0e6495223a
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| 9dd5821436 |
v0.7.5: OAuth CR-taint fix + mouse opt-in + CR-safety sweep
- Fix bash arithmetic crash on MobaXterm/Cygwin: $(date +%s) was returning CR-tainted values landing in $(( )) operands - Mouse mode off by default; opt in via LARRY_MOUSE=1 or /mouse on - Comprehensive CR-safety sweep across lib/*.sh and larry.sh — every command-substitution result, file read, and user input that feeds an arithmetic context, case dispatcher, or path/header is now CR-stripped at the source New shared helper lib/cygwin-safe.sh defines three primitives: coerce_int VAL [DEFAULT] — for arithmetic / integer-test operands strip_cr VAL — for case patterns, regex tests, paths, headers read_clean VAR [PROMPT] — read -r wrapper that strips CR pre-assign Hardened call sites (14 files, 60+ patch points): - larry.sh: status-line date/tput, 3 y/N approvals, auth menu, API key - lib/oauth.sh: cmd_login + cmd_refresh date+%s captures - lib/nc-engine.sh: 5 y/N action prompts + find|wc arithmetic - lib/nc-msgs.sh: parse_time_ms (4 date sites) + meta-TSV time + MSG_COUNT - lib/nc-regression.sh: tr|wc count + hl7-diff ?-fallback arithmetic - lib/nc-smat-diff.sh: A_COUNT/B_COUNT/DIFFS_TOTAL - lib/nc-insert-protocol.sh: every awk-emitted line number → head/tail math - lib/journal.sh: _next_seq wc -l arithmetic - lib/lessons.sh: _next_id/_count + 2 y/N prompts - lib/hl7-sanitize.sh: cmd_count + clear-table y/N - lib/ssh-helper.sh: 4 local+remote wc -c integer compares - lib/nc-find.sh, lib/nc-table.sh, lib/nc-document.sh, larry-rollback.sh Reproduces the exact error Bryan hit: bash: ...: arithmetic syntax error: invalid arithmetic operator (error token is "") lib/cygwin-safe.sh added to MANIFEST so it auto-syncs on next launch. Co-Authored-By: Clover (Claude Opus 4.7) <noreply@anthropic.com> |
|||
| 58e6bf4e03 |
v0.7.3: automatic PHI detection (tiered detection + blacklist contexts)
Adds automatic PHI tokenization on two surfaces: user input and HL7-shaped tool results. Supersedes Bryan's reverted |
|||
| 0927238dcd |
v0.7.1: status line moves from above-prompt to between-turn (post-input, pre-response)
render_status_line is no longer called before printf 'you[model]>'. It is
now invoked after read_user_input returns and after @file/PHI preprocessing
complete, immediately before add_user_text/agent_turn. The visual effect is
that the dim status divider sits BETWEEN turns — summarising the cost of
the just-completed turn as the user heads into the next one.
The slash-command and empty-input paths all 'continue' before the new call
site, so no status line renders on /help, /status, /clear, /quit, etc.
First-turn suppression continues to live inside render_status_line (it
returns silently while STATUS_* globals are empty and _LARRY_TURNS=0), so
the very first prompt of a session still has nothing above the response.
/status on-demand command is unchanged; LARRY_NO_STATUS=1 still disables
entirely. Comments updated at render_status_line, the STATUS_* globals
header, the help block, and the LARRY_NO_STATUS env doc.
Supersedes the earlier combined v0.7.1 (
|
|||
| af2ffe883c |
v0.7.1: status line below prompt + automatic PHI detection + session-artifact upload
Feature 1 — Status line BELOW the prompt (was: above).
The dim status line now renders AFTER each completed agent_turn and BEFORE
the next prompt, sitting between turns as a footer to the just-finished
exchange. Shipped Option B from the spec — render_status_line moved to the
tail of the REPL loop, the call before printing the prompt was removed.
Option A (cursor manipulation under an active readline prompt) was rejected
because `read -e` takes exclusive control of the cursor and inserting a
repositioned footer below an active prompt is fragile on MobaXterm / Cygwin
(readline redisplay clobbers manual cursor moves). Visual outcome is
identical to "below the previous prompt cycle", and /status still forces a
re-render mid-conversation if needed.
Feature 2 — Automatic PHI detection.
New auto_detect_phi() runs BEFORE preprocess_phi_markers and tokenizes any
value matching PHI-shaped patterns (email, SSN, phone, DOB, MRN 6-12 digits,
HL7 caret-name, "Last, First", or loose "Title Case Title Case"). Uses the
existing hl7-sanitize.sh tokenize-value pipeline so canonicalization
(sort-unique-lowercase NAME tokens, ISO DOB, digits-only PHONE/SSN,
lowercase EMAIL) collapses different surface forms onto one token across
the session. Skipped: paths, URLs, already-tokenized values, manual @@/{{phi:}}
markers, timestamps (13+ digits or 10 digits starting with '1'), and a
built-in allowlist of common non-PHI two-word phrases ("Home Assistant",
"Mac Studio", etc.).
Modes: confirm (default — prompts Y/n on loose name-like matches once per
session), aggressive (silent always-tokenize), off. Env LARRY_AUTO_PHI;
runtime /auto-phi and /auto-phi-status slash commands. Per-turn override
with "!nophi " prefix. Manual markers always win. New normalize-value
subcommand on hl7-sanitize.sh exposes the canonicalization step so the
per-session memory cache uses canonical keys (so "John Smith" and
"JOHN SMITH" share one confirm decision). EMAIL + PHONE categories added
to normalize_value().
Feature 3 — Session-artifact upload at close.
New upload_session_artifacts() POSTs $LARRY_HOME/log/headers.log,
$LARRY_HOME/sessions/<id>.log.md, and <id>.messages.json to
$LARRY_MEMORY_UPLOAD_URL on session exit. Each request carries
X-Larry-Source (headers-log | session-log | session-messages),
X-Larry-Version, and X-Session-Id headers so the ingest side can route
appropriately. Fires from both the clean main_loop exit and the EXIT/INT/TERM
trap (idempotent via _LARRY_UPLOAD_FIRED guard). Unset URL = silent skip
with a one-line warn. Auth tokens are never logged: headers.log captures
only response headers matching ^anthropic-* or ^retry-after: (per v0.6.9
writer); the session log + messages contain post-tokenization content only.
No regressions to v0.7.0 work — HL7 tab completion, mouse mode toggles,
TOOLS_JSON heredoc, streaming, @file refs, status-line existence, slash
completion, and all v0.6.x machinery remain untouched. MANIFEST unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
| c2bba7be90 |
v0.5.5: @@VALUE inline PHI syntax + name canonicalization
Bryan asked for an easier-to-remember inline PHI marker than {{phi:VALUE}}
and for name forms like SMITH^JOHN / Smith, John / John Smith / JOHN SMITH
to all collapse to the same hash. Both shipped.
INLINE SYNTAX (in addition to the legacy {{phi:VALUE}} which still works):
@@VALUE unbracketed — VALUE has no whitespace
e.g. @@12345 @@SMITH^JOHN @@V789
@@VALUE@@ bracketed — VALUE may contain spaces
e.g. @@John Smith@@ @@Smith, John@@
Parser is 2-pass to disambiguate mixed forms in the same prompt: bracketed
markers are matched first (via grep -oE with a regex that excludes leading/
trailing whitespace inside the brackets), then the unbracketed pass scans
the remaining text. Verified against:
"look for @@12345 in PID.3 for @@John Smith@@ DOB @@01/15/1985 ..."
extracts 4 markers correctly and routes each to its category.
AUTO-CATEGORY DETECTION (lib/hl7-sanitize.sh: detect_category):
pure digits 4-15 → MRN
9 digits with dashes → SSN
date-shaped → DOB
caret or comma → NAME
2+ alpha tokens → NAME
else → MANUAL
CANONICALIZATION (lib/hl7-sanitize.sh: normalize_value):
NAME: lowercase, replace ',^/' with spaces, sort unique alpha tokens
SMITH^JOHN, Smith John, John Smith, JOHN SMITH → "john smith"
DOB: parse to YYYY-MM-DD (GNU date or BSD date fallback)
SSN: strip dashes/whitespace
MRN/MANUAL: trim outer whitespace only
TABLE SCHEMA bumped to 4 columns (token / category / canonical / original).
Legacy 3-column rows still read fine — lookups key on column 3 which is
"canonical" in new rows and "value" in legacy rows (mismatches just create
a new token, no corruption). Detokenize prefers column 4, falls back to
column 3 for legacy compat.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
| b9415f3b57 |
v0.3.3: PHI sanitize/desanitize + {{phi:...}} prompt preprocessing
Bryan's ask: use Larry on prod data without PHI ever leaving the client box.
Added:
lib/hl7-sanitize.sh — tokenize PHI fields in HL7 messages
lib/hl7-desanitize.sh — reverse op (local view-time unmask)
Tokenization model:
- Replace PHI fields with [[CATEGORY_NNNN]] tokens (MRN, NAME, DOB,
ADDR, PHONE, ACCT, SSN, PROV, VISIT, etc.)
- Same value → same token across messages (deterministic via local
lookup table; analysis can still correlate patients).
- Lookup table at $LARRY_HOME/sanitize/lookup.tsv mode 0600 — never
leaves the client.
- Default PHI rule set covers PID, PV1, NK1, GT1, IN1, OBR, OBX,
DG1, ORC; --rules-file to extend.
- --strict also tokenizes unknown Z segments wholesale.
Prompt-side preprocessing in larry.sh:
- {{phi:VALUE}} inline marker, auto-category lookup
- {{phi:CATEGORY:VALUE}} explicit category
- Replaced with the token BEFORE the user input enters conversation
history. The original never reaches the API.
- Local feedback "phi> {{phi:...}} → [[TOKEN]]" printed to terminal only.
New REPL slash commands:
/phi <value> tokenize a single value, print the token
/unmask <token> show original (local terminal only, never API)
/tokens show full PHI ↔ token lookup table
New tools in larry.sh schema:
hl7_sanitize agent can sanitize a file before reading PHI
tokenize-value / detokenize-value (subcommands of hl7-sanitize.sh)
Persona update (agents/larry.md):
- Documented PHI mode and rules for proactive sanitize-first behavior
MANUAL.md updated with the full PHI section including limitations.
Brings total native tools to 29.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|