Bryan Johnson 60b8f0e1c8 v0.8.2: Presidio sidecar for free-text NER (tier-5) — closes V1

The only path that closes V1 (free-text PHI gap — the dominant real-world
failure mode per Vera). Opt-in install; larry runs in v0.8.1 mode on hosts
without Presidio (MobaXterm/Cygwin per Bryan's accepted tradeoff).

New files:
- lib/phi-presidio-sidecar.py — FastAPI service on 127.0.0.1:$LARRY_PHI_PORT
  (default 41189). Presidio AnalyzerEngine + AnonymizerEngine over spaCy
  en_core_web_sm + 3 HL7-specific custom recognizers (HL7_MRN, HL7_CARET_NAME,
  HL7_PHONE_BARE). POST /redact and GET /health.
- lib/phi-sidecar.sh — lifecycle (start/stop/status/health/ensure). ensure
  is idempotent; called backgrounded from main_loop so it never blocks the
  first prompt. Honors LARRY_PHI_VENV.
- lib/phi-client.sh — bash client (phi_client_available / phi_redact_text /
  phi_redact_entities). CR-safe; 5s timeout bounds tier-5 stall.

larry.sh:
- auto_detect_phi gains tier-5: after tiers 1-4, before status summary,
  source phi-client.sh, run Presidio on a token-masked copy of the input,
  tokenize each entity through hl7-sanitize.sh tokenize-value (category
  presidio_<TYPE>) so token IDs stay stable. Honors confirm + strict modes.
  Removed the v0.7.3 early-return that skipped past tier-5 when tiers 1-4
  found nothing — pure prose now always reaches tier-5.
- Token-safe substitution: existing [[...]] tokens are pulled to sentinels,
  tier-5 value is replaced, sentinels restored — prevents the token-within-
  token corruption that naive literal-replace caused on already-tokenized
  text. Acronym guard drops HL7/clinical jargon (SSN/MRN/DOB/ADT) Presidio
  over-tags as ORGANIZATION.
- Graceful degradation: sidecar unreachable → tier-5 no-ops with a one-time
  stderr warning. /phi-sidecar slash command + completion table.

install-larry.sh:
- Probes python3 3.9+; offers to create $LARRY_HOME/phi-venv and install
  presidio + fastapi + uvicorn + en_core_web_sm. Skips silently (with a
  v0.8.1-mode note) on Cygwin/MobaXterm without python3, and on
  non-interactive pipe installs. Sets LARRY_PHI_VENV in the larry shim.

MANIFEST: three new lib files added for auto-sync.

Prototype validation (Bryan's Mac, Apple Silicon, Python 3.14):
  cold start (en_core_web_sm): ~9s   (vs ~82s if Presidio auto-grabs _lg;
                                       we pin _sm for the REPL budget)
  warm analyzer latency:       P50 20.6ms / P95 22.7ms
  end-to-end HTTP round-trip:  ~57ms warm; ~150ms first-post-startup
All comfortably under the 200ms-per-turn budget.

MobaXterm verdict: v0.8.2 is Mac/Linux-only. MobaXterm stays on v0.8.1 +
nudges, per Bryan's explicit acceptance. install-larry.sh enforces this
by platform detection; larry.sh tier-5 silently no-ops when the sidecar
is absent (which IS the MobaXterm path — no code is platform-gated).

Verification: bash -n clean on larry.sh + all 3 new lib scripts; python3
ast.parse clean on the sidecar; end-to-end tier-5 tested live against the
sidecar (pure prose, rule-pack+tier-5 combined with no token corruption,
!nophi bypass); strict-mode fail-closed abort tested; CR-taint, path-block,
and base64 round-trip batteries re-run green.

Co-Authored-By: Clover (Claude Opus 4.7) <noreply@anthropic.com>

2026-05-27 20:00:23 -07:00

16 KiB

Raw Blame History

Changelog

All notable changes to cloverleaf-larry / larry-anywhere are recorded here. Versioning is loose-semver; bumps trigger the in-process self-update on every running client via LARRY_BASE_URL + MANIFEST.

v0.8.2 — 2026-05-27

Microsoft Presidio sidecar for free-text NER. Closes V1 from Vera's audit — the dominant real-world failure mode (patient names, addresses, un-keyworded dates in prose chat). Opt-in install; larry runs in v0.8.1 mode on hosts where Presidio isn't installed (MobaXterm/Cygwin per Bryan's accepted tradeoff).

lib/phi-presidio-sidecar.py — FastAPI service on 127.0.0.1:$LARRY_PHI_PORT (default 41189). Wraps Presidio's AnalyzerEngine + AnonymizerEngine over spaCy en_core_web_sm (12MB model, ~9-second cold start). Two endpoints: POST /redact takes {"text": "..."} and returns {"redacted": "...", "entities": [...], "latency_ms": N}; GET /health for the launcher's readiness probe. Three HL7-specific custom recognizers added (HL7_MRN for 6-12 digit numerics with patient/MRN/account context; HL7_CARET_NAME for SMITH^JOHN outside Tier-3 line context; HL7_PHONE_BARE for plain 10-digit phones). Confidence threshold for tier-5 tokenize is 0.3 (below that is too noisy).
lib/phi-sidecar.sh — lifecycle launcher. Subcommands: start / stop / status / health / ensure. ensure is idempotent (no-op if already up); called from larry.sh main_loop startup, backgrounded so it never blocks larry's first prompt. Waits up to 30 seconds for the sidecar to become healthy after start; surfaces the log tail if startup fails. PID file at $LARRY_HOME/.phi-sidecar.pid; log at $LARRY_HOME/log/phi-sidecar.log. Honors LARRY_PHI_VENV env to use a dedicated virtualenv (which the installer sets up at $LARRY_HOME/phi-venv when the user opts in).
lib/phi-client.sh — bash wrapper around /redact. Sourceable functions: phi_client_available, phi_redact_text, phi_redact_entities. Also runs standalone as a CLI (./phi-client.sh check / redact / entities). CR-safe (sources cygwin-safe.sh defensively); 5-second curl timeout bounds any tier-5 stall.
Tier-5 integration in larry.sh:auto_detect_phi. New stage AFTER the existing tier-1/2/3/4 substitution and BEFORE the status summary. Sources phi-client.sh lazily, probes phi_client_available, and on success runs phi_redact_entities to get Presidio's per-entity output. Each entity is tokenized through the SAME hl7-sanitize.sh tokenize-value pipeline as tiers 1-4 (category prefixed presidio_<TYPE>) so token IDs remain stable across surfaces and the /tokens listing stays unified. Tier-5 honors LARRY_AUTO_PHI=confirm (prompts Y/n once per value) and strict (aborts the turn if tokenize-value fails on a Presidio hit). Critically, v0.8.2 removes the v0.7.3 early-return that exited auto_detect_phi when tiers 1-4 found nothing — pure-prose input now ALWAYS reaches tier-5.
Graceful degradation. If the sidecar is unreachable (not installed, not started, crashed), tier-5 silently no-ops with a one-time stderr warning per session. Larry's REPL remains fully functional in v0.8.1 mode. LARRY_AUTO_PHI=strict does NOT abort on absent sidecar (the strict mode escape is for HL7-shaped content where rule-pack would have caught the leak; tier-5 is additive coverage).
/phi-sidecar slash command — start / stop / status / health / ensure exposed to the user. Slash-completion table and _LARRY_SLASH_CMDS_DESC updated.
install-larry.sh install path. On hosts with Python 3.9+ + pip, the installer prompts before creating $LARRY_HOME/phi-venv and installing presidio_analyzer + presidio_anonymizer + fastapi + uvicorn + spaCy en_core_web_sm (~400MB on disk, ~250MB RAM resident). On MobaXterm/Cygwin without python3, the installer skips the prompt entirely and prints Bryan's accepted tradeoff (MobaXterm stays on v0.8.1 + nudges). Re-runnable; idempotent.
MANIFEST. Added three new lib files. They auto-sync to every running client on next launch; clients without Python 3 won't run the sidecar but the files are harmless to ship.

Prototype validation (Bryan's Mac, Apple Silicon, Python 3.14). Cold start (model load): ~9 seconds with en_core_web_sm (vs ~82s with the larger en_core_web_lg Presidio auto-downloads by default — we explicitly pin _sm for the latency-sensitive REPL use case). Warm analyzer latency: P50 20.6ms, P95 22.7ms over 20 sequential requests on 100-word input. End-to-end HTTP round-trip (curl + json roundtrip): P50 ~57ms warm; first request post-startup pays a ~150ms tokenizer warmup tax then steady. Well under the 200ms-per-turn REPL budget.

Detection quality on the canonical "John Doe MRN 623000286" sample: 8 core entities caught (PERSON x2, DATE_TIME x2, PHONE_NUMBER, US_*), plus the three custom HL7 recognizers add MRN + caret-name + bare-phone coverage. Misclassifications (MRN as US_PASSPORT, "ED" as PERSON) are within tolerance for the tokenize-everything-suspicious policy — the auto-PHI lookup table sees them as presidio_* categories and the operator can audit via /tokens.

MobaXterm compatibility verdict. Per Bryan's accepted tradeoff: v0.8.2 ships Mac/Linux-only. MobaXterm/Cygwin stays on v0.8.1 (rule-pack + path-block + content-shape gating + strict mode + base64 round-trip + tool-result review gate). Test path: install-larry.sh detects platform and skips the Presidio install on windows-cygwin with a clear "v0.8.1 mode" note. No code in larry.sh is platform-gated — tier-5 silently no-ops when the sidecar is absent, which IS the MobaXterm path.

Proactive same-pattern sweep. Searched for other call sites where free-text NER would help: tool-result surface already gets HL7-shape sanitize (v0.8.1) and base64 round-trip (v0.8.1-c). Tier-5 is user_input-only by design — tool-result free-text NER deferred to a future patch (would require deciding on per-tool latency budgets; Bryan to call when needed).

v0.8.1 — 2026-05-27

Tool-result PHI gating expansion. Closes V2 / V12 and the V2 base64 sub-gap from Vera's audit. No behavior change for users not on HL7-shaped data; opt-in friction for the 8KB+ tool-result review gate.

Tool-name allow-list dropped; content-shape gating only. The v0.7.3 tool-result auto-PHI gate ran only on read_file (.hl7|.txt), nc_msgs, hl7_field, hl7_diff. v0.8.1 runs _auto_phi_looks_like_hl7 on EVERY tool result. On hit → route through lib/hl7-sanitize.sh. On miss → pass through unchanged. Closes V2: bash_exec/ssh_exec/ grep_files/read_file of .log/.csv/.dat/no-suffix files are now all covered when their output is HL7-shaped. False-positive cost is cheap (extra regex pass with zero behavioral impact on non-HL7).
Base64-wrapped HL7 round-trip. New _auto_phi_b64_roundtrip helper. Detects candidate base64 runs (length >= 200 chars, [A-Za-z0-9+/=] only, length divisible by 4 — NOT entropy-based, per Pax §V2-sub: HL7's repetitive prefixes survive base64 with LOW entropy). Speculatively decodes each run; if decoded bytes look like HL7, routes through hl7-sanitize.sh and re-encodes (base64 -w0) back into the result. Catches ssh_pull_smat sampled mode TSV (server-side encoding kept for binary-safe TSV transport; client-side unwrap handles the safety concern). Requires python3 (installed everywhere larry-anywhere runs); skipped with a one-time stderr warning if unavailable.
Operator review gate for bash_exec/ssh_exec/ssh_pull/ ssh_pull_smat results. When the tool produced HL7-shaped output OR the result exceeds LARRY_TOOL_RESULT_REVIEW_THRESHOLD bytes (default 8192), Larry prompts [Y/n/i] before passing the result back to the model. i opens the result in $PAGER then re-prompts. Default Y — zero friction by default. N substitutes a refusal JSON so the model knows a result was withheld. Skipped when LARRY_AUTO_PHI=off (consistent with the opt-out) OR running non-interactively (no TTY — never blocks headless scripts). Override with LARRY_TOOL_RESULT_REVIEW=always to gate every result. Per Pax §V2/V12: closes the "operator wanted to see this themselves, didn't want the model to see it" gap that's the actual common case.

Proactive same-pattern sweep. Searched the codebase for other call sites where tool output bypasses content-shape gating: found only the one in agent_turn. The v0.8.0-c strict-mode tool-result branch was hardened in lockstep so it now triggers on the broader (content-only) eligibility instead of the old name-allow-list.

Manifest unchanged.

v0.8.0 — 2026-05-27

PHI-safety quick-wins pack — three independent zero-risk patches closing four gap-classes Vera identified in the v0.7.5 static audit (Deliverables/2026-05-27-cloverleaf-larry-phi-leak-audit.md) with Pax's recommended mitigations (Deliverables/2026-05-27-cloverleaf-larry-phi-mitigation-research.md). No new dependencies, no behavior change for users not interacting with PHI.

read_file/grep_files/glob_files/list_dir path-block list (closes V4 + V6 + V11). Refuse — with a structured JSON error the model must surface, NOT a silent "file not found" — any tool-side attempt to read or enumerate under $LARRY_HOME/log/ (auto-phi.log, headers.log, oauth.log, session logs), $LARRY_HOME/sanitize/ (lookup.tsv — the desanitization key), $LARRY_HOME/sessions/, $LARRY_HOME/.oauth.json, or $LARRY_HOME/.env. Block-list resolves $LARRY_HOME at call time (not script-parse time) and runs against both the literal path and its realpath -m canonical form, so symlink detours don't bypass. The proactive same-pattern sweep (Bryan standing rule, 2026-05-27) extended the block from tool_read_file alone to also cover tool_grep_files, tool_glob_files, and tool_list_dir — those tools would otherwise leak filenames or grep-matched content out of the same protected dirs without any approval gate.
/load <file> HL7 pre-routing (closes V3). When the loaded file's content matches _auto_phi_looks_like_hl7, route it through lib/hl7-sanitize.sh (the segment-aware tokenizer with the full PHI field rule set: PID, PV1, NK1, GT1, IN1, OBR, OBX, DG1, ORC) BEFORE the existing user_input auto-PHI pass. Closes the gap where smat dumps loaded via /load only got the lighter per-word classifier, which misses bare HL7 PID fields. Status line reports how many fields were tokenized: phi> /load: hl7-sanitize.sh tokenized N HL7 field(s) from <file> before passing to auto-PHI. Strict mode (see below) aborts the /load if sanitize fails; default/confirm modes warn-and-continue.
LARRY_AUTO_PHI=strict fail-closed mode (closes V5). New fourth value alongside off / on / confirm. In strict mode, the auto-PHI pipeline aborts the surrounding turn (no payload built, no API call) when: (a) lib/hl7-sanitize.sh is missing/non-executable on HL7-shaped user_input, (b) the sanitizer returns empty on HL7-shaped content, (c) any single value's tokenize-value call fails inside the detection loop. On the tool-result surface (which can't kill the in-flight tool_use), strict mode substitutes the result with a structured JSON refusal sentinel so the raw HL7 NEVER reaches the model. Existing off / on / confirm semantics unchanged (still fail-open per Bryan's "don't break tools" priority). Strict is the opt-in tradeoff for HIPAA work where a silent leak is worse than a broken turn. /phi-auto strict toggle and /help text updated. Wired into both auto-PHI invocation sites: user input scan and the tool-result HL7 sanitizer gate.

Proactive same-pattern sweep (Bryan standing rule, 2026-05-27). Searched the codebase for other tools matching the pattern "reads arbitrary path, returns content to model, no approval gate": found and patched tool_grep_files, tool_glob_files, tool_list_dir alongside tool_read_file. bash_exec/ssh_exec already require Y/N operator approval — the operator is the gatekeeper there (a second gate deferred to v0.8.1). No other matches.

Manifest unchanged (no new files in lib/).

v0.7.5 — 2026-05-27

Three focused changes, one common cause: the Cygwin/MobaXterm CR-taint pattern that crashed OAuth on Bryan's v0.7.3 work-box with the cryptic error bash: ...: arithmetic syntax error: invalid arithmetic operator (error token is "").

OAuth/arithmetic CR fix. lib/oauth.sh now routes every operand entering a bash arithmetic context (fetched_at, expires_in, now) through a dedicated coerce_int helper that strips non-digits at the source. The failure mode: $(date +%s) against a Cygwin pty where Windows-native date.exe shadows Cygwin date can return a CR-tainted epoch like "1779999999\r", which crashes the very next $((expires_at - now)). Diagnosis in Deliverables/2026-05-27-cloverleaf-larry-oauth-arithmetic-fix.md.
Mouse mode is opt-in. REPL mouse handling now defaults to OFF and is enabled via LARRY_MOUSE=1 env var or /mouse on slash command. Several terminals (notably MobaXterm and stripped tmux) were swallowing the mouse ANSI sequences and printing literal ^[[?1000h garbage when v0.7.0 turned it on unconditionally. Diagnosis in Deliverables/2026-05-27-cloverleaf-larry-mouse-regression-fix.md.
CR-safety sweep across lib/*.sh and top-level scripts. Three new primitives in lib/cygwin-safe.sh (sourced by every tool family member):
- coerce_int VAL [DEFAULT] — for arithmetic and integer-test operands
- strip_cr VAL — for case patterns, regex tests, paths, HTTP headers
- read_clean VAR [PROMPT] — read -r wrapper that strips CR pre-assign Hardened call sites:
- larry.sh — status-line date +%s / tput cols, three y/N approval prompts (write_file, bash_exec, first-run auth), API-key paste, first-run auth menu
- lib/oauth.sh — cmd_login and cmd_refresh date +%s captures
- lib/nc-engine.sh — five y/N action prompts (stop/start/bounce, resend, route-test, testxlate, tpstest) + find ... | wc -l arithmetic
- lib/nc-msgs.sh — parse_time_ms date captures (4 sites), meta-TSV tm field, MSG_COUNT wc -l
- lib/nc-regression.sh — tr | wc -c count, hl7-diff ?-fallback arithmetic
- lib/nc-smat-diff.sh — A_COUNT/B_COUNT/DIFFS_TOTAL
- lib/nc-insert-protocol.sh — every awk-emitted line-number that feeds head -n $((N-1)) / tail -n +$((N+1)) arithmetic
- lib/journal.sh — _next_seq wc -l arithmetic
- lib/lessons.sh — _next_id, cmd_list, cmd_count arithmetic + two y/N prompts (clear all, clear since)
- lib/hl7-sanitize.sh — cmd_count arithmetic + clear-table y/N
- lib/ssh-helper.sh — local + remote wc -c integer compares (4 sites)
- lib/nc-find.sh — wc -l count for %d printf
- lib/nc-table.sh — $(date +%s) in backup-filename construction
- lib/nc-document.sh — two wc -l | %d printf sites
- larry-rollback.sh — Proceed? y/N prompt
Reproduction (now exercised by cygwin-safe.sh's in-line tests):
```
now=$(printf '%s\r' 1779999999); echo $((now - 1))   # pre-fix: crashes
now=$(coerce_int "$(printf '%s\r' 1779999999)" 0); echo $((now - 1))   # fix: 1779999998
```
Added lib/cygwin-safe.sh to MANIFEST so it auto-syncs to every running client on next launch.

v0.7.4 — 2026-05-27

Drop GitHub fallback from auto-update. Single-source Gitea (https://git.bjnoela.com/bryan/cloverleaf-larry.git).

v0.7.3 — 2026-05-26

Automatic PHI detection (tiered detection + blacklist contexts).

v0.7.2 — 2026-05-26

Gitea becomes primary auto-update origin; GitHub demoted to fallback.

v0.7.1 — 2026-05-26

Status line moves to between-turn position (post-input, pre-response).
Status line below prompt; automatic PHI detection; session-artifact upload.

v0.7.0 — 2026-05-26

HL7-aware tab completion + REPL mouse mode (later made opt-in in v0.7.5).

16 KiB Raw Blame History