Origin moved to Gitea 2026-05-27. GitHub is push-mirror fallback for auto-update reachability.

Go to file

Bryan Johnson 58e6bf4e03 v0.7.3: automatic PHI detection (tiered detection + blacklist contexts) Adds automatic PHI tokenization on two surfaces: user input and HL7-shaped tool results. Supersedes Bryan's reverted `af2ffe8` prototype with a tiered confidence model, explicit blacklist contexts, structured audit log, and tool-result coverage. Bryan's directive: "Err on the side of caution and tokenize anything you think you may need to as long as it doesn't break the tools." Priority order: (1) don't break tools (constraint), (2) catch all PHI (goal), (3) minimize false positives (secondary). Detection — four-tier model (first match wins per token): Tier 1 DEFINITE SSN (with dashes), email, formatted phone, NPI with explicit "NPI:" prefix. Always tokenize. Tier 2 CONTEXTUAL Numeric value preceded by MRN/Patient/DOB/Account/ Visit/Acct/Record/Birth within 20 chars. Always. Tier 3 HL7-CTX Plausibly-PHI-shaped values when line mentions PID.3/5/7/11/13/18, NK1., GT1., IN1.16-20. Aggressive — prompts in confirm mode. Tier 4 KNOWN Value already exists in $LARRY_HOME/sanitize/lookup.tsv. Tier-4 scans the full set of categories actually present in the table (not a hardcoded shortlist), so any category Bryan has used before is checked. Blacklist contexts (NEVER tokenize, even on tier match): * Path-like (/, ./, ../, ~/, contains /) * HL7 field references like PID.18 — the digit after the dot is a field index, not an MRN (spec verification scenario #5) * Version strings (vN.N.N, semver) and ISO dates (overridden by explicit DOB/Birth context so "DOB 1980-01-15" still tokenizes) * Port keywords (:NNNN, port NNNN, tcp/udp NNNN, LISTEN/PORT=) * Error/status codes (error NNN, code NNN, HTTP NNN, rc=N) * JSON key position (value followed by ": or :) * Fenced code blocks (``` ... ``` skipped via awk redactor) * Timestamps (epoch ms 13+ digits, epoch s 10 digits starting 1) Tool-result surface — routed through hl7-sanitize.sh: * Eligible tools: read_file (.hl7/.HL7/.txt/.TXT only), nc_msgs, hl7_field, hl7_diff * Eligibility further gated by _auto_phi_looks_like_hl7 shape check (segment headers MSH/PID/EVN/PV1 with \| delimiter) * Generic outputs (list_dir, grep_files, bash_exec, glob_files, ssh_exec, web search) NEVER scanned — spec is explicit about this * For HL7-shaped content we use the canonical field-aware pipeline rather than the prose detector, since segments are pipe-delimited and would otherwise be a single whitespace token. Both pipelines share lookup.tsv so tokens are stable across surfaces. Behavior controls: * env LARRY_AUTO_PHI: 1/on (default), 0/off, confirm * /phi-auto on\|off\|confirm\|status slash command * "!nophi " per-turn prefix override * Manual @@VALUE / {{phi:VALUE}} markers always win — preprocessed FIRST; auto-PHI fills gaps in things Bryan didn't manually mark. * After each pass, dim status line summarises: phi> auto-tokenized 3 value(s) [user_input]: MRN×1 EMAIL×1 SSN×1 Audit — JSONL log at $LARRY_HOME/log/auto-phi.log: { "ts": "...", "value": "...", "category": "...", "token": "...", "tier": "definite\|contextual\|hl7\|known\|hl7_pipeline", "surface": "user_input\|tool_result", "context": "..." } Mode 0600, parent dir 0700. Best-effort write; never fails the host call. Library changes (lib/hl7-sanitize.sh): * normalize_value: re-add EMAIL + PHONE arms + new NPI arm. EMAIL and PHONE arms were originally in `af2ffe8` (reverted with v0.7.1) — cited in the source comments. * normalize-value subcommand: exposes canonical normalization so auto-PHI can build per-session memory keys. Originally `af2ffe8`. * lookup-original subcommand: probes the table for an exact match without creating new tokens. Used by Tier-4 "already-known" detection. Implementation notes: * macOS bash 3.2 compatibility: ${pos: -20} returns empty when len < 20; use explicit ${pos:$((len-20))} guarded by length check. * Per-session decision cache (accept/decline) uses bash 4 associative arrays with a 3.2 fallback to pipe-delimited string membership. * Confirm-mode prompts only Tier 3-4 — Tier 1-2 hits are high-confidence and always tokenize even in confirm mode (Bryan: err on caution). * Detection loop iterates line-by-line so fenced-code redaction works and so left/right context is meaningful per token. Verification matrix (18/18 pass): 1 SSN tokenized, 2 Email tokenized, 3 MRN contextual, 4 bare digits skipped, 5 PID.18 skipped, 6 path skipped, 7 version skipped, 8 port skipped, 9 Tier-4 known catches custom category (EMP), 10 !nophi skips, 11 existing token left alone, 12 read_file .hl7 sanitizes all PHI fields, 13 .py not HL7-shaped, 14 list_dir not HL7-shaped, 15 mode=off skips, 16a /phi-auto off skips, 16b /phi-auto on tokenizes, 17 audit JSONL parseable. No regressions to v0.7.2 origin switching, v0.7.1 status-line position, v0.7.0 HL7 completion + mouse mode, v0.6.9 status state, v0.6.7 streaming, or any earlier OAuth/SSH/lessons work. MANIFEST unchanged. Divergence from `af2ffe8` (cited in source comments): * Tiered classifier (vs. flat regex set) — enables reasoning about WHY a value tokenized; gates confirm-mode behavior. * Explicit blacklist contexts — addresses spec false-positive cases that `af2ffe8` missed (HL7 field refs, ports, error codes, JSON keys). * Tool-result surface — `af2ffe8` only ran on user input. * Structured JSONL audit log — `af2ffe8` had no per-tokenization log. * /phi-auto semantics: on\|off\|confirm\|status (spec) vs. af2ffe8's /auto-phi on\|off\|aggressive\|confirm. * Dropped the loose "Title Case Title Case" pair detector and its name-allowlist — too high FP rate against narrative prose ("Larry Anywhere", "Mac Studio") and Bryan's name-allowlist couldn't keep up with the long tail. Name detection now Tier-3 (HL7-context only) and Tier-4 (already-known) only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-27 17:37:26 -07:00
agents	v0.3.3: PHI sanitize/desanitize + {{phi:...}} prompt preprocessing	2026-05-26 10:29:20 -07:00
lib	v0.7.3: automatic PHI detection (tiered detection + blacklist contexts)	2026-05-27 17:37:26 -07:00
.gitignore	v0.3.0: initial release of Larry-Anywhere	2026-05-26 09:46:20 -07:00
install-larry.sh	v0.7.2: Gitea becomes primary auto-update origin; GitHub demoted to fallback	2026-05-27 17:25:00 -07:00
larry-auth.sh	v0.3.1: OAuth subscription auth + offline manual cheat sheet	2026-05-26 09:57:44 -07:00
larry-rollback.sh	v0.3.0: initial release of Larry-Anywhere	2026-05-26 09:46:20 -07:00
larry-tunnel.sh	v0.3.0: initial release of Larry-Anywhere	2026-05-26 09:46:20 -07:00
larry.sh	v0.7.3: automatic PHI detection (tiered detection + blacklist contexts)	2026-05-27 17:37:26 -07:00
MANIFEST	v0.7.0: HL7-aware tab completion + REPL mouse mode	2026-05-27 16:15:11 -07:00
MANUAL.md	v0.4.3: cross-env bundle for regression — no direct peer protocol needed	2026-05-26 11:25:02 -07:00
README.md	v0.3.0: initial release of Larry-Anywhere	2026-05-26 09:46:20 -07:00
VERSION	v0.7.3: automatic PHI detection (tiered detection + blacklist contexts)	2026-05-27 17:37:26 -07:00

README.md

Larry-Anywhere

Portable AI agent for Cloverleaf integration work. Single bash script, no installs, no root, no package manager. Runs on Linux and inside MobaXterm on Windows. 26 native v3 tools for NetConfig analysis, message search, system documentation, regression testing, and safe NetConfig modification — all implemented directly in bash with no dependency on v1 wrapper scripts or v2 cloverleaf-tools.pyz.

When Cloverleaf is installed, Larry uses the shipped product binaries (tclsh, hcienginerun, etc.) directly. Otherwise it falls back to bash one-liners it composes itself. Never relies on the v1/v2 wrapper layers.

Install

One-liner (recommended)

On any client box with curl and bash (essentially any Linux + MobaXterm shell):

curl -fsSL https://raw.githubusercontent.com/bojj27/cloverleaf-larry/main/install-larry.sh | bash

The installer:

Detects platform (Linux / Darwin / MobaXterm-cygwin) and arch
Creates ~/.larry/ (or wherever $LARRY_HOME points)
Pulls every script + agent file from bojj27/cloverleaf-larry raw URLs
Downloads a static jq binary into ~/.larry/bin/ if jq isn't on PATH
Drops a larry shim into ~/bin/
Makes no system changes, requires no root

First run:

larry                              # prompts for ANTHROPIC_API_KEY once
                                   # saved to ~/.larry/.env mode 0600

Auto-update

Every time you run larry, it self-updates from the canonical GitHub URL. To suppress for one launch: larry --no-update. To disable permanently: export LARRY_NO_UPDATE=1.

Offline / scp install (when the client box can't reach github.com)

# from a machine that CAN reach github
git clone https://github.com/bojj27/cloverleaf-larry
scp -r cloverleaf-larry/ user@client-box:~/cloverleaf-larry/
ssh user@client-box
cd ~/cloverleaf-larry && ./install-larry.sh

The installer detects local files and uses them when LARRY_BASE_URL isn't reachable.

Use

Set the Cloverleaf runtime context, then point Larry at your site:

export HCIROOT=/opt/cloverleaf/cis2025/integrator
export HCISITE=adt
larry "$HCIROOT/$HCISITE"

you> list every protocol in this site
you> find threads with codametrix in the name
you> show messages from to_3m in the last 3 days for MRN 5720501458
you> generate jump threads for every TCP-listener inbound, target host=newlinux01.test, jump port = orig+10000
you> diff the ADTto_3m interface + connected threads between test and prod
you> document the codametrix system into ~/.larry/knowledge/codametrix.md
you> /quit

What Larry can do natively (v3 tools)

domain	tools
File system	`read_file`, `list_dir`, `grep_files`, `glob_files`, `write_file`, `bash_exec`
NetConfig (read)	`nc_list_protocols`, `nc_list_processes`, `nc_protocol_block`, `nc_protocol_field`, `nc_protocol_nested`, `nc_protocol_summary`, `nc_destinations`, `nc_sources`, `nc_xlate_refs`, `nc_tclproc_refs`
NetConfig (write, journaled)	`nc_insert_protocol`, `nc_add_route`
Workflows	`nc_find_inbound`, `nc_make_jump`, `nc_document`, `nc_find`, `nc_diff_interface`
Messages (smat is SQLite!)	`hl7_field`, `nc_msgs`, `hl7_diff`
Safety	`larry_rollback_list` + `larry-rollback.sh` CLI

Every write goes through a journal (~/.larry/journal/<session>/) — original snapshotted, diff saved, atomic replacement. Roll back any subset with larry-rollback.sh --list, --target /path/to/file, --session <id>, or --entry <id>.

Slash commands in the REPL

command	what
`/env`	show detected HCIROOT/HCISITE + tool layer presence
`/sites`	list site dirs under HCIROOT
`/site <name>`	switch HCISITE mid-session
`/cd <path>`	change working directory
`/model <name>`	switch Claude model
`/reset`	clear conversation history
`/load <file>`	load a file as your next message
`/help`	full slash-command help

Working examples (battle-tested against a 22-site Cloverleaf install)

Migration jump-threads: "find every TCP-listener inbound, generate the 3-thread jump pair (linux_out / windowsin / windows_out) for each." Inserts via journaled write. Roll back instantly.
MRN search: "messages from to_3m in last 3 days for patient MRN X." Reads smat via sqlite3 -ascii, parses HL7 natively, filters by PID field — no Cloverleaf binary involved.
System documentation: "find all threads matching , document them." Cross-site walk, threads + ports + processes + xlates + tclprocs, adjacent-thread map, placeholder POC/status/escalation sections.
Interface diff: "diff ADTto_3m + connected (depth 1) between test and prod." Connected-graph BFS, protocol-block diff + xlate-file diff + tclproc-file diff.
Regression diff (Phase 6): hl7_diff for any two HL7 message files, with --ignore MSH.7 by default and configurable field-level exceptions. The orchestrator that drives Cloverleaf's route_test end-to-end is the only Example 6 piece pending an engine to invoke against.

Architecture in one diagram

  Agent layer        Larry-Anywhere (this repo)
                     ├── bash REPL → Anthropic API
                     ├── personas: Larry + Clover + Regress + Cheatsheet
                     ├── 26 native tools (no v1/v2 deps)
                     └── journal-backed writes with rollback
                                       │
                                       ↓ acts on
  Cloverleaf install  $HCIROOT / $HCISITE
                      NetConfig, Xlate/, tables/, tclprocs/, formats/
                      .smatdb files (SQLite!) under exec/processes/
                      shipped binaries (tclsh, hcienginerun, ...) — invoked
                      directly via bash_exec when needed for engine ops

No layer between Larry and Cloverleaf except plain bash. The v1 wrapper scripts (tbn, hlq, mr, mp, mg, awkcut, ...) and the v2 cloverleaf-tools.pyz are intentionally absent.

Environment cheat-sheet

var	default	purpose
`LARRY_HOME`	`~/.larry`	where state lives (sessions, journal, .env, agent overrides)
`LARRY_MODEL`	`claude-sonnet-4-6`	Claude model (try `claude-opus-4-7` for deeper work)
`LARRY_MAX_TOKENS`	`8192`	per-turn output cap
`LARRY_NO_UPDATE`	`0`	set to `1` to disable self-update
`LARRY_UPDATE_URL`	github.com/bojj27/cloverleaf-larry/main/larry.sh	self-update source
`LARRY_AGENTS_URL`	github.com/bojj27/cloverleaf-larry/main/agents	persona refresh source
`ANTHROPIC_API_KEY`	(prompted on first run)	API key, saved to `$LARRY_HOME/.env`
`HCIROOT` / `HCISITE`	(unset)	auto-detected and surfaced in system prompt

Roll back any change Larry made

larry-rollback.sh --list                                # see every write Larry made, newest first
larry-rollback.sh --target /opt/cloverleaf/.../NetConfig  # undo every change to this file
larry-rollback.sh --session 2026-05-26-090724-12345     # undo a whole Larry session
larry-rollback.sh --last 1                              # undo the most recent write
larry-rollback.sh --entry <session>/<NNN_filename>      # undo one specific write

Pre-rollback copies are left at <target>.larry-prerollback.<unix-ts> so you can re-do if needed.

Hard limits (V3)

No subagent dispatch — Larry + Clover + Regress live in one head. No Pax / Iris / Vera / etc. in portable mode.
No memory layer — Honcho / Hindsight / mem0 aren't reachable from a remote client box yet. Session history is the markdown logs in $LARRY_HOME/sessions/.
read_file capped at 250 KB, grep_files/glob_files 300 results, bash_exec 500 lines of output. Use targeted queries.
Subscription OAuth not yet wired — API key path only. Claude.ai Max subscription quota uses a different auth flow (OAuth device-code); landing in a future release.

Reverse SSH tunnel back home (optional)

If you also want your home Larry to dial into the client shell:

~/.larry/larry-tunnel.sh --serveo                          # zero-config (serveo.net, third-party)
~/.larry/larry-tunnel.sh --hop=user@bjnoela.com:22         # your controlled hop

Auto-reconnect built in. PID and public URL written to ~/.larry/tunnel.{pid,url}.

License

GPL? MIT? TBD. Bryan decides before this repo gets shared widely.

Issues / PRs

github.com/bojj27/cloverleaf-larry