cloverleaf-larry/CHANGELOG.md
bj 2b578f5058 v0.9.1: on upgrade to broker-mode, WIPE the now-obsolete local credentials
An install switching TO broker-mode (the v0.9.0 default) carried long-lived
Anthropic/OAuth credentials from the pre-broker era. Broker-mode authenticates
via short-lived broker tokens and never uses them — they are a pure security
liability on the box, acutely so on a PHI box. On the next self-update the agent
now cleans them up automatically:

- Secure-deletes $LARRY_HOME/.api-key and .oauth.json (reuses the
  uninstall-larry.sh shred -u -z -n3 -> overwrite -> rm logic).
- Strips the ANTHROPIC_API_KEY / CLAUDE_CODE_OAUTH_TOKEN LINES from
  $LARRY_HOME/.env and from ~/.bashrc, ~/.bash_profile, ~/.profile (backup
  first); every other line is kept.
- Idempotent (.broker-cred-wiped marker, written only after a run that removed
  something); silent no-op when clean.
- Hard-guarded on LARRY_AUTH_MODE=broker: does NOT fire under the apikey escape
  hatch (which legitimately still needs the key). Only the two Anthropic/OAuth
  vars are touched (LARRY_* / GITEA_TOKEN are still needed in broker mode).
- Prints a reminder to ALSO revoke at the source (local deletion != server
  revocation), per the decommission / kill-switch docs.

Fires at the broker-resolution block (after self_update synced a fresh
lib/broker.sh, before the fail-closed preflight). New functions in
lib/broker.sh: _broker_wipe_obsolete_credentials,
_broker_strip_cred_lines_from_env, _broker_strip_cred_lines_from_rc.
VERSION + MANIFEST regenerated. Tested: 31/31 assertions pass across the
upgrade-wipe, apikey-non-wipe, clean-no-op, idempotency, dangerous-path-guard,
and selective-line-strip paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 23:42:11 -07:00

126 KiB
Raw Blame History

Changelog

All notable changes to cloverleaf-larry / larry-anywhere are recorded here. Versioning is loose-semver; bumps trigger the in-process self-update on every running client via LARRY_BASE_URL + MANIFEST.

v0.9.1 — 2026-05-31

Upgrade to broker-mode now WIPES the now-obsolete local Anthropic/OAuth credentials it carried from the pre-broker era.

When an existing install self-updates and LARRY_AUTH_MODE resolves to broker (the v0.9.0 default), the long-lived credentials baked in before the broker pivot are unused by broker-mode (which authenticates via short-lived broker tokens) and are a pure security liability on the box — acutely so on a PHI box. On the next self-update the agent now cleans them up automatically:

  • Secure-deletes $LARRY_HOME/.api-key and $LARRY_HOME/.oauth.json (reuses the uninstall-larry.sh shred -u -z -n3 → overwrite → rm logic via lib/broker.sh _broker_secure_delete, since on a PHI box these are sensitive).
  • Strips the credential LINES (ANTHROPIC_API_KEY / CLAUDE_CODE_OAUTH_TOKEN, with optional export/leading whitespace) from $LARRY_HOME/.env and from ~/.bashrc, ~/.bash_profile, ~/.profile — keeping every other line; a timestamped .broker-upgrade.<ts>.bak is written before any rc/.env rewrite.
  • Logs clearly what was wiped, file by file, with the secure-delete method used.
  • Idempotent: writes $LARRY_HOME/.broker-cred-wiped only after a run that actually removed something, so the rc-rewrite path isn't re-attempted every launch; if nothing obsolete is present it no-ops silently and leaves no marker (so a later apikeybroker flip still triggers cleanup).
  • Does NOT fire when LARRY_AUTH_MODE=apikey — the escape hatch legitimately still needs its key. Hard-guarded on LARRY_AUTH_MODE=broker inside the function; only the two Anthropic/OAuth vars are touched (broker-mode still uses LARRY_* and GITEA_TOKEN, so those rc lines are left intact).
  • Prints a reminder that the operator must ALSO revoke the keys at the source (local deletion ≠ server revocation), consistent with the decommission / kill-switch docs.

Fires in larry.sh at the broker-resolution block (after self_update synced a fresh lib/broker.sh, before the fail-closed preflight). New functions in lib/broker.sh: _broker_wipe_obsolete_credentials, _broker_strip_cred_lines_from_env, _broker_strip_cred_lines_from_rc.

v0.9.0 — 2026-05-31

Broker mode is the DEFAULT — the remote kill-switch is wired into every Cloverleaf-Larry deployment, including self-update of existing installs.

Phase 3 of the Larry remote kill-switch (Pax design brief Deliverables/2026-05-31-larry-remote-killswitch-design.md; server broker by Mack at 192.168.20.135:8181 / Tailscale 100.86.16.114:8181). A deployed Larry no longer holds a long-lived sk-ant-… key: it holds a per-deployment enrollment secret, mints a short-lived token from the broker, and routes every LLM call THROUGH the broker (/v1/messages), which injects the real key server-side. Flip set-authorized <id> false in the broker and the deployment 401s and dies — no access to the box required.

New / changed behavior

  • LARRY_AUTH_MODE=broker is the DEFAULT (was apikey). Unset => broker. Self-update flips existing installs to broker-mode too: upgrading Gundersen delivers the kill-switch automatically, per Bryan's requirement. Escape hatch (documented, not default): LARRY_AUTH_MODE=apikey keeps the legacy baked-key rail (NO kill-switch) — never for PHI boxes.
  • New lib/broker.sh — enroll+mint (/enroll-mint), fail-closed heartbeat (/authorized), best-effort PHI wipe (reuses uninstall-larry.sh's shred/overwrite secure-delete + the same hard LARRY_HOME safety guard).
  • Fail-closed preflight at launch (after self-update, before any model call): authorized:false => for profile:phi, best-effort local PHI wipe, then refuse to run; unreachable past LARRY_HEARTBEAT_MAX_MISS (default 3) => refuse to run (NO wipe on a mere network blip — only an explicit disable wipes).
  • In-REPL heartbeat every LARRY_HEARTBEAT_INTERVAL (default 60s): a mid-session disable stops the next turn (and wipes PHI on a phi profile). The per-call token re-mint in call_api/call_api_stream is the hard stop on top.
  • call_api / call_api_stream broker branch: Authorization: Bearer <short-lived-token>, NO x-api-key. Token never touches disk.
  • Enrollment provisioning in install-larry.sh: pass LARRY_DEPLOYMENT_ID
    • LARRY_ENROLL_SECRET (+ LARRY_PROFILE, LARRY_BROKER_URL) at install → stored 0600, baked into the larry shim so every launch is broker-mode and the box shows up in the dashboard ready to toggle.
  • /auth now reports broker state (deployment id, profile, heartbeat cadence, token held) and confirms "no sk-ant-… key is stored — kill-switch ON."

Reachability (the design tension, flagged for Bryan)

Broker mode means the client MUST reach the broker to function. The broker is LAN + Tailscale only — no public route. On an egress-restricted box (the Gundersen Cloudflare block that 28'd git.bjnoela.com), the client reaches the broker over Tailscale (LARRY_BROKER_URL=http://100.86.16.114:8181, the default). If a box can reach NEITHER LAN nor Tailscale, broker-mode fail-closes = the agent will not run — a correct KILL state but a useless WORKING state. So a real deployment on a locked-down network MUST run Tailscale (or Bryan must stand up a hardened public broker ingress with its own auth). The installer and README say this explicitly so no one ships a silently-bricked box.

Bug fixed during integration test

  • _broker_json_field used jq // empty, which renders a literal false as empty — authorized:false would have mis-classified a DISABLED deployment as an unreachable MISS (delaying fail-close past the miss budget and SKIPPING the PHI wipe). Fixed to if has($k) then .[$k] else "" end. Verified: disable now fail-closes immediately and the phi wipe fires.

v0.8.34 — 2026-05-31

Hardened uninstall-larry.sh into a first-class, healthcare-grade decommission command — "update larry, then run one command."

The v0.8.33 uninstaller removed the install footprint but did not meet the operational decommission requirements for a PHI-handling deployment left on a client box. This release makes larry uninstall (and the shipped uninstall-larry.sh it delegates to) self-contained and idempotent: one run stops everything, securely destroys cleartext PHI, scrubs creds, self-removes, and tells you what you still must do at the source. The larry uninstall early-dispatch in larry.sh is unchanged — it already delegates here.

New / changed behavior

  • Stop everything (tolerant): in addition to the precise pidfile kills (PHI Presidio sidecar, reverse-SSH tunnel), it now pgrep+kills detached larry.sh REPLs, phi-presidio-sidecar, and larry-tunnel keepalives by command pattern. It NEVER kills itself, its parent, or any uninstall-larry process (patterns + per-PID exclusion).
  • PHI hygiene (healthcare): log/auto-phi.log, sanitize/lookup.tsv, and sessions/*.log.md are SECURELY deleted FIRST — shred -u -z -n 3 where available; overwrite-then-rm fallback (zero + urandom passes) where shred is absent (Windows/MobaXterm); truncate-then-rm if even dd is missing. It prints HONESTLY per-file whether a real secure-delete was achieved, and warns that overwrite-in-place is not guaranteed on SSD/CoW/Windows filesystems. find-less hosts use a bash-glob fallback so session PHI is never skipped.
  • Shell-profile creds: greps .bashrc/.bash_profile/.profile for ANTHROPIC_API_KEY|CLAUDE_CODE_OAUTH_TOKEN|LARRY_*|GITEA_TOKEN and, by default, backs up the rc (timestamped .larry-uninstall.<ts>.bak) then strips only those lines. --keep-rc prints them for manual removal instead.
  • More shim/copy locations: removes ~/larry, ~/.local/bin/larry, ~/bin/larry, $LARRY_BIN_DIR/larry, and a scp'd ~/larry-anywhere — but only OUR shims (auto-generated header or a launcher/symlink into $LARRY_HOME); a foreign larry is left untouched.
  • Self-uninstall last: when run from a standalone larry-anywhere checkout (not from inside $LARRY_HOME), removes that checkout and the script itself — only if real work was done this run and the dir is a full bundle.
  • Post-uninstall reminder: prints the explicit "FINISH AT THE SOURCE" block — REVOKE the Anthropic API key AND the Claude-Code OAuth grant (local removal does not invalidate egressed copies), revoke any Gitea PAT / rotate SSH creds, plus a BAA/PHI-disclosure reminder for healthcare deployments.
  • Safety guards: refuses to operate if LARRY_HOME is empty/unset, /, $HOME, a single-component-at-root path (/.larry), or anything that doesn't look like a Larry dir — so a misconfigured env can never rm -rf /. Removal is scoped strictly to the built target list (no glob expansion, no parent touch).
  • New flags: --keep-rc, --no-shred (alongside the existing --yes, --dry-run, --keep-data). DRY-RUN remains the default.

v0.8.33 — 2026-05-29

Two operator-requested features: a real uninstaller, and a deterministic-only (no-API) mode.

1. larry uninstall / uninstall-larry.sh — remove EVERYTHING the installer put down

There was no uninstall command before this. Now there is one, and it reverses install-larry.sh exactly.

  • New script uninstall-larry.sh (shipped by the installer into $LARRY_HOME, manifest-synced) + a larry uninstall subcommand that delegates to it. Runs BEFORE bootstrap/self-update/auth, so you can uninstall even when the API/origin is unreachable (same early-dispatch pattern as larry tools).
  • DRY-RUN BY DEFAULT. larry uninstall (or --dry-run) only PREVIEWS the full removal list and prints "would stop" for any running PHI sidecar / tunnel. larry uninstall --yes actually deletes (with a [y/N] confirm unless --yes). --keep-data removes program files but preserves user data (sessions/journal/lessons/knowledge/sanitize/log + creds).
  • Footprint removed: the whole $LARRY_HOME (shipped bundle + bin/jq fallback + optional phi-venv + all runtime artifacts: sessions, journal, lessons, knowledge, sanitize, log incl. headers.log, .history, .origin, .api-key/.env/.oauth.json, .ssh-creds/.ssh-sockets/.ssh-hosts.tsv, known_hosts, tunnel.*, every dotfile state marker) and the larry PATH shim at $LARRY_BIN_DIR/larry (default $HOME/bin/larry).
  • SAFETY: the shim is removed ONLY if it carries our auto-generated header (Auto-generated by install-larry.sh) — an unrelated larry on PATH is left alone. Background procs are stopped via THEIR OWN pidfiles only (never a guessed PID). rm -rf is scoped strictly to the built target list — no globs, no parent dirs. The installer NEVER edits your shell rc, so the uninstaller doesn't either: it prints a one-line reminder to remove any PATH export you added by hand. Cloverleaf sites / $HCIROOT are never touched.

2. --no-api — deterministic-only mode (ZERO LLM API calls, zero cost)

  • New flag --no-api (env: LARRY_NO_API=1). In this mode larry makes ZERO /v1/messages requests — no pay-as-you-go cost, ever. The interactive REPL and ALL local/deterministic commands still work; a free-text prompt is NOT sent to the model. Instead larry keyword-sniffs the prompt and points you at the matching deterministic larry tools <name> command (the same lib/nc-*.sh / hl7-*.sh tools the LLM would have called) — or at larry tools list when nothing matches.
  • No key required: the first-run auth prompt is skipped in --no-api mode (no key is read because no key is used). The REPL prompt label shows you[no-api]> instead of a model name.
  • Defense in depth: call_api and call_api_stream HARD-REFUSE when LARRY_NO_API=1, so even a stray codepath cannot silently bill a request. The turn handler short-circuits upstream, so the guard normally never fires.
  • What's unavailable: open-ended reasoning / free-form authoring (anything that genuinely needs the model). Those surface a clear "unavailable in --no-api mode" message rather than erroring obscurely or silently calling out. Coverage reference: Deliverables/2026-05-28-larry-deterministic-tool-coverage-plan.md — most of Bryan's NetConfig/HL7 operations already have deterministic tools.

v0.8.32 — 2026-05-28

★ CAPSTONE TOOL: nc_provision_jumps — point at a SITE and build the cross-environment server_jump thread set for ALL inbound (root) threads, routing the existing-env inbound feed (the ADT feed) into a NEW environment. Bryan's stated ultimate goal. PURE COMPOSITION of tools all validated in the read-pass (v0.8.29) and write/mutate pass (v0.8.30) — it reimplements nothing: nc-inbound.sh enumerates the inbound roots → nc-parse.sh reads each root's PROTOCOL.PORT + ENCODINGnc-make-jump.sh generates the per-inbound 3-thread set + splice route → nc-insert-protocol.sh persists them → journal.sh wraps the WHOLE batch in ONE session.

  • Invocation: nc-provision-jumps --site <SITE> --new-host <H> --new-port-base <BASE>. Flags: --dry-run (preview the FULL plan — every inbound + every block/route that WOULD be created — write NOTHING), --confirm yes (required to write; absent = plan-only refusal), --filter REGEX (limit which inbound roots), --scope tcp-listen|all, --netconfig / --new-netconfig (explicit OLD / server_jump NetConfigs; new defaults to the site NetConfig), --hciroot, --inbound-host, --process-jump, --format text|tsv.
  • Per-inbound loop is name-driven (tag = the inbound name, so it loops cleanly): jump_port = new_port_base + index, collision-checked. For each root it generates linux_<name>_out (OLD/site NetConfig), windows_<name>_in + windows_<name>_out (NEW-env server_jump NetConfig), and splices a route onto the OLD inbound's DATAXLATE → linux_<name>_out (existing routes preserved).
  • ★ SINGLE-SESSION JOURNALING — the key safety property. All 3N inserts + N splice routes for the whole batch share ONE LARRY_SESSION_ID, so larry-rollback.sh --session <id> undoes the ENTIRE provisioning run byte-identical in one shot (not per-thread).
  • ★ FAIL-SAFE — the batch is ALL-OR-NOTHING (robustness hardening). Each inbound's later steps are gated on its earlier steps: if ① (linux_<x>_out insert) fails, ②/③/④ are skipped — so the ④ splice can never route to a thread that was never created (no dangling DEST). On the FIRST hard failure ANYWHERE in the batch the loop STOPS and auto-rolls-back the whole session via the proven byte-identical larry-rollback.sh --session <id> --yes, then exits 6 with provisioning failed at <inbound>; auto-rolled-back the batch. If the rollback tool can't run, it fails loud and prints the manual rollback command.
  • ★ PRE-FLIGHT COLLISION CHECK against the EXISTING NetConfig (robustness hardening). Before ANY write (and surfaced in --dry-run) it scans the target NetConfig — and the --new-netconfig when different — via nc-parse (protocol-summary; never a hand-grep) for (a) any planned jump_port already in use by an existing PROTOCOL.PORT (would create a duplicate listener) and (b) any planned thread NAME already present. ANY collision ABORTS before writing anything (exit 5), listing every conflict (inbound, port/name, what it collides with). This complements the existing intra-batch port/name guards.
  • Splice route artifact path is now a proper PLAN_ROUTE[] array built at plan time (was a fragile ${oldout%old_out.tcl}route_add.tcl suffix-strip).
  • Regression handoff (capstone's SEPARATE, already-shipped half): output ends with a roots: <csv> line consumable directly as the nc_regression scope (server / threads:, route_test_cmd via the exposed nc_engine route-test). This tool does NOT run regression.
  • Wired into all 4 larry.sh surfaces (manual-tools registry, tool_nc_provision_jumps wrapper, execute_tool case, TOOLS_JSON schema).
  • Verified on a COPY of the real 24-site integrator (/tmp/clvf_jumps_test, site epic = 21 inbound roots; read fixture sha-confirmed untouched at fa129cfc…): (a) full-site --dry-run listed all 21 roots → 63 new threads + 21 splice routes, wrote NOTHING (sha unchanged); (b) a real full-site provision inserted 63 balanced-brace blocks + 21 splice routes (topology spot-checked: windows_<x>_in listens on the jump port and routes internally to windows_<x>_out → 127.0.0.1:<orig_port>; linux_<x>_out → new-host:jump_port; OLD inbound keeps its existing dests + the new jump-out), all under ONE journal session (84 entries, 1 session id); (c) larry-rollback.sh --session <id> removed the ENTIRE batch BYTE-IDENTICAL (sha back to fa129cfc…); (d) --filter '^MFNfr' correctly limited to 2 roots, and that batch also rolled back byte-identical.
  • Robustness re-test on a COPY (/tmp/clvf_jumps_h, site epic = 21 roots; read fixture /tmp/clvf_realtest sha-confirmed untouched): (a) a CLEAN run still provisions 21 roots → 63 blocks + 21 splice routes under ONE session (84 entries) and --session rollback is still byte-identical; (b) both --dry-run AND a real run ABORT cleanly (exit 5, sha unchanged, no journal minted) when a jump_port/NAME collides with the existing NetConfig (crafted via --new-port-base = an existing listener port); (c) a simulated mid-batch insert failure fail-fasts (exit 6), auto-rolls-back the whole session byte-identical, and leaves NO dangling linux_<x>_out / splice.
  • VERSION + larry.sh LARRY_VERSION → 0.8.32; MANIFEST regenerated (--check clean); bash -n clean. (Version held at 0.8.32 — still unpushed; the fail-safe + collision pre-flight harden the same release.)

v0.8.31 — 2026-05-28

★ NEW WRITE TOOL: nc_set_field — change a settable field (PORT, HOST/IP, PROCESSNAME, ENCODING) on an existing thread, JOURNALED + rollback-reversible. Bryan's top-requested write feature ("changing port numbers and ip addresses"). Built on the exact journal/atomic-write foundation the v0.8.30 mutate pass proved byte-identical-reversible — same idiom as nc-table.sh / nc-insert-protocol.sh (snapshot → diff → atomic write → journal entry; undo via larry-rollback.sh).

  • Invocation: nc-set-field <thread>[.<site>] <field> <value> (bare thread → $HCISITE; thread.site cross-site; also --site). Flags: --dry-run (show before→after, NO write), --confirm yes (skip the y/N prompt; still journaled), --netconfig PATH, --hciroot PATH, --completion (emit a bash-completion snippet: thread names + the field enum).
  • Curated safe set, explicit reject otherwise — never blind-edits arbitrary tokens and never CREATES a missing field: PORT → nested PROTOCOL.PORT; HOST (alias IP) → nested PROTOCOL.HOST; PROCESSNAME → top-level; ENCODING → top-level (must already exist). Field name is case-insensitive.
  • Anchored edit, NOT a global sed. Locates the thread's protocol-block line range via nc-parse (protocol-line + brace-balanced end walk), then the exact field line within it (nested vs. top-level scoped), and replaces ONLY that value token — preserving indentation + brace shape. A port/host SHARED by another thread is provably untouched. Re-verifies balanced braces before the journal write; a broken structure aborts with nothing written. No-op (value already set) exits 4 cleanly.
  • Wired into all 4 larry.sh surfaces (manual-tools registry, tool_nc_set_field wrapper, execute_tool case, TOOLS_JSON schema) + the bash-completion snippet.
  • Verified on a COPY of the real 24-site integrator (/tmp/clvf_setfield_test, read fixture sha-confirmed untouched): dry-run shows before→after without writing; a real PORT change (39500→39600) and HOST change (172.31.23.2→10.34.48.11) each apply as a single surgical line edit (braces balanced, the two threads sharing PORT 51205 untouched); both journaled; whole-session rollback restored the file BYTE-IDENTICAL; an unsupported field (MLP_TIMEOUT, TYPE) and an unknown thread are rejected cleanly.

v0.8.30 — 2026-05-28

★ WRITE/MUTATE TOOL VALIDATION PASS + journal-rollback foundation verified — 2 real bugs found & fixed. Ran every config-mutating tool against a COPY of the real 24-site Cloverleaf integrator fixture (/tmp/clvf_writetest_integrator, the read fixture left untouched), proving each mutation (a) applies, (b) journals, and (c) rolls back BYTE-IDENTICAL to the original. nc_table (add/delete/replace/create), nc_add_route, nc_insert_protocol (end + after-anchor), nc_create_thread (tcpip/file, ±connect-from), nc_make_jump (gen + e2e insert), nc_tclgen (all 7 templates) all verified. Rollback validated across ALL granularities: --session, --entry, --target, --last N, the 2-entry chain unwind (newest-first), and the create→delete path. The journal rollback foundation reliably restores byte-identical originals.

  • nc-create-thread.sh HOST brace-collision (malformed NetConfig). The block template inlined { HOST ${HOST:-{}} }. When --host was supplied (the common case for a tcpip client/outbound), the } closing the parameter default collided with the template's closing brace, emitting { HOST 10.9.9.9} } — one extra }, brace-imbalanced (-1), invalid TCL that Cloverleaf would reject. Now the HOST field is precomputed into $HOST_FIELD (real host, else {}) and referenced cleanly: { HOST ${HOST_FIELD} }. Only triggered with --host; the empty-host/file case was already correct. nc-make-jump.sh was NOT affected (it uses { HOST ${host} }).
  • lessons.sh:142 printf option-injection. Same class fixed in v0.8.29 (and flagged there as out-of-scope): printf '---\n\n' parsed --- as printf options → printf: --: invalid option, dropping the markdown rule from lessons.sh export bundles. Now printf '%s\n\n' '---'. (The in-loop printf '\n---\n\n' on line 151 is safe — the leading \n shields the dashes.)

KNOWN / TRIAGE (not fixed this pass): the journal records the table target path with the lowercase tables/ segment from nc-table.sh's modify_via_csv even when the file lives under Tables/ (capital). Restore works on case-insensitive filesystems (macOS APFS, Windows) but could miss on a case-sensitive Linux mount. Low-risk for current hosts; flagged for a path-normalization follow-up (do not patch blind — needs a real case-sensitive box to verify against).

v0.8.29 — 2026-05-28

★ READ/INSPECT TOOL VALIDATION PASS — 6 real bugs found & fixed. Ran every read/inspect/analysis tool against a real 24-site Cloverleaf integrator fixture (via BOTH the lib/<x>.sh path and the wired execute_tool dispatch), with real thread/site/xlate/table names discovered from the config. Same class as the v0.8.28 nc-engine arg-parse crash: each only surfaced when actually run. A BSD-awk word-boundary nit in nc-status.sh not-up (gawk-only \<up\> → portable (^|[^a-z])up([^a-z]|$)) was also caught at the Vera gate and folded in.

  • nc-find.sh --name mode returned ZERO matches (BSD/macOS). The protocol- name extraction used the GNU-only sed \+. BSD sed (macOS) treats \+ as a literal +, so the thread name came back empty and every --name hit was silently dropped. Now POSIX-portable ([[:space:]][[:space:]]* / [A-Za-z0-9_][A-Za-z0-9_]*). Worked on GNU/Cygwin hosts; broke on BSD.
  • nc-find.sh exit 1 on success for tsv/jsonl. The trailing [ "$FORMAT" = "table" ] && printf ... test left the script exit code at 1 for non-table formats (the && short-circuits). Added explicit exit 0 — a successful search (even zero matches) now returns 0; mis-signaled failure to any $?/&& caller.
  • nc-parse.sh tclproc-refs dropped digit-leading proc names. The { PROC / { PROCS regexes required a leading [A-Za-z_], so a real Cloverleaf proc like 3M_check_ack was never reported — which also blanked nc_find --tclproc 3M_check_ack. Widened to [A-Za-z0-9_]+; PROCSCONTROL still excluded.
  • nc-xlate.sh diff could not find site-scoped xlates. Unlike show/ops/tree/ summary, diff did not accept --site, so locate_xlate only checked $HCIROOT/Xlate and always died no such xlate for the common case (xlates under <site>/Xlate/). cmd_diff now takes --site (applied to both names) and larry.sh forwards it.
  • nc-diff-interface.sh printf option-injection. printf '---\n\n' (the markdown horizontal rule) parsed --- as printf options → printf: --: invalid option, dropping the rule. Now printf '%s\n\n' '---'.
  • nc-smat-diff.sh printf option-injection (8 lines). printf '- A: ...' format strings starting with - errored printf: - : invalid option in NON-interactive bash, dropping every summary bullet. Guarded with printf --.
  • nc-status.sh not-up crashed on --format. cmd_not_up accepted only --site/--filter; the larry.sh dispatch ALWAYS appends --format, so not-up died unknown flag: --format and was unusable via the wired tool. cmd_not_up now accepts --format and forwards it to cmd_threads.

All other read/inspect tools (nc_paths up/down/full/all/site-only incl. the cross-site ADTto_CodaMetrix chain, nc_destinations/sources, nc_list_, nc_protocol_, nc_find_inbound, nc_document single+system, nc_revisions timeline+diff, nc_msgs raw+field-filter+json, nc_xlate list/show/ops/tree/summary, nc_xlate_refs, nc_tclproc_refs, hl7_field, hl7_diff, nc_diff_interface, nc_regression 6-phase + chain-walk command-gen, nc_smat_diff pairing, nc_engine

  • nc_status graceful degrade, list_sites ignore-rules, all 7 nc_tclgen templates verified info complete in tclsh) PASSED unchanged. Test matrix: Deliverables/2026-05-28-cloverleaf-v3-tool-test-matrix.md.

KNOWN / TRIAGE (not fixed this pass): hl7-sanitize.sh is a silent no-op on LF-delimited input (RS="\r" reads the whole file as one record) — fixing needs a portable CR/LF normalizer (BSD awk has no regex RS); nc_engine route-test/ testxlate/resend ignore --dry-run (only stop/start/bounce/restart honor it) and the dispatch never forwards it; lessons.sh:142 has the same printf '---' option-injection (out-of-scope write tool).

v0.8.28 — 2026-05-28

★ EXPOSE 5 lib-only tools as first-class LLM tools. A roadmap audit found ~7 working lib/*.sh scripts that ran from the shell (larry tools <name>) but were NOT wired into larry.sh's LLM TOOLS_JSON. This pass exposes the READ/INSPECT/TEST ones (the config-mutating nc-table / nc-create-thread are a separate next pass). Each tool now has all four larry.sh surfaces — manual-tools registry, tool_<name> wrapper, execute_tool dispatch case, and the TOOLS_JSON LLM spec — mirroring the v0.8.27 nc_revisions pattern. The LLM tool count goes 38 → 43.

  • nc_statusnc-status.sh — engine RUNTIME status (sites/threads/ not-up/connections/queued/raw + --site/--filter/--format). READ-ONLY; needs a live engine for real numbers, degrades gracefully on a config-only host (sites → config-present, threads → reports the missing tstat binary, no crash).

  • nc_enginenc-engine.sh — engine process control + test drivers (stop/start/bounce/restart/status + route-test/testxlate/tpstest/resend-ib/ resend-ob). State changes run only on a live box and are journaled; status and the test drivers need a live engine. Unlocks the TPS-test (tpstest, wraps hcitps) and route-test driver the regression tool needs.

  • nc_xlatenc-xlate.sh — READ-ONLY xlate (.xlt) explorer (list/show/ops/tree/summary/diff). Works fully on a config fixture.

  • nc_smat_diffnc-smat-diff.sh — diff stored SMAT messages across two envs, paired by MSH.10 (--env-a/--env-b/--pair-on). READ-ONLY; full run needs two reachable archives.

  • nc_tclgennc-tclgen.sh — TCL UPOC skeleton generator (tps-presc/ tps-postsc/tps-iclkill/xlate-helper/trxid/ack/field-rewrite). Standalone, deterministic, API-free.

  • --help header-leak fix (4 libs). The sed comment-header range over-ran into live code: nc-status.sh (2,25p2,20p), nc-engine.sh (2,30p2,29p), nc-xlate.sh (2,15p2,11p), nc-tclgen.sh (2,20p2,19p). Each now stops exactly at the end of the comment header — no set -o pipefail / NC_SELF= / die() leaking into --help. nc-smat-diff.sh (2,22p) was already clean.

  • FIXED nc-engine.sh arg-parsing crash (found during this exposure pass). The subcommand dispatcher treated EVERY --flag as taking a value (flags+=("$1" "${2:-}"); shift 2), so the valueless --dry-run swallowed the next token; and the set -u that leaks in from sourcing journal.sh made the empty-array expansion "${flags[@]}" throw unbound variable under bash 3.2 (macOS/Gundersen), crashing stop/start/bounce/restart with --site/--dry-run. Fix: set +u after sourcing journal.sh (nc-engine predates nounset and uses bare $1/empty arrays), plus a multi-case parser — --dry-run is valueless, --site/--confirm consume one value (without over-shifting on a bare trailing flag). status and the test-driver paths were already unaffected.

v0.8.27 — 2026-05-28

★ NEW TOOL: nc-revisions.sh — NetConfig change-history / revision-diff. Shows how a THREAD, a multi-thread SYSTEM, or a WHOLE SITE changed over time by diffing Cloverleaf's own per-save NetConfig revision SNAPSHOTS, annotated with WHO saved each revision and WHEN. Deterministic, pure bash+awk, NO API, NO network — runs identically on an API-blocked host.

  • Revision storage (verified on the real integrator). Each save snapshots the full NetConfig into $HCIROOT/<site>/revisions/NetConfig<TS>/NetConfig, where <TS> is the save time as M D YYYY H M S with NO zero-padding (so NetConfig5212025121420 = 5/21/2025 12:14:20). Because the components are un-padded the dir NAME is NOT lexically sortable across months/hours, so we do NOT sort on it. WHO + DATE come from the snapshot's NetConfig PROLOGUE (who: / date:, TAB-separated, leading-space-indented; the block is prologue … end_prologue). The date: string (Month DD, YYYY H:MM:SS AM/PM TZ) is parsed into a sortable key YYYYMMDDHHMMSS so the timeline is in true chronological order. The live <site>/NetConfig is included last as (current); a secondary key (snapshot dir-timestamp; ~ sentinel for live) breaks ties when two saves share the same last-editor prologue stamp.

  • Scoped, not whole-file. The diff/summary is reduced to the relevant protocol-block(s) via nc-parse.sh (single thread, or every name-matching thread for --system), so it is NOT the whole 10k-line NetConfig unless --site (whole-NetConfig scope).

  • Flags. --format timeline (default) = plain-text who/when/summary, one revision per block, summary counting routing threads added/removed/modified in scope between consecutive revisions; --format diff = the scoped unified diff between consecutive revisions, one block per transition. --limit N (most-recent N), --since <date> (YYYY-MM-DD or YYYYMMDD). Invocation: nc-revisions <thread> (defaults to $HCISITE), <thread>.<site>, <site>/<thread>, --system <pat> [--site S], or --site <site>.

  • Plain text + control-byte safety. Output matches the v0.8.24/26 OneNote conventions: no markdown. Timeline runs through _sanitize_ctl (unconditional strip — human-readable artifact); --format diff through the tty-gated _sanitize_ctl_tty (raw, byte-identical on a pipe/redirect so a downstream consumer of the diff sees exact bytes). --help sed range stops at the end of the comment header (no code leak).

  • Wired into larry.sh. Added to the tools registry, the tool_nc_revisions dispatcher, the execute_tool case, and a full nc_revisions LLM-tool schema mirroring the other NetConfig tools.

Proved on the real test integrator (HCIROOT=/tmp/clvf_realtest/integrator): nc-revisions ADTto_uds.ancout2 rendered an 8-revision timeline spanning Apr 2024 → Nov 2025 (correctly ordered across months despite un-padded dir names), surfacing a ~1 modified host change and the eventual -1 removed in the live config; --format diff showed the scoped one-line { HOST 10.33.176.5 } → { HOST 10.34.48.11 } change. --system codametrix --site ancout --format diff showed ADTto_CodaMetrix added then its { PORT 8000 } → { PORT 39500 } and inbound-archive (INFILE/INSAVE) edits by BRYJOHN. bash -n clean on all touched files; MANIFEST regenerated and --check passes.

v0.8.26 — 2026-05-28

★ HARDENING: extend the v0.8.25 control-byte sanitize across the whole tool suite (Vera follow-up). v0.8.25 fixed the terminal-corruption leak in lib/nc-document.sh only. Vera flagged that the OTHER tools dumping NetConfig/.tcl/HL7 content to stdout carry the SAME risk and were still unsanitised — most importantly nc-msgs.sh, whose raw HL7 contains C0 framing bytes (e.g. 0x1c block separator) that corrupt a terminal when viewed un-redirected.

  • Shared sanitiser. _sanitize_ctl is hoisted out of nc-document.sh into lib/cygwin-safe.sh (already the sourceable shared-primitives lib), so every tool shares ONE definition. nc-document.sh now sources it; its behavior is UNCHANGED — it still strips UNCONDITIONALLY (the doc is a human-readable artifact; control bytes are unwanted even when redirected to a .txt).

  • New _sanitize_ctl_tty — the data-tool variant. For the data-producing tools (nc-msgs.sh, nc-parse.sh, hl7-field.sh, hl7-diff.sh, hl7-sanitize.sh, hl7-desanitize.sh) stripping happens ONLY when stdout is an interactive terminal ([ -t 1 ]). When the user pipes/redirects (e.g. nc-msgs … --format raw > input.msgs feeding route_test, or | awkcut), the output passes through RAW and byte-identical — the 0x1c HL7 framing and other bytes are LOAD-BEARING downstream and must not be silently corrupted. Each tool's output region runs in a brace group piped through the gate, with ${PIPESTATUS[0]} propagated so subcommand exit codes survive the pipe. (hl7-schema.sh is sourced-only — no stdout sink of its own — so it is untouched.)

  • ssh-helper.sh _read_hidden trap nits (Vera). (a) The restore trap was installed only when stty -g succeeded, yet stty -echo ran regardless — a ^C in that window left echo off. Now a restore trap is installed on EVERY path BEFORE touching echo (falling back to stty echo when the save failed). (b) The trap reset was trap - (reset-to-default); it now captures the caller's PRIOR INT/TERM/HUP trap with trap -p and restores it.

Proved on the real test integrator (to_appriss.smatdb, 11 msgs): nc-msgs … --format raw to a file kept 0x1c/0x0d intact (493 bytes), while the same command through a PTY stripped ESC/FS; cmp confirmed they differ. nc-document --name codametrix to a file still emits 0 control bytes with the em-dash preserved. Portable: LC_ALL=C tr octal ranges + POSIX [ -t 1 ]; no GNU-only flags. bash -n clean on all touched files. MANIFEST regenerated.

v0.8.25 — 2026-05-28

★ FIX: terminal line-editing corruption after larry tools … runs (Bryan, reproduced on his live Cloverleaf box). After running larry tools nc-document (and tbn-class searches), his interactive shell's line editing broke — backspace printed a literal ^H instead of erasing, arrow keys died, and he had to run stty sane/reset to recover. Two independent causes, both fixed belt-and-suspenders:

  • Cause 1 — CONTROL-BYTE OUTPUT LEAK (the one Bryan hit). lib/nc-document.sh reads arbitrary NetConfig and .tcl proc source — author comments, the --raw-tcl verbatim appendix (cat "$apath"), and xlate bodies — and emits it to stdout unsanitised. Any raw C0 control byte in that content, especially ESC (0x1B) and DECSET mode-switch sequences (ESC[?1049h, ESC[?25l, etc.), flips the terminal into a different mode when the doc is printed (i.e. NOT redirected), wrecking line editing. Fix: a single sanitiser at the one output choke-point (out_target → new _sanitize_ctl) strips C0 controls except TAB/LF/CR via LC_ALL=C tr -d '\001-\010\013\014\016-\037\177'. Applies to BOTH stdout and --out <file>. High bytes (0x800xFF) are untouched, so UTF-8 names/comments survive. Portable to AIX/Linux/BSD/Cygwin (POSIX tr octal ranges; no GNU-only flags). The non-interactive tools <name> dispatch in larry.sh itself touches NO termios (it execs the lib tool before any trap/mouse-mode is installed), confirming the leak was the tool's own output bytes.

  • Cause 2 — TTY-MODE LEAK on SIGINT (latent, hardened). lib/ssh-helper.sh read hidden passwords with a bare stty -echo … read … stty echo. A Ctrl-C (or EOF/signal) BETWEEN the two stty calls left echo permanently OFF — terminal corrupted until stty sane/reset. Fix: new _read_hidden saves the full prior termios with stty -g and restores it via a trap … INT TERM HUP plus an explicit restore on normal return, so every interrupt path restores the tty. Reads from /dev/tty. Both hidden-password prompts (cmd_pass, cmd_setup re-prompt) now route through it.

bash -n clean on all touched files. MANIFEST regenerated.

v0.8.24 — 2026-05-28

★ PLAIN-TEXT output rewrite of lib/nc-document.sh (Bryan's priority — OneNote readability). Bryan shares the generated interface doc with a teammate in OneNote, which does NOT render markdown — #, **bold**, | pipe tables |, --- rules and backticks all show as literal junk. The OUTPUT layer is rewritten to emit clean PLAIN TEXT by default (parsing/extraction logic unchanged):

  • Default = plain text everywhere. UPPERCASE headings underlined with a dashes line; no **bold**, no backtick code spans, no --- horizontal rules, no | pipe tables |. Reads cleanly in any editor and pastes straight into OneNote.
  • Tabular sections (Message Flow, Delivery breakdown) — two render modes. DEFAULT = indented label:value blocks (one block per hop/row; reads in any font, zero setup). New --onenote-table flag renders those same sections as TAB-separated rows (header row + one data row per record, real \t between cells, NO leading/trailing pipes) — paste into OneNote then Insert > Table to get real cells. Non-tabular sections (Title/Context/Description) stay plain text in both modes.
  • Removed the verbose Filter / translation logic (surfaced deterministically from the referenced UPOC procs): preamble line; the surfaced UPOC bits now sit under a simple FILTER / TRANSLATION LOGIC plain heading, still inline in each interface description (always on).
  • Raw TCL is now OPT-IN behind --raw-tcl (off by default). The readable, deterministically-extracted UPOC bits stay inline in the per-interface description regardless; only the verbatim raw-proc appendix is gated. The appendix is rendered as a plain indented source dump (no ```tcl fences).
  • New flags --onenote-table and --raw-tcl documented in the lib --help and in the nc_document tool schema/wrapper in larry.sh (onenote_table, raw_tcl booleans). The --help sed -n range was re-pinned to the END of the comment header (2,92p) so it does not leak live code.

Folded in two Vera-deferred v0.8.22 minors (lib/nc-document.sh):

  • Removed dead/unused sup/fan counters in _xlate_filter_block.
  • Added local to the previously un-local'd _dest_hit/_d loop variables in document_thread.

Verified against the real integrator (HCIROOT=/tmp/clvf_realtest/integrator, --name codametrix): default output is plain text with label:value hop blocks and no markdown chars; --onenote-table produces tab-separated rows; the removed preamble line is gone; raw TCL only appears with --raw-tcl.

v0.8.23 — 2026-05-28

★ REGRESSION CHAIN-WALK route-test capture (Bryan's priority). New --chain-walk mode in lib/nc-regression.sh — a single-env (per-env) capture that runs route_test at every routing(inbound) thread along an nc-paths chain and chains each step's per-destination output into the next step's input. Orthogonal to the existing 6-phase env-A/env-B pipeline; Bryan runs it on env-A and env-B with the same START input, then the existing hl7-diff --ignore MSH.7 diff phase compares the captured outputs.

Workflow (verbatim Bryan spec):

  • Take a START thread + N. Grab the N most-recent messages from the START inbound's SMAT via nc-msgs --limit N --format rawinput.msgs.
  • Resolve the downstream chain(s) from START with nc-paths --format jsonl. --target <site/thread> restricts to the single chain ending at that node; with no --target, ALL fan-out branches are walked (each branch a chain dir).
  • The route_test ENTRY threads are the START node plus every node that immediately follows a cross-site ==> hop (the remote inbound). Outbound sender nodes (e.g. OB3_RAD_ordersS) are NOT entries — they are produced as .out.<sender> files by the upstream route_test.
  • At each entry E: hciroutetest -a -d -f nl -s <out_base> <E> <input> — the command string is mined VERBATIM from the v1/v2 wrapper (cloverleaf_tools/tools/route_test_wrapper.py:10/149-156); -f nl yields NEWLINE output directly (no manual len2nl+delete). -a (all routes) writes every fan-out branch's output in one invocation.
  • route_test writes ONE FILE PER DESTINATION named <out_base>.out.<DEST> (.out.<DEST> naming confirmed from legacy_workflow_commands.py:1015). File SELECTION: the node immediately AFTER E in the chain is the suffix selected to feed the NEXT step. For the cross-site boundary S ==> R, the selected .out.<S> payload becomes the input fed to the next site's inbound R.
  • PRODUCES under $OUT/chain/<NN>/: input.msgs (original START messages), step-NN.<E>.out.<DEST> (per-destination outputs), step-NN.<E>.selected (the chosen next-step input), commands.sh (the exact generated command sequence), chain.txt (the resolved v1 chain).

hciroutetest is a Linux engine binary needing a live engine — NOT runnable off the server — so chain-walk GENERATES the orchestration + command STRINGS + file-chaining/selection and stubs the engine call under --dry-run; real execution is on the server (source the Cloverleaf profile first). Everything that does NOT need the engine is self-verified against the real integrator: the nc_paths chain resolution, the nc-msgs message-grab command, the per-step route_test command strings, and the .out.<DEST> → next-step SELECTION.

New flags: --chain-walk --start <site/thread|thread> [--site SITE] --count N --hciroot HCIROOT --out DIR [--target <site/thread>] [--route-test-bin BIN] [--dry-run]. Portable bash (Win+Linux), pure bash+awk, API-FREE, no python/.pyz. Existing phase pipeline untouched.

Verified on ORUto_CodaMextrix orders (mux/RadOrdfr_epic_972310 --> mux/OB3_RAD_ordersS ==> orders/IB3_RAD_muxS --> orders/ORUto_CodaMextrix): 2 route_test entries (RadOrdfr_epic_972310@mux, IB3_RAD_muxS@orders); start grab + the exact per-step hciroutetest strings + the .out.OB3_RAD_ordersS (cross-site) → .out.ORUto_CodaMextrix (terminus) selection. No-target run walks all 6 IB3_RAD_muxS fan-out branches.

v0.8.22 — 2026-05-28

Interface document tool follow-on (lib/nc-document.sh, inbound-systems.tsv, cheatsheet). All changes remain deterministic, pure bash+awk, API-FREE.

★ Xlate-internal filtering & fan-out (Bryan). The route's .xlt is now parsed deterministically for the three ops that change the message COUNT, and they are called out in the Description prose, a new "Xlate filtering & fan-out" subsection, and the delivery breakdown's XLATE line:

  • OP SUPPRESSFILTERING — the message/segment is dropped; the governing OP IF condition is surfaced (e.g. "message SUPPRESSED when @medicopia_fac eq =KILL").
  • OP SENDFAN-OUT — an extra output copy is emitted mid-translation ("message cloned/multiplied here"); conditional sends show the when … clause.
  • OP CONTINUEFAN-OUT — translation continues after a send (the companion that yields a second distinct message). Pure-awk brace-depth + IF-frame-stack parse of the xlt; no API. A pure-pathcopy xlate (e.g. Epic_ADT_CodaMetrix.xlt) correctly yields no subsection.

★ Configurable inbound-systems lookup + known-feed hard-map (Bryan). A new curated inbound-systems.tsv (<key>\t<upstream system identity>, key on the feed thread name OR port:<n>) deterministically labels the external sender in the Message Flow "From"/feed row. On NO match it falls back to the honest generic Epic (process <name>) — it never fabricates (Vera's fabrication concern: the map is curated, not guessed). Resolution order: $LARRY_HOME/inbound-systems.tsv → shipped seed; override with --inbound-systems PATH or $INBOUND_SYSTEMS_FILE. Seeded with known feeds (ADTfr_epic_964700 → "Epic AIP 964700 (ADT)", etc.). The installer seeds it copy-if-missing (never clobbers edits); it is intentionally NOT in the MANIFEST so self-update never overwrites curation.

Vera Minor-1 (cosmetic): --help no longer leaks shell code — the header sed range now stops at the end of the usage block (was 2,72p).

Vera Minor-2: new optional --strict-delivery gate for SYSTEM mode — a thread counts as a delivery only if it has a real downstream endpoint (non-empty PROTOCOL.HOST/PORT) or OUTBOUNDONLY=1, excluding reply-only outbounds a broad --name would otherwise sweep in. Default OFF (preserves prior behavior). Verified: --name Infor drops the reply-only Empfr_Infor under the strict gate; codametrix still yields its 2 deliveries.

Vera Minor-3: list-form { DEST {a b c} } routes are now captured in _routes_of (was single-form { DEST <thread> } only, mirroring nc-parse's index parser), so a delivery reachable only via a multi-dest route is matched instead of missed. route_thr also falls back to the authoritative nc_paths chain's penultimate node when the one-hop sources primitive misses, so the route/xlate breakdown stays populated.

Cheatsheet drift (Vera): nc_make_jump row + the jump-thread-pattern section now name the actual three generated threads (linux_<tag>_out, windows_<tag>_in, windows_<tag>_out) instead of the stale to_/fr_<inbound>_server_jump pair.

Bonus: fixed a pre-existing latent printf: --: invalid option warning on the --- footer separator (now printf '%s\n\n' '---').

v0.8.21 — 2026-05-28

Interface document tool rebuilt (lib/nc-document.sh) — documents a Cloverleaf interface end-to-end in Bryan's confirmed Legacy "ADT Messages" template. Deterministic, pure bash+awk, API-FREE (the whole tool runs identically on an API-blocked host like Gundersen — no python, no .pyz, no network).

Two modes.

  • SINGLE INTERFACE: nc-document.sh <thread> [site] (or <site>/<thread>), e.g. ADTto_CodaMetrix ancout — one fully-detailed interface section.
  • SYSTEM/PATTERN: nc-document.sh --name <pattern>, e.g. --name codametrix — one section per matching DELIVERY (outbound) thread across all sites. A delivery is any thread that is NOT an inbound listener (ISSERVER=1) and NOT an ICL/file inbound router (OBWORKASIB=1), so it catches both OUTBOUNDONLY=1 threads and bidirectional Xto_* deliveries (e.g. DFTto_codaMetrix, OUTBOUNDONLY=0).

Per-interface output (Legacy template):

  • Title = the interface / message type.
  • Description prose — what the messages are, the TRXID filter that selects the delivery, where translation happens (xlate vs raw), seeded from the surfaced UPOC bits.
  • Message Flow table Platform | Action | Description | From | To — Epic feed → Cloverleaf routing → Final Delivery, one row per hop. The routing row uses nc-paths.sh and adapts its wording to whether the chain crosses a site boundary (cross-site destination-block hop ==> vs intra-site DATAXLATE route -->).
  • Delivery breakdown — Flow chain; how-received (inbound PROTOCOL TYPE/HOST/PORT/ISSERVER + ICLSERVERPORT); inbound TRXID/TPS proc (DATAFORMAT.PROC); the route's TRXID filter + WILDCARD; route TYPE; PREPROCS/POSTPROCS; XLATE; destination host:port / process / type.

★ Deterministic UPOC-bits extraction (the key feature). For every referenced UPOC proc (inbound TRXID/TPS proc + each route's PRE/POST/PROCS), locate its .tcl under $HCIROOT/<site>/tclprocs/ (home site first, then any site) and surface — with NO API — into the Description: (1) the proc's comments (header + inline # filter notes), (2) HL7 fields referenced (dotted PID.8 + the underscore PV1_3_3 form normalized to dotted), (3) literal event-code matches (A01 A02 A03 …, boundary-checked), (4) table lookups (tbllookup / .tbl, e.g. PeriCalm_Loc), (5) disposition (CONTINUE/KILL/return → "pass matching / kill non-matching"). Rendered compactly, e.g.: UPOC Epic_PeriCalm_ADT_pass — … · fields: PV1.45 PID.8 · matches: A02 A03 · table: PeriCalm_Loc · disposition: pass matching / kill non-matching. The raw proc TCL is included verbatim in a plainly-labelled ## Referenced proc source appendix for audit — with no "summarize by hand / on an API box" marker (the surfaced bits ARE the content).

LLM polish (enrichment, NOT in the bash tool). The bash tool calls no API. The nc_document tool schema now instructs the model, when run WITH the API, to transparently polish the surfaced UPOC bits into smoother filter prose in the Description (no marker, no special mechanism). On API-blocked hosts the deterministic bits + appendix ARE the deliverable.

Portability fixes baked in:

  • All extraction awk is \b-free (BSD/BWK awk on macOS + mawk on Windows Git-Bash silently match nothing on \b); token boundaries use explicit char-class scans.
  • Internal records are \037(US)-delimited, not TAB — bash read with a TAB IFS collapses CONSECUTIVE empty fields and was silently shifting columns when an ICL/file inbound has empty HOST/PORT/ISSERVER. Inbound facts are read into named globals for the same reason.
  • Route parser walks the real DATAXLATE depth map (route sub-blocks at depth 3, DEST/TYPE/XLATE at depth 6, inner { PROCS <name> } at depth 8), so per-route TRXID/TYPE/XLATE/PREPROCS extraction is exact.

Wiring. tool_nc_document (larry.sh) now takes thread/name/site; the nc_document tool schema documents single-thread + system modes and the UPOC-bit-polish instruction. larry tools nc-document drives the same script standalone (no API).

Verified on the REAL integrator (HCIROOT=/tmp/clvf_realtest/integrator, the 24-site QA env): generated ADTto_CodaMetrix ancout (matches Larry's verified prototype — flow mux/ADTfr_epic_964700 --> mux/OB_ADT_ancS ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix, inbound proc trxId_IB_ADT_muxS, TYPE xlate, XLATE Epic_ADT_CodaMetrix.xlt, dest 172.31.23.2:39500 process ADT); the codametrix system doc (2 deliveries: the ADT feed + the intra-site DFTto_codaMetrix DFT feed); and the PeriWatch route, proving the UPOC-bits extraction surfaces real content from Epic_PeriCalm_ADT_pass (comments, fields, A02/A03 matches, PeriCalm_Loc table, pass/kill disposition). bash -n clean; TOOLS_JSON valid.

v0.8.20 — 2026-05-28

Route-chain tracer (lib/nc-paths.sh) REARCHITECTED for the real integrator: parse once, walk in memory; cross-site linking corrected from a port-match heuristic to authoritative destination-block resolution. v0.8.20 was never shipped — this entry supersedes the earlier port-based draft of the same version, which FAILED Bryan's real-integrator smoke (24-site QA env): catastrophically slow AND missing real cross-site feeders.

Problem (measured on the real 24-site integrator, before this fix):

  • nc-paths.sh ADTto_CodaMetrix ancout --site-only → correct chain but 84 s.
  • full (no flag) → same single chain, 164 s.
  • --down → "unknown flag".
  • Root cause: the walker invoked nc-parse.sh as a SUBPROCESS per hop / per candidate (destinations/sources/protocol-nested/protocol-field/ list-protocols), and each invocation re-ran _blocks + cmd_protocol_block — two full awk passes over the (16K-line) NetConfig. O(threads × parse-cost) = minutes. Even the intra-site walk was a bottleneck (sources scans every body).
  • Correctness: the draft linked sites by matching an outbound's PROTOCOL.PORT to an inbound's listen/ICL port. That MISSED the real mux feeder of ancout's IB_ADT_muxS (port 62043) — because no thread has PROTOCOL.PORT 62043; the link is expressed only through a destination block.

1. Single-pass index (lib/nc-parse.sh new index subcommand, cmd_index). ONE awk pass per NetConfig emits a flat record stream the walker needs: P protocol, D DEST edge (handles BOTH { DEST name } and the list form { DEST {a b c} } — the list form was silently dropped by the old cmd_destinations regex), L listen port (server PROTOCOL.PORT with ISSERVER=1 and/or guarded ICLSERVERPORT), O outbound dest port, and X <destname> <site> <thread> <port> — the resolution of a top-level destination block. Indexing all 24 live NetConfigs is <1 s.

2. In-memory route graph + in-memory walk (lib/nc-paths.sh). The index loads once into bash associative arrays (G_PROTO/G_DESTS/G_LISTEN/G_OUT/ G_DESTBLK/G_INSRC/G_DESTBLK_REV; _load_nc, _build_in_sources, _build_graph). _walk_down/_walk_up and the one-hop primitives (_outgoing/_incoming/_xsite_down_targets/_xsite_up_feeders) are now pure O(1) lookups — NO subprocess and NO re-parse per hop. Cycle test is a bash substring match (_seen_has), not a grep fork per hop.

3. Cross-site link corrected to destination blocks. Cloverleaf links sites through the named ICL destination table: a thread's DATAXLATE DEST may name either a LOCAL protocol (intra-site hop) or a destination block, which resolves to { SITE } { THREAD } { PORT }. A DEST naming a destination block is the cross-site hop, resolved by NAME to the exact remote (site,thread). The PORT equals the remote thread's listen/ICL port (corroboration), but it is never the primary key. ICLSERVERPORT is still read GUARDED in the index (absent/{} → skipped, never the un-guarded keylget that crashed v2 paths.tcl).

4. full mode = upstream × downstream JOIN at the thread. No more O(sites × threads) entry-chain scan (Vera m3). The complete chain is the thread's upstream feeder chains (each ending AT the thread) joined to its downstream chains (each starting AT the thread); both walks follow destination blocks, so the join spans sites naturally.

5. Flag standardization. --down/--up are now accepted as aliases of --downstream/--upstream in nc-paths.sh itself (they already worked via the /paths slash handler; the bare script rejected them).

6. Intra-site hops UNCHANGED in semantics — still the DATAXLATE DEST list, never an ICLSERVERPORT walk.

7. Removed: the port-match cross-site index (_build_port_index, the PI_* arrays), the per-hop subprocess primitives (_proto_port/_proto_isserver/ _icl_port/_norm_port), and the dead _nc_for_site helper.

Verification — RE-MEASURED ON THE REAL 24-SITE INTEGRATOR (tarball cloverleaf_test.tar.gz, HCIROOT = extracted integrator/):

  • ADTto_CodaMetrix ancout --site-only: 84 s → 0.66 s.
  • ADTto_CodaMetrix ancout (full): 164 s → 1.0 s.
  • whole-tree --all (all 24 sites, 709 chains): 4.3 s (well under a minute).
  • --down / --up: now valid flags.
  • REAL cross-site chain proven: mux/ADTfr_epic_964700 --> mux/OB_ADT_ancS ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix. IB_ADT_muxS's upstream feeder lives in the mux site and reaches ancout via destination block OB_ADT_ancS ({ SITE ancout } { THREAD IB_ADT_muxS } { PORT 62043 }) — exactly the feeder the port-match draft missed and Bryan asked for. Multi-site fan-out is site-correct (each destination block resolves to its own site's IB_ADT_muxS).
  • --site-only confirmed to suppress all cross-site hops.
  • bash -n clean (nc-paths.sh, nc-parse.sh, larry.sh); /paths + tool_nc_paths drive clean under set -u; MANIFEST regenerated & --check OK.
  • No-traffic-bypass preserved (read-only NetConfig parsing; no engine/network calls; pure bash + awk, no python/.pyz; portable Win + Linux).

v0.8.19 — 2026-05-28

Deterministic route-chain nc_paths tool — the #1 fix from the deterministic tool-coverage plan (Clover). The on-server LLM had NO transitive route-chain tool: to answer "show me the path / what feeds X / full route" it brute-forced the whole NetConfig with grep/read_file/bash_exec + a long chain of nc_destinations calls (the ~$1 prompt Bryan hit, which still gave up before unrolling the chain). This release wires up a deterministic enumerator so the model makes ONE call.

1. New single walker backend lib/nc-paths.sh. A DFS path-enumerator that ports the v2 paths semantics (cloverleaf_tools/cli/legacy_workflow_commands.py paths_cmd + _enumerate_downstream_paths/_enumerate_upstream_paths/_enumerate_all_full_paths). Output columns SITE THREAD HOPS PATH — HOPS = thread count in the chain, PATH = the chain joined by -> (one row per enumerated root-to-leaf path; a branching thread yields multiple rows). Matches Bryan's exact format (pharmacy / pharm_adt_in / 2 / pharm_adt_in -> pyxismed_crh_adtorm_out).

  • DEST-based routing, never ICLSERVERPORT. Next hop is resolved ONLY from the DATAXLATE { DEST <name> } list (via nc-parse.sh destinations/sources). Bryan's old paths.tcl walked via keylget data ICLSERVERPORT, which THROWS on any thread lacking that key (every outbound/client thread), so the trace died on the first client thread. The DEST list is present on every routing thread regardless of direction and yields nothing (no crash) when absent — the v2 paths.tcl crash cannot recur here.
  • All-mode (--all): enumerates every chain from every entry point (a thread with no incoming), deduped — the whole-site/environment chain inventory (covers gap #2, v2 list_full_routes).
  • Cross-site BY DEFAULT (Bryan's resolved decision): when a chain's terminal thread is also an entry thread in another site's NetConfig (correlated by shared thread name), the walk CONTINUES into that site — the mux -> ancout -> CodaMetrix chain is followed end to end. --site-only scopes to a single site.
  • Robust cross-site cycle detection. Every walk carries the full ancestor set keyed by (site,thread); revisiting an ancestor terminates that path (the terminal node is still emitted), plus a global max-depth backstop (128, v2 parity). Always terminates — verified against a deliberate cross-site cycle.
  • Formats: table (aligned), tsv, jsonl.

2. Consolidated the walker backend (no second dark walker). Removed the never-wired cmd_chain BFS-node-set command from nc-parse.sh (it only emitted a flat set of reachable nodes, never enumerated paths, and was invisible to the LLM). nc-paths.sh is now the SINGLE route-chain backend; the nc-parse.sh chain subcommand now errors with a pointer to it.

3. Wired nc_paths into the LLM (the critical piece).

  • larry.sh — new tool_nc_paths wrapper (table output routed through _fence_aligned_table; tsv/jsonl pass unfenced), a nc_paths) dispatch case, and a {"name":"nc_paths", ...} schema entry. The schema description steers the model to use nc_paths for ANY "show me the path / trace the chain / what feeds X / full route / end-to-end flow / sources+destinations chain" question instead of grep_files / read_file / bash_exec / repeated nc_destinations.
  • nc_sources schema note tightened to "ONE HOP ONLY — use nc_paths for chains."

4. Ergonomics (manual entry + slash command).

  • larry tools nc-paths <thread> <site> [--site-only] runs the enumerator standalone (no API). Registry entry added to larry tools list.
  • /paths <thread> [site] [--up|--down|--site-only|--all|--format ...] REPL slash command (defaults site to the current $HCISITE). The fuller tbn + <thread>.<site> <subcmd> shell-shim ergonomics remain a separate follow-on pending Bryan's cheat sheet — not built here.

Zero traffic-bypass primitives.

v0.8.18 — 2026-05-28

Readable terminal output + two DIRECT-mode follow-ups from Vera's v0.8.17 gate (Clover). larry runs in a plain monospace terminal that does NOT render markdown — this release makes tabular results actually line up and site lists read vertically, and closes the deferred ssh_push direct-mode gap plus the duplicated direct-ssh options.

1. Readable output in a monospace terminal (Bryan's primary ask). The on-server LLM was re-rendering tool results as markdown tables (| col | col |, |---|) — which print raw and never align — and collapsing the site list into a comma-joined inline sentence. The tools already emit clean data (nc-find/nc-inbound pre-align columns with %-*s; list_sites prints one site per line with a sites: N (excluded: …) headline). The fix steers the model to PRESERVE that, and reinforces it at the tool boundary so it can't drift:

  • agents/larry.md — new TERMINAL OUTPUT CONTRACT section (5 rules): never emit a markdown table; reproduce a tool's pre-aligned/fenced table VERBATIM in a ```text fence; never hand-align columns (the tools do it deterministically); render entity lists ONE PER LINE vertically (never comma-joined inline), and keep list_sites's count headline + one-per-line list as returned.
  • larry.sh — new _fence_aligned_table helper wraps an already-aligned tool table in a ```text fence with an explicit "reproduce VERBATIM; do NOT convert to a markdown table" marker. tool_nc_find and tool_nc_find_inbound route their --format table output through it (tsv/jsonl data formats pass through unfenced). Empty / error / usage output passes through UNCHANGED (the operator must still see errors plainly); the table bytes are never altered — only the fence + marker lines are added.

2. cmd_push DIRECT-mode branch (closes Vera's deferred MAJOR). cmd_push called _resolve_open_master unconditionally, so a DIRECT-mode alias (no ControlMaster) died "no open master" — breaking ssh_push (exposed tool; used by nc_regression phase 4 to push cross-env input bundles, central to the epic_adt_in cross-env goal). Added the symmetric direct branch mirroring cmd_pull: _direct_scp <alias> <local> <addr>:<remote> for the transfer, then post-transfer size verification via _dispatch_remote (fresh per-command _run_direct). lib/ssh-helper.sh.

3. De-duplicated the direct-ssh options (closes Vera's MINOR). The shared direct-mode -o flags (the five security-critical ones — PreferredAuthentications=password, PubkeyAuthentication=no, StrictHostKeyChecking=accept-new, ControlMaster=no, ControlPath=none — plus NumberOfPasswordPrompts=1 and ConnectTimeout) were copied in three places (cmd_setup probe, _run_direct, _direct_scp). Extracted into one _direct_ssh_opts helper so a security-option change can't drift across copies; all three sites now splat $(_direct_ssh_opts). ssh -o ordering is immaterial (no conflicting duplicate keys) so this is byte-equivalent in BEHAVIOR — verified the reconstructed argv matches the prior inline copies token-for-token. lib/ssh-helper.sh.

No traffic bypass (unchanged, absolute). DIRECT mode is legitimate forced-password auth only — no proxy, no tunnel, no masking, and host-key checking STAYS ON (StrictHostKeyChecking=accept-new). The dedup helper keeps that posture in a single source of truth.

bash -n clean on every changed file. /sites slash path drives clean under set -u. VERSION + larry.sh:81 bumped to 0.8.18; MANIFEST regenerated.

v0.8.17 — 2026-05-28

Per-alias DIRECT (no-multiplex) SSH mode (Clover). The real unblock for /sites qa on Bryan's Legacy→qa box.

Confirmed root cause (live-verified on the qa box). The qa server (bryjohnx@lhsixfqa → Cloverleaf host shdclvf01q, release cis2025.01) REJECTS SSH ControlMaster session multiplexing. The master opens and authenticates fine, but any session multiplexed over it dies with read from master failed: Connection reset by peer, then ssh falls back to a fresh connection that fails auth. A DIRECT per-command connection WORKS and returns the 24 site dirs (21 after the helloworld/siteProto/master filter).

Fix — per-alias DIRECT mode. A new TSV column 5 (direct, on|off) opts an alias out of the ControlMaster entirely. When on, ALL remote ops for that alias (exec, discover, pull-smat, pull) run a FRESH per-command sshpass -f <credfile> ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no -o StrictHostKeyChecking=accept-new -o ControlMaster=no -o ControlPath=none -o ConnectTimeout=<n> <user@host> '<remote-cmd>'. The remote command is shaped by the SAME _remote_cmd_for chokepoint as the master path, so a pinned HCIROOT (set-hciroot) is honoured identically — only the transport changes. NO traffic bypass: legitimate forced-password auth, no proxy/tunnel/masking, host-key checking stays on (accept-new).

  1. lib/ssh-helper.sh — TSV column 5 (direct); read_host_direct / _alias_is_direct readers; cmd_set_direct (set-direct <alias> on|off, whitespace-trimmed, legacy <5-column files backfilled, pinned HCIROOT in col 4 never clobbered); _run_direct (per-command sshpass dispatch + stderr filtering); _direct_scp (fresh-sshpass scp for pull); centralised _dispatch_remote (direct vs master, identical command shaping); cmd_exec / cmd_discover / cmd_pull_smat / cmd_pull route through it and SKIP the open-master requirement in direct mode; cmd_setup in direct mode VALIDATES the stored password with one trivial direct command instead of opening a (pointless) master; cmd_hosts shows the direct column (master=n/a for direct aliases). New set-direct|direct subcommand + help.
  2. larry.sh/ssh-set-direct <alias> on|off slash command (set-u-safe split, the v0.8.16 idiom); added to the completion array + descriptions + print_help; tool_list_sites reports the transport (direct sshpass vs ControlMaster) and gives a transport-correct recovery hint on discover failure (stale password in direct mode, not "closed master"); list_sites tool description + the system-prompt remote-alias guidance updated for the multiplex-rejection symptom and the /ssh-set-direct recovery.

Clean output (UX). The qa login profile emits a pre-auth banner ("Unauthorized access…/monitored") AND sudo: a terminal is required / sudo: a password is required on STDERR for non-interactive sessions. The parsed site list is on STDOUT (already clean). _filter_direct_stderr strips exactly those known-benign lines; remaining stderr is surfaced ONLY on an actual non-zero command failure. So /sites qa shows just the 21 sites + the (excluded: …) note — no banner/sudo noise.

No regression. Aliases WITHOUT the direct flag keep the existing ControlMaster-multiplex path unchanged (and still die "no open master" when none is open).

New flow for qa. /ssh-pass qa/ssh-set-hciroot qa /hci/cis2025.01/integrator/ssh-set-direct qa on/sites qa. No master step in direct mode.

Self-verify (this gate, pre-Vera). Driven non-interactively on this Mac: flag round-trip (add → set-direct on/off → hosts → raw TSV; col 4 HCIROOT preserved across col 5 writes); legacy 3-column file backfilled to 5 columns; command-shaping (_remote_cmd_for) identical pinned-export for direct and master, login-shell for unpinned; _alias_is_direct 0/1; stderr filter against a sample banner+sudo+real-error (only the genuine find: … Permission denied survives; pure banner+sudo → empty); _run_direct end-to-end via a stubbed sshpass — STDOUT clean + STDERR empty on success, real error surfaced + banner stripped + rc propagated on failure; tool_list_sites qa over a stubbed direct discover → sites: 21 (excluded: helloworld, siteProto, master), clean, correct mode line; direct-mode setup/discover skip the open-master die; non-direct exec/discover still die "no open master" (no regression); and the /ssh-set-direct REPL slash handler (+ neighbours /ssh-set-hciroot, /ssh-setup, /ssh) drive through a case dispatcher under set -u + compat32 with no unbound-variable abort and correct routing. (Live connect to the qa box not run here — sshpass is the production host's tool, absent on the Mac; the path up to the sshpass invocation is fully exercised via the stub.)

Vera v0.8.16 caveat closed (evidence attached). Repo-wide self-ref-class grep grep -rnE '^[[:space:]]*(local|declare|typeset|readonly|export)[[:space:]]+[A-Za-z_].*=.*$' larry.sh lib/ (712 superficial matches) + a targeted same-statement self-reference detector: ZERO same-statement self-references — every match is the safe sequential ;-separated idiom. bash -n larry.sh lib/*.sh → clean (36/36 files OK).


v0.8.16 — 2026-05-28

Hotfix (Clover): set -u unbound-variable abort in the v0.8.15 /ssh-set-hciroot REPL slash command, reported live on Bryan's MobaXterm / Cygwin bash (larry.sh: line 6903: _sh_alias: unbound variable).

Root cause. larry.sh:6903 was a single-line local declaration in which the 2nd variable's initializer referenced the 1st variable assigned in the SAME local statement: local _sh_alias="${rest%% *}" _sh_path="${rest#"$_sh_alias"}". Under set -u, $_sh_alias is treated as unbound at expansion time within its own local statement, aborting the command. This is NOT Cygwin/bash-3.2 specific — it reproduces identically on bash 5.x. The same latent pattern lived at larry.sh:6925 (the /ssh <alias> <cmd> handler: local alias="${rest%% *}" rcmd="${rest#"$alias"}"), not yet triggered only because that path hadn't been run.

Fix. Both split into set-u-safe form — declare the locals first, assign the first, THEN reference it on a later line. Correct on every bash version.

  1. /ssh-set-hciroot handler (was larry.sh:6903) — safe-split; preserves the empty-path "clear the pin" semantics.
  2. /ssh handler (was larry.sh:6925) — same safe-split.

Codebase-wide bug-class audit. Every .sh (larry.sh + all lib/*.sh + top-level scripts) was scanned for any single-statement local/declare/ typeset/readonly/export where one variable's initializer references another variable assigned in the SAME statement. After the two fixes above, zero remaining instances. All superficially-similar lines elsewhere are ;-separated sequential statements (e.g. _x="${rest%% *}"; rest="${rest#"$_x"}") where the referenced variable is already bound — these are the CORRECT idiom and are safe under set -u.

Gate-gap closed. The v0.8.15 gate exercised the ssh-helper.sh set-hciroot SUBCOMMAND but never the larry.sh /ssh-set-hciroot REPL slash dispatch — where the bug actually was. The v0.8.16 gate drives the ACTUAL slash-command handler bodies (extracted verbatim from the fixed source) through a case dispatcher under set -u with BASH_COMPAT=3.2, non-interactively, with no self-update / API / bootstrap. Verified /ssh-set-hciroot <alias> <path> (normal), /ssh-set-hciroot <alias> (empty-path CLEAR), trailing-space clear, no-args usage, /ssh <alias> <cmd>, and /ssh <alias> (usage) all execute WITHOUT the unbound-var abort. The same harness run against the OLD v0.8.15 line aborts with _sh_alias: unbound variable (rc=1), confirming the gate now catches this regression class. The no-traffic-bypass security line is unchanged.


v0.8.15 — 2026-05-28

Legacy/qa remote-enumeration fix (Clover). Three confirmed-live properties of the qa box bryjohnx@lhsixfqa (→ Cloverleaf host shdclvf01q, release cis2025.01) broke v0.8.13/v0.8.14 remote site enumeration: a sudo-gated login profile (a non-interactive SSH session hits sudo: a terminal is required, so bash -lc can't initialize the env and $HCIROOT comes back EMPTY); no hcisitelist on the box; and a password that rotates ~every 12h (stale stored credential, and a pre-auth banner that masked the real auth error). All three are now handled. The no-traffic-bypass security line is unchanged — zero proxy / masking / evasion primitives (same Gundersen-class control).

  1. Per-alias HCIROOT pin (load-bearing). New ssh-helper.sh set-hciroot <alias> <path> and the /ssh-set-hciroot <alias> <path> slash command persist an HCIROOT for an alias as a 4th column in .ssh-hosts.tsv (old 3-column files stay valid; an empty path clears the pin). When an alias is pinned, exec / discover / pull-smat run the remote command with HCIROOT=<path> exported EXPLICITLY under a NON-login sh -c — they do NOT wrap in bash -lc, so the sudo-gated login profile is never invoked. A single chokepoint, _remote_cmd_for, makes every remote path honour the pin identically (unpinned aliases keep the v0.8.13 bash -lc login-shell behaviour, unchanged). /sites <alias> --hciroot <path> is a convenience that persists the pin then enumerates. This makes qa work regardless of the broken profile. qa HCIROOT = /hci/cis2025.01/integrator.
  2. Portable site enumeration (no hcisitelist dependency). discover's remote script now makes the NetConfig walk the PRIMARY path, identical to lib/each-site.sh (find $HCIROOT -mindepth 1 -maxdepth 2 -name NetConfig -type f → dirname → basename → sort -u). hcisitelist is consulted ONLY if it is actually present AND the walk found nothing — never as the dependency. Works on a box with no hcisitelist. Emits clear NOTE lines (HCIROOT empty → suggests the pin; not a directory; no NetConfigs found).
  3. ControlMaster-open hardening (banner + rotating password). setup now forces -o PreferredAuthentications=password -o PubkeyAuthentication=no -o NumberOfPasswordPrompts=1 so sshpass feeds the password cleanly past the pre-auth banner and a stale credential fails fast instead of hanging; it surfaces the REAL auth error (greps for permission/auth/password/host-key keywords) instead of echoing only the banner; and on an auth failure it RE-PROMPTS for a fresh password (the 12h rotation), stores it 0600, and retries ONCE. Every failure path emits a clear next step — never a silent no-op.
  4. /sites excludes non-real entries (transparent). The enumeration now drops (a) static scaffolding/special sites — helloworld siteProto master, a documented, tunable SITES_EXCLUDE env var — and (b) any site dir whose name equals the host: the REMOTE discover walk computes hostname -s and full hostname and drops a match (qa's alias host is lhsixfqa but the engine box is shdclvf01q; a dir just named after the box is not a site), and also drops a match against the alias's configured SSH host. The filter is applied at the SINGLE enumeration source so REMOTE (pinned + login-shell) and LOCAL /sites behave identically. NOT silently hidden: the walk emits an EXCLUDED line and the tool layer renders the real count as the headline with a note, e.g. sites: 21 (excluded: helloworld, master, siteProto). Acceptance (qa, 24 raw dirs): /sites qa → 21, with the 3 exclusions noted; no dir matches shdclvf01q. The no-traffic-bypass security line is unchanged.

Acceptance (qa): with HCIROOT pinned to /hci/cis2025.01/integrator, /sites qa returns the site list via the NetConfig walk with no bash -lc and no hcisitelist; /ssh-setup qa with a fresh password opens the master past the banner. Self-verified with bash -n on every changed file. POSIX-sh remote scripts; compatible with bash 3.2 / Cygwin; no regression to unpinned aliases.


v0.8.14 — 2026-05-28

Locked-down-box survivability (Clover): make the full toolkit usable BY HAND with no API/LLM, and turn a blocked-API failure from a raw error dump into honest, actionable guidance. Both deliverables are graceful degradation only — ZERO traffic-bypass, masking, proxy-hiding, or block-circumvention primitives were added (hard security line: larry must not try to defeat a corporate security control on a PHI box).

  1. larry tools — manual-tools dispatcher. A discoverable, low-friction entry point so all 24 operator-facing Cloverleaf/HL7 tools in lib/ are listable and runnable by hand with no REPL, no API, and no LLM. larry tools list prints every tool grouped (NetConfig read/write, diff & regression, HL7, site-iteration/format) with a one-line description and a (missing) flag for any registry entry absent on disk; larry tools <name> [args] runs a tool (no args → its own --help); larry tools help explains the mode. It dispatches at the very top of larry.sh — BEFORE bootstrap, self-update, the jq gate, and any network call — so it works on a fresh install with the API unreachable. Only requirement is a lib/ dir next to larry.sh (or in $LARRY_HOME/lib). Internal plumbing (oauth/phi/fetch-safe/cygwin-safe/ ssh-helper/journal/lessons/etc.) is intentionally NOT listed — those are REPL/agent support, not operator-facing tools. The registry descriptions are the SSOT for tools list.
  2. _diagnose_api_block — honest blocked-API detection + guidance. On a locked-down box where api.anthropic.com is blocked, larry no longer dumps a bare network error. It inspects the last call's curl exit code, curl stderr, the response body, and the response headers for block signatures — TLS interception / MITM cert (unable to get local issuer certificate, self- signed, certificate verify failed), DNS-filter / refused / timed-out / TLS- handshake exit codes, Could not resolve host / Connection refused, HTML-where-JSON-was-expected 403/interstitial pages, and explicit Cisco Umbrella fingerprints in body or headers (Server: Cisco, X-Cisco, Cisco Umbrella, This site is blocked, …). When it recognizes a block it prints what happened, the target URL, the path into manual-tools mode (larry tools list / <name> --help / a worked nc-parse example), and the correct remedy — ask IT to allowlist the API host, or run from a permitting network. It states plainly that this is a corporate control on a PHI box and that larry will not, and must not, try to bypass it. Wired into both the non-streaming and streaming API paths; reads its diagnostics from the deterministic $LARRY_HOME/.last-curl-* / .last-stream-* files that call_api persists (the call runs in a subshell, so in-memory globals don't reach the diagnoser).

Fires before bootstrap/network; compatible with bash 3.2 / Cygwin; no regression to the v0.8.13 paths.


v0.8.13 — 2026-05-28

Proactive both-mode Cloverleaf-env detection + the $HCIROOT login-shell fix + a first-class site lister + the dominant residual-slowness fix (Clover). Closes the live-session friction where "how many sites are on qa?" stalled on repeated export $HCIROOT nags despite a working qa SSH alias.

  1. $HCIROOT login-shell fix (root cause). ssh-helper.sh exec ran the remote command in a NON-login, non-interactive shell, so the Cloverleaf login profile (which exports $HCIROOT, $HCISITE, and the hci* PATH) never sourced and $HCIROOT arrived empty — and Larry gave up and asked for a path. exec now wraps the command in bash -lc (login shell), so the remote env populates exactly as for an interactive operator login. Version-agnostic, zero-config. Escape hatch: NOLOGIN prefix or LARRY_SSH_NO_LOGIN=1. The pull-smat find/sample paths use the same wrapper so $HCISITEDIR/sqlite3 resolve there too.
  2. Both-mode detection (proactive). Startup now determines and surfaces a MODE= line: LOCAL ($HCIROOT set by the local profile, or a Cloverleaf install auto-discovered at a common path — work the local tree, no SSH), REMOTE (no local install but a Cloverleaf SSH alias configured — discover over a login shell), or UNKNOWN (ask which mode applies). Larry leads with what it found instead of asking Bryan to spoon-feed paths.
  3. First-class list_sites tool + /sites [alias]. "How many sites are on X / what sites exist" now Just Works in both modes: REMOTE (list_sites(alias=qa)) resolves the remote $HCIROOT in a login shell and enumerates sites (Cloverleaf hcisitelist fast-path, NetConfig-walk fallback); LOCAL (list_sites()) does the same against the detected local $HCIROOT. Reports the resolved HCIROOT, a count, and the names. New ssh-helper.sh discover <alias> backs the remote path.
  4. System-prompt de-nagging. agents/larry.md and the /nc-diff-env / /nc-regression-env templated prompts no longer instruct Larry to ask Bryan to export $HCIROOT for a reachable host. The cardinal rule is explicit: never request a path you can resolve; the only remote precondition surfaced is an open ControlMaster (/ssh-setup <alias>).
  5. Streaming slowness — dominant residual fix. v0.8.12 collapsed per-delta routing to one jq call, but each text_delta/input_json_delta STILL forked a second jq purely to un-escape the @json-encoded payload — ~N forks/turn (N≈output tokens), and on Cygwin/MobaXterm a fork is ~50-100ms. New _json_str_decode un-escapes in pure bash for the common escape-free chunk (zero forks), deferring to jq only when a real backslash escape is present. Halves per-turn fork count on top of v0.8.12. Round-trip verified for tab, escaped quote/backslash, embedded newline, \uXXXX Unicode, and empty.
  6. pull-smat path capture hardened (Vera Minor #1). The remote .smatdb path lookup previously took tail -1 of the merged stdout+stderr stream. Under the new login shell (item 1) a profile MOTD/banner can land on stdout, which a blind tail -1 could mistake for the path. The remote find now emits the resolved path behind a SMATDB_PATH: sentinel; the client selects by pattern, not position, and only falls back to the prior tail -1 when no sentinel is present (so no host that worked before can regress). Selection logic unit-verified for banner-before, banner-after, ERROR, empty, multi- sentinel, and no-sentinel-fallback.

Compatible with bash 3.2 / Cygwin; no regression to the v0.8.12 fixes.


v0.8.12 — 2026-05-27

Crash fix + slowness + cost pass on the new API-key rail (Clover). Full diagnosis: Deliverables/2026-05-27-cloverleaf-larry-v0812-crash-slowness-cost.md.

  1. Fix the post-response arithmetic crash (bash: <n>: syntax error: invalid arithmetic operator (error token is "")). The token/cost accounting fed jq-extracted usage counts straight into $(( )); on Cygwin/MobaXterm a CRLF-translated response body makes jq -r emit 1234\r, and // 0 only guards JSON null, not the CR. coerce_int now defends all three accounting sites (non-streaming cost block, streaming cost block, _record_ctx_used). This is the v0.7.5 OAuth crash's Anomaly-#4 recurrence on the response path.
  2. Prompt caching wired into the request (cost). The ~12.7K-token static prefix (system ~6K + 35 tools ~6.7K) was re-sent uncached every turn. v0.8.12 sends system as a block array and marks the last tool with cache_control: ephemeral (apikey rail; LARRY_PROMPT_CACHE=0 to disable), so the prefix is billed at the 0.1x cache-read rate after turn 1 — a ~90% cut on the static prefix and a large TTFB reduction.
  3. Streaming parser per-delta jq forks collapsed (slowness, dominant fix). The SSE content_block_delta hot path spawned 3 jq processes per event (.index, .delta.type, then the text/json payload). A normal response ships dozens-to-hundreds of these, and on Cygwin/MobaXterm a process fork is ~50-100ms (Windows fork emulation) — so the per-turn render lagged multiple seconds ("veeery slow"). The routing fields are now pulled in ONE jq call (payload @json-encoded so embedded newlines/tabs survive the line read), cutting the text path to 2 forks/event and ignored deltas to 1. Verified round-trip on text with embedded NL/TAB/quote and on input_json_delta.
  4. Per-turn PHI sidecar /health probe cached for the session (slowness). phi_client_available ran curl -m1 every turn; on MobaXterm the sidecar can never run, so it was a guaranteed curl fork + up-to-1s wait per turn. Now probed once per session (file-flag, survives the $(...) subshell); LARRY_PHI_REPROBE=1 forces a fresh probe.

v0.8.11 — 2026-05-27

Two headline features land together (Clover):

  1. API key is now the default auth rail — sanctioned, not edge-throttled; OAuth-impersonation disabled by default to protect the user's Max account; secure per-client key via /set-api-key, stored 0600, CR-safe, masked in diagnostics.
  2. Manifest-hashing auto-update — relaunch skips unchanged files by comparing each MANIFEST sha256 to a local hash, so only changed/missing files download. Relaunch drops from minutes to seconds.

1. API key as the default auth rail

Bryan's decision (2026-05-27): the durable fix for the multi-hour OAuth rate_limit_error is to STOP impersonating the official Claude Code client and move to the sanctioned API-key rail. Anthropic actively fingerprints and blocks Claude-Code OAuth impersonation (server-side enforcement since ~2026-01-09, ToS change 2026-02-19/20); every impersonated OAuth request flags the user's Max account. The API key (sk-ant-api03-) is the intended programmatic rail — a plain x-api-key request, billed pay-as-you-go, not edge-throttled. (Research: Deliverables/2026-05-27-claude-code-oauth-request-requirements-research.md.)

  • API key is the default. LARRY_AUTH_MODE defaults to apikey. The request shape is the clean sanctioned form: x-api-key + anthropic-version: 2023-06-01 + content-type: application/jsonno Authorization: Bearer, no anthropic-beta: claude-code-*, no claude-cli (external, cli) UA, no x-app: cli, no ?beta=true, no "You are Claude Code" system block. All of the prior impersonation scaffolding was removed.
  • OAuth is OFF by default (opt-in only). larry fires OAuth ONLY when LARRY_AUTH_MODE=oauth is set explicitly, and prints a one-time account-risk warning when it does. There is no silent OAuth fallback — larry never auto-pokes the impersonation tripwire. The opt-in OAuth request is minimal and honest (Bearer + anthropic-beta: oauth-2025-04-20 only).
  • Secure per-client API-key provisioning + storage (Bryan's core ask):
    • /set-api-key (and larry-auth.sh --api-key) prompts with read -s (silent, never echoed, never in argv / process table / shell history), optionally validates with one cheap /v1/messages ping (max_tokens:1), then stores the key at $LARRY_HOME/.api-key (mode 0600, owner-only), CR-stripped (MobaXterm/Cygwin CRLF-safe). --clear removes it; --status shows it masked. Each client holds its OWN key (mint one per machine at console.anthropic.com — independently revocable, leak-contained).
    • The key is fed to curl via --config - on stdin, so it never appears in curl's argv / the process table (ps-clean), in all request and validation paths.
  • Secret hygiene: the key is never logged, never committed (added to .gitignore and to larry's PHI/secret read_file path-block), and is masked as sk-ant-api03-XXXX…last4 in /auth, /auth-debug (alias /api-debug), and /set-api-key --status. The audit-trail secret-guard's sk-ant-[A-Za-z0-9_-]{20,} pattern catches sk-ant-api03- specifically.
  • 429-discrimination (reused from the prior thread's good work): a real rate-limit 429 ALWAYS carries anthropic-ratelimit-* headers → legitimate backoff; a 429 with NO such headers on the API-key rail → clear "edge/ transient bounce, not your quota" message (no futile backoff). On the opt-in OAuth rail, that same edge-reject signature triggers an automatic flip to the sanctioned API-key rail if a key is configured (LARRY_NO_EDGE_FALLBACK=1 to opt out).
  • Migration UX: first launch with the apikey default and no key → friendly prompt to run /set-api-key (not a bounce into the risky OAuth path).

2. Manifest-hashing auto-update speedup

sync_from_manifest used to re-download all 48 manifest entries every relaunch over authenticated HTTPS (Gitea via proxy + Cloudflare) and cmp locally to find the 03 that changed — ~3 min on the work-box for a 3-file update, because the MANIFEST was paths-only and the client could not tell what changed without fetching everything.

  • MANIFEST now ships each file's expected sha256 (path<TAB>sha256, generated at release by scripts/make-manifest.sh). The client fetches MANIFEST once, hashes its LOCAL copy of each path, and downloads ONLY entries whose hash differs or are missing. 48 round-trips → 1 (MANIFEST) + a local hash pass + N real downloads (N = actually-changed files, usually 03). Relaunch drops from minutes to seconds.
  • Fail-SAFE: a doubt NEVER skips an update. A download is skipped only when ALL hold — a working sha256 tool, a valid 64-hex manifest hash, the local file exists, and its hash matches exactly. No tool / empty / malformed / non-hex hash, missing local file, hash-tool error, or any mismatch (including a stale or wrong published hash) all fall through to download. Worst case is a needless re-download, never a missed update.
  • sha256 tool fallback chain: sha256sumshasum -a 256openssl dgst -sha256 → (none → full-download fallback, updater never breaks). Detected once, cached. Tool output normalized to bare lowercase 64-hex.
  • CR-safe: the MANIFEST is fetched from Gitea (CRLF risk); the whole line is CR-stripped before splitting and the hash-tool output is tr -d '\r''d, so a CRLF-tainted hash never forces a needless re-download.
  • Download path unchanged from v0.8.9 — still routes through fetch_validate (HTML-sign-in-trap detection), keeps the post-download cmp guard (idempotent), chmod +x for *.sh, and per-file --max-time. The v0.8.9 live progress indicator now renders over the new "verifying (local)" phase and the (fewer) "downloading" frames. New summary line: manifest sync: N updated, M unchanged (local hash), F failed, T total.
  • Release tooling (committed, not synced to clients): scripts/make-manifest.sh regenerates/checks the MANIFEST hashes; scripts/hooks/pre-commit blocks a commit whose MANIFEST hashes drifted. These are release-side only — the work-box consumes manifests, never generates them — so they are deliberately NOT listed in MANIFEST.

Deliverables: Deliverables/2026-05-27-cloverleaf-larry-api-key-default-rail.md, Deliverables/2026-05-27-cloverleaf-larry-manifest-hashing-speedup.md.

v0.8.9 — 2026-05-27

Manifest-sync live progress indicator (Clover). Symptom: Bryan's auto-update relaunch is very slow and looks frozen — a multi-minute silent gap between update found: X -> Y … relaunching and manifest sync: N updated, 0 failed, M total, with NO output in between. Observed: [21:32:29][21:35:37] = ~3 min of silence to update just 3 of 48 files; earlier [20:19:42][20:20:32] = ~50s for 22 files. Bryan cannot tell working-but-slow from hung.

Root cause (confirmed by reading sync_from_manifest). Phase-A sync does NOT do cheap HEAD/hash-checks — it fully downloads EVERY manifest entry over an authenticated HTTPS round-trip (Gitea via the corporate proxy + Cloudflare), then uses cmp -s locally to decide whether the bytes actually changed (discarding unchanged ones). With 48 entries that is 48 sequential full downloads, every relaunch, regardless of how few files changed. The loop emits nothing until the trailing summary log line → the entire window is silent → looks hung. The "3 files changed" in the summary is just how many survived the cmp; all 48 were still fetched.

  • Live in-place progress over the WHOLE sync. New _sync_progress / _sync_progress_done helpers render checking N/48 lib/foo.sh per entry, rewriting the line via \r\033[K (carriage-return + clear-line). When a fetch actually lands a changed file, the frame switches to downloading N/48 lib/foo.sh so real writes are distinguishable from the common unchanged case. The current filename is always shown, so a genuine stall is VISIBLE — you see exactly which file it is stuck on instead of a blank freeze.
  • MobaXterm-safe escapes only. Uses solely \r + ESC[K (the same primitive already at the readline prompt, audited safe in the v0.8.7 escape inventory). Deliberately NO DECSTBM scroll-region, cursor save/restore, or absolute-row addressing — the exact escapes MobaXterm mis-honors. Verified under a pty: only ESC[0m, ESC[2m, ESC[K ever emitted; zero forbidden escapes. Independent of LARRY_NO_STREAM / mouse mode (gates only on a TTY).
  • Non-TTY safe. Gates on [ -t 2 ]. When stderr is a pipe/log it emits a plain newline-terminated heartbeat every 10 files (no \r), so captured logs stay clean — verified zero \r bytes in piped output.
  • Heartbeat for hangs. The per-file fetch already carries curl --max-time (5s for VERSION, 15s otherwise); a stuck file now shows its name, times out, counts as a fail, and the loop advances — never an infinite silent stall.
  • Speedup deferred (by design). The clean fix — fetch the manifest once, compare per-file hashes locally, and SKIP unchanged files with NO per-file network round-trip — requires the MANIFEST to carry hashes. It currently carries paths only (no hashes/sizes), so the optimization would require a manifest-format + release-tooling change (higher risk, separate change). Filed as a follow-up: MANIFEST should emit path<TAB>sha256 so v0.9.x can turn 48 network round-trips into 1 fetch + local compare. This release ships the indicator (Bryan's explicit ask); correctness of the sync is unchanged.

Note: the v0.8.9 relaunch itself is still the slow path (the indicator helps from the NEXT update onward), but you now see live forward progress during it.

v0.8.8 — 2026-05-27

Force unconditional 429 header capture (Clover). Symptom: Bryan's MobaXterm work-box hits rate_limit_error repeatedly, but $LARRY_HOME/log/headers.log NEVER generates — so we cannot diagnose which rate-limit rail / auth path is failing. The single goal: guarantee the log generates on the NEXT 429 so Bryan can tail it and paste it (manual paste is the plan; auto-sync is dropped).

Call-flow trace (first, to disprove the deeper hypothesis). Bryan's box runs LARRY_NO_STREAM=1 (auto-set on MobaXterm since v0.8.5), so agent_turn takes resp=$(call_api …). call_api (larry.sh) ALWAYS dumps response headers via curl -D and ALWAYS calls _parse_response_headers on that dump after curl returns — regardless of HTTP status (it explicitly comments "headers carry rate-limit info even on 429s"). So the non-stream 429 path WAS reaching the parser. The parser was NOT the missing call — the bug was entirely the over-clever write-gate INSIDE the parser.

Root cause = the write-gate was too clever for its own purpose. The v0.8.5 gate wrote headers.log only if (OAuth-mode AND a unified-* header was present) OR (retry-after was non-empty). Bryan's 429s carry NEITHER: the backoff used the exponential 2/4/8s fallback (proving no server retry-after), and a per-minute burst 429 routinely omits the unified-* family. Neither branch fired → no write → no log → no diagnosis. The capture defeated its own purpose.

  • Unconditional write on ANY 429, detected from the status line. _parse_response_headers now greps the -D dump for ^HTTP/<ver> 429 (CRLF-tolerant) and, on a match, ALWAYS writes the full raw header block to $LARRY_HOME/log/headers.log — regardless of retry-after, unified-*, or auth mode. A bare 429 with no diagnostic headers STILL logs; that absence is itself the finding (signals a low/bare-tier limit).
  • 429s exempt from the OAuth 50-call cap. New STATUS_429_headers_logged counter with its own budget (STATUS_429_HEADER_LOG_LIMIT=200), independent of the 200-path OAuth sampling cap. A session that burned all 50 OAuth captures on successful calls STILL logs its next (51st-call) 429.
  • Full diagnostic dump. The 429 block writes: a banner with auth-mode (OAuth-Max vs API-key rail), the detected limit-rail, retry-after, org-id, and request-id; then the HTTP status line, ALL anthropic-* headers (not just -ratelimit-*), retry-after, request-id, and every x-* header — so the auth-rail + which-limit question is answerable from one paste.
  • Live stderr pointer. On every 429 capture, prints phi/rl> 429 headers logged to ~/.larry/log/headers.log (rail=<x>, retry-after=<y>) — paste for diagnosis so Bryan knows the log now exists.
  • Same-pattern sweep. Streaming path (call_api_stream_drain_pending_stream_headers_parse_response_headers) shares the same function, so Mac/Linux streaming users get identical 429 capture. The v0.8.0 tool_read_file PHI path-block (which blocks $LARRY_HOME/log/) is tool-dispatch-only — Bryan reading his own headers.log via interactive shell tail is unaffected (verified: no shell-level block; bash_exec runs bash -c directly without the path-block). The 200-path OAuth sampling cap is unchanged.

v0.8.7 — 2026-05-27

Status-line render fix for MobaXterm/Cygwin (Clover). Symptom: the dim between-turn status line (session context + rate-limit reset date) never appeared on Bryan's MobaXterm work-box — the v0.7.1 status-line feature that MEMORY.md flagged as a never-verified passive item.

Root cause = suppress-when-empty gate, NOT terminal positioning. render_status_line (larry.sh) gated the OAuth arm on ctx_used_tokens, oauth_5h_utilization, AND oauth_7d_utilization ALL being empty — returning silently (rendering nothing) when so. Two facts made all three stay empty turn after turn on Bryan's box: (1) STATUS_ctx_used_tokens is populated by _record_ctx_used, which runs only AFTER a successful agent_turn; on a rate_limit_error the turn returns early (larry.sh ~L3929), so ctx was never recorded; (2) pre-v0.8.5 the anthropic-ratelimit-unified-* utilization headers weren't captured on error responses. With every turn erroring, all three gate fields were empty every turn, so the line was suppressed for the whole session and never rendered. This was NOT a positioning bug: the status line is a single plain printf'd dim line printed between turns — there is no DECSTBM scroll-region reservation, no cursor save/restore, no absolute-row positioning anywhere in the codebase, so MobaXterm's terminal emulation had nothing to mis-honor. It was also NOT coupled to streaming or mouse mode.

  • Gate on turn count, not data presence. render_status_line now suppresses ONLY before the first turn has run (_LARRY_TURNS == 0, coerce_int-guarded); thereafter it ALWAYS renders. Both auth-mode helpers already self-render a placeholder for any unpopulated field, so the line always shows session context (model context window, turns, session cost) and the rate-limit reset time fills in once a successful call — or, since v0.8.5, a captured error response — populates the headers.
  • /status always renders on demand, even before the first turn — an explicit request bypasses the turn-0 gate. Lets Bryan verify the line renders on MobaXterm without first completing a (possibly rate-limited) turn.
  • CR-taint hardening (same-pattern sweep). The OAuth segment's reset-epoch comparisons ([ <epoch> -le <now> ]) read STATUS_oauth_{5h,7d}_reset_epoch, which come from _header_value (strips only the TRAILING CR). A CRLF response on MobaXterm or a non-numeric token would have crashed the arithmetic test and aborted the entire line. Both comparisons now coerce_int the epoch first; the STATUS_oauth_status color-override case is now strip_cr-guarded so a rate_limited\r value still matches its literal-glob arm.
  • Same-pattern sweep results: audited every escape sequence in larry.sh + lib/ — only color SGR, clear-screen (\033[2J\033[H), erase-line (\r\033[K), and the opt-in mouse/bracketed-paste modes (off by default since v0.7.5) are used; ZERO scroll-region / cursor-save-restore / absolute-cursor sequences, so no other UI element is at risk of MobaXterm mis-rendering. Confirmed no user-visible element is gated behind streaming (used_stream only guards re-printing already-streamed text) or mouse mode.
  • Verification: bash -n clean; 7/7 unit tests pass against the shipped render functions — turn-0 suppressed; turn≥1 with empty data renders with placeholders; reset date shown when populated; renders with LARRY_NO_STREAM=1
    • mouse off (Bryan's exact config); survives a CR-tainted reset epoch without crashing; LARRY_NO_STATUS=1 still fully disables.

v0.8.6 — 2026-05-27

Work-box → Mac headers.log sync (tsk-2026-05-27-023, Clover headers-sync). Closes the last gap in the rate-limit-diagnosis pipeline: the anthropic-ratelimit-* headers captured on Bryan's MobaXterm work-box (where the testing happens) never reached the Mac's memory daemon, so they could not be analyzed. v0.8.6 pushes the work-box headers.log to a daemon-watched path on the Mac automatically; the Mac daemon ingests it to memory Tier 4 (Hindsight) + Tier 7 (mem0).

  • New lib/headers-sync.sh — incremental, offset-tracked, idempotent push of $LARRY_HOME/log/headers.log to a per-host file on the Mac (~/.cloverleaf/headers-<workbox-hostname>.jsonl, a daemon-watched dir). Transport rides the EXISTING authenticated SSH ControlMaster (/ssh-setup <alias>) — no new key, no second auth, the password is never in argv/env. Only the new bytes since the last sync are sent (dd skip=offset → remote cat >>); a no-op when nothing is new; a re-seed (truncate + resend) when the local file rotates/shrinks. Fully graceful: missing target, closed master, or transport failure logs a warn to $LARRY_HOME/log/sync.log and returns non-fatally — it can NEVER crash or wedge the larry session.
  • /headers-sync on|off|status|target <alias>|now slash command. target binds the Mac SSH alias; on/off toggle auto-sync (persisted to $LARRY_HOME/.env as LARRY_HEADERS_SYNC / LARRY_HEADERS_SYNC_TARGET); status shows enabled?, target, dest, last-sync time, bytes pushed, and master state; now runs one incremental sync on demand. Registered in the TAB-completion arrays and /help.
  • Auto-sync cadence: on larry exit. The REPL EXIT/INT/TERM handler flushes headers.log if auto-sync is enabled (cheap + incremental). On-demand /headers-sync now is always available. (After-EVERY-turn cadence was intentionally deferred to keep this change out of the turn/streaming loop that v0.8.5 just reworked.)
  • Mac-daemon receive side (scripts/headers_log_ingest.py, not part of the larry bundle): now resolves headers-*.jsonl glob sources under the watched dirs IN ADDITION to the fixed canonical headers.log, and processes ALL sources with PER-SOURCE offsets — so the Mac's own stream and one or more work-box streams are surfaced independently. Each fact carries a source= label (the work-box hostname) so the memory layer can tell them apart.
  • Security (Vera PHI audit V7): headers.log holds only anthropic-* response headers (rate-limit metadata + org id) and HTTP status lines — NO message body, NO PHI — so syncing is safe. The existing key/password-auth ControlMaster transport is reused unchanged (not weakened).

v0.8.5 — 2026-05-27

Diagnose-don't-assume rate-limit cluster fix (Clover #8). Symptom: a hello turn threw rate_limit_error on a work-box with 90% of the Claude Max 5h quota free — so NOT 5h-window exhaustion. Root cause = a short-window BURST rail tripped by a stream→non-stream double-send per turn, with no backoff.

  • Rate-limit backoff + actionable message (ROOT). A 429 no longer fails the turn or fires an immediate re-send. agent_turn now retries with backoff that HONORS the retry-after header (else exponential 2/4/8s capped at 30s; LARRY_RL_MAX_RETRIES/LARRY_RL_BACKOFF_MAX tunable). The error message is now ACTIONABLE: _parse_response_headers captures retry-after + which rail tripped (anthropic-ratelimit-{requests,input-tokens,output-tokens}-remaining:0 or unified-{5h,7d} for OAuth) and _humanize_rate_limit renders e.g. rate limit: requests-per-minute exhausted (short-window burst, NOT your 5h quota) — resets in 38s; retrying with backoff. headers.log now captures the full header block on ANY 429 (was: OAuth-mode + unified-* header only), tagged *** 429 retry-after=Ns rail=… ***, so the next rate-limit is always diagnosable.
  • Streaming parse failure no longer double-sends (burst trigger). A streaming 429/overload returns a plain JSON error body (not SSE); parse_stream_to_response previously dropped those non-data: lines, produced zero blocks, returned 1, and agent_turn blindly re-SENT the whole prompt non-streaming — a SECOND full API call within the same second (the per-minute burst). The parser now buffers the non-SSE body and, if it parses as a JSON error, returns a distinct code so the caller surfaces it WITH backoff instead of re-sending (single-send invariant: one logical attempt per turn). Also auto-defaults LARRY_NO_STREAM=1 on MobaXterm/Cygwin/MSYS (_is_cygwin_like) where SSE parsing is fragile; an explicit LARRY_NO_STREAM=0 still forces it on.
  • ErrorPI mangled error string fixed (CR-taint). — ErrorPI error: rate_limit_error was a carriage-return overprint: on MobaXterm the response field jq -r '.error.type' carried a trailing \r, which (a) broke the case "$err_type" in rate_limit_error) match → fell to the %s — %s default (the stray ), and (b) CR-returned the cursor so the terminal overprinted "API error" → "ErrorPI". Fix: strip_cr on err_type/err_msg in _humanize_api_error, and err()/warn()/log() now strip embedded CRs defensively. (The v0.7.5 CR sweep missed the error-DISPLAY construction path.)
  • phi tier-5 notice fires once per session (was per-turn nag). The tier-5 (presidio NER) disabled — sidecar not running notice printed every turn because auto_detect_phi runs inside $(...) command substitution and the old export _LARRY_PHI_TIER5_WARNED=1 flag died in the subshell. Now keyed to a $LARRY_HOME/.phi-notice-shown file holding SESSION_ID — fires once per session, survives the subshell, resets for a genuinely new session. Same-pattern sweep caught the identical subshell-flag bug in _auto_phi_b64_roundtrip's python3-missing notice (_LARRY_B64_PY3_WARNED) — fixed the same way.

v0.8.4 — 2026-05-27

  • Installer/updater now detects HTML-sign-in-page responses and fails loud instead of silently corrupting. Root cause (Clover #5's diagnosis, Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md): a private/sign-in-gated Gitea answers an unauthenticated raw-file read with the HTML Sign-In page at HTTP 200 (303 → /user/login, followed by curl -L to a 200 HTML page). curl -fsSL treats that as success, so the old installer/auto-updater parsed the HTML as VERSION/MANIFEST/larry.sh content — silently aborting, or overwriting real on-disk files with HTML soup. This is exactly what stranded a work-box at v0.7.3 until the Gitea REQUIRE_SIGNIN_VIEW=false flip.
  • New lib/fetch-safe.sh — a content-validating fetch wrapper (fetch_validate URL DEST KIND [MAX_TIME]). After every curl, BEFORE trusting the bytes, it (a) detects the HTML-login trap (<!DOCTYPE html / <html / Sign In - Gitea / <title>Sign In markers, or a text/html Content-Type when a raw file was expected) and (b) validates the content shape per file type: VERSION must match ^[0-9]+\.[0-9]+\.[0-9]+, MANIFEST must be a path-list with no HTML, larry.sh must start with #!/usr/bin/env bash, other .sh must be non-HTML. On any failure it prints an actionable error and returns non-zero without overwriting the target. The bootstrap install-larry.sh (curl|bash, runs before any lib exists) and larry.sh's self_update() (runs before lib is sourced) each carry a byte-identical inline copy; the canonical file is in MANIFEST and auto-syncs.
  • Every remote-content fetch hardened. install-larry.sh fetch(); larry.sh agent fetch, sync_from_manifest MANIFEST + per-file fetches, and _fetch_with_fallback (Phase-B VERSION + larry.sh) all route through the validator. No trusted-content fetch still uses raw curl -fsSL.
  • Optional LARRY_GITEA_TOKEN (alias GITEA_TOKEN) for authenticated fetch. When set, fetches add Authorization: token <PAT> so the installer/updater works against a PRIVATE repo without the public-flip. The token is never hardcoded and never logged. Documented in --help + MANUAL.md.

v0.8.3 — 2026-05-27

  • Tab-completion trailing space no longer breaks command dispatch. The slash-command completer intentionally appends a trailing space after a unique match (so arg-taking commands feel snappy), but the main_loop dispatcher matched exact case globs, so /quit (completed) missed the /quit) arm and fell through to "unknown command". Latent since v0.6.6 when tab completion shipped. Fixed by rtrimming the dispatch key once at the case "$input" boundary (larry.sh), which tolerates the completer's space, a user-typed trailing space, and any CR remnant while preserving interior /load FILE argument spacing. Added a shared rtrim() helper to lib/cygwin-safe.sh (and the inline fallback) next to strip_cr.

v0.8.2 — 2026-05-27

Microsoft Presidio sidecar for free-text NER. Closes V1 from Vera's audit — the dominant real-world failure mode (patient names, addresses, un-keyworded dates in prose chat). Opt-in install; larry runs in v0.8.1 mode on hosts where Presidio isn't installed (MobaXterm/Cygwin per Bryan's accepted tradeoff).

  • lib/phi-presidio-sidecar.py — FastAPI service on 127.0.0.1:$LARRY_PHI_PORT (default 41189). Wraps Presidio's AnalyzerEngine + AnonymizerEngine over spaCy en_core_web_sm (12MB model, ~9-second cold start). Two endpoints: POST /redact takes {"text": "..."} and returns {"redacted": "...", "entities": [...], "latency_ms": N}; GET /health for the launcher's readiness probe. Three HL7-specific custom recognizers added (HL7_MRN for 6-12 digit numerics with patient/MRN/account context; HL7_CARET_NAME for SMITH^JOHN outside Tier-3 line context; HL7_PHONE_BARE for plain 10-digit phones). Confidence threshold for tier-5 tokenize is 0.3 (below that is too noisy).

  • lib/phi-sidecar.sh — lifecycle launcher. Subcommands: start / stop / status / health / ensure. ensure is idempotent (no-op if already up); called from larry.sh main_loop startup, backgrounded so it never blocks larry's first prompt. Waits up to 30 seconds for the sidecar to become healthy after start; surfaces the log tail if startup fails. PID file at $LARRY_HOME/.phi-sidecar.pid; log at $LARRY_HOME/log/phi-sidecar.log. Honors LARRY_PHI_VENV env to use a dedicated virtualenv (which the installer sets up at $LARRY_HOME/phi-venv when the user opts in).

  • lib/phi-client.sh — bash wrapper around /redact. Sourceable functions: phi_client_available, phi_redact_text, phi_redact_entities. Also runs standalone as a CLI (./phi-client.sh check / redact / entities). CR-safe (sources cygwin-safe.sh defensively); 5-second curl timeout bounds any tier-5 stall.

  • Tier-5 integration in larry.sh:auto_detect_phi. New stage AFTER the existing tier-1/2/3/4 substitution and BEFORE the status summary. Sources phi-client.sh lazily, probes phi_client_available, and on success runs phi_redact_entities to get Presidio's per-entity output. Each entity is tokenized through the SAME hl7-sanitize.sh tokenize-value pipeline as tiers 1-4 (category prefixed presidio_<TYPE>) so token IDs remain stable across surfaces and the /tokens listing stays unified. Tier-5 honors LARRY_AUTO_PHI=confirm (prompts Y/n once per value) and strict (aborts the turn if tokenize-value fails on a Presidio hit). Critically, v0.8.2 removes the v0.7.3 early-return that exited auto_detect_phi when tiers 1-4 found nothing — pure-prose input now ALWAYS reaches tier-5.

  • Graceful degradation. If the sidecar is unreachable (not installed, not started, crashed), tier-5 silently no-ops with a one-time stderr warning per session. Larry's REPL remains fully functional in v0.8.1 mode. LARRY_AUTO_PHI=strict does NOT abort on absent sidecar (the strict mode escape is for HL7-shaped content where rule-pack would have caught the leak; tier-5 is additive coverage).

  • /phi-sidecar slash commandstart / stop / status / health / ensure exposed to the user. Slash-completion table and _LARRY_SLASH_CMDS_DESC updated.

  • install-larry.sh install path. On hosts with Python 3.9+ + pip, the installer prompts before creating $LARRY_HOME/phi-venv and installing presidio_analyzer + presidio_anonymizer + fastapi + uvicorn + spaCy en_core_web_sm (~400MB on disk, ~250MB RAM resident). On MobaXterm/Cygwin without python3, the installer skips the prompt entirely and prints Bryan's accepted tradeoff (MobaXterm stays on v0.8.1 + nudges). Re-runnable; idempotent.

  • MANIFEST. Added three new lib files. They auto-sync to every running client on next launch; clients without Python 3 won't run the sidecar but the files are harmless to ship.

Prototype validation (Bryan's Mac, Apple Silicon, Python 3.14). Cold start (model load): ~9 seconds with en_core_web_sm (vs ~82s with the larger en_core_web_lg Presidio auto-downloads by default — we explicitly pin _sm for the latency-sensitive REPL use case). Warm analyzer latency: P50 20.6ms, P95 22.7ms over 20 sequential requests on 100-word input. End-to-end HTTP round-trip (curl + json roundtrip): P50 ~57ms warm; first request post-startup pays a ~150ms tokenizer warmup tax then steady. Well under the 200ms-per-turn REPL budget.

Detection quality on the canonical "John Doe MRN 623000286" sample: 8 core entities caught (PERSON x2, DATE_TIME x2, PHONE_NUMBER, US_*), plus the three custom HL7 recognizers add MRN + caret-name + bare-phone coverage. Misclassifications (MRN as US_PASSPORT, "ED" as PERSON) are within tolerance for the tokenize-everything-suspicious policy — the auto-PHI lookup table sees them as presidio_* categories and the operator can audit via /tokens.

MobaXterm compatibility verdict. Per Bryan's accepted tradeoff: v0.8.2 ships Mac/Linux-only. MobaXterm/Cygwin stays on v0.8.1 (rule-pack + path-block + content-shape gating + strict mode + base64 round-trip + tool-result review gate). Test path: install-larry.sh detects platform and skips the Presidio install on windows-cygwin with a clear "v0.8.1 mode" note. No code in larry.sh is platform-gated — tier-5 silently no-ops when the sidecar is absent, which IS the MobaXterm path.

Proactive same-pattern sweep. Searched for other call sites where free-text NER would help: tool-result surface already gets HL7-shape sanitize (v0.8.1) and base64 round-trip (v0.8.1-c). Tier-5 is user_input-only by design — tool-result free-text NER deferred to a future patch (would require deciding on per-tool latency budgets; Bryan to call when needed).

v0.8.1 — 2026-05-27

Tool-result PHI gating expansion. Closes V2 / V12 and the V2 base64 sub-gap from Vera's audit. No behavior change for users not on HL7-shaped data; opt-in friction for the 8KB+ tool-result review gate.

  • Tool-name allow-list dropped; content-shape gating only. The v0.7.3 tool-result auto-PHI gate ran only on read_file (.hl7|.txt), nc_msgs, hl7_field, hl7_diff. v0.8.1 runs _auto_phi_looks_like_hl7 on EVERY tool result. On hit → route through lib/hl7-sanitize.sh. On miss → pass through unchanged. Closes V2: bash_exec/ssh_exec/ grep_files/read_file of .log/.csv/.dat/no-suffix files are now all covered when their output is HL7-shaped. False-positive cost is cheap (extra regex pass with zero behavioral impact on non-HL7).

  • Base64-wrapped HL7 round-trip. New _auto_phi_b64_roundtrip helper. Detects candidate base64 runs (length >= 200 chars, [A-Za-z0-9+/=] only, length divisible by 4 — NOT entropy-based, per Pax §V2-sub: HL7's repetitive prefixes survive base64 with LOW entropy). Speculatively decodes each run; if decoded bytes look like HL7, routes through hl7-sanitize.sh and re-encodes (base64 -w0) back into the result. Catches ssh_pull_smat sampled mode TSV (server-side encoding kept for binary-safe TSV transport; client-side unwrap handles the safety concern). Requires python3 (installed everywhere larry-anywhere runs); skipped with a one-time stderr warning if unavailable.

  • Operator review gate for bash_exec/ssh_exec/ssh_pull/ ssh_pull_smat results. When the tool produced HL7-shaped output OR the result exceeds LARRY_TOOL_RESULT_REVIEW_THRESHOLD bytes (default 8192), Larry prompts [Y/n/i] before passing the result back to the model. i opens the result in $PAGER then re-prompts. Default Y — zero friction by default. N substitutes a refusal JSON so the model knows a result was withheld. Skipped when LARRY_AUTO_PHI=off (consistent with the opt-out) OR running non-interactively (no TTY — never blocks headless scripts). Override with LARRY_TOOL_RESULT_REVIEW=always to gate every result. Per Pax §V2/V12: closes the "operator wanted to see this themselves, didn't want the model to see it" gap that's the actual common case.

Proactive same-pattern sweep. Searched the codebase for other call sites where tool output bypasses content-shape gating: found only the one in agent_turn. The v0.8.0-c strict-mode tool-result branch was hardened in lockstep so it now triggers on the broader (content-only) eligibility instead of the old name-allow-list.

Manifest unchanged.

v0.8.0 — 2026-05-27

PHI-safety quick-wins pack — three independent zero-risk patches closing four gap-classes Vera identified in the v0.7.5 static audit (Deliverables/2026-05-27-cloverleaf-larry-phi-leak-audit.md) with Pax's recommended mitigations (Deliverables/2026-05-27-cloverleaf-larry-phi-mitigation-research.md). No new dependencies, no behavior change for users not interacting with PHI.

  • read_file/grep_files/glob_files/list_dir path-block list (closes V4 + V6 + V11). Refuse — with a structured JSON error the model must surface, NOT a silent "file not found" — any tool-side attempt to read or enumerate under $LARRY_HOME/log/ (auto-phi.log, headers.log, oauth.log, session logs), $LARRY_HOME/sanitize/ (lookup.tsv — the desanitization key), $LARRY_HOME/sessions/, $LARRY_HOME/.oauth.json, or $LARRY_HOME/.env. Block-list resolves $LARRY_HOME at call time (not script-parse time) and runs against both the literal path and its realpath -m canonical form, so symlink detours don't bypass. The proactive same-pattern sweep (Bryan standing rule, 2026-05-27) extended the block from tool_read_file alone to also cover tool_grep_files, tool_glob_files, and tool_list_dir — those tools would otherwise leak filenames or grep-matched content out of the same protected dirs without any approval gate.

  • /load <file> HL7 pre-routing (closes V3). When the loaded file's content matches _auto_phi_looks_like_hl7, route it through lib/hl7-sanitize.sh (the segment-aware tokenizer with the full PHI field rule set: PID, PV1, NK1, GT1, IN1, OBR, OBX, DG1, ORC) BEFORE the existing user_input auto-PHI pass. Closes the gap where smat dumps loaded via /load only got the lighter per-word classifier, which misses bare HL7 PID fields. Status line reports how many fields were tokenized: phi> /load: hl7-sanitize.sh tokenized N HL7 field(s) from <file> before passing to auto-PHI. Strict mode (see below) aborts the /load if sanitize fails; default/confirm modes warn-and-continue.

  • LARRY_AUTO_PHI=strict fail-closed mode (closes V5). New fourth value alongside off / on / confirm. In strict mode, the auto-PHI pipeline aborts the surrounding turn (no payload built, no API call) when: (a) lib/hl7-sanitize.sh is missing/non-executable on HL7-shaped user_input, (b) the sanitizer returns empty on HL7-shaped content, (c) any single value's tokenize-value call fails inside the detection loop. On the tool-result surface (which can't kill the in-flight tool_use), strict mode substitutes the result with a structured JSON refusal sentinel so the raw HL7 NEVER reaches the model. Existing off / on / confirm semantics unchanged (still fail-open per Bryan's "don't break tools" priority). Strict is the opt-in tradeoff for HIPAA work where a silent leak is worse than a broken turn. /phi-auto strict toggle and /help text updated. Wired into both auto-PHI invocation sites: user input scan and the tool-result HL7 sanitizer gate.

Proactive same-pattern sweep (Bryan standing rule, 2026-05-27). Searched the codebase for other tools matching the pattern "reads arbitrary path, returns content to model, no approval gate": found and patched tool_grep_files, tool_glob_files, tool_list_dir alongside tool_read_file. bash_exec/ssh_exec already require Y/N operator approval — the operator is the gatekeeper there (a second gate deferred to v0.8.1). No other matches.

Manifest unchanged (no new files in lib/).

v0.7.5 — 2026-05-27

Three focused changes, one common cause: the Cygwin/MobaXterm CR-taint pattern that crashed OAuth on Bryan's v0.7.3 work-box with the cryptic error bash: ...: arithmetic syntax error: invalid arithmetic operator (error token is "").

  • OAuth/arithmetic CR fix. lib/oauth.sh now routes every operand entering a bash arithmetic context (fetched_at, expires_in, now) through a dedicated coerce_int helper that strips non-digits at the source. The failure mode: $(date +%s) against a Cygwin pty where Windows-native date.exe shadows Cygwin date can return a CR-tainted epoch like "1779999999\r", which crashes the very next $((expires_at - now)). Diagnosis in Deliverables/2026-05-27-cloverleaf-larry-oauth-arithmetic-fix.md.

  • Mouse mode is opt-in. REPL mouse handling now defaults to OFF and is enabled via LARRY_MOUSE=1 env var or /mouse on slash command. Several terminals (notably MobaXterm and stripped tmux) were swallowing the mouse ANSI sequences and printing literal ^[[?1000h garbage when v0.7.0 turned it on unconditionally. Diagnosis in Deliverables/2026-05-27-cloverleaf-larry-mouse-regression-fix.md.

  • CR-safety sweep across lib/*.sh and top-level scripts. Three new primitives in lib/cygwin-safe.sh (sourced by every tool family member):

    • coerce_int VAL [DEFAULT] — for arithmetic and integer-test operands
    • strip_cr VAL — for case patterns, regex tests, paths, HTTP headers
    • read_clean VAR [PROMPT]read -r wrapper that strips CR pre-assign Hardened call sites:
    • larry.sh — status-line date +%s / tput cols, three y/N approval prompts (write_file, bash_exec, first-run auth), API-key paste, first-run auth menu
    • lib/oauth.shcmd_login and cmd_refresh date +%s captures
    • lib/nc-engine.sh — five y/N action prompts (stop/start/bounce, resend, route-test, testxlate, tpstest) + find ... | wc -l arithmetic
    • lib/nc-msgs.shparse_time_ms date captures (4 sites), meta-TSV tm field, MSG_COUNT wc -l
    • lib/nc-regression.shtr | wc -c count, hl7-diff ?-fallback arithmetic
    • lib/nc-smat-diff.shA_COUNT/B_COUNT/DIFFS_TOTAL
    • lib/nc-insert-protocol.sh — every awk-emitted line-number that feeds head -n $((N-1)) / tail -n +$((N+1)) arithmetic
    • lib/journal.sh_next_seq wc -l arithmetic
    • lib/lessons.sh_next_id, cmd_list, cmd_count arithmetic + two y/N prompts (clear all, clear since)
    • lib/hl7-sanitize.shcmd_count arithmetic + clear-table y/N
    • lib/ssh-helper.sh — local + remote wc -c integer compares (4 sites)
    • lib/nc-find.shwc -l count for %d printf
    • lib/nc-table.sh$(date +%s) in backup-filename construction
    • lib/nc-document.sh — two wc -l | %d printf sites
    • larry-rollback.sh — Proceed? y/N prompt

    Reproduction (now exercised by cygwin-safe.sh's in-line tests):

    now=$(printf '%s\r' 1779999999); echo $((now - 1))   # pre-fix: crashes
    now=$(coerce_int "$(printf '%s\r' 1779999999)" 0); echo $((now - 1))   # fix: 1779999998
    

    Added lib/cygwin-safe.sh to MANIFEST so it auto-syncs to every running client on next launch.

v0.7.4 — 2026-05-27

  • Drop GitHub fallback from auto-update. Single-source Gitea (https://git.bjnoela.com/bryan/cloverleaf-larry.git).

v0.7.3 — 2026-05-26

  • Automatic PHI detection (tiered detection + blacklist contexts).

v0.7.2 — 2026-05-26

  • Gitea becomes primary auto-update origin; GitHub demoted to fallback.

v0.7.1 — 2026-05-26

  • Status line moves to between-turn position (post-input, pre-response).
  • Status line below prompt; automatic PHI detection; session-artifact upload.

v0.7.0 — 2026-05-26

  • HL7-aware tab completion + REPL mouse mode (later made opt-in in v0.7.5).