Diagnose-don't-assume rate-limit cluster (Clover #8). The rate_limit_error on a work-box with 90% of the 5h Max quota free was a short-window BURST rail, not 5h exhaustion — tripped by a stream->non-stream double-send per turn with no backoff. - Rate-limit backoff honoring retry-after (else exp 2/4/8 cap 30) + actionable header-parsed message naming the tripped rail; headers.log now captures every 429 (was OAuth+unified-* only), tagged with retry-after + rail. - parse_stream_to_response detects a non-SSE JSON error body (429/overload) and returns a distinct code so agent_turn surfaces it WITH backoff instead of re-sending the whole prompt (single-send invariant). Auto LARRY_NO_STREAM=1 on MobaXterm/Cygwin/MSYS; explicit LARRY_NO_STREAM=0 still forces streaming on. - ErrorPI fix: strip_cr on err_type/err_msg in _humanize_api_error (a trailing CR broke the case match AND carriage-overprinted "API error"); err/warn/log now strip embedded CRs defensively. (v0.7.5 sweep missed the error-display path.) - phi tier-5 notice once-per-session via $LARRY_HOME/.phi-notice-shown SESSION_ID flag (old export flag died in the $(...) subshell -> per-turn nag). Same-pattern sweep fixed the identical subshell-flag bug in _auto_phi_b64_roundtrip. Deliverable: Deliverables/2026-05-27-cloverleaf-larry-v085-ratelimit-streaming-fixes.md Co-Authored-By: Clover (Claude Opus 4.7) <noreply@anthropic.com>
22 KiB
Changelog
All notable changes to cloverleaf-larry / larry-anywhere are recorded here.
Versioning is loose-semver; bumps trigger the in-process self-update on every
running client via LARRY_BASE_URL + MANIFEST.
v0.8.5 — 2026-05-27
Diagnose-don't-assume rate-limit cluster fix (Clover #8). Symptom: a hello
turn threw rate_limit_error on a work-box with 90% of the Claude Max 5h quota
free — so NOT 5h-window exhaustion. Root cause = a short-window BURST rail
tripped by a stream→non-stream double-send per turn, with no backoff.
- Rate-limit backoff + actionable message (ROOT). A 429 no longer fails the
turn or fires an immediate re-send.
agent_turnnow retries with backoff that HONORS theretry-afterheader (else exponential 2/4/8s capped at 30s;LARRY_RL_MAX_RETRIES/LARRY_RL_BACKOFF_MAXtunable). The error message is now ACTIONABLE:_parse_response_headerscapturesretry-after+ which rail tripped (anthropic-ratelimit-{requests,input-tokens,output-tokens}-remaining:0orunified-{5h,7d}for OAuth) and_humanize_rate_limitrenders e.g.rate limit: requests-per-minute exhausted (short-window burst, NOT your 5h quota) — resets in 38s; retrying with backoff.headers.lognow captures the full header block on ANY 429 (was: OAuth-mode + unified-* header only), tagged*** 429 retry-after=Ns rail=… ***, so the next rate-limit is always diagnosable. - Streaming parse failure no longer double-sends (burst trigger). A
streaming 429/overload returns a plain JSON error body (not SSE);
parse_stream_to_responsepreviously dropped those non-data:lines, produced zero blocks, returned 1, andagent_turnblindly re-SENT the whole prompt non-streaming — a SECOND full API call within the same second (the per-minute burst). The parser now buffers the non-SSE body and, if it parses as a JSON error, returns a distinct code so the caller surfaces it WITH backoff instead of re-sending (single-send invariant: one logical attempt per turn). Also auto-defaultsLARRY_NO_STREAM=1on MobaXterm/Cygwin/MSYS (_is_cygwin_like) where SSE parsing is fragile; an explicitLARRY_NO_STREAM=0still forces it on. ErrorPImangled error string fixed (CR-taint).— ErrorPI error: rate_limit_errorwas a carriage-return overprint: on MobaXterm the response fieldjq -r '.error.type'carried a trailing\r, which (a) broke thecase "$err_type" in rate_limit_error)match → fell to the%s — %sdefault (the stray—), and (b) CR-returned the cursor so the terminal overprinted "API error" → "ErrorPI". Fix:strip_cronerr_type/err_msgin_humanize_api_error, anderr()/warn()/log()now strip embedded CRs defensively. (The v0.7.5 CR sweep missed the error-DISPLAY construction path.)- phi tier-5 notice fires once per session (was per-turn nag). The
tier-5 (presidio NER) disabled — sidecar not runningnotice printed every turn becauseauto_detect_phiruns inside$(...)command substitution and the oldexport _LARRY_PHI_TIER5_WARNED=1flag died in the subshell. Now keyed to a$LARRY_HOME/.phi-notice-shownfile holdingSESSION_ID— fires once per session, survives the subshell, resets for a genuinely new session. Same-pattern sweep caught the identical subshell-flag bug in_auto_phi_b64_roundtrip's python3-missing notice (_LARRY_B64_PY3_WARNED) — fixed the same way.
v0.8.4 — 2026-05-27
- Installer/updater now detects HTML-sign-in-page responses and fails loud
instead of silently corrupting. Root cause (Clover #5's diagnosis,
Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md): a private/sign-in-gated Gitea answers an unauthenticated raw-file read with the HTML Sign-In page at HTTP 200 (303 →/user/login, followed bycurl -Lto a 200 HTML page).curl -fsSLtreats that as success, so the old installer/auto-updater parsed the HTML as VERSION/MANIFEST/larry.shcontent — silently aborting, or overwriting real on-disk files with HTML soup. This is exactly what stranded a work-box at v0.7.3 until the GiteaREQUIRE_SIGNIN_VIEW=falseflip. - New
lib/fetch-safe.sh— a content-validating fetch wrapper (fetch_validate URL DEST KIND [MAX_TIME]). After everycurl, BEFORE trusting the bytes, it (a) detects the HTML-login trap (<!DOCTYPE html/<html/Sign In - Gitea/<title>Sign Inmarkers, or atext/htmlContent-Typewhen a raw file was expected) and (b) validates the content shape per file type: VERSION must match^[0-9]+\.[0-9]+\.[0-9]+, MANIFEST must be a path-list with no HTML,larry.shmust start with#!/usr/bin/env bash, other.shmust be non-HTML. On any failure it prints an actionable error and returns non-zero without overwriting the target. The bootstrapinstall-larry.sh(curl|bash, runs before any lib exists) andlarry.sh'sself_update()(runs before lib is sourced) each carry a byte-identical inline copy; the canonical file is in MANIFEST and auto-syncs. - Every remote-content fetch hardened.
install-larry.shfetch();larry.shagent fetch,sync_from_manifestMANIFEST + per-file fetches, and_fetch_with_fallback(Phase-B VERSION + larry.sh) all route through the validator. No trusted-content fetch still uses rawcurl -fsSL. - Optional
LARRY_GITEA_TOKEN(aliasGITEA_TOKEN) for authenticated fetch. When set, fetches addAuthorization: token <PAT>so the installer/updater works against a PRIVATE repo without the public-flip. The token is never hardcoded and never logged. Documented in--help+ MANUAL.md.
v0.8.3 — 2026-05-27
- Tab-completion trailing space no longer breaks command dispatch. The
slash-command completer intentionally appends a trailing space after a
unique match (so arg-taking commands feel snappy), but the main_loop
dispatcher matched exact
caseglobs, so/quit(completed) missed the/quit)arm and fell through to "unknown command". Latent since v0.6.6 when tab completion shipped. Fixed by rtrimming the dispatch key once at thecase "$input"boundary (larry.sh), which tolerates the completer's space, a user-typed trailing space, and any CR remnant while preserving interior/load FILEargument spacing. Added a sharedrtrim()helper tolib/cygwin-safe.sh(and the inline fallback) next tostrip_cr.
v0.8.2 — 2026-05-27
Microsoft Presidio sidecar for free-text NER. Closes V1 from Vera's audit — the dominant real-world failure mode (patient names, addresses, un-keyworded dates in prose chat). Opt-in install; larry runs in v0.8.1 mode on hosts where Presidio isn't installed (MobaXterm/Cygwin per Bryan's accepted tradeoff).
-
lib/phi-presidio-sidecar.py— FastAPI service on127.0.0.1:$LARRY_PHI_PORT(default41189). Wraps Presidio'sAnalyzerEngine+AnonymizerEngineover spaCyen_core_web_sm(12MB model, ~9-second cold start). Two endpoints:POST /redacttakes{"text": "..."}and returns{"redacted": "...", "entities": [...], "latency_ms": N};GET /healthfor the launcher's readiness probe. Three HL7-specific custom recognizers added (HL7_MRNfor 6-12 digit numerics with patient/MRN/account context;HL7_CARET_NAMEforSMITH^JOHNoutside Tier-3 line context;HL7_PHONE_BAREfor plain 10-digit phones). Confidence threshold for tier-5 tokenize is 0.3 (below that is too noisy). -
lib/phi-sidecar.sh— lifecycle launcher. Subcommands:start / stop / status / health / ensure.ensureis idempotent (no-op if already up); called fromlarry.shmain_loop startup, backgrounded so it never blocks larry's first prompt. Waits up to 30 seconds for the sidecar to become healthy afterstart; surfaces the log tail if startup fails. PID file at$LARRY_HOME/.phi-sidecar.pid; log at$LARRY_HOME/log/phi-sidecar.log. HonorsLARRY_PHI_VENVenv to use a dedicated virtualenv (which the installer sets up at$LARRY_HOME/phi-venvwhen the user opts in). -
lib/phi-client.sh— bash wrapper around/redact. Sourceable functions:phi_client_available,phi_redact_text,phi_redact_entities. Also runs standalone as a CLI (./phi-client.sh check / redact / entities). CR-safe (sourcescygwin-safe.shdefensively); 5-second curl timeout bounds any tier-5 stall. -
Tier-5 integration in
larry.sh:auto_detect_phi. New stage AFTER the existing tier-1/2/3/4 substitution and BEFORE the status summary. Sourcesphi-client.shlazily, probesphi_client_available, and on success runsphi_redact_entitiesto get Presidio's per-entity output. Each entity is tokenized through the SAMEhl7-sanitize.sh tokenize-valuepipeline as tiers 1-4 (category prefixedpresidio_<TYPE>) so token IDs remain stable across surfaces and the/tokenslisting stays unified. Tier-5 honorsLARRY_AUTO_PHI=confirm(prompts Y/n once per value) andstrict(aborts the turn iftokenize-valuefails on a Presidio hit). Critically, v0.8.2 removes the v0.7.3 early-return that exitedauto_detect_phiwhen tiers 1-4 found nothing — pure-prose input now ALWAYS reaches tier-5. -
Graceful degradation. If the sidecar is unreachable (not installed, not started, crashed), tier-5 silently no-ops with a one-time stderr warning per session. Larry's REPL remains fully functional in v0.8.1 mode.
LARRY_AUTO_PHI=strictdoes NOT abort on absent sidecar (the strict mode escape is for HL7-shaped content where rule-pack would have caught the leak; tier-5 is additive coverage). -
/phi-sidecarslash command —start / stop / status / health / ensureexposed to the user. Slash-completion table and_LARRY_SLASH_CMDS_DESCupdated. -
install-larry.shinstall path. On hosts with Python 3.9+ + pip, the installer prompts before creating$LARRY_HOME/phi-venvand installingpresidio_analyzer + presidio_anonymizer + fastapi + uvicorn + spaCy en_core_web_sm(~400MB on disk, ~250MB RAM resident). On MobaXterm/Cygwin without python3, the installer skips the prompt entirely and prints Bryan's accepted tradeoff (MobaXterm stays on v0.8.1 + nudges). Re-runnable; idempotent. -
MANIFEST. Added three new lib files. They auto-sync to every running client on next launch; clients without Python 3 won't run the sidecar but the files are harmless to ship.
Prototype validation (Bryan's Mac, Apple Silicon, Python 3.14).
Cold start (model load): ~9 seconds with en_core_web_sm (vs ~82s with
the larger en_core_web_lg Presidio auto-downloads by default — we
explicitly pin _sm for the latency-sensitive REPL use case). Warm
analyzer latency: P50 20.6ms, P95 22.7ms over 20 sequential requests
on 100-word input. End-to-end HTTP round-trip (curl + json roundtrip):
P50 ~57ms warm; first request post-startup pays a ~150ms tokenizer
warmup tax then steady. Well under the 200ms-per-turn REPL budget.
Detection quality on the canonical "John Doe MRN 623000286" sample: 8
core entities caught (PERSON x2, DATE_TIME x2, PHONE_NUMBER, US_*),
plus the three custom HL7 recognizers add MRN + caret-name + bare-phone
coverage. Misclassifications (MRN as US_PASSPORT, "ED" as PERSON) are
within tolerance for the tokenize-everything-suspicious policy — the
auto-PHI lookup table sees them as presidio_* categories and the
operator can audit via /tokens.
MobaXterm compatibility verdict. Per Bryan's accepted tradeoff:
v0.8.2 ships Mac/Linux-only. MobaXterm/Cygwin stays on v0.8.1
(rule-pack + path-block + content-shape gating + strict mode + base64
round-trip + tool-result review gate). Test path: install-larry.sh
detects platform and skips the Presidio install on windows-cygwin
with a clear "v0.8.1 mode" note. No code in larry.sh is platform-gated
— tier-5 silently no-ops when the sidecar is absent, which IS the
MobaXterm path.
Proactive same-pattern sweep. Searched for other call sites where free-text NER would help: tool-result surface already gets HL7-shape sanitize (v0.8.1) and base64 round-trip (v0.8.1-c). Tier-5 is user_input-only by design — tool-result free-text NER deferred to a future patch (would require deciding on per-tool latency budgets; Bryan to call when needed).
v0.8.1 — 2026-05-27
Tool-result PHI gating expansion. Closes V2 / V12 and the V2 base64 sub-gap from Vera's audit. No behavior change for users not on HL7-shaped data; opt-in friction for the 8KB+ tool-result review gate.
-
Tool-name allow-list dropped; content-shape gating only. The v0.7.3 tool-result auto-PHI gate ran only on
read_file (.hl7|.txt),nc_msgs,hl7_field,hl7_diff. v0.8.1 runs_auto_phi_looks_like_hl7on EVERY tool result. On hit → route throughlib/hl7-sanitize.sh. On miss → pass through unchanged. Closes V2:bash_exec/ssh_exec/grep_files/read_fileof.log/.csv/.dat/no-suffix files are now all covered when their output is HL7-shaped. False-positive cost is cheap (extra regex pass with zero behavioral impact on non-HL7). -
Base64-wrapped HL7 round-trip. New
_auto_phi_b64_roundtriphelper. Detects candidate base64 runs (length >= 200 chars,[A-Za-z0-9+/=]only, length divisible by 4 — NOT entropy-based, per Pax §V2-sub: HL7's repetitive prefixes survive base64 with LOW entropy). Speculatively decodes each run; if decoded bytes look like HL7, routes throughhl7-sanitize.shand re-encodes (base64 -w0) back into the result. Catchesssh_pull_smatsampled mode TSV (server-side encoding kept for binary-safe TSV transport; client-side unwrap handles the safety concern). Requirespython3(installed everywhere larry-anywhere runs); skipped with a one-time stderr warning if unavailable. -
Operator review gate for
bash_exec/ssh_exec/ssh_pull/ssh_pull_smatresults. When the tool produced HL7-shaped output OR the result exceedsLARRY_TOOL_RESULT_REVIEW_THRESHOLDbytes (default 8192), Larry prompts[Y/n/i]before passing the result back to the model.iopens the result in$PAGERthen re-prompts. Default Y — zero friction by default.Nsubstitutes a refusal JSON so the model knows a result was withheld. Skipped whenLARRY_AUTO_PHI=off(consistent with the opt-out) OR running non-interactively (no TTY — never blocks headless scripts). Override withLARRY_TOOL_RESULT_REVIEW=alwaysto gate every result. Per Pax §V2/V12: closes the "operator wanted to see this themselves, didn't want the model to see it" gap that's the actual common case.
Proactive same-pattern sweep. Searched the codebase for other call
sites where tool output bypasses content-shape gating: found only the
one in agent_turn. The v0.8.0-c strict-mode tool-result branch was
hardened in lockstep so it now triggers on the broader (content-only)
eligibility instead of the old name-allow-list.
Manifest unchanged.
v0.8.0 — 2026-05-27
PHI-safety quick-wins pack — three independent zero-risk patches closing
four gap-classes Vera identified in the v0.7.5 static audit
(Deliverables/2026-05-27-cloverleaf-larry-phi-leak-audit.md) with Pax's
recommended mitigations
(Deliverables/2026-05-27-cloverleaf-larry-phi-mitigation-research.md).
No new dependencies, no behavior change for users not interacting with PHI.
-
read_file/grep_files/glob_files/list_dirpath-block list (closes V4 + V6 + V11). Refuse — with a structured JSON error the model must surface, NOT a silent "file not found" — any tool-side attempt to read or enumerate under$LARRY_HOME/log/(auto-phi.log, headers.log, oauth.log, session logs),$LARRY_HOME/sanitize/(lookup.tsv — the desanitization key),$LARRY_HOME/sessions/,$LARRY_HOME/.oauth.json, or$LARRY_HOME/.env. Block-list resolves$LARRY_HOMEat call time (not script-parse time) and runs against both the literal path and itsrealpath -mcanonical form, so symlink detours don't bypass. The proactive same-pattern sweep (Bryan standing rule, 2026-05-27) extended the block fromtool_read_filealone to also covertool_grep_files,tool_glob_files, andtool_list_dir— those tools would otherwise leak filenames or grep-matched content out of the same protected dirs without any approval gate. -
/load <file>HL7 pre-routing (closes V3). When the loaded file's content matches_auto_phi_looks_like_hl7, route it throughlib/hl7-sanitize.sh(the segment-aware tokenizer with the full PHI field rule set: PID, PV1, NK1, GT1, IN1, OBR, OBX, DG1, ORC) BEFORE the existing user_input auto-PHI pass. Closes the gap where smat dumps loaded via/loadonly got the lighter per-word classifier, which misses bare HL7 PID fields. Status line reports how many fields were tokenized:phi> /load: hl7-sanitize.sh tokenized N HL7 field(s) from <file> before passing to auto-PHI. Strict mode (see below) aborts the/loadif sanitize fails; default/confirm modes warn-and-continue. -
LARRY_AUTO_PHI=strictfail-closed mode (closes V5). New fourth value alongsideoff / on / confirm. In strict mode, the auto-PHI pipeline aborts the surrounding turn (no payload built, no API call) when: (a)lib/hl7-sanitize.shis missing/non-executable on HL7-shaped user_input, (b) the sanitizer returns empty on HL7-shaped content, (c) any single value'stokenize-valuecall fails inside the detection loop. On the tool-result surface (which can't kill the in-flight tool_use), strict mode substitutes the result with a structured JSON refusal sentinel so the raw HL7 NEVER reaches the model. Existingoff / on / confirmsemantics unchanged (still fail-open per Bryan's "don't break tools" priority). Strict is the opt-in tradeoff for HIPAA work where a silent leak is worse than a broken turn./phi-auto stricttoggle and/helptext updated. Wired into both auto-PHI invocation sites: user input scan and the tool-result HL7 sanitizer gate.
Proactive same-pattern sweep (Bryan standing rule, 2026-05-27).
Searched the codebase for other tools matching the pattern "reads
arbitrary path, returns content to model, no approval gate": found and
patched tool_grep_files, tool_glob_files, tool_list_dir
alongside tool_read_file. bash_exec/ssh_exec already require Y/N
operator approval — the operator is the gatekeeper there (a second gate
deferred to v0.8.1). No other matches.
Manifest unchanged (no new files in lib/).
v0.7.5 — 2026-05-27
Three focused changes, one common cause: the Cygwin/MobaXterm CR-taint pattern
that crashed OAuth on Bryan's v0.7.3 work-box with the cryptic error
bash: ...: arithmetic syntax error: invalid arithmetic operator (error token is "").
-
OAuth/arithmetic CR fix.
lib/oauth.shnow routes every operand entering a bash arithmetic context (fetched_at,expires_in,now) through a dedicatedcoerce_inthelper that strips non-digits at the source. The failure mode:$(date +%s)against a Cygwin pty where Windows-nativedate.exeshadows Cygwindatecan return a CR-tainted epoch like"1779999999\r", which crashes the very next$((expires_at - now)). Diagnosis inDeliverables/2026-05-27-cloverleaf-larry-oauth-arithmetic-fix.md. -
Mouse mode is opt-in. REPL mouse handling now defaults to OFF and is enabled via
LARRY_MOUSE=1env var or/mouse onslash command. Several terminals (notably MobaXterm and stripped tmux) were swallowing the mouse ANSI sequences and printing literal^[[?1000hgarbage when v0.7.0 turned it on unconditionally. Diagnosis inDeliverables/2026-05-27-cloverleaf-larry-mouse-regression-fix.md. -
CR-safety sweep across
lib/*.shand top-level scripts. Three new primitives inlib/cygwin-safe.sh(sourced by every tool family member):coerce_int VAL [DEFAULT]— for arithmetic and integer-test operandsstrip_cr VAL— for case patterns, regex tests, paths, HTTP headersread_clean VAR [PROMPT]—read -rwrapper that strips CR pre-assign Hardened call sites:larry.sh— status-linedate +%s/tput cols, three y/N approval prompts (write_file, bash_exec, first-run auth), API-key paste, first-run auth menulib/oauth.sh—cmd_loginandcmd_refreshdate +%scaptureslib/nc-engine.sh— five y/N action prompts (stop/start/bounce, resend, route-test, testxlate, tpstest) +find ... | wc -larithmeticlib/nc-msgs.sh—parse_time_msdatecaptures (4 sites), meta-TSVtmfield,MSG_COUNTwc -llib/nc-regression.sh—tr | wc -ccount, hl7-diff?-fallback arithmeticlib/nc-smat-diff.sh—A_COUNT/B_COUNT/DIFFS_TOTALlib/nc-insert-protocol.sh— every awk-emitted line-number that feedshead -n $((N-1))/tail -n +$((N+1))arithmeticlib/journal.sh—_next_seqwc -larithmeticlib/lessons.sh—_next_id,cmd_list,cmd_countarithmetic + two y/N prompts (clear all, clear since)lib/hl7-sanitize.sh—cmd_countarithmetic + clear-table y/Nlib/ssh-helper.sh— local + remotewc -cinteger compares (4 sites)lib/nc-find.sh—wc -lcount for%dprintflib/nc-table.sh—$(date +%s)in backup-filename constructionlib/nc-document.sh— twowc -l | %dprintf siteslarry-rollback.sh— Proceed? y/N prompt
Reproduction (now exercised by
cygwin-safe.sh's in-line tests):now=$(printf '%s\r' 1779999999); echo $((now - 1)) # pre-fix: crashes now=$(coerce_int "$(printf '%s\r' 1779999999)" 0); echo $((now - 1)) # fix: 1779999998Added
lib/cygwin-safe.shtoMANIFESTso it auto-syncs to every running client on next launch.
v0.7.4 — 2026-05-27
- Drop GitHub fallback from auto-update. Single-source Gitea
(
https://git.bjnoela.com/bryan/cloverleaf-larry.git).
v0.7.3 — 2026-05-26
- Automatic PHI detection (tiered detection + blacklist contexts).
v0.7.2 — 2026-05-26
- Gitea becomes primary auto-update origin; GitHub demoted to fallback.
v0.7.1 — 2026-05-26
- Status line moves to between-turn position (post-input, pre-response).
- Status line below prompt; automatic PHI detection; session-artifact upload.
v0.7.0 — 2026-05-26
- HL7-aware tab completion + REPL mouse mode (later made opt-in in v0.7.5).