Diagnose-don't-assume rate-limit cluster (Clover #8). The rate_limit_error on a
work-box with 90% of the 5h Max quota free was a short-window BURST rail, not 5h
exhaustion — tripped by a stream->non-stream double-send per turn with no backoff.
- Rate-limit backoff honoring retry-after (else exp 2/4/8 cap 30) + actionable
header-parsed message naming the tripped rail; headers.log now captures every
429 (was OAuth+unified-* only), tagged with retry-after + rail.
- parse_stream_to_response detects a non-SSE JSON error body (429/overload) and
returns a distinct code so agent_turn surfaces it WITH backoff instead of
re-sending the whole prompt (single-send invariant). Auto LARRY_NO_STREAM=1 on
MobaXterm/Cygwin/MSYS; explicit LARRY_NO_STREAM=0 still forces streaming on.
- ErrorPI fix: strip_cr on err_type/err_msg in _humanize_api_error (a trailing
CR broke the case match AND carriage-overprinted "API error"); err/warn/log
now strip embedded CRs defensively. (v0.7.5 sweep missed the error-display path.)
- phi tier-5 notice once-per-session via $LARRY_HOME/.phi-notice-shown SESSION_ID
flag (old export flag died in the $(...) subshell -> per-turn nag). Same-pattern
sweep fixed the identical subshell-flag bug in _auto_phi_b64_roundtrip.
Deliverable: Deliverables/2026-05-27-cloverleaf-larry-v085-ratelimit-streaming-fixes.md
Co-Authored-By: Clover (Claude Opus 4.7) <noreply@anthropic.com>