v0.8.4: installer/updater detects HTML-sign-in-page responses and fails loud

Hardens the installer + auto-updater against the Gitea private-repo trap
(Clover #5 diagnosis): an unauthenticated raw-file read of a sign-in-gated
Gitea returns the HTML Sign-In page at HTTP 200, which `curl -fsSL` treats as
success — so the old code parsed HTML as VERSION/MANIFEST/larry.sh content and
silently aborted (or overwrote real files with HTML). This stranded a work-box
at v0.7.3 until the REQUIRE_SIGNIN_VIEW=false flip.

- New lib/fetch-safe.sh: fetch_validate URL DEST KIND [MAX_TIME]. Detects the
  HTML-login trap (DOCTYPE/<html/"Sign In - Gitea"/<title>Sign In markers, or
  text/html Content-Type) and validates content shape per file type (semver
  VERSION, path-list MANIFEST, shebang larry.sh, non-HTML .sh). On failure:
  actionable error + non-zero, target file left untouched.
- install-larry.sh (curl|bash bootstrap) and larry.sh self_update() each carry
  a byte-identical inline copy (both run before lib/ can be sourced).
- Every remote-content fetch routed through the validator: install fetch();
  agent fetch; sync_from_manifest MANIFEST + per-file; _fetch_with_fallback.
- Optional LARRY_GITEA_TOKEN / GITEA_TOKEN env var adds Authorization: token
  <PAT> for authenticated fetch against private repos. Never hardcoded/logged.
  Documented in --help + MANUAL.md.

Co-Authored-By: Clover (Claude Opus 4.7) <noreply@anthropic.com>
This commit is contained in:
Bryan Johnson 2026-05-27 20:28:58 -07:00
parent d4c382dc6d
commit 31ffae6f36
7 changed files with 509 additions and 16 deletions

View File

@ -4,6 +4,39 @@ All notable changes to `cloverleaf-larry` / `larry-anywhere` are recorded here.
Versioning is loose-semver; bumps trigger the in-process self-update on every
running client via `LARRY_BASE_URL` + `MANIFEST`.
## v0.8.4 — 2026-05-27
- **Installer/updater now detects HTML-sign-in-page responses and fails loud
instead of silently corrupting.** Root cause (Clover #5's diagnosis,
`Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md`): a
private/sign-in-gated Gitea answers an unauthenticated raw-file read with the
**HTML Sign-In page at HTTP 200** (303 → `/user/login`, followed by `curl -L`
to a 200 HTML page). `curl -fsSL` treats that as success, so the old
installer/auto-updater parsed the HTML as VERSION/MANIFEST/`larry.sh` content
— silently aborting, or overwriting real on-disk files with HTML soup. This
is exactly what stranded a work-box at v0.7.3 until the Gitea
`REQUIRE_SIGNIN_VIEW=false` flip.
- **New `lib/fetch-safe.sh`** — a content-validating fetch wrapper
(`fetch_validate URL DEST KIND [MAX_TIME]`). After every `curl`, BEFORE
trusting the bytes, it (a) detects the HTML-login trap (`<!DOCTYPE html` /
`<html` / `Sign In - Gitea` / `<title>Sign In` markers, or a `text/html`
`Content-Type` when a raw file was expected) and (b) validates the content
shape per file type: VERSION must match `^[0-9]+\.[0-9]+\.[0-9]+`, MANIFEST
must be a path-list with no HTML, `larry.sh` must start with
`#!/usr/bin/env bash`, other `.sh` must be non-HTML. On any failure it prints
an actionable error and returns non-zero **without overwriting the target**.
The bootstrap `install-larry.sh` (curl|bash, runs before any lib exists) and
`larry.sh`'s `self_update()` (runs before lib is sourced) each carry a
byte-identical inline copy; the canonical file is in MANIFEST and auto-syncs.
- **Every remote-content fetch hardened.** `install-larry.sh` `fetch()`;
`larry.sh` agent fetch, `sync_from_manifest` MANIFEST + per-file fetches, and
`_fetch_with_fallback` (Phase-B VERSION + larry.sh) all route through the
validator. No trusted-content fetch still uses raw `curl -fsSL`.
- **Optional `LARRY_GITEA_TOKEN` (alias `GITEA_TOKEN`) for authenticated
fetch.** When set, fetches add `Authorization: token <PAT>` so the
installer/updater works against a PRIVATE repo without the public-flip. The
token is never hardcoded and never logged. Documented in `--help` + MANUAL.md.
## v0.8.3 — 2026-05-27
- **Tab-completion trailing space no longer breaks command dispatch.** The

View File

@ -27,6 +27,11 @@ agents/regress.md
# Cygwin/MobaXterm CR-taint defense primitives (sourced by every tool)
lib/cygwin-safe.sh
# v0.8.4: content-validating fetch (HTML-sign-in-page trap detection + per-
# file-type shape checks) for the installer/auto-updater. Canonical home of the
# validators that install-larry.sh and larry.sh also carry inline (pre-source).
lib/fetch-safe.sh
# Auth implementation
lib/oauth.sh

View File

@ -24,6 +24,37 @@ export HCISITEDIR="$HCIROOT/$HCISITE"
---
## Auto-update & origin
`larry.sh` (and `install-larry.sh`) fetch updates from a single Gitea origin:
`$LARRY_BASE_URL` (default `https://git.bjnoela.com/bryan/cloverleaf-larry/raw/branch/main`).
Env vars that control fetching:
- `LARRY_BASE_URL` — override the origin (fork/mirror). No trailing slash.
- `LARRY_NO_UPDATE=1` (or `--no-update`) — skip the self-update entirely.
- `LARRY_GITEA_TOKEN` (alias `GITEA_TOKEN`) — a Gitea **personal access token**
(read scope) for **authenticated fetch against a PRIVATE repo**. When set,
every update/install fetch adds `Authorization: token <PAT>`. The token value
is never logged. Use this when the Gitea repo is private or the instance has
`REQUIRE_SIGNIN_VIEW=true`, so you don't have to flip the repo public.
```bash
LARRY_GITEA_TOKEN=<PAT> larry # authenticated auto-update
LARRY_GITEA_TOKEN=<PAT> bash install-larry.sh # authenticated install
```
**Hardening (v0.8.4):** every remote fetch is content-validated before the
bytes are trusted. If the origin returns the Gitea HTML *Sign-In* page (which
Gitea serves at HTTP 200 for an unauthenticated read of a private repo), the
installer/updater **fails loud** with an actionable error and does **not**
overwrite any real file — instead of silently parsing the HTML as
VERSION/MANIFEST/script content (the bug that stranded a client at v0.7.3).
The remedy is exactly what the error states: make the repo public +
`REQUIRE_SIGNIN_VIEW=false`, **or** set `LARRY_GITEA_TOKEN`.
---
## Authentication (`larry-auth.sh`, `lib/oauth.sh`)
Only needed if you're running the Larry REPL (`larry.sh`). The lib/ tools themselves never call Anthropic — they're pure local bash.

View File

@ -1 +1 @@
0.8.3
0.8.4

View File

@ -12,6 +12,8 @@
# LARRY_HOME install location (default: $HOME/.larry)
# LARRY_BASE_URL where to fetch files from (no trailing slash)
# LARRY_BIN_DIR where to symlink the `larry` command (default: $HOME/bin)
# LARRY_GITEA_TOKEN optional Gitea PAT (read scope) for authenticated fetch
# against a PRIVATE repo. Alias: GITEA_TOKEN. Never logged.
set -eu
LARRY_HOME="${LARRY_HOME:-$HOME/.larry}"
@ -35,6 +37,91 @@ ok() { printf ' %s✓%s %s\n' "$C_GREEN" "$C_RESET" "$*"; }
warn() { printf ' %s!%s %s\n' "$C_YELLOW" "$C_RESET" "$*"; }
die() { printf '%serror:%s %s\n' "$C_RED" "$C_RESET" "$*" >&2; exit 1; }
# >>> fetch-safe inline (keep in sync with lib/fetch-safe.sh) >>>
# install-larry.sh is the curl|bash bootstrap — it runs BEFORE any lib/ file
# exists on disk, so it cannot source lib/fetch-safe.sh. We inline a byte-
# identical copy of the validators. Root cause + design: see
# Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md and
# lib/fetch-safe.sh's header. The trap: Gitea answers an unauthenticated raw
# read with HTTP 200 + the HTML Sign-In page; `curl -fsSL` calls that success
# and the installer parses HTML as file content. We detect + fail loud.
_fs_curl_auth_args() {
local _tok="${LARRY_GITEA_TOKEN:-${GITEA_TOKEN:-}}"
_tok="${_tok//$'\r'/}"
if [ -n "$_tok" ]; then
printf '%s\n' '-H'
printf '%s\n' "Authorization: token $_tok"
fi
}
_fs_html_trap_error() {
printf 'error: %s returned an HTML sign-in page, not file content. The Gitea repo is private or the instance requires sign-in. Either (a) make the repo public + set REQUIRE_SIGNIN_VIEW=false, or (b) set LARRY_GITEA_TOKEN=<PAT> for authenticated fetch.\n' \
"$1" >&2
}
_fs_snippet() {
local f="$1" fb="$2" s
s="$(head -c 60 "$f" 2>/dev/null | tr -d '\r\n' )"
[ -z "$s" ] && s="$fb"
printf '"%s..."' "$s"
}
# fetch_validate URL DEST KIND [MAX_TIME] — see lib/fetch-safe.sh for the full
# contract. KIND in {version,manifest,script,sh,text}.
fetch_validate() {
local url="$1" dest="$2" kind="${3:-text}" mt="${4:-15}"
local tmp hdr code ctype first line1
tmp="$(mktemp 2>/dev/null || echo "${dest}.fs.$$")"
hdr="$(mktemp 2>/dev/null || echo "${dest}.fsh.$$")"
local _args=( -sSL --max-time "$mt" -o "$tmp" -D "$hdr" -w '%{http_code}' )
local _auth_line
while IFS= read -r _auth_line; do
[ -n "$_auth_line" ] && _args+=( "$_auth_line" )
done < <(_fs_curl_auth_args)
code="$(curl "${_args[@]}" "$url" 2>/dev/null)"
local rc=$?
code="${code//$'\r'/}"
if [ "$rc" -ne 0 ] || [ ! -s "$tmp" ]; then
rm -f "$tmp" "$hdr"
printf 'error: %s — fetch failed (curl rc=%s). Origin unreachable or timed out.\n' "$url" "$rc" >&2
return 1
fi
ctype="$(grep -i '^content-type:' "$hdr" 2>/dev/null | tail -1 | tr -d '\r' | tr 'A-Z' 'a-z')"
first="$(head -c 4096 "$tmp" 2>/dev/null | tr -d '\r')"
if printf '%s' "$first" | grep -qi '<!doctype html\|<html\|sign in - gitea\|<title>sign in'; then
rm -f "$tmp" "$hdr"; _fs_html_trap_error "$url"; return 1
fi
case "$ctype" in
*text/html*) rm -f "$tmp" "$hdr"; _fs_html_trap_error "$url"; return 1 ;;
esac
rm -f "$hdr"
line1="$(head -1 "$tmp" 2>/dev/null | tr -d '\r')"
case "$kind" in
version)
local ver; ver="$(printf '%s' "$first" | tr -d '[:space:]')"
if ! printf '%s' "$ver" | grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+'; then
rm -f "$tmp"
printf 'error: %s — expected a semver VERSION (e.g. 0.8.4), got %s.\n' "$url" "$(_fs_snippet "$tmp" "$first")" >&2
return 1
fi ;;
manifest)
if printf '%s' "$first" | grep -q '<'; then
rm -f "$tmp"; printf 'error: %s — MANIFEST contains HTML markup ("<").\n' "$url" >&2; return 1
fi
if ! grep -Eq '^[A-Za-z0-9_][A-Za-z0-9_./-]*$' "$tmp"; then
rm -f "$tmp"; printf 'error: %s — MANIFEST has no plausible path line.\n' "$url" >&2; return 1
fi ;;
script)
if [ "$line1" != '#!/usr/bin/env bash' ]; then
rm -f "$tmp"
printf 'error: %s — larry.sh must start with `#!/usr/bin/env bash`, got %s.\n' "$url" "$(_fs_snippet "$tmp" "$first")" >&2
return 1
fi ;;
sh|text|*) : ;;
esac
mkdir -p "$(dirname "$dest")" 2>/dev/null || true
mv "$tmp" "$dest" || { rm -f "$tmp"; printf 'error: cannot write %s\n' "$dest" >&2; return 1; }
return 0
}
# <<< fetch-safe inline <<<
# ─────────────────────────────────────────────────────────────────────────────
# Detect platform
# ─────────────────────────────────────────────────────────────────────────────
@ -81,16 +168,32 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" 2>/dev/null && pwd)" || SC
# No fallback; if $LARRY_BASE_URL is unreachable we die with a clear error
# telling the user to verify the URL or set an alternate mirror.
# _kind_for REL — infer the content-shape contract for a manifest path so
# every fetch gets validated (HTML-trap + shape) before we trust the bytes.
_kind_for() {
case "$1" in
larry.sh) printf 'script' ;;
VERSION) printf 'version' ;;
MANIFEST) printf 'manifest' ;;
*.sh) printf 'sh' ;;
*) printf 'text' ;;
esac
}
fetch() {
# $1 = remote relative path, $2 = local destination
if [ -n "$LARRY_BASE_URL" ]; then
say "fetching $1"
if curl -fsSL --max-time 30 "$LARRY_BASE_URL/$1" -o "$2" 2>/dev/null && [ -s "$2" ]; then
# v0.8.4 hardening: validate every remote fetch (HTML-sign-in-page trap +
# content-shape) BEFORE trusting the bytes. fetch_validate writes $2 only
# on success; on failure it prints an actionable error and leaves $2
# untouched, so we never overwrite a real file with HTML soup.
if fetch_validate "$LARRY_BASE_URL/$1" "$2" "$(_kind_for "$1")" 30; then
ok "$2"
return 0
fi
rm -f "$2"
die "install failed: cannot reach LARRY_BASE_URL=$LARRY_BASE_URL (fetching $1) — verify the URL or set LARRY_BASE_URL to a reachable mirror"
die "install failed: cannot fetch $1 from LARRY_BASE_URL=$LARRY_BASE_URL — see error above (verify the URL, repo visibility, or set LARRY_GITEA_TOKEN / a reachable mirror)"
elif [ -n "$SCRIPT_DIR" ] && [ -f "$SCRIPT_DIR/$1" ]; then
cp "$SCRIPT_DIR/$1" "$2" && ok "copied $1 (local)"
else
@ -106,6 +209,8 @@ fetch agents/cloverleaf-cheatsheet.md "$LARRY_HOME/agents/cloverleaf-cheatsheet.
fetch agents/regress.md "$LARRY_HOME/agents/regress.md"
fetch larry-rollback.sh "$LARRY_HOME/larry-rollback.sh"
fetch larry-auth.sh "$LARRY_HOME/larry-auth.sh"
fetch lib/fetch-safe.sh "$LARRY_HOME/lib/fetch-safe.sh"
fetch lib/cygwin-safe.sh "$LARRY_HOME/lib/cygwin-safe.sh"
fetch lib/oauth.sh "$LARRY_HOME/lib/oauth.sh"
fetch lib/ssh-helper.sh "$LARRY_HOME/lib/ssh-helper.sh"
fetch lib/lessons.sh "$LARRY_HOME/lib/lessons.sh"

149
larry.sh
View File

@ -26,6 +26,14 @@
# LARRY_MODEL Claude model (default: claude-sonnet-4-6)
# LARRY_MAX_TOKENS max output tokens per turn (default: 8192)
# LARRY_NO_UPDATE set to 1 to disable self-update
# LARRY_GITEA_TOKEN optional Gitea PAT (read scope) for authenticated
# fetch against a PRIVATE repo (alias: GITEA_TOKEN).
# When set, update/install fetches add an
# "Authorization: token <PAT>" header. Never logged.
# v0.8.4: lets the updater work against a private repo
# without flipping it public. If a fetch returns the
# Gitea HTML sign-in page (HTTP 200), the updater now
# FAILS LOUD instead of parsing HTML as file content.
# ANTHROPIC_API_KEY overrides $LARRY_HOME/.env if set
#
# Slash commands during chat:
@ -57,7 +65,7 @@ set -o pipefail
# ─────────────────────────────────────────────────────────────────────────────
# Config
# ─────────────────────────────────────────────────────────────────────────────
LARRY_VERSION="0.8.3"
LARRY_VERSION="0.8.4"
LARRY_HOME="${LARRY_HOME:-$HOME/.larry}"
# ─────────────────────────────────────────────────────────────────────────────
@ -140,6 +148,95 @@ err() { printf '%serror:%s %s\n' "$C_RED" "$C_RESET" "$*" >&2; }
warn() { printf '%swarn:%s %s\n' "$C_YELLOW" "$C_RESET" "$*" >&2; }
larry_say() { printf '%s%slarry>%s %s\n' "$C_MAGENTA" "$C_BOLD" "$C_RESET" "$*"; }
# >>> fetch-safe inline (keep in sync with lib/fetch-safe.sh) >>>
# self_update() (below) runs BEFORE lib/cygwin-safe.sh + lib/fetch-safe.sh are
# sourced (the source point is ~line 850, after the lib dir resolves). So the
# auto-updater carries a byte-identical inline copy of the fetch validators.
# Root cause + design: see lib/fetch-safe.sh's header and
# Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md. The
# trap: Gitea answers an unauthenticated raw read with HTTP 200 + the HTML
# Sign-In page; `curl -fsSL` calls that success and the updater parses HTML as
# VERSION/MANIFEST/larry.sh content (silent abort, or overwrites real files
# with HTML). We detect + fail loud, never overwriting a real file.
# Optional LARRY_GITEA_TOKEN / GITEA_TOKEN env var enables authenticated fetch.
_fs_curl_auth_args() {
local _tok="${LARRY_GITEA_TOKEN:-${GITEA_TOKEN:-}}"
_tok="${_tok//$'\r'/}"
if [ -n "$_tok" ]; then
printf '%s\n' '-H'
printf '%s\n' "Authorization: token $_tok"
fi
}
_fs_html_trap_error() {
printf 'error: %s returned an HTML sign-in page, not file content. The Gitea repo is private or the instance requires sign-in. Either (a) make the repo public + set REQUIRE_SIGNIN_VIEW=false, or (b) set LARRY_GITEA_TOKEN=<PAT> for authenticated fetch.\n' \
"$1" >&2
}
_fs_snippet() {
local f="$1" fb="$2" s
s="$(head -c 60 "$f" 2>/dev/null | tr -d '\r\n' )"
[ -z "$s" ] && s="$fb"
printf '"%s..."' "$s"
}
# fetch_validate URL DEST KIND [MAX_TIME] — see lib/fetch-safe.sh for the full
# contract. KIND in {version,manifest,script,sh,text}. Writes DEST only on
# success; returns non-zero + leaves DEST untouched on any failure.
fetch_validate() {
local url="$1" dest="$2" kind="${3:-text}" mt="${4:-15}"
local tmp hdr code ctype first line1
tmp="$(mktemp 2>/dev/null || echo "${dest}.fs.$$")"
hdr="$(mktemp 2>/dev/null || echo "${dest}.fsh.$$")"
local _args=( -sSL --max-time "$mt" -o "$tmp" -D "$hdr" -w '%{http_code}' )
local _auth_line
while IFS= read -r _auth_line; do
[ -n "$_auth_line" ] && _args+=( "$_auth_line" )
done < <(_fs_curl_auth_args)
code="$(curl "${_args[@]}" "$url" 2>/dev/null)"
local rc=$?
code="${code//$'\r'/}"
if [ "$rc" -ne 0 ] || [ ! -s "$tmp" ]; then
rm -f "$tmp" "$hdr"
printf 'error: %s — fetch failed (curl rc=%s). Origin unreachable or timed out.\n' "$url" "$rc" >&2
return 1
fi
ctype="$(grep -i '^content-type:' "$hdr" 2>/dev/null | tail -1 | tr -d '\r' | tr 'A-Z' 'a-z')"
first="$(head -c 4096 "$tmp" 2>/dev/null | tr -d '\r')"
if printf '%s' "$first" | grep -qi '<!doctype html\|<html\|sign in - gitea\|<title>sign in'; then
rm -f "$tmp" "$hdr"; _fs_html_trap_error "$url"; return 1
fi
case "$ctype" in
*text/html*) rm -f "$tmp" "$hdr"; _fs_html_trap_error "$url"; return 1 ;;
esac
rm -f "$hdr"
line1="$(head -1 "$tmp" 2>/dev/null | tr -d '\r')"
case "$kind" in
version)
local ver; ver="$(printf '%s' "$first" | tr -d '[:space:]')"
if ! printf '%s' "$ver" | grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+'; then
rm -f "$tmp"
printf 'error: %s — expected a semver VERSION (e.g. 0.8.4), got %s.\n' "$url" "$(_fs_snippet "$tmp" "$first")" >&2
return 1
fi ;;
manifest)
if printf '%s' "$first" | grep -q '<'; then
rm -f "$tmp"; printf 'error: %s — MANIFEST contains HTML markup ("<").\n' "$url" >&2; return 1
fi
if ! grep -Eq '^[A-Za-z0-9_][A-Za-z0-9_./-]*$' "$tmp"; then
rm -f "$tmp"; printf 'error: %s — MANIFEST has no plausible path line.\n' "$url" >&2; return 1
fi ;;
script)
if [ "$line1" != '#!/usr/bin/env bash' ]; then
rm -f "$tmp"
printf 'error: %s — larry.sh must start with `#!/usr/bin/env bash`, got %s.\n' "$url" "$(_fs_snippet "$tmp" "$first")" >&2
return 1
fi ;;
sh|text|*) : ;;
esac
mkdir -p "$(dirname "$dest")" 2>/dev/null || true
mv "$tmp" "$dest" || { rm -f "$tmp"; printf 'error: cannot write %s\n' "$dest" >&2; return 1; }
return 0
}
# <<< fetch-safe inline <<<
# ─────────────────────────────────────────────────────────────────────────────
# CLI args
# ─────────────────────────────────────────────────────────────────────────────
@ -296,7 +393,10 @@ fetch_agents_or_warn() {
if [ -n "$LARRY_AGENTS_URL" ]; then
log "fetching agent definitions from $LARRY_AGENTS_URL"
for f in $LARRY_AGENT_FILES; do
curl -fsSL --max-time 10 "$LARRY_AGENTS_URL/$f" -o "$LARRY_HOME/agents/$f" \
# v0.8.4: validate the fetch (HTML-sign-in-page trap + non-HTML shape)
# before trusting the bytes; on failure fall back to the built-in agent
# rather than writing an HTML sign-in page into agents/.
fetch_validate "$LARRY_AGENTS_URL/$f" "$LARRY_HOME/agents/$f" text 10 \
|| { warn "could not fetch $f — using built-in fallback"; write_fallback_agent "$f"; }
done
else
@ -370,7 +470,12 @@ _record_origin() {
sync_from_manifest() {
local base="$1"
local manifest="$LARRY_HOME/.manifest.new"
curl -fsSL --max-time 10 "$base/MANIFEST" -o "$manifest" 2>/dev/null || {
# v0.8.4: validate the MANIFEST fetch. If Gitea is private/sign-in-gated it
# answers with the HTML login page at HTTP 200; the old `curl -fsSL` treated
# that as success and the loop below then iterated HTML lines as file paths
# and overwrote real on-disk files with HTML. fetch_validate fails loud and
# leaves $manifest absent, so we abort cleanly without corrupting anything.
fetch_validate "$base/MANIFEST" "$manifest" manifest 10 || {
rm -f "$manifest"
return 1
}
@ -393,7 +498,17 @@ sync_from_manifest() {
dest="$LARRY_HOME/$path"
tmp="$dest.new"
mkdir -p "$(dirname "$dest")" 2>/dev/null
if curl -fsSL --max-time 15 "$base/$path" -o "$tmp" 2>/dev/null && [ -s "$tmp" ]; then
# v0.8.4: per-file content validation. Infer the shape contract from the
# path so a sign-in-page (or any HTML) response can never be written over a
# real lib/agent/metadata file. fetch_validate writes $tmp only on success.
local _kind
case "$path" in
VERSION) _kind=version ;;
MANIFEST) _kind=manifest ;;
*.sh) _kind=sh ;;
*) _kind=text ;;
esac
if fetch_validate "$base/$path" "$tmp" "$_kind" 15 && [ -s "$tmp" ]; then
if [ ! -f "$dest" ] || ! cmp -s "$dest" "$tmp"; then
mv "$tmp" "$dest"
case "$path" in *.sh) chmod +x "$dest" 2>/dev/null || true ;; esac
@ -429,13 +544,27 @@ sync_from_manifest_with_fallback() {
return 1
}
# _fetch_with_fallback REL_PATH DEST [MAX_TIME] — v0.7.4 single-source fetch
# (name kept for call-site compatibility). Returns 0 if the file pulled
# non-empty, non-zero otherwise. Records the winning origin slot in
# $_LARRY_LAST_ORIGIN (always "primary" in single-source mode).
# _fetch_with_fallback REL_PATH DEST [MAX_TIME] [KIND] — v0.7.4 single-source
# fetch (name kept for call-site compatibility). Returns 0 if the file pulled
# AND passed content validation, non-zero otherwise. Records the winning
# origin slot in $_LARRY_LAST_ORIGIN (always "primary" in single-source mode).
#
# v0.8.4: routes through fetch_validate so the Gitea HTML-sign-in-page trap
# (HTTP 200 + login HTML) is caught BEFORE the bytes are trusted. KIND defaults
# to a shape inferred from REL_PATH (VERSION->version, larry.sh->script,
# *.sh->sh, else text).
_fetch_with_fallback() {
local rel="$1" dest="$2" mt="${3:-15}"
if curl -fsSL --max-time "$mt" "$LARRY_BASE_URL/$rel" -o "$dest" 2>/dev/null && [ -s "$dest" ]; then
local rel="$1" dest="$2" mt="${3:-15}" kind="${4:-}"
if [ -z "$kind" ]; then
case "$rel" in
VERSION) kind=version ;;
MANIFEST) kind=manifest ;;
larry.sh) kind=script ;;
*.sh) kind=sh ;;
*) kind=text ;;
esac
fi
if fetch_validate "$LARRY_BASE_URL/$rel" "$dest" "$kind" "$mt" && [ -s "$dest" ]; then
_record_origin primary "$LARRY_BASE_URL"
return 0
fi

190
lib/fetch-safe.sh Normal file
View File

@ -0,0 +1,190 @@
#!/usr/bin/env bash
# fetch-safe.sh — content-validating remote fetch for the Larry-Anywhere
# installer + auto-updater.
#
# WHY THIS EXISTS (root cause — see
# Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md,
# Clover #5's diagnosis, "Problem 1"):
#
# `curl -fsSL` against a Gitea raw-file URL, when the Gitea instance
# requires sign-in (or the repo is private), returns the HTML *Sign-In
# page* with **HTTP 200** (Gitea answers an unauthenticated raw read with
# 303 -> /user/login, and `curl -L` follows it to a 200 HTML page).
# `curl -fsSL` only fails on HTTP 4xx/5xx, so it treats this 200-HTML as
# SUCCESS. The installer/updater then parses the HTML as VERSION/MANIFEST/
# larry.sh content, finds no valid version, and either silently aborts OR
# (worse) overwrites real on-disk files with the HTML soup.
#
# That exact trap stranded Bryan's work-box at v0.7.3 until the Gitea
# `REQUIRE_SIGNIN_VIEW=false` flip. The flip fixed the symptom; this file
# fixes the *fragility* — any future private-repo install, Gitea
# re-privatization, or auth-gated mirror would hit the same silent trap.
#
# DESIGN: fail LOUD, never silently corrupt. After every fetch, before the
# caller trusts the bytes, we (a) detect the HTML-login-page trap and (b)
# validate the content shape per file type. On any failure we print an
# actionable error and return non-zero WITHOUT leaving a poisoned file in
# place.
#
# OPTIONAL AUTH: if LARRY_GITEA_TOKEN (or GITEA_TOKEN) is set, fetches add an
# `Authorization: token <PAT>` header so the updater works against a
# private repo without the public-flip. The token value is NEVER logged.
#
# SOURCING NOTE: this file is the canonical, version-controlled home of these
# validators and is listed in MANIFEST so it propagates + stays auditable.
# BUT both install-larry.sh (the curl|bash bootstrap, which runs before any
# lib/ file exists on disk) and larry.sh's self_update() (which runs before
# lib/ is sourced) carry an INLINE, byte-identical copy of these functions so
# they work pre-source. When you change a validator here, mirror it in those
# two inline blocks (each is fenced with `# >>> fetch-safe inline (keep in
# sync with lib/fetch-safe.sh) >>>`).
#
# Defines functions only; runs no code on source; touches no set -e/-u/-o
# pipefail (the caller owns those). Re-sourcing is harmless.
# _fs_curl_auth_args — emit the optional Authorization header args on stdout,
# one per line, IF a Gitea PAT is present in the environment. Never echoes the
# token to a log; the caller splices the lines straight into curl's argv.
_fs_curl_auth_args() {
local _tok="${LARRY_GITEA_TOKEN:-${GITEA_TOKEN:-}}"
# Strip CR (Cygwin/MobaXterm paste can taint an env var with a trailing \r,
# which would corrupt the HTTP header line and get the request rejected).
_tok="${_tok//$'\r'/}"
if [ -n "$_tok" ]; then
printf '%s\n' '-H'
printf '%s\n' "Authorization: token $_tok"
fi
}
# fetch_validate URL DEST KIND [MAX_TIME]
# URL — fully-qualified remote URL to fetch
# DEST — local path to write on success (left ABSENT/untouched on failure)
# KIND — content-shape contract, one of:
# version -> first line must match ^[0-9]+\.[0-9]+\.[0-9]+
# manifest -> newline list of plausible paths, no HTML chars
# script -> first line must be `#!/usr/bin/env bash`
# sh -> shebang OR at least non-HTML (lib helper files)
# text -> just "not the HTML sign-in trap" (default)
# MAX_TIME — curl --max-time seconds (default 15)
#
# Returns 0 and writes DEST only when BOTH the HTML-trap check AND the
# content-shape check pass. Returns non-zero (and prints an actionable error)
# otherwise, leaving DEST untouched so the caller never overwrites a real file
# with garbage.
fetch_validate() {
local url="$1" dest="$2" kind="${3:-text}" mt="${4:-15}"
local tmp hdr code ctype first
tmp="$(mktemp 2>/dev/null || echo "${dest}.fs.$$")"
hdr="$(mktemp 2>/dev/null || echo "${dest}.fsh.$$")"
# Build curl argv. -D dumps response headers so we can inspect Content-Type
# and the final HTTP status. -w prints the final code on stdout's tail (we
# capture it separately). We deliberately DO follow redirects (-L) so we can
# still reach a CDN/mirror that legitimately 301s, but the post-fetch checks
# below catch the /user/login HTML landing that the redirect produces.
local _args=( -sSL --max-time "$mt" -o "$tmp" -D "$hdr" -w '%{http_code}' )
# Splice optional auth header (read line-by-line to preserve spaces).
local _auth_line
while IFS= read -r _auth_line; do
[ -n "$_auth_line" ] && _args+=( "$_auth_line" )
done < <(_fs_curl_auth_args)
code="$(curl "${_args[@]}" "$url" 2>/dev/null)"
local rc=$?
code="${code//$'\r'/}"
# Hard transport failure (curl non-zero, or empty body).
if [ "$rc" -ne 0 ] || [ ! -s "$tmp" ]; then
rm -f "$tmp" "$hdr"
printf 'error: %s — fetch failed (curl rc=%s, empty=%s). Origin unreachable or timed out.\n' \
"$url" "$rc" "$([ -s "$tmp" ] && echo no || echo yes)" >&2
return 1
fi
# ── HTML-login-page trap detection (ANY one of these is a hard fail) ──────
ctype="$(grep -i '^content-type:' "$hdr" 2>/dev/null | tail -1 | tr -d '\r' | tr 'A-Z' 'a-z')"
first="$(head -c 4096 "$tmp" 2>/dev/null | tr -d '\r')"
if printf '%s' "$first" | grep -qi '<!doctype html\|<html\|sign in - gitea\|<title>sign in'; then
rm -f "$tmp" "$hdr"
_fs_html_trap_error "$url"
return 1
fi
case "$ctype" in
*text/html*)
rm -f "$tmp" "$hdr"
_fs_html_trap_error "$url"
return 1
;;
esac
rm -f "$hdr"
# ── Content-shape validation per KIND ─────────────────────────────────────
local line1
line1="$(head -1 "$tmp" 2>/dev/null | tr -d '\r')"
case "$kind" in
version)
local ver
ver="$(printf '%s' "$first" | tr -d '[:space:]')"
if ! printf '%s' "$ver" | grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+'; then
rm -f "$tmp"
printf 'error: %s — expected a semver VERSION (e.g. 0.8.4), got %s. Not valid file content.\n' \
"$url" "$(_fs_snippet "$tmp" "$first")" >&2
return 1
fi
;;
manifest)
# Must contain at least one plausible path line and NO HTML angle bracket.
if printf '%s' "$first" | grep -q '<'; then
rm -f "$tmp"
printf 'error: %s — MANIFEST contains HTML markup ("<"), not a path list.\n' "$url" >&2
return 1
fi
if ! grep -Eq '^[A-Za-z0-9_][A-Za-z0-9_./-]*$' "$tmp"; then
rm -f "$tmp"
printf 'error: %s — MANIFEST has no plausible path line.\n' "$url" >&2
return 1
fi
;;
script)
if [ "$line1" != '#!/usr/bin/env bash' ]; then
rm -f "$tmp"
printf 'error: %s — larry.sh must start with `#!/usr/bin/env bash`, got %s.\n' \
"$url" "$(_fs_snippet "$tmp" "$first")" >&2
return 1
fi
;;
sh)
# A shebang is ideal; at minimum it must not be HTML (already checked).
case "$line1" in
'#!'*) : ;;
*)
# Non-shebang .sh (rare) — accept as long as it isn't HTML (above).
: ;;
esac
;;
text|*) : ;;
esac
# All checks passed — atomically place the validated bytes.
mkdir -p "$(dirname "$dest")" 2>/dev/null || true
mv "$tmp" "$dest" || { rm -f "$tmp"; printf 'error: cannot write %s\n' "$dest" >&2; return 1; }
return 0
}
# _fs_html_trap_error URL — print the canonical, actionable HTML-trap error.
_fs_html_trap_error() {
printf 'error: %s returned an HTML sign-in page, not file content. The Gitea repo is private or the instance requires sign-in. Either (a) make the repo public + set REQUIRE_SIGNIN_VIEW=false, or (b) set LARRY_GITEA_TOKEN=<PAT> for authenticated fetch.\n' \
"$1" >&2
}
# _fs_snippet TMPFILE FALLBACK — a short, single-line, log-safe preview of what
# we actually received (first 60 chars), so errors are diagnosable without
# dumping a full HTML page.
_fs_snippet() {
local f="$1" fb="$2" s
s="$(head -c 60 "$f" 2>/dev/null | tr -d '\r\n' )"
[ -z "$s" ] && s="$fb"
printf '"%s..."' "$s"
}