diff --git a/CHANGELOG.md b/CHANGELOG.md index c9863b8..b7da1c7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,39 @@ All notable changes to `cloverleaf-larry` / `larry-anywhere` are recorded here. Versioning is loose-semver; bumps trigger the in-process self-update on every running client via `LARRY_BASE_URL` + `MANIFEST`. +## v0.8.4 — 2026-05-27 + +- **Installer/updater now detects HTML-sign-in-page responses and fails loud + instead of silently corrupting.** Root cause (Clover #5's diagnosis, + `Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md`): a + private/sign-in-gated Gitea answers an unauthenticated raw-file read with the + **HTML Sign-In page at HTTP 200** (303 → `/user/login`, followed by `curl -L` + to a 200 HTML page). `curl -fsSL` treats that as success, so the old + installer/auto-updater parsed the HTML as VERSION/MANIFEST/`larry.sh` content + — silently aborting, or overwriting real on-disk files with HTML soup. This + is exactly what stranded a work-box at v0.7.3 until the Gitea + `REQUIRE_SIGNIN_VIEW=false` flip. +- **New `lib/fetch-safe.sh`** — a content-validating fetch wrapper + (`fetch_validate URL DEST KIND [MAX_TIME]`). After every `curl`, BEFORE + trusting the bytes, it (a) detects the HTML-login trap (`Sign In` markers, or a `text/html` + `Content-Type` when a raw file was expected) and (b) validates the content + shape per file type: VERSION must match `^[0-9]+\.[0-9]+\.[0-9]+`, MANIFEST + must be a path-list with no HTML, `larry.sh` must start with + `#!/usr/bin/env bash`, other `.sh` must be non-HTML. On any failure it prints + an actionable error and returns non-zero **without overwriting the target**. + The bootstrap `install-larry.sh` (curl|bash, runs before any lib exists) and + `larry.sh`'s `self_update()` (runs before lib is sourced) each carry a + byte-identical inline copy; the canonical file is in MANIFEST and auto-syncs. +- **Every remote-content fetch hardened.** `install-larry.sh` `fetch()`; + `larry.sh` agent fetch, `sync_from_manifest` MANIFEST + per-file fetches, and + `_fetch_with_fallback` (Phase-B VERSION + larry.sh) all route through the + validator. No trusted-content fetch still uses raw `curl -fsSL`. +- **Optional `LARRY_GITEA_TOKEN` (alias `GITEA_TOKEN`) for authenticated + fetch.** When set, fetches add `Authorization: token ` so the + installer/updater works against a PRIVATE repo without the public-flip. The + token is never hardcoded and never logged. Documented in `--help` + MANUAL.md. + ## v0.8.3 — 2026-05-27 - **Tab-completion trailing space no longer breaks command dispatch.** The diff --git a/MANIFEST b/MANIFEST index bfb0925..196c470 100644 --- a/MANIFEST +++ b/MANIFEST @@ -27,6 +27,11 @@ agents/regress.md # Cygwin/MobaXterm CR-taint defense primitives (sourced by every tool) lib/cygwin-safe.sh +# v0.8.4: content-validating fetch (HTML-sign-in-page trap detection + per- +# file-type shape checks) for the installer/auto-updater. Canonical home of the +# validators that install-larry.sh and larry.sh also carry inline (pre-source). +lib/fetch-safe.sh + # Auth implementation lib/oauth.sh diff --git a/MANUAL.md b/MANUAL.md index 91f5533..a87a1d7 100644 --- a/MANUAL.md +++ b/MANUAL.md @@ -24,6 +24,37 @@ export HCISITEDIR="$HCIROOT/$HCISITE" --- +## Auto-update & origin + +`larry.sh` (and `install-larry.sh`) fetch updates from a single Gitea origin: +`$LARRY_BASE_URL` (default `https://git.bjnoela.com/bryan/cloverleaf-larry/raw/branch/main`). + +Env vars that control fetching: + +- `LARRY_BASE_URL` — override the origin (fork/mirror). No trailing slash. +- `LARRY_NO_UPDATE=1` (or `--no-update`) — skip the self-update entirely. +- `LARRY_GITEA_TOKEN` (alias `GITEA_TOKEN`) — a Gitea **personal access token** + (read scope) for **authenticated fetch against a PRIVATE repo**. When set, + every update/install fetch adds `Authorization: token `. The token value + is never logged. Use this when the Gitea repo is private or the instance has + `REQUIRE_SIGNIN_VIEW=true`, so you don't have to flip the repo public. + + ```bash + LARRY_GITEA_TOKEN= larry # authenticated auto-update + LARRY_GITEA_TOKEN= bash install-larry.sh # authenticated install + ``` + +**Hardening (v0.8.4):** every remote fetch is content-validated before the +bytes are trusted. If the origin returns the Gitea HTML *Sign-In* page (which +Gitea serves at HTTP 200 for an unauthenticated read of a private repo), the +installer/updater **fails loud** with an actionable error and does **not** +overwrite any real file — instead of silently parsing the HTML as +VERSION/MANIFEST/script content (the bug that stranded a client at v0.7.3). +The remedy is exactly what the error states: make the repo public + +`REQUIRE_SIGNIN_VIEW=false`, **or** set `LARRY_GITEA_TOKEN`. + +--- + ## Authentication (`larry-auth.sh`, `lib/oauth.sh`) Only needed if you're running the Larry REPL (`larry.sh`). The lib/ tools themselves never call Anthropic — they're pure local bash. diff --git a/VERSION b/VERSION index ee94dd8..b60d719 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.8.3 +0.8.4 diff --git a/install-larry.sh b/install-larry.sh index 5ed2cc6..c218232 100755 --- a/install-larry.sh +++ b/install-larry.sh @@ -9,9 +9,11 @@ # LARRY_BASE_URL=https://example.com/larry-anywhere bash install-larry.sh # # Env vars: -# LARRY_HOME install location (default: $HOME/.larry) -# LARRY_BASE_URL where to fetch files from (no trailing slash) -# LARRY_BIN_DIR where to symlink the `larry` command (default: $HOME/bin) +# LARRY_HOME install location (default: $HOME/.larry) +# LARRY_BASE_URL where to fetch files from (no trailing slash) +# LARRY_BIN_DIR where to symlink the `larry` command (default: $HOME/bin) +# LARRY_GITEA_TOKEN optional Gitea PAT (read scope) for authenticated fetch +# against a PRIVATE repo. Alias: GITEA_TOKEN. Never logged. set -eu LARRY_HOME="${LARRY_HOME:-$HOME/.larry}" @@ -35,6 +37,91 @@ ok() { printf ' %s✓%s %s\n' "$C_GREEN" "$C_RESET" "$*"; } warn() { printf ' %s!%s %s\n' "$C_YELLOW" "$C_RESET" "$*"; } die() { printf '%serror:%s %s\n' "$C_RED" "$C_RESET" "$*" >&2; exit 1; } +# >>> fetch-safe inline (keep in sync with lib/fetch-safe.sh) >>> +# install-larry.sh is the curl|bash bootstrap — it runs BEFORE any lib/ file +# exists on disk, so it cannot source lib/fetch-safe.sh. We inline a byte- +# identical copy of the validators. Root cause + design: see +# Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md and +# lib/fetch-safe.sh's header. The trap: Gitea answers an unauthenticated raw +# read with HTTP 200 + the HTML Sign-In page; `curl -fsSL` calls that success +# and the installer parses HTML as file content. We detect + fail loud. +_fs_curl_auth_args() { + local _tok="${LARRY_GITEA_TOKEN:-${GITEA_TOKEN:-}}" + _tok="${_tok//$'\r'/}" + if [ -n "$_tok" ]; then + printf '%s\n' '-H' + printf '%s\n' "Authorization: token $_tok" + fi +} +_fs_html_trap_error() { + printf 'error: %s returned an HTML sign-in page, not file content. The Gitea repo is private or the instance requires sign-in. Either (a) make the repo public + set REQUIRE_SIGNIN_VIEW=false, or (b) set LARRY_GITEA_TOKEN= for authenticated fetch.\n' \ + "$1" >&2 +} +_fs_snippet() { + local f="$1" fb="$2" s + s="$(head -c 60 "$f" 2>/dev/null | tr -d '\r\n' )" + [ -z "$s" ] && s="$fb" + printf '"%s..."' "$s" +} +# fetch_validate URL DEST KIND [MAX_TIME] — see lib/fetch-safe.sh for the full +# contract. KIND in {version,manifest,script,sh,text}. +fetch_validate() { + local url="$1" dest="$2" kind="${3:-text}" mt="${4:-15}" + local tmp hdr code ctype first line1 + tmp="$(mktemp 2>/dev/null || echo "${dest}.fs.$$")" + hdr="$(mktemp 2>/dev/null || echo "${dest}.fsh.$$")" + local _args=( -sSL --max-time "$mt" -o "$tmp" -D "$hdr" -w '%{http_code}' ) + local _auth_line + while IFS= read -r _auth_line; do + [ -n "$_auth_line" ] && _args+=( "$_auth_line" ) + done < <(_fs_curl_auth_args) + code="$(curl "${_args[@]}" "$url" 2>/dev/null)" + local rc=$? + code="${code//$'\r'/}" + if [ "$rc" -ne 0 ] || [ ! -s "$tmp" ]; then + rm -f "$tmp" "$hdr" + printf 'error: %s — fetch failed (curl rc=%s). Origin unreachable or timed out.\n' "$url" "$rc" >&2 + return 1 + fi + ctype="$(grep -i '^content-type:' "$hdr" 2>/dev/null | tail -1 | tr -d '\r' | tr 'A-Z' 'a-z')" + first="$(head -c 4096 "$tmp" 2>/dev/null | tr -d '\r')" + if printf '%s' "$first" | grep -qi 'sign in'; then + rm -f "$tmp" "$hdr"; _fs_html_trap_error "$url"; return 1 + fi + case "$ctype" in + *text/html*) rm -f "$tmp" "$hdr"; _fs_html_trap_error "$url"; return 1 ;; + esac + rm -f "$hdr" + line1="$(head -1 "$tmp" 2>/dev/null | tr -d '\r')" + case "$kind" in + version) + local ver; ver="$(printf '%s' "$first" | tr -d '[:space:]')" + if ! printf '%s' "$ver" | grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+'; then + rm -f "$tmp" + printf 'error: %s — expected a semver VERSION (e.g. 0.8.4), got %s.\n' "$url" "$(_fs_snippet "$tmp" "$first")" >&2 + return 1 + fi ;; + manifest) + if printf '%s' "$first" | grep -q '<'; then + rm -f "$tmp"; printf 'error: %s — MANIFEST contains HTML markup ("<").\n' "$url" >&2; return 1 + fi + if ! grep -Eq '^[A-Za-z0-9_][A-Za-z0-9_./-]*$' "$tmp"; then + rm -f "$tmp"; printf 'error: %s — MANIFEST has no plausible path line.\n' "$url" >&2; return 1 + fi ;; + script) + if [ "$line1" != '#!/usr/bin/env bash' ]; then + rm -f "$tmp" + printf 'error: %s — larry.sh must start with `#!/usr/bin/env bash`, got %s.\n' "$url" "$(_fs_snippet "$tmp" "$first")" >&2 + return 1 + fi ;; + sh|text|*) : ;; + esac + mkdir -p "$(dirname "$dest")" 2>/dev/null || true + mv "$tmp" "$dest" || { rm -f "$tmp"; printf 'error: cannot write %s\n' "$dest" >&2; return 1; } + return 0 +} +# <<< fetch-safe inline <<< + # ───────────────────────────────────────────────────────────────────────────── # Detect platform # ───────────────────────────────────────────────────────────────────────────── @@ -81,16 +168,32 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" 2>/dev/null && pwd)" || SC # No fallback; if $LARRY_BASE_URL is unreachable we die with a clear error # telling the user to verify the URL or set an alternate mirror. +# _kind_for REL — infer the content-shape contract for a manifest path so +# every fetch gets validated (HTML-trap + shape) before we trust the bytes. +_kind_for() { + case "$1" in + larry.sh) printf 'script' ;; + VERSION) printf 'version' ;; + MANIFEST) printf 'manifest' ;; + *.sh) printf 'sh' ;; + *) printf 'text' ;; + esac +} + fetch() { # $1 = remote relative path, $2 = local destination if [ -n "$LARRY_BASE_URL" ]; then say "fetching $1" - if curl -fsSL --max-time 30 "$LARRY_BASE_URL/$1" -o "$2" 2>/dev/null && [ -s "$2" ]; then + # v0.8.4 hardening: validate every remote fetch (HTML-sign-in-page trap + + # content-shape) BEFORE trusting the bytes. fetch_validate writes $2 only + # on success; on failure it prints an actionable error and leaves $2 + # untouched, so we never overwrite a real file with HTML soup. + if fetch_validate "$LARRY_BASE_URL/$1" "$2" "$(_kind_for "$1")" 30; then ok "$2" return 0 fi rm -f "$2" - die "install failed: cannot reach LARRY_BASE_URL=$LARRY_BASE_URL (fetching $1) — verify the URL or set LARRY_BASE_URL to a reachable mirror" + die "install failed: cannot fetch $1 from LARRY_BASE_URL=$LARRY_BASE_URL — see error above (verify the URL, repo visibility, or set LARRY_GITEA_TOKEN / a reachable mirror)" elif [ -n "$SCRIPT_DIR" ] && [ -f "$SCRIPT_DIR/$1" ]; then cp "$SCRIPT_DIR/$1" "$2" && ok "copied $1 (local)" else @@ -106,6 +209,8 @@ fetch agents/cloverleaf-cheatsheet.md "$LARRY_HOME/agents/cloverleaf-cheatsheet. fetch agents/regress.md "$LARRY_HOME/agents/regress.md" fetch larry-rollback.sh "$LARRY_HOME/larry-rollback.sh" fetch larry-auth.sh "$LARRY_HOME/larry-auth.sh" +fetch lib/fetch-safe.sh "$LARRY_HOME/lib/fetch-safe.sh" +fetch lib/cygwin-safe.sh "$LARRY_HOME/lib/cygwin-safe.sh" fetch lib/oauth.sh "$LARRY_HOME/lib/oauth.sh" fetch lib/ssh-helper.sh "$LARRY_HOME/lib/ssh-helper.sh" fetch lib/lessons.sh "$LARRY_HOME/lib/lessons.sh" diff --git a/larry.sh b/larry.sh index 3b03c16..22bb8bb 100755 --- a/larry.sh +++ b/larry.sh @@ -26,6 +26,14 @@ # LARRY_MODEL Claude model (default: claude-sonnet-4-6) # LARRY_MAX_TOKENS max output tokens per turn (default: 8192) # LARRY_NO_UPDATE set to 1 to disable self-update +# LARRY_GITEA_TOKEN optional Gitea PAT (read scope) for authenticated +# fetch against a PRIVATE repo (alias: GITEA_TOKEN). +# When set, update/install fetches add an +# "Authorization: token " header. Never logged. +# v0.8.4: lets the updater work against a private repo +# without flipping it public. If a fetch returns the +# Gitea HTML sign-in page (HTTP 200), the updater now +# FAILS LOUD instead of parsing HTML as file content. # ANTHROPIC_API_KEY overrides $LARRY_HOME/.env if set # # Slash commands during chat: @@ -57,7 +65,7 @@ set -o pipefail # ───────────────────────────────────────────────────────────────────────────── # Config # ───────────────────────────────────────────────────────────────────────────── -LARRY_VERSION="0.8.3" +LARRY_VERSION="0.8.4" LARRY_HOME="${LARRY_HOME:-$HOME/.larry}" # ───────────────────────────────────────────────────────────────────────────── @@ -140,6 +148,95 @@ err() { printf '%serror:%s %s\n' "$C_RED" "$C_RESET" "$*" >&2; } warn() { printf '%swarn:%s %s\n' "$C_YELLOW" "$C_RESET" "$*" >&2; } larry_say() { printf '%s%slarry>%s %s\n' "$C_MAGENTA" "$C_BOLD" "$C_RESET" "$*"; } +# >>> fetch-safe inline (keep in sync with lib/fetch-safe.sh) >>> +# self_update() (below) runs BEFORE lib/cygwin-safe.sh + lib/fetch-safe.sh are +# sourced (the source point is ~line 850, after the lib dir resolves). So the +# auto-updater carries a byte-identical inline copy of the fetch validators. +# Root cause + design: see lib/fetch-safe.sh's header and +# Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md. The +# trap: Gitea answers an unauthenticated raw read with HTTP 200 + the HTML +# Sign-In page; `curl -fsSL` calls that success and the updater parses HTML as +# VERSION/MANIFEST/larry.sh content (silent abort, or overwrites real files +# with HTML). We detect + fail loud, never overwriting a real file. +# Optional LARRY_GITEA_TOKEN / GITEA_TOKEN env var enables authenticated fetch. +_fs_curl_auth_args() { + local _tok="${LARRY_GITEA_TOKEN:-${GITEA_TOKEN:-}}" + _tok="${_tok//$'\r'/}" + if [ -n "$_tok" ]; then + printf '%s\n' '-H' + printf '%s\n' "Authorization: token $_tok" + fi +} +_fs_html_trap_error() { + printf 'error: %s returned an HTML sign-in page, not file content. The Gitea repo is private or the instance requires sign-in. Either (a) make the repo public + set REQUIRE_SIGNIN_VIEW=false, or (b) set LARRY_GITEA_TOKEN= for authenticated fetch.\n' \ + "$1" >&2 +} +_fs_snippet() { + local f="$1" fb="$2" s + s="$(head -c 60 "$f" 2>/dev/null | tr -d '\r\n' )" + [ -z "$s" ] && s="$fb" + printf '"%s..."' "$s" +} +# fetch_validate URL DEST KIND [MAX_TIME] — see lib/fetch-safe.sh for the full +# contract. KIND in {version,manifest,script,sh,text}. Writes DEST only on +# success; returns non-zero + leaves DEST untouched on any failure. +fetch_validate() { + local url="$1" dest="$2" kind="${3:-text}" mt="${4:-15}" + local tmp hdr code ctype first line1 + tmp="$(mktemp 2>/dev/null || echo "${dest}.fs.$$")" + hdr="$(mktemp 2>/dev/null || echo "${dest}.fsh.$$")" + local _args=( -sSL --max-time "$mt" -o "$tmp" -D "$hdr" -w '%{http_code}' ) + local _auth_line + while IFS= read -r _auth_line; do + [ -n "$_auth_line" ] && _args+=( "$_auth_line" ) + done < <(_fs_curl_auth_args) + code="$(curl "${_args[@]}" "$url" 2>/dev/null)" + local rc=$? + code="${code//$'\r'/}" + if [ "$rc" -ne 0 ] || [ ! -s "$tmp" ]; then + rm -f "$tmp" "$hdr" + printf 'error: %s — fetch failed (curl rc=%s). Origin unreachable or timed out.\n' "$url" "$rc" >&2 + return 1 + fi + ctype="$(grep -i '^content-type:' "$hdr" 2>/dev/null | tail -1 | tr -d '\r' | tr 'A-Z' 'a-z')" + first="$(head -c 4096 "$tmp" 2>/dev/null | tr -d '\r')" + if printf '%s' "$first" | grep -qi 'sign in'; then + rm -f "$tmp" "$hdr"; _fs_html_trap_error "$url"; return 1 + fi + case "$ctype" in + *text/html*) rm -f "$tmp" "$hdr"; _fs_html_trap_error "$url"; return 1 ;; + esac + rm -f "$hdr" + line1="$(head -1 "$tmp" 2>/dev/null | tr -d '\r')" + case "$kind" in + version) + local ver; ver="$(printf '%s' "$first" | tr -d '[:space:]')" + if ! printf '%s' "$ver" | grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+'; then + rm -f "$tmp" + printf 'error: %s — expected a semver VERSION (e.g. 0.8.4), got %s.\n' "$url" "$(_fs_snippet "$tmp" "$first")" >&2 + return 1 + fi ;; + manifest) + if printf '%s' "$first" | grep -q '<'; then + rm -f "$tmp"; printf 'error: %s — MANIFEST contains HTML markup ("<").\n' "$url" >&2; return 1 + fi + if ! grep -Eq '^[A-Za-z0-9_][A-Za-z0-9_./-]*$' "$tmp"; then + rm -f "$tmp"; printf 'error: %s — MANIFEST has no plausible path line.\n' "$url" >&2; return 1 + fi ;; + script) + if [ "$line1" != '#!/usr/bin/env bash' ]; then + rm -f "$tmp" + printf 'error: %s — larry.sh must start with `#!/usr/bin/env bash`, got %s.\n' "$url" "$(_fs_snippet "$tmp" "$first")" >&2 + return 1 + fi ;; + sh|text|*) : ;; + esac + mkdir -p "$(dirname "$dest")" 2>/dev/null || true + mv "$tmp" "$dest" || { rm -f "$tmp"; printf 'error: cannot write %s\n' "$dest" >&2; return 1; } + return 0 +} +# <<< fetch-safe inline <<< + # ───────────────────────────────────────────────────────────────────────────── # CLI args # ───────────────────────────────────────────────────────────────────────────── @@ -296,7 +393,10 @@ fetch_agents_or_warn() { if [ -n "$LARRY_AGENTS_URL" ]; then log "fetching agent definitions from $LARRY_AGENTS_URL" for f in $LARRY_AGENT_FILES; do - curl -fsSL --max-time 10 "$LARRY_AGENTS_URL/$f" -o "$LARRY_HOME/agents/$f" \ + # v0.8.4: validate the fetch (HTML-sign-in-page trap + non-HTML shape) + # before trusting the bytes; on failure fall back to the built-in agent + # rather than writing an HTML sign-in page into agents/. + fetch_validate "$LARRY_AGENTS_URL/$f" "$LARRY_HOME/agents/$f" text 10 \ || { warn "could not fetch $f — using built-in fallback"; write_fallback_agent "$f"; } done else @@ -370,7 +470,12 @@ _record_origin() { sync_from_manifest() { local base="$1" local manifest="$LARRY_HOME/.manifest.new" - curl -fsSL --max-time 10 "$base/MANIFEST" -o "$manifest" 2>/dev/null || { + # v0.8.4: validate the MANIFEST fetch. If Gitea is private/sign-in-gated it + # answers with the HTML login page at HTTP 200; the old `curl -fsSL` treated + # that as success and the loop below then iterated HTML lines as file paths + # and overwrote real on-disk files with HTML. fetch_validate fails loud and + # leaves $manifest absent, so we abort cleanly without corrupting anything. + fetch_validate "$base/MANIFEST" "$manifest" manifest 10 || { rm -f "$manifest" return 1 } @@ -393,7 +498,17 @@ sync_from_manifest() { dest="$LARRY_HOME/$path" tmp="$dest.new" mkdir -p "$(dirname "$dest")" 2>/dev/null - if curl -fsSL --max-time 15 "$base/$path" -o "$tmp" 2>/dev/null && [ -s "$tmp" ]; then + # v0.8.4: per-file content validation. Infer the shape contract from the + # path so a sign-in-page (or any HTML) response can never be written over a + # real lib/agent/metadata file. fetch_validate writes $tmp only on success. + local _kind + case "$path" in + VERSION) _kind=version ;; + MANIFEST) _kind=manifest ;; + *.sh) _kind=sh ;; + *) _kind=text ;; + esac + if fetch_validate "$base/$path" "$tmp" "$_kind" 15 && [ -s "$tmp" ]; then if [ ! -f "$dest" ] || ! cmp -s "$dest" "$tmp"; then mv "$tmp" "$dest" case "$path" in *.sh) chmod +x "$dest" 2>/dev/null || true ;; esac @@ -429,13 +544,27 @@ sync_from_manifest_with_fallback() { return 1 } -# _fetch_with_fallback REL_PATH DEST [MAX_TIME] — v0.7.4 single-source fetch -# (name kept for call-site compatibility). Returns 0 if the file pulled -# non-empty, non-zero otherwise. Records the winning origin slot in -# $_LARRY_LAST_ORIGIN (always "primary" in single-source mode). +# _fetch_with_fallback REL_PATH DEST [MAX_TIME] [KIND] — v0.7.4 single-source +# fetch (name kept for call-site compatibility). Returns 0 if the file pulled +# AND passed content validation, non-zero otherwise. Records the winning +# origin slot in $_LARRY_LAST_ORIGIN (always "primary" in single-source mode). +# +# v0.8.4: routes through fetch_validate so the Gitea HTML-sign-in-page trap +# (HTTP 200 + login HTML) is caught BEFORE the bytes are trusted. KIND defaults +# to a shape inferred from REL_PATH (VERSION->version, larry.sh->script, +# *.sh->sh, else text). _fetch_with_fallback() { - local rel="$1" dest="$2" mt="${3:-15}" - if curl -fsSL --max-time "$mt" "$LARRY_BASE_URL/$rel" -o "$dest" 2>/dev/null && [ -s "$dest" ]; then + local rel="$1" dest="$2" mt="${3:-15}" kind="${4:-}" + if [ -z "$kind" ]; then + case "$rel" in + VERSION) kind=version ;; + MANIFEST) kind=manifest ;; + larry.sh) kind=script ;; + *.sh) kind=sh ;; + *) kind=text ;; + esac + fi + if fetch_validate "$LARRY_BASE_URL/$rel" "$dest" "$kind" "$mt" && [ -s "$dest" ]; then _record_origin primary "$LARRY_BASE_URL" return 0 fi diff --git a/lib/fetch-safe.sh b/lib/fetch-safe.sh new file mode 100644 index 0000000..5e4be0e --- /dev/null +++ b/lib/fetch-safe.sh @@ -0,0 +1,190 @@ +#!/usr/bin/env bash +# fetch-safe.sh — content-validating remote fetch for the Larry-Anywhere +# installer + auto-updater. +# +# WHY THIS EXISTS (root cause — see +# Deliverables/2026-05-27-cloverleaf-larry-stuck-update-and-tab-bug.md, +# Clover #5's diagnosis, "Problem 1"): +# +# `curl -fsSL` against a Gitea raw-file URL, when the Gitea instance +# requires sign-in (or the repo is private), returns the HTML *Sign-In +# page* with **HTTP 200** (Gitea answers an unauthenticated raw read with +# 303 -> /user/login, and `curl -L` follows it to a 200 HTML page). +# `curl -fsSL` only fails on HTTP 4xx/5xx, so it treats this 200-HTML as +# SUCCESS. The installer/updater then parses the HTML as VERSION/MANIFEST/ +# larry.sh content, finds no valid version, and either silently aborts OR +# (worse) overwrites real on-disk files with the HTML soup. +# +# That exact trap stranded Bryan's work-box at v0.7.3 until the Gitea +# `REQUIRE_SIGNIN_VIEW=false` flip. The flip fixed the symptom; this file +# fixes the *fragility* — any future private-repo install, Gitea +# re-privatization, or auth-gated mirror would hit the same silent trap. +# +# DESIGN: fail LOUD, never silently corrupt. After every fetch, before the +# caller trusts the bytes, we (a) detect the HTML-login-page trap and (b) +# validate the content shape per file type. On any failure we print an +# actionable error and return non-zero WITHOUT leaving a poisoned file in +# place. +# +# OPTIONAL AUTH: if LARRY_GITEA_TOKEN (or GITEA_TOKEN) is set, fetches add an +# `Authorization: token ` header so the updater works against a +# private repo without the public-flip. The token value is NEVER logged. +# +# SOURCING NOTE: this file is the canonical, version-controlled home of these +# validators and is listed in MANIFEST so it propagates + stays auditable. +# BUT both install-larry.sh (the curl|bash bootstrap, which runs before any +# lib/ file exists on disk) and larry.sh's self_update() (which runs before +# lib/ is sourced) carry an INLINE, byte-identical copy of these functions so +# they work pre-source. When you change a validator here, mirror it in those +# two inline blocks (each is fenced with `# >>> fetch-safe inline (keep in +# sync with lib/fetch-safe.sh) >>>`). +# +# Defines functions only; runs no code on source; touches no set -e/-u/-o +# pipefail (the caller owns those). Re-sourcing is harmless. + +# _fs_curl_auth_args — emit the optional Authorization header args on stdout, +# one per line, IF a Gitea PAT is present in the environment. Never echoes the +# token to a log; the caller splices the lines straight into curl's argv. +_fs_curl_auth_args() { + local _tok="${LARRY_GITEA_TOKEN:-${GITEA_TOKEN:-}}" + # Strip CR (Cygwin/MobaXterm paste can taint an env var with a trailing \r, + # which would corrupt the HTTP header line and get the request rejected). + _tok="${_tok//$'\r'/}" + if [ -n "$_tok" ]; then + printf '%s\n' '-H' + printf '%s\n' "Authorization: token $_tok" + fi +} + +# fetch_validate URL DEST KIND [MAX_TIME] +# URL — fully-qualified remote URL to fetch +# DEST — local path to write on success (left ABSENT/untouched on failure) +# KIND — content-shape contract, one of: +# version -> first line must match ^[0-9]+\.[0-9]+\.[0-9]+ +# manifest -> newline list of plausible paths, no HTML chars +# script -> first line must be `#!/usr/bin/env bash` +# sh -> shebang OR at least non-HTML (lib helper files) +# text -> just "not the HTML sign-in trap" (default) +# MAX_TIME — curl --max-time seconds (default 15) +# +# Returns 0 and writes DEST only when BOTH the HTML-trap check AND the +# content-shape check pass. Returns non-zero (and prints an actionable error) +# otherwise, leaving DEST untouched so the caller never overwrites a real file +# with garbage. +fetch_validate() { + local url="$1" dest="$2" kind="${3:-text}" mt="${4:-15}" + local tmp hdr code ctype first + tmp="$(mktemp 2>/dev/null || echo "${dest}.fs.$$")" + hdr="$(mktemp 2>/dev/null || echo "${dest}.fsh.$$")" + + # Build curl argv. -D dumps response headers so we can inspect Content-Type + # and the final HTTP status. -w prints the final code on stdout's tail (we + # capture it separately). We deliberately DO follow redirects (-L) so we can + # still reach a CDN/mirror that legitimately 301s, but the post-fetch checks + # below catch the /user/login HTML landing that the redirect produces. + local _args=( -sSL --max-time "$mt" -o "$tmp" -D "$hdr" -w '%{http_code}' ) + # Splice optional auth header (read line-by-line to preserve spaces). + local _auth_line + while IFS= read -r _auth_line; do + [ -n "$_auth_line" ] && _args+=( "$_auth_line" ) + done < <(_fs_curl_auth_args) + + code="$(curl "${_args[@]}" "$url" 2>/dev/null)" + local rc=$? + code="${code//$'\r'/}" + + # Hard transport failure (curl non-zero, or empty body). + if [ "$rc" -ne 0 ] || [ ! -s "$tmp" ]; then + rm -f "$tmp" "$hdr" + printf 'error: %s — fetch failed (curl rc=%s, empty=%s). Origin unreachable or timed out.\n' \ + "$url" "$rc" "$([ -s "$tmp" ] && echo no || echo yes)" >&2 + return 1 + fi + + # ── HTML-login-page trap detection (ANY one of these is a hard fail) ────── + ctype="$(grep -i '^content-type:' "$hdr" 2>/dev/null | tail -1 | tr -d '\r' | tr 'A-Z' 'a-z')" + first="$(head -c 4096 "$tmp" 2>/dev/null | tr -d '\r')" + + if printf '%s' "$first" | grep -qi 'sign in'; then + rm -f "$tmp" "$hdr" + _fs_html_trap_error "$url" + return 1 + fi + case "$ctype" in + *text/html*) + rm -f "$tmp" "$hdr" + _fs_html_trap_error "$url" + return 1 + ;; + esac + + rm -f "$hdr" + + # ── Content-shape validation per KIND ───────────────────────────────────── + local line1 + line1="$(head -1 "$tmp" 2>/dev/null | tr -d '\r')" + case "$kind" in + version) + local ver + ver="$(printf '%s' "$first" | tr -d '[:space:]')" + if ! printf '%s' "$ver" | grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+'; then + rm -f "$tmp" + printf 'error: %s — expected a semver VERSION (e.g. 0.8.4), got %s. Not valid file content.\n' \ + "$url" "$(_fs_snippet "$tmp" "$first")" >&2 + return 1 + fi + ;; + manifest) + # Must contain at least one plausible path line and NO HTML angle bracket. + if printf '%s' "$first" | grep -q '<'; then + rm -f "$tmp" + printf 'error: %s — MANIFEST contains HTML markup ("<"), not a path list.\n' "$url" >&2 + return 1 + fi + if ! grep -Eq '^[A-Za-z0-9_][A-Za-z0-9_./-]*$' "$tmp"; then + rm -f "$tmp" + printf 'error: %s — MANIFEST has no plausible path line.\n' "$url" >&2 + return 1 + fi + ;; + script) + if [ "$line1" != '#!/usr/bin/env bash' ]; then + rm -f "$tmp" + printf 'error: %s — larry.sh must start with `#!/usr/bin/env bash`, got %s.\n' \ + "$url" "$(_fs_snippet "$tmp" "$first")" >&2 + return 1 + fi + ;; + sh) + # A shebang is ideal; at minimum it must not be HTML (already checked). + case "$line1" in + '#!'*) : ;; + *) + # Non-shebang .sh (rare) — accept as long as it isn't HTML (above). + : ;; + esac + ;; + text|*) : ;; + esac + + # All checks passed — atomically place the validated bytes. + mkdir -p "$(dirname "$dest")" 2>/dev/null || true + mv "$tmp" "$dest" || { rm -f "$tmp"; printf 'error: cannot write %s\n' "$dest" >&2; return 1; } + return 0 +} + +# _fs_html_trap_error URL — print the canonical, actionable HTML-trap error. +_fs_html_trap_error() { + printf 'error: %s returned an HTML sign-in page, not file content. The Gitea repo is private or the instance requires sign-in. Either (a) make the repo public + set REQUIRE_SIGNIN_VIEW=false, or (b) set LARRY_GITEA_TOKEN= for authenticated fetch.\n' \ + "$1" >&2 +} + +# _fs_snippet TMPFILE FALLBACK — a short, single-line, log-safe preview of what +# we actually received (first 60 chars), so errors are diagnosable without +# dumping a full HTML page. +_fs_snippet() { + local f="$1" fb="$2" s + s="$(head -c 60 "$f" 2>/dev/null | tr -d '\r\n' )" + [ -z "$s" ] && s="$fb" + printf '"%s..."' "$s" +}