cloverleaf-larry/install-larry.sh
Bryan Johnson 60b8f0e1c8 v0.8.2: Presidio sidecar for free-text NER (tier-5) — closes V1
The only path that closes V1 (free-text PHI gap — the dominant real-world
failure mode per Vera). Opt-in install; larry runs in v0.8.1 mode on hosts
without Presidio (MobaXterm/Cygwin per Bryan's accepted tradeoff).

New files:
- lib/phi-presidio-sidecar.py — FastAPI service on 127.0.0.1:$LARRY_PHI_PORT
  (default 41189). Presidio AnalyzerEngine + AnonymizerEngine over spaCy
  en_core_web_sm + 3 HL7-specific custom recognizers (HL7_MRN, HL7_CARET_NAME,
  HL7_PHONE_BARE). POST /redact and GET /health.
- lib/phi-sidecar.sh — lifecycle (start/stop/status/health/ensure). ensure
  is idempotent; called backgrounded from main_loop so it never blocks the
  first prompt. Honors LARRY_PHI_VENV.
- lib/phi-client.sh — bash client (phi_client_available / phi_redact_text /
  phi_redact_entities). CR-safe; 5s timeout bounds tier-5 stall.

larry.sh:
- auto_detect_phi gains tier-5: after tiers 1-4, before status summary,
  source phi-client.sh, run Presidio on a token-masked copy of the input,
  tokenize each entity through hl7-sanitize.sh tokenize-value (category
  presidio_<TYPE>) so token IDs stay stable. Honors confirm + strict modes.
  Removed the v0.7.3 early-return that skipped past tier-5 when tiers 1-4
  found nothing — pure prose now always reaches tier-5.
- Token-safe substitution: existing [[...]] tokens are pulled to sentinels,
  tier-5 value is replaced, sentinels restored — prevents the token-within-
  token corruption that naive literal-replace caused on already-tokenized
  text. Acronym guard drops HL7/clinical jargon (SSN/MRN/DOB/ADT) Presidio
  over-tags as ORGANIZATION.
- Graceful degradation: sidecar unreachable → tier-5 no-ops with a one-time
  stderr warning. /phi-sidecar slash command + completion table.

install-larry.sh:
- Probes python3 3.9+; offers to create $LARRY_HOME/phi-venv and install
  presidio + fastapi + uvicorn + en_core_web_sm. Skips silently (with a
  v0.8.1-mode note) on Cygwin/MobaXterm without python3, and on
  non-interactive pipe installs. Sets LARRY_PHI_VENV in the larry shim.

MANIFEST: three new lib files added for auto-sync.

Prototype validation (Bryan's Mac, Apple Silicon, Python 3.14):
  cold start (en_core_web_sm): ~9s   (vs ~82s if Presidio auto-grabs _lg;
                                       we pin _sm for the REPL budget)
  warm analyzer latency:       P50 20.6ms / P95 22.7ms
  end-to-end HTTP round-trip:  ~57ms warm; ~150ms first-post-startup
All comfortably under the 200ms-per-turn budget.

MobaXterm verdict: v0.8.2 is Mac/Linux-only. MobaXterm stays on v0.8.1 +
nudges, per Bryan's explicit acceptance. install-larry.sh enforces this
by platform detection; larry.sh tier-5 silently no-ops when the sidecar
is absent (which IS the MobaXterm path — no code is platform-gated).

Verification: bash -n clean on larry.sh + all 3 new lib scripts; python3
ast.parse clean on the sidecar; end-to-end tier-5 tested live against the
sidecar (pure prose, rule-pack+tier-5 combined with no token corruption,
!nophi bypass); strict-mode fail-closed abort tested; CR-taint, path-block,
and base64 round-trip batteries re-run green.

Co-Authored-By: Clover (Claude Opus 4.7) <noreply@anthropic.com>
2026-05-27 20:00:23 -07:00

292 lines
16 KiB
Bash
Executable File

#!/usr/bin/env bash
# install-larry.sh — bootstrap Larry-Anywhere on a fresh remote shell.
# No root, no package install, no sudo. Writes only into $LARRY_HOME.
#
# Usage:
# curl -fsSL <BASE_URL>/install-larry.sh | bash
#
# Or with explicit base URL:
# LARRY_BASE_URL=https://example.com/larry-anywhere bash install-larry.sh
#
# Env vars:
# LARRY_HOME install location (default: $HOME/.larry)
# LARRY_BASE_URL where to fetch files from (no trailing slash)
# LARRY_BIN_DIR where to symlink the `larry` command (default: $HOME/bin)
set -eu
LARRY_HOME="${LARRY_HOME:-$HOME/.larry}"
# Canonical hosting (v0.7.4): self-hosted Gitea at git.bjnoela.com is the
# single source. The v0.7.2 GitHub fallback was removed after the GitHub
# mirror was made private (anonymous raw fetches now 401/403, so the
# fallback was functionally broken). Override LARRY_BASE_URL via env if you
# fork or mirror elsewhere.
#
# IMPORTANT: the Gitea repo must be PUBLIC for unauthenticated raw-URL reads
# to succeed. If the install fetch fails, set LARRY_BASE_URL to a reachable
# mirror or check repo visibility on git.bjnoela.com.
LARRY_BASE_URL="${LARRY_BASE_URL:-https://git.bjnoela.com/bryan/cloverleaf-larry/raw/branch/main}"
LARRY_BIN_DIR="${LARRY_BIN_DIR:-$HOME/bin}"
C_RESET=$'\033[0m'; C_BOLD=$'\033[1m'; C_GREEN=$'\033[32m'
C_YELLOW=$'\033[33m'; C_RED=$'\033[31m'; C_CYAN=$'\033[36m'
say() { printf '%s%sinstall-larry>%s %s\n' "$C_CYAN" "$C_BOLD" "$C_RESET" "$*"; }
ok() { printf ' %s✓%s %s\n' "$C_GREEN" "$C_RESET" "$*"; }
warn() { printf ' %s!%s %s\n' "$C_YELLOW" "$C_RESET" "$*"; }
die() { printf '%serror:%s %s\n' "$C_RED" "$C_RESET" "$*" >&2; exit 1; }
# ─────────────────────────────────────────────────────────────────────────────
# Detect platform
# ─────────────────────────────────────────────────────────────────────────────
UNAME_S="$(uname -s 2>/dev/null || echo unknown)"
PLATFORM=""
case "$UNAME_S" in
Linux*) PLATFORM="linux" ;;
Darwin*) PLATFORM="darwin" ;;
CYGWIN*|MINGW*|MSYS*) PLATFORM="windows-cygwin" ;; # MobaXterm lives here
*) PLATFORM="unknown" ;;
esac
ARCH="$(uname -m 2>/dev/null || echo unknown)"
case "$ARCH" in
x86_64|amd64) ARCH_NORM="amd64" ;;
aarch64|arm64) ARCH_NORM="arm64" ;;
i?86) ARCH_NORM="i386" ;;
*) ARCH_NORM="$ARCH" ;;
esac
say "platform: $PLATFORM/$ARCH_NORM • LARRY_HOME=$LARRY_HOME"
# ─────────────────────────────────────────────────────────────────────────────
# Check required commands
# ─────────────────────────────────────────────────────────────────────────────
command -v bash >/dev/null 2>&1 || die "bash not found"
command -v curl >/dev/null 2>&1 || die "curl not found"
# ─────────────────────────────────────────────────────────────────────────────
# Make dirs
# ─────────────────────────────────────────────────────────────────────────────
mkdir -p "$LARRY_HOME"/{agents,sessions,bin,lib} || die "cannot create $LARRY_HOME"
chmod 700 "$LARRY_HOME" 2>/dev/null || true
ok "created $LARRY_HOME"
# ─────────────────────────────────────────────────────────────────────────────
# Fetch the scripts. If LARRY_BASE_URL is not set, try to detect being run
# from a local checkout (sibling files present) — copy locally instead.
# ─────────────────────────────────────────────────────────────────────────────
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" 2>/dev/null && pwd)" || SCRIPT_DIR=""
# v0.7.4 single-source: install-larry.sh is the FIRST contact with the
# origin — it runs before larry.sh exists, so it must succeed in one shot.
# No fallback; if $LARRY_BASE_URL is unreachable we die with a clear error
# telling the user to verify the URL or set an alternate mirror.
fetch() {
# $1 = remote relative path, $2 = local destination
if [ -n "$LARRY_BASE_URL" ]; then
say "fetching $1"
if curl -fsSL --max-time 30 "$LARRY_BASE_URL/$1" -o "$2" 2>/dev/null && [ -s "$2" ]; then
ok "$2"
return 0
fi
rm -f "$2"
die "install failed: cannot reach LARRY_BASE_URL=$LARRY_BASE_URL (fetching $1) — verify the URL or set LARRY_BASE_URL to a reachable mirror"
elif [ -n "$SCRIPT_DIR" ] && [ -f "$SCRIPT_DIR/$1" ]; then
cp "$SCRIPT_DIR/$1" "$2" && ok "copied $1 (local)"
else
die "no LARRY_BASE_URL set and $1 not found in script dir"
fi
}
fetch larry.sh "$LARRY_HOME/larry.sh"
fetch larry-tunnel.sh "$LARRY_HOME/larry-tunnel.sh"
fetch agents/larry.md "$LARRY_HOME/agents/larry.md"
fetch agents/clover.md "$LARRY_HOME/agents/clover.md"
fetch agents/cloverleaf-cheatsheet.md "$LARRY_HOME/agents/cloverleaf-cheatsheet.md"
fetch agents/regress.md "$LARRY_HOME/agents/regress.md"
fetch larry-rollback.sh "$LARRY_HOME/larry-rollback.sh"
fetch larry-auth.sh "$LARRY_HOME/larry-auth.sh"
fetch lib/oauth.sh "$LARRY_HOME/lib/oauth.sh"
fetch lib/ssh-helper.sh "$LARRY_HOME/lib/ssh-helper.sh"
fetch lib/lessons.sh "$LARRY_HOME/lib/lessons.sh"
fetch lib/hl7-sanitize.sh "$LARRY_HOME/lib/hl7-sanitize.sh"
fetch lib/hl7-desanitize.sh "$LARRY_HOME/lib/hl7-desanitize.sh"
fetch lib/each.sh "$LARRY_HOME/lib/each.sh"
fetch lib/each-site.sh "$LARRY_HOME/lib/each-site.sh"
fetch lib/len2nl.sh "$LARRY_HOME/lib/len2nl.sh"
fetch lib/csv-to-table.sh "$LARRY_HOME/lib/csv-to-table.sh"
fetch lib/table-to-csv.sh "$LARRY_HOME/lib/table-to-csv.sh"
fetch lib/nc-engine.sh "$LARRY_HOME/lib/nc-engine.sh"
fetch lib/nc-status.sh "$LARRY_HOME/lib/nc-status.sh"
fetch lib/nc-table.sh "$LARRY_HOME/lib/nc-table.sh"
fetch lib/nc-xlate.sh "$LARRY_HOME/lib/nc-xlate.sh"
fetch lib/nc-smat-diff.sh "$LARRY_HOME/lib/nc-smat-diff.sh"
fetch lib/nc-create-thread.sh "$LARRY_HOME/lib/nc-create-thread.sh"
fetch lib/nc-tclgen.sh "$LARRY_HOME/lib/nc-tclgen.sh"
fetch lib/nc-parse.sh "$LARRY_HOME/lib/nc-parse.sh"
fetch lib/nc-inbound.sh "$LARRY_HOME/lib/nc-inbound.sh"
fetch lib/nc-make-jump.sh "$LARRY_HOME/lib/nc-make-jump.sh"
fetch lib/hl7-field.sh "$LARRY_HOME/lib/hl7-field.sh"
fetch lib/hl7-schema.sh "$LARRY_HOME/lib/hl7-schema.sh"
fetch lib/nc-msgs.sh "$LARRY_HOME/lib/nc-msgs.sh"
fetch lib/nc-document.sh "$LARRY_HOME/lib/nc-document.sh"
fetch lib/nc-diff-interface.sh "$LARRY_HOME/lib/nc-diff-interface.sh"
fetch lib/nc-find.sh "$LARRY_HOME/lib/nc-find.sh"
fetch lib/nc-insert-protocol.sh "$LARRY_HOME/lib/nc-insert-protocol.sh"
fetch lib/hl7-diff.sh "$LARRY_HOME/lib/hl7-diff.sh"
fetch lib/nc-regression.sh "$LARRY_HOME/lib/nc-regression.sh"
fetch lib/journal.sh "$LARRY_HOME/lib/journal.sh"
fetch VERSION "$LARRY_HOME/VERSION"
fetch MANUAL.md "$LARRY_HOME/MANUAL.md"
chmod +x "$LARRY_HOME/larry.sh" "$LARRY_HOME/larry-tunnel.sh" "$LARRY_HOME/larry-rollback.sh" "$LARRY_HOME/larry-auth.sh" "$LARRY_HOME/lib/"*.sh
# ─────────────────────────────────────────────────────────────────────────────
# jq fallback — download static binary into $LARRY_HOME/bin/ if missing
# ─────────────────────────────────────────────────────────────────────────────
if ! command -v jq >/dev/null 2>&1 && [ ! -x "$LARRY_HOME/bin/jq" ]; then
say "jq not found — fetching static binary into $LARRY_HOME/bin/jq"
JQ_BASE="https://github.com/jqlang/jq/releases/download/jq-1.7.1"
JQ_URL=""
case "$PLATFORM/$ARCH_NORM" in
linux/amd64) JQ_URL="$JQ_BASE/jq-linux-amd64" ;;
linux/arm64) JQ_URL="$JQ_BASE/jq-linux-arm64" ;;
linux/i386) JQ_URL="$JQ_BASE/jq-linux-i386" ;;
darwin/amd64) JQ_URL="$JQ_BASE/jq-macos-amd64" ;;
darwin/arm64) JQ_URL="$JQ_BASE/jq-macos-arm64" ;;
windows-cygwin/amd64) JQ_URL="$JQ_BASE/jq-windows-amd64.exe" ;;
windows-cygwin/i386) JQ_URL="$JQ_BASE/jq-windows-i386.exe" ;;
*) warn "no jq binary known for $PLATFORM/$ARCH_NORM — install jq manually" ;;
esac
if [ -n "$JQ_URL" ]; then
local_jq="$LARRY_HOME/bin/jq"
case "$PLATFORM" in windows-cygwin) local_jq="$LARRY_HOME/bin/jq.exe" ;; esac
if curl -fsSL --max-time 60 "$JQ_URL" -o "$local_jq"; then
chmod +x "$local_jq"
ok "jq -> $local_jq"
else
warn "could not download jq from $JQ_URL"
fi
fi
else
ok "jq available"
fi
# ─────────────────────────────────────────────────────────────────────────────
# Drop a `larry` shim onto PATH (best-effort)
# ─────────────────────────────────────────────────────────────────────────────
mkdir -p "$LARRY_BIN_DIR" 2>/dev/null || true
if [ -d "$LARRY_BIN_DIR" ] && [ -w "$LARRY_BIN_DIR" ]; then
cat > "$LARRY_BIN_DIR/larry" <<EOF
#!/usr/bin/env bash
# Auto-generated by install-larry.sh
export LARRY_HOME="${LARRY_HOME}"
exec "${LARRY_HOME}/larry.sh" "\$@"
EOF
chmod +x "$LARRY_BIN_DIR/larry"
ok "shim: $LARRY_BIN_DIR/larry"
case ":$PATH:" in
*":$LARRY_BIN_DIR:"*) : ;;
*) warn "$LARRY_BIN_DIR is not on PATH — add 'export PATH=\"$LARRY_BIN_DIR:\$PATH\"' to your shell rc" ;;
esac
else
warn "cannot write to $LARRY_BIN_DIR — invoke larry directly as: $LARRY_HOME/larry.sh"
fi
# ─────────────────────────────────────────────────────────────────────────────
# v0.8.2 — optional PHI Presidio sidecar (free-text NER).
# Closes V1 from Vera's PHI-leak audit. Opt-in install; larry runs in
# v0.8.1 mode (rule-pack only) on hosts where this isn't installed.
# We probe for python3 + pip, then offer the install. Skip silently if
# python3 isn't available — keeps the install one-shot on raw MobaXterm
# where Python may not be present.
# ─────────────────────────────────────────────────────────────────────────────
if command -v python3 >/dev/null 2>&1; then
PYV=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")' 2>/dev/null || echo "")
case "$PYV" in
3.9|3.10|3.11|3.12|3.13|3.14|3.15) PY_OK=1 ;;
*) PY_OK=0 ;;
esac
if [ "${PY_OK:-0}" = "1" ]; then
say "v0.8.2: Presidio PHI sidecar is available (python $PYV detected)"
echo " Presidio provides free-text NER (names, addresses, dates in prose)"
echo " that the regex tiers miss. Install adds presidio_analyzer +"
echo " presidio_anonymizer + fastapi + uvicorn + spaCy en_core_web_sm"
echo " to a dedicated virtualenv at $LARRY_HOME/phi-venv (~400MB on disk,"
echo " ~250MB RAM resident when running). One-time cost; tier-5 NER"
echo " then runs on every prompt with ~20ms latency."
echo ""
# Heuristic: if stdin is a TTY, prompt. Otherwise (curl|bash pipe), skip.
INSTALL_PHI=""
if [ -t 0 ]; then
printf 'install Presidio sidecar now? [y/N]: '
read -r INSTALL_PHI </dev/tty || INSTALL_PHI=""
else
echo " (non-interactive install — skip; rerun installer with TTY or set"
echo " LARRY_INSTALL_PHI=1 to enable. To install manually later:"
echo " python3 -m venv $LARRY_HOME/phi-venv"
echo " $LARRY_HOME/phi-venv/bin/pip install presidio_analyzer presidio_anonymizer fastapi uvicorn"
echo " $LARRY_HOME/phi-venv/bin/python -m spacy download en_core_web_sm)"
INSTALL_PHI="${LARRY_INSTALL_PHI:-n}"
fi
case "${INSTALL_PHI:-}" in
y|Y|yes|YES|1)
say "installing Presidio sidecar to $LARRY_HOME/phi-venv (this takes 2-5 minutes)..."
if python3 -m venv "$LARRY_HOME/phi-venv" >/dev/null 2>&1; then
if "$LARRY_HOME/phi-venv/bin/pip" install --quiet \
presidio_analyzer presidio_anonymizer fastapi uvicorn >/dev/null 2>&1; then
if "$LARRY_HOME/phi-venv/bin/python" -m spacy download en_core_web_sm \
>/dev/null 2>&1; then
ok "Presidio sidecar installed (venv: $LARRY_HOME/phi-venv)"
# Set LARRY_PHI_VENV in the shim so larry auto-uses it.
if [ -f "$LARRY_BIN_DIR/larry" ]; then
sed -i.bak "s|^exec \"|export LARRY_PHI_VENV=\"$LARRY_HOME/phi-venv\"\nexec \"|" \
"$LARRY_BIN_DIR/larry" 2>/dev/null || true
rm -f "$LARRY_BIN_DIR/larry.bak"
fi
else
warn "spaCy en_core_web_sm download failed; sidecar will not start until model is present"
fi
else
warn "pip install failed; Presidio sidecar not available on this host (larry runs in v0.8.1 mode)"
fi
else
warn "python3 -m venv failed; cannot install Presidio (larry runs in v0.8.1 mode)"
fi
;;
*)
ok "skipped Presidio install — larry runs in v0.8.1 mode (rule-pack auto-PHI only)"
;;
esac
else
warn "python3 detected but version ($PYV) is not 3.9+; Presidio sidecar requires 3.9+"
warn "larry runs in v0.8.1 mode (rule-pack auto-PHI only) on this host"
fi
else
case "$PLATFORM" in
windows-cygwin)
warn "python3 not detected on Cygwin/MobaXterm. v0.8.2 Presidio sidecar SKIPPED."
warn "Bryan's accepted tradeoff: MobaXterm stays on v0.8.1 + prompt nudges."
;;
*)
warn "python3 not on PATH; Presidio sidecar skipped (larry runs in v0.8.1 mode)"
;;
esac
fi
# ─────────────────────────────────────────────────────────────────────────────
# Done
# ─────────────────────────────────────────────────────────────────────────────
echo ""
say "install complete (no system changes were made; everything lives under $LARRY_HOME)"
say "origin (single-source, v0.7.4): $LARRY_BASE_URL"
echo ""
echo "Next steps:"
echo " 1) export ANTHROPIC_API_KEY=sk-ant-... (or larry will prompt on first run)"
echo " 2) larry (or $LARRY_HOME/larry.sh)"
echo " 3) larry /path/to/cloverleaf/site_root (to start with a working dir)"
echo ""
echo "Reverse SSH tunnel (optional, run in another shell or backgrounded):"
echo " $LARRY_HOME/larry-tunnel.sh --serveo # zero-config"
echo " $LARRY_HOME/larry-tunnel.sh --hop=user@bjnoela.com:22 # your hop"
echo ""