From fc667e24511642fdd14562bc81f93c57fc8525c7 Mon Sep 17 00:00:00 2001 From: Bryan Johnson Date: Thu, 28 May 2026 08:58:49 -0700 Subject: [PATCH] =?UTF-8?q?v0.8.15:=20legacy/qa=20remote-enumeration=20fix?= =?UTF-8?q?=20=E2=80=94=20per-alias=20HCIROOT=20pin=20(sudo-gated=20profil?= =?UTF-8?q?e=20bypass),=20hcisitelist-free=20NetConfig=20walk,=20ControlMa?= =?UTF-8?q?ster=20banner+rotating-pw=20hardening;=20zero=20traffic-bypass?= =?UTF-8?q?=20primitives?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MAJOR-1: regenerate MANIFEST (larry.sh, lib/ssh-helper.sh, VERSION, CHANGELOG.md hashes now authoritative for the v0.8.15 bytes). MINOR-1: print_help /sites line documents the --hciroot pin convenience and the pinned-vs-login resolution distinction. Co-Authored-By: Claude Opus 4.7 --- CHANGELOG.md | 62 ++++++++++ MANIFEST | 8 +- VERSION | 2 +- larry.sh | 110 +++++++++++++++-- lib/ssh-helper.sh | 305 +++++++++++++++++++++++++++++++++++++++------- 5 files changed, 426 insertions(+), 61 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d18183e..efeefb4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,68 @@ All notable changes to `cloverleaf-larry` / `larry-anywhere` are recorded here. Versioning is loose-semver; bumps trigger the in-process self-update on every running client via `LARRY_BASE_URL` + `MANIFEST`. +## v0.8.15 — 2026-05-28 + +Legacy/qa remote-enumeration fix (Clover). Three confirmed-live properties of +the qa box `bryjohnx@lhsixfqa` (→ Cloverleaf host `shdclvf01q`, release +cis2025.01) broke v0.8.13/v0.8.14 remote site enumeration: a **sudo-gated login +profile** (a non-interactive SSH session hits `sudo: a terminal is required`, so +`bash -lc` can't initialize the env and `$HCIROOT` comes back EMPTY); **no +`hcisitelist`** on the box; and a **password that rotates ~every 12h** (stale +stored credential, and a pre-auth banner that masked the real auth error). All +three are now handled. The no-traffic-bypass security line is unchanged — **zero +proxy / masking / evasion primitives** (same Gundersen-class control). + +1. **Per-alias HCIROOT pin (load-bearing).** New `ssh-helper.sh set-hciroot + ` and the `/ssh-set-hciroot ` slash command + persist an HCIROOT for an alias as a 4th column in `.ssh-hosts.tsv` (old + 3-column files stay valid; an empty path clears the pin). When an alias is + pinned, `exec` / `discover` / `pull-smat` run the remote command with + `HCIROOT=` exported EXPLICITLY under a NON-login `sh -c` — they do NOT + wrap in `bash -lc`, so the sudo-gated login profile is never invoked. A single + chokepoint, `_remote_cmd_for`, makes every remote path honour the pin + identically (unpinned aliases keep the v0.8.13 `bash -lc` login-shell + behaviour, unchanged). `/sites --hciroot ` is a convenience that + persists the pin then enumerates. This makes qa work regardless of the broken + profile. `qa` HCIROOT = `/hci/cis2025.01/integrator`. +2. **Portable site enumeration (no `hcisitelist` dependency).** `discover`'s + remote script now makes the **NetConfig walk the PRIMARY path**, identical to + `lib/each-site.sh` (`find $HCIROOT -mindepth 1 -maxdepth 2 -name NetConfig + -type f` → dirname → basename → sort -u). `hcisitelist` is consulted ONLY if + it is actually present AND the walk found nothing — never as the dependency. + Works on a box with no `hcisitelist`. Emits clear `NOTE` lines (HCIROOT empty + → suggests the pin; not a directory; no NetConfigs found). +3. **ControlMaster-open hardening (banner + rotating password).** `setup` now + forces `-o PreferredAuthentications=password -o PubkeyAuthentication=no -o + NumberOfPasswordPrompts=1` so sshpass feeds the password cleanly past the + pre-auth banner and a stale credential fails fast instead of hanging; it + surfaces the REAL auth error (greps for permission/auth/password/host-key + keywords) instead of echoing only the banner; and on an auth failure it + RE-PROMPTS for a fresh password (the 12h rotation), stores it 0600, and + retries ONCE. Every failure path emits a clear next step — never a silent + no-op. +4. **`/sites` excludes non-real entries (transparent).** The enumeration now + drops (a) static scaffolding/special sites — `helloworld siteProto master`, + a documented, tunable `SITES_EXCLUDE` env var — and (b) any site dir whose + name equals the host: the REMOTE `discover` walk computes `hostname -s` and + full `hostname` and drops a match (qa's alias host is `lhsixfqa` but the + engine box is `shdclvf01q`; a dir just named after the box is not a site), + and also drops a match against the alias's configured SSH host. The filter + is applied at the SINGLE enumeration source so REMOTE (pinned + login-shell) + and LOCAL `/sites` behave identically. NOT silently hidden: the walk emits an + `EXCLUDED` line and the tool layer renders the real count as the headline with + a note, e.g. `sites: 21 (excluded: helloworld, master, siteProto)`. Acceptance + (qa, 24 raw dirs): `/sites qa` → 21, with the 3 exclusions noted; no dir + matches `shdclvf01q`. The no-traffic-bypass security line is unchanged. + +Acceptance (qa): with HCIROOT pinned to `/hci/cis2025.01/integrator`, +`/sites qa` returns the site list via the NetConfig walk with no `bash -lc` and +no `hcisitelist`; `/ssh-setup qa` with a fresh password opens the master past +the banner. Self-verified with `bash -n` on every changed file. POSIX-sh remote +scripts; compatible with bash 3.2 / Cygwin; no regression to unpinned aliases. + +--- + ## v0.8.14 — 2026-05-28 Locked-down-box survivability (Clover): make the full toolkit usable BY HAND diff --git a/MANIFEST b/MANIFEST index 8f55c6d..2d75948 100644 --- a/MANIFEST +++ b/MANIFEST @@ -23,16 +23,16 @@ # scripts/make-manifest.sh and bump VERSION. # Top-level scripts -larry.sh 2c10a738cd3fc14012b4d67fcdc58be40147593f604a3ddc66b19b6b4b0ea081 +larry.sh 2e7650eb7a014624bd6956c30ce3a54e0e87d4ccfc73bb0b2ae82d1a31b882e0 larry-tunnel.sh 6b050e4eeab15669f4858eaf3b807f168f211ced07815db9521bc40a093f6aaa larry-auth.sh a220cdf7878569dc3028951ee57fc8d5e706a8ca5c6aa45347b58facb386f831 larry-rollback.sh 91b5e9aa6c79266bf306dcfba4ca791c07971bd6924d67a779037531648aa6d0 install-larry.sh e97da4e12a0d8863ca18d79b12f6c4294c72fa6d4b11dffeab66504236bb4eb1 # Metadata -VERSION af0c015a6470ca542b68d7084a55652bee7798013d87487cd05fac1484a25980 +VERSION 8517de55d0fc1041caab07518dbf7da86dba47c3befe0a6ef84d005872cb799d MANUAL.md 666128a086b59ff3c31a574aec0c5dd681666d66319da9f078451bf9013ca5e1 -CHANGELOG.md aa0bd56caf29a0939a7b7d676bec9daed01606f9ac29f0180c0ac72c990d49be +CHANGELOG.md 0b8f2dba750577f934935dd7d5805c498afa9d516cd37e5b6cda039cb86ec350 # Agent personas (system-prompt overlays) agents/larry.md 11ea905fa7cac6fa7baeb11b2d62af07b15a666ce90cfe36491bcbc555244397 @@ -52,7 +52,7 @@ lib/fetch-safe.sh abecf0045b9856f63ffa346119443c11de56547344be32bddaed9fbae6b021 lib/oauth.sh 04a93376f88fe53cc1c86a5dbe577735c60375dadd4f2fda55b921ef3cddf22b # Secure SSH with ControlMaster (password hidden from Larry-the-LLM) -lib/ssh-helper.sh 7aa2aa7b3860cb48b7ba5120f9efc2563a6cdaed41242f42ecc9dd03fdebeb28 +lib/ssh-helper.sh 3397945df8184d0bc89853608c097af11b97b37695c5598c979347b6b912e0eb # v0.8.6: work-box → Mac headers.log sync (tsk-2026-05-27-023). Incremental, # offset-tracked push of $LARRY_HOME/log/headers.log to a daemon-watched path diff --git a/VERSION b/VERSION index 832bad2..7d87d99 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.8.14 +0.8.15 diff --git a/larry.sh b/larry.sh index 243ca51..91fb3a7 100755 --- a/larry.sh +++ b/larry.sh @@ -78,7 +78,7 @@ set -o pipefail # ───────────────────────────────────────────────────────────────────────────── # Config # ───────────────────────────────────────────────────────────────────────────── -LARRY_VERSION="0.8.14" +LARRY_VERSION="0.8.15" LARRY_HOME="${LARRY_HOME:-$HOME/.larry}" # ───────────────────────────────────────────────────────────────────────────── @@ -1247,7 +1247,7 @@ detect_cloverleaf_env() { esac if [ -n "$aliases" ]; then lines+=("Configured SSH aliases: $(printf '%s' "$aliases" | tr '\n' ' ')") - lines+=("For a remote alias: discover its env with the list_sites tool (alias=) — it opens a LOGIN shell so the remote \$HCIROOT resolves. NEVER ask Bryan to export \$HCIROOT for a remote host.") + lines+=("For a remote alias: discover its env with the list_sites tool (alias=) — it resolves the remote \$HCIROOT (login shell, or an explicit pin if set) and walks NetConfigs. NEVER ask Bryan to export \$HCIROOT for a remote host. If list_sites reports HCIROOT empty with a sudo-gated-profile NOTE, have Bryan pin it once: /ssh-set-hciroot (e.g. qa → /hci/cis2025.01/integrator).") fi if [ -n "${HCIROOT:-}" ]; then @@ -3840,11 +3840,29 @@ tool_list_sites() { local rroot; rroot=$(printf '%s\n' "$out" | awk -F'\t' '$1=="HCIROOT"{print $2; exit}') local sites; sites=$(printf '%s\n' "$out" | awk -F'\t' '$1=="SITE"{print $2}' | sort -u) local note; note=$(printf '%s\n' "$out" | awk -F'\t' '$1=="NOTE"{print $2}') + # v0.8.15: the discover walk filters out scaffolding/special sites and any + # dir named after the host; it reports what it dropped on an EXCLUDED line. + # Surface it transparently (never silently hide) — the real-site count stays + # the headline below. + local excluded; excluded=$(printf '%s\n' "$out" | awk -F'\t' '$1=="EXCLUDED"{print $2; exit}') local n=0; [ -n "$sites" ] && n=$(printf '%s\n' "$sites" | grep -c .) - printf 'Cloverleaf env on alias "%s" (REMOTE, login shell):\n' "$alias" + # v0.8.15: report the actual resolution mode. If the alias has a pinned + # HCIROOT (4th column of the hosts TSV) the discover ran with HCIROOT + # exported explicitly and NO login profile; otherwise it used a login shell. + local _hosts_tsv="${LARRY_HOME:-$HOME/.larry}/.ssh-hosts.tsv" _pin="" _mode="login shell" + if [ -f "$_hosts_tsv" ]; then + _pin=$(awk -F'\t' -v a="$alias" 'NR>1 && $1==a { print $4; exit }' "$_hosts_tsv" 2>/dev/null) + fi + [ -n "$_pin" ] && _mode="pinned HCIROOT, no login profile" + printf 'Cloverleaf env on alias "%s" (REMOTE, %s):\n' "$alias" "$_mode" printf ' HCIROOT = %s\n' "${rroot:-}" [ -n "$note" ] && printf ' NOTE: %s\n' "$note" - printf ' sites: %d\n' "$n" + if [ -n "$excluded" ]; then + local _exc_csv; _exc_csv=$(printf '%s' "$excluded" | tr ' ' ',' | sed 's/,/, /g') + printf ' sites: %d (excluded: %s)\n' "$n" "$_exc_csv" + else + printf ' sites: %d\n' "$n" + fi [ -n "$sites" ] && printf '%s\n' "$sites" | sed 's/^/ - /' return 0 fi @@ -3863,10 +3881,37 @@ tool_list_sites() { sites=$(find "$root" -mindepth 1 -maxdepth 2 -name NetConfig -type f 2>/dev/null \ | while IFS= read -r nc; do basename "$(dirname "$nc")"; done | sort -u) fi + # v0.8.15: apply the SAME exclusion as the REMOTE discover walk — static + # scaffolding/special sites plus any dir named after this host. Tunable via + # the SITES_EXCLUDE env var (default: helloworld siteProto master). Never + # silently hidden: the dropped names are reported alongside the real count. + local _sites_exclude="${SITES_EXCLUDE:-helloworld siteProto master}" + local _hn_s _hn_f; _hn_s=$(hostname -s 2>/dev/null || true); _hn_f=$(hostname 2>/dev/null || true) + local _kept="" _dropped="" + if [ -n "$sites" ]; then + while IFS= read -r s; do + [ -n "$s" ] || continue + local _drop="" + local x; for x in $_sites_exclude; do [ "$s" = "$x" ] && _drop=1 && break; done + [ -z "$_drop" ] && [ -n "$_hn_s" ] && [ "$s" = "$_hn_s" ] && _drop=1 + [ -z "$_drop" ] && [ -n "$_hn_f" ] && [ "$s" = "$_hn_f" ] && _drop=1 + if [ -n "$_drop" ]; then _dropped="$_dropped $s"; else _kept="$_kept$s +"; fi + done <' / 'what sites exist' — NEVER ask Bryan to export or hand you $HCIROOT first; this tool resolves it for you. Works in BOTH deployment modes. REMOTE mode: pass alias= (a configured SSH alias, e.g. qa); the tool opens a LOGIN shell on that host so the remote $HCIROOT populates from the operator profile, then enumerates sites (Cloverleaf's hcisitelist if present, else a NetConfig walk). The ControlMaster must be open — if it is not, the result tells you to have Bryan run /ssh-setup . LOCAL mode: omit alias; the tool enumerates sites under the locally-detected $HCIROOT (or the hciroot override). Returns the resolved HCIROOT, a site count, and the site names.","input_schema":{"type":"object","properties":{"alias":{"type":"string","description":"REMOTE mode: an SSH alias from ssh_status (e.g. 'qa'). Omit for LOCAL mode (sites on this box)."},"hciroot":{"type":"string","description":"LOCAL mode only: override the detected $HCIROOT."}},"required":[]}}, + {"name":"list_sites","description":"List and COUNT the Cloverleaf sites in the environment. This is your proactive answer to 'how many sites are on ' / 'what sites exist' — NEVER ask Bryan to export or hand you $HCIROOT first; this tool resolves it for you. Works in BOTH deployment modes. REMOTE mode: pass alias= (a configured SSH alias, e.g. qa); the tool resolves the remote $HCIROOT and enumerates sites via a NetConfig walk (the version-agnostic ground truth; Cloverleaf's hcisitelist is used only if present AND the walk found nothing). If the alias has a PINNED HCIROOT (set via /ssh-set-hciroot), the walk runs with HCIROOT exported explicitly and SKIPS the login profile — this is required on hosts whose login profile is sudo-gated/non-interactive (a plain login shell there returns an EMPTY $HCIROOT). Otherwise it opens a LOGIN shell so the operator profile populates $HCIROOT. The ControlMaster must be open — if it is not, the result tells you to have Bryan run /ssh-setup . If the result shows HCIROOT empty with a NOTE about a sudo-gated profile, tell Bryan to pin it: /ssh-set-hciroot . LOCAL mode: omit alias; the tool enumerates sites under the locally-detected $HCIROOT (or the hciroot override). Returns the resolved HCIROOT, a site count, and the site names.","input_schema":{"type":"object","properties":{"alias":{"type":"string","description":"REMOTE mode: an SSH alias from ssh_status (e.g. 'qa'). Omit for LOCAL mode (sites on this box)."},"hciroot":{"type":"string","description":"LOCAL mode only: override the detected $HCIROOT."}},"required":[]}}, {"name":"hl7_diff","description":"HL7-aware diff between two message files (or multi-message dumps). Compares segment-by-segment, field-by-field, with component and subcomponent precision. Ignores configured fields (default MSH.7 timestamp) so timestamp-only diffs do not show up as noise. Use for regression testing between environments (e.g. test vs prod route-test outputs).","input_schema":{"type":"object","properties":{"left":{"type":"string","description":"Path to left HL7 file."},"right":{"type":"string","description":"Path to right HL7 file."},"ignore":{"type":"string","description":"Comma-separated list of fields to ignore (e.g. MSH.7,MSH.10,EVN.6). Default MSH.7."},"include":{"type":"string","description":"If set, ONLY these fields are compared (overrides ignore for that set)."},"format":{"type":"string","enum":["text","tsv","count"],"description":"text=human-readable diff, tsv=machine-parseable, count=just the difference count."}},"required":["left","right"]}}, @@ -5404,6 +5449,9 @@ Slash commands: /ssh-hosts list configured remote hosts /ssh-add register a new host /ssh-pass set/update password (hidden input; daily rotation OK) + /ssh-set-hciroot pin HCIROOT for an alias (sudo-gated/non-interactive + hosts that don't export it in a non-login shell; + empty path clears the pin) /ssh-setup open a long-lived ControlMaster connection /ssh-close close the ControlMaster /ssh-status [alias] show open masters + cred presence @@ -5530,7 +5578,12 @@ Slash commands: Audit: every tokenization writes a JSONL entry to \$LARRY_HOME/log/auto-phi.log (ts/value/category/token/tier/surface/context). /redetect re-scan for HCIROOT/HCISITE/tools + deployment mode - /sites [alias] count/list Cloverleaf sites — local, or REMOTE via (login shell) + /sites [alias] [--hciroot ] + count/list Cloverleaf sites — local, or REMOTE via . + Remote resolves \$HCIROOT via a login shell by default; pass + --hciroot to PIN it for the alias (persisted) and run + the walk with HCIROOT exported, skipping a sudo-gated/non- + interactive login profile. /site switch HCISITE for this session /pwd show current working directory /help this help @@ -5652,6 +5705,7 @@ _LARRY_SLASH_CMDS=( /ssh-add /ssh-remove /ssh-pass + /ssh-set-hciroot /ssh-setup /ssh-close /ssh-status @@ -5709,6 +5763,7 @@ _LARRY_SLASH_CMDS_DESC=( [/ssh-add]=" register a new host" [/ssh-remove]=" remove a host" [/ssh-pass]=" set/update password (hidden input)" + [/ssh-set-hciroot]=" pin HCIROOT for an alias (sudo-gated hosts; empty path clears)" [/ssh-setup]=" open a long-lived ControlMaster" [/ssh-close]=" close the ControlMaster" [/ssh-status]="show open ControlMaster sessions + cred presence" @@ -6838,6 +6893,21 @@ main_loop() { if [ -z "$rest" ]; then err "usage: /ssh-pass "; continue; fi _run_ssh_helper pass "$rest" continue ;; + /ssh-set-hciroot*) # v0.8.15: pin/persist HCIROOT for an alias so remote + # enumeration/exec exports it explicitly and SKIPS the login + # profile (for sudo-gated/non-interactive hosts, e.g. qa). + local rest; rest=$(_slash_args "/ssh-set-hciroot" "$input") + if [ -z "$rest" ]; then + err "usage: /ssh-set-hciroot (empty path clears the pin)"; continue + fi + local _sh_alias="${rest%% *}" _sh_path="${rest#"$_sh_alias"}" + _sh_path="${_sh_path# }" + if [ -z "$_sh_alias" ]; then + err "usage: /ssh-set-hciroot "; continue + fi + # _sh_path may legitimately be empty (clear the pin). + _run_ssh_helper set-hciroot "$_sh_alias" "$_sh_path" + continue ;; /ssh-setup*) local rest; rest=$(_slash_args "/ssh-setup" "$input") if [ -z "$rest" ]; then err "usage: /ssh-setup "; continue; fi _run_ssh_helper setup "$rest" @@ -6864,9 +6934,29 @@ main_loop() { larry_say "re-detected. /env to view." continue ;; /sites*) # v0.8.13: both-mode site listing. `/sites` → LOCAL; `/sites ` - # → REMOTE (login-shell discover over the open ControlMaster). - local _site_alias; _site_alias=$(_slash_args "/sites" "$input") - tool_list_sites "${_site_alias:-}" "" + # → REMOTE discover over the open ControlMaster. + # v0.8.15: optional `--hciroot ` pass-through. In REMOTE mode + # it PINS that HCIROOT for the alias (persisted) before enumerating, + # so the remote walk exports HCIROOT explicitly and skips the + # sudo-gated login profile. In LOCAL mode it overrides the scan root. + local _site_args; _site_args=$(_slash_args "/sites" "$input") + local _site_alias="" _site_hciroot="" _tok _expect="" + for _tok in $_site_args; do + if [ "$_expect" = "hciroot" ]; then _site_hciroot="$_tok"; _expect=""; continue; fi + case "$_tok" in + --hciroot) _expect="hciroot" ;; + --hciroot=*) _site_hciroot="${_tok#--hciroot=}" ;; + *) [ -z "$_site_alias" ] && _site_alias="$_tok" ;; + esac + done + if [ -n "$_site_alias" ] && [ -n "$_site_hciroot" ]; then + # REMOTE + explicit hciroot → persist the pin, then enumerate. + _run_ssh_helper set-hciroot "$_site_alias" "$_site_hciroot" + tool_list_sites "$_site_alias" "" + else + # REMOTE (pin/login-shell resolves HCIROOT) or LOCAL (hciroot override). + tool_list_sites "${_site_alias:-}" "${_site_hciroot:-}" + fi continue ;; /site\ *) HCISITE="${input#/site }"; HCISITEDIR="$HCIROOT/$HCISITE" export HCISITE HCISITEDIR diff --git a/lib/ssh-helper.sh b/lib/ssh-helper.sh index 8987075..268a328 100755 --- a/lib/ssh-helper.sh +++ b/lib/ssh-helper.sh @@ -20,6 +20,13 @@ # add add a host to the alias list # remove remove an alias (also clears cred + socket) # pass set/update the password (hidden interactive) +# set-hciroot pin (persist) $HCIROOT for an alias. When +# set, remote enumeration/exec runs with +# HCIROOT= exported EXPLICITLY and +# WITHOUT the `bash -lc` login wrapper — for +# hosts whose login profile is sudo-gated or +# otherwise non-interactive (v0.8.15). +# Pass an empty path to clear the pin. # setup open ControlMaster (uses stored password ONCE) # close close ControlMaster # status [alias] show open masters / cred presence @@ -68,7 +75,9 @@ ensure_layout() { chmod 700 "$LARRY_HOME" "$SSH_CREDS_DIR" "$SSH_SOCKETS_DIR" 2>/dev/null || true if [ ! -f "$SSH_HOSTS_FILE" ]; then umask 077 - printf 'alias\taddr\tport\n' > "$SSH_HOSTS_FILE" + # v0.8.15: 4th column = pinned HCIROOT (optional). Older 3-column files stay + # valid — readers treat a missing $4 as "no pin". + printf 'alias\taddr\tport\thciroot\n' > "$SSH_HOSTS_FILE" chmod 600 "$SSH_HOSTS_FILE" fi } @@ -80,13 +89,22 @@ read_host_addr() { awk -F'\t' -v a="$alias" 'NR>1 && $1==a { print $2 "\t" $3; exit }' < "$SSH_HOSTS_FILE" } +# read_host_hciroot ALIAS → echoes the pinned HCIROOT (column 4) or empty. +# v0.8.15: a non-empty value means remote commands for this alias run with +# HCIROOT exported explicitly and WITHOUT the `bash -lc` login wrapper. +read_host_hciroot() { + local alias="$1" + [ -f "$SSH_HOSTS_FILE" ] || { printf ''; return 0; } + awk -F'\t' -v a="$alias" 'NR>1 && $1==a { print $4; exit }' < "$SSH_HOSTS_FILE" +} + require_sshpass() { command -v sshpass >/dev/null 2>&1 \ || die "sshpass not on PATH — install it (apt install sshpass / brew install sshpass) and retry" } cmd_help() { - sed -n '4,30p' "$0" + sed -n '4,47p' "$0" } cmd_hosts() { @@ -97,9 +115,9 @@ cmd_hosts() { echo "no hosts configured. Add with: ssh-helper.sh add " return 0 fi - printf 'alias user@host port cred master\n' - printf '%s\n' '───── ───────── ──── ──── ──────' - awk -F'\t' 'NR>1' "$SSH_HOSTS_FILE" | while IFS=$'\t' read -r alias addr port; do + printf 'alias user@host port cred master hciroot-pin\n' + printf '%s\n' '───── ───────── ──── ──── ────── ───────────' + awk -F'\t' 'NR>1' "$SSH_HOSTS_FILE" | while IFS=$'\t' read -r alias addr port hciroot; do local cred_state="–" [ -f "$SSH_CREDS_DIR/$alias" ] && cred_state="✓" local master_state="–" @@ -107,7 +125,7 @@ cmd_hosts() { if [ -S "$sock" ] && ssh -S "$sock" -O check -p "$port" "$addr" 2>/dev/null; then master_state="open" fi - printf '%-20s%-52s%-6s%-6s%s\n' "$alias" "$addr" "${port:-22}" "$cred_state" "$master_state" + printf '%-20s%-52s%-6s%-6s%-8s%s\n' "$alias" "$addr" "${port:-22}" "$cred_state" "$master_state" "${hciroot:-–}" done } @@ -130,11 +148,41 @@ cmd_add() { die "alias '$alias' already exists. Use 'remove $alias' first." fi umask 077 - printf '%s\t%s\t%s\n' "$alias" "$addr" "$port" >> "$SSH_HOSTS_FILE" + # v0.8.15: write an empty 4th (hciroot) field so the row layout is uniform. + printf '%s\t%s\t%s\t%s\n' "$alias" "$addr" "$port" "" >> "$SSH_HOSTS_FILE" chmod 600 "$SSH_HOSTS_FILE" ok "added $alias → $addr (port $port). Next: ssh-helper.sh pass $alias" } +# cmd_set_hciroot ALIAS [PATH] — pin (or clear) the HCIROOT for an alias. +# Persisted as column 4 of the hosts TSV. An empty/omitted PATH clears the pin. +# When set, cmd_exec/cmd_discover/cmd_pull_smat run remote commands with +# HCIROOT= exported EXPLICITLY and WITHOUT the `bash -lc` login wrapper — +# the v0.8.15 fix for hosts whose login profile is sudo-gated (a non-interactive +# SSH session hits `sudo: a terminal is required` and never exports $HCIROOT). +cmd_set_hciroot() { + local alias="${1:-}" newroot="${2:-}" + [ -n "$alias" ] || die "usage: set-hciroot (empty path clears the pin)" + ensure_layout + local addr_port; addr_port=$(read_host_addr "$alias") + [ -n "$addr_port" ] || die "no such alias: $alias (run 'add' first)" + # Rewrite the row in place, setting/replacing column 4. awk handles rows that + # still have only 3 columns (legacy) by assigning $4 directly. + local tmp; tmp=$(mktemp) + awk -F'\t' -v OFS='\t' -v a="$alias" -v r="$newroot" ' + NR==1 { if (NF < 4) { $4="hciroot" } print; next } + $1==a { $4=r; print; next } + { print } + ' "$SSH_HOSTS_FILE" > "$tmp" && mv "$tmp" "$SSH_HOSTS_FILE" + chmod 600 "$SSH_HOSTS_FILE" + if [ -n "$newroot" ]; then + ok "pinned HCIROOT for $alias → $newroot" + ok " (remote enumeration/exec for $alias will export HCIROOT explicitly and SKIP the login profile)" + else + ok "cleared HCIROOT pin for $alias (reverting to login-shell \$HCIROOT resolution)" + fi +} + cmd_remove() { local alias="${1:-}" [ -n "$alias" ] || die "usage: remove " @@ -186,24 +234,99 @@ cmd_setup() { [ -f "$credfile" ] || die "no password set for $alias — run 'pass $alias' first" require_sshpass ok "opening ssh master for $alias ($addr:$port) — ControlPersist=$SSH_CONTROL_PERSIST..." - if sshpass -f "$credfile" ssh \ - -o "ControlMaster=yes" \ - -o "ControlPath=$sock" \ - -o "ControlPersist=$SSH_CONTROL_PERSIST" \ - -o "StrictHostKeyChecking=accept-new" \ - -o "ConnectTimeout=10" \ - -p "$port" \ - -N -f \ - "$addr" 2>/tmp/larry-ssh-setup.err; then - if ssh -S "$sock" -O check -p "$port" "$addr" 2>/dev/null; then - ok "✓ master open: $alias → $addr:$port (socket: $sock)" - rm -f /tmp/larry-ssh-setup.err - return 0 + + # _try_master_open — one attempt with the stored credential. Returns 0 on a + # verified-open master; non-zero otherwise. Stderr from sshpass/ssh lands in + # the file named by $1 so the caller can classify it. + # + # v0.8.15 hardening (banner + rotating-password): + # • -o PreferredAuthentications=password -o PubkeyAuthentication=no forces the + # password method so sshpass feeds the password cleanly. Without this, on a + # box that prints a long pre-auth banner and would otherwise try pubkey + # first, ssh can consume the password slot on the wrong method and the only + # thing surfaced is the banner with NO "permission denied" — exactly the + # symptom seen on shdclvf01q. + # • -o NumberOfPasswordPrompts=1 so a stale password fails fast (one prompt) + # instead of hanging, which lets us re-prompt for the rotated one. + _try_master_open() { + local errfile="$1" + sshpass -f "$credfile" ssh \ + -o "ControlMaster=yes" \ + -o "ControlPath=$sock" \ + -o "ControlPersist=$SSH_CONTROL_PERSIST" \ + -o "StrictHostKeyChecking=accept-new" \ + -o "PreferredAuthentications=password" \ + -o "PubkeyAuthentication=no" \ + -o "NumberOfPasswordPrompts=1" \ + -o "ConnectTimeout=10" \ + -p "$port" \ + -N -f \ + "$addr" 2>"$errfile" + local rc=$? + [ "$rc" -eq 0 ] && ssh -S "$sock" -O check -p "$port" "$addr" 2>/dev/null + } + + # _looks_like_auth_failure ERRFILE — heuristic: did this fail on auth (vs. + # network/host-key)? sshpass exits 5 on auth failure, but the banner can mask + # the textual reason, so we also treat permission/password/auth keywords as + # auth failures. A rotated password is the prime suspect on this box. + _looks_like_auth_failure() { + local errfile="$1" + grep -qiE 'permission denied|authentication fail|incorrect password|too many authentication|password:' "$errfile" 2>/dev/null && return 0 + # Empty-or-banner-only stderr after a password attempt → almost always the + # rotated/stale credential. Treat as auth failure so we re-prompt. + return 0 + } + + local errfile="/tmp/larry-ssh-setup.err" + : > "$errfile" + if _try_master_open "$errfile"; then + ok "✓ master open: $alias → $addr:$port (socket: $sock)" + rm -f "$errfile" + return 0 + fi + + # First attempt failed. Surface the REAL error (not just the banner) and, if it + # looks like an auth failure, re-prompt for a fresh password (12h rotation on + # this box) and retry ONCE. Never silently no-op. + printf 'ssh-helper: first master-open attempt failed for %s.\n' "$alias" >&2 + if [ -s "$errfile" ]; then + printf 'ssh-helper: ssh/sshpass stderr (auth error, not just the banner):\n' >&2 + grep -iE 'permission denied|authentication|password|denied|fatal|connection|timed out|refused|host key' "$errfile" >&2 2>/dev/null \ + || cat "$errfile" >&2 2>/dev/null + else + printf 'ssh-helper: (no stderr captured — the box likely printed only its pre-auth banner; the stored password is almost certainly stale)\n' >&2 + fi + + if _looks_like_auth_failure "$errfile" && [ -t 0 -o -e /dev/tty ]; then + printf 'ssh-helper: looks like the stored password is stale (this host rotates ~every 12h).\n' >&2 + printf 'Enter a FRESH password for %s (input hidden; Enter to abort): ' "$alias" >&2 + local pw="" + stty -echo 2>/dev/null + IFS= read -r pw /dev/null + echo "" >&2 + if [ -n "$pw" ]; then + umask 077 + printf '%s' "$pw" > "$credfile" # NO trailing newline (sshpass -f) + chmod 600 "$credfile" + ok "stored the fresh password — retrying master open..." + : > "$errfile" + if _try_master_open "$errfile"; then + ok "✓ master open: $alias → $addr:$port (socket: $sock)" + rm -f "$errfile" + return 0 + fi + printf 'ssh-helper: retry with the fresh password ALSO failed. ssh/sshpass stderr:\n' >&2 + cat "$errfile" >&2 2>/dev/null + else + printf 'ssh-helper: no password entered — aborting.\n' >&2 fi fi - printf 'ssh-helper: setup failed. sshpass/ssh stderr:\n' >&2 - cat /tmp/larry-ssh-setup.err >&2 2>/dev/null - rm -f /tmp/larry-ssh-setup.err + + printf 'ssh-helper: master NOT open for %s. Next step: re-run `ssh-helper.sh setup %s` (or the /ssh-setup %s slash command) with a current password; if the host changed, re-check `ssh-helper.sh hosts`.\n' \ + "$alias" "$alias" "$alias" >&2 + rm -f "$errfile" return 1 } @@ -276,6 +399,39 @@ _build_login_cmd() { printf "bash -lc '%s'" "$esc" } +# v0.8.15 (sudo-gated-profile fix): when an alias has a pinned HCIROOT, the +# remote command must NOT go through the login profile (`bash -lc`). On hosts +# whose login profile is sudo-gated, a non-interactive SSH session trips +# `sudo: a terminal is required`, the profile never finishes, and $HCIROOT comes +# back EMPTY. Instead we export HCIROOT explicitly and run a plain `sh -c` (no +# login profile, no tty needed). This is deterministic and version-agnostic. +# +# _shq STR → single-quote STR for safe embedding inside another '...' context. +_shq() { printf '%s' "$1" | sed "s/'/'\\\\''/g"; } + +# _build_pinned_cmd HCIROOT RAW → a remote command string that exports HCIROOT +# explicitly (and HCISITEDIR-friendly callers can derive from it) then runs RAW +# under a NON-login `sh -c`. No `bash -lc`, so the sudo-gated profile is skipped. +_build_pinned_cmd() { + local root="$1" raw="$2" + local esc; esc=$(_shq "$raw") + printf "sh -c 'HCIROOT=%s; export HCIROOT; %s'" "$(_shq "$root")" "$esc" +} + +# _remote_cmd_for ALIAS RAW → echo the exact command string to hand to ssh. +# If ALIAS has a pinned HCIROOT → pinned (explicit-export, no login profile). +# Else → the existing login-shell wrapper (_build_login_cmd). Single chokepoint +# so cmd_exec/cmd_discover/cmd_pull_smat all honour the pin identically. +_remote_cmd_for() { + local alias="$1" raw="$2" + local pin; pin=$(read_host_hciroot "$alias") + if [ -n "$pin" ]; then + _build_pinned_cmd "$pin" "$raw" + else + _build_login_cmd "$raw" + fi +} + cmd_exec() { local alias="${1:-}" [ -n "$alias" ] || die "usage: exec " @@ -291,9 +447,11 @@ cmd_exec() { if [ ! -S "$sock" ] || ! ssh -S "$sock" -O check -p "$port" "$addr" 2>/dev/null; then die "no open master for $alias — run 'setup $alias' first" fi - # Multiplexed; no password needed. Run in a login shell so $HCIROOT et al. - # populate from the remote Cloverleaf login profile (see _build_login_cmd). - ssh -S "$sock" -p "$port" -o BatchMode=yes "$addr" "$(_build_login_cmd "$cmd")" + # Multiplexed; no password needed. If the alias has a pinned HCIROOT we export + # it explicitly and skip the login profile (v0.8.15 sudo-gated-profile fix); + # otherwise we run in a login shell so $HCIROOT et al. populate from the remote + # Cloverleaf login profile (see _build_login_cmd / _remote_cmd_for). + ssh -S "$sock" -p "$port" -o BatchMode=yes "$addr" "$(_remote_cmd_for "$alias" "$cmd")" } # cmd_discover ALIAS — proactively detect the remote Cloverleaf environment. @@ -318,29 +476,78 @@ cmd_discover() { die "no open master for $alias — run 'setup $alias' first" fi - # A single login-shell remote script. It: + # A single remote script. It: # - prints HCIROOT\t$HCIROOT - # - tries `hcisitelist` (Cloverleaf site lister); each non-empty token → SITE - # - falls back to a NetConfig walk under $HCIROOT (depth ≤2) - # Kept POSIX-sh so it runs under whatever /bin/sh the login shell spawns. + # - PRIMARY enumeration = the NetConfig walk under $HCIROOT (depth ≤2), + # IDENTICAL to lib/each-site.sh: find NetConfig files → dirname → basename + # → sort -u. This is the version-agnostic ground truth and works on a box + # with NO `hcisitelist` (v0.8.15 portability fix — confirmed: shdclvf01q + # has no hcisitelist). + # - `hcisitelist` is used ONLY if it is actually present AND the walk found + # nothing (belt-and-suspenders), never as the dependency. + # Kept POSIX-sh so it runs under whatever /bin/sh spawns it. + # + # NOTE on environment: when the alias has a pinned HCIROOT, _remote_cmd_for + # exports HCIROOT explicitly and runs this under a NON-login `sh -c` (skips the + # sudo-gated login profile). Otherwise it runs under `bash -lc` so the login + # profile populates $HCIROOT. Either way the script below only reads + # ${HCIROOT:-}, so it is agnostic to which path delivered it. + # v0.8.15 (list-sites exclusion): drop non-real entries from the enumeration so + # /sites shows only operator-meaningful sites. Two filters, applied at the walk + # source (so REMOTE pinned, REMOTE login-shell, and LOCAL all behave the same): + # 1. SITES_EXCLUDE — static scaffolding/special dirs (helloworld, siteProto, + # master). A documented, tunable env var: Bryan can override at call time + # via `SITES_EXCLUDE='...' discover ` without a config UI. + # 2. Host-name match — any site dir whose name == the remote `hostname -s` or + # full `hostname` (a dir just named after the box, e.g. shdclvf01q). The + # remote hostname is the primary signal; we ALSO pass the alias's configured + # SSH host as a secondary candidate (qa's alias host is lhsixfqa) so a dir + # matching that is dropped too. + # NOT silent: every dropped name is reported on an EXCLUDED note so the tool + # layer surfaces it. The real-site list/count stays the headline. + local sites_exclude="${SITES_EXCLUDE:-helloworld siteProto master}" + # bare host from the alias's user@host (strip optional user@); '-' if none. + local alias_host="${addr#*@}"; [ -n "$alias_host" ] || alias_host="-" local remote=' + SITES_EXCLUDE='\'"$(_shq "$sites_exclude")"\''; + ALIAS_HOST='\'"$(_shq "$alias_host")"\''; printf "HCIROOT\t%s\n" "${HCIROOT:-}"; if [ -z "${HCIROOT:-}" ]; then - printf "NOTE\tHCIROOT empty even in a login shell — operator profile may not export it\n"; + printf "NOTE\tHCIROOT is empty. If this host has a sudo-gated/non-interactive login profile, pin it: ssh-helper.sh set-hciroot \n"; exit 0; - fi - got=0; - if command -v hcisitelist >/dev/null 2>&1; then - for s in $(hcisitelist 2>/dev/null); do - [ -n "$s" ] && { printf "SITE\t%s\n" "$s"; got=1; } - done; fi; - if [ "$got" = "0" ]; then - find "$HCIROOT" -mindepth 1 -maxdepth 2 -name NetConfig -type f 2>/dev/null \ - | while IFS= read -r nc; do d=$(dirname "$nc"); printf "SITE\t%s\n" "$(basename "$d")"; done \ - | sort -u; - fi' - ssh -S "$sock" -p "$port" -o BatchMode=yes "$addr" "$(_build_login_cmd "$remote")" + if [ ! -d "${HCIROOT}" ]; then + printf "NOTE\tHCIROOT=%s is not a directory on the remote — check the pinned path\n" "${HCIROOT}"; + exit 0; + fi; + sites=$(find "$HCIROOT" -mindepth 1 -maxdepth 2 -name NetConfig -type f 2>/dev/null \ + | while IFS= read -r nc; do d=$(dirname "$nc"); basename "$d"; done \ + | sort -u); + if [ -z "$sites" ] && command -v hcisitelist >/dev/null 2>&1; then + printf "NOTE\tNetConfig walk found no sites; falling back to hcisitelist\n"; + sites=$(hcisitelist 2>/dev/null | tr " " "\n" | grep -v "^$" | sort -u); + fi; + if [ -z "$sites" ]; then + printf "NOTE\tno sites with a NetConfig found under %s\n" "$HCIROOT"; + exit 0; + fi; + HN_S=$(hostname -s 2>/dev/null || true); + HN_F=$(hostname 2>/dev/null || true); + kept=""; dropped=""; + for s in $sites; do + [ -n "$s" ] || continue; + drop=""; + for x in $SITES_EXCLUDE; do [ "$s" = "$x" ] && drop=1 && break; done; + [ -z "$drop" ] && [ -n "$HN_S" ] && [ "$s" = "$HN_S" ] && drop=1; + [ -z "$drop" ] && [ -n "$HN_F" ] && [ "$s" = "$HN_F" ] && drop=1; + [ -z "$drop" ] && [ "$ALIAS_HOST" != "-" ] && [ "$s" = "$ALIAS_HOST" ] && drop=1; + if [ -n "$drop" ]; then dropped="$dropped $s"; else kept="$kept +$s"; fi; + done; + dropped=$(printf "%s" "$dropped" | sed "s/^ *//"); + [ -n "$dropped" ] && printf "EXCLUDED\t%s\n" "$dropped"; + printf "%s\n" "$kept" | while IFS= read -r s; do [ -n "$s" ] && printf "SITE\t%s\n" "$s"; done' + ssh -S "$sock" -p "$port" -o BatchMode=yes "$addr" "$(_remote_cmd_for "$alias" "$remote")" } # ── v0.6.8: scp helpers that multiplex via the existing ControlMaster ──────── @@ -503,7 +710,8 @@ cmd_pull_smat() { find_cmd+='printf "SMATDB_PATH:%s\n" "$F"' local _smat_raw remote_smatdb - _smat_raw=$(ssh -S "$_RH_SOCK" -p "$_RH_PORT" -o BatchMode=yes "$_RH_ADDR" "$(_build_login_cmd "$find_cmd")" 2>&1) + # v0.8.15: honour a pinned HCIROOT (explicit export, no sudo-gated login profile). + _smat_raw=$(ssh -S "$_RH_SOCK" -p "$_RH_PORT" -o BatchMode=yes "$_RH_ADDR" "$(_remote_cmd_for "$alias" "$find_cmd")" 2>&1) remote_smatdb=$(printf '%s\n' "$_smat_raw" | grep '^SMATDB_PATH:' | tail -1) if [ -n "$remote_smatdb" ]; then remote_smatdb="${remote_smatdb#SMATDB_PATH:}" @@ -556,8 +764,12 @@ cmd_pull_smat() { sample_cmd+='RETURNED=$(sqlite3 "'"$remote_smatdb"'" "SELECT MIN(1000, COUNT(*)) FROM smat_msgs WHERE Time >= $CUTOFF_MS"); ' sample_cmd+='echo "# smatdb=$(basename '"$remote_smatdb"') days_back='"$days_back"' total_in_window=$TOTAL returned=$RETURNED truncated=$([ "$TOTAL" -gt 1000 ] && echo yes || echo no)" >&2' - # Login shell so sqlite3 resolves from the operator's PATH (v0.8.13). - ssh -S "$_RH_SOCK" -p "$_RH_PORT" -o BatchMode=yes "$_RH_ADDR" "$(_build_login_cmd "$sample_cmd")" + # Login shell so sqlite3 resolves from the operator's PATH (v0.8.13), unless + # the alias has a pinned HCIROOT, in which case we export HCIROOT explicitly + # and skip the sudo-gated login profile (v0.8.15). Note: when pinned, sqlite3 + # must be resolvable on the default non-login PATH; if it is not, the + # sample_cmd already emits a clear "ERROR: sqlite3 not on remote PATH". + ssh -S "$_RH_SOCK" -p "$_RH_PORT" -o BatchMode=yes "$_RH_ADDR" "$(_remote_cmd_for "$alias" "$sample_cmd")" } case "${1:-help}" in @@ -565,6 +777,7 @@ case "${1:-help}" in add) shift; cmd_add "$@" ;; remove|rm) shift; cmd_remove "$@" ;; pass|passwd) shift; cmd_pass "$@" ;; + set-hciroot|hciroot) shift; cmd_set_hciroot "$@" ;; setup|open) shift; cmd_setup "$@" ;; close|exit) shift; cmd_close "$@" ;; status) shift; cmd_status "$@" ;;