cloverleaf-larry/lib/cygwin-safe.sh
Bryan Johnson 111be2c744 v0.8.26: harden control-byte sanitize across the tool suite + ssh-helper traps
Shared _sanitize_ctl (unconditional, nc-document) and _sanitize_ctl_tty
(strips only when stdout is a terminal) now live in cygwin-safe.sh. nc-msgs,
nc-parse, and the hl7-* tools route stdout through the tty-gated variant, so a
terminal is protected from raw HL7/NetConfig control bytes while pipes and
redirects stay byte-exact (the 0x1c framing route_test needs is preserved).
Exit codes propagate via PIPESTATUS. ssh-helper _read_hidden installs its
restore trap before stty -echo on every path and saves/restores the prior trap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 16:35:06 -07:00

180 lines
8.1 KiB
Bash
Executable File
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/usr/bin/env bash
# cygwin-safe.sh — three primitives that defend Larry-Anywhere against the
# Cygwin/MobaXterm CR-taint pattern that crashed OAuth in v0.7.3.
#
# Pattern (full diagnosis in
# Deliverables/2026-05-27-cloverleaf-larry-oauth-arithmetic-fix.md and the
# v0.7.5 CR-safety sweep deliverable):
#
# On MobaXterm / Cygwin / Git-Bash-for-Windows, any of the following can
# return a string ending in a literal carriage return (\r):
# - `$(date +%s)`, `$(date ...)`, `$(cmd)` against a Cygwin-built binary
# - `read` of user input (depending on tty mode)
# - `cat`/`head`/`tail` against a CRLF-line-ended file
# - `$(<file)` when the file is CRLF
# - `wc -l < file`, `wc -c < file` (the count is fine, but `wc.exe` may
# still emit `\r\n` on the captured stdout)
# - `jq -r '.field' file.json` when file was created with CRLF
# - Heredoc lines that came through a Windows clipboard
#
# The CR is invisible in normal output but lethal when the string lands in:
# - bash arithmetic ($(( )), (( )), let, [ -gt N ]) → "invalid arithmetic
# operator (error token is "")"
# - case dispatchers → pattern matches LITERAL `/cmd\r`, not `/cmd`
# - regex tests → `[[ $x =~ ^[Yy]$ ]]` silently fails on `Y\r`
# - path construction → mkdir/stat fail with ENOENT on `dir\r/file`
# - HTTP headers → server rejects the malformed Authorization line
# - file compares → `[[ $a == $b ]]` silent false-negative
#
# This file is SOURCEABLE — every caller does:
# . "$LARRY_LIB_DIR/cygwin-safe.sh"
#
# Idempotent: re-sourcing is harmless (functions just get redefined identically).
# It defines functions only, runs no code on source, sets no `set -u/-e/-o pipefail`
# globally (those are the caller's responsibility — we must not change them).
# coerce_int VAL [DEFAULT] — return a clean decimal integer that is SAFE to
# drop into any bash arithmetic / integer-test context.
#
# Algorithm: strip every byte that isn't 0-9, then fall back to DEFAULT (or 0)
# if the result is empty. No printf %d (whose behaviour on CR taint varies by
# libc), no shell expansion in arithmetic context — nothing that can crash
# the caller.
#
# Use whenever the value will appear in:
# $((expr)) (( expr )) [ X -gt Y ] [[ X -lt Y ]] let X=...
coerce_int() {
local raw="${1:-}" default="${2:-0}"
local cleaned; cleaned=$(printf '%s' "$raw" | tr -cd '0-9')
printf '%s' "${cleaned:-$default}"
}
# strip_cr VAL — return VAL with every embedded carriage return removed.
#
# Use when the value will appear in:
# case "$X" in ...) ...; esac # pattern dispatchers
# [[ "$X" =~ ^[Yy]$ ]] # regex tests
# [[ "$X" == "literal" ]] # string compares
# "$prefix/$X" # path construction
# "-H Authorization: Bearer $X" # HTTP headers
#
# Cheaper than coerce_int — no subshell, pure bash parameter expansion.
strip_cr() {
local v="${1:-}"
# Strip ALL \r occurrences, not just trailing — embedded CRs (from CRLF
# multi-line input) are just as toxic for the consumers above.
printf '%s' "${v//$'\r'/}"
}
# rtrim VAL — return VAL with all TRAILING whitespace removed (spaces, tabs,
# and CR — anything in [:space:]). Leading and interior whitespace untouched.
#
# Use immediately before a `case "$X" in ...) esac` pattern dispatcher whose
# arms are exact-string globs (e.g. /quit) /help)). Bash case patterns are
# literal globs, so a trailing space makes "/quit " miss "/quit)" and fall
# through to the catch-all. This bites tab completion: __larry_complete_slash
# intentionally appends a friendly trailing space after a unique match (so
# arg-taking commands feel snappy), which the exact-match dispatcher then
# rejects for no-arg commands. rtrim at the dispatch boundary tolerates the
# completer's space, a user-typed trailing space, and any CR remnant in one
# defensive line — without removing the completer's UX nicety.
#
# Trailing-only by design: interior spaces separate a command from its
# argument (/load FILE), so we must never collapse those. The expansion
# below strips the run of trailing whitespace chars only.
#
# Pure bash parameter expansion — no subshell, no external tools.
rtrim() {
local v="${1:-}"
printf '%s' "${v%"${v##*[![:space:]]}"}"
}
# read_clean VAR [PROMPT] — like `read -r VAR`, but every captured byte that
# is \r gets stripped before the assignment.
#
# Why a wrapper instead of post-processing the var: bash's `read` already
# strips a trailing newline, but on Cygwin/MobaXterm with a CRLF tty the
# \r BEFORE the \n stays in the variable. Doing `read X; X="${X//$'\r'/}"`
# at every call site is 2× the diff and easy to forget; this folds it.
#
# Reads from /dev/tty by default (same as the prevailing `read -r ans </dev/tty
# || ans=""` idiom across the codebase) so it works when stdin is piped.
# If /dev/tty is unavailable, falls back to plain stdin.
#
# Usage:
# read_clean answer "Proceed? [y/N]: "
# if [[ "$answer" =~ ^[Yy]$ ]]; then ... fi
#
# Returns the same exit code as the underlying `read` (1 on EOF).
read_clean() {
local _var="$1"; shift
local _prompt="${1:-}"
local _raw=""
if [ -r /dev/tty ]; then
if [ -n "$_prompt" ]; then
IFS= read -r -p "$_prompt" _raw </dev/tty
else
IFS= read -r _raw </dev/tty
fi
else
if [ -n "$_prompt" ]; then
IFS= read -r -p "$_prompt" _raw
else
IFS= read -r _raw
fi
fi
local _rc=$?
# Strip ALL CRs (paste of multi-line CRLF can introduce embedded ones).
_raw="${_raw//$'\r'/}"
# Assign through eval — printf-quote the value so it survives metacharacters.
printf -v "$_var" '%s' "$_raw"
return $_rc
}
# ─────────────────────────────────────────────────────────────────────────────
# CONTROL-BYTE SANITIZER (terminal-corruption defence) — shared since v0.8.26.
#
# Origin: v0.8.25 added this as a private helper in nc-document.sh after raw
# ESC/control bytes in tool output flipped the user's terminal mode and broke
# backspace/arrows (recoverable only with `stty sane`/`reset`). v0.8.26 hoists
# the one definition here so EVERY tool that dumps NetConfig/.tcl/HL7 content
# shares it — most importantly nc-msgs.sh, whose raw HL7 carries 0x1c block
# framing and other C0 bytes that wreck a terminal when viewed un-redirected.
#
# `_sanitize_ctl` filters stdin→stdout, stripping the C0 control bytes that
# corrupt a terminal while PRESERVING the three whitespace controls that text
# legitimately uses (TAB, LF, CR) and all high bytes (0x80-0xFF, so UTF-8
# names/comments and the em-dash survive intact).
#
# Strip set (octal, POSIX `tr` ranges — portable to AIX/Linux/BSD/Cygwin):
# \001-\010 SOH..BS (drops BS ^H, the literal-backspace culprit)
# [keep \011 TAB, \012 LF]
# \013\014 VT, FF
# [keep \015 CR — legit in MobaXterm/Windows-tainted content]
# \016-\037 SO..US (drops ESC 0x1B, the mode-flip culprit; 0x1c FS too)
# \177 DEL
# LC_ALL=C forces byte-wise operation (AIX `tr` is locale-sensitive otherwise).
# Falls back to `cat` if `tr` is somehow unavailable, so it never drops data.
_sanitize_ctl() {
LC_ALL=C tr -d '\001-\010\013\014\016-\037\177' 2>/dev/null || cat
}
# `_sanitize_ctl_tty` — the DATA-TOOL variant. nc-msgs/nc-parse/hl7-* emit data
# that downstream tooling consumes byte-for-byte (e.g. `nc-msgs ... > input.msgs`
# feeding route_test, or `| awkcut`). The 0x1c HL7 framing and other control
# bytes are LOAD-BEARING on a pipe/redirect — stripping them would silently
# corrupt the data. So we only sanitize when stdout is an interactive TERMINAL
# (protect the human's tty); on a pipe/file we pass through RAW, byte-identical.
#
# `[ -t 1 ]` is POSIX (true only when fd 1 is a terminal). Note this is a
# FILTER (stdin→stdout); the gate decision is made once at call time. Callers
# pipe their whole output region through it and propagate ${PIPESTATUS[0]} so
# the upstream producer's exit code is preserved across the pipe.
_sanitize_ctl_tty() {
if [ -t 1 ]; then
_sanitize_ctl
else
cat
fi
}