Shared _sanitize_ctl (unconditional, nc-document) and _sanitize_ctl_tty (strips only when stdout is a terminal) now live in cygwin-safe.sh. nc-msgs, nc-parse, and the hl7-* tools route stdout through the tty-gated variant, so a terminal is protected from raw HL7/NetConfig control bytes while pipes and redirects stay byte-exact (the 0x1c framing route_test needs is preserved). Exit codes propagate via PIPESTATUS. ssh-helper _read_hidden installs its restore trap before stty -echo on every path and saves/restores the prior trap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
180 lines
8.1 KiB
Bash
Executable File
180 lines
8.1 KiB
Bash
Executable File
#!/usr/bin/env bash
|
||
# cygwin-safe.sh — three primitives that defend Larry-Anywhere against the
|
||
# Cygwin/MobaXterm CR-taint pattern that crashed OAuth in v0.7.3.
|
||
#
|
||
# Pattern (full diagnosis in
|
||
# Deliverables/2026-05-27-cloverleaf-larry-oauth-arithmetic-fix.md and the
|
||
# v0.7.5 CR-safety sweep deliverable):
|
||
#
|
||
# On MobaXterm / Cygwin / Git-Bash-for-Windows, any of the following can
|
||
# return a string ending in a literal carriage return (\r):
|
||
# - `$(date +%s)`, `$(date ...)`, `$(cmd)` against a Cygwin-built binary
|
||
# - `read` of user input (depending on tty mode)
|
||
# - `cat`/`head`/`tail` against a CRLF-line-ended file
|
||
# - `$(<file)` when the file is CRLF
|
||
# - `wc -l < file`, `wc -c < file` (the count is fine, but `wc.exe` may
|
||
# still emit `\r\n` on the captured stdout)
|
||
# - `jq -r '.field' file.json` when file was created with CRLF
|
||
# - Heredoc lines that came through a Windows clipboard
|
||
#
|
||
# The CR is invisible in normal output but lethal when the string lands in:
|
||
# - bash arithmetic ($(( )), (( )), let, [ -gt N ]) → "invalid arithmetic
|
||
# operator (error token is "")"
|
||
# - case dispatchers → pattern matches LITERAL `/cmd\r`, not `/cmd`
|
||
# - regex tests → `[[ $x =~ ^[Yy]$ ]]` silently fails on `Y\r`
|
||
# - path construction → mkdir/stat fail with ENOENT on `dir\r/file`
|
||
# - HTTP headers → server rejects the malformed Authorization line
|
||
# - file compares → `[[ $a == $b ]]` silent false-negative
|
||
#
|
||
# This file is SOURCEABLE — every caller does:
|
||
# . "$LARRY_LIB_DIR/cygwin-safe.sh"
|
||
#
|
||
# Idempotent: re-sourcing is harmless (functions just get redefined identically).
|
||
# It defines functions only, runs no code on source, sets no `set -u/-e/-o pipefail`
|
||
# globally (those are the caller's responsibility — we must not change them).
|
||
|
||
# coerce_int VAL [DEFAULT] — return a clean decimal integer that is SAFE to
|
||
# drop into any bash arithmetic / integer-test context.
|
||
#
|
||
# Algorithm: strip every byte that isn't 0-9, then fall back to DEFAULT (or 0)
|
||
# if the result is empty. No printf %d (whose behaviour on CR taint varies by
|
||
# libc), no shell expansion in arithmetic context — nothing that can crash
|
||
# the caller.
|
||
#
|
||
# Use whenever the value will appear in:
|
||
# $((expr)) (( expr )) [ X -gt Y ] [[ X -lt Y ]] let X=...
|
||
coerce_int() {
|
||
local raw="${1:-}" default="${2:-0}"
|
||
local cleaned; cleaned=$(printf '%s' "$raw" | tr -cd '0-9')
|
||
printf '%s' "${cleaned:-$default}"
|
||
}
|
||
|
||
# strip_cr VAL — return VAL with every embedded carriage return removed.
|
||
#
|
||
# Use when the value will appear in:
|
||
# case "$X" in ...) ...; esac # pattern dispatchers
|
||
# [[ "$X" =~ ^[Yy]$ ]] # regex tests
|
||
# [[ "$X" == "literal" ]] # string compares
|
||
# "$prefix/$X" # path construction
|
||
# "-H Authorization: Bearer $X" # HTTP headers
|
||
#
|
||
# Cheaper than coerce_int — no subshell, pure bash parameter expansion.
|
||
strip_cr() {
|
||
local v="${1:-}"
|
||
# Strip ALL \r occurrences, not just trailing — embedded CRs (from CRLF
|
||
# multi-line input) are just as toxic for the consumers above.
|
||
printf '%s' "${v//$'\r'/}"
|
||
}
|
||
|
||
# rtrim VAL — return VAL with all TRAILING whitespace removed (spaces, tabs,
|
||
# and CR — anything in [:space:]). Leading and interior whitespace untouched.
|
||
#
|
||
# Use immediately before a `case "$X" in ...) esac` pattern dispatcher whose
|
||
# arms are exact-string globs (e.g. /quit) /help)). Bash case patterns are
|
||
# literal globs, so a trailing space makes "/quit " miss "/quit)" and fall
|
||
# through to the catch-all. This bites tab completion: __larry_complete_slash
|
||
# intentionally appends a friendly trailing space after a unique match (so
|
||
# arg-taking commands feel snappy), which the exact-match dispatcher then
|
||
# rejects for no-arg commands. rtrim at the dispatch boundary tolerates the
|
||
# completer's space, a user-typed trailing space, and any CR remnant in one
|
||
# defensive line — without removing the completer's UX nicety.
|
||
#
|
||
# Trailing-only by design: interior spaces separate a command from its
|
||
# argument (/load FILE), so we must never collapse those. The expansion
|
||
# below strips the run of trailing whitespace chars only.
|
||
#
|
||
# Pure bash parameter expansion — no subshell, no external tools.
|
||
rtrim() {
|
||
local v="${1:-}"
|
||
printf '%s' "${v%"${v##*[![:space:]]}"}"
|
||
}
|
||
|
||
# read_clean VAR [PROMPT] — like `read -r VAR`, but every captured byte that
|
||
# is \r gets stripped before the assignment.
|
||
#
|
||
# Why a wrapper instead of post-processing the var: bash's `read` already
|
||
# strips a trailing newline, but on Cygwin/MobaXterm with a CRLF tty the
|
||
# \r BEFORE the \n stays in the variable. Doing `read X; X="${X//$'\r'/}"`
|
||
# at every call site is 2× the diff and easy to forget; this folds it.
|
||
#
|
||
# Reads from /dev/tty by default (same as the prevailing `read -r ans </dev/tty
|
||
# || ans=""` idiom across the codebase) so it works when stdin is piped.
|
||
# If /dev/tty is unavailable, falls back to plain stdin.
|
||
#
|
||
# Usage:
|
||
# read_clean answer "Proceed? [y/N]: "
|
||
# if [[ "$answer" =~ ^[Yy]$ ]]; then ... fi
|
||
#
|
||
# Returns the same exit code as the underlying `read` (1 on EOF).
|
||
read_clean() {
|
||
local _var="$1"; shift
|
||
local _prompt="${1:-}"
|
||
local _raw=""
|
||
if [ -r /dev/tty ]; then
|
||
if [ -n "$_prompt" ]; then
|
||
IFS= read -r -p "$_prompt" _raw </dev/tty
|
||
else
|
||
IFS= read -r _raw </dev/tty
|
||
fi
|
||
else
|
||
if [ -n "$_prompt" ]; then
|
||
IFS= read -r -p "$_prompt" _raw
|
||
else
|
||
IFS= read -r _raw
|
||
fi
|
||
fi
|
||
local _rc=$?
|
||
# Strip ALL CRs (paste of multi-line CRLF can introduce embedded ones).
|
||
_raw="${_raw//$'\r'/}"
|
||
# Assign through eval — printf-quote the value so it survives metacharacters.
|
||
printf -v "$_var" '%s' "$_raw"
|
||
return $_rc
|
||
}
|
||
|
||
# ─────────────────────────────────────────────────────────────────────────────
|
||
# CONTROL-BYTE SANITIZER (terminal-corruption defence) — shared since v0.8.26.
|
||
#
|
||
# Origin: v0.8.25 added this as a private helper in nc-document.sh after raw
|
||
# ESC/control bytes in tool output flipped the user's terminal mode and broke
|
||
# backspace/arrows (recoverable only with `stty sane`/`reset`). v0.8.26 hoists
|
||
# the one definition here so EVERY tool that dumps NetConfig/.tcl/HL7 content
|
||
# shares it — most importantly nc-msgs.sh, whose raw HL7 carries 0x1c block
|
||
# framing and other C0 bytes that wreck a terminal when viewed un-redirected.
|
||
#
|
||
# `_sanitize_ctl` filters stdin→stdout, stripping the C0 control bytes that
|
||
# corrupt a terminal while PRESERVING the three whitespace controls that text
|
||
# legitimately uses (TAB, LF, CR) and all high bytes (0x80-0xFF, so UTF-8
|
||
# names/comments and the em-dash survive intact).
|
||
#
|
||
# Strip set (octal, POSIX `tr` ranges — portable to AIX/Linux/BSD/Cygwin):
|
||
# \001-\010 SOH..BS (drops BS ^H, the literal-backspace culprit)
|
||
# [keep \011 TAB, \012 LF]
|
||
# \013\014 VT, FF
|
||
# [keep \015 CR — legit in MobaXterm/Windows-tainted content]
|
||
# \016-\037 SO..US (drops ESC 0x1B, the mode-flip culprit; 0x1c FS too)
|
||
# \177 DEL
|
||
# LC_ALL=C forces byte-wise operation (AIX `tr` is locale-sensitive otherwise).
|
||
# Falls back to `cat` if `tr` is somehow unavailable, so it never drops data.
|
||
_sanitize_ctl() {
|
||
LC_ALL=C tr -d '\001-\010\013\014\016-\037\177' 2>/dev/null || cat
|
||
}
|
||
|
||
# `_sanitize_ctl_tty` — the DATA-TOOL variant. nc-msgs/nc-parse/hl7-* emit data
|
||
# that downstream tooling consumes byte-for-byte (e.g. `nc-msgs ... > input.msgs`
|
||
# feeding route_test, or `| awkcut`). The 0x1c HL7 framing and other control
|
||
# bytes are LOAD-BEARING on a pipe/redirect — stripping them would silently
|
||
# corrupt the data. So we only sanitize when stdout is an interactive TERMINAL
|
||
# (protect the human's tty); on a pipe/file we pass through RAW, byte-identical.
|
||
#
|
||
# `[ -t 1 ]` is POSIX (true only when fd 1 is a terminal). Note this is a
|
||
# FILTER (stdin→stdout); the gate decision is made once at call time. Callers
|
||
# pipe their whole output region through it and propagate ${PIPESTATUS[0]} so
|
||
# the upstream producer's exit code is preserved across the pipe.
|
||
_sanitize_ctl_tty() {
|
||
if [ -t 1 ]; then
|
||
_sanitize_ctl
|
||
else
|
||
cat
|
||
fi
|
||
}
|