v0.8.0: PHI safety quick-wins — path-block + /load HL7 routing + strict mode
Three independent zero-risk patches closing V3/V4/V5/V6/V11 gaps from
Vera's static PHI-leak audit. Implemented per Pax's mitigation
recommendations. No new deps, no behavior change for users not handling PHI.
- tool_read_file / tool_grep_files / tool_glob_files / tool_list_dir now
refuse paths under $LARRY_HOME/{log,sanitize,sessions} and
$LARRY_HOME/{.oauth.json,.env} with a structured JSON error the model
must surface. Block-list evaluates at call time; comparison runs against
both the literal and realpath-canonicalized form of both PATH and
$LARRY_HOME. Closes V4 + V6 + V11 (de-sanitization key, OAuth tokens,
PHI clear-text audit log). The proactive same-pattern sweep extended
the block from read_file alone to grep_files/glob_files/list_dir.
- /load <file> pre-routes HL7-shaped content through lib/hl7-sanitize.sh
(segment-aware tokenizer) BEFORE the user_input auto-PHI pass. Closes
V3 — smat dumps loaded via /load no longer rely on the lighter per-word
classifier.
- LARRY_AUTO_PHI=strict (fourth value alongside off/on/confirm) is the
fail-closed mode. Aborts the turn when sanitizer is missing or returns
empty on HL7-shaped content, or when tokenize-value fails. On the
tool-result surface (can't kill an in-flight tool_use), substitutes
the result with a refusal sentinel so raw HL7 NEVER reaches the model.
Existing off/on/confirm semantics unchanged. /phi-auto strict toggle,
/help text, and tests updated. Closes V5.
Refs:
Deliverables/2026-05-27-cloverleaf-larry-phi-leak-audit.md (Vera)
Deliverables/2026-05-27-cloverleaf-larry-phi-mitigation-research.md (Pax)
Verification: bash -n clean; path-block unit-tested with 13 cases including
symlink resolution (file and dir), ../ traversal, nonexistent paths, and
the empty-LARRY_HOME edge case — all pass.
Co-Authored-By: Clover (Claude Opus 4.7) <noreply@anthropic.com>
This commit is contained in:
parent
9dd5821436
commit
7434e6e8b8
61
CHANGELOG.md
61
CHANGELOG.md
@ -4,6 +4,67 @@ All notable changes to `cloverleaf-larry` / `larry-anywhere` are recorded here.
|
|||||||
Versioning is loose-semver; bumps trigger the in-process self-update on every
|
Versioning is loose-semver; bumps trigger the in-process self-update on every
|
||||||
running client via `LARRY_BASE_URL` + `MANIFEST`.
|
running client via `LARRY_BASE_URL` + `MANIFEST`.
|
||||||
|
|
||||||
|
## v0.8.0 — 2026-05-27
|
||||||
|
|
||||||
|
PHI-safety quick-wins pack — three independent zero-risk patches closing
|
||||||
|
four gap-classes Vera identified in the v0.7.5 static audit
|
||||||
|
(`Deliverables/2026-05-27-cloverleaf-larry-phi-leak-audit.md`) with Pax's
|
||||||
|
recommended mitigations
|
||||||
|
(`Deliverables/2026-05-27-cloverleaf-larry-phi-mitigation-research.md`).
|
||||||
|
No new dependencies, no behavior change for users not interacting with PHI.
|
||||||
|
|
||||||
|
- **`read_file`/`grep_files`/`glob_files`/`list_dir` path-block list
|
||||||
|
(closes V4 + V6 + V11).** Refuse — with a structured JSON error the
|
||||||
|
model must surface, NOT a silent "file not found" — any tool-side
|
||||||
|
attempt to read or enumerate under `$LARRY_HOME/log/` (auto-phi.log,
|
||||||
|
headers.log, oauth.log, session logs), `$LARRY_HOME/sanitize/`
|
||||||
|
(lookup.tsv — the desanitization key), `$LARRY_HOME/sessions/`,
|
||||||
|
`$LARRY_HOME/.oauth.json`, or `$LARRY_HOME/.env`. Block-list resolves
|
||||||
|
`$LARRY_HOME` at call time (not script-parse time) and runs against
|
||||||
|
both the literal path and its `realpath -m` canonical form, so symlink
|
||||||
|
detours don't bypass. The proactive same-pattern sweep (Bryan standing
|
||||||
|
rule, 2026-05-27) extended the block from `tool_read_file` alone to
|
||||||
|
also cover `tool_grep_files`, `tool_glob_files`, and `tool_list_dir`
|
||||||
|
— those tools would otherwise leak filenames or grep-matched content
|
||||||
|
out of the same protected dirs without any approval gate.
|
||||||
|
|
||||||
|
- **`/load <file>` HL7 pre-routing (closes V3).** When the loaded file's
|
||||||
|
content matches `_auto_phi_looks_like_hl7`, route it through
|
||||||
|
`lib/hl7-sanitize.sh` (the segment-aware tokenizer with the full PHI
|
||||||
|
field rule set: PID, PV1, NK1, GT1, IN1, OBR, OBX, DG1, ORC) BEFORE
|
||||||
|
the existing user_input auto-PHI pass. Closes the gap where smat dumps
|
||||||
|
loaded via `/load` only got the lighter per-word classifier, which
|
||||||
|
misses bare HL7 PID fields. Status line reports how many fields were
|
||||||
|
tokenized: `phi> /load: hl7-sanitize.sh tokenized N HL7 field(s) from
|
||||||
|
<file> before passing to auto-PHI`. Strict mode (see below) aborts the
|
||||||
|
`/load` if sanitize fails; default/confirm modes warn-and-continue.
|
||||||
|
|
||||||
|
- **`LARRY_AUTO_PHI=strict` fail-closed mode (closes V5).** New fourth
|
||||||
|
value alongside `off / on / confirm`. In strict mode, the auto-PHI
|
||||||
|
pipeline aborts the surrounding turn (no payload built, no API call)
|
||||||
|
when: (a) `lib/hl7-sanitize.sh` is missing/non-executable on HL7-shaped
|
||||||
|
user_input, (b) the sanitizer returns empty on HL7-shaped content,
|
||||||
|
(c) any single value's `tokenize-value` call fails inside the
|
||||||
|
detection loop. On the tool-result surface (which can't kill the
|
||||||
|
in-flight tool_use), strict mode substitutes the result with a
|
||||||
|
structured JSON refusal sentinel so the raw HL7 NEVER reaches the
|
||||||
|
model. Existing `off / on / confirm` semantics unchanged (still
|
||||||
|
fail-open per Bryan's "don't break tools" priority). Strict is the
|
||||||
|
opt-in tradeoff for HIPAA work where a silent leak is worse than a
|
||||||
|
broken turn. `/phi-auto strict` toggle and `/help` text updated.
|
||||||
|
Wired into both auto-PHI invocation sites: user input scan and the
|
||||||
|
tool-result HL7 sanitizer gate.
|
||||||
|
|
||||||
|
**Proactive same-pattern sweep (Bryan standing rule, 2026-05-27).**
|
||||||
|
Searched the codebase for other tools matching the pattern "reads
|
||||||
|
arbitrary path, returns content to model, no approval gate": found and
|
||||||
|
patched `tool_grep_files`, `tool_glob_files`, `tool_list_dir`
|
||||||
|
alongside `tool_read_file`. `bash_exec`/`ssh_exec` already require Y/N
|
||||||
|
operator approval — the operator is the gatekeeper there (a second gate
|
||||||
|
deferred to v0.8.1). No other matches.
|
||||||
|
|
||||||
|
Manifest unchanged (no new files in `lib/`).
|
||||||
|
|
||||||
## v0.7.5 — 2026-05-27
|
## v0.7.5 — 2026-05-27
|
||||||
|
|
||||||
Three focused changes, one common cause: the Cygwin/MobaXterm CR-taint pattern
|
Three focused changes, one common cause: the Cygwin/MobaXterm CR-taint pattern
|
||||||
|
|||||||
250
larry.sh
250
larry.sh
@ -57,7 +57,7 @@ set -o pipefail
|
|||||||
# ─────────────────────────────────────────────────────────────────────────────
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
# Config
|
# Config
|
||||||
# ─────────────────────────────────────────────────────────────────────────────
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
LARRY_VERSION="0.7.5"
|
LARRY_VERSION="0.8.0"
|
||||||
LARRY_HOME="${LARRY_HOME:-$HOME/.larry}"
|
LARRY_HOME="${LARRY_HOME:-$HOME/.larry}"
|
||||||
|
|
||||||
# ─────────────────────────────────────────────────────────────────────────────
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
@ -652,6 +652,24 @@ add_user_tool_results() {
|
|||||||
# ─────────────────────────────────────────────────────────────────────────────
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
tool_read_file() {
|
tool_read_file() {
|
||||||
local path="$1"
|
local path="$1"
|
||||||
|
# v0.8.0-a: PHI safety path-block list. Refuse reads under any path that
|
||||||
|
# contains the desanitization key, OAuth tokens, env secrets, the auto-PHI
|
||||||
|
# audit log (which stores PHI values in clear), or prior-session transcripts.
|
||||||
|
# Returns a structured JSON error so the model surfaces the refusal explicitly
|
||||||
|
# instead of treating it like an ordinary "not found". Closes V4, V6, V11
|
||||||
|
# from Vera's audit (Deliverables/2026-05-27-cloverleaf-larry-phi-leak-audit.md).
|
||||||
|
#
|
||||||
|
# Block-list is computed AT CALL TIME (not script-parse time) so $LARRY_HOME
|
||||||
|
# resolves against the running process's value. realpath normalization is
|
||||||
|
# best-effort — symlinks resolve when realpath is available, otherwise we
|
||||||
|
# fall back to literal-prefix comparison on both the user-supplied path
|
||||||
|
# AND the canonicalized form.
|
||||||
|
if _read_file_path_blocked "$path"; then
|
||||||
|
printf '{"error":"path blocked by PHI safety policy","path":%s,"reason":"%s"}\n' \
|
||||||
|
"$(printf '%s' "$path" | jq -Rs .)" \
|
||||||
|
"this path is under \$LARRY_HOME/log, sanitize, sessions, or contains an OAuth/env secret file; access denied to prevent de-sanitization or credential leak"
|
||||||
|
return
|
||||||
|
fi
|
||||||
if [ ! -e "$path" ]; then echo "ERROR: file not found: $path"; return; fi
|
if [ ! -e "$path" ]; then echo "ERROR: file not found: $path"; return; fi
|
||||||
if [ ! -f "$path" ]; then echo "ERROR: not a regular file: $path"; return; fi
|
if [ ! -f "$path" ]; then echo "ERROR: not a regular file: $path"; return; fi
|
||||||
local size; size=$(wc -c < "$path" 2>/dev/null || echo 0)
|
local size; size=$(wc -c < "$path" 2>/dev/null || echo 0)
|
||||||
@ -662,14 +680,88 @@ tool_read_file() {
|
|||||||
awk '{printf "%6d\t%s\n", NR, $0}' "$path"
|
awk '{printf "%6d\t%s\n", NR, $0}' "$path"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# v0.8.0-a: True (0) if PATH is on the PHI-safety block list.
|
||||||
|
# Blocks (each compared against both the literal path and its canonicalized
|
||||||
|
# form, against both the literal $LARRY_HOME and its canonicalized form —
|
||||||
|
# four-way comparison handles macOS /tmp → /private/tmp symlinking and
|
||||||
|
# user-supplied symlinks alike):
|
||||||
|
# $LARRY_HOME/log/ — auto-phi.log, oauth.log, headers.log, session logs
|
||||||
|
# $LARRY_HOME/sanitize/ — lookup.tsv (the desanitization key)
|
||||||
|
# $LARRY_HOME/sessions/ — prior transcript history
|
||||||
|
# $LARRY_HOME/.oauth.json — OAuth subscription tokens
|
||||||
|
# $LARRY_HOME/.env — env-var secrets (if present)
|
||||||
|
#
|
||||||
|
# Portability:
|
||||||
|
# - GNU `realpath -m` resolves nonexistent paths; macOS `realpath` requires
|
||||||
|
# the path to exist. We try `realpath -m` first, then plain `realpath`,
|
||||||
|
# then a python3 fallback (os.path.realpath, which works on nonexistent
|
||||||
|
# paths everywhere we ship). If all three fail, literal-prefix is the
|
||||||
|
# only remaining defense — the block still works for direct attempts.
|
||||||
|
_read_file_canon() {
|
||||||
|
local p="$1"
|
||||||
|
[ -z "$p" ] && return 1
|
||||||
|
local out
|
||||||
|
if command -v realpath >/dev/null 2>&1; then
|
||||||
|
out=$(realpath -m "$p" 2>/dev/null || true)
|
||||||
|
if [ -n "$out" ]; then printf '%s' "$out"; return 0; fi
|
||||||
|
out=$(realpath "$p" 2>/dev/null || true)
|
||||||
|
if [ -n "$out" ]; then printf '%s' "$out"; return 0; fi
|
||||||
|
fi
|
||||||
|
if command -v python3 >/dev/null 2>&1; then
|
||||||
|
out=$(python3 -c 'import os,sys; print(os.path.realpath(sys.argv[1]))' "$p" 2>/dev/null || true)
|
||||||
|
if [ -n "$out" ]; then printf '%s' "$out"; return 0; fi
|
||||||
|
fi
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
_read_file_path_blocked() {
|
||||||
|
local p="$1"
|
||||||
|
local home="${LARRY_HOME%/}"
|
||||||
|
[ -z "$home" ] && return 1
|
||||||
|
local canon hcanon
|
||||||
|
canon=$(_read_file_canon "$p" 2>/dev/null || true)
|
||||||
|
hcanon=$(_read_file_canon "$home" 2>/dev/null || true)
|
||||||
|
local h hp
|
||||||
|
for h in "$home" "$hcanon"; do
|
||||||
|
[ -z "$h" ] && continue
|
||||||
|
for hp in "$p" "$canon"; do
|
||||||
|
[ -z "$hp" ] && continue
|
||||||
|
case "$hp" in
|
||||||
|
"$h"/log|"$h"/log/*) return 0 ;;
|
||||||
|
"$h"/sanitize|"$h"/sanitize/*) return 0 ;;
|
||||||
|
"$h"/sessions|"$h"/sessions/*) return 0 ;;
|
||||||
|
"$h"/.oauth.json|"$h"/.oauth.json.*) return 0 ;;
|
||||||
|
"$h"/.env|"$h"/.env.*) return 0 ;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
done
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
tool_list_dir() {
|
tool_list_dir() {
|
||||||
local path="${1:-.}"
|
local path="${1:-.}"
|
||||||
|
# v0.8.0-a sweep: list_dir of $LARRY_HOME/log etc. leaks filenames (e.g.
|
||||||
|
# session-2026-05-27-deadbeef.log) and existence of .oauth.json. Block.
|
||||||
|
if _read_file_path_blocked "$path"; then
|
||||||
|
printf '{"error":"path blocked by PHI safety policy","path":%s,"reason":"%s"}\n' \
|
||||||
|
"$(printf '%s' "$path" | jq -Rs .)" \
|
||||||
|
"directory listing denied for \$LARRY_HOME/log, sanitize, sessions"
|
||||||
|
return
|
||||||
|
fi
|
||||||
if [ ! -d "$path" ]; then echo "ERROR: not a directory: $path"; return; fi
|
if [ ! -d "$path" ]; then echo "ERROR: not a directory: $path"; return; fi
|
||||||
ls -la --color=never "$path" 2>/dev/null || ls -la "$path"
|
ls -la --color=never "$path" 2>/dev/null || ls -la "$path"
|
||||||
}
|
}
|
||||||
|
|
||||||
tool_grep_files() {
|
tool_grep_files() {
|
||||||
local pattern="$1"; local path="${2:-.}"
|
local pattern="$1"; local path="${2:-.}"
|
||||||
|
# v0.8.0-a sweep: grep_files of $LARRY_HOME/log/auto-phi.log would emit
|
||||||
|
# JSONL value/token pairs in clear (same de-sanitization risk as read_file).
|
||||||
|
if _read_file_path_blocked "$path"; then
|
||||||
|
printf '{"error":"path blocked by PHI safety policy","path":%s,"reason":"%s"}\n' \
|
||||||
|
"$(printf '%s' "$path" | jq -Rs .)" \
|
||||||
|
"grep denied for \$LARRY_HOME/log, sanitize, sessions, OAuth, env"
|
||||||
|
return
|
||||||
|
fi
|
||||||
if [ ! -e "$path" ]; then echo "ERROR: path not found: $path"; return; fi
|
if [ ! -e "$path" ]; then echo "ERROR: path not found: $path"; return; fi
|
||||||
local total
|
local total
|
||||||
total=$(grep -rnI --color=never -c "$pattern" "$path" 2>/dev/null \
|
total=$(grep -rnI --color=never -c "$pattern" "$path" 2>/dev/null \
|
||||||
@ -682,6 +774,14 @@ tool_grep_files() {
|
|||||||
|
|
||||||
tool_glob_files() {
|
tool_glob_files() {
|
||||||
local pattern="$1"; local path="${2:-.}"
|
local pattern="$1"; local path="${2:-.}"
|
||||||
|
# v0.8.0-a sweep: glob_files of $LARRY_HOME/sessions/ would enumerate every
|
||||||
|
# past session log filename; block.
|
||||||
|
if _read_file_path_blocked "$path"; then
|
||||||
|
printf '{"error":"path blocked by PHI safety policy","path":%s,"reason":"%s"}\n' \
|
||||||
|
"$(printf '%s' "$path" | jq -Rs .)" \
|
||||||
|
"glob denied for \$LARRY_HOME/log, sanitize, sessions"
|
||||||
|
return
|
||||||
|
fi
|
||||||
if [ ! -d "$path" ]; then echo "ERROR: not a directory: $path"; return; fi
|
if [ ! -d "$path" ]; then echo "ERROR: not a directory: $path"; return; fi
|
||||||
local all; all=$(find "$path" -type f -name "$pattern" 2>/dev/null)
|
local all; all=$(find "$path" -type f -name "$pattern" 2>/dev/null)
|
||||||
local total; total=$(printf '%s\n' "$all" | grep -c .)
|
local total; total=$(printf '%s\n' "$all" | grep -c .)
|
||||||
@ -1064,8 +1164,8 @@ preprocess_phi_markers() {
|
|||||||
# Behavior controls
|
# Behavior controls
|
||||||
# -----------------
|
# -----------------
|
||||||
# env LARRY_AUTO_PHI 1 (default, ON) | 0 (off) | confirm (prompt on
|
# env LARRY_AUTO_PHI 1 (default, ON) | 0 (off) | confirm (prompt on
|
||||||
# Tier 3-4 matches)
|
# Tier 3-4 matches) | strict (v0.8.0, fail-closed)
|
||||||
# /phi-auto on|off|confirm|status
|
# /phi-auto on|off|confirm|strict|status
|
||||||
# !nophi <prompt> per-turn override (strip prefix, skip auto-PHI)
|
# !nophi <prompt> per-turn override (strip prefix, skip auto-PHI)
|
||||||
#
|
#
|
||||||
# After each pass, a dim status line summarises what was caught:
|
# After each pass, a dim status line summarises what was caught:
|
||||||
@ -1079,13 +1179,15 @@ preprocess_phi_markers() {
|
|||||||
# ─────────────────────────────────────────────────────────────────────────────
|
# ─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
# Mode resolution. Env default per spec: ON unless 0 / off.
|
# Mode resolution. Env default per spec: ON unless 0 / off.
|
||||||
# Accepted env values: "1" / "on" / "" → on ; "0" / "off" → off ; "confirm" → confirm.
|
# Accepted env values: "1" / "on" / "" → on ; "0" / "off" → off ; "confirm" → confirm ;
|
||||||
|
# "strict" → fail-closed (v0.8.0-c).
|
||||||
# (aggressive accepted as an alias for "on" to preserve af2ffe8 muscle memory.)
|
# (aggressive accepted as an alias for "on" to preserve af2ffe8 muscle memory.)
|
||||||
_resolve_auto_phi_mode() {
|
_resolve_auto_phi_mode() {
|
||||||
local v="${LARRY_AUTO_PHI:-1}"
|
local v="${LARRY_AUTO_PHI:-1}"
|
||||||
case "$v" in
|
case "$v" in
|
||||||
0|off|OFF) printf 'off' ;;
|
0|off|OFF) printf 'off' ;;
|
||||||
confirm|CONFIRM) printf 'confirm' ;;
|
confirm|CONFIRM) printf 'confirm' ;;
|
||||||
|
strict|STRICT) printf 'strict' ;;
|
||||||
1|on|ON|aggressive|"") printf 'on' ;;
|
1|on|ON|aggressive|"") printf 'on' ;;
|
||||||
*) printf 'on' ;;
|
*) printf 'on' ;;
|
||||||
esac
|
esac
|
||||||
@ -1453,11 +1555,32 @@ _auto_phi_looks_like_hl7() {
|
|||||||
|
|
||||||
# Main detector. Args: surface ("user_input"|"tool_result"), input text.
|
# Main detector. Args: surface ("user_input"|"tool_result"), input text.
|
||||||
# Echoes the rewritten input. Status message goes to stderr.
|
# Echoes the rewritten input. Status message goes to stderr.
|
||||||
|
#
|
||||||
|
# v0.8.0-c: in LARRY_AUTO_PHI=strict mode, this function may signal a
|
||||||
|
# fail-closed abort by:
|
||||||
|
# - returning exit code 42, AND
|
||||||
|
# - leaving the explanatory message on stderr (no stdout content).
|
||||||
|
# Callers MUST check the return code and abort the surrounding turn when
|
||||||
|
# they observe 42. The surrounding turn does NOT proceed with the original
|
||||||
|
# input on strict abort; that would defeat the whole point of fail-closed.
|
||||||
auto_detect_phi() {
|
auto_detect_phi() {
|
||||||
local surface="$1"
|
local surface="$1"
|
||||||
local input="$2"
|
local input="$2"
|
||||||
local sanitize_script="$LARRY_LIB_DIR/hl7-sanitize.sh"
|
local sanitize_script="$LARRY_LIB_DIR/hl7-sanitize.sh"
|
||||||
[ -x "$sanitize_script" ] || { printf '%s' "$input"; return; }
|
|
||||||
|
# v0.8.0-c: strict mode aborts when sanitizer is unavailable AND the input
|
||||||
|
# is HL7-shaped (the case where leaking would be most likely). Non-HL7 inputs
|
||||||
|
# in strict mode still get the best-effort pass; strict is about not
|
||||||
|
# silently passing HL7 PHI through when the tokenizer is broken.
|
||||||
|
if [ ! -x "$sanitize_script" ]; then
|
||||||
|
if [ "$AUTO_PHI_MODE" = "strict" ] && _auto_phi_looks_like_hl7 "$input"; then
|
||||||
|
printf 'error: auto-PHI sanitizer unavailable (missing or non-executable: %s); LARRY_AUTO_PHI=strict aborts turn (set LARRY_AUTO_PHI=on to fall back to best-effort)\n' \
|
||||||
|
"$sanitize_script" >&2
|
||||||
|
return 42
|
||||||
|
fi
|
||||||
|
printf '%s' "$input"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
# Per-turn override (user-input surface only).
|
# Per-turn override (user-input surface only).
|
||||||
if [ "$surface" = "user_input" ] && [[ "$input" == '!nophi '* ]]; then
|
if [ "$surface" = "user_input" ] && [[ "$input" == '!nophi '* ]]; then
|
||||||
@ -1561,7 +1684,18 @@ auto_detect_phi() {
|
|||||||
# Tokenize via the canonical pipeline.
|
# Tokenize via the canonical pipeline.
|
||||||
local token
|
local token
|
||||||
token=$("$sanitize_script" tokenize-value --category "$cat" "$orig" 2>/dev/null)
|
token=$("$sanitize_script" tokenize-value --category "$cat" "$orig" 2>/dev/null)
|
||||||
[ -z "$token" ] && continue
|
# v0.8.0-c: strict mode aborts the whole turn if any single value's
|
||||||
|
# tokenize-value call fails — passing the original value through would
|
||||||
|
# be a silent leak, which is exactly what strict is opted-in to prevent.
|
||||||
|
# Default ("on") and "confirm" still skip-and-continue (fail-open).
|
||||||
|
if [ -z "$token" ]; then
|
||||||
|
if [ "$AUTO_PHI_MODE" = "strict" ]; then
|
||||||
|
printf 'error: auto-PHI tokenize-value returned empty for value (category=%s); LARRY_AUTO_PHI=strict aborts turn (run /phi-auto on to fall back to best-effort)\n' \
|
||||||
|
"$cat" >&2
|
||||||
|
return 42
|
||||||
|
fi
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
# Substitute. Literal string replace catches all occurrences.
|
# Substitute. Literal string replace catches all occurrences.
|
||||||
input="${input//"$orig"/"$token"}"
|
input="${input//"$orig"/"$token"}"
|
||||||
@ -3125,6 +3259,19 @@ agent_turn() {
|
|||||||
;;
|
;;
|
||||||
nc_msgs|hl7_field|hl7_diff) _ap_eligible=1 ;;
|
nc_msgs|hl7_field|hl7_diff) _ap_eligible=1 ;;
|
||||||
esac
|
esac
|
||||||
|
# v0.8.0-c: strict mode aborts if sanitizer script is missing/non-exec
|
||||||
|
# when we have HL7-shaped output. We can't kill the tool-loop iteration
|
||||||
|
# without sending SOMETHING back to satisfy the tool_use; substitute
|
||||||
|
# a refusal that the model can surface to Bryan, NOT the raw HL7.
|
||||||
|
if [ "$AUTO_PHI_MODE" = "strict" ] \
|
||||||
|
&& [ "$_ap_eligible" = "1" ] \
|
||||||
|
&& _auto_phi_looks_like_hl7 "$result" \
|
||||||
|
&& [ ! -x "$LARRY_LIB_DIR/hl7-sanitize.sh" ]; then
|
||||||
|
printf '%sphi>%s strict mode: hl7-sanitize.sh unavailable; replacing %s result with refusal sentinel (raw HL7 NOT sent to model)\n' \
|
||||||
|
"$C_DIM" "$C_RESET" "$name" >&2
|
||||||
|
result='{"error":"auto-PHI sanitizer unavailable on HL7-shaped result","tool":"'"$name"'","action":"result withheld; set LARRY_AUTO_PHI=on to fall back to best-effort, or repair lib/hl7-sanitize.sh"}'
|
||||||
|
_ap_eligible=0 # skip the normal sanitize path below
|
||||||
|
fi
|
||||||
if [ "$_ap_eligible" = "1" ] && _auto_phi_looks_like_hl7 "$result"; then
|
if [ "$_ap_eligible" = "1" ] && _auto_phi_looks_like_hl7 "$result"; then
|
||||||
local _ap_tmp _ap_sanitized _ap_before _ap_after
|
local _ap_tmp _ap_sanitized _ap_before _ap_after
|
||||||
_ap_tmp=$(mktemp)
|
_ap_tmp=$(mktemp)
|
||||||
@ -3141,6 +3288,15 @@ agent_turn() {
|
|||||||
AUTO_PHI_SESSION_COUNT=$(( AUTO_PHI_SESSION_COUNT + _ap_new ))
|
AUTO_PHI_SESSION_COUNT=$(( AUTO_PHI_SESSION_COUNT + _ap_new ))
|
||||||
_auto_phi_log "(hl7-sanitize batch)" "BATCH" "(+${_ap_new} tokens)" "hl7_pipeline" "tool_result" "$name"
|
_auto_phi_log "(hl7-sanitize batch)" "BATCH" "(+${_ap_new} tokens)" "hl7_pipeline" "tool_result" "$name"
|
||||||
fi
|
fi
|
||||||
|
else
|
||||||
|
# v0.8.0-c: sanitizer returned empty (failure) on HL7-shaped input.
|
||||||
|
# In strict mode, refuse the result. In default/confirm, keep prior
|
||||||
|
# fail-open behavior (raw result flows — preserves "don't break tools").
|
||||||
|
if [ "$AUTO_PHI_MODE" = "strict" ]; then
|
||||||
|
printf '%sphi>%s strict mode: hl7-sanitize.sh returned empty on HL7-shaped %s result; replacing with refusal sentinel (raw HL7 NOT sent to model)\n' \
|
||||||
|
"$C_DIM" "$C_RESET" "$name" >&2
|
||||||
|
result='{"error":"auto-PHI sanitize returned empty on HL7-shaped result","tool":"'"$name"'","action":"result withheld; set LARRY_AUTO_PHI=on to fall back to best-effort"}'
|
||||||
|
fi
|
||||||
fi
|
fi
|
||||||
rm -f "$_ap_tmp"
|
rm -f "$_ap_tmp"
|
||||||
fi
|
fi
|
||||||
@ -3288,12 +3444,26 @@ Slash commands:
|
|||||||
Modes (env LARRY_AUTO_PHI or /phi-auto):
|
Modes (env LARRY_AUTO_PHI or /phi-auto):
|
||||||
on default — all four tiers always tokenize (caution-first)
|
on default — all four tiers always tokenize (caution-first)
|
||||||
confirm Tier 3-4 prompts Y/n once per session per canonical value
|
confirm Tier 3-4 prompts Y/n once per session per canonical value
|
||||||
|
strict (v0.8.0) fail-closed — HL7-shaped content aborts the turn
|
||||||
|
if hl7-sanitize.sh is missing or returns empty, or if any
|
||||||
|
single value's tokenize-value call fails. Use for HIPAA work
|
||||||
|
where a silent leak is worse than a broken turn.
|
||||||
off disable auto-detection entirely (manual markers still work)
|
off disable auto-detection entirely (manual markers still work)
|
||||||
|
|
||||||
Per-turn override: prefix any prompt with "!nophi " to skip the scan
|
Per-turn override: prefix any prompt with "!nophi " to skip the scan
|
||||||
for that turn only. Explicit @@VALUE / {{phi:VALUE}} markers always
|
for that turn only. Explicit @@VALUE / {{phi:VALUE}} markers always
|
||||||
win — they are processed first; auto-PHI fills only the gaps.
|
win — they are processed first; auto-PHI fills only the gaps.
|
||||||
|
|
||||||
|
/load (v0.8.0): HL7-shaped file content is pre-routed through
|
||||||
|
hl7-sanitize.sh (the segment-aware tokenizer) BEFORE the user_input scan.
|
||||||
|
strict mode aborts /load if sanitize fails on HL7-shaped content.
|
||||||
|
|
||||||
|
read_file / grep_files / glob_files / list_dir (v0.8.0): refuse paths
|
||||||
|
under \$LARRY_HOME/log, \$LARRY_HOME/sanitize, \$LARRY_HOME/sessions,
|
||||||
|
\$LARRY_HOME/.oauth.json, \$LARRY_HOME/.env. These hold the
|
||||||
|
de-sanitization key (lookup.tsv), PHI clear-text audit log, prior
|
||||||
|
sessions, and OAuth tokens — the model never gets to read them.
|
||||||
|
|
||||||
Audit: every tokenization writes a JSONL entry to
|
Audit: every tokenization writes a JSONL entry to
|
||||||
\$LARRY_HOME/log/auto-phi.log (ts/value/category/token/tier/surface/context).
|
\$LARRY_HOME/log/auto-phi.log (ts/value/category/token/tier/surface/context).
|
||||||
/redetect re-scan for HCIROOT/HCISITE/tools
|
/redetect re-scan for HCIROOT/HCISITE/tools
|
||||||
@ -3483,7 +3653,7 @@ _LARRY_SLASH_CMDS_DESC=(
|
|||||||
[/hl7-fields]="<SEG.FIELD> print component breakdown (e.g. /hl7-fields PID.5)"
|
[/hl7-fields]="<SEG.FIELD> print component breakdown (e.g. /hl7-fields PID.5)"
|
||||||
[/mouse]="on|off toggle xterm mouse mode for this session"
|
[/mouse]="on|off toggle xterm mouse mode for this session"
|
||||||
[/origin]="show/pin auto-update origin (gitea|auto|<https URL>) — v0.7.4 single-source"
|
[/origin]="show/pin auto-update origin (gitea|auto|<https URL>) — v0.7.4 single-source"
|
||||||
[/phi-auto]="on|off|confirm|status — runtime control for v0.7.3 auto PHI detection"
|
[/phi-auto]="on|off|confirm|strict|status — runtime control for v0.7.3+v0.8.0 auto PHI detection"
|
||||||
)
|
)
|
||||||
|
|
||||||
# __larry_complete_slash — bound to TAB via `bind -x` (see _install_readline_tab).
|
# __larry_complete_slash — bound to TAB via `bind -x` (see _install_readline_tab).
|
||||||
@ -4333,11 +4503,17 @@ main_loop() {
|
|||||||
AUTO_PHI_MODE="confirm"
|
AUTO_PHI_MODE="confirm"
|
||||||
larry_say "auto-PHI: confirm (Tier 3-4 matches prompt Y/n; Tier 1-2 still always tokenize)"
|
larry_say "auto-PHI: confirm (Tier 3-4 matches prompt Y/n; Tier 1-2 still always tokenize)"
|
||||||
;;
|
;;
|
||||||
|
strict)
|
||||||
|
# v0.8.0-c: fail-closed mode. HL7-shaped content with broken
|
||||||
|
# sanitizer aborts the turn instead of passing through.
|
||||||
|
AUTO_PHI_MODE="strict"
|
||||||
|
larry_say "auto-PHI: strict (fail-closed — HL7-shaped content aborts turn if hl7-sanitize.sh missing or returns empty; tokenize-value failure aborts turn)"
|
||||||
|
;;
|
||||||
status)
|
status)
|
||||||
larry_say "auto-PHI: $AUTO_PHI_MODE (this session tokenized: $AUTO_PHI_SESSION_COUNT) log: $AUTO_PHI_LOG"
|
larry_say "auto-PHI: $AUTO_PHI_MODE (this session tokenized: $AUTO_PHI_SESSION_COUNT) log: $AUTO_PHI_LOG"
|
||||||
;;
|
;;
|
||||||
*)
|
*)
|
||||||
err "usage: /phi-auto on|off|confirm (no arg → status)"
|
err "usage: /phi-auto on|off|confirm|strict (no arg → status)"
|
||||||
;;
|
;;
|
||||||
esac
|
esac
|
||||||
continue ;;
|
continue ;;
|
||||||
@ -4491,6 +4667,52 @@ main_loop() {
|
|||||||
/load\ *) local f="${input#/load }"
|
/load\ *) local f="${input#/load }"
|
||||||
if [ ! -f "$f" ]; then err "no such file: $f"; continue; fi
|
if [ ! -f "$f" ]; then err "no such file: $f"; continue; fi
|
||||||
input="$(cat "$f")"
|
input="$(cat "$f")"
|
||||||
|
# v0.8.0-b: pre-route HL7-shaped /load content through
|
||||||
|
# hl7-sanitize.sh BEFORE it enters the user_input auto-PHI
|
||||||
|
# pipeline. The user_input scan (per-word classifier) is
|
||||||
|
# weaker than hl7-sanitize.sh's segment-aware field tokenizer
|
||||||
|
# for raw HL7 dumps. Closes V3 from Vera's audit.
|
||||||
|
#
|
||||||
|
# LARRY_AUTO_PHI semantics:
|
||||||
|
# off — bypass entirely (operator opted out)
|
||||||
|
# strict — abort /load if hl7-sanitize.sh missing OR returns empty
|
||||||
|
# on/default/confirm — best-effort; warn-and-continue on sanitize failure
|
||||||
|
if [ "$AUTO_PHI_MODE" != "off" ] && _auto_phi_looks_like_hl7 "$input"; then
|
||||||
|
local _ld_sanitize="$LARRY_LIB_DIR/hl7-sanitize.sh"
|
||||||
|
if [ ! -x "$_ld_sanitize" ]; then
|
||||||
|
if [ "$AUTO_PHI_MODE" = "strict" ]; then
|
||||||
|
err "/load aborted: HL7-shaped content but hl7-sanitize.sh unavailable (LARRY_AUTO_PHI=strict)"
|
||||||
|
continue
|
||||||
|
else
|
||||||
|
warn "/load: HL7-shaped content but hl7-sanitize.sh unavailable — content passed through best-effort user_input scan only"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
local _ld_tmp _ld_sanitized _ld_before _ld_after _ld_new
|
||||||
|
_ld_tmp=$(mktemp)
|
||||||
|
printf '%s' "$input" > "$_ld_tmp"
|
||||||
|
_ld_before=$(bash "$_ld_sanitize" count 2>/dev/null || echo 0)
|
||||||
|
_ld_sanitized=$(bash "$_ld_sanitize" "$_ld_tmp" 2>/dev/null)
|
||||||
|
rm -f "$_ld_tmp"
|
||||||
|
if [ -z "$_ld_sanitized" ]; then
|
||||||
|
if [ "$AUTO_PHI_MODE" = "strict" ]; then
|
||||||
|
err "/load aborted: hl7-sanitize.sh returned empty on HL7-shaped content (LARRY_AUTO_PHI=strict)"
|
||||||
|
continue
|
||||||
|
else
|
||||||
|
warn "/load: hl7-sanitize.sh returned empty — content passed through best-effort user_input scan only"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
input="$_ld_sanitized"
|
||||||
|
_ld_after=$(bash "$_ld_sanitize" count 2>/dev/null || echo 0)
|
||||||
|
_ld_new=$((_ld_after - _ld_before))
|
||||||
|
if [ "$_ld_new" -gt 0 ]; then
|
||||||
|
printf '%sphi>%s /load: hl7-sanitize.sh tokenized %d HL7 field(s) from %s before passing to auto-PHI\n' \
|
||||||
|
"$C_DIM" "$C_RESET" "$_ld_new" "$f" >&2
|
||||||
|
AUTO_PHI_SESSION_COUNT=$(( AUTO_PHI_SESSION_COUNT + _ld_new ))
|
||||||
|
_auto_phi_log "(hl7-sanitize /load)" "BATCH" "(+${_ld_new} tokens)" "hl7_pipeline" "user_input" "/load $f"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
fi
|
||||||
larry_say "loaded $(wc -l < "$f" | tr -d ' ') lines from $f as your next message" ;;
|
larry_say "loaded $(wc -l < "$f" | tr -d ' ') lines from $f as your next message" ;;
|
||||||
# v0.6.8: cross-env convenience commands. These templatize a prompt and
|
# v0.6.8: cross-env convenience commands. These templatize a prompt and
|
||||||
# hand it to Larry-the-LLM to execute via the existing tools (no new
|
# hand it to Larry-the-LLM to execute via the existing tools (no new
|
||||||
@ -4609,7 +4831,17 @@ EOF
|
|||||||
# things Bryan didn't manually mark. Per-turn "!nophi " prefix override
|
# things Bryan didn't manually mark. Per-turn "!nophi " prefix override
|
||||||
# is consumed inside auto_detect_phi. Bypassed entirely when mode=off.
|
# is consumed inside auto_detect_phi. Bypassed entirely when mode=off.
|
||||||
# Supersedes af2ffe8 (reverted with v0.7.1).
|
# Supersedes af2ffe8 (reverted with v0.7.1).
|
||||||
input=$(auto_detect_phi user_input "$input")
|
#
|
||||||
|
# v0.8.0-c: capture auto_detect_phi's exit code. Code 42 = strict-mode
|
||||||
|
# fail-closed signal; the error message has already been printed to
|
||||||
|
# stderr by auto_detect_phi. We skip add_user_text/agent_turn entirely,
|
||||||
|
# leaving the turn as a no-op so no payload is ever built or sent.
|
||||||
|
local _ap_rc=0
|
||||||
|
input=$(auto_detect_phi user_input "$input") || _ap_rc=$?
|
||||||
|
if [ "$_ap_rc" = "42" ]; then
|
||||||
|
err "turn aborted by LARRY_AUTO_PHI=strict (see above). Set LARRY_AUTO_PHI=on or /phi-auto on to retry without strict mode."
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
log_section "user"; log_append "$input"
|
log_section "user"; log_append "$input"
|
||||||
# v0.7.1: render the persistent status line BETWEEN turns — after the
|
# v0.7.1: render the persistent status line BETWEEN turns — after the
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user