Bryan's ask: use Larry on prod data without PHI ever leaving the client box.
Added:
lib/hl7-sanitize.sh — tokenize PHI fields in HL7 messages
lib/hl7-desanitize.sh — reverse op (local view-time unmask)
Tokenization model:
- Replace PHI fields with [[CATEGORY_NNNN]] tokens (MRN, NAME, DOB,
ADDR, PHONE, ACCT, SSN, PROV, VISIT, etc.)
- Same value → same token across messages (deterministic via local
lookup table; analysis can still correlate patients).
- Lookup table at $LARRY_HOME/sanitize/lookup.tsv mode 0600 — never
leaves the client.
- Default PHI rule set covers PID, PV1, NK1, GT1, IN1, OBR, OBX,
DG1, ORC; --rules-file to extend.
- --strict also tokenizes unknown Z segments wholesale.
Prompt-side preprocessing in larry.sh:
- {{phi:VALUE}} inline marker, auto-category lookup
- {{phi:CATEGORY:VALUE}} explicit category
- Replaced with the token BEFORE the user input enters conversation
history. The original never reaches the API.
- Local feedback "phi> {{phi:...}} → [[TOKEN]]" printed to terminal only.
New REPL slash commands:
/phi <value> tokenize a single value, print the token
/unmask <token> show original (local terminal only, never API)
/tokens show full PHI ↔ token lookup table
New tools in larry.sh schema:
hl7_sanitize agent can sanitize a file before reading PHI
tokenize-value / detokenize-value (subcommands of hl7-sanitize.sh)
Persona update (agents/larry.md):
- Documented PHI mode and rules for proactive sanitize-first behavior
MANUAL.md updated with the full PHI section including limitations.
Brings total native tools to 29.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
87 lines
2.6 KiB
Bash
Executable File
87 lines
2.6 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# hl7-desanitize.sh — reverse hl7-sanitize: replace [[CATEGORY_NNNN]] tokens
|
|
# with original values from $LARRY_HOME/sanitize/lookup.tsv.
|
|
#
|
|
# Use this LOCALLY ONLY — at view time, in your terminal. Never feed
|
|
# desanitized output back into Larry; that defeats the whole point.
|
|
#
|
|
# Usage:
|
|
# hl7-desanitize.sh [FILE] # read file or stdin
|
|
# hl7-desanitize.sh --table PATH # alternate table
|
|
# hl7-desanitize.sh --token [[NAME_0001]] # single token lookup
|
|
#
|
|
# Examples:
|
|
# # View Larry's sanitized output unmasked, in less:
|
|
# cat larry-output.txt | hl7-desanitize.sh | less
|
|
#
|
|
# # Quick single-token lookup:
|
|
# hl7-desanitize.sh --token "[[MRN_0001]]"
|
|
set -o pipefail
|
|
|
|
LARRY_HOME="${LARRY_HOME:-$HOME/.larry}"
|
|
DEFAULT_TABLE="$LARRY_HOME/sanitize/lookup.tsv"
|
|
|
|
die() { printf 'hl7-desanitize: %s\n' "$*" >&2; exit 1; }
|
|
|
|
table="$DEFAULT_TABLE"
|
|
single_token=""
|
|
input_file=""
|
|
|
|
while [ $# -gt 0 ]; do
|
|
case "$1" in
|
|
--table) shift; table="$1" ;;
|
|
--token) shift; single_token="$1" ;;
|
|
-h|--help) sed -n '2,20p' "$0"; exit 0 ;;
|
|
-*) die "unknown flag: $1" ;;
|
|
*) input_file="$1" ;;
|
|
esac
|
|
shift
|
|
done
|
|
|
|
[ -f "$table" ] || die "no lookup table at $table (sanitize first?)"
|
|
|
|
if [ -n "$single_token" ]; then
|
|
awk -F'\t' -v t="$single_token" 'NR>1 && $1==t {print $3; found=1; exit} END{if (!found) {print "no such token: " t > "/dev/stderr"; exit 2}}' "$table"
|
|
exit $?
|
|
fi
|
|
|
|
# Build sed expression set from lookup table
|
|
# Each line: token \t category \t original
|
|
# We want: s/\[\[CATEGORY_NNNN\]\]/original/g for each
|
|
# Note: original may contain sed metacharacters; escape them.
|
|
|
|
# Read table into awk, build replacement map, walk input substituting tokens.
|
|
awk_script='
|
|
BEGIN { RS = "\n" }
|
|
NR == FNR {
|
|
# Reading table
|
|
if ($1 == "token" || $1 == "") next
|
|
# cols: 1=token, 2=category, 3=original
|
|
tokens[$1] = $3
|
|
next
|
|
}
|
|
{
|
|
line = $0
|
|
# Replace each known token in the line. Tokens look like [[X_NNNN]].
|
|
# Find all matches and substitute.
|
|
while (match(line, /\[\[[A-Z_]+_[0-9]+\]\]/)) {
|
|
tok = substr(line, RSTART, RLENGTH)
|
|
if (tok in tokens) {
|
|
# Build new line by substring substitution
|
|
line = substr(line, 1, RSTART-1) tokens[tok] substr(line, RSTART+RLENGTH)
|
|
} else {
|
|
# Unknown token — leave it, but skip past so we do not infinite-loop
|
|
placeholder = "<<<unmapped:" tok ">>>"
|
|
line = substr(line, 1, RSTART-1) placeholder substr(line, RSTART+RLENGTH)
|
|
}
|
|
}
|
|
print line
|
|
}
|
|
'
|
|
|
|
if [ -n "$input_file" ]; then
|
|
awk -F'\t' "$awk_script" "$table" "$input_file"
|
|
else
|
|
awk -F'\t' "$awk_script" "$table" /dev/stdin
|
|
fi
|