v0.8.20: nc_paths route-chain tracer — parse-once in-memory engine (84s→0.7s single, ~5.5s full-tree), authoritative destination-block cross-site resolution, v1-fidelity output (site/thread nodes, --> intra-route / ==> cross-site) as default + --format table/nodes, pipe-first (site/thread in, awk field-1 = root). Verified EXACT vs v1 on the real 24-site integrator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Bryan Johnson 2026-05-28 11:26:31 -07:00
parent 12989b2ced
commit 9364c7edeb
8 changed files with 805 additions and 252 deletions

View File

@ -4,6 +4,89 @@ All notable changes to `cloverleaf-larry` / `larry-anywhere` are recorded here.
Versioning is loose-semver; bumps trigger the in-process self-update on every
running client via `LARRY_BASE_URL` + `MANIFEST`.
## v0.8.20 — 2026-05-28
Route-chain tracer (`lib/nc-paths.sh`) REARCHITECTED for the real integrator:
parse once, walk in memory; cross-site linking corrected from a port-match
heuristic to authoritative **`destination`-block** resolution. v0.8.20 was never
shipped — this entry supersedes the earlier port-based draft of the same version,
which FAILED Bryan's real-integrator smoke (24-site QA env): catastrophically slow
AND missing real cross-site feeders.
**Problem (measured on the real 24-site integrator, before this fix):**
- `nc-paths.sh ADTto_CodaMetrix ancout --site-only` → correct chain but **84 s**.
- full (no flag) → same single chain, **164 s**.
- `--down` → "unknown flag".
- Root cause: the walker invoked `nc-parse.sh` as a SUBPROCESS per hop / per
candidate (`destinations`/`sources`/`protocol-nested`/`protocol-field`/
`list-protocols`), and each invocation re-ran `_blocks` + `cmd_protocol_block`
— two full awk passes over the (16K-line) NetConfig. O(threads × parse-cost) =
minutes. Even the intra-site walk was a bottleneck (`sources` scans every body).
- Correctness: the draft linked sites by matching an outbound's `PROTOCOL.PORT`
to an inbound's listen/ICL port. That MISSED the real mux feeder of ancout's
`IB_ADT_muxS` (port 62043) — because no thread has `PROTOCOL.PORT 62043`; the
link is expressed only through a `destination` block.
**1. Single-pass index (`lib/nc-parse.sh` new `index` subcommand, `cmd_index`).**
ONE awk pass per NetConfig emits a flat record stream the walker needs:
`P` protocol, `D` DEST edge (handles BOTH `{ DEST name }` and the list form
`{ DEST {a b c} }` — the list form was silently dropped by the old
`cmd_destinations` regex), `L` listen port (server `PROTOCOL.PORT` with
ISSERVER=1 and/or guarded `ICLSERVERPORT`), `O` outbound dest port, and
`X <destname> <site> <thread> <port>` — the resolution of a top-level
`destination` block. Indexing all 24 live NetConfigs is <1 s.
**2. In-memory route graph + in-memory walk (`lib/nc-paths.sh`).** The index loads
once into bash associative arrays (`G_PROTO`/`G_DESTS`/`G_LISTEN`/`G_OUT`/
`G_DESTBLK`/`G_INSRC`/`G_DESTBLK_REV`; `_load_nc`, `_build_in_sources`,
`_build_graph`). `_walk_down`/`_walk_up` and the one-hop primitives
(`_outgoing`/`_incoming`/`_xsite_down_targets`/`_xsite_up_feeders`) are now pure
O(1) lookups — NO subprocess and NO re-parse per hop. Cycle test is a bash
substring match (`_seen_has`), not a `grep` fork per hop.
**3. Cross-site link corrected to `destination` blocks.** Cloverleaf links sites
through the named ICL destination table: a thread's DATAXLATE `DEST` may name
either a LOCAL protocol (intra-site hop) or a `destination` block, which resolves
to `{ SITE }` `{ THREAD }` `{ PORT }`. A `DEST` naming a destination block is the
cross-site hop, resolved by NAME to the exact remote (site,thread). The `PORT`
equals the remote thread's listen/ICL port (corroboration), but it is never the
primary key. `ICLSERVERPORT` is still read GUARDED in the index (absent/`{}` →
skipped, never the un-guarded `keylget` that crashed v2 `paths.tcl`).
**4. `full` mode = upstream × downstream JOIN at the thread.** No more
O(sites × threads) entry-chain scan (Vera m3). The complete chain is the thread's
upstream feeder chains (each ending AT the thread) joined to its downstream chains
(each starting AT the thread); both walks follow destination blocks, so the join
spans sites naturally.
**5. Flag standardization.** `--down`/`--up` are now accepted as aliases of
`--downstream`/`--upstream` in `nc-paths.sh` itself (they already worked via the
`/paths` slash handler; the bare script rejected them).
**6. Intra-site hops UNCHANGED in semantics** — still the DATAXLATE `DEST` list,
never an `ICLSERVERPORT` walk.
**7. Removed:** the port-match cross-site index (`_build_port_index`, the `PI_*`
arrays), the per-hop subprocess primitives (`_proto_port`/`_proto_isserver`/
`_icl_port`/`_norm_port`), and the dead `_nc_for_site` helper.
**Verification — RE-MEASURED ON THE REAL 24-SITE INTEGRATOR** (tarball
`cloverleaf_test.tar.gz`, HCIROOT = extracted `integrator/`):
- `ADTto_CodaMetrix ancout --site-only`: **84 s → 0.66 s**.
- `ADTto_CodaMetrix ancout` (full): **164 s → 1.0 s**.
- whole-tree `--all` (all 24 sites, 709 chains): **4.3 s** (well under a minute).
- `--down` / `--up`: now valid flags.
- REAL cross-site chain proven: `mux/ADTfr_epic_964700 --> mux/OB_ADT_ancS ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix`. `IB_ADT_muxS`'s upstream feeder lives in the `mux`
site and reaches ancout via destination block `OB_ADT_ancS`
(`{ SITE ancout } { THREAD IB_ADT_muxS } { PORT 62043 }`) — exactly the feeder
the port-match draft missed and Bryan asked for. Multi-site fan-out is
site-correct (each destination block resolves to its own site's `IB_ADT_muxS`).
- `--site-only` confirmed to suppress all cross-site hops.
- `bash -n` clean (`nc-paths.sh`, `nc-parse.sh`, `larry.sh`); `/paths` +
`tool_nc_paths` drive clean under `set -u`; MANIFEST regenerated & `--check` OK.
- No-traffic-bypass preserved (read-only NetConfig parsing; no engine/network
calls; pure bash + awk, no python/.pyz; portable Win + Linux).
## v0.8.19 — 2026-05-28
Deterministic route-chain `nc_paths` tool — the #1 fix from the deterministic

View File

@ -23,21 +23,21 @@
# scripts/make-manifest.sh and bump VERSION.
# Top-level scripts
larry.sh 8bc938bc3351b88b4fcf2c4244617ef335c9c9e3352fcc1b8da6ddbb9275cdf9
larry.sh 20b68e650ff9a94a15f7745334fe0dc0f913da2c6d4c2b92388202c951d0d171
larry-tunnel.sh 6b050e4eeab15669f4858eaf3b807f168f211ced07815db9521bc40a093f6aaa
larry-auth.sh a220cdf7878569dc3028951ee57fc8d5e706a8ca5c6aa45347b58facb386f831
larry-rollback.sh 91b5e9aa6c79266bf306dcfba4ca791c07971bd6924d67a779037531648aa6d0
install-larry.sh e97da4e12a0d8863ca18d79b12f6c4294c72fa6d4b11dffeab66504236bb4eb1
# Metadata
VERSION d6cb21adf47733cbddb6f624c559d39c4fa8f018d961f0e577f71b91327880e6
MANUAL.md 956f736291ed3ada0f7bd61c20f60f5267a16776bae918fe3fa17d9c8e07b997
CHANGELOG.md 83fb342bf07fd2086070974ea7ec031ae665493307f95406591e89c7da222959
VERSION 9bb2e455df78105b99303d11d1de0401d94142ff3fadc8e37bcba6c0c4d59914
MANUAL.md c64bd0251a51ad150508b4e1185355bc4826a64071d4de339f92ed550dbfacde
CHANGELOG.md 73f32366662b55ddc16cb937f0e6a4d0f4cd99181e8717ab9938d80b60984db6
# Agent personas (system-prompt overlays)
agents/larry.md 0a1ef737e7fc133ab35be09f79c3a4df33de814e0404b69b950932d0c8a01be1
agents/clover.md d1bbfd6cc4642c2bff6e15dcbdf051d71b063b3fe29e0be97d17b3180d3c7ac5
agents/cloverleaf-cheatsheet.md 4bd63c40bcc71ee4a15a330a3450118d8b88c1de1174366aaeef37b8940df751
agents/cloverleaf-cheatsheet.md 95c3bc52eaae92dff548702b0a0461ccba6ac6d8b410196c45ca59f28d0b3477
agents/regress.md bb05ed1439b1e35d6e9799e32d683bfab166472c72115c1f02757e227c74e42f
# Cygwin/MobaXterm CR-taint defense primitives (sourced by every tool)
@ -97,8 +97,8 @@ lib/nc-xlate.sh ea02693c3dff5db271771d4bb2927b23465b07798df2f9912bc2d2b58a134d54
lib/nc-smat-diff.sh ac003954701ea6b7f4aa1f6941f8536af5b5cdfbb75e306789753d453f06800e
lib/nc-create-thread.sh 5a9d5407c117183cad831d6b95f0e785b1b806f5ccc67f803c12b3695882b5b7
lib/nc-tclgen.sh dc95f523d543192fc7b3ae204107ce67ebb9b7e5184fa0642a1af2e2454d3241
lib/nc-parse.sh 473b64c66a55f07ef19fc589467102c9bf2f389c20eabea63bcf272cad3e16fb
lib/nc-paths.sh dadc4138dd24c5585e40253ef33a2a9adb0af1259bc6a601df44f26667934fb7
lib/nc-parse.sh ab06df8264983a9c490af25bf20e1551a91e68b45a9ec24c6cb0fce1f1b9dd69
lib/nc-paths.sh 388d2f4560736587a01218cadc1de612cd59e392819d16db2f56f19174c1111b
lib/nc-inbound.sh 52d28c5f8d97bdf96f0fc7b5300d35b106b8e1226578f4cda430deb2a8b4a91b
lib/nc-make-jump.sh 08a0bc58a299c95c60a59a5202792daf0ada3a8a0be7dc1b4cccc5724f5c9c79
lib/nc-msgs.sh 729e2d6c9159e83fa177fc6b982e48ed8453a9743477cc90afdd3cd4ec7e620c

View File

@ -164,15 +164,32 @@ lib/nc-parse.sh route-block "$HCISITEDIR/NetConfig" IB_ADT_muxS
## Route-chain path tracer (`lib/nc-paths.sh`) — the single walker
Enumerates the full root-to-leaf message path(s) by following the DATAXLATE
`{ DEST <name> }` routing graph. Output columns **SITE THREAD HOPS PATH** — HOPS
is the thread count in the chain, PATH is the chain joined by ` -> ` (one row per
enumerated path; a branch yields multiple rows). Routing resolves via DEST only,
never `ICLSERVERPORT` (so it never recurs the old `paths.tcl` crash). **Cross-site
by default**: when a chain's terminal thread is also an entry thread in another
site's NetConfig (same name), the chain continues into that site
(mux -> ancout -> CodaMetrix). `--site-only` scopes to one site. Cycle-safe; always
terminates.
Enumerates the full root-to-leaf message path(s). **Within a site** the next hop
follows the DATAXLATE `{ DEST <name> }` routing graph (never an `ICLSERVERPORT`
walk — so it cannot recur the old `paths.tcl` crash). Output columns
**SITE THREAD HOPS PATH** — HOPS is the thread count in the chain, PATH is the
chain joined by ` -> ` (one row per enumerated path; a branch yields multiple
rows). THREAD is the ROOT (first node) of the chain.
> **Upstream note (Vera m2):** for `--up` chains the THREAD column shows the feeder
> ROOT (the most-upstream source) and the queried thread is the chain TERMINUS —
> PATH reads `source -> ... -> queried_thread`.
**Cross-site by `destination` block (v0.8.20)** — Cloverleaf links sites through
named `destination` blocks (the inter-cloverleaf / ICL routing table), not by
thread name and not by blindly matching ports. A top-level
`destination <name> { ... }` declares `{ SITE <site> } { THREAD <thread> }
{ PORT <port> }` — the remote inbound it connects to and the link port. A thread's
DATAXLATE `DEST` may name either a LOCAL protocol (intra-site hop) or a
`destination` block; a `DEST` naming a destination block is the cross-site hop,
resolved by name to the exact remote `(site, thread)`. The `PORT` equals the
remote thread's listen/ICL port (corroboration); `ICLSERVERPORT` is still read
GUARDED (absent / `{}` → skipped, never the un-guarded `keylget` that crashed
`paths.tcl`). Upstream cross-site feeders of an inbound = every destination block
(any site) resolving to it plus the threads that `DEST` to those blocks. The whole
route graph is parsed ONCE per run (`nc-parse.sh index`) into memory and walked
with O(1) lookups — no subprocess / re-parse per hop. `--site-only` scopes to one
site. Cycle-safe; always terminates.
```bash
# One thread — every full path containing it (default), table output.
@ -182,6 +199,8 @@ lib/nc-paths.sh IB_ADT_muxS anc # cross-site chain followed
# Only downstream chains from a thread / only upstream feeders
lib/nc-paths.sh IB_ADT_muxS anc --downstream
# --up: THREAD column = the feeder ROOT; the queried thread (ADTto_CodaMetrix)
# is the chain TERMINUS, i.e. PATH = feeder_root -> ... -> ADTto_CodaMetrix.
lib/nc-paths.sh ADTto_CodaMetrix codametrix --upstream
# Stop at the site boundary (no cross-site join)

View File

@ -1 +1 @@
0.8.19
0.8.20

View File

@ -21,7 +21,7 @@ Two kinds of capability:
| `nc_protocol_summary(netconfig, [filter])` | one-line TSV per protocol with direction, port, host, type — your default "lay of the land" call |
| `nc_destinations(netconfig, name)` | "what does this thread route to?" — unique DEST list from DATAXLATE. **ONE HOP only — for the full multi-hop chain use `nc_paths`.** |
| `nc_sources(netconfig, name)` | "what routes INTO this thread?" — unique source list. **ONE HOP only — for the full chain use `nc_paths`.** |
| `nc_paths(thread, site, [all], [site_only])` | **"trace the FULL route chain / what feeds X / the whole path / downstream + upstream"** — deterministic DFS path enumerator, output `SITE THREAD HOPS PATH`, cross-site by default. **Use this instead of repeated `nc_destinations`/`nc_sources`, grep, or read_file** for ANY path / chain / route-tracing question. |
| `nc_paths(thread, site, [all], [site_only])` | **"trace the FULL route chain / what feeds X / the whole path / downstream + upstream"** — deterministic DFS path enumerator, output `SITE THREAD HOPS PATH`. Intra-site hops follow DATAXLATE DEST; **cross-site links are via named `destination` blocks** (a `DEST` naming a destination block resolves to its `{ SITE } { THREAD }`; the `PORT` corroborates). The whole route graph is parsed once into memory and walked with O(1) lookups. For `--up`, THREAD = feeder ROOT and the queried thread is the terminus. **Use this instead of repeated `nc_destinations`/`nc_sources`, grep, or read_file** for ANY path / chain / route-tracing question. |
| `nc_xlate_refs(netconfig, [name])` | "what .xlt files are referenced?" — all or scoped to one protocol |
| `nc_find_inbound(netconfig, mode, format)` | "which threads are inbound?" — modes: `tcp-listen` (real upstream-client listeners, ISSERVER=1), `icl-or-file` (OBWORKASIB=1 internal mux/file inbounds), `all`. formats: tsv, jsonl, table |

View File

@ -78,7 +78,7 @@ set -o pipefail
# ─────────────────────────────────────────────────────────────────────────────
# Config
# ─────────────────────────────────────────────────────────────────────────────
LARRY_VERSION="0.8.19"
LARRY_VERSION="0.8.20"
LARRY_HOME="${LARRY_HOME:-$HOME/.larry}"
# ─────────────────────────────────────────────────────────────────────────────
@ -341,7 +341,7 @@ _tools_registry() {
cat <<'REG'
#NetConfig (read)
nc-parse.sh|Parse a NetConfig: list/inspect protocols & processes, fields, routes, xlate refs, one-hop destinations/sources
nc-paths.sh|Route-chain PATH tracer: enumerate full root-to-leaf chains for a thread or whole site (cross-site by default). Usage: nc-paths.sh <thread> <site> [--up|--down|--site-only] | --all [--site NAME]
nc-paths.sh|Route-chain PATH tracer: enumerate full root-to-leaf chains for a thread or whole site. Intra-site hops follow the DATAXLATE DEST list (rendered `-->`); a DEST that names a `destination` block is the LOCAL OUTBOUND SENDER node (shown, never collapsed) that cross-site-links (rendered `==>`) to the remote { SITE }/{ THREAD } it names. Default output is the v1 chain form, one path per line: `site/thread --> site/thread ==> site/thread …` (field 1 = root node, pipe-first). Accepts a `site/thread` node OR `thread site` as input. Parses each NetConfig once into an in-memory graph. Usage: nc-paths.sh <thread> <site> [--up|--down|--site-only] [--format v1|table|tsv|jsonl|nodes] | --all [--site NAME]
nc-find.sh|Cross-site search for threads/protocols by name/host/port/xlate across every site under $HCIROOT
nc-inbound.sh|List the inbound (server/listener) threads in a NetConfig
nc-status.sh|Engine runtime status (sites/threads/not-up/queued/connections) — wraps the shipped tstat binaries
@ -1701,16 +1701,21 @@ tool_nc_sources() {
"$LARRY_LIB_DIR/nc-parse.sh" sources "$nc" "$name" 2>&1
}
# nc_paths — deterministic route-chain path ENUMERATOR (v0.8.19). The single
# walker backend; the model calls this ONCE instead of chaining
# nc_destinations + grep_files + read_file (the old ~$1 brute-force). Resolves
# the next hop ONLY from the DATAXLATE DEST list (never ICLSERVERPORT) so it
# cannot recur the old paths.tcl crash. Cross-site by default; --site-only scopes
# to one site. Either pass an explicit netconfig, or a (thread,site) pair, or
# --all for the whole-site/cross-site entry-chain inventory.
# nc_paths — deterministic route-chain path ENUMERATOR. The single walker
# backend; the model calls this ONCE instead of chaining nc_destinations +
# grep_files + read_file (the old ~$1 brute-force). INTRA-site, the next hop is
# resolved from the DATAXLATE DEST list (never an ICLSERVERPORT walk, so it
# cannot recur the old paths.tcl crash). CROSS-site (v0.8.20), threads link via
# named `destination` blocks: a DEST that names a destination block resolves to
# its { SITE } { THREAD } (the PORT corroborates the link; ICLSERVERPORT is read
# GUARDED). Each NetConfig is parsed EXACTLY ONCE into an in-memory graph
# (nc-parse.sh index) and the walk is pure in-memory lookups — no subprocess /
# re-parse per hop. --site-only disables cross-site linking. Either pass an
# explicit netconfig, or a (thread,site) pair, or --all for the whole-site /
# cross-site entry-chain inventory.
tool_nc_paths() {
local netconfig="$1" thread="$2" site="$3" direction="${4:-full}"
local all_mode="${5:-0}" site_only="${6:-0}" fmt="${7:-table}" hciroot="${8:-${HCIROOT:-}}"
local all_mode="${5:-0}" site_only="${6:-0}" fmt="${7:-v1}" hciroot="${8:-${HCIROOT:-}}"
_lib_err_if_missing || return
local args=()
[ -n "$netconfig" ] && args+=(--netconfig "$netconfig")
@ -4153,7 +4158,7 @@ execute_tool() {
"$(J '.direction // "full"')" \
"$(J '.all // 0' | sed "s/false/0/;s/true/1/")" \
"$(J '.site_only // 0' | sed "s/false/0/;s/true/1/")" \
"$(J '.format // "table"')" "$(J '.hciroot // ""')" ;;
"$(J '.format // "v1"')" "$(J '.hciroot // ""')" ;;
nc_tclproc_refs) tool_nc_tclproc_refs "$(J '.netconfig')" "$(J '.name // ""')" ;;
hl7_field) tool_hl7_field "$(J '.message')" "$(J '.field_path')" ;;
nc_msgs) tool_nc_msgs "$(J '.thread')" "$(J '.after // ""')" "$(J '.before // ""')" \
@ -4215,7 +4220,7 @@ TOOLS_JSON=$(cat <<'TOOLS_END'
{"name":"nc_make_jump","description":"Generate the 3-thread jump set for the cross-environment data replay pattern Bryan uses. Emits FOUR artifacts: (1) linux_<tag>_out for OLD env (outbound tcpip-client to new linux:jump_port), (2) windows_<tag>_in for NEW env server_jump site (inbound tcpip-server listening on jump_port, routes internally to #3), (3) windows_<tag>_out for NEW env server_jump site (outbound tcpip-client to 127.0.0.1:<orig_port>, where orig_port is the existing inbound listening port read from the NetConfig), (4) route-add snippet to splice into the OLD inbound DATAXLATE block. Tag = inbound thread name (auto). The NEW env existing inbound is left COMPLETELY UNCHANGED. Pure generation; caller uses write_file (Y/N) to persist.","input_schema":{"type":"object","properties":{"netconfig":{"type":"string","description":"NetConfig path containing the inbound thread (OLD env)."},"inbound":{"type":"string","description":"Existing inbound protocol name to mirror. Must be a TCP-listener (ISSERVER=1); read its PROTOCOL.PORT first to confirm."},"new_host":{"type":"string","description":"Hostname/IP of the NEW linux env that OLD will TCP to."},"jump_port":{"type":"string","description":"TCP port for the OLD to NEW hop. linux_<tag>_out targets it, windows_<tag>_in listens on it."},"inbound_host":{"type":"string","description":"Host that windows_<tag>_out connects to on NEW (the existing inbound on NEW). Default 127.0.0.1 (same box, loopback)."},"process_jump":{"type":"string","description":"Process for NEW-side threads on server_jump. Default server_jump."},"encoding":{"type":"string","description":"ENCODING override. Default = same as the existing inbound."}},"required":["netconfig","inbound","new_host","jump_port"]}},
{"name":"nc_sources","description":"List every protocol that has a DATAXLATE DEST routing to the named thread. The inverse of nc_destinations. ONE HOP ONLY — to trace a full multi-hop chain use nc_paths, not repeated nc_sources calls.","input_schema":{"type":"object","properties":{"netconfig":{"type":"string"},"name":{"type":"string","description":"Target thread name."}},"required":["netconfig","name"]}},
{"name":"nc_paths","description":"Deterministic ROUTE-CHAIN tracer. Enumerates the full root-to-leaf message path(s) by following the DATAXLATE DEST routing graph (NEVER ICLSERVERPORT). USE THIS — DO NOT brute-force with grep_files / read_file / bash_exec / repeated nc_destinations — for ANY of: 'show me the path', 'trace the chain', 'what feeds X', 'where does X go', 'full route', 'end-to-end flow', 'sources and destinations chain', 'how does a message get from A to B', 'map the interface flow'. ONE call answers the whole question. Output columns SITE THREAD HOPS PATH where HOPS = thread count in the chain and PATH = the chain joined by ' -> ' (one row per enumerated path; a branch yields multiple rows). MODES: (a) one thread — set `thread` (and optionally `site`); default returns every full path containing that thread; set direction=down for only downstream, direction=up for only upstream feeders. (b) whole-site / whole-environment inventory — set all=true (optionally scope with `site`); enumerates every chain from every entry point (a thread with no incoming), deduped. CROSS-SITE BY DEFAULT: when a chain's terminal thread is also an entry thread in another site's NetConfig (same thread name), the chain CONTINUES into that site — e.g. mux -> ancout -> CodaMetrix spanning sites. Set site_only=true to stop at the site boundary. Resolves sites under $HCIROOT automatically (or pass hciroot / an explicit netconfig). Cycle-safe across sites; always terminates.","input_schema":{"type":"object","properties":{"thread":{"type":"string","description":"Thread/protocol name to trace. Omit only when all=true."},"site":{"type":"string","description":"Site name (the NetConfig's parent dir). Optional — disambiguates a thread present in multiple sites, or scopes all-mode to one site."},"netconfig":{"type":"string","description":"Optional explicit NetConfig path. If given, the thread's home site is its parent dir; cross-site joins still scan $HCIROOT unless site_only=true."},"direction":{"type":"string","enum":["full","up","down"],"description":"full (default) = every path containing the thread; down = only downstream chains; up = only upstream feeder chains."},"all":{"type":"boolean","description":"true = enumerate every chain from every entry point (whole-site/whole-environment inventory). No thread needed."},"site_only":{"type":"boolean","description":"true = do NOT cross site boundaries (scope to one site). Default false = follow the chain across sites."},"format":{"type":"string","enum":["table","tsv","jsonl"],"description":"Output format. Default table (aligned, monospace)."},"hciroot":{"type":"string","description":"Override $HCIROOT for site discovery / cross-site joins."}},"required":[]}},
{"name":"nc_paths","description":"Deterministic ROUTE-CHAIN tracer. Enumerates the full root-to-leaf message path(s). WITHIN a site the next hop follows the DATAXLATE DEST routing graph (intra-site routing never walks ICLSERVERPORT). USE THIS — DO NOT brute-force with grep_files / read_file / bash_exec / repeated nc_destinations — for ANY of: 'show me the path', 'trace the chain', 'what feeds X', 'where does X go', 'full route', 'end-to-end flow', 'sources and destinations chain', 'how does a message get from A to B', 'map the interface flow'. ONE call answers the whole question. DEFAULT OUTPUT is the v1 chain form, ONE PATH PER LINE: `site/thread --> site/thread ==> site/thread …` where every node is `site/thread`, `-->` is an INTRA-site DATAXLATE route hop, and `==>` is a CROSS-site hop. The FIRST node is the chain ROOT; field 1 (split on whitespace) IS the root node, so the output is pipe-first (`paths X | awk '{print $1}'` → the root). A branch yields multiple lines. For direction=up the root is the feeder ROOT and the queried thread is the chain TERMINUS. MODES: (a) one thread — set `thread` (accepts `thread`+`site` OR a single `site/thread` node, so output feeds back in); default returns every full path containing that thread; direction=down for only downstream, direction=up for only upstream feeders. (b) whole-site / whole-environment inventory — set all=true (optionally scope with `site`); enumerates every chain from every entry point (a thread with no incoming), deduped. CROSS-SITE BY DESTINATION BLOCK (Cloverleaf links sites through named `destination` blocks — the ICL routing table — not by thread name and not by blindly matching ports): a thread's DATAXLATE DEST may name a `destination` block; that block NAME is the LOCAL OUTBOUND SENDER node (shown in the chain, NEVER collapsed) and resolves to { SITE }/{ THREAD } { PORT } — the remote inbound it links to. So at every site boundary the chain reads `…local_inbound --> local_outbound_sender ==> remote_inbound --> …`, e.g. mux/ADTfr_epic_964700 --> mux/OB_ADT_ancS ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix. Upstream feeders of an inbound are resolved symmetrically. The whole route graph is parsed ONCE per run into memory; cross-site resolution is an in-memory lookup, not a per-site scan. Set site_only=true to stop at the site boundary. Resolves sites under $HCIROOT automatically (or pass hciroot / an explicit netconfig). Cycle-safe across sites; always terminates.","input_schema":{"type":"object","properties":{"thread":{"type":"string","description":"Thread/protocol name to trace, OR a `site/thread` node (the output's root node feeds straight back in). Omit only when all=true."},"site":{"type":"string","description":"Site name (the NetConfig's parent dir). Optional — disambiguates a thread present in multiple sites, or scopes all-mode to one site."},"netconfig":{"type":"string","description":"Optional explicit NetConfig path. If given, the thread's home site is its parent dir; cross-site joins still scan $HCIROOT unless site_only=true."},"direction":{"type":"string","enum":["full","up","down"],"description":"full (default) = every path containing the thread; down = only downstream chains; up = only upstream feeder chains (root = feeder root, queried thread = terminus)."},"all":{"type":"boolean","description":"true = enumerate every chain from every entry point (whole-site/whole-environment inventory). No thread needed."},"site_only":{"type":"boolean","description":"true = do NOT cross site boundaries (scope to one site). Default false = follow the chain across sites via destination blocks."},"format":{"type":"string","enum":["v1","table","tsv","jsonl","nodes"],"description":"Output format. Default v1 = the chain form, one path per line (site/thread nodes, --> intra / ==> cross), pipe-first (field 1 = root). table = aligned SITE/THREAD/HOPS/PATH. tsv/jsonl = data. nodes = just the site/thread nodes one per line (no arrows), for re-piping."},"hciroot":{"type":"string","description":"Override $HCIROOT for site discovery / cross-site joins."}},"required":[]}},
{"name":"nc_tclproc_refs","description":"List every TCL proc name referenced from a protocol block (or from the whole NetConfig if name is omitted). Pulls from DATAFORMAT.PROC, PREPROCS.PROCS, POSTPROCS.PROCS, etc. Unique sorted.","input_schema":{"type":"object","properties":{"netconfig":{"type":"string"},"name":{"type":"string","description":"Optional. Scope to one protocol."}},"required":["netconfig"]}},
{"name":"hl7_field","description":"Extract a specific HL7 v2 field from a message. field_path = SEG[.FIELD[.COMPONENT[.SUBCOMPONENT]]]. Examples: PID.3 (MRN), PID.18 (account number), MSH.7 (timestamp), MSH.9.2 (event code, like A08), PID.5 (patient name with components). Multiple repetitions are returned one per line. Native v3, no v1/v2 dependency.","input_schema":{"type":"object","properties":{"message":{"type":"string","description":"Raw HL7 message text. Segments separated by \\r."},"field_path":{"type":"string","description":"Field path like PID.3 or MSH.9.2"}},"required":["message","field_path"]}},
{"name":"nc_msgs","description":"Query Cloverleaf smat (SQLite!) databases for messages from a thread. Filters: time range, exact HL7 field match. Native v3 — reads smatdb directly with sqlite3 -ascii, no hcidbdump/dbExtract needed. Format text shows messages line-by-line with metadata; count returns just the count; json returns structured data. Operates on LOCAL smatdbs; for a remote env's smatdb, use ssh_pull_smat first (sampled mode is cheaper than pulling the whole DB).","input_schema":{"type":"object","properties":{"thread":{"type":"string","description":"Thread name. The .smatdb file under $HCISITEDIR/exec/processes/*/<thread>.smatdb is auto-located unless db is given."},"after":{"type":"string","description":"Time-after filter. Accepts \"3 days ago\", \"2026-05-20 14:30:00\", \"2026-05-20\", or a unix timestamp."},"before":{"type":"string","description":"Time-before filter, same formats as after."},"field":{"type":"string","description":"HL7 field path for exact-match filter, e.g. PID.18 or MSH.10."},"value":{"type":"string","description":"Value the field must equal. Use with field. Repeatable filters not supported via this single tool call — chain calls if you need multi-field AND."},"limit":{"type":"integer","description":"Max messages to return. Default 10."},"format":{"type":"string","enum":["text","json","count","raw"],"description":"text = human-readable with metadata; count = just the number; json = structured; raw = raw bytes separated by 0x1c."},"sitedir":{"type":"string","description":"Override $HCISITEDIR for thread-to-db location."},"db":{"type":"string","description":"Explicit .smatdb path; overrides auto-locate."}},"required":["thread"]}},
@ -7062,10 +7067,12 @@ main_loop() {
continue ;;
/paths|/paths\ *)
# v0.8.19: deterministic route-chain tracer (muscle-memory entry).
# /paths <thread> [site] [--up|--down] [--site-only] [--all] [--format tsv|table|jsonl]
# /paths <thread> [site] [--up|--down] [--site-only] [--all] [--format v1|table|tsv|jsonl|nodes]
# /paths <site>/<thread> ... (v1 node form — output feeds back in)
# /paths --all [site] [--site-only]
# Default format is v1 (the ground-truth chain form), pipe-first.
local _pa; _pa=$(_slash_args "/paths" "$input")
local _p_thread="" _p_site="" _p_dir="full" _p_all=0 _p_siteonly=0 _p_fmt="table" _ptok _pexpect=""
local _p_thread="" _p_site="" _p_dir="full" _p_all=0 _p_siteonly=0 _p_fmt="v1" _ptok _pexpect=""
for _ptok in $_pa; do
if [ "$_pexpect" = "format" ]; then _p_fmt="$_ptok"; _pexpect=""; continue; fi
case "$_ptok" in
@ -7084,7 +7091,7 @@ main_loop() {
done
# default site to the current $HCISITE when a thread is given without one
if [ "$_p_all" = "0" ] && [ -z "$_p_thread" ]; then
err "usage: /paths <thread> [site] [--up|--down|--site-only|--all|--format tsv|table|jsonl]"
err "usage: /paths <thread> [site] | <site>/<thread> [--up|--down|--site-only|--all|--format v1|table|tsv|jsonl|nodes]"
continue
fi
if [ "$_p_all" = "0" ] && [ -z "$_p_site" ] && [ -n "${HCISITE:-}" ]; then

View File

@ -25,6 +25,9 @@
# xlate-refs [<NAME>] — list xlate .xlt files referenced
# tclproc-refs [<NAME>] — list TCL proc names referenced
# route-block <NAME> — emit the DATAXLATE block (the routing config)
# index — single-pass route INDEX (P/D/L/O/X
# records) for the in-memory path walker;
# see cmd_index. Parses the file ONCE.
# help — this help
#
# Route-chain PATH enumeration (root-to-leaf chains, all-mode, cross-site) lives
@ -76,6 +79,131 @@ _blocks() {
' "$nc"
}
# ─────────────────────────────────────────────────────────────────────────────
# SINGLE-PASS INDEX (v0.8.20 perf rearchitecture).
#
# Emits, in ONE awk pass over the NetConfig, every fact the path WALKER needs so
# it never has to re-invoke a subprocess or re-parse the file per hop. The old
# walker called nc-parse.sh (destinations/sources/protocol-nested/...) once PER
# HOP PER CANDIDATE, and each of those re-ran _blocks + cmd_protocol_block (two
# full awk passes over a 16K-line file). On the real 24-site integrator that was
# O(threads x parse-cost) = minutes. This subcommand replaces all of it: parse
# ONCE, walk in memory.
#
# Output is a flat TAB-separated record stream (one record per line). The leading
# single-char tag identifies the record kind:
# P <thread> protocol declared in this NetConfig
# D <thread> <dest> a DATAXLATE DEST edge thread->dest
# (handles BOTH `{ DEST name }` and the
# list form `{ DEST {a b c} }`)
# L <thread> <port> a LISTEN port for <thread>:
# server PROTOCOL.PORT (ISSERVER=1) and/or
# the guarded top-level ICLSERVERPORT.
# O <thread> <port> an OUTBOUND/tcpip-client dest port for
# <thread> (PROTOCOL.PORT with ISSERVER!=1).
# X <destname> <site> <thread> <port> a top-level `destination` block: the
# AUTHORITATIVE cross-site link. A DEST that
# names <destname> hops to <thread> in <site>
# (PORT is the connecting port). This is how
# Cloverleaf actually links sites (ICL) — by
# named destination, resolved to SITE+THREAD,
# NOT by blindly matching ports.
#
# Robust to arbitrary brace nesting (same depth bookkeeping as _blocks). DEST,
# ISSERVER, PORT and ICLSERVERPORT are recognised by their canonical one-line
# `{ KEY value }` rendering; absent/`{}` values are simply never emitted (the
# guard that the old paths.tcl lacked).
# ─────────────────────────────────────────────────────────────────────────────
cmd_index() {
local nc="$1"
require_file "$nc"
awk '
BEGIN { depth=0; in_block=0; btype=""; bname=""
in_proto=0; proto_depth=0; pport=""; isserver=""; iclport=""
dsite=""; dthread=""; dport="" }
# ---- enter a top-level block -------------------------------------------
!in_block && $0 ~ /^(process|protocol|destination) [A-Za-z0-9_]+ \{$/ {
split($0, a, " ")
btype = a[1]; bname = a[2]
depth = 1; in_block = 1
in_proto = 0; proto_depth = 0; pport=""; isserver=""; iclport=""
dsite=""; dthread=""; dport=""
if (btype == "protocol") print "P\t" bname
next
}
in_block {
line = $0
# --- field extraction BEFORE we mutate depth (fields are depth-1 inside
# their parent block; the value is the whole `{ KEY v }` on one line) ---
if (btype == "protocol") {
# DEST single: { DEST name }
if (match(line, /\{ DEST [A-Za-z0-9_]+ \}/)) {
v = substr(line, RSTART+7, RLENGTH-9) # strip "{ DEST " .. " }"
print "D\t" bname "\t" v
}
# DEST list: { DEST {a b c} }
else if (match(line, /\{ DEST \{[^}]*\}/)) {
v = substr(line, RSTART+8, RLENGTH-9) # strip "{ DEST {" .. "}"
m = split(v, dd, /[ \t]+/)
for (i=1; i<=m; i++) if (dd[i] != "") print "D\t" bname "\t" dd[i]
}
# top-level ICLSERVERPORT (a listen port, guarded numeric)
if (match(line, /^[[:space:]]+\{ ICLSERVERPORT [0-9]+ \}[[:space:]]*$/)) {
v = line; sub(/^[[:space:]]+\{ ICLSERVERPORT /, "", v); sub(/ \}[[:space:]]*$/, "", v)
iclport = v
}
# enter the nested { PROTOCOL { ... } } sub-block
if (!in_proto && line ~ /^[[:space:]]+\{ PROTOCOL \{$/) {
in_proto = 1; proto_depth = depth + 1
} else if (in_proto) {
if (match(line, /^[[:space:]]+\{ PORT [0-9]+ \}[[:space:]]*$/)) {
v = line; sub(/^[[:space:]]+\{ PORT /, "", v); sub(/ \}[[:space:]]*$/, "", v)
pport = v
}
if (match(line, /^[[:space:]]+\{ ISSERVER [0-9]+ \}[[:space:]]*$/)) {
v = line; sub(/^[[:space:]]+\{ ISSERVER /, "", v); sub(/ \}[[:space:]]*$/, "", v)
isserver = v
}
}
} else if (btype == "destination") {
if (match(line, /^[[:space:]]+\{ SITE [A-Za-z0-9_]+ \}[[:space:]]*$/)) {
v = line; sub(/^[[:space:]]+\{ SITE /, "", v); sub(/ \}[[:space:]]*$/, "", v); dsite = v
}
if (match(line, /^[[:space:]]+\{ THREAD [A-Za-z0-9_]+ \}[[:space:]]*$/)) {
v = line; sub(/^[[:space:]]+\{ THREAD /, "", v); sub(/ \}[[:space:]]*$/, "", v); dthread = v
}
if (match(line, /^[[:space:]]+\{ PORT [0-9]+ \}[[:space:]]*$/)) {
v = line; sub(/^[[:space:]]+\{ PORT /, "", v); sub(/ \}[[:space:]]*$/, "", v); dport = v
}
}
# --- depth bookkeeping ---
n_open = gsub(/\{/, "{", line)
n_close = gsub(/\}/, "}", line)
depth += n_open - n_close
if (in_proto && depth < proto_depth) in_proto = 0
# --- close the top-level block: emit its aggregate records ---
if (depth == 0) {
if (btype == "protocol") {
# listen ports: server PROTOCOL.PORT (ISSERVER=1) and/or ICL port
if (isserver == "1" && pport != "") print "L\t" bname "\t" pport
if (iclport != "") print "L\t" bname "\t" iclport
# outbound/tcpip-client dest port
if (isserver != "1" && pport != "") print "O\t" bname "\t" pport
} else if (btype == "destination") {
if (dsite != "" && dthread != "")
print "X\t" bname "\t" dsite "\t" dthread "\t" dport
}
in_block = 0; btype=""; bname=""
}
}
' "$nc"
}
cmd_list_protocols() {
local nc="$1"
require_file "$nc"
@ -332,9 +460,10 @@ cmd_tclproc_refs() {
# cmd_chain only emitted a flat set of reachable nodes (depth/direction/thread),
# never enumerated root-to-leaf PATHS, was never wired into the LLM, and would
# have left two competing walkers. nc-paths.sh ports the v2 `paths` DFS
# enumerator (SITE/THREAD/HOPS/PATH output, all-mode, cross-site joins) and reuses
# the one-hop DEST primitives (cmd_destinations / cmd_sources) below. Do not
# reintroduce a second walker here — extend nc-paths.sh.
# enumerator (SITE/THREAD/HOPS/PATH output, all-mode, PORT-based cross-site links)
# and reuses the one-hop DEST primitives (cmd_destinations / cmd_sources) below
# for intra-site routing. Do not reintroduce a second walker here — extend
# nc-paths.sh.
cmd_route_block() {
local nc="$1" name="$2"
@ -384,6 +513,7 @@ case "$SUB" in
xlate-refs) [ $# -ge 2 ] || die "usage: $0 xlate-refs <netconfig> [name]"; cmd_xlate_refs "$2" "${3:-}" ;;
tclproc-refs) [ $# -ge 2 ] || die "usage: $0 tclproc-refs <netconfig> [name]"; cmd_tclproc_refs "$2" "${3:-}" ;;
route-block) [ $# -ge 3 ] || die "usage: $0 route-block <netconfig> <name>"; cmd_route_block "$2" "$3" ;;
index) [ $# -ge 2 ] || die "usage: $0 index <netconfig>"; cmd_index "$2" ;;
help|-h|--help) cmd_help ;;
*) die "unknown subcommand: $SUB (try '$0 help')" ;;
esac

View File

@ -15,21 +15,45 @@
# - All-mode: enumerate from every entry point (a thread with no incoming),
# deduped — gives the whole-site chain inventory (v2 list_full_routes).
#
# ROUTING RESOLUTION: next hop is resolved ONLY from the DATAXLATE { DEST <name> }
# list (via nc-parse.sh destinations / sources). It NEVER reads ICLSERVERPORT.
# This is deliberate: Bryan's old paths.tcl walked routes via
# `keylget data ICLSERVERPORT`, which THROWS on any thread lacking that key
# (every outbound/client thread), so the trace died on the first client thread.
# The DEST list is present on every routing thread regardless of direction and
# simply yields nothing (no crash) when a thread has no routes. DO NOT
# reintroduce an ICLSERVERPORT-based hop here.
# INTRA-SITE ROUTING RESOLUTION: within a single site the next hop is resolved
# ONLY from the DATAXLATE { DEST <name> } list (via nc-parse.sh destinations /
# sources). It NEVER walks via ICLSERVERPORT inside a site. The DEST list is
# present on every routing thread regardless of direction and simply yields
# nothing (no crash) when a thread has no routes. DO NOT reintroduce an
# ICLSERVERPORT-based hop for INTRA-site routing.
#
# CROSS-SITE BY DEFAULT (Bryan's resolved decision, 2026-05-28): when a chain's
# terminal thread (a downstream leaf with no further DEST in its own site) is
# ALSO an entry/inbound thread declared in ANOTHER discovered site's NetConfig
# (correlated by shared thread name), the walk CONTINUES into that site — so the
# mux -> ancout -> CodaMetrix style chain is followed end to end across the site
# boundary. Pass --site-only to scope the walk to a single site.
# CROSS-SITE BY DESTINATION BLOCK (v0.8.20, corrected on the real integrator):
# Cloverleaf links sites through named `destination` blocks — the inter-cloverleaf
# (ICL) routing table — NOT by blindly matching ports. A `destination <name> {...}`
# top-level block declares { SITE <site> } { THREAD <thread> } { PORT <port> }: it
# names a remote inbound thread in another site and the port the link connects on.
# A protocol's DATAXLATE DEST list may name EITHER (a) a LOCAL protocol (intra-site
# hop) OR (b) a destination block — and a DEST naming a destination block is the
# cross-site hop, resolved AUTHORITATIVELY to (SITE,THREAD). The PORT equals the
# remote thread's listen/ICL port (verifiable), but the link is name-resolved, so
# it is exact: e.g. mux thread ADTfr_epic_964700 has { DEST OB_ADT_ancS }; the
# destination block OB_ADT_ancS is { SITE ancout } { THREAD IB_ADT_muxS }
# { PORT 62043 } — so the chain continues into ancout's IB_ADT_muxS.
#
# WHY NOT PURE PORT-MATCHING (the rejected v0.8.20-draft mechanism): an earlier
# draft inferred the link by matching an outbound's PROTOCOL.PORT to an inbound's
# server/ICL port. That was (1) slow and (2) WRONG — it missed real feeders whose
# cross-site link is expressed only via a destination block (the mux feeder of
# IB_ADT_muxS above is reached through DEST OB_ADT_ancS, not through any thread
# whose PROTOCOL.PORT == 62043). ICLSERVERPORT is still read GUARDED in the index
# (absent / `{}` on most threads → skipped, never an error — the un-guarded keylget
# is exactly what crashed the old paths.tcl), but it is used only to corroborate a
# destination block's PORT, never as the primary link key.
#
# The whole route graph (protocol DEST edges + destination-block resolution +
# reverse-source maps) is built ONCE per run from a single awk pass per NetConfig
# (`nc-parse.sh index`) into in-memory associative arrays. Cross-site DOWNSTREAM: a
# DEST naming a destination block continues into its (site,thread). Cross-site
# UPSTREAM feeders of (site,thread): every destination block (any site) resolving
# to it, and the threads in that block's site that DEST to the block name — all
# in-memory lookups, no per-site chain enumeration (fixes Vera's m3 AND the old
# O(threads x parse-cost) per-hop subprocess blowup). Pass --site-only to scope the
# walk to a single site.
#
# Robust cycle detection across sites: every walk carries the full ancestor set
# keyed by "site\037thread"; revisiting any (site,thread) ancestor terminates the
@ -37,20 +61,45 @@
# terminates. A global max-depth cap (default 128, matching v2) is a second
# backstop.
#
# Output columns: SITE THREAD HOPS PATH
# THREAD = the start/anchor thread of the row
# HOPS = number of threads in the chain (len of the path list)
# PATH = the chain joined by " -> " (space-arrow-space)
# One row per enumerated root-to-leaf path; a branching thread yields N rows.
# DEFAULT OUTPUT = v1 CHAINS (one path per line, site/thread nodes, typed arrows):
# mux/ADTfr_epic_964700 --> mux/OB_ADT_ancS ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix
# - every NODE is rendered "site/thread" (slash join)
# - "-->" = an INTRA-site DATAXLATE route hop (a thread's DEST that names a
# LOCAL protocol — including the local OUTBOUND SENDER node, which is
# the destination-block name living in this site)
# - "==>" = a CROSS-site hop (the destination block's link: FROM the local
# outbound sender node TO the remote inbound thread it names)
# - one path per line; a branching thread yields N lines.
# This matches Bryan's v1 ground-truth paths.tcl: at every cross-site boundary the
# chain reads …local_inbound --> local_outbound_sender ==> remote_inbound --> … —
# the sender (= the destination-block name) is ALWAYS shown, never collapsed.
#
# The v1 line is PIPE-FIRST / field-extractable: `paths X | awkcut 1` yields the
# root node (field 1 = chain root, e.g. mux/ADTfr_epic_964700). The output is also
# valid INPUT: a "site/thread" node can be fed back in (paths X → extract root →
# paths <root>). `--format nodes` emits just the site/thread nodes (no arrows) one
# per line so piping never fights the arrow tokens.
#
# OTHER FORMATS (--format):
# table — the SITE/THREAD/HOPS/PATH aligned table (Bryan: kept, opt-in).
# THREAD = the start/anchor (ROOT) node of the row (first node in PATH);
# HOPS = number of nodes in the chain; PATH = the typed v1 chain.
# tsv — site<TAB>thread<TAB>hops<TAB>path (path = the typed v1 chain)
# jsonl — one JSON object per path {site,thread,hops,path}
# nodes — node-only: each path's "site/thread" nodes, one per line, blank line
# between paths (no arrows — clean for re-piping into `paths`).
# NOTE (Vera m2): for UPSTREAM (--up) chains the root is the feeder ROOT (the
# most-upstream source) and the queried thread is the chain TERMINUS.
#
# Usage:
# nc-paths.sh --netconfig <file> <thread> [flags] # explicit NetConfig
# nc-paths.sh <thread> <site> [flags] # resolve site under $HCIROOT
# nc-paths.sh <site>/<thread> [flags] # site/thread (v1 node form)
# nc-paths.sh --all [--site <name>] [flags] # whole-site entry chains
#
# Flags:
# --upstream only the upstream chains feeding the thread
# --downstream only the downstream chains from the thread
# --upstream | --up only the upstream chains feeding the thread
# --downstream | --down only the downstream chains from the thread
# (neither flag = full paths containing the thread,
# v2 default, falling back to downstream-from-thread)
# --all enumerate from every entry point (no thread arg)
@ -60,7 +109,7 @@
# --netconfig <file> operate on one explicit NetConfig (implies the site is
# basename(dirname(file)); cross-site still scans $HCIROOT)
# --max-depth N recursion cap (default 128)
# --format tsv|table|jsonl default: table
# --format v1|table|tsv|jsonl|nodes default: v1 (the ground-truth chain form)
#
# Exit codes: 0 OK, 1 usage error, 2 not found.
set -u
@ -83,36 +132,53 @@ DIR_MODE="full" # full | up | down
ALL_MODE=0
SITE_ONLY=0
MAX_DEPTH=128
FORMAT="table"
FORMAT="v1"
POSITIONAL=()
while [ $# -gt 0 ]; do
case "$1" in
--upstream) DIR_MODE="up" ;;
--downstream) DIR_MODE="down" ;;
--upstream|--up) DIR_MODE="up" ;;
--downstream|--down) DIR_MODE="down" ;;
--all) ALL_MODE=1 ;;
--site) shift; SITE_ARG="${1:-}" ;;
--site-only) SITE_ONLY=1 ;;
--hciroot) shift; HCIROOT_OVERRIDE="${1:-}" ;;
--netconfig) shift; NETCONFIG="${1:-}" ;;
--max-depth) shift; MAX_DEPTH="${1:-128}" ;;
--format) shift; FORMAT="${1:-table}" ;;
-h|--help) sed -n '2,70p' "$NC_SELF" | sed 's/^# \{0,1\}//'; exit 0 ;;
--format) shift; FORMAT="${1:-v1}" ;;
-h|--help) sed -n '2,113p' "$NC_SELF" | sed 's/^# \{0,1\}//'; exit 0 ;;
--*) die "unknown flag: $1" ;;
*) POSITIONAL+=("$1") ;;
esac
shift
done
case "$FORMAT" in tsv|table|jsonl) ;; *) die "bad --format: $FORMAT (tsv|table|jsonl)" ;; esac
case "$FORMAT" in v1|tsv|table|jsonl|nodes) ;; *) die "bad --format: $FORMAT (v1|table|tsv|jsonl|nodes)" ;; esac
# Positional shapes:
# <thread> (manual: thread only; site from $HCISITE/$HCISITEDIR)
# <thread> <site> (manual muscle-memory: thread + site)
# <site>/<thread> (v1 node form — the output IS valid input; pipe-first)
# PIPE-FIRST: a single positional containing a "/" is parsed as site/thread, so
# the v1 output (root node = "site/thread") can be fed straight back into paths.
if [ "${#POSITIONAL[@]}" -ge 1 ]; then THREAD="${POSITIONAL[0]}"; fi
if [ "${#POSITIONAL[@]}" -ge 2 ] && [ -z "$SITE_ARG" ]; then SITE_ARG="${POSITIONAL[1]}"; fi
if [ "${#POSITIONAL[@]}" -gt 2 ]; then die "too many positional args: ${POSITIONAL[*]}"; fi
# Accept the v1 "site/thread" node form as a single positional. A bare thread with
# no embedded slash (the legacy form) is left untouched. Only split on the FIRST
# slash so thread names are preserved verbatim. An explicit --site/2nd positional
# wins over a slash-embedded site only if they agree; otherwise the slash form is
# authoritative for the site (it came from our own output).
if [ -n "$THREAD" ] && [ -z "$NETCONFIG" ]; then
case "$THREAD" in
*/*) _slash_site="${THREAD%%/*}"; _slash_thr="${THREAD#*/}"
if [ -n "$_slash_site" ] && [ -n "$_slash_thr" ]; then
THREAD="$_slash_thr"; SITE_ARG="$_slash_site"
fi ;;
esac
fi
if [ "$ALL_MODE" = "0" ] && [ -z "$THREAD" ]; then
die "no thread given (and --all not set). Try: nc-paths.sh <thread> <site> OR nc-paths.sh --all --site <name>"
fi
@ -166,152 +232,348 @@ _discover_sites() {
fi
}
# Resolve the NetConfig path for a given site name (first match wins).
_nc_for_site() {
local want="$1" i
for ((i=0; i<${#SITE_NAMES[@]}; i++)); do
if [ "${SITE_NAMES[$i]}" = "$want" ]; then
printf '%s' "${SITE_NCS[$i]}"
return 0
fi
done
return 1
}
# Given a thread name, find the FIRST discovered (site,nc) pair whose NetConfig
# declares that thread as a protocol. Emits "site\037nc" or returns 1.
US=$'\037' # unit separator — safe field delimiter for site/thread keys
_locate_thread() {
local want="$1" i sname nc
# ─────────────────────────────────────────────────────────────────────────────
# IN-MEMORY ROUTE GRAPH (v0.8.20 perf rearchitecture).
#
# The old walker invoked nc-parse.sh ONCE PER HOP PER CANDIDATE (destinations /
# sources / protocol-nested / protocol-field / list-protocols), and EACH of those
# re-ran _blocks + cmd_protocol_block — two full awk passes over the (16K-line)
# NetConfig. On the real 24-site integrator that is O(threads x parse-cost) =
# minutes (84s --site-only, 164s full for a single thread). Even intra-site was a
# bottleneck because `sources` scans every protocol body.
#
# Now we PARSE EACH NEEDED NetConfig EXACTLY ONCE (`nc-parse.sh index`, a single
# awk pass — see cmd_index) and load the result into bash associative arrays. The
# walkers then do pure O(1) in-memory lookups: NO subprocess and NO re-parse per
# hop. Indexing all 24 live NetConfigs is <1s; a single-thread trace is now a
# few seconds and a full-tree run is well under a minute.
#
# CROSS-SITE LINK (corrected): Cloverleaf links sites through named `destination`
# blocks (the ICL routing table), NOT by blindly matching ports. A protocol's
# DATAXLATE DEST may name either (a) a LOCAL protocol (intra-site hop) or (b) a
# `destination` block, which resolves to { SITE <site> } { THREAD <thread> }
# { PORT <port> } — the authoritative remote target. The PORT is the connecting
# port (it equals the remote thread's listen/ICL port — verifiable), but the SITE
# and THREAD come straight from the destination block, so the hop is exact and
# name-resolved. (The old port-only heuristic was BOTH slow AND missed real
# feeders whose link is expressed via a destination block — e.g. the mux feeder of
# ancout's IB_ADT_muxS via destination OB_ADT_ancS.)
#
# Associative arrays (bash 4+; matches the rest of this repo, and Git-Bash /
# Cygwin on Windows ship bash 4+/5+). Keys use US ("site\037thread") so names with
# unusual characters never collide with the field delimiter.
# G_PROTO[site\037thread] = 1 membership: thread exists in site
# G_DESTS[site\037thread] = "d1\nd2..." raw DATAXLATE DEST targets (newline)
# G_LISTEN[site\037thread] = "p1 p2" listen ports (server + ICL), space-sep
# G_OUT[site\037thread] = "port" outbound/tcpip-client dest port
# G_DESTBLK[site\037destname] = "tsite\037tthread\037tport" destination-block resolution
# G_INSRC[site\037thread] = "s1\ns2..." reverse intra-site DEST edges (sources)
# G_DESTBLK_REV[tsite\037tthread] = "fsite\037fname\n..." destination blocks (any site)
# pointing AT (tsite,tthread); fname is the dest
# block name, used to find its upstream feeders.
# G_LOADED tracks which NetConfigs have already been indexed (idempotent).
# ─────────────────────────────────────────────────────────────────────────────
declare -A G_PROTO G_DESTS G_LISTEN G_OUT G_DESTBLK G_INSRC G_DESTBLK_REV G_LOADED
# Load ONE NetConfig's index into the in-memory graph (idempotent per nc path).
_load_nc() {
local site="$1" nc="$2"
[ -n "${G_LOADED[$nc]:-}" ] && return 0
G_LOADED[$nc]=1
local tag a b c d e key
while IFS=$'\t' read -r tag a b c d e; do
case "$tag" in
P) key="${site}${US}${a}"; G_PROTO[$key]=1 ;;
D) key="${site}${US}${a}"
if [ -z "${G_DESTS[$key]:-}" ]; then G_DESTS[$key]="$b"; else G_DESTS[$key]="${G_DESTS[$key]}"$'\n'"$b"; fi ;;
L) key="${site}${US}${a}"
if [ -z "${G_LISTEN[$key]:-}" ]; then G_LISTEN[$key]="$b"; else G_LISTEN[$key]="${G_LISTEN[$key]} $b"; fi ;;
O) key="${site}${US}${a}"; G_OUT[$key]="$b" ;;
X) # X <destname> <tsite> <tthread> <tport>
key="${site}${US}${a}"; G_DESTBLK[$key]="${b}${US}${c}${US}${d}"
local rkey="${b}${US}${c}"
local rval="${site}${US}${a}"
if [ -z "${G_DESTBLK_REV[$rkey]:-}" ]; then G_DESTBLK_REV[$rkey]="$rval"; else G_DESTBLK_REV[$rkey]="${G_DESTBLK_REV[$rkey]}"$'\n'"$rval"; fi ;;
esac
done < <("$NCP" index "$nc" 2>/dev/null)
}
# Build the reverse intra-site DEST edges (sources) for every loaded site. Called
# once after all needed NetConfigs are loaded. For each thread A with DEST B in
# the SAME site, record A as a source of B (only when B is a local protocol —
# DEST targets that are destination blocks are handled as cross-site, not here).
_build_in_sources() {
local key src site dst dkey
for key in "${!G_DESTS[@]}"; do
site="${key%%$US*}"; src="${key#*$US}"
while IFS= read -r dst; do
[ -z "$dst" ] && continue
dkey="${site}${US}${dst}"
[ -n "${G_PROTO[$dkey]:-}" ] || continue # only local protocols are intra-site sources
if [ -z "${G_INSRC[$dkey]:-}" ]; then G_INSRC[$dkey]="$src"; else G_INSRC[$dkey]="${G_INSRC[$dkey]}"$'\n'"$src"; fi
done <<< "${G_DESTS[$key]}"
done
}
# Ensure the WHOLE tree is loaded (all discovered sites) — needed for cross-site
# resolution and reverse-source maps. Idempotent.
GRAPH_BUILT=0
_build_graph() {
[ "$GRAPH_BUILT" = "1" ] && return 0
GRAPH_BUILT=1
local i
for ((i=0; i<${#SITE_NCS[@]}; i++)); do
sname="${SITE_NAMES[$i]}"; nc="${SITE_NCS[$i]}"
if "$NCP" list-protocols "$nc" 2>/dev/null | grep -qxF "$want"; then
printf '%s%s%s' "$sname" "$US" "$nc"
return 0
fi
_load_nc "${SITE_NAMES[$i]}" "${SITE_NCS[$i]}"
done
_build_in_sources
}
# Given a thread name, find the FIRST discovered site that declares it (in-memory).
# Emits "site" or returns 1.
_locate_thread() {
local want="$1" i sname
for ((i=0; i<${#SITE_NAMES[@]}; i++)); do
sname="${SITE_NAMES[$i]}"
[ -n "${G_PROTO[${sname}${US}${want}]:-}" ] && { printf '%s' "$sname"; return 0; }
done
return 1
}
# ─────────────────────────────────────────────────────────────────────────────
# One-hop primitives (DEST-based, never ICLSERVERPORT).
# One-hop primitives — now pure in-memory lookups (no subprocess, no re-parse).
# INTRA-site routing follows the DATAXLATE DEST list only (never ICLSERVERPORT).
# A DEST that names a destination block is NOT an intra-site dest (it is the
# cross-site link, handled in the walkers).
# ─────────────────────────────────────────────────────────────────────────────
_outgoing() { "$NCP" destinations "$1" "$2" 2>/dev/null; } # nc thread -> dest names
_incoming() { "$NCP" sources "$1" "$2" 2>/dev/null; } # nc thread -> source names
# Intra-site downstream: DEST targets that are LOCAL protocols in this site.
_outgoing() { # site thread
local site="$1" thr="$2" key="${1}${US}${2}" d dkey
[ -n "${G_DESTS[$key]:-}" ] || return 0
while IFS= read -r d; do
[ -z "$d" ] && continue
dkey="${site}${US}${d}"
[ -n "${G_PROTO[$dkey]:-}" ] && printf '%s\n' "$d"
done <<< "${G_DESTS[$key]}"
}
# Intra-site upstream: local protocols that DEST to this thread.
_incoming() { local key="${1}${US}${2}"; [ -n "${G_INSRC[$key]:-}" ] && printf '%s\n' "${G_INSRC[$key]}"; }
# Is <thread> an entry point (no incoming) in <nc>?
_is_entry_in() {
local nc="$1" t="$2"
[ -z "$(_incoming "$nc" "$t")" ]
# Is <thread> an entry point (no incoming) in <site>?
_is_entry_in() { [ -z "${G_INSRC[${1}${US}${2}]:-}" ]; }
# Cross-site DOWNSTREAM targets: a DEST of (cur_site,cur_thread) that is NOT a
# local protocol but IS a destination block. The destination-block NAME (d) is the
# LOCAL OUTBOUND SENDER node, living in cur_site — v1 shows it and we must NOT
# collapse it. The block resolves to the remote inbound (tsite,tthread). Emit each
# as "sender\037tsite\037tthread" (sender = the dest-block name in cur_site). The
# walker then renders: cur_thread --(intra)--> cur_site/sender ==(cross)==> tsite/tthread.
# Authoritative name-resolved link (PORT is just confirmation).
_xsite_down_targets() {
local cur_site="$1" cur_thread="$2" key="${1}${US}${2}" d dbkey resolved
[ -n "${G_DESTS[$key]:-}" ] || return 0
while IFS= read -r d; do
[ -z "$d" ] && continue
[ -n "${G_PROTO[${cur_site}${US}${d}]:-}" ] && continue # local protocol → intra-site, not here
dbkey="${cur_site}${US}${d}"
resolved="${G_DESTBLK[$dbkey]:-}"
[ -z "$resolved" ] && continue # not a known destination block → skip
local tsite="${resolved%%$US*}" rest="${resolved#*$US}"
local tthr="${rest%%$US*}"
printf '%s%s%s%s%s\n' "$d" "$US" "$tsite" "$US" "$tthr"
done <<< "${G_DESTS[$key]}"
}
# Cross-site UPSTREAM feeders: who feeds (cur_site,cur_thread) from another site?
# Any destination block (in any site) that resolves to (cur_site,cur_thread); its
# upstream feeders are the threads in the destination block's OWN site that DEST to
# that destination-block NAME (dbname). The block NAME is the LOCAL OUTBOUND SENDER
# node, living in fsite between the feeder and this remote inbound — v1 shows it,
# so we carry it. Emit each as "fsite\037fthread\037dbname". The walker then renders
# the upstream prefix: fsite/feeder --(intra)--> fsite/dbname ==(cross)==> cur_site/cur_thread.
# Pure in-memory lookup — no per-site chain enumeration.
_xsite_up_feeders() {
local cur_site="$1" cur_thread="$2" rkey="${1}${US}${2}" dbref
[ -n "${G_DESTBLK_REV[$rkey]:-}" ] || return 0
while IFS= read -r dbref; do
[ -z "$dbref" ] && continue
# dbref = fsite\037destblockname
local fsite="${dbref%%$US*}" dbname="${dbref#*$US}" feeder
# feeders = local protocols in fsite whose DEST names dbname
local fkey
for fkey in "${!G_DESTS[@]}"; do
[ "${fkey%%$US*}" = "$fsite" ] || continue
case $'\n'"${G_DESTS[$fkey]}"$'\n' in
(*$'\n'"$dbname"$'\n'*)
feeder="${fkey#*$US}"
printf '%s%s%s%s%s\n' "$fsite" "$US" "$feeder" "$US" "$dbname" ;;
esac
done
done <<< "${G_DESTBLK_REV[$rkey]}"
}
# ─────────────────────────────────────────────────────────────────────────────
# Path enumeration. Emitted paths are written to $OUT_PATHS as one line each:
# site<TAB>chain where chain = thread1 -> thread2 -> ...
# We carry the running chain as a space-joined token list of "site\037thread"
# keys, and the ancestor set as newline-joined keys (for cycle detection).
# site<TAB>chain where chain = the rendered v1 typed chain (site/thread nodes
# joined by --> / ==>).
#
# CHAIN ENCODING (EDGE-TYPED). We carry the running chain as a space-joined list
# of TOKENS. The FIRST token is a bare node key "site\037thread". Every SUBSEQUENT
# token is "EDGE\035site\037thread" where the leading 1-char EDGE code records how
# this node connects to the PREVIOUS node:
# i = INTRA-site DATAXLATE hop → rendered "-->"
# x = CROSS-site destination-link → rendered "==>"
# \035 (GS) separates the edge code from the node key; \037 (US) separates site
# from thread. Node names are [A-Za-z0-9_]+ so neither separator can collide, and
# tokens stay space-tokenizable (the full-mode awk join still splits on spaces).
# The ancestor set (cycle detection) remains a newline-joined list of plain node
# keys "site\037thread" — edge codes are never part of it.
# ─────────────────────────────────────────────────────────────────────────────
GS=$'\035' # group separator — delimits the edge code from the node key
OUT_PATHS=$(mktemp)
trap 'rm -f "$OUT_PATHS"' EXIT
# Append a node to a keychain with an explicit edge type.
# _chain_push CHAIN EDGE NODEKEY (EDGE = i|x; first push ignores EDGE)
# Emits the new chain on stdout.
_chain_push() {
local chain="$1" edge="$2" node="$3"
if [ -z "$chain" ]; then printf '%s' "$node"; else printf '%s %s%s%s' "$chain" "$edge" "$GS" "$node"; fi
}
# Prepend a node (upstream walk builds a prefix). The edge code lives on the node
# that follows it; when we prepend a NEW root we must move the edge code onto the
# OLD first node and leave the new root bare.
# _chain_unshift CHAIN EDGE NODEKEY
_chain_unshift() {
local chain="$1" edge="$2" node="$3"
if [ -z "$chain" ]; then printf '%s' "$node"; return 0; fi
# The current chain's first token is a bare node key (no edge code). Re-tag it
# with EDGE (its connection to the new root we are prepending), then prefix the
# bare new root.
local first="${chain%% *}" rest=""
case "$chain" in *' '*) rest=" ${chain#* }" ;; esac
printf '%s %s%s%s%s' "$node" "$edge" "$GS" "$first" "$rest"
}
# _emit_chain ANCHOR_SITE KEYCHAIN
# KEYCHAIN = space-separated list of "site\037thread" keys
# Renders to "anchor_site<TAB>t1 -> t2 -> ..." (thread names only in PATH).
# KEYCHAIN = the edge-typed token list described above.
# Renders to "anchor_site<TAB>site/thread --> site/thread ==> ..." (v1 form).
_emit_chain() {
local anchor_site="$1" keychain="$2"
local out="" k thr first=1
for k in $keychain; do
thr="${k#*$US}"
if [ "$first" = "1" ]; then out="$thr"; first=0; else out="$out -> $thr"; fi
local out="" tok edge node site thr first=1
for tok in $keychain; do
if [ "$first" = "1" ]; then
node="$tok"; edge=""
else
edge="${tok%%$GS*}"; node="${tok#*$GS}"
fi
site="${node%%$US*}"; thr="${node#*$US}"
if [ "$first" = "1" ]; then
out="${site}/${thr}"; first=0
else
case "$edge" in
x) out="$out ==> ${site}/${thr}" ;;
*) out="$out --> ${site}/${thr}" ;;
esac
fi
done
printf '%s\t%s\n' "$anchor_site" "$out"
}
# Cycle test against the newline-joined ancestor set — pure bash, no grep
# subprocess (this used to fork `grep -qxF` per hop). seen lines are US-keyed.
_seen_has() {
case $'\n'"$1"$'\n' in (*$'\n'"$2"$'\n'*) return 0 ;; esac
return 1
}
# Downstream DFS. Mirrors v2 _enumerate_downstream_paths + cross-site hop.
# All lookups are in-memory (the graph is keyed by SITE; no NetConfig path / no
# subprocess per hop).
# $1 anchor_site — site to report in the SITE column for these rows
# $2 cur_site — site of current thread
# $3 cur_nc — NetConfig of current thread
# $4 cur_thread — current thread name
# $5 keychain — space-joined ancestor keys NOT including current
# $6 seen — newline-joined ancestor keys (for cycle detection)
# $7 depth
# $3 cur_thread — current thread name
# $4 keychain — edge-typed ancestor chain NOT including current
# $5 seen — newline-joined ancestor node keys (for cycle detection)
# $6 depth
# $7 edge_in — edge connecting the previous node to cur (i|x; "" for root)
_walk_down() {
local anchor_site="$1" cur_site="$2" cur_nc="$3" cur_thread="$4"
local keychain="$5" seen="$6" depth="$7"
local anchor_site="$1" cur_site="$2" cur_thread="$3"
local keychain="$4" seen="$5" depth="$6" edge_in="${7:-}"
local curkey="${cur_site}${US}${cur_thread}"
local newchain
if [ -z "$keychain" ]; then newchain="$curkey"; else newchain="$keychain $curkey"; fi
newchain="$(_chain_push "$keychain" "${edge_in:-i}" "$curkey")"
# cycle / depth cap → terminate, include current node (v2 semantics)
if [ "$depth" -gt "$MAX_DEPTH" ] || printf '%s\n' "$seen" | grep -qxF "$curkey"; then
if [ "$depth" -gt "$MAX_DEPTH" ] || _seen_has "$seen" "$curkey"; then
_emit_chain "$anchor_site" "$newchain"
return 0
fi
# gather outgoing within the current site
# gather outgoing within the current site (DEST targets that are local protocols)
local outgoing=()
local d
while IFS= read -r d; do
[ -z "$d" ] && continue
outgoing+=("$d")
done < <(_outgoing "$cur_nc" "$cur_thread")
done < <(_outgoing "$cur_site" "$cur_thread")
local nseen="$seen"$'\n'"$curkey"
local branched=0
if [ "${#outgoing[@]}" -gt 0 ]; then
local nseen
nseen="$seen"$'\n'"$curkey"
branched=1
for d in "${outgoing[@]}"; do
_walk_down "$anchor_site" "$cur_site" "$cur_nc" "$d" "$newchain" "$nseen" $((depth+1))
# intra-site route hop (-->)
_walk_down "$anchor_site" "$cur_site" "$d" "$newchain" "$nseen" $((depth+1)) i
done
return 0
fi
# No outgoing in this site = a leaf for this site. CROSS-SITE HOP:
# if cross-site is enabled and this leaf thread is an entry/inbound thread in
# ANOTHER site's NetConfig (shared name) that DOES have outgoing there,
# continue the walk into that site.
# CROSS-SITE HOP via destination block (v0.8.20, corrected; v0.8.20 output fix:
# SHOW THE SENDER NODE). A DEST of this thread that names a destination block is
# the LOCAL OUTBOUND SENDER node (the block name, in cur_site) followed by the
# remote inbound (tsite,tthread). v1 renders BOTH:
# cur_thread --(intra -->)--> cur_site/sender ==(cross ==>)==> tsite/tthread
# so we (1) push the sender node with an INTRA edge, then (2) recurse into the
# remote inbound with a CROSS edge. NEVER collapse the sender. This is in ADDITION
# to any intra-site branches above (a thread can route both locally and cross-site).
if [ "$SITE_ONLY" = "0" ]; then
local i osite onc okey
for ((i=0; i<${#SITE_NCS[@]}; i++)); do
osite="${SITE_NAMES[$i]}"; onc="${SITE_NCS[$i]}"
[ "$osite" = "$cur_site" ] && [ "$onc" = "$cur_nc" ] && continue
# the thread must exist in the other site AND have outgoing there
"$NCP" list-protocols "$onc" 2>/dev/null | grep -qxF "$cur_thread" || continue
[ -n "$(_outgoing "$onc" "$cur_thread")" ] || continue
okey="${osite}${US}${cur_thread}"
# cycle guard across sites: don't re-enter an ancestor (site,thread)
printf '%s\n' "$seen" | grep -qxF "$okey" && continue
# Continue the chain in the other site. We DROP the duplicate boundary
# node: cur_thread is already the last node in newchain, and it is the
# same thread name in osite, so we recurse on its destinations directly,
# carrying newchain as the prefix and marking both (site,thread) keys seen.
local nseen2
nseen2="$seen"$'\n'"$curkey"$'\n'"$okey"
local dd
while IFS= read -r dd; do
[ -z "$dd" ] && continue
_walk_down "$anchor_site" "$osite" "$onc" "$dd" "$newchain" "$nseen2" $((depth+1))
done < <(_outgoing "$onc" "$cur_thread")
# only join into the first matching downstream site, then stop scanning
return 0
done
local tgt sender osite othr okey sendkey sendchain
while IFS= read -r tgt; do
[ -z "$tgt" ] && continue
# tgt = sender\037tsite\037tthread
sender="${tgt%%$US*}"; local rest="${tgt#*$US}"
osite="${rest%%$US*}"; othr="${rest#*$US}"
okey="${osite}${US}${othr}"
_seen_has "$seen" "$okey" && continue # cycle guard across sites
branched=1
# (1) the local outbound sender node, intra-site edge from cur_thread
sendkey="${cur_site}${US}${sender}"
sendchain="$(_chain_push "$newchain" i "$sendkey")"
# (2) cross-site edge from the sender into the remote inbound; continue there
_walk_down "$anchor_site" "$osite" "$othr" "$sendchain" "$nseen" $((depth+1)) x
done < <(_xsite_down_targets "$cur_site" "$cur_thread")
fi
# true terminal — emit the chain
_emit_chain "$anchor_site" "$newchain"
# true terminal (no intra- or cross-site continuation) — emit the chain
[ "$branched" = "0" ] && _emit_chain "$anchor_site" "$newchain"
return 0
}
# Upstream DFS. Mirrors v2 _enumerate_upstream_paths. Cross-site upstream:
# if a thread has no incoming in its own site but the same-named thread is a
# downstream/leaf in another site, follow that site's incoming (the feeders).
# builds the chain as a PREFIX (sources come before current)
# Upstream DFS. Mirrors v2 _enumerate_upstream_paths. Builds the chain as a PREFIX
# (sources come before current). Cross-site feeders are resolved via destination
# blocks (see _xsite_up_feeders) — in-memory, no per-site enumeration.
# $7 edge_in — edge connecting cur to the node that FOLLOWS it (already in
# keychain). i|x; "" for the terminus (nothing follows yet).
_walk_up() {
local anchor_site="$1" cur_site="$2" cur_nc="$3" cur_thread="$4"
local keychain="$5" seen="$6" depth="$7"
local anchor_site="$1" cur_site="$2" cur_thread="$3"
local keychain="$4" seen="$5" depth="$6" edge_in="${7:-}"
local curkey="${cur_site}${US}${cur_thread}"
local newchain
if [ -z "$keychain" ]; then newchain="$curkey"; else newchain="$curkey $keychain"; fi
newchain="$(_chain_unshift "$keychain" "${edge_in:-i}" "$curkey")"
if [ "$depth" -gt "$MAX_DEPTH" ] || printf '%s\n' "$seen" | grep -qxF "$curkey"; then
if [ "$depth" -gt "$MAX_DEPTH" ] || _seen_has "$seen" "$curkey"; then
_emit_chain "$anchor_site" "$newchain"
return 0
fi
@ -321,143 +583,165 @@ _walk_up() {
while IFS= read -r s; do
[ -z "$s" ] && continue
incoming+=("$s")
done < <(_incoming "$cur_nc" "$cur_thread")
done < <(_incoming "$cur_site" "$cur_thread")
local nseen="$seen"$'\n'"$curkey"
local branched=0
if [ "${#incoming[@]}" -gt 0 ]; then
local nseen
nseen="$seen"$'\n'"$curkey"
branched=1
for s in "${incoming[@]}"; do
_walk_up "$anchor_site" "$cur_site" "$cur_nc" "$s" "$newchain" "$nseen" $((depth+1))
# intra-site source feeds cur via a route hop (-->)
_walk_up "$anchor_site" "$cur_site" "$s" "$newchain" "$nseen" $((depth+1)) i
done
return 0
fi
# cross-site upstream hop: same-named thread fed in another site
# CROSS-SITE UPSTREAM FEEDERS via destination block (v0.8.20, corrected; output
# fix: SHOW THE SENDER NODE). Any destination block (any site) resolving to THIS
# (site,thread); the block NAME is the LOCAL OUTBOUND SENDER node in the feeder's
# site, and the feeders are the threads in that site that DEST to the block name.
# v1 renders the upstream prefix as:
# fsite/feeder --(intra -->)--> fsite/sender ==(cross ==>)==> cur_site/cur_thread
# so we (1) prepend the sender node with a CROSS edge (sender ==> cur), then
# (2) recurse up into the feeder with an INTRA edge (feeder --> sender). In-memory.
if [ "$SITE_ONLY" = "0" ]; then
local i osite onc okey
for ((i=0; i<${#SITE_NCS[@]}; i++)); do
osite="${SITE_NAMES[$i]}"; onc="${SITE_NCS[$i]}"
[ "$osite" = "$cur_site" ] && [ "$onc" = "$cur_nc" ] && continue
"$NCP" list-protocols "$onc" 2>/dev/null | grep -qxF "$cur_thread" || continue
[ -n "$(_incoming "$onc" "$cur_thread")" ] || continue
okey="${osite}${US}${cur_thread}"
printf '%s\n' "$seen" | grep -qxF "$okey" && continue
local nseen2
nseen2="$seen"$'\n'"$curkey"$'\n'"$okey"
local ss
while IFS= read -r ss; do
[ -z "$ss" ] && continue
_walk_up "$anchor_site" "$osite" "$onc" "$ss" "$newchain" "$nseen2" $((depth+1))
done < <(_incoming "$onc" "$cur_thread")
return 0
done
local fdr fsite othr okey sender sendkey sendchain
while IFS= read -r fdr; do
[ -z "$fdr" ] && continue
# fdr = fsite\037fthread\037dbname
fsite="${fdr%%$US*}"; local rest="${fdr#*$US}"
othr="${rest%%$US*}"; sender="${rest#*$US}"
okey="${fsite}${US}${othr}"
_seen_has "$seen" "$okey" && continue
branched=1
# (1) the local outbound sender node, CROSS edge into cur
sendkey="${fsite}${US}${sender}"
sendchain="$(_chain_unshift "$newchain" x "$sendkey")"
# (2) recurse up into the feeder, INTRA edge into the sender
_walk_up "$anchor_site" "$fsite" "$othr" "$sendchain" "$nseen" $((depth+1)) i
done < <(_xsite_up_feeders "$cur_site" "$cur_thread")
fi
_emit_chain "$anchor_site" "$newchain"
[ "$branched" = "0" ] && _emit_chain "$anchor_site" "$newchain"
return 0
}
# ─────────────────────────────────────────────────────────────────────────────
# Drivers
# ─────────────────────────────────────────────────────────────────────────────
# In-memory list of a site's protocol names (membership keys are "site\037thread").
_protos_in_site() {
local site="$1" key
for key in "${!G_PROTO[@]}"; do
[ "${key%%$US*}" = "$site" ] && printf '%s\n' "${key#*$US}"
done
}
# Enumerate every full path in a site by starting from each entry point.
# Cross-site continuation happens naturally inside _walk_down. Dedup by the
# rendered "site\tchain" line.
# rendered "site\tchain" line. All in-memory — no subprocess.
_enumerate_all_in_site() {
local site="$1" nc="$2"
local entry tmp
tmp=$(mktemp)
# entry points = threads with no incoming in this site
"$NCP" list-protocols "$nc" 2>/dev/null | while IFS= read -r entry; do
[ -z "$entry" ] && continue
if _is_entry_in "$nc" "$entry"; then
printf '%s\n' "$entry" >> "$tmp"
fi
done
# if no entry points (every thread has an incoming, e.g. a pure cycle),
# fall back to all protocols as start points (v2 fallback)
if [ ! -s "$tmp" ]; then
"$NCP" list-protocols "$nc" 2>/dev/null > "$tmp"
fi
local site="$1"
local entry entries=() any_entry=0 all=()
while IFS= read -r entry; do
[ -z "$entry" ] && continue
_walk_down "$site" "$site" "$nc" "$entry" "" "" 0
done < "$tmp"
rm -f "$tmp"
all+=("$entry")
if _is_entry_in "$site" "$entry"; then
entries+=("$entry"); any_entry=1
fi
done < <(_protos_in_site "$site")
# if no entry points (every thread has an incoming, e.g. a pure cycle),
# fall back to all protocols as start points (v2 fallback)
if [ "$any_entry" = "0" ]; then
entries=("${all[@]}")
fi
for entry in "${entries[@]}"; do
_walk_down "$site" "$site" "$entry" "" "" 0
done
}
main_enumerate() {
_discover_sites
[ "${#SITE_NCS[@]}" -gt 0 ] || die "no NetConfig found (set \$HCIROOT, or pass --netconfig / --hciroot)"
# PARSE ONCE: build the whole in-memory route graph (single awk pass per
# NetConfig + reverse-source maps). The walkers then run entirely in memory.
# With --site-only and an explicit thread we still build the full graph (it is
# <1s for 24 sites); cross-site hops are simply suppressed by the SITE_ONLY guard.
_build_graph
local raw
raw=$(mktemp)
trap 'rm -f "$OUT_PATHS" "$raw"' EXIT
if [ "$ALL_MODE" = "1" ]; then
# whole-site entry chains; scope to --site if given (else every site)
local i sname snc
local i sname
for ((i=0; i<${#SITE_NAMES[@]}; i++)); do
sname="${SITE_NAMES[$i]}"; snc="${SITE_NCS[$i]}"
sname="${SITE_NAMES[$i]}"
if [ -n "$SITE_ARG" ] && [ "$sname" != "$SITE_ARG" ]; then continue; fi
_enumerate_all_in_site "$sname" "$snc" >> "$raw"
_enumerate_all_in_site "$sname" >> "$raw"
done
else
# locate the thread's home site
local home_site home_nc loc
# locate the thread's home site (in-memory membership lookup)
local home_site
if [ -n "$NETCONFIG" ]; then
home_nc="$NETCONFIG"; home_site="$(basename "$(dirname "$NETCONFIG")")"
"$NCP" list-protocols "$home_nc" 2>/dev/null | grep -qxF "$THREAD" \
|| die "thread not found in $home_nc: $THREAD"
home_site="$(basename "$(dirname "$NETCONFIG")")"
[ -n "${G_PROTO[${home_site}${US}${THREAD}]:-}" ] \
|| die "thread not found in $NETCONFIG: $THREAD"
elif [ -n "$SITE_ARG" ]; then
home_nc="$(_nc_for_site "$SITE_ARG")" || die "site not found under \$HCIROOT: $SITE_ARG"
home_site="$SITE_ARG"
"$NCP" list-protocols "$home_nc" 2>/dev/null | grep -qxF "$THREAD" \
[ -n "${G_PROTO[${home_site}${US}${THREAD}]:-}" ] \
|| die "thread not found in site $SITE_ARG: $THREAD"
else
loc="$(_locate_thread "$THREAD")" || die "thread not found in any discovered site: $THREAD"
home_site="${loc%%$US*}"; home_nc="${loc#*$US}"
home_site="$(_locate_thread "$THREAD")" || die "thread not found in any discovered site: $THREAD"
fi
case "$DIR_MODE" in
up) _walk_up "$home_site" "$home_site" "$home_nc" "$THREAD" "" "" 0 >> "$raw" ;;
down) _walk_down "$home_site" "$home_site" "$home_nc" "$THREAD" "" "" 0 >> "$raw" ;;
up) _walk_up "$home_site" "$home_site" "$THREAD" "" "" 0 >> "$raw" ;;
down) _walk_down "$home_site" "$home_site" "$THREAD" "" "" 0 >> "$raw" ;;
full)
# v2 default: every full path (entry-point enumeration) that CONTAINS the
# thread; fall back to downstream-from-thread if none contain it.
local all_tmp
all_tmp=$(mktemp)
_enumerate_all_in_site "$home_site" "$home_nc" > "$all_tmp"
# cross-site: also enumerate full paths in any site whose entry chains
# could pass through the thread (the home site's own entry enumeration
# already crosses outward; inbound feeders in other sites are picked up
# because those sites' entry chains are enumerated in all-mode — but for
# a single-thread query we only have the home site's chains, so we also
# scan every discovered site's chains to catch upstream feeders).
if [ "$SITE_ONLY" = "0" ]; then
local j js jn
for ((j=0; j<${#SITE_NAMES[@]}; j++)); do
js="${SITE_NAMES[$j]}"; jn="${SITE_NCS[$j]}"
[ "$jn" = "$home_nc" ] && continue
_enumerate_all_in_site "$js" "$jn" >> "$all_tmp"
done
fi
# keep only chains containing the thread (match on " -> THREAD ->",
# leading "THREAD ->", or trailing "-> THREAD", or exact)
local kept
kept=$(awk -F'\t' -v t="$THREAD" '
# v2 default: every full ROOT-TO-LEAF path CONTAINING the thread.
#
# v0.8.20 (rearchitected): do NOT scan every site's entry chains (the old
# O(sites x threads) loop). The complete chain = the thread's UPSTREAM
# feeder chains (each ending AT the thread: root -> ... -> thread) JOINED at
# the thread to its DOWNSTREAM chains (each starting AT the thread:
# thread -> ... -> leaf). Both walks are in-memory and follow cross-site
# links via destination blocks, so the join naturally spans sites
# (e.g. mux/ADTfr_epic_964700 --> ... ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix).
# The cartesian join over the (usually tiny) up x down sets is done in awk.
# Both halves are the RENDERED v1 chain (site/thread nodes; --> / ==> arrows).
# The upstream prefix ENDS with the queried node (home_site/THREAD); the
# downstream chain STARTS with it. We strip the leading queried node from the
# downstream — KEEPING the arrow that follows it (--> or ==>) so the cross-site
# boundary type is preserved — and graft the remaining suffix onto the prefix.
local up_tmp down_tmp qnode
up_tmp=$(mktemp); down_tmp=$(mktemp)
qnode="${home_site}/${THREAD}"
_walk_up "$home_site" "$home_site" "$THREAD" "" "" 0 > "$up_tmp"
_walk_down "$home_site" "$home_site" "$THREAD" "" "" 0 > "$down_tmp"
# join: for each upstream prefix x each downstream chain, emit
# prefix <arrow> <downstream minus leading queried-node-and-its-arrow>.
awk -F'\t' -v q="$qnode" '
FNR==NR { usite[NR]=$1; up[NR]=$2; nu=NR; next }
{
chain=$2
# pad with arrows for unambiguous boundary matching
padded=" -> " chain " -> "
if (index(padded, " -> " t " -> ") > 0) print $0
}' "$all_tmp" | sort -u)
if [ -n "$kept" ]; then
printf '%s\n' "$kept" >> "$raw"
else
_walk_down "$home_site" "$home_site" "$home_nc" "$THREAD" "" "" 0 >> "$raw"
fi
rm -f "$all_tmp"
dn=$2
# split the downstream into the leading queried node, the arrow that
# follows it, and the remaining suffix. arrow is " --> " or " ==> ".
arrow=""; suffix=""
if (index(dn, q " --> ") == 1) { arrow=" --> "; suffix=substr(dn, length(q " --> ")+1) }
else if (index(dn, q " ==> ") == 1) { arrow=" ==> "; suffix=substr(dn, length(q " ==> ")+1) }
else { arrow=""; suffix="" } # downstream was just the node
for (i=1; i<=nu; i++) {
chain = up[i]
if (suffix != "") chain = up[i] arrow suffix
print usite[i] "\t" chain
}
}
' "$up_tmp" "$down_tmp" | sort -u >> "$raw"
rm -f "$up_tmp" "$down_tmp"
;;
esac
fi
@ -469,25 +753,55 @@ main_enumerate() {
}
# ─────────────────────────────────────────────────────────────────────────────
# Render: OUT_PATHS holds "site<TAB>chain" lines. Build SITE THREAD HOPS PATH.
# Render: OUT_PATHS holds "site<TAB>chain" lines, where chain is the v1 rendered
# form (site/thread nodes joined by " --> " / " ==> "). All derived columns split
# the chain on the TYPED-ARROW regex " (--|==)> " so HOPS counts NODES and the
# root (field 1) is the first node — independent of the boundary type.
# THREAD = first node of the chain (the anchor/root for this row)
# HOPS = number of nodes in the chain
# ─────────────────────────────────────────────────────────────────────────────
render() {
if [ ! -s "$OUT_PATHS" ]; then
printf 'No paths found.\n'
# No-paths goes to stderr for data/pipe formats so stdout stays clean for
# downstream field extraction (awkcut / cut never sees a prose line).
case "$FORMAT" in
v1|nodes|tsv|jsonl) printf 'No paths found.\n' >&2 ;;
*) printf 'No paths found.\n' ;;
esac
return 0
fi
# produce a 4-col TSV: site thread hops path
case "$FORMAT" in
v1)
# The ground-truth chain, one path per line. PIPE-FIRST: field 1 (split on
# the arrow tokens, e.g. `awkcut 1`) is the root node "site/thread".
awk -F'\t' '{ print $2 }' "$OUT_PATHS"
return 0
;;
nodes)
# node-only extraction: each path's site/thread nodes one per line, a blank
# line between paths. No arrows — clean for re-piping into `paths`.
awk -F'\t' '
NR>1 { print "" }
{
chain=$2
n=split(chain, parts, / (--|==)> /)
for (i=1; i<=n; i++) print parts[i]
}' "$OUT_PATHS"
return 0
;;
esac
# produce a 4-col TSV: site thread hops path (path = the v1 typed chain)
local tsv
tsv=$(awk -F'\t' '
{
site=$1; chain=$2
# first node
# first node = chain up to the first typed arrow
first=chain
sub(/ -> .*/, "", first)
# hop count = number of " -> " separators + 1
n=split(chain, parts, / -> /)
sub(/ (--|==)> .*/, "", first)
# hop count = number of nodes = typed-arrow separators + 1
n=split(chain, parts, / (--|==)> /)
printf "%s\t%s\t%d\t%s\n", site, first, n, chain
}' "$OUT_PATHS")