diff --git a/CHANGELOG.md b/CHANGELOG.md index fa8ef72..680739e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,89 @@ All notable changes to `cloverleaf-larry` / `larry-anywhere` are recorded here. Versioning is loose-semver; bumps trigger the in-process self-update on every running client via `LARRY_BASE_URL` + `MANIFEST`. +## v0.8.20 — 2026-05-28 + +Route-chain tracer (`lib/nc-paths.sh`) REARCHITECTED for the real integrator: +parse once, walk in memory; cross-site linking corrected from a port-match +heuristic to authoritative **`destination`-block** resolution. v0.8.20 was never +shipped — this entry supersedes the earlier port-based draft of the same version, +which FAILED Bryan's real-integrator smoke (24-site QA env): catastrophically slow +AND missing real cross-site feeders. + +**Problem (measured on the real 24-site integrator, before this fix):** +- `nc-paths.sh ADTto_CodaMetrix ancout --site-only` → correct chain but **84 s**. +- full (no flag) → same single chain, **164 s**. +- `--down` → "unknown flag". +- Root cause: the walker invoked `nc-parse.sh` as a SUBPROCESS per hop / per + candidate (`destinations`/`sources`/`protocol-nested`/`protocol-field`/ + `list-protocols`), and each invocation re-ran `_blocks` + `cmd_protocol_block` + — two full awk passes over the (16K-line) NetConfig. O(threads × parse-cost) = + minutes. Even the intra-site walk was a bottleneck (`sources` scans every body). +- Correctness: the draft linked sites by matching an outbound's `PROTOCOL.PORT` + to an inbound's listen/ICL port. That MISSED the real mux feeder of ancout's + `IB_ADT_muxS` (port 62043) — because no thread has `PROTOCOL.PORT 62043`; the + link is expressed only through a `destination` block. + +**1. Single-pass index (`lib/nc-parse.sh` new `index` subcommand, `cmd_index`).** +ONE awk pass per NetConfig emits a flat record stream the walker needs: +`P` protocol, `D` DEST edge (handles BOTH `{ DEST name }` and the list form +`{ DEST {a b c} }` — the list form was silently dropped by the old +`cmd_destinations` regex), `L` listen port (server `PROTOCOL.PORT` with +ISSERVER=1 and/or guarded `ICLSERVERPORT`), `O` outbound dest port, and +`X ` — the resolution of a top-level +`destination` block. Indexing all 24 live NetConfigs is <1 s. + +**2. In-memory route graph + in-memory walk (`lib/nc-paths.sh`).** The index loads +once into bash associative arrays (`G_PROTO`/`G_DESTS`/`G_LISTEN`/`G_OUT`/ +`G_DESTBLK`/`G_INSRC`/`G_DESTBLK_REV`; `_load_nc`, `_build_in_sources`, +`_build_graph`). `_walk_down`/`_walk_up` and the one-hop primitives +(`_outgoing`/`_incoming`/`_xsite_down_targets`/`_xsite_up_feeders`) are now pure +O(1) lookups — NO subprocess and NO re-parse per hop. Cycle test is a bash +substring match (`_seen_has`), not a `grep` fork per hop. + +**3. Cross-site link corrected to `destination` blocks.** Cloverleaf links sites +through the named ICL destination table: a thread's DATAXLATE `DEST` may name +either a LOCAL protocol (intra-site hop) or a `destination` block, which resolves +to `{ SITE }` `{ THREAD }` `{ PORT }`. A `DEST` naming a destination block is the +cross-site hop, resolved by NAME to the exact remote (site,thread). The `PORT` +equals the remote thread's listen/ICL port (corroboration), but it is never the +primary key. `ICLSERVERPORT` is still read GUARDED in the index (absent/`{}` → +skipped, never the un-guarded `keylget` that crashed v2 `paths.tcl`). + +**4. `full` mode = upstream × downstream JOIN at the thread.** No more +O(sites × threads) entry-chain scan (Vera m3). The complete chain is the thread's +upstream feeder chains (each ending AT the thread) joined to its downstream chains +(each starting AT the thread); both walks follow destination blocks, so the join +spans sites naturally. + +**5. Flag standardization.** `--down`/`--up` are now accepted as aliases of +`--downstream`/`--upstream` in `nc-paths.sh` itself (they already worked via the +`/paths` slash handler; the bare script rejected them). + +**6. Intra-site hops UNCHANGED in semantics** — still the DATAXLATE `DEST` list, +never an `ICLSERVERPORT` walk. + +**7. Removed:** the port-match cross-site index (`_build_port_index`, the `PI_*` +arrays), the per-hop subprocess primitives (`_proto_port`/`_proto_isserver`/ +`_icl_port`/`_norm_port`), and the dead `_nc_for_site` helper. + +**Verification — RE-MEASURED ON THE REAL 24-SITE INTEGRATOR** (tarball +`cloverleaf_test.tar.gz`, HCIROOT = extracted `integrator/`): +- `ADTto_CodaMetrix ancout --site-only`: **84 s → 0.66 s**. +- `ADTto_CodaMetrix ancout` (full): **164 s → 1.0 s**. +- whole-tree `--all` (all 24 sites, 709 chains): **4.3 s** (well under a minute). +- `--down` / `--up`: now valid flags. +- REAL cross-site chain proven: `mux/ADTfr_epic_964700 --> mux/OB_ADT_ancS ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix`. `IB_ADT_muxS`'s upstream feeder lives in the `mux` + site and reaches ancout via destination block `OB_ADT_ancS` + (`{ SITE ancout } { THREAD IB_ADT_muxS } { PORT 62043 }`) — exactly the feeder + the port-match draft missed and Bryan asked for. Multi-site fan-out is + site-correct (each destination block resolves to its own site's `IB_ADT_muxS`). +- `--site-only` confirmed to suppress all cross-site hops. +- `bash -n` clean (`nc-paths.sh`, `nc-parse.sh`, `larry.sh`); `/paths` + + `tool_nc_paths` drive clean under `set -u`; MANIFEST regenerated & `--check` OK. +- No-traffic-bypass preserved (read-only NetConfig parsing; no engine/network + calls; pure bash + awk, no python/.pyz; portable Win + Linux). + ## v0.8.19 — 2026-05-28 Deterministic route-chain `nc_paths` tool — the #1 fix from the deterministic diff --git a/MANIFEST b/MANIFEST index a9f4695..4540a16 100644 --- a/MANIFEST +++ b/MANIFEST @@ -23,21 +23,21 @@ # scripts/make-manifest.sh and bump VERSION. # Top-level scripts -larry.sh 8bc938bc3351b88b4fcf2c4244617ef335c9c9e3352fcc1b8da6ddbb9275cdf9 +larry.sh 20b68e650ff9a94a15f7745334fe0dc0f913da2c6d4c2b92388202c951d0d171 larry-tunnel.sh 6b050e4eeab15669f4858eaf3b807f168f211ced07815db9521bc40a093f6aaa larry-auth.sh a220cdf7878569dc3028951ee57fc8d5e706a8ca5c6aa45347b58facb386f831 larry-rollback.sh 91b5e9aa6c79266bf306dcfba4ca791c07971bd6924d67a779037531648aa6d0 install-larry.sh e97da4e12a0d8863ca18d79b12f6c4294c72fa6d4b11dffeab66504236bb4eb1 # Metadata -VERSION d6cb21adf47733cbddb6f624c559d39c4fa8f018d961f0e577f71b91327880e6 -MANUAL.md 956f736291ed3ada0f7bd61c20f60f5267a16776bae918fe3fa17d9c8e07b997 -CHANGELOG.md 83fb342bf07fd2086070974ea7ec031ae665493307f95406591e89c7da222959 +VERSION 9bb2e455df78105b99303d11d1de0401d94142ff3fadc8e37bcba6c0c4d59914 +MANUAL.md c64bd0251a51ad150508b4e1185355bc4826a64071d4de339f92ed550dbfacde +CHANGELOG.md 73f32366662b55ddc16cb937f0e6a4d0f4cd99181e8717ab9938d80b60984db6 # Agent personas (system-prompt overlays) agents/larry.md 0a1ef737e7fc133ab35be09f79c3a4df33de814e0404b69b950932d0c8a01be1 agents/clover.md d1bbfd6cc4642c2bff6e15dcbdf051d71b063b3fe29e0be97d17b3180d3c7ac5 -agents/cloverleaf-cheatsheet.md 4bd63c40bcc71ee4a15a330a3450118d8b88c1de1174366aaeef37b8940df751 +agents/cloverleaf-cheatsheet.md 95c3bc52eaae92dff548702b0a0461ccba6ac6d8b410196c45ca59f28d0b3477 agents/regress.md bb05ed1439b1e35d6e9799e32d683bfab166472c72115c1f02757e227c74e42f # Cygwin/MobaXterm CR-taint defense primitives (sourced by every tool) @@ -97,8 +97,8 @@ lib/nc-xlate.sh ea02693c3dff5db271771d4bb2927b23465b07798df2f9912bc2d2b58a134d54 lib/nc-smat-diff.sh ac003954701ea6b7f4aa1f6941f8536af5b5cdfbb75e306789753d453f06800e lib/nc-create-thread.sh 5a9d5407c117183cad831d6b95f0e785b1b806f5ccc67f803c12b3695882b5b7 lib/nc-tclgen.sh dc95f523d543192fc7b3ae204107ce67ebb9b7e5184fa0642a1af2e2454d3241 -lib/nc-parse.sh 473b64c66a55f07ef19fc589467102c9bf2f389c20eabea63bcf272cad3e16fb -lib/nc-paths.sh dadc4138dd24c5585e40253ef33a2a9adb0af1259bc6a601df44f26667934fb7 +lib/nc-parse.sh ab06df8264983a9c490af25bf20e1551a91e68b45a9ec24c6cb0fce1f1b9dd69 +lib/nc-paths.sh 388d2f4560736587a01218cadc1de612cd59e392819d16db2f56f19174c1111b lib/nc-inbound.sh 52d28c5f8d97bdf96f0fc7b5300d35b106b8e1226578f4cda430deb2a8b4a91b lib/nc-make-jump.sh 08a0bc58a299c95c60a59a5202792daf0ada3a8a0be7dc1b4cccc5724f5c9c79 lib/nc-msgs.sh 729e2d6c9159e83fa177fc6b982e48ed8453a9743477cc90afdd3cd4ec7e620c diff --git a/MANUAL.md b/MANUAL.md index 80b06a6..3c10b2c 100644 --- a/MANUAL.md +++ b/MANUAL.md @@ -164,15 +164,32 @@ lib/nc-parse.sh route-block "$HCISITEDIR/NetConfig" IB_ADT_muxS ## Route-chain path tracer (`lib/nc-paths.sh`) — the single walker -Enumerates the full root-to-leaf message path(s) by following the DATAXLATE -`{ DEST }` routing graph. Output columns **SITE THREAD HOPS PATH** — HOPS -is the thread count in the chain, PATH is the chain joined by ` -> ` (one row per -enumerated path; a branch yields multiple rows). Routing resolves via DEST only, -never `ICLSERVERPORT` (so it never recurs the old `paths.tcl` crash). **Cross-site -by default**: when a chain's terminal thread is also an entry thread in another -site's NetConfig (same name), the chain continues into that site -(mux -> ancout -> CodaMetrix). `--site-only` scopes to one site. Cycle-safe; always -terminates. +Enumerates the full root-to-leaf message path(s). **Within a site** the next hop +follows the DATAXLATE `{ DEST }` routing graph (never an `ICLSERVERPORT` +walk — so it cannot recur the old `paths.tcl` crash). Output columns +**SITE THREAD HOPS PATH** — HOPS is the thread count in the chain, PATH is the +chain joined by ` -> ` (one row per enumerated path; a branch yields multiple +rows). THREAD is the ROOT (first node) of the chain. + +> **Upstream note (Vera m2):** for `--up` chains the THREAD column shows the feeder +> ROOT (the most-upstream source) and the queried thread is the chain TERMINUS — +> PATH reads `source -> ... -> queried_thread`. + +**Cross-site by `destination` block (v0.8.20)** — Cloverleaf links sites through +named `destination` blocks (the inter-cloverleaf / ICL routing table), not by +thread name and not by blindly matching ports. A top-level +`destination { ... }` declares `{ SITE } { THREAD } +{ PORT }` — the remote inbound it connects to and the link port. A thread's +DATAXLATE `DEST` may name either a LOCAL protocol (intra-site hop) or a +`destination` block; a `DEST` naming a destination block is the cross-site hop, +resolved by name to the exact remote `(site, thread)`. The `PORT` equals the +remote thread's listen/ICL port (corroboration); `ICLSERVERPORT` is still read +GUARDED (absent / `{}` → skipped, never the un-guarded `keylget` that crashed +`paths.tcl`). Upstream cross-site feeders of an inbound = every destination block +(any site) resolving to it plus the threads that `DEST` to those blocks. The whole +route graph is parsed ONCE per run (`nc-parse.sh index`) into memory and walked +with O(1) lookups — no subprocess / re-parse per hop. `--site-only` scopes to one +site. Cycle-safe; always terminates. ```bash # One thread — every full path containing it (default), table output. @@ -182,6 +199,8 @@ lib/nc-paths.sh IB_ADT_muxS anc # cross-site chain followed # Only downstream chains from a thread / only upstream feeders lib/nc-paths.sh IB_ADT_muxS anc --downstream +# --up: THREAD column = the feeder ROOT; the queried thread (ADTto_CodaMetrix) +# is the chain TERMINUS, i.e. PATH = feeder_root -> ... -> ADTto_CodaMetrix. lib/nc-paths.sh ADTto_CodaMetrix codametrix --upstream # Stop at the site boundary (no cross-site join) diff --git a/VERSION b/VERSION index ad2cbac..d757c24 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.8.19 +0.8.20 diff --git a/agents/cloverleaf-cheatsheet.md b/agents/cloverleaf-cheatsheet.md index 0b2d17b..9007c5a 100644 --- a/agents/cloverleaf-cheatsheet.md +++ b/agents/cloverleaf-cheatsheet.md @@ -21,7 +21,7 @@ Two kinds of capability: | `nc_protocol_summary(netconfig, [filter])` | one-line TSV per protocol with direction, port, host, type — your default "lay of the land" call | | `nc_destinations(netconfig, name)` | "what does this thread route to?" — unique DEST list from DATAXLATE. **ONE HOP only — for the full multi-hop chain use `nc_paths`.** | | `nc_sources(netconfig, name)` | "what routes INTO this thread?" — unique source list. **ONE HOP only — for the full chain use `nc_paths`.** | -| `nc_paths(thread, site, [all], [site_only])` | **"trace the FULL route chain / what feeds X / the whole path / downstream + upstream"** — deterministic DFS path enumerator, output `SITE THREAD HOPS PATH`, cross-site by default. **Use this instead of repeated `nc_destinations`/`nc_sources`, grep, or read_file** for ANY path / chain / route-tracing question. | +| `nc_paths(thread, site, [all], [site_only])` | **"trace the FULL route chain / what feeds X / the whole path / downstream + upstream"** — deterministic DFS path enumerator, output `SITE THREAD HOPS PATH`. Intra-site hops follow DATAXLATE DEST; **cross-site links are via named `destination` blocks** (a `DEST` naming a destination block resolves to its `{ SITE } { THREAD }`; the `PORT` corroborates). The whole route graph is parsed once into memory and walked with O(1) lookups. For `--up`, THREAD = feeder ROOT and the queried thread is the terminus. **Use this instead of repeated `nc_destinations`/`nc_sources`, grep, or read_file** for ANY path / chain / route-tracing question. | | `nc_xlate_refs(netconfig, [name])` | "what .xlt files are referenced?" — all or scoped to one protocol | | `nc_find_inbound(netconfig, mode, format)` | "which threads are inbound?" — modes: `tcp-listen` (real upstream-client listeners, ISSERVER=1), `icl-or-file` (OBWORKASIB=1 internal mux/file inbounds), `all`. formats: tsv, jsonl, table | diff --git a/larry.sh b/larry.sh index a776989..11d2244 100755 --- a/larry.sh +++ b/larry.sh @@ -78,7 +78,7 @@ set -o pipefail # ───────────────────────────────────────────────────────────────────────────── # Config # ───────────────────────────────────────────────────────────────────────────── -LARRY_VERSION="0.8.19" +LARRY_VERSION="0.8.20" LARRY_HOME="${LARRY_HOME:-$HOME/.larry}" # ───────────────────────────────────────────────────────────────────────────── @@ -341,7 +341,7 @@ _tools_registry() { cat <<'REG' #NetConfig (read) nc-parse.sh|Parse a NetConfig: list/inspect protocols & processes, fields, routes, xlate refs, one-hop destinations/sources -nc-paths.sh|Route-chain PATH tracer: enumerate full root-to-leaf chains for a thread or whole site (cross-site by default). Usage: nc-paths.sh [--up|--down|--site-only] | --all [--site NAME] +nc-paths.sh|Route-chain PATH tracer: enumerate full root-to-leaf chains for a thread or whole site. Intra-site hops follow the DATAXLATE DEST list (rendered `-->`); a DEST that names a `destination` block is the LOCAL OUTBOUND SENDER node (shown, never collapsed) that cross-site-links (rendered `==>`) to the remote { SITE }/{ THREAD } it names. Default output is the v1 chain form, one path per line: `site/thread --> site/thread ==> site/thread …` (field 1 = root node, pipe-first). Accepts a `site/thread` node OR `thread site` as input. Parses each NetConfig once into an in-memory graph. Usage: nc-paths.sh [--up|--down|--site-only] [--format v1|table|tsv|jsonl|nodes] | --all [--site NAME] nc-find.sh|Cross-site search for threads/protocols by name/host/port/xlate across every site under $HCIROOT nc-inbound.sh|List the inbound (server/listener) threads in a NetConfig nc-status.sh|Engine runtime status (sites/threads/not-up/queued/connections) — wraps the shipped tstat binaries @@ -1701,16 +1701,21 @@ tool_nc_sources() { "$LARRY_LIB_DIR/nc-parse.sh" sources "$nc" "$name" 2>&1 } -# nc_paths — deterministic route-chain path ENUMERATOR (v0.8.19). The single -# walker backend; the model calls this ONCE instead of chaining -# nc_destinations + grep_files + read_file (the old ~$1 brute-force). Resolves -# the next hop ONLY from the DATAXLATE DEST list (never ICLSERVERPORT) so it -# cannot recur the old paths.tcl crash. Cross-site by default; --site-only scopes -# to one site. Either pass an explicit netconfig, or a (thread,site) pair, or -# --all for the whole-site/cross-site entry-chain inventory. +# nc_paths — deterministic route-chain path ENUMERATOR. The single walker +# backend; the model calls this ONCE instead of chaining nc_destinations + +# grep_files + read_file (the old ~$1 brute-force). INTRA-site, the next hop is +# resolved from the DATAXLATE DEST list (never an ICLSERVERPORT walk, so it +# cannot recur the old paths.tcl crash). CROSS-site (v0.8.20), threads link via +# named `destination` blocks: a DEST that names a destination block resolves to +# its { SITE } { THREAD } (the PORT corroborates the link; ICLSERVERPORT is read +# GUARDED). Each NetConfig is parsed EXACTLY ONCE into an in-memory graph +# (nc-parse.sh index) and the walk is pure in-memory lookups — no subprocess / +# re-parse per hop. --site-only disables cross-site linking. Either pass an +# explicit netconfig, or a (thread,site) pair, or --all for the whole-site / +# cross-site entry-chain inventory. tool_nc_paths() { local netconfig="$1" thread="$2" site="$3" direction="${4:-full}" - local all_mode="${5:-0}" site_only="${6:-0}" fmt="${7:-table}" hciroot="${8:-${HCIROOT:-}}" + local all_mode="${5:-0}" site_only="${6:-0}" fmt="${7:-v1}" hciroot="${8:-${HCIROOT:-}}" _lib_err_if_missing || return local args=() [ -n "$netconfig" ] && args+=(--netconfig "$netconfig") @@ -4153,7 +4158,7 @@ execute_tool() { "$(J '.direction // "full"')" \ "$(J '.all // 0' | sed "s/false/0/;s/true/1/")" \ "$(J '.site_only // 0' | sed "s/false/0/;s/true/1/")" \ - "$(J '.format // "table"')" "$(J '.hciroot // ""')" ;; + "$(J '.format // "v1"')" "$(J '.hciroot // ""')" ;; nc_tclproc_refs) tool_nc_tclproc_refs "$(J '.netconfig')" "$(J '.name // ""')" ;; hl7_field) tool_hl7_field "$(J '.message')" "$(J '.field_path')" ;; nc_msgs) tool_nc_msgs "$(J '.thread')" "$(J '.after // ""')" "$(J '.before // ""')" \ @@ -4215,7 +4220,7 @@ TOOLS_JSON=$(cat <<'TOOLS_END' {"name":"nc_make_jump","description":"Generate the 3-thread jump set for the cross-environment data replay pattern Bryan uses. Emits FOUR artifacts: (1) linux__out for OLD env (outbound tcpip-client to new linux:jump_port), (2) windows__in for NEW env server_jump site (inbound tcpip-server listening on jump_port, routes internally to #3), (3) windows__out for NEW env server_jump site (outbound tcpip-client to 127.0.0.1:, where orig_port is the existing inbound listening port read from the NetConfig), (4) route-add snippet to splice into the OLD inbound DATAXLATE block. Tag = inbound thread name (auto). The NEW env existing inbound is left COMPLETELY UNCHANGED. Pure generation; caller uses write_file (Y/N) to persist.","input_schema":{"type":"object","properties":{"netconfig":{"type":"string","description":"NetConfig path containing the inbound thread (OLD env)."},"inbound":{"type":"string","description":"Existing inbound protocol name to mirror. Must be a TCP-listener (ISSERVER=1); read its PROTOCOL.PORT first to confirm."},"new_host":{"type":"string","description":"Hostname/IP of the NEW linux env that OLD will TCP to."},"jump_port":{"type":"string","description":"TCP port for the OLD to NEW hop. linux__out targets it, windows__in listens on it."},"inbound_host":{"type":"string","description":"Host that windows__out connects to on NEW (the existing inbound on NEW). Default 127.0.0.1 (same box, loopback)."},"process_jump":{"type":"string","description":"Process for NEW-side threads on server_jump. Default server_jump."},"encoding":{"type":"string","description":"ENCODING override. Default = same as the existing inbound."}},"required":["netconfig","inbound","new_host","jump_port"]}}, {"name":"nc_sources","description":"List every protocol that has a DATAXLATE DEST routing to the named thread. The inverse of nc_destinations. ONE HOP ONLY — to trace a full multi-hop chain use nc_paths, not repeated nc_sources calls.","input_schema":{"type":"object","properties":{"netconfig":{"type":"string"},"name":{"type":"string","description":"Target thread name."}},"required":["netconfig","name"]}}, - {"name":"nc_paths","description":"Deterministic ROUTE-CHAIN tracer. Enumerates the full root-to-leaf message path(s) by following the DATAXLATE DEST routing graph (NEVER ICLSERVERPORT). USE THIS — DO NOT brute-force with grep_files / read_file / bash_exec / repeated nc_destinations — for ANY of: 'show me the path', 'trace the chain', 'what feeds X', 'where does X go', 'full route', 'end-to-end flow', 'sources and destinations chain', 'how does a message get from A to B', 'map the interface flow'. ONE call answers the whole question. Output columns SITE THREAD HOPS PATH where HOPS = thread count in the chain and PATH = the chain joined by ' -> ' (one row per enumerated path; a branch yields multiple rows). MODES: (a) one thread — set `thread` (and optionally `site`); default returns every full path containing that thread; set direction=down for only downstream, direction=up for only upstream feeders. (b) whole-site / whole-environment inventory — set all=true (optionally scope with `site`); enumerates every chain from every entry point (a thread with no incoming), deduped. CROSS-SITE BY DEFAULT: when a chain's terminal thread is also an entry thread in another site's NetConfig (same thread name), the chain CONTINUES into that site — e.g. mux -> ancout -> CodaMetrix spanning sites. Set site_only=true to stop at the site boundary. Resolves sites under $HCIROOT automatically (or pass hciroot / an explicit netconfig). Cycle-safe across sites; always terminates.","input_schema":{"type":"object","properties":{"thread":{"type":"string","description":"Thread/protocol name to trace. Omit only when all=true."},"site":{"type":"string","description":"Site name (the NetConfig's parent dir). Optional — disambiguates a thread present in multiple sites, or scopes all-mode to one site."},"netconfig":{"type":"string","description":"Optional explicit NetConfig path. If given, the thread's home site is its parent dir; cross-site joins still scan $HCIROOT unless site_only=true."},"direction":{"type":"string","enum":["full","up","down"],"description":"full (default) = every path containing the thread; down = only downstream chains; up = only upstream feeder chains."},"all":{"type":"boolean","description":"true = enumerate every chain from every entry point (whole-site/whole-environment inventory). No thread needed."},"site_only":{"type":"boolean","description":"true = do NOT cross site boundaries (scope to one site). Default false = follow the chain across sites."},"format":{"type":"string","enum":["table","tsv","jsonl"],"description":"Output format. Default table (aligned, monospace)."},"hciroot":{"type":"string","description":"Override $HCIROOT for site discovery / cross-site joins."}},"required":[]}}, + {"name":"nc_paths","description":"Deterministic ROUTE-CHAIN tracer. Enumerates the full root-to-leaf message path(s). WITHIN a site the next hop follows the DATAXLATE DEST routing graph (intra-site routing never walks ICLSERVERPORT). USE THIS — DO NOT brute-force with grep_files / read_file / bash_exec / repeated nc_destinations — for ANY of: 'show me the path', 'trace the chain', 'what feeds X', 'where does X go', 'full route', 'end-to-end flow', 'sources and destinations chain', 'how does a message get from A to B', 'map the interface flow'. ONE call answers the whole question. DEFAULT OUTPUT is the v1 chain form, ONE PATH PER LINE: `site/thread --> site/thread ==> site/thread …` where every node is `site/thread`, `-->` is an INTRA-site DATAXLATE route hop, and `==>` is a CROSS-site hop. The FIRST node is the chain ROOT; field 1 (split on whitespace) IS the root node, so the output is pipe-first (`paths X | awk '{print $1}'` → the root). A branch yields multiple lines. For direction=up the root is the feeder ROOT and the queried thread is the chain TERMINUS. MODES: (a) one thread — set `thread` (accepts `thread`+`site` OR a single `site/thread` node, so output feeds back in); default returns every full path containing that thread; direction=down for only downstream, direction=up for only upstream feeders. (b) whole-site / whole-environment inventory — set all=true (optionally scope with `site`); enumerates every chain from every entry point (a thread with no incoming), deduped. CROSS-SITE BY DESTINATION BLOCK (Cloverleaf links sites through named `destination` blocks — the ICL routing table — not by thread name and not by blindly matching ports): a thread's DATAXLATE DEST may name a `destination` block; that block NAME is the LOCAL OUTBOUND SENDER node (shown in the chain, NEVER collapsed) and resolves to { SITE }/{ THREAD } { PORT } — the remote inbound it links to. So at every site boundary the chain reads `…local_inbound --> local_outbound_sender ==> remote_inbound --> …`, e.g. mux/ADTfr_epic_964700 --> mux/OB_ADT_ancS ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix. Upstream feeders of an inbound are resolved symmetrically. The whole route graph is parsed ONCE per run into memory; cross-site resolution is an in-memory lookup, not a per-site scan. Set site_only=true to stop at the site boundary. Resolves sites under $HCIROOT automatically (or pass hciroot / an explicit netconfig). Cycle-safe across sites; always terminates.","input_schema":{"type":"object","properties":{"thread":{"type":"string","description":"Thread/protocol name to trace, OR a `site/thread` node (the output's root node feeds straight back in). Omit only when all=true."},"site":{"type":"string","description":"Site name (the NetConfig's parent dir). Optional — disambiguates a thread present in multiple sites, or scopes all-mode to one site."},"netconfig":{"type":"string","description":"Optional explicit NetConfig path. If given, the thread's home site is its parent dir; cross-site joins still scan $HCIROOT unless site_only=true."},"direction":{"type":"string","enum":["full","up","down"],"description":"full (default) = every path containing the thread; down = only downstream chains; up = only upstream feeder chains (root = feeder root, queried thread = terminus)."},"all":{"type":"boolean","description":"true = enumerate every chain from every entry point (whole-site/whole-environment inventory). No thread needed."},"site_only":{"type":"boolean","description":"true = do NOT cross site boundaries (scope to one site). Default false = follow the chain across sites via destination blocks."},"format":{"type":"string","enum":["v1","table","tsv","jsonl","nodes"],"description":"Output format. Default v1 = the chain form, one path per line (site/thread nodes, --> intra / ==> cross), pipe-first (field 1 = root). table = aligned SITE/THREAD/HOPS/PATH. tsv/jsonl = data. nodes = just the site/thread nodes one per line (no arrows), for re-piping."},"hciroot":{"type":"string","description":"Override $HCIROOT for site discovery / cross-site joins."}},"required":[]}}, {"name":"nc_tclproc_refs","description":"List every TCL proc name referenced from a protocol block (or from the whole NetConfig if name is omitted). Pulls from DATAFORMAT.PROC, PREPROCS.PROCS, POSTPROCS.PROCS, etc. Unique sorted.","input_schema":{"type":"object","properties":{"netconfig":{"type":"string"},"name":{"type":"string","description":"Optional. Scope to one protocol."}},"required":["netconfig"]}}, {"name":"hl7_field","description":"Extract a specific HL7 v2 field from a message. field_path = SEG[.FIELD[.COMPONENT[.SUBCOMPONENT]]]. Examples: PID.3 (MRN), PID.18 (account number), MSH.7 (timestamp), MSH.9.2 (event code, like A08), PID.5 (patient name with components). Multiple repetitions are returned one per line. Native v3, no v1/v2 dependency.","input_schema":{"type":"object","properties":{"message":{"type":"string","description":"Raw HL7 message text. Segments separated by \\r."},"field_path":{"type":"string","description":"Field path like PID.3 or MSH.9.2"}},"required":["message","field_path"]}}, {"name":"nc_msgs","description":"Query Cloverleaf smat (SQLite!) databases for messages from a thread. Filters: time range, exact HL7 field match. Native v3 — reads smatdb directly with sqlite3 -ascii, no hcidbdump/dbExtract needed. Format text shows messages line-by-line with metadata; count returns just the count; json returns structured data. Operates on LOCAL smatdbs; for a remote env's smatdb, use ssh_pull_smat first (sampled mode is cheaper than pulling the whole DB).","input_schema":{"type":"object","properties":{"thread":{"type":"string","description":"Thread name. The .smatdb file under $HCISITEDIR/exec/processes/*/.smatdb is auto-located unless db is given."},"after":{"type":"string","description":"Time-after filter. Accepts \"3 days ago\", \"2026-05-20 14:30:00\", \"2026-05-20\", or a unix timestamp."},"before":{"type":"string","description":"Time-before filter, same formats as after."},"field":{"type":"string","description":"HL7 field path for exact-match filter, e.g. PID.18 or MSH.10."},"value":{"type":"string","description":"Value the field must equal. Use with field. Repeatable filters not supported via this single tool call — chain calls if you need multi-field AND."},"limit":{"type":"integer","description":"Max messages to return. Default 10."},"format":{"type":"string","enum":["text","json","count","raw"],"description":"text = human-readable with metadata; count = just the number; json = structured; raw = raw bytes separated by 0x1c."},"sitedir":{"type":"string","description":"Override $HCISITEDIR for thread-to-db location."},"db":{"type":"string","description":"Explicit .smatdb path; overrides auto-locate."}},"required":["thread"]}}, @@ -7062,10 +7067,12 @@ main_loop() { continue ;; /paths|/paths\ *) # v0.8.19: deterministic route-chain tracer (muscle-memory entry). - # /paths [site] [--up|--down] [--site-only] [--all] [--format tsv|table|jsonl] + # /paths [site] [--up|--down] [--site-only] [--all] [--format v1|table|tsv|jsonl|nodes] + # /paths / ... (v1 node form — output feeds back in) # /paths --all [site] [--site-only] + # Default format is v1 (the ground-truth chain form), pipe-first. local _pa; _pa=$(_slash_args "/paths" "$input") - local _p_thread="" _p_site="" _p_dir="full" _p_all=0 _p_siteonly=0 _p_fmt="table" _ptok _pexpect="" + local _p_thread="" _p_site="" _p_dir="full" _p_all=0 _p_siteonly=0 _p_fmt="v1" _ptok _pexpect="" for _ptok in $_pa; do if [ "$_pexpect" = "format" ]; then _p_fmt="$_ptok"; _pexpect=""; continue; fi case "$_ptok" in @@ -7084,7 +7091,7 @@ main_loop() { done # default site to the current $HCISITE when a thread is given without one if [ "$_p_all" = "0" ] && [ -z "$_p_thread" ]; then - err "usage: /paths [site] [--up|--down|--site-only|--all|--format tsv|table|jsonl]" + err "usage: /paths [site] | / [--up|--down|--site-only|--all|--format v1|table|tsv|jsonl|nodes]" continue fi if [ "$_p_all" = "0" ] && [ -z "$_p_site" ] && [ -n "${HCISITE:-}" ]; then diff --git a/lib/nc-parse.sh b/lib/nc-parse.sh index 21afd76..fd9b285 100755 --- a/lib/nc-parse.sh +++ b/lib/nc-parse.sh @@ -25,6 +25,9 @@ # xlate-refs [] — list xlate .xlt files referenced # tclproc-refs [] — list TCL proc names referenced # route-block — emit the DATAXLATE block (the routing config) +# index — single-pass route INDEX (P/D/L/O/X +# records) for the in-memory path walker; +# see cmd_index. Parses the file ONCE. # help — this help # # Route-chain PATH enumeration (root-to-leaf chains, all-mode, cross-site) lives @@ -76,6 +79,131 @@ _blocks() { ' "$nc" } +# ───────────────────────────────────────────────────────────────────────────── +# SINGLE-PASS INDEX (v0.8.20 perf rearchitecture). +# +# Emits, in ONE awk pass over the NetConfig, every fact the path WALKER needs so +# it never has to re-invoke a subprocess or re-parse the file per hop. The old +# walker called nc-parse.sh (destinations/sources/protocol-nested/...) once PER +# HOP PER CANDIDATE, and each of those re-ran _blocks + cmd_protocol_block (two +# full awk passes over a 16K-line file). On the real 24-site integrator that was +# O(threads x parse-cost) = minutes. This subcommand replaces all of it: parse +# ONCE, walk in memory. +# +# Output is a flat TAB-separated record stream (one record per line). The leading +# single-char tag identifies the record kind: +# P protocol declared in this NetConfig +# D a DATAXLATE DEST edge thread->dest +# (handles BOTH `{ DEST name }` and the +# list form `{ DEST {a b c} }`) +# L a LISTEN port for : +# server PROTOCOL.PORT (ISSERVER=1) and/or +# the guarded top-level ICLSERVERPORT. +# O an OUTBOUND/tcpip-client dest port for +# (PROTOCOL.PORT with ISSERVER!=1). +# X a top-level `destination` block: the +# AUTHORITATIVE cross-site link. A DEST that +# names hops to in +# (PORT is the connecting port). This is how +# Cloverleaf actually links sites (ICL) — by +# named destination, resolved to SITE+THREAD, +# NOT by blindly matching ports. +# +# Robust to arbitrary brace nesting (same depth bookkeeping as _blocks). DEST, +# ISSERVER, PORT and ICLSERVERPORT are recognised by their canonical one-line +# `{ KEY value }` rendering; absent/`{}` values are simply never emitted (the +# guard that the old paths.tcl lacked). +# ───────────────────────────────────────────────────────────────────────────── +cmd_index() { + local nc="$1" + require_file "$nc" + awk ' + BEGIN { depth=0; in_block=0; btype=""; bname="" + in_proto=0; proto_depth=0; pport=""; isserver=""; iclport="" + dsite=""; dthread=""; dport="" } + + # ---- enter a top-level block ------------------------------------------- + !in_block && $0 ~ /^(process|protocol|destination) [A-Za-z0-9_]+ \{$/ { + split($0, a, " ") + btype = a[1]; bname = a[2] + depth = 1; in_block = 1 + in_proto = 0; proto_depth = 0; pport=""; isserver=""; iclport="" + dsite=""; dthread=""; dport="" + if (btype == "protocol") print "P\t" bname + next + } + + in_block { + line = $0 + + # --- field extraction BEFORE we mutate depth (fields are depth-1 inside + # their parent block; the value is the whole `{ KEY v }` on one line) --- + if (btype == "protocol") { + # DEST single: { DEST name } + if (match(line, /\{ DEST [A-Za-z0-9_]+ \}/)) { + v = substr(line, RSTART+7, RLENGTH-9) # strip "{ DEST " .. " }" + print "D\t" bname "\t" v + } + # DEST list: { DEST {a b c} } + else if (match(line, /\{ DEST \{[^}]*\}/)) { + v = substr(line, RSTART+8, RLENGTH-9) # strip "{ DEST {" .. "}" + m = split(v, dd, /[ \t]+/) + for (i=1; i<=m; i++) if (dd[i] != "") print "D\t" bname "\t" dd[i] + } + # top-level ICLSERVERPORT (a listen port, guarded numeric) + if (match(line, /^[[:space:]]+\{ ICLSERVERPORT [0-9]+ \}[[:space:]]*$/)) { + v = line; sub(/^[[:space:]]+\{ ICLSERVERPORT /, "", v); sub(/ \}[[:space:]]*$/, "", v) + iclport = v + } + # enter the nested { PROTOCOL { ... } } sub-block + if (!in_proto && line ~ /^[[:space:]]+\{ PROTOCOL \{$/) { + in_proto = 1; proto_depth = depth + 1 + } else if (in_proto) { + if (match(line, /^[[:space:]]+\{ PORT [0-9]+ \}[[:space:]]*$/)) { + v = line; sub(/^[[:space:]]+\{ PORT /, "", v); sub(/ \}[[:space:]]*$/, "", v) + pport = v + } + if (match(line, /^[[:space:]]+\{ ISSERVER [0-9]+ \}[[:space:]]*$/)) { + v = line; sub(/^[[:space:]]+\{ ISSERVER /, "", v); sub(/ \}[[:space:]]*$/, "", v) + isserver = v + } + } + } else if (btype == "destination") { + if (match(line, /^[[:space:]]+\{ SITE [A-Za-z0-9_]+ \}[[:space:]]*$/)) { + v = line; sub(/^[[:space:]]+\{ SITE /, "", v); sub(/ \}[[:space:]]*$/, "", v); dsite = v + } + if (match(line, /^[[:space:]]+\{ THREAD [A-Za-z0-9_]+ \}[[:space:]]*$/)) { + v = line; sub(/^[[:space:]]+\{ THREAD /, "", v); sub(/ \}[[:space:]]*$/, "", v); dthread = v + } + if (match(line, /^[[:space:]]+\{ PORT [0-9]+ \}[[:space:]]*$/)) { + v = line; sub(/^[[:space:]]+\{ PORT /, "", v); sub(/ \}[[:space:]]*$/, "", v); dport = v + } + } + + # --- depth bookkeeping --- + n_open = gsub(/\{/, "{", line) + n_close = gsub(/\}/, "}", line) + depth += n_open - n_close + if (in_proto && depth < proto_depth) in_proto = 0 + + # --- close the top-level block: emit its aggregate records --- + if (depth == 0) { + if (btype == "protocol") { + # listen ports: server PROTOCOL.PORT (ISSERVER=1) and/or ICL port + if (isserver == "1" && pport != "") print "L\t" bname "\t" pport + if (iclport != "") print "L\t" bname "\t" iclport + # outbound/tcpip-client dest port + if (isserver != "1" && pport != "") print "O\t" bname "\t" pport + } else if (btype == "destination") { + if (dsite != "" && dthread != "") + print "X\t" bname "\t" dsite "\t" dthread "\t" dport + } + in_block = 0; btype=""; bname="" + } + } + ' "$nc" +} + cmd_list_protocols() { local nc="$1" require_file "$nc" @@ -332,9 +460,10 @@ cmd_tclproc_refs() { # cmd_chain only emitted a flat set of reachable nodes (depth/direction/thread), # never enumerated root-to-leaf PATHS, was never wired into the LLM, and would # have left two competing walkers. nc-paths.sh ports the v2 `paths` DFS -# enumerator (SITE/THREAD/HOPS/PATH output, all-mode, cross-site joins) and reuses -# the one-hop DEST primitives (cmd_destinations / cmd_sources) below. Do not -# reintroduce a second walker here — extend nc-paths.sh. +# enumerator (SITE/THREAD/HOPS/PATH output, all-mode, PORT-based cross-site links) +# and reuses the one-hop DEST primitives (cmd_destinations / cmd_sources) below +# for intra-site routing. Do not reintroduce a second walker here — extend +# nc-paths.sh. cmd_route_block() { local nc="$1" name="$2" @@ -384,6 +513,7 @@ case "$SUB" in xlate-refs) [ $# -ge 2 ] || die "usage: $0 xlate-refs [name]"; cmd_xlate_refs "$2" "${3:-}" ;; tclproc-refs) [ $# -ge 2 ] || die "usage: $0 tclproc-refs [name]"; cmd_tclproc_refs "$2" "${3:-}" ;; route-block) [ $# -ge 3 ] || die "usage: $0 route-block "; cmd_route_block "$2" "$3" ;; + index) [ $# -ge 2 ] || die "usage: $0 index "; cmd_index "$2" ;; help|-h|--help) cmd_help ;; *) die "unknown subcommand: $SUB (try '$0 help')" ;; esac diff --git a/lib/nc-paths.sh b/lib/nc-paths.sh index 2357fb6..5e0aa8b 100755 --- a/lib/nc-paths.sh +++ b/lib/nc-paths.sh @@ -15,21 +15,45 @@ # - All-mode: enumerate from every entry point (a thread with no incoming), # deduped — gives the whole-site chain inventory (v2 list_full_routes). # -# ROUTING RESOLUTION: next hop is resolved ONLY from the DATAXLATE { DEST } -# list (via nc-parse.sh destinations / sources). It NEVER reads ICLSERVERPORT. -# This is deliberate: Bryan's old paths.tcl walked routes via -# `keylget data ICLSERVERPORT`, which THROWS on any thread lacking that key -# (every outbound/client thread), so the trace died on the first client thread. -# The DEST list is present on every routing thread regardless of direction and -# simply yields nothing (no crash) when a thread has no routes. DO NOT -# reintroduce an ICLSERVERPORT-based hop here. +# INTRA-SITE ROUTING RESOLUTION: within a single site the next hop is resolved +# ONLY from the DATAXLATE { DEST } list (via nc-parse.sh destinations / +# sources). It NEVER walks via ICLSERVERPORT inside a site. The DEST list is +# present on every routing thread regardless of direction and simply yields +# nothing (no crash) when a thread has no routes. DO NOT reintroduce an +# ICLSERVERPORT-based hop for INTRA-site routing. # -# CROSS-SITE BY DEFAULT (Bryan's resolved decision, 2026-05-28): when a chain's -# terminal thread (a downstream leaf with no further DEST in its own site) is -# ALSO an entry/inbound thread declared in ANOTHER discovered site's NetConfig -# (correlated by shared thread name), the walk CONTINUES into that site — so the -# mux -> ancout -> CodaMetrix style chain is followed end to end across the site -# boundary. Pass --site-only to scope the walk to a single site. +# CROSS-SITE BY DESTINATION BLOCK (v0.8.20, corrected on the real integrator): +# Cloverleaf links sites through named `destination` blocks — the inter-cloverleaf +# (ICL) routing table — NOT by blindly matching ports. A `destination {...}` +# top-level block declares { SITE } { THREAD } { PORT }: it +# names a remote inbound thread in another site and the port the link connects on. +# A protocol's DATAXLATE DEST list may name EITHER (a) a LOCAL protocol (intra-site +# hop) OR (b) a destination block — and a DEST naming a destination block is the +# cross-site hop, resolved AUTHORITATIVELY to (SITE,THREAD). The PORT equals the +# remote thread's listen/ICL port (verifiable), but the link is name-resolved, so +# it is exact: e.g. mux thread ADTfr_epic_964700 has { DEST OB_ADT_ancS }; the +# destination block OB_ADT_ancS is { SITE ancout } { THREAD IB_ADT_muxS } +# { PORT 62043 } — so the chain continues into ancout's IB_ADT_muxS. +# +# WHY NOT PURE PORT-MATCHING (the rejected v0.8.20-draft mechanism): an earlier +# draft inferred the link by matching an outbound's PROTOCOL.PORT to an inbound's +# server/ICL port. That was (1) slow and (2) WRONG — it missed real feeders whose +# cross-site link is expressed only via a destination block (the mux feeder of +# IB_ADT_muxS above is reached through DEST OB_ADT_ancS, not through any thread +# whose PROTOCOL.PORT == 62043). ICLSERVERPORT is still read GUARDED in the index +# (absent / `{}` on most threads → skipped, never an error — the un-guarded keylget +# is exactly what crashed the old paths.tcl), but it is used only to corroborate a +# destination block's PORT, never as the primary link key. +# +# The whole route graph (protocol DEST edges + destination-block resolution + +# reverse-source maps) is built ONCE per run from a single awk pass per NetConfig +# (`nc-parse.sh index`) into in-memory associative arrays. Cross-site DOWNSTREAM: a +# DEST naming a destination block continues into its (site,thread). Cross-site +# UPSTREAM feeders of (site,thread): every destination block (any site) resolving +# to it, and the threads in that block's site that DEST to the block name — all +# in-memory lookups, no per-site chain enumeration (fixes Vera's m3 AND the old +# O(threads x parse-cost) per-hop subprocess blowup). Pass --site-only to scope the +# walk to a single site. # # Robust cycle detection across sites: every walk carries the full ancestor set # keyed by "site\037thread"; revisiting any (site,thread) ancestor terminates the @@ -37,20 +61,45 @@ # terminates. A global max-depth cap (default 128, matching v2) is a second # backstop. # -# Output columns: SITE THREAD HOPS PATH -# THREAD = the start/anchor thread of the row -# HOPS = number of threads in the chain (len of the path list) -# PATH = the chain joined by " -> " (space-arrow-space) -# One row per enumerated root-to-leaf path; a branching thread yields N rows. +# DEFAULT OUTPUT = v1 CHAINS (one path per line, site/thread nodes, typed arrows): +# mux/ADTfr_epic_964700 --> mux/OB_ADT_ancS ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix +# - every NODE is rendered "site/thread" (slash join) +# - "-->" = an INTRA-site DATAXLATE route hop (a thread's DEST that names a +# LOCAL protocol — including the local OUTBOUND SENDER node, which is +# the destination-block name living in this site) +# - "==>" = a CROSS-site hop (the destination block's link: FROM the local +# outbound sender node TO the remote inbound thread it names) +# - one path per line; a branching thread yields N lines. +# This matches Bryan's v1 ground-truth paths.tcl: at every cross-site boundary the +# chain reads …local_inbound --> local_outbound_sender ==> remote_inbound --> … — +# the sender (= the destination-block name) is ALWAYS shown, never collapsed. +# +# The v1 line is PIPE-FIRST / field-extractable: `paths X | awkcut 1` yields the +# root node (field 1 = chain root, e.g. mux/ADTfr_epic_964700). The output is also +# valid INPUT: a "site/thread" node can be fed back in (paths X → extract root → +# paths ). `--format nodes` emits just the site/thread nodes (no arrows) one +# per line so piping never fights the arrow tokens. +# +# OTHER FORMATS (--format): +# table — the SITE/THREAD/HOPS/PATH aligned table (Bryan: kept, opt-in). +# THREAD = the start/anchor (ROOT) node of the row (first node in PATH); +# HOPS = number of nodes in the chain; PATH = the typed v1 chain. +# tsv — sitethreadhopspath (path = the typed v1 chain) +# jsonl — one JSON object per path {site,thread,hops,path} +# nodes — node-only: each path's "site/thread" nodes, one per line, blank line +# between paths (no arrows — clean for re-piping into `paths`). +# NOTE (Vera m2): for UPSTREAM (--up) chains the root is the feeder ROOT (the +# most-upstream source) and the queried thread is the chain TERMINUS. # # Usage: # nc-paths.sh --netconfig [flags] # explicit NetConfig # nc-paths.sh [flags] # resolve site under $HCIROOT +# nc-paths.sh / [flags] # site/thread (v1 node form) # nc-paths.sh --all [--site ] [flags] # whole-site entry chains # # Flags: -# --upstream only the upstream chains feeding the thread -# --downstream only the downstream chains from the thread +# --upstream | --up only the upstream chains feeding the thread +# --downstream | --down only the downstream chains from the thread # (neither flag = full paths containing the thread, # v2 default, falling back to downstream-from-thread) # --all enumerate from every entry point (no thread arg) @@ -60,7 +109,7 @@ # --netconfig operate on one explicit NetConfig (implies the site is # basename(dirname(file)); cross-site still scans $HCIROOT) # --max-depth N recursion cap (default 128) -# --format tsv|table|jsonl default: table +# --format v1|table|tsv|jsonl|nodes default: v1 (the ground-truth chain form) # # Exit codes: 0 OK, 1 usage error, 2 not found. set -u @@ -83,36 +132,53 @@ DIR_MODE="full" # full | up | down ALL_MODE=0 SITE_ONLY=0 MAX_DEPTH=128 -FORMAT="table" +FORMAT="v1" POSITIONAL=() while [ $# -gt 0 ]; do case "$1" in - --upstream) DIR_MODE="up" ;; - --downstream) DIR_MODE="down" ;; + --upstream|--up) DIR_MODE="up" ;; + --downstream|--down) DIR_MODE="down" ;; --all) ALL_MODE=1 ;; --site) shift; SITE_ARG="${1:-}" ;; --site-only) SITE_ONLY=1 ;; --hciroot) shift; HCIROOT_OVERRIDE="${1:-}" ;; --netconfig) shift; NETCONFIG="${1:-}" ;; --max-depth) shift; MAX_DEPTH="${1:-128}" ;; - --format) shift; FORMAT="${1:-table}" ;; - -h|--help) sed -n '2,70p' "$NC_SELF" | sed 's/^# \{0,1\}//'; exit 0 ;; + --format) shift; FORMAT="${1:-v1}" ;; + -h|--help) sed -n '2,113p' "$NC_SELF" | sed 's/^# \{0,1\}//'; exit 0 ;; --*) die "unknown flag: $1" ;; *) POSITIONAL+=("$1") ;; esac shift done -case "$FORMAT" in tsv|table|jsonl) ;; *) die "bad --format: $FORMAT (tsv|table|jsonl)" ;; esac +case "$FORMAT" in v1|tsv|table|jsonl|nodes) ;; *) die "bad --format: $FORMAT (v1|table|tsv|jsonl|nodes)" ;; esac # Positional shapes: # (manual: thread only; site from $HCISITE/$HCISITEDIR) # (manual muscle-memory: thread + site) +# / (v1 node form — the output IS valid input; pipe-first) +# PIPE-FIRST: a single positional containing a "/" is parsed as site/thread, so +# the v1 output (root node = "site/thread") can be fed straight back into paths. if [ "${#POSITIONAL[@]}" -ge 1 ]; then THREAD="${POSITIONAL[0]}"; fi if [ "${#POSITIONAL[@]}" -ge 2 ] && [ -z "$SITE_ARG" ]; then SITE_ARG="${POSITIONAL[1]}"; fi if [ "${#POSITIONAL[@]}" -gt 2 ]; then die "too many positional args: ${POSITIONAL[*]}"; fi +# Accept the v1 "site/thread" node form as a single positional. A bare thread with +# no embedded slash (the legacy form) is left untouched. Only split on the FIRST +# slash so thread names are preserved verbatim. An explicit --site/2nd positional +# wins over a slash-embedded site only if they agree; otherwise the slash form is +# authoritative for the site (it came from our own output). +if [ -n "$THREAD" ] && [ -z "$NETCONFIG" ]; then + case "$THREAD" in + */*) _slash_site="${THREAD%%/*}"; _slash_thr="${THREAD#*/}" + if [ -n "$_slash_site" ] && [ -n "$_slash_thr" ]; then + THREAD="$_slash_thr"; SITE_ARG="$_slash_site" + fi ;; + esac +fi + if [ "$ALL_MODE" = "0" ] && [ -z "$THREAD" ]; then die "no thread given (and --all not set). Try: nc-paths.sh OR nc-paths.sh --all --site " fi @@ -166,152 +232,348 @@ _discover_sites() { fi } -# Resolve the NetConfig path for a given site name (first match wins). -_nc_for_site() { - local want="$1" i - for ((i=0; i<${#SITE_NAMES[@]}; i++)); do - if [ "${SITE_NAMES[$i]}" = "$want" ]; then - printf '%s' "${SITE_NCS[$i]}" - return 0 - fi - done - return 1 -} - -# Given a thread name, find the FIRST discovered (site,nc) pair whose NetConfig -# declares that thread as a protocol. Emits "site\037nc" or returns 1. US=$'\037' # unit separator — safe field delimiter for site/thread keys -_locate_thread() { - local want="$1" i sname nc + +# ───────────────────────────────────────────────────────────────────────────── +# IN-MEMORY ROUTE GRAPH (v0.8.20 perf rearchitecture). +# +# The old walker invoked nc-parse.sh ONCE PER HOP PER CANDIDATE (destinations / +# sources / protocol-nested / protocol-field / list-protocols), and EACH of those +# re-ran _blocks + cmd_protocol_block — two full awk passes over the (16K-line) +# NetConfig. On the real 24-site integrator that is O(threads x parse-cost) = +# minutes (84s --site-only, 164s full for a single thread). Even intra-site was a +# bottleneck because `sources` scans every protocol body. +# +# Now we PARSE EACH NEEDED NetConfig EXACTLY ONCE (`nc-parse.sh index`, a single +# awk pass — see cmd_index) and load the result into bash associative arrays. The +# walkers then do pure O(1) in-memory lookups: NO subprocess and NO re-parse per +# hop. Indexing all 24 live NetConfigs is <1s; a single-thread trace is now a +# few seconds and a full-tree run is well under a minute. +# +# CROSS-SITE LINK (corrected): Cloverleaf links sites through named `destination` +# blocks (the ICL routing table), NOT by blindly matching ports. A protocol's +# DATAXLATE DEST may name either (a) a LOCAL protocol (intra-site hop) or (b) a +# `destination` block, which resolves to { SITE } { THREAD } +# { PORT } — the authoritative remote target. The PORT is the connecting +# port (it equals the remote thread's listen/ICL port — verifiable), but the SITE +# and THREAD come straight from the destination block, so the hop is exact and +# name-resolved. (The old port-only heuristic was BOTH slow AND missed real +# feeders whose link is expressed via a destination block — e.g. the mux feeder of +# ancout's IB_ADT_muxS via destination OB_ADT_ancS.) +# +# Associative arrays (bash 4+; matches the rest of this repo, and Git-Bash / +# Cygwin on Windows ship bash 4+/5+). Keys use US ("site\037thread") so names with +# unusual characters never collide with the field delimiter. +# G_PROTO[site\037thread] = 1 membership: thread exists in site +# G_DESTS[site\037thread] = "d1\nd2..." raw DATAXLATE DEST targets (newline) +# G_LISTEN[site\037thread] = "p1 p2" listen ports (server + ICL), space-sep +# G_OUT[site\037thread] = "port" outbound/tcpip-client dest port +# G_DESTBLK[site\037destname] = "tsite\037tthread\037tport" destination-block resolution +# G_INSRC[site\037thread] = "s1\ns2..." reverse intra-site DEST edges (sources) +# G_DESTBLK_REV[tsite\037tthread] = "fsite\037fname\n..." destination blocks (any site) +# pointing AT (tsite,tthread); fname is the dest +# block name, used to find its upstream feeders. +# G_LOADED tracks which NetConfigs have already been indexed (idempotent). +# ───────────────────────────────────────────────────────────────────────────── +declare -A G_PROTO G_DESTS G_LISTEN G_OUT G_DESTBLK G_INSRC G_DESTBLK_REV G_LOADED + +# Load ONE NetConfig's index into the in-memory graph (idempotent per nc path). +_load_nc() { + local site="$1" nc="$2" + [ -n "${G_LOADED[$nc]:-}" ] && return 0 + G_LOADED[$nc]=1 + local tag a b c d e key + while IFS=$'\t' read -r tag a b c d e; do + case "$tag" in + P) key="${site}${US}${a}"; G_PROTO[$key]=1 ;; + D) key="${site}${US}${a}" + if [ -z "${G_DESTS[$key]:-}" ]; then G_DESTS[$key]="$b"; else G_DESTS[$key]="${G_DESTS[$key]}"$'\n'"$b"; fi ;; + L) key="${site}${US}${a}" + if [ -z "${G_LISTEN[$key]:-}" ]; then G_LISTEN[$key]="$b"; else G_LISTEN[$key]="${G_LISTEN[$key]} $b"; fi ;; + O) key="${site}${US}${a}"; G_OUT[$key]="$b" ;; + X) # X + key="${site}${US}${a}"; G_DESTBLK[$key]="${b}${US}${c}${US}${d}" + local rkey="${b}${US}${c}" + local rval="${site}${US}${a}" + if [ -z "${G_DESTBLK_REV[$rkey]:-}" ]; then G_DESTBLK_REV[$rkey]="$rval"; else G_DESTBLK_REV[$rkey]="${G_DESTBLK_REV[$rkey]}"$'\n'"$rval"; fi ;; + esac + done < <("$NCP" index "$nc" 2>/dev/null) +} + +# Build the reverse intra-site DEST edges (sources) for every loaded site. Called +# once after all needed NetConfigs are loaded. For each thread A with DEST B in +# the SAME site, record A as a source of B (only when B is a local protocol — +# DEST targets that are destination blocks are handled as cross-site, not here). +_build_in_sources() { + local key src site dst dkey + for key in "${!G_DESTS[@]}"; do + site="${key%%$US*}"; src="${key#*$US}" + while IFS= read -r dst; do + [ -z "$dst" ] && continue + dkey="${site}${US}${dst}" + [ -n "${G_PROTO[$dkey]:-}" ] || continue # only local protocols are intra-site sources + if [ -z "${G_INSRC[$dkey]:-}" ]; then G_INSRC[$dkey]="$src"; else G_INSRC[$dkey]="${G_INSRC[$dkey]}"$'\n'"$src"; fi + done <<< "${G_DESTS[$key]}" + done +} + +# Ensure the WHOLE tree is loaded (all discovered sites) — needed for cross-site +# resolution and reverse-source maps. Idempotent. +GRAPH_BUILT=0 +_build_graph() { + [ "$GRAPH_BUILT" = "1" ] && return 0 + GRAPH_BUILT=1 + local i for ((i=0; i<${#SITE_NCS[@]}; i++)); do - sname="${SITE_NAMES[$i]}"; nc="${SITE_NCS[$i]}" - if "$NCP" list-protocols "$nc" 2>/dev/null | grep -qxF "$want"; then - printf '%s%s%s' "$sname" "$US" "$nc" - return 0 - fi + _load_nc "${SITE_NAMES[$i]}" "${SITE_NCS[$i]}" + done + _build_in_sources +} + +# Given a thread name, find the FIRST discovered site that declares it (in-memory). +# Emits "site" or returns 1. +_locate_thread() { + local want="$1" i sname + for ((i=0; i<${#SITE_NAMES[@]}; i++)); do + sname="${SITE_NAMES[$i]}" + [ -n "${G_PROTO[${sname}${US}${want}]:-}" ] && { printf '%s' "$sname"; return 0; } done return 1 } # ───────────────────────────────────────────────────────────────────────────── -# One-hop primitives (DEST-based, never ICLSERVERPORT). +# One-hop primitives — now pure in-memory lookups (no subprocess, no re-parse). +# INTRA-site routing follows the DATAXLATE DEST list only (never ICLSERVERPORT). +# A DEST that names a destination block is NOT an intra-site dest (it is the +# cross-site link, handled in the walkers). # ───────────────────────────────────────────────────────────────────────────── -_outgoing() { "$NCP" destinations "$1" "$2" 2>/dev/null; } # nc thread -> dest names -_incoming() { "$NCP" sources "$1" "$2" 2>/dev/null; } # nc thread -> source names +# Intra-site downstream: DEST targets that are LOCAL protocols in this site. +_outgoing() { # site thread + local site="$1" thr="$2" key="${1}${US}${2}" d dkey + [ -n "${G_DESTS[$key]:-}" ] || return 0 + while IFS= read -r d; do + [ -z "$d" ] && continue + dkey="${site}${US}${d}" + [ -n "${G_PROTO[$dkey]:-}" ] && printf '%s\n' "$d" + done <<< "${G_DESTS[$key]}" +} +# Intra-site upstream: local protocols that DEST to this thread. +_incoming() { local key="${1}${US}${2}"; [ -n "${G_INSRC[$key]:-}" ] && printf '%s\n' "${G_INSRC[$key]}"; } -# Is an entry point (no incoming) in ? -_is_entry_in() { - local nc="$1" t="$2" - [ -z "$(_incoming "$nc" "$t")" ] +# Is an entry point (no incoming) in ? +_is_entry_in() { [ -z "${G_INSRC[${1}${US}${2}]:-}" ]; } + +# Cross-site DOWNSTREAM targets: a DEST of (cur_site,cur_thread) that is NOT a +# local protocol but IS a destination block. The destination-block NAME (d) is the +# LOCAL OUTBOUND SENDER node, living in cur_site — v1 shows it and we must NOT +# collapse it. The block resolves to the remote inbound (tsite,tthread). Emit each +# as "sender\037tsite\037tthread" (sender = the dest-block name in cur_site). The +# walker then renders: cur_thread --(intra)--> cur_site/sender ==(cross)==> tsite/tthread. +# Authoritative name-resolved link (PORT is just confirmation). +_xsite_down_targets() { + local cur_site="$1" cur_thread="$2" key="${1}${US}${2}" d dbkey resolved + [ -n "${G_DESTS[$key]:-}" ] || return 0 + while IFS= read -r d; do + [ -z "$d" ] && continue + [ -n "${G_PROTO[${cur_site}${US}${d}]:-}" ] && continue # local protocol → intra-site, not here + dbkey="${cur_site}${US}${d}" + resolved="${G_DESTBLK[$dbkey]:-}" + [ -z "$resolved" ] && continue # not a known destination block → skip + local tsite="${resolved%%$US*}" rest="${resolved#*$US}" + local tthr="${rest%%$US*}" + printf '%s%s%s%s%s\n' "$d" "$US" "$tsite" "$US" "$tthr" + done <<< "${G_DESTS[$key]}" +} + +# Cross-site UPSTREAM feeders: who feeds (cur_site,cur_thread) from another site? +# Any destination block (in any site) that resolves to (cur_site,cur_thread); its +# upstream feeders are the threads in the destination block's OWN site that DEST to +# that destination-block NAME (dbname). The block NAME is the LOCAL OUTBOUND SENDER +# node, living in fsite between the feeder and this remote inbound — v1 shows it, +# so we carry it. Emit each as "fsite\037fthread\037dbname". The walker then renders +# the upstream prefix: fsite/feeder --(intra)--> fsite/dbname ==(cross)==> cur_site/cur_thread. +# Pure in-memory lookup — no per-site chain enumeration. +_xsite_up_feeders() { + local cur_site="$1" cur_thread="$2" rkey="${1}${US}${2}" dbref + [ -n "${G_DESTBLK_REV[$rkey]:-}" ] || return 0 + while IFS= read -r dbref; do + [ -z "$dbref" ] && continue + # dbref = fsite\037destblockname + local fsite="${dbref%%$US*}" dbname="${dbref#*$US}" feeder + # feeders = local protocols in fsite whose DEST names dbname + local fkey + for fkey in "${!G_DESTS[@]}"; do + [ "${fkey%%$US*}" = "$fsite" ] || continue + case $'\n'"${G_DESTS[$fkey]}"$'\n' in + (*$'\n'"$dbname"$'\n'*) + feeder="${fkey#*$US}" + printf '%s%s%s%s%s\n' "$fsite" "$US" "$feeder" "$US" "$dbname" ;; + esac + done + done <<< "${G_DESTBLK_REV[$rkey]}" } # ───────────────────────────────────────────────────────────────────────────── # Path enumeration. Emitted paths are written to $OUT_PATHS as one line each: -# sitechain where chain = thread1 -> thread2 -> ... -# We carry the running chain as a space-joined token list of "site\037thread" -# keys, and the ancestor set as newline-joined keys (for cycle detection). +# sitechain where chain = the rendered v1 typed chain (site/thread nodes +# joined by --> / ==>). +# +# CHAIN ENCODING (EDGE-TYPED). We carry the running chain as a space-joined list +# of TOKENS. The FIRST token is a bare node key "site\037thread". Every SUBSEQUENT +# token is "EDGE\035site\037thread" where the leading 1-char EDGE code records how +# this node connects to the PREVIOUS node: +# i = INTRA-site DATAXLATE hop → rendered "-->" +# x = CROSS-site destination-link → rendered "==>" +# \035 (GS) separates the edge code from the node key; \037 (US) separates site +# from thread. Node names are [A-Za-z0-9_]+ so neither separator can collide, and +# tokens stay space-tokenizable (the full-mode awk join still splits on spaces). +# The ancestor set (cycle detection) remains a newline-joined list of plain node +# keys "site\037thread" — edge codes are never part of it. # ───────────────────────────────────────────────────────────────────────────── +GS=$'\035' # group separator — delimits the edge code from the node key OUT_PATHS=$(mktemp) trap 'rm -f "$OUT_PATHS"' EXIT +# Append a node to a keychain with an explicit edge type. +# _chain_push CHAIN EDGE NODEKEY (EDGE = i|x; first push ignores EDGE) +# Emits the new chain on stdout. +_chain_push() { + local chain="$1" edge="$2" node="$3" + if [ -z "$chain" ]; then printf '%s' "$node"; else printf '%s %s%s%s' "$chain" "$edge" "$GS" "$node"; fi +} +# Prepend a node (upstream walk builds a prefix). The edge code lives on the node +# that follows it; when we prepend a NEW root we must move the edge code onto the +# OLD first node and leave the new root bare. +# _chain_unshift CHAIN EDGE NODEKEY +_chain_unshift() { + local chain="$1" edge="$2" node="$3" + if [ -z "$chain" ]; then printf '%s' "$node"; return 0; fi + # The current chain's first token is a bare node key (no edge code). Re-tag it + # with EDGE (its connection to the new root we are prepending), then prefix the + # bare new root. + local first="${chain%% *}" rest="" + case "$chain" in *' '*) rest=" ${chain#* }" ;; esac + printf '%s %s%s%s%s' "$node" "$edge" "$GS" "$first" "$rest" +} + # _emit_chain ANCHOR_SITE KEYCHAIN -# KEYCHAIN = space-separated list of "site\037thread" keys -# Renders to "anchor_sitet1 -> t2 -> ..." (thread names only in PATH). +# KEYCHAIN = the edge-typed token list described above. +# Renders to "anchor_sitesite/thread --> site/thread ==> ..." (v1 form). _emit_chain() { local anchor_site="$1" keychain="$2" - local out="" k thr first=1 - for k in $keychain; do - thr="${k#*$US}" - if [ "$first" = "1" ]; then out="$thr"; first=0; else out="$out -> $thr"; fi + local out="" tok edge node site thr first=1 + for tok in $keychain; do + if [ "$first" = "1" ]; then + node="$tok"; edge="" + else + edge="${tok%%$GS*}"; node="${tok#*$GS}" + fi + site="${node%%$US*}"; thr="${node#*$US}" + if [ "$first" = "1" ]; then + out="${site}/${thr}"; first=0 + else + case "$edge" in + x) out="$out ==> ${site}/${thr}" ;; + *) out="$out --> ${site}/${thr}" ;; + esac + fi done printf '%s\t%s\n' "$anchor_site" "$out" } +# Cycle test against the newline-joined ancestor set — pure bash, no grep +# subprocess (this used to fork `grep -qxF` per hop). seen lines are US-keyed. +_seen_has() { + case $'\n'"$1"$'\n' in (*$'\n'"$2"$'\n'*) return 0 ;; esac + return 1 +} + # Downstream DFS. Mirrors v2 _enumerate_downstream_paths + cross-site hop. +# All lookups are in-memory (the graph is keyed by SITE; no NetConfig path / no +# subprocess per hop). # $1 anchor_site — site to report in the SITE column for these rows # $2 cur_site — site of current thread -# $3 cur_nc — NetConfig of current thread -# $4 cur_thread — current thread name -# $5 keychain — space-joined ancestor keys NOT including current -# $6 seen — newline-joined ancestor keys (for cycle detection) -# $7 depth +# $3 cur_thread — current thread name +# $4 keychain — edge-typed ancestor chain NOT including current +# $5 seen — newline-joined ancestor node keys (for cycle detection) +# $6 depth +# $7 edge_in — edge connecting the previous node to cur (i|x; "" for root) _walk_down() { - local anchor_site="$1" cur_site="$2" cur_nc="$3" cur_thread="$4" - local keychain="$5" seen="$6" depth="$7" + local anchor_site="$1" cur_site="$2" cur_thread="$3" + local keychain="$4" seen="$5" depth="$6" edge_in="${7:-}" local curkey="${cur_site}${US}${cur_thread}" local newchain - if [ -z "$keychain" ]; then newchain="$curkey"; else newchain="$keychain $curkey"; fi + newchain="$(_chain_push "$keychain" "${edge_in:-i}" "$curkey")" # cycle / depth cap → terminate, include current node (v2 semantics) - if [ "$depth" -gt "$MAX_DEPTH" ] || printf '%s\n' "$seen" | grep -qxF "$curkey"; then + if [ "$depth" -gt "$MAX_DEPTH" ] || _seen_has "$seen" "$curkey"; then _emit_chain "$anchor_site" "$newchain" return 0 fi - # gather outgoing within the current site + # gather outgoing within the current site (DEST targets that are local protocols) local outgoing=() local d while IFS= read -r d; do [ -z "$d" ] && continue outgoing+=("$d") - done < <(_outgoing "$cur_nc" "$cur_thread") + done < <(_outgoing "$cur_site" "$cur_thread") + + local nseen="$seen"$'\n'"$curkey" + local branched=0 if [ "${#outgoing[@]}" -gt 0 ]; then - local nseen - nseen="$seen"$'\n'"$curkey" + branched=1 for d in "${outgoing[@]}"; do - _walk_down "$anchor_site" "$cur_site" "$cur_nc" "$d" "$newchain" "$nseen" $((depth+1)) + # intra-site route hop (-->) + _walk_down "$anchor_site" "$cur_site" "$d" "$newchain" "$nseen" $((depth+1)) i done - return 0 fi - # No outgoing in this site = a leaf for this site. CROSS-SITE HOP: - # if cross-site is enabled and this leaf thread is an entry/inbound thread in - # ANOTHER site's NetConfig (shared name) that DOES have outgoing there, - # continue the walk into that site. + # CROSS-SITE HOP via destination block (v0.8.20, corrected; v0.8.20 output fix: + # SHOW THE SENDER NODE). A DEST of this thread that names a destination block is + # the LOCAL OUTBOUND SENDER node (the block name, in cur_site) followed by the + # remote inbound (tsite,tthread). v1 renders BOTH: + # cur_thread --(intra -->)--> cur_site/sender ==(cross ==>)==> tsite/tthread + # so we (1) push the sender node with an INTRA edge, then (2) recurse into the + # remote inbound with a CROSS edge. NEVER collapse the sender. This is in ADDITION + # to any intra-site branches above (a thread can route both locally and cross-site). if [ "$SITE_ONLY" = "0" ]; then - local i osite onc okey - for ((i=0; i<${#SITE_NCS[@]}; i++)); do - osite="${SITE_NAMES[$i]}"; onc="${SITE_NCS[$i]}" - [ "$osite" = "$cur_site" ] && [ "$onc" = "$cur_nc" ] && continue - # the thread must exist in the other site AND have outgoing there - "$NCP" list-protocols "$onc" 2>/dev/null | grep -qxF "$cur_thread" || continue - [ -n "$(_outgoing "$onc" "$cur_thread")" ] || continue - okey="${osite}${US}${cur_thread}" - # cycle guard across sites: don't re-enter an ancestor (site,thread) - printf '%s\n' "$seen" | grep -qxF "$okey" && continue - # Continue the chain in the other site. We DROP the duplicate boundary - # node: cur_thread is already the last node in newchain, and it is the - # same thread name in osite, so we recurse on its destinations directly, - # carrying newchain as the prefix and marking both (site,thread) keys seen. - local nseen2 - nseen2="$seen"$'\n'"$curkey"$'\n'"$okey" - local dd - while IFS= read -r dd; do - [ -z "$dd" ] && continue - _walk_down "$anchor_site" "$osite" "$onc" "$dd" "$newchain" "$nseen2" $((depth+1)) - done < <(_outgoing "$onc" "$cur_thread") - # only join into the first matching downstream site, then stop scanning - return 0 - done + local tgt sender osite othr okey sendkey sendchain + while IFS= read -r tgt; do + [ -z "$tgt" ] && continue + # tgt = sender\037tsite\037tthread + sender="${tgt%%$US*}"; local rest="${tgt#*$US}" + osite="${rest%%$US*}"; othr="${rest#*$US}" + okey="${osite}${US}${othr}" + _seen_has "$seen" "$okey" && continue # cycle guard across sites + branched=1 + # (1) the local outbound sender node, intra-site edge from cur_thread + sendkey="${cur_site}${US}${sender}" + sendchain="$(_chain_push "$newchain" i "$sendkey")" + # (2) cross-site edge from the sender into the remote inbound; continue there + _walk_down "$anchor_site" "$osite" "$othr" "$sendchain" "$nseen" $((depth+1)) x + done < <(_xsite_down_targets "$cur_site" "$cur_thread") fi - # true terminal — emit the chain - _emit_chain "$anchor_site" "$newchain" + # true terminal (no intra- or cross-site continuation) — emit the chain + [ "$branched" = "0" ] && _emit_chain "$anchor_site" "$newchain" + return 0 } -# Upstream DFS. Mirrors v2 _enumerate_upstream_paths. Cross-site upstream: -# if a thread has no incoming in its own site but the same-named thread is a -# downstream/leaf in another site, follow that site's incoming (the feeders). -# builds the chain as a PREFIX (sources come before current) +# Upstream DFS. Mirrors v2 _enumerate_upstream_paths. Builds the chain as a PREFIX +# (sources come before current). Cross-site feeders are resolved via destination +# blocks (see _xsite_up_feeders) — in-memory, no per-site enumeration. +# $7 edge_in — edge connecting cur to the node that FOLLOWS it (already in +# keychain). i|x; "" for the terminus (nothing follows yet). _walk_up() { - local anchor_site="$1" cur_site="$2" cur_nc="$3" cur_thread="$4" - local keychain="$5" seen="$6" depth="$7" + local anchor_site="$1" cur_site="$2" cur_thread="$3" + local keychain="$4" seen="$5" depth="$6" edge_in="${7:-}" local curkey="${cur_site}${US}${cur_thread}" local newchain - if [ -z "$keychain" ]; then newchain="$curkey"; else newchain="$curkey $keychain"; fi + newchain="$(_chain_unshift "$keychain" "${edge_in:-i}" "$curkey")" - if [ "$depth" -gt "$MAX_DEPTH" ] || printf '%s\n' "$seen" | grep -qxF "$curkey"; then + if [ "$depth" -gt "$MAX_DEPTH" ] || _seen_has "$seen" "$curkey"; then _emit_chain "$anchor_site" "$newchain" return 0 fi @@ -321,143 +583,165 @@ _walk_up() { while IFS= read -r s; do [ -z "$s" ] && continue incoming+=("$s") - done < <(_incoming "$cur_nc" "$cur_thread") + done < <(_incoming "$cur_site" "$cur_thread") + + local nseen="$seen"$'\n'"$curkey" + local branched=0 if [ "${#incoming[@]}" -gt 0 ]; then - local nseen - nseen="$seen"$'\n'"$curkey" + branched=1 for s in "${incoming[@]}"; do - _walk_up "$anchor_site" "$cur_site" "$cur_nc" "$s" "$newchain" "$nseen" $((depth+1)) + # intra-site source feeds cur via a route hop (-->) + _walk_up "$anchor_site" "$cur_site" "$s" "$newchain" "$nseen" $((depth+1)) i done - return 0 fi - # cross-site upstream hop: same-named thread fed in another site + # CROSS-SITE UPSTREAM FEEDERS via destination block (v0.8.20, corrected; output + # fix: SHOW THE SENDER NODE). Any destination block (any site) resolving to THIS + # (site,thread); the block NAME is the LOCAL OUTBOUND SENDER node in the feeder's + # site, and the feeders are the threads in that site that DEST to the block name. + # v1 renders the upstream prefix as: + # fsite/feeder --(intra -->)--> fsite/sender ==(cross ==>)==> cur_site/cur_thread + # so we (1) prepend the sender node with a CROSS edge (sender ==> cur), then + # (2) recurse up into the feeder with an INTRA edge (feeder --> sender). In-memory. if [ "$SITE_ONLY" = "0" ]; then - local i osite onc okey - for ((i=0; i<${#SITE_NCS[@]}; i++)); do - osite="${SITE_NAMES[$i]}"; onc="${SITE_NCS[$i]}" - [ "$osite" = "$cur_site" ] && [ "$onc" = "$cur_nc" ] && continue - "$NCP" list-protocols "$onc" 2>/dev/null | grep -qxF "$cur_thread" || continue - [ -n "$(_incoming "$onc" "$cur_thread")" ] || continue - okey="${osite}${US}${cur_thread}" - printf '%s\n' "$seen" | grep -qxF "$okey" && continue - local nseen2 - nseen2="$seen"$'\n'"$curkey"$'\n'"$okey" - local ss - while IFS= read -r ss; do - [ -z "$ss" ] && continue - _walk_up "$anchor_site" "$osite" "$onc" "$ss" "$newchain" "$nseen2" $((depth+1)) - done < <(_incoming "$onc" "$cur_thread") - return 0 - done + local fdr fsite othr okey sender sendkey sendchain + while IFS= read -r fdr; do + [ -z "$fdr" ] && continue + # fdr = fsite\037fthread\037dbname + fsite="${fdr%%$US*}"; local rest="${fdr#*$US}" + othr="${rest%%$US*}"; sender="${rest#*$US}" + okey="${fsite}${US}${othr}" + _seen_has "$seen" "$okey" && continue + branched=1 + # (1) the local outbound sender node, CROSS edge into cur + sendkey="${fsite}${US}${sender}" + sendchain="$(_chain_unshift "$newchain" x "$sendkey")" + # (2) recurse up into the feeder, INTRA edge into the sender + _walk_up "$anchor_site" "$fsite" "$othr" "$sendchain" "$nseen" $((depth+1)) i + done < <(_xsite_up_feeders "$cur_site" "$cur_thread") fi - _emit_chain "$anchor_site" "$newchain" + [ "$branched" = "0" ] && _emit_chain "$anchor_site" "$newchain" + return 0 } # ───────────────────────────────────────────────────────────────────────────── # Drivers # ───────────────────────────────────────────────────────────────────────────── +# In-memory list of a site's protocol names (membership keys are "site\037thread"). +_protos_in_site() { + local site="$1" key + for key in "${!G_PROTO[@]}"; do + [ "${key%%$US*}" = "$site" ] && printf '%s\n' "${key#*$US}" + done +} + # Enumerate every full path in a site by starting from each entry point. # Cross-site continuation happens naturally inside _walk_down. Dedup by the -# rendered "site\tchain" line. +# rendered "site\tchain" line. All in-memory — no subprocess. _enumerate_all_in_site() { - local site="$1" nc="$2" - local entry tmp - tmp=$(mktemp) - # entry points = threads with no incoming in this site - "$NCP" list-protocols "$nc" 2>/dev/null | while IFS= read -r entry; do - [ -z "$entry" ] && continue - if _is_entry_in "$nc" "$entry"; then - printf '%s\n' "$entry" >> "$tmp" - fi - done - # if no entry points (every thread has an incoming, e.g. a pure cycle), - # fall back to all protocols as start points (v2 fallback) - if [ ! -s "$tmp" ]; then - "$NCP" list-protocols "$nc" 2>/dev/null > "$tmp" - fi + local site="$1" + local entry entries=() any_entry=0 all=() while IFS= read -r entry; do [ -z "$entry" ] && continue - _walk_down "$site" "$site" "$nc" "$entry" "" "" 0 - done < "$tmp" - rm -f "$tmp" + all+=("$entry") + if _is_entry_in "$site" "$entry"; then + entries+=("$entry"); any_entry=1 + fi + done < <(_protos_in_site "$site") + # if no entry points (every thread has an incoming, e.g. a pure cycle), + # fall back to all protocols as start points (v2 fallback) + if [ "$any_entry" = "0" ]; then + entries=("${all[@]}") + fi + for entry in "${entries[@]}"; do + _walk_down "$site" "$site" "$entry" "" "" 0 + done } main_enumerate() { _discover_sites [ "${#SITE_NCS[@]}" -gt 0 ] || die "no NetConfig found (set \$HCIROOT, or pass --netconfig / --hciroot)" + # PARSE ONCE: build the whole in-memory route graph (single awk pass per + # NetConfig + reverse-source maps). The walkers then run entirely in memory. + # With --site-only and an explicit thread we still build the full graph (it is + # <1s for 24 sites); cross-site hops are simply suppressed by the SITE_ONLY guard. + _build_graph + local raw raw=$(mktemp) trap 'rm -f "$OUT_PATHS" "$raw"' EXIT if [ "$ALL_MODE" = "1" ]; then # whole-site entry chains; scope to --site if given (else every site) - local i sname snc + local i sname for ((i=0; i<${#SITE_NAMES[@]}; i++)); do - sname="${SITE_NAMES[$i]}"; snc="${SITE_NCS[$i]}" + sname="${SITE_NAMES[$i]}" if [ -n "$SITE_ARG" ] && [ "$sname" != "$SITE_ARG" ]; then continue; fi - _enumerate_all_in_site "$sname" "$snc" >> "$raw" + _enumerate_all_in_site "$sname" >> "$raw" done else - # locate the thread's home site - local home_site home_nc loc + # locate the thread's home site (in-memory membership lookup) + local home_site if [ -n "$NETCONFIG" ]; then - home_nc="$NETCONFIG"; home_site="$(basename "$(dirname "$NETCONFIG")")" - "$NCP" list-protocols "$home_nc" 2>/dev/null | grep -qxF "$THREAD" \ - || die "thread not found in $home_nc: $THREAD" + home_site="$(basename "$(dirname "$NETCONFIG")")" + [ -n "${G_PROTO[${home_site}${US}${THREAD}]:-}" ] \ + || die "thread not found in $NETCONFIG: $THREAD" elif [ -n "$SITE_ARG" ]; then - home_nc="$(_nc_for_site "$SITE_ARG")" || die "site not found under \$HCIROOT: $SITE_ARG" home_site="$SITE_ARG" - "$NCP" list-protocols "$home_nc" 2>/dev/null | grep -qxF "$THREAD" \ + [ -n "${G_PROTO[${home_site}${US}${THREAD}]:-}" ] \ || die "thread not found in site $SITE_ARG: $THREAD" else - loc="$(_locate_thread "$THREAD")" || die "thread not found in any discovered site: $THREAD" - home_site="${loc%%$US*}"; home_nc="${loc#*$US}" + home_site="$(_locate_thread "$THREAD")" || die "thread not found in any discovered site: $THREAD" fi case "$DIR_MODE" in - up) _walk_up "$home_site" "$home_site" "$home_nc" "$THREAD" "" "" 0 >> "$raw" ;; - down) _walk_down "$home_site" "$home_site" "$home_nc" "$THREAD" "" "" 0 >> "$raw" ;; + up) _walk_up "$home_site" "$home_site" "$THREAD" "" "" 0 >> "$raw" ;; + down) _walk_down "$home_site" "$home_site" "$THREAD" "" "" 0 >> "$raw" ;; full) - # v2 default: every full path (entry-point enumeration) that CONTAINS the - # thread; fall back to downstream-from-thread if none contain it. - local all_tmp - all_tmp=$(mktemp) - _enumerate_all_in_site "$home_site" "$home_nc" > "$all_tmp" - # cross-site: also enumerate full paths in any site whose entry chains - # could pass through the thread (the home site's own entry enumeration - # already crosses outward; inbound feeders in other sites are picked up - # because those sites' entry chains are enumerated in all-mode — but for - # a single-thread query we only have the home site's chains, so we also - # scan every discovered site's chains to catch upstream feeders). - if [ "$SITE_ONLY" = "0" ]; then - local j js jn - for ((j=0; j<${#SITE_NAMES[@]}; j++)); do - js="${SITE_NAMES[$j]}"; jn="${SITE_NCS[$j]}" - [ "$jn" = "$home_nc" ] && continue - _enumerate_all_in_site "$js" "$jn" >> "$all_tmp" - done - fi - # keep only chains containing the thread (match on " -> THREAD ->", - # leading "THREAD ->", or trailing "-> THREAD", or exact) - local kept - kept=$(awk -F'\t' -v t="$THREAD" ' + # v2 default: every full ROOT-TO-LEAF path CONTAINING the thread. + # + # v0.8.20 (rearchitected): do NOT scan every site's entry chains (the old + # O(sites x threads) loop). The complete chain = the thread's UPSTREAM + # feeder chains (each ending AT the thread: root -> ... -> thread) JOINED at + # the thread to its DOWNSTREAM chains (each starting AT the thread: + # thread -> ... -> leaf). Both walks are in-memory and follow cross-site + # links via destination blocks, so the join naturally spans sites + # (e.g. mux/ADTfr_epic_964700 --> ... ==> ancout/IB_ADT_muxS --> ancout/ADTto_CodaMetrix). + # The cartesian join over the (usually tiny) up x down sets is done in awk. + # Both halves are the RENDERED v1 chain (site/thread nodes; --> / ==> arrows). + # The upstream prefix ENDS with the queried node (home_site/THREAD); the + # downstream chain STARTS with it. We strip the leading queried node from the + # downstream — KEEPING the arrow that follows it (--> or ==>) so the cross-site + # boundary type is preserved — and graft the remaining suffix onto the prefix. + local up_tmp down_tmp qnode + up_tmp=$(mktemp); down_tmp=$(mktemp) + qnode="${home_site}/${THREAD}" + _walk_up "$home_site" "$home_site" "$THREAD" "" "" 0 > "$up_tmp" + _walk_down "$home_site" "$home_site" "$THREAD" "" "" 0 > "$down_tmp" + # join: for each upstream prefix x each downstream chain, emit + # prefix . + awk -F'\t' -v q="$qnode" ' + FNR==NR { usite[NR]=$1; up[NR]=$2; nu=NR; next } { - chain=$2 - # pad with arrows for unambiguous boundary matching - padded=" -> " chain " -> " - if (index(padded, " -> " t " -> ") > 0) print $0 - }' "$all_tmp" | sort -u) - if [ -n "$kept" ]; then - printf '%s\n' "$kept" >> "$raw" - else - _walk_down "$home_site" "$home_site" "$home_nc" "$THREAD" "" "" 0 >> "$raw" - fi - rm -f "$all_tmp" + dn=$2 + # split the downstream into the leading queried node, the arrow that + # follows it, and the remaining suffix. arrow is " --> " or " ==> ". + arrow=""; suffix="" + if (index(dn, q " --> ") == 1) { arrow=" --> "; suffix=substr(dn, length(q " --> ")+1) } + else if (index(dn, q " ==> ") == 1) { arrow=" ==> "; suffix=substr(dn, length(q " ==> ")+1) } + else { arrow=""; suffix="" } # downstream was just the node + for (i=1; i<=nu; i++) { + chain = up[i] + if (suffix != "") chain = up[i] arrow suffix + print usite[i] "\t" chain + } + } + ' "$up_tmp" "$down_tmp" | sort -u >> "$raw" + rm -f "$up_tmp" "$down_tmp" ;; esac fi @@ -469,25 +753,55 @@ main_enumerate() { } # ───────────────────────────────────────────────────────────────────────────── -# Render: OUT_PATHS holds "sitechain" lines. Build SITE THREAD HOPS PATH. +# Render: OUT_PATHS holds "sitechain" lines, where chain is the v1 rendered +# form (site/thread nodes joined by " --> " / " ==> "). All derived columns split +# the chain on the TYPED-ARROW regex " (--|==)> " so HOPS counts NODES and the +# root (field 1) is the first node — independent of the boundary type. # THREAD = first node of the chain (the anchor/root for this row) # HOPS = number of nodes in the chain # ───────────────────────────────────────────────────────────────────────────── render() { if [ ! -s "$OUT_PATHS" ]; then - printf 'No paths found.\n' + # No-paths goes to stderr for data/pipe formats so stdout stays clean for + # downstream field extraction (awkcut / cut never sees a prose line). + case "$FORMAT" in + v1|nodes|tsv|jsonl) printf 'No paths found.\n' >&2 ;; + *) printf 'No paths found.\n' ;; + esac return 0 fi - # produce a 4-col TSV: site thread hops path + + case "$FORMAT" in + v1) + # The ground-truth chain, one path per line. PIPE-FIRST: field 1 (split on + # the arrow tokens, e.g. `awkcut 1`) is the root node "site/thread". + awk -F'\t' '{ print $2 }' "$OUT_PATHS" + return 0 + ;; + nodes) + # node-only extraction: each path's site/thread nodes one per line, a blank + # line between paths. No arrows — clean for re-piping into `paths`. + awk -F'\t' ' + NR>1 { print "" } + { + chain=$2 + n=split(chain, parts, / (--|==)> /) + for (i=1; i<=n; i++) print parts[i] + }' "$OUT_PATHS" + return 0 + ;; + esac + + # produce a 4-col TSV: site thread hops path (path = the v1 typed chain) local tsv tsv=$(awk -F'\t' ' { site=$1; chain=$2 - # first node + # first node = chain up to the first typed arrow first=chain - sub(/ -> .*/, "", first) - # hop count = number of " -> " separators + 1 - n=split(chain, parts, / -> /) + sub(/ (--|==)> .*/, "", first) + # hop count = number of nodes = typed-arrow separators + 1 + n=split(chain, parts, / (--|==)> /) printf "%s\t%s\t%d\t%s\n", site, first, n, chain }' "$OUT_PATHS")