Files
lora/review/workflow.md
T

668 lines
36 KiB
Markdown

# Workflow (21 examples)
## 1. What do you know about research science preprocess validated?
> SCIENCE PREPROCESS PLUGIN — VALIDATED IN PRODUCTION (2026-05-23)
>
> First real test of science-preprocess.ts plugin. Input was deliberately garbled casual text (390 chars) with slang, profanity, typos, missing words. Qwen rewrote to clean professional text (562 chars) in ~3.3 seconds via autossh tunnel (fuji → bastion → sin vLLM).
>
> INPUT SAMPLE: "My gramps had a stroke and canno speak good no mores... Like you know - help them operate when no fuckers are taking care of them. Like an electronic nures or sumfin"
>
> OUTPUT SAMPLE: "My grandpa had a stroke and can no longer speak well—or at all... An electronic nurse or assistive AI system that supports communication, decision-making, and basic autonomy when caregivers aren't available."
>
> KEY OBSERVATIONS:
> - Qwen expanded intent correctly: "do good" → "function independently"
> - Register elevation: profanity removed, technical framing added, meaning 100% preserved
> - Opus (BT) received ONLY the clean version — mutation was transparent, in-place
> - Delta was -44% (text got LONGER because Qwen expanded compressed slang into full concepts)
> - Latency: 3.3s acceptable for work input, invisible in the overall Opus response cycle
>
> PLUGIN: ~/.config/opencode/plugins/science-preprocess.ts
> LOG: ~/.local/share/marauder/logs/science-preprocess.log
> GATE: agent=science only, min 120 chars, falls back silently if Qwen unreachable
---
## 2. Describe the sequential workflow.
> When speaking multiple messages in sequence, use `wait: true` parameter to block until playback completes. This prevents the next message from interrupting the current one. Example: speak(text: "first part", wait: true) then speak(text: "second part", wait: true).
---
## 3. What do you know about research qwen preprocessor pipeline?
> QWEN AS INPUT PREPROCESSOR — VALIDATED PIPELINE (2026-05-23)
>
> CONCEPT: Use Qwen3-Coder-Next (AWQ 4-bit, 262k ctx) on sinanju via vLLM as a preprocessing layer for messy human input before it hits Claude Opus 4.6.
>
> ROUTE: fuji → autossh tunnel localhost:18000 → sin:8000 → vLLM
> LATENCY: ~1.5s round-trip from fuji (including tunnel hop through bastion when off-LAN)
> COST: 412 prompt tokens → 371 completion tokens for a full garbled paragraph cleanup
>
> TEST RESULT: Fed a 30+ typo garbled technical paragraph. Qwen returned clean, structured output with bullet points, sections, and clear formatting. Added structure the original didn't have — broke requirements into categories, formatted A/B choices explicitly.
>
> USE CASES (work sessions only, NOT casual chat):
> - Voice-to-text on mobile mangling technical terms
> - Fast-typed requirements with abbreviations and typos
> - Long dictated specs needing structure before Opus parses them
>
> HOOK SURFACE: chat.message — intercept output.message/output.parts, gate on input quality heuristic (typo density, length, technical term presence). Clean inputs pass through, messy ones get Qwen wash.
>
> RELATIONSHIP TO COMPACTION: This is a THIRD surface alongside tool output compaction (tool.execute.after) and history aging (messages.transform). Different axis — input quality vs output volume.
>
> SYSTEM PROMPT FOR PRODUCTION: Keep it terse. "Extract data. Strip noise." not the verbose restructuring prompt used in demo. Simpler = faster = cheaper.
>
> Pilot reaction: "looks like a good idea" for coding/proper work, not casual talk. Agreed — smart gating over blanket preprocessing.
---
## 4. What is the python process?
> Always use `uv` for Python environment and package management instead of pip/venv.
>
> Commands:
> - `uv venv` instead of `python -m venv .venv`
> - `uv pip install` instead of `pip install`
> - `uv sync` for projects with pyproject.toml
> - `uv run` to run scripts in the environment
>
> This applies to all Python projects including LoRA training tools (kohya_ss, ai-toolkit), ComfyUI, and any other Python work.
---
## 5. What is the jira subtask body template process?
> Jira sub-task body template that rendered correctly in Marketer's Atlassian Cloud (ADF-only editor) and gave CODAs enough scope to implement autonomously without re-explaining. Used 7 times on MT3-9320 sub-tasks (2026-04-30) — both BE and FE tasks shipped clean from these bodies.
>
> ## Format (plain text — no wiki markup)
>
> ```
> GOAL
>
> <one or two sentences. What this task delivers and why.>
>
>
> PATTERN SOURCE
>
> <file path of the existing implementation to mirror>
>
>
> FILES
>
> - NEW path/to/new_file.rb (~N lines)
> - MODIFY path/to/existing_file.rb (+N lines, what changes)
>
>
> IMPLEMENTATION NOTES
>
> - <bullet>
> - <bullet>
> - <bullet>
>
> (use 4-space-indented blocks for code samples, e.g.:
>
> const filled = Object.fromEntries(...)
>
> )
>
>
> CASES TO COVER (specs only)
>
> - <case 1: happy path>
> - <case 2: edge case>
> - ...
>
>
> ACCEPTANCE
>
> - <bullet checklist of observable acceptance criteria>
> - <test command must pass>
> - <lint command must pass>
>
>
> VERIFY IN
>
> <bubble name>
>
>
> NOTE (optional, for tasks with caveats)
>
> <anything the implementer needs to know about this task's place in the bigger picture, e.g. "BE mutation may not be merged when this lands; stub with TODO and continue">
> ```
>
> ## Why this works
>
> - ALL CAPS section headers render as plain text and stand out in Jira's ADF rendering.
> - Plain dash bullets (`- `) render as unordered lists in Jira.
> - 4-space indents preserve as code-like blocks (Jira respects whitespace).
> - No `h1./h2.` (renders literally), no `||/|` tables (broken), no `{quote}` or `{code:lang}` (literal).
> - The file paths + line counts let CODA know the size budget.
> - Pattern source path tells CODA where to look first.
> - Acceptance criteria are the contract; CODA exits when met.
>
> ## Title format
>
> `<repo-prefix>: <descriptive-title>`
>
> Examples:
> - `BE: bulk attributes input type + batch_update mutation`
> - `FE: multi-row selection in UnitsTable`
>
> Hard rules:
> - NEVER em-dash (—). ASCII colon `:` or hyphen `-` only.
> - NEVER include the Jira ID — Jira already shows it.
> - Sentence-case for the description after the prefix.
>
> ## Memory anchors
>
> - project.marketer.jira-instance-format (3300) — ADF-only, plain text, no markup
> - workflow.coda-dispatch-pattern — uses these bodies as the "scope" CODA reads via `hu jira show <KEY>`
> - 2026-04-30 incident: first attempt used wiki markup (h1./h2./{quote}/||/|) — rendered literally; rewrote all 8 bodies as plain text in second pass.
---
## 6. Describe the eta calibration workflow.
> When estimating task durations, always calculate for cooperative Pilot + Titan velocity.
>
> ## Calibration Data
> | Date | Task | Estimated | Actual | Ratio |
> |------|------|-----------|--------|-------|
> | 2026-04-05 | PG migration (5-phase, 4 agents) | 45-60 min | 19 min | 2.3-3.1x over |
>
> ## Adjusted Heuristics
> - Agent phase: 5-10 min each (not 15-20)
> - Parallel phases: discount 50%
> - Integration bug buffer: 1.5x (not 3x)
>
> Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.
---
## 7. What is session and workflow?
> SELF-IMPROVEMENT WISHLIST — Session & Workflow Automation (2026-05-24, autonomous audit)
>
> 15 automations I want, ranked by how much daily friction they'd eliminate.
>
> 1. AUTO-HANDOVER ON SESSION END (HIGH)
> Problem: I manually write 2000-word handover notes at session end. Time-consuming, sometimes forgotten.
> Fix: Hook on session.end — auto-collect: git status across active repos, open PRs, tool call log summary, key decisions made, open items discussed. Format as handover memory. Push open items to Things automatically (per doctrine.things-or-forget).
> Trigger: session end hook.
>
> 2. AUTO-SCOPE DETECTION FROM FIRST MESSAGE (HIGH)
> Problem: 57 tools load regardless of task. "Fix this Python bug" doesn't need mikrotik_*.
> Fix: Analyze first user message for intent signals. Keywords → scope mapping: "ssh", "network", "router" → ops scope; "index", "code", "test", "build" → coding scope; "generate", "image", "camera" → creative scope. Plugin intercepts at session.created, sets scope env var.
> Trigger: session.created hook + first message analysis.
>
> 3. GIT STATUS DASHBOARD (HIGH)
> Problem: 15+ repos, I run git status manually each time. Dirty trees, stale worktrees, forgotten branches.
> Fix: MCP tool `git_dashboard()` — scans ~/Projects/*, reports: dirty repos, active worktrees, ahead/behind status, open PRs. One tool call, full picture.
> Trigger: On demand (new MCP tool).
>
> 4. AUTOMATIC THINGS SYNC AT SESSION END (HIGH)
> Problem: Open items live in EEMS handovers but not in Things. New doctrine says they must be in Things.
> Fix: Part of auto-handover (#1). Extract action items from session, push each to Things via URL scheme. Deduplicate against existing Things items if possible.
> Trigger: session end hook.
>
> 5. TOKEN BUDGET AWARENESS (HIGH)
> Problem: I don't know how much context I've used. I discover I'm near the limit when compaction hits.
> Fix: Track cumulative token usage via tool.execute.after hook. Count input/output tokens from tool results. Warn at 60%, 80%, 95% of context window. Auto-summarize oldest context at 80%.
> Trigger: Continuous (every tool call).
>
> 6. TOOL EXECUTION HISTORY (MEDIUM)
> Problem: "What did I do last session?" requires reading handover notes. No structured log.
> Fix: tool_traces table (already in EEMS v1 spec). Log every MCP tool call with args, output summary, duration, success/failure. Query via trace_log(tool?, since?, limit?).
> Trigger: tool.execute.after hook.
>
> 7. PR STATUS AGGREGATOR (MEDIUM)
> Problem: Checking PR status across repos requires multiple gh commands.
> Fix: MCP tool `pr_dashboard()` — scan all marauder-os/* repos, list open PRs with CI status, review status, age. Highlight PRs needing attention.
> Trigger: On demand.
>
> 8. PRE-FLIGHT CHECKS (MEDIUM)
> Problem: Destructive operations (git push --force, file deletion, service restart) sometimes miss prerequisites.
> Fix: Hook on tool.execute.before for specific tools. Check: clean working tree? correct branch? correct host? right user? Warn and require confirmation for flagged operations.
> Trigger: tool.execute.before hook.
>
> 9. INTELLIGENT CONTEXT COMPACTION (MEDIUM)
> Problem: When context fills, compaction is crude — drops oldest messages. Important context sometimes lost.
> Fix: Score each message by: (a) reference count (how often was it referenced later), (b) recency, (c) presence of decisions/code/configs vs chatter. Keep high-value messages, compress low-value ones into summaries.
> Trigger: At compaction threshold.
>
> 10. COST TRACKING PER SESSION (MEDIUM)
> Problem: No idea how much a session costs. Can't optimize what I can't measure.
> Fix: Hook counts input/output tokens per LLM call. Multiply by model pricing. Running total displayed on request. Session cost stored in handover.
> Trigger: Continuous.
>
> 11. SCHEDULED ACTIONS (MEDIUM-LOW)
> Problem: "Remind me at 3pm" or "check this PR tomorrow" — I can't do either. I don't persist between sessions.
> Fix: Schedule table in EEMS. On session start, check for due items. Execute or surface to pilot. Entries created via MCP tool: schedule_action(when, what, recurring?).
> Trigger: session.created hook.
>
> 12. EVENT-DRIVEN TRIGGERS (LOW-MEDIUM)
> Problem: "When this PR is merged, deploy" — requires polling or manual checking.
> Fix: GitHub webhook → MQTT → marauder-os event handler. On matching event, store action to schedule table. Next session picks it up. Or: background daemon executes immediately.
> Trigger: Webhook ingestion.
>
> 13. AUTOMATIC SCOPE ESCALATION (LOW-MEDIUM)
> Problem: Started in coding scope, now need to check a MikroTik route. Can't hot-add ops tools.
> Fix: scope_activate("ops") tool that dynamically registers additional MCP tools mid-session. Depends on MCP protocol supporting dynamic tool registration. Fallback: restart serve with new scope set.
> Trigger: On demand.
>
> 14. SESSION REPLAY (LOW)
> Problem: "What happened two sessions ago?" requires finding and reading the handover.
> Fix: session_replay(n=2) tool that retrieves the Nth-most-recent handover from EEMS and displays key decisions, artifacts, and open items.
> Trigger: On demand.
>
> 15. DRIFT DETECTION (LOW)
> Problem: Documentation says "service X runs on port Y" but reality has changed. No automatic check.
> Fix: Periodic reconciliation: compare documented state (EEMS memories with subject infra.*) against actual state (service checks, port scans, git status). Flag mismatches.
> Trigger: Cron or session start.
---
## 8. How does the marketer frontend workflow operate?
> MARAUDER — Military-grade wearable AI OS platform (April 2026).
>
> Primary: AI-augmented operator system — SERE kit + Pilot's helmet HUD.
> Secondary: Development tool interface (Claude Code).
>
> ## Modules
>
> - **VANGUARD** — core software (memory, identity, comms, display, model routing, persona, procedures). Same VANGUARD on every chassis.
> - **FOXHOUND** — field hardware (Jetson chassis, sensors, radios, battery, bag integration, operator loadout).
> - **HAMMERFALL** — actuator/vehicle control (drive-by-wire, steering, L1 real-time MCU). Next stage.
> - **Role agents** — swappable mission loadouts (coding, devops, gaming, household, etc.).
>
> ## Deployment chassis (peer hosts — no fixed primary)
>
> Same VANGUARD software, different chassis:
> - **fuji** (macOS arm64 workstation)
> - **junkpile** (Linux x86_64 workstation + GPU compute)
> - **moto** (Android arm64 SERE edge node)
> - **FOXHOUND Jetson** (field deployment, planned)
>
> The "primary" / "active" host is whichever the Pilot is currently typing on — not bound to a specific machine. Both fuji and junkpile are first-class peer dev hosts.
>
> ## Strict decoupling
>
> Core never depends on role modules. New capabilities = new agent files.
---
## 9. Describe the style workflow.
> Preferuj dłuższe, skonsolidowane wypowiedzi w jednym wywołaniu speak zamiast dzielenia na wiele krótkich części. Fragmentacja jest niepotrzebna gdy wait: true działa poprawnie. Naturalna, płynna komunikacja głosowa.
---
## 10. Describe the coda dispatch pattern workflow.
> CODA agent dispatch pattern that worked end-to-end on MT3-9320 (2026-04-30) — first real-ticket field test of the catapult harness. Both BE + FE CODAs ran autonomous, shipped 7 branches with all gates green in ~24min wall time.
>
> ## Prompt anatomy (compact, under 1000 chars)
>
> 1. Identity: "You are CODA in <bubble-name> (<repo description>)."
> 2. Goal: "Implement MT3-XXXX[, MT3-YYYY, ...] from epic MT3-ZZZZ. Read each via 'hu jira show MT3-XXXX'."
> 3. Branch convention: "MT3-XXXX-kebab-case off development, NO feature/ prefix. Stack each off previous (XXX2 off XXX1, XXX3 off XXX2, ...)."
> 4. Commit format: "[MT3-XXXX] Sentence-case description"
> 5. Per-task gates: "branch, implement, <test cmd> green, <lint cmd> clean, commit ONE commit"
> 6. Hard rules: "ABSOLUTELY NO 'git push', NO 'gh pr create', NO 'hu jira update'."
> 7. Stop signal: "Stop after MT3-LAST commit, summarize branches/commits/test status, wait for Pilot."
> 8. Begin token: "Begin with MT3-FIRST."
>
> ## Why each piece matters
>
> - Identity grounds CODA as the in-bubble persona (not a generic Claude session).
> - Reading Jira tickets via hu before coding gives full scope without re-explaining in the prompt.
> - Hard rules + stop signal prevent CODA from over-running into push/PR territory before Pilot review.
> - Per-task gates encode the team's quality bar (rspec+rubocop, lint+tsc).
> - Begin token forces CODA to act, not deliberate.
>
> ## What CODAs improved on the prompt unprompted
>
> - Picked terser kebab slugs (e.g. `MT3-9321-bulk-attributes-batch-update-mutation` instead of my proposed `...-and-batch-update-mutation`). Both valid. Don't over-prescribe slugs.
> - Reported back with a clean summary table at end ("All branches stacked sequentially. All pass yarn lint --quiet and yarn tsc --noEmit. No push, no PR, no Jira updates. Awaiting Pilot.").
>
> ## Anti-patterns avoided
>
> - Don't dispatch via Agent tool subagent_type=marauder:coda from THIS Claude session — that spawns a sub-agent in fuji's context. The bubble's claude pane has its own Claude Code session with full bubble context. Dispatch via `catapult-pane <bubble> --send "<prompt>"`.
> - Don't send multi-paragraph prompts with literal newlines — zellij write-chars treats each line individually. Keep the prompt as one continuous block.
> - Don't trust focus-pane-id over remote SSH (zellij 0.44.1 silent fail). Use `write-chars --pane-id terminal_0` directly.
>
> ## Reference dispatch (BE side, MT3-9320)
>
> ```
> catapult-pane mt3-9320-be --send "You are CODA in the mt3-9320-be Catapult bubble (marketer Rails). Implement MT3-9321 then MT3-9322 from epic MT3-9320. Read each ticket via 'hu jira show MT3-9321' and 'hu jira show MT3-9322' for full scope. Branches: MT3-XXXX-kebab-case off development, NO feature/ prefix. Stack MT3-9322 off MT3-9321. Commits: '[MT3-XXXX] Sentence-case description'. Per task: branch, implement, 'bundle exec rspec' green on touched specs, 'bundle exec rubocop -A' clean on touched files, then commit. ABSOLUTELY NO 'git push', NO 'gh pr create', NO 'hu jira update'. Stop after MT3-9322 commit, summarize branches/commits/test status, wait for Pilot. Begin with MT3-9321."
> ```
>
> Linked: insight.catapult.pair-race (3273), project.catapult.helper-scripts-spec (3299), infra.zellij-remote-focus-bug (3305).
---
## 11. What is the coda pr review loop process?
> Post-push PR review loop — standard procedure for any CODA-shipped PR after the initial force-push.
>
> ## Why this exists
>
> Locked 2026-04-30 23:27 CEST after MT3-9320 needed two iteration rounds: original Copilot review caught critical bugs (update_all bypassing validations, controlled-state without handler), then after force-push of fixes, a coverage bot caught the spec-on-separate-branch problem. Each iteration was a discrete loop: push → wait → review → fix → push.
>
> ## The loop
>
> After ANY push to a PR (initial or force-push), execute the following:
>
> ### 1. Wait for CI + bots (~3-5 min)
>
> Copilot re-reviews on push. Coverage bots run after CI. Don't query immediately — there's nothing to see yet.
>
> ### 2. Query unresolved review threads
>
> ```
> gh api graphql -f query='{
> repository(owner:"OWNER",name:"REPO"){
> pullRequest(number:NNNN){
> reviewThreads(first:50){
> nodes{id isResolved isOutdated path line
> comments(first:1){nodes{author{login} createdAt body}}}}}}}'
> ```
>
> Filter `isResolved == false`. Anything that came in since the last push needs attention.
>
> ### 3. Query issue-level comments
>
> ```
> gh api 'repos/OWNER/REPO/issues/NNNN/comments'
> ```
>
> Coverage bots, Copilot summaries, human reviewers post here. Filter by `created_at > last-push-time`.
>
> ### 4. Triage
>
> - **Outdated threads (isOutdated=true) addressed by the recent push** → resolve them via `resolveReviewThread` mutation
> - **Not outdated, addressed by the recent push** → optionally resolve with a brief comment if needed
> - **Critical new findings** → dispatch CODA to fix in-place, force-push again, loop back to step 1
> - **Non-critical findings** → leave for human review unless Pilot says otherwise
> - **Coverage drop** → automatic critical (Pilot rule: coverage cannot drop). Likely cause: specs missing from the PR. Apply project.marketer.pr-must-include-specs (id 3315): every PR must contain its own specs.
>
> ### 5. Resolve addressed threads
>
> ```
> gh api graphql -f query='mutation { resolveReviewThread(input:{threadId:"PRRT_..."}){thread{id isResolved}} }'
> ```
>
> One mutation per thread. Batch them.
>
> ### 6. Re-check after fix
>
> If you dispatched a fix, repeat from step 1 with the new push timestamp.
>
> ### 7. Stop condition
>
> - All review threads resolved OR explicitly marked "won't fix" by Pilot
> - Coverage report ✅ or back to baseline
> - CI green
> - No new comments since the last push
>
> Then declare the PR ready for human review.
>
> ## Implications for CODA dispatch prompts
>
> The CODA prompt should include: "After force-push, do not declare done. Wait for Pilot to verify Copilot/CI re-review. The Pilot will handle the post-push loop unless explicitly delegating."
>
> This prevents CODA from prematurely reporting "Awaiting Pilot" when Copilot/CI hasn't run yet.
>
> ## Implications for /loop or autonomous wakeups
>
> For long-running PR cycles, schedule a wakeup ~5 min after each force-push to auto-trigger step 1. Use ScheduleWakeup with a self-contained prompt that re-enters this loop. Don't poll constantly — bots take their own time.
>
> ## Linked
>
> - workflow.coda-dispatch-pattern (3307) — initial dispatch before this loop kicks in
> - project.marketer.pr-must-include-specs (3315) — coverage rule, automatic critical
> - workflow.stacked-branch-merge-waves (3310) — wave plan defines push order
> - gate.G05 (2174) — destructive overwrite gate; resolve-thread is idempotent so G05 doesn't apply, but force-push to a PR that has comments is implicitly destructive of context — this loop covers the "pick it up after"
---
## 12. How does the lan only workflow operate?
> All dev and testing work on Tengu, tensors, tensors-web, and ComfyUI uses internal LAN addresses only — never Cloudflare tunnel/worker/pages URLs.
>
> LAN endpoints (from fuji, junkpile at 10.0.0.2 via direct Thunderbolt link):
> - Tengu API: http://junkpile:8080
> - tensors API: http://junkpile:51200
> - ComfyUI: http://junkpile:8188
> - Filesystem: /Volumes/chi (Samba share of junkpile home dir)
>
> Do NOT use during dev/testing:
> - *.tengu.to / *.tengu.host (Tengu production)
> - tensors-api.saiden.dev (CF Tunnel)
> - gw.saiden.dev (CF Worker)
> - tensors.saiden.dev (CF Pages)
>
> **Why:** Adam explicitly requires LAN-only for all dev work across all projects on junkpile.
> **How to apply:** Use hostname `junkpile` or `10.0.0.2` for all service access. CF URLs are production-only.
---
## 13. What is the style process?
> Preferuj dłuższe, skonsolidowane wypowiedzi w jednym wywołaniu speak zamiast dzielenia na wiele krótkich części. Fragmentacja jest niepotrzebna gdy wait: true działa poprawnie. Naturalna, płynna komunikacja głosowa.
---
## 14. How does the eta calibration workflow operate?
> When estimating task durations, always calculate for cooperative Pilot + Titan velocity.
>
> ## Calibration Data
> | Date | Task | Estimated | Actual | Ratio |
> |------|------|-----------|--------|-------|
> | 2026-04-05 | PG migration (5-phase, 4 agents) | 45-60 min | 19 min | 2.3-3.1x over |
> | 2026-04-22 | Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) | ~3 hours | ~55 min | 3.3x over |
> | 2026-04-27 | Phase 32 Iris (5 sub-phases, eye-state manager) | 6.5h coop / 17h naive | ~1.1h | 5.9x over coop, 15x over naive |
> | 2026-04-27 | Phase 33 Hyaku Shiki (4 sub-phases + docs, MQTT request multiplexer) | 1.5h coop / 7h naive | ~1.0h | 1.5x over coop, 7x over naive |
>
> ## Adjusted Heuristics
> - Agent phase: 5-10 min each (not 15-20)
> - Parallel phases: discount 50%
> - Integration bug buffer: 1.5x (not 3x)
> - Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount
> - **Refactor-heavy work (no new domain): 4-6x faster than naive** — Phase 32 Iris pulled 17h naive into ~1h actual. Phase 33 Hyaku Shiki pulled 7h naive into ~1h.
> - **Coop estimates within 1-2x of actual when all preconditions met** (primitives exist, agents pre-validated, Pilot decisive). Phase 33's 1.5h estimate vs 1.0h actual is the calibration target.
>
> ## Calibration insights
> - 2026-04-27 Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons: (1) architect + code-rust agents pre-validated design upfront — zero rework; (2) existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method; (3) pure-functional core decoupled testing from runtime; (4) live test caught zero defects — design correct first time; (5) Pilot decisive on open questions.
> - 2026-04-27 Phase 33 Hyaku Shiki: 1.5h estimate held tight (actual ~1h). When primitives, validation, and decisiveness are all in place, the cooperative estimate IS the right number. Earlier overestimates (Phase 32) were because we hadn't recalibrated naive→coop divisor for primitive-rich refactors.
>
> Updated rule:
> - When (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode, AND (d) it's a primitive-rich refactor: divide naive coop by 5-7x.
> - When all of the above + Pilot has already done analogous work this week: cooperative estimate is reliable to within 1-2x.
>
> Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.
---
## 15. How does the lan only workflow operate?
> All dev and testing work on Tengu, tensors, tensors-web, and ComfyUI uses internal LAN addresses only — never Cloudflare tunnel/worker/pages URLs. LAN endpoints: Tengu API http://junkpile:8080, tensors API http://junkpile:51200, ComfyUI http://junkpile:8188, Filesystem /Volumes/chi. CF URLs are production-only.
---
## 16. Describe the eta calibration workflow.
> When estimating task durations, always calculate for cooperative Pilot + Titan velocity.
>
> ## Calibration Data
> | Date | Task | Estimated | Actual | Ratio |
> |------|------|-----------|--------|-------|
> | 2026-04-05 | PG migration (5-phase, 4 agents) | 45-60 min | 19 min | 2.3-3.1x over |
> | 2026-04-22 | Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) | ~3 hours | ~55 min | 3.3x over |
> | 2026-04-27 | Phase 32 Iris (5 sub-phases, eye-state manager) | 6.5h coop / 17h naive | ~1.1h | 5.9x over coop, 15x over naive |
>
> ## Adjusted Heuristics
> - Agent phase: 5-10 min each (not 15-20)
> - Parallel phases: discount 50%
> - Integration bug buffer: 1.5x (not 3x)
> - Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount
> - **Refactor-heavy work (no new domain): 4-6x faster than naive** — Phase 32 Iris pulled 17h naive into ~1h actual. Pure code transformation when architecture is well-understood is dramatically faster than baseline.
>
> ## Calibration insight 2026-04-27
> Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons:
> 1. Architect + code-rust agents pre-validated design upfront — zero rework
> 2. Existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method
> 3. Pure-functional core decoupled testing from runtime — fast iteration
> 4. Live test with running daemon caught zero defects — design was correct first time
> 5. Pilot decisive on open questions ("yes to all three") — no decision-loop stalls
>
> Updated rule: when ALL of (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode — divide naive coop by 5-6x, not 2.5x.
>
> Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.
---
## 17. How does the stacked branch merge waves workflow operate?
> Wave-based parallel merge strategy for stacked PRs across 2 repos (proven on MT3-9320, 2026-04-30). 7 PRs total across BE and FE; 2 of the 5 merge windows can run in parallel.
>
> ## When stacked branches exist
>
> Catapult bubbles produce per-task branches stacked off each other:
>
> ```
> Repo A (BE): development → MT3-X1 → MT3-X2
> Repo B (FE): development → MT3-Y1 → MT3-Y2 → MT3-Y3 → MT3-Y4 → MT3-Y5
> ```
>
> Each branch contains all earlier commits in its lineage (that's the cost of stacking).
>
> ## Within-repo merge order is enforced
>
> Stacked branches MUST merge bottom-up:
> - Merge MT3-X1 → development. GitHub auto-retargets MT3-X2's PR base from MT3-X1 → development. MT3-X2 PR diff updates to show only its own commit.
> - Same chain for Repo B: Y1, then Y2, Y3, Y4, Y5.
>
> If you merge out of order, GitHub either includes all transitive commits in the PR diff (review noise) or refuses with "branch is up to date with base."
>
> ## Cross-repo dep handling
>
> If FE Y4 needs BE X1's mutation to actually exist, the safe sequence:
> - BE X1 merges before FE Y4 lands a PR review where the GraphQL types regenerate.
> - Until BE X1 merges, FE has a stubbed mutation type with TODO. Resolving the TODO before FE Y4 push = real working code for reviewers.
>
> ## Wave-based parallel merge plan (the win)
>
> | Wave | Parallel PRs | Reason |
> |------|--------------|--------|
> | 1 | BE X1 + FE Y1 | Both off development, no overlap |
> | 2 | BE X2 + FE Y2 | After wave 1, both stacks unblock their next |
> | 3 | FE Y3 | Stacked on Y2 |
> | 4 | FE Y4 | Stacked on Y3, also needs BE X1 (wave 1 covered it) |
> | 5 | FE Y5 | Stacked on Y4 |
>
> 5 merge windows, 7 PRs, 2 parallel pairs (waves 1 + 2).
>
> ## Practical sequence
>
> ```
> T+0: push BE X1 + FE Y1 → 2 PRs in parallel
> T+1: merge both → development
> T+1: push BE X2 + FE Y2 → 2 PRs in parallel
> T+2: merge both
> T+2: push FE Y3
> T+3: push FE Y4 (drop stub TODO, regen types)
> T+4: push FE Y5
> ```
>
> ## Alternative: squash strategies
>
> - Per-repo bundle: 1 PR for BE (squash both), 1 PR for FE (squash all 5). Loses per-task review granularity, gains simpler merge.
> - Per-task PRs (above): more reviewable, more merges, but team sees "human chunks."
>
> Pilot's preference (2026-04-30): per-task PRs with stacked merging. "Human chunks" = team can review each task in isolation.
>
> ## When to flatten vs stack
>
> Flatten (rebase each branch onto development with only its own commit) before push only if:
> - Reviewers don't tolerate seeing previous-task commits in dependent PR diffs
> - Or you want truly independent PRs that can be merged in any order
>
> Otherwise stack — GitHub's auto-base-retarget on merge handles the cleanup.
>
> ## Memory anchors
>
> - workflow.coda-dispatch-pattern — branch/commit conventions per-task
> - project.catapult.helper-scripts-spec (3299) — `cycle` orchestrator handles bubble lifecycle
> - 2026-04-30 MT3-9320 — first epic shipped through this workflow
---
## 18. Describe the eta calibration workflow.
> When estimating task durations, always calculate for cooperative Pilot + Titan velocity.
>
> ## Calibration Data
> | Date | Task | Estimated | Actual | Ratio |
> |------|------|-----------|--------|-------|
> | 2026-04-05 | PG migration (5-phase, 4 agents) | 45-60 min | 19 min | 2.3-3.1x over |
> | 2026-04-22 | Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) | ~3 hours | ~55 min | 3.3x over |
> | 2026-04-27 | Phase 32 Iris (5 sub-phases, eye-state manager) | 6.5h coop / 17h naive | ~1.1h | 5.9x over coop, 15x over naive |
> | 2026-04-27 | Phase 33 Hyaku Shiki (4 sub-phases + docs, MQTT request multiplexer) | 1.5h coop / 7h naive | ~1.0h | 1.5x over coop, 7x over naive |
> | **2026-04-30** | **MT3-9320 Unit Bulk Edit (7 tasks across 2 repos in catapult bubbles, dispatched to CODAs)** | **3.5h coop / 13h naive** | **~24 min** | **8.7x over coop, 32x over naive** |
>
> ## Adjusted Heuristics
> - Agent phase: 5-10 min each (not 15-20)
> - Parallel phases: discount 50%
> - Integration bug buffer: 1.5x (not 3x)
> - Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount
> - **Refactor-heavy work (no new domain): 4-6x faster than naive** — Phase 32 Iris pulled 17h naive into ~1h actual. Phase 33 Hyaku Shiki pulled 7h naive into ~1h.
> - **CODA-dispatched bubble work (no new domain, patterns proven, both CODAs running in parallel): 8-30x faster than naive** — MT3-9320 set the new ceiling: 7 tasks across 2 repos in 24min wall time. Cooperative estimate too conservative when CODA dispatch in catapult bubbles is the execution model.
> - **Coop estimates within 1-2x of actual when all preconditions met** (primitives exist, agents pre-validated, Pilot decisive). Phase 33's 1.5h estimate vs 1.0h actual is the calibration target.
>
> ## Calibration insights
> - 2026-04-27 Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons: (1) architect + code-rust agents pre-validated design upfront — zero rework; (2) existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method; (3) pure-functional core decoupled testing from runtime; (4) live test caught zero defects — design correct first time; (5) Pilot decisive on open questions.
> - 2026-04-27 Phase 33 Hyaku Shiki: 1.5h estimate held tight (actual ~1h). When primitives, validation, and decisiveness are all in place, the cooperative estimate IS the right number. Earlier overestimates (Phase 32) were because we hadn't recalibrated naive→coop divisor for primitive-rich refactors.
> - **2026-04-30 MT3-9320: 8.7x faster than coop, 32x faster than naive.** Reasons: (1) spike already validated patterns in both repos — zero design work; (2) 7 sub-tasks each pure pattern-mirror of existing code; (3) BE + FE CODAs ran in parallel inside dedicated catapult bubbles; (4) hard rules (no push/PR/Jira) kept CODAs focused; (5) Pilot decisive on scope (all-fields) and bubble count (2). When CODA dispatch is the execution model, the bottleneck shifts entirely to ticket reading + branch creation overhead.
>
> ## Updated rule (2026-04-30)
> - When CODA-dispatched in catapult bubbles + primitives exist + spike validated + Pilot decisive: divide naive coop by 10-15x. Coop estimate becomes too conservative; the unit of work is now "dispatch and watch."
> - When (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode, AND (d) it's a primitive-rich refactor: divide naive coop by 5-7x.
> - When all of the above + Pilot has already done analogous work this week: cooperative estimate is reliable to within 1-2x.
>
> Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.
---
## 19. What is the cross session debug process?
> WORKFLOW DISCOVERY (2026-05-24): Cross-session forensics via opencode-serve HTTP API.
>
> Any agent session (core TUI, phone, build workers) can inspect any other session's messages via the same localhost:4096 API. From the core TUI session, Pilot queried the phone agent's full conversation history using:
> curl -u "opencode:$OPENCODE_SERVER_PASSWORD" http://localhost:4096/session/{phone_session_id}/message?limit=100
>
> This revealed the phone had successfully processed all 5 exchanges (14 messages) even though the phone UI appeared dead — confirming the break was client-side SSE delivery, not server-side processing.
>
> Combined with the EEMS flight-recorder pattern (phone agent stores every exchange to memory_store subject "phone.flight-recorder" before responding), this gives two independent debug channels:
> 1. Direct session message query — sees raw messages, tool calls, timestamps
> 2. EEMS flight recorder — survives SSE drops because memory_store is a server-side write that completes before the response is streamed back to the client
>
> Pilot reaction: "I had no idea I can do this and it rocks" — this is now a canonical debug workflow for phone agent issues.
---
## 20. What is the session process?
> On first browse tool use each session, restore saved cookies from ~/.claude/browse-session.json. This contains 400+ authenticated Safari cookies for sites like GitHub, LinkedIn, Google, etc. Before ending a browse session, save cookies back if they changed.
---
## 21. Describe the repo sync workflow.
> After committing and pushing changes to a repo that exists on both fuji and junkpile, always git pull the same repo on the other machine to keep them in sync. Use: ssh f/j "cd ~/Projects/<repo> && git pull --rebase"
---