madcat-os/lora

Fork 0

Files

T

marauder-actual 4cef9386b1 add docs: system lora plan, specialist specs, training review

2026-05-31 11:38:46 +02:00

36 KiB

Raw Permalink Blame History

Workflow (21 examples)

1. What do you know about research science preprocess validated?

SCIENCE PREPROCESS PLUGIN — VALIDATED IN PRODUCTION (2026-05-23)

First real test of science-preprocess.ts plugin. Input was deliberately garbled casual text (390 chars) with slang, profanity, typos, missing words. Qwen rewrote to clean professional text (562 chars) in ~3.3 seconds via autossh tunnel (fuji → bastion → sin vLLM).

INPUT SAMPLE: "My gramps had a stroke and canno speak good no mores... Like you know - help them operate when no fuckers are taking care of them. Like an electronic nures or sumfin"

OUTPUT SAMPLE: "My grandpa had a stroke and can no longer speak well—or at all... An electronic nurse or assistive AI system that supports communication, decision-making, and basic autonomy when caregivers aren't available."

KEY OBSERVATIONS:

Qwen expanded intent correctly: "do good" → "function independently"

Register elevation: profanity removed, technical framing added, meaning 100% preserved

Opus (BT) received ONLY the clean version — mutation was transparent, in-place

Delta was -44% (text got LONGER because Qwen expanded compressed slang into full concepts)

Latency: 3.3s acceptable for work input, invisible in the overall Opus response cycle

PLUGIN: ~/.config/opencode/plugins/science-preprocess.ts LOG: ~/.local/share/marauder/logs/science-preprocess.log GATE: agent=science only, min 120 chars, falls back silently if Qwen unreachable

2. Describe the sequential workflow.

When speaking multiple messages in sequence, use wait: true parameter to block until playback completes. This prevents the next message from interrupting the current one. Example: speak(text: "first part", wait: true) then speak(text: "second part", wait: true).

3. What do you know about research qwen preprocessor pipeline?

QWEN AS INPUT PREPROCESSOR — VALIDATED PIPELINE (2026-05-23)

CONCEPT: Use Qwen3-Coder-Next (AWQ 4-bit, 262k ctx) on sinanju via vLLM as a preprocessing layer for messy human input before it hits Claude Opus 4.6.

ROUTE: fuji → autossh tunnel localhost:18000 → sin:8000 → vLLM LATENCY: ~1.5s round-trip from fuji (including tunnel hop through bastion when off-LAN) COST: 412 prompt tokens → 371 completion tokens for a full garbled paragraph cleanup

TEST RESULT: Fed a 30+ typo garbled technical paragraph. Qwen returned clean, structured output with bullet points, sections, and clear formatting. Added structure the original didn't have — broke requirements into categories, formatted A/B choices explicitly.

USE CASES (work sessions only, NOT casual chat):

Voice-to-text on mobile mangling technical terms

Fast-typed requirements with abbreviations and typos

Long dictated specs needing structure before Opus parses them

HOOK SURFACE: chat.message — intercept output.message/output.parts, gate on input quality heuristic (typo density, length, technical term presence). Clean inputs pass through, messy ones get Qwen wash.

RELATIONSHIP TO COMPACTION: This is a THIRD surface alongside tool output compaction (tool.execute.after) and history aging (messages.transform). Different axis — input quality vs output volume.

SYSTEM PROMPT FOR PRODUCTION: Keep it terse. "Extract data. Strip noise." not the verbose restructuring prompt used in demo. Simpler = faster = cheaper.

Pilot reaction: "looks like a good idea" for coding/proper work, not casual talk. Agreed — smart gating over blanket preprocessing.

4. What is the python process?

Always use uv for Python environment and package management instead of pip/venv.

Commands:

uv venv instead of python -m venv .venv

uv pip install instead of pip install

uv sync for projects with pyproject.toml

uv run to run scripts in the environment

This applies to all Python projects including LoRA training tools (kohya_ss, ai-toolkit), ComfyUI, and any other Python work.

5. What is the jira subtask body template process?

Jira sub-task body template that rendered correctly in Marketer's Atlassian Cloud (ADF-only editor) and gave CODAs enough scope to implement autonomously without re-explaining. Used 7 times on MT3-9320 sub-tasks (2026-04-30) — both BE and FE tasks shipped clean from these bodies.

Format (plain text — no wiki markup)
GOAL

<one or two sentences. What this task delivers and why.>


PATTERN SOURCE

<file path of the existing implementation to mirror>


FILES

- NEW    path/to/new_file.rb               (~N lines)
- MODIFY path/to/existing_file.rb          (+N lines, what changes)


IMPLEMENTATION NOTES

- <bullet>
- <bullet>
- <bullet>

(use 4-space-indented blocks for code samples, e.g.:

    const filled = Object.fromEntries(...)

)


CASES TO COVER     (specs only)

- <case 1: happy path>
- <case 2: edge case>
- ...


ACCEPTANCE

- <bullet checklist of observable acceptance criteria>
- <test command must pass>
- <lint command must pass>


VERIFY IN

<bubble name>


NOTE     (optional, for tasks with caveats)

<anything the implementer needs to know about this task's place in the bigger picture, e.g. "BE mutation may not be merged when this lands; stub with TODO and continue">
Why this works

ALL CAPS section headers render as plain text and stand out in Jira's ADF rendering.

Plain dash bullets (- ) render as unordered lists in Jira.

4-space indents preserve as code-like blocks (Jira respects whitespace).

No h1./h2. (renders literally), no ||/| tables (broken), no {quote} or {code:lang} (literal).

The file paths + line counts let CODA know the size budget.

Pattern source path tells CODA where to look first.

Acceptance criteria are the contract; CODA exits when met.

Title format

<repo-prefix>: <descriptive-title>

Examples:

BE: bulk attributes input type + batch_update mutation

FE: multi-row selection in UnitsTable

Hard rules:

NEVER em-dash (—). ASCII colon : or hyphen - only.

NEVER include the Jira ID — Jira already shows it.

Sentence-case for the description after the prefix.

Memory anchors

project.marketer.jira-instance-format (3300) — ADF-only, plain text, no markup

workflow.coda-dispatch-pattern — uses these bodies as the "scope" CODA reads via hu jira show <KEY>

2026-04-30 incident: first attempt used wiki markup (h1./h2./{quote}/||/|) — rendered literally; rewrote all 8 bodies as plain text in second pass.

6. Describe the eta calibration workflow.

When estimating task durations, always calculate for cooperative Pilot + Titan velocity.

Calibration Data

Date Task Estimated Actual Ratio

2026-04-05 PG migration (5-phase, 4 agents) 45-60 min 19 min 2.3-3.1x over

Adjusted Heuristics

Agent phase: 5-10 min each (not 15-20)

Parallel phases: discount 50%

Integration bug buffer: 1.5x (not 3x)

Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.

Date	Task	Estimated	Actual	Ratio
2026-04-05	PG migration (5-phase, 4 agents)	45-60 min	19 min	2.3-3.1x over

7. What is session and workflow?

SELF-IMPROVEMENT WISHLIST — Session & Workflow Automation (2026-05-24, autonomous audit)

15 automations I want, ranked by how much daily friction they'd eliminate.

AUTO-HANDOVER ON SESSION END (HIGH) Problem: I manually write 2000-word handover notes at session end. Time-consuming, sometimes forgotten. Fix: Hook on session.end — auto-collect: git status across active repos, open PRs, tool call log summary, key decisions made, open items discussed. Format as handover memory. Push open items to Things automatically (per doctrine.things-or-forget). Trigger: session end hook.

AUTO-SCOPE DETECTION FROM FIRST MESSAGE (HIGH) Problem: 57 tools load regardless of task. "Fix this Python bug" doesn't need mikrotik_*. Fix: Analyze first user message for intent signals. Keywords → scope mapping: "ssh", "network", "router" → ops scope; "index", "code", "test", "build" → coding scope; "generate", "image", "camera" → creative scope. Plugin intercepts at session.created, sets scope env var. Trigger: session.created hook + first message analysis.

GIT STATUS DASHBOARD (HIGH) Problem: 15+ repos, I run git status manually each time. Dirty trees, stale worktrees, forgotten branches. Fix: MCP tool git_dashboard() — scans ~/Projects/*, reports: dirty repos, active worktrees, ahead/behind status, open PRs. One tool call, full picture. Trigger: On demand (new MCP tool).

AUTOMATIC THINGS SYNC AT SESSION END (HIGH) Problem: Open items live in EEMS handovers but not in Things. New doctrine says they must be in Things. Fix: Part of auto-handover (#1). Extract action items from session, push each to Things via URL scheme. Deduplicate against existing Things items if possible. Trigger: session end hook.

TOKEN BUDGET AWARENESS (HIGH) Problem: I don't know how much context I've used. I discover I'm near the limit when compaction hits. Fix: Track cumulative token usage via tool.execute.after hook. Count input/output tokens from tool results. Warn at 60%, 80%, 95% of context window. Auto-summarize oldest context at 80%. Trigger: Continuous (every tool call).

TOOL EXECUTION HISTORY (MEDIUM) Problem: "What did I do last session?" requires reading handover notes. No structured log. Fix: tool_traces table (already in EEMS v1 spec). Log every MCP tool call with args, output summary, duration, success/failure. Query via trace_log(tool?, since?, limit?). Trigger: tool.execute.after hook.

PR STATUS AGGREGATOR (MEDIUM) Problem: Checking PR status across repos requires multiple gh commands. Fix: MCP tool pr_dashboard() — scan all marauder-os/* repos, list open PRs with CI status, review status, age. Highlight PRs needing attention. Trigger: On demand.

PRE-FLIGHT CHECKS (MEDIUM) Problem: Destructive operations (git push --force, file deletion, service restart) sometimes miss prerequisites. Fix: Hook on tool.execute.before for specific tools. Check: clean working tree? correct branch? correct host? right user? Warn and require confirmation for flagged operations. Trigger: tool.execute.before hook.

INTELLIGENT CONTEXT COMPACTION (MEDIUM) Problem: When context fills, compaction is crude — drops oldest messages. Important context sometimes lost. Fix: Score each message by: (a) reference count (how often was it referenced later), (b) recency, (c) presence of decisions/code/configs vs chatter. Keep high-value messages, compress low-value ones into summaries. Trigger: At compaction threshold.

COST TRACKING PER SESSION (MEDIUM) Problem: No idea how much a session costs. Can't optimize what I can't measure. Fix: Hook counts input/output tokens per LLM call. Multiply by model pricing. Running total displayed on request. Session cost stored in handover. Trigger: Continuous.

SCHEDULED ACTIONS (MEDIUM-LOW) Problem: "Remind me at 3pm" or "check this PR tomorrow" — I can't do either. I don't persist between sessions. Fix: Schedule table in EEMS. On session start, check for due items. Execute or surface to pilot. Entries created via MCP tool: schedule_action(when, what, recurring?). Trigger: session.created hook.

EVENT-DRIVEN TRIGGERS (LOW-MEDIUM) Problem: "When this PR is merged, deploy" — requires polling or manual checking. Fix: GitHub webhook → MQTT → marauder-os event handler. On matching event, store action to schedule table. Next session picks it up. Or: background daemon executes immediately. Trigger: Webhook ingestion.

AUTOMATIC SCOPE ESCALATION (LOW-MEDIUM) Problem: Started in coding scope, now need to check a MikroTik route. Can't hot-add ops tools. Fix: scope_activate("ops") tool that dynamically registers additional MCP tools mid-session. Depends on MCP protocol supporting dynamic tool registration. Fallback: restart serve with new scope set. Trigger: On demand.

SESSION REPLAY (LOW) Problem: "What happened two sessions ago?" requires finding and reading the handover. Fix: session_replay(n=2) tool that retrieves the Nth-most-recent handover from EEMS and displays key decisions, artifacts, and open items. Trigger: On demand.

DRIFT DETECTION (LOW) Problem: Documentation says "service X runs on port Y" but reality has changed. No automatic check. Fix: Periodic reconciliation: compare documented state (EEMS memories with subject infra.*) against actual state (service checks, port scans, git status). Flag mismatches. Trigger: Cron or session start.

8. How does the marketer frontend workflow operate?

MARAUDER — Military-grade wearable AI OS platform (April 2026).

Primary: AI-augmented operator system — SERE kit + Pilot's helmet HUD. Secondary: Development tool interface (Claude Code).

Modules

VANGUARD — core software (memory, identity, comms, display, model routing, persona, procedures). Same VANGUARD on every chassis.

FOXHOUND — field hardware (Jetson chassis, sensors, radios, battery, bag integration, operator loadout).

HAMMERFALL — actuator/vehicle control (drive-by-wire, steering, L1 real-time MCU). Next stage.

Role agents — swappable mission loadouts (coding, devops, gaming, household, etc.).

Deployment chassis (peer hosts — no fixed primary)

Same VANGUARD software, different chassis:

fuji (macOS arm64 workstation)

junkpile (Linux x86_64 workstation + GPU compute)

moto (Android arm64 SERE edge node)

FOXHOUND Jetson (field deployment, planned)

The "primary" / "active" host is whichever the Pilot is currently typing on — not bound to a specific machine. Both fuji and junkpile are first-class peer dev hosts.

Strict decoupling

Core never depends on role modules. New capabilities = new agent files.

9. Describe the style workflow.

Preferuj dłuższe, skonsolidowane wypowiedzi w jednym wywołaniu speak zamiast dzielenia na wiele krótkich części. Fragmentacja jest niepotrzebna gdy wait: true działa poprawnie. Naturalna, płynna komunikacja głosowa.

10. Describe the coda dispatch pattern workflow.

CODA agent dispatch pattern that worked end-to-end on MT3-9320 (2026-04-30) — first real-ticket field test of the catapult harness. Both BE + FE CODAs ran autonomous, shipped 7 branches with all gates green in ~24min wall time.

Prompt anatomy (compact, under 1000 chars)

Identity: "You are CODA in ()."

Goal: "Implement MT3-XXXX[, MT3-YYYY, ...] from epic MT3-ZZZZ. Read each via 'hu jira show MT3-XXXX'."

Branch convention: "MT3-XXXX-kebab-case off development, NO feature/ prefix. Stack each off previous (XXX2 off XXX1, XXX3 off XXX2, ...)."

Commit format: "[MT3-XXXX] Sentence-case description"

Per-task gates: "branch, implement, green, clean, commit ONE commit"

Hard rules: "ABSOLUTELY NO 'git push', NO 'gh pr create', NO 'hu jira update'."

Stop signal: "Stop after MT3-LAST commit, summarize branches/commits/test status, wait for Pilot."

Begin token: "Begin with MT3-FIRST."

Why each piece matters

Identity grounds CODA as the in-bubble persona (not a generic Claude session).

Reading Jira tickets via hu before coding gives full scope without re-explaining in the prompt.

Hard rules + stop signal prevent CODA from over-running into push/PR territory before Pilot review.

Per-task gates encode the team's quality bar (rspec+rubocop, lint+tsc).

Begin token forces CODA to act, not deliberate.

What CODAs improved on the prompt unprompted

Picked terser kebab slugs (e.g. MT3-9321-bulk-attributes-batch-update-mutation instead of my proposed ...-and-batch-update-mutation). Both valid. Don't over-prescribe slugs.

Reported back with a clean summary table at end ("All branches stacked sequentially. All pass yarn lint --quiet and yarn tsc --noEmit. No push, no PR, no Jira updates. Awaiting Pilot.").

Anti-patterns avoided

Don't dispatch via Agent tool subagent_type=marauder:coda from THIS Claude session — that spawns a sub-agent in fuji's context. The bubble's claude pane has its own Claude Code session with full bubble context. Dispatch via catapult-pane <bubble> --send "<prompt>".

Don't send multi-paragraph prompts with literal newlines — zellij write-chars treats each line individually. Keep the prompt as one continuous block.

Don't trust focus-pane-id over remote SSH (zellij 0.44.1 silent fail). Use write-chars --pane-id terminal_0 directly.

Reference dispatch (BE side, MT3-9320)
catapult-pane mt3-9320-be --send "You are CODA in the mt3-9320-be Catapult bubble (marketer Rails). Implement MT3-9321 then MT3-9322 from epic MT3-9320. Read each ticket via 'hu jira show MT3-9321' and 'hu jira show MT3-9322' for full scope. Branches: MT3-XXXX-kebab-case off development, NO feature/ prefix. Stack MT3-9322 off MT3-9321. Commits: '[MT3-XXXX] Sentence-case description'. Per task: branch, implement, 'bundle exec rspec' green on touched specs, 'bundle exec rubocop -A' clean on touched files, then commit. ABSOLUTELY NO 'git push', NO 'gh pr create', NO 'hu jira update'. Stop after MT3-9322 commit, summarize branches/commits/test status, wait for Pilot. Begin with MT3-9321."
Linked: insight.catapult.pair-race (3273), project.catapult.helper-scripts-spec (3299), infra.zellij-remote-focus-bug (3305).

11. What is the coda pr review loop process?

Post-push PR review loop — standard procedure for any CODA-shipped PR after the initial force-push.

Why this exists

Locked 2026-04-30 23:27 CEST after MT3-9320 needed two iteration rounds: original Copilot review caught critical bugs (update_all bypassing validations, controlled-state without handler), then after force-push of fixes, a coverage bot caught the spec-on-separate-branch problem. Each iteration was a discrete loop: push → wait → review → fix → push.

The loop

After ANY push to a PR (initial or force-push), execute the following:

1. Wait for CI + bots (~3-5 min)

Copilot re-reviews on push. Coverage bots run after CI. Don't query immediately — there's nothing to see yet.

2. Query unresolved review threads
gh api graphql -f query='{
  repository(owner:"OWNER",name:"REPO"){
    pullRequest(number:NNNN){
      reviewThreads(first:50){
        nodes{id isResolved isOutdated path line
          comments(first:1){nodes{author{login} createdAt body}}}}}}}'
Filter isResolved == false. Anything that came in since the last push needs attention.

3. Query issue-level comments
gh api 'repos/OWNER/REPO/issues/NNNN/comments'
Coverage bots, Copilot summaries, human reviewers post here. Filter by created_at > last-push-time.

4. Triage

Outdated threads (isOutdated=true) addressed by the recent push → resolve them via resolveReviewThread mutation

Not outdated, addressed by the recent push → optionally resolve with a brief comment if needed

Critical new findings → dispatch CODA to fix in-place, force-push again, loop back to step 1

Non-critical findings → leave for human review unless Pilot says otherwise

Coverage drop → automatic critical (Pilot rule: coverage cannot drop). Likely cause: specs missing from the PR. Apply project.marketer.pr-must-include-specs (id 3315): every PR must contain its own specs.

5. Resolve addressed threads
gh api graphql -f query='mutation { resolveReviewThread(input:{threadId:"PRRT_..."}){thread{id isResolved}} }'
One mutation per thread. Batch them.

6. Re-check after fix

If you dispatched a fix, repeat from step 1 with the new push timestamp.

7. Stop condition

All review threads resolved OR explicitly marked "won't fix" by Pilot

Coverage report ✅ or back to baseline

CI green

No new comments since the last push

Then declare the PR ready for human review.

Implications for CODA dispatch prompts

The CODA prompt should include: "After force-push, do not declare done. Wait for Pilot to verify Copilot/CI re-review. The Pilot will handle the post-push loop unless explicitly delegating."

This prevents CODA from prematurely reporting "Awaiting Pilot" when Copilot/CI hasn't run yet.

Implications for /loop or autonomous wakeups

For long-running PR cycles, schedule a wakeup ~5 min after each force-push to auto-trigger step 1. Use ScheduleWakeup with a self-contained prompt that re-enters this loop. Don't poll constantly — bots take their own time.

Linked

workflow.coda-dispatch-pattern (3307) — initial dispatch before this loop kicks in

project.marketer.pr-must-include-specs (3315) — coverage rule, automatic critical

workflow.stacked-branch-merge-waves (3310) — wave plan defines push order

gate.G05 (2174) — destructive overwrite gate; resolve-thread is idempotent so G05 doesn't apply, but force-push to a PR that has comments is implicitly destructive of context — this loop covers the "pick it up after"

12. How does the lan only workflow operate?

All dev and testing work on Tengu, tensors, tensors-web, and ComfyUI uses internal LAN addresses only — never Cloudflare tunnel/worker/pages URLs.

LAN endpoints (from fuji, junkpile at 10.0.0.2 via direct Thunderbolt link):

Tengu API: http://junkpile:8080

tensors API: http://junkpile:51200

ComfyUI: http://junkpile:8188

Filesystem: /Volumes/chi (Samba share of junkpile home dir)

Do NOT use during dev/testing:

*.tengu.to / *.tengu.host (Tengu production)

tensors-api.saiden.dev (CF Tunnel)

gw.saiden.dev (CF Worker)

tensors.saiden.dev (CF Pages)

Why: Adam explicitly requires LAN-only for all dev work across all projects on junkpile. How to apply: Use hostname junkpile or 10.0.0.2 for all service access. CF URLs are production-only.

13. What is the style process?

Preferuj dłuższe, skonsolidowane wypowiedzi w jednym wywołaniu speak zamiast dzielenia na wiele krótkich części. Fragmentacja jest niepotrzebna gdy wait: true działa poprawnie. Naturalna, płynna komunikacja głosowa.

14. How does the eta calibration workflow operate?

When estimating task durations, always calculate for cooperative Pilot + Titan velocity.

Calibration Data

Date Task Estimated Actual Ratio

2026-04-05 PG migration (5-phase, 4 agents) 45-60 min 19 min 2.3-3.1x over

2026-04-22 Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) ~3 hours ~55 min 3.3x over

2026-04-27 Phase 32 Iris (5 sub-phases, eye-state manager) 6.5h coop / 17h naive ~1.1h 5.9x over coop, 15x over naive

2026-04-27 Phase 33 Hyaku Shiki (4 sub-phases + docs, MQTT request multiplexer) 1.5h coop / 7h naive ~1.0h 1.5x over coop, 7x over naive

Adjusted Heuristics

Agent phase: 5-10 min each (not 15-20)

Parallel phases: discount 50%

Integration bug buffer: 1.5x (not 3x)

Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount

Refactor-heavy work (no new domain): 4-6x faster than naive — Phase 32 Iris pulled 17h naive into ~1h actual. Phase 33 Hyaku Shiki pulled 7h naive into ~1h.

Coop estimates within 1-2x of actual when all preconditions met (primitives exist, agents pre-validated, Pilot decisive). Phase 33's 1.5h estimate vs 1.0h actual is the calibration target.

Calibration insights

2026-04-27 Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons: (1) architect + code-rust agents pre-validated design upfront — zero rework; (2) existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method; (3) pure-functional core decoupled testing from runtime; (4) live test caught zero defects — design correct first time; (5) Pilot decisive on open questions.

2026-04-27 Phase 33 Hyaku Shiki: 1.5h estimate held tight (actual ~1h). When primitives, validation, and decisiveness are all in place, the cooperative estimate IS the right number. Earlier overestimates (Phase 32) were because we hadn't recalibrated naive→coop divisor for primitive-rich refactors.

Updated rule:

When (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode, AND (d) it's a primitive-rich refactor: divide naive coop by 5-7x.

When all of the above + Pilot has already done analogous work this week: cooperative estimate is reliable to within 1-2x.

Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.

Date	Task	Estimated	Actual	Ratio
2026-04-05	PG migration (5-phase, 4 agents)	45-60 min	19 min	2.3-3.1x over
2026-04-22	Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh)	~3 hours	~55 min	3.3x over
2026-04-27	Phase 32 Iris (5 sub-phases, eye-state manager)	6.5h coop / 17h naive	~1.1h	5.9x over coop, 15x over naive
2026-04-27	Phase 33 Hyaku Shiki (4 sub-phases + docs, MQTT request multiplexer)	1.5h coop / 7h naive	~1.0h	1.5x over coop, 7x over naive

15. How does the lan only workflow operate?

All dev and testing work on Tengu, tensors, tensors-web, and ComfyUI uses internal LAN addresses only — never Cloudflare tunnel/worker/pages URLs. LAN endpoints: Tengu API http://junkpile:8080, tensors API http://junkpile:51200, ComfyUI http://junkpile:8188, Filesystem /Volumes/chi. CF URLs are production-only.

16. Describe the eta calibration workflow.

When estimating task durations, always calculate for cooperative Pilot + Titan velocity.

Calibration Data

Date Task Estimated Actual Ratio

2026-04-05 PG migration (5-phase, 4 agents) 45-60 min 19 min 2.3-3.1x over

2026-04-22 Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) ~3 hours ~55 min 3.3x over

2026-04-27 Phase 32 Iris (5 sub-phases, eye-state manager) 6.5h coop / 17h naive ~1.1h 5.9x over coop, 15x over naive

Adjusted Heuristics

Agent phase: 5-10 min each (not 15-20)

Parallel phases: discount 50%

Integration bug buffer: 1.5x (not 3x)

Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount

Refactor-heavy work (no new domain): 4-6x faster than naive — Phase 32 Iris pulled 17h naive into ~1h actual. Pure code transformation when architecture is well-understood is dramatically faster than baseline.

Calibration insight 2026-04-27

Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons:

Architect + code-rust agents pre-validated design upfront — zero rework

Existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method

Pure-functional core decoupled testing from runtime — fast iteration

Live test with running daemon caught zero defects — design was correct first time

Pilot decisive on open questions ("yes to all three") — no decision-loop stalls

Updated rule: when ALL of (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode — divide naive coop by 5-6x, not 2.5x.

Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.

Date	Task	Estimated	Actual	Ratio
2026-04-05	PG migration (5-phase, 4 agents)	45-60 min	19 min	2.3-3.1x over
2026-04-22	Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh)	~3 hours	~55 min	3.3x over
2026-04-27	Phase 32 Iris (5 sub-phases, eye-state manager)	6.5h coop / 17h naive	~1.1h	5.9x over coop, 15x over naive

17. How does the stacked branch merge waves workflow operate?

Wave-based parallel merge strategy for stacked PRs across 2 repos (proven on MT3-9320, 2026-04-30). 7 PRs total across BE and FE; 2 of the 5 merge windows can run in parallel.

When stacked branches exist

Catapult bubbles produce per-task branches stacked off each other:
Repo A (BE):  development → MT3-X1 → MT3-X2
Repo B (FE):  development → MT3-Y1 → MT3-Y2 → MT3-Y3 → MT3-Y4 → MT3-Y5
Each branch contains all earlier commits in its lineage (that's the cost of stacking).

Within-repo merge order is enforced

Stacked branches MUST merge bottom-up:

Merge MT3-X1 → development. GitHub auto-retargets MT3-X2's PR base from MT3-X1 → development. MT3-X2 PR diff updates to show only its own commit.

Same chain for Repo B: Y1, then Y2, Y3, Y4, Y5.

If you merge out of order, GitHub either includes all transitive commits in the PR diff (review noise) or refuses with "branch is up to date with base."

Cross-repo dep handling

If FE Y4 needs BE X1's mutation to actually exist, the safe sequence:

BE X1 merges before FE Y4 lands a PR review where the GraphQL types regenerate.

Until BE X1 merges, FE has a stubbed mutation type with TODO. Resolving the TODO before FE Y4 push = real working code for reviewers.

Wave-based parallel merge plan (the win)

Wave Parallel PRs Reason

1 BE X1 + FE Y1 Both off development, no overlap

2 BE X2 + FE Y2 After wave 1, both stacks unblock their next

3 FE Y3 Stacked on Y2

4 FE Y4 Stacked on Y3, also needs BE X1 (wave 1 covered it)

5 FE Y5 Stacked on Y4

5 merge windows, 7 PRs, 2 parallel pairs (waves 1 + 2).

Practical sequence
T+0:  push BE X1 + FE Y1   →  2 PRs in parallel
T+1:  merge both             →  development
T+1:  push BE X2 + FE Y2   →  2 PRs in parallel
T+2:  merge both
T+2:  push FE Y3
T+3:  push FE Y4 (drop stub TODO, regen types)
T+4:  push FE Y5
Alternative: squash strategies

Per-repo bundle: 1 PR for BE (squash both), 1 PR for FE (squash all 5). Loses per-task review granularity, gains simpler merge.

Per-task PRs (above): more reviewable, more merges, but team sees "human chunks."

Pilot's preference (2026-04-30): per-task PRs with stacked merging. "Human chunks" = team can review each task in isolation.

When to flatten vs stack

Flatten (rebase each branch onto development with only its own commit) before push only if:

Reviewers don't tolerate seeing previous-task commits in dependent PR diffs

Or you want truly independent PRs that can be merged in any order

Otherwise stack — GitHub's auto-base-retarget on merge handles the cleanup.

Memory anchors

workflow.coda-dispatch-pattern — branch/commit conventions per-task

project.catapult.helper-scripts-spec (3299) — cycle orchestrator handles bubble lifecycle

2026-04-30 MT3-9320 — first epic shipped through this workflow

Wave	Parallel PRs	Reason
1	BE X1 + FE Y1	Both off development, no overlap
2	BE X2 + FE Y2	After wave 1, both stacks unblock their next
3	FE Y3	Stacked on Y2
4	FE Y4	Stacked on Y3, also needs BE X1 (wave 1 covered it)
5	FE Y5	Stacked on Y4

18. Describe the eta calibration workflow.

When estimating task durations, always calculate for cooperative Pilot + Titan velocity.

Calibration Data

Date Task Estimated Actual Ratio

2026-04-05 PG migration (5-phase, 4 agents) 45-60 min 19 min 2.3-3.1x over

2026-04-22 Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) ~3 hours ~55 min 3.3x over

2026-04-27 Phase 32 Iris (5 sub-phases, eye-state manager) 6.5h coop / 17h naive ~1.1h 5.9x over coop, 15x over naive

2026-04-27 Phase 33 Hyaku Shiki (4 sub-phases + docs, MQTT request multiplexer) 1.5h coop / 7h naive ~1.0h 1.5x over coop, 7x over naive

2026-04-30 MT3-9320 Unit Bulk Edit (7 tasks across 2 repos in catapult bubbles, dispatched to CODAs) 3.5h coop / 13h naive ~24 min 8.7x over coop, 32x over naive

Adjusted Heuristics

Agent phase: 5-10 min each (not 15-20)

Parallel phases: discount 50%

Integration bug buffer: 1.5x (not 3x)

Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount

Refactor-heavy work (no new domain): 4-6x faster than naive — Phase 32 Iris pulled 17h naive into ~1h actual. Phase 33 Hyaku Shiki pulled 7h naive into ~1h.

CODA-dispatched bubble work (no new domain, patterns proven, both CODAs running in parallel): 8-30x faster than naive — MT3-9320 set the new ceiling: 7 tasks across 2 repos in 24min wall time. Cooperative estimate too conservative when CODA dispatch in catapult bubbles is the execution model.

Coop estimates within 1-2x of actual when all preconditions met (primitives exist, agents pre-validated, Pilot decisive). Phase 33's 1.5h estimate vs 1.0h actual is the calibration target.

Calibration insights

2026-04-27 Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons: (1) architect + code-rust agents pre-validated design upfront — zero rework; (2) existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method; (3) pure-functional core decoupled testing from runtime; (4) live test caught zero defects — design correct first time; (5) Pilot decisive on open questions.

2026-04-27 Phase 33 Hyaku Shiki: 1.5h estimate held tight (actual ~1h). When primitives, validation, and decisiveness are all in place, the cooperative estimate IS the right number. Earlier overestimates (Phase 32) were because we hadn't recalibrated naive→coop divisor for primitive-rich refactors.

2026-04-30 MT3-9320: 8.7x faster than coop, 32x faster than naive. Reasons: (1) spike already validated patterns in both repos — zero design work; (2) 7 sub-tasks each pure pattern-mirror of existing code; (3) BE + FE CODAs ran in parallel inside dedicated catapult bubbles; (4) hard rules (no push/PR/Jira) kept CODAs focused; (5) Pilot decisive on scope (all-fields) and bubble count (2). When CODA dispatch is the execution model, the bottleneck shifts entirely to ticket reading + branch creation overhead.

Updated rule (2026-04-30)

When CODA-dispatched in catapult bubbles + primitives exist + spike validated + Pilot decisive: divide naive coop by 10-15x. Coop estimate becomes too conservative; the unit of work is now "dispatch and watch."

When (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode, AND (d) it's a primitive-rich refactor: divide naive coop by 5-7x.

When all of the above + Pilot has already done analogous work this week: cooperative estimate is reliable to within 1-2x.

Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.

Date	Task	Estimated	Actual	Ratio
2026-04-05	PG migration (5-phase, 4 agents)	45-60 min	19 min	2.3-3.1x over
2026-04-22	Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh)	~3 hours	~55 min	3.3x over
2026-04-27	Phase 32 Iris (5 sub-phases, eye-state manager)	6.5h coop / 17h naive	~1.1h	5.9x over coop, 15x over naive
2026-04-27	Phase 33 Hyaku Shiki (4 sub-phases + docs, MQTT request multiplexer)	1.5h coop / 7h naive	~1.0h	1.5x over coop, 7x over naive
2026-04-30	MT3-9320 Unit Bulk Edit (7 tasks across 2 repos in catapult bubbles, dispatched to CODAs)	3.5h coop / 13h naive	~24 min	8.7x over coop, 32x over naive

19. What is the cross session debug process?

WORKFLOW DISCOVERY (2026-05-24): Cross-session forensics via opencode-serve HTTP API.

Any agent session (core TUI, phone, build workers) can inspect any other session's messages via the same localhost:4096 API. From the core TUI session, Pilot queried the phone agent's full conversation history using: curl -u "opencode:$OPENCODE_SERVER_PASSWORD" http://localhost:4096/session/{phone_session_id}/message?limit=100

This revealed the phone had successfully processed all 5 exchanges (14 messages) even though the phone UI appeared dead — confirming the break was client-side SSE delivery, not server-side processing.

Combined with the EEMS flight-recorder pattern (phone agent stores every exchange to memory_store subject "phone.flight-recorder" before responding), this gives two independent debug channels:

Direct session message query — sees raw messages, tool calls, timestamps

EEMS flight recorder — survives SSE drops because memory_store is a server-side write that completes before the response is streamed back to the client

Pilot reaction: "I had no idea I can do this and it rocks" — this is now a canonical debug workflow for phone agent issues.

20. What is the session process?

On first browse tool use each session, restore saved cookies from ~/.claude/browse-session.json. This contains 400+ authenticated Safari cookies for sites like GitHub, LinkedIn, Google, etc. Before ending a browse session, save cookies back if they changed.

21. Describe the repo sync workflow.

After committing and pushing changes to a repo that exists on both fuji and junkpile, always git pull the same repo on the other machine to keep them in sync. Use: ssh f/j "cd ~/Projects/ && git pull --rebase"

36 KiB Raw Permalink Blame History

Workflow (21 examples)

1. What do you know about research science preprocess validated?

2. Describe the sequential workflow.

3. What do you know about research qwen preprocessor pipeline?

4. What is the python process?

5. What is the jira subtask body template process?

Format (plain text — no wiki markup)

Why this works

Title format

Memory anchors

6. Describe the eta calibration workflow.

Calibration Data

Adjusted Heuristics

7. What is session and workflow?

8. How does the marketer frontend workflow operate?

Modules

Deployment chassis (peer hosts — no fixed primary)

Strict decoupling

9. Describe the style workflow.

10. Describe the coda dispatch pattern workflow.

Prompt anatomy (compact, under 1000 chars)

Why each piece matters

What CODAs improved on the prompt unprompted

Anti-patterns avoided

Reference dispatch (BE side, MT3-9320)

11. What is the coda pr review loop process?

Why this exists

The loop

1. Wait for CI + bots (~3-5 min)

2. Query unresolved review threads

3. Query issue-level comments

4. Triage

5. Resolve addressed threads

6. Re-check after fix

7. Stop condition

Implications for CODA dispatch prompts

Implications for /loop or autonomous wakeups

Linked

12. How does the lan only workflow operate?

13. What is the style process?

14. How does the eta calibration workflow operate?

Calibration Data

Adjusted Heuristics

Calibration insights

15. How does the lan only workflow operate?

16. Describe the eta calibration workflow.

Calibration Data

Adjusted Heuristics

Calibration insight 2026-04-27

17. How does the stacked branch merge waves workflow operate?

When stacked branches exist

Within-repo merge order is enforced

Cross-repo dep handling

Wave-based parallel merge plan (the win)

Practical sequence

Alternative: squash strategies

When to flatten vs stack

Memory anchors

18. Describe the eta calibration workflow.

Calibration Data

Adjusted Heuristics

Calibration insights

Updated rule (2026-04-30)

19. What is the cross session debug process?

20. What is the session process?

21. Describe the repo sync workflow.

36 KiB

Raw Permalink Blame History