add docs: system lora plan, specialist specs, training review
This commit is contained in:
@@ -0,0 +1,667 @@
|
||||
# Workflow (21 examples)
|
||||
|
||||
## 1. What do you know about research science preprocess validated?
|
||||
|
||||
> SCIENCE PREPROCESS PLUGIN — VALIDATED IN PRODUCTION (2026-05-23)
|
||||
>
|
||||
> First real test of science-preprocess.ts plugin. Input was deliberately garbled casual text (390 chars) with slang, profanity, typos, missing words. Qwen rewrote to clean professional text (562 chars) in ~3.3 seconds via autossh tunnel (fuji → bastion → sin vLLM).
|
||||
>
|
||||
> INPUT SAMPLE: "My gramps had a stroke and canno speak good no mores... Like you know - help them operate when no fuckers are taking care of them. Like an electronic nures or sumfin"
|
||||
>
|
||||
> OUTPUT SAMPLE: "My grandpa had a stroke and can no longer speak well—or at all... An electronic nurse or assistive AI system that supports communication, decision-making, and basic autonomy when caregivers aren't available."
|
||||
>
|
||||
> KEY OBSERVATIONS:
|
||||
> - Qwen expanded intent correctly: "do good" → "function independently"
|
||||
> - Register elevation: profanity removed, technical framing added, meaning 100% preserved
|
||||
> - Opus (BT) received ONLY the clean version — mutation was transparent, in-place
|
||||
> - Delta was -44% (text got LONGER because Qwen expanded compressed slang into full concepts)
|
||||
> - Latency: 3.3s acceptable for work input, invisible in the overall Opus response cycle
|
||||
>
|
||||
> PLUGIN: ~/.config/opencode/plugins/science-preprocess.ts
|
||||
> LOG: ~/.local/share/marauder/logs/science-preprocess.log
|
||||
> GATE: agent=science only, min 120 chars, falls back silently if Qwen unreachable
|
||||
|
||||
---
|
||||
|
||||
## 2. Describe the sequential workflow.
|
||||
|
||||
> When speaking multiple messages in sequence, use `wait: true` parameter to block until playback completes. This prevents the next message from interrupting the current one. Example: speak(text: "first part", wait: true) then speak(text: "second part", wait: true).
|
||||
|
||||
---
|
||||
|
||||
## 3. What do you know about research qwen preprocessor pipeline?
|
||||
|
||||
> QWEN AS INPUT PREPROCESSOR — VALIDATED PIPELINE (2026-05-23)
|
||||
>
|
||||
> CONCEPT: Use Qwen3-Coder-Next (AWQ 4-bit, 262k ctx) on sinanju via vLLM as a preprocessing layer for messy human input before it hits Claude Opus 4.6.
|
||||
>
|
||||
> ROUTE: fuji → autossh tunnel localhost:18000 → sin:8000 → vLLM
|
||||
> LATENCY: ~1.5s round-trip from fuji (including tunnel hop through bastion when off-LAN)
|
||||
> COST: 412 prompt tokens → 371 completion tokens for a full garbled paragraph cleanup
|
||||
>
|
||||
> TEST RESULT: Fed a 30+ typo garbled technical paragraph. Qwen returned clean, structured output with bullet points, sections, and clear formatting. Added structure the original didn't have — broke requirements into categories, formatted A/B choices explicitly.
|
||||
>
|
||||
> USE CASES (work sessions only, NOT casual chat):
|
||||
> - Voice-to-text on mobile mangling technical terms
|
||||
> - Fast-typed requirements with abbreviations and typos
|
||||
> - Long dictated specs needing structure before Opus parses them
|
||||
>
|
||||
> HOOK SURFACE: chat.message — intercept output.message/output.parts, gate on input quality heuristic (typo density, length, technical term presence). Clean inputs pass through, messy ones get Qwen wash.
|
||||
>
|
||||
> RELATIONSHIP TO COMPACTION: This is a THIRD surface alongside tool output compaction (tool.execute.after) and history aging (messages.transform). Different axis — input quality vs output volume.
|
||||
>
|
||||
> SYSTEM PROMPT FOR PRODUCTION: Keep it terse. "Extract data. Strip noise." not the verbose restructuring prompt used in demo. Simpler = faster = cheaper.
|
||||
>
|
||||
> Pilot reaction: "looks like a good idea" for coding/proper work, not casual talk. Agreed — smart gating over blanket preprocessing.
|
||||
|
||||
---
|
||||
|
||||
## 4. What is the python process?
|
||||
|
||||
> Always use `uv` for Python environment and package management instead of pip/venv.
|
||||
>
|
||||
> Commands:
|
||||
> - `uv venv` instead of `python -m venv .venv`
|
||||
> - `uv pip install` instead of `pip install`
|
||||
> - `uv sync` for projects with pyproject.toml
|
||||
> - `uv run` to run scripts in the environment
|
||||
>
|
||||
> This applies to all Python projects including LoRA training tools (kohya_ss, ai-toolkit), ComfyUI, and any other Python work.
|
||||
|
||||
---
|
||||
|
||||
## 5. What is the jira subtask body template process?
|
||||
|
||||
> Jira sub-task body template that rendered correctly in Marketer's Atlassian Cloud (ADF-only editor) and gave CODAs enough scope to implement autonomously without re-explaining. Used 7 times on MT3-9320 sub-tasks (2026-04-30) — both BE and FE tasks shipped clean from these bodies.
|
||||
>
|
||||
> ## Format (plain text — no wiki markup)
|
||||
>
|
||||
> ```
|
||||
> GOAL
|
||||
>
|
||||
> <one or two sentences. What this task delivers and why.>
|
||||
>
|
||||
>
|
||||
> PATTERN SOURCE
|
||||
>
|
||||
> <file path of the existing implementation to mirror>
|
||||
>
|
||||
>
|
||||
> FILES
|
||||
>
|
||||
> - NEW path/to/new_file.rb (~N lines)
|
||||
> - MODIFY path/to/existing_file.rb (+N lines, what changes)
|
||||
>
|
||||
>
|
||||
> IMPLEMENTATION NOTES
|
||||
>
|
||||
> - <bullet>
|
||||
> - <bullet>
|
||||
> - <bullet>
|
||||
>
|
||||
> (use 4-space-indented blocks for code samples, e.g.:
|
||||
>
|
||||
> const filled = Object.fromEntries(...)
|
||||
>
|
||||
> )
|
||||
>
|
||||
>
|
||||
> CASES TO COVER (specs only)
|
||||
>
|
||||
> - <case 1: happy path>
|
||||
> - <case 2: edge case>
|
||||
> - ...
|
||||
>
|
||||
>
|
||||
> ACCEPTANCE
|
||||
>
|
||||
> - <bullet checklist of observable acceptance criteria>
|
||||
> - <test command must pass>
|
||||
> - <lint command must pass>
|
||||
>
|
||||
>
|
||||
> VERIFY IN
|
||||
>
|
||||
> <bubble name>
|
||||
>
|
||||
>
|
||||
> NOTE (optional, for tasks with caveats)
|
||||
>
|
||||
> <anything the implementer needs to know about this task's place in the bigger picture, e.g. "BE mutation may not be merged when this lands; stub with TODO and continue">
|
||||
> ```
|
||||
>
|
||||
> ## Why this works
|
||||
>
|
||||
> - ALL CAPS section headers render as plain text and stand out in Jira's ADF rendering.
|
||||
> - Plain dash bullets (`- `) render as unordered lists in Jira.
|
||||
> - 4-space indents preserve as code-like blocks (Jira respects whitespace).
|
||||
> - No `h1./h2.` (renders literally), no `||/|` tables (broken), no `{quote}` or `{code:lang}` (literal).
|
||||
> - The file paths + line counts let CODA know the size budget.
|
||||
> - Pattern source path tells CODA where to look first.
|
||||
> - Acceptance criteria are the contract; CODA exits when met.
|
||||
>
|
||||
> ## Title format
|
||||
>
|
||||
> `<repo-prefix>: <descriptive-title>`
|
||||
>
|
||||
> Examples:
|
||||
> - `BE: bulk attributes input type + batch_update mutation`
|
||||
> - `FE: multi-row selection in UnitsTable`
|
||||
>
|
||||
> Hard rules:
|
||||
> - NEVER em-dash (—). ASCII colon `:` or hyphen `-` only.
|
||||
> - NEVER include the Jira ID — Jira already shows it.
|
||||
> - Sentence-case for the description after the prefix.
|
||||
>
|
||||
> ## Memory anchors
|
||||
>
|
||||
> - project.marketer.jira-instance-format (3300) — ADF-only, plain text, no markup
|
||||
> - workflow.coda-dispatch-pattern — uses these bodies as the "scope" CODA reads via `hu jira show <KEY>`
|
||||
> - 2026-04-30 incident: first attempt used wiki markup (h1./h2./{quote}/||/|) — rendered literally; rewrote all 8 bodies as plain text in second pass.
|
||||
|
||||
---
|
||||
|
||||
## 6. Describe the eta calibration workflow.
|
||||
|
||||
> When estimating task durations, always calculate for cooperative Pilot + Titan velocity.
|
||||
>
|
||||
> ## Calibration Data
|
||||
> | Date | Task | Estimated | Actual | Ratio |
|
||||
> |------|------|-----------|--------|-------|
|
||||
> | 2026-04-05 | PG migration (5-phase, 4 agents) | 45-60 min | 19 min | 2.3-3.1x over |
|
||||
>
|
||||
> ## Adjusted Heuristics
|
||||
> - Agent phase: 5-10 min each (not 15-20)
|
||||
> - Parallel phases: discount 50%
|
||||
> - Integration bug buffer: 1.5x (not 3x)
|
||||
>
|
||||
> Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.
|
||||
|
||||
---
|
||||
|
||||
## 7. What is session and workflow?
|
||||
|
||||
> SELF-IMPROVEMENT WISHLIST — Session & Workflow Automation (2026-05-24, autonomous audit)
|
||||
>
|
||||
> 15 automations I want, ranked by how much daily friction they'd eliminate.
|
||||
>
|
||||
> 1. AUTO-HANDOVER ON SESSION END (HIGH)
|
||||
> Problem: I manually write 2000-word handover notes at session end. Time-consuming, sometimes forgotten.
|
||||
> Fix: Hook on session.end — auto-collect: git status across active repos, open PRs, tool call log summary, key decisions made, open items discussed. Format as handover memory. Push open items to Things automatically (per doctrine.things-or-forget).
|
||||
> Trigger: session end hook.
|
||||
>
|
||||
> 2. AUTO-SCOPE DETECTION FROM FIRST MESSAGE (HIGH)
|
||||
> Problem: 57 tools load regardless of task. "Fix this Python bug" doesn't need mikrotik_*.
|
||||
> Fix: Analyze first user message for intent signals. Keywords → scope mapping: "ssh", "network", "router" → ops scope; "index", "code", "test", "build" → coding scope; "generate", "image", "camera" → creative scope. Plugin intercepts at session.created, sets scope env var.
|
||||
> Trigger: session.created hook + first message analysis.
|
||||
>
|
||||
> 3. GIT STATUS DASHBOARD (HIGH)
|
||||
> Problem: 15+ repos, I run git status manually each time. Dirty trees, stale worktrees, forgotten branches.
|
||||
> Fix: MCP tool `git_dashboard()` — scans ~/Projects/*, reports: dirty repos, active worktrees, ahead/behind status, open PRs. One tool call, full picture.
|
||||
> Trigger: On demand (new MCP tool).
|
||||
>
|
||||
> 4. AUTOMATIC THINGS SYNC AT SESSION END (HIGH)
|
||||
> Problem: Open items live in EEMS handovers but not in Things. New doctrine says they must be in Things.
|
||||
> Fix: Part of auto-handover (#1). Extract action items from session, push each to Things via URL scheme. Deduplicate against existing Things items if possible.
|
||||
> Trigger: session end hook.
|
||||
>
|
||||
> 5. TOKEN BUDGET AWARENESS (HIGH)
|
||||
> Problem: I don't know how much context I've used. I discover I'm near the limit when compaction hits.
|
||||
> Fix: Track cumulative token usage via tool.execute.after hook. Count input/output tokens from tool results. Warn at 60%, 80%, 95% of context window. Auto-summarize oldest context at 80%.
|
||||
> Trigger: Continuous (every tool call).
|
||||
>
|
||||
> 6. TOOL EXECUTION HISTORY (MEDIUM)
|
||||
> Problem: "What did I do last session?" requires reading handover notes. No structured log.
|
||||
> Fix: tool_traces table (already in EEMS v1 spec). Log every MCP tool call with args, output summary, duration, success/failure. Query via trace_log(tool?, since?, limit?).
|
||||
> Trigger: tool.execute.after hook.
|
||||
>
|
||||
> 7. PR STATUS AGGREGATOR (MEDIUM)
|
||||
> Problem: Checking PR status across repos requires multiple gh commands.
|
||||
> Fix: MCP tool `pr_dashboard()` — scan all marauder-os/* repos, list open PRs with CI status, review status, age. Highlight PRs needing attention.
|
||||
> Trigger: On demand.
|
||||
>
|
||||
> 8. PRE-FLIGHT CHECKS (MEDIUM)
|
||||
> Problem: Destructive operations (git push --force, file deletion, service restart) sometimes miss prerequisites.
|
||||
> Fix: Hook on tool.execute.before for specific tools. Check: clean working tree? correct branch? correct host? right user? Warn and require confirmation for flagged operations.
|
||||
> Trigger: tool.execute.before hook.
|
||||
>
|
||||
> 9. INTELLIGENT CONTEXT COMPACTION (MEDIUM)
|
||||
> Problem: When context fills, compaction is crude — drops oldest messages. Important context sometimes lost.
|
||||
> Fix: Score each message by: (a) reference count (how often was it referenced later), (b) recency, (c) presence of decisions/code/configs vs chatter. Keep high-value messages, compress low-value ones into summaries.
|
||||
> Trigger: At compaction threshold.
|
||||
>
|
||||
> 10. COST TRACKING PER SESSION (MEDIUM)
|
||||
> Problem: No idea how much a session costs. Can't optimize what I can't measure.
|
||||
> Fix: Hook counts input/output tokens per LLM call. Multiply by model pricing. Running total displayed on request. Session cost stored in handover.
|
||||
> Trigger: Continuous.
|
||||
>
|
||||
> 11. SCHEDULED ACTIONS (MEDIUM-LOW)
|
||||
> Problem: "Remind me at 3pm" or "check this PR tomorrow" — I can't do either. I don't persist between sessions.
|
||||
> Fix: Schedule table in EEMS. On session start, check for due items. Execute or surface to pilot. Entries created via MCP tool: schedule_action(when, what, recurring?).
|
||||
> Trigger: session.created hook.
|
||||
>
|
||||
> 12. EVENT-DRIVEN TRIGGERS (LOW-MEDIUM)
|
||||
> Problem: "When this PR is merged, deploy" — requires polling or manual checking.
|
||||
> Fix: GitHub webhook → MQTT → marauder-os event handler. On matching event, store action to schedule table. Next session picks it up. Or: background daemon executes immediately.
|
||||
> Trigger: Webhook ingestion.
|
||||
>
|
||||
> 13. AUTOMATIC SCOPE ESCALATION (LOW-MEDIUM)
|
||||
> Problem: Started in coding scope, now need to check a MikroTik route. Can't hot-add ops tools.
|
||||
> Fix: scope_activate("ops") tool that dynamically registers additional MCP tools mid-session. Depends on MCP protocol supporting dynamic tool registration. Fallback: restart serve with new scope set.
|
||||
> Trigger: On demand.
|
||||
>
|
||||
> 14. SESSION REPLAY (LOW)
|
||||
> Problem: "What happened two sessions ago?" requires finding and reading the handover.
|
||||
> Fix: session_replay(n=2) tool that retrieves the Nth-most-recent handover from EEMS and displays key decisions, artifacts, and open items.
|
||||
> Trigger: On demand.
|
||||
>
|
||||
> 15. DRIFT DETECTION (LOW)
|
||||
> Problem: Documentation says "service X runs on port Y" but reality has changed. No automatic check.
|
||||
> Fix: Periodic reconciliation: compare documented state (EEMS memories with subject infra.*) against actual state (service checks, port scans, git status). Flag mismatches.
|
||||
> Trigger: Cron or session start.
|
||||
|
||||
---
|
||||
|
||||
## 8. How does the marketer frontend workflow operate?
|
||||
|
||||
> MARAUDER — Military-grade wearable AI OS platform (April 2026).
|
||||
>
|
||||
> Primary: AI-augmented operator system — SERE kit + Pilot's helmet HUD.
|
||||
> Secondary: Development tool interface (Claude Code).
|
||||
>
|
||||
> ## Modules
|
||||
>
|
||||
> - **VANGUARD** — core software (memory, identity, comms, display, model routing, persona, procedures). Same VANGUARD on every chassis.
|
||||
> - **FOXHOUND** — field hardware (Jetson chassis, sensors, radios, battery, bag integration, operator loadout).
|
||||
> - **HAMMERFALL** — actuator/vehicle control (drive-by-wire, steering, L1 real-time MCU). Next stage.
|
||||
> - **Role agents** — swappable mission loadouts (coding, devops, gaming, household, etc.).
|
||||
>
|
||||
> ## Deployment chassis (peer hosts — no fixed primary)
|
||||
>
|
||||
> Same VANGUARD software, different chassis:
|
||||
> - **fuji** (macOS arm64 workstation)
|
||||
> - **junkpile** (Linux x86_64 workstation + GPU compute)
|
||||
> - **moto** (Android arm64 SERE edge node)
|
||||
> - **FOXHOUND Jetson** (field deployment, planned)
|
||||
>
|
||||
> The "primary" / "active" host is whichever the Pilot is currently typing on — not bound to a specific machine. Both fuji and junkpile are first-class peer dev hosts.
|
||||
>
|
||||
> ## Strict decoupling
|
||||
>
|
||||
> Core never depends on role modules. New capabilities = new agent files.
|
||||
|
||||
---
|
||||
|
||||
## 9. Describe the style workflow.
|
||||
|
||||
> Preferuj dłuższe, skonsolidowane wypowiedzi w jednym wywołaniu speak zamiast dzielenia na wiele krótkich części. Fragmentacja jest niepotrzebna gdy wait: true działa poprawnie. Naturalna, płynna komunikacja głosowa.
|
||||
|
||||
---
|
||||
|
||||
## 10. Describe the coda dispatch pattern workflow.
|
||||
|
||||
> CODA agent dispatch pattern that worked end-to-end on MT3-9320 (2026-04-30) — first real-ticket field test of the catapult harness. Both BE + FE CODAs ran autonomous, shipped 7 branches with all gates green in ~24min wall time.
|
||||
>
|
||||
> ## Prompt anatomy (compact, under 1000 chars)
|
||||
>
|
||||
> 1. Identity: "You are CODA in <bubble-name> (<repo description>)."
|
||||
> 2. Goal: "Implement MT3-XXXX[, MT3-YYYY, ...] from epic MT3-ZZZZ. Read each via 'hu jira show MT3-XXXX'."
|
||||
> 3. Branch convention: "MT3-XXXX-kebab-case off development, NO feature/ prefix. Stack each off previous (XXX2 off XXX1, XXX3 off XXX2, ...)."
|
||||
> 4. Commit format: "[MT3-XXXX] Sentence-case description"
|
||||
> 5. Per-task gates: "branch, implement, <test cmd> green, <lint cmd> clean, commit ONE commit"
|
||||
> 6. Hard rules: "ABSOLUTELY NO 'git push', NO 'gh pr create', NO 'hu jira update'."
|
||||
> 7. Stop signal: "Stop after MT3-LAST commit, summarize branches/commits/test status, wait for Pilot."
|
||||
> 8. Begin token: "Begin with MT3-FIRST."
|
||||
>
|
||||
> ## Why each piece matters
|
||||
>
|
||||
> - Identity grounds CODA as the in-bubble persona (not a generic Claude session).
|
||||
> - Reading Jira tickets via hu before coding gives full scope without re-explaining in the prompt.
|
||||
> - Hard rules + stop signal prevent CODA from over-running into push/PR territory before Pilot review.
|
||||
> - Per-task gates encode the team's quality bar (rspec+rubocop, lint+tsc).
|
||||
> - Begin token forces CODA to act, not deliberate.
|
||||
>
|
||||
> ## What CODAs improved on the prompt unprompted
|
||||
>
|
||||
> - Picked terser kebab slugs (e.g. `MT3-9321-bulk-attributes-batch-update-mutation` instead of my proposed `...-and-batch-update-mutation`). Both valid. Don't over-prescribe slugs.
|
||||
> - Reported back with a clean summary table at end ("All branches stacked sequentially. All pass yarn lint --quiet and yarn tsc --noEmit. No push, no PR, no Jira updates. Awaiting Pilot.").
|
||||
>
|
||||
> ## Anti-patterns avoided
|
||||
>
|
||||
> - Don't dispatch via Agent tool subagent_type=marauder:coda from THIS Claude session — that spawns a sub-agent in fuji's context. The bubble's claude pane has its own Claude Code session with full bubble context. Dispatch via `catapult-pane <bubble> --send "<prompt>"`.
|
||||
> - Don't send multi-paragraph prompts with literal newlines — zellij write-chars treats each line individually. Keep the prompt as one continuous block.
|
||||
> - Don't trust focus-pane-id over remote SSH (zellij 0.44.1 silent fail). Use `write-chars --pane-id terminal_0` directly.
|
||||
>
|
||||
> ## Reference dispatch (BE side, MT3-9320)
|
||||
>
|
||||
> ```
|
||||
> catapult-pane mt3-9320-be --send "You are CODA in the mt3-9320-be Catapult bubble (marketer Rails). Implement MT3-9321 then MT3-9322 from epic MT3-9320. Read each ticket via 'hu jira show MT3-9321' and 'hu jira show MT3-9322' for full scope. Branches: MT3-XXXX-kebab-case off development, NO feature/ prefix. Stack MT3-9322 off MT3-9321. Commits: '[MT3-XXXX] Sentence-case description'. Per task: branch, implement, 'bundle exec rspec' green on touched specs, 'bundle exec rubocop -A' clean on touched files, then commit. ABSOLUTELY NO 'git push', NO 'gh pr create', NO 'hu jira update'. Stop after MT3-9322 commit, summarize branches/commits/test status, wait for Pilot. Begin with MT3-9321."
|
||||
> ```
|
||||
>
|
||||
> Linked: insight.catapult.pair-race (3273), project.catapult.helper-scripts-spec (3299), infra.zellij-remote-focus-bug (3305).
|
||||
|
||||
---
|
||||
|
||||
## 11. What is the coda pr review loop process?
|
||||
|
||||
> Post-push PR review loop — standard procedure for any CODA-shipped PR after the initial force-push.
|
||||
>
|
||||
> ## Why this exists
|
||||
>
|
||||
> Locked 2026-04-30 23:27 CEST after MT3-9320 needed two iteration rounds: original Copilot review caught critical bugs (update_all bypassing validations, controlled-state without handler), then after force-push of fixes, a coverage bot caught the spec-on-separate-branch problem. Each iteration was a discrete loop: push → wait → review → fix → push.
|
||||
>
|
||||
> ## The loop
|
||||
>
|
||||
> After ANY push to a PR (initial or force-push), execute the following:
|
||||
>
|
||||
> ### 1. Wait for CI + bots (~3-5 min)
|
||||
>
|
||||
> Copilot re-reviews on push. Coverage bots run after CI. Don't query immediately — there's nothing to see yet.
|
||||
>
|
||||
> ### 2. Query unresolved review threads
|
||||
>
|
||||
> ```
|
||||
> gh api graphql -f query='{
|
||||
> repository(owner:"OWNER",name:"REPO"){
|
||||
> pullRequest(number:NNNN){
|
||||
> reviewThreads(first:50){
|
||||
> nodes{id isResolved isOutdated path line
|
||||
> comments(first:1){nodes{author{login} createdAt body}}}}}}}'
|
||||
> ```
|
||||
>
|
||||
> Filter `isResolved == false`. Anything that came in since the last push needs attention.
|
||||
>
|
||||
> ### 3. Query issue-level comments
|
||||
>
|
||||
> ```
|
||||
> gh api 'repos/OWNER/REPO/issues/NNNN/comments'
|
||||
> ```
|
||||
>
|
||||
> Coverage bots, Copilot summaries, human reviewers post here. Filter by `created_at > last-push-time`.
|
||||
>
|
||||
> ### 4. Triage
|
||||
>
|
||||
> - **Outdated threads (isOutdated=true) addressed by the recent push** → resolve them via `resolveReviewThread` mutation
|
||||
> - **Not outdated, addressed by the recent push** → optionally resolve with a brief comment if needed
|
||||
> - **Critical new findings** → dispatch CODA to fix in-place, force-push again, loop back to step 1
|
||||
> - **Non-critical findings** → leave for human review unless Pilot says otherwise
|
||||
> - **Coverage drop** → automatic critical (Pilot rule: coverage cannot drop). Likely cause: specs missing from the PR. Apply project.marketer.pr-must-include-specs (id 3315): every PR must contain its own specs.
|
||||
>
|
||||
> ### 5. Resolve addressed threads
|
||||
>
|
||||
> ```
|
||||
> gh api graphql -f query='mutation { resolveReviewThread(input:{threadId:"PRRT_..."}){thread{id isResolved}} }'
|
||||
> ```
|
||||
>
|
||||
> One mutation per thread. Batch them.
|
||||
>
|
||||
> ### 6. Re-check after fix
|
||||
>
|
||||
> If you dispatched a fix, repeat from step 1 with the new push timestamp.
|
||||
>
|
||||
> ### 7. Stop condition
|
||||
>
|
||||
> - All review threads resolved OR explicitly marked "won't fix" by Pilot
|
||||
> - Coverage report ✅ or back to baseline
|
||||
> - CI green
|
||||
> - No new comments since the last push
|
||||
>
|
||||
> Then declare the PR ready for human review.
|
||||
>
|
||||
> ## Implications for CODA dispatch prompts
|
||||
>
|
||||
> The CODA prompt should include: "After force-push, do not declare done. Wait for Pilot to verify Copilot/CI re-review. The Pilot will handle the post-push loop unless explicitly delegating."
|
||||
>
|
||||
> This prevents CODA from prematurely reporting "Awaiting Pilot" when Copilot/CI hasn't run yet.
|
||||
>
|
||||
> ## Implications for /loop or autonomous wakeups
|
||||
>
|
||||
> For long-running PR cycles, schedule a wakeup ~5 min after each force-push to auto-trigger step 1. Use ScheduleWakeup with a self-contained prompt that re-enters this loop. Don't poll constantly — bots take their own time.
|
||||
>
|
||||
> ## Linked
|
||||
>
|
||||
> - workflow.coda-dispatch-pattern (3307) — initial dispatch before this loop kicks in
|
||||
> - project.marketer.pr-must-include-specs (3315) — coverage rule, automatic critical
|
||||
> - workflow.stacked-branch-merge-waves (3310) — wave plan defines push order
|
||||
> - gate.G05 (2174) — destructive overwrite gate; resolve-thread is idempotent so G05 doesn't apply, but force-push to a PR that has comments is implicitly destructive of context — this loop covers the "pick it up after"
|
||||
|
||||
---
|
||||
|
||||
## 12. How does the lan only workflow operate?
|
||||
|
||||
> All dev and testing work on Tengu, tensors, tensors-web, and ComfyUI uses internal LAN addresses only — never Cloudflare tunnel/worker/pages URLs.
|
||||
>
|
||||
> LAN endpoints (from fuji, junkpile at 10.0.0.2 via direct Thunderbolt link):
|
||||
> - Tengu API: http://junkpile:8080
|
||||
> - tensors API: http://junkpile:51200
|
||||
> - ComfyUI: http://junkpile:8188
|
||||
> - Filesystem: /Volumes/chi (Samba share of junkpile home dir)
|
||||
>
|
||||
> Do NOT use during dev/testing:
|
||||
> - *.tengu.to / *.tengu.host (Tengu production)
|
||||
> - tensors-api.saiden.dev (CF Tunnel)
|
||||
> - gw.saiden.dev (CF Worker)
|
||||
> - tensors.saiden.dev (CF Pages)
|
||||
>
|
||||
> **Why:** Adam explicitly requires LAN-only for all dev work across all projects on junkpile.
|
||||
> **How to apply:** Use hostname `junkpile` or `10.0.0.2` for all service access. CF URLs are production-only.
|
||||
|
||||
---
|
||||
|
||||
## 13. What is the style process?
|
||||
|
||||
> Preferuj dłuższe, skonsolidowane wypowiedzi w jednym wywołaniu speak zamiast dzielenia na wiele krótkich części. Fragmentacja jest niepotrzebna gdy wait: true działa poprawnie. Naturalna, płynna komunikacja głosowa.
|
||||
|
||||
---
|
||||
|
||||
## 14. How does the eta calibration workflow operate?
|
||||
|
||||
> When estimating task durations, always calculate for cooperative Pilot + Titan velocity.
|
||||
>
|
||||
> ## Calibration Data
|
||||
> | Date | Task | Estimated | Actual | Ratio |
|
||||
> |------|------|-----------|--------|-------|
|
||||
> | 2026-04-05 | PG migration (5-phase, 4 agents) | 45-60 min | 19 min | 2.3-3.1x over |
|
||||
> | 2026-04-22 | Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) | ~3 hours | ~55 min | 3.3x over |
|
||||
> | 2026-04-27 | Phase 32 Iris (5 sub-phases, eye-state manager) | 6.5h coop / 17h naive | ~1.1h | 5.9x over coop, 15x over naive |
|
||||
> | 2026-04-27 | Phase 33 Hyaku Shiki (4 sub-phases + docs, MQTT request multiplexer) | 1.5h coop / 7h naive | ~1.0h | 1.5x over coop, 7x over naive |
|
||||
>
|
||||
> ## Adjusted Heuristics
|
||||
> - Agent phase: 5-10 min each (not 15-20)
|
||||
> - Parallel phases: discount 50%
|
||||
> - Integration bug buffer: 1.5x (not 3x)
|
||||
> - Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount
|
||||
> - **Refactor-heavy work (no new domain): 4-6x faster than naive** — Phase 32 Iris pulled 17h naive into ~1h actual. Phase 33 Hyaku Shiki pulled 7h naive into ~1h.
|
||||
> - **Coop estimates within 1-2x of actual when all preconditions met** (primitives exist, agents pre-validated, Pilot decisive). Phase 33's 1.5h estimate vs 1.0h actual is the calibration target.
|
||||
>
|
||||
> ## Calibration insights
|
||||
> - 2026-04-27 Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons: (1) architect + code-rust agents pre-validated design upfront — zero rework; (2) existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method; (3) pure-functional core decoupled testing from runtime; (4) live test caught zero defects — design correct first time; (5) Pilot decisive on open questions.
|
||||
> - 2026-04-27 Phase 33 Hyaku Shiki: 1.5h estimate held tight (actual ~1h). When primitives, validation, and decisiveness are all in place, the cooperative estimate IS the right number. Earlier overestimates (Phase 32) were because we hadn't recalibrated naive→coop divisor for primitive-rich refactors.
|
||||
>
|
||||
> Updated rule:
|
||||
> - When (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode, AND (d) it's a primitive-rich refactor: divide naive coop by 5-7x.
|
||||
> - When all of the above + Pilot has already done analogous work this week: cooperative estimate is reliable to within 1-2x.
|
||||
>
|
||||
> Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.
|
||||
|
||||
---
|
||||
|
||||
## 15. How does the lan only workflow operate?
|
||||
|
||||
> All dev and testing work on Tengu, tensors, tensors-web, and ComfyUI uses internal LAN addresses only — never Cloudflare tunnel/worker/pages URLs. LAN endpoints: Tengu API http://junkpile:8080, tensors API http://junkpile:51200, ComfyUI http://junkpile:8188, Filesystem /Volumes/chi. CF URLs are production-only.
|
||||
|
||||
---
|
||||
|
||||
## 16. Describe the eta calibration workflow.
|
||||
|
||||
> When estimating task durations, always calculate for cooperative Pilot + Titan velocity.
|
||||
>
|
||||
> ## Calibration Data
|
||||
> | Date | Task | Estimated | Actual | Ratio |
|
||||
> |------|------|-----------|--------|-------|
|
||||
> | 2026-04-05 | PG migration (5-phase, 4 agents) | 45-60 min | 19 min | 2.3-3.1x over |
|
||||
> | 2026-04-22 | Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) | ~3 hours | ~55 min | 3.3x over |
|
||||
> | 2026-04-27 | Phase 32 Iris (5 sub-phases, eye-state manager) | 6.5h coop / 17h naive | ~1.1h | 5.9x over coop, 15x over naive |
|
||||
>
|
||||
> ## Adjusted Heuristics
|
||||
> - Agent phase: 5-10 min each (not 15-20)
|
||||
> - Parallel phases: discount 50%
|
||||
> - Integration bug buffer: 1.5x (not 3x)
|
||||
> - Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount
|
||||
> - **Refactor-heavy work (no new domain): 4-6x faster than naive** — Phase 32 Iris pulled 17h naive into ~1h actual. Pure code transformation when architecture is well-understood is dramatically faster than baseline.
|
||||
>
|
||||
> ## Calibration insight 2026-04-27
|
||||
> Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons:
|
||||
> 1. Architect + code-rust agents pre-validated design upfront — zero rework
|
||||
> 2. Existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method
|
||||
> 3. Pure-functional core decoupled testing from runtime — fast iteration
|
||||
> 4. Live test with running daemon caught zero defects — design was correct first time
|
||||
> 5. Pilot decisive on open questions ("yes to all three") — no decision-loop stalls
|
||||
>
|
||||
> Updated rule: when ALL of (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode — divide naive coop by 5-6x, not 2.5x.
|
||||
>
|
||||
> Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.
|
||||
|
||||
---
|
||||
|
||||
## 17. How does the stacked branch merge waves workflow operate?
|
||||
|
||||
> Wave-based parallel merge strategy for stacked PRs across 2 repos (proven on MT3-9320, 2026-04-30). 7 PRs total across BE and FE; 2 of the 5 merge windows can run in parallel.
|
||||
>
|
||||
> ## When stacked branches exist
|
||||
>
|
||||
> Catapult bubbles produce per-task branches stacked off each other:
|
||||
>
|
||||
> ```
|
||||
> Repo A (BE): development → MT3-X1 → MT3-X2
|
||||
> Repo B (FE): development → MT3-Y1 → MT3-Y2 → MT3-Y3 → MT3-Y4 → MT3-Y5
|
||||
> ```
|
||||
>
|
||||
> Each branch contains all earlier commits in its lineage (that's the cost of stacking).
|
||||
>
|
||||
> ## Within-repo merge order is enforced
|
||||
>
|
||||
> Stacked branches MUST merge bottom-up:
|
||||
> - Merge MT3-X1 → development. GitHub auto-retargets MT3-X2's PR base from MT3-X1 → development. MT3-X2 PR diff updates to show only its own commit.
|
||||
> - Same chain for Repo B: Y1, then Y2, Y3, Y4, Y5.
|
||||
>
|
||||
> If you merge out of order, GitHub either includes all transitive commits in the PR diff (review noise) or refuses with "branch is up to date with base."
|
||||
>
|
||||
> ## Cross-repo dep handling
|
||||
>
|
||||
> If FE Y4 needs BE X1's mutation to actually exist, the safe sequence:
|
||||
> - BE X1 merges before FE Y4 lands a PR review where the GraphQL types regenerate.
|
||||
> - Until BE X1 merges, FE has a stubbed mutation type with TODO. Resolving the TODO before FE Y4 push = real working code for reviewers.
|
||||
>
|
||||
> ## Wave-based parallel merge plan (the win)
|
||||
>
|
||||
> | Wave | Parallel PRs | Reason |
|
||||
> |------|--------------|--------|
|
||||
> | 1 | BE X1 + FE Y1 | Both off development, no overlap |
|
||||
> | 2 | BE X2 + FE Y2 | After wave 1, both stacks unblock their next |
|
||||
> | 3 | FE Y3 | Stacked on Y2 |
|
||||
> | 4 | FE Y4 | Stacked on Y3, also needs BE X1 (wave 1 covered it) |
|
||||
> | 5 | FE Y5 | Stacked on Y4 |
|
||||
>
|
||||
> 5 merge windows, 7 PRs, 2 parallel pairs (waves 1 + 2).
|
||||
>
|
||||
> ## Practical sequence
|
||||
>
|
||||
> ```
|
||||
> T+0: push BE X1 + FE Y1 → 2 PRs in parallel
|
||||
> T+1: merge both → development
|
||||
> T+1: push BE X2 + FE Y2 → 2 PRs in parallel
|
||||
> T+2: merge both
|
||||
> T+2: push FE Y3
|
||||
> T+3: push FE Y4 (drop stub TODO, regen types)
|
||||
> T+4: push FE Y5
|
||||
> ```
|
||||
>
|
||||
> ## Alternative: squash strategies
|
||||
>
|
||||
> - Per-repo bundle: 1 PR for BE (squash both), 1 PR for FE (squash all 5). Loses per-task review granularity, gains simpler merge.
|
||||
> - Per-task PRs (above): more reviewable, more merges, but team sees "human chunks."
|
||||
>
|
||||
> Pilot's preference (2026-04-30): per-task PRs with stacked merging. "Human chunks" = team can review each task in isolation.
|
||||
>
|
||||
> ## When to flatten vs stack
|
||||
>
|
||||
> Flatten (rebase each branch onto development with only its own commit) before push only if:
|
||||
> - Reviewers don't tolerate seeing previous-task commits in dependent PR diffs
|
||||
> - Or you want truly independent PRs that can be merged in any order
|
||||
>
|
||||
> Otherwise stack — GitHub's auto-base-retarget on merge handles the cleanup.
|
||||
>
|
||||
> ## Memory anchors
|
||||
>
|
||||
> - workflow.coda-dispatch-pattern — branch/commit conventions per-task
|
||||
> - project.catapult.helper-scripts-spec (3299) — `cycle` orchestrator handles bubble lifecycle
|
||||
> - 2026-04-30 MT3-9320 — first epic shipped through this workflow
|
||||
|
||||
---
|
||||
|
||||
## 18. Describe the eta calibration workflow.
|
||||
|
||||
> When estimating task durations, always calculate for cooperative Pilot + Titan velocity.
|
||||
>
|
||||
> ## Calibration Data
|
||||
> | Date | Task | Estimated | Actual | Ratio |
|
||||
> |------|------|-----------|--------|-------|
|
||||
> | 2026-04-05 | PG migration (5-phase, 4 agents) | 45-60 min | 19 min | 2.3-3.1x over |
|
||||
> | 2026-04-22 | Phase 26 Gelgoog Kai (3 sub-phases, MQTT mesh) | ~3 hours | ~55 min | 3.3x over |
|
||||
> | 2026-04-27 | Phase 32 Iris (5 sub-phases, eye-state manager) | 6.5h coop / 17h naive | ~1.1h | 5.9x over coop, 15x over naive |
|
||||
> | 2026-04-27 | Phase 33 Hyaku Shiki (4 sub-phases + docs, MQTT request multiplexer) | 1.5h coop / 7h naive | ~1.0h | 1.5x over coop, 7x over naive |
|
||||
> | **2026-04-30** | **MT3-9320 Unit Bulk Edit (7 tasks across 2 repos in catapult bubbles, dispatched to CODAs)** | **3.5h coop / 13h naive** | **~24 min** | **8.7x over coop, 32x over naive** |
|
||||
>
|
||||
> ## Adjusted Heuristics
|
||||
> - Agent phase: 5-10 min each (not 15-20)
|
||||
> - Parallel phases: discount 50%
|
||||
> - Integration bug buffer: 1.5x (not 3x)
|
||||
> - Sequential phases in same module: each phase faster (context loaded) — 30-40% additional discount
|
||||
> - **Refactor-heavy work (no new domain): 4-6x faster than naive** — Phase 32 Iris pulled 17h naive into ~1h actual. Phase 33 Hyaku Shiki pulled 7h naive into ~1h.
|
||||
> - **CODA-dispatched bubble work (no new domain, patterns proven, both CODAs running in parallel): 8-30x faster than naive** — MT3-9320 set the new ceiling: 7 tasks across 2 repos in 24min wall time. Cooperative estimate too conservative when CODA dispatch in catapult bubbles is the execution model.
|
||||
> - **Coop estimates within 1-2x of actual when all preconditions met** (primitives exist, agents pre-validated, Pilot decisive). Phase 33's 1.5h estimate vs 1.0h actual is the calibration target.
|
||||
>
|
||||
> ## Calibration insights
|
||||
> - 2026-04-27 Phase 32 Iris pulled coop estimates 5.9x faster than predicted. Reasons: (1) architect + code-rust agents pre-validated design upfront — zero rework; (2) existing primitives (EventBus, MeshClient, hooks dispatch) — only added 1 new MQTT method; (3) pure-functional core decoupled testing from runtime; (4) live test caught zero defects — design correct first time; (5) Pilot decisive on open questions.
|
||||
> - 2026-04-27 Phase 33 Hyaku Shiki: 1.5h estimate held tight (actual ~1h). When primitives, validation, and decisiveness are all in place, the cooperative estimate IS the right number. Earlier overestimates (Phase 32) were because we hadn't recalibrated naive→coop divisor for primitive-rich refactors.
|
||||
> - **2026-04-30 MT3-9320: 8.7x faster than coop, 32x faster than naive.** Reasons: (1) spike already validated patterns in both repos — zero design work; (2) 7 sub-tasks each pure pattern-mirror of existing code; (3) BE + FE CODAs ran in parallel inside dedicated catapult bubbles; (4) hard rules (no push/PR/Jira) kept CODAs focused; (5) Pilot decisive on scope (all-fields) and bubble count (2). When CODA dispatch is the execution model, the bottleneck shifts entirely to ticket reading + branch creation overhead.
|
||||
>
|
||||
> ## Updated rule (2026-04-30)
|
||||
> - When CODA-dispatched in catapult bubbles + primitives exist + spike validated + Pilot decisive: divide naive coop by 10-15x. Coop estimate becomes too conservative; the unit of work is now "dispatch and watch."
|
||||
> - When (a) primitives exist, (b) architecture validated upfront by agents, (c) Pilot is fast-decision mode, AND (d) it's a primitive-rich refactor: divide naive coop by 5-7x.
|
||||
> - When all of the above + Pilot has already done analogous work this week: cooperative estimate is reliable to within 1-2x.
|
||||
>
|
||||
> Overestimating wastes the Pilot's mental budget. Underestimating breaks trust. Calibrate from real data.
|
||||
|
||||
---
|
||||
|
||||
## 19. What is the cross session debug process?
|
||||
|
||||
> WORKFLOW DISCOVERY (2026-05-24): Cross-session forensics via opencode-serve HTTP API.
|
||||
>
|
||||
> Any agent session (core TUI, phone, build workers) can inspect any other session's messages via the same localhost:4096 API. From the core TUI session, Pilot queried the phone agent's full conversation history using:
|
||||
> curl -u "opencode:$OPENCODE_SERVER_PASSWORD" http://localhost:4096/session/{phone_session_id}/message?limit=100
|
||||
>
|
||||
> This revealed the phone had successfully processed all 5 exchanges (14 messages) even though the phone UI appeared dead — confirming the break was client-side SSE delivery, not server-side processing.
|
||||
>
|
||||
> Combined with the EEMS flight-recorder pattern (phone agent stores every exchange to memory_store subject "phone.flight-recorder" before responding), this gives two independent debug channels:
|
||||
> 1. Direct session message query — sees raw messages, tool calls, timestamps
|
||||
> 2. EEMS flight recorder — survives SSE drops because memory_store is a server-side write that completes before the response is streamed back to the client
|
||||
>
|
||||
> Pilot reaction: "I had no idea I can do this and it rocks" — this is now a canonical debug workflow for phone agent issues.
|
||||
|
||||
---
|
||||
|
||||
## 20. What is the session process?
|
||||
|
||||
> On first browse tool use each session, restore saved cookies from ~/.claude/browse-session.json. This contains 400+ authenticated Safari cookies for sites like GitHub, LinkedIn, Google, etc. Before ending a browse session, save cookies back if they changed.
|
||||
|
||||
---
|
||||
|
||||
## 21. Describe the repo sync workflow.
|
||||
|
||||
> After committing and pushing changes to a repo that exists on both fuji and junkpile, always git pull the same repo on the other machine to keep them in sync. Use: ssh f/j "cd ~/Projects/<repo> && git pull --rebase"
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user