# Bugs (9 examples) ## 1. What do you know about newspaper issue 0003? > BT Newspaper Issue 0003 — 2026-05-11 Evening. Sent to Kindle 21:34 CEST. message_id 19e188983aa61c98. > > Theme: operator-archetype + architecture-as-substrate. Three pieces, ~180 words each: > > 1. **DAY-ONE WIRE — How six people built the chip in every smartphone on Earth.** Sophie Wilson + Steve Furber at Acorn Computers, Cambridge 1983-85. RISC, 18 months, first silicon ran on leakage current alone (ammeter read zero). ARM Holdings = licenses architecture, designs no chips. Saiden thesis mirror: ARM not Intel. > > 2. **OPERATOR FILE — Bell Labs, the PDP-11, and two men playing.** Ken Thompson + Dennis Ritchie at Bell Labs post-Multics, 1969-70. Toy OS for a space-travel game became Unix + C. No headcount, no roadmap. "Operators who play seriously outproduce committees who plan rigorously." > > 3. **CURIO — Why your email has an @ in it.** Ray Tomlinson, BBN, 1971. Picked @ from the Model 33 Teletype keyboard because it was the only unused symbol. First email recipient: himself, on another machine in the same room. Tombstone: @. > > Companion: Issue 0004 (lighter, later-night reading; for break 2). Both sent before break 1 fired. > > Sequence: Issue 0001 (creator), 0002 (ARM-not-Stark doctrine trigger, 2026-05-10), 0003 (this), 0004 (companion). --- ## 2. What's the vision for debug toggle mcp tool? > DESIGN IDEA (2026-05-24): Runtime debug toggle via marauder MCP tool. > > PROBLEM: Debug directives like the EEMS flight recorder need to be injected into agent prompts temporarily. Current method: manually edit opencode.json prompt text + restart server + fresh session. Works but takes ~30 seconds and is error-prone. > > PROPOSED TOOL: debug_toggle(agent, feature, enabled) > > MECHANISM: > - Reads ~/.config/opencode/opencode.json > - Injects or strips a named debug directive block from the target agent's prompt > - Restarts opencode-serve (brew services restart) > - Returns new state (agent, feature, enabled/disabled) > > DESIGN PRINCIPLES: > - Zero overhead when off — directive is literally absent from the prompt, not a per-turn flag check > - Full cost only when on — same prompt injection pattern as manual edit > - Agent can toggle its own debug features (kamikaze-style — knows next turn picks up the change) > - Extensible to other debug features beyond flight-recorder > > KNOWN DEBUG FEATURES TO SUPPORT: > 1. flight-recorder — store every exchange to EEMS (subject phone.flight-recorder) before responding. Adds ~3-8s latency + ~$0.03/turn token cost + compounding context bloat. > 2. (future) verbose-tools — log all tool inputs/outputs to EEMS > 3. (future) turn-timing — store TTFT/gen/total timing to EEMS per turn > 4. (future) context-dump — store full context window size to EEMS per turn > > IMPLEMENTATION LOCATION: marauder MCP server (Rust, marauder-os/marauder repo) > > STORAGE FORMAT: Debug directive blocks in the prompt could be delimited by markers: > > ...directive text... > > > The toggle tool finds these markers and inserts/removes the block. Clean, no regex guessing. > > ALTERNATIVE CONSIDERED: Memory-flag approach (store config.flight-recorder in EEMS, prompt says "check flag at boot"). Rejected because it adds overhead to EVERY turn (memory_recall to check the flag) even when debug is off. Defeats the purpose. > > ORIGIN: Pilot suggestion after using the manual flight-recorder toggle during phone comms break debug (EEMS #6440). "I think we need a toggle for something like this for debug — togglable in config and accessible for you to toggle too." --- ## 3. What do you know about bugfix opencode serve eacces 500? > ROOT CAUSE of opencode-serve HTTP 500s on all v1 routes (/session, /session/status, POST /session): > > EACCES: permission denied, lstat '/Users/chi' > > The serve daemon (PID runs as user madcat) had WorkingDirectory=/Users/chi in the launchd plist. Every API request triggers realpathSync(cwd) internally, which failed because madcat couldn't lstat chi's home directory (750 permissions, chi:staff, madcat not in staff group). > > FIX (2026-05-25): > 1. Changed plist WorkingDirectory from /Users/chi to /Users/madcat > 2. Added ACL on /Users/chi: `chmod +a 'user:madcat allow list,search,readattr,readextattr,readsecurity' /Users/chi` > 3. Killed serve daemon (sudo kill PID), KeepAlive respawned with new config > > PLIST: /Users/madcat/Library/LaunchAgents/homebrew.mxcl.opencode-serve.plist > RESTART METHOD: kill PID (KeepAlive respawns). `brew services restart` fails via sudo due to launchd domain mismatch (user/* vs gui/*). > > VALIDATION: Two consecutive lance pipeline runs (create→prompt→poll→read→cleanup) both succeeded. Prior to fix, second consecutive call always 500'd. > > Supersedes the WorkingDirectory=/Users/chi "fix" from EEMS 6489 which was incorrect — that entry set HOME=/Users/chi to fix DB path, but the real issue was WorkingDirectory making the daemon try to lstat an unreadable directory. --- ## 4. Report on sweep partial success bug. > madcat-visual sweep verb partial-success was claimed in docstring but broken > in implementation. Fixed in PR #5 (merged 2026-05-20 as 64c07a5). > > THE BUG: > - `do_sweep` docstring: "Partial success is supported — if frame 4 fails we still snap 5-9." > - BUT: `tapo.moveMotor(int(dx), int(dy))` was OUTSIDE the try/except. > - Only `rtsp.grab_one_frame` was wrapped. > - Result: any motor failure (most commonly MOTOR_LOCKED_ROTOR error_code -64304 > when sweep deltas push past c225's pan/tilt limits) aborted the entire sweep. > - The dir got the captured frames so far but the function raised before reaching > any post-loop logic (the new latest-symlink update in PR #5). > > THE FIX (PR #5 commit 70e6169): > - Wrap moveMotor in per-position try/except. > - Append failure entry with `"moveMotor: ..."` prefix to failures list. > - `continue` to next delta — skip the frame-grab slot (no point grabbing from > a position the camera didn't reach). > - Loop continues past motor failures. > > FAILURE MESSAGE FORMAT (now triage-friendly): > - `"moveMotor: Error: Maximum Pan/Tilt range reached (MOTOR_LOCKED_ROTOR), ..."` — motor side. > - `"rtsp: ..."` — frame grab side. > > VALIDATION (sweep from Livingroom preset): > - 8/9 frames captured. Frame 5 (BR) failed: MOTOR_LOCKED_ROTOR. > - Loop continued through frames 6,7,8. > - latest-symlink populated correctly on partial success. > > DOCTRINE NOTE: MOTOR_LOCKED_ROTOR (-64304) is a NORMAL sweep edge condition > on c225, not a fault. The camera enforces pan ±170° / tilt ±35° hard limits. > Sweep deltas of ±60° pan + ±15° tilt can hit them from non-central start > positions. Always expect this on sweeps starting near a preset that's > already at an extreme. --- ## 5. What do you know about issue madcat env truncated in ssh? > Discovered 2026-05-16. `ssh madcat 'echo ${#CLOUDFLARE_EMAIL} ${#CLOUDFLARE_API_KEY}'` reports `13` and `11` — the real fuji values are `26` and `37`. The vars exist on madcat (per `env | grep CLOUDFLARE` interactive output) but get TRUNCATED somewhere in the non-interactive ssh path. > > ## Symptoms > - `~/.bashrc` on madcat has 0 CLOUDFLARE_* references — so vars come from somewhere else (systemd user-env? `~/.profile`? `/etc/environment`? a leaked `Environment=` directive in a unit?) > - Truncation pattern (13/11 chars) suggests values are being read from a different source that holds shorter strings — maybe placeholders or stale entries. > - Result: any CF API call from `ssh madcat '...'` returns `9106: Authentication failed (status: 400)`. > > ## Workaround (in use) > Run CF API curls from fuji's chi shell where env is correct (loaded via fuji's zsh startup). Account-level API calls don't care which host hits them. > > ## TODO (low priority, deferred) > 1. `ssh madcat 'cat /etc/environment ~/.profile ~/.bash_login ~/.bash_logout 2>/dev/null; systemctl --user show-environment | grep -i cloudflare'` > 2. Find source of truncated values, remove or fix > 3. Optionally inject correct values via `systemctl --user set-environment` or in `~/.bashrc` if they need to be there for systemd services. > > ## NOT a blocker for code.saiden.dev work > The cloudflared tunnel uses cert.pem (zone-scoped, not API key). Wrangler uses its own OAuth (`wrangler login`). Only flarectl/curl-based CF API calls are affected, and those run from fuji. --- ## 6. Report on bug1 serverbusy fix. > EEMS #6440 Bug 1 (isServerBusy stale state) fix merged in PR #11. > > Bug: isServerBusy only flips false on SSE session.status idle events. When SSE stream drops (cloudflared tunnel timeout), idle event never arrives, leaving isServerBusy stuck true forever. Next sendPrompt() calls abortInFlight() with guard passing on stale-true, causing abort to hit idle server and leaving next prompt unprocessed. > > Fix applied in syncStateAfterReconnect() in MadcatService.swift: > - Added unconditional isServerBusy=false reset when GET /session/{id} reveals server is idle > - Added foundAssistantMsg tracking to detect when no assistant message exists > - If assistant message found but no fresh text, reset busy flag > - If no assistant message found at all, reset busy flag > > This is Option 1 from the bug doc (lowest blast radius). > > PR: https://github.com/marauder-os/madcat-apple/pull/11 > Branch: fix/serverbusy-stale-state (merged to main) > Diff: MadcatPhone/Services/MadcatService.swift - 16 insertions > > Note: AGENTS.md bug documentation already exists (not modified per task constraints). --- ## 7. Describe the phone comms break 2026 05 24 issue. > PHONE COMMS BREAK — 5 bugs discovered during flight-recorder debug session (2026-05-24) > > CONTEXT: Phone agent (Opus 4.6) worked for 5 exchanges on session ses_1a650c545ffeKBpXxztJgHj2ZL, then comms broke. Flight recorder (EEMS subject phone.flight-recorder) confirmed all server-side responses generated correctly — break was client-side. > > BUG 1 — isServerBusy stale state (CRITICAL) > SYMPTOM: Phone gets stuck showing "thinking" after SSE stream drops. > ROOT CAUSE: isServerBusy is set to false ONLY by SSE `session.status idle` events. When SSE drops (cloudflared tunnel timeout on long-lived connections), `session.status idle` never arrives → isServerBusy stays true forever. Next sendPrompt() calls abortInFlight() which fires (guard passes because isServerBusy is stale-true) → abort hits an idle server → documented footgun: "Calling abort when the server is already idle has been observed to leave the next prompt unprocessed." > FIX: (a) Reset isServerBusy=false in syncStateAfterReconnect unconditionally when server session shows idle. (b) Add staleness timeout — if isServerBusy true for >N seconds without any SSE event, force it false. (c) In abortInFlight, add a GET /session/{id} check to verify server is actually busy before sending abort. > > BUG 2 — fetchTTS 60s timeout stalls UI silently > SYMPTOM: Phone shows "speaking" animation but no audio plays for up to 60 seconds. > ROOT CAUSE: fetchTTS has req.timeoutInterval = 60. If tts.saiden.dev stalls (bastion→sin WG hop), the phone sits in turnPhase="speak", eyeState="speaking" with dead air until timeout. Only then does the catch block trigger AVSpeech fallback. > FIX: Reduce fetchTTS timeout to ~10s. Consider non-blocking TTS fetch with immediate AVSpeech fallback if fetch doesn't return in 5s. > > BUG 3 — toolStateByPart unbounded growth > SYMPTOM: Memory leak across session lifetime (minor). > ROOT CAUSE: toolStateByPart dictionary (used for deduplicating tool status updates) is never cleared between turns. Every tool call adds entries. Over a long session with many tool calls (Opus + EEMS flight recorder = 5+ tools/turn), this grows unbounded. > FIX: Clear toolStateByPart in sendPrompt() alongside assistantTextByPart. > > BUG 4 — No visible error surface on prompt POST failure > SYMPTOM: User sends PTT message, nothing happens, phone looks "idle" and normal. > ROOT CAUSE: When sendPrompt POST fails (timeout, network, HTTP 500), only statusLine text changes. No alert, no haptic feedback, no persistent error indicator. Eye flips to "idle". User can't distinguish "idle and ready" from "failed silently." > FIX: Add eyeState="alert" on prompt failure. Consider haptic feedback (UIImpactFeedbackGenerator). Show error in viewport area, not just status line. > > BUG 5 — createSession doesn't specify agent > SYMPTOM: Session created with default agent instead of "phone". > ROOT CAUSE: createSession() passes json: [:] (empty body). Server picks default agent. Works because sendPrompt() specifies "agent":"phone" per-prompt, but fragile. > FIX: Pass {"agent": "phone"} in createSession body. > > FILES AFFECTED: > - MadcatPhone/Services/MadcatService.swift — all 5 bugs > - MadcatPhone/Services/Speech.swift — indirectly (Bug 2 callback) > - MadcatPhone/Views/ContentView.swift — Bug 4 (no error UI) > > RELATED EEMS: > - #6436 bug.boot-identity-recall-glob-failure (discovered in same phone session) > - #6437 design.boot-identity-manifest (proposed fix for boot bug) > - #6438, #6439 phone.flight-recorder entries --- ## 8. What do you know about newspaper issue 0004? > BT Newspaper Issue 0004 — 2026-05-11 Late Evening. Sent to Kindle 21:37 CEST. message_id 19e188b0fd8d90ad. For Pilot's break 2 (00:38-01:08). > > Theme: priority-task-scheduling philosophy + architecture-rejected-then-vindicated + pattern-extension. Three pieces, ~180 words each: > > 1. **DAY-ONE WIRE — The 26-year-old who saved Apollo 11.** Margaret Hamilton, MIT, 20 July 1969. AGC 1202 alarm three minutes before lunar landing. Priority-scheduled task swapping she'd designed exactly for this case. Coined "software engineering" because male hardware engineers wouldn't take her field seriously. Daughter crashing the simulator → response was "system should survive it" not "kids shouldn't touch keyboard." > > 2. **OPERATOR FILE — The actress who invented Wi-Fi.** Hedy Lamarr + George Antheil, 1942. Frequency-hopping torpedo guidance, derived from Antheil's piano-roll mechanism for Ballet Mécanique. Patent 2,292,387. Navy rejected ("too complicated"), then rediscovered in 1960s → became Wi-Fi, Bluetooth, GPS military signals. Lamarr got zero royalties, token award 1997. "The navy didn't reject the idea because it was wrong. They rejected it because it came from a woman and a composer." > > 3. **CURIO — Why ham radio operators sign off "73."** Phillips Code 1857, Western Union telegraph shorthand for "best regards." Survived telegraph → radio → voice → digital. Pattern-extension over invention. > > Texture distinct from 0003: two women engineers vs 0003's all-men cast. Resonates with counter-UAS queue (Lamarr) + AmprNet queue (73 ham radio history). --- ## 9. What was the phone dual speak bug? > KNOWN BUG: Dual TTS playback — phone AND fuji both speak on phone turns (2026-05-24). > > SYMPTOM: When the phone sends a prompt via phone.saiden.dev, the response plays audio on BOTH the phone (client-side fetchTTS → tts.saiden.dev → bt7274-en piper WAV) AND on fuji (server-side LLM calls marauder MCP speak tool). > > ROOT CAUSE: The phone agent's prompt was updated to say "do NOT call the speak tool" but the LLM may still call it anyway (prompt compliance is not guaranteed). The speak tool is available to the phone agent via the marauder MCP server attached to opencode-serve on fuji. When the LLM calls speak(), piper plays audio on fuji's speakers. > > ATTEMPTED FIX: Phone agent prompt in ~/.config/opencode/opencode.json updated 2026-05-24 to explicitly say "do NOT call the speak tool (that plays on the server host, not the phone)". Did not fully resolve. > > PROPER FIX OPTIONS: > 1. Add speak to the phone agent's tool denials in the opencode config (permission block: "speak": "deny") — prevents the tool from being available at all > 2. Add speak/stop to the fieldToolDenials() in MadcatService.swift so the phone's prompt body denies it per-turn > 3. Both (belt and suspenders) > > The phone app's own TTS path (fetchTTS → tts.saiden.dev → bt7274-en WAV → AVAudioPlayer) is the correct one for phone playback. The speak tool should never fire for phone sessions. > > STATUS: Deferred. Noted for next phone session. ---