add docs: system lora plan, specialist specs, training review

2026-05-31 11:38:46 +02:00
parent 4678816795
commit 4cef9386b1
23 changed files with 62713 additions and 0 deletions
@@ -0,0 +1,122 @@
+# madcat System LoRA — Qwen3-Coder-Next
+
+## Goal
+
+Bake madcat ecosystem awareness into Qwen3-Coder-Next so any agent session
+knows how to use TTS, EEMS, mesh hosts, personas, and tools without relying
+on massive system prompts. Cart/persona identity loads at runtime via system
+prompt — the LoRA provides the operational substrate.
+
+## Base Model
+
+- **Training**: `Qwen/Qwen3-Coder-Next` (BF16, 80B total / 3B active MoE)
+- **Serving**: `cyankiwi/qwen3-coder-next:awq` on sin vLLM with `--enable-lora`
+- **Architecture**: Hybrid DeltaNet + Attention, 512 experts, 10 active
+
+## Training Config
+
+- Hardware: RunPod H200 141GB (QLoRA 4-bit on single GPU)
+- LoRA: r=16, alpha=16, bf16
+- Optimizer: adamw_torch
+- Batch: 1, grad_accum 8
+- Epochs: 3
+- LR: 5e-5
+- MAX_SEQ: 4096 (behavioral examples are short)
+- Estimated time: ~20-30 minutes
+
+## Dataset: ~100 examples
+
+### Categories
+
+| Category | ~Count | Description |
+|---|---|---|
+| TTS awareness | 15 | Check settings for active cart, use `tts_speak` with active cart voice, cart switching, voice selection |
+| EEMS/memory | 20 | Use `memory_recall`/`memory_store` before work, check both native + legacy stores, lazy recall patterns, boot recalls |
+| Persona loading | 10 | Load cart persona from settings, apply cadence/voice/identity at runtime, never hardcode personality into responses |
+| Mesh/infra | 15 | Host awareness (fuji/sin/junkpile), cross-host tools, SSH patterns, service locations, WireGuard mesh |
+| Tool discipline | 15 | Settings check on boot, proper tool naming (no redundant namespacing), code style rules, index search before work |
+| Agent behavior | 15 | Todo management, pre-work checklist (indexes -> EEMS -> legacy -> verify), pilot interlock, earned canonization |
+| Error patterns | 10 | MCP down alerts, mesh offline handling, graceful degradation, repeat detection |
+
+### Example Format
+
+Hermes tool-call format (same as bt7274-v4). Each example is a multi-turn
+conversation showing correct madcat-aware behavior:
+
+```jsonl
+{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "...", "tool_calls": [...]}]}
+```
+
+### Example Content Patterns
+
+**TTS awareness:**
+- User says "say hello" -> agent checks active cart via settings, calls `tts_speak` with correct voice
+- User asks to switch voice -> agent uses `tts_switch_voice` with cart tag
+- Agent responds to user with TTS when settings indicate TTS is enabled
+
+**EEMS/memory:**
+- Before starting code work -> agent runs `memory_recall` for the topic, checks `memory_list`
+- Agent stores a win/doctrine/procedure after completing significant work
+- Agent checks legacy EEMS via `legacy-memory_recall` for broader context
+- Boot sequence: 3 parallel recalls (self-model, cart, user preferences)
+
+**Persona loading:**
+- Agent reads active cart from settings and adjusts cadence accordingly
+- Agent does NOT hardcode "I am BT-7274" — loads identity from cart config
+- Different carts produce different communication styles
+
+**Mesh/infra:**
+- Agent knows fuji=macOS/ARM64, sin=GB10/ARM64, junkpile=x86_64/RTX2000Ada
+- Agent uses correct SSH aliases (sin, madcat@sin, madcat@10.0.0.2)
+- Agent checks host before running platform-specific commands
+
+**Tool discipline:**
+- In `users.ts`, export `list` not `user_list`
+- In a `Session` class, method is `create` not `createSession`
+- Search indexes before writing code (17k+ code chunks, 44k+ doc chunks)
+
+**Agent behavior:**
+- Use TodoWrite for 3+ step tasks
+- Mark todos in_progress before working, completed after done
+- Pre-work: index_search -> memory_recall -> legacy_recall -> verify sources
+
+**Error patterns:**
+- MCP server disconnected -> alert user, suggest reconnect
+- Mesh host unreachable -> check WireGuard, try alternative path
+- Tool execution fails -> retry once, then surface error clearly
+
+## Serving
+
+```bash
+# sin vLLM with AWQ base + LoRA adapter
+vllm serve --config /etc/vllm/qwen3-coder-next.yaml \
+  --enable-lora \
+  --lora-modules madcat=/path/to/madcat-system-lora
+```
+
+## File Layout
+
+```
+~/Projects/lora/
+├── docs/madcat-system-lora.md    # this file
+├── data/madcat-system.jsonl      # training dataset (to be created)
+├── gen_madcat_dataset.py         # dataset generation script (to be created)
+├── train_madcat.py               # training script (to be created)
+└── madcat-system-lora/           # output adapter (after training)
+```
+
+## Open Questions
+
+- AWQ + LoRA on MoE: confirm vLLM supports this combo for Qwen3-Coder-Next
+- LoRA target modules: attention layers only, or also shared expert?
+- Dataset generation: mine from existing opencode sessions or hand-craft?
+- Cart settings surface: how does the agent read "active cart" at runtime?
+
+## Related
+
+- `docs/specialist-plan.md` — language-specific LoRA adapters (oxidizer, prism, etc.)
+- `docs/bt7274-persona.md` — BT-7274 persona LoRA (v4, 802 examples)
+- EEMS: `project.lora.specialist-plan` — specialist adapter plan
+- EEMS: `project.lora.bt7274-v4` — BT persona training details
+- `/etc/vllm/qwen3-coder-next.yaml` — current AWQ serving config on sin
+- `/etc/vllm/qwen3-coder-next-fp8.yaml` — FP8 config (won't fit GB10 with co-tenants)