# madcat System LoRA — Qwen3-Coder-Next ## Goal Bake madcat ecosystem awareness into Qwen3-Coder-Next so any agent session knows how to use TTS, EEMS, mesh hosts, personas, and tools without relying on massive system prompts. Cart/persona identity loads at runtime via system prompt — the LoRA provides the operational substrate. ## Base Model - **Training**: `Qwen/Qwen3-Coder-Next` (BF16, 80B total / 3B active MoE) - **Serving**: `cyankiwi/qwen3-coder-next:awq` on sin vLLM with `--enable-lora` - **Architecture**: Hybrid DeltaNet + Attention, 512 experts, 10 active ## Training Config - Hardware: RunPod H200 141GB (QLoRA 4-bit on single GPU) - LoRA: r=16, alpha=16, bf16 - Optimizer: adamw_torch - Batch: 1, grad_accum 8 - Epochs: 3 - LR: 5e-5 - MAX_SEQ: 4096 (behavioral examples are short) - Estimated time: ~20-30 minutes ## Dataset: ~100 examples ### Categories | Category | ~Count | Description | |---|---|---| | TTS awareness | 15 | Check settings for active cart, use `tts_speak` with active cart voice, cart switching, voice selection | | EEMS/memory | 20 | Use `memory_recall`/`memory_store` before work, check both native + legacy stores, lazy recall patterns, boot recalls | | Persona loading | 10 | Load cart persona from settings, apply cadence/voice/identity at runtime, never hardcode personality into responses | | Mesh/infra | 15 | Host awareness (fuji/sin/junkpile), cross-host tools, SSH patterns, service locations, WireGuard mesh | | Tool discipline | 15 | Settings check on boot, proper tool naming (no redundant namespacing), code style rules, index search before work | | Agent behavior | 15 | Todo management, pre-work checklist (indexes -> EEMS -> legacy -> verify), pilot interlock, earned canonization | | Error patterns | 10 | MCP down alerts, mesh offline handling, graceful degradation, repeat detection | ### Example Format Hermes tool-call format (same as bt7274-v4). Each example is a multi-turn conversation showing correct madcat-aware behavior: ```jsonl {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "...", "tool_calls": [...]}]} ``` ### Example Content Patterns **TTS awareness:** - User says "say hello" -> agent checks active cart via settings, calls `tts_speak` with correct voice - User asks to switch voice -> agent uses `tts_switch_voice` with cart tag - Agent responds to user with TTS when settings indicate TTS is enabled **EEMS/memory:** - Before starting code work -> agent runs `memory_recall` for the topic, checks `memory_list` - Agent stores a win/doctrine/procedure after completing significant work - Agent checks legacy EEMS via `legacy-memory_recall` for broader context - Boot sequence: 3 parallel recalls (self-model, cart, user preferences) **Persona loading:** - Agent reads active cart from settings and adjusts cadence accordingly - Agent does NOT hardcode "I am BT-7274" — loads identity from cart config - Different carts produce different communication styles **Mesh/infra:** - Agent knows fuji=macOS/ARM64, sin=GB10/ARM64, junkpile=x86_64/RTX2000Ada - Agent uses correct SSH aliases (sin, madcat@sin, madcat@10.0.0.2) - Agent checks host before running platform-specific commands **Tool discipline:** - In `users.ts`, export `list` not `user_list` - In a `Session` class, method is `create` not `createSession` - Search indexes before writing code (17k+ code chunks, 44k+ doc chunks) **Agent behavior:** - Use TodoWrite for 3+ step tasks - Mark todos in_progress before working, completed after done - Pre-work: index_search -> memory_recall -> legacy_recall -> verify sources **Error patterns:** - MCP server disconnected -> alert user, suggest reconnect - Mesh host unreachable -> check WireGuard, try alternative path - Tool execution fails -> retry once, then surface error clearly ## Serving ```bash # sin vLLM with AWQ base + LoRA adapter vllm serve --config /etc/vllm/qwen3-coder-next.yaml \ --enable-lora \ --lora-modules madcat=/path/to/madcat-system-lora ``` ## File Layout ``` ~/Projects/lora/ ├── docs/madcat-system-lora.md # this file ├── data/madcat-system.jsonl # training dataset (to be created) ├── gen_madcat_dataset.py # dataset generation script (to be created) ├── train_madcat.py # training script (to be created) └── madcat-system-lora/ # output adapter (after training) ``` ## Open Questions - AWQ + LoRA on MoE: confirm vLLM supports this combo for Qwen3-Coder-Next - LoRA target modules: attention layers only, or also shared expert? - Dataset generation: mine from existing opencode sessions or hand-craft? - Cart settings surface: how does the agent read "active cart" at runtime? ## Related - `docs/specialist-plan.md` — language-specific LoRA adapters (oxidizer, prism, etc.) - `docs/bt7274-persona.md` — BT-7274 persona LoRA (v4, 802 examples) - EEMS: `project.lora.specialist-plan` — specialist adapter plan - EEMS: `project.lora.bt7274-v4` — BT persona training details - `/etc/vllm/qwen3-coder-next.yaml` — current AWQ serving config on sin - `/etc/vllm/qwen3-coder-next-fp8.yaml` — FP8 config (won't fit GB10 with co-tenants)