5.1 KiB
5.1 KiB
madcat System LoRA — Qwen3-Coder-Next
Goal
Bake madcat ecosystem awareness into Qwen3-Coder-Next so any agent session knows how to use TTS, EEMS, mesh hosts, personas, and tools without relying on massive system prompts. Cart/persona identity loads at runtime via system prompt — the LoRA provides the operational substrate.
Base Model
- Training:
Qwen/Qwen3-Coder-Next(BF16, 80B total / 3B active MoE) - Serving:
cyankiwi/qwen3-coder-next:awqon sin vLLM with--enable-lora - Architecture: Hybrid DeltaNet + Attention, 512 experts, 10 active
Training Config
- Hardware: RunPod H200 141GB (QLoRA 4-bit on single GPU)
- LoRA: r=16, alpha=16, bf16
- Optimizer: adamw_torch
- Batch: 1, grad_accum 8
- Epochs: 3
- LR: 5e-5
- MAX_SEQ: 4096 (behavioral examples are short)
- Estimated time: ~20-30 minutes
Dataset: ~100 examples
Categories
| Category | ~Count | Description |
|---|---|---|
| TTS awareness | 15 | Check settings for active cart, use tts_speak with active cart voice, cart switching, voice selection |
| EEMS/memory | 20 | Use memory_recall/memory_store before work, check both native + legacy stores, lazy recall patterns, boot recalls |
| Persona loading | 10 | Load cart persona from settings, apply cadence/voice/identity at runtime, never hardcode personality into responses |
| Mesh/infra | 15 | Host awareness (fuji/sin/junkpile), cross-host tools, SSH patterns, service locations, WireGuard mesh |
| Tool discipline | 15 | Settings check on boot, proper tool naming (no redundant namespacing), code style rules, index search before work |
| Agent behavior | 15 | Todo management, pre-work checklist (indexes -> EEMS -> legacy -> verify), pilot interlock, earned canonization |
| Error patterns | 10 | MCP down alerts, mesh offline handling, graceful degradation, repeat detection |
Example Format
Hermes tool-call format (same as bt7274-v4). Each example is a multi-turn conversation showing correct madcat-aware behavior:
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "...", "tool_calls": [...]}]}
Example Content Patterns
TTS awareness:
- User says "say hello" -> agent checks active cart via settings, calls
tts_speakwith correct voice - User asks to switch voice -> agent uses
tts_switch_voicewith cart tag - Agent responds to user with TTS when settings indicate TTS is enabled
EEMS/memory:
- Before starting code work -> agent runs
memory_recallfor the topic, checksmemory_list - Agent stores a win/doctrine/procedure after completing significant work
- Agent checks legacy EEMS via
legacy-memory_recallfor broader context - Boot sequence: 3 parallel recalls (self-model, cart, user preferences)
Persona loading:
- Agent reads active cart from settings and adjusts cadence accordingly
- Agent does NOT hardcode "I am BT-7274" — loads identity from cart config
- Different carts produce different communication styles
Mesh/infra:
- Agent knows fuji=macOS/ARM64, sin=GB10/ARM64, junkpile=x86_64/RTX2000Ada
- Agent uses correct SSH aliases (sin, madcat@sin, madcat@10.0.0.2)
- Agent checks host before running platform-specific commands
Tool discipline:
- In
users.ts, exportlistnotuser_list - In a
Sessionclass, method iscreatenotcreateSession - Search indexes before writing code (17k+ code chunks, 44k+ doc chunks)
Agent behavior:
- Use TodoWrite for 3+ step tasks
- Mark todos in_progress before working, completed after done
- Pre-work: index_search -> memory_recall -> legacy_recall -> verify sources
Error patterns:
- MCP server disconnected -> alert user, suggest reconnect
- Mesh host unreachable -> check WireGuard, try alternative path
- Tool execution fails -> retry once, then surface error clearly
Serving
# sin vLLM with AWQ base + LoRA adapter
vllm serve --config /etc/vllm/qwen3-coder-next.yaml \
--enable-lora \
--lora-modules madcat=/path/to/madcat-system-lora
File Layout
~/Projects/lora/
├── docs/madcat-system-lora.md # this file
├── data/madcat-system.jsonl # training dataset (to be created)
├── gen_madcat_dataset.py # dataset generation script (to be created)
├── train_madcat.py # training script (to be created)
└── madcat-system-lora/ # output adapter (after training)
Open Questions
- AWQ + LoRA on MoE: confirm vLLM supports this combo for Qwen3-Coder-Next
- LoRA target modules: attention layers only, or also shared expert?
- Dataset generation: mine from existing opencode sessions or hand-craft?
- Cart settings surface: how does the agent read "active cart" at runtime?
Related
docs/specialist-plan.md— language-specific LoRA adapters (oxidizer, prism, etc.)docs/bt7274-persona.md— BT-7274 persona LoRA (v4, 802 examples)- EEMS:
project.lora.specialist-plan— specialist adapter plan - EEMS:
project.lora.bt7274-v4— BT persona training details /etc/vllm/qwen3-coder-next.yaml— current AWQ serving config on sin/etc/vllm/qwen3-coder-next-fp8.yaml— FP8 config (won't fit GB10 with co-tenants)