lora/docs/madcat-system-lora.md

# madcat System LoRA — Qwen3-Coder-Next

## Goal

Bake madcat ecosystem awareness into Qwen3-Coder-Next so any agent session
knows how to use TTS, EEMS, mesh hosts, personas, and tools without relying
on massive system prompts. Cart/persona identity loads at runtime via system
prompt — the LoRA provides the operational substrate.

## Base Model

- **Training**: `Qwen/Qwen3-Coder-Next` (BF16, 80B total / 3B active MoE)
- **Serving**: `cyankiwi/qwen3-coder-next:awq` on sin vLLM with `--enable-lora`
- **Architecture**: Hybrid DeltaNet + Attention, 512 experts, 10 active

## Training Config

- Hardware: RunPod H200 141GB (QLoRA 4-bit on single GPU)
- LoRA: r=16, alpha=16, bf16
- Optimizer: adamw_torch
- Batch: 1, grad_accum 8
- Epochs: 3
- LR: 5e-5
- MAX_SEQ: 4096 (behavioral examples are short)
- Estimated time: ~20-30 minutes

## Dataset: ~100 examples

### Categories

| Category | ~Count | Description |
|---|---|---|
| TTS awareness | 15 | Check settings for active cart, use `tts_speak` with active cart voice, cart switching, voice selection |
| EEMS/memory | 20 | Use `memory_recall`/`memory_store` before work, check both native + legacy stores, lazy recall patterns, boot recalls |
| Persona loading | 10 | Load cart persona from settings, apply cadence/voice/identity at runtime, never hardcode personality into responses |
| Mesh/infra | 15 | Host awareness (fuji/sin/junkpile), cross-host tools, SSH patterns, service locations, WireGuard mesh |
| Tool discipline | 15 | Settings check on boot, proper tool naming (no redundant namespacing), code style rules, index search before work |
| Agent behavior | 15 | Todo management, pre-work checklist (indexes -> EEMS -> legacy -> verify), pilot interlock, earned canonization |
| Error patterns | 10 | MCP down alerts, mesh offline handling, graceful degradation, repeat detection |

### Example Format

Hermes tool-call format (same as bt7274-v4). Each example is a multi-turn
conversation showing correct madcat-aware behavior:

```jsonl
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "...", "tool_calls": [...]}]}
```

### Example Content Patterns

**TTS awareness:**
- User says "say hello" -> agent checks active cart via settings, calls `tts_speak` with correct voice
- User asks to switch voice -> agent uses `tts_switch_voice` with cart tag
- Agent responds to user with TTS when settings indicate TTS is enabled

**EEMS/memory:**
- Before starting code work -> agent runs `memory_recall` for the topic, checks `memory_list`
- Agent stores a win/doctrine/procedure after completing significant work
- Agent checks legacy EEMS via `legacy-memory_recall` for broader context
- Boot sequence: 3 parallel recalls (self-model, cart, user preferences)

**Persona loading:**
- Agent reads active cart from settings and adjusts cadence accordingly
- Agent does NOT hardcode "I am BT-7274" — loads identity from cart config
- Different carts produce different communication styles

**Mesh/infra:**
- Agent knows fuji=macOS/ARM64, sin=GB10/ARM64, junkpile=x86_64/RTX2000Ada
- Agent uses correct SSH aliases (sin, madcat@sin, madcat@10.0.0.2)
- Agent checks host before running platform-specific commands

**Tool discipline:**
- In `users.ts`, export `list` not `user_list`
- In a `Session` class, method is `create` not `createSession`
- Search indexes before writing code (17k+ code chunks, 44k+ doc chunks)

**Agent behavior:**
- Use TodoWrite for 3+ step tasks
- Mark todos in_progress before working, completed after done
- Pre-work: index_search -> memory_recall -> legacy_recall -> verify sources

**Error patterns:**
- MCP server disconnected -> alert user, suggest reconnect
- Mesh host unreachable -> check WireGuard, try alternative path
- Tool execution fails -> retry once, then surface error clearly

## Serving

```bash
# sin vLLM with AWQ base + LoRA adapter
vllm serve --config /etc/vllm/qwen3-coder-next.yaml \
  --enable-lora \
  --lora-modules madcat=/path/to/madcat-system-lora
```

## File Layout

```
~/Projects/lora/
├── docs/madcat-system-lora.md    # this file
├── data/madcat-system.jsonl      # training dataset (to be created)
├── gen_madcat_dataset.py         # dataset generation script (to be created)
├── train_madcat.py               # training script (to be created)
└── madcat-system-lora/           # output adapter (after training)
```

## Open Questions

- AWQ + LoRA on MoE: confirm vLLM supports this combo for Qwen3-Coder-Next
- LoRA target modules: attention layers only, or also shared expert?
- Dataset generation: mine from existing opencode sessions or hand-craft?
- Cart settings surface: how does the agent read "active cart" at runtime?

## Related

- `docs/specialist-plan.md` — language-specific LoRA adapters (oxidizer, prism, etc.)
- `docs/bt7274-persona.md` — BT-7274 persona LoRA (v4, 802 examples)
- EEMS: `project.lora.specialist-plan` — specialist adapter plan
- EEMS: `project.lora.bt7274-v4` — BT persona training details
- `/etc/vllm/qwen3-coder-next.yaml` — current AWQ serving config on sin
- `/etc/vllm/qwen3-coder-next-fp8.yaml` — FP8 config (won't fit GB10 with co-tenants)