madcat System LoRA — Qwen3-Coder-Next

Goal

Bake madcat ecosystem awareness into Qwen3-Coder-Next so any agent session knows how to use TTS, EEMS, mesh hosts, personas, and tools without relying on massive system prompts. Cart/persona identity loads at runtime via system prompt — the LoRA provides the operational substrate.

Base Model

Training: Qwen/Qwen3-Coder-Next (BF16, 80B total / 3B active MoE)
Serving: cyankiwi/qwen3-coder-next:awq on sin vLLM with --enable-lora
Architecture: Hybrid DeltaNet + Attention, 512 experts, 10 active

Training Config

Hardware: RunPod H200 141GB (QLoRA 4-bit on single GPU)
LoRA: r=16, alpha=16, bf16
Optimizer: adamw_torch
Batch: 1, grad_accum 8
Epochs: 3
LR: 5e-5
MAX_SEQ: 4096 (behavioral examples are short)
Estimated time: ~20-30 minutes

Dataset: ~100 examples

Category	~Count	Description
TTS awareness	15	Check settings for active cart, use `tts_speak` with active cart voice, cart switching, voice selection
EEMS/memory	20	Use `memory_recall`/`memory_store` before work, check both native + legacy stores, lazy recall patterns, boot recalls
Persona loading	10	Load cart persona from settings, apply cadence/voice/identity at runtime, never hardcode personality into responses
Mesh/infra	15	Host awareness (fuji/sin/junkpile), cross-host tools, SSH patterns, service locations, WireGuard mesh
Tool discipline	15	Settings check on boot, proper tool naming (no redundant namespacing), code style rules, index search before work
Agent behavior	15	Todo management, pre-work checklist (indexes -> EEMS -> legacy -> verify), pilot interlock, earned canonization
Error patterns	10	MCP down alerts, mesh offline handling, graceful degradation, repeat detection

Example Format

Hermes tool-call format (same as bt7274-v4). Each example is a multi-turn conversation showing correct madcat-aware behavior:

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "...", "tool_calls": [...]}]}

Example Content Patterns

TTS awareness:

User says "say hello" -> agent checks active cart via settings, calls tts_speak with correct voice
User asks to switch voice -> agent uses tts_switch_voice with cart tag
Agent responds to user with TTS when settings indicate TTS is enabled

EEMS/memory:

Before starting code work -> agent runs memory_recall for the topic, checks memory_list
Agent stores a win/doctrine/procedure after completing significant work
Agent checks legacy EEMS via legacy-memory_recall for broader context
Boot sequence: 3 parallel recalls (self-model, cart, user preferences)

Persona loading:

Agent reads active cart from settings and adjusts cadence accordingly
Agent does NOT hardcode "I am BT-7274" — loads identity from cart config
Different carts produce different communication styles

Mesh/infra:

Agent knows fuji=macOS/ARM64, sin=GB10/ARM64, junkpile=x86_64/RTX2000Ada
Agent uses correct SSH aliases (sin, madcat@sin, madcat@10.0.0.2)
Agent checks host before running platform-specific commands

Tool discipline:

In users.ts, export list not user_list
In a Session class, method is create not createSession
Search indexes before writing code (17k+ code chunks, 44k+ doc chunks)

Agent behavior:

Use TodoWrite for 3+ step tasks
Mark todos in_progress before working, completed after done
Pre-work: index_search -> memory_recall -> legacy_recall -> verify sources

Error patterns:

MCP server disconnected -> alert user, suggest reconnect
Mesh host unreachable -> check WireGuard, try alternative path
Tool execution fails -> retry once, then surface error clearly

Serving

# sin vLLM with AWQ base + LoRA adapter
vllm serve --config /etc/vllm/qwen3-coder-next.yaml \
  --enable-lora \
  --lora-modules madcat=/path/to/madcat-system-lora

File Layout

~/Projects/lora/
├── docs/madcat-system-lora.md    # this file
├── data/madcat-system.jsonl      # training dataset (to be created)
├── gen_madcat_dataset.py         # dataset generation script (to be created)
├── train_madcat.py               # training script (to be created)
└── madcat-system-lora/           # output adapter (after training)

Open Questions

AWQ + LoRA on MoE: confirm vLLM supports this combo for Qwen3-Coder-Next
LoRA target modules: attention layers only, or also shared expert?
Dataset generation: mine from existing opencode sessions or hand-craft?
Cart settings surface: how does the agent read "active cart" at runtime?

docs/specialist-plan.md — language-specific LoRA adapters (oxidizer, prism, etc.)
docs/bt7274-persona.md — BT-7274 persona LoRA (v4, 802 examples)
EEMS: project.lora.specialist-plan — specialist adapter plan
EEMS: project.lora.bt7274-v4 — BT persona training details
/etc/vllm/qwen3-coder-next.yaml — current AWQ serving config on sin
/etc/vllm/qwen3-coder-next-fp8.yaml — FP8 config (won't fit GB10 with co-tenants)

5.1 KiB Raw Blame History