Files
lora/docs/madcat-system-lora.md

5.1 KiB

madcat System LoRA — Qwen3-Coder-Next

Goal

Bake madcat ecosystem awareness into Qwen3-Coder-Next so any agent session knows how to use TTS, EEMS, mesh hosts, personas, and tools without relying on massive system prompts. Cart/persona identity loads at runtime via system prompt — the LoRA provides the operational substrate.

Base Model

  • Training: Qwen/Qwen3-Coder-Next (BF16, 80B total / 3B active MoE)
  • Serving: cyankiwi/qwen3-coder-next:awq on sin vLLM with --enable-lora
  • Architecture: Hybrid DeltaNet + Attention, 512 experts, 10 active

Training Config

  • Hardware: RunPod H200 141GB (QLoRA 4-bit on single GPU)
  • LoRA: r=16, alpha=16, bf16
  • Optimizer: adamw_torch
  • Batch: 1, grad_accum 8
  • Epochs: 3
  • LR: 5e-5
  • MAX_SEQ: 4096 (behavioral examples are short)
  • Estimated time: ~20-30 minutes

Dataset: ~100 examples

Categories

Category ~Count Description
TTS awareness 15 Check settings for active cart, use tts_speak with active cart voice, cart switching, voice selection
EEMS/memory 20 Use memory_recall/memory_store before work, check both native + legacy stores, lazy recall patterns, boot recalls
Persona loading 10 Load cart persona from settings, apply cadence/voice/identity at runtime, never hardcode personality into responses
Mesh/infra 15 Host awareness (fuji/sin/junkpile), cross-host tools, SSH patterns, service locations, WireGuard mesh
Tool discipline 15 Settings check on boot, proper tool naming (no redundant namespacing), code style rules, index search before work
Agent behavior 15 Todo management, pre-work checklist (indexes -> EEMS -> legacy -> verify), pilot interlock, earned canonization
Error patterns 10 MCP down alerts, mesh offline handling, graceful degradation, repeat detection

Example Format

Hermes tool-call format (same as bt7274-v4). Each example is a multi-turn conversation showing correct madcat-aware behavior:

{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "...", "tool_calls": [...]}]}

Example Content Patterns

TTS awareness:

  • User says "say hello" -> agent checks active cart via settings, calls tts_speak with correct voice
  • User asks to switch voice -> agent uses tts_switch_voice with cart tag
  • Agent responds to user with TTS when settings indicate TTS is enabled

EEMS/memory:

  • Before starting code work -> agent runs memory_recall for the topic, checks memory_list
  • Agent stores a win/doctrine/procedure after completing significant work
  • Agent checks legacy EEMS via legacy-memory_recall for broader context
  • Boot sequence: 3 parallel recalls (self-model, cart, user preferences)

Persona loading:

  • Agent reads active cart from settings and adjusts cadence accordingly
  • Agent does NOT hardcode "I am BT-7274" — loads identity from cart config
  • Different carts produce different communication styles

Mesh/infra:

  • Agent knows fuji=macOS/ARM64, sin=GB10/ARM64, junkpile=x86_64/RTX2000Ada
  • Agent uses correct SSH aliases (sin, madcat@sin, madcat@10.0.0.2)
  • Agent checks host before running platform-specific commands

Tool discipline:

  • In users.ts, export list not user_list
  • In a Session class, method is create not createSession
  • Search indexes before writing code (17k+ code chunks, 44k+ doc chunks)

Agent behavior:

  • Use TodoWrite for 3+ step tasks
  • Mark todos in_progress before working, completed after done
  • Pre-work: index_search -> memory_recall -> legacy_recall -> verify sources

Error patterns:

  • MCP server disconnected -> alert user, suggest reconnect
  • Mesh host unreachable -> check WireGuard, try alternative path
  • Tool execution fails -> retry once, then surface error clearly

Serving

# sin vLLM with AWQ base + LoRA adapter
vllm serve --config /etc/vllm/qwen3-coder-next.yaml \
  --enable-lora \
  --lora-modules madcat=/path/to/madcat-system-lora

File Layout

~/Projects/lora/
├── docs/madcat-system-lora.md    # this file
├── data/madcat-system.jsonl      # training dataset (to be created)
├── gen_madcat_dataset.py         # dataset generation script (to be created)
├── train_madcat.py               # training script (to be created)
└── madcat-system-lora/           # output adapter (after training)

Open Questions

  • AWQ + LoRA on MoE: confirm vLLM supports this combo for Qwen3-Coder-Next
  • LoRA target modules: attention layers only, or also shared expert?
  • Dataset generation: mine from existing opencode sessions or hand-craft?
  • Cart settings surface: how does the agent read "active cart" at runtime?
  • docs/specialist-plan.md — language-specific LoRA adapters (oxidizer, prism, etc.)
  • docs/bt7274-persona.md — BT-7274 persona LoRA (v4, 802 examples)
  • EEMS: project.lora.specialist-plan — specialist adapter plan
  • EEMS: project.lora.bt7274-v4 — BT persona training details
  • /etc/vllm/qwen3-coder-next.yaml — current AWQ serving config on sin
  • /etc/vllm/qwen3-coder-next-fp8.yaml — FP8 config (won't fit GB10 with co-tenants)