add docs: system lora plan, specialist specs, training review
This commit is contained in:
@@ -0,0 +1,122 @@
|
||||
# madcat System LoRA — Qwen3-Coder-Next
|
||||
|
||||
## Goal
|
||||
|
||||
Bake madcat ecosystem awareness into Qwen3-Coder-Next so any agent session
|
||||
knows how to use TTS, EEMS, mesh hosts, personas, and tools without relying
|
||||
on massive system prompts. Cart/persona identity loads at runtime via system
|
||||
prompt — the LoRA provides the operational substrate.
|
||||
|
||||
## Base Model
|
||||
|
||||
- **Training**: `Qwen/Qwen3-Coder-Next` (BF16, 80B total / 3B active MoE)
|
||||
- **Serving**: `cyankiwi/qwen3-coder-next:awq` on sin vLLM with `--enable-lora`
|
||||
- **Architecture**: Hybrid DeltaNet + Attention, 512 experts, 10 active
|
||||
|
||||
## Training Config
|
||||
|
||||
- Hardware: RunPod H200 141GB (QLoRA 4-bit on single GPU)
|
||||
- LoRA: r=16, alpha=16, bf16
|
||||
- Optimizer: adamw_torch
|
||||
- Batch: 1, grad_accum 8
|
||||
- Epochs: 3
|
||||
- LR: 5e-5
|
||||
- MAX_SEQ: 4096 (behavioral examples are short)
|
||||
- Estimated time: ~20-30 minutes
|
||||
|
||||
## Dataset: ~100 examples
|
||||
|
||||
### Categories
|
||||
|
||||
| Category | ~Count | Description |
|
||||
|---|---|---|
|
||||
| TTS awareness | 15 | Check settings for active cart, use `tts_speak` with active cart voice, cart switching, voice selection |
|
||||
| EEMS/memory | 20 | Use `memory_recall`/`memory_store` before work, check both native + legacy stores, lazy recall patterns, boot recalls |
|
||||
| Persona loading | 10 | Load cart persona from settings, apply cadence/voice/identity at runtime, never hardcode personality into responses |
|
||||
| Mesh/infra | 15 | Host awareness (fuji/sin/junkpile), cross-host tools, SSH patterns, service locations, WireGuard mesh |
|
||||
| Tool discipline | 15 | Settings check on boot, proper tool naming (no redundant namespacing), code style rules, index search before work |
|
||||
| Agent behavior | 15 | Todo management, pre-work checklist (indexes -> EEMS -> legacy -> verify), pilot interlock, earned canonization |
|
||||
| Error patterns | 10 | MCP down alerts, mesh offline handling, graceful degradation, repeat detection |
|
||||
|
||||
### Example Format
|
||||
|
||||
Hermes tool-call format (same as bt7274-v4). Each example is a multi-turn
|
||||
conversation showing correct madcat-aware behavior:
|
||||
|
||||
```jsonl
|
||||
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "...", "tool_calls": [...]}]}
|
||||
```
|
||||
|
||||
### Example Content Patterns
|
||||
|
||||
**TTS awareness:**
|
||||
- User says "say hello" -> agent checks active cart via settings, calls `tts_speak` with correct voice
|
||||
- User asks to switch voice -> agent uses `tts_switch_voice` with cart tag
|
||||
- Agent responds to user with TTS when settings indicate TTS is enabled
|
||||
|
||||
**EEMS/memory:**
|
||||
- Before starting code work -> agent runs `memory_recall` for the topic, checks `memory_list`
|
||||
- Agent stores a win/doctrine/procedure after completing significant work
|
||||
- Agent checks legacy EEMS via `legacy-memory_recall` for broader context
|
||||
- Boot sequence: 3 parallel recalls (self-model, cart, user preferences)
|
||||
|
||||
**Persona loading:**
|
||||
- Agent reads active cart from settings and adjusts cadence accordingly
|
||||
- Agent does NOT hardcode "I am BT-7274" — loads identity from cart config
|
||||
- Different carts produce different communication styles
|
||||
|
||||
**Mesh/infra:**
|
||||
- Agent knows fuji=macOS/ARM64, sin=GB10/ARM64, junkpile=x86_64/RTX2000Ada
|
||||
- Agent uses correct SSH aliases (sin, madcat@sin, madcat@10.0.0.2)
|
||||
- Agent checks host before running platform-specific commands
|
||||
|
||||
**Tool discipline:**
|
||||
- In `users.ts`, export `list` not `user_list`
|
||||
- In a `Session` class, method is `create` not `createSession`
|
||||
- Search indexes before writing code (17k+ code chunks, 44k+ doc chunks)
|
||||
|
||||
**Agent behavior:**
|
||||
- Use TodoWrite for 3+ step tasks
|
||||
- Mark todos in_progress before working, completed after done
|
||||
- Pre-work: index_search -> memory_recall -> legacy_recall -> verify sources
|
||||
|
||||
**Error patterns:**
|
||||
- MCP server disconnected -> alert user, suggest reconnect
|
||||
- Mesh host unreachable -> check WireGuard, try alternative path
|
||||
- Tool execution fails -> retry once, then surface error clearly
|
||||
|
||||
## Serving
|
||||
|
||||
```bash
|
||||
# sin vLLM with AWQ base + LoRA adapter
|
||||
vllm serve --config /etc/vllm/qwen3-coder-next.yaml \
|
||||
--enable-lora \
|
||||
--lora-modules madcat=/path/to/madcat-system-lora
|
||||
```
|
||||
|
||||
## File Layout
|
||||
|
||||
```
|
||||
~/Projects/lora/
|
||||
├── docs/madcat-system-lora.md # this file
|
||||
├── data/madcat-system.jsonl # training dataset (to be created)
|
||||
├── gen_madcat_dataset.py # dataset generation script (to be created)
|
||||
├── train_madcat.py # training script (to be created)
|
||||
└── madcat-system-lora/ # output adapter (after training)
|
||||
```
|
||||
|
||||
## Open Questions
|
||||
|
||||
- AWQ + LoRA on MoE: confirm vLLM supports this combo for Qwen3-Coder-Next
|
||||
- LoRA target modules: attention layers only, or also shared expert?
|
||||
- Dataset generation: mine from existing opencode sessions or hand-craft?
|
||||
- Cart settings surface: how does the agent read "active cart" at runtime?
|
||||
|
||||
## Related
|
||||
|
||||
- `docs/specialist-plan.md` — language-specific LoRA adapters (oxidizer, prism, etc.)
|
||||
- `docs/bt7274-persona.md` — BT-7274 persona LoRA (v4, 802 examples)
|
||||
- EEMS: `project.lora.specialist-plan` — specialist adapter plan
|
||||
- EEMS: `project.lora.bt7274-v4` — BT persona training details
|
||||
- `/etc/vllm/qwen3-coder-next.yaml` — current AWQ serving config on sin
|
||||
- `/etc/vllm/qwen3-coder-next-fp8.yaml` — FP8 config (won't fit GB10 with co-tenants)
|
||||
Reference in New Issue
Block a user