Files
lora/docs/specialist-plan.md
T
2026-05-25 16:40:09 +02:00

3.2 KiB

Coding Specialist LoRA Training Plan

Overview

LoRA fine-tune Qwen3-Coder-Next with per-language specialist adapters. Single vLLM instance on sin, multiple LoRA adapters, zero extra base model cost.

Adapters

Adapter Base Target data Source
build-rust Qwen3-Coder-Next ~300-500 examples opencode sessions + madcat-os repos
build-ts Qwen3-Coder-Next ~400-600 examples opencode sessions + plugins/visor repos
build-python Qwen3-Coder-Next ~200-400 examples opencode sessions + lora/training scripts
build-ruby Qwen3-Coder-Next ~100-200 examples opencode sessions + Rails projects
build-swift Qwen3-Coder-Next ~50-100 examples opencode sessions + madcat-apple

Data Sources

1. opencode session history

64 build agent sessions, 13,598 messages in ~/.local/share/opencode/opencode.db. Subagent types (build-rust, etc.) don't exist in history yet — all coding work is under generic build agent. Must classify by content.

Classification signals:

  • File extensions in tool calls (.rs, .ts, .py, .rb, .swift)
  • Bash commands (cargo, npm, pip, bundle, swift build)
  • Tool output content (compiler errors, test output)

2. Git repo diffs

Mine actual commit history for style-consistent examples:

  • git log --patch → extract user-intent + diff pairs
  • Real bug fixes, refactors, feature implementations
  • Preserves Pilot's code style per language

Target repos:

  • Rust: marauder-os, madcat-os/*, tengu
  • TypeScript: opencode config, plugins, visor, sere-kit
  • Python: lora training scripts, automation
  • Ruby: any Rails projects on mesh
  • Swift: madcat-apple

3. Synthetic augmentation

For underrepresented languages (Ruby, Swift), generate synthetic pairs:

  • Take real code from repos
  • Generate "implement X" / "fix Y" / "refactor Z" prompts
  • Pair with actual code as response

Training Config

Same as bt7274 v2:

  • LoRA r=16, alpha=16, dropout=0
  • Target modules: q/k/v/o_proj, gate/up/down_proj
  • bf16, gradient checkpointing
  • adamw_8bit optimizer
  • 3 epochs, batch 1, grad_accum 8

Adjustments per specialist:

  • MAX_SEQ=8192 for code (longer than chat)
  • LR=1e-4 (lower for code, less style drift)

Serving

Single vLLM instance on sin:

python -m vllm.entrypoints.openai.api_server \
  --model Qwen3-Coder-Next \
  --enable-lora \
  --lora-modules \
    build-rust=/path/to/lora-rust \
    build-ts=/path/to/lora-ts \
    build-python=/path/to/lora-python \
    build-ruby=/path/to/lora-ruby \
    build-swift=/path/to/lora-swift \
  --max-lora-rank 16 \
  --port 8000

opencode Integration

"build-rust": { "model": "vllm/build-rust" },
"build-ts": { "model": "vllm/build-ts" },
"build-python": { "model": "vllm/build-python" },
"build-ruby": { "model": "vllm/build-ruby" },
"build-swift": { "model": "vllm/build-swift" }

Pipeline

  1. extract-training-data.py — pull from opencode DB, classify by language
  2. mine-repos.py — extract git diffs as training pairs
  3. train.py — per-specialist training (reuse justfile tasks)
  4. just train-specialist LANG=rust — one command per adapter