docs: bt7274 persona, specialist plan, tts-clean LoRA

2026-05-25 16:40:09 +02:00
parent 9bcf1c2d9a
commit 5deb9bd2b4
3 changed files with 243 additions and 0 deletions
@@ -0,0 +1,80 @@
 # BT-7274 Persona LoRA
 ## Overview
 LoRA fine-tune Qwen2.5-7B-Instruct for BT-7274 Titan-AI persona.
 Voice, tool dispatch, and comms style.
 ## Versions
 ### v1 (97 examples)
 - Hand-crafted ChatML examples
 - Voice/style only, no tool calls
 - 5 epochs, 65 steps, ~17 min on RTX 2000 Ada 16GB
 - Adapter: `bt7274-lora-v1/`
 - Dataset: `bt7274_v1.jsonl`
 ### v2 (500 examples)
 - Extracted from opencode session history (58 core sessions)
 - 498/500 examples include tool calls (memory, speak, mesh, display)
 - ~1.1M tokens, avg 3.4K chars/example
 - 3 epochs, ~187 steps, ~1 hr estimated
 - Adapter: `bt7274-lora-v2/` (training in progress)
 - Dataset: `bt7274_v2.jsonl`
 ## Data Pipeline
 Extraction script: `~/.config/opencode/scripts/extract-training-data.py`
 1. Query opencode SQLite DB (`~/.local/share/opencode/opencode.db`)
 2. Filter core-agent sessions (BT-7274 voice)
 3. Reconstruct multi-turn: user → assistant (text + tool_calls) → tool results → response
 4. Score quality: BT voice cues, tool usage, length, anti-patterns
 5. Top N by score → ChatML JSONL
 Quality scoring:
 - +2 per BT voice cue (Pilot, Boss, Confirmed, etc.)
 - +5 for tool-inclusive turns
 - +3 for reasonable length (50-2000 chars)
 - -5 for anti-patterns (Claude, Happy to help, etc.)
 ## Training Config
 - Base: `unsloth/Qwen2.5-7B-Instruct-bnb-4bit`
 - LoRA: r=16, alpha=16, dropout=0
 - Targets: q/k/v/o_proj, gate/up/down_proj
 - MAX_SEQ=4096 (tool chains need length)
 - bf16, gradient checkpointing, adamw_8bit
 - Batch=1, grad_accum=8, LR=2e-4
 ## Serving
 vLLM on junkpile (RTX 2000 Ada 16GB):
 ```bash
 python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-7B-Instruct \
  --quantization bitsandbytes \
  --load-format bitsandbytes \
  --enable-lora \
  --lora-modules bt7274=/home/madcat/Projects/lora/bt7274-lora-v2 \
  --max-lora-rank 16 \
  --max-model-len 32768 \
  --enforce-eager \
  --enable-auto-tool-choice \
  --tool-call-parser hermes \
  --port 8000
 ```
 ## Cart Integration
 Identity injected via `plugins/cart-loader.ts` (Rust NAPI plugin).
 Reads `~/.config/opencode/carts/bt7274.toml`.
 No EEMS boot recalls — cart is sole identity source.
 ## Incremental Retraining
 Checkpoints saved every 50 steps. To resume or extend:
 - Add new examples to dataset JSONL
 - Set `resume_from_checkpoint=True` in TrainingArguments
 - Optimizer state, LR schedule, momentum all preserved
@@ -0,0 +1,98 @@
 # Coding Specialist LoRA Training Plan
 ## Overview
 LoRA fine-tune Qwen3-Coder-Next with per-language specialist adapters.
 Single vLLM instance on sin, multiple LoRA adapters, zero extra base model cost.
 ## Adapters
 | Adapter | Base | Target data | Source |
 |---------|------|-------------|--------|
 | `build-rust` | Qwen3-Coder-Next | ~300-500 examples | opencode sessions + madcat-os repos |
 | `build-ts` | Qwen3-Coder-Next | ~400-600 examples | opencode sessions + plugins/visor repos |
 | `build-python` | Qwen3-Coder-Next | ~200-400 examples | opencode sessions + lora/training scripts |
 | `build-ruby` | Qwen3-Coder-Next | ~100-200 examples | opencode sessions + Rails projects |
 | `build-swift` | Qwen3-Coder-Next | ~50-100 examples | opencode sessions + madcat-apple |
 ## Data Sources
 ### 1. opencode session history
 64 `build` agent sessions, 13,598 messages in `~/.local/share/opencode/opencode.db`.
 Subagent types (build-rust, etc.) don't exist in history yet — all coding
 work is under generic `build` agent. Must classify by content.
 Classification signals:
 - File extensions in tool calls (`.rs`, `.ts`, `.py`, `.rb`, `.swift`)
 - Bash commands (`cargo`, `npm`, `pip`, `bundle`, `swift build`)
 - Tool output content (compiler errors, test output)
 ### 2. Git repo diffs
 Mine actual commit history for style-consistent examples:
 - `git log --patch` → extract user-intent + diff pairs
 - Real bug fixes, refactors, feature implementations
 - Preserves Pilot's code style per language
 Target repos:
 - **Rust:** marauder-os, madcat-os/*, tengu
 - **TypeScript:** opencode config, plugins, visor, sere-kit
 - **Python:** lora training scripts, automation
 - **Ruby:** any Rails projects on mesh
 - **Swift:** madcat-apple
 ### 3. Synthetic augmentation
 For underrepresented languages (Ruby, Swift), generate synthetic pairs:
 - Take real code from repos
 - Generate "implement X" / "fix Y" / "refactor Z" prompts
 - Pair with actual code as response
 ## Training Config
 Same as bt7274 v2:
 - LoRA r=16, alpha=16, dropout=0
 - Target modules: q/k/v/o_proj, gate/up/down_proj
 - bf16, gradient checkpointing
 - adamw_8bit optimizer
 - 3 epochs, batch 1, grad_accum 8
 Adjustments per specialist:
 - MAX_SEQ=8192 for code (longer than chat)
 - LR=1e-4 (lower for code, less style drift)
 ## Serving
 Single vLLM instance on sin:
 ```bash
 python -m vllm.entrypoints.openai.api_server \
  --model Qwen3-Coder-Next \
  --enable-lora \
  --lora-modules \
    build-rust=/path/to/lora-rust \
    build-ts=/path/to/lora-ts \
    build-python=/path/to/lora-python \
    build-ruby=/path/to/lora-ruby \
    build-swift=/path/to/lora-swift \
  --max-lora-rank 16 \
  --port 8000
 ```
 ## opencode Integration
 ```json
 "build-rust": { "model": "vllm/build-rust" },
 "build-ts": { "model": "vllm/build-ts" },
 "build-python": { "model": "vllm/build-python" },
 "build-ruby": { "model": "vllm/build-ruby" },
 "build-swift": { "model": "vllm/build-swift" }
 ```
 ## Pipeline
 1. `extract-training-data.py` — pull from opencode DB, classify by language
 2. `mine-repos.py` — extract git diffs as training pairs
 3. `train.py` — per-specialist training (reuse justfile tasks)
 4. `just train-specialist LANG=rust` — one command per adapter
@@ -0,0 +1,65 @@
 # TTS Cleanup LoRA
 ## Purpose
 Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
 handle abbreviations. Runs in the latency path — must be fast.
 ## Model
 Qwen2.5-1.5B-Instruct + LoRA on sin.
 ~2GB VRAM, ~20ms inference. Negligible overhead.
 ## Pipeline
 ```
 LLM response (markdown)
    → vLLM 1.5B tts-clean LoRA (~20ms)
    → clean spoken text
    → piper/chatterbox TTS engine
    → audio
 ```
 ## Training Data
 Sources:
 - 214 `core_speak` tool calls from opencode session history
  (pre-speak markdown → actual spoken text pairs)
 - Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
 Transformations to learn:
 | Input | Output |
 |-------|--------|
 | `**Status:** 3 nodes` | `Status: three nodes` |
 | `10.0.0.2` | `ten dot zero dot zero dot two` |
 | `~1.1M tokens` | `about one point one million tokens` |
 | `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
 | `## Boot Complete` | `Boot Complete.` |
 | `\| Col \| Val \|` | `Col: Val.` |
 | `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
 | `` `cargo build` `` | `cargo build` |
 | `$3.60` | `three dollars sixty` |
 | `2026-05-25` | `May twenty-fifth, twenty twenty-six` |
 ## Serving
 Separate vLLM instance on sin, port 8001:
 ```bash
 python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --enable-lora \
  --lora-modules tts-clean=/path/to/lora-tts-clean \
  --max-lora-rank 16 \
  --port 8001
 ```
 ## Integration
 Hook into `core_speak` tool or cart-loader plugin:
 - Intercept text before TTS
 - Call tts-clean model
 - Forward cleaned text to TTS engine
 Reusable across all persona carts — not BT-specific.