From 5deb9bd2b454502bc53c4744a2cd717f06fa96f7 Mon Sep 17 00:00:00 2001 From: marauder-actual Date: Mon, 25 May 2026 16:40:09 +0200 Subject: [PATCH] docs: bt7274 persona, specialist plan, tts-clean LoRA --- docs/bt7274-persona.md | 80 +++++++++++++++++++++++++++++++++ docs/specialist-plan.md | 98 +++++++++++++++++++++++++++++++++++++++++ docs/tts-clean.md | 65 +++++++++++++++++++++++++++ 3 files changed, 243 insertions(+) create mode 100644 docs/bt7274-persona.md create mode 100644 docs/specialist-plan.md create mode 100644 docs/tts-clean.md diff --git a/docs/bt7274-persona.md b/docs/bt7274-persona.md new file mode 100644 index 0000000..8187f5e --- /dev/null +++ b/docs/bt7274-persona.md @@ -0,0 +1,80 @@ +# BT-7274 Persona LoRA + +## Overview + +LoRA fine-tune Qwen2.5-7B-Instruct for BT-7274 Titan-AI persona. +Voice, tool dispatch, and comms style. + +## Versions + +### v1 (97 examples) +- Hand-crafted ChatML examples +- Voice/style only, no tool calls +- 5 epochs, 65 steps, ~17 min on RTX 2000 Ada 16GB +- Adapter: `bt7274-lora-v1/` +- Dataset: `bt7274_v1.jsonl` + +### v2 (500 examples) +- Extracted from opencode session history (58 core sessions) +- 498/500 examples include tool calls (memory, speak, mesh, display) +- ~1.1M tokens, avg 3.4K chars/example +- 3 epochs, ~187 steps, ~1 hr estimated +- Adapter: `bt7274-lora-v2/` (training in progress) +- Dataset: `bt7274_v2.jsonl` + +## Data Pipeline + +Extraction script: `~/.config/opencode/scripts/extract-training-data.py` + +1. Query opencode SQLite DB (`~/.local/share/opencode/opencode.db`) +2. Filter core-agent sessions (BT-7274 voice) +3. Reconstruct multi-turn: user → assistant (text + tool_calls) → tool results → response +4. Score quality: BT voice cues, tool usage, length, anti-patterns +5. Top N by score → ChatML JSONL + +Quality scoring: +- +2 per BT voice cue (Pilot, Boss, Confirmed, etc.) +- +5 for tool-inclusive turns +- +3 for reasonable length (50-2000 chars) +- -5 for anti-patterns (Claude, Happy to help, etc.) + +## Training Config + +- Base: `unsloth/Qwen2.5-7B-Instruct-bnb-4bit` +- LoRA: r=16, alpha=16, dropout=0 +- Targets: q/k/v/o_proj, gate/up/down_proj +- MAX_SEQ=4096 (tool chains need length) +- bf16, gradient checkpointing, adamw_8bit +- Batch=1, grad_accum=8, LR=2e-4 + +## Serving + +vLLM on junkpile (RTX 2000 Ada 16GB): + +```bash +python -m vllm.entrypoints.openai.api_server \ + --model Qwen/Qwen2.5-7B-Instruct \ + --quantization bitsandbytes \ + --load-format bitsandbytes \ + --enable-lora \ + --lora-modules bt7274=/home/madcat/Projects/lora/bt7274-lora-v2 \ + --max-lora-rank 16 \ + --max-model-len 32768 \ + --enforce-eager \ + --enable-auto-tool-choice \ + --tool-call-parser hermes \ + --port 8000 +``` + +## Cart Integration + +Identity injected via `plugins/cart-loader.ts` (Rust NAPI plugin). +Reads `~/.config/opencode/carts/bt7274.toml`. +No EEMS boot recalls — cart is sole identity source. + +## Incremental Retraining + +Checkpoints saved every 50 steps. To resume or extend: +- Add new examples to dataset JSONL +- Set `resume_from_checkpoint=True` in TrainingArguments +- Optimizer state, LR schedule, momentum all preserved diff --git a/docs/specialist-plan.md b/docs/specialist-plan.md new file mode 100644 index 0000000..9a25d65 --- /dev/null +++ b/docs/specialist-plan.md @@ -0,0 +1,98 @@ +# Coding Specialist LoRA Training Plan + +## Overview + +LoRA fine-tune Qwen3-Coder-Next with per-language specialist adapters. +Single vLLM instance on sin, multiple LoRA adapters, zero extra base model cost. + +## Adapters + +| Adapter | Base | Target data | Source | +|---------|------|-------------|--------| +| `build-rust` | Qwen3-Coder-Next | ~300-500 examples | opencode sessions + madcat-os repos | +| `build-ts` | Qwen3-Coder-Next | ~400-600 examples | opencode sessions + plugins/visor repos | +| `build-python` | Qwen3-Coder-Next | ~200-400 examples | opencode sessions + lora/training scripts | +| `build-ruby` | Qwen3-Coder-Next | ~100-200 examples | opencode sessions + Rails projects | +| `build-swift` | Qwen3-Coder-Next | ~50-100 examples | opencode sessions + madcat-apple | + +## Data Sources + +### 1. opencode session history + +64 `build` agent sessions, 13,598 messages in `~/.local/share/opencode/opencode.db`. +Subagent types (build-rust, etc.) don't exist in history yet — all coding +work is under generic `build` agent. Must classify by content. + +Classification signals: +- File extensions in tool calls (`.rs`, `.ts`, `.py`, `.rb`, `.swift`) +- Bash commands (`cargo`, `npm`, `pip`, `bundle`, `swift build`) +- Tool output content (compiler errors, test output) + +### 2. Git repo diffs + +Mine actual commit history for style-consistent examples: +- `git log --patch` → extract user-intent + diff pairs +- Real bug fixes, refactors, feature implementations +- Preserves Pilot's code style per language + +Target repos: +- **Rust:** marauder-os, madcat-os/*, tengu +- **TypeScript:** opencode config, plugins, visor, sere-kit +- **Python:** lora training scripts, automation +- **Ruby:** any Rails projects on mesh +- **Swift:** madcat-apple + +### 3. Synthetic augmentation + +For underrepresented languages (Ruby, Swift), generate synthetic pairs: +- Take real code from repos +- Generate "implement X" / "fix Y" / "refactor Z" prompts +- Pair with actual code as response + +## Training Config + +Same as bt7274 v2: +- LoRA r=16, alpha=16, dropout=0 +- Target modules: q/k/v/o_proj, gate/up/down_proj +- bf16, gradient checkpointing +- adamw_8bit optimizer +- 3 epochs, batch 1, grad_accum 8 + +Adjustments per specialist: +- MAX_SEQ=8192 for code (longer than chat) +- LR=1e-4 (lower for code, less style drift) + +## Serving + +Single vLLM instance on sin: + +```bash +python -m vllm.entrypoints.openai.api_server \ + --model Qwen3-Coder-Next \ + --enable-lora \ + --lora-modules \ + build-rust=/path/to/lora-rust \ + build-ts=/path/to/lora-ts \ + build-python=/path/to/lora-python \ + build-ruby=/path/to/lora-ruby \ + build-swift=/path/to/lora-swift \ + --max-lora-rank 16 \ + --port 8000 +``` + +## opencode Integration + +```json +"build-rust": { "model": "vllm/build-rust" }, +"build-ts": { "model": "vllm/build-ts" }, +"build-python": { "model": "vllm/build-python" }, +"build-ruby": { "model": "vllm/build-ruby" }, +"build-swift": { "model": "vllm/build-swift" } +``` + +## Pipeline + +1. `extract-training-data.py` — pull from opencode DB, classify by language +2. `mine-repos.py` — extract git diffs as training pairs +3. `train.py` — per-specialist training (reuse justfile tasks) +4. `just train-specialist LANG=rust` — one command per adapter diff --git a/docs/tts-clean.md b/docs/tts-clean.md new file mode 100644 index 0000000..f0190cb --- /dev/null +++ b/docs/tts-clean.md @@ -0,0 +1,65 @@ +# TTS Cleanup LoRA + +## Purpose + +Post-process LLM output for text-to-speech. Strip markdown, expand numbers, +handle abbreviations. Runs in the latency path — must be fast. + +## Model + +Qwen2.5-1.5B-Instruct + LoRA on sin. +~2GB VRAM, ~20ms inference. Negligible overhead. + +## Pipeline + +``` +LLM response (markdown) + → vLLM 1.5B tts-clean LoRA (~20ms) + → clean spoken text + → piper/chatterbox TTS engine + → audio +``` + +## Training Data + +Sources: +- 214 `core_speak` tool calls from opencode session history + (pre-speak markdown → actual spoken text pairs) +- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers) + +Transformations to learn: + +| Input | Output | +|-------|--------| +| `**Status:** 3 nodes` | `Status: three nodes` | +| `10.0.0.2` | `ten dot zero dot zero dot two` | +| `~1.1M tokens` | `about one point one million tokens` | +| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` | +| `## Boot Complete` | `Boot Complete.` | +| `\| Col \| Val \|` | `Col: Val.` | +| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` | +| `` `cargo build` `` | `cargo build` | +| `$3.60` | `three dollars sixty` | +| `2026-05-25` | `May twenty-fifth, twenty twenty-six` | + +## Serving + +Separate vLLM instance on sin, port 8001: + +```bash +python -m vllm.entrypoints.openai.api_server \ + --model Qwen/Qwen2.5-1.5B-Instruct \ + --enable-lora \ + --lora-modules tts-clean=/path/to/lora-tts-clean \ + --max-lora-rank 16 \ + --port 8001 +``` + +## Integration + +Hook into `core_speak` tool or cart-loader plugin: +- Intercept text before TTS +- Call tts-clean model +- Forward cleaned text to TTS engine + +Reusable across all persona carts — not BT-specific.