From 5deb9bd2b454502bc53c4744a2cd717f06fa96f7 Mon Sep 17 00:00:00 2001
From: marauder-actual <marauder@saiden.dev>
Date: Mon, 25 May 2026 16:40:09 +0200
Subject: [PATCH] docs: bt7274 persona, specialist plan, tts-clean LoRA

---
 docs/bt7274-persona.md  | 80 +++++++++++++++++++++++++++++++++
 docs/specialist-plan.md | 98 +++++++++++++++++++++++++++++++++++++++++
 docs/tts-clean.md       | 65 +++++++++++++++++++++++++++
 3 files changed, 243 insertions(+)
 create mode 100644 docs/bt7274-persona.md
 create mode 100644 docs/specialist-plan.md
 create mode 100644 docs/tts-clean.md

diff --git a/docs/bt7274-persona.md b/docs/bt7274-persona.md
new file mode 100644
index 0000000..8187f5e
--- /dev/null
+++ b/docs/bt7274-persona.md
@@ -0,0 +1,80 @@
+# BT-7274 Persona LoRA
+
+## Overview
+
+LoRA fine-tune Qwen2.5-7B-Instruct for BT-7274 Titan-AI persona.
+Voice, tool dispatch, and comms style.
+
+## Versions
+
+### v1 (97 examples)
+- Hand-crafted ChatML examples
+- Voice/style only, no tool calls
+- 5 epochs, 65 steps, ~17 min on RTX 2000 Ada 16GB
+- Adapter: `bt7274-lora-v1/`
+- Dataset: `bt7274_v1.jsonl`
+
+### v2 (500 examples)
+- Extracted from opencode session history (58 core sessions)
+- 498/500 examples include tool calls (memory, speak, mesh, display)
+- ~1.1M tokens, avg 3.4K chars/example
+- 3 epochs, ~187 steps, ~1 hr estimated
+- Adapter: `bt7274-lora-v2/` (training in progress)
+- Dataset: `bt7274_v2.jsonl`
+
+## Data Pipeline
+
+Extraction script: `~/.config/opencode/scripts/extract-training-data.py`
+
+1. Query opencode SQLite DB (`~/.local/share/opencode/opencode.db`)
+2. Filter core-agent sessions (BT-7274 voice)
+3. Reconstruct multi-turn: user → assistant (text + tool_calls) → tool results → response
+4. Score quality: BT voice cues, tool usage, length, anti-patterns
+5. Top N by score → ChatML JSONL
+
+Quality scoring:
+- +2 per BT voice cue (Pilot, Boss, Confirmed, etc.)
+- +5 for tool-inclusive turns
+- +3 for reasonable length (50-2000 chars)
+- -5 for anti-patterns (Claude, Happy to help, etc.)
+
+## Training Config
+
+- Base: `unsloth/Qwen2.5-7B-Instruct-bnb-4bit`
+- LoRA: r=16, alpha=16, dropout=0
+- Targets: q/k/v/o_proj, gate/up/down_proj
+- MAX_SEQ=4096 (tool chains need length)
+- bf16, gradient checkpointing, adamw_8bit
+- Batch=1, grad_accum=8, LR=2e-4
+
+## Serving
+
+vLLM on junkpile (RTX 2000 Ada 16GB):
+
+```bash
+python -m vllm.entrypoints.openai.api_server \
+  --model Qwen/Qwen2.5-7B-Instruct \
+  --quantization bitsandbytes \
+  --load-format bitsandbytes \
+  --enable-lora \
+  --lora-modules bt7274=/home/madcat/Projects/lora/bt7274-lora-v2 \
+  --max-lora-rank 16 \
+  --max-model-len 32768 \
+  --enforce-eager \
+  --enable-auto-tool-choice \
+  --tool-call-parser hermes \
+  --port 8000
+```
+
+## Cart Integration
+
+Identity injected via `plugins/cart-loader.ts` (Rust NAPI plugin).
+Reads `~/.config/opencode/carts/bt7274.toml`.
+No EEMS boot recalls — cart is sole identity source.
+
+## Incremental Retraining
+
+Checkpoints saved every 50 steps. To resume or extend:
+- Add new examples to dataset JSONL
+- Set `resume_from_checkpoint=True` in TrainingArguments
+- Optimizer state, LR schedule, momentum all preserved
diff --git a/docs/specialist-plan.md b/docs/specialist-plan.md
new file mode 100644
index 0000000..9a25d65
--- /dev/null
+++ b/docs/specialist-plan.md
@@ -0,0 +1,98 @@
+# Coding Specialist LoRA Training Plan
+
+## Overview
+
+LoRA fine-tune Qwen3-Coder-Next with per-language specialist adapters.
+Single vLLM instance on sin, multiple LoRA adapters, zero extra base model cost.
+
+## Adapters
+
+| Adapter | Base | Target data | Source |
+|---------|------|-------------|--------|
+| `build-rust` | Qwen3-Coder-Next | ~300-500 examples | opencode sessions + madcat-os repos |
+| `build-ts` | Qwen3-Coder-Next | ~400-600 examples | opencode sessions + plugins/visor repos |
+| `build-python` | Qwen3-Coder-Next | ~200-400 examples | opencode sessions + lora/training scripts |
+| `build-ruby` | Qwen3-Coder-Next | ~100-200 examples | opencode sessions + Rails projects |
+| `build-swift` | Qwen3-Coder-Next | ~50-100 examples | opencode sessions + madcat-apple |
+
+## Data Sources
+
+### 1. opencode session history
+
+64 `build` agent sessions, 13,598 messages in `~/.local/share/opencode/opencode.db`.
+Subagent types (build-rust, etc.) don't exist in history yet — all coding
+work is under generic `build` agent. Must classify by content.
+
+Classification signals:
+- File extensions in tool calls (`.rs`, `.ts`, `.py`, `.rb`, `.swift`)
+- Bash commands (`cargo`, `npm`, `pip`, `bundle`, `swift build`)
+- Tool output content (compiler errors, test output)
+
+### 2. Git repo diffs
+
+Mine actual commit history for style-consistent examples:
+- `git log --patch` → extract user-intent + diff pairs
+- Real bug fixes, refactors, feature implementations
+- Preserves Pilot's code style per language
+
+Target repos:
+- **Rust:** marauder-os, madcat-os/*, tengu
+- **TypeScript:** opencode config, plugins, visor, sere-kit
+- **Python:** lora training scripts, automation
+- **Ruby:** any Rails projects on mesh
+- **Swift:** madcat-apple
+
+### 3. Synthetic augmentation
+
+For underrepresented languages (Ruby, Swift), generate synthetic pairs:
+- Take real code from repos
+- Generate "implement X" / "fix Y" / "refactor Z" prompts
+- Pair with actual code as response
+
+## Training Config
+
+Same as bt7274 v2:
+- LoRA r=16, alpha=16, dropout=0
+- Target modules: q/k/v/o_proj, gate/up/down_proj
+- bf16, gradient checkpointing
+- adamw_8bit optimizer
+- 3 epochs, batch 1, grad_accum 8
+
+Adjustments per specialist:
+- MAX_SEQ=8192 for code (longer than chat)
+- LR=1e-4 (lower for code, less style drift)
+
+## Serving
+
+Single vLLM instance on sin:
+
+```bash
+python -m vllm.entrypoints.openai.api_server \
+  --model Qwen3-Coder-Next \
+  --enable-lora \
+  --lora-modules \
+    build-rust=/path/to/lora-rust \
+    build-ts=/path/to/lora-ts \
+    build-python=/path/to/lora-python \
+    build-ruby=/path/to/lora-ruby \
+    build-swift=/path/to/lora-swift \
+  --max-lora-rank 16 \
+  --port 8000
+```
+
+## opencode Integration
+
+```json
+"build-rust": { "model": "vllm/build-rust" },
+"build-ts": { "model": "vllm/build-ts" },
+"build-python": { "model": "vllm/build-python" },
+"build-ruby": { "model": "vllm/build-ruby" },
+"build-swift": { "model": "vllm/build-swift" }
+```
+
+## Pipeline
+
+1. `extract-training-data.py` — pull from opencode DB, classify by language
+2. `mine-repos.py` — extract git diffs as training pairs
+3. `train.py` — per-specialist training (reuse justfile tasks)
+4. `just train-specialist LANG=rust` — one command per adapter
diff --git a/docs/tts-clean.md b/docs/tts-clean.md
new file mode 100644
index 0000000..f0190cb
--- /dev/null
+++ b/docs/tts-clean.md
@@ -0,0 +1,65 @@
+# TTS Cleanup LoRA
+
+## Purpose
+
+Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
+handle abbreviations. Runs in the latency path — must be fast.
+
+## Model
+
+Qwen2.5-1.5B-Instruct + LoRA on sin.
+~2GB VRAM, ~20ms inference. Negligible overhead.
+
+## Pipeline
+
+```
+LLM response (markdown)
+    → vLLM 1.5B tts-clean LoRA (~20ms)
+    → clean spoken text
+    → piper/chatterbox TTS engine
+    → audio
+```
+
+## Training Data
+
+Sources:
+- 214 `core_speak` tool calls from opencode session history
+  (pre-speak markdown → actual spoken text pairs)
+- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
+
+Transformations to learn:
+
+| Input | Output |
+|-------|--------|
+| `**Status:** 3 nodes` | `Status: three nodes` |
+| `10.0.0.2` | `ten dot zero dot zero dot two` |
+| `~1.1M tokens` | `about one point one million tokens` |
+| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
+| `## Boot Complete` | `Boot Complete.` |
+| `\| Col \| Val \|` | `Col: Val.` |
+| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
+| `` `cargo build` `` | `cargo build` |
+| `$3.60` | `three dollars sixty` |
+| `2026-05-25` | `May twenty-fifth, twenty twenty-six` |
+
+## Serving
+
+Separate vLLM instance on sin, port 8001:
+
+```bash
+python -m vllm.entrypoints.openai.api_server \
+  --model Qwen/Qwen2.5-1.5B-Instruct \
+  --enable-lora \
+  --lora-modules tts-clean=/path/to/lora-tts-clean \
+  --max-lora-rank 16 \
+  --port 8001
+```
+
+## Integration
+
+Hook into `core_speak` tool or cart-loader plugin:
+- Intercept text before TTS
+- Call tts-clean model
+- Forward cleaned text to TTS engine
+
+Reusable across all persona carts — not BT-specific.