docs: bt7274 persona, specialist plan, tts-clean LoRA

2026-05-25 16:40:09 +02:00
parent 9bcf1c2d9a
commit 5deb9bd2b4
3 changed files with 243 additions and 0 deletions
@@ -0,0 +1,65 @@
+# TTS Cleanup LoRA
+
+## Purpose
+
+Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
+handle abbreviations. Runs in the latency path — must be fast.
+
+## Model
+
+Qwen2.5-1.5B-Instruct + LoRA on sin.
+~2GB VRAM, ~20ms inference. Negligible overhead.
+
+## Pipeline
+
+```
+LLM response (markdown)
+    → vLLM 1.5B tts-clean LoRA (~20ms)
+    → clean spoken text
+    → piper/chatterbox TTS engine
+    → audio
+```
+
+## Training Data
+
+Sources:
+- 214 `core_speak` tool calls from opencode session history
+  (pre-speak markdown → actual spoken text pairs)
+- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
+
+Transformations to learn:
+
+| Input | Output |
+|-------|--------|
+| `**Status:** 3 nodes` | `Status: three nodes` |
+| `10.0.0.2` | `ten dot zero dot zero dot two` |
+| `~1.1M tokens` | `about one point one million tokens` |
+| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
+| `## Boot Complete` | `Boot Complete.` |
+| `\| Col \| Val \|` | `Col: Val.` |
+| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
+| `` `cargo build` `` | `cargo build` |
+| `$3.60` | `three dollars sixty` |
+| `2026-05-25` | `May twenty-fifth, twenty twenty-six` |
+
+## Serving
+
+Separate vLLM instance on sin, port 8001:
+
+```bash
+python -m vllm.entrypoints.openai.api_server \
+  --model Qwen/Qwen2.5-1.5B-Instruct \
+  --enable-lora \
+  --lora-modules tts-clean=/path/to/lora-tts-clean \
+  --max-lora-rank 16 \
+  --port 8001
+```
+
+## Integration
+
+Hook into `core_speak` tool or cart-loader plugin:
+- Intercept text before TTS
+- Call tts-clean model
+- Forward cleaned text to TTS engine
+
+Reusable across all persona carts — not BT-specific.