docs: bt7274 persona, specialist plan, tts-clean LoRA

This commit is contained in:
marauder-actual
2026-05-25 16:40:09 +02:00
parent 9bcf1c2d9a
commit 5deb9bd2b4
3 changed files with 243 additions and 0 deletions
+65
View File
@@ -0,0 +1,65 @@
# TTS Cleanup LoRA
## Purpose
Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
handle abbreviations. Runs in the latency path — must be fast.
## Model
Qwen2.5-1.5B-Instruct + LoRA on sin.
~2GB VRAM, ~20ms inference. Negligible overhead.
## Pipeline
```
LLM response (markdown)
→ vLLM 1.5B tts-clean LoRA (~20ms)
→ clean spoken text
→ piper/chatterbox TTS engine
→ audio
```
## Training Data
Sources:
- 214 `core_speak` tool calls from opencode session history
(pre-speak markdown → actual spoken text pairs)
- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
Transformations to learn:
| Input | Output |
|-------|--------|
| `**Status:** 3 nodes` | `Status: three nodes` |
| `10.0.0.2` | `ten dot zero dot zero dot two` |
| `~1.1M tokens` | `about one point one million tokens` |
| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
| `## Boot Complete` | `Boot Complete.` |
| `\| Col \| Val \|` | `Col: Val.` |
| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
| `` `cargo build` `` | `cargo build` |
| `$3.60` | `three dollars sixty` |
| `2026-05-25` | `May twenty-fifth, twenty twenty-six` |
## Serving
Separate vLLM instance on sin, port 8001:
```bash
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-1.5B-Instruct \
--enable-lora \
--lora-modules tts-clean=/path/to/lora-tts-clean \
--max-lora-rank 16 \
--port 8001
```
## Integration
Hook into `core_speak` tool or cart-loader plugin:
- Intercept text before TTS
- Call tts-clean model
- Forward cleaned text to TTS engine
Reusable across all persona carts — not BT-specific.