1.6 KiB
1.6 KiB
TTS Cleanup LoRA
Purpose
Post-process LLM output for text-to-speech. Strip markdown, expand numbers, handle abbreviations. Runs in the latency path — must be fast.
Model
Qwen2.5-1.5B-Instruct + LoRA on sin. ~2GB VRAM, ~20ms inference. Negligible overhead.
Pipeline
LLM response (markdown)
→ vLLM 1.5B tts-clean LoRA (~20ms)
→ clean spoken text
→ piper/chatterbox TTS engine
→ audio
Training Data
Sources:
- 214
core_speaktool calls from opencode session history (pre-speak markdown → actual spoken text pairs) - Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
Transformations to learn:
| Input | Output |
|---|---|
**Status:** 3 nodes |
Status: three nodes |
10.0.0.2 |
ten dot zero dot zero dot two |
~1.1M tokens |
about one point one million tokens |
LoRA r=16, α=16 |
LoRA r sixteen, alpha sixteen |
## Boot Complete |
Boot Complete. |
| Col | Val | |
Col: Val. |
RTX 2000 Ada 16GB |
RTX two thousand Ada, sixteen gigabytes |
`cargo build` |
cargo build |
$3.60 |
three dollars sixty |
2026-05-25 |
May twenty-fifth, twenty twenty-six |
Serving
Separate vLLM instance on sin, port 8001:
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-1.5B-Instruct \
--enable-lora \
--lora-modules tts-clean=/path/to/lora-tts-clean \
--max-lora-rank 16 \
--port 8001
Integration
Hook into core_speak tool or cart-loader plugin:
- Intercept text before TTS
- Call tts-clean model
- Forward cleaned text to TTS engine
Reusable across all persona carts — not BT-specific.