Files
lora/docs/tts-clean.md
T
2026-05-25 16:40:09 +02:00

66 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TTS Cleanup LoRA
## Purpose
Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
handle abbreviations. Runs in the latency path — must be fast.
## Model
Qwen2.5-1.5B-Instruct + LoRA on sin.
~2GB VRAM, ~20ms inference. Negligible overhead.
## Pipeline
```
LLM response (markdown)
→ vLLM 1.5B tts-clean LoRA (~20ms)
→ clean spoken text
→ piper/chatterbox TTS engine
→ audio
```
## Training Data
Sources:
- 214 `core_speak` tool calls from opencode session history
(pre-speak markdown → actual spoken text pairs)
- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
Transformations to learn:
| Input | Output |
|-------|--------|
| `**Status:** 3 nodes` | `Status: three nodes` |
| `10.0.0.2` | `ten dot zero dot zero dot two` |
| `~1.1M tokens` | `about one point one million tokens` |
| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
| `## Boot Complete` | `Boot Complete.` |
| `\| Col \| Val \|` | `Col: Val.` |
| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
| `` `cargo build` `` | `cargo build` |
| `$3.60` | `three dollars sixty` |
| `2026-05-25` | `May twenty-fifth, twenty twenty-six` |
## Serving
Separate vLLM instance on sin, port 8001:
```bash
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-1.5B-Instruct \
--enable-lora \
--lora-modules tts-clean=/path/to/lora-tts-clean \
--max-lora-rank 16 \
--port 8001
```
## Integration
Hook into `core_speak` tool or cart-loader plugin:
- Intercept text before TTS
- Call tts-clean model
- Forward cleaned text to TTS engine
Reusable across all persona carts — not BT-specific.