66 lines
1.6 KiB
Markdown
66 lines
1.6 KiB
Markdown
# TTS Cleanup LoRA
|
||
|
||
## Purpose
|
||
|
||
Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
|
||
handle abbreviations. Runs in the latency path — must be fast.
|
||
|
||
## Model
|
||
|
||
Qwen2.5-1.5B-Instruct + LoRA on sin.
|
||
~2GB VRAM, ~20ms inference. Negligible overhead.
|
||
|
||
## Pipeline
|
||
|
||
```
|
||
LLM response (markdown)
|
||
→ vLLM 1.5B tts-clean LoRA (~20ms)
|
||
→ clean spoken text
|
||
→ piper/chatterbox TTS engine
|
||
→ audio
|
||
```
|
||
|
||
## Training Data
|
||
|
||
Sources:
|
||
- 214 `core_speak` tool calls from opencode session history
|
||
(pre-speak markdown → actual spoken text pairs)
|
||
- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
|
||
|
||
Transformations to learn:
|
||
|
||
| Input | Output |
|
||
|-------|--------|
|
||
| `**Status:** 3 nodes` | `Status: three nodes` |
|
||
| `10.0.0.2` | `ten dot zero dot zero dot two` |
|
||
| `~1.1M tokens` | `about one point one million tokens` |
|
||
| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
|
||
| `## Boot Complete` | `Boot Complete.` |
|
||
| `\| Col \| Val \|` | `Col: Val.` |
|
||
| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
|
||
| `` `cargo build` `` | `cargo build` |
|
||
| `$3.60` | `three dollars sixty` |
|
||
| `2026-05-25` | `May twenty-fifth, twenty twenty-six` |
|
||
|
||
## Serving
|
||
|
||
Separate vLLM instance on sin, port 8001:
|
||
|
||
```bash
|
||
python -m vllm.entrypoints.openai.api_server \
|
||
--model Qwen/Qwen2.5-1.5B-Instruct \
|
||
--enable-lora \
|
||
--lora-modules tts-clean=/path/to/lora-tts-clean \
|
||
--max-lora-rank 16 \
|
||
--port 8001
|
||
```
|
||
|
||
## Integration
|
||
|
||
Hook into `core_speak` tool or cart-loader plugin:
|
||
- Intercept text before TTS
|
||
- Call tts-clean model
|
||
- Forward cleaned text to TTS engine
|
||
|
||
Reusable across all persona carts — not BT-specific.
|