# TTS Cleanup LoRA ## Purpose Post-process LLM output for text-to-speech. Strip markdown, expand numbers, handle abbreviations. Runs in the latency path — must be fast. ## Model Qwen2.5-1.5B-Instruct + LoRA on sin. ~2GB VRAM, ~20ms inference. Negligible overhead. ## Pipeline ``` LLM response (markdown) → vLLM 1.5B tts-clean LoRA (~20ms) → clean spoken text → piper/chatterbox TTS engine → audio ``` ## Training Data Sources: - 214 `core_speak` tool calls from opencode session history (pre-speak markdown → actual spoken text pairs) - Synthetic pairs for edge cases (tables, code blocks, URLs, numbers) Transformations to learn: | Input | Output | |-------|--------| | `**Status:** 3 nodes` | `Status: three nodes` | | `10.0.0.2` | `ten dot zero dot zero dot two` | | `~1.1M tokens` | `about one point one million tokens` | | `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` | | `## Boot Complete` | `Boot Complete.` | | `\| Col \| Val \|` | `Col: Val.` | | `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` | | `` `cargo build` `` | `cargo build` | | `$3.60` | `three dollars sixty` | | `2026-05-25` | `May twenty-fifth, twenty twenty-six` | ## Serving Separate vLLM instance on sin, port 8001: ```bash python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-1.5B-Instruct \ --enable-lora \ --lora-modules tts-clean=/path/to/lora-tts-clean \ --max-lora-rank 16 \ --port 8001 ``` ## Integration Hook into `core_speak` tool or cart-loader plugin: - Intercept text before TTS - Call tts-clean model - Forward cleaned text to TTS engine Reusable across all persona carts — not BT-specific.