TTS Cleanup LoRA

Purpose

Post-process LLM output for text-to-speech. Strip markdown, expand numbers, handle abbreviations. Runs in the latency path — must be fast.

Model

Qwen2.5-1.5B-Instruct + LoRA on sin. ~2GB VRAM, ~20ms inference. Negligible overhead.

Pipeline

LLM response (markdown)
    → vLLM 1.5B tts-clean LoRA (~20ms)
    → clean spoken text
    → piper/chatterbox TTS engine
    → audio

Training Data

Sources:

214 core_speak tool calls from opencode session history (pre-speak markdown → actual spoken text pairs)
Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)

Transformations to learn:

Input	Output
`Status: 3 nodes`	`Status: three nodes`
`10.0.0.2`	`ten dot zero dot zero dot two`
`~1.1M tokens`	`about one point one million tokens`
`LoRA r=16, α=16`	`LoRA r sixteen, alpha sixteen`
`## Boot Complete`	`Boot Complete.`
`\| Col \| Val \|`	`Col: Val.`
`RTX 2000 Ada 16GB`	`RTX two thousand Ada, sixteen gigabytes`
`cargo build`	`cargo build`
`$3.60`	`three dollars sixty`
`2026-05-25`	`May twenty-fifth, twenty twenty-six`

Serving

Separate vLLM instance on sin, port 8001:

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --enable-lora \
  --lora-modules tts-clean=/path/to/lora-tts-clean \
  --max-lora-rank 16 \
  --port 8001

Integration

Hook into core_speak tool or cart-loader plugin:

Intercept text before TTS
Call tts-clean model
Forward cleaned text to TTS engine

Reusable across all persona carts — not BT-specific.

1.6 KiB Raw Blame History Unescape Escape