Files
lora/docs/tts-clean.md
T
2026-05-25 16:40:09 +02:00

1.6 KiB
Raw Blame History

TTS Cleanup LoRA

Purpose

Post-process LLM output for text-to-speech. Strip markdown, expand numbers, handle abbreviations. Runs in the latency path — must be fast.

Model

Qwen2.5-1.5B-Instruct + LoRA on sin. ~2GB VRAM, ~20ms inference. Negligible overhead.

Pipeline

LLM response (markdown)
    → vLLM 1.5B tts-clean LoRA (~20ms)
    → clean spoken text
    → piper/chatterbox TTS engine
    → audio

Training Data

Sources:

  • 214 core_speak tool calls from opencode session history (pre-speak markdown → actual spoken text pairs)
  • Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)

Transformations to learn:

Input Output
**Status:** 3 nodes Status: three nodes
10.0.0.2 ten dot zero dot zero dot two
~1.1M tokens about one point one million tokens
LoRA r=16, α=16 LoRA r sixteen, alpha sixteen
## Boot Complete Boot Complete.
| Col | Val | Col: Val.
RTX 2000 Ada 16GB RTX two thousand Ada, sixteen gigabytes
`cargo build` cargo build
$3.60 three dollars sixty
2026-05-25 May twenty-fifth, twenty twenty-six

Serving

Separate vLLM instance on sin, port 8001:

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --enable-lora \
  --lora-modules tts-clean=/path/to/lora-tts-clean \
  --max-lora-rank 16 \
  --port 8001

Integration

Hook into core_speak tool or cart-loader plugin:

  • Intercept text before TTS
  • Call tts-clean model
  • Forward cleaned text to TTS engine

Reusable across all persona carts — not BT-specific.