lora/docs/tts-clean.md

# TTS Cleanup LoRA

## Purpose

Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
handle abbreviations. Runs in the latency path — must be fast.

## Model

Qwen2.5-1.5B-Instruct + LoRA on sin.
~2GB VRAM, ~20ms inference. Negligible overhead.

## Pipeline

```
LLM response (markdown)
    → vLLM 1.5B tts-clean LoRA (~20ms)
    → clean spoken text
    → piper/chatterbox TTS engine
    → audio
```

## Training Data

Sources:
- 214 `core_speak` tool calls from opencode session history
  (pre-speak markdown → actual spoken text pairs)
- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)

Transformations to learn:

| Input | Output |
|-------|--------|
| `**Status:** 3 nodes` | `Status: three nodes` |
| `10.0.0.2` | `ten dot zero dot zero dot two` |
| `~1.1M tokens` | `about one point one million tokens` |
| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
| `## Boot Complete` | `Boot Complete.` |
| `\| Col \| Val \|` | `Col: Val.` |
| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
| `` `cargo build` `` | `cargo build` |
| `$3.60` | `three dollars sixty` |
| `2026-05-25` | `May twenty-fifth, twenty twenty-six` |

## Serving

Separate vLLM instance on sin, port 8001:

```bash
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --enable-lora \
  --lora-modules tts-clean=/path/to/lora-tts-clean \
  --max-lora-rank 16 \
  --port 8001
```

## Integration

Hook into `core_speak` tool or cart-loader plugin:
- Intercept text before TTS
- Call tts-clean model
- Forward cleaned text to TTS engine

Reusable across all persona carts — not BT-specific.