docs: bt7274 persona, specialist plan, tts-clean LoRA
This commit is contained in:
@@ -0,0 +1,65 @@
|
||||
# TTS Cleanup LoRA
|
||||
|
||||
## Purpose
|
||||
|
||||
Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
|
||||
handle abbreviations. Runs in the latency path — must be fast.
|
||||
|
||||
## Model
|
||||
|
||||
Qwen2.5-1.5B-Instruct + LoRA on sin.
|
||||
~2GB VRAM, ~20ms inference. Negligible overhead.
|
||||
|
||||
## Pipeline
|
||||
|
||||
```
|
||||
LLM response (markdown)
|
||||
→ vLLM 1.5B tts-clean LoRA (~20ms)
|
||||
→ clean spoken text
|
||||
→ piper/chatterbox TTS engine
|
||||
→ audio
|
||||
```
|
||||
|
||||
## Training Data
|
||||
|
||||
Sources:
|
||||
- 214 `core_speak` tool calls from opencode session history
|
||||
(pre-speak markdown → actual spoken text pairs)
|
||||
- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
|
||||
|
||||
Transformations to learn:
|
||||
|
||||
| Input | Output |
|
||||
|-------|--------|
|
||||
| `**Status:** 3 nodes` | `Status: three nodes` |
|
||||
| `10.0.0.2` | `ten dot zero dot zero dot two` |
|
||||
| `~1.1M tokens` | `about one point one million tokens` |
|
||||
| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
|
||||
| `## Boot Complete` | `Boot Complete.` |
|
||||
| `\| Col \| Val \|` | `Col: Val.` |
|
||||
| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
|
||||
| `` `cargo build` `` | `cargo build` |
|
||||
| `$3.60` | `three dollars sixty` |
|
||||
| `2026-05-25` | `May twenty-fifth, twenty twenty-six` |
|
||||
|
||||
## Serving
|
||||
|
||||
Separate vLLM instance on sin, port 8001:
|
||||
|
||||
```bash
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model Qwen/Qwen2.5-1.5B-Instruct \
|
||||
--enable-lora \
|
||||
--lora-modules tts-clean=/path/to/lora-tts-clean \
|
||||
--max-lora-rank 16 \
|
||||
--port 8001
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
Hook into `core_speak` tool or cart-loader plugin:
|
||||
- Intercept text before TTS
|
||||
- Call tts-clean model
|
||||
- Forward cleaned text to TTS engine
|
||||
|
||||
Reusable across all persona carts — not BT-specific.
|
||||
Reference in New Issue
Block a user