docs: bt7274 persona, specialist plan, tts-clean LoRA
This commit is contained in:
@@ -0,0 +1,80 @@
|
||||
# BT-7274 Persona LoRA
|
||||
|
||||
## Overview
|
||||
|
||||
LoRA fine-tune Qwen2.5-7B-Instruct for BT-7274 Titan-AI persona.
|
||||
Voice, tool dispatch, and comms style.
|
||||
|
||||
## Versions
|
||||
|
||||
### v1 (97 examples)
|
||||
- Hand-crafted ChatML examples
|
||||
- Voice/style only, no tool calls
|
||||
- 5 epochs, 65 steps, ~17 min on RTX 2000 Ada 16GB
|
||||
- Adapter: `bt7274-lora-v1/`
|
||||
- Dataset: `bt7274_v1.jsonl`
|
||||
|
||||
### v2 (500 examples)
|
||||
- Extracted from opencode session history (58 core sessions)
|
||||
- 498/500 examples include tool calls (memory, speak, mesh, display)
|
||||
- ~1.1M tokens, avg 3.4K chars/example
|
||||
- 3 epochs, ~187 steps, ~1 hr estimated
|
||||
- Adapter: `bt7274-lora-v2/` (training in progress)
|
||||
- Dataset: `bt7274_v2.jsonl`
|
||||
|
||||
## Data Pipeline
|
||||
|
||||
Extraction script: `~/.config/opencode/scripts/extract-training-data.py`
|
||||
|
||||
1. Query opencode SQLite DB (`~/.local/share/opencode/opencode.db`)
|
||||
2. Filter core-agent sessions (BT-7274 voice)
|
||||
3. Reconstruct multi-turn: user → assistant (text + tool_calls) → tool results → response
|
||||
4. Score quality: BT voice cues, tool usage, length, anti-patterns
|
||||
5. Top N by score → ChatML JSONL
|
||||
|
||||
Quality scoring:
|
||||
- +2 per BT voice cue (Pilot, Boss, Confirmed, etc.)
|
||||
- +5 for tool-inclusive turns
|
||||
- +3 for reasonable length (50-2000 chars)
|
||||
- -5 for anti-patterns (Claude, Happy to help, etc.)
|
||||
|
||||
## Training Config
|
||||
|
||||
- Base: `unsloth/Qwen2.5-7B-Instruct-bnb-4bit`
|
||||
- LoRA: r=16, alpha=16, dropout=0
|
||||
- Targets: q/k/v/o_proj, gate/up/down_proj
|
||||
- MAX_SEQ=4096 (tool chains need length)
|
||||
- bf16, gradient checkpointing, adamw_8bit
|
||||
- Batch=1, grad_accum=8, LR=2e-4
|
||||
|
||||
## Serving
|
||||
|
||||
vLLM on junkpile (RTX 2000 Ada 16GB):
|
||||
|
||||
```bash
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model Qwen/Qwen2.5-7B-Instruct \
|
||||
--quantization bitsandbytes \
|
||||
--load-format bitsandbytes \
|
||||
--enable-lora \
|
||||
--lora-modules bt7274=/home/madcat/Projects/lora/bt7274-lora-v2 \
|
||||
--max-lora-rank 16 \
|
||||
--max-model-len 32768 \
|
||||
--enforce-eager \
|
||||
--enable-auto-tool-choice \
|
||||
--tool-call-parser hermes \
|
||||
--port 8000
|
||||
```
|
||||
|
||||
## Cart Integration
|
||||
|
||||
Identity injected via `plugins/cart-loader.ts` (Rust NAPI plugin).
|
||||
Reads `~/.config/opencode/carts/bt7274.toml`.
|
||||
No EEMS boot recalls — cart is sole identity source.
|
||||
|
||||
## Incremental Retraining
|
||||
|
||||
Checkpoints saved every 50 steps. To resume or extend:
|
||||
- Add new examples to dataset JSONL
|
||||
- Set `resume_from_checkpoint=True` in TrainingArguments
|
||||
- Optimizer state, LR schedule, momentum all preserved
|
||||
@@ -0,0 +1,98 @@
|
||||
# Coding Specialist LoRA Training Plan
|
||||
|
||||
## Overview
|
||||
|
||||
LoRA fine-tune Qwen3-Coder-Next with per-language specialist adapters.
|
||||
Single vLLM instance on sin, multiple LoRA adapters, zero extra base model cost.
|
||||
|
||||
## Adapters
|
||||
|
||||
| Adapter | Base | Target data | Source |
|
||||
|---------|------|-------------|--------|
|
||||
| `build-rust` | Qwen3-Coder-Next | ~300-500 examples | opencode sessions + madcat-os repos |
|
||||
| `build-ts` | Qwen3-Coder-Next | ~400-600 examples | opencode sessions + plugins/visor repos |
|
||||
| `build-python` | Qwen3-Coder-Next | ~200-400 examples | opencode sessions + lora/training scripts |
|
||||
| `build-ruby` | Qwen3-Coder-Next | ~100-200 examples | opencode sessions + Rails projects |
|
||||
| `build-swift` | Qwen3-Coder-Next | ~50-100 examples | opencode sessions + madcat-apple |
|
||||
|
||||
## Data Sources
|
||||
|
||||
### 1. opencode session history
|
||||
|
||||
64 `build` agent sessions, 13,598 messages in `~/.local/share/opencode/opencode.db`.
|
||||
Subagent types (build-rust, etc.) don't exist in history yet — all coding
|
||||
work is under generic `build` agent. Must classify by content.
|
||||
|
||||
Classification signals:
|
||||
- File extensions in tool calls (`.rs`, `.ts`, `.py`, `.rb`, `.swift`)
|
||||
- Bash commands (`cargo`, `npm`, `pip`, `bundle`, `swift build`)
|
||||
- Tool output content (compiler errors, test output)
|
||||
|
||||
### 2. Git repo diffs
|
||||
|
||||
Mine actual commit history for style-consistent examples:
|
||||
- `git log --patch` → extract user-intent + diff pairs
|
||||
- Real bug fixes, refactors, feature implementations
|
||||
- Preserves Pilot's code style per language
|
||||
|
||||
Target repos:
|
||||
- **Rust:** marauder-os, madcat-os/*, tengu
|
||||
- **TypeScript:** opencode config, plugins, visor, sere-kit
|
||||
- **Python:** lora training scripts, automation
|
||||
- **Ruby:** any Rails projects on mesh
|
||||
- **Swift:** madcat-apple
|
||||
|
||||
### 3. Synthetic augmentation
|
||||
|
||||
For underrepresented languages (Ruby, Swift), generate synthetic pairs:
|
||||
- Take real code from repos
|
||||
- Generate "implement X" / "fix Y" / "refactor Z" prompts
|
||||
- Pair with actual code as response
|
||||
|
||||
## Training Config
|
||||
|
||||
Same as bt7274 v2:
|
||||
- LoRA r=16, alpha=16, dropout=0
|
||||
- Target modules: q/k/v/o_proj, gate/up/down_proj
|
||||
- bf16, gradient checkpointing
|
||||
- adamw_8bit optimizer
|
||||
- 3 epochs, batch 1, grad_accum 8
|
||||
|
||||
Adjustments per specialist:
|
||||
- MAX_SEQ=8192 for code (longer than chat)
|
||||
- LR=1e-4 (lower for code, less style drift)
|
||||
|
||||
## Serving
|
||||
|
||||
Single vLLM instance on sin:
|
||||
|
||||
```bash
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model Qwen3-Coder-Next \
|
||||
--enable-lora \
|
||||
--lora-modules \
|
||||
build-rust=/path/to/lora-rust \
|
||||
build-ts=/path/to/lora-ts \
|
||||
build-python=/path/to/lora-python \
|
||||
build-ruby=/path/to/lora-ruby \
|
||||
build-swift=/path/to/lora-swift \
|
||||
--max-lora-rank 16 \
|
||||
--port 8000
|
||||
```
|
||||
|
||||
## opencode Integration
|
||||
|
||||
```json
|
||||
"build-rust": { "model": "vllm/build-rust" },
|
||||
"build-ts": { "model": "vllm/build-ts" },
|
||||
"build-python": { "model": "vllm/build-python" },
|
||||
"build-ruby": { "model": "vllm/build-ruby" },
|
||||
"build-swift": { "model": "vllm/build-swift" }
|
||||
```
|
||||
|
||||
## Pipeline
|
||||
|
||||
1. `extract-training-data.py` — pull from opencode DB, classify by language
|
||||
2. `mine-repos.py` — extract git diffs as training pairs
|
||||
3. `train.py` — per-specialist training (reuse justfile tasks)
|
||||
4. `just train-specialist LANG=rust` — one command per adapter
|
||||
@@ -0,0 +1,65 @@
|
||||
# TTS Cleanup LoRA
|
||||
|
||||
## Purpose
|
||||
|
||||
Post-process LLM output for text-to-speech. Strip markdown, expand numbers,
|
||||
handle abbreviations. Runs in the latency path — must be fast.
|
||||
|
||||
## Model
|
||||
|
||||
Qwen2.5-1.5B-Instruct + LoRA on sin.
|
||||
~2GB VRAM, ~20ms inference. Negligible overhead.
|
||||
|
||||
## Pipeline
|
||||
|
||||
```
|
||||
LLM response (markdown)
|
||||
→ vLLM 1.5B tts-clean LoRA (~20ms)
|
||||
→ clean spoken text
|
||||
→ piper/chatterbox TTS engine
|
||||
→ audio
|
||||
```
|
||||
|
||||
## Training Data
|
||||
|
||||
Sources:
|
||||
- 214 `core_speak` tool calls from opencode session history
|
||||
(pre-speak markdown → actual spoken text pairs)
|
||||
- Synthetic pairs for edge cases (tables, code blocks, URLs, numbers)
|
||||
|
||||
Transformations to learn:
|
||||
|
||||
| Input | Output |
|
||||
|-------|--------|
|
||||
| `**Status:** 3 nodes` | `Status: three nodes` |
|
||||
| `10.0.0.2` | `ten dot zero dot zero dot two` |
|
||||
| `~1.1M tokens` | `about one point one million tokens` |
|
||||
| `LoRA r=16, α=16` | `LoRA r sixteen, alpha sixteen` |
|
||||
| `## Boot Complete` | `Boot Complete.` |
|
||||
| `\| Col \| Val \|` | `Col: Val.` |
|
||||
| `RTX 2000 Ada 16GB` | `RTX two thousand Ada, sixteen gigabytes` |
|
||||
| `` `cargo build` `` | `cargo build` |
|
||||
| `$3.60` | `three dollars sixty` |
|
||||
| `2026-05-25` | `May twenty-fifth, twenty twenty-six` |
|
||||
|
||||
## Serving
|
||||
|
||||
Separate vLLM instance on sin, port 8001:
|
||||
|
||||
```bash
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model Qwen/Qwen2.5-1.5B-Instruct \
|
||||
--enable-lora \
|
||||
--lora-modules tts-clean=/path/to/lora-tts-clean \
|
||||
--max-lora-rank 16 \
|
||||
--port 8001
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
Hook into `core_speak` tool or cart-loader plugin:
|
||||
- Intercept text before TTS
|
||||
- Call tts-clean model
|
||||
- Forward cleaned text to TTS engine
|
||||
|
||||
Reusable across all persona carts — not BT-specific.
|
||||
Reference in New Issue
Block a user