feat: tts-norm LoRA — dataset generator + training script

gen_tts_dataset.py: 4960 synthetic examples, 22 categories (numbers,
currencies, dates, times, temperatures, acronyms, NATO phonetic, URLs,
markdown, etc). Bilingual EN/PL with explicit [lang] tag prefix.

train_tts_norm.py: Unsloth LoRA training for Qwen2.5-7B-Instruct.
Rank 16, 3 epochs, packing, max_seq 768. Trained on H100 in 20m38s,
final loss 0.091. Adapter: 154MB.
This commit is contained in:
marauder-actual
2026-05-26 00:14:51 +02:00
parent 8137d278db
commit 122e73860b
2 changed files with 1525 additions and 0 deletions
+1352
View File
File diff suppressed because it is too large Load Diff