About
This site collects notes, patches, and write-ups from a small TTS fine-tuning project. The hardware (“grahams-brain”) is a 2010-era AMD Phenom II X6 with an RTX 3060 12 GB — a deliberately constrained rig that forces every piece of the modern Python AI stack to be examined for compatibility.
The corpus is ~3 hours of clean single-speaker British (Bolton/Lancashire) audio from three speakers (Sara Cox, Maxine Peake, Diane Morgan). Both F5-TTS and StyleTTS2 were fine-tuned on it; the resulting models ship as distinct production scripts because they hit different sweet spots on the accent-strength vs phonetic-stability trade-off.
Code, patches, gists
- Minimal F5-TTS trainer (no datasets / accelerate)
- Non-AVX2 CPU TTS compatibility notes
- How human feedback steers TTS fine-tuning
- F5-TTS vs StyleTTS2 architecture trade-off
Upstream contributions
- StyleTTS2 PR: weights_only=False for PyTorch ≥ 2.6
- StyleTTS2 PR: drop pandas dependency
- F5-TTS issue: EMA-only checkpoint structure
- kokoro issue: broken misaki version pin
Contact
GitHub: netlinux-ai