About

This site collects notes, patches, and write-ups from a small TTS fine-tuning project. The hardware (“grahams-brain”) is a 2010-era AMD Phenom II X6 with an RTX 3060 12 GB — a deliberately constrained rig that forces every piece of the modern Python AI stack to be examined for compatibility.

The corpus is ~3 hours of clean single-speaker British (Bolton/Lancashire) audio from three speakers (Sara Cox, Maxine Peake, Diane Morgan). Both F5-TTS and StyleTTS2 were fine-tuned on it; the resulting models ship as distinct production scripts because they hit different sweet spots on the accent-strength vs phonetic-stability trade-off.

Code, patches, gists

Minimal F5-TTS trainer (no datasets / accelerate)
Non-AVX2 CPU TTS compatibility notes
How human feedback steers TTS fine-tuning
F5-TTS vs StyleTTS2 architecture trade-off

Upstream contributions

StyleTTS2 PR: weights_only=False for PyTorch ≥ 2.6
StyleTTS2 PR: drop pandas dependency
F5-TTS issue: EMA-only checkpoint structure
kokoro issue: broken misaki version pin

Contact

GitHub: netlinux-ai