netlinux-ai
Notes from a TTS fine-tuning project: getting F5-TTS and StyleTTS2 running on a 2010-era CPU (AMD Phenom II X6, no AVX2) with an RTX 3060, fine-tuning each on a small Northern English corpus, and comparing what each architecture actually learns.
The work produced two production scripts (tts.sh using StyleTTS2 for
clean phonetics; tts-f5.sh using F5-TTS for stronger accent commitment)
plus a number of upstream patches and documentation pieces published
as gists and PRs.
Posts
Posts
-
How human feedback actually steers TTS fine-tuning
-
Running modern Python TTS toolchains on non-AVX2 CPUs
-
A minimal F5-TTS fine-tune trainer (no datasets, no accelerate)
-
F5-TTS vs StyleTTS2: a real Pareto trade-off in fine-tune behaviour
subscribe via RSS