A minimal F5-TTS fine-tune trainer (no datasets, no accelerate)
A ~250-line trainer for F5-TTS that bypasses the HuggingFace datasets and
accelerate dependency stack. Single-file, readable end-to-end.
Useful when:
- the upstream
f5-tts_finetune-cliwon’t install/run because ofpyarrow/pandas/datasetsissues - you want a single-file trainer you can read and modify
- you want to train using pre-computed mel-spectrograms loaded from disk rather than recomputing per epoch
The full code (trainer + mel pre-compute helper + README with the design notes) lives at the gist:
→ gist.github.com/netlinux-ai/a7bbf6c64487bdc9ae5ff66731c5646f
Key design notes worth highlighting:
-
Stub
datasetsinsys.modulesbefore anyf5_ttsimports — F5-TTS’ ownf5_tts.model.datasetdoesfrom datasets import Datasetat module load. Stubbing satisfies the import without pulling pyarrow. -
Strip the
ema_model.prefix from the published F5TTS_v1_Base checkpoint. The published file contains only EMA shadow weights; naive loaders that skipema_model.*get a random-initialised model. See the companion bug report. -
Don’t decay LR on short fine-tunes. The default warmup-then-linear-decay schedule from F5-TTS pretraining will decay LR to ~zero over the run. On short (< 50 epoch) fine-tunes, late-epoch gradients contribute almost nothing. Use constant LR after warmup.
-
num_workers=0for the DataLoader. Subprocess workers re-import torch and re-run dynamo init, which can SIGFPE on older CPUs. Keep loading in the main process; with pre-computed mels, throughput is GPU-bound anyway.