A ~250-line trainer for F5-TTS that bypasses the HuggingFace datasets and accelerate dependency stack. Single-file, readable end-to-end.

Useful when:

  • the upstream f5-tts_finetune-cli won’t install/run because of pyarrow / pandas / datasets issues
  • you want a single-file trainer you can read and modify
  • you want to train using pre-computed mel-spectrograms loaded from disk rather than recomputing per epoch

The full code (trainer + mel pre-compute helper + README with the design notes) lives at the gist:

gist.github.com/netlinux-ai/a7bbf6c64487bdc9ae5ff66731c5646f

Key design notes worth highlighting:

  1. Stub datasets in sys.modules before any f5_tts imports — F5-TTS’ own f5_tts.model.dataset does from datasets import Dataset at module load. Stubbing satisfies the import without pulling pyarrow.

  2. Strip the ema_model. prefix from the published F5TTS_v1_Base checkpoint. The published file contains only EMA shadow weights; naive loaders that skip ema_model.* get a random-initialised model. See the companion bug report.

  3. Don’t decay LR on short fine-tunes. The default warmup-then-linear-decay schedule from F5-TTS pretraining will decay LR to ~zero over the run. On short (< 50 epoch) fine-tunes, late-epoch gradients contribute almost nothing. Use constant LR after warmup.

  4. num_workers=0 for the DataLoader. Subprocess workers re-import torch and re-run dynamo init, which can SIGFPE on older CPUs. Keep loading in the main process; with pre-computed mels, throughput is GPU-bound anyway.