A minimal F5-TTS fine-tune trainer (no datasets, no accelerate)

A ~250-line trainer for F5-TTS that bypasses the HuggingFace datasets and accelerate dependency stack. Single-file, readable end-to-end.

Useful when:

the upstream f5-tts_finetune-cli won’t install/run because of pyarrow / pandas / datasets issues
you want a single-file trainer you can read and modify
you want to train using pre-computed mel-spectrograms loaded from disk rather than recomputing per epoch

The full code (trainer + mel pre-compute helper + README with the design notes) lives at the gist:

Key design notes worth highlighting:

Stub datasets in sys.modules before any f5_tts imports — F5-TTS’ own f5_tts.model.dataset does from datasets import Dataset at module load. Stubbing satisfies the import without pulling pyarrow.
Strip the ema_model. prefix from the published F5TTS_v1_Base checkpoint. The published file contains only EMA shadow weights; naive loaders that skip ema_model.* get a random-initialised model. See the companion bug report.
Don’t decay LR on short fine-tunes. The default warmup-then-linear-decay schedule from F5-TTS pretraining will decay LR to ~zero over the run. On short (< 50 epoch) fine-tunes, late-epoch gradients contribute almost nothing. Use constant LR after warmup.
num_workers=0 for the DataLoader. Subprocess workers re-import torch and re-run dynamo init, which can SIGFPE on older CPUs. Keep loading in the main process; with pre-computed mels, throughput is GPU-bound anyway.