WaveRNN is a PyTorch implementation of DeepMind’s WaveRNN vocoder, bundled with a Tacotron-style TTS front end to form a complete text-to-speech stack. As a vocoder, WaveRNN models raw audio with a compact recurrent neural network that can generate high-quality waveforms more efficiently than many traditional autoregressive models. The repository includes scripts and code for preprocessing datasets such as LJSpeech, training Tacotron to produce mel spectrograms, training WaveRNN on those spectrograms (with optional GTA data), and finally generating audio. A quick_start.py script allows users to immediately synthesize example sentences from a pretrained model and inspect both generated audio and attention plots. For custom TTS, the project guides you through training Tacotron, forcing GTA spectrogram export when desired, training WaveRNN with or without GTA, and then running joint generation.
Features
- PyTorch implementation of WaveRNN vocoder paired with a Tacotron TTS front end
- End-to-end pipeline for preprocessing, Tacotron training, WaveRNN training, and final waveform generation
- Quick-start script that synthesizes example sentences and visualizes attention for rapid experimentation
- Support for training with ground-truth-aligned (GTA) mel spectrograms to improve quality and stability
- Pretrained WaveRNN and Tacotron models on LJSpeech for immediate use or fine-tuning
- Command-line tools (train_tacotron.py, train_wavernn.py, gen_tacotron.py, etc.) with flexible options for custom datasets and texts