Until now, Amphion TTS supports the following models or architectures,
- FastSpeech2: A non-autoregressive TTS architecture that utilizes feed-forward Transformer blocks.
- VITS: An end-to-end TTS architecture that utilizes conditional variational autoencoder with adversarial learning
- NaturalSpeech2 (👨💻 developing): An architecture for TTS that utilizes a latent diffusion model to generate natural-sounding voices.
- Jets: An end-to-end TTS model that jointly trains FastSpeech2 and HiFi-GAN with an alignment module.
Here are some TTS samples from Amphion.