Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 780 Bytes

File metadata and controls

13 lines (9 loc) · 780 Bytes

Amphion Text-to-Speech (TTS) Recipe

Supported Model Architectures

Until now, Amphion TTS supports the following models or architectures,

  • FastSpeech2: A non-autoregressive TTS architecture that utilizes feed-forward Transformer blocks.
  • VITS: An end-to-end TTS architecture that utilizes conditional variational autoencoder with adversarial learning
  • NaturalSpeech2 (👨‍💻 developing): An architecture for TTS that utilizes a latent diffusion model to generate natural-sounding voices.
  • Jets: An end-to-end TTS model that jointly trains FastSpeech2 and HiFi-GAN with an alignment module.

Amphion TTS Demo

Here are some TTS samples from Amphion.