Xiaomi open-sources OmniVoice, a voice cloning TTS model covering more than 600 languages.

CoinFeed reported on May 7th that Xiaomi AI Labs has launched OmniVoice, a multilingual speech cloning TTS model. Employing a minimalist single-bidirectional Transformer architecture, it supports speech synthesis in 646 languages, outperforming mainstream models in both Chinese and English scenarios in terms of synthesis quality and inference speed. Trained on approximately 580,000 hours of data from 50 open-source datasets, the model uses a dynamic upsampling strategy for low-resource languages. In tests with 24 and 102 languages, its speech similarity and intelligibility surpass many commercial systems, with some metrics approaching or even exceeding those of real speech. OmniVoice supports cross-language speech cloning, custom timbres, noisy reference audio adaptation, sub-language control, and pronunciation correction. The training and inference code, along with model weights, are open-sourced on platforms such as GitHub and Huggingface.

Xiaomi open-sources OmniVoice, a voice cloning TTS model covering more than 600 languages.

Share this article

Share Article