Last Updated: July 2020
Creating natural-sounding speech from text is considered a “grand challenge” in the field of AI and has been a research goal for decades. Over the last two years, WellSaid Labs has consistently researched and developed tremendous breakthroughs in the quality of text-to-speech systems.
This is a collection of observations about WellSaid’s research and TTS capabilities. Below, you can find samples that demonstrate the range of our voices. The samples are unfiltered and unedited; what you hear is a transparent representation of the research in its current state. Here, you’ll get a sense of where we’ve been, from getting close to human parity and achieving it, and get a sense of where we’ll go next, with new languages and more complex texts.
Meet our AI voice actors. Listen to all fifteen voices as they read Shirley Anita Chisholm's 1970 speech, "For the Equal Rights Amendment."
Vanessa
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/38f95f4b-0de2-4e42-8ff3-f5dc85953a5d/VN-26.mp3
Alana
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/27426390-0507-48e4-a233-628fd8660101/AB-245.mp3
David
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e77db1e5-636c-4655-8626-8d639615c2b6/DD-24.mp3
Wade
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/d4e51f3b-e823-4ef2-85dc-738549b618f5/WC-95.mp3
Paige
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/6cbbace0-660c-4ece-8db6-b18a622c047b/PL-6.mp3
Ava
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/27075e0e-a21e-4105-8a83-c86e959012c2/AM-33.mp3
Isabel
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a522b4ae-e20e-472f-a255-6270a0383a2d/IV-12.mp3
Nicole
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2d1d5865-d4c7-4a3a-b3d7-3c659d2dbf19/NL-5.mp3
Tristan
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/1d88b29d-489a-4021-96ba-cd38497678cf/TF-320.mp3
Kai
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e93bf84c-127a-4080-818e-60c56801a1ec/KM-33.mp3
Sofia
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ca2a21b4-6c2d-481f-8158-1bec438f055f/SH-18.mp3
Ramona
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/cc7b3655-6b02-4d4d-a49f-e25a61139ed6/RJ-13.mp3
Tobin
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/95bb234e-153e-4b9e-90a8-4df68e41b288/TA-1.mp3
Patrick
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/d1f32e91-e607-48de-bafc-2b6f06526857/PK-1.mp3
Jeremy
https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a26e02d3-3054-4654-89ea-12bb96614ba0/JG-1.mp3
Transcript Credits: Thank you, Shirley Anita Chisholm, for your work on equal rights.
WellSaid Labs' text-to-speech is the first to achieve human parity (June 2020) for naturalness on short audio clips (512 clips of 15 seconds or less) across multiple voices (15 voices).
<aside> 📓 WellSaid Labs replicated Tacotron-2 in September 2018, the current state-of-the-art.
</aside>
Over the course of two years of research, we incrementally improved our mean opinion score for naturalness. Our initial research was evaluated anecdotally.