Spotifysays more than 100 million people streampodcastson its platform on a regular basis. It has spent years acquiring and cultivating talent and shows to call its “originals,” and the ad deals they generate help pay the bills. Now, the company is taking a step past the suggestion algorithms we’ve all come to acquiesce and is using artificial intelligence to replicate its podcast hosts' voices for translated versions of their programs. So, how will it all work? Let’s lay out the claims.
What is AI voice replication?
Using artificial intelligence to mimic and manipulate voices already has its own cottage industry among online media creators. If you want to hear Hank Hill of “King of the Hill” (voiced by Mike Judge) sing the Marty Robbins classic “Big Iron,” you can. You might also come across pastiches or even full-on radio dramas with characters from your favourite franchises. The technology is also used in text-to-speech engines such as the ones TikTok utilizes for dictating captions.
The actual product these models generate will vary. Early exemplars with limited capabilities to train voices to exact scripts would hallucinate easily: a monotone reading can suddenly burst into Hendrix-esque psych rock. Intonation is a clear weak spot if you’re listening to spoken word. In singing, you’ll find some odd trills where you might not expect them. But the biggest setback you’ll probably notice is the poor audio quality - presumably relatively poor compared to the source material the model was trained on.
Of course, when we talk about samples, we get into the murky mess of rights licensing, theft, plagiarism, impersonation, and other difficult topics.Anything AI-relatedwarrants these concerns. Circling back to TikTok, the Chinese-owned social media company settled with voice actress Bev Standing in 2022 over using her voice in its TTS feature (viaVoiceOverXtra). But with the instant and exponential proliferation potential of the internet, many people who have even a shred of notoriety in some small corner of the world aren’t even aware of and don’t have the chance to address AI impersonations of their voice. It is unclear how this issue may come to a head, but it’s definitely one to keep in mind.
What is Spotify’s AI Voice Translation program?
The company says ina corporate blog postthat it is working with a handful of its shows' hosts to generate podcast episodes that have been translated into French, German, and Spanish using AI-generated versions of their voices. Spotify is piloting the scheme with Spanish versions of the following episodesnow available for streaming:
More episodes from more shows will be coming in all three languages over the coming weeks. Trevor Noah, Monica Padman, Bill Simmons, and the crew ofThe Rewatchablesfrom The Ringer have all signed on. One uncertainty that might play into this is whether guests will have to agree to have their voice replicated to appear on these shows, but we’ll have to see if any disagreements make their way into the fore.
How does Spotify’s AI Voice Translation sound?
Again, translated episodes are already live and ready for your discernment, but if you’ve only got the patience for clips, you can check out the sampler video above.
Spotify is packaging AI Voice Translation as a “tool.” It’s uncertain if that means a podcaster can flip a switch and start using the tool, nor is it clear how much control they have over their voice data regarding AI training. The company does name-drop OpenAI, the maker of ChatGPT, as a tech partner.The Vergereports that the tool relies on theWhisperneural net to generate voices and perform translation functions.
While we can’t vouch for the vernacular, Whisper seems to be doing a decent job with intonation, cadence, and replication, even if the quality is still not all there.
The hope is that it’ll be good enough to draw in millions of new listeners to popular, highly-produced podcasts by breaking down the English language barrier. It also helps if the ads coming through are also in French, German, and/or Spanish.