ElevenLabs says it may need to impose new rules on its voice-cloning tool after bad actors reportedly used it to create fake clips of celebs and media personalities saying offensive things.
One week after opening access to its AI voice-cloning platform, AI speech startup ElevenLabs says it may need to rethink that openness amid “an increasing number of voice cloning misuse cases.”
The company, founded in 2022 by former Google machine learning engineer Piotr Dabkowski and ex-Palantir deployment strategist Mati Staniszewski, addressed the issue on Twitter after Vice found clips(Opens in a new window) on 4chan featuring what sound like auto-generated celebrity voices making questionable statements. One, for instance, appears to be actor Emma Watson reading from Mein Kampf; others include homophobic, transphobic, violent, and racist sentiments from the likes of media personalities Joe Rogan and Ben Shapiro.
Though it’s not clear which of the 4chan clips were created using the ElevenLabs beta, one post contained a link to Prime Voice AI, suggesting the company’s software may have been employed.
“While we see our tech being overwhelmingly applied to positive use, we also see an increasing number of voice cloning misuse cases. We’d like to address this by implementing additional safeguards,” the firm wrote in a Monday Twitter thread(Opens in a new window), throwing out ideas like eliminating its Voice Lab platform for creating new voices, and instead manually verifying every cloning request.
It encouraged people to reply on Twitter or send the company direct messages with their thoughts. Based on the public responses, a number of people (who do not appear to be beholden to corporate lawyers) think ElevenLabs should throw caution to the wind and do nothing. “Voice is just a frequency range,” said one person. “Consider doing nothing until you get a cease and desist or something,” wrote another. A few offered tentative support for a paid service.
ElevenLabs offers “speech synthesis” and “voice cloning”—the latter of which results in a clean sample recording, more than one minute in length. “Professional cloning,” meanwhile, can allegedly reproduce any accent, for use in newsletters, books, and videos.
Similar advances in automation previously led to a rise in deepfake videos (specifically: pornography). And while ElevenLabs built its software with the goal to “instantly convert spoken audio between languages”—there will always be someone who abuses AI Voice Tech.