This post is part of LifehackersExposing AI series.
It’s fascinating, hilarious, and terrifying.
Of course, there’s the other side of the coin: the potential for rampant misinformation.
In this case,the models are trained on samples of other people speaking.
OpenAI’s Whisper model, for example,was trained on 680,000 hours of data.
Once the model is trained, however, it doesn’t needthatmuch data to replicate a voice.
Give it more data, and it will replicate the voice more accurately.
As the tech advances, it’s getting more difficult to immediately spot forgery here.
However, where they still struggle is in replicating thewaywe speak.
For example, “collages” might go from co-lah-jez, toco-lah-jez or co-lay-ges.
The pacing might be affected, as well.
An AI model might blow past the spacing between two sentences, which will give itself away immediately.
(Even a human who can’t stop talking doesn’t sound so robotic.)
and my second sentence, “Thinking about heading to the movies tonight.”
On the flip side, it may taketoolong to get to the next word or sentence.
But you don’t get some of the highs and lows of his particular way of speaking.
There’ssomevariance here: The bot saying “Ohh, Danny, you’re Italian” sounds realistic enough.
The last word of the recording, “sandwich,” sounds especially off.
Again, things are advancing fast here.
Even still, there are imperfections you’ve got the option to spot if you’re listening closely.
Is a celebrity or politician saying something ridiculous or provocative?
Was it a media organization, or just some random account on Instagram?
If it’s real, multiple media organizations will likely pick up on it quickly.
As with all AI media detectors, however, take these tools with a grain of salt.
The other,Pindrop Security, correctly identified 81 of the 84 sample clips submitted, which is impressive.
Just understand the limitations of the programs you’re using.