That robot sounds just like you

First, OpenAI tackled text with ChatGPT, then images with DALL-E. Next, it announced Sora, its text-to-video platform. But perhaps the most pernicious technology is what might come next: text-to-voice. Not just audio — but specific voices.

A group of OpenAI clients is reportedly testing a new tool called Voice Engine, which can mimic a person’s voice based on a 15-second recording, according to the New York Times. And from there it can translate the voice into any language.

The report outlined a series of potential abuses: spreading disinformation, allowing criminals to impersonate people online or over phone calls, or even breaking voice-based authenticators used by banks.

In a blog post on its own site, OpenAI seems all too aware of the potential for misuse. Its usage policies mandate that anyone using Voice Engine obtain consent before impersonating someone else and disclose that the voices are AI-generated, and OpenAI says it’s watermarking all audio so third parties can detect it and trace it back to the original maker.

But the company is also using this opportunity to warn everyone else that this technology is coming, including urging financial institutions to phase out voice-based authentication.

AI voices have already wreaked havoc in American politics. In January, thousands of New Hampshire residents received a robocall from a voice pretending to be President Joe Biden, urging them not to vote in the Democratic primary election. It was generated using simple AI tools and paid for by an ally of Biden's primary challenger Dean Phillips, who has since dropped out of the race.

In response, the Federal Communications Commission clarified that AI-generated robocalls are illegal, and New Hampshire’s legislature passed a law on March 28 that requires disclosures for any political ads using AI.

So, what makes this so much more dangerous than any other AI-generated media? The imitations are convincing. The Voice Engine demonstrations so far shared with the public sound indistinguishable from the human-uttered originals — even in foreign languages. But even the Biden robocall, which its maker admitted was made for only $150 with tech from the company ElevenLabs, was a good enough imitation.

But the real danger lies in the absence of other indicators that the audio is fake. With every other AI-generated media, there are clues for the discerning viewer or reader. AI text can feel clumsily written, hyper-organized, and chronically unsure of itself, often refusing to give real recommendations. AI images often have a cartoonish or sci-fi sheen, depending on their maker, and are notorious for getting human features wrong: extra teeth, extra fingers, and ears without lobes. AI video, still relatively primitive, is infinitely glitchy.

It’s conceivable that each of these applications for generative AI improves to a point where they’re indistinguishable from the real thing, but for now, AI voices are the only iteration that feels like it could become utterly undetectable without proper safeguards. And even if OpenAI, often the first to market, is responsible, that doesn’t mean all actors will be.

The announcement of Voice Engine, which doesn’t have a set release date, as such, feels less like a product launch and more like a warning shot.

More from GZERO Media

Slovakian President-elect Peter Pellegrini gestures, at F.D. Roosevelt University Hospital where Prime Minister Robert Fico was taken after a shooting incident in Handlova, in Banska Bystrica, Slovakia, May 16, 2024.
REUTERS/Leonhard Foeger

Slovak Prime Minister Robert Fico survived Wednesday’s assassination attempt “by a hair,” said President-elect Peter Pellegrini on Thursday, as authorities reported that the shooter was a “lone wolf” without providing further details.

US troops commenced work on the construction of the floating pier that will bring humanitarian aid into Gaza on Monday
Reuters

“The last thing Biden wants is dead US soldiers or servicemen in Gaza or a situation where he has to put boots on the ground,” says Gregory Brew, a Eurasia Group analyst.

US President Joe Biden deliver remarks on American investments before signing documents related the China tariffs in the Rose Garden of the White House in Washington on May 14, 2024.
Yuri Gripas/ABACAPRESS

Joe Biden employed executive privilege to deny House Republicans access to recordings of his interview with Robert Hur, the special counsel investigating the president’s handling of sensitive government documents.

A Congolese soldier stands guard as he waits for the ceremony to repatriate the two bodies of South African soldiers killed in the ongoing war between M23 rebels and the Congolese army in Goma, North Kivu province of the Democratic Republic of Congo February 20, 2024.
REUTERS/Arlette Bashizi

The Democratic Republic of Congo has called for a global embargo of mineral exports from Rwanda, which it accuses of backing rebel groups along their shared frontier.

Violent riots have been taking place in Noumea since yesterday evening. Numerous shops and a number of houses have been set alight, looted or destroyed by young independantists, who reject the reform of the electoral freeze. In photo: view of Noumea, where many buildings are under fire. New Caledonia, Noumea, May 14, 2024.
Delphine Mayeur / Hans Lucas via Reuters Connect

France declared a 12-day state of emergency and banned TikTok in its South Pacific territory of New Caledonia on Thursday after at least four people were killed and hundreds more injured in riots that broke out Monday.

Annie Gugliotta

Did Hamas score a big win at the United Nations, or was it actually a win for the much-maligned idea of the two-state solution? To find out, GZERO Publisher Evan Solomon turned to Canada’s Ambassador to the United Nations Bob Rae for answers.

U.S. President Joe Biden speaks during a virtual roundtable on securing critical minerals at the White House in Washington, U.S., February 22, 2022.
REUTERS/Kevin Lamarque

Speaking of China,the US and Canada are taking their efforts to compete with Beijing underground – literally.