Microsoft's AI Voice Cloning Technology Is So Good, But There's a Catch

Microsoft research team revealed VALL-E 2a new AI system for speech synthesis that can generate “human-level performance” voices with just a few seconds of audio that is indistinguishable from the source.

“VALL-E 2 is the latest advancement in neural codec language modeling that marks a milestone in training-free text-to-speech (TTS) synthesis, achieving human parity for the first time,” the paper said.

The system builds on its predecessor VALL-E introduced in early 2023. Neural codec language models represent speech as code sequences.

What sets VALL-E 2 apart from other speech cloning techniques is its “Iterative Aware Sampling” approach and adaptive switching between sampling techniques, the team says. These strategies improve consistency and address the most common problems with traditional speech creation.

The researchers write:

“VALL-E 2 synthesizes consistently high-quality speech, even for sentences that are difficult to understand due to complexity or repetitive phrases,” he said, pointing out that the technology could help create voices for people who have lost the ability to speak.

However, this tool is so impressive that it will not be available to the public.

“We currently have no plans to incorporate VALL-E 2 into products or expand its accessibility to the public,” Microsoft said in its ethics statement, noting that such tools carry risks such as voice mimicry without consent and the use of convincing AI voices in fraud and other criminal activities.

The research team stressed the need for a standard method for digitally watermarking AI generations, noting that detecting AI-generated content with high accuracy remains a challenge.

“If the model is to generalize to unseen people in the real world, it must include a protocol to ensure that the speaker consents to the use of their voice and a synthetic voice detection model.”

That said, VALL-E 2’s results are remarkably accurate compared to other tools. In a series of tests conducted by the research team, VALL-E 2 outperformed human standards in terms of the robustness, naturalness, and similarity of the generated speech.

Source: Microsoft

The VALL-E-2 was able to achieve these results with just 3 seconds of audio. However, the team notes that “using a 10-second voice sample yields even better results.”

Microsoft isn’t the only AI company to demonstrate advanced AI models without releasing them to the market. Meta’s Voicebox and OpenAI’s Voice Engine are two impressive voice transcription tools that suffer from similar limitations.

A Meta AI spokesperson said last year:

“There are many interesting use cases for generative speech models, but due to the risk of misuse, we are not publicly releasing the Voicebox model or code at this time.”

Additionally, OpenAI explains that it is first trying to address privacy issues before rolling out its synthetic voice model.

OpenAI explains in a post on the official blog:

“In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not broadly release the technology at this time.”

Calls for ethical guidelines are spreading across the AI community, especially as regulators begin to raise concerns about the impact of generative AI on our everyday lives.

Home home

According to Decrypt

Microsoft’s AI Voice Cloning Technology Is So Good, But There’s a Catch

Can Pepe Coin (PEPE) Clear Zero in July?

Floki rallies as Shiba Inu, Pepe and DogWifHat plummet

Expert Advises Holding SHIB Before Bitcoin Surges to $75,000

LEAVE A REPLY Cancel reply

Fresh

Can Pepe Coin (PEPE) Clear Zero in July?

Floki rallies as Shiba Inu, Pepe and DogWifHat plummet

Expert Advises Holding SHIB Before Bitcoin Surges to $75,000

Predicting when the Shiba Inu will remove the fourth zero

Ethereum ETF, Uniswap V4, and Cardano Are Key Catalysts for Crypto Market in Q3

Cardano: Integration Changes to Help ADA Surpass $2?

Europe Updates Travel Rule to Include Cryptocurrency Service Providers

Conditions for Fetch.ai to Recover While ADA Could Drop Over 40%

Check out 3 altcoins that increased 15-56% this week despite the broader market downturn

Ethereum Developers Concerned About Pectra Hard Fork Overload

Justin Sun Looks to Have Lost $66 Million Due to ETH Price Drop, Motivation for German Bitcoin Offer?

“Don’t Let Government Selling Bitcoin FUD Ruin Your Trading”

Bitcoin Could Continue to Fall Further If Key Support Level Breaks Plus News TRON, WIF, APT, XAI, IMX, Toncoin, Puffer Finance

Spot Bitcoin ETF Sees Surge in Inflows After July 4 Drop

Google Announces AI Training Technique That Optimizes, Increases Speed and Efficiency

Bitfinex to Refund Investors of Failed Hilton El Salvador Hotel Project

EDITOR PICKS

Shiba Inu updated with 3 new features

Here are 5 RWA altcoins to watch out for in July

Presale Successful, Pepe Unchained Promises to Heat Up the Meme Coin Market with Layer 2 Technology

POPULAR POSTS

Grayscale Ethereum Trust NAV Discount Quietly Shifts to Premium Ahead of Expected Spot ETF Launch

Despite Mt.Gox Creditor Selloff Concerns, Market Makers and Investors Remain Optimistic on Bitcoin

Only 5 profitable miners left

FOLLOW US