OpenAI and Microsoft implement Vall-E

OpenAI and Microsoft continue the battle with Google in artificial intelligence by implementing Vall-E, the new voice chatbot. This is a speech synthesis software that can simulate the human voice after just three seconds of listening.

In other words, this is the latest piece of the generative artificial intelligence system developed by Microsoft and OpenAI, with which since 2019 the colossus of Bill Gates is linked by a multi-year, multibillion-dollar partnership.

Vall-E: all the details about the new chatbot from OpenAI and Microsoft

Valle-E is a tool of AGI, Artificial General Intelligence, that is, a “general” or “strong” artificial intelligence that can simulate human intelligence. Thus, as opposed to what we have known so far, which is “narrow” or “weak” AI.

The latter is able to respond with preset actions to specific tasks, but not to react to an unplanned action. In recent years, AI chatbots have not performed as well as its creators expected because they were limited to small tasks and had a high error rate.

Valle-E was developed to be used with high-quality speech synthesis tools and to create original audio from an example sample. OpenAI defines Valle-E as a “natural codec language model,” as its operation is based on a technology called EnCodec.

The startup, funded by Elon Musk and Sam Altman, among others, also boasts the creation of ChatGPT, a chatbot that can sustain an interactive conversation with users by remembering and learning from previous actions and precedents.

Hence, just as ChatGPT is able to generate codes autonomously, Valle-E is also designed to create discrete audio codecs from listening to an audio sample.

Behaving precisely as a human.

Together with the GPT-3 software for text and Dall-E/Stable Diffusion for images, the Valle-E audio system completes the ChatGPT triptych and aims to revolutionize the field of generative AI.

Speaker Prompt, Ground Truth, Baseline and Vall-E.

The sophistication of the new tool launched by OpenAI and Microsoft lies in Valle-E’s ability to recognize the timbre, inflection, and emotional tone of the person who is speaking and replay it after only three seconds of listening.

The applications in audio editing are many, as are the criticisms of the software’s potential for manipulation and misuse. Not surprisingly, unlike what happened with ChatGPT, Microsoft did not provide the code for Vall-E for others to experiment with.

Samples of speech already synthesized by the software can also be found on the Valle-E site. In particular, several sampling variants can be heard including: Speaker Prompt, Ground Truth, Baseline, and Vall-E. 

The first option is an audio clip whose speech connotations have to be reproduced by the AI; in the second, a sentence is spoken for which the AI has to propose a comparison. The third, on the other hand, is an example generated with currently available speech synthesis technologies. Finally, Vall-E is the original speech generated by Microsoft’s software.

Potentialities and dangers of OpenAI and Microsoft’s AI.

Microsoft and OpenAI researchers seem aware of the potential harms of this technology. In fact, they communicated in a public paper the following:

“Since Vall-E could synthesize speech that maintains the identity of the speaker, such technology could pose potential risks related to improper use of the model, such as spoofing voice identification or impersonating someone.”

Therefore, Microsoft adds, to mitigate such risks, a detection model can be built to distinguish whether an audio clip has been synthesized by Vall-E. In this regard, the two giants will also implement Microsoft’s artificial intelligence principles during further model development.

However, the risk of emulation is not the only factor generating skepticism and fear. Vall-E was trained using the LibriLight audio library made by Meta, which contains 60 thousand hours of English-language speeches extracted mostly from public domain audiobooks, recorded and read by volunteers.

In any case, to increase its synthesis capacity, Vall-E will need to expand its learning pool to the entire Internet. This next step is what enabled GPT-3, ChatGPT’s predecessor, to achieve impressive sentence processing, writing, and assembly capabilities.

Despite this, the software was also prone to formulating violent, sexist and racist content precisely because it worked on examples taken indiscriminately from the entire Web. This is what could also happen with the new Vall-E.

In this case, filtering operations would require the use of numerous human staff, which, at the moment, the big digital giants do not seem to foresee given the wave of layoffs that is affecting big tech.

Google unveils Bard to compete with OpenAI and Microsoft

As anticipated, competing with Microsoft and OpenAI is Google, which is set to unveil Bard, the chatbot from DeepMind, the company acquired by Google’s Alphabet. Bard looks like an exact copy of ChatGPT, but without the flaw in updates.

Sundar Pichai, Google’s CEO, presented the new software as a tool that draws information from the web to provide fresh, high-quality responses. By “fresh,” he means continuously updated, something Microsoft’s AI still fails to do.

In a nutshell, Bard aims to generate detailed answers to simple questions. Its operation is based on LaMDA, the Language Model for Dialogue Applications, which one of Google’s own engineers had previously described as “sentient.”

There is no denying that Google’s announcement of Bard’s launch was expected by technology enthusiasts. After all, according to reports in the Wall Street Journal, Alphabet, Google’s parent company, has invested more than $31 billion in artificial intelligence in 2021, more than any other competitor.

After the success of ChatGPT, the company therefore decided to summon the very best: founders Larry Page and Sergey Brin. In any case, there is no doubt that artificial intelligence software is an invaluable resource in the field of innovation.

Indeed, even Amazon, Meta, and Apple are certainly not going to sit back and watch what others are doing without taking action. However, while competition is a great accelerator in terms of research, there is a risk that, in the race for the best artificial intelligence, flawed systems with errors, limitations and risks will be used without paying too much attention to the big picture.