Despite the metaverse being virtual, the basic tenets of communication and language will be the same as in the physical world. If the metaverse is supposed to behave like global cyberspace, the language barriers have to be removed. Zuckerberg’s Meta clearly picked up on this issue and announced an AI-powered universal speech translator. Meta has claimed that the speech-to-speech translator will have no delay due to transcription time. Conversations will be more natural since the translation process becomes seamless and will not be noticeable by the other person.
Foundation of metaverse
The universal speech translator will be built on the basis of a ‘No Language Left Behind’ concept, which is a translation system that will have the ability to learn every single language. This will also include languages that are largely spoken and don’t have texts available. By 2019, Facebook was already supporting 41 languages.
Check out the video here
Sign up for your weekly dose of what's up in emerging technology.
It is not just Meta that believes in improving the functions of conversational AI. Last year, at the Interspeech Conference held in the Czech Republic in September, CEO Jensen Huang demonstrated the capabilities of their Conversational AI. In the middle of Huang’s keynote address, a virtual Huang slipped into the speech with nobody being able to make out the difference between the two, even though one had a voice and an image that was computer-generated.
The company also released another series called ‘I am AI’, where the voice of the speaker is computer-generated. NVIDIA also came up with Vid2Vid Cameo, which combines Conversational AI and advanced real-time graphics processing. In instances when one is not looking their best and has to urgently show up on a video call, Cameo has the ability to map an uploaded image of the user with their real-time facial expressions.
Fresh speech applications
At this year’s NVIDIA GTC Conference, CEO Huang introduced the Riva 2.0 SDK, and the company’s Riva Enterprise managed offering. Both of these can be used in the market to build voice-related AI applications signalling NVIDIA’s interest in speech recognition. The company stated that Riva 2.0 can be used with NVIDIA TAO, the company’s low code solution for speeding up AI model development. NVIDIA has revealed that Snap, Snapchat’s parent company, employs Riva’s automatic speech recognition and text to speech tech in their platform for developers. Another communication solutions company RingCentral is also using Riva’s services for live captions during video conferences.
According to research company Markets and Markets, the speech and voice recognition market is predicted to grow from USD 8.3 billion in 2021 to USD 22 billion in 2026, thanks to enterprise applications. A survey conducted by Pindrop in 2018 found that 28 per cent of 500 IT and business decision makers were using voice technology to help customers.
Speech technology now also includes voice cloning tools that use AI to copy the pitch and the intonation of a person’s speech. NVIDIA’s Riva Custom Voice, a tool for speech cloning, stated that it could study 30 minutes of pre-recorded speeches to create custom, human-like voices. The global voice cloning market has the potential to grow from USD 456 million to USD 1.73 billion in 2023, according to Markets and Markets.
“Healthcare, automobile, retail, e-commerce, banking, and human resources aiming to improve customer service through more personalised interactions are some of the use cases where we have seen explosive growth rates. Advanced algorithms which support the processing of complex, layered conversations through Natural Language Processing and Natural Language Generation that enable near-human interactions have been the driving technologies behind the success of conversational AI. In addition, Automatic Speech Recognition and Advanced Dialog Management, with machine learning as the backbone, have made ground-breaking progress in delivering optimised output,” Amitt Sharma, CEO and founder of VDO.AI stated.
Last week, Conversational AI-based platform Kore.ai’s CEO Raj Koneru spoke about how he believed most applications in the future would become conversational. Koneru also reiterated that conversational AI would become the foundation for the metaverse and the omniverse.
And with tech giants like Microsoft, Meta, Amazon, TikTok and Apple investing in creating their own metaverses and businessmen like Bill Gates predicting that virtual meetings will shift to metaverses within the next two to three years, the business opportunities for conversational AI are growing exponentially.