MITB Banner

Keyboards will Soon Become Obsolete 

QWERTY anyone?

Share

Keyboards will Soon Become Obsolete

Illustration by Diksha Mishra

Listen to this story

Entrepreneur and investor Naval Ravikant recently re-launched his social media app, Airchat, when there was no dearth of such platforms already. However, the USP of this one is that the app is completely voice-centric – the interaction is via voice only. 

Source: X

Airchat might just be the latest entrant highlighting the power of voice, but a number of recent AI platforms and devices have already brought voice as a predominant user interface. 

Dawn of the Voice Era 

Multimodal AI was identified as one of Microsoft’s AI trends for the year, and going by the AI developments this year, voice modality is emerging as a key feature. 

The latest Humane Ai Pin, a small wearable device that performs as a personal assistant and even works as a probable replacement for a smartphone, essentially works on voice. Any type of interaction with the device like making calls, reading messages, and clicking pictures can be executed through voice commands. 

Bethany Bongiorno, co-founder of Humane, believes that voice will be an integral part of an AI future. “Voice-first in an AI future,” she said. Similarly, AI devices such Rabbit R1, a pocket-size gadget which is an integrated language action model, also operates on voice commands. 

Brett Adcock, CEO and founder of robotics company Figure AI, said, “We believe the default user interface for the robot is speech. You’re going to want to talk to the robot. Even in an industrial setting, when you’re unboxing the robot for the first time, we think the initialisation process is speech.”

How Do We Assess Them? 

With voice models come a different set of evaluation parameters. Interestingly, the shift has started after a year-long emphasis on text-based AI generation. 

Benchmarks and leaderboards for evaluating text-based models have always been a topic of LLM discussions. The need has also given rise to an Indic LLM Leaderboard. However, leaderboards for voice-based generative models are not that prominent. 

There are evaluation parameters for voice-based models such as latency, word error rate (WER), short-time objective intelligibility (STOI), miss-rate, and ROC Curve. These parameters measure accuracy in terms of speech quality and speech intelligibility. 

Shift from Text to Voice

Chatbots that aid multiple functions such as HR operations or finding love, are essentially text-based. However, there is a casual shift now.  

Last month, Hume AI released EVI, an empathetic voice interface AI model. Users can converse with the model normally, where the model will be able to analyse and understand a user’s emotion based on the tone of the voice and other features. It almost serves as a therapist. 

Hume comes as a huge shift from other similar platforms such as Inflection’s Pi, which acts as an emotionally intelligent AI that helps with one’s emotional needs. 

Not All Big Tech are Gung-ho

While the big tech companies are integrating voice in one form or another, be it OpenAI’s ChatGPT or Google’s Gemini, the models are multimodal allowing voice interface as a normal mode. Interestingly, a major player, Apple, is not too keen on this form of modality yet. 

For a company making strides in bringing generative AI features to its phone, and even releasing AI models such as RealM that could possibly beat GPT-4, Apple is yet to catch up on the voice game. 

However, voice is not completely alien to Apple. Apple’s spatial computing device, Apple Vision Pro can be controlled using voice features. 

Further, the company’s famed voice assistant, Siri, is expected to get advanced AI features which will probably be announced at the Apple WWDC 2024 event in June. The feature might be a major boost to Apple’s voice modality function. 

While voice is being increasingly adopted, companies are still relying on text-based chatbots. IT company, Happiest Minds recently announced ‘hAPPI’, a generative AI-powered chatbot that will converse with users on health and wellness-related queries. 

It is obvious that to get to the closest level-of human-like interaction, voice becomes indispensable. After all, “Humans are all meant to get along with other humans, it just requires the natural voice,” said Ravikant. 

PS: The story was written using a keyboard.

Share
Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.