After it came into existence about four years ago, Bengaluru-based Gnani.ai founded by Ananth Nagaraj and Ganesh Gopalan has designed a range of products built on voice-recognition technology in Indian vernacular languages. Back in 2016, the two founders had strategised on building speech recognition and NLP systems which guaranteed high-level accuracy and ease of use. Once the startup developed a robust engine with high levels of accuracy, they started building expertise in speech and NLP. The startup then started leveraging it to address voice processes for Indian businesses.
India will have more than 500 million smartphone owners, but the nation has just over 100 million English speakers — most of whom speak it as a second language. So, empowering users (in a country with over 1.3 billion population and numerous languages) to interact with voice applications in their native language is much needed.
Here, AI speech engine tools that process various languages can aid conversation by working as a voice assistant, performing customer service calls or managing voice-based transactions. AI and NLP-powered voice assistant and speech analytics startups have grown steadily and found use cases like customer support, onboarding, lead qualification, and user engagement processes.
For internet adoption of hundreds of millions of these new users, the startup has built AI speech tools for different languages spoken in India. Today it supports 16 languages globally, the majority being Indian. The unique thing about Gnani.ai is that unlike most players in this space, the team created their voice engines without the aid of any third-party APIs. The founders say their vernacular language models are more accurate than the global providers, based on specific benchmarks.
For this weekly interview, we connected with Ganesh Gopalan, CEO & Co-Founder at Gnani.ai to know more about the Indian NLP industry and the applications in business operations. “We have trained our language models with at least ten thousand hours of annotated audio data,” shares Ganesh.
Here are full excerpts from the interview:
Could you please brief on the latest technological developments happening at Gnani.ai?
Gnani is a speech and an NLP AI company with a focus on deep tech. We develop core IP, and we have our proprietary stack for all key technologies starting from ASR, NLP to speech engines.
At Gnani.ai we are doing a lot of exciting things currently. From a technology standpoint, we now have an on-device voice model. It is an all neural speech recognition module, and we are one of the few companies in the world to have this and perhaps the only one to have this for Indian languages. Traditionally, on-device models are typically used for applications like automotive or IoT devices, where you don’t want to reach out to the cloud.
The challenge of on-device models is that they are low vocabulary models and conversational models. We have come up with a product with two things that are special and unique — one that it has an extensive vocabulary and secondly it is also low footprint. Our model is a less than 100 MB model in total RAM consumed. It’s a totally new architecture we have developed in-house.
You mentioned that your on-device engine has less than 100 MB of the footprint. Can you share a use case for that?
It could be used in automotive or any IoT device. We also see a lot of manufacturers looking at a voice as an interface. While most of these implementations will be hybrid, you would want at least some of the key aspects of processing to be done on the device or edge. We believe whenever you want quick responses with a less lead team, a hybrid or an on-device approach will see its applications.
What has been the latest trends when it comes to the vernacular voice business?
We believe hybrid is the future. It is the future of speech technology in terms of deployment. Fundamentally if you can have a conversation with any IoT device in any Indian languages, this is what we have been working on from the last few months
From a business standpoint, the current uncertain situation has drawn up a lot of opportunities, especially when it comes to customer service. The number of leads and implementation that we have in the last few months have been extraordinary. We do a lot of work on the automation of collection processes, automation of onboarding processes, traditional customer services, insurance renewal, to name a few. Our voice bot can be used on any channel be it telephony line, mobile lines, mobile app, mobile websites. There has been a massive surge in companies looking at digital transformation. Digital transformation has gained prominence because of the current situation. That is a huge positive for us.
What kind of traction are you witnessing in speech analytics?
We see a huge interest in speech analytics as we continue to develop exciting innovations in speech analytics while extending to other channels. Our customers have started coming back to us and asking if we can do speech analytics extended to other channels. We also now offer email analytics, chat analytics and IVR analytics which are actionable (omnichannel analytics). We used to be a little more speech-focused, but now we look at the speech and NLP-based analytics equally. All the customer insights are hidden behind those audio conversations, and it has been a challenge to figure out what customers are talking about us. But now people have realised this is a great opportunity.
Most of the businesses today are using English for their operations. How is it different than when companies use Indian languages for communication in the context of voice and language solutions?
If you look at a consumer-facing company, the first question we ask the customer is how much of their customer support is in English. We expected at least 50% would be in English, but the reality is that Indian languages comprise at least seventy to eighty percent or more of the total customer support. We see that for slightly higher-end products or in bigger metros, the language of business conversation in offices is English. But when it comes to customer support, people tend to talk in their vernacular language. Also, we believe voice is and will continue to dominate significantly in terms of customer service, and within that, it is going to be more of the vernacular languages.
Companies are beginning to realise this and are asking for analytics on vernacular customer support. Also, the analytics is better done in the vernacular language. For example, if the conversation is in Hindi or Tamil, and if you want to understand the context of what is being said, you need to have your complete transcription and not just keyword spotting for analytics models. We at Gnani.ai do a comprehensive analysis of the native languages; we don’t just do a translation into English to build models. Instead, our models are developed from scratch and customised for Indian vernacular. If we only translate into English, it leads to major flaws in the accuracy of the models.
You say that analytics is done better in Hindi or other vernacular languages. Can you please elaborate on that point?
Typically across many companies, voice analytics is done with keyword spotting, which is the first approach and is commonly practised. The other method is that companies first do Hindi transcription using API and do perform translation using another API. Once you get the translation in English, you do the analytics in English. We at Gnani.ai do the complete analysis of the language and ensure we have the entire context of what is being spoken to avoid errors during the process.
We also have pre-trained intents now in Hindi and all the other top Indian vernacular languages. We feel this is the value that we bring onto the table, and we have created banking, insurance, collection intents, to name a few. In cases wherein a customer from the same domain that comes to us, it becomes very quick and easy for us to launch the solution/product for them in a short span.
Can you walk us through what went into building data models and collecting data sets for specific languages?
We spent around two years building out our expertise in speech and NLP on a bunch of these languages. Today we support 16-17 languages globally, the majority being Indian. It’s more of getting to build systems in place and have multiple methods and ways, and not just innovate in terms of datasets or algorithms. We have worked with numerous linguistic groups and associations, and they have helped us understand the nuance of the language and have helped us fine-tuned the algorithm. We are trying to do similar things when we are developing systems for companies outside India.
We train our language models with at least ten thousand hours of annotated audio data. We also have these pre-trained models for specific industries; for example, we have pre-trained models for BFSI, banking, loans and insurance collections. We believe the more niche you get with these multiple language models, the more accurate your system will be.
Tell us about your ongoing research work at Gnani.
We are in the process of filing for 11 patents. We have a unique bilingual model. This works best in specific markets like Indian and Europe. There is a lot of research and innovation we have done in this context.
What are the different business use cases which can be achieved with your technology?
We work across various spectrums such as customer support, customer engagement, collection, onboarding, lead qualification, and more. For example, in the insurance industry, it could be asking simple questions to collect basic customer information. We use products such as voice bot on multiple channels, be it on mobile, telephone lines, website to gather insights.
How different are your voice agents from other big tech companies such as Google or Amazon?
When it comes to Indian languages, our APIs are definitely more accurate than the global provider. Depending on the language, it could be anywhere between 1-3% more accurate than other players in the market. For the benchmark, we use a mix of two to three different approaches. We have been benchmarked by mobile companies that we associate with. We separate out about ten hours of data that we don’t use for our training, and we keep running it across multiple of these engines to see how they are doing. Recently another large MNC player benchmarked us for Tamil and Telugu language, and they said that they found us 4% better than the global competition for these languages.
Finally, tell us about your global strategic partnerships?
We are a Samsung ventures-invested company. This is a strategic investment to power Bixby (Samsung Voice assistant) for multiple Indian and other languages.