According to Research and Markets reports, Artificial Intelligence for speech recognition market in India is anticipated to expand at a compound annual growth rate (CAGR) of ~65.17% during the forecast period (2019-2024) and is expected to reach a value of INR 14.61 Bn by 2024.
The increasing demand for smart speakers and voice-enabled devices, coupled with rising penetration of speech recognition technology in customer care services are driving this growth, further stimulating development and innovation in the space.
Enterprises across the globe have been spending a considerable amount of money on contact centres and agents, yet the customers are rarely satisfied due to problems like more prolonged time consumption. To mitigate such issues, Bangalore-based Vernacular.ai is aiming to resolve by automating a large chunk of non-productive calls over voice. The company recently raised a Series A investment of $5.1 million.
Founded in 2016 by two IIT Roorkee grads, Sourabh Gupta and Akshay Deshraj, Vernacular.ai is an AI-first SaaS business startup that is driven with the vision to become the leading voice automation/AI platform in the world.
How Is It Driving Language Understanding
Vernacular.ai delivers two unique products — VIVA (Vernacular Intelligent Voice Assistant) and VASR (Vernacular Automated Speech Recognition).
VASR enables enterprises to convert audio to text by applying powerful neural network models in an easy-to-use API. This API can recognise over 160 dialects in ten different languages to support the enterprise user base.
Built on top of VASR, VIVA is an AI-based voice automation platform which helps automate 80 percent of calls handled by a call centre and reduces agents’ average call handling time by 30 per cent. VIVA uses innovative natural language understanding and speech recognition technology, which supports around 10 Indian languages. It also enables hyper-personalisation of customer calls with its ability to understand the various characteristics of the speaker, like accent, speech rate, age, gender, region, even dialect.
On being asked how these products are different from others in the market, Gupta pointed out three key differences-
- Validated self-learning technology that ensures the system is really improving over time.
- Capability to automate full end-to-end open conversations rather than limited automation over IVR and closed-domain calls.
- VIVA, built using state of the art advances in voice AI can identify the user’s persona that includes the language, dialect, accent and sentiment.
The Tech Behind It
Vernecular.ai uses AI and machine learning for accomplishing a number of tasks, such as-
- In the Contact Center Automation solution for understanding the intent of the user in their utterances.
- For recognising text from the speech, the company uses deep neural networks trained on thousands of hours of acoustic data.
- For identifying whether a person is speaking or not. In audios, they employ recurrent models which work on real-time audios with minimal latency.
- For controlling the behaviours of voice bot based on the current understanding of user and call state during a call flow.
- Modelling usage patterns in all the languages, which covers conversational nuances and semantic equivalence of words and phrases.
- In the Text to Speech models where the company trains models to synthesise audio which replicate nuances of conversational human speech.
- For analysing human to human conversations to get insights about resolution, user satisfaction etc.
Core Tech Stack
Talking about the core tech stack, the company mostly uses Python with various ML frameworks. Deshraj said, “Most commonly TensorFlow, but people have been using PyTorch in various projects too. In places where more performance is needed, we use languages like C, C++ and most recently Rust. For speech recognition, Kaldi as a framework works well for us since it gives a lot of hackability, though we do use other stock end-to-end frameworks too.”
He added, “Since most of our engineering stack is in Golang, we also use that whenever required for stitching pieces together or for things which are less Machine Learning. At times we have experimented with Clojure for such tasks.”
On the backend, Vernacular.ai has core services written in Golang, C/CPP & Python, and for frontend applications, they use React.js and Elm. Gupta also mentioned that they support relational databases like PostgreSQL, MySQL & Oracle, and the services usually communicate over gRPC and JSON-over-HTTP. They also use Kubernetes for container orchestration.
Tackling Hiring Phase
The general hiring process at Vernacular.ai involves technical and cultural fit rounds. Gupta said, “Rigorous filtering aside, one important piece for us is sourcing from the right places and finding candidates who are going to excel in our environment. The things we look for when hiring are learnability, ambitiousness, ability to work with unknowns, and someone teeming with enthusiasm whom we would love to have on-board.”
Future Roadmap
In the next five years, the company wants to build a language engine for the world. We envision a world where human-machine interactions over voice will become second nature to everyone. Our mission is to build the structural components to ensure this revolution unfolds, said Deshraj on a concluding note.