Ahmedabad-based VSpeech.ai was founded in 2015. The startup sensed an opportunity while working with Interactive Voice Response (IVR) call centres, and soon pivoted to IVR based telephony integrations with Speech products.
“We are an AI-driven technology firm dedicated to solving complex business problems with Intelligent Speech Solutions. Our AI-based technology stack offers more than 90% accuracy,” said Mausam Patel, co-founder & Director at VSpeech.ai.
Trained on more than 5000 hours of data from calls, the company has built Speech Recognition Engines with multi-lingual recognition for agent-customer communications.
Vspeech.ai offers a voice analysis system that auto-generates analytics from thousands of calls to help companies make critical business decisions.
The startup has now integrated Emotional AI into their products. Emotional AI detects and interprets human emotions from calls on the go and helps improve the overall experience.
Vspeech.ai claims to be the only conversational AI company that offers multilingual Speech Recognition in 15 major Indian languages and ten foreign languages. The system also understands a mixture of languages.
“Our multilingual service is designed to provide an easy communication platform as India is a diverse country with almost 456 languages. Most of them tend to use code-switching, i.e. using two or more languages at one time for their convenience,” said Patel.
The company uses an advanced 8 KHZ Mono Engine to understand mixed language inputs accurately. “Current products in the market from Google, Amazon and Azure don’t support mixed languages naturally. Vspeech.ai effectively does that,” he added
In the call centres, the voice data carries a lot of noise like background sounds, traffic movements etc. Vspeech.ai bypasses these noises while transcribing voice calls.
Vspeech.ai runs on its own proprietary machine learning tools. The technology includes domain-based neural networks, generative adversarial networks and TensorFlow-based AI tools. The language models consist of classifiers and N-gram stacks.
The tech stack involves natural language understanding components on top of NLP/NLU libraries. VSpeech.ai builds its own supervised learning methods. The company owns server infrastructure and also has a parallel GPU system to train models. It has a large repository of audio and text data from different languages and uses linguistics experts to transfer that domain knowledge into easily usable tools. VSpeech.ai has also built its own IPA system to understand spoken and written languages effectively.
The software is delivered through HTTP/HTTPS, and Socket APIs.The system provides offline as well as online stream mode options for real-time services.
Vspeech.ai executes thousands of call transcriptions per day on scalable AWS infrastructure and deploys multiple API on different nodes. Most backend API is in Python and Node.js.
Additionally, the most common Speech services are available on the website Backend API can be plugged into any software.. The startup claimed that its system is highly scalable wherein one node of the system can process 6000 hours of data per month.
The company is self-funded and has invested heavily in building a technology stack in the first three years. From 2018 onwards, Vspeech.ai products started bagging enterprise contracts from telecoms, banks, IVR providers and fintech solution providers. “VSpeech.ai owns 75% of the market share in the voice solution segment in India, offering all Indian regional languages voice solutions, including Indian English, Hindi, Tamil, Telugu, Malayalam, Kannada, Bengali, Gujarati, Marathi, Oriya and more,” said Patel.
The company offers solutions in European languages and has Nordic clients. VSpeech.ai has plans to expand into the EU and cover more languages.