For this week’s startup feature, Analytics India Magazine spoke to Abhimanyu, co-founder of Agara.ai, to understand how the company is providing real-time voice AI to automate the entire customer support function in enterprises.
Agara is an autonomous virtual voice agent powered by Real-time Voice AI. The platform is contextualised and pre-trained to power natural conversations over voice, without human assistance.
The voice agent brings cutting edge autonomous technology to solve a real-life problem in customer service. Agara’s proprietary machine learning models are pre-trained on industry-specific customer care data.
Abhimanyu said, “Focused currently on voice communication, our AI platform works much like human brains on four lobes. One each for Speech Recognition, Natural Language Understanding, Conversations Module and Text-to-Speech. The lobes help in speech-to-text conversion, recognise intents, detect consumer’s emotional state, extract information from the discussion and respond.”
What’s the differentiator
Abhimanyu stated, “Agara is built specifically for autonomous voice conversations. Every aspect of our product is hyper optimised to deliver quick, intelligent responses to customers over voice.”
He spoke about a few important innovations in their lab, such as:
- Patent-pending, proprietary Spoken Language Understanding modules specialised in accurately identifying key entities from a conversation like names, numbers, email addresses, cities, prices and more.
- Best-in-class speech recognition accuracy on phone audio
- Robust behaviour with accents and noisy inputs
- A combination of GPU and CPU based infrastructure for low latency responses
Furthermore, Abhimanyu also mentioned some of the key technology differentiators, such as:
- Conversational flexibility with pre-built conversation blocks
- No-code workflow building
- Integrated multimodal communication
- Pure autonomy focus
- Industry leading speech understanding
- One workflow invoked from anywhere
- Vertically integrated system/100% autonomous call centre
Use of AI at Agara.ai
According to Abhimanyu, Agara’s custom-trained ASR system has been trained on several hundred hours of customer support phone call recordings for the most optimal, context-specific speech recognition. The ASR is delivered on a GPU-based infrastructure to ensure low latency. Along with the proprietary ASR, Agara also uses public ASR, specifically, the Google Enhanced Phone model for transcription.
“Agara’s extended R&D efforts on speech recognition warrant the need for an additional capability which accounts for errors often found in public speech recognition systems. Often, these errors came in one of the following three forms, which are irrecoverable transcript errors, accents & intonations and ambient noise,” Abhimanyu said.
In parallel to the ASRs, the conversational AI platform uses SLU (Spoken Language Understanding) modules to capture specific entities and intent from the caller’s speech, working robustly against accents and noises. SLUs are custom machine learning models developed in-house at Agara. These models are designed to operate directly on speech input to identify and extract specific entities and intents. These models do not generate transcripts and only output the requisite entity/intent.
Talking about proprietary NLP models, the co-founder said these models combine the outputs of the Speech-to-Text system and SLU for a structured understanding of what the caller said. The models are pre-trained using industry-specific datasets to accurately identify the intent, entities, and tone and sentiment from the user’s speech.
The Conversation Blocks handle complex, multi-turn conversations with the caller to collect relevant data, adapt to any switch in context naturally, and resolve caller requests in a fully autonomous manner. Moreover, the platform also uses customised versions of publicly available Text-to-Speech services to deliver responses naturally.
Core tech stack
Abhimanyu said the core tech stack comprises multiple microservices running on top of AWS.
- The company uses Golang for the core backend services.
- The ML prediction services use Python and GCP speech services.
- React powers all the frontend apps.
- All the metadata including logs, audio, transcripts etc. gets stored into AWS S3.
- All transactional and platform data goes into Postgres DB
- The development and deployment pipeline is automated using Github Actions, Terraform and AWS Fargate.
- Languages: Golang, Python, Typescript
- Frameworks: React, Vue
- Data Storage: AWS RDS Postgres, Redis, Elasticsearch, AWS S3
- Cloud Hosting: AWS ECS, AWS EC2
- Deployment: AWS Fargate, Terraform, Github Actions
The company has raised a $4.3 million Pre-Series A extension led by UTEC, a Japan-based early-stage deep-tech venture capital firm. Existing investors Blume Ventures and RTP Global also participated in the round.
Agara will focus on three main areas in the next few years:
- Significantly improve speech understanding accuracy: Agara is in the process of transcribing, annotating and training its machine learning models to extract massive amounts of intelligence. In addition, the platform also manages a growing global panel of data creators. Over the next two years, Agara intends to harvest both to significantly improve the accuracy of their proprietary speech understanding.
- Create near-human conversations that can blur the difference between people and machines to significantly improve customer experience: Agara uses automated text to speech systems to synthesise voice on the fly. Agara is investing considerable resources in creating a text-to-speech system that is incredibly human-like and tuned particularly to the needs of a person-to-person conversation.
- Enhancing the suite of capabilities to let users create highly engaging conversation flows across a multitude of scenarios in minutes: Agara intends to simplify the process of creating, managing and improving conversation flows, without the need for any technical expertise using the drag and drop framework.