NVIDIA, at the GTC conference 2020, has announced the release of NVIDIA Jarvis. This GPU-accelerated application framework allows companies to use video and speech data to build state-of-the-art conversational AI services customised for their industry, products and customers.
According to the company release — the shift toward working from home, telemedicine and remote learning have created a surge in demand for custom, language-based AI services, ranging from customer support to real-time transcriptions and summarisation of video calls to keep people productive and connected.
According to Jensen Huang, founder and CEO of NVIDIA, “Conversational AI is central to the future of many industries, as applications gain the ability to understand and communicate with nuance and contextual awareness.”
He further added, “NVIDIA Jarvis can enable organisations to serve millions with speed and accuracy, improving customer satisfaction and supporting growing needs in the healthcare, financial services, education and retail industries.”
Applications built with Jarvis can take advantage of innovations in the new NVIDIA A100 GPU for AI computing and the latest optimisations in NVIDIA TensorRT for inference. For the first time, it’s now possible to run an entire multimodal application, using the most powerful vision and speech models, faster than the 300-millisecond threshold for real-time interactions, claims NVIDIA.
According to their release, Jarvis provides a complete, GPU-accelerated software stack and tools making it easy for developers to create, deploy and run end-to-end, real-time conversational AI applications that can understand terminology unique to each company and its customers.
When asked, David Schubmehl, research director of AI Software Platforms at IDC said, “We, at IDC, continue to see rapid growth within the conversational AI market largely because organisations of all sizes are beginning to realise the value of using well-trained virtual assistants and chatbots to help service their customers and grow their businesses.”
“IDC expects worldwide spending on conversational AI use cases like automated customer service agents and digital assistants to grow from $5.8 billion in 2019 to $13.8 billion in 2023, a compound annual growth rate of 24%,” stated Schubmehl.
To offer an interactive, personalised experience, NVIDIA believes companies need to train their language-based applications on data that is specific to their product offerings and customer requirements. However, building a service from scratch requires deep AI expertise, large amounts of data and compute resources to train the models, and software to regularly update models with new data.
Jarvis addresses these challenges by offering an end-to-end deep learning pipeline for conversational AI, stated the company release. It includes state-of-the-art deep learning models, such as NVIDIA’s Megatron BERT for natural language understanding. Enterprises can further fine-tune these models on their data using NVIDIA NeMo, optimise for inference using TensorRT, and deploy in the cloud and at the edge using Helm charts available on NGC, NVIDIA’s catalogue of GPU-optimised software.
Among the first companies to take advantage of Jarvis-based conversational AI products and services for their customers are Voca, an AI agent for call centre support, and Kensho, for automatic speech transcriptions for finance and business.
Voca’s AI virtual agents, which use NVIDIA for faster, more interactive, human-like engagements — are used by Toshiba, AT&T and other world-leading companies. Voca uses AI to understand the full intent of a customer’s spoken conversation and speech. This makes it possible for the agents to automatically identify different tones and vocal clues to discern between what a customer says and what a customer means. Additionally, using scalability features built into NVIDIA’s AI platform, they can dramatically reduce customer wait time.
According to Alan Bekker, CTO and co-founder of Voca, “Low latency is critical in call centres and with NVIDIA GPUs our agents can listen, understand and respond in under a second with the highest levels of accuracy. Now our virtual agents are able to successfully handle 70% – 80% of all calls — ranging from general customer service requests to payment transactions and technical support.”
On the other hand, Kensho, the innovation hub for S&P Global located in Cambridge, Mass., that deploys scalable machine learning and analytics systems, has used NVIDIA’s conversational AI to develop Scribe, a speech recognition solution for finance and business. With NVIDIA, Scribe outperforms other commercial solutions on earnings calls and similar financial audio in terms of accuracy by a margin of up to 20%.
“We’re working closely with NVIDIA on ways to push end-to-end automatic speech recognition with deep learning even further. By training new models with NVIDIA, we’re able to offer higher transcription accuracy for financial jargon compared to traditional approaches that do not use AI. With this, we are offering our customers timely information in minutes versus days,” said Georg Kucsko, head of AI research at Kensho.
NVIDIA is also providing an early access program for NVIDIA Jarvis, which is available to a limited number of applicants. In order to evaluate the application framework, developers sign up here.