NVIDIA has released Jarvis 1.0 Beta — a flexible application framework for multimodal conversational AI services that delivers real-time performance on NVIDIA GPUs. It includes an end-to-end workflow for building and deploying real-time conversational AI apps.
Real-Time Conversational AI
Conversational AI allows humans to interact with devices, machines, and other systems using speech. Though it sounds simple, the technology behind developing a conversational AI system is complex. It typically involves a multi-step process that requires a massive amount of computational power to carry out multiple computations in a matter of a few milliseconds.
The advancements in the field of deep learning have proved to be very beneficial for conversational AI. Many such systems now are capable of showing superhuman accuracies on particular tasks. Retail, healthcare, and finance are some of the critical sectors that have seen widespread adoption of conversational AI.
In general, conversation AI consists of three stages:
Automatic Speech Recognition: At this step, the human voice is taken as input and converted to readable text.
Natural language Understanding: NLU takes text as input, understands the context, and outputs an intelligent response.
Text-to-Speech: The text response generated at the NLU stage is taken as input here and converted to a natural-sounding speech.
What Does NVIDIA Jarvis Do?
The Jarvis framework includes pre-trained conversational AI models, NVIDIA AI toolkit, and end-to-end services for speech recognition, vision, and other NLU tasks.
By simultaneously fusing sensor inputs such as vision and audio, Jarvis provides capabilities such as multi-user and multi-context conversations. These capabilities are very useful in applications such as virtual assistants, call center assistants, and multi-user diarisation.
Jarvis helps developers to fine-tune the models. This helps in deeper understanding of the context, offering end-to-end real-time services by optimising for inference, and delivering seven times higher throughput on GPUs than CPUs. It offers up to ten times speed using TLT and produces GPU accelerated pipelines for intelligent language-based real-time applications.
- The new beta release includes not only models for conversational AI but also support for Transfer Learning Toolkit (TLT). It is an AI-powered toolkit for building production quality pre-trained models without any coding. It is inexpensive in terms of data collection, labelling, and training. Notably, NVIDIA recently announced version 3.0 of this versatile AI toolkit.
Version 3.0 includes pre-trained models such as gesture recognition, gaze estimation, emotion recognition, face detection and landmark estimation. It will provide support for use cases such as NLP and speech recognition.
- Deep Learning researchers can build novel conversational models using NVIDIA NeMo, a python toolkit that makes experimentation with new architectures easy. Further, these models can be trained efficiently with mixed precision on Tensor Cores in NVIDIA GPUs. Users can then use NVIDIA TLT to fine tune models on custom datasets to get the highest accuracy.
- Jarvis offers fully accelerated deep learning pipelines optimised to work as scalable services. Through a simple API, developers can access high-performance services for tasks such as speech recognition, text-to-speech, intent recognition, gaze detection, and facial landmark detection. Each of these skill pipelines can be fused to build completely new skills. These pipelines are performance-tuned and can be customised to a specific case.
- With Jarvis, developers can automate steps from pre-training models to optimising services deployed in cloud, data center, or edge. It applies TensorRT optimisation techniques to models and configures NVIDIA Triton Inference Server. With just one line of code, developers can download, set up, and run entire Jarvis applications.
A few enterprises such as InstaDeep, mobile network operator MTS, Ribbon, Northwestern Medicine, and Ribbon have already adopted Jarvis. Jarvis is freely available for download for members of the NVIDIA developers community.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Join Our Telegram Group. Be part of an engaging online community. Join Here.
I am a journalist with a postgraduate degree in computer network engineering. When not reading or writing, one can find me doodling away to my heart’s content.