MITB Banner

NVIDIA Just Gave A PyTorch Based Conversational AI Model For Free

Share

Nvidia NeMo

Last week, NVIDIA announced the NeMo model for the development of speech and language models and to create a conversational AI. NeMo is an open-source toolkit based on the PyTorch backend. The neural modules form the building blocks of these NeMo models. With NeMo, users can compose and train state-of-the-art neural network architectures.

How Can NeMo Help

NVIDIA NeMo allows to quickly build, train, and fine-tune conversational AI. It consists of NeMo core and NeMo collections. While NeMo core helps in getting the common look and feel for all models, NeMo collections act as groups of domain-specific modules and models. 

There are main parts of NeMo: model, neural module, and neural type. 

The models contain all necessary information regarding training, fine-tuning, data augmentation, and infrastructure details. 

The models of NeMo consists of:

  • Neural network implementation where all the neural models are connected for training and evaluation
  • All pre- and post-processing activities such as tokenisation and augmentation
  • The dataset classes to be used with this model
  • The optimisation algorithm and the learning rate schedule
  • Other infrastructure details

The neural modules are encoder-decoder architectures consisting of conceptual building blocks responsible for different tasks. At its core, Neural Module is the logical part of the neural network, which takes a set of inputs and computes a set of outputs.

The inputs and outputs have a neural type that comprises the semantics, axis order, and the dimensions of the input and output tensor, which ensures safety semantic check between the modules of NeMo. The inputs and outputs are typed with Neural Types, which are pairs that contain information about the tensor’s axes layout and semantics of its elements. The kind of inputs a Neural Module accepts and what output it returns are described by input_types and output_types properties respectively.

For the sake of better comparison, NeMo can be thought of as an abstraction between a layer and a full neural network, which corresponds to a conceptual piece of the neural network, for example, an encoder, decoder, or a language model.

Conversational AI encompasses three main areas of artificial intelligence research — automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS or speech synthesis). NeMo helps practitioners to access, re-use, and build on the pre-trained models in this field.

Speaking of the different collections, NeMo comes with an extendable collection of models for ASR, NLP, and TTS.

The NeMo Speech collection(nemo_asr) has models and building blocks for speech and command recognition, speaker identification and verification, and voice activity detection. The NeMo’s NLP collection (nemo_nlp) has models for answering questions, punctuation, name entity recognition, among others. In NeMo’s text-to-speech collection (nemo_tts), there are spectrogram generators and vocoders which generate synthetic speech.

The NeMo models are built on PyTorch and PyTorch Lightning. While PyTorch is most commonly used, PyTorch Lightning and Hydra (from the PyTorch ecosystem) can be used for enhanced effectiveness. Another advantage of integrating with PyTorch Lightning is that it allows for quickly invoking actions with the trainer API. It also has features such as logging, checkpointing, overfit checking, among others. Further, in the case of Hydra, it gives the user the flexibility to and error-checking capabilities. 

Wrapping Up

During the recent NVIDIA GTC 2020 event, NVIDIA announced the release of Jarvis, a GPU-accelerated application framework that uses NeMo. The company claims that it will allow the usage of video and speech data to build state-of-the-art conversational AI services. As per the company release, Jarvis addresses challenges of large data, computational resources for training the models, among others, by offering an end-to-end learning pipeline for conversational AI. Already, several organisations, such as Voca, an AI agent for call centre support that boasts of clientele such as Toshiba and AT&T; and Kensho which is a company which provides automatic speech transcription services for finance and businesses.

In the coming future, it is expected that more companies will adopt NeMo for developing conversational AI.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.