NVIDIA’s Large Language AI Models Are Now Available To Businesses Worldwide

Nvidia doubles down on artificial intelligence language models and inference as a platform for the Metaverse, in data centres, the cloud, and at the edge.

NVIDIA has set the stage for businesses worldwide to design and deploy large language models (LLMs). This design enables them to develop domain-specific chatbots, personal assistants, and other artificial intelligence systems.

The firm announced the NVIDIA NeMo Megatron framework for training trillion-parameter language models. In addition, NVIDIA Triton Inference Server offers multi-node distributed inference features for new domains and languages. When used in conjunction with NVIDIA DGX systems, these technologies provide an enterprise-grade solution for simplifying the construction and deployment of massive language models.

“Large language models have demonstrated their flexibility and capability, answering deep domain questions, translating languages, comprehending and summarising documents, writing stories, and computing programmes all without specialised training or supervision,” said Bryan Catanzaro, NVIDIA’s vice president of applied deep learning research. “Developing huge language models for new languages and domains is perhaps the largest supercomputing use to date, and these capabilities are now accessible to the world’s corporations.”


Sign up for your weekly dose of what's up in emerging technology.

Speed LLM Development 

NVIDIA NeMo Megatron builds on Megatron, an open-source project led by NVIDIA researchers that implements massive transformer language models at scale. Megatron 530B is the most customisable language model in the world.

Enterprises can overcome the obstacles associated with developing complex natural language processing models using the NeMo Megatron framework. It is designed to scale out across NVIDIA DGX SuperPOD’s large-scale accelerated computing infrastructure. With data processing libraries that ingest, curate, organise, and clean data, NeMo Megatron automates the complexity of LLM training. Leveraging powerful data, tensor, and pipeline parallelisation technologies enables the training of huge language models to be efficiently distributed across thousands of GPUs. Enterprises can utilise the NeMo Megatron framework to teach LLMs in their topics and languages of interest.

Download our Mobile App

Real-Time LLM Inference 

New multi-GPU, multi-node capabilities in the newest NVIDIA Triton Inference Server enable real-time scaling of LLM inference workloads across several GPUs and nodes. The models demand more memory than a single GPU or even a large server with numerous GPUs can provide, and inference must be performed quickly for applications to be relevant. Megatron 530B may now be run on two NVIDIA DGX systems, reducing processing time from nearly a minute on a CPU server to half a second, enabling the deployment of LLMs for real-time applications.

Custom Language Models 

SiDi, JD Explore Academy, and VinBrain are among the early adopters constructing huge language models using NVIDIA DGX SuperPOD. SiDi, one of Brazil’s leading research and development organisations for artificial intelligence, has modified the Samsung virtual assistant for 200 million Portuguese speakers.

“The SiDi team has considerable expertise developing artificial intelligence (AI) virtual assistants and chatbots, which require both high AI performance and specialised software that is trained and tuned to the shifting nuances of human language,” said John Yi, SiDi’s CEO. “NVIDIA DGX SuperPOD is suitable for powering our team’s advanced work and enabling us to provide world-class AI services to Brazilian Portuguese speakers.”

JD Explore Academy, the research and development arm of JD.com, a leading supply chain technology and service provider, is utilising NVIDIA DGX SuperPOD to develop natural language processing for use in smart customer service, smart retail, smart logistics, the Internet of Things, and healthcare, among other applications.

VinBrain, a healthcare artificial intelligence firm based in Vietnam, used a DGX SuperPOD to develop and deploy a clinical language model for radiologists and telemedicine in 100 hospitals. It is currently used by over 600 healthcare practitioners.


NVIDIA Triton is available through the NVIDIA NGC catalogue, a repository for GPU-accelerated AI software that includes frameworks, toolkits, pretrained models, and Jupyter Notebooks, as well as through the Triton GitHub repository. Additionally, Triton is a component of NVIDIA’s AI Enterprise software stack, which NVIDIA optimises, certifies, and supports. As a result, enterprises can utilise the software suite to execute language model inference on commercially available accelerated servers in on-premises data centres and private clouds.

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Do machines feel pain?

Scientists worldwide have been finding ways to bring a sense of awareness to robots, including feeling pain, reacting to it, and withstanding harsh operating conditions.

IT professionals and DevOps say no to low-code

The obsession with low-code is led by its drag-and-drop interface, which saves a lot of time. In low-code, every single process is shown visually with the help of a graphical interface that makes everything easier to understand.

Neuralink elon musk

What could go wrong with Neuralink?

While the broad aim of developing such a BCI is to allow humans to be competitive with AI, Musk wants Neuralink to solve immediate problems like the treatment of Parkinson’s disease and brain ailments.

Understanding cybersecurity from machine learning POV 

Today, companies depend more on digitalisation and Internet-of-Things (IoT) after various security issues like unauthorised access, malware attack, zero-day attack, data breach, denial of service (DoS), social engineering or phishing surfaced at a significant rate.