NVIDIA has set the stage for businesses worldwide to design and deploy large language models (LLMs). This design enables them to develop domain-specific chatbots, personal assistants, and other artificial intelligence systems.
The firm announced the NVIDIA NeMo Megatron framework for training trillion-parameter language models. In addition, NVIDIA Triton Inference Server offers multi-node distributed inference features for new domains and languages. When used in conjunction with NVIDIA DGX systems, these technologies provide an enterprise-grade solution for simplifying the construction and deployment of massive language models.
Sign up for your weekly dose of what's up in emerging technology.
“Large language models have demonstrated their flexibility and capability, answering deep domain questions, translating languages, comprehending and summarising documents, writing stories, and computing programmes all without specialised training or supervision,” said Bryan Catanzaro, NVIDIA’s vice president of applied deep learning research. “Developing huge language models for new languages and domains is perhaps the largest supercomputing use to date, and these capabilities are now accessible to the world’s corporations.”
Speed LLM Development
NVIDIA NeMo Megatron builds on Megatron, an open-source project led by NVIDIA researchers that implements massive transformer language models at scale. Megatron 530B is the most customisable language model in the world.
Enterprises can overcome the obstacles associated with developing complex natural language processing models using the NeMo Megatron framework. It is designed to scale out across NVIDIA DGX SuperPOD’s large-scale accelerated computing infrastructure. With data processing libraries that ingest, curate, organise, and clean data, NeMo Megatron automates the complexity of LLM training. Leveraging powerful data, tensor, and pipeline parallelisation technologies enables the training of huge language models to be efficiently distributed across thousands of GPUs. Enterprises can utilise the NeMo Megatron framework to teach LLMs in their topics and languages of interest.
Real-Time LLM Inference
New multi-GPU, multi-node capabilities in the newest NVIDIA Triton Inference Server enable real-time scaling of LLM inference workloads across several GPUs and nodes. The models demand more memory than a single GPU or even a large server with numerous GPUs can provide, and inference must be performed quickly for applications to be relevant. Megatron 530B may now be run on two NVIDIA DGX systems, reducing processing time from nearly a minute on a CPU server to half a second, enabling the deployment of LLMs for real-time applications.
Custom Language Models
SiDi, JD Explore Academy, and VinBrain are among the early adopters constructing huge language models using NVIDIA DGX SuperPOD. SiDi, one of Brazil’s leading research and development organisations for artificial intelligence, has modified the Samsung virtual assistant for 200 million Portuguese speakers.
“The SiDi team has considerable expertise developing artificial intelligence (AI) virtual assistants and chatbots, which require both high AI performance and specialised software that is trained and tuned to the shifting nuances of human language,” said John Yi, SiDi’s CEO. “NVIDIA DGX SuperPOD is suitable for powering our team’s advanced work and enabling us to provide world-class AI services to Brazilian Portuguese speakers.”
JD Explore Academy, the research and development arm of JD.com, a leading supply chain technology and service provider, is utilising NVIDIA DGX SuperPOD to develop natural language processing for use in smart customer service, smart retail, smart logistics, the Internet of Things, and healthcare, among other applications.
VinBrain, a healthcare artificial intelligence firm based in Vietnam, used a DGX SuperPOD to develop and deploy a clinical language model for radiologists and telemedicine in 100 hospitals. It is currently used by over 600 healthcare practitioners.
NVIDIA Triton is available through the NVIDIA NGC catalogue, a repository for GPU-accelerated AI software that includes frameworks, toolkits, pretrained models, and Jupyter Notebooks, as well as through the Triton GitHub repository. Additionally, Triton is a component of NVIDIA’s AI Enterprise software stack, which NVIDIA optimises, certifies, and supports. As a result, enterprises can utilise the software suite to execute language model inference on commercially available accelerated servers in on-premises data centres and private clouds.