Listen to this story
Microsoft and NVIDIA entered a decade-long partnership earlier this year amid the generative AI craze. While the latter, with its hardware prowess, is already leading the race, Microsoft too enjoys an upper hand, thanks to its deal with OpenAI. All year round, both the parties have announced several deals hand-in-hand in the AI landscape.
“Our partnership with NVIDIA spans every layer of the Copilot stack — from silicon to software — as we innovate together for this new age of AI,” said Satya Nadella, chairman and CEO of Microsoft, at the ongoing Ignite conference.
Here are 7 NVIDIA announcements by Microsoft made at the event that caught our attention:
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Microsoft has introduced the NC H100 v5 VM series for Azure, featuring the industry’s first cloud instances with NVIDIA H100 NVL GPUs. These virtual machines have the combined power of PCIe-based H100 GPUs connected via NVIDIA NVLink, delivering nearly 4 petaflops of AI computing and 188GB of HBM3 memory.
This setup is a game-changer for mid-range AI workloads, offering up to 12x higher performance on models like GPT-3 175B. Moreover, Microsoft plans to integrate the NVIDIA H200 Tensor Core GPU into Azure next year, catering to larger model inferencing with enhanced memory capacity and bandwidth using the latest-generation HBM3e memory.
Microsoft also has plans to add the NVIDIA H200 Tensor Core GPU to its Azure fleet next year to support larger model inferencing with similar latency.
Microsoft is expanding its NVIDIA-powered services with the introduction of NCC H100 v5 VMs. These confidential virtual machines leverage NVIDIA H100 Tensor Core GPUs, ensuring the confidentiality and integrity of data and applications in use, in memory.
These GPU-enhanced confidential VMs will enter private preview soon, providing Azure customers with unparalleled acceleration while maintaining data security.
NVIDIA has introduced an AI foundry service to supercharge the development and tuning of custom generative AI applications for enterprises and startups deploying on Microsoft Azure.
The foundry service pulls together three elements — a collection of NVIDIA AI Foundation Models, NVIDIA NeMoTM framework and tools, and NVIDIA DGXTM Cloud AI supercomputing services. This will give enterprises an end-to-end solution for creating custom generative AI models.
Businesses can then deploy their customised models with NVIDIA AI Enterprise software to power generative AI applications, including intelligent search, summarisation and content generation.
NVIDIA has launched an AI foundry service to turbocharge the development and tuning of custom generative AI applications for enterprises and startups on Microsoft Azure. This introduction will optimise large language models for various industries.
The AI leader has also partnered with Amdocs, a key player in communications and media services that will leverage the AI foundry service to optimise enterprise-grade LLMs for the telco and media sectors. This collaboration builds on the existing Amdocs-Microsoft partnership.
Microsoft and NVIDIA are democratising access to AI Foundation Models, allowing developers to experience them through a user-friendly interface or API directly from a browser. These models, including popular ones like Llama 2, Stable Diffusion XL, and Mistral, can be customised with proprietary data.
Optimised with NVIDIA TensorRT-LLM these models deliver high throughput and low latency, running seamlessly on any NVIDIA GPU-accelerated stack. These foundational models are accessible through the NVIDIA NGC catalogue, Hugging Face, and Microsoft Azure AI model catalogue.
NVIDIA also launched two new simulation engines on Omniverse Cloud hosted on Microsoft Azure: the virtual factory simulation engine and the autonomous vehicle (AV) simulation engine.
As automotive companies transition to AI-enhanced digital systems, these simulation engines aim to save costs and reduce lead times. Omniverse Cloud serves as a platform-as-a-service, unifying core product and business processes for automakers.
An upcoming update to TensorRT-LLM, an open-source software enhancing AI inference performance, will add support for new large language models. This update makes demanding AI workloads more accessible on desktops and laptops with RTX GPUs, starting at 8GB of VRAM.
TensorRT-LLM for Windows will soon be compatible with OpenAI’s Chat API, letting developers run projects locally on a PC with RTX. The upcoming release of TensorRT-LLM v0.6.0 promises improved inference performance, up to 5x faster, and support for additional popular LLMs, including Mistral 7B and Nemotron-3 8B.