MITB Banner

PyTorch Releases Version 2.3 with Focus on Large Language Models and Sparse Inference

The new release has added features to meet the needs of the AI and machine learning community. 

Share

Listen to this story

PyTorch announced the release of version 2.3, introducing several new features and improvements for performance and usability of large language models and sparse inference.

The release, which consists of 3,393 commits from 426 contributors, brings support for user-defined Triton kernels in torch.compile. This feature allows users to migrate their existing Triton kernels without experiencing performance regressions or graph breaks.

The feature also allows Torch Inductor to precompile user-defined Triton kernels and organise code more efficiently. 

Another feature, Tensor Parallelism for efficient training of large language models. It facilitates various tensor manipulations across GPUs and hosts, integrating with FSDP (Fully Sharded Data Parallel) for efficient 2D parallelism. 

The PyTorch team has validated Tensor Parallelism on training runs for models with over 100 billion parameters, demonstrating its effectiveness in handling large-scale language models.

PyTorch 2.3 introduces support for semi-structured sparsity, specifically 2:4 sparsity, by implementing it as a tensor subclass. This feature enhances performance, achieving up to 1.6 times faster processing than dense matrix multiplication, and includes advanced functionalities like mixing different data types during quantization, uses improved versions of cuSPARSELt and CUTLASS kernels, and is compatible with torch.compile for more efficient computation.

Compared to the previous version, PyTorch 2.2, which brought advancements like the integration of FlashAttention-v2 and the introduction of AOTInductor, PyTorch 2.3 builds upon these improvements and introduces new features specifically targeted at large language models and sparse inference.

With significant contributions from a large and active community, this version brings features like user-defined Triton kernels and Tensor Parallelism to collectively improve performance, scalability, and flexibility. 

Share
Picture of K L Krithika

K L Krithika

K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies, trying not to confuse it with reality.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.