Sign In

Published on October 1, 2024
In Deep Tech

PyTorch torchao is 🔥

Torchao’s quantization algorithms, applicable to popular models like Llama 3 and diffusion models, have demonstrated up to 97% speedup in inference.

Image by Nalini Nirad

By Mohit Pandey

Last week, PyTorch introduced torchao (Architecture Optimisation tool), a native library designed to enhance model training and inference. It achieves this by leveraging low-bit data types, quantization, and sparsity. According to the PyTorch team, Torchao’s quantization algorithms, applicable to popular models like Llama 3 and diffusion models, have demonstrated up to 97% speedup in inference and 73% peak VRAM reduction, maintaining high accuracy. “Quantizing weights to int4 and the KV cache to int8 supports Llama 3.1 8B at full 128K context length, running in under 18.9GB of VRAM.” “If you’re interested in making your models faster and smaller for training or inference, we hope you’ll find torchao useful and easy to integrate,” said the PyTorch team. Built �

Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.

📣 Want to advertise in AIM? Book here

Mohit writes about AI in simple, explainable, and often funny words. He's especially passionate about chatting with those building AI for Bharat, with the occasional detour into AGI.

Related Posts

Sakana’s AI CUDA Engineer Delivers Up to 100x Speed Gains Over PyTorch

Conda is in Trouble

Pytorch 2.5 GPU Update

PyTorch 2.5 Unleashes High-End GPU Performance, Supercharges LLMs

‘Next Token Prediction’ Might Make PyTorch-like Frameworks Redundant in the Future

‘Next Token Prediction’ Might Make PyTorch-like Frameworks Redundant

Why-AI-is-nothing-without-PyTorch-

AI Companies are Nothing Without PyTorch

PyTorch Enables Llama 2 & 3 to Run on Smartphones with Zero Code

Don’t Miss the Next Big Shift in AI.

Get one year subscription for ₹5999

Data Centre India

India’s Data Centre Boom Is Running Into a Talent Wall

With capacity expected to more than double this decade, the industry is investing in training as graduates struggle to meet

This Firm Wants to be the ‘Next Big Disruptor’ in Networking

Arrcus positions itself as a horizontal software layer that can run across different types of networking hardware.

Will 2026 be the year of AI IPOs?

With CoreWeave’s listing and Fractal Analytics going for an IPO, an array of AI companies are now looking to raise

Fighting Deepfakes May Not Be a Technology Problem

Defenders must be active at all times, while attackers need only one opportunity.

India’s Data Centre Expansion Is Decentralising

Without compute buildup beyond metros, the next wave of digital adoption will be constrained

How Mumbai Keeps Winning India’s Data-Centre Race

Land prices are among the highest in the country, but total build economics remain competitive by global standards.

From Shortages to Scale, io.net’s Approach to Rewriting AI Compute Access

A decentralised GPU marketplace may scale AI compute faster than traditional clouds, as GPU demand towers over supply

Why Deloitte Built a Tax AI That Knows When to Say ‘I Don’t Know’

The company has launched an agentic AI platform for tax research that’s targeting something radical in a conservative profession.

Download the easiest way to
stay informed

Flagship Events