AIM Banners_978 x 90

PyTorch torchao is 🔥

Torchao’s quantization algorithms, applicable to popular models like Llama 3 and diffusion models, have demonstrated up to 97% speedup in inference.
Image by Nalini Nirad
Last week, PyTorch introduced torchao (Architecture Optimisation tool), a native library designed to enhance model training and inference. It achieves this by leveraging low-bit data types, quantization, and sparsity. According to the PyTorch team, Torchao’s quantization algorithms, applicable to popular models like Llama 3 and diffusion models, have demonstrated up to 97% speedup in inference and 73% peak VRAM reduction, maintaining high accuracy. “Quantizing weights to int4 and the KV cache to int8 supports Llama 3.1 8B at full 128K context length, running in under 18.9GB of VRAM.†“If you’re interested in making your models faster and smaller for training or inference, we hope you’ll find torchao useful and easy to integrate,†said the PyTorch team.  Built â€
Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.

📣 Want to advertise in AIM? Book here

Picture of Mohit Pandey
Mohit Pandey
Mohit writes about AI in simple, explainable, and often funny words. He's especially passionate about chatting with those building AI for Bharat, with the occasional detour into AGI.
Related Posts
AIM Print and TV
Don’t Miss the Next Big Shift in AI.
Get one year subscription for ₹5999
Download the easiest way to
stay informed