PyTorch Releases ExecuTorch Alpha for Deploying LLMs for Edge Devices

The cutting-edge tool is designed to deploy LLMs on devices like smartphones and smart glasses.

Share

PyTorch yesterday announced the release of ExecuTorch alpha, a new tool focused on deploying large language models and large ML models to edge devices. The release, which comes just a few months after the 0.1 preview in collaboration with partners at Arm, Apple, and Qualcomm Technologies, Inc., aims to stabilise the API surface and improve installation processes.

ExecuTorch alpha brings several key features that allow running LLMs efficiently on mobile devices, which are highly constrained for compute, memory, and power. It supports 4-bit post-training quantisation using GPTQ and provides broad device support on CPU through dynamic shape support and new dtypes in XNNPack. 

These improvements allow running models like Llama 2 7B and early support for Llama 3 8B on various edge devices, including iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S22, S23, and S24 phones.

The release also expands the list of supported models across NLP, vision, and speech, with traditional models expected to function seamlessly out of the box. The ExecuTorch SDK has been enhanced with better debugging and profiling tools, allowing developers to map from operator nodes back to original Python source code for efficient anomaly resolution and performance tuning.

PyTorch’s collaborations with partners such as Arm, Apple, Qualcomm Technologies, Google, and MediaTek have been crucial in bringing ExecuTorch to fruition. The framework has already seen production usage, with Meta using it for hand tracking on Meta Quest 3, various models on Ray-Ban Meta Smart Glasses, and integration with Instagram and other Meta products.

Recently PyTorch released  2.3 introducing several features and improvements for performance and usability of large language models and sparse inference. The release allows tensor manipulations across GPUs and hosts, integrating with FSDP (Fully Sharded Data Parallel) for efficient 2D parallelism.

Share
Picture of K L Krithika

K L Krithika

K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies, trying not to confuse it with reality.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India