Listen to this story
GPUs play a crucial role in delivering computational power for the deployment of AI models, especially for large-scale pretrained models. Due to their platform-specific nature, AI practitioners at present have minimal choice in selecting high-performance GPU inference solutions. Due to the dependencies in complex runtime environments, maintaining the code that makes up these solutions becomes challenging.
In order to address these industry challenges, Meta AI has developed AITemplate (AIT), a unified open-source system with separate acceleration back ends for both AMD and NVIDIA GPU hardware technology.
With the help of AITemplate, it is now possible to run performant inference on hardware from both GPU providers. AITemplate is a Python framework that converts AI models into high-performance C++ GPU template code for a faster inference.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
As mentioned in the company’s blog post, researchers at Meta AI used AITemplate to improve performance up to 12x on NVIDIA GPUs and 4x on AMD GPUs compared with eager mode within PyTorch. The AITemplate system consists of a front-end layer that performs various graph transformations and a back-end layer producing C++ kernel templates for the GPU target. The company stated that the vision behind the framework is to support high-speed while maintaining simplicity.
Moreover, it delivers close to hardware-native Tensor Core (NVIDIA GPU) and Matrix Core (AMD GPU) on widely used AI models such as transformers, convolutional neural networks, and diffusers. At present, AITemplate is enabled on NVIDIA’s A100 and AMD’s MI200 GPU systems, both of which are often used in data centers for research facilities, technology companies, cloud computing service providers, among others.
Download our Mobile App
Source: AITemplate optimizations, Meta AI
The blog reads, “AITemplate offers state-of-the-art performance for current and next-gen NVIDIA and AMD GPUs with less system complexity. However, we are only at the beginning of our journey to build a high-performance AI inference engine. We also plan to extend AITemplate to additional hardware systems, such as Apple M-series GPUs, as well as CPUs from other technology providers.”