Apple Introduces AIM, New Autoregressive Pre-Trained Vision Models

Apple recently unveiled autoregressive image models (AIM), a collection of vision models pre-trained with an autoregressive objective.

Share


Apple
recently unveiled autoregressive image models (AIM), a collection of vision models pre-trained with an autoregressive objective. These models represent a new frontier for training large-scale vision models, which are inspired by their textual counterparts, large language models (LLMs), and exhibit similar scaling properties. 

The researchers said that it presents a scalable method for pre-training vision models without supervision. The authors have used a generative autoregressive objective during pre-training and propose technical improvements to adapt it for downstream transfer. 

Check out the GitHub repository here.

The researchers said that the performance of the visual features scale with both the model capacity and the quantity of data. Further, they said that the value of the objective function correlates with the performance of the model on downstream tasks. 

The team have also illustrated the practical implication of these findings by pre-training a 7 billion parameter AIM on 2 billion images, which achieves 84.0% on ImageNet-1k with a frozen trunk. 

Interestingly, even at this scale, they have observed no sign of saturation in performance. The pre-training of AIM is similar to the pre-training of LLMs, and does not require any image-specific strategy to stabilize the training at scale.

About AIM

Apple believes that AIM has desirable properties, including the ability to scale to 7 billion parameters using a vanilla transformer implementation without stability-inducing techniques or extensive hyperparameter adjustments. 

Moreover, AIM’s performance on the pre-training task has a strong correlation with downstream performance, outperforming state-of-the-art methods like MAE and narrowing the gap between generative and joint embedding pre-training approaches.

The researchers have also found no signs of saturation as models scaled, suggesting potential for further performance improvements with larger models trained for longer schedules. 

Share
Picture of Arya Vishwakarma

Arya Vishwakarma

Arya Vishwakarma is a journalism graduate who enjoys being in front of the camera and experimenting with many facets of the industry. She loves writing and anchoring and is an outgoing, gregarious person.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India