MITB Banner

Meet Ferret-UI, Apple’s AI-Powered Answer to Mobile UI Challenges

The core focus of Ferret-UI lies in its multimodal capabilities, combining language understanding with visual comprehension tailored specifically for mobile UI screens, incorporating referring, grounding, and reasoning capabilities.

Share

Apple
Listen to this story

Ahead of Apple’s flagship event, WWDC 2024, in June, the tech giant is going all in to bringing generative AI to its products. Enter Ferret-UI, a specialised LLM tailored specifically for the nuanced demands of mobile user interface comprehension and interaction. 

In this paper called “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs”, the authors present Ferret-UI as a solution to the limitations of existing LLMs in handling UI screens better.

While general-purpose LLMs like GPT-3 have garnered attention for their versatility, they often struggle to understand and effectively interact with UI screens, especially in the mobile domain. The core focus of Ferret-UI lies in its multimodal capabilities, combining advanced language understanding with visual comprehension tailored specifically for mobile UI screens, incorporating referring, grounding, and reasoning capabilities. 

Under the Hood

One of the key challenges in adapting LLMs to UI screens is the unique characteristics of these screens compared to natural images. UI screens often have elongated aspect ratios and contain smaller objects of interest, such as icons and texts, which are not typically encountered in natural images. To address this challenge, Ferret-UI integrates a mechanism called “any resolution,” allowing it to handle screens of varying aspect ratios and magnifying details for enhanced visual feature extraction. By encoding each sub-image separately before feeding them to the LLM, Ferret-UI ensures that no critical visual information is lost during processing.

Moreover, Ferret-UI employs a new approach to data curation, gathering training samples from a wide range of elementary UI tasks. These tasks include icon recognition, finding text, and widget listing, among others. By training on such diverse tasks, Ferret-UI learns to understand UI elements’ semantics and spatial positioning, enabling it to make distinctions at both broad and detailed levels.

In addition to elementary tasks, Ferret-UI is also trained on specialised tasks, such as detailed description generation, perception-conversation understanding, and function inference. These tasks prepare the model to engage in intricate discussions about visual components, formulate action plans based on specific goals, and interpret the overall purpose of a UI screen.

To evaluate the effectiveness of Ferret-UI, the authors establish a comprehensive benchmark encompassing various UI tasks. Comparative evaluations with other existing models, including open-source LLMs and GPT-4V, demonstrate Ferret-UI’s superiority, particularly in elementary UI tasks and advanced reasoning capabilities.

If Apple integrates Ferret UI in Siri, it can be a game-changing experience for Apple users. Integrating Ferret-UI into Siri can also improve accessibility features, enable seamless app integration, offer personalised assistance, facilitate natural language UI navigation, and enhance integration with voice assistive technologies, benefiting users with special needs and improving overall user experience on iOS devices. 

This update comes soon after Apple released the MM1 model last month and ReALM (Reference Resolution As Language Modeling) two weeks ago. The company has also forged a $50M licensing deal with Shutterstock to acquire AI training data.

Share
Picture of Shritama Saha

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore the influence of AI on different domains including fashion, healthcare and banks.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.