MITB Banner

How Facebook’s Latest Image Description Tool For The Visually Impaired Really Works

Share

Alt text is the piece of text that pops up on the screen when an image fails to load. When tagged with proper alt text, visually impaired persons can understand image content, being read out in a synthetic voice. Since the alt text descriptions have to be manually added, many pictures feature no alt text, greatly restricting the experience. To bridge this gap, in 2016, Facebook introduced automatic alternative text (AAT) technology. As the name suggests, AAT generates the description of photos on demand. 

The researchers at Facebook have now brought in three significant changes to AAT:

  • A ten-fold increase in the number of concepts AAT can detect and describe.
  •  More detailed image descriptions, giving information on the activities, landmarks, types of animals, etc.
  • Including information about the location coordinates and the relative size of the object — an industry first.

Automatic Alternative Text

AAT is rooted in Facebook’s object recognition technology. This object recognition technology makes use of a neural network with billions of parameters trained on several examples.

In 2018, AAT bagged the Helen Keller Achievement Award (under the aegis of the American Foundation for the Blind) for its ‘exemplary role in creating accessible products or improving the accessibility of their already-popular products’.

The first version of AAT was developed using human-labelled data. The data was then used to train a deep convolutional neural network. The initial version of AAT could recognise about 100 basic concepts such as trees, mountains, people’s identity (using facial recognition model) etc.

However, the version was not scalable, necessitating a move away from the fully supervised learning model with human-labelled data.

Latest Enhancements

In an earlier blog, Facebook explained how hashtags could be an ideal source for training data and making images more accessible. However, hashtags present two main challenges–they at times reference nonvisual concepts (for example #tbt) and they are very vague. Both these shortcomings could confuse deep learning models. To overcome this, new approaches including assessing multiple labels used per image, sorting through hashtag synonyms, and balancing between frequently used hashtags and rare ones were developed. The team also trained a large-scale hashtag prediction model for image recognition.

Leveraging this method of image recognition through hashtags, Facebook’s team has adopted a model trained on weakly supervised data, from public instagram images and their hashtags for AAT. This model is fine-tuned to utilise data obtained from different geographies to make it work better for everyone. Apart from this, concepts of gender, skin colour, and age were also applied to build a more accurate and inclusive model.

The new enhancement to AAT also applies transfer learning, a method of repurposing machine learning models for training on new tasks. This enabled the creation of a model that could identify more distinctive concepts such as monuments, type of food, and selfies.

The model was further trained on Faster R-CNN, a two-stage object detector, using an open-source platform for object detection and segmentation called Detectron2, for more details on parameters such as position and counts.

Credit: Facebook

Greater Detail & Higher Accuracy

The improved AAT can recognise over 1,200 concepts. Facebook has only included concepts based on well-trained models that meet a high precision threshold to maintain a high level of accuracy.

The images would now include a ‘Detail Image Description’ panel that provides comprehensive descriptions of content including positional information (top/bottom/left/right) and relative prominence of objects (primary, secondary or minor).

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.