Active Hackathon

How Facebook’s Latest Image Description Tool For The Visually Impaired Really Works

Alt text is the piece of text that pops up on the screen when an image fails to load. When tagged with proper alt text, visually impaired persons can understand image content, being read out in a synthetic voice. Since the alt text descriptions have to be manually added, many pictures feature no alt text, greatly restricting the experience. To bridge this gap, in 2016, Facebook introduced automatic alternative text (AAT) technology. As the name suggests, AAT generates the description of photos on demand. 

The researchers at Facebook have now brought in three significant changes to AAT:


Sign up for your weekly dose of what's up in emerging technology.
  • A ten-fold increase in the number of concepts AAT can detect and describe.
  •  More detailed image descriptions, giving information on the activities, landmarks, types of animals, etc.
  • Including information about the location coordinates and the relative size of the object — an industry first.

Automatic Alternative Text

AAT is rooted in Facebook’s object recognition technology. This object recognition technology makes use of a neural network with billions of parameters trained on several examples.

In 2018, AAT bagged the Helen Keller Achievement Award (under the aegis of the American Foundation for the Blind) for its ‘exemplary role in creating accessible products or improving the accessibility of their already-popular products’.

The first version of AAT was developed using human-labelled data. The data was then used to train a deep convolutional neural network. The initial version of AAT could recognise about 100 basic concepts such as trees, mountains, people’s identity (using facial recognition model) etc.

However, the version was not scalable, necessitating a move away from the fully supervised learning model with human-labelled data.

Latest Enhancements

In an earlier blog, Facebook explained how hashtags could be an ideal source for training data and making images more accessible. However, hashtags present two main challenges–they at times reference nonvisual concepts (for example #tbt) and they are very vague. Both these shortcomings could confuse deep learning models. To overcome this, new approaches including assessing multiple labels used per image, sorting through hashtag synonyms, and balancing between frequently used hashtags and rare ones were developed. The team also trained a large-scale hashtag prediction model for image recognition.

Leveraging this method of image recognition through hashtags, Facebook’s team has adopted a model trained on weakly supervised data, from public instagram images and their hashtags for AAT. This model is fine-tuned to utilise data obtained from different geographies to make it work better for everyone. Apart from this, concepts of gender, skin colour, and age were also applied to build a more accurate and inclusive model.

The new enhancement to AAT also applies transfer learning, a method of repurposing machine learning models for training on new tasks. This enabled the creation of a model that could identify more distinctive concepts such as monuments, type of food, and selfies.

The model was further trained on Faster R-CNN, a two-stage object detector, using an open-source platform for object detection and segmentation called Detectron2, for more details on parameters such as position and counts.

Credit: Facebook

Greater Detail & Higher Accuracy

The improved AAT can recognise over 1,200 concepts. Facebook has only included concepts based on well-trained models that meet a high precision threshold to maintain a high level of accuracy.

The images would now include a ‘Detail Image Description’ panel that provides comprehensive descriptions of content including positional information (top/bottom/left/right) and relative prominence of objects (primary, secondary or minor).

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022