How Facebook’s Latest Image Description Tool For The Visually Impaired Really Works

Alt text is the piece of text that pops up on the screen when an image fails to load. When tagged with proper alt text, visually impaired persons can understand image content, being read out in a synthetic voice. Since the alt text descriptions have to be manually added, many pictures feature no alt text, greatly restricting the experience. To bridge this gap, in 2016, Facebook introduced automatic alternative text (AAT) technology. As the name suggests, AAT generates the description of photos on demand. 

The researchers at Facebook have now brought in three significant changes to AAT:

  • A ten-fold increase in the number of concepts AAT can detect and describe.
  •  More detailed image descriptions, giving information on the activities, landmarks, types of animals, etc.
  • Including information about the location coordinates and the relative size of the object — an industry first.

Automatic Alternative Text

AAT is rooted in Facebook’s object recognition technology. This object recognition technology makes use of a neural network with billions of parameters trained on several examples.

In 2018, AAT bagged the Helen Keller Achievement Award (under the aegis of the American Foundation for the Blind) for its ‘exemplary role in creating accessible products or improving the accessibility of their already-popular products’.

The first version of AAT was developed using human-labelled data. The data was then used to train a deep convolutional neural network. The initial version of AAT could recognise about 100 basic concepts such as trees, mountains, people’s identity (using facial recognition model) etc.

However, the version was not scalable, necessitating a move away from the fully supervised learning model with human-labelled data.

Latest Enhancements

In an earlier blog, Facebook explained how hashtags could be an ideal source for training data and making images more accessible. However, hashtags present two main challenges–they at times reference nonvisual concepts (for example #tbt) and they are very vague. Both these shortcomings could confuse deep learning models. To overcome this, new approaches including assessing multiple labels used per image, sorting through hashtag synonyms, and balancing between frequently used hashtags and rare ones were developed. The team also trained a large-scale hashtag prediction model for image recognition.

Leveraging this method of image recognition through hashtags, Facebook’s team has adopted a model trained on weakly supervised data, from public instagram images and their hashtags for AAT. This model is fine-tuned to utilise data obtained from different geographies to make it work better for everyone. Apart from this, concepts of gender, skin colour, and age were also applied to build a more accurate and inclusive model.

The new enhancement to AAT also applies transfer learning, a method of repurposing machine learning models for training on new tasks. This enabled the creation of a model that could identify more distinctive concepts such as monuments, type of food, and selfies.

The model was further trained on Faster R-CNN, a two-stage object detector, using an open-source platform for object detection and segmentation called Detectron2, for more details on parameters such as position and counts.

Credit: Facebook

Greater Detail & Higher Accuracy

The improved AAT can recognise over 1,200 concepts. Facebook has only included concepts based on well-trained models that meet a high precision threshold to maintain a high level of accuracy.

The images would now include a ‘Detail Image Description’ panel that provides comprehensive descriptions of content including positional information (top/bottom/left/right) and relative prominence of objects (primary, secondary or minor).

Download our Mobile App

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.