Last updated April 1, 2021
In AI Mysteries

How Amazon’s Image-Recipe Hierarchical Transformer excels in Cross-modal Recipe Retrieval

Amazon introduced a transformer-based cross-modal recipe retrieval method that is straightforward, simple and versatile to train and deploy

Published on April 2, 2021
by Rajkumar Lakshmanamoorthy

Design by Processed with VSCO with kk2 preset

Though food is essential to everyone, it is not just identified as a basic need every time. Rather, humans consistently show interest in exploring new foods and improving their conventional foods’ taste. The present digital world gives a great passage to digitalise food recipes by listing the ingredients, nutritional information, cooking instructions, supporting images and videos, and reviews and ratings. Recipe retrieval with AI/ML attempts over a decade to fulfil the humans’ need to develop their cooking skills and consume something new and delicious!

Cross-modal recipe retrieval is one of the digital recipe approaches in which a machine learning model outputs a text recipe when it is provided with an image of food. This is a challenging task because of the two entirely different modes: natural language processing and image processing. A lot of training data are available on various sites that make modeling possible. Nevertheless, data are spread across various sites and they do not have a proper structure or assurance for completeness. Various researches resulted in amazing machine learning models, but they fail to fulfil humans’ expectations on performance.

Recent cross-modal recipe retrieval approaches use the LSTM cells to encode the text recipes and the corresponding image embeddings. These models mostly require heavily pre-trained text representations, complex multi-stage training strategies and adversarial losses. Amaia Salvador, Erhan Gundogdu, Loris Bazzani and Michael Donoser of Amazon have introduced a transformer-based cross-modal recipe retrieval method that is straightforward, simple and versatile to train and deploy.

Attention-based transformer networks have recently replaced traditional convolution neural networks and recurrent neural networks in various domains, including text, audio, image, video and structured data. Transformer networks show computational efficiency and performance improvements over those traditional approaches. To this end, the Amazon scientists have implemented a hierarchical transformers-based self-learning algorithm in the interesting inter-domain task, cross-modal recipe retrieval to great success. This hierarchical recipe transformer is an end-to-end machine learning model with attention-based encoders for both text and images.

This hierarchical model has two parallel encoder architecture, one for image encoding and another for recipe text encoding. Recipe text encoding is performed hierarchically from the recipe’s title, the ingredients of the recipe to the recipe’s instructions. Individual encoders are used for each of these text tasks. The final embedded text representations are supplied to the recipe encoder that is paired with the image encoder. The hierarchical transformer encoder (HTR) reads the text sentence by sentence. Thus, it effectively retrieves the correct information of ingredients and instructions without any data loss or mismatch.

The pair of encoders for image and text are trained simultaneously by incorporating a pair loss. However, some large recipe datasets do not have accompanying images making supervised training impossible. For training on recipe text that has no accompanying image, a self-supervised learning approach is introduced with a special loss function, known as the recipe loss. Therefore, the model is robust and powerful to train with either recipe-image pairs or recipe-only data.

Self-supervised training strategy for recipe-only data

Python Implementation

Amazon’s Image-Recipe Transformer requires Git’s LFS (Large File Storage) module. The following commands install git lfs in the local machine.

 %%bash
 curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
 sudo apt-get install git-lfs
 git lfs install

Output:

Install timm module using the command,

!pip install timm

Download the source code from the official repository to the local machine.

!git clone https://github.com/amzn/image-to-recipe-transformers.git

Output:

Change the directory to refer to the source contents and check for proper download of source files using the following command.

 %cd /content/image-to-recipe-transformers/
 !ls -p

Output:

If the machine has no Anaconda installed, the following command may help install the Anaconda-3 package.

 !wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
 !bash Anaconda3-2020.02-Linux-x86_64.sh

Activate the conda environment and create the development environment using the following command inside the base mode. It takes some time to install the dependencies and activate the required environment.

!bash

and provide the following inside the base mode,

conda env create -f environment.yml

Output:

The following command inside the base mode runs the transformer in the conda environment.

conda activate im2recipetransformers

Output:

Download the recipe data from the Recipe1M dataset by creating an account. This dataset contains more than one million recipes and around 13 million supporting images. Once the dataset is downloaded, extract it and move to a directory named /root/DATASET_PATH. The following command preprocesses the data.

!python preprocessing.py --root DATASET_PATH

Enable training using the following command. Training may take more time based on the device configuration and memory availability.

 %cd /content/image-to-recipe-transformers/src/
 !python train.py --model_name model --root DATASET_PATH --save_dir /path/to/saved/model/checkpoints

Launch tensorboard logging using the following command.

!tensorboard --logdir "./" --port PORT

Output:

 %cd /content/image-to-recipe-transformers/src/
 !python test.py --model_name model --eval_split test --root DATASET_PATH --save_dir /path/to/saved/model/checkpoints

Calculate the evaluation metrics such as MedR and recall using the following command.

 %cd /content/image-to-recipe-transformers/src/
 !python eval.py --embeddings_file /path/to/saved/model/checkpoints/model/feats_test.pkl --medr_N 10000

Performance of Hierarchical Recipe Transformer

Hierarchical Recipe Transformer is trained and evaluated on the largest public recipe dataset, Recipe1M+. Other competing models are trained and evaluated on the same dataset under identical device configurations. All the models are evaluated on either direction, namely, image-to-recipe conversion and recipe-to-image conversion.

Amazon’s Hierarchical Recipe Transformer outperforms every other model such as R2GAN (Generative Adversarial Network), MCEN (Latent Variable Model), ACME (Adversarial Cross-Modal Embeddings), SCAN (Semantic Consistency and Attention Mechanisms) and Dac (Dividing and Conquering Cross-Modal Recipe Retrieval) on MedR and recall metrics.

Moreover, this Hierarchical Recipe Transformer is tested under different incremental stages with pair loss (supervised recipe-image training), recipe loss (self-supervised recipe-only training) and Vision Transformer (image encoding).

Amazon’s Hierarchical Recipe Transformer achieves state-of-the-art performance in all kinds of retrieval metrics and all retrieval scenarios.

Images and illustrations other than code outputs are obtained from this source.

Rajkumar Lakshmanamoorthy

A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

How Amazon’s Image-Recipe Hierarchical Transformer excels in Cross-modal Recipe Retrieval

Design by Processed with VSCO with kk2 preset

Python Implementation

Performance of Hierarchical Recipe Transformer

Rajkumar Lakshmanamoorthy

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Is it Humane to Bash Humane Ai Pin?

Meta Llama 3 Now Available on Databricks For Enterprise

How Databricks is Enabling Agriculture’s Data Revolution with UPL

How Good is Llama 3 for Indic Languages?

OpenAI Hires Pragya Misra As Its First Employee in India

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

India is Making its Own AI Servers

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru