Explained: Facebook’s Novel Method To Downsize Recommendation Models By 112 Times

A combined team from Facebook AI Research and Georgia Institute of Technology has come up with a new approach, known as Tensor Train decomposition for DLRMs (TT-Rec), to compress the size of deep learning recommendation models by up to 112 times. 

Deep neural networks (DNNs) are applied across domains such as predictive forecasting, medical diagnosis, autonomous driving, natural language etc. The capacity of the embedding tables in deep learning recommendation models (DLRMs) is increasing dramatically as the efficiency of the models improves.

Why This Research

Over the years, DNNs have made great strides in several dimensions, including the size of data, cost of infrastructure for training and deployment of the model, model complexity, among others. For instance, OpenAI’s GPT-3 comprises 175 billion parameters. Also, Facebook saw an eight-fold increase in the amount of computation required for machine learning model training in one year (2019-2020). Such unprecedented growth in dimensions results in costly and complex models. The researchers have developed an algorithmic approach to deal with the large memory requirement of DNNs. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Tech Behind The Model

Deep learning-based recommendation models (DLRMs) are one of the most resource-demanding deep learning workloads. According to the researchers, the large embedding tables in recommendation models contribute to 99 percent of the total recommendation model capacity.

To that end, the researchers have used a method known as tensorization to tackle the large memory capacity demand of embedding tables in a DLRM. At a high level, tensorization works by replacing a neural network’s layers with an approximate and structured low-rank form. However, the form is parametric as its shape determines the design trade-off between storage capacity, execution time, and model accuracy. 

Download our Mobile App

The researchers have designed the Tensor-Train compression technique for deep learning Recommendation models, known as TT-Rec. TT-Rec is based on the idea of replacing large embedding tables in a DLRM with a sequence of matrix products. “TT-Rec uses a hybrid approach to learn features and deliver on-par model accuracy while requiring orders-of-magnitude less memory capacity,” the researchers said.

The above figure depicts the generalised model architecture for DLRMs. The model has two primary components, Multi-Layer Perceptron (MLP) layer modules & Embedding Tables (EMBs). The MLP layers process continuous features, such as user age, while the EMBs process categorical features by encoding sparse, high-dimensional inputs into a dense, vector representation. TT-Rec customises the TT-decomposition method to compress embedding tables in deep learning recommendation models.


  • The research applied tensor-train compression in a new application context, compressing the embedding layers of deep learning recommendation models (DLRMs).
  • The researchers quantified the potential trade-off between memory requirements and accuracy.
  • To recover accuracy loss, researchers proposed a sampled Gaussian distribution for the weight initialisation of the tensor cores. To accelerate TT-Rec’s training performance, they introduced a separate cache structure to store frequently-accessed embedding vectors in the uncompressed format, which empirically helps in accuracy improvement.
  • TTRec achieved a higher model accuracy rate with an increase of 10 percent in the training time on average. The approach also reduces the size of the total memory requirement of the embedding tables by up to 112 times.

Benefits of TT-Rec

  • TT-Rec provides a flexible design space between memory capacity, training time and model accuracy.
  • TT-Rec is a highly effective approach, especially for online recommendation training. 
  • According to the researchers, the orders-of-magnitude lower memory requirement with TT-Rec also unlocks many modern AI training accelerators for DLRM training.
  • TT-Rec suits accelerators like GPUs with a relatively higher compute-to-memory (FLOPs-per-Byte) ratio and limited memory capacity.

Wrapping Up

The research demonstrated significant compression ratios and improved training time performance of the DLRMs, including a judicious design and parameterisation of the tensor-train compression technique. 

Read the paper here.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox