It can be challenging for a neural network to work efficiently with sparse data. The lack of publicly available details of representative models and data sets has slowed down the research into recommendation systems.
To help advance understanding in this subfield, Facebook AI research has open-sourced a state-of-the-art deep learning recommendation model (DLRM) that was implemented using PyTorch and Caffe2 platforms.
DLRM advances on other models by combining principles from both collaborative filtering and predictive analytics-based approaches, which enables it to work efficiently with production-scale data and provide state-of-art results.
The experiments done by the researchers found that DNNs for recommendation pose unique challenges to efficient execution as compared to traditional CNN and RNN architectures, which have been the focus of the systems and computer architecture community.
Understanding DLRM
The DLRM benchmark is written in Python to allow for a flexible implementation, where the model architecture, data set, and other parameters are defined by the command line arguments. DLRM can be used for both inference and training. In the latter case, the backward-pass operators are added to the computational graph to allow for parameter updates.
In the DLRM model, categorical features are processed using embeddings, while continuous features are processed with a bottom multilayer perceptron (MLP).
The results are processed with a top MLP and fed into a sigmoid function in order to give a probability of a click.
The DLRM model handles continuous (dense) and categorical (sparse) features that describe users and products, as shown here. It exercises a wide range of hardware and system components, such as memory capacity and bandwidth, as well as communication and compute resources.
The variety of servers present in Facebook’s data center introduces architectural heterogeneity, ranging from varying SIMD width to different implementations of the cache hierarchy. The architectural heterogeneity exposes additional hardware-software co-design and optimization opportunities.
The code is self-contained and can interface with public data sets, including the Kaggle Display Advertising Challenge Dataset. This particular data set contains 13 continuous and 26 categorical features, which define the size of the MLP input layer as well as the number of embeddings used in the model, while other parameters can be defined on the command line.
What’s The Need For DLRM?
The current practice of using only latency for bench-marking inference performance is insufficient. Co-locating multiple recommendation models on a single machine can improve throughput. However, this introduces a tradeoff between single model latency and aggregated system throughput.
The model runs on a realistic data set that allows us to measure the accuracy of the model, which is especially useful when experimenting with different numerical techniques and other models.
In addition to the architectural implications for stand-alone recommendation systems, the effect of inference co-location and hyper-threading, as mechanisms to improve resource utilization, on performance variability in the data center were studied by the researchers.
The team behind DLRM believes that this work will lay the foundation for future full-stack hardware solutions targeting personalized recommendation.
The open source implementation of DLRM can be used as a benchmark to measure:
- The speed at which the model (and associated operators) performs.
- How various numerical techniques affect its accuracy.
Implementation of DLRM
DLRM PyTorch. Implementation of DLRM in PyTorch framework:
dlrm_s_pytorch.py
DLRM Caffe2. Implementation of DLRM in Caffe2 framework:
dlrm_s_caffe2.py
The code for each varies slightly to adapt to the specifics of each framework, but the overall structure is similar.
Get hands on with DLRM here