Recommender systems are the growth engines of a slew of tech companies. Be it Netflix, YouTube, or Amazon, recommender engines keep the user hooked to the product. The algorithms leverage techniques such as collaborative filtering, matrix factorization and deep learning to surface content tailored to the end-user. Recommender systems predict user preferences and even accounts for 30% of the revenue in some of the largest commercial platforms. Below, we take a look at two popular recommendation systems in the market today: NVIDIA Merlin and TensorFlow Recommenders.
NVIDIA Merlin
Merlin is NVIDIA’s end-to-end recommender-on-GPU framework, providing short feature engineering and high training throughput to enable fast experimentation and production retraining of DL recommender models. The Merlin 1.0, an update released in 2022, is an AI framework for building high performing recommenders at scale. The new version has two libraries – Merlin Models and Merlin Systems to choose the best fitting features and models for a specific use case.
Architecture
NVIDIA Merlin is an application framework. The architecture aims to facilitate the phases of recommender system development, from experimentation to production, accelerated on NVIDIA GPUs. It consists of three major components:
- Merlin ETL: ETL, extract-transform-load, is a collection of tools for fast recommender system feature engineering and preprocessing on GPU. ETL is responsible for preparing and exporting datasets for training, in sizes reaching TB or PB scale.
- Merlin training: Merlin training is a collection of DL recommender system models and training tools such as HugeCTR and DLRM.
- Merlin inference: The Merlin-inference container allows users to deploy NVTabular workflows and HugeCTR or TensorFlow models to Inference servers for production.
Features:
- NVTabular: Merlin NVTabular is a feature engineering and preprocessing library offering high-speed and transformational capabilities to manipulate terabytes of recommender system datasets and significantly reduce data preparation time.
- HugeCTR: Merlin HugeCTR is a deep neural network training framework and an efficient C++ recommender system. The tool features multi-GPU and multi-node training for model parallel and data-parallel scaling for maximum performance. In addition, the tool covers common recommender system architectures, including Wide and Deep, Deep Cross Network, and DeepFM.
- DLRM: The Deep Learning Recommendation Model (DLRM) is a recommendation model designed to use both categorical and numerical inputs. The model includes the Wide and Deep, Neural Collaborative Filtering, and Variational AutoEncoder network architectures. In addition, DLRM can be optimised for training with TensorFlow and PyTorch.
- NVIDIA TensorRT: TensorRT is an SDK for high-performance DL inference that includes a DL inference optimiser and runtime. The tool delivers low latency and high throughput for DL inference applications. In addition, it can automatically optimise the network architecture with vertical and horizontal layer fusion.
- Triton Server: The Triton inference server provides a cloud-inferencing solution optimised for NVIDIA GPUs and supports models including PyTorch, TensorFlow, TensorRT, and Open Neural Network Exchange (ONNX) runtime. The server automatically manages and leverages available GPUs and can serve multiple versions of a model.
TensorFlow Recommenders
Google’s TensorFlow Recommenders (TFRS) is an open-source TensorFlow package to build, evaluate, and serve sophisticated recommender models. TFRS, built on top of Tensorflow 2 and Keras, assists in building a recommender system right from data preparation, model formulation, training, evaluation, and deployment.
Architecture
The tool’s modular design allows programmers to customise individual layers and metrics to form a cohesive whole. TFRS allows developers to create and assess flexible candidate nomination models, freely include item, user, and context information into recommendation models, etc. You can train multi-task models that optimise many recommendation goals simultaneously. TensorFlow’s ecosystem supports careful deployment of models based on huge data with features like Data and Feature Columns for transforming data, a standardised data format in TF Records, and a dedicated inference server in TF Serving.
“Tensor Recommenders Library is grounded in years of experience in production recommender systems at Google. The goal is to make TensorFlow Recommenders an evolving platform, flexible enough for conducting academic research and highly scalable for building web-scale recommendation systems,” said Wei Wei, Developer Advocate, Google.
Features
TensorFlow Recommenders (TFRS)’s v0.3.0 was released in 2020 with two important features:
- Built-in support for fast, scalable approximate retrieval: TFRS leverages ScaNN to build deep learning recommender models that retrieve the best candidates out of millions in milliseconds.
- Modelling features interactions: TFRS implements a Deep & Cross Network (DCN) that can learn explicit and bounded-degree cross features effectively. It starts with an input layer followed by a cross-network and ends with a deep network that models implicit feature interactions.
With TensorFlow you can:
- Build and evaluate flexible candidate nomination models
- Freely incorporate item, user, and context information into recommendation models
- Train multi-task models that jointly optimise multiple recommendation objectives.
- Efficiently serve the resulting models using TensorFlow Serving.