How to build a recommendation system using TensorFlow Ranking?

The TensorFlow library is an implementation from TensorFlow that helps us in building learning-to-rank (LTR) models. The learning to rank(LTR) models are models that help us in constructing the ranking models for any information retrieval system.

Machine learning ranking is an approach to build scalable information retrieval systems especially when the task is to find out the similar items for a given input value. Recommender systems also find and present similar items based on several characteristics. TensorFlow Ranking is a Python library that helps in building learning to rank machine learning models. In this article, we will discuss how we can use TensorFlow ranking to build a recommendation system based on the learning-to-rank concept. The demonstration here is inspired by the TensorFlow tutorials on TensorFlow Ranking. The major points to be discussed in the article are listed below.

Table of contents

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
  1. What is TensorFlow Ranking
  2. Building recommendation system 
    1. Importing and preprocessing data 
    2. Defining model
    3. Compiling model 
    4. Fitting model
    5. Generating prediction   

What is TensorFlow Ranking?

The TensorFlow Ranking is an implementation from TensorFlow that helps us in building learning-to-rank (LTR) models. The learning to rank(LTR) models are models that help us in constructing the ranking models for any information retrieval system.  In LTR modelling we use training data with a list of items and these items are connected with some partial orders. Representation of partial order can be a numerical score or a binary judgment. 

The main purpose of this type of model is to predict the rank of new items similar to the training data. Using the TensorFlow ranking we can produce such models. Also, these models find their uses in various tasks such as collaborative filtering, sentiment analysis, and personalized advertisement. A possible architecture of such models can be explained by the following figure.

Image source

This implementation also provides various modules to speed up the building of LTR models wherein the background these modules work on the Keras API. Since the LTR models have their applications in generating recommendation systems In this article, we are going to use the tensor flow ranking for making a recommendation system. Before starting the procedure we are required to install this implementation that can be done using the following lines of codes.

!pip install -q tensorflow-ranking

After installation, we are ready to implement it in our work.

Building recommendation system

In this article, we are going to make a recommendation system using the TensorFlow ranking packages, so that we can utilize the model to rank the movies and then recommend them to the user.  

Importing and preprocessing data

Here in the article, we are going to use the movielens dataset for making recommendation systems that can be called from the tensorflow_dataset module.

import tensorflow_datasets as tfds
ratings_data = tfds.load('movielens/100k-ratings', split="train")
fetures_data = tfds.load('movielens/100k-movies', split="train")

Output:

Selecting the features from the rating data.

ratings_data = ratings_data.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "user_rating": x["user_rating"]
})

Converting iser_ids and movie_title into integer indices.

import tensorflow as tf
from tensorflow.keras import layers 
feature_data = fetures_data.map(lambda x: x["movie_title"])
users = ratings_data.map(lambda x: x["user_id"])
 
user_ids_vocabulary = layers.experimental.preprocessing.StringLookup(
    mask_token=None)
user_ids_vocabulary.adapt(users.batch(1000))
 
movie_titles_vocabulary = layers.experimental.preprocessing.StringLookup(
    mask_token=None)
movie_titles_vocabulary.adapt(feature_data.batch(1000))

Group by user_id.

key_func = lambda x: user_ids_vocabulary(x["user_id"])
reduce_func = lambda key, dataset: dataset.batch(100)
train = ratings_data.group_by_window(
    key_func=key_func, reduce_func=reduce_func, window_size=100)

Here we can check the shape of our data.

print(train)
for x in train.take(1):
  for key, value in x.items():
    print(f"Shape of {key}: {value.shape}")
    print(f"Example values of {key}: {value[:5].numpy()}")
    print()

Output:

Generating batch of labels and features 

from typing import Dict, Tuple
def _features_and_labels(
    x: Dict[str, tf.Tensor]) -> Tuple[Dict[str, tf.Tensor], tf.Tensor]:
  labels = x.pop("user_rating")
  return x, labels
 
 
train = train.map(_features_and_labels)
 
train = train.apply(
    tf.data.experimental.dense_to_ragged_batch(batch_size=32))

Here in the above codes, we have tensor of user id and movie titles in the train of shape [32, none]. Let’s define the model.

Defining model

from tensorflow.keras import Model
class RankingModel(Model):
 
  def __init__(self, user_vocab, movie_vocab):
    super().__init__()
    self.user_vocab = user_vocab
    self.movie_vocab = movie_vocab
    self.user_embed = layers.Embedding(user_vocab.vocabulary_size(),
                                                64)
    self.movie_embed = layers.Embedding(movie_vocab.vocabulary_size(),
                                                 64)
 
  def call(self, features: Dict[str, tf.Tensor]) -> tf.Tensor:
 
    embeddings_user= self.user_embed(self.user_vocab(features["user_id"]))
    embeddings_movie = self.movie_embed(
        self.movie_vocab(features["movie_title"]))
 
    return tf.reduce_sum(embeddings_user * embeddings_movie, axis=2)

Here in the above codes, we have defined a class in which we defined a function to set the user and movie vocabulary and embeddings and a call function to define how the ranking will be calculated.  In the outcome, we will be having dot products of user embeddings and movie embeddings. 

Model Compilation

import tensorflow_ranking as tfr
from tensorflow.keras import optimizers
model = RankingModel(user_ids_vocabulary, movie_titles_vocabulary)
optimizer = optimizers.Adagrad(0.5)
loss = tfr.keras.losses.get(
    loss=tfr.keras.losses.RankingLossKey.SOFTMAX_LOSS, ragged=True)
eval_metrics = [
    tfr.keras.metrics.get(key="ndcg", name="metric/ndcg", ragged=True),
    tfr.keras.metrics.get(key="mrr", name="metric/mrr", ragged=True)
]
model.compile(optimizer=optimizer, loss=loss, metrics=eval_metrics

In the above, we have used the ranking loss for the training of the model and ranking metrics for the evaluation of the model from the TensorFlow ranking package. Also from ranking metrics, we specified the normalized discounted cumulative gain and mean reciprocal rank. 

Model fitting 

history = model.fit(train, epochs=9)

Output:

In the above, we have trained our compiled model on the data using 9 epochs. Let’s check the history of the model 

history.history

In history, we can see that loss from the model is so high because we are using ranking-specific softmax loss that is different from the softmax loss in classification problems. This loss promotes all relevant items in the ranking list that have better chances than irrelevant items. 

Generating prediction 

for movie_titles in feature_data.batch(2000):
  break
 
inputs = {
    "user_id":
        tf.expand_dims(tf.repeat("26", repeats=movie_titles.shape[0]), axis=0),
    "movie_title":
        tf.expand_dims(movie_titles, axis=0)
}
 
scores = model(inputs)
titles = tfr.utils.sort_by_scores(scores,
                                  [tf.expand_dims(movie_titles, axis=0)])[0]
print(f"Top 10 recommendations for user 26: {titles[0, :10]}")

Output:

Here in the above codes, we have created a list of users and movies from which we have generated an input as user number 26. Using the input and scores model has recommended 10 movie names for user 26.

Final words

In this article, we have discussed the TensorFlow ranking that is an implementation from TensorFlow for learning to rank modelling. Using this module we have generated a recommendation system on movielens dataset. 

References

More Great AIM Stories

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM