Machine learning ranking is an approach to build scalable information retrieval systems especially when the task is to find out the similar items for a given input value. Recommender systems also find and present similar items based on several characteristics. TensorFlow Ranking is a Python library that helps in building learning to rank machine learning models. In this article, we will discuss how we can use TensorFlow ranking to build a recommendation system based on the learning-to-rank concept. The demonstration here is inspired by the TensorFlow tutorials on TensorFlow Ranking. The major points to be discussed in the article are listed below.
Table of contents
- What is TensorFlow Ranking
- Building recommendation system
- Importing and preprocessing data
- Defining model
- Compiling model
- Fitting model
- Generating prediction
What is TensorFlow Ranking?
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
The TensorFlow Ranking is an implementation from TensorFlow that helps us in building learning-to-rank (LTR) models. The learning to rank(LTR) models are models that help us in constructing the ranking models for any information retrieval system. In LTR modelling we use training data with a list of items and these items are connected with some partial orders. Representation of partial order can be a numerical score or a binary judgment.
The main purpose of this type of model is to predict the rank of new items similar to the training data. Using the TensorFlow ranking we can produce such models. Also, these models find their uses in various tasks such as collaborative filtering, sentiment analysis, and personalized advertisement. A possible architecture of such models can be explained by the following figure.


This implementation also provides various modules to speed up the building of LTR models wherein the background these modules work on the Keras API. Since the LTR models have their applications in generating recommendation systems In this article, we are going to use the tensor flow ranking for making a recommendation system. Before starting the procedure we are required to install this implementation that can be done using the following lines of codes.
!pip install -q tensorflow-ranking
After installation, we are ready to implement it in our work.
Building recommendation system
In this article, we are going to make a recommendation system using the TensorFlow ranking packages, so that we can utilize the model to rank the movies and then recommend them to the user.
Importing and preprocessing data
Here in the article, we are going to use the movielens dataset for making recommendation systems that can be called from the tensorflow_dataset module.
import tensorflow_datasets as tfds
ratings_data = tfds.load('movielens/100k-ratings', split="train")
fetures_data = tfds.load('movielens/100k-movies', split="train")
Output:


Selecting the features from the rating data.
ratings_data = ratings_data.map(lambda x: {
"movie_title": x["movie_title"],
"user_id": x["user_id"],
"user_rating": x["user_rating"]
})
Converting iser_ids and movie_title into integer indices.
import tensorflow as tf
from tensorflow.keras import layers
feature_data = fetures_data.map(lambda x: x["movie_title"])
users = ratings_data.map(lambda x: x["user_id"])
user_ids_vocabulary = layers.experimental.preprocessing.StringLookup(
mask_token=None)
user_ids_vocabulary.adapt(users.batch(1000))
movie_titles_vocabulary = layers.experimental.preprocessing.StringLookup(
mask_token=None)
movie_titles_vocabulary.adapt(feature_data.batch(1000))
Group by user_id.
key_func = lambda x: user_ids_vocabulary(x["user_id"])
reduce_func = lambda key, dataset: dataset.batch(100)
train = ratings_data.group_by_window(
key_func=key_func, reduce_func=reduce_func, window_size=100)
Here we can check the shape of our data.
print(train)
for x in train.take(1):
for key, value in x.items():
print(f"Shape of {key}: {value.shape}")
print(f"Example values of {key}: {value[:5].numpy()}")
print()
Output:


Generating batch of labels and features
from typing import Dict, Tuple
def _features_and_labels(
x: Dict[str, tf.Tensor]) -> Tuple[Dict[str, tf.Tensor], tf.Tensor]:
labels = x.pop("user_rating")
return x, labels
train = train.map(_features_and_labels)
train = train.apply(
tf.data.experimental.dense_to_ragged_batch(batch_size=32))
Here in the above codes, we have tensor of user id and movie titles in the train of shape [32, none]. Let’s define the model.
Defining model
from tensorflow.keras import Model
class RankingModel(Model):
def __init__(self, user_vocab, movie_vocab):
super().__init__()
self.user_vocab = user_vocab
self.movie_vocab = movie_vocab
self.user_embed = layers.Embedding(user_vocab.vocabulary_size(),
64)
self.movie_embed = layers.Embedding(movie_vocab.vocabulary_size(),
64)
def call(self, features: Dict[str, tf.Tensor]) -> tf.Tensor:
embeddings_user= self.user_embed(self.user_vocab(features["user_id"]))
embeddings_movie = self.movie_embed(
self.movie_vocab(features["movie_title"]))
return tf.reduce_sum(embeddings_user * embeddings_movie, axis=2)
Here in the above codes, we have defined a class in which we defined a function to set the user and movie vocabulary and embeddings and a call function to define how the ranking will be calculated. In the outcome, we will be having dot products of user embeddings and movie embeddings.
Model Compilation
import tensorflow_ranking as tfr
from tensorflow.keras import optimizers
model = RankingModel(user_ids_vocabulary, movie_titles_vocabulary)
optimizer = optimizers.Adagrad(0.5)
loss = tfr.keras.losses.get(
loss=tfr.keras.losses.RankingLossKey.SOFTMAX_LOSS, ragged=True)
eval_metrics = [
tfr.keras.metrics.get(key="ndcg", name="metric/ndcg", ragged=True),
tfr.keras.metrics.get(key="mrr", name="metric/mrr", ragged=True)
]
model.compile(optimizer=optimizer, loss=loss, metrics=eval_metrics
In the above, we have used the ranking loss for the training of the model and ranking metrics for the evaluation of the model from the TensorFlow ranking package. Also from ranking metrics, we specified the normalized discounted cumulative gain and mean reciprocal rank.
Model fitting
history = model.fit(train, epochs=9)
Output:


In the above, we have trained our compiled model on the data using 9 epochs. Let’s check the history of the model
history.history


In history, we can see that loss from the model is so high because we are using ranking-specific softmax loss that is different from the softmax loss in classification problems. This loss promotes all relevant items in the ranking list that have better chances than irrelevant items.
Generating prediction
for movie_titles in feature_data.batch(2000):
break
inputs = {
"user_id":
tf.expand_dims(tf.repeat("26", repeats=movie_titles.shape[0]), axis=0),
"movie_title":
tf.expand_dims(movie_titles, axis=0)
}
scores = model(inputs)
titles = tfr.utils.sort_by_scores(scores,
[tf.expand_dims(movie_titles, axis=0)])[0]
print(f"Top 10 recommendations for user 26: {titles[0, :10]}")
Output:


Here in the above codes, we have created a list of users and movies from which we have generated an input as user number 26. Using the input and scores model has recommended 10 movie names for user 26.
Final words
In this article, we have discussed the TensorFlow ranking that is an implementation from TensorFlow for learning to rank modelling. Using this module we have generated a recommendation system on movielens dataset.
References