Compared to standard classification models that classify one item at a time, LTR models receive an entire list of items as input and learn an ordering that maximises the utility of the entire list. Over the years, search, and recommendation systems have been the most popular applications of LTR models. Since its launch in 2018, TF-Ranking has been applied in diverse domains beyond search, including e-commerce, smart city planning, and SAT solvers.
For instance, the goal of LTR is to learn a function f () that takes as an input a list of items (songs, documents, products, movies, etc.) and outputs the list of things in the optimal order (descending order of relevance). As shown in the image below, the green shade indicates the item relevance level, and the red item marked with ‘X’ indicates non-relevant.
Introducing the latest version of TF-Ranking
In May this year, Google launched the latest version of TF-Ranking that enables full support for natively building LTR models using Keras, a high-level API of TensorFlow 2. Its native Keras ranking model consists of a flexible ModelBuilder, a DatasetBuilder to set up training data, and a Pipeline to train the model with the provided dataset. Check out the source code on GitHub. These components help build a customised LTR model easier than ever and facilitate rapid exploration of new model structures for research and production.
Here is an image showcasing the key improvements in the latest version of TF-Ranking. The blue modules are provided by TF-Ranking, and the green modules are customisable.
LTR with TFR-BERT
Of late, pre-trained language models such as BERT have achieved SOTA performance on various language understanding tasks. To capture the expressiveness of these models, Google — in its latest blog post — demonstrated TF-Ranking implementing a novel TFR-BERT architecture that combines BERT with the power of LTR (learning-to-rank) to optimise the ordering of list inputs.
For example, consider a query and a list of ‘n’ documents to rank in response to this query. Instead of learning a BERT representation for each <query, document> pair, LTR models apply a ranking loss to jointly learn a BERT representation that maximises the utility of the entire ranked list for the ground-truth labels.
The image below illustrates this process. For this, you need to flatten a list of ‘n’ documents to rank in response to a query into a list <query, document> tuples. Next, these tuples are fed into a pre-trained language model (BERT). Finally, the collected BERT outputs for the entire document list are then fine-tuned with one of the specialised ranking losses available in TFR.
“Our experience shows that this ‘TFR-BERT architecture’ delivers significant improvement in pre-trained language model performance, learning to SOTA performance for several popular ranking tasks, especially when multiple pre-trained language models are ensembled,” according to Google.
Interpretability and transparency of LTR models
Transparency and interpretability are essential factors in deploying LTR models in ranking systems.
In such scenarios, the contribution of each feature to the final ranking should be examinable and understandable to ensure transparency, accountability, and fairness of the outcomes.
These things can be achieved using generalised additive models or GAMs, intrinsically interpretable ML models that are linearly composed of smooth functions of individual features. In the past, GAMs have been studied on classification and regression tasks. However, it is less clear on how to apply them in a ranking application.
Last year, Google developed a neural ranking GAM — an extension of generalised additive models to ranking problems. For instance, in the image below, using a neural ranking GAM makes visible how relevance, price and distance, in the context of a given user device, contribute to the final ranking of the hotel. Neural ranking GAMs are now available as part of TensorFlow-Ranking.
Neural ranking vs gradient boosting
Since its TF-Ranking launch, the team has significantly deepened the understanding of how best to leverage neural models in ranking with numerical features, instead of gradient boosted decision trees (GBDTs) such as LambdaMART, which had remained the baseline to beat in various open LTR datasets.
The team culminated in a data augmented self-attentive latent cross (DASALC) model, as described in an ICLR 2021 paper, which is the first to establish parity, and in some cases statistically significant improvements of neural ranking models over strong LambdaMART baselines on open LTR datasets.
‘This achievement is made possible through a combination of techniques, including data augmentation, self-attention for modeling document interactions, neural feature transformation, listwise ranking loss, and model ensembling similar to boosting in GBDTs,” according to a Google blog post. The architecture of the DASALC model was entirely implemented using the TF-Ranking library.
“We believe that the new ‘Keras-based TF-Ranking’ version will make it easier to conduct neural LTR research and deploy production-grade ranking systems,” according to Google.