Active Hackathon

Guide to PyTerrier: A Python Framework for Information Retrieval

PyTerrier framework proposes different pipelines as Python Classes to build an end-to-end, scalable Information Retrieval system
PyTerrier cover image
Listen to this story

Information Retrieval is one of the key tasks in many natural language processing applications. The process of searching and collecting information from databases or resources based on queries or requirements, Information Retrieval (IR). The fundamental elements of an Information Retrieval system are query and document. The query is the user’s information requirement, and the document is the resource that contains the information. An efficient IR system collects the required information accurately from the document in a compute-effective manner.

The popular Information Retrieval frameworks are mostly written in Java, Scala, C++ and C. Though they are adaptable in many languages, end-to-end evaluation of Python-based IR models is a tedious process and needs many configuration adjustments. Further, reproducibility of the IR workflow under different environments is practically not possible with the available frameworks.


Sign up for your weekly dose of what's up in emerging technology.

Machine Learning heavily relies on the high-level Python language. Deep learning models are built almost on one of the two Python frameworks: TensorFlow and PyTorch. Though most natural language processing applications are built on top of Python frameworks and libraries nowadays, there is no well-adaptable Python framework for the Information Retrieval tasks. Hence, here comes the need for a Python-based Information Retrieval framework that supports end-to-end experimentation with reproducible results and model comparisons. 

PyTerrier & its Architecture

Craig Macdonald of the University of Glasgow and Nicola Tonellotto of the University of Pisa have introduced a Python framework, named PyTerrier, for Information Retrieval. This framework proposes different pipelines as Python Classes for Information Retrieval tasks such as retrieval, Learn-to-Rank re-ranking, rewriting the query, indexing, extracting the underlying features and neural re-ranking. An end-to-end Information Retrieval system can be easily built with these pre-established pipeline elements. Moreover, a built IR architecture can be scaled or extended in the future as per the requirements.

A typical model comparison experiment for two different IR models (Source)

Are you looking for a complete repository of Python libraries used in data science, check out here.

An experiment architecture for comparing two different Information Retrieval models has many key components such as Ranked retrieval, Fusion, Feature extraction, LTR (Learn-to-Rank) re-ranking and Neural re-ranking. The workflow is represented in a directed acyclic graph (DAG) with complex operation sequences. The PyTerrier framework helps build such a complex DAG problem in an end-to-end trainable pipeline. 

PyTerrier & its Key Objects

PyTerrier is a declarative framework with two key objects: an IR transformer and an IR operator. A transformer is an object that maps the transformation between an array of queries and the corresponding documents. 

The Transformer Classes of PyTerrier. Q and R represent the input query and the input document, respectively. An element provided in parentheses is optional (Source).

The basic retrieval process, for example, in PyTerrier is performed using the following Python code.

Here, Q is the input query and R’ is the retrieved output document. Thus, a complex IR task can be performed with simple Python codes. Also, PyTerrier provides operator overloading for conventional math operators to perform custom IR operations.

The PyTerrier operators employed under operator overloading strategy (Source).

The newly introduced PyTerrier Framework is instantiated on two public datasets so far: the Terrier dataset and the Ansereni dataset. More dataset implementations would be expected soon.

Hands-on Retrieval and Evaluation

PyTerrier is available as a PyPi package. We can simply pip install it.

!pip install python-terrier

Import the library and initialize it.

import pyterrier as pt
if not pt.started():

Use one of the in-built datasets to perform the retrieval process and extract its index.

vaswani_dataset = pt.datasets.get_dataset("vaswani")

indexref = vaswani_dataset.get_index()
index = pt.IndexFactory.of(indexref)



Extract queries as topics for the dataset.

topics = vaswani_dataset.get_topics()


Perform retrieval easily using a few commands as shown below.

retr = pt.BatchRetrieve(index, controls = {"wmodel": "TF_IDF"})
retr.setControl("wmodel", "TF_IDF")
retr.setControls({"wmodel": "TF_IDF"})


It can be observed that the documents are retrieved and ranked. Further, the results can be saved to the disk using the ‘write_results’ method available in the ‘io’ class of the PyTerrier framework.,"result1.res")

Now, evaluation is performed by comparing the results with the ground truth available in-built. Get the ground truth query results.

qrels = vaswani_dataset.get_qrels()


Evaluate the query results.

eval = pt.Utils.evaluate(res,qrels)


Evaluation results can also be obtained for per-query results. Here, the evaluation is performed based on the ‘map’ metric on all documents under query.

eval = pt.Utils.evaluate(res,qrels,metrics=["map"], perquery=True)

A portion of the output:

Find the Notebook with these code implementations here.

Hands-on Learn-To-Rank

Create the environment by importing the necessary libraries and initializing the PyTerrier framework.

import numpy as np
import pandas as pd
import pyterrier as pt
if not pt.started():

Download an in-built dataset, its indices, queries and ground truth results.

dataset = pt.datasets.get_dataset("vaswani")
indexref = dataset.get_index()
topics = dataset.get_topics()
qrels = dataset.get_qrels()

For ranking the queries, the standard ‘BM25’ model is used in this example. The traditional ‘TF-IDF’ model and the ‘PL2’ model are used to re-rank the query results.

#this ranker will make the candidate set of documents for each query
BM25 = pt.BatchRetrieve(indexref, controls = {"wmodel": "BM25"})
#these rankers we will use to re-rank the BM25 results
TF_IDF =  pt.BatchRetrieve(indexref, controls = {"wmodel": "TF_IDF"})
PL2 =  pt.BatchRetrieve(indexref, controls = {"wmodel": "PL2"})

Create a PyTerrier pipeline to perform the above said example task and make a query.

pipe = BM25 >> (TF_IDF ** PL2)
pipe.transform("chemical end:2")


In the above output, the term ‘score’ represents the ranking score of the BM25 model and the term ‘features’ represents the re-ranking scores of the TF-IDF and PL2 models. However ranking at the first step and re-ranking in two successive steps consumes more time. To tackle this issue, PyTerrier introduces a method, called FeaturesBatchRetrieve. Let’s implement the method for efficient processing by ranking and re-ranking, all in one go.

fbr = pt.FeaturesBatchRetrieve(indexref, controls = {"wmodel": "BM25"}, features=["WMODEL:TF_IDF", "WMODEL:PL2"]) 
# the top 2 results
(fbr %2).search("chemical")


PyTerrier has a pipeline method, called compile(), which optimizes the ranking and re-ranking processes automatically. This approach also yields the same results as above at around the same compute-time. An example implementation is as follows:

pipe_fast = pipe.compile()
pipe_fast %2).search("chemical")


After performing ranking and re-ranking, a machine learning model can be built to Learn-to-Rank (LTR). Split the available data into train, validation and test sets.

train_topics, valid_topics, test_topics = np.split(topics, [int(.6*len(topics)), int(.8*len(topics))])

Build a Random Forest model to perform the LTR and obtain the results.

from sklearn.ensemble import RandomForestRegressor
BaselineLTR = fbr >> pt.pipelines.LTR_pipeline(RandomForestRegressor(n_estimators=400)), qrels)
resultsRF = pt.pipelines.Experiment([PL2, BaselineLTR], test_topics, qrels, ["map"], names=["PL2 Baseline", "LTR Baseline"])


Build an XGBoost model to perform the LTR and obtain the results.

import xgboost as xgb
params = {'objective': 'rank:ndcg', 
          'learning_rate': 0.1, 
          'gamma': 1.0, 'min_child_weight': 0.1,
          'max_depth': 6,
          'verbose': 2,
          'random_state': 42 

BaseLTR_LM = fbr >> pt.pipelines.XGBoostLTR_pipeline(xgb.sklearn.XGBRanker(**params)), qrels, valid_topics, qrels)
resultsLM = pt.pipelines.Experiment([PL2, BaseLTR_LM],
                                qrels, ["map"], 
                                names=["PL2 Baseline", "LambdaMART"])


Find the Notebook with these code implementations here.

Wrapping up

We discussed the newly introduced PyTerrier framework, its architecture and its implementation for Information Retrieval tasks. We learnt how to use the framework with two example hands-on implementations for the applications, a Simple Query-Retrieval and a Learn-to-Rank machine learning model. PyTerrier has enormous algorithms and in-built datasets to perform almost any Information Retrieval task with minimal efforts. This framework is also established as a Python-built one focusing chiefly on simplicity, efficiency and reproducibility.

Further reading:

Research paper

Github repository

Indexing with PyTerrier

Index API of PyTerrier

More Great AIM Stories

Rajkumar Lakshmanamoorthy
A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022