MITB Banner

New Doesn’t Mean Novel: Validating Latest Approaches For Recommendation Systems

Share

Deep Learning algorithms are the go-to solution to almost all the recommender systems nowadays. Deep learning thrives at devouring tonnes of data and spewing out recommendations with great accuracy. These systems are ubiquitous and have touched many lives in some form or the other. From YouTube to Netflix, the applications have risen multifold. 

Since there is no going back for these deep learning-based recommendation systems, it is only obvious to evaluate them for the route they take in building the final model. There are many popular recommendation approaches that have been accepted widely. But are these as good as they seem to be? Are they reproducible? Are there simpler, better alternatives to the deep learning approach?

To address these questions, Maurizio Ferrari and his colleagues have done a study on recent recommendation approaches.

Finding How Good Is Good Enough

The authors considered 18 algorithms that were presented at top-level research conferences in the last years. These 18 algorithms were the result of a study conducted by the authors by analysing research papers that proposed new algorithmic approaches for top-n recommendation tasks using deep learning methods at the recent conference proceedings of KDD, SIGIR, TheWebConf (WWW), and RecSys for corresponding research works.

To validate their experiment, the following baseline methods were considered in the experiments to compare with the performance of new recommendation approaches:

TopPopular: A non-personalized method that recommends the most popular items to everyone. Popularity is measured by the number of explicit or implicit ratings.

ItemKNN: A traditional Collaborative-Filtering (CF) approach based on k-nearest-neighborhood (KNN). And others that include UserKNN, ItemKNN-CFCBF etc.

The latest approaches that were checked reproducibility include:

  • Collaborative Memory Networks (CMN)
  • Metapath based Context for RECommendation (MCRec)
  • Collaborative Variational Autoencoder (CVAE)
  • Collaborative Deep Learning (CDL)
  • Neural Collaborative Filtering (NCF) and others.

After collating the relevant works, the papers with code were selected to check for reproducibility. The authors lament that they could reproduce the published results with an acceptable degree of certainty for only seven papers.

The reproducibility of a work was decided based on the following factors:

  • A working version of the source code is available or the code only has to be modified in minimal ways to work correctly
  • At least one dataset used in the original paper is available. A further requirement here is that either the originally-used train-test splits are publicly available or that they can be reconstructed based on the information in the paper

To check for reproducibility, the authors performed refactoring on the original implementations in a way that allowed them to apply the same evaluation procedure that was used in the original papers. Specifically, refactoring is done in a way that the original code for training, hyper-parameter optimization and prediction are separated from the evaluation code.

The study, to the surprise of the authors, revealed that in the large majority of the investigated cases (6 out of 7) the proposed deep learning techniques did not consistently outperform the simple, but fine-tuned baseline methods.

Future Direction

This paper was an attempt to address the following:

Reproducibility: To what extent is recent research in the area reproducible (with reasonable effort)?

Progress: To what extent are recent algorithms actually leading to better performance results when compared to relatively simple, but well-tuned, baseline methods?

Besides issues related to the baselines, an additional challenge is that researchers use various types of datasets, evaluation protocols, performance measures, and data preprocessing steps, which makes it difficult to conclude which method is the best across different application scenarios.

The lower performing newer approaches can be a consequence of the following: 

 (i) weak baselines: 

 (ii) establishment of weak methods as new baselines; and

 (iii) difficulties in comparing or reproducing results across papers

If the newer approaches cannot outrank the older, simpler approaches then it is a no-brainer to implement the new ones. 

Read the full paper here

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.