New Hackathon For Data Scientists — Buyer’s Time Prediction Challenge

New Hackathon For Data Scientists — Buyer’s Time Prediction Challenge

MachineHack is back with a new hackathon where the machine learning community is being challenged to come up with a machine learning model to predict the time a buyer will spend on an eCommerce platform.

The COVID-19 pandemic has massively changed consumer behaviour and buying patterns. Amid the pandemic, the buyers spent a significant amount of time surfing eCommerce stores, which in turn led to a massive surge in the number of users on eCommerce platforms across the domains. Alongside, the store owners are also working towards attracting customers using various algorithms to leverage customer behaviour patterns.

As a matter of fact, tracking customer activity has turned out to be a great way of understanding customer behaviour and analysing the same to serve them better. To facilitate this, machine learning and artificial intelligence have played a significant role in designing various recommendation engines to lure customers by predicting their buying patterns.

In this hackathon, MachineHack is challenging the participants to develop a regression algorithm to predict the time a buyer will spend on an eCommerce platform.

The challenge will start on 18th December, Friday at 6 PM IST.

Problem Statement & Description

In this hackathon, the goal of the participants will be to create a machine learning model to forecast the time spent by a buyer on an eCommerce platform. It is believed that AI and ML technologies hold significant promise in not only to predict buyers’ time on a platform but also for customer segmentation as well as product recommendation.

The dataset collected for training has 5429 rows with nine columns, including time_spent column as the target variable. Alongside the dataset for testing includes 2327 rows with eight columns. The attribute description includes session identifier (session_id and session_number); client software and device details (client_agent and device_details); date stamp of the session; binary value for any purchase done, cart activity and check-out activity; and the total time spent in seconds. The hackathon, however, comes with a few prerequisite skills such as regression modelling, advanced feature engineering, and ability to optimise RMSLE score.

The evaluation of the hackathon will be done using the RMSLE metric. One can use ‘np.sqrt(mean_squared_log_error(actual, predicted)’ to calculate the same. It will also support private and public leaderboards, where the public leaderboard will be evaluated on 70% of the test data. The private leaderboard, on the other hand, will be made available at the end of the hackathon and will be evaluated on 100% test data. Further, the final scores will be achieved based on the ‘best score’ on the public leaderboard.

With this hackathon, data scientists will not only have the opportunity to get hands-on creating an ML model and have the exposure of solving use cases at the organisational level, but the top three winners will also get MLDS 2021 passes, a conference exclusively designed for the machine learning practitioners ecosystem.

For generating a valid submission file, the participants must submit a .csv/.xlsx file with exactly 2327 rows with 1 column — i.e. time_spent. The submission will return an ‘Invalid Score’ if any extra column or rows are presented. 

The hackathon will expire on 04th Jan Monday at 7 AM IST.

Dataset Description:

  • Train.json – 5429 rows x 9 columns (Includes time_spent Column as Target variable)
  • Test.json – 2327 rows x 8 columns
  • Sample Submission.csv – Please check the Evaluation section for more details on how to generate a valid submission 

Attribute Description:

  • session_id – Unique identifier for every row
  • session_number – Session type identifier
  • client_agent – Client-side software details
  • device_details – Client-side device details
  • date – Datestamp of the session
  • purchased – Binary value for any purchase done
  • added_in_cart – Binary value for cart activity
  • checked_out – Binary value for checking out successfully
  • time_spent – Total time spent in seconds (Target Column)


  • Regression modelling.
  • Advance feature engineering, with date stamps and text data types.
  • Optimising RMSLE score as a metric to generalise well on unseen data.

Click here to participate in the hackathon.

More Great AIM Stories

Sejuti Das
Sejuti currently works as Associate Editor at Analytics India Magazine (AIM). Reach out at

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

Yugesh Verma
All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges

Yugesh Verma
A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks. 

Yugesh Verma
A guide to explainable named entity recognition

Named entity recognition (NER) is difficult to understand how the process of NER worked in the background or how the process is behaving with the data, it needs more explainability. we can make it more explainable.

Yugesh Verma
10 real-life applications of Genetic Optimization

Genetic algorithms have a variety of applications, and one of the basic applications of genetic algorithms can be the optimization of problems and solutions. We use optimization for finding the best solution to any problem. Optimization using genetic algorithms can be considered genetic optimization

Yugesh Verma
How to Visualize Backpropagation in Neural Networks?

The backpropagation algorithm computes the gradient of the loss function with respect to the weights. these algorithms are complex and visualizing backpropagation algorithms can help us in understanding its procedure in neural network.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM