Search

Meet This Week’s MachineHack Champions Who Cracked The ‘Grocery Sales Forecast’ Hackathon

MachineHack concluded its fourth instalment of the weekend hackathon series this Monday. The Grocery Sales Forecast hackathon received active participation from 171 participants and close to 380 registrations.

Out of the 171 competitors, three topped our leaderboard. In this article, we will introduce you to the winners and describe the approach they took to solve the problem.

#1: Karan Juneja

Karan is an Electronics and Tele-communication Engineer from PICT, Pune. His data science journey began out of his passion and curiosity for robotics and he has been acquiring new data science skills from free online resources as well as by participating in hackathons.

Approach To Solving The Problem

Karan explains his approach briefly as follows.

Being a time series problem with just one feature, the dataset was a bit challenging. Firstly, I tried to understand the data by plotting smoothing and rolling averages. It could be observed from the plots that the standard deviation was a bit high in the early days and then went very low in the latter days. I decided to trust my local cross-validation score instead of the public leaderboard score. I created a new feature Quarter which could be used to create lags and rolling average features, and it helped in getting me to the top of the leaderboard. I also compared the standard deviation of the predicted 90 days’ sales.

Get the complete code here.

Adarsh and Sai are  B-tech second year Electronics and Communication Engineering students at Vidya Jyothi Institute of Technology, Hyderabad.

Both of them had their first encounter with machine learning during their first year of college in an NLP hackathon conducted at IIIT Hyderabad. The hackathon provided them with a great learning experience which made them want to learn more about the domain. From then on, both Adarsh and Sai have been spending time learning and upskilling by practicing machine learning problems.

Approach To Solving The Problem

We started by creating generic baseline models that did not give any good score on the leaderboard. We tried neural networks with Keras-TensorFlow, Linear Regression, Random Forest, XGB Regressor, Light GBM, and other regression methods. We used a windowed data set and tried the LSTM network using Keras which gave a good cross-validation score, but the leaderboard score was still low. We then shifted our focus on feature engineering. The total data given to predict was for 2 years (692 days), so we considered each year as 346 working days, and 346 days were divided into months and added to columns days and months to the data. After adding the features, we tried the regression models again and XGBoost regressor came up with the best score. Additionally, we tuned the parameters to improve the score.

“We have been participating in MachineHack hackathons regularly. The hackathons have been very competitive, and we also got a chance to connect with other competitors and gain more knowledge. We are very thankful to MachineHack and its practice courses. They were very helpful.” – they shared their MachineHack experience.

Get the complete code here.

#3: Mohammed Abdul Qavi

A Senior Data Scientist at ADP, Mohammed Abdul Qavi solves various problems in the HCM domain. He started his career working on basic statistical models. And, his interest in mathematics drew him to the data science space. Abdul Qavi earned his Masters in Industrial Engineering and Operations Research (IEOR) from IIT Bombay in 2013. He learns and acquires new skills through MOOCs and by reading articles across various websites like Analytics India Magazine, Medium, Kaggle, LinkedIn, etc.

Approach To Solving The Problem

Abdul explains his approach as follows:

Initially, I felt that the problem is a typical time series problem. The initial hypothesis was that the seasonality might be seven days because people might make a lot of grocery purchases during the weekends. Of course, this might not always be true due to a lot of online purchasing. The data showed an upward trend while the seasonality was not very obvious in the beginning.

The biggest challenge was the inconsistency between the public LB score and the local test score. My CV strategy was to consider the last 90 days as test data and perform 5 Folds time series based cross-validation on the remaining data. This way, the local CV score and test score were quite consistent. After a few submissions, I decided to trust my local CV score.

I spend most of my time training and tuning the prophet model. I also converted the problem from the time series to a regression problem and trained a lightgbm model that gave the best result.

Steps:

1. Performed  EDA and created an initial baseline solution using the prophet

2. Further data exploration suggested that quarterly seasonality was the way to proceed

3. Switched to statistical methods and fitted the data using the Holt-Winters method

4. Tuned the alpha, beta and gamma parameters and ended up with the best public LB score

5. Due to inconsistency in the local CV and public LB score, I decided to try other alternatives

6. Created lag features and converted the problem to a regression problem. Train a lightgbm model

The final submission was an average of 5 Fold TimeSeriesSplit predictions.

Things we can try:

–   Tuning the Lightgm model parameters

–   Training on the complete data using 3 different seeds and taking an average

–   Training Xgboost model

–   Ensembling Lightgbm and Xgboost (based on a weighted average)

“MachineHack is an amazing platform for a lot of data scientists to practice, learn, participate, and win exciting prizes. The community is continuously growing with a lot of participants and a healthy competition. I recommend MachineHack to various fresh graduates who are interested in solving various ML problems across industries. The articles on Analysis India Magazine are very informative and should definitely be followed. MachineHack is providing such a good opportunity for everyone to learn during the quarantine. I always have amazing experience talking to the MachineHack organizers and my concerns related to the hackathon are addressed in quick time.” – he shared his MachineHack experience.

Get the complete code here.

Check out for this week’s hackathon here.

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Is GPT-4 Really Better than Radiologists?

“Radiology report summaries created by GPT-4 are comparable, and in some cases, even preferred over

TSMC: The Wizard Behind AI’s Curtain

TSMC anticipates a substantial CAGR of nearly 50% in the AI sector from 2022 to 2027.

Not really.

Google Gemini To Arrive Sooner Than Expected

This is after announcing the AI at the Google I/O 2023, the company had postponed

ByteDance to Launch Platform to Build Custom Chatbots

This comes just a few days after OpenAI had delayed its plan to launch a

This New AI tool Could Mark the Beginning of the End for TikTok and Instagram Influencers

Alibaba Group announces a model framework that can transform still images into dynamic character videos

Embracing Identity: The Journey of Sujoy Das

“Why is it that corporate diversity efforts are often limited to specific times of the

The Biggest Data Breaches of 2023

The most significant breaches that impacted the global landscape in 2023.

NVIDIA Planning Big Expansions in Japan

Prime Minister Fumio Kishida has extended billions of dollars in financial support to bolster TSMC

Runway Partners with Getty to Build Video Generation Model for Enterprises

Runway enterprise users can refine RGM with their proprietary datasets, benefiting various industries like Hollywood,