MITB Banner

Meet The MachineHack Champions Who Cracked The ‘E-commerce Price Prediction’ Hackathon

MachineHack successfully concluded its eighth instalment of the weekend hackathon series last Monday. The E-commerce Price Prediction hackathon was greatly welcomed by data science enthusiasts with over 400 registrations and active participation from close to 200 practitioners.

Out of the 189 competitors, three topped our leaderboard. In this article, we will introduce you to the winners and describe the approach they took to solve the problem.

#1| Devrup Banerjee

Although Devrup learnt python just out of the sheer need to automate the routine work and gather data at scale, his real enthusiasm and passion for data science sprouted in his second year of MBA at Great Lakes Institute of Management, Gurgaon, while he was attending his marketing and retail analytics class. He realised that the real motivation behind learning all these algorithms was not about enhancing accuracy but to tell your client by how much you can promise to increase their bottom-line if they were to follow your exact given path. The subject changed his life. 

“My roommate, who was also equally inspired, and I used to have sleepless nights just going through the 25 lacs dataset given as a final project with our rickety computers to generate actionable insights. To better the bottom-line percentage, that’s what inspired me into analytics.“ – He said

His team has won many competitions at MBA level, won at IIT Kanpur MRA tournament while finishing as runners up at IIM Kashipur’s case study on analytics.

He is currently trying to deep dive into data science to better his analytical skills so that if someone gives him a dataset in future, he can be both a business analyst and a data scientist. 

Approach To Solving The Problem 

Devrup explains his approach as follows:

The real problem in the ecommerce dataset was the product and brand names. There were about 300 brands which were not present in the train dataset but were present in the test dataset. So, One-Hot-Encoding or any other form of encoding was ruled out. I considered TF-IDF with 2 ngrams, but it still wasn’t giving me the results I expected. The brand column, along with the item category which if cleaned properly contained the most information about any product price. Thus, I decided to target and encode them and take care of those 300 odd brands separately. A class column was formed, based on the average price levels of each brand in each category, which contained most of the information from the brand name and item category.

A bit of MBA knowledge was also leveraged to identify the end consumer of all the products given, as the first thing we were taught in our MBA was price should always be decided after identifying the target market. And it turned out to be the game-changers.

MachineHack has been a huge source of inspiration and learning, along with Analytics India Magazine which keeps us up to date on the latest happening from around the world on analytics. They have established themselves as the domain leaders, and I won’t be surprised if they are soon known as the Indian Kaggle.

Get the complete code here.

#2| Mrutyunjaya Rath

Besides being a graduate in Mechanical Engineering, data and coding have always excited Mrutyunjaya. He started his data science journey by doing a course with upGrad in association with IIIT-B. Although he found it to be a little difficult at the beginning, the continuous practice gave him the confidence and skills to pursue the path.

He spends most of his time participating in hackathons and acquiring new skills by learning new techniques.

“You will succeed in some of the approaches, and some will fail miserably, and that is something which is exciting about data science”- he said.

Approach To Solving The Problem 

Mrutyunjaya explains his approach briefly as follows.

My approach to this problem was very simple. I believe, for any data science problem, the most important thing is EDA. So, when I started plotting features, I noticed the target variable ‘Selling Price’ was highly left-skewed, due to which the output also was getting left-skewed. So, I decided to apply a logarithmic transformation to normalise the target variable. After that, I tried to create new features like datetime features, group_by of categorical variables for statistical features and categorical variables were handled through label encoder. After pre-processing, I went on to build 3 models, using XGBoost, LGBM and CatBoost. I fine-tuned the models which were giving me a good cross-validation score through early stopping. And in the end, I blended the result of all 3 models to get me my final score.

“MachineHack is one of the best platforms for any data science enthusiast. Not only you can compete here, but also you get to know your participants which leads to an increase in your connections, and you get to talk and interact with like-minded people. I would like to thank MachineHack and Analytics India Magazine for organising this hackathon and also for contribution towards the data science and machine learning community. I would also like to congratulate my fellow participants who managed to put a score on the leaderboard.” – he shared his opinion.

Get the complete code here.

#3| Shravan Kumar

Shravan Kumar is a senior data scientist at Novartis, Hyderabad. Though Shravan has been working in the analytics field for some time, he always had a keen interest in the predictive analytics domain which he wished to explore. He started with MOOC courses on platforms like Coursera, edX, Udacity etc., and acquired the skillsets for his area of interest. Having learned the essentials, Shravan’s next focus was on perfecting his skills through online competitions. He is an active participant of hackathons conducted by MachineHack, Kaggle and other platforms.

Approach To Solving The Problem 

He explains his approach briefly as follows:

Pre-processing steps:

  • Created date features for e.g., Year, Month, Day, DayofWeek, DayofYear, etc.
  • Converted the selling price into log transformation
  • Frequency encoding on each of the categorical variables at different combinations
  • Mean targeting encoding at each of the categorical variables on ‘Product_Brand’, ‘Item_Category’, ‘Subcategory_1’,   ‘Subcategory_2’

Model building steps:

  • I applied CatBoost and LightGBM with different seed values and finally applied a harmonic mean between these two outputs
  • My model looked stable with both the 30% and 100% public datasets
  • Here, I had to train the model on the entire dataset without the train test split ratio, as the leaderboard was not reflecting a good score. 

“MachineHack is a great learning platform for all the aspiring and current data scientists who get a chance to solve some real-world business problems. The articles and blogs provided by Analytics India Magazine are very helpful and keep all our industry-relevant people with the most updated news in the data science and analytics industry. Truly MachineHack is one of the best hackathon organisers and data science knowledge portals in India.” -he shared his experience.

Get the complete code here.

Check out new hackathons here.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Amal Nair

Amal Nair

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories