With the successful conclusion of yet another MachineHack hackathon — Merchandise Popularity Prediction Challenge — on the 8th of February, 2020, in this article, we will share the approaches and solutions of the participants that made it to the top of the leaderboard.
Centered around the massive transformation of consumers’ behavior and their buying patterns, Merchandise Popularity Prediction Challenge asked the participants to develop a machine learning model that can predict the popularity level of merchandise. This fortnight-long hackathon saw the participation of around 700 data scientists and machine learning practitioners who tirelessly worked towards building innovative solutions to this problem statement.
After a careful evaluation using the Multi-Class Log Loss (Multi-Class Cross-Entropy Loss) metric, along with the private and public leaderboards scores, MachineHack selected the following top three winners:
Also Read: The Solution Approach Of The Great Indian Hiring Hackathon: Winners’ Take
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Winner 01: Saurabh Sawhney
Coming from a medical background, Saurabh is an eye surgeon by training. He had a private practice for close to 20 years in New Delhi. However, he has always been interested in computers and maths, and what better way to bring these together than Data Science.
During his years as an eye surgeon, he learned MS Excel and started applying its power for improving the outcomes of his surgeries. He also created several calculators that remain popular with the Ophthalmic community to date.
In 2017, he enrolled in the MCA program of IGNOU, where he learned some essential basics, but more importantly, and helped him believe that he could really do this. Last year, he gave up his medical practice to concentrate whole-heartedly on Data Science. After completing his MCA, Saurabh also took some online data science courses.
Being an ophthalmologist, computer vision excites him the most as he believes that the power of vision is the primary means to understand the world around us.
He also believes that we have just begun to realize the potential of computer vision and are miles away from what is truly possible. He hopes to contribute to the advancement of computer vision in the days to come.
To solve Merchandise Popularity Prediction Challenge, Saurabh had two major components in his solution.
The first element is an ensemble of Random Forests which he likes to call Random Jungle. He created multiple Random Forest models and combined them for a stronger prediction.
The question was, how do I get different forests from the same data? Is there an approach that can inject more randomness into a random forest?
He decided to preprocess the data in different ways, increasing or decreasing the number of columns, and specifying the hyperparameters of the random forests differently. This included creating polynomial features as well as some simple feature generation which the polynomial function misses. He finally ended up with ten different preprocessing variations to which he applied the Random Forest algorithm, later combining the soft predictions from each forest using median values.
The Random Jungle is a generic approach that I hope will be useful for data science problems in the future. – added Saurabh
While speaking about his MachineHack experience, Saurabh said, “Since May 2020, I have been trying my hand at the data problems offered by Machine Hack. There are many things I love about the platform. It is no-nonsense, professionally managed, and very easy to use. The problems are challenging, forcing a rethink of all I know in the effort to get a little extra performance. In my own journey as a Data Enthusiast, I have benefitted tremendously from sharpening my skills on these hackathons. The only grudge, if I can call it that, is the limited number of submissions per day. It prevents me from trying out all the things I would want to, but then I guess it is important to have that limit, or we would all be hill-climbing the test set. Overall, it has been a fantastic experience on Machine Hack. Thanks and kudos to the team behind it.”
Winner 02: Sachin Yadav
Sachin started his career as an application developer with exposure in the banking and insurance domain. We use to get a lot of queries from business and were having huge underlying data which was to be analyzed to resolve those queries. This is where he started having the interest to play with the data and wanted to generate value out of it. He kick-started his journey with SAS and then he gradually switched to python. Then he took it to the next level when he got enrolled in a postgraduate program in AI/ML from Great Lakes and since then there has been no looking back.
To solve this challenge, Sachin firstly, browsed the data and analyzed the columns before importing it to the Python environment. He does this to make himself comfortable with the data being analyzed and see any pretty obvious patterns in it. For example, in this particular scenario if the store_ratio was zero then popularity was always zero.
Then he started with exploratory data analysis and with which it became evident that a certain class of records was overpowering the rest. He dropped the duplicates in the Train dataset and mapped the popularity of records in Test Dataset to that of Train for matching records.
It is always worth a try to do this; it can result in a good score. – added Sachin
Followed by extensive feature creation and hyper-parameter tuning of selected models (Mainly CatBoost & GradientBoosting Classifier). But the score was getting stuck at a certain point and was not improving. Then he decided to do the Stacking and Voting classifier of the best baseline performing model and finally landed up with a Voting classifier with base estimators as LogisticRegression, ExtraTreesRegressor, CatBoostClassifier & GradientBoostingClassifier.
While speaking about his Machinehack experience “I am big Fan of Machine Hack weekend hackathon, as it was kind of sporting arena wherein people from different backgrounds work hard and try to outdo each other. They have kind of increased the difficulty level with this current hackathon (Merchandise Popularity Prediction Challenge) wherein you are only being allowed three times a day to test your model score. Thanks for all the learning I have gained out of these competitive hackathons.” – concluded Sachin
Winner 03: Ameya Patil
Our third winner, Ameya Patil, has completed Btech in Electronics and works at MNC in product design and development. He is passionate to explore new areas and new technologies. His passion for learning(CV & NLP) in general and problem-solving skills led to machine learning and deep learning. With not much background in coding, he had enrolled in various MOOCs to learn about the demanding field. He uses his free time to sharpen his skills in machine learning and deep learning other than a full-time job. He tries to spend most of my time reading about topics or time doing projects, participating in hackathons, or anything which adds to knowledge and builds on end-to-end use cases in the field of data science and product development.
Discussing his approach, he said there was a lot of experimentation involved in arriving at the final solution. His goal was to make a simple and generalized model, most of his intuition about the problems led him astray; for example, log-transforming the skewed target led to significantly worse performance. Even making out of features was difficult based on limited information.
As is the case in machine learning hackathons, feature selection and feature engineering were crucial. The feature engineering tried using various automated libraries, making poly features not enough to up the score at a respectable position. Especially features like time, Category_1, Category_2 was not making sense at all until the end of the hackathon so as to improve the score. While running several models ExtraTreeClassifier worked best for him initially. However, to increase rank in the competition this was not enough. So the final model was a voting classifier with Catboost, ExtraTreeclassifier, and RandomForrestClassifieer assigning different weights an improved score on top. The crucial to this hackathon was prediction adjuster to check overlapped train and test data for prediction. This helped him to further improve his scores.
While speaking about his Machinehack experience “MachineHack is an amazing platform, especially for beginners. The problems here are simpler, giving a chance for new learners to get their hands dirty at machine learning problems. I intend to continue using MachineHack to practice and refresh my knowledge of Data Science. Winning solutions from previous hackathons are an invaluable learning resource that I highly encourage aspiring participants to leverage. MachineHack platform has been invaluable for my learning journey. Each and every hackathon ends up teaching something new. MachineHack has a lot of talented participants and healthy competition, which really forces one to surpass limits on solving the given problem.”