Now Reading
Meet The MachineHack Champions Who Cracked The ‘Forest Cover Classification’ Hackathon

Meet The MachineHack Champions Who Cracked The ‘Forest Cover Classification’ Hackathon

Amal Nair

MachineHack successfully concluded its twelfth instalment of the weekend hackathon series last Monday. The Forest Cover Classification hackathon was greatly welcomed by data science enthusiasts with over 240 registrations and active participation from close to 130 practitioners.

Out of the 130 competitors, three topped our leaderboard. In this article, we will introduce you to the winners and describe the approach they took to solve the problem.

#1| Devrup Banerjee

Although Devrup learnt python just out of the sheer need to automate the routine work and gather data at scale, his real enthusiasm and passion for data science sprouted in his second year of MBA at Great Lakes Institute of Management, Gurgaon, while he was attending his marketing and retail analytics class. He realised that the real motivation behind learning all these algorithms was not about enhancing accuracy but to tell your client by how much you can promise to increase their bottom-line if they were to follow your exact given path. The subject changed his life. 



“My roommate, who was also equally inspired, and I used to have sleepless nights just going through the 25 lacs dataset given as a final project with our rickety computers to generate actionable insights. To better the bottom-line percentage, that’s what inspired me into analytics.“ – He said

His team has won many competitions at MBA level, won at IIT Kanpur MRA tournament while finishing as runners up at IIM Kashipur’s case study on analytics.

He is currently trying to deep dive into data science to better his analytical skills so that if someone gives him a dataset in future, he can be both a business analyst and a data scientist. 

Approach To Solving The Problem 

Devrup explains his approach briefly as follows:

The first thing to notice about the problem was how similar the train and test datasets were. Every boxplot and every histogram almost threw an identical picture of the train and test sets. This indicated a stratified splitting of the train and test from the main dataset which might have resulted in said distribution. Thus there was no use of oversampling of the minority classes. Extensive feature engineering and a reverse clustering based on the test set helped achieve a certain level of accuracy and clipping of few values based on train distribution gave a major boost. CatBoost and LightGBM gave the best results.

“MachineHack has been a huge source of inspiration and learning, along with Analytics India Magazine which keeps us up to date on the latest happening from around the world on analytics. They have established themselves as the domain leaders, and I won’t be surprised if they are soon known as the Indian Kaggle.”- he shared his opinion about MachineHack

Get the complete code here.

#2| Karan Juneja

Karan is an Electronics and Telecommunication Engineer from PICT, Pune. His data science journey began out of his passion and curiosity for robotics. He has been acquiring new data science skills from free online resources as well as by participating in hackathons. 

Karan is a regular participant in MachineHack hackathons and has been in top 3 for several of the competitions.

Approach To Solving The Problem 

Karan explains her approach briefly as follows:

The hackathon provided a dataset that was openly available but still I tried to do a bit of feature engineering myself and with that got a score of around 0.255 with LightGBM. I also figured out that on this particular dataset the XGBoost algorithm could perform much better than LightGBM, but due to the limitations in computational power I chose LightGBM. The competition was tight with very good scores on the leaderboard and so I tried pseudo labelling and it worked out and got me a score of around 0.20.

Get the complete code here.

#3| V G Sravan

Sravan is a third year Electronics and Communication Engineering student at IIT Kharagpur. Intrigued by the technology and its advancements, Sravan started his journey towards data science and machine learning with his focus set on Deep Learning. 

See Also
Data Science
Study Something New Every Day & Participate In Hackathons, Says This General Electric Data Scientist

Amidst the pandemic, Sravan had been using his time in practising machine learning by solving online hackathons. 

“MachineHack is a very good place for beginners, it’s a very easy and competitive place to work on different ML models” – Sravan shared his opinion about MachineHack

Approach To Solving The Problem 

Sravan explains his approach briefly as follows:

This hackathon dataset was already very clean and uniformly distributed, so data cleaning and EDA was not required initially. Out of different models I trained,  ExtraTrees worked very well. This was expected as there was low cardinality of features and high training samples. Feature extraction and analysis played an important role for reducing the error.

Get the complete code here.

Provide your comments below

comments


If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top