MITB Banner

Watch More

Meet This Week’s MachineHack Champions Who Cracked The ‘Insurance Churn Prediction’ Hackathon

MachineHack concluded its second instalment of the weekend hackathon series this Monday. The Insurance Churn Prediction Hackathon turned out to be a blockbuster and was greatly welcomed by the data science and machine learning community with active participation from over 200 participants and close to 400 registrations.

Out of the 221 competitors, three topped our leaderboard. In this article, we will introduce you to the winners and the approach they took to solve the problem.

#1: Karan Juneja

Karan is an Electronics and Telecommunication Engineer from PICT, Pune. His data science journey began out of his passion and curiosity for robotics. He has been acquiring new data science skills from free online resources as well as by participating in hackathons. 

Approach To Solving The Problem 

Karan explains his approach briefly as follows.

  1. Since all the features were anonymised there was very little room for feature engineering. The best strategy was to find out the correlations using pairplots and try to remove the noise
  2. Identified and removed all insignificant features using XGBoost
  3. Tried mean-encoding and also created new features with polynomial features 
  4. Trained two models XGBOOST (5 Folds Stratified) and LGB (10 Folds Stratified), and ensemble them with weights that got the best score on the leaderboard

“It’s an amazing platform. This was just my second competition on this platform, and so far, my experience has been sublime,” said Karan.

Get the complete code here.

#2: Atif Hassan

Atif has always been fascinated with the idea of building something intelligent out of one’s own logic. As a high school student, he worked on fusing games with evolutionary algorithms. During his graduation, he worked on various ML-based projects such as topic recommendation systems, classification of scientific articles, etc.

His curiosity directed him into choosing data science as his career while doing his Masters at IIT Kharagpur.

He is an active participant in many hackathons across platforms such as MachineHack and HackerEarth.

He has also published two different novel algorithms on topics related to the data-mining and bioinformatics fields with another one on NLP, currently in review.

With all his skills and determination, he hopes to join an R&D department in the industry/government and simplify the lives of all his fellow Indians, using data science.

Approach To Solving The Problem 

He explains his approach as follows:

As the data was anonymized, there was no scope for feature engineering. It could be clearly understood from the data that some features had been standardized (treated as numeric) while others were not (treated as categorical). feature_8 was one-hot-encoded (ohe) and other features were left as is. I noticed that XGBoost worked better on ohe dataset while LightGBM handled categorical variables internally and required no one-hot-encoding. RandomForest too was better off without one-hot-encoding. I, therefore, maintained two separate datasets, one for XGBoost and the other for RandomForest and LightGBM. 

I tried reverse-engineering the features in order to derive some insight, but it did not help. Also, it was clear from plotting the data that none of the features was able to conclusively separate the positive and negative samples, so no variables were removed. Having finished with the initial analysis and data transformation, I shifted my focus on building better models. I developed two separate sets of ensembles.

Since the dataset was highly imbalanced, the first ensemble was built using cost-sensitive versions of the RandomForest and LightGBM classifiers. Specifically, each classifier’s hyper-parameters were fixed through 10 fold stratified CV, and their probability outputs were combined through a weighted average.

Due to the data imbalance and relatively large sample space, the second ensemble was performed on majority-class randomly undersampled data. This ensemble consisted of RandomForest and LightGBM classifiers as base-estimators. The output probabilities, along with their weighted average output was concatenated with the entire data and fed to an XGBoost model that produced probabilities as output.

Finally, the probabilities from both ensembles were combined using a weighted average to yield the final result.

“MachineHack is a great platform where one can practice his/her theoretical concepts and further their practical knowledge by a large extent. Due to the talented competitors on the leaderboard, one is really pushed to their limits to solve a problem, thus helping a person improve a lot. The hosting, submission and leaderboard of all challenges are seamlessly put together by the platform that allows participants to quickly take part in competitions as well as perform fast re-iteration of their models.” – Atif shared his views on MachineHack.

Get the complete code here.

#3:Nikhil Kumar Mishra

Nikhil is currently a final year Computer Science Engineering student at Pesit South Campus, Bangalore. He started his data science journey during his second year after being inspired by a YouTube video on self-driving cars. The technology intrigued him, and he was driven into the world of ML. He started with Andrew NG’s famous course and applied his knowledge in the hackathons which he participated in. 

Kaggle’s Microsoft Malware Prediction hackathon — in which he finished 25th — was a turning point in his data science journey as it gave him the confidence to take it further and challenge himself with more hackathons on other platforms like MachineHack and other similar platforms. 

Approach To Solving The Problem 

Nikhil explains his approach as follows:

On exploring the data, I found out that two columns had 12 unique values, so I assumed they were months, one column had 31 unique values, so I assumed it to be a column representing the day number of the month. Retracing the month number and day no, from the value provided was difficult, so I tried to create very basic features from it, which gave me a good boost in the score. 

Once done with this little feature trick, I created 2 models one lightGBM and one ExtraTrees and finally made an ensemble out of them.

Next, I found the optimal threshold for maximising the F1-score, which came out to be quite low, 0.27, and used the thresholded predictions as my final submission.

“MachineHack is an amazing platform, especially for beginners. The MachineHack team is very helpful in understanding and interacting with participants to get doubts resolved. Also, the community is ever-growing, with new and brilliant participants coming up in every competition. I intend to continue using MachineHack to practice and refresh my knowledge on data science,” says Nikhil about his experience with MachineHack.

Get the complete code here.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Amal Nair

Amal Nair

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories