Last updated October 7, 2021
In Creative AI

Meet This Week’s MachineHack Champions Who Cracked The ‘Metal Furnace Challenge’

Share

Published on April 23, 2020

by Amal Nair

MachineHack’s latest venture into a new series of hackathons has proved to excite the Machine Learning and Data Science Community. Last week MachineHack launched its first Weekend hackathon, a short running hackathon where the participants have just the weekend to compete and win.

The hackathon received an overwhelming response from the Data Science and Machine Learning community with close to 300 registrations and 166 active participants.

Out of the 166 competitors, three topped our leaderboard, we will introduce you to them and the approach they took to solve the problem.

#1: G Mothy

G Mothy is a final year student of Computer Engineering at Army Institute of Technology, Pune. His Data Science Journey started with his internship at IIT Madras under the guidance of the IIT professors.

From then on he never looked back and had been exploring different areas in data science with the help of his seniors, and using platforms like Kaggle and Analytics Vidhya. He likes exploring various types of hackathons to experiment and acquire new skills.

Approach To Solving The Problem

“Working on this dataset was challenging and new ideas were not striking as the data was completely anonymized which provided limited scope for feature engineering.” – Mothy said.

He explains his approach briefly as follows.

Started with exploratory data analysis to find out about the relation between the features and for creating new features. Although local k-fold cross-validation was providing a way to validate the submission. 30% of test data was not providing the right insights about the model performance on the public leaderboard.
On EDA feature f9 had only one unique value, so feature ‘f9’ was removed
Transformed the test feature values into train feature values, as there was a linear relationship between the feature values.
Trained an XGBoost on 80% and validated on 20% of the train data.

“This platform was quite new to me. The public and private split made the hackathon more challenging. Overall experience was mostly filled with learnings and I would love to explore and participate in more such challenges in future.” – he shared his experience on Machinehack.

Get the complete code here.

#2: Rahul Gupta

Rahul Gupta is a BTech student pursuing Electronics and Communication Engineering at Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow.

Currently in his third year, he started his journey towards machine learning when he was in his second year while exploring his interest in latest technologies . Machine Learning as a field got him excited enough to venture into it. He started with the basics of python and some essential libraries and later dived into the basics of machine learning.

Approach To Solving The Problem

Rahul explains his approach briefly as follows:

Realized that certain features could be considered as categorical and therefore converted it into One-Hot-Vectors
Did some feature engineering, added certain features by taking means across columns
Based on the feature importances, selected specific features
Finally, applied GradientBoostingClassifier which provided the best result

“It was a great experience to work on this platform and to apply my theoretical concepts on a practical scenario. Also, this is a good platform for beginner’s like me to showcase our skills.” he added about MachineHack.

Get the complete code here.

#3: Kranthi Kiran

Kranthi Kiran is a Computer Science engineering student at Army Institute of Technology, Pune.

He first came in touch with Machine Learning when one of his friends was doing the Titanic Survival Challenge and was amazed that he could predict the survival of a person in a natural disaster.

This made him curious enough to try out the problem on his own. By the time he finished the competition, he was totally astonished by the power of Analytics and Machine Learning on real-world problems.

Approach To Solving The Problem

Kiran explains his approach as follows:

Firstly, the data looked like it was scaled/normalised by some means.I found that both the train and test sets had the same standard deviation across all features which meant that there was a possibility of the data being artificially generated and then being normalised to mask the values.

Next I hypothesised that features had a categorical nature because they had very less unique values (in the range of 2-8) and only 3 features had a number of unique values in two-digits. To prove this hypothesis I tried checking if there were any common values between train and test. But there were none. I deeply checked the distribution of values in features of train and test sets side-by-side.

In the table above we can see that even if the train and test values are different (they are very close in-terms of euclidean distance), the percentage of values are very similar. So if we could map these uncommon test values by train values, we could have the model learn and perform better on test sets.

This could actually be done by two methods :

1. Min Max Scaling on both Train and Test sets separately as both have different distributions.

2. Choosing a closest neighbor of test value from train value for every feature.

The Min Max Scaling option worked better on local CV and leaderboard.

The next problem was class imbalance as some classes had as little as 2-3 records of training data. I used SMOTE oversampling for balancing out the minority classes but the problem after oversampling is that the model is incapable of understanding the imbalance of target in the train data i.e we can have a test set with same target distribution but the model trained on oversampled set may cause the probabilities of original targets to lower(due to more minor class records).

While this brought great results on the public set I relied on my local CV and trained my model on a Min-Max Scaled version of data.

I baselined almost all models for which gradient boosting methods worked great, especially XGBoost and LightGBM. I ended up using XGBoost for my final model which worked a tad bit better on local than LightGBM.

“MachineHack is a great platform for anybody practising Data-Science and Machine Learning as you can compete with anybody starting from a student till a Data Scientist with 10 years of experience and learn tremendously in parallel to competing with the best” – he shared his opinion on MachineHack

Get the complete code here.

Check out this week’s hackathon here.

Access all our open Survey & Awards Nomination forms in one place