Now Reading
Meet The MachineHack Champions Who Cracked The ‘Glass Quality Prediction’ Hackathon

Meet The MachineHack Champions Who Cracked The ‘Glass Quality Prediction’ Hackathon

Amal Nair

MachineHack successfully concluded its sixth instalment of the weekend hackathon series last Monday. The Glass Quality Prediction hackathon was greatly welcomed by data science enthusiasts with close to 400 registrations and active participation from over 240 practitioners.

Out of the 246 competitors, three topped our leaderboard. In this article, we will introduce you to the winners and describe the approach they took to solve the problem.

#1: Devesh Darshan

Currently, in his second year of Engineering at Birla Institute of Technology and Science, Pilani, Devesh first came across the term data science during his first year. Like many, he started his journey with the popular Stanford University course by Andrew Ng. His curiosity led him to many other popular online courses as well. He started practising with simple data sets like Titanic and House price on Kaggle. He spends most of his time reading articles and blogs of Analytics India Magazine and Medium to learn new ML techniques.



Approach To Solving The Problem 

Devesh explains his approach as follows:

It was a rather very simple approach from my side. Firstly, I focused on feature engineering and making the data points distribution as normal as possible by applying transformations like log and square root. Then I split the data into train and validation and made some models. I cross-validated the performance using KStratifiedFolds to get a better idea and observed that the ExtraTreesClassifier produced the best result. Then I fine-tuned the model and later used a Bagging ensemble technique to get a more accurate prediction.

“Machine Hack is the best platform for new data scientists to practice and test their skills, as some of the problems stated are very beginner-friendly, unlike Kaggle or other platforms where the problems require experience and a better machine to implement the solution” – Devesh shared his experience.

Get the complete code here.

#2: Vedant Thapa

Vedant is currently pursuing his Master’s Degree in Computer Science from Mithibai College, Mumbai. He first came across the term data science during the final year project of his Bachelor’s program. He was impressed by the applications of data science and started exploring more about it by reading blogs. As he grew more curious, he decided to take a course on Udemy. Later, he joined a data science program at GreyAtom School of data science, where he came across mentors who guided him in building a strong foundation with mathematics and statistics while focusing on practical aspects of data science. Since then, he has been honing and perfecting his data science skills through projects and hackathons.

Approach To Solving The Problem 

Vedant explains his approach briefly as follows.

  1. Started off with EDA and univariate visualisations of the independent variables in training and testing sets. These visualisations suggested that the training and testing sets were from the same distribution
  2. I also found a rule in the ‘x_component’ columns according to which few instances where none of the ‘x_component’ was 1 (target) could be directly classified as 1
  3. The ‘grade_A_component’ and ‘x_component’ columns were found to be one-hot-encoded features
  4. I reversed the one-hot encoding in ‘x_component’ columns and used frequency encoding on it
  5. I engineered some features using numerical independent variables based on basic arithmetic operations like addition, multiplication, subtraction and division
  6. Started training both linear as well as tree-based models, tree-based models outperformed linear models by a significant difference. This was expected as there was a poor linear relationship between independent and dependent variables
  7. Finally, after trying out different bagging and boosting models like CatBoost, XGBoost and LightGBM, ExtraTrees classifier using Stratified 5-Fold CV gave the lowest log loss and standard deviation
  8. I applied the rule found during EDA on the final predictions and submitted it

“MachineHack platform has been invaluable for my learning journey. Each and every hackathon ends up teaching something new. MachineHack has a lot of talented participants and healthy competition, which really forces one to his/her limits on solving the given problem. Any concerns related to the hackathons are also addressed in quick time by the organisers which have helped me a lot as a beginner. Their articles on AIM are really inspiring and informative and should be definitely followed. I intend to continue participating and honing my skills through this amazing platform.” – Vedant shared his experience.

Get the complete code here.

#3: G Mothy

G Mothy is a final year student of Computer Engineering at Army Institute of Technology, Pune. His data science journey started with his internship at IIT Madras under the guidance of the IIT professors. 

From then on, he never looked back and had been exploring different areas in data science with the help of his seniors, and using platforms like Kaggle and other platforms that host hackathons. He likes exploring various types of hackathons to experiment and acquire new skills.

Approach To Solving The Problem 

He explains his approach briefly as follows:

On observing the data grade_A and x_component were one-hot-encoded features. One of the features of grade_A was observed to be a dummy variable, and so the feature grade_A_component_1 was removed. However, in x_components this was not the case.

See Also

The integer part of pixel_area and log_area was the same so pixel_area was removed. New features were created with the xmax, xmin, ymin, ymax by applying some arithmetic operations.

On plotting the count plots for x_component features, there were few clear classification conditions.

x_component == 1 -> it is class 1

With these features, ExtraTreesClassifier provided the best local 5-fold cross-validation by removing a few features based on feature importance and log_loss metric.

“The competitions organised by MachineHack are good for beginners to try and learn the concepts in a competitive environment. Overall experience was mostly filled with learnings, and I would love to explore and participate in more such challenges in future.”- he shared his experience.

Get the complete code here.

Check out for new hackathons here.

Provide your comments below

comments


If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top