An ‘MLDS 2022’ exclusive hackathon ‘Ode to Code: Predicting weather using alien fruit properties,’ backed by machine learning hackathon platform MachineHack and data science and AI engineering company Tredence, successfully concluded on January 24, 2022.
The hackathon witnessed close to 580+ participants and 300+ solutions posted on the leaderboard, out of which only three winners were selected. The winners of this hackathon received cash prizes worth INR 1 lakh.
The candidates who topped the leaderboard precisely analysed data to grow fruits (similar to Earth) on an imaginary exoplanet. Additionally, the candidates figured out a way to help scientists accurately identify the earth-like season in which the fruit must have grown using the data collected.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Let’s check out who the winners are, alongside understanding their journey into the data science space, winning approach, and overall experience at MachineHack.
Rank 1: Suman Sahoo
A mathematics and computer science student, Suman Sahoo is still in his pre-final year of college. He told AIM that he started data science in his second year. As a data science enthusiast, Sahoo enjoys participating in competitive ML competitions. He is also a chess hobbyist.
Download our Mobile App
Sahoo’s winning solution included data preprocessing techniques (used one-hot encoding for categorical features for all models except Light GBM, where he used label encoding); scaling numerical features (cap-diameter, stem-height, stem-width); feature generation using ratios of numerical features; imputation for completing missing values using new category (for categorical features); and outlier removal from numerical features and others.
Meanwhile, for cross-validation, he used fivefold Stratified KFold cross-validation, along with multiple seed averaging to reduce variance from models. In the modelling stage, he said, he used an ensemble of NNs and GBTs. He also noted that NNs had more weightage in the ensemble than GBTs. “Default Catboost and TabNet worked best,” he added. Further, he said for hyperparameter tuning for Light GBM and HistGradientBoosingClassifier; he used Optuna. Finally, for the post-processing stage, he said, he soft averaged probabilities from all models.
Sahoo said he enjoys solving MachineHack problems. “MachineHack leaderboard helped to improve my skills, squeezing every bit of performance out of ML models. Looking forward to new contents,” added Sahoo.
Check out the code here.
Rank 02: Hrishi Morde
Hrishi Morde is a computer science graduate. Currently, he works as SDE at ZiMetrics. Besides development, Morde likes the analytics field. He said he got introduced to this field by a friend in the third year of his bachelors. Since then, he has been continuously learning and practising and has also done online courses to learn more about the field.
“Currently, I am trying to learn advanced concepts in deep learning, along with my work. And, I feel that I am not just learning new things, but these new things are helping me to approach the development programmes better,” said Morde.
He said he started solving ML problems by understanding the problem statement and other attributes in that dataset. After that, he did an extensive EDA (exploratory data analysis) to find the information insights and pure samples that belong to a specific class. This EDA consisted of various analytical methods, like hypothesis generation, understanding the data by scrutinising every variable, univariate, bivariate, and multivariate analysis.
After EDA, he started with feature engineering, implemented various features, and eliminated the ones causing metrics to go down. “I tried numerous models using different features and did the post modelling EDA, where I got some pure samples. After that, I tried to tune various models and implement a cross-validation strategy, but none gave me better accuracy locally. Hence, I skipped them and predicted the target variable,” said Morde.
“There are very few platforms that are giving an opportunity to machine learning practitioners to solve such incredible problem statements, and I would say that MachineHack is on top of that list,” said Morde. He said MachineHack would continue to do this great work and give people the opportunity to hone their skills and learn more.
Check out the code here.
Rank 03: Eric Vos
It is interesting to note that Eric Vos is not a data scientist. He said he learnt industrial IT and robotics 20 years ago. As part of his course, the basics of traditional AI were covered. “A few years ago, I was curious about new ML techniques like neural networks and deep learning. So I followed some great MOOCs (Andrew NG, Geoffrey Hinton, etc.).” Since then, he has actively participated in various data science competitions and hackathons to improve his learned skills.
Vos said he started with plain EDA and figured out that the dataset is probably synthetic. “I tried to encode categorical features using various methods and worked on outliers in numeric features. However, the results of my modelling experiments were not good and induced huge overfitting. With no clear signal in the data set, I decided to try the MLJAR package. My best result was produced by an “Ensemble_Stacked” MLJAR model trained in ‘perform’ mode,” he added.
Vos said he has participated in several MH hackathons and learned a lot from published solutions from top machine hackers. “It’s a great place to improve my machine learning skills and play with various original datasets. Thanks for all the learning I have gained out of these competitive hackathons,” said Vos.
Check out the code here.
It’s a wrap!
This one of a kind hackathon conducted by Tredence was open to data scientists, machine learning practitioners, analytics professionals taking a crack at predicting the weather using alien fruit properties. It presented a unique challenge of analysing data samples of fruits growing in an exoplanet and identifying the climate based on the properties, with the existing challenge of missing data.
As data science and AI engineering company, Tredence focuses on solving the last mile problem in the analytics space. The company consists of 1,500+ employees and has a presence in Foster City, Chicago, London, Toronto and Bangalore, with clients across industries, including retail, CPG, hi-tech, telecom, travel and others.
Check out MachineHack for exciting new hackathons!