Genpact, in collaboration with Formula E team Envision Racing and MachineHack, successfully completed the Dare in Reality hackathon for data scientists and machine learning professionals on 22 November. The goal? To help the racing team improve its performance in the all-electric, international single-seater world championship. The hackathon welcomed more than 5,200 participants and over 10,000 submissions within just two weeks.
“The idea for organising the Dare in Reality hackathon was to let data science professionals, machine learning engineers, AI practitioners, and other tech enthusiasts work on a real-world problem statement,” said Krishna Rastogi, the Product Lead and Technical Architect at MachineHack. “The hackathon has had one of the highest numbers of participants and submissions at MachineHack, where the rankings were based on the RMSLE metric to predict drivers’ lap times in qualifying rounds ahead of a race. Our participants have solved the problem in many innovative ways.”
Let’s take a look at the winners who impressed the judges with their data skills and took home highly coveted cash prizes and goodies.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Rank 01: Igor Maleev
Maleev was crowned the winner of the Dare in Reality hackathon. Maleev became interested in data science while studying for a PhD in mathematics and statistics. He has experience working as a data scientist in the advertising and retail space and is a data science consultant right now.
Winning Approach
Fig 1: Data Distribution of Free Practices and Qualifying groups
Download our Mobile App
Maleev says that the main idea that helped him win was to train the model on the green segment of Fig 1. After training and testing this data, he got very close to getting the score on the leaderboard during the competition. The rest was very technical and consisted of feature engineering, cleaning data, and training and tuning the model.
Check out the code here.
Rank 02: Mahesh Yadav and Vakada Naveen
Yadav became interested in machine learning when he saw intelligent virtual assistants but his full exposure came after getting into IIT Madras as a research scholar in September 2020.
Naveen has always been fascinated by the portrayal of how AI could do wonders in futuristic movies like I, Robot. Even his final year B Tech project on machine learning was focused on this area. He has secured a place to pursue an MS through research at IIT Madras in the area of Vision and Language transformers.
Winning Approach
Yadav and Naveen followed a three-phase approach, which included:
- Data preprocessing
- Model building
- Ensembling methods
Data Preprocessing
The team preprocessed time columns by replacing them with float values, which were then separated into categorical and numerical columns. The team normalised the skewed features. After applying one-hot encoding to handle the categorical columns and normalization on the dataset is done using MinMaxScaler. Principal Component Analysis (PCA) was performed on the dataset to reduce the dimensionality.
Model building
Yadav and Naveen tried a variety of models such as neural networks with different architectures, light gradient boosting, Xgboost, support vector regression, gradient boosting, and random forests. Neural networks performed best compared to other models. They tried hyperparameter tuning for the neural networks and found some of the best architecture to be used for ensembling.
Ensembling Methods
Yadav and Naveen ran various neural network architectures, sampled different training datasets each time and kept track of the best models. The ensemble approaches they tried include stacking with neural networks as meta learners, stacking with machine learning models as meta learners, simple averaging, and weighted averaging. They said that the best model they submitted was from simple averaging of predictions from the best neural network models.
Check out the code here.
Rank 03: Sylas John Rathinaraj
Initially, Rathinaraj was a SAS developer and got interested in predictive analytics in 2017. He focused on learning courses in statistics, exploratory data analysis (EDA), machine learning, data science and deep learning, from Coursera and Udemy. This is the first time he has ranked in the top five in a hackathon.
Winning Approach
Here, Rathinaraj transformed all the time captured information into seconds and label encoded all the categorical features. The target variable was log-transformed to reduce skewness in the distribution. After that, the redundant features were eliminated, along with the highly correlated feature.
With that, a new feature was created, which is the time taken in all the three sectors, minus the PIT time. He created further features that show the improvement that happened in all the sectors. Along with that, Rathinaraj also created a variable from the ‘event’ column information, just taking the Free Practice and Qualifying Group, and eliminated all the numeric suffixes. He then created one additional feature as frequency encoded values for each categorical variable.
Model Building
In this step, feature deletion, addition and selection were performed to avoid overfitting as the test data contains location 6, 7, 8 qualifying group lap alone. He evaluated the LightGBM, CatBoost and XGboost model but for final prediction, and used the CatBoost model with five-fold cross-validation.
Talking about his experience, Rathinaraj said, “It’s been some time since I started regularly participating in MachineHack hackathons. It has been an extremely exciting journey for me and indeed very useful for my learning,” said Rathinaraj.
Check out the code here.
Rank 04: Praveen Kumar Bandla
Bandla came across the term ‘data science’ when he took a business data mining course while studying for his MBA at IIM Calcutta. He was hooked instantly by the way maths and programming can be employed to help solve complex business problems. While working at EXL, he got the opportunity to work with a US insurance client with their analytics team. He went on to pursue a PGP in data science, offered by Simplilearn in association with Purdue University and IBM. Since then, he has been participating in ML hackathons and has learnt a lot from these competitions.
Winning Approach
To start with, Bandla worked to understand the dataset and the features that were provided. He researched the context of the problem statement to get a better understanding of the task. After this, he performed EDA to explore the distribution of features and their relation to the target variable. After he figured out the features he wanted to use, he trained basic models to get an idea of where he stood on the leaderboard.
He said, “Then, I would experiment with feature transformation, feature engineering, model tuning, boosting, stacking and so on. This would give me an idea as to how complex models are performing with the given dataset compared to simpler ones. In this competition, I found that simpler models perform better than complex models.”
Check out the code here.
Rank 05: Mahima Arora
Arora comes with a bachelor’s degree in mathematics and a master’s in operations research. It has only been a year since Arora started working, but the experience has opened her up to different concepts, a variety of tools and a vast possibility to explore and learn more.
Winning Approach
After exploring the data, Arora started with some data cleaning that included fixing the formats of different variables and converting them into a usable form. Then, Arora performed univariate and bivariate analysis to understand the data better. In the next step, she merged weather data with the original dataset and aggregated it on the combination of location, event and source of data. With this, she calculated the mean for each of the columns and merged it with the original dataset. The imputation was carried out on columns with 60-70% missing data.
After this, Arora converted categorical variables into dummy variables and dropped the irrelevant columns. She split the data into train and validation and started building a model using XGBoost regressor, random forest and gradient boost algorithms. She used k-fold cross-validation to tune her models and fine-tuned XGBoost Regressor with “Mean Squared Log Error” as an objective function, which gave the best performance on her validation data.
Arora’s experience at MachineHack has been enriching and fulfilling. She stated, “From cleaning the data and applying different algorithms to fine-tuning, the model has increased my overall understanding of this field. These hackathons provide a great platform to learn as well as compete in a healthy environment to improve and enhance your existing knowledge.”
Check out the code here.
Out-of-the-box solutions, high degree of skills displayed
The Dare in Reality hackathon saw participants bring out-of-the-box solutions to the table to solve the innovative problem they’d been presented with. Having such a high level of skills on show at the Dare in Reality hackathon surely made it a huge success.
“We were amazed at the number of carefully considered solutions the hackathon received to our challenge,” said the Envision Racing team. “With the data science community demonstrating such a high level of innovation, the five winners should be particularly proud of their success. We’re already exploring how we can adapt their ideas to help the team gain an edge in qualifying.”