MachineHack successfully concluded its ninth installment of the weekend hackathon series on June 22. The ODI Match Winner Prediction hackathon was welcomed by data science enthusiasts with over 400 registrations and active participation from close to 218 practitioners.
Out of the 213 competitors, three topped our leaderboard. In this article, we will introduce you to the winners and describe the approach they took to solve the problem:
Mrutyunjaya Rath
Besides being a graduate in Mechanical Engineering, data and coding have always excited him. He started his data science journey by doing a course with upGrad in association with IIIT-B. Although he found it to be a little difficult at the beginning, continuous practice gave him the confidence and skills to pursue this path.
He spends most of his time participating in hackathons and acquiring new skills by learning new techniques. “You will succeed in some of the approaches, while in some, you will fail miserably; that is something which is exciting about data science,” he said.
Approach To Solving The Problem
Mrutyunjaya briefly explains his approach as follows:
After putting a lot of thinking on how to approach the problem, I stacked both Team 1 and Team 2 features and performed one-hot encoding. Similarly, I one-hot encoded the Host country feature. The rest of the categorical variables were handled through Label Encoder. I tried grouping-by some categorical variables and made some statistical features, but it did not work. After that, I went on to build models and tuned the parameters using early stopping. XGBoost gave me the best result for this particular problem.
“MachineHack is one of the best platforms for any data science enthusiast. Not only can you compete here, but you also get to know your participants, which leads to an increase in your connections, and you get to talk and interact with like-minded people. I would like to thank MachineHack and Analytics India Magazine for organizing this hackathon, and also for their contribution towards the data science and machine learning community. I would also like to congratulate my fellow participants who managed to put a score on the leaderboard,” he said.
Get the complete code here.
Anil Betta
Anil had no interest in programming during his engineering days, but he had a great deal of interest in mathematics, especially statistics. He soon came across a domain in Computer Science which involved a lot of Math and Statistics, and thus, he started his journey in Data Analytics. He did some online courses on the subject and later came across Kaggle where he acquired his skills through open kernels and blogs.
The ongoing pandemic forced him to use his time in participating in hackathons by MachineHack, among other platforms. Anil is currently working as an ML Engineer at Opteamix. “Thanks to MachineHack for organising these interesting weekend hackathons from which I am learning continuously,” he said.
Approach To Solving The Problem
Anil explains his approach as follows:
- Key thing was to convert the multi-class classification into binary classification. This is because predicting the winning probability for other teams, when Team-A is playing against Team-B, doesn’t make any sense. This approach helped massively where my validation score improved from 0.68 to 0.61.
- I created a Team Rank feature based on the value counts of MatchWinner with respect to Team.
- I Added a number of group-by aggregate features on both numerical and categorical features.
- Then, I created group-by n-unique features.
- I created interaction features (combining 2 features).
- Frequency encoding on categorical features gave a decent boost in score.
- Created Team Experience feature based on the number of unique stadiums each team played in.
- Used Stratified K Fold validation which was matching with LB score.
- LGBM model.
The following approaches did not work:
- Target Encoding
- Target-based Feature Engineering
Get the complete code here.