Now Reading
The Winners Of Last Hacker Standing – Soccer Fever Challenge

The Winners Of Last Hacker Standing – Soccer Fever Challenge

The Winners Of Last Hacker Standing - Soccer Fever Challenge

The Weekend Hackathon Edition #2 – The Last Hacker Standing Soccer Fever challenge concluded successfully on 26 August 2021. The challenge involved predicting the outcomes of soccer games played over a period of time.   The problem statement was a classic study for decision-making and understanding the odds stacked against a particular situation. We are pleased to share that this was the most participative hackathon in this edition and we owe it to our learners and their love for Soccer. 

Based on the leaderboard score, we have the top three winners of the Soccer Fever Challenge, who will get free passes to the virtual Deep Learning DevCon 2021, to be held on 23-24 Sept 2021. Let’s get to know the winners’ journeys, solution approaches, and experiences at MachineHack. 

Register for FREE Workshop on Data Engineering>>

First Rank – Deregnaucourt, Thomas

Thomas has a PhD in Data Science, and is currently working at Ausy as a Data Scientist. He likes to challenge himself in competitions like Kaggle and now MachineHack challenges, during his free time to improve his knowledge.

Approach

He says, “The problem was unique since some features contained only missing values in the test set despite having values in the training set. It was important to deal with this issue by dropping those features, in order to not extract false information for the predictive model.”

The main preprocessing techniques used :

  • dropping the unusable features, that contains only missing values in the test set
  • engineering three new features: the month and the weekday of the match, and the sign of the difference between the projective scores
  • dropping the detected outliers based on the last engineered feature

In EDA, the difference between the projective scores was very informative. For the observations having a strictly positive difference between the projective scores (5051 observations), the Outcome was 1 in 99.88% cases (only 6 observations have an Outcome of 0). At the same time the observations having a strictly negative difference between the projective scores (2344 observations) have an Outcome of 0 in 99.87% cases (only 3 observations have an Outcome of 1). Features created were – the goal difference of opponents, i.e. team 1 and team 2 for the day.

Thomas tested a few ensemble learning classification models, in particular, RandomForest, LGBM, CatBoost and XGBoost. After having incorporated the “goal difference” feature, the model which worked best was a Decision Tree.

Experience

He says, “This challenge was the first competition I have participated in with MachineHack, but it is certainly not the last one.”

Check out his solution here

Second prize – Gaurav Chaudhary

Gaurav is a data science aficionado with 5.8 years of professional experience in the domain of AI, Data Supply Chain & Big Data. He is currently working as a Data & AI Consultant at Accenture Technology catering to projects, clients, and business development.

Approach

In EDA, Gaurav found if the difference between proj_score1 & proj_score2 is greater than 0, then the outcome is 1, else 0. Only one feature was sufficient – the difference between proj_score1 & proj_score2. Also, as per him , there was no requirement of a Machine Learning algorithm. That was definitely the “googly” in this problem statement. A straightforward approach was what was required.

Experience

Gaurav says. “Overall experience is good. I would still insist on having metadata provided if possible, for better understanding of the problem statement in hand”.

Check out his solution here

Third prize – Ajay Raja

Ajay Raja started off at Latentview as an intern and graduated there till senior analyst level. He then shifted gears and moved to analytics consulting at BCG where he worked with many practice areas like retail, industrial goods, public sector, etc. He is currently serving as a team leader at BCG GAMMA, the analytics arm of BCG.

Approach 


He found very few features to model with; output was dependent on model tuning.

Preprocessing steps followed:

See Also

Cleaning up dates, removing fields that did not have a value in the test set, 

feature selection based on Random forest Variable importance.

Insights drawn included:

Projected scores were the most important predictors; Delta between Projected 1 and Projected 2 was also a highly significant variable. Feature engineering included linear combinations of projected 1 and projected 2 (sum, product, ratio, delta). Delta was the one that stood out most, followed by ratio.

Model used: 

Gradient boosted Trees, CatBoost, NaiveBayes, and proprietary AutoML codes were used for a number of experiments. Gradient boosted trees provided decent results comparable to AutoML.

Experience 

Ajay says, “It was nice, there are a lot of resources for learning and development which I plan to visit whenever I get time.”

Check out his solution here. 

Heartiest Congratulations to the winners of Soccer Fever Hackathon of the Last Hacker Challenge Weekend Edition – 2 .  

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top