The Weekend Hackathon Edition #2 – The Last Hacker Standing Music Genre Classification concluded successfully on 12 Aug 2021. The challenge involved creating a scalable music genre classification model that generalizes well on unseen data. It had almost 300+ participants and 120+ actively competing on the leaderboard.
Based on the leaderboard score, we have the top 3 winners of the Music Genre Classification Challenge, who will get free passes to the virtual Deep Learning DevCon 2021, to be held on 23-24 Sept 2021. Here, we look at the winners’ journeys, solution approaches and experiences at MachineHack.
Please note that most of the winner solutions are shared voluntarily, within a stipulated time frame, so we bring you the best three solutions in the order of their leaderboard rank.
Rank 2 – Eric Vos
Eric learned Industrial IT and Robotics 30 years ago. As part of his course, the basics of traditional AI were covered. A couple of years ago, he got curious about new machine learning techniques like neural networks and deep learning. He started following relevant courses by Andrew NG, Geoffrey Hinton, etc. Eric now participates in most data science competitions and hackathons to practice the newly acquired machine learning skills.
Approach
Eric is happy to have worked on such a unique problem statement and dataset. He started with exploratory data analysis using AutoViz and AutoViML and, in the process, realized that the dual encoding of minute and milliseconds wasn’t a transformation job alone. So he decided to keep track of the original time format in a separate feature. In addition to the existing features, he added standard NLP and language features, adding up to 29 features for modelling. After several experiments with various models, a single CatBoost one produced by AutoViML delivered the best result.
Experience
Eric is a serial MachineHacker and has learned a lot from published solutions shared by top machine hackers. He says, “MachineHack is a great place to improve my machine learning skills and play with various original datasets. I like the ‘weekend’ format; it’s now my weekly brain sport.”
Check out his solution here.
Rank 3 – Anand Kumar
Anand completed his engineering in Electronics and Communication from Anna University, Chennai and has close to 10 years of experience in data science/ machine learning and is currently working as an Associate Manager in a leading analytics research firm.
Approach
Anand thinks the fact that the problem statement is a multiclass problem makes it slightly different. In particular, the Track Name variable had 15000 unique values, so a high cardinality variable can be pretty tricky to handle.
He tried label encoding categorical features and different techniques to impute missing values like mean/median imputing. What worked out best is converting two categorical features, ‘ Artist Name’ and ‘Track Name’, into a string and filled NA value in ‘Popularity’, ‘key’ and ‘instrumental-ness’ with simple zero.
Anand used FLAML – Fast and Lightweight AutoML to get the best model and for further Hyperparameter tuning. He converted two categorical features into a string and then, while fitting the CatBoostClassifier model, passed them as “cat_features”. He also used loss_function=’MultiClass’ in this case.
Experience
Anand says, “It was a great experience and learning at the same time”.
Check out his solution here.
Rank 4 – Harshad Patil
Harshad started his data science journey about five years ago. He used to work as a Business Analyst but always found deep learning fascinating. He started learning from online sources that helped him with the basics. He then switched roles from an Analyst to a Data Scientist and started competing in hackathons from different websites such as Machine Hack, HackerEarth, Zindi-Africa, etc. He keenly followed the solutions of the top 3 in every hackathon and incorporated some approaches. He eventually won a competition and was presented with a cash prize of $500.
Approach
Harshad decided to use tree-based models due to the vast number of unique values in Artist and Track Name. First, he segregated all outliers and label encoded all categorical data, hence getting a good baseline score. Then using Winsorizer to deal with outliers alone improved the evaluation score. Next, he incorporated a total of 55 features, out of which many were only aggregation features. But after performing Principal Component Analysis, he stopped at 20 features, which got him a good leaderboard score. Finally, he used the CatBoost model, which is most robust in handling categorical features.
Experience
Harshad says, “It’s a great website to challenge your knowledge. And, nowadays, companies are looking into applicant’s ranks to get a glimpse of the candidate. So, keeping a higher rank on this website can give an edge to the person. “
Check out his solution here.
Once again, join us in congratulating the winners of this exciting hackathon – who indeed were the “Last Hackers Standing” of Music Genre Classification – Weekend Hackathon Edition-2. We will be back next week with the winning solutions of the ongoing challenge – Tea Story.