MITB Banner

MachineHack Winners: How These Data Science Enthusiasts Solved MachineHack’s ‘Chartbusters Prediction’ Hackathon

Share

MachineHack backed by its parent Analytics India Magazine has been continuously indulging in helping machine learning and data science community grow to its peak by conducting exciting hackathons and challenging the aspirants.

Recently, we concluded our 20th successful edition of Data Science hackathons by announcing the champions for Chartbusters Prediction: Foretell The Popularity Of Songs hackathon which was well received by the community.

Out of the 190 participants, Nikhil Kumar Mishra, Nitesh Yadav and Snehan Kekre won the first, second and third places respectively on the hackathon leaderboard. 

Analytics India Magazine introduces you to the winners and their approach to the solution.

#1: Nikhil Kumar Mishra

Nikhil is currently a final year Computer Science Engineering student at Pesit South Campus, Bangalore.

He started his data science journey when he was in his second year after being inspired by a youtube video on self-driving cars. The technology intrigued him, and he was driven into the world of Machine Learning. He started with Andrew NG’s famous course and applied his knowledge in the hackathons which he participated in. 

“Whenever I learnt a new technique, I was always eager to apply it myself, and competitions gave me a chance to do just that” – he said when asked about his Data Science Journey.

Kaggle’s Microsoft Malware Prediction hackathon in which he finished 25th was a turning point in his Data Science journey which gave him the confidence to take it further and challenge himself with more hackathons on platforms like Kaggle, MachineHack and Analytics Vidhya. 

Approach To Solving The Problem 

Nikhil explains his approach to solving the problem as follows:

1. Model Selection: Unlike most online hackathons, this problem was quite interesting in the sense that the state of the Art Gradient Boosting models did not work quite well here. They were easily beaten by simple models such as linear regression. The real challenge laid in identifying this magic. I used a single neural network with categorical embeddings architecture for this competition.

2. Feature Engineering: The second part of the challenge was feature engineering, and this competition had a lot of scope for it. Simple aggregations of the numerical columns for each categorical column say Likes, Popularity, Comments, Followers aggregated based on Artist Name was a starting point, followed by interaction features formed from the numerical features by the product and multiplication. Timestamp was also an important column for feature engineering. Timestamp was used to extract features such as:

  • Year, month, day, time, etc. for each song.
  • How many days before or after the current song, did an artist release a song?
  • How many days before or after the current song, did a song of the same genre come out?
  • How many days before or after the current song, did a song of the same genre come out?
  • What was the previous number of likes, comments, popularity or followers for an artist?
  • What was the change in popularity of two successive songs released by the same artist?
  • What were the years mean and max of likes, comments, popularity or followers?

Next categorical columns were used to calculate frequency. Log transformations were also applied for each numerical column. Generally, such transformations of different kinds work well with classifiers such as linear regression, logistic regression, neural networks, etc., but not with tree-based models.

3. Model Building: The final model consisted of a five hidden-layer deep neural Network, utilizing categorical embeddings. Model architecture can still be played around to get much better or worse scores.

Get the complete code here.

“MachineHack is an amazing platform, especially for beginners. The problems here are simpler, giving a chance for new learners to get their hands dirty at machine learning problems. MachineHack team is very helpful in understanding and interacting with the participants, to get their doubts resolved. Also, the community is ever-growing and challenged with new and brilliant participants coming up every competition. I intend to continue using MachineHack to practice and refresh my knowledge on Data Science” says Nikhil about his experience with MachineHack.

#2:Nitesh Yadav

Nitesh is currently pursuing MSc. in Computer Science at the University of Delhi. Having a good hold in Mathematics and Statistics, he enjoys playing with Data and Machine Learning models.

He sticks to MOOCs, articles and other free online sources to gain knowledge and uses hackathons as a means to perfecting his skills. 

Approach To Solving The Problem 

Nitesh describes his approach as follows:

  1. Started with basic EDA and handling NANs and performed normalization on the training set to reduce the effect of outliers.
  2. Extracted features from the timestamp and other features by applying feature engineering.
  3. The main focus was on outliers as it affects the evaluation metric (RMSE) a lot. Sometimes removal of outliers may result in the loss of significant information.
  4. Applied simple algorithms such as linear regressions, DecisionTree regressors, etc.
  5. After applying some ensemble algorithms, I tried data augmentation with Xgboost Regressor.
  6. Performed Hyperparameter tuning of Xgboost using GridSearchCV.
  7. Analysis of the test set also helped in training the model to handle outliers.

On being asked about the experience on MachineHack, he said – “MachineHack is a fabulous platform for data scientists to practice and learn. I heard a lot about MachineHack on different online forums or websites while searching for the best platform for Data science competitions. It was my first ever competition on MachineHack and learned so much in winter vacations.”

He also added that his willingness to learn and never give up had helped him a lot in upskilling even while failing to top the leaderboards.

Get the complete code here.

#3: Snehan Kekre

Snehan was introduced to data science and machine learning while studying Computer Science and AI at Minerva Schools in San Francisco. He was impressed by the power of applied mathematics and algorithms that can solve tangible, real-world problems. This, in turn, ultimately led him to Rhyme.com which was later acquired by Coursera where he works as a machine learning Instructor tasked with creating hands-on, project-based courses for learners from around the world. 

Approach To Solving The Problem:

Snehan explains his approach as follows:

I found this to be the most challenging hackathon that I’ve attempted. There was a lot of experimentation involved in arriving at the final solution. Most of my intuition about the problems led me astray; for example, log-transforming the skewed target led to significantly worse performance). It took me a good two weeks to look at the problem from a fresh perspective.

As is the case in machine learning hackathons, feature selection and feature engineering were crucial. Working with the Song_Name feature was especially challenging for me. I tried various techniques to extract information from this feature, including using TF-IDF statistics and GloVe embedded word vectors. The winning solution involved extracting the number of collaborators in songs, after which I had to drop the column or risk model performance. The choice of estimator was also crucial. After trying various tree-based and non-tree based methods, one that was superior was surprisingly scikit-learn’s GradientBoostingRegressor. The final solution was an ensemble of predictions to reduce variance.

“Competitive data science is a whole other ball game. The winning solutions of most of these hackathons involve techniques that are seldom taught in academia but are used in some production systems. Platforms like MachineHack, Kaggle, Analytics Vidhya and others fill this gap and allow anyone, regardless of background or prior experience to compete on a level playing field, where often the only thing that matters is optimizing a metric. Winning solutions from previous hackathons are an invaluable learning resource that I highly encourage aspiring participants to leverage. This was my third time participating in a data science hackathon, and I’m excited about where the future will lead.”- he added while sharing his experience on MachineHack.

Get the complete code here.

Share
Picture of Amal Nair

Amal Nair

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India