MachineHack wrapped up its 16th edition by announcing the winners for Predict The News Category Hackathon.
Saurabh Kumar, Chetan Ambi and Mohammed Abdul Qavi won the first, second and third places respectively on the hackathon leaderboard. Analytics India Magazine introduces you to the winners and their approach to the solution.
#1: Saurabh Kumar
A skilled and experienced Data Scientist in a reputed firm, Kumar has shown his expertise multiple times by topping several hackathons at MachineHack.
Kumar’s interest in the field of Data Science and Machine Learning emerged from a single algorithm. His personal experience with the Random Forest Algorithm and its capabilities thrilled him to pursue and advance his skills in the buzzing field. Kumar said he is inspired and overwhelmed by the ability of ML algorithms to solve a variety of real-world problems.
Kumar’s Approach To Solving The Problem
He started with the traditional NLP techniques like BOW/TF-IDF along with LightGBM and XGBoost algorithms. As the hackathon grew competitive, he decided to use more sophisticated tricks to better his score on the leaderboard. He used transformer models and their families like BERT/GPT2 and XLNet and finally got his winning score with XLNet.
“MachineHack is a great platform for both aspiring and current data scientist as it provides real-world problems. I have been active on MachineHack platform since their first hackathon and really enjoy competing here. MachineHack team is very cooperative and is willing to work on feedback,” he said.
Get Kumar’s solution code here.
#2: Chetan Ambi
Currently working as Technology Lead at Infosys Ltd, Chetan Ambi is a regular participant who has had a tremendous winning streak on MachineHack. Armed with almost a decade of experience in the IT Industry, Mysore-based Ambi has blazed a trail on his favourite ML playground — MachineHack.
Despite being relatively new to Data Science and Machine Learning domain, Ambi has proven many a time that he is more than just an amateur. He gained his knowledge and skills using online resources. His favourite destinations for data science learning includes sites like Udemy, Coursera, Analytics India Magazine, Machinelearningmastery, Pyimagesearch, Kaggle etc.
Ambi’s Approach To Solving The Problem
Ambi started with the usual approach of data cleaning, stemming/lemmatization, count vectorizer, TF-IDF and was able to get a score little lower than 98. To better his score, Ambi decided to go for Fast.ai. With a week of fine-tuning, the library helped him achieve his highest single model score of 99.05. He then used an ensemble of top 3 high scoring seeds/random states of the same model to achieve a final score of 0.99163027656 which gave him the top second position in the leaderboard.
“MachineHack has become my favourite ML playground and it really is a wonderful platform for everyone from beginners to experts to showcase their Machine Learning skills. I am really enjoying solving industry curated problems on MachineHack. Previously, I have won Author Identification problem, Predict Data Scientist Salary Hackathon, Predict A Doctor’s Consultation Fee Hackathon & Predict The Flight Ticket Price Hackathon. I am expecting more challenging problems in the future from MachineHack,” he said.
“I have attended MLDS 2019 and The Rising events organized by AIM. It was a delightful experience at both the events. I am looking forward to a similar experience from Cypher 2019,” he added.
Get Ambi’s solution code here.
#3: Mohammed Abdul Qavi
A Senior Data Scientist at ADP, Mohammed Abdul Qavi solves various problems in the HCM domain. He started his career working on basic statistical models and his interest in Mathematics drew him to the Data Science Space. Mohammed earned his Masters in Industrial Engineering and Operations Research (IEOR) from IIT Bombay in 2013. He learns and acquires new skills through MOOCs and by reading articles across various websites like Analytics India Magazine, Medium, Kaggle, LinkedIn etc.
Qavi’s Approach To Solving The Problem
Qavi’s approach to the solution is divided into two parts. In the first part, he tried basic machine learning techniques to train different types of models to achieve a standard score and in the second part, he used more sophisticated deep learning models to push the score higher.
He explained his approach as follows:
Part 1: Basic ML Approach
- Started with basic data cleaning and applied various tokenization and stemming methods. After a few attempts, I found word punctuation tokenizer to be the best option
- Created separate features based on Bag-of-words and TF-IDF techniques
- Trained Logistic Regression, Naive Bayes (multinomial and Bernoulli) and SVM model
- Performed grid search based hyperparameter tuning
- Ensembled all the models using the arithmetic mean
Basic ML models placed me in the top 10 positions.
Part 2: Deep Learning Approach
- Used Fastai library to initially create a language model from a pre-trained language model of wikitext corpus. Fine-tuned the language model learner by tweaking the learning rate and the number of epochs accordingly. The prediction gave good accuracy but the score was still under 99%
- Then I Fine-tuned the Bert model (base uncased) and trained it over 6 epochs
- I also used an amazing deep learning library ‘fast-bert’ to train xlnet models (basic and large). I tuned various model parameters like epochs, learning rate and batch size
- In the end, I spend time on the recent Roberta model but didn’t find much time to fine-tune it
After trying various models, I ensured that the predictions among them were not very correlated. My final submission was mode based ensemble of all the model predictions.
“MachineHack is an amazing platform for data scientists to learn and win exciting prizes. Also, the members were very kind and approachable. My concerns related to the hackathon were resolved in no time. I recommend MachineHack to various fresh graduates who are interested to solve various problems across industries. Thank you for the amazing experience, keep up the good work and looking forward to future competitions,” he shared.
“I pursued my masters based on my own interest and back then terms like data science, AI and Machine learning were not very popular. We all see that things have changed a lot in recent years. I want to bring this to everyone’s notice that it is very important to decide your career based on our own interest rather than what the market demands”, he added.
Get Mohammed’s solution code here.