MachineHack successfully conducted its eighteenth installment of the weekend hackathon series this Monday. The Product Sentiment Classification: Weekend Hackathon #19 hackathon provided the contestants with an opportunity to develop a machine learning model to accurately classify various products into four different classes of sentiments based on the raw text review provided by the user. Data science enthusiasts greatly welcomed the hackathon with over 257 registrations and active participation from close to 92 practitioners.
Out of the 257 competitors, three topped our leaderboard. In this article, we will introduce you to the winners and describe the approach they took to solve the problem.
#1| Prashant Arora
Prashant has had an amazing journey having participated in several competitions and hackathons on various platforms similar to Machine Hack. While he has learnt many things on the way, the main learning was to try and execute different codes and snippets in different use cases and datasets. After a lot of practice and perseverance, he can now analyze and generalize a dataset in a better way. Prashant started his competition with the HackerEarth platform when he was still new to the field and got a poor ranking. But with participation in more competitions, he gradually began to learn various tools and techniques to achieve this feat with the MachineHack hackathon finally.
Approach to solve the problem
The task was to predict sentiments based on the product description. As they were provided with a small dataset, hence using pre-trained models was the only scope for the competition. Hence, he searched about it on the web and found an amazing library named sentence-transformers. With the help of this, he used the pre-trained Roberta Large and Roberta base model, modified by Facebook Research.
He created two simple data frames of embeddings generated from both large and base models. He then trained separate cat-boost models on both of these data frames, and simply an average ensemble resulted in the best score.
“MachineHack has always been at the top of my work list, especially as it’s weekend hackathons. These small-time competitions have raised a competitive feeling in our minds and have helped to improve ourselves much more.” – Prashant shared his opinion about MachineHack.
#2| Yash Kashyap
Yash was introduced to Data Science during his second year of college, which is almost a year back from now. He had a gradual start with familiarity to just three terms – Data Science, Machine Learning, and Linear Regression. Fortunately, he found a helping senior, who guided and helped him find sources to learn the basics. During the lockdown period, he went from zero to whatever he is right now. He started participating in competitions from May and started to feel confident about his skills. Since then, he has not looked back. Currently, he is exploring Neural Networks and Deep Learning Techniques and deeply enjoys what he does.
Approach to solve the problem
He used the Roberta-Large model for generating the word embeddings. Out of the created word embedding, he created 10 PCA components. Using some newly created features, he trained the CatBoost model to get the current score on the leaderboard.
“I had a great experience in MachineHack so far. I started from being in the bottom and learned a lot from the solution that is posted on GitHub. I now really feel great to find myself in a respectable position in the leaderboard.” – Yash shared his opinion.
#3| Snehan Kekre
Snehan started his data science journey in his sophomore year of undergrad at Minerva Schools in San Francisco. After learning about AI and ML in academia, he joined Rhyme.com as a subject matter expert in data science where he was tasked with creating hands-on educational content on data science and machine learning. Since the acquisition of Rhyme.com by Coursera in 2019, he is now a Data Science and ML instructor at Coursera with over 70,000+ learners. He occasionally participates in data science competitions when the topics are interesting, provided he can make time.
Approach to solve the problem
His approach was very bare-bones and minimal. The network architecture was inspired by “Wide & Deep Learning: Better Together with TensorFlow“, where the text features pass through the deep parts of the network, while the categorical features make up the wide part of the network. Leveraging this paradigm of transfer learning, he made use of The Universal Sentence Encoder (pre-trained) model from TensorFlow Hub.
It encodes text into high-dimensional vectors that can be used for text classification and other NLP downstream tasks. It takes care of all the text-preprocessing and was trained on a very large corpus. So rather than learning embeddings from scratch, he leveraged the massive amount of computing by someone else and simply loaded the universal sentence encoder as an ordinary Keras layer. The raw sentences are fed into this embedding layer, generating 512-dimensional outputs corresponding to sentence embeddings. These embeddings are fed through a couple of dense layers with dropout regularisation.
The categorical feature was one-hot encoded and concatenated with the output from the embedding to dense layers. This model was then fit on 90% of training data and validated on the remaining 10%. After noting the evaluation metrics, he re-trained the model on all the data such that the number of training epochs corresponded to the one that resulted in the lowest validation loss in the previous run. Using this trained model, he obtained the predictions on the test data and submitted it. He was working over the weekend, so couldn’t make time for hyperparameter optimizations and feature engineering.
“MachineHack is an amazing platform, especially for beginners. The community is constantly growing, with new and brilliant participants competing. I intend to continue using MachineHack to practice and refresh my knowledge of data science”. ” – Snehan shared his opinion.