The Weekend Hackathon Edition #2 – The Last Hacker Standing Sentiment Analysis challenge concluded successfully on 5 August 2021. The challenge involved creating a scalable sentiment analysis model that generalizes well on unseen data. It had almost 250+ participants and 90+ solutions posted on the leaderboard.
Based on the leaderboard score, we have the top 3 winners of the Sentiment Analysis Challenge, who will get free passes to the virtual Deep Learning DevCon 2021, to be held on 23-24 Sept 2021. Here, we look at the winners’ journeys, solution approaches and experiences at MachineHack.
First prize – Chandrashekhar Kandhooru
Chandrashekhar is a serial hacker and a Grand Master with global rank two on the Machine Hack platform. He says he started participating in hackathons to develop his basic concepts. He already has secured the Top 3 position in 8 Machine Learning competitions and loves to solve complex problems.
Approach
He approached the Sentiment Analysis challenge by first preprocessing the text with BERT Tokenizer. He then trained his model using the Keras Bert model. It is a pre-trained deep learning model that directly takes the preprocessed data to the model. He first experimented with a padding sequence length of 500 over two hidden layers, 64 and 32 neurons, using “tanh” as the activation function. It gave a log-loss score of 0.608. His second experiment involved different sequence lengths (350,450,250) over different combinations of hidden layers((64,64),(128,128),(128,480,384)), etc. He then averaged all predicted probabilities from these combinations. This gave him the winning log loss score of 0.5934.
Experience
He says, “MachineHack is an amazing platform to learn more skills through hackathons.”
Check out his solution here.
Second prize – Sachin Yadav
Sachin Yadav forayed into data science by participating in a lot of competitive hackathons. He draws approaches and insights, effectively winning solutions of fellow participants, and from the latest research topics in the AI/ML field. Referring to the hackathon, he thinks the challenge was handling a data set varying in length from 1 to 1000 words. Therefore, the kind of preprocessing techniques applied would have to be effective in retaining important information.
Approach
Sachin’s approach to preprocessing the text involved :
Replacing all tagged users in the datasets with a generic word for example, ‘@xyz’ to ‘@user’
All references to sites were retained while removing the hyperlinks .This was to keep the dataset enriched when a link was shared within the review columns
Anything which was not a word was replaced with blanks
He did feature engineering by extracting word embedding, which was fed into a simple artificial neural network for prediction. Two models were used to extract four embeddings (Cased & Uncased datasets):
“cardiffnlp/twitter-roberta-base-sentiment”: One with dataset being cased and other with uncased
“paraphrase-monet-base-v2′”: One with dataset being cased and other with uncased
The final output was a blend of all the four prediction files from the simple artificial neural network.
Experience
“The experience with MachineHack has always been very exciting as it always provides a wide variety of business cases across different domains. Highly recommended for upskilling in the Data Science Domain.”
Check out his solution here.
Third prize – Gyan Kumar
Gyan Kumar works as a BI analyst. His job involves vast amounts of data crunching, drawing insights and using a descriptive analytics approach. For about a year, he was hooked on learning about machine learning and deep learning. He follows Andrew Ng’s videos on YouTube and spends a lot of time on Kaggle and Github. He loves researching computer vision and image processing using GANs. He aspires to be a data scientist.
Referring to the hackathon, the fact that the data set was diverse, i.e., a mix of tweets and reviews, was fascinating. In addition to it, only a week to build and improve a model was quite challenging and exhilarating.
Approach
Gyan preprocessed his data using the regex library and used the emoji library to convert emoji into words. During initial data exploration, he made bigrams and word clouds from raw data to check the most frequent words and decide which words to remove. He then defined a cleaning function to take care of frequently occurring words.
He used BERT Word embeddings to generate ids, masks and tokens. He found the word limits in some of the sentences exceeded the number of tokens that can be encoded with BERT. He experimented with the “maximum no of tokens” for encoding and found that the model performance was consistent, somewhere in the range of 160-384.
He used three variants of BERT:
Heavyweight”bert_en_uncased_L-24_H-1024_A-16/1″
TFDistilBertModel
TFBertModel
Results were almost similar but with better latency than heavyweight. He went with the TFBertModel for speed and optimal accuracy.
Experience
“MachineHack is an amazing platform for machine learning enthusiasts. I wanted to compete in the 2020 November Great Indian Hiring Hackathon but I was not that prepared to enter in a competition since I had recently started learning. This is my second outing at MachineHack this year.”
Check out his solution here.
Once again, join us in congratulating the winners of this exciting hackathon – who indeed were the “Last Hackers Standing” of Sentiment Analysis – Weekend Hackathon Edition-2 . Next week, we will be back with the winning solutions of the ongoing challenge – Music Genre Classification.