- The first challenge in MachineHack's Weekend Hackathon Edition #2 is to build a highly scalable sentiment analysis model to classify customers emotions/sentiments from the raw unstructured data.
MachineHack is back with Weekend Hackathon Edition #2 — The Last Hacker Standing.
The Weekend Hackathon Edition #2 will be held for six weeks, starting 30th July to 9th September 2021. As part of the competition, we will release a new problem statement every Friday for the participants to solve within seven days and win exciting prizes.
Week 01: Problem statement & description
Understanding customer emotions is key for businesses to gauge their brand reputation and improve customer experience. Companies leverage sentiment analysis to understand customer data and their perception of brands, products or services. Although businesses collect massive data on a daily basis from emails, complaints, queries, support tickets, social media, executive chats, surveys, articles etc, 80-90% of such data are unstructured. Hence, businesses look for ways to manage the raw data and create a model to automatically measure customer sentiments to make informed decisions about their brand.
In this weekend hackathon, MachineHack challenges all the data scientists and machine learning practitioners to create a highly scalable Sentiment Analysis Model to accurately classify customer sentiments about a certain brand (Company ABC) based on the data.
The challenge will start on 30th July 2021 at 8:00 PM (IST)
In this challenge, the participants will work on a data mix of Reviews and Tweets.
The dataset collected for training has 44,100 rows with four columns of tweet_id; author; content; sentiment including sentiment (3 sentiments – 2-Positive, 0- Negative, 1- Neutral) as a target variable. On the other hand, the dataset for testing the model includes 18,900 rows and three columns and doesn’t include the target variable.
The hackathon calls for a few prerequisite skills such as text pre-processing, including lemmatization, tokenization, N-Grams and other relevant methods, multi-class classification and optimising Log Loss.
The submission will be evaluated using the Log Loss metric, and to generate a valid submission file, the participants must use scikit-learn models. The hackathon also supports private and public leaderboards. The public leaderboard will be evaluated on 30% of test data, and the private leaderboard will be evaluated on 100% of test data, to be made available at the end of the hackathon.
The submission limit for this hackathon is one account per participant.
The advanced-level challenge will allow data scientists and machine learning practitioners to get a glimpse into real-life sentiment analysis modelling.
The hackathon will end on 5th August 2021 at 6:00 PM (IST)
The top three winners of this hackathon will get free passes to the Deep Learning DevCon 2021 (DLDC), scheduled to be held on 23-24 September 2021. In addition, the winners will also get a chance to improve their Global Leader-Board Rankings & be the ultimate MachineHack Grand Master.
- Train.csv — 44100 rows x 4 columns (Includes Sentiment as target variable)
- Test.csv — 18900 rows x 3 columns (Doesn’t include the target variable)
- Text Pre-processing – Lemmatization , Tokenization, N-Grams and other relevant methods
- Multi-Class Classification
- Optimizing Log Loss