We are back with another weekend hackathon and this weekend we are challenging the machinehack community to build an NLP model to analyze sentiments in the product reviews for various electronic products.
Analyzing sentiments related to various products such as Tablet, Mobile and various other gizmos can be fun and difficult especially when collected across various demographics around the world. In this weekend hackathon, we challenge the machinehackers community to develop a machine learning model to accurately classify various products into 4 different classes of sentiments based on the raw text review provided by the user. Analyzing these sentiments will not only help us serve the customers better but can also reveal a lot of customer traits present/hidden in the reviews.
The challenge will start on 4th Sep Friday at 6 pm IST.
Problem Statement & Description
The sentiment analysis requires a lot to be taken into account mainly due to the preprocessing involved to represent raw text and make them machine-understandable. Usually, we stem and lemmatize the raw information and then represent it using TF-IDF, Word Embeddings, etc. However, provided the state-of-the-art NLP models such as Transformer based BERT models one can skip the manual feature engineering like TF-IDF and Count Vectorizers.
The dataset collected has close to 9000 rows with 4 columns and the reviews are in the form of raw text. The labels for each review are provided with the training labels such as positive, negative, no sentiment, and can’t be said(neutral sentence).
In this short span of time, we would encourage you to leverage the ImageNet moment (Transfer Learning) in NLP using various pre-trained models to classify the product reviews correctly using Multi-class Log Loss as a metric.
Given are raw customer reviews over various types of products with 4 different sentiment classes. Your objective as a data scientist is to build a natural language processing model that can accurately classify the class of sentiments as close as possible.
The unzipped folder will have the following files.
- Train.csv – 6364 rows x 4 columns (Inlcudes Sentiment Column as Target)
- Test.csv – 2728 rows x 3 columns
- Sample Submission.csv – sample format for submission file.
How to Generate a valid Submission File
Sklearn models support the predic_proba() method to generate the probabilities for every class.
You should submit a .csv/.xlsx file with exactly 2728 rows with 4 columns (one column per class). Your submission will return an Invalid Score if you have extra columns or rows.
The file should have exactly 4 (0-3) columns:
- Text_ID – Unique Identifier
- Product_Description – Description of the product review by a user
- Product_Type – Different types of product (9 unique products)
- Class – Represents various sentiments
- 0 – Cannot Say
- 1 – Negative
- 2 – Positive
- 3 – No Sentiment
- NLP, Sentiment Analysis
- Feature extraction from raw text using TF-IDF, CountVectorizer
- Using Word Embedding to represent words as vectors
- Using Pretrained models like Transformers, BERT
- Optimizing multi-class log loss to generalize well on unseen data
The datasets will be made available for download on Sep 4th, Friday at 6 pm IST.
This hackathon and the bounty will expire on Sep 7th, Monday at 7 am IST.
The top 3 competitors in this competition will receive a free pass to the Deep Learning DevCon 2020
We have also introduced a new set of prizes going forward.
- Continous 3 finishes In Weekend Hackathons Top-3 participants on the private leaderboard will be interviewed for #HackeroftheMonth.
- Stand a Chance to get an exclusive interview for your Data Science/Machine Learning journey by Analytics India Magazine
Who is the #hackerofthemonth ??
Any participant can become #hackerofthemonth by proving their mettle in the weekend hackathon leaderboards. We will award the #hackerofthemonth community recognition to participants who are in Top-3 for 3-consecutive weekend hackathons in a row. Yes, you got it right, it’s a hattrick!!
Stand a chance to get Interviewed by the biggest AL/ML media-house in the country for your Data Science and Machine Learning journey.
Please note this PRIZE is only for the Weekend Hackathon series of competitions.
- One account per participant. Submissions from multiple accounts will lead to disqualification
- The submission limit for the hackathon is 10 per day after which the submission will not be evaluated
- All registered participants are eligible to compete in the hackathon
- This competition counts towards your overall ranking points
- We ask that you respect the spirit of the competition and do not cheat
- This hackathon will expire on 03rd August, Monday at 7 am IST
- Use of any external dataset is prohibited and doing so will lead to disqualification
- The submission will be evaluated using the Log Loss metric. One can use sklearn.metric.log_loss to calculate the same
- This hackathon supports private and public leaderboards
- The public leaderboard is evaluated on 30% of Test data
- The private leaderboard will be made available at the end of the hackathon which will be evaluated on 100% Test data