MITB Banner

Product Sentiment Classification: Weekend Hackathon #19

Share

We are back with another weekend hackathon and this weekend we are challenging the machinehack community to build an NLP model to analyze sentiments in the product reviews for various electronic products. 

Analyzing sentiments related to various products such as Tablet, Mobile and various other gizmos can be fun and difficult especially when collected across various demographics around the world. In this weekend hackathon, we challenge the machinehackers community to develop a machine learning model to accurately classify various products into 4 different classes of sentiments based on the raw text review provided by the user. Analyzing these sentiments will not only help us serve the customers better but can also reveal a lot of customer traits present/hidden in the reviews.

The challenge will start on 4th Sep Friday at 6 pm IST.

Problem Statement & Description

The sentiment analysis requires a lot to be taken into account mainly due to the preprocessing involved to represent raw text and make them machine-understandable. Usually, we stem and lemmatize the raw information and then represent it using TF-IDF, Word Embeddings, etc. However, provided the state-of-the-art NLP models such as Transformer based BERT models one can skip the manual feature engineering like TF-IDF and Count Vectorizers.

The dataset collected has close to 9000 rows with 4 columns and the reviews are in the form of raw text. The labels for each review are provided with the training labels such as positive, negative, no sentiment, and can’t be said(neutral sentence).

In this short span of time, we would encourage you to leverage the ImageNet moment (Transfer Learning) in NLP using various pre-trained models to classify the product reviews correctly using Multi-class Log Loss as a metric.

Given are raw customer reviews over various types of products with 4 different sentiment classes. Your objective as a data scientist is to build a natural language processing model that can accurately classify the class of sentiments as close as possible.

Dataset Description:

The unzipped folder will have the following files.

  • Train.csv – 6364 rows x 4 columns (Inlcudes Sentiment Column as Target)
  • Test.csv – 2728 rows x 3 columns
  • Sample Submission.csv – sample format for submission file.

How to Generate a valid Submission File

Sklearn models support the predic_proba() method to generate the probabilities for every class.

You should submit a .csv/.xlsx file with exactly 2728 rows with 4 columns (one column per class). Your submission will return an Invalid Score if you have extra columns or rows.

The file should have exactly 4 (0-3) columns:

Attribute Description:

  • Text_ID – Unique Identifier
  • Product_Description – Description of the product review by a user
  • Product_Type – Different types of product (9 unique products)
  • Class – Represents various sentiments
    • 0 – Cannot Say
    • 1 – Negative
    • 2 – Positive
    • 3 – No Sentiment

Skills:

  • NLP, Sentiment Analysis
  • Feature extraction from raw text using TF-IDF, CountVectorizer
  • Using Word Embedding to represent words as vectors
  • Using Pretrained models like Transformers, BERT
  • Optimizing multi-class log loss to generalize well on unseen data

The datasets will be made available for download on Sep 4th, Friday at 6 pm IST.

This hackathon and the bounty will expire on Sep 7th, Monday at 7 am IST.

Bounties

The top 3 competitors in this competition will receive a free pass to the Deep Learning DevCon 2020

We have also introduced a new set of prizes going forward.

  • Continous 3 finishes In Weekend Hackathons Top-3 participants on the private leaderboard will be interviewed for #HackeroftheMonth.
  • Stand a Chance to get an exclusive interview for your Data Science/Machine Learning journey by Analytics India Magazine

Who is the #hackerofthemonth ??

Any participant can become #hackerofthemonth by proving their mettle in the weekend hackathon leaderboards. We will award the #hackerofthemonth community recognition to participants who are in Top-3 for 3-consecutive weekend hackathons in a row. Yes, you got it right, it’s a hattrick!!

Stand a chance to get Interviewed by the biggest AL/ML media-house in the country for your Data Science and Machine Learning journey.

Please note this PRIZE is only for the Weekend Hackathon series of competitions.

Rules

  1. One account per participant. Submissions from multiple accounts will lead to disqualification
  2. The submission limit for the hackathon is 10 per day after which the submission will not be evaluated
  3. All registered participants are eligible to compete in the hackathon
  4. This competition counts towards your overall ranking points
  5. We ask that you respect the spirit of the competition and do not cheat
  6. This hackathon will expire on 03rd August, Monday at 7 am IST
  7. Use of any external dataset is prohibited and doing so will lead to disqualification

Evaluation

  • The submission will be evaluated using the Log Loss metric. One can use sklearn.metric.log_loss to calculate the same
  • This hackathon supports private and public leaderboards
  • The public leaderboard is evaluated on 30% of Test data
  • The private leaderboard will be made available at the end of the hackathon which will be evaluated on 100% Test data

Share
Picture of Anurag Upadhyaya

Anurag Upadhyaya

Experienced Data Scientist with a demonstrated history of working in Industrial IOT (IIOT), Industry 4.0, Power Systems and Manufacturing domain. I have experience in designing robust solutions for various clients using Machine Learning, Artificial Intelligence, and Deep Learning. I have been instrumental in developing end to end solutions from scratch and deploying them independently at scale.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.