We are back with another weekend hackathon and this weekend we are challenging the machinehack community to build an NLP model to analyze fake contents in the raw text from social media platforms and various other news platforms.
Fake contents are everywhere from social media platforms, news platforms and there is a big list. Considering the advancement in NLP research institutes are putting a lot of sweat, blood, and tears to detect the fake content generated across the platforms.
Fake news, defined by the New York Times as “a made-up story with an intention to deceive”, often for a secondary gain, is arguably one of the most serious challenges facing the news industry today. In a December Pew Research poll, 64% of US adults said that “made-up news” has caused a “great deal of confusion” about the facts of current events
In this weekend hackathon, we challenge the machinehackers community to develop a machine learning model to accurately classify various contents into 6 different classes of labels such as True, False, Half-True, Barely-True, Mostly-True.
The challenge will start on 11th Sep Friday at 6 pm IST.
Problem Statement & Description
In this hackathon, your goal as a data scientist is to create an NLP model, to combat the fake content menace. We believe that using AI technologies hold promise for significantly automating parts of the procedure human fact-checkers use today to determine if a story is real or a hoax.
The dataset collected has close to 10240 rows with 3 columns and the content is in the form of raw text. The labels for each text row is provided with the training labels such as True, False, Half-True, Barely-True, Mostly-True.
In this short span of time, we would encourage you to leverage the ImageNet moment (Transfer Learning) in NLP using various pre-trained models to classify the product reviews correctly using Multi-class Log Loss as a metric.
Given are raw content collected over various social media and news platforms with 6 different classes. Your objective as a data scientist is to build a natural language processing model that can accurately classify the class labels as close as possible on test data.
The unzipped folder will have the following files.
- Train.csv – 10240 rows x 3 columns (Inlcudes Labels Column as Target)
- Test.csv – 1267 rows x 2 columns
- Sample Submission.csv – sample format for submission file.
How to Generate a valid Submission File
Sklearn models support the predic_proba() method to generate the probabilities for every class.
You should submit a .csv/.xlsx file with exactly 1267 rows with 6 columns (one column per class). Your submission will return an Invalid Score if you have extra columns or rows.
The file should have exactly 6 (0-5) columns:
- Text – Raw content from social media/ new platforms
- Text_Tag – Different types of content tags
- Labels – Represents various classes of Labels
- Half-True – 2
- False – 1
- Mostly-True – 3
- True – 5
- Barely-True – 0
- Not-Known – 4
- NLP, Sentiment Analysis
- Feature extraction from raw text using TF-IDF, CountVectorizer
- Using Word Embedding to represent words as vectors
- Using Pretrained models like Transformers, BERT
- Optimizing multi-class log loss to generalize well on unseen data
The datasets will be made available for download on Sep 11th, Friday at 6 pm IST.
This hackathon and the bounty will expire on Sep 14th, Monday at 7 am IST.
The top 3 competitors in this competition will receive a free pass to the Deep Learning DevCon 2020
We have also introduced a new set of prizes going forward.
- Continous 3 finishes In Weekend Hackathons Top-3 participants on the private leaderboard will be interviewed for #HackeroftheMonth.
- Stand a Chance to get an exclusive interview for your Data Science/Machine Learning journey by Analytics India Magazine
Who is the #hackerofthemonth ??
Any participant can become #hackerofthemonth by proving their mettle in the weekend hackathon leaderboards. We will award the #hackerofthemonth community recognition to participants who are in Top-3 for 3-consecutive weekend hackathons in a row. Yes, you got it right, it’s a hattrick!!
Stand a chance to get Interviewed by the biggest AL/ML media-house in the country for your Data Science and Machine Learning journey.
Please note this PRIZE is only for the Weekend Hackathon series of competitions.
- One account per participant. Submissions from multiple accounts will lead to disqualification
- The submission limit for the hackathon is 10 per day after which the submission will not be evaluated
- All registered participants are eligible to compete in the hackathon
- This competition counts towards your overall ranking points
- We ask that you respect the spirit of the competition and do not cheat
- This hackathon will expire on 14th September, Monday at 7 am IST
- Use of any external dataset is prohibited and doing so will lead to disqualification
- The submission will be evaluated using the Log Loss metric. One can use sklearn.metric.log_loss to calculate the same
- This hackathon supports private and public leaderboards
- The public leaderboard is evaluated on 30% of Test data
- The private leaderboard will be made available at the end of the hackathon which will be evaluated on 100% Test data