Weekend Hackathons are becoming more competitive, so we are back with a tougher one this time. In this weekend hackathon, we are providing an open UCI dataset but the target has been predicted by our machine learning model. Yes, you heard it right, In this weekend hackathon, we are challenging all the MachineHackers to design a machine learning model to predict the popularity of a news article provided various statistics associated with the raw text from news articles. The goal is to predict the news article’s popularity as close as possible.
The challenge will start on July 31st Friday at 6 pm IST.
Problem Statement & Description
The provided dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. These articles were published by Mashable (www.mashable.com) and their content as the rights to reproduce it belongs to them. Hence, this dataset does not share the original content but some statistics associated with it. The provided features were extracted as it is done with any NLP use case. The goal is to predict the news article’s popularity as close as possible. The noise in the extracted features makes it difficult to just use the provided attributes and reach a good score. This dataset also provides huge scope to feature engineering and we are looking forward to some serious competition this time.
Given are 58 distinguishing factors that can predict the popularity of news titles. Your objective as a data scientist is to build a machine learning model that can accurately predict the news article’s popularity as close as possible.
Data Description:-
The unzipped folder will have the following files.
- Train.csv – 7928 rows x 59 columns
- Test.csv – 31716 rows x 58 columns
- Sample Submission – Sample format for the submission.
Target Variable: shares (popularity of news tittles)
The datasets will be made available for download on July 31st, Friday at 6 pm IST.
This hackathon and the bounty will expire on August 3rd, Monday at 7 am IST.
Below are the file formats for the provided data
Train.csv – Glimpse of Train data, not all columns included
Test.csv – Glimpse of Test data, not all columns included
Sample_Submission.xlsx – Accepted Format of submissions
Bounties
The top 3 competitors will receive a free pass to the Computer Vision DevCon 2020
Know more about the Computer Vision DevCon 2020.
Rules
- One account per participant. Submissions from multiple accounts will lead to disqualification
- The submission limit for the hackathon is 10 per day after which the submission will not be evaluated
- All registered participants are eligible to compete in the hackathon
- This competition counts towards your overall ranking points
- We ask that you respect the spirit of the competition and do not cheat
- This hackathon will expire on 03rd August, Monday at 7 am IST
- Use of any external dataset is prohibited and doing so will lead to disqualification
Evaluation
The leaderboard is evaluated using Mean Absolute Error (MAE) for the participant’s submission.