Weekend Hackathons are becoming more competitive, so we are back with a tougher one this time. In this weekend hackathon, we are providing an open UCI dataset but the target has been predicted by our machine learning model. Yes, you heard it right, In this weekend hackathon, we are challenging all the MachineHackers to design a machine learning model to predict the popularity of a news article provided various statistics associated with the raw text from news articles. The goal is to predict the news article’s popularity as close as possible.
The challenge will start on July 31st Friday at 6 pm IST.
Problem Statement & Description
The provided dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. These articles were published by Mashable (www.mashable.com) and their content as the rights to reproduce it belongs to them. Hence, this dataset does not share the original content but some statistics associated with it. The provided features were extracted as it is done with any NLP use case. The goal is to predict the news article’s popularity as close as possible. The noise in the extracted features makes it difficult to just use the provided attributes and reach a good score. This dataset also provides huge scope to feature engineering and we are looking forward to some serious competition this time.
Given are 58 distinguishing factors that can predict the popularity of news titles. Your objective as a data scientist is to build a machine learning model that can accurately predict the news article’s popularity as close as possible.
The unzipped folder will have the following files.
- Train.csv – 7928 rows x 59 columns
- Test.csv – 31716 rows x 58 columns
- Sample Submission – Sample format for the submission.
Target Variable: shares (popularity of news tittles)
The datasets will be made available for download on July 31st, Friday at 6 pm IST.
This hackathon and the bounty will expire on August 3rd, Monday at 7 am IST.
Below are the file formats for the provided data
Train.csv – Glimpse of Train data, not all columns included
Test.csv – Glimpse of Test data, not all columns included
Sample_Submission.xlsx – Accepted Format of submissions
The top 3 competitors will receive a free pass to the Computer Vision DevCon 2020
- One account per participant. Submissions from multiple accounts will lead to disqualification
- The submission limit for the hackathon is 10 per day after which the submission will not be evaluated
- All registered participants are eligible to compete in the hackathon
- This competition counts towards your overall ranking points
- We ask that you respect the spirit of the competition and do not cheat
- This hackathon will expire on 03rd August, Monday at 7 am IST
- Use of any external dataset is prohibited and doing so will lead to disqualification
The leaderboard is evaluated using Mean Absolute Error (MAE) for the participant’s submission.
Provide your comments below
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
Experienced Data Scientist with a demonstrated history of working in Industrial IOT (IIOT), Industry 4.0, Power Systems and Manufacturing domain. I have experience in designing robust solutions for various clients using Machine Learning, Artificial Intelligence, and Deep Learning. I have been instrumental in developing end to end solutions from scratch and deploying them independently at scale.