News Popularity Prediction: Weekend Hackathon #14

Weekend Hackathons are becoming more competitive, so we are back with a tougher one this time. In this weekend hackathon, we are providing an open UCI dataset but the target has been predicted by our machine learning model. Yes, you heard it right, In this weekend hackathon, we are challenging all the MachineHackers to design a machine learning model to predict the popularity of a news article provided various statistics associated with the raw text from news articles. The goal is to predict the news article’s popularity as close as possible.

The challenge will start on July 31st  Friday at 6 pm IST.

Problem Statement & Description

The provided dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. These articles were published by Mashable ( and their content as the rights to reproduce it belongs to them. Hence, this dataset does not share the original content but some statistics associated with it. The provided features were extracted as it is done with any NLP use case. The goal is to predict the news article’s popularity as close as possible. The noise in the extracted features makes it difficult to just use the provided attributes and reach a good score. This dataset also provides huge scope to feature engineering and we are looking forward to some serious competition this time.

Given are 58 distinguishing factors that can predict the popularity of news titles. Your objective as a data scientist is to build a machine learning model that can accurately predict the news article’s popularity as close as possible.

Data Description:-

The unzipped folder will have the following files.

  • Train.csv –  7928 rows x 59 columns
  • Test.csv –  31716 rows x 58 columns
  • Sample Submission – Sample format for the submission.

Target Variable: shares (popularity of news tittles)

The datasets will be made available for download on July 31st, Friday at 6 pm IST.

This hackathon and the bounty will expire on August 3rd, Monday at 7 am IST.

Below are the file formats for the provided data

Train.csv – Glimpse of Train data, not all columns included

Test.csv – Glimpse of Test data, not all columns included

Sample_Submission.xlsx – Accepted Format of submissions


The top 3 competitors will receive a free pass to the Computer Vision DevCon 2020

Know more about the Computer Vision DevCon 2020.


  1. One account per participant. Submissions from multiple accounts will lead to disqualification
  2. The submission limit for the hackathon is 10 per day after which the submission will not be evaluated
  3. All registered participants are eligible to compete in the hackathon
  4. This competition counts towards your overall ranking points
  5. We ask that you respect the spirit of the competition and do not cheat
  6. This hackathon will expire on 03rd August, Monday at 7 am IST
  7. Use of any external dataset is prohibited and doing so will lead to disqualification


The leaderboard is evaluated using Mean Absolute Error (MAE) for the participant’s submission.

Download our Mobile App

Anurag Upadhyaya
Experienced Data Scientist with a demonstrated history of working in Industrial IOT (IIOT), Industry 4.0, Power Systems and Manufacturing domain. I have experience in designing robust solutions for various clients using Machine Learning, Artificial Intelligence, and Deep Learning. I have been instrumental in developing end to end solutions from scratch and deploying them independently at scale.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.

Intel Goes All in on AI

Pat Gelsinger said, there are three types of chip manufacturers, “you’re big, you’re niche or you’re dead”