Classifying Movie Scripts: Predict The Movie Genre Hackathon

MachineHack is launching yet another hackathon to keep the data science and machine learning community occupied during the quarantine period amid the Covid-19 outbreak. With the objective of helping the community use this time by expanding their knowledge, MachineHack and Analytics India Magazine brings to you – Classifying Movie Scripts: Predict The Movie Genre Hackathon

Problem Statement & Description

If provided by the entire script of the movie, can your ML model classify it into the right genre?

Labelling text data can be hard. To use the available information to auto-create or predict the labels can be an interesting machine learning task. Using the power of Natural Language Processing (NLP), the unstructured text data can be leveraged to auto-generate the right classes for the test data in the future.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

To accomplish this, we have scraped close to 2000 movie scripts and the respective genres.

As some of the scripts are huge, it would be interesting to figure out new ways of feature extraction and different NLP techniques.

In this hackathon, participants are challenged to use the movie script to design a natural language processing system that can help the customer classify it into the right genre in the coming future.

The current platform struggles to classify the movies with an accuracy above 90%. However, we at MachineHack, feel that the current state of the art NLP algorithms such as BERT and OpenGPT have paved the way to design more robust systems which can understand the context of the provided text data.

Data Description

The participants will have access to the following files:

  • Train.csv – 1978 script file names with the class labels.
  • Test.csv – 849 script file names without the class labels.
  • Scripts – Folder with 2827 scripts .txt files.
  • Sample Submission – Sample format for the submission.
  • Started Notebook – A simple benchmark notebook.

Data Preview




Refer the starter notebook below, just run the notebook to generate a benchmark submission.


The hackathon provides participants with an exclusive opportunity to win free passes to Cypher 2020

Top 3 competitors will receive a free pass to Cypher 2020.

Cypher is India’s largest Analytics & AI summit. In its sixth year, Cypher has emerged as the ideal platform to network and learn from leading industry experts, companies and startups in the fields of analytics, data science and artificial intelligence.

Learning from transformative thinkers and connecting with like-minded innovators, Cypher provides a platform where you will be challenged to push yourself in data-driven processes while drawing inspiration from those thriving in the industry.


  1. There can only be one account per participant. Submissions from multiple accounts will lead to disqualification.
  2. The submission limit for the hackathon is three per day, after which the submission will not be evaluated.
  3. This hackathon will expire on May 15 16:00 IST.
  4. All registered users are eligible to participate in the hackathon.
  5. This competition counts towards our overall ranking points.
  6. You will not be able to submit once you click the “Complete Hackathon” button. You may ignore this feature.
  7. We ask that you respect the spirit of the competition and do not cheat.


The leaderboard is evaluated using Multi-Class Log loss (Cross-entropy loss) for the participant’s submission.

Amal Nair
A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact:

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry


Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox