MITB Banner

Classifying Movie Scripts: Predict The Movie Genre Hackathon

Share

MachineHack is launching yet another hackathon to keep the data science and machine learning community occupied during the quarantine period amid the Covid-19 outbreak. With the objective of helping the community use this time by expanding their knowledge, MachineHack and Analytics India Magazine brings to you – Classifying Movie Scripts: Predict The Movie Genre Hackathon

Problem Statement & Description

If provided by the entire script of the movie, can your ML model classify it into the right genre?

Labelling text data can be hard. To use the available information to auto-create or predict the labels can be an interesting machine learning task. Using the power of Natural Language Processing (NLP), the unstructured text data can be leveraged to auto-generate the right classes for the test data in the future.

To accomplish this, we have scraped close to 2000 movie scripts and the respective genres.

As some of the scripts are huge, it would be interesting to figure out new ways of feature extraction and different NLP techniques.

In this hackathon, participants are challenged to use the movie script to design a natural language processing system that can help the customer classify it into the right genre in the coming future.

The current platform struggles to classify the movies with an accuracy above 90%. However, we at MachineHack, feel that the current state of the art NLP algorithms such as BERT and OpenGPT have paved the way to design more robust systems which can understand the context of the provided text data.

Data Description

The participants will have access to the following files:

  • Train.csv – 1978 script file names with the class labels.
  • Test.csv – 849 script file names without the class labels.
  • Scripts – Folder with 2827 scripts .txt files.
  • Sample Submission – Sample format for the submission.
  • Started Notebook – A simple benchmark notebook.

Data Preview

Train.csv

Test.csv

Movie_Scripts_Sample_Submission.xlsx

Refer the starter notebook below, just run the notebook to generate a benchmark submission.

Bounties

The hackathon provides participants with an exclusive opportunity to win free passes to Cypher 2020

Top 3 competitors will receive a free pass to Cypher 2020.

Cypher is India’s largest Analytics & AI summit. In its sixth year, Cypher has emerged as the ideal platform to network and learn from leading industry experts, companies and startups in the fields of analytics, data science and artificial intelligence.

Learning from transformative thinkers and connecting with like-minded innovators, Cypher provides a platform where you will be challenged to push yourself in data-driven processes while drawing inspiration from those thriving in the industry.

Rules

  1. There can only be one account per participant. Submissions from multiple accounts will lead to disqualification.
  2. The submission limit for the hackathon is three per day, after which the submission will not be evaluated.
  3. This hackathon will expire on May 15 16:00 IST.
  4. All registered users are eligible to participate in the hackathon.
  5. This competition counts towards our overall ranking points.
  6. You will not be able to submit once you click the “Complete Hackathon” button. You may ignore this feature.
  7. We ask that you respect the spirit of the competition and do not cheat.

Evaluation

The leaderboard is evaluated using Multi-Class Log loss (Cross-entropy loss) for the participant’s submission.

Share
Picture of Amal Nair

Amal Nair

A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.