Detecting Anomalies in Wafer Manufacturing: Weekend Hackathon #18

Weekend Hackathons are becoming more competitive, so we are back with a tougher one this time. Another exciting weekend hackathon to flex your machine learning classification skills by building an anomaly detection model to separate the good and anomalous products for one of India’s leading wafer manufacturers into 2 different classes. 

Detecting Anomalies can be a difficult task and especially in the case of labeled datasets due to some level of human bias introduced while labeling the final product as anomalous or good. These giant manufacturing systems need to be monitored every 10 milliseconds to capture their behavior which brings in lots of information and what we call the Industrial IoT (IIOT). Also, hardly a manufacturer wants to create an anomalous product. Hence, the anomalies are like a needle in a haystack which renders the dataset that is significantly Imbalanced and has a very less number of rows.

The challenge will start on 28th Aug Friday at 6 pm IST.

Problem Statement & Description

Capturing such a dataset using a machine learning model and making the model generalize can be fun. In this competition, we bring such a use-case from one of India’s leading manufacturers of wafers(semiconductors). The dataset collected was anonymized to hide the feature names, also there are 1558 features that would require some serious domain knowledge to understand them. 

However, In the era of Deep Learning, we are challenging the data science community to come up with an anomaly detection model that can generalize well on the unseen set of data(Test data). In this hackathon, you will be creating a machine learning/ deep learning model to classify the anomalies correctly using Area under the curve(AUC) as a metric.

This dataset also provides huge scope to feature engineering/dimensionality reduction and we are looking forward to some serious competition this time.

Given are 1558 distinguishing factors that can predict the right class of a product. Your objective as a data scientist is to build a machine learning model that can accurately classify the class of good products as well as anomalous products as close as possible.

Dataset Description:

The unzipped folder will have the following files.

  • Train.csv – 1763 rows x 1559 columns (includes Class as target column)
  • Test.csv – 756 rows x 1558 columns
  • Sample Submission.csv – sample format for submission file.

Attribute Description:

  • Feature_1 – Feature_1558 – Represents the various attributes that were collected from the manufacturing machine
  • Class – (0 or 1) – Represents Good/Anaomalous class labels for the products


  • High Dimensionality Data, Overfitting-vs-Underfitting
  • Advanced Classification Techniques, Gradient Boosting, Neural Nets, etc
  • Feature engineering, Feature Selection Techniques
  • Optimizing Area under the curve(AUC) to generalize well on unseen data

The datasets will be made available for download on Aug 28th, Friday at 6 pm IST.

This hackathon and the bounty will expire on Aug 31st, Monday at 7 am IST.


We have introduced a new set of prizes going forward.

  • Continous 3 finishes In Weekend Hackathons Top-3 participants on the private leaderboard will be interviewed for #HackeroftheMonth.
  • Stand a Chance to get an exclusive interview for your Data Science/Machine Learning journey by Analytics India Magazine

Who is the #hackerofthemonth ??

Any participant can become #hackerofthemonth by proving their mettle in the weekend hackathon leaderboards. We will award the #hackerofthemonth community recognition to participants who are in Top-3 for 3-consecutive weekend hackathons in a row. Yes, you got it right, it’s a hattrick!!

Stand a chance to get Interviewed by the biggest AL/ML media-house in the country for your Data Science and Machine Learning journey.

Please note this PRIZE is only for the Weekend Hackathon series of competitions.


  1. One account per participant. Submissions from multiple accounts will lead to disqualification
  2. The submission limit for the hackathon is 10 per day after which the submission will not be evaluated
  3. All registered participants are eligible to compete in the hackathon
  4. This competition counts towards your overall ranking points
  5. We ask that you respect the spirit of the competition and do not cheat
  6. This hackathon will expire on 03rd August, Monday at 7 am IST
  7. Use of any external dataset is prohibited and doing so will lead to disqualification


Download our Mobile App

Anurag Upadhyaya
Experienced Data Scientist with a demonstrated history of working in Industrial IOT (IIOT), Industry 4.0, Power Systems and Manufacturing domain. I have experience in designing robust solutions for various clients using Machine Learning, Artificial Intelligence, and Deep Learning. I have been instrumental in developing end to end solutions from scratch and deploying them independently at scale.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can Apple Save Meta?

The iPhone kicked off the smartphone revolution and saved countless companies. Could the Pro Reality headset do the same for Meta?