Weekend Hackathons are becoming more competitive, so we are back with a tougher one this time. Another exciting weekend hackathon to flex your machine learning classification skills by building an anomaly detection model to separate the good and anomalous products for one of India’s leading wafer manufacturers into 2 different classes.
Detecting Anomalies can be a difficult task and especially in the case of labeled datasets due to some level of human bias introduced while labeling the final product as anomalous or good. These giant manufacturing systems need to be monitored every 10 milliseconds to capture their behavior which brings in lots of information and what we call the Industrial IoT (IIOT). Also, hardly a manufacturer wants to create an anomalous product. Hence, the anomalies are like a needle in a haystack which renders the dataset that is significantly Imbalanced and has a very less number of rows.
The challenge will start on 28th Aug Friday at 6 pm IST.
Problem Statement & Description
Capturing such a dataset using a machine learning model and making the model generalize can be fun. In this competition, we bring such a use-case from one of India’s leading manufacturers of wafers(semiconductors). The dataset collected was anonymized to hide the feature names, also there are 1558 features that would require some serious domain knowledge to understand them.
However, In the era of Deep Learning, we are challenging the data science community to come up with an anomaly detection model that can generalize well on the unseen set of data(Test data). In this hackathon, you will be creating a machine learning/ deep learning model to classify the anomalies correctly using Area under the curve(AUC) as a metric.
This dataset also provides huge scope to feature engineering/dimensionality reduction and we are looking forward to some serious competition this time.
Given are 1558 distinguishing factors that can predict the right class of a product. Your objective as a data scientist is to build a machine learning model that can accurately classify the class of good products as well as anomalous products as close as possible.
The unzipped folder will have the following files.
- Train.csv – 1763 rows x 1559 columns (includes Class as target column)
- Test.csv – 756 rows x 1558 columns
- Sample Submission.csv – sample format for submission file.
- Feature_1 – Feature_1558 – Represents the various attributes that were collected from the manufacturing machine
- Class – (0 or 1) – Represents Good/Anaomalous class labels for the products
- High Dimensionality Data, Overfitting-vs-Underfitting
- Advanced Classification Techniques, Gradient Boosting, Neural Nets, etc
- Feature engineering, Feature Selection Techniques
- Optimizing Area under the curve(AUC) to generalize well on unseen data
The datasets will be made available for download on Aug 28th, Friday at 6 pm IST.
This hackathon and the bounty will expire on Aug 31st, Monday at 7 am IST.
We have introduced a new set of prizes going forward.
- Continous 3 finishes In Weekend Hackathons Top-3 participants on the private leaderboard will be interviewed for #HackeroftheMonth.
- Stand a Chance to get an exclusive interview for your Data Science/Machine Learning journey by Analytics India Magazine
Who is the #hackerofthemonth ??
Any participant can become #hackerofthemonth by proving their mettle in the weekend hackathon leaderboards. We will award the #hackerofthemonth community recognition to participants who are in Top-3 for 3-consecutive weekend hackathons in a row. Yes, you got it right, it’s a hattrick!!
Stand a chance to get Interviewed by the biggest AL/ML media-house in the country for your Data Science and Machine Learning journey.
Please note this PRIZE is only for the Weekend Hackathon series of competitions.
- One account per participant. Submissions from multiple accounts will lead to disqualification
- The submission limit for the hackathon is 10 per day after which the submission will not be evaluated
- All registered participants are eligible to compete in the hackathon
- This competition counts towards your overall ranking points
- We ask that you respect the spirit of the competition and do not cheat
- This hackathon will expire on 03rd August, Monday at 7 am IST
- Use of any external dataset is prohibited and doing so will lead to disqualification
- The submissions will be evaluated using the ROC-AUC score (Reciever Operating Characteristics – Area Under the Curve) metric. One can use roc_auc_score(actual, predicted)
- This hackathon supports private and public leaderboards
- The public leaderboard is evaluated on 30% of Test data
- The private leaderboard will be made available at the end of the hackathon which will be evaluated on 100% Test data