Another weekend, another new hackathon for data scientists!
This time, MachineHack is challenging the data science community to predict the working hours per week for the desired salary.
This challenge is a part of the new MachineHack Fortnight Hackathon Series — where we pose unique problem statements every week to test your data science skills.
Problem Statement & Description
With the rise in different career opportunities, it has been challenging for us to balance our work and personal life. To grow in this competitive world, we are not only learning new things but also upskilling ourselves to stay relevant. As a matter of fact, the wages we receive in our career are linked to the working hours and the skills we learn over time. These working hours may vary at different locations for earning the same income.
In this hackathon, the MachineHack community needs to create an ML model that can predict the per week working hours at different locations with attributes such as work class, education, marital status, occupation capital-gain, capital-gain, capital-loss etc. to get the desired salary in a range.
The hackathon will start on 17th Sept 2021 at 6:00 PM (IST)
MachineHack has created a training dataset of 18944 rows with 15 columns, including “hours-per-week” as the target variable. The dataset for testing, on the other hand, consists of 8119 rows with 14 columns.
The prerequisite skills required to participate in this hackathon include optimising RMSE, forecasting, and knowledge of timeseries and machine learning approaches.
The participants must submit a .csv/.xlsx file with exactly 8119 rows with 1 column with the heading “hours-per-week”. The submission will return an ‘Invalid Score’ in case of extra columns or rows.
Scikit-learn models support the predict() method to generate the predicted values.
The submission limit for this hackathon is three per day, with one account per participant.
The evaluation of the hackathon will be done using the Root Mean Squared Error (RMSE) metric. One can use ‘np. sqrt (Mean Squared Error)’ to calculate the same.
The hackathon will also support private and public leaderboards. While the public leaderboard will be evaluated on 30% of the test data, the private leaderboard will be assessed on 100% of the test data and made available at the end of the hackathon.
The final score will be based on the ‘Best Score’ on the public leaderboard.
The hackathon will end on 01st Oct 2021 at 6:00 PM (IST).
- Train.csv: 18944 rows x 15 columns (with target Column: “hours-per-week”)
- Test.csv: 8119 rows x 14 columns
- Optimise RMSE
- Machine Learning Approach