MachineHack, in association with Analytics India Magazine, is back with yet another new hackathon — Merchandise Popularity Prediction Challenge.
Understanding the popularity of merchandise to get insights on consumer market dynamics has been crucial for eCommerce business intelligence. As a matter of fact, big brands spend a significant amount of time and money on popularising a product. Nevertheless, their efforts go in vain while establishing the merchandise in the hyperlocal market. Based on different geographical conditions, the same attributes can communicate much different information about the customer. Hence, such insights are a must for any brand owner to understand consumer economics.
In this hackathon, MachineHack has gathered data from one of India’s top apparel brands, with merchandise details like category, score, and presence in the store, and challenged the participants to develop a machine learning model that can predict the popularity level of the products.
The challenge will start on 22nd January, Friday at 6 PM IST.
The fortnight-long hackathon has challenged the community to build a machine learning model to forecast the popularity of merchandise for an eCommerce company. The popularity class will decide how popular the product is, given the attributes which a store owner can control to make it happen.
The dataset collected for training has 18208 rows with 12 columns, including the ‘popularity’ column as the target variable. Alongside the dataset for testing includes 12140 rows with 11 columns. The attribute description includes — store_ratio; basket_ratio; category_1; store_score; category_2; store_presence; score_1, 2, 3 and 4; time; and class of popularity in the target column. The hackathon, also, comes with a few prerequisite skills such as multi-class classification modelling, advanced feature engineering, and the ability to optimise multi-class log loss score as a metric to generalise well on unseen data.
The evaluation of the hackathon will be done using the ‘Log Loss’ metric. One can use ‘log_loss(y_true, y_pred)‘ to calculate the same. It will also support private and public leaderboards, where the public leaderboard will be evaluated on 70% of test data. On the other hand, the private leaderboard will be evaluated on 100% of test data and will be available at the end of the hackathon. However, the final score will be calculated based on the ‘best score’ on the public leaderboard.
To generate a valid submission file, the participants must submit a .csv/.xlsx file with exactly 12140 rows with five columns (i.e. 0, 1, 2, 3, 4). The submission will return an ‘invalid score’ if participants submit any extra columns or rows. The participants should use ‘y_true’ as provided as ‘class Labels(y_true)’ because the predicted probabilities per class (y_pred), from the model, are utilising the predict_proba() method. The submission limit for this hackathon is three per day, post which the submission will not be evaluated.
With this challenge, data scientists and the machine learning community will have the opportunity to get firsthand experience in creating a machine learning model, which will provide an exposure to solving use cases at the organisational level. The top three winners will get free passes to this year’s Machine Learning Developer’s Summit (MLDS) — India’s leading conference, exclusively designed for machine learning practitioners.
The hackathon will end on 8th February, Monday at 7 AM IST.
- Train.csv – 18208 rows x 12 columns (Includes popularity Column as Target variable)
- Test.csv – 12140 rows x 11 columns
- Sample Submission.csv – Please check the Evaluation section for more details on how to generate a valid submission
- popularity – Class of popularity (Target Column)
- Multi-class Classification Modeling
- Advance Feature engineering
- Optimising Multi-Class log loss score as a metric to generalise well on unseen data