MachineHack is back with a new hackathon where the machine learning community is being challenged to come up with a machine learning model to predict the time a buyer will spend on an eCommerce platform.
The COVID-19 pandemic has massively changed consumer behaviour and buying patterns. Amid the pandemic, the buyers spent a significant amount of time surfing eCommerce stores, which in turn led to a massive surge in the number of users on eCommerce platforms across the domains. Alongside, the store owners are also working towards attracting customers using various algorithms to leverage customer behaviour patterns.
As a matter of fact, tracking customer activity has turned out to be a great way of understanding customer behaviour and analysing the same to serve them better. To facilitate this, machine learning and artificial intelligence have played a significant role in designing various recommendation engines to lure customers by predicting their buying patterns.
In this hackathon, MachineHack is challenging the participants to develop a regression algorithm to predict the time a buyer will spend on an eCommerce platform.
The challenge will start on 18th December, Friday at 6 PM IST.
Problem Statement & Description
In this hackathon, the goal of the participants will be to create a machine learning model to forecast the time spent by a buyer on an eCommerce platform. It is believed that AI and ML technologies hold significant promise in not only to predict buyers’ time on a platform but also for customer segmentation as well as product recommendation.
The dataset collected for training has 5429 rows with nine columns, including time_spent column as the target variable. Alongside the dataset for testing includes 2327 rows with eight columns. The attribute description includes session identifier (session_id and session_number); client software and device details (client_agent and device_details); date stamp of the session; binary value for any purchase done, cart activity and check-out activity; and the total time spent in seconds. The hackathon, however, comes with a few prerequisite skills such as regression modelling, advanced feature engineering, and ability to optimise RMSLE score.
The evaluation of the hackathon will be done using the RMSLE metric. One can use ‘np.sqrt(mean_squared_log_error(actual, predicted)’ to calculate the same. It will also support private and public leaderboards, where the public leaderboard will be evaluated on 70% of the test data. The private leaderboard, on the other hand, will be made available at the end of the hackathon and will be evaluated on 100% test data. Further, the final scores will be achieved based on the ‘best score’ on the public leaderboard.
With this hackathon, data scientists will not only have the opportunity to get hands-on creating an ML model and have the exposure of solving use cases at the organisational level, but the top three winners will also get MLDS 2021 passes, a conference exclusively designed for the machine learning practitioners ecosystem.
For generating a valid submission file, the participants must submit a .csv/.xlsx file with exactly 2327 rows with 1 column — i.e. time_spent. The submission will return an ‘Invalid Score’ if any extra column or rows are presented.
The hackathon will expire on 04th Jan Monday at 7 AM IST.
- Train.json – 5429 rows x 9 columns (Includes time_spent Column as Target variable)
- Test.json – 2327 rows x 8 columns
- Sample Submission.csv – Please check the Evaluation section for more details on how to generate a valid submission
- session_id – Unique identifier for every row
- session_number – Session type identifier
- client_agent – Client-side software details
- device_details – Client-side device details
- date – Datestamp of the session
- purchased – Binary value for any purchase done
- added_in_cart – Binary value for cart activity
- checked_out – Binary value for checking out successfully
- time_spent – Total time spent in seconds (Target Column)
- Regression modelling.
- Advance feature engineering, with date stamps and text data types.
- Optimising RMSLE score as a metric to generalise well on unseen data.