In yet another Weekend Hackathon, MachineHack is back with an exciting challenge for the data scientists’ community. The challenge is to analyse the temporal nature of tea prices and forecast the weekly average price of tea.
This challenge is a part of MachineHack Weekend Hackathon Edition #2 — The Last Hacker Standing, where we pose unique problem statements every week, from 30 July to 9 Sept 2021.
PARTICIPATE & STAND A CHANCE TO WIN FREE PASSES TO THE DLDC 2021!!!
Problem Statement and Description
Despite having its origins in China, we Indians, in particular, share a great bond with tea. We all love our cup of “Chai ”, don’t we? Whether it’s black, green, spiced or with cream/sugar, the unique tea culture is a phenomenon to celebrate. Like other essential commodities such as wheat, paddy, sugar, etc., tea is part of the WholeSale Price Index under the Manufactured Products category. It is auctioned on a daily/weekly basis by a regulatory authority. And therefore, it also creates a massive impact on the Indian economy.
So what is the story behind this ubiquitous beverage, or how does it make its way from the plantation to the umpteen packaged variants, which we are asked to choose from?
In this hackathon, we are challenging the MachineHack community to analyse the temporal nature of tea prices in the training dataset and forecast the weekly average tea price for the 29 weeks mentioned in the test set.
The hackathon will start on 13 Aug 2021 at 8:00 PM (IST).
Real-world data is not always very legible or easy to understand. A lot of work goes around interpreting certain fields and trying to make some sense of the data by data wrangling, EDA and imputation, etc. For this challenge, we present unprocessed data for you to get a flavour of basic data engineering.
The participants need to create an ML model that can account for the temporal nature of tea prices in the training dataset and forecast the weekly average tea price for 29 weeks mentioned in the test set.
MachineHack has created a training dataset of 544 rows with 15 columns of ‘WeekEnding_Date’;
“Average Prices across Auction Places Kolkata, Bangalore, Cochin, Darjeeling, Ernakulam, Siliguri and Guwahati”; and “Ref_Price across Auction Places Kolkata, Bangalore, Cochin, Darjeeling, Ernakulam, Siliguri and Guwahati.” It also includes “Average” as the target variable. On the other hand, the dataset for testing consists of 29 rows with 15 columns.
The prerequisite skills required to attend the hackathon include forecasting, time series and machine learning approach.
The participants must submit a .csv/.xlsx file with exactly 29 rows and 1 column, including ‘Average ( Target).’ The submission will return an ‘Invalid Score’ in case of extra columns or rows.
Scikit-learn models support the predict() method to generate the predicted values.
The submission limit for this hackathon is one account per participant.
The evaluation of the hackathon will be done using the Root Mean Squared Error (RMSE) metric. One can use ‘np. sqrt (Mean Squared Error)’ to calculate the same.
The hackathon will also support private and public leaderboards. While the public leaderboard will be evaluated on 30% of the test data, the private leaderboard will be made available at the end of the hackathon and will be assessed on 100% of the test data.
The final score will be based on the ‘Best Score’ on the public leaderboard.
The hackathon will end on 19 Aug 2021 at 6:00 PM (IST).
The top three winners will get free passes to the Deep Learning DevCon 2021 (DLDC), scheduled to be held on 23-24 Sept 2021. In addition, the winners will also get a chance to improve their Global Leaderboard Rankings and become the ultimate MachineHack Grand Master.
- Train.csv — 544 rows x 15 columns (includes ‘Average’ as a target variable)
- Test.csv — 29 rows x 15 columns
- Machine Learning Approach