MachineHack, in association with Analytics India Magazine, has come up with yet another hackathon for the machine learning community — the Workation Price Prediction Challenge.
In the light of the new normal, different websites have started providing packages to work from different locations. The concept of workation — a portmanteau of work and vacation– is gaining currency. However, it is challenging to find a good place with all the amenities, including high-speed internet and a comfortable stay within the budget.
Sign up for your weekly dose of what's up in emerging technology.
Thus, to solve the real-world problem of finding the best deals for workations, MachineHack is challenging the machine learning community to build a model for predicting the price per person for workation trips.
To facilitate this, MachineHack has collected workation packages in and around India — starting from Kashmir to Kanyakumari and from Gujarat to Assam. The data has more than 18000+ rows of different packages with details like start location, hotel type, cost per person, destination, itinerary, and many more. Using this dataset, along with the knowledge of machine learning, deep learning, and model building, the participants need to create a model that can efficiently and accurately predict a workation trip’s expense.
The challenge will start on 26th March, Friday at 6 PM IST.
Overview Of The Hackathon
The seventeen-day long advanced hackathon challenges machine learning practitioners to develop a prediction model that can forecast the budget required for a workation trip.
The dataset collected for training has 21,000 rows with 15 columns, including per person price column as a target variable. On the other hand, the dataset for testing the model includes 9,000 rows with 14 columns and doesn’t include the target variable. The attribute description includes — unique identifier per row sample; package name; package type; type of the tour package; destination; complete itinerary; places covered; travel date; hotel details; flight details; sightseeing places covered; cancellation rules; price of the tour package per person. The advanced hackathon also comes with a few prerequisite skills such as advanced regression modelling; feature engineering, and ensemble modelling.
Since hackathon is evaluated using the Root Mean Squared Log Error (RMSLE) metric, the participants must be aware of how to optimize it to generalise well on unseen data. One can use np.sqrt(mean_squared_log_error(actual, predicted)) to calculate the same. The hackathon also supports private and public leaderboards. The public leaderboard will be evaluated on 70% of test data, and the private leaderboard will be evaluated on 100% of test data, which will be available at the end of the hackathon. The Final Score will be decided based on the score achieved on the public leaderboard.
To generate a valid submission file, the participants must use scikit-learn models that support the “predict()” method to generate the predicted values. Participants should submit a .csv/.xlsx file with exactly 9,000 rows with one column displaying “Per Person Price.” The submission will return an ‘invalid score’ if participants submit any extra columns or rows. The submission limit for this hackathon is one account per participant with three submissions per day, after which the submission will not be evaluated.
The advanced-level change will allow the data scientists and machine learning community to get hands-on experience in creating a machine learning model solving one of the major real-world problems of 2021. The top three winners will get a free pass to The RISING 2021 — the biggest meeting of women data science leaders.
The hackathon will end on 12th April, Monday at 7 AM IST.
- Train.csv – 21000 rows x 15 columns (Includes Per Person Price Column as Target variable)
- Test.csv – 9000 rows x 14 columns (Doesn’t include the Target Variable)
- Sample Submission.csv – Please check the Evaluation section for more details on how to generate a valid submission
- Uniq Id – Unique Identifier per row sample
- Package Name – Name of the tour package
- Package Type – Type of the tour package
- Destination – A destination place
- Itinerary – complete itinerary
- Places Covered – covered places in the itinerary
- Travel Date – Date of travel
- Hotel Details – Details of the hotel stay
- Start City – Start place for the travel
- Airline – Flight details
- Flight Stops – Intermediate stops if any
- Meals – Inflight meals or services
- Sightseeing Places Covered – Itinerary details regarding sightseeing
- Cancellation Rules – Cancellation policy as per travel company
- Per Person Price – Price of the tour package per person (Target Column)
- Advanced Regression Modeling
- Feature Engineering, Ensemble Modeling
- Optimising RMSLE(Root Mean Squared Log Error) as a metric to generalise well on unseen data