Last updated September 9, 2020
In Creative AI

MachineHack Winners: How A Data Scientist & An Analyst Secured Leaderboard Rank In MH Predicting The Restaurant Food Cost Hackathon

Share

Illustration by Dikshant Agarwal and Saurabh Kumar

Published on May 28, 2019

by Disha Misal

MachineHack recently concluded its Predicting Restaurant Food Cost Hackathon. Analytics India Magazine talked to the leaderboard rank holders of the hackathon to know about their data science journey and how they solved the problem.

Dikshant Agarwal

Journey In Data Science:

Presently working as a data scientist at a Fintech startup, Dikshant started his career as a product designer in a robotics company. A year later, he enrolled himself into a liberal arts program called Young India Fellowship at Ashoka University. Few months before his graduation there, he had to make a choice about what kind of industry he wanted to work in. “I loved tech and the dynamic nature of it,” he said. After a few discussions with his engineering seniors and friends, he decided to give data science a shot.

He self-taught himself through MOOCs like the Andrew Ng ML course and books like Introduction to Machine Learning with Python by Andreas C. Muller. He also spent a significant amount of time learning python programming basics. After getting the hang of the basics, he started picking diverse types of projects from various online sources like Kaggle and slowly got comfortable with that analytical mindset and data science approach. Recently, Dikshant started doing hackathons as a way to explore how data gets utilised by different industries, and try novel methods and approaches to better model these different problem statements. Currently, he is working as a Data Scientist in a fin-tech startup.

How did he solve this MachineHack problem:

Dikshant said that the given dataset for the hackathon was particularly interesting since it had the rawest data available for any restaurant. He started by exploring how categorical values of different available features correlated with cost. This initial exploration helped him to gain an understanding of how to further clean and transform the data before modelling. He spent a significant amount of time cleaning and, subsequently, testing performance on different models. He also tried wrangling time data but couldn’t extract significant enough information for his final model. He later finished by tuning his algorithms and stacking them together. For this problem, he used a stacked version of Random Forest, XGBoost, Gradient Boosting and LightGBM. Here is the code on GitHub that Saurabh used for the hackathon.

Experience on MachineHack:

It was his first time participating in a MachineHack hackathon and he said that it was really great to see spontaneity and enthusiasm of other fellow data scientists. Dikshant said, “The data was also, as mentioned before, quite “raw” and interesting to explore and make sense of. It seemed like a really good sample to model the original data and it definitely flexed my data exploration and munging skills. Really excited about what MachineHack offers next!”

Saurabh Kumar

Journey in data science:

Saurabh Kumar is a Group Lead working on Financial Surveillance Analytics at Ameriprise Financial Services Inc. He is an avid data scientist and he first got interested in the subject back in the year 2014, when he learnt about the machine learning algorithm of random forest and its performance in classification tasks compared to traditional classifiers. Since then, he tries to keep his curiosity and consistency in learning about the field by participating in various hackathon platforms. He was inspired and overwhelmed by the ability of ML algorithms to solve a variety of real-world problems.

How did he solve this MachineHack problem:

In this challenge, there were lots of unstructured data features, cuisines and time for example. He used TF-IDF to create features out of them. According to Saurabh, there were low raw features so he created lots of interactive features, which helped his model to identify hidden signals within the data. While modelling, he took the log transform of y and fitted model on the transformed variable. This helped him to reduce the variance of residuals. Finally, he used LGBM regression as his model. Here is the code on GitHub that Saurabh used for the hackathon.

Experience on MachineHack:

Saurabh has participated and has had top ranks in the leaderboard on multiple MachineHack hackathons in the past. “Predict The Data Scientists Salary In India Hackathon” and “Who Let The Dogs Out: Pets Breed Classification hackathon” are two of those. Talking about his experience on MahcineHack, Saurabh says, “I love Machine Hack platform, you guys post interesting problems and now the competition has increased here so it is fun to compete with some of the top minds in data science.”

Access all our open Survey & Awards Nomination forms in one place