With the successful conclusion of yet another MachineHack’s hackathon — House Price Prediction Challenge — on 26th of October, Analytics India Magazine spoke to winners to understand their approach to solve the complex problem statement and to find out about their experience of participating and winning the hackathon.
A month-long challenge, House Price Prediction Hackathon, was not only complicated but also exhausting considering, it could be challenging to identify the right set of attributes influencing buyers’ behaviour as such. For this hackathon, the participants were challenged to build a regression model, based on 12 influencing factors, for analysing and accurately predicting the house prices in India. The challenge was much welcomed by data scientists with active participation from close to 700 machine learning practitioners.
After a careful evaluation using the RMSLE, Root Mean Squared Logarithmic Error, metric, along with the leaderboard score, three participants topped our leaderboard. Here, we are going to introduce you to the champions of this complex House Price Prediction Hackathon along with an understanding of their approach to solve the problem.
Winner 01: Shiv Kumar
A final year electrical engineering student from IIT Bhilai, Shiv Kumar’s area of interest is machine learning, deep learning, and artificial intelligence. Shiv Kumar believes in exploring his knowledge through machine learning competitions and hackathons on platforms like MachineHack, HackerEarth and more. Currently, Shiv is in his final year, and searching for a job in the field of artificial intelligence, and looking to gain industry experience while applying his knowledge.
To solve the House Price Prediction Challenge, Shiv leveraged feature engineering, which was the key to enter in the top ranks. “I did some data analysis and applied the same in the model,” said Shiv. “Ensembling and GBM can also give good results like LGBM, XGBoost and Adboost.”
Shiv combined all of them and managed to get a perfect result. Further, to get better results, he suggested that one can also leverage ‘addresses’ in the data, which plays a useful role in feature engineering and improves model accuracy.
Winner 02: Sunil Sanjay Hule
A computer science student, Sunil Hule began learning Python at the beginning of 2019 from ‘Automate The Boring Stuff With Python’ by AI Sweigart, which then introduced him to data structures leading him to the field of competitive programming. Though he didn’t achieve any podiums, he indeed gained a pretty good command over Python. Sunil learnt his machine learning, deep learning and artificial intelligence by browsing through Youtube videos along with Udemy’s machine learning courses.
While this was good, Sunil needed more for the advancement of his career. Amid COVID pandemic, Sunil signed up for mentorship, which turned out to be a huge advantage for his career. This led him to begin participating in machine learning hackathons to practice. Currently, Sunil is in his final years of engineering, with an experience of participating in over ten hackathons and completed over five projects. Along with that, Sunil has also started creating content online to help beginners with machine learning.
To accurately predict prices of the houses in the hackathon, Sunil started with exploratory data analysis with a brief look at the dataset. This involved hypothesis generation, hypothesis testing, exploring the dataset, and univariate and bivariate analysis. With this, Sunil was able to build a basic model. Once that’s done, Sunil leveraged the feature engineering process which helped in extracting features based on EDA as well as feature grouping. This included feature preprocessing, setting cross-validation strategy, hyperparameter tuning, and level 2 stacking and blending the predictions.
When asked about his experience at MachineHack, he said, “This was my first ever competition on MachineHack, and I really liked the community and diverse problem statements it features.”
Winner 03: Saurav Mishra
Saurav’s data science journey started with a few machine learning courses in college, which turned out to be enough to get him placed in this profile. However, after getting placed in a job, Saurav stopped learning and just waited to start the career. However, in July, this year, Saurav realised that his joining had been postponed till March 2021, due to COVID pandemic and this is what made him restart his learning process.
To start with, Saurav did some MOOC and grasped knowledge from some online blogs. With the help of his friend, Saurav got introduced to hackathons and competitions, which led to lengthy discussions on various methods and algorithms, and also provided in-depth knowledge to solve these complex problems. “I have only participated in five competitions so far, and I believe these competitions are the best way to learn,” said Saurav.
To solve this House Prediction problem, Saurav started with fundamental exploratory data analysis, and then removed duplicate rows from the training dataset and quickly built a baseline LGBM model. Once that’s done, Saurav moved to feature engineering, with which he was able to extract the city and locality information from the address column and created some more features using GroupBy on the city and locality information — these features improved the model’s performance. Finally, Saurav approached ensemble KFold of XGBoost, LGBM and CatBoost in order to get the final prediction.
This was Saurav’s first MachineHack competition and hopefully many more to come. “I liked the interface of MachineHack very much, and I believe this is a great place to apply and improve my machine learning skills,” said Saurav. “With such competitions, we also get to learn from top machine hackers through their published solutions and can even connect with them. Looking forward to more competitions.”