Last updated July 1, 2020
In Creative AI

Meet The MachineHack Champions Who Cracked The ‘Used Electronics Price Prediction’ Hackathon

Share

Published on June 11, 2020

by Amal Nair

MachineHack successfully concluded its seventh instalment of the weekend hackathon series last Monday. The Used Electronics Price Prediction hackathon was greatly welcomed by data science enthusiasts with over 300 registrations and active participation from over 100 practitioners.

Out of the 117 competitors, three topped our leaderboard. In this article, we will introduce you to the winners and describe the approach they took to solve the problem.

#1| Atif Hassan

Atif has always been fascinated with the idea of building something intelligent out of one’s own logic. As a high school student, he worked on fusing games with evolutionary algorithms. During his graduation, he worked on various ML-based projects such as topic recommendation systems, classification of scientific articles, etc.

His curiosity directed him into choosing data science as his career while doing his Masters at IIT Kharagpur. He is an active participant in many hackathons across platforms such as MachineHack and HackerEarth.He has also published two different novel algorithms on topics related to the data-mining and bioinformatics fields with another one on NLP, currently in review.

With all his skills and determination, he hopes to join an R&D department in the industry/government and simplify the lives of all his fellow Indians using data science.

Approach To Solving The Problem

Atif explains his approach as follows:

I treated this competition as an aspect-based extraction problem. My approach can be summarized in seven separate techniques.

I thoroughly cleaned the dataset, including a lot of product names that were present in the data without any space separations. I also dropped the City, State and Additional_Description columns.
Finding out what each number in the brand column represented was the most important part for me in this competition. Once I realised that Brand value of 0 was associated with the Huawei Honor products, 1 with apple products, 2 with Lenovo and, 3 with LG, I built five separate features, one for each brand with the apple products receiving two features (iphone and iwatch). These features were a relative ranking of all the different products based on their costs and ended up being the most important set of features in the final classifier.
I then went on to engineer five more features, two of which were the amount of RAM and ROM of each phone mentioned in the dataset, and the other three were boolean features representing whether the phones had a warranty on them or not, whether the payment was to be done online or in terms of cash, and whether the phone was in working state or not.
Once my feature engineering was complete, I applied the count-vectorizer from sklearn on the model_info column to generate a set of sparse features to represent each sentence.
I also added the un-normalized sum of IDF weighted word vectors as dense representations for each sentence in the model_info column.
These features alone put me at the 2nd spot with the default CatBoost (no hyper-parameter tuning).
Finally, I applied a weighted average of the default CatBoost and XGBoost to achieve the first position.

Get the complete code here.

“MachineHack is a great platform where one can practice his/her theoretical concepts and further their practical knowledge by a large extent. Due to the talented competitors on the leaderboard, one is really pushed to their limits to solve the problem, thus helping a person improve a lot. The hosting, submission and leaderboard of all challenges are seamlessly put together by the platform that allows participants to quickly take part in competitions as well as perform fast re-iteration of their models.” – Atif shared his views on MachineHack.

#2| Sayantan Basu

Inspired by his ML professors and a few of his seniors at IIT Kharagpur, Sayantan started his work in the field of data science and machine learning in 2015. The unavailability of formal courses on the subject in his college made him sporadically go through a variety of online resources. In 2017, he got a chance to work on an NLP based project, and since then he has never looked back. Later in 2018, he joined IIT Guwahati as an M.Tech student where he took formal courses and built the basic concepts of machine learning.

Even though he has been in the field for over half a decade, he still feels that he has only scratched the mere surface of this fast-paced field, where there is a new advancement every day. He acknowledges his seniors and friends who have inspired him and helped him improve a lot over the years.

Approach To Solving The Problem

Sayantan explains his approach briefly as follows.

Feature Engineering

Removed additional_description column
Removed City and State columns
Identified brand 0 as Honor’, brand 1 as ‘iPhone’, brand 2 as ‘Lenovo’ and brand 3 as ‘LG’.
Proceeded with common sense that phones with high MRP in the market would have a higher resale price
Found out different categories of models that were present in the dataset and made a relative importance feature table. For example, iPhone 11 has more importance than iPhone 7, so iPhone 11 can be given 2 and iPhone 7 can be assigned 1. Similarly, for Honor, Lenovo, and LG. This feature boosted my model’s performance to a huge extent
Extracted ROM size like 32gb, 64gb, etc., and used that as a feature
Used Word2vec and IDF-score to combine Model_Info and represented it as a 100 dimension vector
Used countVectorizer with unigrams and bigrams as features
Finally, I selected a few “good words” like “new” , “unused”, “excellent”, etc and added them as binary features.

Model

Catboost Regressor with no hyperparameter tuning

“The competitive platform really motivates people to push towards gaining more knowledge.” – he shared his opinion on MachineHack

Get the complete code here.

#3| Nikhil Kumar Mishra

Nikhil is currently a final year Computer Science Engineering student at Pesit South Campus, Bangalore. He started his data science journey during his second year after being inspired by a YouTube video on self-driving cars. The technology intrigued him, and he was driven into the world of ML. He started with Andrew NG’s famous course and applied his knowledge in the hackathons which he participated in.

Kaggle’s Microsoft Malware Prediction hackathon — in which he finished 25th — was a turning point in his data science journey as it gave him the confidence to take it further and challenge himself with more hackathons on MachineHack and other similar platforms.

Nikhil has been actively participating in hackathons amidst the lockdown, winning and ending up in top 3 for several competitions. He wishes to contribute and make an impact in the data science community.

Approach To Solving The Problem

He explains his approach briefly as follows:

I tried SOTA architectures like Roberta at first, but they did not help me, so I decided to stick to Tfidf Vectorizer.
Apart from the normal frequency of words that the vectorizer had extracted, I also extracted one more feature, i.e. the memory of a particular phone, which proved to be quite significant.
‘Additional Description’ feature was more of a noise, which on removing did not affect the cross validation score.
Other categorical features like Location were also removed as they were overfitting the training data.
For the final model, I used an ensemble of 20-fold LightGBM, 10 fold CatBoost and XgBoost.

Get the complete code here.

“MachineHack is an amazing platform, especially for beginners. The MachineHack team is very helpful in understanding and interacting with participants to get doubts resolved. Also, the community is ever-growing, with new and brilliant participants coming up in every competition. I intend to continue using MachineHack to practice and refresh my knowledge on data science,” says Nikhil about his experience with MachineHack.

Check out new hackathons here

Access all our open Survey & Awards Nomination forms in one place