MachineHack recently concluded its 18th edition of Machine Learning hackathons by announcing the winners for it’s Predict The Price Of Books challenge.
Shravan Kumar, Divyanshu Suri and Saurabh Kumar secured the first, second and third places respectively on the hackathon leaderboard. Analytics India Magazine introduces you to the winners and their approach to the solution.
#1: Shravan Kumar
Shravan Kumar is a Senior Manager of Advanced Analytics at Novartis. Though Shravan has been working in the analytics field for some time he always had a keen interest in the predictive analytics domain which he wished to explore. He started with MOOC courses in platforms like Coursera, Edx, Udacity etc. and acquired the skillsets for his area of interest. Having learned the essentials, Shravan’s next focus was on perfecting his skills through online competitions. He is an active participator of hackathons conducted by MachineHack, Kaggle and other platforms.
Shravan’s Approach To Solving The Problem
Shravan explains his approach as follows:
Pre-processing steps:
- Checked the number of rows, columns, data types of variables, missing values and observed word clouds of text data.
- Basic pre-processing steps for text are observed with a major focus on two columns i.e., Title and Synopsis. Meta Features are created like ‘number of words’, ‘number of unique words’, ‘number of characters’ etc.
- Reviews and Ratings values are converted into numeric values.
- Edition variable was split into two major variables Edition Type and Date column – Month is extracted from Date.
- Created features with TF-IDF and Count Vectorizer for both the Synopsis and Title variables.
- More features were created by using Glove vectors for the synopsis, genre, book category variable values.
- Converted all the text variables ‘Title’, ‘Author’, ‘Synopsis’, ‘Genre’, ‘Year’, ‘Month’ to label encoded values
- Converted the price of the book into Log(Price) – because it is giving a normal distribution
- Created count and mean encoded features for all categorical variables
Model building steps:
- Used 5 fold cross-validation techniques along with LightGBM as algorithm and RMSE as a metric.
- Experimented with hyperparameter tuning to achieve better scores, especially by changing the learning rate and seed values.
- Choosing the right cross-validation technique and feature preparation helped me achieve the 1st Rank on leaderboard
Click here to view the code.
“MachineHack is a great learning platform. The articles by Analytics India Magazine writers are very helpful and keep all our industry-relevant people updated with news in this industry. Truly MachineHack is one of the best hackathon organisers and data science knowledge portals in India,” he said.
#2: Divyanshu Suri
Now a Senior Manager of Machine Learning at AXA XL, Divyanshu Suri is not new to MachineHack and has won multiple hackathons. Having done his Bachelors in Statistics from Delhi University and a Masters in Applied Statistics from IIT Bombay, Divyanshu was amazed by the real power of data science in his second job at EXL Service where he worked in insurance analytics. He then went on to participate in many online hackathons gaining knowledge and improving his skills.
Now, as a Senior Manager, he applies predictive analytics to solve a variety of data science problems in commercial and speciality lines.
Divyanshu’s Approach To Solving The Problem
Divyanshu started the competition with exploratory analysis, trying to understand the data.
He then proceeded with data cleaning and feature engineering. In order to find the best fitting model for the problem, he tried different algorithms and compared the performances and then combined the better performing models.
He built a lot of different models based on a different set of variables, different transformations, different variable creation algorithms, and different ML algorithms and finally used stacking concept to come up with the final model.
Click here to view the code.
“MachineHack is a great platform to learn and apply new data science techniques and ML algorithms and improve your own skillset. It is also a great platform to compete with the other industry experts in the data science community.”- he said.
#3: Saurabh Kumar
A skilled and experienced Data Scientist in a reputed firm, Kumar has shown his expertise multiple times by topping several hackathons at MachineHack.
Kumar’s interest in the field of Data Science and Machine Learning emerged from a single algorithm. His personal experience with the Random Forest Algorithm and its capabilities thrilled him to pursue and advance his skills in the buzzing field. Kumar said he is inspired and overwhelmed by the ability of ML algorithms to solve a variety of real-world problems.
Kumar’s Approach To Solving The Problem
Sourabh Kumar used basic feature engineering and traditional NLP techniques like BOW and TF-IDF and lightgbm for cracking the hackathon.
Click here to view the code.
“I am active on the MachineHack platform since their first hackathon and really enjoy competing here. MachineHack team is very cooperative and is willing to work on feedbacks” – he said.