The Academy of Continuing Education at Shiv Nadar University, Delhi-NCR, in partnership with MachineHack, launched the annual Analytics Olympiad for data scientists and machine learning professionals. It was an opportunity for participants to showcase their technical skills and potential in business analytics to prospective employers.
The competition began on October 21 and was concluded on November 27. The participant who cracked Phase #3 of the challenge won the ₹1 Lakh/- grand prize.
The event was a roaring success. The Olympiad received over 1000 registrations from across the country. After undergoing a stringent qualifying round, 250 participants were selected, of which the top ten got an opportunity to give an in-person presentation at Shiv Nadar University, Delhi NCR before an eminent jury panel. Dr Swati Jain, Vice President, Analytics at EXL; Dr Anish Agarwal, director (data and analytics), NatWest Group; Megha Sinha, Vice President Digital (AI/ML/MLOps), Genpact; and Manoj Madhusudanan, Head of Dunnhumby India formed the jury for the Analytics Olympiad.
The Big Day
The Analytics Olympiad was conducted as part of the university’s tenth anniversary. Welcoming the participants, jury, and other guests, Dr Bibek Banerjee, Dean – Academy of Continuing Education, and Dean – School of Management & Entrepreneurship, Shiv Nadar University, Delhi-NCR, said, “We went through a deep exercise of identifying our vision for 2030. Data, analytics, and ML is going to inform and drive the vision for the School of Management & Entrepreneurship. The institute is working on analytics as a very important discipline domain where we want to co-invest with industry and young people to create stellar value. In that context, the idea for the Analytics Olympiad was conceived.”
For Analytics Olympiad 2021, the participants were challenged to build an ML model and predict the sales of each product from each outlet. The participants were also required to use the model to analyse the properties of the product in the stores and find ways to increase sales.
The ten qualifying participants included – Vivek Kumar, Technical Manager, at NatWest Group; Akash Gupta, the Data Scientist at HSBC; Mukul Sharma, Assistant Manager at Niva Bupa Health Insurance; Rupesh Prasad, the Manager, at TCS; Rajat Ranjan, Data Scientist, TheMathCompany; Indrashis Das, Product Analyst, at HighRadius Technologies; Jhagrut Pradeep Lalwani, Student of Veermata Jijabai Technological Institute; Arpita Saggar, MCA student, Delhi University; Yash Khandelwal, Student at Birla Institute of Technology in Ranchi; Rahul Pednekar, the Deputy General Manager at Vodafone Intelligent Solutions.
The ten participants were given a total time of 20 minutes each where 15 minutes were to showcase the presentation of their solutions, followed by a 5 minutes Q&A session. Arpita Saggar was adjudged the winner while Yash Khandelwal and Rahul Pednekar secured second and third places, respectively.
Rank 01: Arpita Saggar
For preparing the data, Arpita selected all the features and evaluated the model. The lesser relevant features were removed using permutation importance to evaluate how randomly shuffling a single column of the test data, leaving the target and all other columns in place, affects model performance on shuffled data. She further used Mean feature importance (over repeated calculations) to remove less important features. She scaled the numerical columns to a range [0, 1]. Categorical columns were encoded as a one-hot numeric array. The categorical features were ignored, and she partitioned the data by splitting it into ten consecutive folds (without shuffling). Each fold was then used once as a test set while the nine remaining folds formed the training set.
Arpita used the Voting Regressor model that comprises CatBoostRegressor, LGMBRegressor and XGBRegressor since the combined model outperforms individual models (lowest RMSE over ten folds of training data).
Check out her solution here.
Rank 02: Yash Khandelwal
He first split the training data into training and validation sets in an 80-20 ratio. Each of the ensemble models was then first to fit in the training set. The RMSE on the validation set was used for model weighting. The ensemble is fit on the whole training data. The inference is then carried out on the test data.
He used extreme gradient boost and light gradient boost, which are boosting algorithms (have the best predictive performance) as models with regularisation along with L1 and L2 regularisation. Khandelwal’s approach helped him conclude that the weighted average ensemble was able to achieve the lowest RMSE along with the stacking ensemble.
Check out his solution here.
Rank 03: Rahul Pednekar
Rahul Pednekar used two strategies. He created dummy variables in train and test. He then split the training data into 80 per cent (training) & 20 per cent (testing) and got RMSE values results using algorithms that included XGB, Catboost, LGBM, Lasso, Ridge, Elastic, Linear Regression – to know each algorithm’s performance without any hyperparameter tuning.
To get the best values, hyperparameter tuning was performed using the Optuna library with K-fold cross-validation. The prediction was performed using each of the algorithms ten times on the test using K-Gold cross-validation. The ten predictions were averaged to get the final prediction for each algorithm. To create ensemble, Pednekar defined weights starting 0.0 till 1.0 with the interval of 0.5, i.e., 0.0, 0.05, 0.1, 0.15, 0.2, 0.25 … 0.95, 1.0. He then found the best weights that gave minimum RMSE on the Out of fold prediction and used those weights to predict on the final test dataset
Part-1 Ensemble = LGBM_XGB_Catboost ensemble * 0.65 + Lasso Prediction * 0.35
All the steps mentioned above were repeated to arrive at the Part-2 prediction. He used Part-2 Ensemble = LGBM_XGB_Catboost_Lasso ensemble
Final prediction was made using the following formula: 0.5* Part-1 Predictions + 0.5 * Part-2 Predictions
Check out his solution here.
Data science and analytics are not just buzzwords but have penetrated some of the most crucial facets of our lives. The requirement now is to recognise these critical fields and devise effective solutions. Competitions like Analytics Olympiads are great opportunities to showcase such solutions.
About Shiv Nadar University, Delhi-NCR:
Shiv Nadar University, Delhi NCR is a student-centric, multidisciplinary and research-focused University offering a wide range of academic programs at the Undergraduate, Master’s, and Doctoral levels. The University was set up in 2011 by the Shiv Nadar Foundation, a philanthropic foundation established by Mr. Shiv Nadar, founder of HCL. The University is in the quest to become a globally acclaimed center for learning and innovation in the fields of Engineering, Natural Sciences, Humanities & Social Sciences, and Management. The core of the University consists of a select, world-class faculty with doctoral and postdoctoral experiences from ranked universities all over the world.
Shiv Nadar University has been recognized as one of the ten private ‘Institutions of Eminence’ by the Government. In the NIRF (Government’s National Institutional Ranking Framework), the University has been the youngest institution in the ‘top 100’ Overall list. In NIRF-2021, it ranked 56 in the ‘University’ category. Shiv Nadar University has been accredited with Grade ‘A’ by NAAC (National Assessment and Accreditation Council), valid for a period of five years from 26 November 2019. It is also among a select group of institutions in the country which were awarded the prestigious Atal Incubation Center grant by the NITI Aayog, Government of India, in the very first round in 2017.
The University’s Academy of Continuing Education aims to democratise access to best-in-class knowledge, practices, and skill development courses for all. Alongside, through a unique certification program in Data Sciences and Analytics for Business, it aims to help prepare for today and future careers. To know more about the data science & analytics for business program offered by Shiv Nadar University, Delhi-NCR, visit: https://sme.snu.edu.in/admissions/certificate-programs/DSAB