Now Reading
MachineHack Winners: How These Data Science Enthusiasts Solved The Recent ‘Women In AI’ Hackathon

MachineHack Winners: How These Data Science Enthusiasts Solved The Recent ‘Women In AI’ Hackathon

W3Schools

In yet another successful conclusion to our latest hackathon dedicated to the Women in AI community, MachineHack last week concluded its 24th hackathon down the line — Food Quality Assessment: Women in AI Hackathon as an effort to encourage the women in Analytics and Machine Learning space.

The hackathon provided an exciting opportunity for the participants to win exclusive passes to AIM’s upcoming Women In Analytics Conference, The Rising 2020

The hackathon was well received by the Data Science and Machine Learning community with over 530 registrations and close to 90 active participants, including both male and female enthusiasts.



Out of the 89 submissions, three women in AI topped our leaderboard, we will introduce you to them and the approach they took to solve the problem.

#1: Indira Raju

Indira has over 4 years of experience in the IT industry. In the course of these 4 years, Indira has been a PHP coder, a Data Analyst, a Data Scientist and a Business Analyst. She started out as a Research Assistant at IIT Kharagpur, where she worked as a PHP coder and soon entered the analytics space to work as a Data Analyst in the same institute. Her growing interest in the analytics field led her to enrol in a Certification Program in Big Data Analytics/Optimisation with INSOFE after which she joined Nielson as an Executive Data Scientist. Indira now works as a Business Analyst for HP Inc.

Indira often participates in hackathons which she believes would help her in staying abreast of the latest techniques in Machine Learning and Data Science.

Approach To Solving The Problem 

Indira explains her approach briefly as follows:

  1. Created multiple features to get to the target
  2. Since there is an overlap of dates in train and test set, past and future inspection results and reasons can be used to predict the correct result, however, in production or real-world scenarios, future results can not be used.
  3. An aggregation of  15 selected models gave the best leaderboard scores.
    • 6 k fold cross-validation with LightGBM
    • 4 k fold with LightGBM
    • 5 k fold with CatBoost

This resulted in an average cross-validation score of 0.13 using the features based on future results.

Without future results, a cross-validation score of 0.19 was obtained, which scored 0.18 on the leaderboard.

Get the complete code here.

#2: Varshinee Venkatesan

Varshinee has over 2.5 years of experience in the computer science domain working on technologies like web development, AR/VR, game development, big data, etc. It was when she got into the realm of big data that she became fascinated with data handling and preprocessing colossal amounts of data. From there on Varshinee had been keenly interested in the Data Science domain. Varshinee shared her data science experience and how helpful her current organisation Sirius Computer Solutions is.

Approach To Solving The Problem 

Varshinee explains her approach to the problem as follows:

“This particular problem was so interesting. There is a saying that data science is about 80% handling data and 20% algorithms. I deny that. I would say that data science is about 90% handling data and 10% algorithms,” says Varshinee. 

See Also
Data Scientist Pavan

  1. Since it was a multiclass classification problem, a deep neural network with softmax activation was used at the output layer. 
  2. Performed visualisation to know the importance of the features. 
  3. Used DNN to train the data, which did not produce the expected results. 
  4. Calculated the loss for each and every class and found out that two classes gave poor entropic loss. 
  5. Checked the relevance and difference between the classes and added a few extra features.
  6. Generated new features by combining ‘date’ with other features, which drastically improved the score. 
  7. Different classification algorithms were tried, checking the performance of each model.
  8. Tuned and optimised the algorithm to prevent overfitting and underfitting. 

“I would like to thank the Analytics India Magazine and MachineHack community a lot for coming up with this wonderful hackathon with a cool problem statement and encouraging all women data science enthusiasts to participate in this Hackathon. Hackathons like this will kindle more interest among data science enthusiasts, which in turn paves the way in creating more data scientists,” she told us about her experience on MachineHack

Get the complete code here.

#3: Mayuri Lashkare

Mayuri is a final year Bachelor of Computer Science student. Having been introduced to Data Science by her brother, she started her data science journey by diving into OpenIntro Statistics, an open-source textbook for introductory statistics. She then expanded her knowledge from various educational platforms like Coursera and Kaggle MicroCourses.

Approach To Solving The Problem 

Here is an outline of Mayuri’s approach to solving the problem:

  1. Started with basic EDA
  2. Data Preprocessing to handle NaN values
  3. Feature selection and Feature generation.
  4. Tried a number of algorithms like RandomForest, XGBoost and CatBoost. 

“This was a wonderful experience for me because I got a lot to learn and at the same time apply my knowledge in this competition. Thanks to the AIM and MachineHack team for this opportunity; I look forward to participating in more hackathons. I also enjoy reading AIM articles and find them very helpful and informative,” she spoke about her experience with MachineHack

Get the complete code here.

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top