For years, people have been comparing machine learning and data science hackathons with real-world implications. Yet, ironically, the debates are never-ending and often ambiguous.
For instance, if you look at online hackathon platforms like Kaggle or MachineHack. These platforms allow users to find and publish data sets, explore and build models in a web-based data-science environment, collaborate/work with other data scientists and machine learning engineers, and enter the competition to solve data science and machine learning challenges across experience levels – beginners to intermediate and expert.
Sign up for your weekly dose of what's up in emerging technology.
Hackathon platforms have been serving as a test-bed for data scientists and machine learning professionals. As per Kaggle, more than 55 per cent of data scientists have less than three years of experience, and six per cent of them pursuing data science have been using machine learning for more than a decade.
There are a lot more gains than losses by participating in hackathons. Some of the benefits/advantages include:
- Learning and collaborating opportunity. Participants get to network with like-minded people and discuss their solutions/approaches to the problems. Plus, working in groups helps to approach a problem from new perspectives and collaborate to achieve results.
- Experimenting with many SOTA approaches and datasets
- At times, you end up making great contact and landing an awesome job by showcasing your passion and skills to the world.
- It is fun to participate and see how you fare on the leaderboard.
- If you win, the prize money is always a bonus, but that should not be the only criteria to participate/take part in hackathons.
In this article, we will talk about the differences between hackathon platforms and real-world machine learning projects and draw a clear conclusion between the both.
Before we delve deep into understanding the difference between hackathons and real-world machine learning projects, let’s look into a lifecycle of a machine learning project. As explained by Steve Nouri, founder, AI4Diversity, it typically involves:
- Scoping the project
- Collecting the data
- Training the model
- Deploying in production
- Repeating 2, 3, 4
Many industry experts believe that the hackathon platforms might be an amazing way to experiment and learn. Still, it only aligns with a single stage of the ML lifecycle – i.e., training the model. However, when a data scientist builds a model in the real world and optimises the metric, they need to consider the RoI, inference, re-training cost and costs in general. That is a completely missing puzzle while working on hackathon platforms.
“To drive the adoption of an ML model within the business stakeholders, it is important we think about ‘interpretability’ as well,” said Sushanth Dasari, data scientist at Trust, stating that it drives a lot of key decisions in each of the steps in the life cycle, which is never the case with a hackathon.
“In real-world ML projects, 90 per cent of the time is spent on acquiring, cleaning and processing the data, often querying different databases and merging this data. The quality of the input data needs to be carefully assessed and checked for correctness, integrity, and consistency,” said Daniele Gadler, data scientist at ONE LOGIC GmbH.
Further, he said once the Ml model had been developed and deployed, a lot of time goes into monitoring, re-training the model and re-training it based on newly ingested data (MLOps). Instead, in hackathons, the data is already provided and is generally cleaner than in real-world projects. Furthermore, there are no concerns about real-world issues such as model stability, maintainability, deployability, etc. “You can just focus on developing a super-complex ‘unmaintainable’ huge model with the goal of obtaining the best performance on the data provided for the competition, hoping it will generalise on newly unseen data,” said Gadler.
Joseph Wehbe, co-founder and CEO of DAIMLAS.com, said that time is wasted improving 0.000001 accuracies on hackathon platforms, but you do not do that in the real world. It focuses only on one performance metric. However, in the real world, you focus on scalability, speed, deployment, and cost. “You don’t learn how to clean raw data. You don’t learn understanding the business problem, deployment skills, team skills interacting with leadership, and analysis to understand what ‘business problem’ you are trying to solve,” he added.
While hackathon platforms like Kaggle, MachineHack, etc., push users to explore new problems, it also helps them understand the science part well enough to do real-world work.
Hackathon platforms can be as real as real-world, but only the environments are different. For example, ‘what a gym is for athletes, hackathon platforms are for data scientists and machine learning professionals,’ – a great place to practice and learn.