Are you one of those Kaggle ninjas who has aced machine learning competitions but are struggling to get noticed by recruiters? A data scientist by day and Kaggler by night is not a new phenomenon. You will find scores of ML enthusiasts who have aced the Titantic competition (one of Kaggle’s easiest) and also survived the Digit Recognizer or First Steps with Julia. But there’s a major downside to Kaggle, that many data science professionals have stressed upon.
We list down several missing pieces in Kaggle that undermine your machine learning understanding:
- Understanding business problem: In a Kaggle competition you don’t spend time to understand how a model will fit into the business problem and how accurate could it be.
- Types of problem: Among the vast selection of business problems that a data scientist faces in real-life, Kaggle only highlights a subset of it, shared Ria Chakraborty, a data scientist at IBM on Quora.
- Convincing C-level executives about your analysis: Kaggle competitions often overlook another challenging part: convincing C-level executives to follow through the solution with a data backed strong analysis.
- Working on clean dataset: The real difference between Kaggle and real-world data science is that you’ll likely never be handed a pre-determined and cleaned dataset. Data wrangling forms a huge part of the data science business — mean joining datasets, cleaning up missing values and transforming data.
- Putting models in production: You never put Kaggle models in production which is the real test of its performance.
While Kaggle is a great source of competitions and forums for ML hackathons, and helps get one started on practical machine learning, it’s also good to get a solid theoretical background. Here’s what we think: Kaggle is a great place to get started on machine learning, but at the same time one must also improve their theoretical background to fill any gap in machine learning.
Now first up, let’s define a Kaggler proficient in ML – a) you have more than a basic knowledge of machine learning; b) You know how to use machine learning libraries/packages in R, Python, Java etc.
AIM gives a lowdown on how one can make the most of your Kaggle machine learning experience
Build a theoretical foundation: Data science practitioners believe machine learning is a life-long commitment and most MOOCs don’t cover all the algorithms. One should go deep to fill the gaps in learning.
Microsoft Azure Machine Learning: If you don’t want to sign up for a MS, then you can try free resources on Azure ML studio. You can even try out solutions from Cortana Intelligence Gallery. You can even try pushing your models into production in Azure ML.
Build a machine learning portfolio: Kaggle competitions are often panned for presenting clean datasets. In fact, data wrangling is the missing piece in the puzzle, whereas in a business setting, data wrangling forms a huge part of data science — joining datasets, cleaning up missing values, transforming data/creating new features. By creating a semi-formal work product for each project to share what was done, how it was done and what was learned (use github README files on each repo, write blog post, create PDF tech reports, power points, whatever).
Here are a few project ideas: a project where you had to collect the data yourself, e.g. scraping products reviews from a website or a project where you dealt with missing or messy data, e.g. cases where some people provided their location and some didn’t.
Building a machine learning portfolio will go a long way in establishing how you can complete projects. It will also equip with the confidence to take on more interesting projects as you apply your ML learning and show your skills and capabilities to recruiters once you start looking for a job.
Prioritize your learning based on the application: Since machine learning is a broad field, it will be better to select a specific area of study and a Masters can help in landing a job interview as well. If you are geared towards a Post-doctorate which is also a good idea, there are a few application areas you can focus on. From Natural Language Processing to Computer Vision (think setting up GPU instances in AWS) and Deep Reinforcement Learning, there are several areas of application in ML for research.
Move up the ladder with Neural Networks for ML: Want to become more technically strong, deepen your Neural Network knowledge with a course or free resources that talks about artificial neural networks and how they’re being used for machine learning, in areas such as speech recognition, image segmentation and object recognition.
Kick-start your career in machine learning: Now that you have mastered some of the basics of ML, and have become a well-rounded machine learning expert, you would want to make it a full-time job. If you are proficient at ML, you would probably land a job as a data scientist and progressively move into doing more ML.