Kaggle is home to data scientists and machine learning professionals and practitioners all over the globe. From beginners to professionals, every data science enthusiast is present in this big online community. Kaggle allows them to connect and learn. Kaggle is now a subsidiary unit of tech giant Google. It was founded ten years back in April 2010 by Anthony Goldbloom(Founder and CEO) along with Jeremy Howard(best known for creating fastai library) and Nicholas Gruen. Back then, it only hosted machine learning hackathons. Over the years, it has evolved to provide a public data platform over the cloud to educate people on artificial intelligence. Kaggle has four main sections: Competitions, Datasets, Notebooks and Discussions.
In this article, let us take a walk through various features and services Kaggle provides which have proven to be a boon for any ML and DS practitioner.
Sign up for your weekly dose of what's up in emerging technology.
This is the first page that will appear on opening Kaggle. Pretty similar to any social media feed. People you follow in their notebooks or people’s notebooks which they’ve upvoted will appear here. Upvoting is like appreciating someone else’s work. Based on upvotes, there are certain divisions or rewards, which I’ll be discussing later.
Each user in Kaggle has their own profile page section containing basic information about the user so that others get to know. Their current working place for professionals or university for students along with the designation. Links to other public profiles can also be mentioned here. I’ve provided Github and Linkedin profiles which are represented by the logos. Users can add a short description of themselves. Followers and following present in numbers which list appears down the page. Progress and records in all the four paradigms are shown here. At the end of this page, the activity log is present, which shows everyday stats.
This is the most important section according to me in Kaggle. Kaggle conducts data science competitions which are considered as benchmarks in the data science world. A real-world dataset and problem statement is provided along with other parameters on how the solution is expected, evaluation metric and deadline for submission. After submission, public scores will be generated in the leaderboard with rankings. One major thing is that winners are selected from the private leaderboard, which is generated after the competition is over. These competitions are held with big prize money for winners. However, some basic competitions are also there like housing prices and flower classification for beginners to learn and practise. Most of these competitions have been existing ever since added. Proper data analysis and data modelling play key roles for solutions. Often validation accuracy also plays a vital role. A lot of experimenting is to be done to learn the exact way out. I consider this the best way for practising data science, and by iterating, one can become an expert. Competitions can be participated individually or in groups.
A competition page
This is another important section containing datasets. Users can add datasets in the specified format. Providing a proper description of the dataset along with use case. Licensing is important for copyrights. For research and project-based work already existing datasets can be downloaded easily. Along with datasets, a Kaggle starter kernel is available to show basic data analysis.
Kaggle Notebooks or Kernels
This is another important section where people share their work in Kaggle notebooks which is just Jupyter notebook with code and markdowns for the explanation. A lot can be learnt from here about approaches and workflow in a step by step manner. While running a code, versions can be saved in the form of current work done and later keep track of each improvement or addition made. Notebooks can be forked and then make changes.
For adding dataset use the Add data tab in the upright corner and the following will pop up you can either upload from Kaggle datasets or your own local system/ GitHub repository/ external link/cloud.
For interacting in forums and in general to people’s works, this section is useful. We can add comments, clarifying doubts or mention some resource.
Kaggle Progression System
Kaggle has a reward system through which certain divisions are awarded. Based on upvotes, there are three types of rewards – bronze, silver and gold on each performance tier. This system keeps the users intact with the spirit of competition and being awarded for the hard work. The awarding system is divided into Novice for recently joined users, Contributor, Expert, Master and GrandMaster for each of the four paradigms, Kaggle GrandMasters are considered eminent people who have achieved this by enormous labour and recognised globally for their contribution.
Kaggle has started free hands-on practise courses on data science topics starting from language basis Python and R to data analysis, data visualisation, machine learning algorithms, deep learning, CV and NLP, database language SQL, reinforcement learning. All these courses have been divided into topics along with exercise notebook. A progress bar shows the progress after completing each topic. At the end of course completion, a certificate from Kaggle is also provided at free of cost. These courses are really helpful for beginners and driven in the best standard and taught by data science professionals.
Kaggle can be called a full-stack community for data scientists as it provides end to end service from preparing to job opportunities. It has tie-ups with many companies and gives vacancy information for different posts available such as Machine learning engineers, data analysts, data visualisers, data modelling, data engineer, data scientists and many more. Companies can post their hirings with specified salary, post, experience and qualifications. Some competitions are even held for recruiting at top firms.
Kaggle has received global recognition ever since it was founded for its high standard competitions which have proven to be real-world solutions and used by many companies like Microsoft, CERN, Merck, Adzuna. Many researchers have published peer-reviewed papers based on winning solutions at Kaggle competitions. Some of these successful competitions are – gesture recognition, chess ratings, HIV research, traffic forecasting. Kaggle has blogs written on different topics and winning solutions.