A subsidiary of Google LLC and online community of data scientists and machine learning practitioners, Kaggle, has recently launched the fifth edition of its in-depth survey– Kaggle State of Data Science and Machine Learning 2021.
For the survey, more than 25,000 data scientists and ML engineers have submitted responses on their backgrounds and day-to-day experience– educational details to salaries to preferred technologies and techniques. However, the report is focused only on the 14 per cent of the respondents who are presently employed with the job title ‘data scientist.’
The State of Data Science and Machine Learning 2021 report is organised into five sections:
- Data scientist profile
- Data science and machine learning experience
Today, we list some of the key insights from the survey:
Data scientist profile
- Gender gap persists: Data science is still suffering from a large gender gap in the workplace, with 82 per cent of the users identifying as men. There has been no significant change in gender distribution over the past five years.
- Attracting the young crowd: Data science remains a ‘fairly young profession’ with more than half of all respondents aged between 22 and 34.
- Data scientists are spread across the world. More than 40 per cent of the data scientists live outside the ten countries where Kaggle had the most respondents.
- It is noteworthy that about 24.4 per cent of Kaggle data scientists are from India, 12.2 from the US and under 4.3 are from Brazil.
- Most data scientists (over 62 per cent) have higher education degrees– Master’s or doctoral degrees; while less than five per cent are high school diploma graduates.
- Although about 64 per cent of participating data scientists have an advanced degree, over the years, it is becoming increasingly common for data scientists to not have an advanced degree.
- As it is a constantly evolving field of study, most data scientists and machine learning technologists maintain ongoing education. To help them stay at the forefront of their field, Coursera has remained the most popular data science learning platform.
- Kaggle Learn Courses experienced a nine per cent growth since 2020.
Data science and ML experience
- More Kaggle data scientists have taken up programming within the last year (14.6 per cent in 2021, compared to nine per cent in 2020).
- More than 55 per cent of the Kaggle data scientists have less than three years of experience in ML.
- Less than six per cent have been using ML for more than a decade.
- US-based data scientists have more ML experience than their global counterparts.
- US-based companies are most likely to pay data scientists a six-figure salary. Global companies have lower and more evenly distributed salary ranges.
- US-based companies are likely to pay higher salaries, followed by Germany and Japan.
- Most US-based data scientists make over $100,000 per annum.
- Less than three per cent of India-based data scientists make over $100,000 per annum.
- In India, nearly 90 per cent of the data scientists make less than $50,000 per year.
- Similar to last year, over half of the respondents of the survey work at companies with five or fewer people in the data science department. However, one for every five respondents works on a team with more than 20 data scientists.
- More than a quarter of respondents claim to have spent no money on ML and cloud computing projects. However, one in every 10 data scientists has spent over $100,000 on projects in the last five years.
- Interestingly, data scientists from the US spend more money in the cloud than their global counterparts.
- Around three-quarters of the Kaggle data scientists continue to use Jupyter-based IDEs as their go-to tool, followed by Visual Studio Code which is used by 38 per cent of data scientists.
- Similar to last year, linear and logistic regression continue to be the most commonly used algorithms, followed by decision trees and random forests.
- Under complex methods, gradient boosting machines and convolutional neural networks are the most popular approaches.
- There has been an increase in the use of large language models such as BERT and GPT-3.
- Python-based tools dominate the ML frameworks, with Scikit-learn topping the list (used by 80 per cent of data scientists). It is followed by TensorFlow, Keras, and XGBoost.
- Hugging Face is the most popular of the new tools added to the survey, with 10 per cent of data scientists using it.
- PyTorch has been growing strongly.
- Amazon Web Services, Google Cloud Platform and Microsoft Azure continue to lead the enterprise cloud computing game.
- Amazon Elastic Compute Cloud is the most popular cloud computing product. However, Google Cloud Compute Engine and Azure Virtual Machines also have strong adoption. Similarly, Amazon’s Simple Storage Service (S3) is the most popular data storage product among data scientists; however, Google Cloud Storage and Azure Data Lake Storage are also popular.
- Amazon SageMaker is the most popular choice for enterprise ML customers, following Databricks, which is almost at par with Azure ML Studio (at 13 per cent). Google Cloud Vertex AI is preferred by eight per cent of data scientists.
- MySQL, PostgreSQL and MicrosoftSQL are the top favourite databases.
- TensorBoard is most popular among data scientists, with 22.3 per cent of them using it, followed by MLflow at 18 per cent.
- Google Cloud AutoML continues to be the most popular in the AutoML category. Its adoption continues to grow at a steady rate.
Kaggle has shared the complete dataset of the responses to this survey for its community to review. It plans to run a competition until November 28 to gain more insights about data science practitioners in 2021.
You can access the survey results here.
At AIM, we have published a report on the ‘State of Artificial Intelligence in India 2021’; you can access that here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
After diving deep into the Indian startup ecosystem, Debolina is now a Technology Journalist. When not writing, she is found reading or playing with paint brushes and palette knives. She can be reached at firstname.lastname@example.org