Listen to this story
The Data Science Skills study is a survey-based report highlighting various skills considered by industry professionals to be in high demand. The report finds out different tools, technologies, or skills across categories that are currently being used or that are imperative to know/learn if one is to make a career in data science. The report further identifies the suitability of different skills by years of experience and sectors. It also discusses the time spent by practising and non-practising data science professionals on learning these skills through different formats.
Data science and its applications are becoming more common in a rapidly digitising world. As a result, many students/professionals from different disciplines seek sources that can help them understand the key skill sets required to kickstart/stay relevant for a career in data science. Recruiters or industry professionals also need to gauge what tools are in higher demand and why. This report presents a comprehensive view to all the stakeholders — students, professionals, recruiters, and others — about the different key data science tools or skillsets required to start or advance a career in the data science industry.
The report has been developed after rigorous primary research through a survey distributed to data scientists and leading AI/ML practitioners. This was complemented by direct discussions with job-seekers to understand and gauge their perspective on the in-demand skills in this domain.
All past reports:
- 84.4% of professionals mentioned that recruiters look for Machine Learning as the most crucial skill at the time of hiring, followed by Statistics at 78.9%.
- More than one in two (55.7%) professionals spend their time weekly to upskill.
- 61.7% of Data Science professionals are learning Cloud Technologies to upskill.
- Almost nine in ten (87.8%) Data Science professionals mentioned that knowledge of programming languages (R, Python, SAS) is one of the most basic skills to kickstart a career in Data Science.
- More than nine in ten (90.6%) professionals use Python as a programming language for Statistical Modelling.
- MS Excel (63.3%), Tableau (56.7%), and MS Power BI (43.9%) are the three most used tools for data visualisation.
- More than three in four (77.8%) professionals use Conventional ML Models like Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, etc.
Common skills looked at by recruiters
84.4% of professionals mentioned that recruiters look for Machine Learning as the most crucial skill during hiring
Almost two in three professionals with less than 3 years of experience said recruiters consider Data Visualisation as a must-have skill when hiring—this number reduces for respondents with more years of experience
Nine in ten professionals from the BFSI and Pharma & Healthcare sector said recruiters look for Statistics as one of the core skills during the hiring
According to 84.3% respondents (4 out of 5), Machine Learning is considered as a top skill in candidates by recruiters when hiring data scientists. This is followed by proficiency in Statistics (78.9%) and Communication (72.8%). Some recruiters consider communication skills to be more important than Programming Knowledge (70.0%). 62.2% respondents (3 in 5) stated that recruiters look for Data Wrangling and Preprocessing skills whereas 55.6% (1 in 2) recruiters looked for Data Visualisation as a skillset.
92.3% (9 out of 10) professionals with more than 10 years of experience think Machine Learning is a considered a common skill by recruiters, compared to 81.9% respondents with less than 3 years of experience. The share of professionals with more than 10 years of experience agreed that Communication and Big Data skills are demanded 1.4 and 1.2 times higher than those with less than 3 years of experience.
4 out of 5 IT professionals said that recruiters prioritise critical skills such as Machine Learning (84.3%), Statistics (81.4%), Communication (81.4%) and Programming Knowledge (81.4%). Similarly, 9 out of 10 (90.0%) BFSI and Pharma & Healthcare professionals said that Statistics is one of the core skills that recruiters seek. The same respondents from the BFSI sector agreed that Machine Learning is one of the most desired skills.
The share of professionals who agreed that domain knowledge was important was the highest (60.0%) in Pharma & Healthcare. Presentation skills were considered noticeably more important in Pharma & Healthcare (70.0%) and Retail, CPG, & E-commerce (73.7%) compared to other industries.
Need for upskilling
Data Science professionals are critical to a company’s development, innovation, and decision-making processes, and they must be able to adapt to an ever-changing digital world.
Therefore, upskilling helps professionals broaden their abilities and knowledge required for future employment, opportunities and success. This is supported by 98.6% of respondents who agree with the need for continuous upskilling in the field.
Time invested in upskilling
One in two Data Science professionals spend time upskilling themselves weekly
Almost two in three Data Science professionals in the Retail, CPG and E-Commerce industry upskill weekly
3 in 4 Data Science professionals with less than 3 years of work experience engage in upskilling weekly, while more than half of the professionals in the 3-6 year work experience bracket upskill weekly
According to the survey responses, 55.7% professionals spend time upskilling weekly. Around 22.8% spend time every month, while 11.9% do it quarterly. A meagre 5.9% do it annually, and 3.7% never upskill.
Professionals with less than 3 years of experience are the most active in upskilling themselves. 72.2% (3 out of 4) Data Science professionals with less than 3 years of experience upskill weekly. 56.6% professionals with 3-6 years of experience also upskill weekly, but a significant share of these professionals (28.3%) upskill on a monthly basis. Similarly, 31.0% (1 out of 3) professionals with 6-10 years of experience prefer to upgrade their skills quarterly.
Professionals with less than 3 years of experience are the most active in upskilling themselves
63.6% professionals from the Retail, CPG and E-Commerce sectors are the most active in updating their skills weekly. On the other hand, 35.1% Data Science professionals from the BFSI sector upskill monthly.
New skills data scientists are learning
Three out of five Data Science professionals are learning Cloud technologies to upskill
70% professionals working in BFSI stated that they have upskilled in MLOps
Cloud technologies, MLOps, and Advanced Deep Learning Models like Transformers are the top 3 new skills Data Scientists/Analysts are trying to learn or upskill in
To remain relevant to the industry’s current needs, Data Science professionals continuously update their skills. As per the conducted survey, more than 61.7% (3 out of 5) professionals said they are upgrading their skills in Cloud technologies (Azure, AWS, GCP). Following that, 56.1% professionals are learning MLOps and 55.0% are learning Transformers.
The most popular skill to acquire among professionals with more than 10 years of experience is MLOps, with almost 73.1% (3 out of 4) professionals learning techniques to scale ML models-one of the most pressing concerns in the industry. This is followed by Reinforcement Learning (57.7%), Cloud Technologies (57.7%) Transformers (57.7%) and others. Professionals with 3-6 years of experience are more inclined towards acquiring Cloud technologies (71.7%) as a core new skill, followed by MLOps (62.3%), Transformers (60.4%) and others.
Professionals working in the Retail, CPG and E-Commerce sectors are more inclined towards learning Cloud technologies (73.7%) as a new skill. On the other hand, professionals in the BFSI sector are more likely to learn MLOps (70.0%) as a new skill set. Similarly, professionals in the Pharma & Healthcare sector are interested in learning Transofrmers (70.0%) and Computer Vision (60.0%) as core skills.
Cloud for data analysis is in high demand and that is reflected in the high share of professionals choosing to upskill in the technology.
Basic skills needed for a data science career
Nine out of ten Data Science professionals mentioned that knowledge of programming languages (R, Python, SAS) is the most basic skill to start a career in Data Science
Four in five professionals said that Statistics is an important basic skill to start a Data Science career
Programming (in R, Python, SAS), Statistics, and a basic understanding of Machine Learning are considered to be the top 3 basic skills for a career in Data Science
According to the survey, 87.8% (9 in 10) respondents said that knowledge of programming languages like Python, R, or SQL is the most basic skill to kickstart a career in Data Science/Analytics. This is followed by knowledge of statistics (80.6%) and basic ML understanding, as 75.6% of respondents claimed.
All (100.0%) respondents with more than 10 years of experience said that ability to code in statistical programming languages is a must-have skill to start a career in Data Science. This is followed by knowledge of statistics and basic Machine Learning concepts, both at 80.8%. Similarly, five in six (83.3%) Data Science professionals with less than 3 years of experience think that knowledge of statistics is a must. A significantly higher percentage of professionals (77.4%) with 3 to 6 years of experience said that Data Wrangling and Preprocessing Skills are important compared to professionals in other experience brackets.
In terms of industries, 94.7% (9 out of 10) survey respondents in the Retail, CPG, & E-Commerce said that knowledge of ML concepts is the most basic skill to start a career in Data Science. The demand for Statistics (86.7%) is the highest among BFSI professionals, and the demand for Data Visualisation skills is highest in Pharma & Healthcare (70.0%). By and large, it was agreed among all industries that knowledge of programming language is the most important skill to start a career in Data Science.
More than three in four professionals claiming that basic ML understanding is a must-have skill for a career in Data Science is indicative of increasing maturity in the field.
Languages used for statistical modelling
Nine in ten professionals use Python for statistical modelling
Python, SQL, R are the top three languages preferred by Data Scientists
Data science professionals with more than 10 years of experience are 3.3 times more likely to use SAS than those with less than 3 years of experience
Python is the most popular programming language in Data Science, with nine in ten (90.6%) Data Science professionals saying they use it for statistical modelling. After that, SQL and R were preferred by 52.8% and 38.3% of participants, respectively.
Years of experience plays a prominent role in some of the languages used by Data Science professionals. For instance, data scientists with more than 10 years of experience are 3.3 times more likely to use SAS than those with less than 3 years of experience. Similarly, the use of R increases by 1.8 times.
Python remains the most used programming language across all the sectors, with at least eight out of ten professionals in every industry surveyed saying they use it. Apart from that, the use of SQL (68.4%) is highest in Retail, CPG and E-commerce, followed by IT at 62.9%. R is the most commonly used programming language in the Pharma & Healthcare sector, with three in five (60.0%) professionals claiming they use it for statistical modelling.
Enterprises prefer languages like Python and R over SAS, not just because of the cost factor but also because technologies are often first released on open source.
Despite the cost factor, Pharma & Healthcare (20.0%) and BFSI (23.3%) also widely utilise SAS since it is a preferred choice of tool by most for clinical trial data analysis and also because it offers better security.
Data Visualisation tools
MS Excel is the most widely used visualisation tool, with two in three analytics professionals using it
MS Excel, Tableau, and MS Power BI are the three most used tools for Data Visualisation
MS Excel is used by 84.6% professionals with more than 10 years of experience
Despite all the technological advancements in Data Science, the use of MS Excel remains high, especially when building data visualisations. 63.3% (2 in 3) analytics professionals two in three analytics professionals said that they use MS Excel. This is followed by Tableau (56.7%), Power BI (43.9%), and QlikView (12.2%).
The utilisation of MS Excel (84.6%) is especially high among people with more than 10 years of experience. On the other hand, Tableau is the preferred choice for professionals between 3-6 years (50.9%), followed by MS Excel (45.3%) and Power BI (34.0%). Similarly, Data Science professionals with 6-10 years of experience also prefer Tableau.
People with 3-10 years of experience are more hands-on and use comparatively more complex tools like Tableau for dashboards than just MS Excel.
By sectors, Tableau is the most popular tool in Pharma & Healthcare according to four out of five (80.0%) professionals who said they use it for data visualisation. Similarly, 65.7% of IT respondents said they use Tableau compared to 61.4% who use Power BI and 58.6% that use Excel. On the other hand, MS Excel remains the most used tool for Data Visualisation in all the other surveyed sectors.
Data Science models
Three out of four Data Science professionals use Conventional Machine Learning models on a regular basis
Two in five data science professionals use Convolution Neural Networks
Five out of six professionals with 10+ years of experience said they have an RNN
Conventional Machine Learning models like Linear Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, etc. are the most utilised ML techniques among Data Science professionals—more than three out of four (77.8%) respondents said they use it on a regular basis. This is followed by CNN at 40.0%, LSTM at 31.7%, and RNN at 28.3%.
Data Science professionals who are in the early stage of their careers prefer using Conventional Machine Learning Models since they are just starting out. 61.1% (3 out of 5) respondents with less than 3 years of experience use Conventional Machine Learning models. However, with more experience, data scientists venture into complex models. You can observe an increased use of Neural Networks and Deep Learning models among professionals with 3-6 years of experience. Around 77.4% of them use CNN, 47.2% use RNN, and 47.2% use LSTM. In the 6-10 years experience bracket, you see a lesser use of these models. However, the utilisation again goes up for professionals with more than 10 years of experience since they need to keep up to date with the latest technologies and experiment with the state-of-the-art/complex models for research.
Conventional Machine Learning models are the preferred choice of professionals across sectors. Following that, specific industries show a preference for certain models. For instance, CNN is widely used in the IT (44.3%) and BFSI sectors (43.3%) since both these industries see a wide array of applications in segmentation or classification.
Similarly, LSTM (60.0%) or RNN (50.0%) models are widely used in Pharma & Healthcare. 15.8% (1 in 6) data scientists working in Retail, CPG and E-Commerce use Multilayer Perceptrons (MLPs) and 13.3% (1 in 8) professionals working in the BFSI sector use Genrative Adversarial Networks (GANs).
Freshers start out with Conventional ML Models but soon experiment with complex Deep Learning Models or Neural Networks as they gain work experience.