Data Scientists Spend 45% Of Their Time In Data Wrangling

Data Scientists Spend 45% Of Their Time In Data Wrangling

The demand for data science has massively gained traction in recent years, and even with the economic downturn due to the COVID outbreak organisations are investing more on having data science capabilities in their organisations. However, despite significant investments in time, money, efforts as well as human resources, data science still fails to deliver sustained business value.

In fact, according to a recent report, — The State of Data Science 2020 — it has been revealed that one of the significant concerns with data science is getting their models into production, which usually gets delayed or stopped in the process. One of the primary reasons for this is the delayed data collection process, where, according to the report, data scientists on average spend 45% of their time in getting the data ready in order to utilise it for developing models. Such a massive amount of time spent on loading and cleansing the data takes valuable time away from data scientists, which, in turn, impacts their overall productivity.

Further, the report has stated that, after all these struggles, once models are ready for production, data scientists undergo several other changes like dependencies on management and skill gaps before projects actually go through production processes. And these production struggles, therefore, restricts approximately 50% of respondents to showcase the true potential of data science on the business outcome.

Peter Wang, CEO and co-founder of Anaconda, stated in the report that data science has limitless possibilities to transform businesses. However, the numbers highlight that organisations as well as professionals are still in the developing phase and need to do more work in order to get the maximum value out of data science. “From broadening the data science educational curriculum to being more intentional with open-source security, there are clear learnings here for the industry at large to implement in order to improve,” said Wang.

The Efficiency Gap 

The COVID pandemic has disrupted industries and forced companies to rely on data science capabilities to develop strategies to sustain this tormenting time. However, the majority of organisations struggle to maximise the value of their data science. A lot of this could be attributed to the siloed data that the majority of the companies work on, which causes data scientists to spend the majority of their time in collecting and consolidating data from disparate sources and then cleaning it to make it usable for model making.

According to the report — “the data analysis work done in isolation can offer important insights, but falls short of the discipline’s full potential to transform businesses and offer a competitive advantage.”

Majority of current data resides in different formats ranging from videos, audios, text files, images etc., which makes it challenging for data scientists to implement AI, as all the data needs to be gathered in a single place and cleaned to make it usable. In fact, according to the report, 80% of the world’s data is unstructured, that only allows businesses to gain visibility into a small portion of that data. Thus, insufficient data can lead to the failure of AI projects. 

“Getting the data is the key challenge,” said Bill Inmon, the father of data warehousing. He believes that once the data is collected, the rest of the data science is not trivial. “… we spend 98% of our time gathering, finding and cleansing data. I don’t understand why they don’t have a regular part of the curriculum that focuses on getting the data,” said Inmon. 

This further can be a major reason for data scientists to lose motivation in the job and look for a change which again leads to the increasing attrition rate of companies. Along with siloed data, data science teams also lack collaboration and the work of ML pipelines also takes place in isolation, thus delaying the process of putting a model into production.

Also Read: Why Majority Of Data Science Projects Never Make It To Production

To resolve this, according to the report, data scientists should learn to communicate the value of their work, and business leaders should remove the barriers of data science deployment in organisations.

Bridging The Skills Gap

The report further explained how the skills gap is another critical reason for the majority of the models to get onto the production process. Although many universities and organisations are expanding their data science courses and upskill offerings, according to the report there has been a significant gap in what institution graduates have and what enterprises need in the current era. In fact, the report stated that two of the significant differences are in big data management (with 38%) and engineering skills (with 26%), as these are not in the top 10 skills offered by university courses

Thus, enterprises lack data science professionals with the skills that are actually required for businesses to stay relevant in this uncertain time. In fact, the report has experience (40%), technical skills (26%), and soft skills (18%) are the key obstacles for data scientists to obtain their ideal job.

To resolve, the report has suggested that edtech companies should collaborate more with corporates and enterprises to create courses that are relevant for the businesses. These courses should ensure that learnings go beyond just resume enhancement and should provide hands-on experience on technical skills. This can also be enhanced with the help of an internship which will assist these new professionals in facing the real challenges of the data science industry. “Serving as a ‘data translator,’ demonstrating business impact from their work, and influencing colleagues cross-functionally to address production roadblocks and secure access to resources,” stated in the report.

Alongside business leaders should rethink their strategies to retain data science talents by providing necessary upskilling opportunities and ensuring clarity in their career. Also, the report has stated how cross-training employees can lead to many benefits for companies. Here business leaders should encourage their employees to learn and reskill themselves by cross-training with domain expert employees to improve outcomes. In fact, experts believe that data scientists should be trained across multiple domains in order to continue their professional development and increase their relevance in organisations.

Furthermore, even though ethics and business knowledge has been a critical concern in the data science industry, the report has revealed how only 15% of universities are actually offering any programs and training on ethics and business knowledge. Experts believe that without sufficient expertise in understanding the business fundamentals and the ethical practices of data science, no professionals can be a data science unicorn.

Wrapping Up

To summarise, the report has stated that despite organisations relying on data science abilities to run their business amid this COVID pandemic, the data science discipline still has a long way to mature completely. Experts of the report believe that data science continues to integrate entirely in the core business functions of the organisations. However, to get the maximum potential business leaders must focus their attention on data collection and bridging the skills gap for faster model production.

Download our Mobile App

Sejuti Das
Sejuti currently works as Associate Editor at Analytics India Magazine (AIM). Reach out at

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.