An honest revelation of a data scientist

Most of the online data science courses and articles do an amazing job of giving a brief understanding of the technicalities of data science. But they try to superimpose some well-known myths of data science as the reality deep inside a learner. It’s about time we burst these bubbles once and for all.
An Honest Revelation of a Data Scientist

Design by An Honest Revelation of a Data Scientist

Time flies. It seems like it was yesterday when I walked out of my college as a novice statistician into the big leagues. Yet, six years and a lot of mistakes later, I can see myself growing up from a junior analyst into the role of data science consultant. A lot has changed in these six years in the world of learning data science, but some misconceptions still linger around.

In this article, I will try to address these misconceptions and try to draw a realistic picture of the world of data science.

Data Science Myths:

Most of the online data science courses and articles do an amazing job of giving a brief understanding of the technicalities of data science. But they try to superimpose some well-known myths of data science as the reality deep inside a learner. It’s about time we burst these bubbles once and for all.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

#1 Data science is all building models. 

If you spend enough time-consuming data science-related content online, you will inevitably stumble upon terms like machine learning, artificial intelligence, neural networks and data modelling being thrown around. Unfortunately, the internet tends to overhype these keywords. In reality, Data Science requires one to thoroughly understand data, identify the patterns, and create signals that support the pattern. Data of the real world is messy and unstructured. It requires a lot of toiling to get the data up to the standards of finding any noticeable pattern, let alone modelling. During my early days, I nearly didn’t work on a single bit of modelling stuff but rather was invested in sourcing, validating and cleaning data, i.e. the mundane and unattractive part of data science. I then understood why these mundane things really matter and is the most crucial part of any data science solution.

#2 We have all-powerful algorithms that can do everything.

In reality, all algorithms have their set of advantages and drawbacks. One must carefully balance these trade-offs to extract the best out of them. A deep understanding of their background, assumptions and workings helps to evaluate the applicability of an algorithm in a particular situation. Moreover, judicious tweaking of hyper-parameters in even the most basic algorithms can provide statistically better and stable results over the standard version of high-end algorithms. I learnt the hard way that it’s better to stick to a particular algorithm and try to extract the best out of it instead of bombarding the data with every algorithm ever known to a human being. 

#3 Data Science is a one-man army.

The online courses provide “real-like live projects” but lack a key skill for collaboration on any data science project. Typically, a data science team will consist of – i) a Lead data scientist who provides overall guidance and manages the progress of a project, ii) a couple of Senior data scientists who work on complex data pattern recognition stuffs and solution designing, iii) a bunch of Junior data scientists who are still in the learning curve and iv) some Data Engineers who work on creating the right format of data. You will need to communicate regularly with your team regarding what you are doing, how you are doing, and the result. Your work will be evaluated and reviewed by the senior folks. You will have to work on a bunch of different tasks, be it data cleaning or pattern finding.

#4 There is a one-size-fits-all type of approach.

Sadly, each solution is different. The approach you will be working on depends on how the solution is designed. Here, it has become quite diverse due to the senior data scientists’ different skills and understanding levels of the data science approaches.

Even till this point, there is no SOPs into how a data science project is to be approached, and different intermediate processes are to be handled. Even the internet is clueless about this, and every other website provides a very different approach towards the same problem. The data scientist’s lack of SOP and variability of interpretational capabilities makes it extremely difficult to work in a team setting. The different skill levels make the team effort disjointed, and the project’s success lies solely on the capabilities of the most experienced and talented ones.

Problems due to Non-Standard Operation Procedures:

The lack of standardization is quickly galloping into the ladder of major work-stopper for data science. There are some basic steps you need to traverse for every solution, and you will inevitably face some issues with these steps as there is no standard procedure. Some data science professionals invented ways to tackle them from their own experiences, but not all have access to them. So, a lot of time is spent online looking for solutions at StackOverflow and similar platforms. Moreover, all the solutions presented online might not be relevant, and one has to make a lot of trial and error to find the exact solution.

An internal survey was conducted to measure the approximate time allocated by data science practitioners of various Wipro teams for the different steps of data science workflow. The survey result was compiled and averaged out to level out the skill difference among the data science practitioners. The results are shown in the table below.

Sl. No.StepDescriptionTime(in mins)
1ExploreExploration of the best approach to the ML model235
2FitTesting whether the approach fits the problem80
3ImplementImplementing the best approach to the problem80
4UnderstandDeep understanding of the data & creating features using data wrangling techniques50
5ModelCreate and validate ML model50
6ProductionProductionalize the ML Model using MLOps30
7ResearchIdeate new use cases, Brainstorm, read about new technologies0

Table 1: Showing average time allocated by a data scientist in 8.75 working hour of a day in redefined steps of ML process

It was observed that while the time to “Understand”, “Model” and “Production” remain more or less the same for every data scientist, the make-or-break moment arrives with the time spent on the “Explore” stage. This stage sets apart a novice from a master data scientist and pinpoints the massive skill gap between different data scientists and how that impacts the implementation of the whole project. Apart from these, data scientists nowadays have almost no “working minutes” left to spend learning about new technologies and brainstorming on new ideas. So, inevitably, data scientists need to extend their working hours to compensate. Moreover, they are compromising the time allocated dedicatedly to the “Understand” & “Model”, hampering the model’s quality and stability.


Standardization of ML processes is the need of the hour for any data scientist. The industry has off-late started to understand the perils of operating with a people-dependent approach to data science instead of a process-dependent one. A sufficiently equipped ML standardization will be able to reduce the burden on data scientists and enable them to utilize their resources better. The standardized procedure will also help in the democratization of the ML modelling framework and help create ML models with higher benchmarks.

Siladitya Sen
Siladitya Sen is a Business Analyst at Wipro Limited. He has received his M. Sc. In Statistics from Presidency University, Kolkata. He has close to 7 years of experience in the field of data science. He is quite proficient in building classical statistical Models, Machine Learning and AI models.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry


Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.