The popular consensus on skills that data scientists should have is algorithmic knowledge, coding and the ability to visualise & interpret. While technical skills are at the core of data science, non-technical skills are equally important to enhance the role of a data scientist. Problem-solving is the superior ability needed for data scientists.
In the real world, the problems are often stated vaguely and broadly. For instance, data scientists often get requests from organisations saying, “We want to increase the profits for our business” or “We want to predict loan defaulters”. Translating a vague problem into a clear, measurable, and precise problem statement is the first and most important step for business consultants and data scientists. Conversely, a wrong problem statement will ruin your entire data science project.
Sign up for your weekly dose of what's up in emerging technology.
Data scientists can use the three-step process as an easy technique to define a problem:
- Understand the landscape
- Break the broad problem into smaller bits
- Narrow the options and finalise the problem statement
Let’s explore this process in more detail.
Understand the landscape
The first step before focusing on the actual problem is understanding the context of the problem. This includes references to the stakeholders and boundaries of the problem at hand. Next, data scientists can understand the landscape by answering important questions related to the industry landscape and present scope. They can do so partly by reading up relevant materials and partly by stakeholder interviews. Listed below are the essential questions to understand the problem.
- What is the industry and company context?
You understand how the specific industry operates and its drivers to pinpoint the context of the specific problem statement. For instance, an individual working in the retail industry should know how the retail supply chain works, how stores are organised, and how customers behave and respond to pricing and promotions. Additionally, it is equally crucial for them to have a concise understanding of how these trends are evolving with the post-pandemic behavioural shift to increased online shopping.
- Who are the stakeholders that can provide me with information?
These are the business owners that are clear about their goals and metrics. They can provide you with the necessary data and clarity. It is essential to navigate the organisation to understand who the stakeholders are and how accessible they are to you. Your first point of contact or your key buyer can help you greatly in this navigation.
- What are the scope boundaries?
Sometimes, data scientists do not realise that they are working with incorrect scope until it is too late. Therefore, it is integral to ensure that there is upfront clarity on what the boundaries are. Important questions like the request on a city, region, country, or global level, which business units and products are we talking about, what should be excluded, need to be asked while taking up a project.
- What are the key success criteria?
The key success criteria need to be agreed upon with the stakeholders. These include the level of profitability that is acceptable and the timeframe to achieve the project. The stakeholders may come up with success criteria that you may not have thought of. It is also alright if there is a lack of clear agreement between stakeholders on some of these criteria in some situations. You can note the points of disagreement and revisit them once the problem is defined with more specificity.
Once you have developed a detailed understanding of the landscape, the next stage delves deeper into breaking down the actual problem into smaller parts and understanding it.
Break the problem into smaller bits
There are many frameworks that people use for doing a deep dive on a broadly stated problem. However, the one that has always worked best for me is a derivative of the famous ‘Five Whys’ technique.
The trick is to keep asking ‘why’ until you reach a point where it doesn’t seem to make sense to ask any further. It is almost akin to a child’s inquisitiveness. We are all too familiar with young children who keep asking ‘why’ followed by another ‘why’ followed by another ‘why’. That’s exactly what we need to do here. Shed the inhibition and go deeper into the problem to understand the underlying dimensions.
Three things are important to follow at this stage:
1. Explore all the potential ‘why’s.
The traditional ‘Five Whys’ technique is about asking the question in quick succession to a response. It would be best if you went deeper and deeper until you are satisfied. I generally prefer a modified version of this technique while using this for crafting a problem statement. Here’s how. Ask ‘why’ and explore all potential options of why something could be a driver (rather than moving on to the next ‘why’ after we get one satisfactory response). Then for each of these responses, go further deeper and ask ‘why’s. This leads us to the MECE principle.
2. Follow the MECE principle.
MECE (pronounced as mee-see) stands for Mutually Exclusive, Collectively Exhaustive. To illustrate this principle with an example, some potential answers while asking ‘why’ for profit decrease could be – higher input costs, higher production costs, higher sales and marketing costs, higher administrative and overhead costs, or lower sales revenues. While these answers are exhaustive collectively, they do not overlap with another driver and are thus mutually exclusive.
3. Map these into a tree diagram for clarity.
The last step to understanding the problem statement is organising it. Breaking the problem into smaller parts is a foolproof way to get a broader and clearer sense of the problem at hand. You can do this by creating a tree diagram, starting with the broad problem statement as the first node. While you follow this with the general branches exploring the problem in smaller details, the technique further explores ‘why’. The smaller points in the tree diagram should be followed along with the MECE principle to explore potential reasons for the problem. Finally, you can plot them on the diagram to gain clarity on various dimensions of the problem.
To illustrate the steps with an example, the problem statement is ‘profits are lower than desired’. This can be followed by parts of the problem such as ‘higher input costs, higher production costs, lower sales revenues’, and plot under potential reasons such as ‘lower sales revenues’.
It is also important to ensure that the ‘why’s at each stage are grounded in reality and not wishful thinking. This is where the understanding of the industry, as well as discussions with stakeholders, becomes key.
Narrow the options & finalise the problem statement
After creating a detailed issue tree that looks at the problem comprehensively, the next step is to knock out nodes that are not relevant. This is where data starts playing a key role. Although a detailed data analysis is often not needed, a quick summarization of high-level data does a good job of indicating which nodes are more relevant than the others. This can be further validated through stakeholder interviews.
It is essential to apply critical thinking and question the data at the stage. This is because you don’t always get data that you can trust upfront, and at times, this data can be misleading and distracting. Therefore, this step will also include noise reduction and removal of features to ensure data quality. Once the data is sorted, you can use it to generate high-level insights on where the problem might be.
Data scientists need to resist the temptation to jump into a solution mode and form a hypothesis when they get a problem statement. Not everyone in business is good at articulating the problem precisely, and chasing a poorly defined problem will lead to poor results at the end of the road. Therefore, it is of utmost importance that data scientists spend the time to develop the ability to define the problem precisely, make it specific and unambiguous. These initial efforts will be hugely beneficial in the longer run.
This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill the form here.