Council Post: A checklist manifesto to build data science teams

A data science team can be thought of in terms of a cricket team. The team's success depends on batsmen, bowlers, a captain, a coach, and performance analysts. All must work in tandem to ensure success. 
A checklist manifesto to build data science teams
Image © A checklist manifesto to build data science teams


AI, ML and data science have changed the game for companies across domains. Now, organisations are committing huge resources to build the perfect data science team. Building a team, let alone in data science, is a daunting task. 

A data science team does not have a flat composition. There are a lot of designations, roles, and responsibilities that fall under the purview of this team. While structuring a data science team may depend on the organisation’s final goals, some parameters remain the same– like the need for innovation and scalability.


Sign up for your weekly dose of what's up in emerging technology.

Building highly efficient and scalable data science teams 

An AI-based organisation uses artificial intelligence to achieve its end goal. However, the alignment is often lost while setting up data science teams, and the focus is on capability development. Even some Fortune 500 companies struggle with data science. The problem becomes even more challenging for service providers, mainly due to their complex and dynamic portfolio.

Hence it is important to start with a clear vision and mission. Data science teams should not be set up as a cost centre. Sooner or later, the team would struggle to prove its RoI. The best place to start is to make a pipeline of priority problems the business wants to solve: identify the problem; evaluate the feasibility of a data science-driven solution; define success criteria; and monitor RoI.

It is important to ensure organisational alignment and team integration with the strategic initiatives. In the absence of such aspects, the data science team may end up just as another shared services team. Finally, the team must maintain a balance between innovation and experimentation vs scalability and delivery.

Credit: Author

Team composition

People tend to incorrectly assume that a data science CoE would consist only of data scientists. A data science team can be thought of in terms of a cricket team. The team’s success depends on batsmen, bowlers, a captain, a coach, and performance analysts. All must work in tandem to ensure success. 

Data science has a lot of niche roles. For example, a machine learning engineer may struggle to play an end-to-end technical solution architect; however, the solution architect may have the skill but may struggle with decoding a business problem and associated complications.

The solution here is to draft a team structure based on these roles. The recruits must be able to identify opportunities, pitch, design the solution, execute algorithms, build scalable solutions with end-user modules, fix bugs, improve accuracy, maintain, and translate data science outputs to insights and actions, etc.

Each task requires a different set of skills. The team must aim at achieving scalability capitalising on these skill cohorts, subject to effort and bandwidth requirements for priority initiatives. 

Providing the right ammunition

The next step is to identify the platforms needed to carry out the responsibilities a particular role entails. For example, the cohort focusing on innovation may need the latest GPUs to run some advanced deep learning models. The models can be built on-premise from scratch or a cloud platform module could be leveraged. 

The key considerations while making the decision around selecting the right platforms are :

  1. Building a comprehensive end-state CI/CD pipeline architecture (data warehouse, data lake, AI/ML development and deployment, Model management, Visualization and business consumption layers including automation through RPA and working toward the ideal state architecture) is key to onboarding the right tools and platforms. Most of the cloud players like AWS, Azure have adopted this comprehensive approach and hence have numerous modules that cater to these areas.
  2. Development vs production technology ecosystem and architecture depending on the volume and frequency of niche experimentation (used by SMEs) vs scaled production (BAU data scientists and engineers). 
  3. Build vs buy especially given the bunch of off-the-shelf use case driven platforms and tools out there 
  4. On cloud vs on premise is a completely different debate but cloud is winning the battle due to obvious reasons around cost, flexibility, scalability, except for security concerns in some cases  

Processes and governance

In many cases, companies hire experts and make huge investments in tools and platforms, only to see the team flatline. More often than not, this results from giving more attention to people and platforms, and not the process itself. For example, MS Dhoni, one of the most successful captains the cricket world has ever seen, emphasised on the ‘process’ to prepare the team for big matches. Similarly, there are templates, frameworks, and processes built around data science to ensure the team is successful. Some of the critical ones include:

  • Problem-solving framework: Having the right framework helps in ensuring the right problem is being solved using data science as a means. Most analytics and consulting driven leaders have their own frameworks – Mckinsey has the Mckinsey Way; MuSigma calls its framework muPDNA; eClerx has the Analytics Consulting framework.
  • Innovation roadmaps: Building roadmaps to make sure there is a balance of cost center kind of operations vs value to the business to demonstrate RoI.
  • Talent: Attracting and retaining talent through individual and team performance evaluations. To retain talent, companies offer rewards and recognition on the basis of innovation, efficiency, delivery, etc.
  • Operating model: Teams must focus on choosing the right type of operating model – push vs pull, centralised vs hub and spoke, etc.
  • Maturity evaluation framework and progress tracking.

Important questions to ask

Now you have the players, the equipment and the mechanisms needed to make them ‘match ready’, but one cannot rule out the possibility of an unexpected event on the match day. To ensure a ‘match-winning team’, one must ask the following questions:

For the data science team

  • Are business leaders coming to you with problems?
  • Are you able to demonstrate RoI/ uplift from solutions built?
  • Is there a continuous increase in the different number of solutions developed, deployed, and most importantly consumed in BAU processes?
  • How easy is it for you to scale up?
  • Is efficiency improving continuously? Are you able to deliver solutions faster?

For a data science start-up/service provider 

  • Are your capabilities aligned with what clients are looking for? 
  • Are you at a reasonably high state of maturity compared to where the market stands?
  • Do you have a good track record of converting prototypes and pilots into long term engagements and solutions? 
  • Are you liked by clients, business teams and in-house data science teams (or maybe hated by them if thought from a different perspective)?
  • Are you adaptive and flexible enough and yet scalable to be able to integrate with the clients across people processes and platforms?
  • Can you show an uplift in KPIs to business stakeholders driven by solutions and insights provided by your team?
  • Do you have a high share of repeat / long term client engagements vs one-off developments?
  • Are you able to maintain the balance between cost, quality and time to build

Wrapping up

While the above cheatsheet can get you started in building and scaling a data science team, each of the pointers mentioned requires careful consideration, evaluation and planning to make the right decision. The advantage is you can use the evaluation criteria to understand if the team is on the track. Adapting rapidly, course correcting and being on top of the latest in the industry, awareness about risks and success drivers are critical to building successful data teams. 

More Great AIM Stories

Ruble Joseph
Ruble heads the Centre of Excellence for Data Science, Analytics and Consulting at eClerx. In the last 4+ years, Ruble has built this team grounds up and today drives growth for eClerx by building successful client partnerships in Data Science.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
Amit Raja Naik
Oh boy, is JP Morgan wrong?

The global brokerage firm has downgraded Tata Consultancy Services, HCL Technology, Wipro, and L&T Technology to ‘underweight’ from ‘neutral’ and slashed its target price by 15-21 per cent.