Every organisation, big or small, has a ton of data to deal with. Many people are shifting from different sectors to data science, either because of its demand or because of the challenging work that it offers.
Sign up for your weekly dose of what's up in emerging technology.
But starting a data science team is not like forming a usual IT team. Behind the scenes in corporate offices, these data science teams are chosen carefully and brought together strategically.
It is very important for the data science teams to have a clear idea of their goal from the very beginning. In general, the team distribution and even the hierarchy differs according to the company. Some organisations keep the data science team under CTO, some keep it under the CFO, some keep it under the CMO. Some prefer to distribute this expertise in different groups, so that each group has a data science talent, whereas some organisations have just one data scientist.
Organising A Data Science Team
There are six popular ways in which an organisation can form its data science team:
1.Decentralised: In this approach, analytics are applied occasionally across the organisation and leads to decentralised reporting.
2.Functional: Here, data science professionals work specifically in just one department when data science is more extensive.
3.Consulting: In such an approach, different teams call the data scientist to solve their department problem. The data scientist has no unbudgeable team that he works in.
4.Centralised: Here, analytics is used in strategic tasks and one data science team serves to solve the problems of the whole organisation.
5.Centre of Excellence (CoE): In this approach, data scientists are allocated to different units in the organisation. The centralised approach is still intact and is the most balanced structure where analytics activities are highly coordinated.
6.Federated: The analytics team works from a central point and they address complex cross-functional tasks within the company.
Data Science Team Management
How the team works depends entirely on the company profile. The main crux of the job, however, is to make mathematical models for predictions and provide the company with the solutions that it needs. Amazon, for example, uses survival analysis models to increase marketing and advertising efficiency.
The working may also differ according to whether the organisation is a startup or a well-established company. In large companies, the work is mostly very composed and filtered to a specific area and the basic infrastructure is built beforehand. Whereas in startups, data scientists have to build the infrastructure and machine learning systems from scratch. Most of the backend coding has to be done by them and he is supposed to be an all-rounder. Data Scientist of Airbnb, Martin Daniel, in his talk had summarised about having an experimentation-centric culture in the team at Airbnb’s.
“The goal is to turn data into information and information into insight”, said by Carly Fiorina, former CEO of Hewlett-Packard, is very accurate when understanding the functioning of a data science team in an organisation. The team roughly has a data scientist, data engineer, business analyst, visualisation expert, team leader, stakeholder representative and deployment engineer.
Team leader: The team leader’s job is to keep a check if all the things are on track while visualisation of new projects and problems is being done. He coordinates the quants, devs, and analysts in the team. They also need to be skilful in the kind of company that they work in, a steel industry, for example, will need a steel processing expert.
Data engineers: There lies a very thin line between the jobs of data scientists and data engineers. The major difference between the roles of the two is that data scientists are more focused on statistics and mathematics, whereas data engineers target building infrastructure and architecture. Data engineers are involved with processes of data warehouse design, extract transform load (ETL), data bounds checking and database tuning. Data warehouse design has to ensure that the data is easily accessible to the team. ETL process has to ensure that the data accumulated has been there is a smooth functioning when additional data is added. Other things included in ETL are data bounds checking and QA tasks. Input of the data, processing and storing is the job of data engineers. Data engineers are lay the main framework for the team. He has to be skilled in programming languages like C++, SQL, Java, Perl and Hive.
Business analyst: The role of a business analyst/translator is to tell the meaning of the data and the problem, like giving insights about its physical significance, from the raw data the the data engineers provide and the team has to work on. They create Key Performance Indicators (KPI) and new features from this raw data and provide real-time reporting using Power BI or Tableau. Basically, their job is to provide insights of the problem to the team. The skills preferred for this position are data visualisation, business intelligence and SQL.
Data scientists: Data scientists are in a constant interaction with the data infrastructure that data engineers build. They are mainly involved in building and deploying mathematical models. They are skilled in computer programming like R, SAS, Python, Matlab, SQL, Hadoop, as well as statististics. Some organisations have data scientists divided into machine learning engineers and data journalists. Where the former takes care of everything that goes into training and modelling, for example, what data should go into what model, while the later involves in explaining the objective to the other team members in simplistic terms. The deployment of ML models is also done by machine learning engineers. ML engineer, also called deployment engineer works with the management to develop deployment specifications and configurations. It is also his responsibility to work with the team to troubleshoot deployment issues. The skills in programming that both these groups require is more or less the same.
Data analyst: Data visualisation analyst or expert makes the received data meaningful by providing key information in many different ways. They monitor and test the model performance. They supervise the model and suggest in any corrections are to be made. His job is to make large understandable and usable, integrating data depending on the business requirements and delivering the data in a useful and appealing way.
Importance Of A Good Data Science Team
Harvard University had declared data science as the sexiest job of the 21st century. It has to do with the innumerable demand that it has gained over the past few years and the realisation that they are one of the major pillars of the organisation’s development. It is important for the whole data science team to know the process clearly, in order to evolve, since the main objective is connecting their work to business and this team will help in shaping up biggest organisation decisions. Coursera also has a course by the Johns Hopkins University on how to build a data science team. Data is a precious thing and will last longer than the systems themselves and so this profession deserves the demand that it seeks.