Neeraj Gehani, Product Director at dunnhumby, made a case for having a product mindset at the fourth edition of the Machine Learning Developers Summit (MLDS) during his session titled ‘Emergence of Data Products.’ He unpacked the reasons for data products becoming increasingly important, types of data products, the framework for building data products and the unique challenges in building data products.
“Value delivery is about personalisation. From a competitive advantage perspective, it is about customer retention, profitability, or using the data to create a competitive edge around the business. And from a strategic differentiation perspective, companies have unique data assets, unique algorithms or business models, and all these things come together to create the flavor of data products,” said Gehani.
There are three types of data: raw data, aggregated data and data from ML models. Any kind of data in the form of transactional data–for example, the data that gets stored in Google Cloud, Amazon s3, Microsoft Azure etc–meant for internal use is raw data. The data is built or maintained by data engineers. The next level of evolution is around aggregated data to build dashboards to get insights. Such data sets are internal. The dashboards offer insights or reports for decision making: The datasets are typically built and maintained by business analysts with support from the developers. Automated platforms, like Optimizely, produce data in an automated fashion for testing, model telemetry, etc. Such platforms are built and maintained by data scientists and machine learning engineers.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Of late, vendors are coming up with a lot of tools, from infrastructure side to machine learning side, analyst side, enterprise applications, security data sets etc.
“The developer community is heavily invested in learning the relevant skills to capitalise on the demand in the market. There are a lot of people undertaking courses in data analysis, SQL, Python, and ML. In fact, from May 2019 to May 2020, there has been a 300% increase in total enrolments for machine learning courses. But, according to a survey, less than 40% of organisations are managing data as a business asset and creating a data driven organisation,” informed Gehani.
The disconnect is palpable. The vendors are rolling out data products and training the community. But when it comes to creating a data driven organisation, forging a data driven culture, or managing data as a business asset, the numbers are not positive. Something is really off in terms of why businesses are not getting the kind of value they would expect from datasets.
It is essential to build a rapport with business teams, rather than work in silos. From a solutioning perspective, data scientists, developers and machine learning engineers need to start operating with a product mindset.
Once the developers have built a model and a dashboard, it is essential to keep iterating the product to keep them relevant. This comes with a product mindset in terms of thinking through the value chain end to end. “I think this is something our community is missing. It starts from defining business problems at a very high level in terms of what are the objectives, what are the use cases? What is the benchmarking success criteria?” said Gehani.
Even if data scientists have defined the business problem, it is essential to understand the data in terms of whether the right data is available, identify the gaps in the quality, clean up the data, look for things like missing value imputation, labeling all those things along with feature engineering. Then, when model development is important for predictive kinds of products, it’s all about selecting the modeling techniques to building models and then evaluating models.
“The point I’m trying to make is that as a community, I think we are very, very siloed. So we have people who are focused on understanding data, and making sure that data is set up for success. They are focused on model development, but people who think end-to-end value chain, are limited.” said Gehani. “So just building a science model alone will not help. You have to think through this end-to-end and think about, all the areas where integrations will happen.”