If you want to know how to build great data science-enabled product, you needn’t look beyond LinkedIn, now Microsoft-owned. The Mountain View headquartered professional networking company has over 433 million members across 200 countries, allowing data scientists access to structured datasets that spawned cutting-edge data-driven products, most notably “People You May Know”; “Social Graph Visualizations”; “ Matching” and “Collaborative Filtering” that drove the company to success.
According to Mandar Parikh, VP Product & Engineering, Entytle Inc, “Linkedin is one of the early adopters of data science and is at the forefront of modern day data science. It was one of the first companies to put together a strong data science team. In fact, LinkedIn graph search and recruiter product has got data science behind,” he said.
Sign up for your weekly dose of what's up in emerging technology.
The rise of data-centric products at LinkedIn was overseen by DJ Patil, former US chief data scientist who joined LinkedIn as Chief Scientist and Senior Director of Product Analytics in 2008. Interestingly, Patil alongside Jeff Hammerbacher (founder of Cloudera, also led Facebook team), coined the term ‘data scientist’ and began hiring under this title. And some of the world’s most successful data scientists have come out of LinkedIn.
Rewind: Products that revolutionized social networking and e-commerce
PYMK became the new way to connect: According to Patil, who wrote an earlier post, data products are at the heart of social networks, in other words “social network is huge datasets of users, with connections to each other, forming a graph”. LinkedIn’s invention – PYMK, went on to become a critical part of Facebook, Twitter, Google+ who have all reportedly trademarked their friend-suggestion algorithm.
PYMK feature is based works on a recommendation engine and makes use of the clustering and classification algorithm to find out people we interact with the most. It also makes use of location and common friend’s data to dish out PYMK suggestions.
Amazon did something similar to e-commerce by using item-based collaborative filtering for “People Who Viewed this Item” feature wherein purchase logs were converted into TSV files with customer and item id, detailing whether the product was viewed or bought.
Netflix’s widely popular recommendation engine drives online engagement: According to Netflix blog, the movie experience is driven by various algorithms which are part of the Netflix recommender system, it’s most valued asset. Some of their most popular algorithms are ‘Instant Search’ and ‘Page generation’ go a long way in personalization and it all starts with the homepage.
So how does Netflix personalize the homepage algorithmically: through a rules-based approach. Netflix blog cites using a set of rules to define template that dictates for all members what types of rows can go in certain positions on the page. This template is improved through A/B testing to further understand where to place the rows for all members.
New age companies Uber, Salesforce and Airbnb reinventing business through data science enabled products
Data science is at the heart of Uber’s philosophy and ‘Surge Pricing’, ‘Fare Estimates’, ‘Driver Positioning’ and Matching are some of the most popular data science products from their stable.
Parikh cites Uber use case: according to Parikh, Uber’s success is driven by data centric products such as showing up surge pricing, ETA, heat maps and most importantly driver positioning. Driver positioning – how do the driver know where to wait for customers to maximize their ride revenue is driven by data science algorithms in the backend. Another use case is the ‘Matching algorithm’ that uses automatic matching, in this case Supplier Pick Model wherein based on the request, the nearest cab is made available.
Airbnb’s highly publicized matching algorithm to get host preferences right: The startup that turned the idea of hospitality on its head is also known for being very data-centric. Interestingly, news suggest that data science propelled it’s the startup’s valuation to $25.5 billion. However what made this startup popular was its matching algorithm that allowed interaction between hosts and guests. The model has been built on an estimated conditional probability of booking in a particular location, given the person searched. The California startup detailed in an older post “personalized search results to promote results that would fit the unique preferences of the searcher — the guest”.
Location relevance signal in their search built completely with data from the users’ behavior allows future guests locations where they can have great experiences, and the same model has been applied uniformly across the world enabling hosts to open up their homes for stay.
Salesforce transitioned from contact manager to data centric company: When it first started, Salesforce was just a contact manager, shared Parikh. “It was just a data entry system and there was absolutely no data science in there. Over the last years they have built up a data science team and they brought data science into the product itself with Salesforce Einstein,” he said. The CRM company is now helping sales persons across the world in closing deals faster with predictive scoring, courtesy Einstein, the AI assistant.
How to build data science enabled products
So what’s data science, quizzes Parikh. At the very core of it, what data science does is build models that work on large datasets, from thereon one makes predictions. But there is a secret to great data science – art of data science is to figure out which feature to use when. “If you look at datasets, it is rows of data stored in the table, every column is called a feature and the model that we build needs to shortlist the features. Based on the features, one makes predictions and shortlisting of features is called feature selection,” shared Parikh.
Citing a use case of feature selection, Parikh explained: Say for example you have a dataset about customers and you want to what product customers are most likely to buy. So you have to figure out which features are important in making those decisions. We might decide that age of customer is the feature that we would include in our analysis, colour of hair is a feature to be included but by some reason the zip code they are residing is not a feature to be included this process. This is what we call feature selection and once we have our features we build different types of models that fall into a couple of different types of categories.
Features that best define data science products — Adaptive and self-learning: According to Sean McClure, Director of Data Science at Space-Time Insight, the next generation of products essentially require data science and product development to be at its core. What data science does is it goes beyond just trend spotting and finds way to automate the learning that is required to connect an organization’s data to their decisions. And in data science, machine learning is crucial to building great products.
Parikh outlines two types of Machine Learning techniques for building models
Unsupervised Learning: Unsupervised learning is a set of algorithms that figures out what patterns exist in that data. “Essentially, when one doesn’t know what types of patterns exists but we can figure out where to look for those patterns through this technique,” he said.
Supervised learning: in this approach, one can use existing patterns in data to make predictions.
Key takeaways for product managers suggested by Parikh
Firstly, Parikh sets the record straight on Data Science, Machine Learning and AI which are used interchangeably. “Data Science models extracts patterns and make predictions. Machine Learning automatically calibrates the models and improves the predictions over time by taking results and feeding that back in model, thereby automatically predicting and improving them over time,” he said.
When it comes to building great products, nothing is more important than a) business metrics. “This is an important point to bear in mind for product managers and when you are building a product, one must focus solving the customer’s pain points,” he said. In designing products with data science – b) product management fundamentals don’t change. “At the core of it, product management stays same, be obsessed about your customer and have deep empathy and customer centricity,” he added. Lastly, c) focus on solving the use case at hand.
What is a low hanging fruit for data science? It is often choosing the simplest model where one doesn’t need to overthink. “Don’t let your data scientists tell you otherwise that the models are not ready yet, or let’s refine it further before we actually get it out,” he notes arguing that’s a mistake which can lead to analysis paralysis. Sometimes, the simplest most basic algorithm can take one very far. And in most situations, big data won’t necessarily be viable.
It is a view echoed by McClure who says, “Data science is less about finding the most predictive model and more about discovering ways to make analysis work with people.”
In the same vein, Patil in an older post emphasized how quality assurance (QA) of data products requires a completely different approach. What’s crucial in building great products is the ability to adapt and iterate quickly throughout the product life cycle. “To ensure agility, we build small groups to work on specific products, projects, or analyses. Building test datasets is nontrivial, and it is often impossible to test all of the use cases,” he added.