- The first step is to understand the business problem and to translate it into a statistical/machine learning problem.
For this week’s practitioner’s series, Analytics India Magazine got in touch with Karthik Kumar, Director of Data Science for Auto Practice at Epsilon. He leads the data science capabilities charter where they develop advanced analytic solutions, products, strategies focused on the acquisition, engagement, and retention of the customers for the top Automotive OEMs.
AIM: Tell us a bit about your educational background
Karthik: I received my Master’s in Management Information Systems from University at Buffalo, SUNY. Before that, my undergraduate degree was in Science. My passion for data science helped me learn from top universities such as MIT, Stanford, and Indian Statistical Institute. These valuable educational experiences, courses, and an urge for continuous learning and curiosity helped me build my career in data science.
AIM: How did your data science journey begin?
Karthik: Data science has always fascinated me right from my academic years. What has always intrigued me is how data can be converted into valuable insights with a magic sauce algorithm. At an early stage in my career with a leading consulting and tech firm, I started to apply these learnings on real use cases.
One of my initial projects was where I was asked to solve a human capital challenge to proactively identify high performers who could attrite which was a key focus area for business. The reports and dashboards which we had developed were only a reactive approach and hence the focus was to develop a proactive solution that would be an early warning tool to take necessary action to de-risk attrition with high performers. This was the first step towards developing a machine learning model which was created using an advanced algorithm where it predicted the probability of a set of high performers expected to attrite in the next “x” number of days. This acted as an early warning tool for the business to take necessary actions.
This solution was also presented at Indian Statistical Institute during an Analytics summit where it won the “Best Predictive Analytics Paper” award. This was a catalyst project which propelled my Data Science journey where I started developing ML solutions for Retail, Human Capital, Telco, Finance and Media clients.
AIM: What were the initial challenges and how did you address them?
Karthik: The initial challenges were to get an understanding of the industry and domain. For example, when I was roped in for an industry project for a leading global mining group that focuses on finding, mining and processing the Earth’s mineral resources, we were supposed to build a proof of concept to identify anomalies of the ores after the crushing process; for which I had to spend extensive time to understand the mining process to clearly define the target variable to build the machine learning solution.
The next challenge was to work across various teams to gather the right data, for which, we had to translate a lot of the people’s knowledge into a process and then turn it into quality data. With the high model development time, the real challenge was to develop a near real time solution with continuous data feed. This is where we roped in the engineering team to help us with a data architecture solution to feed input data to the machine learning model to score and identify the anomalies.
AIM: What does a typical day look like for you at Epsilon?
Karthik: Each day the endeavor is to get better by at least 1% with innovation, stakeholder engagements, execution of the data science projects which lead to high business impact, and customer satisfaction. I connect with my team to ask them the right questions, provide them the required guidance and support to build the new-age data science solutions and products. The talented team of data scientists, engineers and analysts in turn help me learn continuously.
To ensure we are abreast of the latest developments in the area of machine learning and AI, we have a strong team engagement culture where we meet every month to deep dive on key topics, discuss in detail a research paper, or have colleagues share their best practices with the team. These knowledge-sharing sessions help us collaborate, share ideas, and especially during these testing times of the pandemic, this helps to keep our innovative, creative minds challenged even in a virtual, hybrid workplace!
AIM: How do you approach a data science problem?
Karthik: As they say, “Data is the new code”. The machine learning code is only a small portion of the puzzle and would not suffice to take the model from a POC stage to production. Deployment is a process where it is a continuous data flow and learning journey, making ML an iterative process. Hence maintaining high quality in all phases of the ML life cycle is the most important task.
The first step is to understand the business problem and to translate it into a statistical/machine learning problem. In this expedition, the quality of the data is critical and this is where a data scientist has to spend maximum of his efforts to better comprehend, and transform the data to understand its characteristics to build a robust machine solution leading to successful business outcomes.
The amount of work on mining the right data, improving and understanding the data is the most important step which I would emphasise on my projects. An extensive feature engineering from the data would help build a strong data science model versus iterating the models on a fixed data set. My tip to budding data scientists would be to invest maximum time in gathering the right data, exploring and creating the features innovatively.
“At Epsilon, we are tool-agnostic where our engagement models are flexible to ensure it is aligned with our customer requirements, infrastructure, use cases.”Karthik Kumar
80% of data science work is about data which is the food for AI, the high-quality data with the right preparations would be the most important step in the AI systems. Though it seems less fancy to explore features, create the multilevel variate plot, transform data and create data features, these are the building blocks for a solid model.
AIM: What does your machine learning toolkit look like?
Karthik: At Epsilon, we are tool-agnostic where our engagement models are flexible to ensure it is aligned with our customer requirements, infrastructure, use cases etc. Our Offerings are actionable tools facilitating improvements in all marketing mix elements across the customer journey to drive acquisition, engagement, and retention. We develop our advanced analytics offerings, products, services and, machine learning capabilities with a wide spectrum of analytics tools, languages, platforms such as R, Python, SAS, Spark, PySpark, H2O.ai, Databricks, Dataiku, AWS and, Azure cloud ML platforms, AWS SageMaker. With our Data science COE community, we discover DS applications, tools, techniques to continuously evolve with the new AI/ML developments.
AIM: There is a lot of hype around AI and ML. Which domain of AI, do you think, will come out on top in the next 10 years?
Karthik: The hype is real. I would like to borrow a couple of quotes here:
“AI is the new electricity” – Andrew Ng.
“AI is probably the most important thing humanity has ever worked on. I think of it as something more profound than electricity or fire”
– Sundar Pichai
There are a lot of runaway horses, but the promise is real if you are methodical about the data you have and the use cases you solve for; there’s a pot of gold at the end of the rainbow for sure.
A McKinsey study estimates a 13 trillion US dollars’ worth of US economic growth and value creation due to AI. Though at present, the AI investment is growing fast, it is dominated by digital giants such as Google and Baidu. The exciting part of AI will be its development and adoption across industries and we are already seeing that on a huge scale in the automotive sectors.
Today 99% of the AI economic value creation is through one type of technique called supervised learning which is an input to output relationship. An automotive-related example is a technology used in autonomous cars, where an input is a picture of what is in front of the car and the decision as an output feeds into car systems. With the right commercial use cases, in the next decade, AI is going to transform industries such as Retail, Media, Education, Healthcare, and travel, which at present has low to medium adoption of AI.
AIM: What does it take to land a machine learning job at Epsilon?
Karthik: At Epsilon, we put our people at the center of what we do and provide them with challenging opportunities where they get to work with some of our world-class data assets, technologies, and platforms across industries. We also have vibrant analytics and data science practitioner community internally, that works closely with top talent to learn, collaborate, and get inspired by the analytics work across the various practice areas through analytics events such as hackathons, roadshows that we host regularly. We extend these ‘Hackathon’ programs externally to identify top talent people from around India, APAC marketplaces and also we have a yearly Campus graduate and internship programs where we work with the top colleges to hire freshers to our Epsilon family who add to the required creativity and, new approaches to solve the complex data science use cases.
Any aspiring candidates specific to ML roles should be curious, display a high level of passion and showcase examples where they have solved machine learning use cases. Machine learning is an evolving field and the candidate should always be on a continuous learning path and would recommend showcasing their contributions on platforms such as Kaggle, Github, and LinkedIn.
AIM: What books and other resources have you used in your journey? Few words for the aspirants as well.
Karthik: It is important for aspiring Data Scientist to develop a strong foundation in statistics, I would highly recommend Introduction to Statistical Learning, a book that could help build on the basics of statistics knowledge and once you graduate, at an intermediate/advanced level, the next recommended book would be Elements of Statistical Learning that cover advanced topics like neural networks, matrix factorization.
Also, one needs to build a stronghold on at least one programming language and with the data handling capabilities, parallel computations, large integration capabilities, and communities, I would recommend Python open-source tool which has the most advanced libraries to solve the new-age data science problems. The Applied Data Science with Python Specialization by Coursera helped me gain good applied knowledge of Python and to get hands-on with toolkits such as pandas, matplotlib, scikit-learn, nltk. These are just a few to name which have helped me in my Data Science journey.
As a beginner, one might be overwhelmed with the various options available to develop the Data science skills in programming, mathematics and statistics. Be patient, stick to one data science programming language, develop a dense understanding of mathematical and statistical skills. This will help beginners build a strong foundation for their Data Science careers. And lastly, getting hands-on experience is very crucial to help you apply the theoretical to the practical aspects to witness tangible outcomes. I would suggest new data scientists participate in competitions on platforms like Kaggle which will help in the practical application of knowledge and to learn from the strong data science community.