The ML Expert Who Hated Mathematics: Interview With Dipanjan Sarkar

Published on May 25, 2020
by Ram Sagar

Every week, Analytics India Magazine reaches out to developers, practitioners and experts from the machine learning community to gain insights into their journey in data science, and the tools and skills essential for their day-to-day operations.

For this week’s column, Analytics India Magazine got in touch with Dipanjan Sarkar, a very well known face in the machine learning community. In this story, we take you through the journey of Dipanjan and how he became an ML expert.

How It All Began

Dipanjan currently works as a Data Science Lead at Applied Materials where he leads a team of data scientists to solve various problems in the manufacturing and semiconductor domain by leveraging machine learning, deep learning, computer vision and natural language processing. He provides the much needed technical expertise, AI strategy, solutioning, and architecture, and works with stakeholders globally.

He has a bachelor’s degree in computer science & engineering and a masters in data science from IIIT Bangalore. Currently, he is pursuing a PG Diploma in ML and AI from Columbia University and an executive education certification course in AI Strategy from Northwestern University – Kellogg School of Management.

Apart from academia, Dipanjan is a big fan of MOOCs. He also beta-test new courses for Coursera before they are made public.

Dipanjan is also a Google Developer Expert in Machine Learning and has worked with several Fortune 500 companies. For an expert in ML, mathematics is a prerequisite, but we were surprised when we learnt that Dipanjan actually hated mathematics at school and this continued until ninth grade where he picked up statistics, linear algebra and calculus, the three pillars of machine learning.

I always loved the way you could program a computer to do specific tasks and make a machine actually learn with data!

Dipanjan’s renewed interest in mathematics was followed by his fascination for computer programming. With his growing fascination from mathematics to statistics and traditional computer programming, his career choice became almost obvious.

On Becoming An ML Expert

Reminiscing about his initial days, when the word ‘data science’ wasn’t worshipped yet, Dipanjan spoke about how the field was more conceptual and theoretical. Back then, there weren’t any active ecosystems of tools, languages and frameworks dedicated for data science. Hence, it took more time to learn theoretical concepts since it took more efforts to actually implement them or see them in practice.

With the advent of Python, R and a whole suite of tools and libraries, he believes that it has become easier to tame the learning curve of data science. However, he also warns that this can be a double-edged sword if one focuses on hands-on without deep-diving into the math and concepts behind algorithms and techniques to understand how it works or why it is used.

I have always been a strong advocate of self-learning, and I believe that is where you get maximum value

Due to the lack of mentors or proper guides, which are plenty nowadays on LinkedIn and other forums, Dipanjan had no other option than to self-learn with the help of the web and books.

For aspirants, he recommends the following books: –

Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio
Pattern Recognition by Christopher Bishop
Introduction to Statistical Learning
Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie

To dive deep into the concepts and to get hands-on, he recommends Deep Learning with Keras, Python Machine Learning and Hands-On Machine Learning as practical books with examples. Dipanjan has also written a handful of books on practical machine learning.

When it comes to practice and deploying ML models, Dipanjan extensively uses the CRISP-DM (cross industry process for data mining) framework, which he considers to be one of the best frameworks to tackle any data science problem.

Also, before diving into models or data, he insists on the importance of identifying and articulating the business problem in the right manner. For conceptualising an AI use-case, Dipanjan recommends something called AI Canvas, which he has learnt from the Kellogg School of Management:

Business Problem and Value
Key Objective Function
Data Strategy
Modelling Approach
Model Training Strategy
Customer Value

Use the right tools for the job without waging wars of Python vs R or PyTorch vs TensorFlow

When asked about his favourite tools, Dipanjan explained the importance of not paying attention towards Python vs R or PyTorch vs TensorFlow and using the right tools that get the job done.

For instance, he and his team use the ecosystem of tools and libraries centered around Python very frequently. This includes the regular run-of-the-mill pandas, matplotlib, seaborn, plotly for data wrangling and exploratory data analysis. For statistical modelling he prefers libraries like scikit-learn, statsmodels and pyod.

Dipanjan’s toolkit looks as follows:

Statistical Modeling: scikit-learn, statsmodels and pyod
Deep Learning Frameworks: both TensorFlow (tf.keras) and PyTorch depending on the problem at hand
Computer Vision: OpenCV, Matlab
NLP: scikit-learn, spacy, gensim and transformers
Transfer learning: pre-trained models from TensorFlow Hub
Building baselines: AutoML frameworks
Explainable AI: LIME and SHAP.
Languages: R and Java in the past for both data analysis as well as to build pipelines and web interfaces besides Python.

Along with picking the right tools, he recommends practitioners to always go with the simplest solution unless complexity is adding substantial value and last but not the least, he urges people not to ignore documentation.

To those looking to break into the world of data science, Dipanjan suggests one to follow a hybrid approach, i.e. learn concepts, code and apply them on real-world datasets.

First, learn all the math and concepts and then try to actually apply the methods you have learnt

In the long, tedious process of learning, Dipanjan warns that people might lose focus and get sidetracked into thinking why are they even learning a certain method. To remedy this, he insists on learning and applying if one aims of becoming a good data scientist without deviating from the goal.

On ML Hype And Its Future

Addressing the overwhelming hype around AI and ML, Dipanjan says that he is already witnessing the dust settling down and how companies are now actually starting to realise both the limitations and value of AI. Deep learning and deep transfer learning are actually starting to provide value for companies working on complex problems involving unstructured data like images, audio, video and text and things are only going to get bigger and better with advanced tools and hardware in future. However, he admits that there is definitely still a fair bit of hype out there.

Traditional machine learning models like linear and logistic regression will never go out of fashion

No matter how advanced the field gets, he believes that traditional machine learning models like linear and logistic regression will never go out of fashion since they are the bread and butter of various organisations and use-cases out there. And, models that are easy to explain, including linear models and decision trees will continue to be used extensively.

Going forward, he is optimistic about the use-cases and applications to optimise manufacturing, predicting demand and sales, inventory planning, logistics and routing, infrastructure management optimisation and enhancing customer support and experience, will continue to be the key drivers for almost all major organisations for the next decade.

When it comes to breakthroughs, Dipanjan expects something big to happen in newer domains like self-learning, continuous-learning, meta-learning and reinforcement learning.

Always remember to challenge other’s opinions with a healthy mindset because a good data scientist doesn’t just follow instructions blindly.

Talking about his tireless efforts to guide youngsters, he recollects how not having a mentor had been a major hindrance and how he had to unlearn and relearn overtime to correct his misconceptions. To help aspirants avoid the same mistakes, he mentors them whenever possible.

On a concluding note, Dipanjan said that he is mightily impressed by the relentless efforts of the data science community to share ideas through blogs, vlogs and online forums. Confessing his love for Analytics India Magazine, Dipanjan spoke about how AIM has been fostering a rich analytics ecosystem in India by reaching out to the global community.

Dipanjan will be speaking at Analytics India Magazine’s inaugural virtual conference, Plugin on 28th of May 2020. For more information, check our portal here.

Access all our open Survey & Awards Nomination forms in one place >>

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.

The ML Expert Who Hated Mathematics: Interview With Dipanjan Sarkar

How It All Began

On Becoming An ML Expert

On ML Hype And Its Future

Ram Sagar

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Is it Humane to Bash Humane Ai Pin?

Meta Llama 3 Now Available on Databricks For Enterprise

How Databricks is Enabling Agriculture’s Data Revolution with UPL

How Good is Llama 3 for Indic Languages?

OpenAI Hires Pragya Misra As Its First Employee in India

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

India is Making its Own AI Servers

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru