Last updated January 24, 2021
In Innovation in AI

Why Data Analysis & Not Math Is A Prerequisite For Machine Learning

Published on February 12, 2018

by Richa Bhatia

If you are an absolute Machine Learning beginner and are wondering whether data analysis is a prerequisite, then here’s the hard-fact – data analysis meaning the task of gathering data, cleaning data, exploring and visualizing data is an absolute must before one gets started on machine learning.

However, let’s also get one thing clear – machine learning is as much about linear algebra, probability theory and statistics (especially graphical models) and information theory as much as data analysis. And data analysis forms an important part of understanding — ML algorithms are used with real world data, but without the knowledge of data processing/data-science since data never comes in structured, labeled format, you wouldn’t get far with algorithms. According to a section of ML practitioners, data science and machine learning are essentially two sides to the same field.

Let’s see how data analysis will help you level up on ML

First, you won’t be able to build a good enough model if you don’t have solid skills with data analysis
Even if you use packaged tools like Python’s scikit-learn –that end up performing the hard math– one needs to have a solid understanding to make these tools work effectively. Because a solid understanding of exploratory data analysis and data visualization, you can’t get far in machine learning
Even for application of tools such as caret and scikit-learn, you’ll need to be able to gather, prepare, and explore your data. You a need solid understanding of data analysis

Let’s enumerate how one can use Data Science as a platform to dive into basics of Machine Learning

1) 80% of data science work involves data prep

By now, it is common knowledge that 80% of data science work involves data preparation, EDA, and visualization and for most data scientists, data organization and manipulation is still a much-needed skill and this is where they implement all machine learning algorithms using scikit-learn.

This means when one is building machine learning models, 80% of the time will be spent in gathering data, exploring it, cleaning it, and analyzing results with data visualization.

2) Knowing how to manipulate data is critical

For beginning ML practitioners, manipulating data is more critical than understanding the math underlying the algorithm: While Linear algebra is the building block of machine learning and forms the key to understanding the statistics applied in ML, most data science practitioners have a working understanding of calculus or linear algebra.

However, they are excellent data analysts and usually lean towards the minimum requirement of math and fill in the gaps on the job. According to a data science practitioner from financial sector, if you want to be able to write an algorithm from scratch, you need a very high understanding of linear algebra. If you want to a data science practitioner, otherwise one doesn’t need a high-level knowledge of calculus to understand how an algorithm behaves.

However, in the long run advanced math is an absolute must, but in the short-term, one must focus on data-visualization/data-manipulation stack in R or Python.

This the most widely recommended package to get started for visualization/wrangling/analysis:

R: ggplot2, dplyr, tidyr, stringr

Python: numpy, pandas, matplotlib, seaborn

3) Before one dives into ML, you need to master visualization

The job description of an entry-level data scientist involves a lot of data aggregation and data visualization. This in turn helps a lot to perform exploratory data analysis. For professionals who prefer R, you can learn: ggplot2 for data visualization, including basic visualizations like scatterplots, histograms, bar charts and also learn how to use ggplot and dplyr together for exploratory data analysis. Python users can learn to use Pandas and data visualizations together for exploratory data analysis.

4) Linear algebra is defined as the workhorse of Machine Learning

That said, Linear algebra is important if you want to understand the inner workings of machine learning and gradient descent. One can’t emphasize enough the importance of grasping essential concepts of statistics and probability, given how machine learning is often dubbed as statistical learning.

The field is so vast and endless that it is difficult to follow a focused learning plan and most entry-level data scientists grapple with covering all the essential concepts in a short span of time. For a deeper understanding of the algorithms one needs statistic and stochastic process. But this is the moment, it becomes difficult since one needs knowledge of calculus and Linear Algebra.

However, for an absolute beginner it can be difficult to understand all the important aspects and that’s why a foundation in data analysis, can help one build machine learning models that work. Also, one must remember that during a machine learning workflow, the experience from exploratory data analysis will help as an input to the “data transformation” step of ML workflow.

Outlook

Not everybody has a rigorously quantitative background to work their way through the math required for Machine Learning. Given the rising interest in the field, and a lack of formal training, most beginners (who follow the self-learning path) find it challenging and frustrating to master the concepts completely. That’s why, beginners can use data analysis as a platform to dive into machine learning without completely mastering linear algebra or calculus.

Meanwhile, here’s a guide to ML by Jason Brownlee where he talks about how to get a handle of Linear Algebra for ML. According to Brownlee, there are a minimum of 3 topics one must cover – a) Notation (it will allow one to piece things together); b) Operations which means learning how to perform simple operations such as multiplying, transposing matrics and c) Matrix Factorization, this requires a deep dive into concepts like SVD and QR. This forms the bedrock of machine learning.

Besides, don’t forget to brush up the basics with these books on ML– Elements of Statistical Learning. Hastie, Tibshirani, Friedman & Information Theory, Inference, and Learning Algorithms by David MacKay. For Linear Algebra, check out Linear Algebra, Theory, and Applications by Kuttler

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Richa Bhatia

Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.

Full-day hands-on workshop by ADaSci on Linear Algebra with Python for Data Science

How A Discussion With Dad Ignited Fire In The Belly For Aswini Thota To Kickstart His ML Journey

The Garrison Platoon Of Books: How To Read 43 Machine Learning Books in a Year

How To Stand Out In A Highly-Competitive Machine Learning Jobs Ecosystem

10 Essential Analytics Books by Indian Authors

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

India is Making its Own AI Servers

Pritam Bordoloi

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Top Editorial Picks

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

Sukriti Gupta

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the