In today’s world where data science and machine learning have become one of the most desirable career options, everyone from college-going students to mid-career professionals are looking to switch career into data science. But the first pre-requisite that comes in mind while pursuing data science or machine learning field is that you should be able to delve into heavy loaded mathematical mumbo-jumbo to make a way into this career.
But what about those who are weak in mathematics or have no solid background in it at all? Should they lose all hope and stop dreaming about learning data science or machine learning at all?
Of course, not. In this article, we bring you the real picture about whether is it mandatory or not to know mathematics before the dreamers step their foot in the world of data science and machine learning.
Reasons Why Mathematics Is A Prerequisite
- Data science is more about learning and picking up things fast and accurately than any university major. With rigorous math, you pick up intuitions and techniques very fast and accurately and generally could have an easier learning curve than most.
- Data scientists are essentially statisticians, and most have graduate-level knowledge of math and statistics. It's required for positions in the field, and it is essential for correctly applying algorithms and hypothesis testing.
- Standard tools like logistic regression, decision trees or confidence intervals, are math-heavy. Most employers use standard tools. Due to which, hiring managers are looking for candidates with a strong math background, mostly for historical reasons.
- Academic training for data scientists is math-heavy for historical reasons using the professors that used to teach stat classes
- This the primary reason why one needs to really be math savvy to get a standard data science job, so sticking to standard math-heavy training and standard tools work for people interested in becoming a hardcore data scientist.
What A Functional Data Scientist Needs To Know In Reality ?
- A good data scientist or an engineer must have extensive knowledge of databases and best engineering practices.
- These include handling and logging errors, monitoring the system, building human-fault-tolerant pipelines, understanding what is necessary to scale up, addressing continuous integration, knowledge of database administration, maintaining data cleaning, and ensuring a deterministic pipeline.
Is Serious Mathematical Knowledge Required ?
- In reality, the set of techniques that covers all aspects of machine learning, the statistical engine behind data science does not use any mathematics or statistical theory beyond high school level.
- Anyone can learn data science very quickly if one has a strong background working with data and programming.
- Yet there is a set of techniques developed by hardcore mathematically oriented data scientists that do not use mathematics nor statistical
- These techniques work just as well and some of them have been proved to be equivalent to their math-heavy counterparts with the added bonus of generally being more robust.
- Also, these techniques are easy to understand and lead to easier interpretations as it is based on years of experience processing large volumes of diverse data in automated mode.
How Much Math Does A Data Scientist Actually Do ?
Busting the myth and revealing the reality
- Entry level data scientists to intermediate level data scientists, spend less than 5% of their time doing mathematics and it’s the same for machine learning too especially when one builds a model, very little time doing any math.
- For machine learning, the real prerequisite skill that one needs to learn is data analysis, beginners and there is no need to know calculus and linear algebra in order to build a model that makes accurate predictions.
- The role of mathematics is particularly significant only if one is involved in machine learning research in an academic setting or for few subsets of more advanced data scientists.
- There are people in the industry at high levels who are also using advanced math on a regular basis. There are who are pushing the boundaries of machine learning people working on bleeding edge tools.
- People at companies like Google and Facebook are only ones who certainly use calculus, linear algebra, and more advanced math routinely in their work.
The bottom line is that in industry, data scientists just don’t do much higher-level math but I reality they do is they spend a huge amount of their time getting data, cleaning data, and exploring data. The truth is that 80% of what people do is data munging and data visualization.
9 Math-Free Techniques Covering A Good Chunk Of Data Science
- Advanced Machine Learning with Basic Excel: This method is a light implementation of the technique in which basic Excel implementations exist which are very simple to understand. It’s currently available in Python, Perl, Julia, and R. This method will also support an SQL implementation in future.
- Machine Learning Automation with HDT: This method blends two traditional techniques called decision trees, and regression. But this implementation does not involve any node splitting or any traditional regression model, a regression part is the math-free Jack-knife regression. Earlier the same version used logistic regression, but as simple data transformations and using fewer parameters resulted in better performance logistic regression were replaced by Jack-knife regression.
- Model-Free Confidence Intervals: One needs to have a basic understanding of random variables and probability distributions to know the concept of confidence interval. These confidence intervals methods are based on percentiles which are very easy to understand, math-free and highly reliable to use for predictive analytics.
- Tests of Hypotheses: One of the difficult topics for students taking stats classes. Here, it has been replaced by a simple variant of my confidence intervals, so understanding the concept is direct.
- Jack-knife Regression with Excel: These regression techniques are so simple and efficient that it can be easily implemented in Excel or SQL.
- Jack-knife Regression: Theory - This is regression without statistical theory behind it, no even linear algebra. Yet it comes with confidence intervals. In this method even after using few meta-parameters, the loss of accuracy compared with classic regression is a bare minimum. The methodology works well in the presence of outliers, highly correlated features, or other violations of the assumptions that must be satisfied by one’s data set when using traditional regression.
- Indexation, Cataloguing, and NLP: This method is a math-free approach to supervised clustering.
- Fast Combinatorial Feature Selection: In this method, traditional techniques are based on some variance reduction principle, which usually requires understanding the concept of a random variable.
- Variance, Clustering, and Density Estimation: In these methods, no mathematics is involved.
- The key takeaway here is that for beginning data scientists and ML practitioners, data expertise beats math expertise as one will get much farther if one really knows the way around a dataset when compared to knowing calculus or college-level math.
- So, if your goal is to get a job in business or industry, your first milestone must be mastering data analysis as it’s not about mastering calculus. It ’s not at all about being able to write proofs or grind through math problems. It analysis that is what matters.
- One needs to master how to gather data, explore it, and prepare it. Overall mastering data visualization and data wrangling including aggregation is the key so that one use both together to be able to perform exploratory data analysis.
Even though these maths free techniques do emphasise a math-free data science or ML possibility. It is no way confirms the industry expectations or hopes to give you a job as that solely depends what exactly you're doing as a data scientist and the company you work for.
It is possible to be a functional data scientist without being a mathematical wizard, but based on experience, without a certain level of concrete mathematical literacy, one will struggle to be an effective practitioner in long term on the projects which are on the heavier end.
The reference for this article is taken from here.
Register for our upcoming events:
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- WEBINAR: HOW TO BEGIN A CAREER IN DATA SCIENCE | 24th Oct
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Provide your comments below
What's Your Reaction?
Martin F.R. works as a Technology Journalist at Analytics India Magazine. He usually likes to write detail-oriented articles which are well-researched in articulated formats. Other than covering updates on analytics, artificial intelligence & data science, his interests also include covering politics, economics, finance, consumer electronics, global affairs and issues regarding public policy matters. When not writing any articles, he usually delves into reading biographies of successful entrepreneurs or experiments with his new culinary ideas.