Now Reading
5 Fundamental Theorems Of Machine Learning

5 Fundamental Theorems Of Machine Learning


The last century has seen tremendous innovation in the field of mathematics. New theories have been postulated and traditional theorems have been made robust by persistent mathematicians. And we are still reaping the benefits of their exhaustive endeavours to build intelligent machines.

Here is a list of five theorems which act as a cornerstone for standard machine learning models:

The Gauss-Markov Theorem

The first part of this theorem was given by Carl Friedrich Gauss in the year 1821 and by Andrey Markov in 1900. The modern notation of this theorem was given by FA Graybill in 1976.

Statement: When the error probability distribution is unknown in a linear model, then, amongst all of the linear unbiased estimators for the parameters of the linear model, the estimator obtained using the method of least squares is the one that minimises the variance. The mathematical expectation of each error is assumed to be zero, and all of them have the same (unknown) variance.

Application: Linear Regression models

Universal Approximation theorem

Statement: A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R^n, under mild assumptions on the activation function.

Application: Artificial neural networks

Singular Value Decomposition

It can be used for eigen decomposition of a symmetric matrix with positive eigenvalues to any m x n matrix by polar decomposition.

Statement: Suppose M is a m × n matrix whose entries come from the field K, which is either the field of real numbers or the field of complex numbers. Then there exists a factorisation, called a ‘singular value decomposition’ of M, of the form


  • U is an m × m unitary matrix over K, (unitary matrices are orthogonal matrices),
  • Σ is a diagonal m × n matrix with non-negative real numbers on the diagonal,
  • V is an n × n unitary matrix over K, and V is the conjugate transpose of V.

Application: Principal Component Analysis

Mercer’s Theorem

Postulated by Mercer in 1909, this theorem represents symmetric positive functions on a square as the sum of convergence of product functions.

See Also
faker tutorial

Statement: Suppose K is a continuous symmetric non-negative definite kernel. Then there is an orthonormal basis {ei}i of L2[a, b] consisting of eigen functions of K such that the corresponding sequence of eigenvalues {λi}i is non-negative. The eigen functions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation

Application: Support Vector Machines.

Representer Theorem

Statement: Among all functions, which admit an infinite representation in terms of eigen functions because of Mercer’s theorem, the one that minimises the regularised risk always has a finite representation in the basis formed by the kernel evaluated at the ‘n’ training points.

Where H is the Hilbert space and k is the reproducing kernel.

Application: Kernel tricks (class of algorithms for pattern analysis, Support Vector Machines)


What Do You Think?

If you loved this story, do join our Telegram Community.

Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
In Love
Not Sure

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top