Now Reading
5 Fundamental Theorems Of Machine Learning

5 Fundamental Theorems Of Machine Learning

The last century has seen tremendous innovation in the field of mathematics. New theories have been postulated and traditional theorems have been made robust by persistent mathematicians. And we are still reaping the benefits of their exhaustive endeavours to build intelligent machines.

How To Start Your Career In Data Science?

Here is a list of five theorems which act as a cornerstone for standard machine learning models:

The Gauss-Markov Theorem

The first part of this theorem was given by Carl Friedrich Gauss in the year 1821 and by Andrey Markov in 1900. The modern notation of this theorem was given by FA Graybill in 1976.

Statement: When the error probability distribution is unknown in a linear model, then, amongst all of the linear unbiased estimators for the parameters of the linear model, the estimator obtained using the method of least squares is the one that minimises the variance. The mathematical expectation of each error is assumed to be zero, and all of them have the same (unknown) variance.

Application: Linear Regression models

Universal Approximation theorem

Statement: A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R^n, under mild assumptions on the activation function.

Application: Artificial neural networks

Singular Value Decomposition

It can be used for eigen decomposition of a symmetric matrix with positive eigenvalues to any m x n matrix by polar decomposition.

Statement: Suppose M is a m × n matrix whose entries come from the field K, which is either the field of real numbers or the field of complex numbers. Then there exists a factorisation, called a ‘singular value decomposition’ of M, of the form


  • U is an m × m unitary matrix over K, (unitary matrices are orthogonal matrices),
  • Σ is a diagonal m × n matrix with non-negative real numbers on the diagonal,
  • V is an n × n unitary matrix over K, and V is the conjugate transpose of V.

Application: Principal Component Analysis

Mercer’s Theorem

Postulated by Mercer in 1909, this theorem represents symmetric positive functions on a square as the sum of convergence of product functions.

See Also

Statement: Suppose K is a continuous symmetric non-negative definite kernel. Then there is an orthonormal basis {ei}i of L2[a, b] consisting of eigen functions of K such that the corresponding sequence of eigenvalues {λi}i is non-negative. The eigen functions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation

Application: Support Vector Machines.

Representer Theorem

Statement: Among all functions, which admit an infinite representation in terms of eigen functions because of Mercer’s theorem, the one that minimises the regularised risk always has a finite representation in the basis formed by the kernel evaluated at the ‘n’ training points.

Where H is the Hilbert space and k is the reproducing kernel.

Application: Kernel tricks (class of algorithms for pattern analysis, Support Vector Machines)


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top