 # Complete Guide to Linear Algebra for Data Scientists with NumPy

Data Science relies heavily on Linear Algebra. NumPy offers array-like data structures & dedicated operations and methods for Linear Algebra.

NumPy is an essential Python library to perform mathematical and scientific computations. NumPy offers Python’s array-like data structures with exclusive operations and methods. Many data science libraries and frameworks, including Pandas, Scikit-Learn, Statsmodels, Matplotlib and SciPy, are built on top of NumPy with Numpy arrays in their building blocks. Some frameworks, including TensorFlow and PyTorch, introduce NumPy arrays or NumPy-alike arrays as their fundamental data structure in the name of tensors.

Data Science relies heavily on Linear Algebra. NumPy is famous for its Linear Algebra operations. This article discusses methods available in the NumPy library to perform various Linear Algebra operations with examples. These examples assume that the readers have a basic understanding of NumPy arrays. Check out the following articles to have a better understanding of NumPy fundamentals:

#### THE BELAMY

Import the NumPy library for Linear Algebra functions and Matplotlib for some plotting functions.

``` import numpy as np
import matplotlib.pyplot as plt ```

## Transpose of Matrix

Matrix transpose is performed with the `transpose` method on a nested list or a Python array, or a higher-dimensional Numpy array.

``` # Transpose of a Matrix (as nested list)
a = [[1,2,3,4],[2,3,4,5]]
b = np.transpose(a)
print('a\n',a)
print('b\n',b) ```

Output:

If the matrix is a NumPy array, it can be treated as an object and method `T` can be applied over it as follows.

``` # Transpose of a Matrix (as NumPy array)
a = np.array([[1,2,3,4],[2,3,4,5]])
b = a.T
print('a\n',a)
print('b\n',b) ```

Output:

## Dot Product

The `dot` method of NumPy performs dot-matrix product (scalar product) for 1D or higher dimensional arrays. If the inputs are scalars (numbers), it performs multiplication.

``` # scalars
a = 5
b = 3
z = np.dot(a,b)
print(z) ```

Output:

The above NumPy operation is equivalent to simple multiplication.

``` z = a * b
print(z) ```

Output:

In the case of one- or higher-dimensional arrays, the inputs can be either NumPy arrays, Python arrays, Python lists or Python’s nested lists.

``` # 1D arrays or vectors
a = np.array([1,2,3])  # or a = [1,2,3]
b = np.array([2,3,4])  # or b = [2,3,4]
z = np.dot(a,b)
print(z) ```

Output:

``` # 2D arrays or matrices
a = [[1,2,3],[2,0,3],[7,-5,1]]
b = [[3,-1,5],[-2,-6,4], [0,4,4]]
z = np.dot(a,b)
print(z) ```

Output:

A NumPy array is a Numpy object upon which the `dot` method can be performed as below. However, this method accepts only NumPy arrays to operate on.

``` # convert lists into NumPy arrays
a = np.array(a)
b = np.array(b)
z = a.dot(b)
print(z) ```

Output:

## The multi_dot method

It performs dot (scalar) product with 2 or more input matrices. First and last arrays can be either 1D or 2D arrays. However, the dimensions of the matrices must suit subsequent scalar matrix multiplication.

``` # matrices with random integers: entries ranging from -4 to 4
a = np.random.randint(-4,4,(500,5))
b = np.random.randint(-4,4,(5,1000))
c = np.random.randint(-4,4,(1000,10))
d = np.random.randint(-4,4,(10,2000))
e = np.random.randint(-4,4,(2000,200))
# Perform multiple matrix multiplication
z = np.linalg.multi_dot([a,b,c,d,e]) ```

The result of this method can be obtained with successive dot products of matrices but `multi_dot` functions in an optimized manner. It decides the order of dot multiplication to complete the entire process efficiently.

``` %%time
z = np.linalg.multi_dot([a,b,c,d,e])
print(z, '\n') ```

Output:

``` %%time
z = a.dot(b).dot(c).dot(d).dot(e)
print(z, '\n') ```

Output:

We can observe that `multi_dot` consumes around 9 ms compute time, whereas successive `dot` methods consume around 335 ms compute time to arrive at the same solution.

## Inner Product

The inner product is the scalar multiplication of one vector (or matrix) and the transpose of another vector (or matrix). If both arrays are 1D arrays, their dimensions should be identical. If either or both arrays are higher-dimensional, then the last dimensions of both arrays should be identical.

``` a = np.array([[1,2,3], [4,-1,0]])
b = np.array([6,3,2])
z = np.inner(a,b)
print(z) ```

Output:

The same results can be obtained using the `dot` method as follows.

`a.dot(b.T)`

Output:

## Outer Product

Outer product is the dot product of a column vector of size M*1 and a row vector of size 1*N. The resulting array is a matrix of size M*N.

``` a = np.array([1,2,3,4,5])
b = np.array([6,3,2])
z = np.outer(a,b)
print(z) ```

Output:

## Matrix Multiplication

In NumPy, matrix multiplication is performed only with matrices, i.e. higher-dimensional arrays. If a vector is passed as an array, a row or a column will be added to that vector to temporarily convert it into a matrix. Once the matrix multiplication is finished, that row or column will be removed automatically.

``` a = np.arange(6).reshape(2,3)
b = np.arange(-3,3).reshape(3,2)
z = np.matmul(a, b)
print(a)
print(b)
print(z) ```

Output:

The last dimension of the second array and the last but-one dimension of the first array should be identical to perform matrix multiplication. Further, the symbol `@` is also used to perform matrix multiplication.

``` a = np.random.random([8,7,5,2])
b = np.random.random([8,7,2,3])
z = a @ b
print(z.shape) ```

Output:

## Matrix Determinant

Matrix determinant can be calculated using the method `det` available in the `linalg` module.

``` # generate a random integer matrix of size 3 by 3
a = np.random.randint(1,10,[3,3])
det = np.linalg.det(a)
print(int(det)) ```

Output:

For a higher-dimensional matrix, determinants are calculated with the entries at the last two dimensions.

``` # Determinant of a 3D array
a = np.random.randint(-5,5,[3,4,4])
det = np.linalg.det(a)
print(a)
print(det) ```

Output:

## Matrix Inverse

Inverse of a square matrix can be derived using the `inv` method of the `linalg` module.

``` a = np.random.randint(1,10,[3,3])
inv = np.linalg.inv(a)
print(a)
print()
print(inv) ```

Output:

## Matrix Power

Matrix Power is a general method to obtain either positive or negative powers of a given square matrix. The first negative power of a matrix is technically termed its inverse. Thus, the `matrix_power` method can be used to find the inverse or any power of a matrix.

``` a = np.random.random([4,4])
# positive powers of matrix
a_2 = np.linalg.matrix_power(a, 2)
a_7 = np.linalg.matrix_power(a, 7)
# inverse of matrix
a_inv_1 = np.linalg.matrix_power(a, -1)
a_inv_3 = np.linalg.matrix_power(a, -3)
print('matrix \n', a)
print('\n matrix to the power 2\n', a_2)
print('\n matrix to the power 7\n', a_7)
print('\n matrix inverse \n', a_inv_1)
print('\n matrix cubic inverse \n', a_inv_3) ```

Output:

## Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors of a matrix can be determined as follows. If Eigen values cannot be determined, the method throws an error (Eg. Singular matrix).

``` a = np.arange(9).reshape(3,3)
eig_val, eig_vec = np.linalg.eig(a)
print('Eigenvalues are: \n', eig_val)
print('\nEigenvectors are: \n', eig_vec) ```

Output:

Eigenvalues alone can be determined using the method `eigvals` as shown below.

``` a = np.arange(9).reshape(3,3)
eigenvalues = np.linalg.eigvals(a)
print(eigenvalues) ```

Output:

## Traces of a Matrix

Traces of a square matrix is the summation of its diagonal elements.

``` a = np.eye(5)
print(a)
z = np.trace(a)
print('\nTrace of matrix is: ',z) ```

Output:

Offset can be provided to calculate the summation of elements lying parallel to the diagonal. Offsets above the diagonal are marked with positive integers, and those below the diagonal are marked with negative integers.

``` z = np.trace(a, offset=1)
print(z) ```

Output:

## Matrix Norm

Matrix or vector norm is calculated using the `norm` method of the `linalg` module.

``` a = np.arange(12).reshape(4,3)
z = np.linalg.norm(a)
print(a)
print('\n Frobenius Norm of above matrix:')
print(z) ```

Output:

Axis-wise norm determination is also possible by specifying the axis as an integer.

``` # Norm along axis 0
a = np.arange(12).reshape(4,3)
z = np.linalg.norm(a, axis=0)
print(z) ```

Output:

``` # Norm along axis 1
a = np.arange(12).reshape(4,3)
z = np.linalg.norm(a, axis=1)
# dim = 4
print(z) ```

Output:

For higher-dimensional arrays, axes must be specified in a tuple to calculate the norms.

``` # Norms of 3D arrays
a = np.random.random([3,2,4])
print(a)
print('\nNorm along axes (0,1)')
z = np.linalg.norm(a, axis=(0,1))
print(z)
print('\nNorm along axes (1,2)')
z = np.linalg.norm(a, axis=(1,2))
print(z) ```

Output:

## Solving System of Equations

When we think of Linear Algebra, the system of linear equations comes to our mind first, as it is tedious, time-consuming and error-prone. NumPy solves systems of linear equations in a fraction of seconds!

``` # Coefficient Matrix
a = np.random.randint(1,20,[4,4])
# Dependent variable vector
b = np.array([4,9,12,7])
# solution
x = np.linalg.solve(a,b)
print('Coefficient Matrix')
print(a)
print('\nDependent Variable vector')
print(b)
print('\nSolution')
print(x) ```

Output:

Check for correctness in the solution by performing dot product of the coefficient matrix and the solution vector.

``` # Check for correctness
B = a.dot(x)
print(B) ```

Output:

This ‘B’ array is identical to the input ‘b’ array. Hence, our solution is correct.

## Singular Value Decomposition

Singular Value Decomposition (SVD) is one of the great dimension-reduction algorithms in machine learning. It identifies the principal components and arranges them by rank. The top ranked components contribute greatly to the original array. Here, we explore SVD with an image to get better understanding.

``` from skimage import data
image = data.coffee()
print(image.dtype, image.min(), image.max(), image.shape)
plt.imshow(image)
plt.show() ```

Output:

Normalize the image by dividing it by the maximum value, 255 and reorder the axes to be (3,400,600).

``` # normalize image
img = image/255.0
# reorder the axes to have colour channels as the first dimension
img = np.transpose(img, axes=(2,0,1)) ```

Perform SVD on the image. It decomposes the original image into three components: U matrix, Sigma vector, and V matrix. The Sigma vector is the diagonal entries of the Sigma matrix. Hence, it is advisable that the Sigma vector may be reformed into a diagonal matrix. It should be noted that the first dimension 3 refers to the three colour channels.

``` U,S,V = np.linalg.svd(img)
print(U.shape, S.shape, V.shape) ```

Output:

``` # S matrix should have dimensions suitable for matrix multiplication
Sigma = np.zeros((3,400,600))
for i in range(3):
np.fill_diagonal(Sigma[i,:,:], S[i,:])
print(Sigma.shape) ```

Output:

Reconstruct the original image without any dimension reduction.

``` reconst = U @ Sigma @ V
reconst = np.transpose(reconst, axes=(1,2,0))
plt.imshow(reconst)
plt.show() ```

Output:

Reconstruct the data by reducing the common dimensions from 400 to 50.

``` k = 50
reconst = U @ Sigma[:,:,:k] @ V[:,:k,:]
reconst = np.transpose(reconst, axes=(1,2,0))
plt.imshow(reconst)
plt.show() ```

Output:

It is amazing that we reconstructed the image with most details, even at a reduction of ¼.

We can once again reconstruct the same image by reducing the data points from 400 to 20.

``` k = 20
reconst = U @ Sigma[:,:,:k] @ V[:,:k,:]
reconst = np.transpose(reconst, axes=(1,2,0))
plt.imshow(reconst)
plt.show() ```

Output:

Out of 400 data points, merely 20 data points can reconstruct the image with key feature details! This is why the old mathematical algorithm- SVD is still popular in machine learning.

Find here the Colab Notebook with the above code implementation.

Note: There are no seeds set for the above code implementation. Therefore, the values generated and outputs obtained are not reproducible.

## Wrapping Up

In this article, we discussed how Linear Algebra is performed with the NumPy library. We discussed popular methods and algorithms with examples and hands-on practice. However, there are few more methods available in NumPy’s linalg module that are left undiscussed. Interested readers can find more examples at NumPy’s official documentation.

## More Great AIM Stories

### OpenAI Launches \$100 Mn Fund To Catch AI Startups Young A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

## Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.  