# How Does PCA Dimension Reduction Work For Images?

In this article, we will demonstrate how to work on larger data and images using a famous dimension reduction technique PCA( principal component analysis).

In machine learning, we need lots of data to build an efficient model, but dealing with a larger dataset is not an easy task we need to work hard in preprocessing the data and as a data scientist we will come across a situation dealing with a large number of variables here PCA (principal component analysis) is dimension reduction technique helps in dealing with those problems.

In this article, we will demonstrate how to work on larger data and images using a famous dimension reduction technique PCA( principal component analysis).

• How does PCA work?
• How does PCA work on Image compression?
• How does PCA work on a normal Dataset?
• Limitations of PCA

## How does PCA work…?

PCA is a dimensionality reduction that is often used to reduce the dimension of the variables of a larger dataset that is compressed to the smaller one which contains most of the information to build an efficient model. In a real-time scenario when you are working reducing the number of variables in the dataset you need compromise on model accuracy but using PCA will give good accuracy. The idea of PCA is to reduce the variables in the dataset and preserve data as much as possible.

## How does PCA work on Image Compression?

The image is a combination of pixels in rows placed one after another to form one single image each pixel value represents the intensity value of the image, so if you have multiple images we can form a matrix considering a row of pixels as a vector. It requires huge amounts of storage while working with many images where we are using PCA is used to compress it and preserve the data as much as possible.

## Hands-on implementation of image compression using  PCA

```Importing libraries.
import matplotlib.image as mplib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import PCA
Reading an image and printing the shape of the image.
print(img.shape)
plt.imshow(img)```

Reshaping the image to 2-dimensional so we are multiplying columns with depth so 225 X 3 = 675.

`img_r = np.reshape(img, (225, 582)) `

`print(img_r.shape) `

Applying PCA so that it will compress the image, the reduced dimension is shown in the output.

```pca = PCA(32).fit(img_r)
img_transformed = pca.transform(img_r)
print(img_transformed.shape)
print(np.sum(pca.explained_variance_ratio_) )

Retrieving the results of the image after Dimension reduction.
temp = pca.inverse_transform(img_transformed)
print(temp.shape)
temp = np.reshape(temp, (225,225 ,3))
print(temp.shape)
plt.imshow(temp)```

As you can see in the output, we compressed the image using PCA.

## How does PCA work on Machine learning Dataset?

As a data scientist, we need to solve the larger datasets of nearly 1000’s of columns. It’s very hard to preprocess and visualize the larger datasets. By PCA we can solve this problem and compress that multidimensional data into a single column. In the below code snippets we will try to implement PCA on a dataset and use the k-means algorithm on a compressed dataset to divide into clusters.

## Implementation of image compression using  PCA

```#Importing libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
wine.describe()```

Applying PCA to compress the data.

```pca=PCA(n_components=13)
pca_values=pca.fit_transform(wine_data)
var = pca.explained_variance_ratio_
pca.components_[0]
How compressed data is distributed.
var1 = np.cumsum(np.round(var,decimals = 4)*100)
var1```

We are storing the PCA compressed dataset.

`z =pca_values[:,2]`

We are testing compressed new data on the k-means algorithm.

```new_df = pd.DataFrame(z)
new_df
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = 3)
kmeans.fit(new_df)
Kmeans.labels_```

As you can see in the output, we divided the PCA applied data on the k-means algorithm by dividing them into 3 clusters.

## Limitations

• We cannot apply PCA on categorical features, we need to create a dummy variable for it.
• We lose the independent variable so that we can not interpret them.
• If each value in the data is very important then PCA should not be your choice because there will be a loss of the data in the process of dimension reduction.

## Conclusion

In the above demonstration, we discussed how PCA is used as a dimension reduction on image compression, machine learning dataset, and implementing the compressed dataset on the K-means algorithm.

## More Great AIM Stories

### TypeScript vs JavaScript: Who’s Winning The 10-year-long Battle?

AI enthusiast, Currently working with Analytics India Magazine. I have experience of working with Machine learning, Deep learning real-time problems, Neural networks, structuring and machine learning projects. I am a Computer Vision researcher and I am Interested in solving real-time computer vision problems.

## AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Kickstart your career in Data Science and Business Analytics with this program from Great Learning

The curriculum of the PGP in Data Science and Business Analytics: V.22 has been updated in consultation with industry experts, academicians and program alums.

### How to build healthcare predictive models using PyHealth?

PyHealth is a Python-based toolbox. As the name implies, this toolbox contains a variety of ML models and architecture algorithms for working with medical data and modeling.

### Explained: Prospective learning in AI

A paper published earlier this year argued that retrospective learning isn’t a good representation of true intelligence.

### AI in SEO is so evolved now, it’s pitting against itself

With the integration of AI into SEO, can brands overcome the strict and ever vigilant guidelines of SERPs?

### Council Post: Key things to remember while building data teams

The AI team consists of  ‘an external team’ (a team external to the data team but part of the core AI team) that works closely with the data team and then there is the core data team itself.

### IBM launches new Mainframe model, aims to regain lost ground

Despite the cost-saving benefits and ease of sharing resources, only 25% of enterprise workloads have been moved to the cloud.

### How AI is used for the early detection of breast cancer

CNNs are efficient in detecting malignancies from scans.

### Google says no to FLoC, replaces it with Topic

Topic is a Privacy Sandbox proposal for internet-based advertising, which is replacing FLoC (Federated Learning of Cohorts).

### Learning Scala 101: The best books, videos and courses

Tagged as “the definitive book on Scala”, this book is co-authored by Martin Odersky, the designer of the Scala language.

### Meta AI proposes a new approach to improve object detection

Detic, like ViLD, uses CLIP embeddings as the classifier.