A Guide to Locally Linear Embedding for Dimensionality Reduction

We can simply apply the dimension reduction by choosing the random projection of the data. Locally-Linear Embedding is a approach for dimension reduction

The performance of any machine learning model strongly depends on the quality of the data used to train the model. When the data to train the model is very large, its size needs to be scaled down based on several factors. For this purpose, there are many methods used to reduce the dimensionality of the model. Some of these work in the case of linear while some work when there are non-linear relations between the features in the data. In this article, we are going to discuss locally-linear embedding which is an approach to non-linear dimensionality reduction with neighbourhood preserving embeddings of high dimensional data. The major points to be discussed in this article are listed below.

Table of Contents

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
  1. Non-Linear Dimensionality Reduction
  2. Manifold Learning
  3. Locally-Linear Embedding(LLE)
    1. Hessian Locally-Linear Embedding (Hessian LLE)
    2. Modified Locally-Linear Embedding(MLLE)
    3. Local Tangent Space Alignment(LTSA)
  4. Implementation of LLE in Python

Let us start the discussion by understanding the non-linear dimensionality reduction approach. 

Non-Linear Dimensionality Reduction

Data with higher dimensions requires more than two or three dimensions in the space to represent, which can be difficult sometimes to understand how the distribution of the data is in the space or difficult to interpret the data because of its dimensionality. So the nonlinear dimensionality reduction is an approach where the simplification of the high dimension is done by assuming that the interesting points of the data lie within the lower-dimensional space. So if the data lies in a lower dimension space we can easily interpret it by visualization. Manifold learning is one way to perform non-linear dimensionality reduction.

Manifold Learning

Manifold learning can be considered as the family of approaches to non-linear dimensionality reduction. Where most of the approaches in manifold learning are dependent on the idea that the data we are having is only artificially high. And we know that these kinds of datasets are very difficult to interpret by visualization. Visualization of the data which has fewer dimensions can represent a proper overview of the structure of the data where visualization of high dimensional data is much less intuitive. To get a proper structure of the data, a reduction of dimension is required.

We can simply apply the dimension reduction by choosing the random projection of the data. This helps us in visualizing the data structure to some extent but a random projection of the data causes the loss of many of the interesting data structures in the large data. To deal with this problem of random projection many of the methods are introduced to us such as Principal Component Analysis (PCA), Independent Component Analysis, Linear Discriminant Analysis, and others. These are also very powerful methods for dimension reduction but many of them are not considered in working with data whose structure is not linear.

The manifold learning approach can be considered as the generalization or the update of the PCA so that it can be used for the reduction of the dimension of the data which consists of a non-linear structure. The typical manifold learning approach comes under unsupervised learning where the algorithms try to learn the data structure without using any predetermined classifications. There can be many approaches under the family of manifold learning like isomap, locally linear embedding, local tangent space alignment, and others. In this article, we are going to learn about the Locally-Linear Embedding approach for dimension reduction.    

Locally-Linear Embedding(LLE)

There can be many advantages of using Locally-Linear Embedding approaches over Isomap or other approaches, like using it with sparse matrix algorithms we can obtain a faster optimization and better results with many problems. The algorithm of  LLE starts with finding a set of the nearest neighbours of each point. After finding the nearest neighbours by computing the weights set for each point it helps in describing the point as a linear combination of its neighbours. And in the final step to finding the low dimensional embedding for points, it uses a technique which we call the eigenvector optimization. The final step also helps in describing each point as a linear combination of its neighbour. LLE has no internal model.

To compute the neighbour Xj of any point Xi the LLE uses the barycentric coordinates of Xi. were using the linear combination the algorithm reconstructs the original point. The weight matrix Wij gives the linear combination to point Xi. errors in the reconstruction of the original point can be given by the following function.

Image source

Where E(W) can also be called the cost function.

In reconstructing the original point Xi, the contribution of point Xj can be referred to by the weight matrix Wij. The above-given cost function is minimized by these two constraints:

  • Each original point is reconstructed by only the points in the neighbourhood of the original point. The algorithm can make Wij as zero if any point Xj is not the neighbour point of the original point Xi.
  • Sum of every row in Wij (weight matrix) = 1.

The main goal of the algorithm is to make the data with dimensions D  of the dimension d where D >> d. The idea behind the generation of neighbourhood preserving maps is that The same weight matrix of the data with D dimension which helps in the reconstruction of the original points can also be used in the construction of the same original point in the data with dimensions d. Also, the minimization of the cost function is taken into the account and  Each point Xi in the D dimensional data is mapped onto a point Yi in the d dimensional data.

Image source

Unlike other cost functions in different methods, here the algorithm tries to keep the weight matrix the same, and the optimization of the coordinates of the data space is done by minimizing the cost function on the points Yi.

If the numbers of data points are N then the problem of minimization can be solved by the eigenvalue problem of an N X N-dimensional sparse matrix. an orthogonal set of coordinates provided by the  N X N-dimensional sparse matrix bottoms ‘d’ non zero eigenvectors. Typically we can say there can be two LLE variants.

  • Hessian Locally-Linear Embedding (Hessian LLE) – it is just like a basic LLE based on the solving sparse matrix technique and tends to give much higher quality results than the basic LLE. it has no internal models.
  • Modified Locally-Linear Embedding(MLLE) – Modified LLE (MLLE) is another LLE variant in the local weight matrix addressed by multiple weights of each neighbourhood. This process leads to distortions in LLE maps. More formally we can say that if the generated weights from the basic LLE are orthogonally projected we can consider them as multiple weights.
  • Local Tangent Space Alignment(LTSA) – Local Tangent Space Alignment can also be considered as the variant of LLE because it is pretty similar to LLE when we talk about the algorithm of LTSA. The only difference between LLE and LTSA is that LLE focuses on preserving the neighbourhood distance where LTSA characterizes the local geometry of each neighbourhood.

In the next section of the article, we will see how we can implement the LLE and its different variants using python. The implementation below is inspired by the example given by SK-Learn.

Implementation of LLE in Python

In this article, we are going to implement the LLE and its variant on the load digits dataset provided by the sklearn library of python. 

  • Importing data set:
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt, offsetbox
import numpy as np
data = load_digits()
X = data.data
y =  data.target
n_samples = 1797
n_features = 64
n_neighbors = 30
fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(7, 7))
for idx, ax0 in enumerate(axs.ravel()):
    ax0.imshow(X[idx].reshape((8, 8)), cmap=plt.cm.binary)
    ax0.axis("off")

_ = fig.suptitle("digits dataset", fontsize=18)

Output:

The above image is a representation of the digits dataset where we have imported all the classes available in the dataset package.

  • Defining a helper function for plotting the LLE and its variants:
def helper_function(X, title, ax):
    from sklearn.preprocessing import MinMaxScaler
    X = MinMaxScaler().fit_transform(X)
    show_image = np.array([[1.0, 1.0]]) 
    for i in range(X.shape[0]):
        ax.text(
            X[i, 0],
            X[i, 1],
            str(y[i]),
            color=plt.cm.Dark2(y[i]),
            fontdict={"weight": "bold", "size": 9},
        )

        dist = np.sum((X[i] - show_image) ** 2, 1)
        if np.min(dist) < 4e-3:
            continue
        show_image = np.concatenate([show_image, [X[i]]], axis=0)

        import sklearn
        imagebox = offsetbox.AnnotationBbox(
            offsetbox.OffsetImage(sklearn.datasets.load_digits(n_class=10).images[i], cmap=plt.cm.gray_r), X[i]
        )

        ax0.add_artist(imagebox)
    ax0.set_title(title)
    ax0.axis("off")

The above functions will be using the different embed techniques on the digits dataset and on generated embedding it will plot the projection different data points available on the digits dataset and by visualizing we will be able to understand the embedding space whether they are groped well or scattered across the embedding space.

  • Comparison of different LLE techniques:

We can use the scikit-learn library provided LocallyLinearEmbedding module for performing the LLE and its different variants by just changing the method argument while implementing the module.

from sklearn.manifold import LocallyLinearEmbedding

    ax0.axis("off")

    "Standard LLE": LocallyLinearEmbedding(
        n_neighbors=n_neighbors, n_components=2, method="standard"
    ),

    "Modified LLE": LocallyLinearEmbedding(
        n_neighbors=n_neighbors, n_components=2, method="modified"
    ),

    "Hessian LLE": LocallyLinearEmbedding(
        n_neighbors=n_neighbors, n_components=2, method="hessian"
    ),

    "LTSA LLE": LocallyLinearEmbedding(
        n_neighbors=n_neighbors, n_components=2, method="ltsa"
    ),


}

By declaring these methods we can now visualize the projection of the original data. Also, we can compare the computation time for each LLE  using the following function.



from time import time

projections, timing = {}, {}
for name, transformer in embeddings.items():

    if name.startswith("Standard LLE"):
        data = X.copy()
        data.flat[:: X.shape[1] + 1] += 0.01  # Make X invertible
    else:
        data = X
    print(f"{name}...")
    start_time = time()

    projections[name] = transformer.fit_transform(data, y)
    timing[name] = time() - start_time

Output:

Finally, we can plot the resulting projection given by each method.



from itertools import zip_longest

fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(17, 24))

for name, ax0 in zip_longest(timing, axs.ravel()):
    if name is None:
        ax0.axis("off")
        continue
    title = f"{name} (time {timing[name]:.3f}s)"
    helper_function(projections[name], title, ax0)
plt.show()

Output:

Here in the above output we can see the difference between the different LLE variants where standard LLE has taken the lowest time to computation and we can say by looking at the visualization we have good results for the Hessian LLE and modified LLE variants of the LLE.

Final Words

Here in this article, we have seen how the locally-linear embedding algorithm works and we have applied LLE and its all variants on the digits data and seen the comparison between them. We can say that these LLE variants are a good option for us to apply in the situation where data is high dimensional and also follows the nonlinear dimensional structure.

References 

More Great AIM Stories

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.