 Introductory Guide to Linear Discriminant Analysis

Dimensionality reduction is the transformation of data from high dimensional space into a low dimensional space so that low dimensional space representation retains nearly all the information ideally saying all the information only by reducing the width of the data.  The advancement in technology and increased use of modern devices results in huge data generation. Feeding the large size of data directly to predictive algorithms does not always give accurate results as many parts of the data may not be relevant to what we want to predict or analyze. So there is a need to apply some data reduction approaches to reduce the size of the data. Here data reduction means reducing the dimensions of data or reducing the variables by the base of statistics. In contrast to dimensionality reduction, in this article we will talk about a supervised method of dimension reduction that is Linear Discriminant Analysis (LDA) and this method will be compared with others. Below is a list of points that we will cover in this article.

1. What is Dimensionality Reduction?
2. What is Linear Discriminant Analysis?
• Assumption of LDA
• Objectives of LDA
3. Statistics associated with Discriminant Analysis
4. LDA vs PCA
5. Applications of LDA

Let’s proceed with these topics.

What is Dimensionality Reduction?

Dimensionality reduction is the transformation of data from high dimensional space into a low dimensional space so that low dimensional space representation retains nearly all the information ideally saying all the information only by reducing the width of the data. Working with high dimensional space can be undesirable for many reasons like raw data is mostly sparse and results in high computational cost. Dimensionality reduction is common in a field that deals with large instances and columns.

Methods of dimensionality reduction are divided into linear and non-linear approaches. Dimensionality reduction can also be used for noise reduction, data visualization, cluster analysis, and as an intermediate step while building predictive models.

Some of the commonly used dimensionality reduction techniques are,

What is Linear Discriminant Analysis?

Formulated in 1936 by Ronald A Fisher by showing some practical uses as a classifier, initially, it was described as a two-class problem. Later on, in 1948 C. R. Rao generalized it as multi-class linear discriminant analysis.

In most cases, linear discriminant analysis is used as dimensionality reduction for supervised problems. It is used for projecting features from higher dimensional space to lower-dimensional space. Basically many engineers and scientists use it as a preprocessing step before finalizing a model.   Under LDA we basically try to address which set of parameters can best describe the association of groups for a class, and what is the best classification model that separates those groups.

LDA approaches by finding a linear combination of features that characterizes two or more classes or outcomes and the resulting combination is used as a linear classifier or for dimensionality reduction. LDA is more or less related to the ANOVA and regression techniques which express a dependent variable as a linear combination of another independent variable. Below is a basic comparison table for LDA, Regression, and ANOVA.

Assumption of LDA

• Each feature/column in the dataset is Gaussian distribution in simple words data points are normally distributed having bell-shaped curves.
• Independent variables are normal for each level of the grouping variable.
• Predictive power can decrease with an increase in correlation between variables.
• All instances are assumed to be randomly sampled and scores on one variable are assumed to be independent.

It is observed that linear discriminant analysis is relatively robust to a slight variation on all of the above assumptions.

Objectives of LDA

• Development of discrimination function, or linear combination of predictor or independent variables, which will best discriminate between categories of criterion or dependent group.
• Checks to see whether there are any disparities between the characterized groups.
• Determines which variable contributes most inter-group differences.
• Classification of groups is based on the values of the predictor variables.

Statistics associated with LDA

Canonical Correlation

It assesses the degree to which discriminating scores and groupings are linked. It’s a measure of how well a single discriminant function and a set of dummy variables characterize group membership.

Centroid

The centroid is the mean value of the partial group’s discriminate score. There are as many centroids as there are groups, with one for each. The group centroids are the means of groups across all functions.

Discriminant function Coefficients

When the variable is in its original units of measurement, the discriminant function of coefficients are multipliers of variables.

Discriminant scores

The variable’s values are multiplied by the unstandardized coefficients. The discriminant score is calculated by summing this product and adding them to the constant term.

Eigenvalue

The eigenvalue is the ratio between groups to within groups sum of squares for each discrimination function. The presence of a large eigenvalue indicates that the function is superior.

Total Correlation matrix

A total correlation matrix is generated by treating the cases as if they were from a single sample and computing correlation.

F-Scores and their significance

These are calculated from one-way ANOVA, with grouping variables serving as the categorical independent variables. Each predictor intern serves as the metric dependent variable in the ANOVA.

Wilks Lambda

Sometimes also called U statistics, Wilks Lambda for each predictor is the ratio of the within-group sum of squares. Its value varies between 0 and 1. Large values say near to 1 indicate that group means do not seem to be different. Small values indicate that group means seem to be different.

LDA vs PCA

From the discussion so for we come to know that in general LDA is a very similar approach to principal component analysis both are linear transformation techniques for dimensionality reduction but also we are pursuing some differences those are listed below;

Application of LDA

Linear discriminant analysis has been successfully used for many applications. As long as we can transform the problem in classification or working on classification problems we can apply this technique. We can use discriminant analysis for original applications if you have a new additional combination of features and objects that may never be considered.

Some examples of real-world applications areas of LDA:

Face recognition

It is a widely used application for computer vision, where every face draws with large pixel values. Here LDA reduces the number of features before implementing the classification task. A temple is created with newly produced dimensions which are linear combinations of pixels.

In medical field

Here LDA is used to classify the state of the patients for disease as mild, moderate, or severe based on fewer parameters and the treatment to the patient is going in such a way that movement of treatment is reduced.

Robotics

Robots are trained to replicate the human task and behavior and these can be treated as classification tasks. Here LDA can be used to make similar groups based on various parameters such as frequencies, pitches, sound, tunes, etc.

Conclusion

In this article, we have seen what dimensionality reduction is and what its significance is. Following it, we discussed the supervised method of dimensionality reduction called LDA which can be further used as a classifier when logistic regression fails and when we are dealing with two or more classes. R2 square is the deciding factor in regression analysis here in LDA it is Wilks Lambda.

More Great AIM Stories

Psst… Amazon Is Busy Transfer Learning Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

OUR UPCOMING EVENTS

8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks.  Meta AI releases “data2vec”, a self-supervised algorithm that works for speech, vision, and text

They have applied it separately to speech, text and images where it outperformed the previous best single-purpose algorithms for computer vision and speech.  Building and modelling a graph neural network from scratch

Graph neural networks that can operate on the graph data can be considered graph neural networks. Using graph data any neural network is required to perform tasks using the vertices or nodes of the data.  A guide to explainable named entity recognition

Named entity recognition (NER) is difficult to understand how the process of NER worked in the background or how the process is behaving with the data, it needs more explainability. we can make it more explainable.  What is Uplift modelling and how can it be done with CausalML?

In this article, we will discuss the uplift modelling, its types of modelling, and Python-based package called CausalML can be used to address the causal inference.  10 real-life applications of Genetic Optimization

Genetic algorithms have a variety of applications, and one of the basic applications of genetic algorithms can be the optimization of problems and solutions. We use optimization for finding the best solution to any problem. Optimization using genetic algorithms can be considered genetic optimization