The advancement in technology and increased use of modern devices results in huge data generation. Feeding the large size of data directly to predictive algorithms does not always give accurate results as many parts of the data may not be relevant to what we want to predict or analyze. So there is a need to apply some data reduction approaches to reduce the size of the data. Here data reduction means reducing the dimensions of data or reducing the variables by the base of statistics. In contrast to dimensionality reduction, in this article we will talk about a supervised method of dimension reduction that is Linear Discriminant Analysis (LDA) and this method will be compared with others. Below is a list of points that we will cover in this article.
Table of Contents
- What is Dimensionality Reduction?
- What is Linear Discriminant Analysis?
- Assumption of LDA
- Objectives of LDA
- Statistics associated with Discriminant Analysis
- LDA vs PCA
- Applications of LDA
Let’s proceed with these topics.
Dimensionality reduction is the transformation of data from high dimensional space into a low dimensional space so that low dimensional space representation retains nearly all the information ideally saying all the information only by reducing the width of the data. Working with high dimensional space can be undesirable for many reasons like raw data is mostly sparse and results in high computational cost. Dimensionality reduction is common in a field that deals with large instances and columns.
Methods of dimensionality reduction are divided into linear and non-linear approaches. Dimensionality reduction can also be used for noise reduction, data visualization, cluster analysis, and as an intermediate step while building predictive models.
Some of the commonly used dimensionality reduction techniques are,
- Principal component analysis (PCA)
- Non-negative matrix factorization
- Autoencoders
- Graph-based Kernel PCA
- Linear Discriminant analysis
What is Linear Discriminant Analysis?
Formulated in 1936 by Ronald A Fisher by showing some practical uses as a classifier, initially, it was described as a two-class problem. Later on, in 1948 C. R. Rao generalized it as multi-class linear discriminant analysis.
In most cases, linear discriminant analysis is used as dimensionality reduction for supervised problems. It is used for projecting features from higher dimensional space to lower-dimensional space. Basically many engineers and scientists use it as a preprocessing step before finalizing a model. Under LDA we basically try to address which set of parameters can best describe the association of groups for a class, and what is the best classification model that separates those groups.
LDA approaches by finding a linear combination of features that characterizes two or more classes or outcomes and the resulting combination is used as a linear classifier or for dimensionality reduction. LDA is more or less related to the ANOVA and regression techniques which express a dependent variable as a linear combination of another independent variable. Below is a basic comparison table for LDA, Regression, and ANOVA.
Features | Regression | ANOVA | LDA |
No. Of Dependent Variable | 1 | 1 | 1 |
No. Of Independent Variable | Multiple | Multiple | Multiple |
Nature of Dependent variable | Metric | Metric | Categorical |
Nature of Independent variable | Metric | Categorical | Metric |
Assumption of LDA
- Each feature/column in the dataset is Gaussian distribution in simple words data points are normally distributed having bell-shaped curves.
- Independent variables are normal for each level of the grouping variable.
- Predictive power can decrease with an increase in correlation between variables.
- All instances are assumed to be randomly sampled and scores on one variable are assumed to be independent.
It is observed that linear discriminant analysis is relatively robust to a slight variation on all of the above assumptions.
Objectives of LDA
- Development of discrimination function, or linear combination of predictor or independent variables, which will best discriminate between categories of criterion or dependent group.
- Checks to see whether there are any disparities between the characterized groups.
- Determines which variable contributes most inter-group differences.
- Classification of groups is based on the values of the predictor variables.
Statistics associated with LDA
Canonical Correlation
It assesses the degree to which discriminating scores and groupings are linked. It’s a measure of how well a single discriminant function and a set of dummy variables characterize group membership.
Centroid
The centroid is the mean value of the partial group’s discriminate score. There are as many centroids as there are groups, with one for each. The group centroids are the means of groups across all functions.
Discriminant function Coefficients
When the variable is in its original units of measurement, the discriminant function of coefficients are multipliers of variables.
Discriminant scores
The variable’s values are multiplied by the unstandardized coefficients. The discriminant score is calculated by summing this product and adding them to the constant term.
Eigenvalue
The eigenvalue is the ratio between groups to within groups sum of squares for each discrimination function. The presence of a large eigenvalue indicates that the function is superior.
Total Correlation matrix
A total correlation matrix is generated by treating the cases as if they were from a single sample and computing correlation.
F-Scores and their significance
These are calculated from one-way ANOVA, with grouping variables serving as the categorical independent variables. Each predictor intern serves as the metric dependent variable in the ANOVA.
Wilks Lambda
Sometimes also called U statistics, Wilks Lambda for each predictor is the ratio of the within-group sum of squares. Its value varies between 0 and 1. Large values say near to 1 indicate that group means do not seem to be different. Small values indicate that group means seem to be different.
LDA vs PCA
From the discussion so for we come to know that in general LDA is a very similar approach to principal component analysis both are linear transformation techniques for dimensionality reduction but also we are pursuing some differences those are listed below;
Features | Principal Component Analysis | Linear Discriminant Analysis |
Method of learning | Unsupervised | Supervised |
Focus | Its searches for the direction that has the largest variations | Maximizes ratio between class variation and within-class variation |
Computation for large dataset | Requires fewer computations | Requires more computation than PCA for a large dataset |
Discrimination between classes | Deals without paying any particular attention to the class structure | Directly deals with discrimination between classes |
Well distributed classes in a small dataset | PCA is less superior to LDA | LDA is more superior to PCA |
Application of LDA
Linear discriminant analysis has been successfully used for many applications. As long as we can transform the problem in classification or working on classification problems we can apply this technique. We can use discriminant analysis for original applications if you have a new additional combination of features and objects that may never be considered.
Some examples of real-world applications areas of LDA:
Face recognition
It is a widely used application for computer vision, where every face draws with large pixel values. Here LDA reduces the number of features before implementing the classification task. A temple is created with newly produced dimensions which are linear combinations of pixels.
In medical field
Here LDA is used to classify the state of the patients for disease as mild, moderate, or severe based on fewer parameters and the treatment to the patient is going in such a way that movement of treatment is reduced.
Robotics
Robots are trained to replicate the human task and behavior and these can be treated as classification tasks. Here LDA can be used to make similar groups based on various parameters such as frequencies, pitches, sound, tunes, etc.
Conclusion
In this article, we have seen what dimensionality reduction is and what its significance is. Following it, we discussed the supervised method of dimensionality reduction called LDA which can be further used as a classifier when logistic regression fails and when we are dealing with two or more classes. R2 square is the deciding factor in regression analysis here in LDA it is Wilks Lambda.