Latent class analysis is a statistical technique which is used for the analysis of multivariate categorical data. When observed data are in the form of a series of categorical responses as, say for instances, in public opinion surveys, individual-level voting data, consumer behaviour and decision-making, it is often of interest to investigate sources of confounding between the observed variables, identify and characterize clusters of similar cases, and approximate the distribution of observations across the many variables of interest. Latent class models are a useful tool for accomplishing these goals.
Thus, for given voting data having recorded data (6 votes on different dates) binary in nature, LCA (poCLA function) is used to cluster the TDs(members of Dail Éireann,) to uncover groups with similar voting patterns. The data given to us says that a TD was voted yes (coded 2) or not (coded 1). Therefore, as part of this analysis will try to examine the membership of the clusters found using LCA (the political affiliation of each TD) and the cluster-specific parameters.
Link to Dataset: VotingPattern
METHOD AND OUTPUT ANALYSIS:
The voting data consists of six manifest variables. The purpose of studying these data is to cluster the TDs(members of Dail Éireann,) to uncover groups with similar voting patterns. Using package poLCA, and applying the function poLCA by providing the formula and data having six variables (Environment, RentFreeze, SocialWelfare, Gaming and Lotteries, Housing Minister and FirstTimeBuyers). Here we will do Latent class analysis for 7 classes and see if any improvement is there in the model each of the classes. Below is the graph for two classes.
Figure 1 shows a screen capture of the estimation of model lc2 with the graphs option set to TRUE. The two estimated latent classes clearly correspond to a pair of classes that have high proportion voted No (Class 1-> 37%) or yes (Class 2 -> 63%). The full output from the estimation of model lc2 is given below. First, the estimated classconditional response probabilities π are reported for six votes/variables(Environment, RentFreeze, SocialWelfare, GamingAndLotteries, HousingMinister and FirstTimeBuyers )with each row corresponding to a latent class, and each column corresponding to votes; No in the first column, and Yes in the second.
Thus, for example, For Environment variable in class 1 we can see that 34.5% of votes are No, and the red taller bar represents those who voted as yes i.e. 65.5%, For RentFreeze variable in class 1 we can see that 97.4% of votes are No, and the red bar represents those who voted as yes i.e.2.6%, for SocialWelfare variable in class 1 we can see that 100% of votes are No, and the red dot represents those who voted as yes i.e.0%, GamingAndLotteries variable in class 1 we can see that 36% of votes are No, and the taller bar represent those who voted as yes i.e.64%, HousingMinister variable in class 1 we can see that 97% of votes are No, and the taller bar represents those who voted as yes i.e.3%, FirstTimeBuyers variable in class 1 we can see that 1% of votes are No, and the taller bar represents those who voted as yes i.e.0%.
Thus, if we take total we get of No(34.5 +97.4+100+36+97+100) i.e. 464.9% and for Yes ( 135.1%), therefore we can say that class 1 is having most of voted No. Similarly, for class 2 we can see that 100% of votes are No, and the red bar represents those who voted as yes i.e. 0%, For RentFreeze variable in class 2 we can see that 17% of votes are No, and the taller bar represents those who voted as yes i.e.83%, for SocialWelfare variable in class 2 we can see that 16% of votes are No, and the taller bar represents those who voted as yes i.e.84%, so, if we take total we get for Yes ( 323%), therefore we can say that class 2 has more of votes Yes. These are the same values that appear in Figure 1.
Finally, poLCA outputs a number of the goodness of fit statistics (67.9). For the voting data, the minimum BIC and not very high value of AIC criteria both indicate that the two-class model is most parsimonious: with two classes, the AIC is 950.0 and the BIC 989.6; with three classes, the AIC decreases to 937.0493and the BIC increases to 998.04; and with four classes, the AIC decreases to 934.0441 and the BIC increases to 1016.39. Thus, “two-class model” with minimum BIC value and marginal high AIC value in comparison to other models are good.
|Party||Class 1||Class 2|
CONCLUSIONS (PARTY |CLUSTER MEMBERSHIP):
Therefore, from the above table, we can say that most of the FG party members are in class 1(48) and most of the FF party members (43) are in class 2. Also, FG (Fine Gael) party TDs could be the one most voted No and are contributing more percentage in the cluster 1.
Similarly, FF(Fianna Fáil) party belongs to class 2 with mostly voted Yes. While there are less number/zero members of other parties(AAA-PBP, FF, Green, I4C, Ind, Lab, SD, SF) in class 1(high proportion voted as No).On the other hand, it looks TDs of other parties have a good number voting and constitute a good part of cluster 2(high proportion voted as Yes). Class 1 has a total of 56 members and class 2 has 100 members. The predicted cluster membership class 1(36%) and class2(64%).
LCA OUTPUT TABLE:
Conditional item response (column) probabilities, by outcome variable, for each class (row)
|class 1: 0.345 0.655|
|class 2: 1.000 0.000|
|class 1: 0.9739 0.0261|
|class 2: 0.1684 0.8316|
|class 1: 1.0000 0.0000|
|class 2: 0.1631 0.8369|
|class 1: 0.3571 0.6429|
|class 2: 0.6765 0.3235|
|class 1: 0.9670 0.0330|
|class 2: 0.4786 0.5214|
|class 1: 1.0000 0.0000|
|class 2: 0.2652 0.7348|
Estimated class population shares
Predicted class memberships (by modal posterior prob.)
Fit for 2 latent classes:
number of observations: 156
number of estimated parameters: 13
residual degrees of freedom: 50
maximum log-likelihood: -462.0166
G^2(2): 58.95794 (Likelihood ratio/deviance statistic)
X^2(2): 67.89261 (Chi-square goodness of fit)
#Load the Data.
voting2<-read.csv(“TDs_names_parties.csv”,header = TRUE)
#This is to load the data.
#Question 1(b):Latent class analysis.
#Install the package.
#Create the formula
#Apply the Latent analysis function.
lc2<-poLCA(f,bin.votes,nclass=2,graphs = TRUE)
lc3<-poLCA(f,bin.votes,nclass=3,graphs = TRUE)
lc4<-poLCA(f,bin.votes,nclass=4,graphs = TRUE)
#This is to create the data frame with party data.
Join Our Discord Server. Be part of an engaging online community. Join Here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
An Engineering graduate with Master's degree in Data Science and having expertise in Machine learning, Deep learning and Data visualization. Other than work, you can find me as a fun-loving person with hobbies such as reading , music and sports.