Active Hackathon

How to use logistic regression for image classification? 

This article is an attempt to showcase the capability of Logistic Regression as a machine learning algorithm for image classification.
Listen to this story

Image Classification is a process of classifying various image categories to their appropriate labels or categories it is associated with. Image classification is mostly employed with Convolutional Neural Networks (CNNs), but this article is an attempt to showcase that even logistic regression has the capability to classify images efficiently with a reduction in computational time and also to waive off the tedious task of building complex models for image classification. 

Table of Contents

  1. An overview of Logistic Regression
  2. Case Study for Image Classification with Logistic Regression
  3. Summary

An overview of Logistic Regression

Logistic Regression is one of the supervised machine learning algorithms which would be majorly employed for binary class classification problems where according to the occurrence of a particular category of data the outcomes are fixed. Logistic regression operates basically through a sigmoidal function for values ranging between 0 and 1.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Case Study for Image Classification with Logistic Regression

As mentioned earlier as this article emphasizes using Logistic Regression for Image classification we are using the Hand Sign Digit Classification dataset with two categories of images showing Hand Signs of 0 and 1.

A numpy format dataset was utilized for this article, so the input and the output dataset were loaded into the working environment appropriately as shown below and the main reason for using the numpy format data is for easy computation as numpy data processing is faster when compared to other data types. Below are the steps to be followed to load numpy data into the working environment.

inp_df=np.load('/content/drive/MyDrive/Colab notebooks/Image classificatiob using LOGREG/inp.npy')
out_df=np.load('/content/drive/MyDrive/Colab notebooks/Image classificatiob using LOGREG/op.npy')

Once the dataset was loaded into the working environment the shape of the numpy data was determined to estimate the number of rows and columns present in the data and it was seen that there are 410 images of size (64,64) in the input data used and there are 410 images in the output data. The shape of the data can be computed as shown below.

print('Input Dataframe shape',inp_df.shape)
print('Output Dataframe shape',out_df.shape)

The output of the shape command will be as shown below.

Once the dataset was loaded into the working environment the dataset was split for the training and testing with a split ratio of 80:20 respectively using the scikit-learn model selection module as shown below.

from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(inp_df,out_df,test_size=0.2,random_state=42)

It is a better practice to mention a random value for the random_state parameter while splitting the data to ensure uniform shuffles of data for training and testing. Later the split data was used to visualize the data present across the training and testing phase using subplots to validate the split among the input and the output as shown below.

plt.figure(figsize=(15,5))
for i in range(1,6):
 plt.subplot(1,5,i)
 plt.imshow(X_train[i,:,:],cmap='gray')
 plt.title('Sign language of {}'.format(Y_train[i]))
 plt.axis('off')
 plt.tight_layout()
plt.show()

Here random parameters for figure size were mentioned to obtain clear visible visual and later the initial images of the data were obtained as shown below.

As we are working with the image dataset and for the classification of images we are using the Logistic Regression algorithm it was necessary to reshape the dependent component of the train and test appropriately as Logistic Regression is built to work with at most two dimensions of data and moreover this being an image dataset it is necessary to reduce the dimensions of the image data which is originally in three dimensions to two dimensions as shown below to evacuate the issues with respect to dimensionality.

X_train=X_train.reshape(328,64*64)
X_test=X_test.reshape(82,64*64)

Once the necessary data preprocessing steps were taken up, the Logistic regression model was fitted to the split data by importing the necessary scikit linear model package for Logistic Regression as shown below.

from sklearn.linear_model import LogisticRegression

Once the necessary module was imported into the working environment the LogisticRegression model was fitted onto the split data as shown below.

logreg =  LogisticRegression()
logreg.fit(X_train,Y_train)

Later the model was taken up for prediction for different test scenarios where the model was able to yield the right predictions. 

y_pred=logreg.predict(X_test)

One of the image classification results from the Logistic regression model implemented is shown below where the implemented model’s ability to correctly classify the image samples can be observed.

Later the accuracy score of the logistic regression model was obtained for the test data as shown below to evaluate the model’s nature of genericness and reliability when the model is tested for changing data, wherein the Logistic regression model was able to yield an overall accuracy score of 98% for the test data. The steps to obtain the accuracy score from a logistic regression model are shown in the below figure.

However, relying only on the parameter of accuracy would not be right all the time as it would lead to misinterpreting results. Due to this, the various other performance metrics of the logistic regression model implemented were evaluated through a classification report where parameters such as precision, recall, and f1-score can be evaluated in order to make suitable interpretations from the models. 

Out of this when the harmonic mean or in simple terms the F1 score parameter also for both the classes falls in a considerable range close to 98% for ‘0’ class and 97% for ‘1’ class which is an indicator of a reliable model. For better understanding, the classification report for the logistic regression model implemented is shown below. 

Summary

Image classification is one such application in the domain of Deep Learning and Image Processing where at certain times multi-level classification is taken up with models like Convolutional Neural Networks where the model built, might have to propagate through various layers. However, if there is a requirement for binary image classification even a simple yet effective supervised machine learning algorithm model like Logistic Regression can be implemented to obtain appropriate image classification as briefed in this article.

References

More Great AIM Stories

Darshan M
Darshan is a Master's degree holder in Data Science and Machine Learning and an everyday learner of the latest trends in Data Science and Machine Learning. He is always interested to learn new things with keen interest and implementing the same and curating rich content for Data Science, Machine Learning,NLP and AI

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

[class^="wpforms-"]
[class^="wpforms-"]