Today, Image classification has found widespread use in many industries and across verticals. Everyone is taking up tasks involving image classification and utilizing the benefits of neural networks. Image classification is done with the help of Convolutional Neural Networks. Convolutional Neural Networks use various convolution layers, pooling layers, batch normalization and fully connected layers to construct the entire network. Most of it just involves plain matrix multiplication. As time is advancing, image classification is just getting simpler and simpler. You do not have to make a lot of effort to get your image classification application into place soon.
With the development of python libraries like TensorFlow and Keras, image classification has become way too simple. These frameworks provide the structure for the various layers and creating them is a minuscule task. When you come across a problem statement that involves image classification, your instinctive approach would be to create your own model which would contain ‘x’ number of layers.
Sign up for your weekly dose of what's up in emerging technology.
Checking the performance and the other parameters of the model you realise that the network you created doesn’t give a very good performance. You try tweaking the usual parameters and run behind the hyperparameters but still in vain. This happened with me recently, I realised that creating your own network is important but you have to also save time and work in the most optimum manner to get results.
Transfer learning is one such medium to save on time and experiment on the datasets once your dataset is ready to be modelled. Transfer learning includes the usage of pre trained models. These pre trained models are models trained on particular datasets with the best performance for that dataset. These pre trained models have a well defined architecture which will give a very good performance.
These pre trained models could be used as-is or some modifications can be made to its layers depending on your purpose. These pre-trained models have weights that are already saved and help save on time for the frozen layers. For example, the VGG – 16 models are trained on the ImageNet dataset and have more than a lakh class. So if we want to use the same architecture of VGG-16 for our application, we will need to change the number of classes in our last layer. So for performing such a kind of operation, you can unfreeze the layers and modify them accordingly.
Tensorflow and Keras frameworks provide the capability of using pre-trained models. These pre-trained models are already a part of the Keras framework. They can be directly used by importing them from the Keras models. After importing the model, you can directly decide on which layers you want to unfreeze and which layers you want to use as is. Keras consists of all the famous pre-trained models like VGG, Inception, Xception, ResNet etc. These models were all developed by individuals in the past as a part of solving some kind of a business question. These models are designed in such a way that gives good performance on any dataset.Hence these pre trained models should be tried upon in any case of image classification.
I had to recently deal with an image classification problem statement which involved classifying various raw food ingredients on the basis of the image supplied. My approach was also instinctive and I went ahead with writing my own convolution layers, Max Pooling layers, Drop out layers etc, but even with hyper parameter optimization, I wasn’t getting considerable results. So I thought of experimenting with the various pre-existing pre-trained models. I started off with VGG 16 as it is one of the oldest and the most famous architecture with 16 layers.
The VGG model gave me a decent accuracy of 88-89% on the train dataset but an accuracy of around 75% on the testing dataset. Seeing the difference between the accuracies, the question of over fitting would soon come into picture. I tried with the next model which is the Inception Model. The Inception model was known to give a good performance of around 80% accuracy on the Imagenet dataset but to my despair, there was a gap of around 10% in the accuracy of the training and the test datasets. Then the next one that I tried was the Xception model, it had a remarkable history of performing better than the Inception Model.
The Xception model used depth separable convolution layers. It gave an accuracy of around 90% on the training dataset and around 12-15% lower accuracy on the test dataset. I had to logically think and apply another model called the ResNet 50 on the image dataset. The ResNet 50 model gave a very high accuracy of around 94% and a bare difference of 2% difference in the training and testing set performance. ResNet 50 was chosen as the model to be applied to the dataset as it had an added advantage of a skip mechanism where the feedback would be connected directly to the last output layer. It’s a strong lesson learned.
So whenever you are laid with the task of image classification, always check if you can apply transfer learning to it. Do not directly start making your architecture.