Computer vision models require a lot of effort in deciding the architecture and fitting them with the large-sized data. To reduce such efforts, transfer learning can be utilized which is all about using a pre-trained model to solve different problems. PyTorchCV is a framework that provides us with a lot of pre-trained computer vision models that are considered as high-performing solutions than the existing ones. In this article, we are going to discuss how to build transfer learning models in PyTorch with PyTorchCV. The major points to be discussed in the article are listed below.
Table of content
- What is transfer learning?
- Transfer learning in computer vision
- What is PyTorchCV?
- Transfer learning with PyTorchCV
Let’s start with knowing what transfer learning is.
What is transfer learning?
Transfer learning is a type of machine learning program in which we utilize a trained model to perform tasks using different information. For example, a random forest model trained in an IRIS data set can also work with digits data.
As we know machine learning algorithms work based on the historical information and the outcome we require from them is a sort of prediction of any class or prediction of the future. We design these models so that they can perform an isolated task. We can also say that in a normal machine learning program we transfer source knowledge to target knowledge.
In transfer learning, we also perform normal machine learning tasks but once a model is prepared for data we use its knowledge gained with different information. Simply said, transfer learning is a process of using a trained model with information on which the model is not trained. Using this method we can save the cost of building and training a model.
In recent years, we can witness a variety of models that can be utilized in transfer learning settings and we get these models from various frameworks. One of the great examples of such a framework is hugging face where we get models for every field of data science and artificial intelligence, instead of this framework we also get models from big companies like Microsoft, Google, and Facebook. In this article, our focus is on a framework that provides models for computer vision tasks in a transfer learning setting. Before looking at the framework we need to know about the models that can give the state of the art performance for computer vision tasks.
Are you looking for a complete repository of Python libraries used in data science, check out here.
Transfer learning for computer vision
Computer vision is an important part of data science and artificial intelligence as well as it is difficult also because it mainly deals with images and video data. Such data is very difficult to extract information from it and make a machine learn patterns from extracted information. Instead of building such a difficult model, we can prefer to use a pre-trained model for better performance.
Various models are available to use in transfer learning settings in every section of computer vision. Some of them are as follows:
- VGG(virtual geometry group): it is a deep convolutional neural network.
- ResNet(residual network): this model consists of several stages of convolutional and identity blocks.
- DenseNet: this model has densely connected convolutional blocks.
- DeepLabV3: DeepLab is a state-of-the-art semantic segmentation model developed by the researchers of Google.
- PSPNet(pyramid scene parsing network): This model utilizes the pyramid parsing module for completing the semantic segmentation.
- DenseASPP(Densely connected Atrous Spatial Pyramid Pooling): this model connects several atrous convolution blocks in a dense system.
- SSD(Single Shot MultiBox Detector): this model is developed for object detection using a single deep learning neural network.
- Faster R-CNN: This model is used for real-time object detection and consists of a network that can detect objects in a region of the image.
- FPN(feature pyramid networks): This model utilizes the pyramid parsing module for completing object detection in images.
- CPM(convolutional pose machine)
- OpenPose: This model uses Part Affinity Fields for real-time pose estimation.
Here we have looked at some of the models we can use in computer vision models for computer vision problems. Let’s take a look at a framework that is only designed to perform transfer learning in the computer vision field named PyTorchCV.
What is PyTorchCV?
PyTorchCV is a framework that is built using the PyTorch library and consists of transfer learning models that are related to only computer vision modelling. PyTorchCV provides the feature of building high-performing deep learning models that have shown better performance than the other existing frameworks.
We can find the GitHub repository of this framework here. From the repository, we can utilize the source codes for various state-of-the-art computer vision models.
Since this framework is built on PyTorch, a general user of PyTorch can easily understand the uses of this framework. The models this framework has in collections are trained using datasets like ImageNet-1K, CIFAR-10/100, SVHN, CUB-200-2011, Pascal VOC2012, ADE20K, Cityscapes, and COCO. We can find all the implemented models in the framework here. We can utilize models from this framework after installing it in our environment.
The installation can be done in the following way.
!pip install pytorchcv
Still, there is a recommendation from the developer side to use this framework with torch version >= 0.4.1. We can install both at the same time using the following lines of codes:
!pip install pytorchcv torch>=0.4.0
After installation, we can use the pre-trained models that are available in the framework.
Transfer learning with PyTorchCV
This section includes the basic information on the implementation of a model provided by the PyTorchCV framework. For example, if we want to use a resnet-18 as a transfer learning model, we can do this in the following way:
from pytorchcv.model_provider import get_model as ptcv_get_model net = ptcv_get_model("resnet18", pretrained=True) net
Here we can see the structure of ResNet. Since the image is too large it is not posted here.
Defining an image
import torch from torch.autograd import Variable x = Variable(torch.randn(1, 3, 224, 224))
Fitting the model on the image
y = net(x)
In the above step, we will get the extracted features from the image by the instantiated ResNet-18 transfer learning model.
Here we have seen how we can utilize this framework and its provided pre-trained model. One thing that I liked about this framework is that developers of this framework are focused only on implementing models related to computer vision only. Because implementation is only related to computer vision, this framework is oriented to high performance in computer vision tasks and can help in projects related to only computer vision. Since most of the transfer learning models in computer vision are built using convolutional neural networks this framework does not split its knowledge into other sections of neural networks like RNN and LSTM. This feature makes this framework lightweight and high-performing. One more thing which is good about the framework is it uses Pytorch as its base library so whenever checking the models from the source code, a good user of PyTorch can easily understand the process of extracting a pre-trained model. These all qualities can make us use this framework for our computer vision transfer learning procedures.
In this article, we have discussed transfer learning and transfer learning in computer vision. Along with this, we look at the list of some models that can be used in transfer learning settings for solving computer vision problems and we have looked at the usage of a framework PyTorchCV that only includes computer vision models.