With the recent advancement in the field of deep learning building deeper convolutional neural networks has become a trend for solving visualization problems. Though it gives accurate results the CNNs require computation of billions of FLOPS. To overcome this issue we introduce a computation efficient CNN architecture named ShuffleNet which is designed especially for mobile devices, drones and robots. It gives the best accuracy in the very limited computational budget.
This article demonstrates how we can implement a deep learning model with ShuffleNet architecture to classify images of CIFAR-10 dataset. Here, we define a Convolutional Neural Network (CNN) model using Torch to train this model. We will test the model to check the reduction in computational cost and obtain accuracy.
Architecture of ShuffleNet
This architecture uses pointwise group convolutions and channel shuffling to reduce the computational cost. In the first case, information is blocked as outputs from a certain group only relate to inputs within the group. To solve this issue we use channel shuffling operation as illustrated in the second case of the above figure. Information is passed on to different groups in the group convolution layer.
About the Dataset
The CIFAR-10 dataset contains 60,000 224×224 colour images in 10 different classes. There are 6,000 images of each class. The 10 different classes represent aeroplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.
Implementation
We will use google Colab for image classification. Start with mounting the drive.
After signing in to google account we get the authorization code, then enter the code in the text box that will be displayed. The drive should now be mounted.
Once the drive is mounted we will proceed to define the methods for loading the data set, initializing the CNN model, training and testing. First of all, we will import the required libraries.
#Importing Libraries import torch import torch.nn as nn import torchvision.transforms as transforms import torchvision.datasets as dsets from torchvision import models from torchsummary import summary from torch import nn,optim import torch.nn.functional as F import numpy as np import pandas as pd import torchvision import os import sys import time import math import datetime as dt import tqdm import argparse import glob import matplotlib.pyplot as plt import tarfile import warnings import torch.optim as optim import torch.utils.data warnings.filterwarnings("ignore")
After it, we will proceed by displaying the image.
Exploring the image dataset
from matplotlib.pyplot import figure figure(num=None, figsize=(5, 5), dpi=150, facecolor='w', edgecolor='k') def show_imgs(X): plt.figure(1) k = 0 for i in range(0,3): for j in range(0,3): plt.subplot2grid((3,3),(i,j)) plt.imshow(Image.fromarray(X[k])) k = k+1 # show the plot plt.show() (x_train, y_train), (x_test, y_test) = cifar10.load_data() show_imgs(x_test[:9]) #Initialize values device = 'cuda' if torch.cuda.is_available() else 'cpu' best_acc = 0 # best test accuracy start_epoch = 0 # start from epoch 0 or last checkpoint epoch batch_size = 128
#Transform
With data augmentation, we can get better accuracy. Normalize the data before training the data. Add padding, RandomHorizontalFlip and RandomCrop to it.
print('==> Preparing data..') transform_train = transforms.Compose([ transforms.Pad(4), transforms.RandomHorizontalFlip(), transforms.RandomCrop(32), transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), ]) transform_test = transforms.Compose([ transforms.Pad(4), transforms.RandomHorizontalFlip(), transforms.RandomCrop(32), transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), ]) trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train) train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2) testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test) test_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2) classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') # functions to show an image import matplotlib.pyplot as plt import numpy as np from matplotlib.pyplot import figure figure(num=None, figsize=(8, 8), dpi=150, facecolor='w', edgecolor='k') # functions to show an image classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') def imshow(img): img = img / 2 + 0.5 # unnormalize npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0))) plt.show() # get some random training images dataiter = iter(train_loader) images, labels = dataiter.next() # show images imshow(torchvision.utils.make_grid(images)) # print labels print(' '.join('%5s' % classes[labels[j]] for j in range(5)))
Defining Shufflenet for Our Work
The below code snippet will define the ShuffleNet Architecture. The image 224*224 is passed on to the convolution layer with filter size 3*3 and stride 2. ShuffleNet uses pointwise group convolution so the model is passed over two GPUs.We get the image size for the next layer by applying formula (n+2p-f)/s +1 where n input channel,p is padding,f is kernel size and s is stride. The features are passed on to a fully connected layer that classifies the image out of 1000 classes.
import torch import torch.nn as nn import torch.nn.functional as F class ShuffleBlock(nn.Module): def __init__(self, groups): super(ShuffleBlock, self).__init__() self.groups = groups def forward(self, x): '''Channel shuffle: [N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]''' N,C,H,W = x.size() g = self.groups return x.view(N,g,C//g,H,W).permute(0,2,1,3,4).reshape(N,C,H,W) class Bottleneck(nn.Module): def __init__(self, in_planes, out_planes, stride, groups): super(Bottleneck, self).__init__() self.stride = stride mid_planes =int(out_planes/4) g = 1 if in_planes==24 else groups self.conv1 = nn.Conv2d(in_planes, mid_planes, kernel_size=1, groups=g, bias=False) self.bn1 = nn.BatchNorm2d(mid_planes) self.shuffle1 = ShuffleBlock(groups=g) self.conv2 = nn.Conv2d(mid_planes, mid_planes, kernel_size=3, stride=stride, padding=1, groups=mid_planes, bias=False) self.bn2 = nn.BatchNorm2d(mid_planes) self.conv3 = nn.Conv2d(mid_planes, out_planes, kernel_size=1, groups=groups, bias=False) self.bn3 = nn.BatchNorm2d(out_planes) self.shortcut = nn.Sequential() if stride == 2: self.shortcut = nn.Sequential(nn.AvgPool2d(3, stride=2, padding=1)) def forward(self, x): out = F.relu(self.bn1(self.conv1(x))) out = self.shuffle1(out) out = F.relu(self.bn2(self.conv2(out))) out = self.bn3(self.conv3(out)) res = self.shortcut(x) out = F.relu(torch.cat([out,res], 1)) if self.stride==2 else F.relu(out+res) return out class ShuffleNet(nn.Module): def __init__(self, cfg): super(ShuffleNet, self).__init__() out_planes = cfg['out_planes'] num_blocks = cfg['num_blocks'] groups = cfg['groups'] self.conv1 = nn.Conv2d(3, 24, kernel_size=1, bias=False) self.bn1 = nn.BatchNorm2d(24) self.in_planes = 24 self.layer1 = self._make_layer(out_planes[0], num_blocks[0], groups) self.layer2 = self._make_layer(out_planes[1], num_blocks[1], groups) self.layer3 = self._make_layer(out_planes[2], num_blocks[2], groups) self.linear = nn.Linear(out_planes[2], 10) def _make_layer(self, out_planes, num_blocks, groups): layers = [] for i in range(num_blocks): stride = 2 if i == 0 else 1 cat_planes = self.in_planes if i == 0 else 0 layers.append(Bottleneck(self.in_planes, out_planes-cat_planes, stride=stride, groups=groups)) self.in_planes = out_planes return nn.Sequential(*layers) def forward(self, x): out = F.relu(self.bn1(self.conv1(x))) out = self.layer1(out) out = self.layer2(out) out = self.layer3(out) out = F.avg_pool2d(out, 4) out = out.view(out.size(0), -1) out = self.linear(out) return out def ShuffleNetG2(): cfg = { 'out_planes': [200,400,800], 'num_blocks': [4,8,4], 'groups': 2 } return ShuffleNet(cfg) def ShuffleNetG3(): cfg = { 'out_planes': [240,480,960], 'num_blocks': [4,8,4], 'groups': 3 } return ShuffleNet(cfg) net = ShuffleNetG2() x = torch.randn(1,3,32,32) y = net(x) print(y) net = ShuffleNetG2() print(net) #Setting the model on CUDA if torch.cuda.is_available(): net.cuda() criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
Training Testing and Making Predictions
Now, we are all set to train and test the model on the CIFAR-10 dataset. Before we start setting the model on CUDA and use stochastic gradient descent optimizer.For better accuracy train and test the model with 60 epochs.
# Training the model def train_net(): net.train() train_loss = 0 n_correct = 0 n_total = 0 for batch_size, (inputs, targets) in enumerate(train_loader): inputs, targets = inputs.to(device), targets.to(device) optimizer.zero_grad() outputs = net(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() train_loss += loss.item() _, predicted = outputs.max(1) n_correct += predicted.eq(targets).sum().item() n_total += targets.shape[0] return train_loss/(batch_size+1),n_correct/n_total def get_loss_acc(is_test_dataset = True): net.eval() dataloader = test_loader if is_test_dataset else train_loader n_correct = 0 n_total = 0 test_loss = 0 with torch.no_grad(): for batch_size, (inputs, targets) in enumerate(dataloader): inputs, targets = inputs.to(device), targets.to(device) outputs = net(inputs) test_loss += criterion(outputs, targets).item() _, predicted = outputs.max(1) n_correct += predicted.eq(targets).sum().item() n_total += targets.shape[0] return test_loss/(batch_size+1),n_correct/n_total #Testing the model def test(epoch): global best_acc net.eval() test_loss = 0 correct = 0 total = 0 with torch.no_grad(): for batch_idx, (inputs, targets) in enumerate(test_loader): inputs, targets = inputs.to(device), targets.to(device) outputs = net(inputs) loss = criterion(outputs, targets) test_loss += loss.item() _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() print(batch_idx, len(test_loader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)' % (test_loss/(batch_idx+1), 100.*correct/total, correct, total)) # Save checkpoint. acc = 100.*correct/total if acc > best_acc: print('Saving..') state = { 'net': net.state_dict(), 'acc': acc, 'epoch': epoch, } if not os.path.isdir('checkpoint'): os.mkdir('checkpoint') torch.save(state, './checkpoint/ckpt.pth') torch.save(net, './checkpoint/net.pth') best_acc = acc import glob import torch.optim as optim import datetime as dt EPOCH = 60 start = dt.datetime.now() start_epoch=0 for epochi in range(start_epoch,start_epoch + EPOCH): #scheduler.step() cur_lr = [i['lr'] for i in optimizer.param_groups][0] print("Batch Size",batch_size,'(%.2fs)\n\nEpoch: %d/%d | cur_lr:%.4f ' % ( (dt.datetime.now()-start).seconds, epochi+1,EPOCH+start_epoch,cur_lr)) start = dt.datetime.now() test_loss , test_acc = get_loss_acc() train_loss , train_acc = train_net() #hist.append([train_loss , train_acc,test_loss , test_acc]) print( 'train Loss: %.3f | Acc: %.3f%% \ntest Loss: %.3f | Acc: %.3f%% ' % ( train_loss, train_acc*100,test_loss, test_acc*100)) #Count Parameters def count_parameters(model): pytorch_total_params = sum(p.numel() for p in model.parameters()) print("Total_params",pytorch_total_params) pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad) print("Trainable_params",pytorch_total_params)
Compared to other architecture the number of parameters is less.
#Model Accuracy total_correct = 0 total_images = 0 confusion_matrix = np.zeros([10,10], int) with torch.no_grad(): for data in test_loader: images, labels = data images = images.to(device) labels = labels.to(device) outputs = net(images) _, predicted = torch.max(outputs.data, 1) total_images += labels.size(0) total_correct += (predicted == labels).sum().item() for i, l in enumerate(labels): confusion_matrix[l.item(), predicted[i].item()] += 1 model_accuracy = total_correct / total_images * 100 print('Model accuracy on {0} test images: {1:.2f}%'.format(total_images, model_accuracy))
Results of the Model
#Result print('{0:10s} - {1}'.format('Category','Accuracy')) for i, r in enumerate(confusion_matrix): print('{0:10s} - {1:.1f}'.format(classes[i], r[i]/np.sum(r)*100)) #Plot the confusion Matrix fig, ax = plt.subplots(1,1,figsize=(8,6)) ax.matshow(confusion_matrix, aspect='auto', vmin=0, vmax=1000, cmap=plt.get_cmap('Blues')) plt.ylabel('Actual Category') plt.yticks(range(10), classes) plt.xlabel('Predicted Category') plt.xticks(range(10), classes) plt.show()
Conclusion
As we can see in the above result the model has very high accuracy on both training and test. The number of parameters is less thereby reducing the computational complexity. So we can conclude that the model has given accurate predictions to classify images on the CIFAR-10 dataset. With an increase in the number of epochs, we can get better accuracy. We can experiment it further by adding jitter and brightness to the dataset.
The complete code for the above implementation is available at the AIM’s GitHub repository. Please go through this link to check the notebook with the codes.