Now Reading
Complete Guide To ShuffleNet V1 With Implementation In Multiclass Image Classification

Complete Guide To ShuffleNet V1 With Implementation In Multiclass Image Classification

Ankit Das

With the recent advancement in the field of deep learning building deeper convolutional neural networks has become a trend for solving visualization problems. Though it gives accurate results the CNNs require computation of billions of FLOPS. To overcome this issue we introduce a computation efficient CNN architecture named ShuffleNet which is designed especially for mobile devices, drones and robots. It gives the best accuracy in the very limited computational budget.

This article demonstrates how we can implement a deep learning model with ShuffleNet architecture to classify images of CIFAR-10 dataset.  Here, we define a Convolutional Neural Network (CNN) model using Torch to train this model. We will test the model to check the reduction in computational cost and obtain accuracy.

Architecture of ShuffleNet


This architecture uses pointwise group convolutions and channel shuffling to reduce the computational cost. In the first case, information is blocked as outputs from a certain group only relate to inputs within the group. To solve this issue we use channel shuffling operation as illustrated in the second case of the above figure. Information is passed on to different groups in the group convolution layer.

About the Dataset

The CIFAR-10 dataset contains 60,000 224×224 colour images in 10 different classes. There are 6,000 images of each class. The 10 different classes represent aeroplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.


We will use google Colab for image classification. Start with mounting the drive.


After signing in to google account we get the authorization code, then enter the code in the text box that will be displayed. The drive should now be mounted.

Once the drive is mounted we will proceed to define the methods for loading the data set, initializing the CNN model, training and testing. First of all, we will import the required libraries.

#Importing Libraries
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torchvision import models
from torchsummary import summary
from torch import nn,optim
import torch.nn.functional as F
import numpy as np
import pandas as pd
import torchvision
import os
import sys
import time
import math
import datetime as dt
import tqdm
import argparse
import glob
import matplotlib.pyplot as plt
import tarfile
import warnings 
import torch.optim as optim

After it, we will proceed by displaying the image.

Exploring the image dataset

from matplotlib.pyplot import figure
figure(num=None, figsize=(5, 5), dpi=150, facecolor='w', edgecolor='k')
def show_imgs(X):
    k = 0
    for i in range(0,3):
        for j in range(0,3):
            k = k+1
    # show the plot
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

#Initialize values
device = 'cuda' if torch.cuda.is_available() else 'cpu'
best_acc = 0  # best test accuracy
start_epoch = 0  # start from epoch 0 or last checkpoint epoch
batch_size = 128


With data augmentation, we can get better accuracy. Normalize the data before training the data. Add padding, RandomHorizontalFlip and RandomCrop to it.

print('==> Preparing data..')
transform_train = transforms.Compose([
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
transform_test = transforms.Compose([
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
train_loader =, batch_size=batch_size, shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
test_loader =, batch_size=batch_size, shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# functions to show an image
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.pyplot import figure
figure(num=None, figsize=(8, 8), dpi=150, facecolor='w', edgecolor='k')
# functions to show an image
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
# get some random training images
dataiter = iter(train_loader)
images, labels =
# show images
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(5)))

Defining Shufflenet for Our Work

The below code snippet will define the ShuffleNet Architecture. The image 224*224 is passed on to the convolution layer with filter size 3*3 and stride 2. ShuffleNet uses pointwise group convolution so the model is passed over two GPUs.We get the image size for the next layer by applying formula (n+2p-f)/s +1 where n input channel,p is padding,f is kernel size and s is stride. The features are passed on to a fully connected layer that classifies the image out of 1000 classes.

See Also
TextBlob Text Classification

import torch
import torch.nn as nn
import torch.nn.functional as F
class ShuffleBlock(nn.Module):
    def __init__(self, groups):
        super(ShuffleBlock, self).__init__()
        self.groups = groups
    def forward(self, x):
        '''Channel shuffle: [N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]'''
        N,C,H,W = x.size()
        g = self.groups
        return x.view(N,g,C//g,H,W).permute(0,2,1,3,4).reshape(N,C,H,W)
class Bottleneck(nn.Module):
    def __init__(self, in_planes, out_planes, stride, groups):
        super(Bottleneck, self).__init__()
        self.stride = stride
        mid_planes =int(out_planes/4)
        g = 1 if in_planes==24 else groups
        self.conv1 = nn.Conv2d(in_planes, mid_planes, kernel_size=1, groups=g, bias=False)
        self.bn1 = nn.BatchNorm2d(mid_planes)
        self.shuffle1 = ShuffleBlock(groups=g)
        self.conv2 = nn.Conv2d(mid_planes, mid_planes, kernel_size=3, stride=stride, padding=1, groups=mid_planes, bias=False)
        self.bn2 = nn.BatchNorm2d(mid_planes)
        self.conv3 = nn.Conv2d(mid_planes, out_planes, kernel_size=1, groups=groups, bias=False)
        self.bn3 = nn.BatchNorm2d(out_planes)
        self.shortcut = nn.Sequential()
        if stride == 2:
            self.shortcut = nn.Sequential(nn.AvgPool2d(3, stride=2, padding=1))
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.shuffle1(out)
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        res = self.shortcut(x)
        out = F.relu([out,res], 1)) if self.stride==2 else F.relu(out+res)
        return out
class ShuffleNet(nn.Module):
    def __init__(self, cfg):
        super(ShuffleNet, self).__init__()
        out_planes = cfg['out_planes']
        num_blocks = cfg['num_blocks']
        groups = cfg['groups']
        self.conv1 = nn.Conv2d(3, 24, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(24)
        self.in_planes = 24
        self.layer1 = self._make_layer(out_planes[0], num_blocks[0], groups)
        self.layer2 = self._make_layer(out_planes[1], num_blocks[1], groups)
        self.layer3 = self._make_layer(out_planes[2], num_blocks[2], groups)
        self.linear = nn.Linear(out_planes[2], 10)
    def _make_layer(self, out_planes, num_blocks, groups):
        layers = []
        for i in range(num_blocks):
            stride = 2 if i == 0 else 1
            cat_planes = self.in_planes if i == 0 else 0
            layers.append(Bottleneck(self.in_planes, out_planes-cat_planes, stride=stride, groups=groups))
            self.in_planes = out_planes
        return nn.Sequential(*layers)
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out
def ShuffleNetG2():
    cfg = {
        'out_planes': [200,400,800],
        'num_blocks': [4,8,4],
        'groups': 2
    return ShuffleNet(cfg)
def ShuffleNetG3():
    cfg = {
        'out_planes': [240,480,960],
        'num_blocks': [4,8,4],
        'groups': 3
    return ShuffleNet(cfg)
net = ShuffleNetG2()
x = torch.randn(1,3,32,32)
y = net(x)
net = ShuffleNetG2()
#Setting the model on CUDA
if torch.cuda.is_available():
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)

Training Testing and Making Predictions

Now, we are all set to train and test the model on the CIFAR-10 dataset. Before we start setting the model on CUDA and use stochastic gradient descent optimizer.For better accuracy train and test the model with 60 epochs.

# Training the model
def train_net():
    train_loss = 0
    n_correct = 0
    n_total = 0
    for batch_size, (inputs, targets) in enumerate(train_loader):
        inputs, targets =,
        outputs = net(inputs)
        loss = criterion(outputs, targets)
        train_loss += loss.item()
        _, predicted = outputs.max(1)
        n_correct += predicted.eq(targets).sum().item()
        n_total += targets.shape[0]
    return train_loss/(batch_size+1),n_correct/n_total
def get_loss_acc(is_test_dataset = True):
    dataloader = test_loader if is_test_dataset else train_loader
    n_correct = 0
    n_total = 0
    test_loss = 0
    with torch.no_grad():
        for batch_size, (inputs, targets) in enumerate(dataloader):
            inputs, targets =,
            outputs = net(inputs)
            test_loss += criterion(outputs, targets).item()
            _, predicted = outputs.max(1)
            n_correct += predicted.eq(targets).sum().item()
            n_total += targets.shape[0]
    return test_loss/(batch_size+1),n_correct/n_total
#Testing the model
def test(epoch):
    global best_acc
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(test_loader):
            inputs, targets =,
            outputs = net(inputs)
            loss = criterion(outputs, targets)
            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
            print(batch_idx, len(test_loader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))
  # Save checkpoint.
    acc = 100.*correct/total
    if acc > best_acc:
        state = {
            'net': net.state_dict(),
            'acc': acc,
            'epoch': epoch,
        if not os.path.isdir('checkpoint'):
            os.mkdir('checkpoint'), './checkpoint/ckpt.pth'), './checkpoint/net.pth')
        best_acc = acc
import glob
import torch.optim as optim
import datetime as dt
EPOCH = 60
start =
for epochi in range(start_epoch,start_epoch + EPOCH):
    cur_lr = [i['lr'] for i in optimizer.param_groups][0]
    print("Batch Size",batch_size,'(%.2fs)\n\nEpoch: %d/%d | cur_lr:%.4f ' % (
       (, epochi+1,EPOCH+start_epoch,cur_lr))
    start =
    test_loss , test_acc = get_loss_acc()
    train_loss , train_acc = train_net()
    #hist.append([train_loss , train_acc,test_loss , test_acc])
    print( 'train Loss: %.3f | Acc: %.3f%% \ntest Loss: %.3f | Acc: %.3f%% ' % (
        train_loss, train_acc*100,test_loss, test_acc*100))

#Count Parameters
def count_parameters(model):
    pytorch_total_params = sum(p.numel() for p in model.parameters())
    pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

Compared to other architecture the number of parameters is less.

#Model Accuracy
total_correct = 0
total_images = 0
confusion_matrix = np.zeros([10,10], int)
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        images =
        labels =
        outputs = net(images)
        _, predicted = torch.max(, 1)
        total_images += labels.size(0)
        total_correct += (predicted == labels).sum().item()
        for i, l in enumerate(labels):
            confusion_matrix[l.item(), predicted[i].item()] += 1 
model_accuracy = total_correct / total_images * 100
print('Model accuracy on {0} test images: {1:.2f}%'.format(total_images, model_accuracy))

Results of the Model

print('{0:10s} - {1}'.format('Category','Accuracy'))
for i, r in enumerate(confusion_matrix):
    print('{0:10s} - {1:.1f}'.format(classes[i], r[i]/np.sum(r)*100))

#Plot the confusion Matrix
fig, ax = plt.subplots(1,1,figsize=(8,6))
ax.matshow(confusion_matrix, aspect='auto', vmin=0, vmax=1000, cmap=plt.get_cmap('Blues'))
plt.ylabel('Actual Category')
plt.yticks(range(10), classes)
plt.xlabel('Predicted Category')
plt.xticks(range(10), classes)


As we can see in the above result the model has very high accuracy on both training and test. The number of parameters is less thereby reducing the computational complexity. So we can conclude that the model has given accurate predictions to classify images on the CIFAR-10 dataset. With an increase in the number of epochs, we can get better accuracy. We can experiment it further by adding jitter and brightness to the dataset.
The complete code for the above implementation is available at the AIM’s GitHub repository. Please go through this link to check the notebook with the codes.

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
What's Your Reaction?
In Love
Not Sure

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top