A Guide to Chainer: A Flexible Toolkit For Neural Networks

Implementing neural networks necessitates the use of a variety of specialized building elements, such as multidimensional arrays, activation functions, and automatic differentiation.


Implementing neural networks necessitates the use of a variety of specialized building elements, such as multidimensional arrays, activation functions, and automatic differentiation. We employ numerous frameworks such as Tensorflow, Pytorch, Theano, and others to avoid the complexity we confront if we choose to develop those things from scratch. In this post, we will look at a framework called Chainer and will understand how it is better than the traditional frameworks. We will also try to understand this superiority by implementing the Chainer framework in Python.

Here in this post, we will focus mostly on the following two points:-


Sign up for your weekly dose of what's up in emerging technology.
  1. Traditional Frameworks Vs Chainer Framework
  2. Implementing Chainer Framework in Python

Let us begin by defining the difference between the present framework’s approach and Chainer’s approach.

Traditional Frameworks Vs Chainer Framework

In typical neural network frameworks, models are frequently built in two phases, using a Define-and-Run technique as shown in Figure 1(a). During the Define phase, the model’s computational graph is constructed and developed. This stage involves generating a neural network object using a model definition that includes the inter-layer data flow graph, beginning weights, and activation functions.

The computations for both the forward and backward passes are commonly defined using automatic differentiation, with optional graph optimizations. The real forward and backward calculation of the graph is done in the Run phase. In this phase, the model is trained by minimizing the loss function using optimization algorithms such as stochastic gradient descent, given a set of training instances.

Fig 1 (Source) 

For static models like CNNs, the Define-and-Run paradigm works well since having the whole computational graph available allows for potential graph improvements to enhance memory efficiency and/or runtime performance. However, there are two key issues with applying various types of NN models.

  1. The first is that supporting general dynamic graphs, such as neural networks with control flow, can be time-consuming. Control flow decisions in frameworks such as TensorFlow are defined in the data flow graph using special operators such as Switch and Merge rather than the host language’s control flow syntaxes.
  1. The second issue is that the user does not have access to the neural network’s core mechanism under the Define-and-Run paradigm. This presents a variety of difficulties in constructing an appropriate model. For instance, in order to successfully debug and optimize a model

Chainer, on the other hand, uses a “Define-by-Run” approach as shown in Figure 1(b), in which the network is defined dynamically through the real forward computation. Chainer, rather than storing programming logic, stores the history of computation. This method allows us to fully utilize Python’s programming logic capabilities. Chainer, for example, does not require any magic to add conditionals and loops to network definitions. Chainer’s key notion is the Define-by-Run scheme.

Implementing Chainer Framework in Python

Here in this section, we will implement the CNN-based image classifier on the CIFAR10 dataset. We will try to explore various functionality given by the Chainer. 

Create model

Our model is defined as a Chain subclass. Three convolutional layers will be followed by two fully linked layers in our CNN model. Despite the fact that this is still a modest CNN. Each layer of a neural network is divided into one of two sorts of functions (really, function objects) in Chainer: ‘Link’ and ‘Function.’

  • A function with no learnable parameters is referred to as a function.
  • A function with (learnable) parameters is called a link.

Chainer’s Link can be thought of as a wrapper around a function that allows us to pass parameters to it. That is, when Link is called, it will also call the associated Function.

Then we implement code that conducts the “forward pass” computations to describe a model. Various connections and chains will be called by this code (recall that Link and Chain are callable objects). Chainer will automatically handle the “backward pass,” so we won’t have to worry about it unless we wish to create some special functions.

import chainer
import chainer.functions as F
import chainer.links as L
class MyModel(chainer.Chain):
    def __init__(self, n_out):
        super(MyModel, self).__init__()
        with self.init_scope():
            self.conv1=L.Convolution2D(None, 32, 3, 3, 1)
            self.conv2=L.Convolution2D(32, 64, 3, 3, 1)
            self.conv3=L.Convolution2D(64, 128, 3, 3, 1)
            self.fc4=L.Linear(None, 1000)
            self.fc5=L.Linear(1000, n_out)
    def __call__(self, x):
        h = F.relu(self.conv1(x))
        h = F.relu(self.conv2(h))
        h = F.relu(self.conv3(h))
        h = F.relu(self.fc4(h))
        h = self.fc5(h)
        return h
Training Phase

Let’s create a ‘train’ function that we can use to quickly train other models in the future. This function accepts a model object and trains it to categorize the 10 CIFAR10 classes before returning the trained model. This train function will be used to train the MyModel network described previously.

We need to construct batches of our dataset before we can train. Chainer already provides an Iterator class and various subclasses that can be used for this purpose, and users can easily create their own as well.

In this example, we’ll use the SerialIterator Iterator subclass. The SerialIterator can either return the examples in the same order as they occur in the dataset (that is, sequentially) or it can shuffle the instances and return them in random order.

The complete training loop can be coded as below

from chainer.datasets import cifar
from chainer import iterators
from chainer import optimizers
from chainer import training
from chainer.training import extensions
def train(model_object_, batch_size=64, gpu_id=0, Max_epoch=20):
    # 1. Get the Dataset
    train_set, test_set = cifar.get_cifar10()
    # 2. Create a Serial Iterator for Train and test data
    train_iter_set = iterators.SerialIterator(train_set, batch_size)
    test_iter_set = iterators.SerialIterator(test_set, batch_size, False, False)
    # 3. Use classifier from chainer's Link
    model_ = L.Classifier(model_object_)
    if gpu_id >=0:
    # 4. Set optimization
    opti_ = optimizers.Adam()
    # 5. Update weights
    updater_ = training.StandardUpdater(train_iter_set, opti_, device=gpu_id)
    # 6. Train network
    trainer = training.Trainer(updater_, (Max_epoch, 'epoch'), out='{}_cifar10_result'.format(model_object_.__class__.__name__))
    # 7. Evaluate network
    class Test_Mode_Evaluator(extensions.Evaluator):
        def evaluate(self):
            model_ = self.get_target('main')
            ret = super(Test_Mode_Evaluator, self).evaluate()
            return ret
    trainer.extend(Test_Mode_Evaluator(test_iter_set, model_, device=gpu_id))
    trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'main/accuracy', 'validation/main/loss', 'validation/main/accuracy', 'elapsed_time']))
    trainer.extend(extensions.PlotReport(['main/loss', 'validation/main/loss'], x_key='epoch', file_name='loss.png'))
    trainer.extend(extensions.PlotReport(['main/accuracy', 'validation/main/accuracy'], x_key='epoch', file_name='accuracy.png'))
    del trainer
    return model_
gpu_id = 0  # Set -1 if don't have a GPU
model = train(MyModel(10), gpu_id=gpu_id)

We have trained the network for 20 epochs and loss and accuracies is obtained as below.

As you can see there is much difference between training accuracies and validation accuracies it seems like the model has overfitted the dataset.

Let’s examine how well our CNN performs as we add more layers to it. We’ll also make our model modular by writing it as a three-chain combination. This will aid in the improvement of readability and the reduction of code duplication: ConvBlock – a single completely connected neural net, – a single convolutional neural net, – a single fully connected neural net, – a single fully connected neural net, – a single fully connected neural LinearBlock – Make a full model by chaining together a lot of these two blocks.

class ConvBlock(chainer.Chain):
    def __init__(self, n_ch, pool_drop=False):
        w = chainer.initializers.HeNormal()
        super(ConvBlock, self).__init__()
        with self.init_scope():
            self.conv = L.Convolution2D(None, n_ch, 3, 1, 1,
                                 nobias=True, initialW=w)
            self.bn = L.BatchNormalization(n_ch)
        self.pool_drop = pool_drop
    def __call__(self, x):
        h = F.relu(self.bn(self.conv(x)))
        if self.pool_drop:
            h = F.max_pooling_2d(h, 2, 2)
            h = F.dropout(h, ratio=0.25)
        return h
class LinearBlock(chainer.Chain):
    def __init__(self):
        w = chainer.initializers.HeNormal()
        super(LinearBlock, self).__init__()
        with self.init_scope():
            self.fc = L.Linear(None, 1024, initialW=w)
    def __call__(self, x):
        return F.dropout(F.relu(self.fc(x)), ratio=0.5)

ConvBlock is specified as a descendant of Chain. It has a single convolution layer and a Batch Normalization layer, both of which were registered by the constructor. The __call__ method receives data and performs an activation function on it. The Max Pooling and Dropout methods are also used if the pool drop is set to True.

Let’s now stack the component blocks to define the deeper CNN network.

class DeepCNN(chainer.ChainList):
    def __init__(self, n_output):
        super(DeepCNN, self).__init__(
            ConvBlock(64, True),
            ConvBlock(128, True),
            ConvBlock(256, True),
            L.Linear(None, n_output)
    def __call__(self, x):
        for f in self.children():
            x = f(x)
        return x

Now we all have settled Deeper CNN. Let’s train and observe the accuracy and loss.

gpu_id = 0  # Set to -1 if you don't have a GPU
model = train(DeepCNN(10), gpu_id=gpu_id)

In comparison to the previous smaller CNN, the accuracy on the test set has much improved. The accuracy used to be around 61 percent, but today it’s over 86 percent. To boost accuracy even further, we should not only improve the models layer but also to increase the training data (data augmentation) or to integrate different models to achieve the best results (Ensemble method)

Final Words 

Through this post we seen Chainer Deep Learning framework that enable us build to build simple and complex DL applications. Because of its broad and deep support – Chainer is actively used for most current neural net approaches (CNN, RNN, RL, etc.), aggressively adds new approaches as they are created, and provides support for a wide range of hardware as well as parallelization for several GPUs. We only saw a simple use case in which we trained a CNN network on the CIFAR10 dataset even more and SOTA applications can be implemented.


More Great AIM Stories

Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM