Implementing neural networks necessitates the use of a variety of specialized building elements, such as multidimensional arrays, activation functions, and automatic differentiation. We employ numerous frameworks such as Tensorflow, Pytorch, Theano, and others to avoid the complexity we confront if we choose to develop those things from scratch. In this post, we will look at a framework called Chainer and will understand how it is better than the traditional frameworks. We will also try to understand this superiority by implementing the Chainer framework in Python.
Here in this post, we will focus mostly on the following two points:-
- Traditional Frameworks Vs Chainer Framework
- Implementing Chainer Framework in Python
Let us begin by defining the difference between the present framework’s approach and Chainer’s approach.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Traditional Frameworks Vs Chainer Framework
In typical neural network frameworks, models are frequently built in two phases, using a Define-and-Run technique as shown in Figure 1(a). During the Define phase, the model’s computational graph is constructed and developed. This stage involves generating a neural network object using a model definition that includes the inter-layer data flow graph, beginning weights, and activation functions.

The computations for both the forward and backward passes are commonly defined using automatic differentiation, with optional graph optimizations. The real forward and backward calculation of the graph is done in the Run phase. In this phase, the model is trained by minimizing the loss function using optimization algorithms such as stochastic gradient descent, given a set of training instances.

Fig 1 (Source)
For static models like CNNs, the Define-and-Run paradigm works well since having the whole computational graph available allows for potential graph improvements to enhance memory efficiency and/or runtime performance. However, there are two key issues with applying various types of NN models.
- The first is that supporting general dynamic graphs, such as neural networks with control flow, can be time-consuming. Control flow decisions in frameworks such as TensorFlow are defined in the data flow graph using special operators such as Switch and Merge rather than the host language’s control flow syntaxes.
- The second issue is that the user does not have access to the neural network’s core mechanism under the Define-and-Run paradigm. This presents a variety of difficulties in constructing an appropriate model. For instance, in order to successfully debug and optimize a model
Chainer, on the other hand, uses a “Define-by-Run” approach as shown in Figure 1(b), in which the network is defined dynamically through the real forward computation. Chainer, rather than storing programming logic, stores the history of computation. This method allows us to fully utilize Python’s programming logic capabilities. Chainer, for example, does not require any magic to add conditionals and loops to network definitions. Chainer’s key notion is the Define-by-Run scheme.
Implementing Chainer Framework in Python
Here in this section, we will implement the CNN-based image classifier on the CIFAR10 dataset. We will try to explore various functionality given by the Chainer.
Create model
Our model is defined as a Chain subclass. Three convolutional layers will be followed by two fully linked layers in our CNN model. Despite the fact that this is still a modest CNN. Each layer of a neural network is divided into one of two sorts of functions (really, function objects) in Chainer: ‘Link’ and ‘Function.’
- A function with no learnable parameters is referred to as a function.
- A function with (learnable) parameters is called a link.
Chainer’s Link can be thought of as a wrapper around a function that allows us to pass parameters to it. That is, when Link is called, it will also call the associated Function.
Then we implement code that conducts the “forward pass” computations to describe a model. Various connections and chains will be called by this code (recall that Link and Chain are callable objects). Chainer will automatically handle the “backward pass,” so we won’t have to worry about it unless we wish to create some special functions.
import chainer import chainer.functions as F import chainer.links as L class MyModel(chainer.Chain): def __init__(self, n_out): super(MyModel, self).__init__() with self.init_scope(): self.conv1=L.Convolution2D(None, 32, 3, 3, 1) self.conv2=L.Convolution2D(32, 64, 3, 3, 1) self.conv3=L.Convolution2D(64, 128, 3, 3, 1) self.fc4=L.Linear(None, 1000) self.fc5=L.Linear(1000, n_out) def __call__(self, x): h = F.relu(self.conv1(x)) h = F.relu(self.conv2(h)) h = F.relu(self.conv3(h)) h = F.relu(self.fc4(h)) h = self.fc5(h) return h
Training Phase
Let’s create a ‘train’ function that we can use to quickly train other models in the future. This function accepts a model object and trains it to categorize the 10 CIFAR10 classes before returning the trained model. This train function will be used to train the MyModel network described previously.
We need to construct batches of our dataset before we can train. Chainer already provides an Iterator class and various subclasses that can be used for this purpose, and users can easily create their own as well.
In this example, we’ll use the SerialIterator Iterator subclass. The SerialIterator can either return the examples in the same order as they occur in the dataset (that is, sequentially) or it can shuffle the instances and return them in random order.
The complete training loop can be coded as below
from chainer.datasets import cifar from chainer import iterators from chainer import optimizers from chainer import training from chainer.training import extensions def train(model_object_, batch_size=64, gpu_id=0, Max_epoch=20): # 1. Get the Dataset train_set, test_set = cifar.get_cifar10() # 2. Create a Serial Iterator for Train and test data train_iter_set = iterators.SerialIterator(train_set, batch_size) test_iter_set = iterators.SerialIterator(test_set, batch_size, False, False) # 3. Use classifier from chainer's Link model_ = L.Classifier(model_object_) if gpu_id >=0: model_.to_gpu(gpu_id) # 4. Set optimization opti_ = optimizers.Adam() opti_.setup(model_) # 5. Update weights updater_ = training.StandardUpdater(train_iter_set, opti_, device=gpu_id) # 6. Train network trainer = training.Trainer(updater_, (Max_epoch, 'epoch'), out='{}_cifar10_result'.format(model_object_.__class__.__name__)) # 7. Evaluate network class Test_Mode_Evaluator(extensions.Evaluator): def evaluate(self): model_ = self.get_target('main') ret = super(Test_Mode_Evaluator, self).evaluate() return ret trainer.extend(extensions.LogReport()) trainer.extend(Test_Mode_Evaluator(test_iter_set, model_, device=gpu_id)) trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'main/accuracy', 'validation/main/loss', 'validation/main/accuracy', 'elapsed_time'])) trainer.extend(extensions.PlotReport(['main/loss', 'validation/main/loss'], x_key='epoch', file_name='loss.png')) trainer.extend(extensions.PlotReport(['main/accuracy', 'validation/main/accuracy'], x_key='epoch', file_name='accuracy.png')) trainer.run() del trainer return model_ gpu_id = 0 # Set -1 if don't have a GPU model = train(MyModel(10), gpu_id=gpu_id)
We have trained the network for 20 epochs and loss and accuracies is obtained as below.

As you can see there is much difference between training accuracies and validation accuracies it seems like the model has overfitted the dataset.
Let’s examine how well our CNN performs as we add more layers to it. We’ll also make our model modular by writing it as a three-chain combination. This will aid in the improvement of readability and the reduction of code duplication: ConvBlock – a single completely connected neural net, – a single convolutional neural net, – a single fully connected neural net, – a single fully connected neural net, – a single fully connected neural LinearBlock – Make a full model by chaining together a lot of these two blocks.
class ConvBlock(chainer.Chain): def __init__(self, n_ch, pool_drop=False): w = chainer.initializers.HeNormal() super(ConvBlock, self).__init__() with self.init_scope(): self.conv = L.Convolution2D(None, n_ch, 3, 1, 1, nobias=True, initialW=w) self.bn = L.BatchNormalization(n_ch) self.pool_drop = pool_drop def __call__(self, x): h = F.relu(self.bn(self.conv(x))) if self.pool_drop: h = F.max_pooling_2d(h, 2, 2) h = F.dropout(h, ratio=0.25) return h class LinearBlock(chainer.Chain): def __init__(self): w = chainer.initializers.HeNormal() super(LinearBlock, self).__init__() with self.init_scope(): self.fc = L.Linear(None, 1024, initialW=w) def __call__(self, x): return F.dropout(F.relu(self.fc(x)), ratio=0.5)
ConvBlock is specified as a descendant of Chain. It has a single convolution layer and a Batch Normalization layer, both of which were registered by the constructor. The __call__ method receives data and performs an activation function on it. The Max Pooling and Dropout methods are also used if the pool drop is set to True.
Let’s now stack the component blocks to define the deeper CNN network.
class DeepCNN(chainer.ChainList): def __init__(self, n_output): super(DeepCNN, self).__init__( ConvBlock(64), ConvBlock(64, True), ConvBlock(128), ConvBlock(128, True), ConvBlock(256), ConvBlock(256, True), LinearBlock(), LinearBlock(), L.Linear(None, n_output) ) def __call__(self, x): for f in self.children(): x = f(x) return x
Now we all have settled Deeper CNN. Let’s train and observe the accuracy and loss.
gpu_id = 0 # Set to -1 if you don't have a GPU model = train(DeepCNN(10), gpu_id=gpu_id)

In comparison to the previous smaller CNN, the accuracy on the test set has much improved. The accuracy used to be around 61 percent, but today it’s over 86 percent. To boost accuracy even further, we should not only improve the models layer but also to increase the training data (data augmentation) or to integrate different models to achieve the best results (Ensemble method)
Final Words
Through this post we seen Chainer Deep Learning framework that enable us build to build simple and complex DL applications. Because of its broad and deep support – Chainer is actively used for most current neural net approaches (CNN, RNN, RL, etc.), aggressively adds new approaches as they are created, and provides support for a wide range of hardware as well as parallelization for several GPUs. We only saw a simple use case in which we trained a CNN network on the CIFAR10 dataset even more and SOTA applications can be implemented.