# A Guide to Chainer: A Flexible Toolkit For Neural Networks

Implementing neural networks necessitates the use of a variety of specialized building elements, such as multidimensional arrays, activation functions, and automatic differentiation.

Implementing neural networks necessitates the use of a variety of specialized building elements, such as multidimensional arrays, activation functions, and automatic differentiation. We employ numerous frameworks such as Tensorflow, Pytorch, Theano, and others to avoid the complexity we confront if we choose to develop those things from scratch. In this post, we will look at a framework called Chainer and will understand how it is better than the traditional frameworks. We will also try to understand this superiority by implementing the Chainer framework in Python.

Here in this post, we will focus mostly on the following two points:-

#### THE BELAMY

1. Traditional Frameworks Vs Chainer Framework
2. Implementing Chainer Framework in Python

Let us begin by defining the difference between the present framework’s approach and Chainer’s approach.

#### Traditional Frameworks Vs Chainer Framework

In typical neural network frameworks, models are frequently built in two phases, using a Define-and-Run technique as shown in Figure 1(a). During the Define phase, the model’s computational graph is constructed and developed. This stage involves generating a neural network object using a model definition that includes the inter-layer data flow graph, beginning weights, and activation functions.

The computations for both the forward and backward passes are commonly defined using automatic differentiation, with optional graph optimizations. The real forward and backward calculation of the graph is done in the Run phase. In this phase, the model is trained by minimizing the loss function using optimization algorithms such as stochastic gradient descent, given a set of training instances.

Fig 1 (Source)

For static models like CNNs, the Define-and-Run paradigm works well since having the whole computational graph available allows for potential graph improvements to enhance memory efficiency and/or runtime performance. However, there are two key issues with applying various types of NN models.

1. The first is that supporting general dynamic graphs, such as neural networks with control flow, can be time-consuming. Control flow decisions in frameworks such as TensorFlow are defined in the data flow graph using special operators such as Switch and Merge rather than the host language’s control flow syntaxes.
1. The second issue is that the user does not have access to the neural network’s core mechanism under the Define-and-Run paradigm. This presents a variety of difficulties in constructing an appropriate model. For instance, in order to successfully debug and optimize a model

Chainer, on the other hand, uses a “Define-by-Run” approach as shown in Figure 1(b), in which the network is defined dynamically through the real forward computation. Chainer, rather than storing programming logic, stores the history of computation. This method allows us to fully utilize Python’s programming logic capabilities. Chainer, for example, does not require any magic to add conditionals and loops to network definitions. Chainer’s key notion is the Define-by-Run scheme.

#### Implementing Chainer Framework in Python

Here in this section, we will implement the CNN-based image classifier on the CIFAR10 dataset. We will try to explore various functionality given by the Chainer.

###### Create model

Our model is defined as a Chain subclass. Three convolutional layers will be followed by two fully linked layers in our CNN model. Despite the fact that this is still a modest CNN. Each layer of a neural network is divided into one of two sorts of functions (really, function objects) in Chainer: ‘Link’ and ‘Function.’

• A function with no learnable parameters is referred to as a function.
• A function with (learnable) parameters is called a link.

Chainer’s Link can be thought of as a wrapper around a function that allows us to pass parameters to it. That is, when Link is called, it will also call the associated Function.

Then we implement code that conducts the “forward pass” computations to describe a model. Various connections and chains will be called by this code (recall that Link and Chain are callable objects). Chainer will automatically handle the “backward pass,” so we won’t have to worry about it unless we wish to create some special functions.

import chainer
import chainer.functions as F

class MyModel(chainer.Chain):

def __init__(self, n_out):
super(MyModel, self).__init__()
with self.init_scope():
self.conv1=L.Convolution2D(None, 32, 3, 3, 1)
self.conv2=L.Convolution2D(32, 64, 3, 3, 1)
self.conv3=L.Convolution2D(64, 128, 3, 3, 1)
self.fc4=L.Linear(None, 1000)
self.fc5=L.Linear(1000, n_out)

def __call__(self, x):
h = F.relu(self.conv1(x))
h = F.relu(self.conv2(h))
h = F.relu(self.conv3(h))
h = F.relu(self.fc4(h))
h = self.fc5(h)
return h

###### Training Phase

Let’s create a ‘train’ function that we can use to quickly train other models in the future. This function accepts a model object and trains it to categorize the 10 CIFAR10 classes before returning the trained model. This train function will be used to train the MyModel network described previously.

We need to construct batches of our dataset before we can train. Chainer already provides an Iterator class and various subclasses that can be used for this purpose, and users can easily create their own as well.

In this example, we’ll use the SerialIterator Iterator subclass. The SerialIterator can either return the examples in the same order as they occur in the dataset (that is, sequentially) or it can shuffle the instances and return them in random order.

The complete training loop can be coded as below

from chainer.datasets import cifar
from chainer import iterators
from chainer import optimizers
from chainer import training
from chainer.training import extensions

def train(model_object_, batch_size=64, gpu_id=0, Max_epoch=20):

# 1. Get the Dataset
train_set, test_set = cifar.get_cifar10()

# 2. Create a Serial Iterator for Train and test data
train_iter_set = iterators.SerialIterator(train_set, batch_size)
test_iter_set = iterators.SerialIterator(test_set, batch_size, False, False)

# 3. Use classifier from chainer's Link
model_ = L.Classifier(model_object_)
if gpu_id >=0:
model_.to_gpu(gpu_id)

# 4. Set optimization
opti_.setup(model_)

# 5. Update weights
updater_ = training.StandardUpdater(train_iter_set, opti_, device=gpu_id)

# 6. Train network
trainer = training.Trainer(updater_, (Max_epoch, 'epoch'), out='{}_cifar10_result'.format(model_object_.__class__.__name__))

# 7. Evaluate network
class Test_Mode_Evaluator(extensions.Evaluator):

def evaluate(self):
model_ = self.get_target('main')
ret = super(Test_Mode_Evaluator, self).evaluate()
return ret

trainer.extend(extensions.LogReport())
trainer.extend(Test_Mode_Evaluator(test_iter_set, model_, device=gpu_id))
trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'main/accuracy', 'validation/main/loss', 'validation/main/accuracy', 'elapsed_time']))
trainer.extend(extensions.PlotReport(['main/loss', 'validation/main/loss'], x_key='epoch', file_name='loss.png'))
trainer.extend(extensions.PlotReport(['main/accuracy', 'validation/main/accuracy'], x_key='epoch', file_name='accuracy.png'))
trainer.run()
del trainer

return model_

gpu_id = 0  # Set -1 if don't have a GPU

model = train(MyModel(10), gpu_id=gpu_id)


We have trained the network for 20 epochs and loss and accuracies is obtained as below.

As you can see there is much difference between training accuracies and validation accuracies it seems like the model has overfitted the dataset.

Let’s examine how well our CNN performs as we add more layers to it. We’ll also make our model modular by writing it as a three-chain combination. This will aid in the improvement of readability and the reduction of code duplication: ConvBlock – a single completely connected neural net, – a single convolutional neural net, – a single fully connected neural net, – a single fully connected neural net, – a single fully connected neural LinearBlock – Make a full model by chaining together a lot of these two blocks.

class ConvBlock(chainer.Chain):

def __init__(self, n_ch, pool_drop=False):
w = chainer.initializers.HeNormal()
super(ConvBlock, self).__init__()
with self.init_scope():
self.conv = L.Convolution2D(None, n_ch, 3, 1, 1,
nobias=True, initialW=w)
self.bn = L.BatchNormalization(n_ch)
self.pool_drop = pool_drop

def __call__(self, x):
h = F.relu(self.bn(self.conv(x)))
if self.pool_drop:
h = F.max_pooling_2d(h, 2, 2)
h = F.dropout(h, ratio=0.25)
return h

class LinearBlock(chainer.Chain):

def __init__(self):
w = chainer.initializers.HeNormal()
super(LinearBlock, self).__init__()
with self.init_scope():
self.fc = L.Linear(None, 1024, initialW=w)

def __call__(self, x):
return F.dropout(F.relu(self.fc(x)), ratio=0.5)


ConvBlock is specified as a descendant of Chain. It has a single convolution layer and a Batch Normalization layer, both of which were registered by the constructor. The __call__ method receives data and performs an activation function on it. The Max Pooling and Dropout methods are also used if the pool drop is set to True.

Let’s now stack the component blocks to define the deeper CNN network.

class DeepCNN(chainer.ChainList):

def __init__(self, n_output):
super(DeepCNN, self).__init__(
ConvBlock(64),
ConvBlock(64, True),
ConvBlock(128),
ConvBlock(128, True),
ConvBlock(256),
ConvBlock(256, True),
LinearBlock(),
LinearBlock(),
L.Linear(None, n_output)
)

def __call__(self, x):
for f in self.children():
x = f(x)
return x


Now we all have settled Deeper CNN. Let’s train and observe the accuracy and loss.

gpu_id = 0  # Set to -1 if you don't have a GPU
model = train(DeepCNN(10), gpu_id=gpu_id)


In comparison to the previous smaller CNN, the accuracy on the test set has much improved. The accuracy used to be around 61 percent, but today it’s over 86 percent. To boost accuracy even further, we should not only improve the models layer but also to increase the training data (data augmentation) or to integrate different models to achieve the best results (Ensemble method)

#### Final Words

Through this post we seen Chainer Deep Learning framework that enable us build to build simple and complex DL applications. Because of its broad and deep support – Chainer is actively used for most current neural net approaches (CNN, RNN, RL, etc.), aggressively adds new approaches as they are created, and provides support for a wide range of hardware as well as parallelization for several GPUs. We only saw a simple use case in which we trained a CNN network on the CIFAR10 dataset even more and SOTA applications can be implemented.

## More Great AIM Stories

### TensorFlow 2.5.0 Released: All Major Updates & Features

Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

## Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

##### MORE FROM AIM

LTI and Mindtree both play in Analytics services businesses, just like most other large IT/ITes service providers. But, what would the analytics services business of the merged entity look like?

##### GitHub now offers math support in markdown

GitHub’s math rendering capability uses MathJax; an open-source, JavaScript-based display engine.

Meta recently organised messaging event called ‘Conversations.’

##### Wipro announces 40,000 sq.ft. Innovation Studio in Texas

The studio will leverage Wipro’s deep reservoir of IPs, patents, and innovation DNA.

##### Google’s facial recognition tech to replace smart cards in Bengaluru metro trains￼

BMRCL plans to introduce the technology at its automatic fare collection gates.

##### Data science hiring process at DealShare

In the next few months, DealShare looks to grow its data science team by 15-20 members.

##### DeepMind’s AlphaFold 2 is half of the story

The idea was if I give you a sequence of amino acids, can you predict what will be the structure or the shape that it will take in the 3D space?

##### Lenskart invests USD 2 Mn in location intelligence platform GeoIQ

GeoIQ’s AI-based location tool will help Lenskart with its aggressive store rollout strategy.

##### TensorFlow v2.9 released: Major highlights

The main highlights of this release are performance enhancement with oneDNN and the release of a new API for model distribution, called DTensor