Computer Vision or CV can be defined as a field of study that aims to develop techniques to enable computers to “see” or develop “vision” and also understand the content of digital images such as photographs and videos. Images and text are all around us these days, and they encircle human society. Smartphones these days have cameras that can capture high-resolution images in just a touch. Sharing photos and videos have never been easier, thanks to social media platforms like Instagram and Facebook. Even with messaging apps like Whatsapp and Telegram, connectivity today has become much easier, and hence it also seems to be getting even simplified day by day. The internet itself is comprised of text and images. It is relatively easy to index and searches text with the emergence of numerous search engines. Still, algorithms need to know what the images contain and compose to index and search images.
For a long time, the content of photos and videos was shrouded in mystery, always described using the descriptions provided by the uploader. To get the most out of image and video data, we need computers to “see” the image and comprehend the information. Image classification is the task of identifying images and categorizing them in one of several distinct classes. Image recognition softwares and apps can define what’s depicted in a picture and distinguish one object from another.
Sign up for your weekly dose of what's up in emerging technology.
Being recognized as one of the computer vision tasks, image classification serves as the basic foundation for solving different CV problems. Deep Learning Models can be used for solving computer vision tasks. Deep learning is a machine learning technique that focuses on teaching machines to learn by example and instance. Deep Learning methods use neural network architectures, and deep learning models are also known as deep neural networks.
Neural networks are computing systems designed to recognize patterns. The neural architecture is inspired by the structure of the human brain composed of neurons and hence the name. They consist of three types of layers: the input, the hidden layers, and the output layer. The input layer receives the signal, the hidden layer processes it, and the output layer decides whether to forecast the input data or not. Each network layer consists of interconnected nodes, also known as artificial neurons, that do all the computation.
But what makes a neural network deep? It’s the number of hidden layers! While traditional neural networks generally have up to three hidden layers, deep neural networks may contain hundreds of them. While recognising someone subconsciously analysing their appearance and some of the distinct inherent features such as face shape, eye color, hairstyle, body type, and we recognize this individual by virtue of memory. So the same way, to recognize faces, a system must learn their features first. Then, it must be trained to predict whether an object is X or Z or Y. Deep learning models learn these characteristics in a different approach from machine learning models. That’s why model training approaches are different as well.
What is Lucid?
Lucid is a library that provides a collection of infrastructure and tools to help research neural networks and understand how neural networks make interpretations and decisions based on the input. It is a step up from DeepDream and provides flexible abstractions so that it can be used for a wide range of interpretability research. Lucid helps us know the how and why of a given prediction. This makes the end-user understand the reasons for the occurrence of such. There is a growing keen interest that neural networks need to be interpretable to humans for research purposes and better understanding.
The field of neural network interpretability has formed to help with these concerns. Lucid makes use of convolutional neural networks, which have many convolutional layers. At first glance, the early layers look for basic lines and simple shapes and patterns from the input image. The results from this layer keep propagating forward and further respond to more understandable inputs; this information then goes forward to generate the output from the final layers. The strength of each neuron’s response to input is used to understand the behavior. Multiple neurons can be connected to create a channel. The whole channel receives the same input from the previous layer, but each layer processes information slightly differently and looks for different features. All of these outputs are then combined and passed on to the next layer and so on. When an image is passed through the network using such methods, known as a forward pass, it creates an activation number—the greater the number, the stronger the decision.
As it matures, two major threads of research have begun to emerge: feature visualization and attribution. Feature visualization is a powerful tool, actually getting it to work involves several details. On the other hand, optimisation is an approach that can be an interesting way to understand what a model is looking for as it separates the things causing behavior from things that merely correlate with the causes.
It also makes use of Feature Visualization by Optimization, where :
- It holds the network static
- Chooses which part of the network to visualize.
- Passes the image as information.
- Optimizes the image to excite distinct features.
- Repeat the same over again.
Getting started with Lucid
Here we will try a small implementation to create an interpretable neural network where we will try to understand how a network takes an input image and identifies its features. We will also observe the input changes as it passes through each layer in the neural network. The following is one of many official implementations from Lucid’s creators, which you can find through the link here.
Installing The Library
At first, we will install all our Lucid Library for Interpretation; we will run the following code :
# Install Lucid !pip install --quiet lucid==0.2.3
Installing the dependencies :
# Importing the dependencies import numpy as np import tensorflow as tf assert tf.__version__.startswith('1') import lucid.modelzoo.vision_models as models from lucid.misc.io import show import lucid.optvis.objectives as objectives import lucid.optvis.param as param import lucid.optvis.render as render import lucid.optvis.transform as transform
Importing our model from Lucid Model Zoo. Next, we will use the InceptionV1 model to create dreamy output images. These output representations show what images the human brain creates while asleep.
# Let's import a model from the Lucid modelzoo! model = models.InceptionV1() model.load_graphdef()
We will now visualize the neuron; for this, we will start by calling a pre-trained input layer thatcontains random input images; the hidden layers will pick one random image and produce output from it.
# Visualizing a neuron # importing our neuron layer _ = render.render_vis(model, "mixed4a_pre_relu:476")
We can see the input image as following through output :
Let’s visualize another neuron using a more explicit objective:
#adding and processing through hidden layer obj = objectives.channel("mixed4a_pre_relu", 465) _ = render.render_vis(model, obj)
Processed output :
As we can observe, the more the image forward passes, the more our network tries to detect and identify its features. This gives us more context about how the neural network is working to detect features.
Passing through a whole channel to generate output :
# Or we could do something weirder: # (Technically, objectives are a class that implements addition.) channel = lambda n: objectives.channel("mixed4a_pre_relu", n) obj = channel(476) + channel(465) _ = render.render_vis(model, obj)
Here is our final convoluted output. As we can now observe, the neural network has identified the features from the input and performed convolution on it!
We can also perform a transformation on the input image :
# No transformation robustness transforms =  _ = render.render_vis(model, "mixed4a_pre_relu:476", transforms=transforms)
# Using alternate parameterizations is one of the primary ingredients for # effective visualization param_f = lambda: param.image(128, fft=False, decorrelate=False) _ = render.render_vis(model, "mixed4a_pre_relu:2", param_f)
Dreamy Transformed Image :
In this article, we have tried to explore the lucid library, a library that helps us understand how neural networks make decisions. We also tried to implement a small example through which we observed the changes as the input kept passing through the neural channel. You can find the implementation colab notebook here.