# Hands-on Guide to Cockpit: A Debugging Tool for Deep Learning Models

It is important to understand the limitations that prevent machine learning adoption in many industries.

#### THE BELAMY

1. Why Debugging is Necessary
2. Debugging Tools in Deep Learning
3. Debugging with Cockpit
4. Implementing the Debugging with Cockpit

Let’s start the discussion by understanding why debugging is so important when it comes to building a robust application.

#### Why Debugging is Necessary

Debuggers are essential in the development of traditional software. When things go wrong, they provide you access to the code’s inner workings, allowing you to see “inside the box.” This is far more efficient than rerunning the program with new inputs. Deep learning, on the other hand, is arguably closer to the latter.

If a deep net training attempt fails, we must decide whether to alter the training hyperparameters (how?), the optimizer (to which?), the model (how?), or just re-run with a different seed. Machine learning toolboxes don’t offer much in the way of guidance for these decisions.

Debugging Tools in Deep Learning

Deep learning can be debugged with standard debuggers. They’ll offer us access to each and every weight in a neural network, as well as the individual pixels in the training data. However, this rarely offers useful information for effective training. To extract useful data, you’ll need to use a statistical method and condense the confusing complexity into a simple summary.

TensorBoard and Weights & Biases were created in part to make this visualization easier. However, because they do not show the network’s internal state, the extensively observed quantities (primarily train/test loss and accuracy) provide only a rudimentary explanation for relative variances between several training cycles.

These debuggers, as you may have seen, provide learning curves that characterize the model’s current state – whether it is performing well or not – but no information regarding training state and dynamics. They tell the user if things are going well or not, but not why. It’s like piloting a plane with your eyes closed and no Cockpit feedback. As a result, it should come as no surprise that achieving cutting-edge deep learning performance necessitates specialist skill or plain trial and error.

#### Debugging With Cockpit

Here comes Cockpit, which adds a visual and statistical debugging tool to the deep learning pipeline that employs both newly proposed and established observables. It uses and augments current modifications to automatic differentiation (i.e. BackPack for PyTorch) to effectively access second-order statistical (e.g. gradient variances) and geometric (e.g. Hessian) information, according to the official publication.

In their work, they explain how these numbers might help deep learning engineers with tasks like learning rate selection and finding frequent problems in data processing or model architectures. Practically, we’ll see all of these. It’s open-source, expandable code that seamlessly interfaces with current PyTorch training loops.

#### Implementing the Debugging with Cockpit

In this section, we will see practically how we can access the various internal parameters of a particular model and will discuss the meaning of each.

Below we are implementing the example taken from the official documentation of Cockpit. To continue with this example, you need a lactate _utils_exmple.py file from the repository to your working directory in order to do successful data imports. Before going further you need to install a Cockpit and can be done using a simple pip command as ! pip install Cockpit-for-pytorch

In addition to PyTorch, we import BackPack, which will be installed automatically when Cockpit is installed. We also include the Cockpit and CockpitPlotter classes, which will allow us to track and visualize useful data.

In the next lines of code, we import from a utils file that contains the Fashion-MNIST data.

import torch
from _utils_examples import fmnist_data
from backpack import extend

from Cockpit import Cockpit, CockpitPlotter
from Cockpit.utils.configuration import configuration


Then, for our Fashion-MNIST data set, we create a basic classifier. The main difference from a standard training loop is that we must use BackPack to extend both the model and the loss function. It’s as simple as wrapping the standard model and loss function with BackPack extend() method. It informs BackPack that extra values (such as individual gradients) for these parameters should be computed.

We also need access to the individual loss values for the Alpha (Will discuss shortly) quantity, which can be computed inexpensively but isn’t generally part of a traditional training loop. By setting the reduction=None, we may generate this function in the same way as the standard loss function. There is no need to inform BackPack of its existence because the losses will be the same.

# Build Fashion-MNIST classifier
fmnist_data = fmnist_data()
model = extend(torch.nn.Sequential(torch.nn.Flatten(), torch.nn.Linear(784, 10)))
loss_fn = extend(torch.nn.CrossEntropyLoss(reduction="mean"))
individual_loss_fn = torch.nn.CrossEntropyLoss(reduction="none")


The Cockpit class is in charge of computing the quantities and storing the results. We must provide model parameters as well as a list of values indicating what should be tracked and when. Cockpit has three alternative computational complexity configurations: “economy,” “business,” and “full” (see also configuration()). To keep track of all possible quantities, we’ll use the utility function given.

# Create SGD Optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-2)

# Create Cockpit and a plotter
Cockpit = Cockpit(model.parameters(), quantities=configuration("full"))
plotter = CockpitPlotter()


Now, let’s move to the training loop. The training itself is simple. We draw a mini-batch at each iteration, compute the model predictions and losses, then conduct a backward pass and update the parameters. The primary difference between Cockpit and backward calls is that the backward call is wrapped by a Cockpit(…) context, which manages the extra computations during the backward pass. The info parameter is used to pass additional information required by specific amounts.

# Main training loop
max_steps, global_step = 5, 0
for inputs, labels in iter(fmnist_data):

# forward pass
outputs = model(inputs)
loss = loss_fn(outputs, labels)
losses = individual_loss_fn(outputs, labels)

# backward pass
with Cockpit(
global_step,
info={
"batch_size": inputs.shape[0],
"individual_losses": losses,
"loss": loss,
"optimizer": opt,
},
):
loss.backward(create_graph=Cockpit.create_graph(global_step))

# optimizer step
opt.step()
global_step += 1

print(f"Step: {global_step:5d} | Loss: {loss.item():.4f}")

plotter.plot(Cockpit)

if global_step >= max_steps:
break
plotter.plot(Cockpit, block=True)

The computed metrics may be viewed at any point during the training, which we do in every iteration, by accessing the plotting capabilities of the CockpitPlotter via plot (). The whole Cockpit view is shown here after the final iteration.

Let us interpret each of these plots in detail.

###### Alpha

It has built a noise-informed univariate quadratic approximation in the step direction (i.e. the loss as a function of the step size) and assessed to which point on this parabola our optimizer advances using individual loss and gradient observations at the start and conclusion of each iteration.

This value has been standardized by Inventor so that stepping to the valley floor is assigned a value of ???? = 0, the beginning point is allocated a value of ???? = 1, and updates to the point precisely opposite the starting point are assigned a value of ???? = 1.

The orange trajectory is stationary, as evidenced by the update size. But why is that? Slowing down can be caused by both a slow learning rate and loss of landscape plateaus. These two causes are distinguished by the gradient norm.

Individual gradients disperse around the mean using a standardized radius and two bandwidths (parallel and orthogonal to the gradient mean) in the norm, inner product, and orthogonality tests.

These settings are used in the original works to adjust batch sizes. Instead, Cockpit visualizes the standardized noise radius and bandwidths by combining all three tests into a single gauge (top centre plot).

These noise signals can be utilized to direct batch size adaptation both on and off the computer, as well as to investigate the impact of gradient alignment on training speed and generalization.

###### Hessian EigenValue

The largest Hessian eigenvalue defines the optimum step size in convex optimization. The cockpit takes use of this to calculate the Hessian’s biggest eigenvalue and trace (top and centre plots). The former resembles the sharpest valley on the loss surface and hence may indicate training instabilities. The graph depicts the concept of “average curvature.”

###### TIC

The Takeuchi Information Criterion (TIC) uses a ratio between Hessian and non-central second gradient moments to determine the generalization gap. It also gives insight into the changes in the goal function that gradient noise implies. Cockpit delivers TIC estimates in small batches.

The gradient elements are represented via a univariate histogram in Cockpit. A combined histogram of parameter-gradient pairs is also included. The mini-batch approach provides a two-dimensional glimpse into the network’s gradient and parameter values.

#### Final Words

Deep learning is mostly a black box black approach. High dimensionality, stochasticity, and non-convexity necessitate ongoing tracking and tweaking, which can be a time-consuming and uncomfortable procedure. To solve such a difficult issue, we discovered Cockpit, a practical visual debugging tool for deep learning. It provides tools for real-time monitoring of the network’s internal dynamics during training.

## More Great AIM Stories

### TensorFlow 2.5.0 Released: All Major Updates & Features

Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

## Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

##### MORE FROM AIM

LTI and Mindtree both play in Analytics services businesses, just like most other large IT/ITes service providers. But, what would the analytics services business of the merged entity look like?

##### GitHub now offers math support in markdown

GitHub’s math rendering capability uses MathJax; an open-source, JavaScript-based display engine.

Meta recently organised messaging event called ‘Conversations.’

##### Wipro announces 40,000 sq.ft. Innovation Studio in Texas

The studio will leverage Wipro’s deep reservoir of IPs, patents, and innovation DNA.

##### Google’s facial recognition tech to replace smart cards in Bengaluru metro trains￼

BMRCL plans to introduce the technology at its automatic fare collection gates.

##### Data science hiring process at DealShare

In the next few months, DealShare looks to grow its data science team by 15-20 members.

##### DeepMind’s AlphaFold 2 is half of the story

The idea was if I give you a sequence of amino acids, can you predict what will be the structure or the shape that it will take in the 3D space?

##### Lenskart invests USD 2 Mn in location intelligence platform GeoIQ

GeoIQ’s AI-based location tool will help Lenskart with its aggressive store rollout strategy.

##### TensorFlow v2.9 released: Major highlights

The main highlights of this release are performance enhancement with oneDNN and the release of a new API for model distribution, called DTensor