Hands-on Guide to Cockpit: A Debugging Tool for Deep Learning Models

It is important to understand the limitations that prevent machine learning adoption in many industries.

It is important to understand the limitations that prevent machine learning adoption in many industries. Machine learning models are excellent at certain jobs, but they can also make a lot of mistakes. Understanding how your model can fail and preparing adequate answers ahead of time is the key to a successful project. There are numerous frameworks available for understanding model behaviour, specifically for deep learning, such as TensorBoard, Weights and Biases, and so on. In this article, we will have a look at Cockpit, a framework that provides us with practically all of the information we need about our model. We will go through the following table of contents to understand this debugging framework. 

Table of Contents

  1. Why Debugging is Necessary
  2. Debugging Tools in Deep Learning
  3. Debugging with Cockpit
  4. Implementing the Debugging with Cockpit 

Let’s start the discussion by understanding why debugging is so important when it comes to building a robust application.

Why Debugging is Necessary

Debuggers are essential in the development of traditional software. When things go wrong, they provide you access to the code’s inner workings, allowing you to see “inside the box.” This is far more efficient than rerunning the program with new inputs. Deep learning, on the other hand, is arguably closer to the latter. 

If a deep net training attempt fails, we must decide whether to alter the training hyperparameters (how?), the optimizer (to which?), the model (how?), or just re-run with a different seed. Machine learning toolboxes don’t offer much in the way of guidance for these decisions.

Debugging Tools in Deep Learning

Deep learning can be debugged with standard debuggers. They’ll offer us access to each and every weight in a neural network, as well as the individual pixels in the training data. However, this rarely offers useful information for effective training. To extract useful data, you’ll need to use a statistical method and condense the confusing complexity into a simple summary. 

TensorBoard and Weights & Biases were created in part to make this visualization easier. However, because they do not show the network’s internal state, the extensively observed quantities (primarily train/test loss and accuracy) provide only a rudimentary explanation for relative variances between several training cycles.

These debuggers, as you may have seen, provide learning curves that characterize the model’s current state – whether it is performing well or not – but no information regarding training state and dynamics. They tell the user if things are going well or not, but not why. It’s like piloting a plane with your eyes closed and no Cockpit feedback. As a result, it should come as no surprise that achieving cutting-edge deep learning performance necessitates specialist skill or plain trial and error.

Debugging With Cockpit

Here comes Cockpit, which adds a visual and statistical debugging tool to the deep learning pipeline that employs both newly proposed and established observables. It uses and augments current modifications to automatic differentiation (i.e. BackPack for PyTorch) to effectively access second-order statistical (e.g. gradient variances) and geometric (e.g. Hessian) information, according to the official publication.

In their work, they explain how these numbers might help deep learning engineers with tasks like learning rate selection and finding frequent problems in data processing or model architectures. Practically, we’ll see all of these. It’s open-source, expandable code that seamlessly interfaces with current PyTorch training loops.

Implementing the Debugging with Cockpit 

In this section, we will see practically how we can access the various internal parameters of a particular model and will discuss the meaning of each. 

Below we are implementing the example taken from the official documentation of Cockpit. To continue with this example, you need a lactate _utils_exmple.py file from the repository to your working directory in order to do successful data imports. Before going further you need to install a Cockpit and can be done using a simple pip command as ! pip install Cockpit-for-pytorch

In addition to PyTorch, we import BackPack, which will be installed automatically when Cockpit is installed. We also include the Cockpit and CockpitPlotter classes, which will allow us to track and visualize useful data.

In the next lines of code, we import from a utils file that contains the Fashion-MNIST data.

import torch
from _utils_examples import fmnist_data
from backpack import extend
from Cockpit import Cockpit, CockpitPlotter
from Cockpit.utils.configuration import configuration

Then, for our Fashion-MNIST data set, we create a basic classifier. The main difference from a standard training loop is that we must use BackPack to extend both the model and the loss function. It’s as simple as wrapping the standard model and loss function with BackPack extend() method. It informs BackPack that extra values (such as individual gradients) for these parameters should be computed.

We also need access to the individual loss values for the Alpha (Will discuss shortly) quantity, which can be computed inexpensively but isn’t generally part of a traditional training loop. By setting the reduction=None, we may generate this function in the same way as the standard loss function. There is no need to inform BackPack of its existence because the losses will be the same.

# Build Fashion-MNIST classifier
fmnist_data = fmnist_data()
model = extend(torch.nn.Sequential(torch.nn.Flatten(), torch.nn.Linear(784, 10)))
loss_fn = extend(torch.nn.CrossEntropyLoss(reduction="mean"))
individual_loss_fn = torch.nn.CrossEntropyLoss(reduction="none")

The Cockpit class is in charge of computing the quantities and storing the results. We must provide model parameters as well as a list of values indicating what should be tracked and when. Cockpit has three alternative computational complexity configurations: “economy,” “business,” and “full” (see also configuration()). To keep track of all possible quantities, we’ll use the utility function given.

# Create SGD Optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-2)
# Create Cockpit and a plotter
Cockpit = Cockpit(model.parameters(), quantities=configuration("full"))
plotter = CockpitPlotter()

Now, let’s move to the training loop. The training itself is simple. We draw a mini-batch at each iteration, compute the model predictions and losses, then conduct a backward pass and update the parameters. The primary difference between Cockpit and backward calls is that the backward call is wrapped by a Cockpit(…) context, which manages the extra computations during the backward pass. The info parameter is used to pass additional information required by specific amounts.

# Main training loop
max_steps, global_step = 5, 0
for inputs, labels in iter(fmnist_data):
    # forward pass
    outputs = model(inputs)
    loss = loss_fn(outputs, labels)
    losses = individual_loss_fn(outputs, labels)
    # backward pass
    with Cockpit(
            "batch_size": inputs.shape[0],
            "individual_losses": losses,
            "loss": loss,
            "optimizer": opt,
    # optimizer step
    global_step += 1
    print(f"Step: {global_step:5d} | Loss: {loss.item():.4f}")
    if global_step >= max_steps:
plotter.plot(Cockpit, block=True)

The computed metrics may be viewed at any point during the training, which we do in every iteration, by accessing the plotting capabilities of the CockpitPlotter via plot (). The whole Cockpit view is shown here after the final iteration.

Let us interpret each of these plots in detail.


It has built a noise-informed univariate quadratic approximation in the step direction (i.e. the loss as a function of the step size) and assessed to which point on this parabola our optimizer advances using individual loss and gradient observations at the start and conclusion of each iteration. 

This value has been standardized by Inventor so that stepping to the valley floor is assigned a value of ???? = 0, the beginning point is allocated a value of ???? = 1, and updates to the point precisely opposite the starting point are assigned a value of ???? = 1.

Gradient Norm

The orange trajectory is stationary, as evidenced by the update size. But why is that? Slowing down can be caused by both a slow learning rate and loss of landscape plateaus. These two causes are distinguished by the gradient norm.

Gradient Test

Individual gradients disperse around the mean using a standardized radius and two bandwidths (parallel and orthogonal to the gradient mean) in the norm, inner product, and orthogonality tests. 

These settings are used in the original works to adjust batch sizes. Instead, Cockpit visualizes the standardized noise radius and bandwidths by combining all three tests into a single gauge (top centre plot). 

These noise signals can be utilized to direct batch size adaptation both on and off the computer, as well as to investigate the impact of gradient alignment on training speed and generalization.

Hessian EigenValue

The largest Hessian eigenvalue defines the optimum step size in convex optimization. The cockpit takes use of this to calculate the Hessian’s biggest eigenvalue and trace (top and centre plots). The former resembles the sharpest valley on the loss surface and hence may indicate training instabilities. The graph depicts the concept of “average curvature.”


The Takeuchi Information Criterion (TIC) uses a ratio between Hessian and non-central second gradient moments to determine the generalization gap. It also gives insight into the changes in the goal function that gradient noise implies. Cockpit delivers TIC estimates in small batches.

Gradient and Parameter Histogram

The gradient elements are represented via a univariate histogram in Cockpit. A combined histogram of parameter-gradient pairs is also included. The mini-batch approach provides a two-dimensional glimpse into the network’s gradient and parameter values.

Final Words

Deep learning is mostly a black box black approach. High dimensionality, stochasticity, and non-convexity necessitate ongoing tracking and tweaking, which can be a time-consuming and uncomfortable procedure. To solve such a difficult issue, we discovered Cockpit, a practical visual debugging tool for deep learning. It provides tools for real-time monitoring of the network’s internal dynamics during training.


Download our Mobile App

Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.