Whether it is the market crash or a wrong diagnosis, the after-effects will be certainly irreversible. Hence, tracking the development of machine learning algorithm throughout its life cycle becomes crucial. Neural network activations have an underlying compositional, combinatorial structure.\u00a0\r\n\r\nVisualising the behaviour of neural networks has been of great interest lately for two reasons \u2014 to have a glimpse at how intelligence in its fundamental form appears to be and to analyse the neurons for improvement (avoiding misclassification etc.) of the network.\r\n\r\nPreviously, AI researchers started with individual neurons. In this method, a noisy image is added with details gradually until a noticeable excitement in that neuron can be observed. But this method doesn\u2019t show how neurons interact with each other.\r\n\r\nThe past couple of years have seen a sporadic growth of interest in interpreting the machine learning model representations and decisions made by these models, with profound implications for research into explainable ML, causality, safe AI, social science, automatic scientific discovery, human computer interaction (HCI), crowdsourcing, machine teaching, and AI ethics.\u00a0\r\n\r\nIf safer AI systems are to be deployed for example, on self-driving cars, straightforward black-box models might not suffice.\r\n\r\nCompanies like Uber, use neural networks for a variety of purposes, including detecting and predicting object motion for self-driving vehicles, responding more quickly to customers, and building better maps.\r\n\r\nThe machine learning team at Uber have tried to make neural networks more transparent by introducing a new metric to assess the learning routines of a network. They call this loss change allocation(LCA). This work has also been accepted for the prestigious NeurIPS conference.\r\nWhat Is Loss Change Allocation\r\n\r\n\r\nHere the objective is to measure how much each trainable parameter of the neural network \u201clearns\u201d at any point in time.\r\n\r\nThink of \u201clearning\u201d as the changes to the network that drive training set loss down and consider the loss on the entire training set, not just a batch; while batches drive parameter updates in stochastic gradient descent. Learning is measured with respect to the whole training set.\r\n\r\nLCA allocates changes in loss over individual parameters, thereby measuring how much each parameter learns.\r\n\r\nThis measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator.\u00a0\r\n\r\nThis rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters \u201chelp\u201d or \u201churt\u201d the network\u2019s learning, respectively. LCA may be summed over training iterations and\/or over neurons, channels, or layers for increasingly coarse views.\u00a0\r\n\r\nHere are few properties of LCA:\r\n\r\n \tIf a parameter has zero gradient or does not move, it has zero LCA\r\n \tIf a parameter has a non-zero gradient and moves in the negative gradient direction, it has a negative LCA. These parameters are called \u201chelping\u201d because they decrease the loss at an iteration\r\n \tIf a parameter moves in the positive direction of the gradient, it is \u201churting\u201d by increasing the loss. This could be caused by a noisy mini-batch or momentum causing the parameter to move the wrong direction\r\n\r\nObservations made using LCA:\r\n\r\n \tBarely over 50% of parameters help during any given iteration\r\n \tSome entire layers hurt overall, moving on average against the training gradient, may be due to phase lag in an oscillatory training process\r\n \tIncrements in learning proceed in a synchronized manner across layers, often peaking on identical iterations\r\n\r\nComplex machine learning models like deep neural networks have recently achieved outstanding predictive performance in a wide range of applications, including visual object recognition, speech perception, language modeling, and information retrieval.\r\n\r\nResearch has been done before on these challenges that have discerned broad patterns of convergence of layer representation, but by using LCA, layerwise learning on a smaller scale can be performed.\r\n\r\nA useful property of the LCA method is that it allows us to analyze any loss function, which allows for a more granular view into the training process and allowing us to identify when each layer learns concepts useful for classifying.\r\n\r\nKnow more about LCA here.