Deep learning applications usually involve complex optimisation problems that are often difficult to solve analytically. Often the objective function itself may not be in analytically closed-form, which means that the objective function only permits function evaluations without any gradient evaluations. This is where Zeroth-Order comes in.
Optimisation corresponding to the above types of problems falls into the category of Zeroth-Order (ZO) optimisation with respect to the black-box models, where explicit expressions of the gradients are hard to estimate or infeasible to obtain.
Researchers from IBM Research and MIT-IBM Watson AI Lab discussed the topic of Zeroth-Order optimisation at the on-going Computer Vision and Pattern Recognition (CVPR) 2020 conference.
In this article, we will take a dive into what Zeroth-Order optimisation is and how this method can be applied in complex deep learning applications.
Behind ZO Optimisation
Zeroth-Order (ZO) optimisation is a subset of gradient-free optimisation that emerges in various signal processing as well as machine learning applications. ZO optimisation methods are basically the gradient-free counterparts of first-order (FO) optimisation techniques. ZO approximates the full gradients or stochastic gradients through function value-based gradient estimates.
Derivative-Free methods for black-box optimisation has been studied by the optimisation community for many years now. However, conventional Derivative-Free optimisation methods have two main shortcomings that include difficulties to scale to large-size problems and lack of convergence rate analysis.
ZO optimisation has the following three main advantages over the Derivative-Free optimisation methods: –
- Ease of implementation with only small modification of commonly-used gradient-based algorithms
- Computationally efficient approximations to derivatives when they are difficult to compute
- Comparable convergence rates to first-order algorithms
Applications Of ZO Optimisation
ZO optimisation has drawn increasing attention due to its success in solving emerging signal processing and deep learning as well as machine learning problems. This optimisation method serves as a powerful and practical tool for evaluating adversarial robustness of deep learning systems.
According to Pin-Yu Chen, a researcher at IBM Research, Zeroth-order (ZO) optimisation achieves gradient-free optimisation by approximating the full gradient via efficient gradient estimators.
Some recent important applications include generation of prediction-evasive, black-box adversarial attacks on deep neural networks, generation of model-agnostic explanation from machine learning systems, and design of gradient or curvature regularised robust ML systems in a computationally-efficient manner. In addition, the use cases span across automated ML and meta-learning, online network management with limited computation capacity, parameter inference of black-box/complex systems, and bandit optimisation in which a player receives partial feedback in terms of loss function values revealed by her adversary.
ZO Optimisation For Adversarial Robustness In Deep Learning
Talking about the application of ZO optimisation to the generation of prediction-evasive adversarial examples to fool DL models, the researchers stated that most studies on adversarial vulnerability of deep learning had been restricted to the white-box setting where the adversary has complete access and knowledge of the target system, such as deep neural networks.
In most of the cases, the internal states or configurations and the operating mechanism of deep learning systems are not revealed to the practitioners, for instance, Google Cloud Vision API. This in result gives rise to the issues of black-box adversarial attacks where the only mode of interaction of the adversary with the system is through the submission of inputs and receiving the corresponding predicted outputs.
ZO optimisation serves as a powerful and practical tool for evaluating adversarial robustness of deep learning as well as machine learning systems. ZO-based methods for exploring vulnerabilities of deep learning to black-box adversarial attacks are able to reveal the most susceptible features.
Such methods of ZO optimisation can be as effective as state-of-the-art white-box attacks, despite only having access to the inputs and outputs of the targeted deep neural networks. ZO optimisation can also generate explanations and provide interpretations of prediction results in a gradient-free and model-agnostic manner.
The interest in ZO optimisation has grown rapidly over the last few decades. According to the researchers, ZO optimisation has been increasingly embraced for solving big data and machine learning problems when explicit expressions of the gradients are difficult to compute or infeasible to obtain.