Pre-trained models, free datasets, APIs and all things open source have made ML development more affordable. The flip side to this is, these freely available datasets and pre-trained models are open for malicious players. Data poisoning, weight poisoning etc are well documented phenomena in the machine learning space.
Detecting Neural Trojan attacks for an unknown DNN is difficult due to the stealthiness of backdoors that makes them hard to identify by functional testing (which uses the test accuracy as the detection criteria) and only limited information can be obtained about the queried model during Trojan detection. A clean training dataset or a gold reference model might not be available in real-world settings.
According to a survey by the researchers at University of Maryland and other top institutes, dataset creators often rely on data-hungry neural network models built by harvesting information from anonymous and unverified sources on the web.
“Outsiders can passively manipulate datasets by placing corrupted data on the web and waiting for data harvesting bots to collect them.”
In their report, they list top defense mechanisms to avoid various types of dataset security attacks:
Backdoors can stay hidden indefinitely until activated by an input, and present a serious security risk to biometric authentication systems, self-driving cars etc. For trigger reconstruction, the backdoored models are trained to assign an adversarial label. A trigger is introduced even when a small number of pixels are manipulated. The backdoor triggers are recovered by computing adversarial perturbations for the target labels, and then selecting the smallest perturbation out of all labels.
Popular methods: Neural Cleanse, Deep Inspect, TABOR.
Trigger Agnostic Strategies
The researchers wrote that combining pruning with fine-tuning can remove backdoors while preserving the overall model accuracy, which is still an open problem for many strategies. And, trigger agnostic methods promise to maintain accuracy even when an adversary tries to manipulate pruning. To better preserve the model’s accuracy on clean data, researchers suggest watermark removal framework REfiT that leverages elastic weight consolidation. This defensive training process delays the learning of model weights while updating other weights that are likely responsible for memorizing watermarks.
Popular methods: REfiT, WILD.
Randomized smoothing was originally proposed to defend against evasion attacks. The technique here is to pick a smoothed version of a base model whose model prediction on each data point is replaced with popular prediction in the vicinity of the data point. The researchers state that these outputs of the smoothed model can be computed efficiently regardless of the perturbations.
Differential privacy offers a mathematical framework to anonymise data. It is a high-assurance, analytic means of ensuring that use cases like these are addressed in a privacy-preserving manner. Even companies like Apple leverage differential privacy techniques while collecting feedback from its users in a safe way. The objective of differential privacy is to ensure models will not be disproportionately affected by poisoned samples. The core idea is that if the output of the algorithm remains essentially unchanged when one individual input point is added or subtracted, the privacy of each individual is preserved.
Defenses For Federated Learning
Federated learning pushes the boundary further by sharing the information across the devices with sophisticated anonymity. The main drivers behind FL are privacy and confidentiality concerns and regulatory compliance requirements. To avoid attacks on FL regimes, the researchers introduced Robust federated aggregation algorithms that attempt to nullify the effects of attacks while aggregating client updates.
These algorithms are designed to identify and bring down the weights (avoiding wright poisoning) of the malicious updates. Another method is to compute aggregates in a way which is resistant to poisons. In addition to robust federated aggregation to mitigate poisoning attacks, the researchers also recommend clipping the norm of model updates and adding Gaussian noise to mitigate backdoor attacks based on the model replacement paradigm.
While the aforementioned defense strategies have been promising, the researchers expect to resolve other persisting problems in the future:
- Defenses on problems other than image classification.
- Avoiding poisoning while maintaining accuracy.
- Defense without access to training protocol.
Read the full survey here.