Prakash A, Moran N, Garber S, et al. Deflecting adversarial attacks with pixel deflection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8571-8580.
1. Overview
1.1. Motivation
- image classifier tend to be robust to natural noise
- adversarial attacks tend to be agnostic to object location
- most attacks search the entire image plane for adversarial perturbation without regard for the lacation of the image content
In this paper, it proposed pixel deflection + wavelet to defense attack
- force the image to match natural image statistics
- use semantic maps to obtain a better pixel to update
- locally corrupts the image by redistributing pixel values via a process we term pixel deflection
- then waelet-based denoising operation to softens the corruption
1.2. Related Work
1.2.1. Attack
- FGSM
- IGSM
- L-BFGS. minimize L2 distance between the image and adversarial example
- Jacobian-based Saliency Map Attack (JSMA). modify the pixels which are most salient, targeted attack
- Deep Fool. untargeted, approximate the classifier as a linear decision boundary then find the smallest perturbation needed to cross that boundary
- Carlini&Wagner (C&W). Z_k: the logits of a model for a given class k
1.2.2. Defense
- ensemble
- distillation
- transformation
- quilting + TVM
- foveation-based mechanism. crop the image around the object and then scale it back to the original size
- random crop + random pad
2. Pixel Deflection
- most deep classifiers are robust to the presence of natural noise, such as sensor noise
2.1. Algorithm
- random sample a pixel
- replace it with another randomly selected pixel from within a small square neighbourhood
- even changing as much as 1% of the original pixels does notalter the classification of a clean image
2.2. Distribution of Attacks
2.3. Targeted Pixel Deflection
- In natural image, many pixels do not correspond to a relevant semantic object and are therefore not salient to classification
2.3.1. Robust Activation Map
- an adversary which successfully changes the most likely class tends to leave the rest of the top-k classes unchanged
- 38% of the time the predicted class of adversarial images is the second highest class of the model for the clean ima
3. Wavelet Denoising
3.1. Hard Thresholding
- all coefficients with magnitude below the threshold are set to zero
3.2. Soft Thresholding
3.3. Adaptive Thresholding
VisuShrink. N pixels and σ of noise
BayesShrink. model the threshold for each wavelet coefficient as a Generalized Gaussian Distribution (GGD)
4. Methods
- corrupt the image with pixel deflection
- soften the impact of pixel deflection
- convert image to YCbCr which has denoising advantages to the wavelet
- project the image into the wavelet domain (use db1, but db2 annd haar have similar results)
- soft threshold the wavelets using BayesShrinks
- convert image back to RGB