Song Y, Kim T, Nowozin S, et al. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples[J]. arXiv preprint arXiv:1710.10766, 2017.
- adversarial examples mainly lie in the low probability regions of the training distribution
In this paper, it proposed PixelDefend methods
- using statistical hypothesis testing, find modern neural density models are good at detecting imperceptible perturbations
- 63% to 84% for Fashion MNIST
- 32% to 70% for CIFAR-10
- show that generative models can be used for detecting adversarially perturbed images and observe that most adversarial exmaples lie in low probability regions
- introduce a novel family of defend methods. PixelDefend (one of this family)
- CIFAR-10 performance
- Random Perturbation
- BIM (Basic Iterative Methods)
- Adversarial Training.
- FGSM adversarial examples (most commonly used)
- train with BIM has witness success in small datases, but has reported failure in larger ones
- Label Smoothing (defensive distillation).
- convert one-hot labels to soft targets
- correct class 1-ε; wrong class ε/(N-1)
- Feature Squeezing.
- reduces the color range from [0, 255] to a smaller value, then smooths the image with a median filter
- Fashion MNIST
- bit per dimension
the distribution of log-likelihood
p-values (compute by PixelCNN)
- trade-off. choose ε_defend overestimate ε_attack
- ε_defend = 0. input image probability is below a threshold value
- otherwise ε=manually chosen setting
- Attack with BIM. unrooling the PixelCNN is too deep, lead to vanishing gradient. Moreover, time consuming to attack (10 hours to generate 100 attacking images with one TITAN Xp GPU)
- optimization problem was not amenable to gradient descent
- PixelCNN and Classifier are trained separately and have independent parameters
p-values after defend