(ICLR 2018) PixelDefend:Leveraging Generative Models to Understand and Defend against Adversarial Examples

Song Y, Kim T, Nowozin S, et al. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples[J]. arXiv preprint arXiv:1710.10766, 2017.

1. Overview

1.1. Motivation

  • adversarial examples mainly lie in the low probability regions of the training distribution

In this paper, it proposed PixelDefend methods

  • using statistical hypothesis testing, find modern neural density models are good at detecting imperceptible perturbations
  • 63% to 84% for Fashion MNIST
  • 32% to 70% for CIFAR-10

1.2. Contribution

  • show that generative models can be used for detecting adversarially perturbed images and observe that most adversarial exmaples lie in low probability regions
  • introduce a novel family of defend methods. PixelDefend (one of this family)
  • CIFAR-10 performance

1.3. Attack Methods

  • Random Perturbation
  • FGSM
  • BIM (Basic Iterative Methods)
  • DeepFool
  • CW

1.4. Defense Methods

1.4.1. Change Network & training procedure

  • Adversarial Training.
    • FGSM adversarial examples (most commonly used)
    • train with BIM has witness success in small datases, but has reported failure in larger ones
  • Label Smoothing (defensive distillation).
    • convert one-hot labels to soft targets
    • correct class 1-ε; wrong class ε/(N-1)

1.4.2. Modify Adversarial Examples

  • Feature Squeezing.
    • reduces the color range from [0, 255] to a smaller value, then smooths the image with a median filter

1.5. Datasets

  • Fashion MNIST
  • CIFAR-10

1.6. Model

  • ResNet
  • VGG

1.7. Detecting Adversarial Examples

  • bit per dimension

  • the distribution of log-likelihood

  • p-values (compute by PixelCNN)

1.8. PixelDefend

  • trade-off. choose ε_defend overestimate ε_attack

1.9. Adaptive PixelDefend

  • ε_defend = 0. input image probability is below a threshold value
  • otherwise ε=manually chosen setting

1.10. Defensive

  • Attack with BIM. unrooling the PixelCNN is too deep, lead to vanishing gradient. Moreover, time consuming to attack (10 hours to generate 100 attacking images with one TITAN Xp GPU)
  • optimization problem was not amenable to gradient descent
  • PixelCNN and Classifier are trained separately and have independent parameters

1.11. Experiments

  • p-values after defend

  • Comparision