0%

(ICLR 2018) PixelDefend:Leveraging Generative Models to Understand and Defend against Adversarial Examples

Posted on 2018-12-21 In Paper Note , Adversarial Attack Views:

Song Y, Kim T, Nowozin S, et al. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples[J]. arXiv preprint arXiv:1710.10766, 2017.

1. Overview

1.1. Motivation

adversarial examples mainly lie in the low probability regions of the training distribution

In this paper, it proposed PixelDefend methods

using statistical hypothesis testing, find modern neural density models are good at detecting imperceptible perturbations
63% to 84% for Fashion MNIST
32% to 70% for CIFAR-10

1.2. Contribution

show that generative models can be used for detecting adversarially perturbed images and observe that most adversarial exmaples lie in low probability regions
introduce a novel family of defend methods. PixelDefend (one of this family)
CIFAR-10 performance

1.3. Attack Methods

Random Perturbation
FGSM
BIM (Basic Iterative Methods)
DeepFool
CW

1.4. Defense Methods

1.4.1. Change Network & training procedure

Adversarial Training.
- FGSM adversarial examples (most commonly used)
- train with BIM has witness success in small datases, but has reported failure in larger ones
Label Smoothing (defensive distillation).
- convert one-hot labels to soft targets
- correct class 1-ε; wrong class ε/(N-1)

1.4.2. Modify Adversarial Examples

Feature Squeezing.
- reduces the color range from [0, 255] to a smaller value, then smooths the image with a median filter

1.5. Datasets

Fashion MNIST
CIFAR-10

1.6. Model

ResNet
VGG

1.7. Detecting Adversarial Examples

bit per dimension

the distribution of log-likelihood
p-values (compute by PixelCNN)

1.8. PixelDefend

trade-off. choose ε_defend overestimate ε_attack

1.9. Adaptive PixelDefend

ε_defend = 0. input image probability is below a threshold value
otherwise ε=manually chosen setting

1.10. Defensive

Attack with BIM. unrooling the PixelCNN is too deep, lead to vanishing gradient. Moreover, time consuming to attack (10 hours to generate 100 attacking images with one TITAN Xp GPU)
optimization problem was not amenable to gradient descent
PixelCNN and Classifier are trained separately and have independent parameters

1.11. Experiments

p-values after defend
Comparision