0%

(CVPR 2018) Deflecting adversarial attacks with pixel deflection

Prakash A, Moran N, Garber S, et al. Deflecting adversarial attacks with pixel deflection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8571-8580.



1. Overview


1.1. Motivation

  • image classifier tend to be robust to natural noise
  • adversarial attacks tend to be agnostic to object location
  • most attacks search the entire image plane for adversarial perturbation without regard for the lacation of the image content

In this paper, it proposed pixel deflection + wavelet to defense attack

  • force the image to match natural image statistics
  • use semantic maps to obtain a better pixel to update
  • locally corrupts the image by redistributing pixel values via a process we term pixel deflection
  • then waelet-based denoising operation to softens the corruption

1.2.1. Attack

  • FGSM
  • IGSM
  • L-BFGS. minimize L2 distance between the image and adversarial example
  • Jacobian-based Saliency Map Attack (JSMA). modify the pixels which are most salient, targeted attack
  • Deep Fool. untargeted, approximate the classifier as a linear decision boundary then find the smallest perturbation needed to cross that boundary
  • Carlini&Wagner (C&W). Z_k: the logits of a model for a given class k


1.2.2. Defense

  • ensemble
  • distillation
  • transformation
  • quilting + TVM
  • foveation-based mechanism. crop the image around the object and then scale it back to the original size
  • random crop + random pad



2. Pixel Deflection


  • most deep classifiers are robust to the presence of natural noise, such as sensor noise

2.1. Algorithm



  • random sample a pixel
  • replace it with another randomly selected pixel from within a small square neighbourhood


  • even changing as much as 1% of the original pixels does notalter the classification of a clean image

2.2. Distribution of Attacks



2.3. Targeted Pixel Deflection

  • In natural image, many pixels do not correspond to a relevant semantic object and are therefore not salient to classification

2.3.1. Robust Activation Map





  • an adversary which successfully changes the most likely class tends to leave the rest of the top-k classes unchanged
  • 38% of the time the predicted class of adversarial images is the second highest class of the model for the clean ima



3. Wavelet Denoising


3.1. Hard Thresholding

  • all coefficients with magnitude below the threshold are set to zero

3.2. Soft Thresholding



3.3. Adaptive Thresholding

  • VisuShrink. N pixels and σ of noise



  • BayesShrink. model the threshold for each wavelet coefficient as a Generalized Gaussian Distribution (GGD)





4. Methods


  1. corrupt the image with pixel deflection
  2. soften the impact of pixel deflection
  • convert image to YCbCr which has denoising advantages to the wavelet
  • project the image into the wavelet domain (use db1, but db2 annd haar have similar results)
  • soft threshold the wavelets using BayesShrinks
  • convert image back to RGB



5. Experiments