0%

(2018) DARCCC:Detecting Adversaries by Reconstruction from Class Conditional Capsule

Posted on 2019-02-26 In Paper Note , Adversarial Attack Views:

Frosst N, Sabour S, Hinton G. DARCCC: Detecting Adversaries by Reconstruction from Class Conditional Capsules[J]. arXiv preprint arXiv:1811.06969, 2018.

1. Overview

In this paper

present a simple tachnique that allows capsule models (DARCCC) to detect advertisarial images
set a threshold on L2 distance between input image and reconstruction from winning capsule

same technique works well for CNNs
explore a stronger white-box attack (R-BIM) that takes the reconstruction error into account. However the generated adversarial examples do not look like the original image but with a small amount of adder noise
experiments on MNIST, Fashion-MNIST and SVHN

CapsuleNet has been proven to be more robust to white box attacks while being as weak as CNNs in defending black box attacks (this paper address this shortcoming)

2. Methods

2.1. Histogram of L2 Distance

2.2. Network

2.3. Threshold

set it as the 95th percentile of validation distances. That meas FPR on real validation is 5%

3. Experiments

3.1. Block Box Attack

CapsuleNet is as weak as CNN

3.2. White Box Attack

CapsuleNet is more robust to CNN

3.3. R-BIM

R-BIM is significantly less successful than a standard BIM attack in changing the classification
CapsuleNet in particular exhibits significant resilience to this attack

3.4. Visualize

some generated adversarial examples look like ‘0’. These are not adversarial images at all since they resemble their predicted class to the human eye