Frosst N, Sabour S, Hinton G. DARCCC: Detecting Adversaries by Reconstruction from Class Conditional Capsules[J]. arXiv preprint arXiv:1811.06969, 2018.
1. Overview
In this paper
- present a simple tachnique that allows capsule models (DARCCC) to detect advertisarial images
- set a threshold on L2 distance between input image and reconstruction from winning capsule
- same technique works well for CNNs
- explore a stronger white-box attack (R-BIM) that takes the reconstruction error into account. However the generated adversarial examples do not look like the original image but with a small amount of adder noise
- experiments on MNIST, Fashion-MNIST and SVHN
1.1. Related Works
- CapsuleNet has been proven to be more robust to white box attacks while being as weak as CNNs in defending black box attacks (this paper address this shortcoming)
2. Methods
2.1. Histogram of L2 Distance
2.2. Network
2.3. Threshold
- set it as the 95th percentile of validation distances. That meas FPR on real validation is 5%
3. Experiments
3.1. Block Box Attack
- CapsuleNet is as weak as CNN
3.2. White Box Attack
- CapsuleNet is more robust to CNN
3.3. R-BIM
- R-BIM is significantly less successful than a standard BIM attack in changing the classification
- CapsuleNet in particular exhibits significant resilience to this attack
3.4. Visualize
- some generated adversarial examples look like ‘0’. These are not adversarial images at all since they resemble their predicted class to the human eye