Baluja S, Fischer I. Adversarial transformation networks: Learning to generate adversarial examples[J]. arXiv preprint arXiv:1703.09387, 2017.
1. Overview
1.1. Motivation
- existing methods either directly computing gradients of solving an optimization on the image pixels
In this paper, it proposed Adversarial Transformation Network (ATN)
- in a self-supervised manner to generate adversarial examples
- fast to excute
2. Methods
2.1. ATN
- transform an input into an adversarial example against a target network or set of networks
focus on targeted, white-box ATNs
f. target network
- g. parameter vector
2.2. Training
- L_X. loss function in the input space or perceptual loss
- L_Y. specially-formed loss on the output space
- β. weight to balance
2.3. Inference
- even faster than the single-step gradient-based methods, so long as
2.4. Loss Functions
- r(*). reranking function
2.5. Reranking Function
simplest way. set r(y, t) = onehot(t)
α>1. specify how much larger y_t should be than the current max classification