Reed S E, Akata Z, Mohan S, et al. Learning what and where to draw[C]//Advances in Neural Information Processing Systems. 2016: 217-225.
1. Overview
In this paper, it attempted to synthesis and meet two requirements

disentangle the semantic information from two modalities and generate new images from the combined semantics.
- realistic while matching the target text description
- maintain other image features that are irrelevant to the text description
1.1. Related Work
- deterministic networks
- VAE
- autoregression
- VAE
- GAN
2. Methods

2.1. Architecture
- CA technique from StackGAN
- residual. output image would retain similar structure of the source image
2.2. Adaptive Loss for Semantic Image Synthesis

- +. positive
- -. negative
2.3. Loss Function

2.4. Improving Image Feature Representation
- pretrained VGG of conv4
2.5. Visual-Semantic Text Embedding
- pair-wise ranking loss
3. Experiments
3.1. Details
- 0.0002 Adam with 0.5 momentum, decrease by 0.5
- batch size 64
- flipping, rotating, zooming, cropping
3.2. Comparison


3.3. Interpolation


3.4. Variety
