0%

Shaham T R, Dekel T, Michaeli T. Singan: Learning a generative model from a single natural image[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 4570-4580.

# 1. Overview

In this paper, it proposes SinGAN model only from a single natural image
1) A pyramid of fully Conv cGANs. Each for learning patch distribution.

2) Generate new samples of arbitary size and aspect ratio.

# 2. Method

## 2.1. Formulation

1) $\tilde{x}_N = G_N(z_N)$
2) $\tilde{x}_n = G_n(z_n, (\tilde{x} _{n+1}) \uparrow^r), n \lt N = (\tilde{x} _{n+1}\uparrow^r) + \psi_n(z_n + (\tilde{x} _{n+1}) \uparrow^r )$
3) $\psi_n$ contains 5 Conv-BN-ReLU.
4) Start with 32 kernels per block at the coarsest scale. And increase by a factor of 2 every 4 scales.

## 2.2. Loss Function

$min_{G_n} max_{D_n} L_{adv} (G_n, D_n) + \alpha L_{rec}(G_n)$.

Reconstruction Loss
1) Ensure there exists a specific set of input noise maps
$\lbrace z_N^{rec}, z_{N-1}^{rec}, …, z_0^{rec} \rbrace = \lbrace z^{*}, 0, …, 0 \rbrace$.
2) $z^{*}$ fixed noise map during traning.
3) $L_{rec} = || G_n(0, ( \tilde{x} _{n+1}^{rec} ) \uparrow^r) - x_n ||^2$.
4) $L_{rec} = || G_N(z^{*}) - x_N ||^2, n = N$

# 3. Experiments

## 3.1. Explore SinGAN

1) Starting the generation from finer scales, enables to keep the global strucuture intact.

2) Train with different scales $N$.

## 3.3. Application

### 3.3.1. Super-Resolution

1) Reconstruction loss weight $\alpha = 100$.
2) Scale factor $r = \sqrt[ k]{s}, k \in N$
3) Train on LR image, then upsample LR image by $r$ and inject to $G_0$.

### 3.3.2. Paint-to-Image

1) Downsample clipart image, then feed it to one of coarse scales ($N-1, N-2$)

### 3.3.3. Harmonization

1) Train on background image, then inject a downsampled version of naively pasted composite at test time.

### 3.3.4. Editing

1) Inject a downsampled version of the composite into one of the coarse scales.
2) Combine SinGAN’s output at the edited regions, with the original image.

### 3.3.5. Single Image Animation

1) A random walk in z-space, starting with $z^{rec}$ for the first frame at all generation scales.