(ICCV 2019) An Internal Learning Approach to Video Inpainting

Keyword [Deep Image Prior]

Zhang H, Mai L, Xu N, et al. An Internal Learning Approach to Video Inpainting[J]. arXiv preprint arXiv:1909.07957, 2019.

1. Overview

In this paper, it proposes a video inpainting method (DIP-Vid-FLow)
1) Based on Deep Image Prior.
2) Based on Internal Learning (some loss funcitions).

2. Algorithm

2.1. Loss Function

$L=\omega_r L_r + \omega_f L_f + \omega_c L_c + \omega_p L_p$

1) $\omega_r=1$. weight of image generation loss.
2) $\omega_f=0.1$. weight of flow generation loss.
3) $ \omega_c=1$. weight of consistency loss.
4) $\omega_p=0.01$. weight of perceptual loss.

2.2. Image Generation Loss

$L_r(\hat{I}_i)=||M_i \odot (\hat{I}_i - I_i)||_2^2$

1) $M_i$. Binary mask of knowing regions

2.3. Flow Generation Loss

$L_f(\hat{F_{i,j}})=||O_{i,j}\odot M^f_{i,j}\odot (\hat{F_{i,j}}- F_{i,j}) ||_2^2$

1) $F_{i,j}$. from frame $I_i$ to frame $I_j$.
2) $M^f_{i,j} = M_i \cap M_j (F_{i,j})$. The reliable flow estimation computed as te intersection of aligned masks of frame $i$ to $j$.
3) 6 adjacent frames $j \in {i \pm 1, i \pm 3, i \pm 5}$.
4) $O_{i,j}, \hat{F_{i,j}}$. estimated occlusion map and flow from PWC-Net.

2.4. Consistency Loss

$L_c(\hat{I_j}, \hat{F_{i,j}}) = || (1-M_{i,j}^f) \odot ( \hat{I_j}(\hat{F_{i,j}}) - \hat{I_i}) ||_2^2$

1) $I(F)$. warp.
2) $1 - M_{i,j}^f$. encourage the training to foucs on propagating information inside the hole.

2.5. Perceptual Loss

$L_p(\hat{I_i}) = \sum_{k \in K} || \psi_k (M_i) \odot (\phi_k (\hat{I_i}) - \phi_k(I_i)) ||_2^2$
1) 3 layers {relu1_2, relu2_2, relu3_3} of VGG16 pre-trained.

2.6. Details

1) Pick $N$ frames which are consecutive with a fixed frame interval of $t$ as a batch. Find that this helps propagate the information more consistently across the frames in the batch.
2) Find that 50-100 updates per batch is best.

3. Experiments

3.1. Ablation Study

3.2. Window Length