Keyword [ESPCN] [Pixel Shuffle] [Optical Flow] [FlowNet]
Shi W, Caballero J, Huszár F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1874-1883.
1. Overview
在此前的工作中,首先将low resolution (LR)图像upscale(bicubic插值)到high resolution (HR)空间。然后,输入到神经网络中。然而,这种方法的计算复杂度很大。
论文提出ESPCN结构(efficient sub-pixel convolutional neural network)
- 降低计算复杂度与速度.
输入为LR空间图像(3 x h x w),网络结构中最后一层Conv层为efficient sub-pixel convolution layer (输出维度为3rr x h x w).最后进行rearrange(3 x rh x rw)成HR空间
Pipeline
不需使用bicubic提高准确度
1.1. 效果
- Real-time SR of 1080p videos on single K2 GPU
- r*r times faster
- Perform better (+0.15dB on Images, +0.39dB on Videos)
1.2. SR Problem
- ill-posed problem
- multiple solutions (one-to-many mapping).
- key assumption. Much of the high-frequency data(边缘) is redundant and thus can be accurately reconstructed from low frequency components
1.3. Related Work
- Edge-based
- Image statistics-based
- Patch-based
- Sparsity-based (sparse coding). dictionary (prior) discover correspondence between LR and HR. Computation expensive
- Random forest
- Auto-encoder
- SRCNN
1.4. Dataset
1.4.1. Image
- Timofte (widely used by SISR paper)
91张训练图片,2个测试集(Set5, Set14分别包含5张、14张图片). - Berkeley segmentation dataset (BSD300, BSD500)
- Super texture dataset
136张texture图片. - ImageNet
机选取5000张.
1.4.2. Video
- Xiph
8 1920x1080 videos, length ≈10s. - Ultra Video Group
7 1920x1080 videos, length 5s.
1.5. Future Work
- Neighbouring video frames
- Spatial-temporal network
2. Experiments
2.1. 网络结构
实验中的ESPCN结构:
- Input (b, 3, h, w)
- Conv_1 (5x5, 64, 1s) –> (b, 64, h, w)
- Conv_2 (3x3, 32, 1s) –> (b, 32, h, w)
- Conv_3 (3x3, rr, 1s) –> (b, 3r*r, h, w)
- PixelShuffle (r) –> (b, 3, rh, rw)
模型使用tanh,实验中与relu进行比较。
2.2. Loss Function
MSE.
2.3. 评价指标
PSNR of luminance in YCbCr.