(CVPR 2017) Rethinking Atrous Convolution for Semantic Image Segmentation

Keyword [DeepLabv3] [Dilated Conv] [ASPP]

Chen L, Papandreou G, Schroff F, et al. Rethinking Atrous Convolution for Semantic Image Segmentation[J]. arXiv: Computer Vision and Pattern Recognition, 2017.

1. Overview

In this paper, it proposes DeepLabv3 for segmentation.

  • Design modules which employ Atrous Conv in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates (multi-grid).
  • Augment Atrous Spatial Pyramid Pooling (ASPP) module.
  • Remove CRF.

1.1. Multi-scale Methods

2. Details

2.1. Cascaded Modules

2.2. Parallel Modules

2.3. Multi-graid Method

1) Apply different atrous rates to 3 Convs within $block4$ to $block7$.
2) If $MultiGrid=(1,2,4)$ and $rates=2$, then $MultiGrid=2 \cdot (1,2,4)$.

2.4. ASPP

1) Contains:3 Dilated Conv, 1 $1 \times 1$ Conv and Global AVGPool.
2) When rate is too large, Dilated Conv degrades to $1 \times 1$ Conv.

3. Experiments

3.1. Cascaded Modules

3.1.1 Output Stride

3.1.2. Deeper

3.1.3. Multi-Grid

3.1.4 Inference Strategy

3.2. Parallel Modules

3.3. Comparison