Keyword [Group Normalization]

Wu Y, He K. Group normalization[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19.

1. Overview

1.1. Motivation

Normalization along batch dimension introduces problems: when batch size smaller, BN’s error increase rapidly
BN helps to converge (stochastic uncertainty of batch statistics acts as a regularizer, benifit generalization), but worse for small batch
SIFT, HOG. group-wise feature and group-wise normalization
BN’s statistics are computed for each GPU, not broadcast across all GPU

In this paper, it proposed Group Normalization (GN)

independent of batch
divide channels into groups and compute μ, σ of each group

1.2.1. Normalization

Local Response Normalization (LRN). compute the statistics in a small neighbourhood for each pixel
BN
Layer Normalization (LN)
Instance Normalization (IN)
Weight Normalization (WN)
LN, IN, WN. independent with batch.
LN, IN. successful in RNN and GAN model.

1.2.2. Addressing Small Batch

Batch Renormalization (BR). two parameters constraint the μ,σ of BN
Synchronized BN. μ,σ computed across multiple GPUs

1.2.3. Group-wise Computation

group convolution. AlexNet
ResXNet
depth-wise. MobileNet, Xception
ShuffleNet

1.3. Dataset

ImageNet. Classification
COCO. obj detection, Segmentation
Kinectics. Video Classification

1.4. Group Normalization

Relation in Group

horizontal
orientation
frequency
shape
illumination
texture

general formulation
BN (along NHW)
LN (along HW)
GN (along HWC_group)

G. group number; C/G. channel per group

(G=1)→ LN (assume all channels make similar contribution, more stricted than GN)
(G=C)→ IN (not exploit channel dependence)

1.5. Future Works

investigate GN in reinforcement learning (RL)

2. Experiments

2.1. Ablation Study

2.1.1. Batch Size (Classification)

Linear Learning Rate Scaling Rule. LR 0.1 for size 32, LR 0.1N/32 for size N.

2.1.2. Batch Size (Video Classification)

2.1.3. Group & Channel Number

2.1.4. Distribution

2.2. Comparison

2.2.1. Classification

2.2.2. Detection & Segmentation

replace BN* with GN, when fine-tuneing weight decay of 0 for γ and β is important for good detection results
the distribution of RoIs batches sampled from the same image is not i.i.d. degrades BN’s estimation

(ECCV 2018) Group normalization

1. Overview

1.1. Motivation

1.2.1. Normalization

1.2.2. Addressing Small Batch

1.2.3. Group-wise Computation

1.3. Dataset

1.4. Group Normalization

1.5. Future Works

2. Experiments

2.1. Ablation Study

2.1.1. Batch Size (Classification)

2.1.2. Batch Size (Video Classification)

2.1.3. Group & Channel Number

2.1.4. Distribution

2.2. Comparison

2.2.1. Classification

2.2.2. Detection & Segmentation

2.2.3. Video Classification

1. Overview

1.1. Motivation

1.2. Related Works

1.2.1. Normalization

1.2.2. Addressing Small Batch

1.2.3. Group-wise Computation

1.3. Dataset

1.4. Group Normalization

1.5. Future Works

2. Experiments

2.1. Ablation Study

2.1.1. Batch Size (Classification)

2.1.2. Batch Size (Video Classification)

2.1.3. Group & Channel Number

2.1.4. Distribution

2.2. Comparison

2.2.1. Classification

2.2.2. Detection & Segmentation

2.2.3. Video Classification