Keyword [Pre-activation] [Identity Mapping]
He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks[C]//European Conference on Computer Vision. Springer, Cham, 2016: 630-645.
1. Overview
In this paper, it analyzed the propagation formulations behind the residual building blocks
- identity mapping is better
- pre-activation is better
1.1. Identity Mapping
1.2. Pre-Activation
- pre-activation (BN + ReLU) can normalize the signal of residual + F(x) which is not normalized