Keyword [Bilinear CNN]
Lin T Y, RoyChowdhury A, Maji S. Bilinear cnn models for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1449-1457.
1. Overview
人类大脑双流假说 (two-streams hypothesis). 假说认为大脑中有两种视觉系统
- 腹流(ventral stream; what pathway). 参与物体识别
- 背流(dorsal stream; where pathway). 处理物体相对于viewer的空间位置
基于上述假说,论文提出bilinear模型,该模型可end-to-end训练,有助于fine-grained分类问题。
模型分为两条stream,分别负责
- localization (where) [part detector]
- appearance modeling (what) [feature extractor]
但最终实验表明两条stream并没有明显的界限,它们都趋向于激活特定的semantic part.
2. 计算过程
得到两条stream输出的特征图后(h, w, c1), (h, w, c2)
- 首先,对应空间点进行外积操作,从而实现part-feature interaction. (h, w, c1*c2)
- 其次,进行sum-pooling操作. (c1*c2)
- 接着,进行signed square-root和L2归一化
- 最后,分类
3. 数据集
- (bird) CUB-200-2011. 11788张图片,200种鸟类
- (aircraft) FGVC-aircraft. 10000张图片,100中飞机类型
- (car) Cars. 16185张图片,196种车类型