Xiao T, Li S, Wang B, et al. Joint detection and identification feature learning for person search[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3376-3385.

1. Overview

1.1. Motivation

existing methods mainly focus on matching cropped pedestrian images between queries and candidates (assum perfect detection)

In this paper, it proposed a framework for person search

jointly handle pedestrian detection and person re-identification
- proposal net. focus more on the recall rather than the precision
- misalignments of proposal can be furtheradjusted by the identification net

Online Instance Matching (OIM) loss function
collect and annotate a large-scale benchmark dataset

1.2. Comparison of Loss Function

pairwise or triplet loss. O(N^2) need efficient strategy, difficult to find
softmax. compare all samples at the same time, as the number of classes increases, training the big softmax classifier matrix become much slower or even can not converge
OIM.
- compare samples of mini-batch with all registered entries
- unlabeled identities can be served as negatives for labeled identities

1.3. Contribution

jointly optimization
OIM loss function
dataset

1.4.1. Person Re-identification

manually design discriminative features
learn feature transforms across camera views
learning distance metrics
CNN
triplet samples
classify
on abnormal images. low-resolution and partially occlude images

1.4.2. Pedestrian Detection

hand-crafted features. DPM, ACF and Checkerboard

1.5. Dataset

CUHK03
Market501
Duke

2. Method

2.1. Structure

output.
- 2048 dimension→ L2-normalized 256 dimension→ cosine similarities
  - 2048 dimension to proposal alignment

2.2. Online Instance Matching Loss

only consider the labeled abd unlabeled identities while leave the other proposals untouched
the look up table (LUT). L: the size of the table; D: vector dimension
forward. compute cosine similarities between the mini-batch sample and all the labeled identities.
x. the features of a labeled identity inside a mini-batch

backward. if targe class id is t, update t-th column of the LUT, and then scale to unit L2-norm
many unlabeled identities can be safely used as negative classes for all the labeled identities, and store in circular queue. Q: size of the queue
cosine similarities

The probability of x being recognized as the identity with class-id i

L. the number of different target people
Q. the size of circular queue to store unlabeled prople
τ. higher temperature leads to softer probability distribution

The probability of x being recognized as the i-th unlabeled identity

Maximization
Degradation

)

2.3. Drawback of Softmax

classifier matrix suffers from large variance of gradients and can not by learned effectively
- large number of identities, which only has several instances; each image contains a few identities
- learn more than 5,000 discriminant functions simultaneously, but during each SGD iteration we only have positive samples from tens of classes
can not exploit the unlabeled identities with softmax loss
OIM is non-parametric.
- potential drawback. overfit more easily, it find that projecting features into a L2-normalized low-dimensional subspace helps reduce overfitting

2.4. Scalability

when the number of identities increases, OIM could be time-consuming
approximate by sub-sampling the labeled and unlabeled identities

3. Dataset

3.1. Come From

hand-held camera to shoot
movie snapshots

3.2. Processing

ignore smaller heights than 50 pixels

3.3. Evaluation

no overlapped images or labeled identities between training and test set

3.4. Metrics

cumulative matching characteristics (CMC top-K)
mean averaged precision (mAP)

4. Experiments

4.1. Details

τ. 0.1
size of circular queue. 5,000
mini-batch. 2 images
learning rate. 0.001 to 0.0001 after 40k

4.2. Detection

4.3. Search

4.4. OIM

converge faster
consistently improves the test performance

4.5. Sub-sample of OIM

small number converge faster

4.6. Low-dimensional Subspace

project features into a proper low-rank subspace is very important to regularize the network training

4.7. Detection Recall

higher recall does not necessarily lead to higher person search performance, re-id method could still get confused on some false alarms
should not only focus on training re-id methods with manually cropped pedestrians, but should consider the detections jointly under the person search problem setting

4.8. Gallery Size

larger gallery, more difficult
all methods may suffer from some common hard samples