Keyword [CLEVR-Ref+] [IEP-Ref] [IEP] [Nerual Module Networks]
Liu R , Liu C , Bai Y , et al. CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions[J]. 2019.
1) Current benchmark datasets suffer from bias
2) Current SOTA models can not be easily evaluated on their intermediate reasoning process
In this paper, it builds CLEVR-Ref+ (convert from CLEVR in VQA)and propose IEP-Ref
- control over dataset bias
- the segmentation module in the end of IEP-Ref can be attached to any intermedia module to reveal the entire reasoning process
- IEP-Ref can correctly predict no-foreground, even if all training data has at least one object referred
1) construct CLEVR-Ref+ dataset
2) test several SOTA on CLEVR-Ref+, including IEP-Ref
3) segmentation module trained in IEP-Ref can be trivially plugged in all intermediate steps
When testing, simply attach the trained Segment module to the output of all intermediate modules.
Robust enough to return zero foreground.
1) Preprocess. Its output is the input to the Scene module.
2) Unary. Transform one feature to another (Scene, Filter X, Unique, Relate, Same X modules)
3) Binary. Transform two feature to one (And and Or)
4) Postprocess. segmentation module