(CVPR 2019) CLEVR-Ref+:Diagnosing Visual Reasoning with Referring Expressions

Keyword [CLEVR-Ref+] [IEP-Ref] [IEP] [Nerual Module Networks]

Liu R , Liu C , Bai Y , et al. CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions[J]. 2019.

1. Overview

1.1. Motivation

1) Current benchmark datasets suffer from bias
2) Current SOTA models can not be easily evaluated on their intermediate reasoning process

In this paper, it builds CLEVR-Ref+ (convert from CLEVR in VQA)and propose IEP-Ref

  • control over dataset bias
  • the segmentation module in the end of IEP-Ref can be attached to any intermedia module to reveal the entire reasoning process
  • IEP-Ref can correctly predict no-foreground, even if all training data has at least one object referred

1.2. Contribution

1) construct CLEVR-Ref+ dataset
2) test several SOTA on CLEVR-Ref+, including IEP-Ref
3) segmentation module trained in IEP-Ref can be trivially plugged in all intermediate steps

2. CLEVR-Ref+

3. Experiments

3.1. Step-By-Step Inspection of Visual Reasoning

When testing, simply attach the trained Segment module to the output of all intermediate modules.

3.2. False-Premise Referring Expressions

Robust enough to return zero foreground.

4. IEP-Ref

1) Preprocess. Its output is the input to the Scene module.
2) Unary. Transform one feature to another (Scene, Filter X, Unique, Relate, Same X modules)
3) Binary. Transform two feature to one (And and Or)
4) Postprocess. segmentation module