(CVPR 2019) Neighbourhood Watch:Referring Expression Comprehension via Language-guided Graph Attention Networks

Keyword [LGRAN]

Wang P, Wu Q, Cao J, et al. Neighbourhood watch: Referring expression comprehension via language-guided graph attention networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 1960-1968.

1. Overview

1.1. Motivation

  • Mattnet. language and region features are learned or designed independently without being informed by each other.

In this paper, it proposes Language-guided Graph Attention Network (LGRAN)
1) Language Self-attention Module
2) Language-guided Graph Attention (node attention & edge attention)

Make referring expression decision both visualisable and explainable.

  • directed graph
  • node. object set of proposals or GTs
  • edge
    • intra-class edge. spatial relationship
    • inter-class edge. spatial relationship + other objects’ visual feature

1.2. Dataset

  • RefCOCO
  • RefCOCO+
  • RefCOCOg


2.1. Problem

1) Given an image $I$, localise the object $o’$ referred to by r from the object set $O={o_i}, i=1,…,N$ of $I$.
2) $O$ is given as GT or obtained by an proposal generation method.

2.2. Language Self-Attention Module

1) $s_{sub}$, $w_{sub}$
2) $s_{intra}$, $w_{intra}$
3) $s_{inter}$, $w_{inter}$

2.3. Language-guided Graph Attention Module

2.3.1. Graph Construction

1) Node set. $V={v_i}, i=1,…,N$
2) Edge set. $E^{intra}$, $E^{inter}$

3) intra-class edge set. top $k$ objs based on distance.
4) inter-class edge set. top $k$ other objs based on distance.
5) $k=5$

6) Node $v_i$. img feature $v_i$ (512) + $l_i$ (5)

7) $e_{ij}$

8) $x_c$. centre coordinate

2.3.2. Language-guided Graph attention

  • the Node Attention

  • the Intra-class Edge Attention

  • the Inter-class Edge Attention

2.3.3 The Attended Graph Representation

2.3.4 Matching Module


3. Experiments

  • IOU > 0.5. prediction is true