Keyword [DQN]

Bellver M, Giroinieto X, Marques F, et al. Hierarchical Object Detection with Deep Reinforcement Learning[J]. Advances in Parallel Computing, 2016

1. Overview

In this paper, it trained an intelligent agent to decide where to look

2. Methods

state. descriptor of current region + memory vector (last 4 actions, 6 * 4=24 dimension)
action. 5 movement + terminal
reward.

ε-greedy policy. start with ε=1, decrease until ε=0.1 in step of 0.1
start with random actions, and at each epoch the agent takes decisions relying more on the already learnt policy
to help the agent learn terminal action, we force it each time the current region has a IoU>0.5

(s, a, r, s’)

consecutive experiences in this paper is very correlated, lead to inefficient and unstable learning
to solve it, we use random minibatches. 1000 experiences and batch size 100