The proposed model is a Fully Convolutional Network (FCN) with soft attention on the patch representations (Contextual Attention-based Memory Network, or CAMN). The attention network iteratively refines its output using an RNN, which makes it an Episodic-CAMN.
Basically, the model is VGG + Recurrent soft attention inserted between FC6 and FC7.
Experiments and Results
- SIFT Flow
- PASCAL VOC 2011
They only compare with VGG-based networks with similar settings.