Summary

The proposed model is a Fully Convolutional Network (FCN) with soft attention on the patch representations (Contextual Attention-based Memory Network, or CAMN). The attention network iteratively refines its output using an RNN, which makes it an Episodic-CAMN.

Basically, the model is VGG + Recurrent soft attention inserted between FC6 and FC7.

Experiments and Results

Datasets:

  • PASCAL-Context
  • SIFT Flow
  • PASCAL VOC 2011

They only compare with VGG-based networks with similar settings.