Oriented Response Networks

Model

This papers provides a new way to use convolutions. In conventional convolutional neural networks, the filters are only invariant to translation, but the authors propose a new technique to also make them invariant to rotations. This method is called Oriented Response Network (ORN). The ORN is built on top of Active Rotating Filters (ARF).

ARF

An ARF \(\mathcal{F}\) is a filter of size \(W \times W \times N\) where \(N\) is the number of rotations during the convolution. These produce a feature map of \(N-1\) orientations. The rotated variant of \(\mathcal{F}\) called \(\mathcal{F}_\theta\) are constructed in two steps: coordinate rotation, and orientation spin.

To be efficient, the coordinate rotation and orientation spin are calculated by the circular shift operator in the fourier domain.

Oriented Response Convolution (ORConv)

The ORConv can be seen as a composition of the ARF \(\mathcal{F}\) and an N-channel input feature map \(\mathcal{M}\). This combination is denoted as \(\hat\mathcal{M} = \mathbf{ORConv}(\mathcal{F}, \mathcal{M})\), where \(\hat\mathcal{M}\) are the output feature maps with \(\mathcal{N}\) orientations.

Rotation invariant Feature encoding

By default, the feature maps of ORN are not rotation invariant. When the task needs to be within-class rotation invariant they use two strategies, ORAlign and ORPooling.

Notation: \(\hat\mathcal{M}\{i\}^{(d)}\) where \(i\) is the i-th feature map of the ORConv layer, and \(d\) is the orientation.

ORAlign

The ORAlign simply computes the dominant orientation \(\mathcal{D} = \text{argmax}_d \hat\mathcal{M}\{i\}^{(d)}\) and rotate the feature by \(-\mathcal{D}\frac{2\pi}{N}\).

ORPooling

This pooling consists of simply extracting the maximum orientation of a given feature map \(\text{max}(\hat\mathcal{M}\{i\}^{(d)})\)

Results

They built an MNIST dataset with rotation from \([-\frac{\pi}{2}, \frac{\pi}{2}]\). They report results and show the feature encoding using tSNE for numbers that are similar, like \(6\) and \(9\).

They also report results on CIFAR10 dataset and produce better results with less parameters.