# Iterative Projection and Matching

# Highlights

The authors propose the data-selection method ‘*Iterative Projection and Matching*’ which is computationally
efficient and features a high ‘*selection accuracy*’.

The approach is evaluated in different experiments on active learning, learning using representatives, training of a GAN or video summarization.

# Introduction

Data selection, i.e. choosing \(K\) representatives from a set of \(N\) samples, can become a non-trivial problem when \(N\) gets large. There are two main approaches of convex relaxation:

- D-optimal solutions (stochastic: volume sampling (VS))
- A-optimal solutions.

Disadvantages of these methods are:

- lack of guarantee that ‘
*un-selected samples are well-covered by the selected ones*’ - ‘
*outliers are selected with a high probability*’ due to favouring**diversity**of samples.

An alternative are ‘*representative approaches*’ which have the disadvantages that

- they have a high time complexity, and
- rely on proper fine-tuning of parameters.

The authors propose ‘*Iterative Projection and Matching*’ (IPM) to address these issues.
Their particular contributions are:

- linear time complexity
- no parameters
- theoretical underpinning of properties of IPM.

# Methods

‘*Let \(\mathbf{a}_1, \ldots, \mathbf{a}_M \in \mathbb{R}^N\) be \(M\) given data points of dimension \(N\).
We define an \(M \times N\) matrix, \(\mathbf{A}\), such that \(\mathbf{a}_m^T\) is the \(m^th\) row of \(\mathbf{A}\),
for \(m = 1, \ldots M.\) The goal is to reduce this matrix into a \(K \times N\) matrix, \(\mathbf{A}_R,\) based
on an optimal metric.*’

### Representative selection

The central problem is formulated as

\[\arg\min_{|\mathbb{T}| = K} \|\mathbf{A} - \pi_\mathbb{T}(\mathbf{A})\|_F^2,\]where \(\mathbb{T} \subset \{1, \ldots, M\}\), \(\pi_\mathbb{T}(\mathbf{A})\) is the matrix of rows of \(\mathbf{A}\) projected onto the span of selected rows indexed by \(\mathbb{T}\), and \(\|\cdot\|_F\) denotes the Frobenius norm.

### IPM

The problem can be reformulated in terms of

\[\arg\min_{\mathbf{U},\mathbf{V}} \| \mathbf{A} - \mathbf{UV}^T \|_F^2 \; \mathrm{s.t.} \; \mathbf{v}_k \in \mathbb{A},\]where \(\mathbb{A}\) contains the normalised rows of \(\mathbf{A}\), and \(\mathbf{v}_k\) is the \(k\)-th column of \(\mathbf{V}\).

This is further simplified to

\[(\mathbf{u}, \mathbf{v}) = \arg\min_{\mathbf{u}, \mathbf{v}} \| \mathbf{A} - \mathbf{uv}^T \|_F^2 \; \mathrm{s.t.} \; \|\mathbf{v}\| = 1,\] \[m^{(1)} = \arg\max_m |\mathbf{v}^T\tilde{\mathbf{a}}_m|,\]where \(\tilde{\mathbf{a}}_m\) is the normalised \(m\)-th row of \(\mathbf{A}\).

### Properties of IPM

#### Lower bound on maximum correlation

The authors show theoretically, that the lower bound for the maximum correlation between
\(\mathbf{v}\) and a data point \(\mathbf{a}_i\) is given by the ‘*rank-oneness measure \(\mathrm{ROM}(\mathbf{A})\)*’.
(See paper for details.) This ensures that they always find a data point which is sufficiently similar to
the right singular vector \(\mathbf{v}\) of \(\mathbf{A}\).

#### Robustness to outliers

In another analysis, it is shown that the first singular vector \(\mathbf{v}\) of \(\mathbf{A}\) ‘*is the most
robust spectral component against changes in the data.*’ (See paper for details.)

# Results

#### Active learning

Fine-tuning a 3D *ResNet18* (pretrained on *Kinetics-400*) on *UCF-101* (human action dataset) the authors compare
different data selection methods in a classification experiment.

At the first active learning cylce, the training set consists of **one sample per class** in the dataset.

At each following cycle, ‘*one sample per class is selected, without the knowledge of the labels, and added
to the training set*’. In each cycle, ‘*the fully connected layer […] is fine-tuned for 60 epochs.*’

The authors draw the following conclusions:

- in early cycles, uncertainty-based methods perform worse than random selection,
- IPM is outperforming all other methods in early cycles,
- adding uncertainty by \(m^\ast = \arg\max_m \alpha |\mathbf{v}^T\tilde{\mathbf{a}}_m| + (1 - \alpha) q(\mathbf{a}_m)\) letting alpha decay with rate 0.95 (from \(\alpha = 1\) in the first cycle) is allowing for best performance.

#### Learning using representatives

The strategy here is to select a subset of representative samples from each class of a dataset and train only on the reduced dataset.

The ‘*CMU Muli-PIE Face Database*’ (249 subjects, 13 poses, 20 illuminations, 2 expressions) is reduced to 10 images
per subject using different selection methods.

IPM slects the best representatives in terms of different view angles.

Also, the results of a GAN for multi-view generation from single-view input images are best using IPM compared to other methods.

#### Visualising efficiency of sampled representatives

Using the 3D ResNet18 classifier pretrained on Kinetics-400, the output features of the last convolutional layer are being used to select representatives from UCF-101. Using t-SNE visualization, the authors show that IPM enables the fine-tuned classifier to learn the separation between two random classes better compared to other selection methods.

#### Other experiments

Further results on

- finding representatives for
*ImageNet*, and - video summarization are shown. (See paper for details.)

# Conclusions

The authors could show that their approach for data selection is

- simple (and therefore easily extensible)
- time-efficient, and
- data-efficient.

In their experiments, the approach outperforms the competitors in terms of

- performance of the trained networks, and
- performance of data selection (computation cost, representativity of the dataset).