Highlights

Simple regularization method that encourages neural networks to predict less confidently on interpolations of hidden representations
Smoother decision boundaries
Learned representations with less directions of variance
Better generalization than other competitive regularizers (e.g. dropout)
Improved robustness to adversarial attacks

Method

Manifold mixup is based on Input Mixup, where pairs of inputs and targets are interpolated to create new data.

“[…], mixup regularizes the neural network to favor simple linear behavior in-between training examples.”

Manifold mixup is a simple data augmentation method, which consists of interpolating pairs of hidden activations of inputs and labels (one-hot encodings for classification)

Procedure:

Select a random layer in the network (may include the input layer)
Sample two minibatches of data, and run them through the network up to the selected layer
Perform Input mixup by sampling pairs of inputs/labels, and interpolating between them to build a “mixed minibatch”
Run the mixed minibatch through the rest of the network to produce the output
Compute the loss between the output and the “mixed label” (interpolation between the original labels)

Smoother decision boundaries
Concentration of hidden states with regions of low confidence
Flattening effect on the hidden states

Experiments

Results

CIFAR-10 and CIFAR-100
SVHN
MNIST learned representations