Highlights

The authors find two properties of neural networks.

  • There is no distinction between individual high-level units and random linear combinations of high-level units.
  • Existence and transferability of adversarial examples.

On the units

Let xRm be an input image and ϕ(x) the activation values of some layer. One can look at what inputs maximize the features of ϕ(x), that is:

The authors find that many images that satisfy

are semantically related to each other, where v is a random vector.

  • This puts into question the notion that neural networks disentangle variation factors across coordinates.

Adversarial examples

Let f:Rm{1,...,k} be a classifier with an associated loss function. For a given input xRm and target label l{1,...,k}, the aim is to solve

The minimizer is denoted D(x,l). This task is non-trivial only if f(x)l. The authors find an approximation of D(x,l) by line-search to find the minimum c>0 for which the minimum r of the following problems atisfies f(x+r)=l.

  • In the convex case this yields the exact solution.

Experiments

  • Existence of adversarial examples

  • Adversarial examples transfer to other architectures trained from scratch with different hyperparameters:

  • Adversarial examples transfer to other architectures trained on a disjoint training set:

  • Adding random noise to the input images is not as effective to misclassify samples when compared to adversarial examples generation:

Theoretical analysis

Conclusion

  • The semantic meaning of activations of neurons is not meaningful since random directions in feature space present similar properties.
  • Adversarial examples can be found for any neural network, they transfer across architectures trained with different hyperparameters and even different data sets.