Intriguing properties of neural networks
Highlights
The authors find two properties of neural networks.
- There is no distinction between individual high-level units and random linear combinations of high-level units.
- Existence and transferability of adversarial examples.
On the units
Let x∈Rm be an input image and ϕ(x) the activation values of some layer. One can look at what inputs maximize the features of ϕ(x), that is:
The authors find that many images that satisfy
are semantically related to each other, where v is a random vector.
- This puts into question the notion that neural networks disentangle variation factors across coordinates.
Adversarial examples
Let f:Rm→{1,...,k} be a classifier with an associated loss function. For a given input x∈Rm and target label l∈{1,...,k}, the aim is to solve
The minimizer is denoted D(x,l). This task is non-trivial only if f(x)≠l. The authors find an approximation of D(x,l) by line-search to find the minimum c>0 for which the minimum r of the following problems atisfies f(x+r)=l.
- In the convex case this yields the exact solution.
Experiments
- Existence of adversarial examples
- Adversarial examples transfer to other architectures trained from scratch with different hyperparameters:
- Adversarial examples transfer to other architectures trained on a disjoint training set:
- Adding random noise to the input images is not as effective to misclassify samples when compared to adversarial examples generation:
Theoretical analysis
Conclusion
- The semantic meaning of activations of neurons is not meaningful since random directions in feature space present similar properties.
- Adversarial examples can be found for any neural network, they transfer across architectures trained with different hyperparameters and even different data sets.