Intriguing properties of neural networks
Highlights
The authors find two properties of neural networks.
- There is no distinction between individual high-level units and random linear combinations of high-level units.
- Existence and transferability of adversarial examples.
On the units
Let \(x \in \mathbb{R}^m\) be an input image and \(\phi(x)\) the activation values of some layer. One can look at what inputs maximize the features of \(\phi(x)\), that is:
The authors find that many images that satisfy
are semantically related to each other, where \(v\) is a random vector.
- This puts into question the notion that neural networks disentangle variation factors across coordinates.
Adversarial examples
Let \(f:\mathbb{R}^m \to \{1,...,k \}\) be a classifier with an associated loss function. For a given input \(x \in \mathbb{R}^m\) and target label \(l \in \{1,...,k \}\), the aim is to solve
The minimizer is denoted \(D(x,l)\). This task is non-trivial only if \(f(x) \not= l\). The authors find an approximation of \(D(x,l)\) by line-search to find the minimum \(c>0\) for which the minimum \(r\) of the following problems atisfies \(f(x+r)=l\).
- In the convex case this yields the exact solution.
Experiments
- Existence of adversarial examples
- Adversarial examples transfer to other architectures trained from scratch with different hyperparameters:
- Adversarial examples transfer to other architectures trained on a disjoint training set:
- Adding random noise to the input images is not as effective to misclassify samples when compared to adversarial examples generation:
Theoretical analysis
Conclusion
- The semantic meaning of activations of neurons is not meaningful since random directions in feature space present similar properties.
- Adversarial examples can be found for any neural network, they transfer across architectures trained with different hyperparameters and even different data sets.