Universal adversarial perturbations
Summary
The authors propose a systematic algorithm for computing universal perturbations to fool image classification networks. The perturbations are shown to work very well across neural networks.
The proposed algorithm has two parameters:
- The norm of the perturbation to be added to images
- The desired fooling rate
The idea is to iteratively go over images and build the “universal perturbation” v by computing the minimal modification to v that causes each image to be misclassified.
Experiments and Results
Dataset: ILSVRC 2012 validation set (50,000 images)
Note that in Table 1, “X” is the training set on which the universal perturbation is computed.