Universal adversarial perturbations
The authors propose a systematic algorithm for computing universal perturbations to fool image classification networks. The perturbations are shown to work very well across neural networks.
The proposed algorithm has two parameters:
- The norm of the perturbation to be added to images
- The desired fooling rate
The idea is to iteratively go over images and build the “universal perturbation” \(v\) by computing the minimal modification to \(v\) that causes each image to be misclassified.
Experiments and Results
Dataset: ILSVRC 2012 validation set (50,000 images)
Note that in Table 1, “X” is the training set on which the universal perturbation is computed.