Summary

The authors propose a systematic algorithm for computing universal perturbations to fool image classification networks. The perturbations are shown to work very well across neural networks.

The proposed algorithm has two parameters:

  1. The norm of the perturbation to be added to images
  2. The desired fooling rate

The idea is to iteratively go over images and build the “universal perturbation” v by computing the minimal modification to v that causes each image to be misclassified.

Experiments and Results

Dataset: ILSVRC 2012 validation set (50,000 images)

Note that in Table 1, “X” is the training set on which the universal perturbation is computed.