Evasion attacks against machine learning at test time
Highlights
 The authors propose an optimization method to generate adversarial examples for SVM and neural nets.
 The method has a regularization method that allows to generate adversarial examples in the blackbox and whitebox settings separately.
 They apply the method to malware detection in PDF files.
Introduction
Given a classification algorithm \(f: X \mapsto Y\) that assigns to a feature \(x \in X\) a label \(y \in Y = \{1,+1 \}\), where \(1\) represents the legitimate class and \(1\) the adversarial or malicious class. The label \(y^c=f(x)\) given by a classifier is obtained by thresholding a discriminant function \(g : X \mapsto \mathbb{R}\), so we assume \(f(x)=1\) if \(g(x) < 0\) and \(+1\) otherwise.
Adversary model

Adversary’s goal: Find a sample \(x\) such that \(g(x)<\epsilon\) for every \(\epsilon>0\). This means the sample just crosses the decision boundary. It can also be asked to find a sample where the classifier is highly confident on a wrong decision.

Adversary’s knowledge: Depending on information about the classifier, data or training the adversary may find different strategies.

Adversary’s capability: The authors restrict to scenarios where the adversary can only modify the test data.
Attack scenarios

Perfect knowledge (PK): The adversary knows everything about the classifier: architecture, data, training.

Limited knowledge (LK): The authors assume the adversary can generate a new dataset \(D'= \{ (\hat{x}, \hat{y}) \}\) on which to train a new classifier \(\hat{f}\) and discriminat function \(\hat{g}\) that approximates the original discriminant function \(g\) of the classifier \(f\).
Attack strategy
For an adversarial target sample \(x^0\), an optimal attack strategy finds sample a \(x^*\) to minimize \(g\) or its estimate \(\hat{g}\), subject to a bound on its distance from \(x^0\)
But this is hard, so they regularize it as follows:
They approach this via gradient descent: