SVM Poisoning.

Gradient ascent-based attacks to maximize test error.

Introduction

For this project, I reimplemented the majority of Poisoning Attacks against Support Vector Machines (Biggio et al.). Due to computational constraints, I limited the implementation to linear SVMs, but the extension to non-linear kernel SVMs (detailed in the paper) is fairly straightforward; with a small enough step size, only the gradient term needs to be changed. From the abstract:

We investigate a family of poisoning attacks against Support Vector Machines (SVM). Such attacks inject specially crafted training data that increases the SVM's test error. Central to the motivation for these attacks is the fact that most learning algorithms assume that their training data comes from a natural or well-behaved distribution. However, this assumption does not generally hold in security-sensitive settings. As we demonstrate, an intelligent adversary can, to some extent, predict the change of the SVM's decision function due to malicious input and use this ability to construct malicious data.

The proposed attack uses a gradient ascent strategy in which the gradient is computed based on properties of the SVM's optimal solution. This method can be kernelized and enables the attack to be constructed in the input space even for non-linear kernels. We experimentally demonstrate that our gradient ascent procedure reliably identifies good local maxima of the non-convex validation error surface, which significantly increases the classifier's test error.

Building understanding with 2D data

We begin by generating a training set of 2D data and fitting a linear SVM. For this project, we set our hyperparameter C = 2.0.

Aggregate signal
Optimal separating hyperplane with support vectors (both margin and error support) labels with a star.

We also generate a test set of 2D data. Our goal, as an adversarial agent, is to maximize the hinge loss of our test set, by adding one additional point to our training data. We define our hinge loss on the test set, as a function of our adversarial point x_c as:

\[L(x_c) = \sum_{k = 1}^m (1 - y_k f(x_k;x_c))_+\]

By repeatedly fitting SVMs on the training set, with an additional adversarial point, we can plot the test set hinge loss with respect to the location of the adversarial point.

2D hinge loss surface
Our test set hinge loss with respect to the location of our adversarial point in the training set.
Hinge loss surface

With these 2D and 3D plots, we observe two things:

  1. Given a linear SVM, we can trivially maximize our test hinge loss by moving the adversary arbitrarily far away from the separating hyperplane, on the side opposite to its true label.
  2. The loss surface is non-convex.

While the linear case is clearly unbounded, we will still use this setting as it simplifies both the derivation and computation. As outlined in the paper, the extension to non-linear kernels like a Gaussian RBF is straightforward, and the adversary can find satisfactory (local) maxima simply by changing the gradient function.

First pass: Gradient approximation by finite differences

My first, fairly naive attempt, involved intializing the adversarial point by selecting a point in the negative set and flipping its label to positive. Then, the hinge loss gradient with respect to that adversarial point can be calculated using finite differences:

\[\frac{\partial}{\partial x_i} f(x) \approx \frac{f(x + \frac{1}{2}h e_i) - f(x - \frac{1}{2}h e_i)}{h}\]

Using a gradient approximation calculated from finite differences, you can use any optimizer. Obviously state-of-the-art optimizers like Adam will work best, but I implemented a hacky one from scratch that involved line search and random step directions when the gradient became negligible. This worked well enough, but in the future I'll definitely stick to stronger out-of-the-box tools.

Finite difference result
The adversary (black, but belonging to the blue/positive class) after 100 iterations of gradient ascent.

The main drawback of using finite differences is that it involves fitting 2 SVMs to the training set at each iteration just to work out the gradient, and even more if you used my naive line-search and random steps to escape local optima! The goal of the paper is to avoid this, and instead explicitly calculate the gradient with respect to the adversary.

Second pass: Explicit calculation of gradient

Recall that, for (1) margin support, (2) error support and (3) reserve vectors respectively we have the following equations:

\[y_i (x_i^\top \beta + \beta_0) = 1 \quad (1)\] \[y_i (x_i^\top \beta + \beta_0) < 1 \quad (2)\] \[y_i (x_i^\top \beta + \beta_0) > 1 \quad (3)\]

The authors assume that for small perturbations of our adversarial point, the margin support vectors do not change:

$$ \begin{align} y_i (x_i^\top \beta(x_c) + \beta_0(x_c)) &= 1 \\ y_i (x_i^\top \beta(x_c') + \beta_0(x_c')) &= 1\end{align} $$

Where our beta weights are a function of our adversarial point. We can also define our decision function f in terms of both primal and dual variables:

$$ \begin{align} f(x;x_c) &= x^\top \beta(x_c) + \beta_0(x_c) \\ f(x;x_c) &= \left( \sum_{i = 1}^n \alpha_i y_i x_i + \alpha_c y_c x_c \right)^\top x + b \end{align} $$

Scalar conditions

Now we will define a new variable, Q, that will be useful in the calculation of our gradient:

$$ \begin{align} Q_{ij} &= y_j y_i x_j^T x_i \\ Q_{ic} &= y_c y_i x_c^T x_i \\ y_i f(x_i;x_c) &= \sum_{j = 1}^n Q_{ij} \alpha_j + Q_{ic} \alpha_c + y_i b \end{align}$$

This final equation will be useful, as it is the part of a test point's hinge loss with respect to the adversarial point.

Given our previous assumptions, we know that for a support vector (x_i, y_i), yf will always be 1, and so the partial derivative with respect to a component of our adversarial point will always be 0:

$$ \begin{align} x_c &= (u_1,\dots,u_d) \\ x_i &= (v_1,\dots,v_d) \\ \frac{\partial}{\partial u_l} y_i f(x_i;x_c) &= \sum_{j = 1}^n Q_{ij} \frac{\partial \alpha_i}{\partial u_l} + Q_{ic} \frac{\partial \alpha_c}{\partial u_l} + \frac{\partial Q_{ic}}{\partial u_l} \alpha_c + y_i \frac{\partial b}{\partial u_l}= 0 \\ \frac{\partial Q_{ic}}{\partial u_l} &= y_c y_i v_l \end{align} $$

Our other optimality condition (from the dual problem) must also hold, and we can take the derivative to establish another useful condition:

$$ \begin{align} \sum_{i = 1}^n \alpha_i y_i + \alpha_c y_c &= 0 \\ \sum_{i = 1}^n y_i \frac{\partial \alpha_i}{\partial u_l} + y_c \frac{\partial \alpha_c}{\partial u_l} &= 0 \end{align} $$

Vector conditions

If we define the following variables below, we can convert our conditions from summations over scalars to vector notation: $$ \begin{align} q_{is}^\top &= \begin{bmatrix} y_1 y_i x_1^T x_i & y_2 y_i x_2^T x_i & \dots & y_n y_c x_n^T x_c \end{bmatrix} \\ y &= \begin{bmatrix} y_s \\ y_c \end{bmatrix} \\ \gamma_i &= y_c y_i v_l \\ q_{is}^\top \frac{\partial \alpha}{\partial u_l} + \gamma_i \alpha_c + y_i \frac{\partial b}{\partial u_l} &= 0 \\ y^\top \frac{\partial \alpha}{\partial u_l} &= 0 \end{align} $$

Matrix conditions

We can now calculate the partial derivative of our bias term, as well as our dual alpha weights (i.e. point-wise weights): $$ \begin{align} A &= \begin{bmatrix} 0 & y_s^\top \\ y_s & Q_{ss} \end{bmatrix} \\ b &= \begin{bmatrix} 0 \\ -\frac{\partial Q_{sc}}{\partial u_l} \alpha_c \end{bmatrix} \\ A \begin{bmatrix} \frac{\partial \beta}{\partial u_l} \\ \frac{\partial \alpha}{\partial u_l} \end{bmatrix} &= b \\ \end{align} $$

Where Q_ss denotes a matrix whose i,jth entry corresponds to (y_i y_j x_i^T x_j) of the i-th and j-th margin support vectors, y_s is the vector containing only the margin support vector labels, dQ_sc/u_l is a vector whose dimension is the number of margin support vectors, such that the i-th element has dQ_ic/u_l, the partial derivative of the i-th support vector.

Explicit gradient ascent on MNIST

With the above, we can examine how this adversarial point looks using an SVM classifier trained to differentiate 7's an 9's:

Original
Our original 7 that we initialize our adversary at.
Perturbed
Our perturbed 7 after 100 iterations maximizing test set hinge loss.
Difference
The difference between the two images clearly shows the adversary adding "9-like" noise.