The authors propose a method for obtaining resource-efficient neural networks. This method allows to specify a constraint, such as respecting a neuron budget. The method is inspired by the Method of Auxiliary Coordinates (see my other post on that paper).

# Neural network pruning as an optimization problem

The authors study 3 pruning costs: C(w) = ||w||_p where p ∈ [0, 1, 2]. These costs are used with 2 different approaches: the constraint form and the penalty form.

## Constraint form

$$w$$ are the trainable weights of the network, and $$\theta$$ is a set of trainable parameters where each parameter is associated with one in $$w$$.

• $$w$$ is trained to lower the cross entropy loss
• $$\theta$$ is optimized to respect the pruning constraint
• $$w, \theta$$ are optimized to be equal

Alterning optimization is used. There is a Learning step and a Compression step. In the learning step, we minimize the following:

$L(w) + \frac{\mu}{2} |w - \theta|^2$

We will drive $$\mu \to \infty$$. In the compression step, we minimize

||w - \theta||^2 s.t. C(\theta) < \kappa.

In practice, the Compression step is done algorithmically by keeping the top-$$\kappa$$ weights in $$\theta$$ and zeroing-out the rest (remember that only $$\theta$$ is changed here, not $$w$$). See paper for all the crunchy details, formulas, theorems and proofs.

## Penalty form

The penalty form is very similar and is proved to be equivalent. The Learning step is the same, and the Compression step replaces the constraint with a term to minimize.

## Global vs. Local sparsity

With this method, it is possible to find networks respecting a budget with different compression ratios for each layer, and so naturally.

# Results

Results are pretty good

Good luck.