“Learning-Compression” Algorithms for Neural Net Pruning
The authors propose a method for obtaining resource-efficient neural networks. This method allows to specify a constraint, such as respecting a neuron budget. The method is inspired by the Method of Auxiliary Coordinates (see my other post on that paper).
Neural network pruning as an optimization problem
The authors study 3 pruning costs: C(w) = ||w||_p where p ∈ [0, 1, 2]
. These costs are used with 2 different approaches: the constraint form and the penalty form.
Constraint form
\(w\) are the trainable weights of the network, and \(\theta\) is a set of trainable parameters where each parameter is associated with one in \(w\).
- \(w\) is trained to lower the cross entropy loss
- \(\theta\) is optimized to respect the pruning constraint
- \(w, \theta\) are optimized to be equal
Alterning optimization is used. There is a Learning step and a Compression step. In the learning step, we minimize the following:
\[L(w) + \frac{\mu}{2} |w - \theta|^2\]We will drive \(\mu \to \infty\). In the compression step, we minimize
||w - \theta||^2 s.t. C(\theta) < \kappa.
In practice, the Compression step is done algorithmically by keeping the top-\(\kappa\) weights in \(\theta\) and zeroing-out the rest (remember that only \(\theta\) is changed here, not \(w\)). See paper for all the crunchy details, formulas, theorems and proofs.
Penalty form
The penalty form is very similar and is proved to be equivalent. The Learning step is the same, and the Compression step replaces the constraint with a term to minimize.
Global vs. Local sparsity
With this method, it is possible to find networks respecting a budget with different compression ratios for each layer, and so naturally.
Results
Results are pretty good
Reproducing the results
Good luck.