Survey: Network compression and speedup
Nice powerpoint presentation made by three students at the University of Illinois at Urbana-Champaign surveying the some of the most widely implemented methods for compressing and speeding up neural networks.
The presentation goes through the following approaches
- Matrix Factorization
- Singular Value Decomposition (SVD)
- Flattened Convolutions
- Weight Pruning
- Magnitude-based method
- Iterative pruning + Retraining
- Pruning with rehabilitation
- Hessian-based method
- Quantization method
- Full Quantization
- Fixed-point format
- Code book
- Quantization with full-precision copy
-
Pruning + Quantization + Encoding
- Design small architecture: SqueezeNet