Survey: Network compression and speedup

Nice powerpoint presentation made by three students at the University of Illinois at Urbana-Champaign surveying the some of the most widely implemented methods for compressing and speeding up neural networks.

The presentation goes through the following approaches

Matrix Factorization
- Singular Value Decomposition (SVD)
- Flattened Convolutions
Weight Pruning
- Magnitude-based method
- Iterative pruning + Retraining
- Pruning with rehabilitation
- Hessian-based method
Quantization method
- Full Quantization
- Fixed-point format
- Code book
- Quantization with full-precision copy
Pruning + Quantization + Encoding
Design small architecture: SqueezeNet