Nice powerpoint presentation made by three students at the University of Illinois at Urbana-Champaign surveying the some of the most widely implemented methods for compressing and speeding up neural networks.

The presentation goes through the following approaches

  • Matrix Factorization
    • Singular Value Decomposition (SVD)
    • Flattened Convolutions
  • Weight Pruning
    • Magnitude-based method
    • Iterative pruning + Retraining
    • Pruning with rehabilitation
    • Hessian-based method
  • Quantization method
    • Full Quantization
    • Fixed-point format
    • Code book
    • Quantization with full-precision copy
  • Pruning + Quantization + Encoding

  • Design small architecture: SqueezeNet