Summary

Generalization of Inception-ResNet modules. A PolyInception module is represented as a polynomial composition of Inception blocks, which may share parameters inside the same module.

PolyInception modules

Notation:

  • I is Input
  • \(F, G, H\) are inception “blocks”
  • Same letter = Shared parameters

Proposed modules:

  • poly-2: \(I + F + F^2\)
  • mpoly-2: \(I + F + GF\)
  • 2-way: \(I + F + G\)

NOTE: “mpoly-2” and “2-way” modules posess stronger expressive power but increase parameter size.

Could go even further:

  • poly-3: \(I + F + F^2 + F^3\)
  • mpoly-3: \(I + F + GF + HGF\)
  • 3-way: \(I + F + G + H\)

Inception-ResNet vs PolyNet structure

Inception-ResNet is composed of 3 “stages” (A-B-C) that operate on different spatial resolutions.

Experiments and Results

Dataset: ILSVRC (ImageNet - 1000 classes)

Replacing Inception-ResNet-v2 stage B with PolyInceptions results in greater performance gains (compared to replacing the other 2 stages).

Using mixed PolyInception modules provides the best results.

PolyNet performance scales better with depth than ResNet/Inception-ResNet.

Performance gains over SotA are of the order of >1%.

Conclusion

Take home message: Structural diversity is recommended when deepening a CNN.