PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

Summary

Generalization of Inception-ResNet modules. A PolyInception module is represented as a polynomial composition of Inception blocks, which may share parameters inside the same module.

PolyInception modules

Notation:

I is Input
\(F, G, H\) are inception “blocks”
Same letter = Shared parameters

Proposed modules:

poly-2: \(I + F + F^2\)
mpoly-2: \(I + F + GF\)
2-way: \(I + F + G\)

NOTE: “mpoly-2” and “2-way” modules posess stronger expressive power but increase parameter size.

Could go even further:

poly-3: \(I + F + F^2 + F^3\)
mpoly-3: \(I + F + GF + HGF\)
3-way: \(I + F + G + H\)

Inception-ResNet vs PolyNet structure

Inception-ResNet is composed of 3 “stages” (A-B-C) that operate on different spatial resolutions.

Experiments and Results

Dataset: ILSVRC (ImageNet - 1000 classes)

Replacing Inception-ResNet-v2 stage B with PolyInceptions results in greater performance gains (compared to replacing the other 2 stages).

Using mixed PolyInception modules provides the best results.

PolyNet performance scales better with depth than ResNet/Inception-ResNet.

Performance gains over SotA are of the order of >1%.

Conclusion

Take home message: Structural diversity is recommended when deepening a CNN.