PixelSNAIL: An Improved Autoregressive Generative Model

Code: https://github.com/neocxi/pixelsnail-public

Summary

PixelSNAIL is an autoregressive generative model:

In this case, \((x_1, ..., x_n)\) are the pixels of an image.

Advantages of using an autoregressive generative model:

Possible conditional models, and why they don’t work:

Traditional RNNs suffer from really long-range dependencies
Causal convolutions (see PixelCNN) have a finite size receptive field
Self-attention (Attention Is All You Need/Transformer) requires keeping access to all previously generated elements

Choosing an ordering for the pixels is an arbitrary choice. Usually, a raster scan is chosen :

For example, causal convolutions (PixelCNN) are designed using a raster scan ordering :

The idea of PixelSNAIL is to combine a residual block and a self-attention block.

Receptive field for a randomly initialized model (Derivative of the predicted yellow pixel w.r.t the input):

They compare results with other tractable likelihood methods on CIFAR-10, ImageNet 32x32 and ImageNet 64x64.