PixelSNAIL: An Improved Autoregressive Generative Model
Code: https://github.com/neocxi/pixelsnail-public
Summary
PixelSNAIL is an autoregressive generative model:
In this case, \((x_1, ..., x_n)\) are the pixels of an image.
Advantages of using an autoregressive generative model:
- Tractable likelihood and easy training (as opposed to GANs)
- Outperforms latent variable models
Possible conditional models, and why they don’t work:
- Traditional RNNs suffer from really long-range dependencies
- Causal convolutions (see PixelCNN) have a finite size receptive field
- Self-attention (Attention Is All You Need/Transformer) requires keeping access to all previously generated elements
Choosing an ordering for the pixels is an arbitrary choice. Usually, a raster scan is chosen :
For example, causal convolutions (PixelCNN) are designed using a raster scan ordering :
The idea of PixelSNAIL is to combine a residual block and a self-attention block.
Receptive field for a randomly initialized model (Derivative of the predicted yellow pixel w.r.t the input):
Results
They compare results with other tractable likelihood methods on CIFAR-10, ImageNet 32x32 and ImageNet 64x64.