## Summary

PixelSNAIL is an autoregressive generative model:

In this case, $$(x_1, ..., x_n)$$ are the pixels of an image.

Advantages of using an autoregressive generative model:

• Tractable likelihood and easy training (as opposed to GANs)
• Outperforms latent variable models

Possible conditional models, and why they don’t work:

• Traditional RNNs suffer from really long-range dependencies
• Causal convolutions (see PixelCNN) have a finite size receptive field
• Self-attention (Attention Is All You Need/Transformer) requires keeping access to all previously generated elements

Choosing an ordering for the pixels is an arbitrary choice. Usually, a raster scan is chosen :

For example, causal convolutions (PixelCNN) are designed using a raster scan ordering :

The idea of PixelSNAIL is to combine a residual block and a self-attention block.

Receptive field for a randomly initialized model (Derivative of the predicted yellow pixel w.r.t the input):

## Results

They compare results with other tractable likelihood methods on CIFAR-10, ImageNet 32x32 and ImageNet 64x64.