## Summary

Conditional GANs are trained to hierarchically generate an encoder’s internal representations by first conditioning on the class label (or encoding), down to generating the input image.

There are 3 combined types of losses:

• Adversarial loss: Each GAN-generated representation should be able to fool a discriminator when compared to the encoder-generated representations (purple arrows)
• Conditional loss: An encoder should produce the same higher representation whether its input was generated by a GAN or an encoder (blue arrows)
• Entropy loss: A new deep model $$Q_i$$ (which shares parameters with $$D_i$$) tries to predict the GAN input noise given the GAN’s output $$Q_i( z_i$$ | $$\hat{h}_i )$$ (green arrows)

Training is done in two-steps to avoid the problem of having too much noise in the outputs at the beginning.

• Each GAN is trained separately (conditioned on the true last representation)
• All GANs are trained end-to-end (conditioned on the previous GAN’s generated representation)

## Experiments and results

Datasets used: MNIST, SVHN, CIFAR-10

The authors show that without the entropy loss term, samples tend to be invariant to the input noise.

Their method beats the state-of-the-art Inception score on CIFAR-10. They also get a better “turing test” score on AMT than DC-GAN on MNIST and CIFAR-10 (24.4% error rate for SGAN vs. 15.6% for DC-GAN with the three combined losses).

## Notes

SGANs depend on a good discriminator/encoder to learn good representations. I’d like to see results using unsupervised learning (perhaps using something like an auto-encoder).