Image-to-Image Translation with Conditional Adversarial Networks

Summary

Image-to-Image (or Pix-2-Pix) is a type of conditional GAN that does not use noise to generate its output. Instead, it uses a label map as an input to the generator. The discriminator receives the label map and the generated image to make its decision.

Generator

The generator is basically an U-Net which takes the label map as the input and produces an image.

Discriminator

The discriminator is a normal CNN with a sigmoid activation layer at the end. It also uses LeakyReLU for every convolutional layer. The input of the discriminator is both the label map and the generated image.

Loss

Pix-2-Pix is using the standard conditional GAN loss plus a regularisation term L1.

Where lambda is a hyperparameter (100 is suggested).

Examples

Implementations

Pix2pix Caffe (Original Code)
Pix2pix Tensorflow
Pix2pix Keras