Image-to-Image (or Pix-2-Pix) is a type of conditional GAN that does not use noise to generate its output. Instead, it uses a label map as an input to the generator. The discriminator receives the label map and the generated image to make its decision.


The generator is basically an U-Net which takes the label map as the input and produces an image.


The discriminator is a normal CNN with a sigmoid activation layer at the end. It also uses LeakyReLU for every convolutional layer. The input of the discriminator is both the label map and the generated image.


Pix-2-Pix is using the standard conditional GAN loss plus a regularisation term L1.

Where lambda is a hyperparameter (100 is suggested).