## Summary

The refiner $$R_{\theta}$$ is a fully convolutional network without striding or pooling.

In addition to the usual adversarial loss ($$l_{real}$$), a regularization loss is used to preserve “annotation” information from the simulator ($$l_{reg}$$).

The function $$\psi$$ in the regularization term is a mapping to feature space. It is usually the identity function, but in some cases the authors use other features, like the mean of color channels or a convnet output for example.

### Generated images history for discriminator training

The training set for the discriminator update is built using 50% real images, 25% of refined images generated by the latest generator, and 25% of refined images generated by past versions of the generator. This is done to improve the stability of adversarial training. The authors note that this method is complimentary to using a running average of the model parameters.

## Experiments and Results

Datasets:

• Appearance-based gaze estimation on MPIIGaze dataset
• Hand pose estimation on NYU hand pose dataset of depth images

Visual Turing test A “Visual Turing test” for classifying real vs. refined images was done, and the human accuracy was 51.7%, showing that refined images are almost indistinguishable from real images.

Training on refined synthetic data outperforms training on purely synthetic data by 22.3%.

Comparison to other methods