In this paper, the authors present a conditional VAE capable of encoding and decoding manual sketches. The goal is to teach a machine to draw a sketch like a human would do.
A sketch is a list of points, and each point is a vector consisting of 5 elements: \(S_i=(\Delta x, \Delta y, p_1, p_2, p_3)\) where \((\Delta x, \Delta y)\) is the offset wrt the previous point and \((p_1, p_2, p_3)\) are binary variables encoding the pen status (down, up, EndOfSketch).
The method is illustrated in the top figure. The encoder is a bidirectional RNN which generates a latent point, iid of a Gaussian distribution. The decoder is an RNN whose input is both the previous point \(S_i\) and the latent vector \(Z\) and the output is a GMM for \((\Delta x, \Delta y)\)
and a Softmax categorical distribution for \((p_1, p_2, p_3)\). The loss has the usual VAE shape, i.e. a reconstruction Loss plus a Kullback-Leibler Divergence Loss. The reconstruction loss has two terms namely
At test time, they use a “temperature” variable \(\tau\) to add more or less randomness to their method.
They first show how their method works without an encoder (so only training the decoder). This leads to an unconditioned RNN generator.
They then show that their VAE can correct abnormalities.
and interpolate between shapes
Even more interesting, their system can predict a sketch from an incomplete sketch.
You shall find more information on the follwing BLOG