http://karpathy.github.io/2016/05/31/rl/

Yet another cool post by A. Karpathy. Potentially the best introduction to reinforcement learning and policy gradient with the game of Pong as an exemple and simple python code.