Super famous 2012 AlexNet paper which presented the first CNN model applied to ImageNet. The networks comes with:

  • 5 conv layers with maxpooling and ReLU;
  • 2 FC layers + softmax of 1000 classes;
  • Multi-GPU
  • Local response normalization (“This sort of response normalization implements a form of lateral inhibition inspired by the type found in real neurons, creating competition for big activities amongst neuron outputs computed using different kernels.”)
  • Overlapping pooling
  • Data augmentation
  • Dropout