Summary

SSD is an adaptation of YOLO to support prior boxes. Prior boxes (called default boxes in the paper) are pre-calculated boxes with different aspect ratios and scales. At prediction time, SSD predicts the correct prior box and the associated class. Also, SSD is using multiples feature maps to achieve a better performance.

Model Mean avg precision FPS Input size
Faster R-CNN 73.2 7 1000x600
YOLO (VGG-16) 66.4 21 443x443
SSD512 76.8 22 512x512
SSD300 74.3 59 300x300
Fast-YOLO 52.7 155 443x443
Faster R-CNN works on any input size

Model

The model is using the VGG-16 model for its base. It then uses several feature maps to produce its output.

It’s a more complex model than YOLO but it’s faster because the input size is smaller.

ssd_model

Using Atrous Convolution speeds up the model by 20%

Default boxes

Default boxes are computed from the training sets, they are similar to the anchor boxes from Faster R-CNN. They help the network getting the right aspect-ratio.

Loss

The loss function is similar to YOLO’s loss function. Instead of multiple detections per cell, it predicts a box per prior box. The loss is computed on the prior boxes with a Jaccard overlap bigger than 0.5. This allows multiple predictions per cell.

Implementations

SSD Tensorflow

SSD Keras