SSD is an adaptation of YOLO to support prior boxes. Prior boxes (called default boxes in the paper) are pre-calculated boxes with different aspect ratios and scales. At prediction time, SSD predicts the correct prior box and the associated class. Also, SSD is using multiples feature maps to achieve a better performance.
|Model||Mean avg precision||FPS||Input size|
Faster R-CNN works on any input size
The model is using the VGG-16 model for its base. It then uses several feature maps to produce its output.
It’s a more complex model than YOLO but it’s faster because the input size is smaller.
Using Atrous Convolution speeds up the model by 20%
Default boxes are computed from the training sets, they are similar to the anchor boxes from Faster R-CNN. They help the network getting the right aspect-ratio.
The loss function is similar to YOLO’s loss function. Instead of multiple detections per cell, it predicts a box per prior box. The loss is computed on the prior boxes with a Jaccard overlap bigger than 0.5. This allows multiple predictions per cell.