The best object detectors are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors (like Yolo and ssd) that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors.

Authors argue that one stage methods suffer from class imbalance. They propose a solution by using a new loss call the Focal Loss as in Fig.1.

The main advantage of the focal loss is to give a near zero loss to the well classified samples whose probability are not close to one.

The proposed architecture is a pyramid net, somehow similar to a unet but with skipconnections and with an output at each level of the pyramid.


Results show that their method is more accurate and faster than previous methods.