The basic idea of this paper is to train multiple detection heads with multiple IoU thresholds. The output of the previous detector is fed to the next as a resampling mechanism.

This method gives better results, especially for high IoUs.

This approach is somewhat expensive as it adds 100M parameters and is slower by 0.03 seconds on a FPN during inference. (0.115s vs 0.14)

Code is available here : https://github.com/zhaoweicai/cascade-rcnn