This paper presents a semi-automatic deep learning segmentation method. The idea is quite simple but the results beat state-of-the-art solutions.

To segment an object, the user is asked to select 4 keypoints : the topmost, the bottommost, the leftmost and the rightmost points. Each point is then associated to a Gaussian kernel printed in a 2D image. This 2D image is concatenated to the RBG input image thus leading to a 4-modality input image. To improve results, the feed to the network a dilated cropped window around the 4 points.

The proposed network is a modified ResNet101 without the last layers, without max poolings and with some dilated convolutions to make sure the output has the same size than the input.


They report a series of ablation results, but at the end of the day, they report state-of-the-art results.