The authors propose a novel loss for multi-class segmentation. The core idea behind their loss is to force the network to minimize the overall loss per class rather than per point (as in a standard CE loss).

The final loss is the Minkowski normalization with all valid patches. As $$k$$ increases, the difficult areas are highlighted. The authors argue that this helps in segmentation since there is often classes with only few samples. They found that $$k=5$$ gives the best results on Pascal VOC 2012.