This paper proposes two procedures for loss correction in case of noisy labels that are agnostic to both application domain and network architecture. Their method simply amounts to at most a matrix inversion or a matrix multiplication

They suppose that the joint distribution of noisy labels data is modeled as

\[p(\vec x,\hat y) = \sum_y p(\hat y|y) p(y|\vec x)p(\vec x)\]


\[p(\hat y|y) = T \in [0, 1]^{c*c}\]

is the noise transition matrix specifying the probability of one label being flipped to another, i.e.:

\[\forall i, j, T_{ij} = p(\hat y = e^j |y = e^i).\]

In order to compensate for the noise, they propose 2 solutions : a forward correction procedure and a backward correction procedure. The forward correction amounts to multiply the output of the network with T

The backward correction amounts to multiply the loss by the inverse of T

The algorithm to compute T is very simple and amounts to


You know what, it works!