The idea

Batch normalization works really well but is really slow even with parallel optimizations.

The authors suggest that you can speed up the process by sampling only part of the batch and estimating mean and variance. Models C, D are Resnet-18 and Model E is Resnet-50

The method

Multiple sampling methods are proposed, but the authors mostly used Batch Sampling (BS) and Feature Sampling (FS). See above image for the explanation of approaches. They also used Virtual Dataset Normalization where estimation is both done on virtual data and sampled data according to a predetermined ratio.

The results

From the above results, we can see that the model converges as fast as full batch normalization, with potential substantial speedup and negligible accuracy loss.