The presented approach aims at learning a set of filters for denoising bursts of images taken by hand-held cameras (e.g. in smartphones).


  • Synthetic data creation from general purpose images simulating characteristics of real cameras.
  • Pixel-wise 3D kernel prediction for denoising of burst image sequence.
  • Adaptive, weighted (‘annealing’) loss.
  • Generalisation to multiple noise levels.

Conceptual overview

Basic concept

  • input

    • set of series of N images (burst)
    • one image of pixel-wise noise estimates
  • encoder-decoder structure

  • output

    • N filters per pixel in input space (filter size: \(K\) by \(K\))
    • synthesis of output image at pixel \(p\) by

      \[\hat{Y}^p = \frac{1}{N} \sum_{i=1}^N \, <f_i^p, V^p(X_i)>,\]

      where \(f^p_i\) denotes the learned filter at pixel \(p\) in input image \(i\).


  • main objective

    • \(L^2\)-term on gamma-corrected images
    • \(L^1\)-term on gradients of gamma-corrected images

      \[\ell(\hat{Y}, Y^\ast) = \lambda_2 \, \left\lVert\Gamma(\hat{Y}) - \Gamma(Y^\ast)\right\rVert_2^2 + \lambda_1 \, \left\lVert\nabla\Gamma(\hat{Y}) - \nabla\Gamma(Y^\ast)\right\rVert_1^2\]
  • annealed loss (This is the actual loss term!)

    • time dependent individual image loss term
    • idea: steer training in the beginning to avoid convergence to local minima

      \[\mathcal{L}(X; Y^\ast, t) = \ell\left(\frac{1}{N}\sum_{i=1}^N \, f_i(X_i), Y^\ast\right) + \beta\alpha^t \sum_{i=1}^N \, \ell(f_i(X_i), Y^\ast)\]

Synthetic data creation

  • The authors develop an approach to model several artifacts involved in creation of raw data from real camera sensors, including e.g.

    • explicit model for signal noise

    • simulated misalignment due to sensor movement



  • filter size: K = 5
  • number of input images: N = 8

Nomenclature (KPN methods)

  • 1-frame: N = 1
  • no ann: basic loss function only (@see main objective)
  • sigma blind: no noise estimate as input
  • direct: directly synthesise output pixel values (by adding three additional conv layers)

Synthetic data set

  • KPN always outperforms state of the art
  • multi-frame info, annealing loss and noise estimate are all helpful

Real data set

Predicted kernels

  • The approach is robust to object movement (the mouse).
  • The authors claim that their ‘annealing’ approach helps focusing on a single frame where movement occurs, while taking advantage of all frames in static parts of the image (i.e. background).

Adaptation to noise levels

  • Noise estimation

    • did not improve training loss
    • but helps for generalisation to larger (unseen) noise levels
  • Learned filters adapt to the noise estimate in the input

    • (a) to (e): scalar multiples of noise estimate for same image burst