
Burst Denoising with Kernel Prediction Networks
Ben Mildenhall
1,2
∗
Jonathan T. Barron
2
Jiawen Chen
2
Dillon Sharlet
2
Ren Ng
1
Robert Carroll
2
1
UC Berkeley
2
Google Research
Abstract
We present a technique for jointly denoising bursts of im-
ages taken from a handheld camera. In particular, we pro-
pose a convolutional neural network architecture for pre-
dicting spatially varying kernels that can both align and de-
noise frames, a synthetic data generation approach based
on a realistic noise formation model, and an optimization
guided by an annealed loss function to avoid undesirable
local minima. Our model matches or outperforms the state-
of-the-art across a wide range of noise levels on both real
and synthetic data.
1. Introduction
The task of image denoising is foundational to the study
of imaging and computer vision. Traditionally, the prob-
lem of single-image denoising has been addressed as one
of statistical inference using analytical priors [20, 22], but
recent work has built on the success of deep learning by us-
ing convolutional neural networks that learn mappings from
noisy images to noiseless images by training on millions of
examples [29]. These networks appear to learn the likely
appearance of “ground truth” noiseless images in addition
to the statistical properties of the noise present in the input
images.
Multiple-image denoising has also traditionally been ap-
proached through the lens of classical statistical inference,
under the assumption that averaging multiple noisy and in-
dependent samples of a signal will result in a more accu-
rate estimate of the true underlying signal. However, when
denoising image bursts taken with handheld cameras, sim-
ple temporal averaging yields poor results because of scene
and camera motion. Many techniques attempt to first align
the burst or include some notion of translation-invariance
within the denoising operator itself [8]. The idea of de-
noising by combining multiple aligned image patches is
also key to many of the most successful single image tech-
niques [3, 4], which rely on the self-similarity of a single
∗
Work done while interning at Google.
image to allow some degree of denoising via averaging.
We propose a method for burst denoising with the signal-
to-noise ratio benefits of multi-image denoising and the
large capacity and generality of convolutional neural net-
works. Our model is capable of matching or outperforming
the state-of-the-art at all noise levels on both synthetic and
real data. Our contributions include:
1. A procedure for converting post-processed images
taken from the internet into data with the character-
istics of raw linear data captured by real cameras. This
lets us to train a model that generalizes to real images
and circumvents the difficulties in acquiring ground
truth data for our task from a camera.
2. A network architecture that outperforms the state-
of-the-art on synthetic and real data by predicting a
unique 3D denoising kernel to produce each pixel of
the output image. This provides both a performance
improvement over a network that synthesizes pixels di-
rectly, and a way to visually inspect how each burst
image is being used.
3. A training procedure for our kernel prediction network
that allows it to predict filter kernels that use infor-
mation from multiple images even in the presence of
small unknown misalignments.
4. A demonstration that a network that takes the noise
level of the input imagery as input during training and
testing generalizes to a much wider range of noise lev-
els than a blind denoising network.
2. Related work
Single-image denoising is a longstanding problem, origi-
nating with classical methods like anisotropic diffusion [20]
or total variation denoising [22], which used analytical pri-
ors and non-linear optimization to recover a signal from a
noisy image. These ideas were built upon to develop multi-
image or video denoising techniques such as VBM4D [17]
and non-local means [3, 14], which group similar patches
across time and jointly filter them under the assumption
that multiple noisy observations can be averaged to better
estimate the true underlying signal. Recently these ideas
arXiv:1712.02327v2 [cs.CV] 29 Mar 2018
评论