Implicit Diffusion Models for Continuous Super-Resolution.pdf

落枫0907

388

10页

0次

2023-11-03

50墨值下载

Implicit Diffusion Models for Continuous Super-Resolution

Sicheng Gao

, Xuhui Liu

, Bohan Zeng

, Sheng Xu

, Yanjing Li

, Xiaoyan Luo

Jianzhuang Liu

, Xiantong Zhen

, Baochang Zhang

1,4†

Beihang University

Shenzhen Institute of Advanced Technology, Shenzhen, China

United Imaging

Zhongguancun Laboratory, Beijing, China

Abstract

modulates the proportion of the LR information and generated

features in the ﬁnal output, which enables the model to accom-

modate the continuous-resolution requirement. Extensive ex-

periments validate the effectiveness of our IDM and demon-

strate its superior performance over prior arts. The source

code will be available at https://github.com/Ree1s/

IDM

Image super-resolution (SR) has attracted increasing atten-

tion due to its widespread applications. However, current SR

methods generally suffer from over-smoothing and artifacts,

and most work only with ﬁxed magniﬁcations. This paper in-

troduces an Implicit Diffusion Model (IDM) for high-ﬁdelity

continuous image super-resolution. IDM integrates an im-

plicit neural representation and a denoising diffusion model

in a uniﬁed end-to-end framework, where the implicit neu-

ral representation is adopted in the decoding process to learn

continuous-resolution representation. Furthermore, we design

a scale-adaptive conditioning mechanism that consists of a

low-resolution (LR) conditioning network and a scaling fac-

tor. The scaling factor regulates the resolution and accordingly

1. Introduction

Image super-resolution (SR) refers to the task of generating

high-resolution (HR) images from given low-resolution (LR)

images. It has attracted increasing attention due to its far-

reaching applications, such as video restoration, photography,

and accelerating data transmission. While signiﬁcant progress

has been achieved recently, existing SR models predominantly

suffer from suboptimal quality and the requirement for ﬁxed-

resolution outputs, leading to undesirable restrictions in prac-

tice.

Regression-based methods [21, 23] offer an intuitive way to

establish a mapping from LR to HR images. LIIF [6] speciﬁ-

cally achieves resolution-continuous outputs through implicit

neural representation. However, these methods often fail to

generate high-ﬁdelity details needed for high magniﬁcations

These authors contributed equally.

†

Corresponding Author: bczhang@buaa.edu.cn.

(d)

(c)

(b)

(a)

×10

×8×2

Figure 1. Visual comparison, where training is on 8× SR and testing

on 2×, 8×, and 10×. (a) EDSR [23] and (b) LIIF [6] are regression-

based models; (c) SR3 [35] and (d) IDM (ours) are generative models.

Among them, LIIF and IDM employ the implicit neural representa-

tion.

(see Fig. 1(a) and (b)) since their regression losses tend to cal-

culate the averaged results of possible SR predictions. Deep

generative models, including autoregressive [30, 43], GAN-

based [15,16,18,26], ﬂow-based [8,24] and variational autoen-

coders (VAEs) [17, 42], have emerged as solutions that enrich

detailed textures. Still, they often exhibit artifacts and only ap-

ply to pre-deﬁned ﬁxed magniﬁcations. Despite the ability to

generate realistic images with high perceptual quality with the

help of extra priors, GAN-based models are subject to mode

collapse and struggle to capture complex data distributions,

yielding unnatural textures. Recently, Diffusion Probabilistic

Models (DMs) [12, 39] have been used in image synthesis to

improve the ﬁdelity of SR images and have shown impressive

performance. Nonetheless, DM-based methods are still lim-

ited to ﬁxed magniﬁcations, which would result in corrupted

output once the magniﬁcation changes (see Fig. 1(c)). There-

arXiv:2303.16491v1 [cs.CV] 29 Mar 2023

Figure 2. Examples of 16 × super-resolution. (a) LR input. (b) ESRGAN [45] which trains a simple end-to-end structure GAN, and loses the

inherent information. (c) GLEAN [4] which achieves more realistic details through additional StyleGAN [16] priors, but still generates unnatural

textures and GAN-speciﬁc artifacts. (d) With implicit continuous representation based on a scale-adaptive conditioning mechanism, IDM generates

the output with high-ﬁdelity details and retains the identity of the ground-truth. (e) The ground-truth.

fore, they turn to a complicated cascaded structure [13] or two-

stage training strategies [10, 33, 34] to achieve multiple com-

bined magniﬁcations, or retrain the model for a speciﬁc resolu-

tion [35], which brings extra training cost.

To address these issues, this paper presents a novel Implicit

Diffusion Model (IDM) for high-ﬁdelity image SR across a

continuous range of resolutions. We take the merit of diffu-

sion models in synthesizing ﬁne image details to improve the ﬁ-

delity of SR results and introduce the implicit image function to

handle the ﬁxed-resolution limitation. In particular, we formu-

late continuous image super-resolution as a denoising diffusion

process. We leverage the appealing property of implicit neural

representations by encoding an image as a function into a con-

tinuous space. When incorporated into the diffusion model, it

is parameterized by a coordinate-based Multi-Layer Perceptron

(MLP) to capture the resolution-continuous representations of

images better.

At a high level, IDM iteratively leverages the denoising dif-

fusion model and the implicit image function, which is im-

plemented in the upsampling layers of the U-Net architecture.

Fig. 1(d) illustrates that IDM achieves continuously modu-

lated results within a wide range of resolutions. Accordingly,

we develop a scale-adaptive conditioning mechanism consist-

ing of an LR conditioning network and a scaling factor. The

LR conditioning network can encode LR images without pri-

ors and provide multi-resolution features for the iterative de-

noising steps. The scaling factor is introduced for controlling

the output resolution continuously and works through the adap-

tive MLP to adjust how much the encoded LR and generated

features are expressed. It is worth noting that, unlike previ-

ous methods with two-stage synthesis pipelines [9, 13, 33] or

additional priors [4, 26, 44], IDM enjoys an elegant end-to-end

training framework without extra priors. As shown in Fig. 2,

we can observe that IDM outperforms other previous works in

synthesizing photographic image details.

The main contributions of this paper are summarized as fol-

lows:

• We develop an Implicit Diffusion Model (IDM) for

continuous image super-resolution to reconstruct photo-

realistic images in an end-to-end manner. Iterative im-

plicit denoising diffusion is performed to learn resolution-

continuous representations that enhance the high-ﬁdelity

details of SR images.

• We design a scale-adaptive conditioning mechanism to

dynamically adjust the ratio of the realistic information

from LR features and the generated ﬁne details in the dif-

fusion process. This is achieved through an adaptive MLP

when size-varied SR outputs are needed.

• We conduct extensive experiments on key benchmarks for

natural and facial image SR tasks. IDM exhibits state-

of-the-art qualitative and quantitative results compared to

the previous works and yields high-ﬁdelity resolution-

continuous outputs.

2. Related Work

Implicit Neural Representation. In recent years, implicit

neural representations have shown extraordinary capability in

modeling 3D object shapes, synthesizing 3D surfaces of the

scene, and capturing complicated 3D structures [3, 27–29, 36–

38]. Particularly, methods based on Neural Radiance Fields

(NeRF) [2, 28] utilize Multi-Layer Perceptrons (MLPs) to ren-

der 3D-consistent images with reﬁned texture details. Because

of its outstanding performance in 3D tasks, implicit neural rep-

resentations have been extended to 2D images. Instead of pa-

rameterizing 2D shapes with an MLP with ReLU as in early

works [31, 40], SIREN [37] employs periodic activation func-

tions to model high-quality image representations with fast

convergence. LIIF [6] signiﬁcantly improves the performance

of representing natural and complex images with local latent

code, which can restore images in an arbitrary resolution. How-

ever, the high-resolution results generated by LIIF are con-

strained by prior LR information, resulting in over-smoothing

with high-frequency information lost.

(a) Low-Resolution (b) ESRGAN (c) GLEAN (d) IDM (ours) (e) Ground-Truth

of 10

50墨值下载

关注

评论