暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
25.Generative Adversarial Networks.pdf
255
9页
0次
2021-02-22
50墨值下载
Generative Adversarial Nets
Ian J. Goodfellow, Jean Pouget-Abadie
, Mehdi Mirza, Bing Xu, David Warde-Farley,
Sherjil Ozair
, Aaron Courville, Yoshua Bengio
D
´
epartement d’informatique et de recherche op
´
erationnelle
Universit
´
e de Montr
´
eal
Montr
´
eal, QC H3C 3J7
Abstract
We propose a new framework for estimating generative models via an adversar-
ial process, in which we simultaneously train two models: a generative model G
that captures the data distribution, and a discriminative model D that estimates
the probability that a sample came from the training data rather than G. The train-
ing procedure for G is to maximize the probability of D making a mistake. This
framework corresponds to a minimax two-player game. In the space of arbitrary
functions G and D, a unique solution exists, with G recovering the training data
distribution and D equal to
1
2
everywhere. In the case where G and D are defined
by multilayer perceptrons, the entire system can be trained with backpropagation.
There is no need for any Markov chains or unrolled approximate inference net-
works during either training or generation of samples. Experiments demonstrate
the potential of the framework through qualitative and quantitative evaluation of
the generated samples.
1 Introduction
The promise of deep learning is to discover rich, hierarchical models [2] that represent probability
distributions over the kinds of data encountered in artificial intelligence applications, such as natural
images, audio waveforms containing speech, and symbols in natural language corpora. So far, the
most striking successes in deep learning have involved discriminative models, usually those that
map a high-dimensional, rich sensory input to a class label [14, 22]. These striking successes have
primarily been based on the backpropagation and dropout algorithms, using piecewise linear units
[19, 9, 10] which have a particularly well-behaved gradient . Deep generative models have had less
of an impact, due to the difficulty of approximating many intractable probabilistic computations that
arise in maximum likelihood estimation and related strategies, and due to difficulty of leveraging
the benefits of piecewise linear units in the generative context. We propose a new generative model
estimation procedure that sidesteps these difficulties.
1
In the proposed adversarial nets framework, the generative model is pitted against an adversary: a
discriminative model that learns to determine whether a sample is from the model distribution or the
data distribution. The generative model can be thought of as analogous to a team of counterfeiters,
trying to produce fake currency and use it without detection, while the discriminative model is
analogous to the police, trying to detect the counterfeit currency. Competition in this game drives
both teams to improve their methods until the counterfeits are indistiguishable from the genuine
articles.
Jean Pouget-Abadie is visiting Universit
´
e de Montr
´
eal from Ecole Polytechnique.
Sherjil Ozair is visiting Universit
´
e de Montr
´
eal from Indian Institute of Technology Delhi
Yoshua Bengio is a CIFAR Senior Fellow.
1
All code and hyperparameters available at http://www.github.com/goodfeli/adversarial
1
arXiv:1406.2661v1 [stat.ML] 10 Jun 2014
This framework can yield specific training algorithms for many kinds of model and optimization
algorithm. In this article, we explore the special case when the generative model generates samples
by passing random noise through a multilayer perceptron, and the discriminative model is also a
multilayer perceptron. We refer to this special case as adversarial nets. In this case, we can train
both models using only the highly successful backpropagation and dropout algorithms [17] and
sample from the generative model using only forward propagation. No approximate inference or
Markov chains are necessary.
2 Related work
An alternative to directed graphical models with latent variables are undirected graphical models
with latent variables, such as restricted Boltzmann machines (RBMs) [27, 16], deep Boltzmann
machines (DBMs) [26] and their numerous variants. The interactions within such models are
represented as the product of unnormalized potential functions, normalized by a global summa-
tion/integration over all states of the random variables. This quantity (the partition function) and
its gradient are intractable for all but the most trivial instances, although they can be estimated by
Markov chain Monte Carlo (MCMC) methods. Mixing poses a significant problem for learning
algorithms that rely on MCMC [3, 5].
Deep belief networks (DBNs) [16] are hybrid models containing a single undirected layer and sev-
eral directed layers. While a fast approximate layer-wise training criterion exists, DBNs incur the
computational difficulties associated with both undirected and directed models.
Alternative criteria that do not approximate or bound the log-likelihood have also been proposed,
such as score matching [18] and noise-contrastive estimation (NCE) [13]. Both of these require the
learned probability density to be analytically specified up to a normalization constant. Note that
in many interesting generative models with several layers of latent variables (such as DBNs and
DBMs), it is not even possible to derive a tractable unnormalized probability density. Some models
such as denoising auto-encoders [30] and contractive autoencoders have learning rules very similar
to score matching applied to RBMs. In NCE, as in this work, a discriminative training criterion is
employed to fit a generative model. However, rather than fitting a separate discriminative model, the
generative model itself is used to discriminate generated data from samples a fixed noise distribution.
Because NCE uses a fixed noise distribution, learning slows dramatically after the model has learned
even an approximately correct distribution over a small subset of the observed variables.
Finally, some techniques do not involve defining a probability distribution explicitly, but rather train
a generative machine to draw samples from the desired distribution. This approach has the advantage
that such machines can be designed to be trained by back-propagation. Prominent recent work in this
area includes the generative stochastic network (GSN) framework [5], which extends generalized
denoising auto-encoders [4]: both can be seen as defining a parameterized Markov chain, i.e., one
learns the parameters of a machine that performs one step of a generative Markov chain. Compared
to GSNs, the adversarial nets framework does not require a Markov chain for sampling. Because
adversarial nets do not require feedback loops during generation, they are better able to leverage
piecewise linear units [19, 9, 10], which improve the performance of backpropagation but have
problems with unbounded activation when used ina feedback loop. More recent examples of training
a generative machine by back-propagating into it include recent work on auto-encoding variational
Bayes [20] and stochastic backpropagation [24].
3 Adversarial nets
The adversarial modeling framework is most straightforward to apply when the models are both
multilayer perceptrons. To learn the generator’s distribution p
g
over data x, we define a prior on
input noise variables p
z
(z), then represent a mapping to data space as G(z; θ
g
), where G is a
differentiable function represented by a multilayer perceptron with parameters θ
g
. We also define a
second multilayer perceptron D(x; θ
d
) that outputs a single scalar. D(x) represents the probability
that x came from the data rather than p
g
. We train D to maximize the probability of assigning the
correct label to both training examples and samples from G. We simultaneously train G to minimize
log(1 D(G(z ))):
2
of 9
50墨值下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜