2 Related work
Several recent papers focus on improving the stability of training and the resulting perceptual quality
of GAN samples [2, 3, 5, 6]. We build on some of these techniques in this work. For instance, we
use some of the “DCGAN” architectural innovations proposed in Radford et al. [3], as discussed
below.
One of our proposed techniques, feature matching, discussed in Sec. 3.1, is similar in spirit to
approaches that use maximum mean discrepancy [7, 8, 9] to train generator networks [10, 11].
Another of our proposed techniques, minibatch features, is based in part on ideas used for batch
normalization [12], while our proposed virtual batch normalization is a direct extension of batch
normalization.
One of the primary goals of this work is to improve the effectiveness of generative adversarial
networks for semi-supervised learning (improving the performance of a supervised task, in this case,
classification, by learning on additional unlabeled examples). Like many deep generative models,
GANs have previously been applied to semi-supervised learning [13, 14], and our work can be seen
as a continuation and refinement of this effort.
3 Toward Convergent GAN Training
Training GANs consists in finding a Nash equilibrium to a two-player non-cooperative game.
Each player wishes to minimize its own cost function, J
(D)
(θ
(D)
, θ
(G)
) for the discriminator and
J
(G)
(θ
(D)
, θ
(G)
) for the generator. A Nash equilibirum is a point (θ
(D)
, θ
(G)
) such that J
(D)
is at a
minimum with respect to θ
(D)
and J
(G)
is at a minimum with respect to θ
(G)
. Unfortunately, find-
ing Nash equilibria is a very difficult problem. Algorithms exist for specialized cases, but we are not
aware of any that are feasible to apply to the GAN game, where the cost functions are non-convex,
the parameters are continuous, and the parameter space is extremely high-dimensional.
The idea that a Nash equilibrium occurs when each player has minimal cost seems to intuitively mo-
tivate the idea of using traditional gradient-based minimization techniques to minimize each player’s
cost simultaneously. Unfortunately, a modification to θ
(D)
that reduces J
(D)
can increase J
(G)
, and
a modification to θ
(G)
that reduces J
(G)
can increase J
(D)
. Gradient descent thus fails to converge
for many games. For example, when one player minimizes xy with respect to x and another player
minimizes −xy with respect to y, gradient descent enters a stable orbit, rather than converging to
x = y = 0, the desired equilibrium point [15]. Previous approaches to GAN training have thus
applied gradient descent on each player’s cost simultaneously, despite the lack of guarantee that this
procedure will converge. We introduce the following techniques that are heuristically motivated to
encourage convergence:
3.1 Feature matching
Feature matching addresses the instability of GANs by specifying a new objective for the generator
that prevents it from overtraining on the current discriminator. Instead of directly maximizing the
output of the discriminator, the new objective requires the generator to generate data that matches
the statistics of the real data, where we use the discriminator only to specify the statistics that we
think are worth matching. Specifically, we train the generator to match the expected value of the
features on an intermediate layer of the discriminator. This is a natural choice of statistics for the
generator to match, since by training the discriminator we ask it to find those features that are most
discriminative of real data versus data generated by the current model.
Letting f (x) denote activations on an intermediate layer of the discriminator, our new objective for
the generator is defined as: ||E
x∼p
data
f (x) − E
z∼p
z
(z)
f (G(z))||
2
2
. The discriminator, and hence
f (x), are trained in the usual way. As with regular GAN training, the objective has a fixed point
where G exactly matches the distribution of training data. We have no guarantee of reaching this
fixed point in practice, but our empirical results indicate that feature matching is indeed effective in
situations where regular GAN becomes unstable.
2
评论