JOVANA
Library Glossary Getting Started Three Levels Fields How it works Mission
Join the mission
Back to the library
Artificial Intelligence 2014

Generative Adversarial Nets

Ian Goodfellow et al. (Université de Montréal)

Train two networks against each other — a forger and a detective — until the fakes pass for real.

Choose your version
In depth · the introduction

What if you could teach a computer to paint not by showing it the right answer, but by setting two of them to compete — one to forge, one to spot forgeries?

The idea, unpacked

Most machine learning is taught by correction: here is the right answer, get closer to it. But how do you correct a painting? There is no single right pixel. This paper sidestepped the problem with a clever trick — instead of grading the output directly, it had two networks grade each other.

One network, the generator, tries to make fake examples — say, images of faces. A second network, the discriminator, is shown a mix of real faces and the generator's fakes, and has to guess which is which. The two are locked in a contest: the discriminator gets better at catching fakes, which forces the generator to make better fakes, which forces the discriminator to look harder, and on it goes. Neither is ever told what a good face looks like; they teach each other by trying to win.

Where it came from

The story has a well-worn origin tale: in 2014, Ian Goodfellow, then a PhD student in Yoshua Bengio's lab at the University of Montreal, was arguing with friends in a Montreal pub about how to make computers generate realistic images. He sketched the adversarial idea that night, coded it when he got home, and — as the legend goes — it worked on the first try. The eight authors published it that June at the NIPS conference. The full title was modest: "Generative Adversarial Nets."

The first results were small grey digits and blurry faces. But other researchers saw what the framework could become, and within a few years GANs were producing photographs of people who do not exist.

Why it mattered

Before this, getting a machine to create convincing new images was clumsy and the results were smudgy. The adversarial setup produced sharp, believable samples, and it did so using only the standard training machinery — no exotic mathematics required to run it. That combination made "generative AI" a serious, fast-moving field. The deepfakes, the AI portraits, the photo-editing tools that invent plausible detail — the modern conversation about machines that create largely begins here.

An everyday analogy

Picture an art forger and a museum detective who train together for years. Every time the detective learns to spot a tell — the wrong varnish, a too-modern pigment — the forger studies the catch and fixes it. Each makes the other sharper. After enough rounds, the forger's paintings are so good the detective can do no better than flip a coin. At that exact point, the forgeries are, by any test the detective can devise, indistinguishable from the real thing. That coin-flip moment is the mathematical goal of the whole system — try reaching it yourself below.

An interactive plot: a fixed blue bell curve is the real data distribution; an orange bell curve is the generator, with sliders for its centre and spread; a dashed green curve is the optimal discriminator's output between 0 and 1. As you slide the orange curve to overlap the blue one, the green curve flattens onto the D = 1/2 line and a readout shows the game value approaching minus log four. A train button steps the generator toward the data automatically.

Where it sits

GANs were the engine of image generation for about five years, and they sit in the same lineage as the other AI documents in this Library: deep networks trained by backpropagation, the same machinery that powers the Transformer. They have since been largely overtaken for image and video by diffusion models, which proved easier to train and sharper still. But the core move — learn to create by being judged — outlived the specific recipe and reshaped how the field thinks about generation.

The original document
Original source text
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio · NIPS 27 (2014), pp. 2672–2680
Abstract
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
The abstract goes on to state that the training procedure for G is to maximize the probability of D making a mistake, that this corresponds to a minimax two-player game, and that — when G and D are multilayer perceptrons — the whole system can be trained by backpropagation, with no Markov chains or unrolled inference needed.
Introduction — the counterfeiters and the police
The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency. Competition in this game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles.
Adversarial nets — the value function
The two networks play a minimax game over a single value function: D is trained to assign the correct label to both real and generated samples, while G is trained to fool it.
In other words, D and G play the following two-player minimax game with value function V(G, D): min_G max_D V(D, G) = E_{x∼p_data}[log D(x)] + E_{z∼p_z}[log(1 − D(G(z)))].
[ … ]
Theoretical results
Theorem 1. The global minimum of the virtual training criterion C(G) is achieved if and only if p_g = p_data. At that point, C(G) achieves the value −log 4.
At the optimum the generator's distribution exactly matches the data, and the best the discriminator can do is output 1/2 everywhere — it can no longer tell real from fake.
Experiments
The authors train adversarial nets on MNIST, the Toronto Face Database (TFD) and CIFAR-10, and report Parzen-window log-likelihood estimates of the generated samples. The figures of generated digits and faces are shown beside their nearest training neighbours, to demonstrate the model has not simply memorised the data.
Advantages and disadvantages
The disadvantages are primarily that there is no explicit representation of p_g(x), and that D must be synchronized well with G during training (in particular, G must not be trained too much without updating D …).
Université de Montréal · June 2014