Mastering the Game of Go with Deep Neural Networks and Tree Search
Teach a machine the intuition of Go — then let it search. AlphaGo beats a champion.
A machine learned the feel of Go from human masters, then learned to look ahead on its own — and beat a champion at the game thought hardest for computers to crack.
The idea, unpacked
Go is a game of placing black and white stones on a 19×19 grid. It looks simple, but the number of possible games is astronomical — more than the atoms in the observable universe — so a computer cannot win by simply trying every move. For decades that left Go far beyond machines, even after computers had mastered chess.
AlphaGo's answer was to give the machine two kinds of judgement, the way a strong human player has them. One part, the policy network, looks at a board and suggests the few moves worth considering — that is intuition about where to play. The other part, the value network, looks at a board and guesses who is winning — that is judgement about how good a position is. With those two senses, the program no longer has to examine everything; it can search like a person, only much faster.
Where it came from
The work came out of Google DeepMind in London, led by David Silver and Aja Huang under Demis Hassabis, and was published in the journal Nature in 2016 with twenty authors. They first showed AlphaGo by letting it crush every other Go program. Then, in October 2015, they brought in Fan Hui — the reigning European champion and a professional player — for five formal games behind closed doors. AlphaGo won all five. It was the first time a computer had beaten a professional at full-board Go without any handicap, something experts had said was still a decade away.
Why it mattered
Go had been a symbol of the gap between human intuition and machine calculation. Beating a professional showed that a single recipe — learn the intuition with deep networks, then refine it with deliberate search — could close that gap on a problem no one had cracked by brute force. The same recipe soon reached well beyond games, and the win arrived years ahead of what the field had expected, recalibrating how fast such learning systems might progress.
A way to picture it
Imagine planning a chess-like move with a wise coach at your shoulder. Out of hundreds of legal moves, the coach quietly points at the three or four worth thinking about — that is the policy network, narrowing your options. Then, instead of playing each one all the way to the end of the game, the coach glances at the resulting board and says "this looks good for you, this looks bad" — that is the value network, judging a position at a glance. You spend your limited thinking time only where it counts. The widget below lets you turn up that thinking time and watch the search settle on the best move.
Where it sits
AlphaGo built on two streams already in the Library. Its networks are deep convolutional networks of the kind that AlexNet (2012) launched into the mainstream; the search idea descends from Monte Carlo tree search developed in the 2000s. The year after this paper, DeepMind let the program throw away the human games and learn entirely from self-play (AlphaGo Zero), and then generalized the method to chess and shogi (AlphaZero). The pattern — learn good guesses, then search to sharpen them — now turns up in protein folding and even in how today's reasoning models think through a problem step by step.
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves.
Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves.
We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks.
Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.
This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.