Biology 2021

Highly Accurate Protein Structure Prediction with AlphaFold

John Jumper, Demis Hassabis et al. (DeepMind)

An AI learned to fold proteins — solving a 50-year grand challenge of biology in one stroke.

Choose your version

In depth · the introduction

An AI called AlphaFold learned to predict the folded shape of a protein from its raw sequence — cracking a puzzle that had stumped biology for half a century.

The idea, unpacked

Proteins are the tiny machines that run your body — they digest food, fight infection, carry oxygen. Each one starts as a long chain of chemical beads (amino acids) that crumples up into a specific, intricate 3-D shape, and that shape is everything: it decides what the protein can do. The catch is that working out the shape from the bead sequence was fiendishly hard. There are more possible folds than atoms in the universe, and finding the right one in the lab can take years per protein.

Where it came from

AlphaFold cracked it by learning from examples. Scientists had painstakingly mapped the shapes of around 170,000 proteins over decades; AlphaFold's AI studied all of them, learned the hidden rules of folding, and could suddenly predict a brand-new protein's shape in minutes — often as accurately as a lab experiment. At a famous head-to-head contest called CASP in 2020, it left every rival far behind, and the judges declared the problem essentially solved. DeepMind then ran it on nearly every known protein, over 200 million of them, and gave the entire database away for free — work that helped earn a share of the 2024 Nobel Prize in Chemistry.

Why it mattered

This solved one of biology's oldest grand challenges and handed every researcher on Earth a near-instant map of life's machinery. It's already speeding up the search for new medicines, the design of enzymes, and our understanding of diseases. It's also the clearest sign yet that AI can crack real scientific mysteries, not just play games.

How a shape becomes a fingerprint

Here's a way to picture what the AI really predicts. Lay the protein's beads out in order and ask, for every pair, “do these two end up touching once it folds?” Write the answers in a grid — that's a contact map. A coil leaves a stripe near the grid's diagonal; a chain that folds back on itself leaves a stripe crossing it; a straight chain leaves the grid blank. The pattern of touches is a fingerprint of the shape — get it right and you've basically got the fold. Try it below.

What came next

AlphaFold was only the beginning. Newer versions predict not just single proteins but how proteins clasp onto each other, onto DNA and RNA, and onto the small molecules that become drugs. The same ideas now drive “protein design” — building brand-new proteins that never existed in nature, to act as medicines, vaccines, or tiny factories. A tool that started by reading the shapes life already invented is helping invent new ones.

The original document

Original source text

J. Jumper, R. Evans, A. Pritzel … D. Hassabis · Nature 596 (2021): 583–589

The problem

Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort, the structures of around 100,000 unique proteins have been determined, but this represents a small fraction of the billions of known protein sequences.

The challenge of predicting the three-dimensional structure of a protein based solely on its amino acid sequence — the structure prediction component of the “protein folding problem” — has been an important open research problem for more than 50 years.

The result

Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known.

We validated an entirely redesigned version of our neural network–based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14), demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods.

The full paper — with the Evoformer and structure-module architecture, the CASP14 accuracy distributions, the analysis of the per-residue pLDDT confidence measure, and the ablation studies — runs to many pages with an extensive supplement, and is available at the source below.

DeepMind, London · 2021