A Short History of AI

A summer that started it all

In the summer of 1956, a small group of researchers gathered at Dartmouth College for a workshop on a bold idea: that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it." That meeting gave the field its name — artificial intelligence — and its founding optimism. You already met AI as an idea in the previous guide; here we watch that idea actually grow up, and learn why it grew in fits and starts.

The mood was electric. Leaders predicted that a machine matching the full human mind was perhaps a generation away. They were wrong about the timeline by more than half a century — a pattern of over-promising that, as you will see, repeats across AI's whole history. But they were right that something real had begun. The story from here is not a straight climb. It is a series of waves: a thrilling new idea, soaring expectations, a hard collision with reality, then a quiet stretch of patient work before the next wave.

The age of rules: symbolic AI and expert systems

The first big bet was that intelligence is, at heart, manipulating symbols by logical rules — the same kind of reasoning you do when you follow a chain of "if this, then that." This approach is called symbolic AI, sometimes nicknamed "good old-fashioned AI." The plan was elegant: write down human knowledge as explicit facts and rules, and let the machine reason over them. Early programs proved mathematical theorems and played checkers, and it genuinely felt like the door to thought had cracked open.

By the 1970s and 80s this matured into the expert system: a program that captured a specialist's know-how as hundreds or thousands of hand-written rules. A medical expert system could suggest diagnoses; a configuration system could spec out a computer order. For a while these actually made money, and the boom was real. The deep idea was that knowledge lives in rules someone writes down — and if you just write enough of them, the machine grows smart.

When the funding froze: the AI winters

When grand promises met stubborn reality, enthusiasm collapsed — and so did the money. These downturns have a name: an AI winter. There were two big ones, roughly in the mid-1970s and again in the late 1980s into the 1990s. Funders, burned by claims that never arrived, pulled back; "artificial intelligence" became an embarrassing phrase that careful researchers avoided in grant applications.

It is worth being honest about what a winter really was. The science did not stop — quiet, important work continued in labs all along. What froze was the hype and the funding, not the field. And the winters had a cause beyond broken promises: the ideas of the time were starving for two things they could not yet get enough of — data and computing power. Remember that pairing; it is the key that unlocks the rest of this story.

The other tradition: learning from examples

All along, a rival idea had been growing in the shadows. Instead of writing rules by hand, what if a machine could learn the patterns itself from examples? This is connectionism — building loose, simplified networks inspired by the brain's neurons, and letting them adjust through experience. Its earliest spark was the perceptron in the late 1950s, a tiny learning machine that could be trained to tell simple categories apart.

The perceptron was overhyped, then mathematically shown to have real limits, and the connectionist idea slept for years. It revived in the 1980s when researchers worked out how to train networks with many layers, a method you will study later under the name backpropagation. In parallel, a more sober, statistics-flavored kind of machine learning quietly took over real applications in the 1990s and 2000s: spam filters, credit scoring, web search. This was the field learning a humbler, more honest lesson — fit models to data, measure carefully, promise only what you can show.

2012: the deep-learning ignition

The modern era has a surprisingly precise spark date: 2012. In a yearly contest where programs competed to label millions of photographs, a deep neural network crushed the competition so decisively that the result reshaped the field overnight. This was the public arrival of deep learning — stacking many layers of artificial neurons so the network learns its own features, from edges up to whole objects, instead of relying on features people hand-designed.

But here is the honest twist: the core ideas were decades old. The deep network that won used techniques from the 1980s. What changed was the two things the winters had starved for. The internet had produced oceans of labeled data, and gamers had accidentally funded a perfect tool — the graphics chip (GPU), which happens to be brilliant at the exact math neural networks need. Old idea + huge data + cheap parallel compute. The wave broke not because someone had a lone genius insight, but because the whole supply chain of intelligence finally lined up.

There is a famous, slightly painful summary of what the field kept relearning, called the bitter lesson: across decades, methods that simply scale up general learning with more data and compute have tended to beat methods that lean on clever hand-built human knowledge. It is bitter because researchers love their clever ideas — yet again and again, raw scale won. The deep-learning years were that lesson arriving with full force.

The foundation-model era — and why now

The latest wave pushed scale further than anyone expected to work. Instead of training a fresh model for each task, researchers trained enormous networks on vast swaths of text and images, producing a single, general-purpose foundation model that can then be adapted to many jobs. The chatbots and image generators you have used are the visible tip of this. Crucially, this wave rode a 2017 architecture (the transformer) that made it possible to train these models efficiently on huge amounts of data.

So why now, and not in some past summer of optimism? Because the same recipe finally matured: an idea that scales, data at internet size, and compute cheap enough to train models with billions of adjustable numbers inside them. The waves were never really about a single eureka. They were about all three ingredients ripening together — and for most of AI's history, at least one was missing.

each wave needs all three:
  idea_that_scales  +  enough_data  +  cheap_compute

1956-70s symbolic   : strong idea, ~no data,   weak compute  -> stalls
1980s connectionism : good idea,   little data, weak compute  -> stalls
2012 deep learning  : old idea,    big data,    GPUs          -> ignites
2020s foundation    : scaled idea, web data,    huge compute  -> booms

Why progress came in waves: in each era, AI advanced only when an idea, data, and compute all arrived at once — and stalled whenever one was missing.