"Learning" is not one thing
You already know the big move that makes AI different from ordinary software: instead of a person spelling out every rule, a program improves by being shown data. That is machine learning. But "shown data" hides a real question — *shown what, exactly, and what is it trying to get better at?* The answer splits machine learning into three broad families, and almost every system you hear about lives in one of them (or stitches a few together).
The three families are supervised, unsupervised, and reinforcement learning. The cleanest way to tell them apart is to ask what kind of feedback the machine gets. Supervised learning gets a *right answer* for each example. Unsupervised learning gets *no answers at all* — just the raw data. Reinforcement learning gets *occasional rewards* for the consequences of its actions. That single distinction — what the feedback looks like — does most of the work.
Supervised learning: learning from answered examples
Supervised learning is the most common kind, and the easiest to picture. You give the machine many examples that each come with the correct answer attached. Each example has features (the inputs it can see) and a label (the answer you want it to produce). The machine's whole job is to find a model — a rule — that maps features to labels well enough to handle examples it has never seen.
Take a spam filter. You collect thousands of emails, each already marked *spam* or *not spam* by people. The features might be which words appear, who sent it, how many links it has; the label is the spam/not-spam verdict. The filter studies these answered examples and learns a rule, then applies that rule to brand-new mail. The hope is [[generalization|generalization]] — that a pattern learned from old mail still works on tomorrow's mail.
Supervised learning powers a huge share of practical AI: detecting fraud, reading handwriting, flagging tumours on a scan, predicting tomorrow's demand. Its great strength is also its great cost — it needs labelled answers, and someone (often a person) has to produce them. A model that has merely memorised its examples instead of learning the real pattern is said to be [[overfitting|overfitting]]; it looks brilliant on the data it studied and falls apart on anything new.
# supervised: every example carries its own answer
train on:
("win a free prize, click here", spam)
("lunch at noon?", not_spam)
("verify your account now!!!", spam)
learn rule: features -> label
new email -> rule -> guess: spam / not_spamUnsupervised learning: finding structure with no answer key
Unsupervised learning removes the answer key entirely. You hand the machine a pile of data with no labels and ask a softer question: *what structure is hiding in here?* There is no "correct" output to match — only patterns to surface. That makes it less tidy than supervised learning, but it fits the common case where you have plenty of data and almost no labels.
The classic example is grouping customers. A shop has records of what each person buys, but nobody has sorted them into "types." A clustering method like k-means looks at the dataset and bundles shoppers who behave alike — bargain hunters here, weekend bulk-buyers there — without ever being told those groups exist. The machine *proposes* the groups; a human then looks and decides whether they mean anything.
Beyond clustering, unsupervised ideas power recommendation, anomaly detection (a transaction unlike all the others may be fraud), and the compression that squeezes data down to its essentials. There is a catch worth naming: with no answer key, you cannot simply measure "accuracy." Judging whether the discovered structure is *useful* takes human eyes and domain sense — the machine can group, but it cannot tell you the groups matter.
Reinforcement learning: learning by trial and reward
Reinforcement learning looks the least like a classroom and the most like training a dog or learning to ride a bike. There is no stack of answered examples. Instead an agent acts in an environment, and now and then it gets a reward — a number saying *that went well* or *that went badly*. Over many tries it adjusts its policy (its way of choosing actions) to earn more reward over the long run.
The famous example is game-playing. To learn Go, a system plays huge numbers of games inside an act-then-observe loop: it moves, the board changes, and only at the very end does it learn whether it won. From that thin, delayed signal it gradually discovers strong play. This is roughly how DeepMind's AlphaGo reached a level that beat the world's best human players — a genuine landmark, not hype.
Reinforcement learning shines where decisions unfold over time and feedback is sparse: games, robot control, steering data-centre cooling, tuning recommendations. It is also the hungriest and most finicky of the three — it can need millions of trials, which is fine in a simulator but dangerous in the real world, and a badly designed reward can teach exactly the wrong lesson. "Make the number go up" is not the same as "do what I meant."
Where the lines blur — and what learning still can't do
In real systems the three families mix freely. The large language models behind today's chatbots are trained first by predicting the next word in oceans of text — a *self-supervised* trick that conjures labels out of the raw data itself — and then polished with reinforcement learning from human preferences. So when you meet a sleek modern AI, it is usually not one pure method but a recipe layering several. The three families are the alphabet, not the whole literature.
Note too what all three share — and where they stop. Every one of them learns from the past and bets that the future resembles it; shift the world far enough and even a brilliant model goes wrong. None of them "understands" in the human sense; a spam filter has no idea what email *is*. And whichever family a system belongs to, it is still [[narrow-ai|narrow AI]] — superb at the one slice it was trained on, blank outside it. AlphaGo could crush a Go champion and not so much as tell you the rules of checkers.
Hold on to the core idea instead of the labels: a machine learns when its behaviour improves from experience, and the *kind* of experience — answered examples, unlabelled data, or rewarded actions — names the family. Carry that question into the next guide, where we look squarely at what today's AI genuinely can and cannot do, separating the real progress from the breathless promises.