From a brain cell to a cartoon
A real neuron in your head collects tiny electrical nudges from thousands of neighbors. When the total nudge crosses a threshold, it fires a spike down its wire; otherwise it stays quiet. That is the entire image early researchers borrowed — not the messy biochemistry, just the slogan *add up your inputs, and fire if the sum is big enough*. The artificial neuron is a deliberate cartoon of that slogan, and admitting it is a cartoon is the honest place to start.
This idea sits squarely in the tradition called connectionism: the bet that intelligence emerges not from hand-written rules but from many simple units wired together. It is the opposite philosophy to the symbolic AI of logic and explicit rules you met earlier on this ladder. Neither camp is the whole truth, but the neuron is connectionism's atom.
Weighted sum and bias
Give the neuron a handful of numbers as input — say the brightness of pixels, or the features of a house. Each input gets multiplied by its own weight, a number saying how much that input matters and in which direction. Add those products together and you have the weighted sum. A large positive weight says "pay attention to this input"; a negative weight says "this input pushes me the other way"; a weight near zero says "ignore it."
Then we add one more number that belongs to the neuron alone: the bias. It does not multiply any input; it simply shifts the whole sum up or down, setting how eager the neuron is to fire before any input arrives. Think of the weights as the *slope* of a decision and the bias as where you *plant* that decision along the number line. If you met linear regression earlier, you have already seen exactly this shape: weights times inputs plus an intercept.
z = (w1*x1 + w2*x2 + ... + wn*xn) + b x = inputs (given to the neuron) w = weights (learned, one per input) b = bias (learned, one per neuron) z = the weighted sum, also called the 'pre-activation'
The activation: why we bend the line
So far the neuron is purely linear — a straight line, or a flat sheet in higher dimensions. The final step pushes the weighted sum z through an activation function, a fixed, non-linear curve. The earliest neurons used a hard step: fire 1 if z crosses zero, else 0. Smoother choices are now standard — the S-shaped sigmoid squashes any number into the range 0 to 1, while the wildly popular ReLU simply outputs z when positive and 0 otherwise.
Why bother bending the line at all? Because stacking linear steps on top of linear steps just gives you another line — a hundred straight layers collapse into a single straight one, learning nothing a single layer could not. The non-linear kink is what lets stacked neurons carve curved, intricate boundaries through data. The activation is the small ingredient that makes depth worth having.
Each activation has a personality. Sigmoid gives a clean probability-like output but saturates — far from zero its curve goes nearly flat, and a flat curve carries almost no gradient, a foreshadowing of the vanishing gradient trouble you will meet when training deep nets. ReLU stays steep for positive inputs and is cheap to compute, which is much of why deep learning took off. There is no single best choice; this is a small instance of the broader no-free-lunch truth.
The perceptron, and how it learns
Put a step activation on a single neuron and you have the perceptron, Frank Rosenblatt's 1958 machine and the first artificial neuron that could *learn its own weights*. The recipe is touchingly simple: show it an example, see whether it fires correctly, and if it is wrong, nudge each weight a little toward the right answer. Repeat over many examples and the weights drift into a setting that separates the two classes.
- Start with random (often zero) weights and bias.
- Take one labelled example, compute the weighted sum, apply the step to get a guess.
- If the guess matches the label, change nothing.
- If it is wrong, add (or subtract) a small multiple of the input to each weight, pushing toward the correct side.
- Loop over the data until it stops making mistakes (if it can).
This is your first taste of supervised learning: a label tells the unit what the answer should have been, and error drives the update. It is also a baby version of gradient descent — roll downhill on the mistakes — the same engine that, generalised through backpropagation, trains every modern network. The perceptron is not a quaint relic; it is the seed of the whole subject.
What one neuron cannot do
Honesty time. A single neuron draws exactly one straight boundary — a line in 2D, a flat plane in higher dimensions. It can split apples from oranges only when a straight cut suffices. Classic counterexample: the XOR pattern, where you must say yes when exactly one of two inputs is on. No straight line separates those four points, and so no single neuron, however cleverly trained, can ever learn XOR.
Here is the payoff and the cliffhanger. Wire several neurons side by side into a layer, then feed one layer into the next, and the straight cuts combine into bent, folded regions that *can* solve XOR and far harder problems — a multilayer perceptron. A famous result, the universal approximation theorem, even promises that a big enough layer can approximate essentially any reasonable function.
Read that promise carefully, though. The theorem says a suitable network *exists*; it never says training will find it, how big it must be, or that it will generalise to new data. Existence is not a recipe. That honest gap — between what is possible and what is learnable — is exactly the territory the rest of this rung will explore, one neuron at a time.