What independence actually says
The first guide in this rung built conditional probability around one picture: learning that B happened shrinks the sample space down to B, and P(A given B) measures how much of that smaller world is also A. Independence is the special, peaceful case where the shrinking does nothing. Two events A and B are independent when knowing B occurred leaves your assessment of A completely unchanged: P(A given B) = P(A). The information in B is, for the purpose of predicting A, worthless. That is the whole intuition, and everything else in this guide is just bookkeeping around it.
There's a small wrinkle with that definition: P(A given B) = P(A and B) / P(B) only makes sense when P(B) is not zero. So the textbook definition of independent events is stated in a symmetric, division-free form that always works: A and B are independent exactly when P(A and B) = P(A) · P(B). The probability of both happening factorizes into the product of the separate probabilities. When P(B) > 0 you can divide through and recover P(A given B) = P(A); when P(A) > 0 you get P(B given A) = P(B). The product form is the clean, official definition; the conditional statements are how it feels.
Definition (always valid): P(A and B) = P(A) * P(B) Equivalent when P(B) > 0: P(A given B) = P(A) Equivalent when P(A) > 0: P(B given A) = P(B)
Independent is not disjoint — they are nearly opposites
Here is the confusion this guide exists to kill. Mutually exclusive (disjoint) events are ones that cannot both happen: A and B share no outcomes, so P(A and B) = 0. Independent events are ones where one happening tells you nothing about the other. People hear both as a vague "unrelated" and assume they're the same thing. They are not just different — for events of positive probability, they are almost incompatible. If A and B are disjoint and both have positive probability, then learning B happened tells you A definitely did not happen. That is the strongest possible dependence, the opposite of telling you nothing.
The algebra nails it. Suppose A and B are disjoint, so P(A and B) = 0. For them to also be independent we would need P(A and B) = P(A) · P(B), i.e. 0 = P(A) · P(B). That can only hold if at least one of P(A), P(B) is zero. So two events with positive probability cannot be both disjoint and independent — ever. Disjointness is a statement about the outcomes (no overlap); independence is a statement about the probabilities (they multiply). Confusing them is like confusing "these two roads never meet" with "these two roads tell me nothing about each other's traffic".
Why the product rule is so useful
In the previous guides the multiplication rule gave the general factorization P(A and B) = P(A) · P(B given A), and the conditional factor was the price you paid for the events influencing each other. Independence is precisely the case where that price drops to nothing: P(B given A) collapses to P(B), and the chain rule becomes a plain product. This is what makes independence the workhorse of modeling. Toss a fair coin three times and the eight sequences are governed by one rule: P(H, H, T) = (1/2)(1/2)(1/2) = 1/8, because the tosses don't talk to each other. No conditional gymnastics, just multiply.
Independence also drives the gambler's fallacy from the other side. Each spin of a fair wheel is independent of the last, which by definition means the past spins carry no information: P(red on spin 6 given five reds) is still P(red). The wheel has no memory, no obligation to "balance out". The gambler's fallacy is, at heart, the refusal to believe that P(A given B) = P(A) when it really does hold. And the honest counterweight is not "it must even out soon" but the law of large numbers, several rungs ahead: the long-run average settles down, while any particular short run is free to stay lopsided.
Three or more events: pairwise is not enough
Once you have three or more events, "independent" splits into two strengths, and missing the difference is a classic trap. Pairwise independence means every pair multiplies: P(A and B) = P(A)P(B), and likewise for A,C and B,C. Mutual (full) independence demands more — every sub-collection multiplies, including the triple: P(A and B and C) = P(A)P(B)P(C). Mutual independence implies pairwise, but the reverse fails, and that gap is real, not a technicality. You can have three events that are perfectly independent two at a time yet tightly linked when you look at all three together. This is the heart of pairwise versus mutual independence.
The cleanest example: toss two fair coins. Let A = "first coin is heads", B = "second coin is heads", C = "the two coins match". Each has probability 1/2. Check the pairs — A and C: the first is heads and they match means both heads, probability 1/4 = (1/2)(1/2), independent. By symmetry B and C are independent too, and A and B obviously are. So all three pairs are independent. But P(A and B and C) is just P(both heads) = 1/4, while P(A)P(B)P(C) = 1/8. Not equal — the triple fails. And it must: once you know A and B, C is completely determined, so the three together are anything but independent.
Two more honest cautions
First, independence can appear or vanish when you condition. Two events independent overall can become dependent once you fix a third — and the reverse happens too. The clean version of this is conditional independence: A and B are conditionally independent given C when P(A and B given C) = P(A given C) · P(B given C). This is its own relationship, neither implying nor implied by plain independence. A famous picture: two students' exam scores may look correlated overall (good day, good cafeteria, easy questions), yet be independent once you condition on the specific exam they sat. Independence is always relative to the information you're holding fixed.
Second, when we move from events to random variables, independence has a numerical cousin that is weaker, not equal: zero correlation. Independent variables are always uncorrelated, but uncorrelated does not imply independent. Correlation only sees straight-line, linear association; a variable can determine another through a curve and still show Cov(X, Y) = 0. The textbook case is X uniform on (-1, 1) with Y = X^2: knowing X pins Y down exactly, so they are about as dependent as possible, yet by symmetry Cov(X, Y) = E[X^3] - E[X]E[X^2] = 0. Zero correlation rules out the linear shadow of dependence, never dependence itself.