Epidemics: The SIR Model

Three boxes the whole population flows through

In the previous guide, two species pushed each other around: the rabbits fed the foxes, the foxes thinned the rabbits, and the Lotka-Volterra equations turned that tug-of-war into closed loops circling forever. An epidemic has the same flavour — populations interacting through a rate of contact — but a crucial difference. Here the 'species' are not different animals; they are the *same people* in different conditions, and a person flows one way through the conditions and (in the simplest story) never comes back. That one-way flow is the whole engine of the SIR model.

Picture the population sorted into three boxes. S holds the *susceptible* — people who have not yet caught the disease and could. I holds the *infected* — people who have it right now and can pass it on. R holds the *recovered* (or removed) — people who have had it and are now immune, or who have died; either way they no longer spread it and cannot catch it again. Everybody sits in exactly one box. Over time, people drain from S into I as they get infected, and from I into R as they get better. S only ever empties, R only ever fills; I swells and then subsides. This three-box bookkeeping is the simplest member of a whole family called compartment models, and you will meet it again, in different clothes, when we reach drug dosing.

Writing down the flow as three equations

Now we turn the picture into rates of change. Two parameters govern everything. Let beta be the transmission rate: roughly, how often an infected person makes contact close enough to pass the disease, times the chance it actually passes. Let gamma be the recovery rate: the rate at which infected people leave I for R. The flow from I to R is the easy one — it does not care about anyone else. A fraction gamma of the infected recover per unit time, so that outflow is gamma I, giving the third equation R' = gamma I. Notice that 1/gamma is the *average time a person stays infectious*: if gamma = 1/7 per day, people are contagious for about a week.

The infection flow is the heart of the model, and it is the one genuinely nonlinear piece. A new infection needs an infected person *and* a susceptible person to meet. If you mix the population well, the rate of S-meets-I encounters is proportional to the product S times I — double the infected, double the encounters; double the susceptible, double them again. So the rate at which people pour out of S and into I is beta S I. That single product term, beta S I, is what makes an epidemic curve and not a straight line; it is the same 'two things must meet' structure that drove predator-prey, wearing a new hat.

  S' = - beta S I            (susceptible only ever drain away)
  I' =   beta S I - gamma I  (gained from S, lost to recovery)
  R' =            gamma I    (recovered only ever fill up)

  with   S + I + R = 1   constant   (nobody is created or destroyed)

The classic SIR system. Add the three equations and the right side is S' + I' + R' = 0, confirming the total stays fixed — the bookkeeping never leaks. Because S + I + R = 1 is built in, you really only need to track two of the three; the third is whatever is left over.

These three coupled equations form a first-order system of the sort this whole rung is built to handle. There is no elementary closed-form solution for S(t) and I(t) as tidy formulas in t — and that is the rule, not the exception, for nonlinear ODEs, exactly as you were warned earlier. So how do we say anything? Two ways, both of which you already own. We can run a numerical solver to draw the curves, and we can reason *qualitatively* about the system without ever solving it — and that qualitative reasoning is where the model gives up its single most famous secret.

R0: the one number that decides everything

Here is the question every health official asks first: drop a single infected person into a fully susceptible population — will the disease take off, or fizzle out? Look at the middle equation right at the start, when S is essentially 1 (almost everyone is susceptible). Then I' = beta S I - gamma I is approximately (beta - gamma) I. The infected count grows when beta - gamma is positive and shrinks when it is negative. Rearranged, the disease spreads precisely when beta/gamma is bigger than 1. That ratio has a name and a fame: it is the basic reproduction number, R0 = beta/gamma.

R0 has a beautifully concrete meaning. Recall beta is roughly new infections caused per unit time by one infected person in a sea of susceptibles, and 1/gamma is how long that person stays infectious. Multiply them: beta times (1/gamma) = beta/gamma = R0 is *the average number of fresh cases one infected person generates before recovering*, in an otherwise untouched population. If R0 = 3, the first patient infects three, who each infect three, and the outbreak roars. If R0 = 0.8, the first patient infects fewer than one on average, the chain of transmission starves, and nothing happens. The dividing line at R0 = 1 is the epidemic threshold — the single sharpest 'go / no-go' switch in all of mathematical epidemiology.

Why the curve peaks and why people are left untouched

Even when R0 > 1 and the outbreak takes off, it does not consume everyone. The infected curve rises, reaches a peak, and falls back to zero — and crucially it falls before S hits zero. Why? Look again at I' = (beta S - gamma) I. The infected population grows only while beta S - gamma > 0, that is, while S > gamma/beta = 1/R0. Every new infection spends one more susceptible, so S keeps dropping. The very moment S slides down past the value 1/R0, the bracket flips negative, I' turns down, and the epidemic is past its peak. The peak is not when the disease runs out of people — it is the instant the susceptible pool thins below the threshold needed to sustain growth.

After the peak, I keeps falling, but S keeps falling too — just more slowly, because there are fewer infected left to do the infecting. The outbreak burns out when I reaches zero again, and at that moment some susceptibles are *still in the S box, never infected at all*. This is the famous and slightly counter-intuitive result: an epidemic stops not because it ran out of susceptibles, but because it ran out of *infected* people fast enough that the remaining susceptibles were never reached. The leftover fraction is real and predictable — the so-called final-size of the epidemic — and it is why even a fierce outbreak can leave a sizeable minority who simply got lucky.

There is a clean way to see the whole trajectory without solving for time at all — the qualitative trick this rung keeps rewarding. Divide the S equation by the R equation to eliminate t: dS/dR = -beta S / gamma = -R0 S. That little separable relation integrates to S = S0 e^(-R0 R), a tidy law linking how much susceptibility remains to how much has recovered. Setting I = 0 at the end turns it into one equation for the final susceptible fraction, the *final-size relation*. You never needed S(t) as a formula in t; you only needed the *shape* of the orbit in the S-R plane, read off the way you would read a phase-plane picture.

What the simple model gets wrong on purpose

The honest part now: every assumption baked into those three lines is a simplification, and knowing them is the difference between using SIR and misusing it. The biggest is well-mixed homogeneity — the beta S I term assumes everyone contacts everyone with equal chance, like ideal-gas molecules. Real people live in households, schools, and cities; contact is clustered and patchy, so a single average beta smears over structure that genuinely matters. The model also assumes a *closed* population: no births, no deaths from other causes, no travel in or out — fine for a fast outbreak over weeks, wrong for a disease lingering over years.

Two more assumptions deserve flags. First, SIR grants *permanent immunity* — once in R, you never return to S. For diseases where immunity wanes or the pathogen mutates, that is false, and you need an SIRS variant that lets R leak back to S, which can produce recurring waves instead of a single bump. Second, SIR has *no incubation delay*: a person becomes infectious the instant they are infected. Many real diseases have a latent stretch first, and that is exactly the gap the SEIR model fills by inserting an Exposed box, E, between S and I — infected but not yet contagious. That one extra compartment delays and reshapes the curve, and it is the natural next step up in realism from the model you have here.

None of this makes SIR wrong — it makes it a *first model*, and a superb one. It explains, from three honest lines, why outbreaks are explosive then self-limiting, why R0 = 1 is a knife-edge, why herd immunity is a fraction not a totality, and why some people escape entirely. When you fit beta and gamma to real case data — the model-fitting step that closes the modeling cycle — you get genuine, decision-grade estimates of R0 and the likely peak. The craft is to start here, see clearly what the simple model captures, and add compartments only where the data show the simplification has broken. That disciplined layering is the whole spirit of this final rung.