Why the Plain Kalman Filter Breaks
The classic Kalman filter is wonderfully tidy, but it rests on one strict assumption: every step must be a straight-line relationship. The way the robot moves from one moment to the next (its motion model) and the way a sensor reading depends on the true state (its observation model) both have to be linear — double the input, double the output, with nothing curved in between. When that holds, a Gaussian bell-shaped belief stays a perfect bell after every prediction and update, and the math closes neatly.
Real robots rarely cooperate. The moment a wheeled robot turns, its new x–y position depends on the sine and cosine of its heading — curved functions, not straight lines. Drive forward while rotating, and your future position bends around an arc. That is nonlinear motion, and it already breaks the assumption.
Sensing is just as bad. A bearing-only sensor — say a camera that reports only the *angle* to a landmark, not the distance — relates angle to position through an arctangent. Range from a beacon involves a square root. Feed a nice Gaussian cloud of guesses through such a curved function and the cloud comes out lopsided, banana-shaped, no longer a clean bell. The plain filter has no honest way to represent that, so it quietly misstates its own confidence.
The Extended Kalman Filter: Pretend It's Straight, Locally
The oldest fix is the extended Kalman filter (EKF), and its trick is easy to picture. A curve, if you zoom in close enough, looks almost like a straight line — the tangent at that point. So the EKF takes the robot's current best guess, draws the tangent to the curved motion and sensor functions right there, and uses that straight-line approximation to run the ordinary Kalman math for one step. Next step, new guess, new tangent. It re-linearizes every cycle.
Mechanically, that tangent is captured by a Jacobian — a matrix of partial derivatives that says how each output nudges when each input nudges, evaluated at the current estimate. Plug those Jacobians where the linear filter expected fixed matrices, and the whole predict-and-update loop carries on almost unchanged. That familiarity is exactly why the EKF became the workhorse of GPS receivers, early drones, and countless localization systems.
But a tangent is only a good stand-in near the point where you drew it. If the function curves sharply, or your current guess is far from the truth, the straight-line approximation can be badly off — and then the filter confidently steps in the wrong direction. Errors compound, the reported covariance shrinks while reality wanders, and the estimate can diverge entirely, never to recover. The EKF also demands that you can actually compute those derivatives, which is painful or impossible for messy, non-smooth models.
The Unscented Kalman Filter: Sample, Don't Differentiate
The unscented Kalman filter (UKF) starts from a sharp insight: it is easier to approximate a probability distribution than to approximate an arbitrary nonlinear function. Instead of flattening the curve into a tangent, the UKF leaves the true curved function alone and picks a small, carefully chosen set of sample points — called sigma points — that together capture the mean and spread of the current belief.
Then it does the obvious, honest thing: it pushes each sigma point through the *real* nonlinear motion or observation function — no tangent, no shortcut. On the far side it has a transformed scatter of points, and it simply reads off their new mean and covariance to form the next Gaussian. This is called the unscented transform, and for the same amount of curvature it typically tracks the true mean and spread far more faithfully than the EKF's tangent.
There is a big practical bonus: no derivatives. You never have to work out a Jacobian by hand or worry that your model is too jagged to differentiate — you just need to be able to *run* the function on a point. That makes the UKF a natural drop-in upgrade for the EKF, usually at only a modest extra cost (a handful of sigma points instead of one linearization).
The UKF's limit, though, has the same shape as the EKF's: at the end of the day it still summarizes the answer as a single Gaussian — one bell with one peak. If the true belief has become genuinely banana-shaped, or worse, split into two separate clumps of possibility, no single bell can describe it. For that, you need a fundamentally different representation.
Particle Filters: A Cloud of Guesses That Can Be Any Shape
The particle filter throws out the bell curve entirely. Instead of describing belief with a mean and a covariance, it represents belief as a crowd of hundreds or thousands of individual guesses — particles — each one a complete hypothesis about where the robot might be. The density of the crowd *is* the probability: where particles cluster thickly, the robot probably is; where they thin out, it probably isn't. Because the crowd can take any shape, it can capture a banana, two clumps, a ring — whatever the truth demands.
- Predict: move every particle forward through the motion model, sprinkling in a little random noise so the cloud spreads to reflect uncertainty in how the robot actually moved.
- Weight: take the latest sensor reading and score each particle — the ones whose location would have produced something close to what was actually seen get a high weight, the ones that disagree get a low one.
- Resample: breed a fresh crowd by copying high-weight particles many times and dropping low-weight ones, so survivors concentrate where the evidence points.
- Repeat every cycle; over time the cloud chases the truth and tightens as readings accumulate.
This recipe is exactly Monte Carlo localization — the particle filter applied to the classic problem of figuring out where a robot is on a known map. Its great party trick is the kidnapped robot: scatter particles uniformly across the whole map, and as the robot drives and senses, clumps that disagree with the world starve and vanish while the right clump fattens, until the cloud collapses onto the true location. A single Gaussian, which must commit to one guess and one spread from the start, simply cannot do that.
Choosing: Cost Versus Flexibility
Line the three up and a clear spectrum appears, trading computational cost against how much messiness each can swallow. The EKF is cheapest and simplest, the UKF is a modest step up that handles curvature more gracefully, and the particle filter is the heavyweight that can represent any belief shape at all — but demands the most compute.
Filter Belief shape Handles nonlinearity Cost Best when ------ ----------------- -------------------- ------- -------------------------- EKF one Gaussian tangent (linearize) low mild curve, good init guess UKF one Gaussian sample sigma points medium sharper curve, no derivs PF/MCL any-shape cloud run true function high multi-modal, kidnapped robot
In practice the choice follows the problem. A drone fusing an IMU with GPS, where the estimate stays near the truth and updates run hundreds of times a second, leans EKF or UKF for speed. A ground robot that must figure out its place on a map from scratch, or recover after being picked up and moved, leans on a particle filter precisely because it can hold many rival hypotheses at once. Many real systems even mix them — and all three are members of the same family, recursive Bayes filters that loop predict-then-update forever, differing only in how they draw the belief.