Looking Ahead: LQR, Optimal Control, and MPC

From tuning knobs to stating a goal

In earlier rungs you tuned a controller by hand — nudge a gain up, watch it wobble, nudge it back. That works for one motor, but it gets exhausting for a robot with many joints that all push and pull on each other. Optimal control flips the problem around: instead of fiddling with knobs, you write down what you actually want, and let the math hand you the controller that achieves it best.

First you describe the robot as a state-space model: a compact bookkeeping of its state (the positions and velocities of every joint) and how today's state plus the control you apply rolls into tomorrow's state. This is the same idea behind inverse dynamics — a model of how the machine moves — just written in a form built for control.

Then you write a cost — a single number you want as small as possible. It usually trades off two things you can never fully have at once: tracking error (how far you are from the set-point you want) and control effort (how hard the motors strain). Want tight tracking? Penalize error heavily. Want a gentle, energy-sipping robot? Penalize effort. The cost is where your intentions live.

LQR: the best linear feedback, in closed form

When the model is linear and the cost is quadratic — error squared plus effort squared — the problem has a beautiful, exact answer. It's called the linear-quadratic regulator, or LQR. "Quadratic" means everything is squared, which is what makes the math solvable in one clean shot; "regulator" means it drives the state back toward zero (or toward your target).

What LQR hands you is a single gain matrix K. The control law is just u = −Kx: measure the full state x, multiply by K, and that's your command. It is full-state feedback — every joint's command can depend on every other joint's position and speed, which is exactly the coupling that hand-tuning a separate PID loop per joint can never quite capture.

The knobs you still turn are two weight matrices, usually called Q and R. Q is how much you punish state error; R is how much you punish control effort. Crank Q up and the robot becomes aggressive and snappy but works the motors hard. Crank R up and it becomes calm and frugal but lazier about tracking. You are no longer guessing gains one at a time — you are tuning two dials that express a single honest tradeoff, and the gains fall out automatically.

MPC: plan a little, act, repeat

LQR is brilliant but blind in one way: it has no notion of limits. It will happily ask for more torque than a motor can deliver if the math says so. Model predictive control, or MPC, fixes this by thinking ahead — like a chess player, only a few moves at a time.

Predict: at this instant, use the model to simulate the robot forward over a short horizon — say the next 1–2 seconds — for a candidate sequence of moves.
Optimize: search over those sequences to find the one with the lowest cost — best tracking, least effort — while obeying constraints you bake in as hard rules.
Act: apply only the very first move of that plan, and throw the rest away.
Repeat: one tick later, measure the real state again and redo the whole thing — this constant re-planning is what makes MPC robust to surprises.

The superpower is that last word in step two: constraints. MPC can be told, as iron law, that a motor saturates at its torque ceiling, that a joint can't bend past its mechanical stop, that a legged robot's foot must stay inside its support area, or that a drone must not tilt beyond a safe angle. Where a PID loop fights actuator saturation after the fact, MPC simply never plans a move it can't actually make.

If this feels like a cousin of trajectory optimization, it is — MPC is essentially solving a small trajectory-optimization problem from scratch at every single control tick, dozens or hundreds of times a second.

Choosing between them, and where they show up

The tradeoff is mostly about cost — computational cost this time. LQR solves its math once, offline, then runs forever as a trivial matrix multiply: blazing fast, almost free per tick, but it cannot honor a single constraint. MPC re-solves an optimization live, every tick: vastly more capable, but hungry for computation and for a trustworthy model.

So engineers reach for LQR when the robot stays near a known operating point and constraints rarely bite — balancing a quadrotor in hover, steadying an inverted pendulum, fast inner loops where every microsecond counts. They reach for MPC when limits are the whole game — a legged robot placing feet on rough terrain without toppling, a self-driving car threading traffic while respecting acceleration and lane bounds, a drone flying aggressively right up against its thrust limits.

Both methods assume you can see the full state x, but real sensors only give you pieces of it. That gap is filled by a state estimator — a state observer or a Kalman filter — which reconstructs the missing state from measurements and feeds it in. Optimal control and good estimation are two halves of the same machine; the next track explores the estimation half.

loop every tick:
    x  = estimate_state()            # from sensors
    plan = optimize(model, cost,
                    constraints,     # torque, joint, safety limits
                    horizon = N)     # simulate N steps ahead
    u  = plan.first_move             # apply only step 1
    send_to_motors(u)
    # next tick: re-measure, re-plan from scratch

The MPC loop in pseudo-code: estimate, optimize over a horizon under constraints, apply the first move, then redo it all next tick.