Frames: Every Robot's "Where Am I?"

The same cup, two different answers

Picture a robot arm sitting at a kitchen table with a coffee cup in front of it. Ask the robot "where is the cup?" and it says, "30 centimeters straight ahead of me." Ask the room the same question and it says, "over by the window, on the left half of the table." Both answers are completely correct. They describe the very same cup. Yet the numbers are totally different — and neither one is more "true" than the other.

This is the single most important idea in robot geometry: a location is never just three numbers floating in the air. Three numbers only mean something once you say what you measured them from. "30 cm forward" needs a thing that has a "forward." "By the window" needs a room with walls. Change the thing you measure from, and the numbers change, even though nothing physically moved.

What a frame actually is: an origin and three arrows

The "thing you measure from" has a proper name: a coordinate frame. A frame is just two ingredients glued together. First, an origin — a single chosen point that counts as "zero," the spot you start measuring from. Second, three axes — three arrows pointing out from that origin, usually labeled x, y, and z, each at a right angle to the other two. Together they form a little ruler-set planted in space: one arrow for "left–right," one for "forward–back," one for "up–down."

Once a frame is planted, describing any point becomes a simple recipe: stand at the origin, walk so far along the x arrow, so far along y, so far along z, and you arrive at the point. Those three "how far along each arrow" distances are the point's coordinates in that frame. Move the origin or spin the arrows, and the same physical point earns a brand-new set of three numbers.

The room stays put; the robot carries its own

Robots almost always live with at least two frames at once, and the contrast between them is the key to everything that follows. The first is the world frame: a frame bolted to the room itself. Maybe its origin is a corner of the floor, its x arrow runs along one wall, its z arrow points up at the ceiling. It never moves. It is the shared map everyone can agree on.

The second is the body frame: a frame glued to the robot itself, riding along wherever the robot goes. Its origin might sit at the robot's chest or at the base of the arm; its x arrow points "forward," the way the robot faces. When the robot rolls across the room, the body frame rolls with it. When the robot turns, the body frame's arrows turn too. This is exactly the frame that lets the robot say "the cup is 30 cm in front of me" — "in front of me" only has meaning inside a frame that travels with the body.

Real robots don't stop at two. The camera has its own frame, each joint of the arm has one, the gripper has one, every wheel has one. A single machine can carry a dozen frames, all moving relative to each other moment by moment. Robotics software keeps them organized in a branching structure called a transform tree — but the seed of that whole tree is just this world-versus-body distinction.

One point, many numbers — and why conversion matters

Let's make this concrete with a position vector — the formal name for that list of three numbers (x, y, z) that pins down a point relative to a frame. The cup never moves on the table. But here is how it gets described from two frames at the same instant:

Same cup, same instant, two frames:

  in BODY frame  (origin = robot, x = forward):
      cup = ( 0.30, 0.00, 0.00 )   # 30 cm straight ahead

  in WORLD frame (origin = room corner, x = along wall):
      cup = ( 1.85, 2.40, 0.75 )   # near window, table height

The cup did not move. Only the frame we asked in changed.

The numbers disagree because they answer "measured from where?" differently — not because the cup is in two places.

Now the practical headache appears. The robot's camera spots the cup and reports its position in the camera's frame. But the arm's motors only understand commands phrased in the arm's own base frame. And the human supervisor drew a "keep-out zone" using the world frame. Three frames, three different sets of numbers for the same cup — and they cannot be used together until they are translated into a shared frame.

That translation between frames — taking the cup's numbers in one frame and computing its numbers in another — is the engine of almost everything a robot does: seeing, reaching, grasping, navigating. The rest of this track is really one long, careful answer to the question this guide raises: given a point's numbers in one frame, how do we find its numbers in another? Doing it cleanly also needs the frame's twist and tilt — its orientation — not just where its origin sits, and that is exactly where we head next.