Index Notation & the Summation Convention

Why the bold-letter notation runs out of road

Welcome to the first rung of tensor calculus. The earlier guides on vector fields, gradient, divergence and curl gave you a powerful machine — but every formula there quietly assumed flat, square, Cartesian coordinates. The moment the ground curves, or you switch to polar or spherical axes, those tidy formulas grow extra terms and the bold-arrow notation v stops telling you what you need to know. This rung's promise is calculus on a curved surface, even curved spacetime, and to keep that promise you first need a notation honest enough to survive a change of coordinates. That notation is [[calc-index-notation|index notation]].

The idea is to stop hiding a vector's insides inside a single bold letter and instead name its components out loud. Rather than v you write v^i, which means 'the i-th component', with i running over 1, 2, ..., n. A matrix becomes A^i_j; a more elaborate object simply grows more indices, like T^{ij}_k. The whole craft of tensor calculus is then carried out by pushing these indexed symbols around according to a few strict rules — and once those rules are in your hands, the algebra of curved space becomes almost mechanical, which is exactly the relief you want when the geometry itself is hard.

Upstairs and downstairs: contravariant and covariant

Why does the position of an index matter — why is v^i (index up) a different beast from v_i (index down)? Because there are genuinely two natural ways to attach numbers to a vector, and in a skewed or curved frame they disagree. An [[contravariant-and-covariant-components|upper index]] marks a contravariant component: how many of each basis arrow you stack to build the vector — the readings that go with displacements, velocities, ordinary arrows. A lower index marks a covariant component: the perpendicular shadow the vector casts onto each basis direction — the readings that go with gradients and planes of constant value. The names sound mystical but describe something plain: under a change of coordinates the two kinds of numbers move in opposite directions.

Here is the mental picture that fixes 'contra'. Suppose you shrink your ruler — you halve the length of each basis vector. The arrow in the world has not changed, yet to describe it you now need twice as many basis units, so the contravariant components double. The components moved opposite to the basis: basis down, numbers up. That is contravariance. A covariant quantity like the gradient does the reverse — its lower-index components scale the same way as the basis, hence 'co'. Recall from Volume I that the gradient's entries are partial derivatives df/dx^i; partial derivatives carry their coordinate downstairs, which is the deep reason a gradient is born with a lower index and lives covariantly.

In an ordinary orthonormal Cartesian frame the two kinds of numbers happen to coincide — v^i and v_i hold the same values — which is exactly why your first courses never bothered to distinguish them and why you can be forgiven for thinking up and down are decoration. They are not. The instant the basis is non-orthonormal or position-dependent (polar, spherical, a curved surface), the two readings split apart, and confusing them produces wrong formulas. The dictionary that translates between up and down is the [[metric-tensor|metric tensor]] g_{ij}, and the operation of raising and lowering indices, v_i = g_{ij} v^j, is the next guide's business; for now just hold that up and down are two faces of one arrow.

Einstein's one-line rule

Now the centerpiece. Physics is awash in sums that all look alike. A dot product is u1 v1 + u2 v2 + u3 v3. A matrix times a vector has entries summing A(i,k) times x(k) over k. Go to four dimensions and curved space and these summation signs sprout everywhere, cluttering the page. The [[einstein-summation-convention|Einstein summation convention]] is a single bookkeeping rule that lets you erase almost every sigma. The story goes that Einstein joked he had made his greatest contribution to mathematics with it — and only half in jest, because the rule does real work.

The rule is exactly this: whenever an index appears twice in a single term — once upstairs and once downstairs — you automatically sum over it across its whole range, with no sigma written. So a^i b_i secretly means a^1 b_1 + a^2 b_2 + ... + a^n b_n. A repeated index summed away like this is called a dummy (or summed) index, and its name does not matter at all: a^i b_i and a^k b_k denote the very same number, so you may rename a dummy freely to avoid a clash. An index that appears only once in a term is a free index; it labels which component you mean, and it must appear, in the same up-or-down slot, in every term and on both sides of an equation.

READING AN INDEXED EXPRESSION

  a^i b_i   ( n = 3 )   ->   a^1 b_1 + a^2 b_2 + a^3 b_3     ( one hidden sum )

  i is a DUMMY index : up once, down once, summed, name irrelevant
     a^i b_i  =  a^k b_k  =  a^m b_m            ( all the same number )

  In   c^i = A^i_j x^j :
     j  appears up-and-down  -> DUMMY  (summed: matrix times vector)
     i  appears once each side -> FREE   (labels the 3 components of c)

  ds^2 = g_{ij} dx^i dx^j   ->   TWO dummy pairs (i and j)
     a double sum: 9 terms in 3-D, written as 4 symbols

  INDEX-BALANCE CHECK (do this on every line):
     free indices on the LEFT must match free indices on the RIGHT,
     same names, same up/down slots.  If they don't, you slipped.

How to read repeated (dummy) versus single (free) indices, and the one-second index-balance check that catches most algebra slips.

The two glue symbols: delta and epsilon

Two special symbols turn the rules above into a working algebra. The first is the Kronecker delta delta^i_j, defined as 1 when i = j and 0 otherwise — it is just the identity matrix [1, 0; 0, 1] wearing indices. Its job is to rename and contract: in a sum like delta^i_j v^j, the delta is nonzero only when j equals i, so the whole sum collapses to v^i. In other words delta^i_j acts as a 'substitution operator' that hunts down a dummy index and replaces it. Whenever a calculation produces a Kronecker delta tangled into a sum, you can usually simplify on the spot by letting it eat one of the dummy indices.

The second is the Levi-Civita symbol epsilon_{ijk}, which is +1 for an even permutation of (1, 2, 3), -1 for an odd permutation, and 0 whenever any two indices are equal. This little antisymmetric object is exactly what encodes cross products and determinants. The cross product you met as nabla cross F earlier in this volume becomes, in components, c^i = epsilon^{ijk} a_j b_k — and watch the index balance: i is the lone free index labelling the three components of the output, while j and k are each summed-away dummy pairs. The determinant of a 3-by-3 matrix is likewise a single epsilon-contraction. So the two symbols between them capture identity, swapping, rotation and orientation — the entire vocabulary of linear algebra, packed into delta and epsilon.

Contraction, free indices, and what it earns you

The single most important move in the whole notation is contraction: pairing an upper index with a lower index of the same name and summing, as in v^i w_i. This is not idle shorthand — it is precisely the operation that builds coordinate-independent quantities. Because an upper index transforms by the Jacobian of the coordinate change and a lower index by the inverse Jacobian, the two transformation factors are inverses of each other; contracting them makes the factors cancel exactly, and what survives is a true geometric scalar that every observer, in every frame, agrees on. The dot product v^i w_i is the cleanest example: one number, frame-independent, born from a single contraction.

This is also where the notation earns its keep as a tensor language. A genuine [[tensor-transformation-law|tensor equation]] is one written as 'tensor = tensor' with the same free indices, in the same up/down slots, on both sides. The transformation law then guarantees something remarkable: if such an equation holds in one coordinate system, it holds in every coordinate system at once — a single derivation establishes a law of nature in all frames. That is the principle of general covariance, and it is the reason relativity, continuum mechanics and electromagnetism are all written this way. Be honest about one limit, though: having indices does NOT make an object a tensor. The connection coefficients you will soon meet (the Christoffel symbols) carry indices but fail the transformation law, which is exactly why the partial derivative of a tensor is not itself a tensor and the covariant derivative had to be invented.

Write every component with a named index, upper for contravariant slots (arrows, v^i) and lower for covariant slots (gradients, w_i). The position is data, not decoration.
Spot the dummy pairs — each index that is up once and down once — and read them as hidden sums via the Einstein convention; rename any dummy that would collide.
Run the index-balance check: the free indices left and right must match in name and in up/down position. A mismatch, or any index appearing three times, means an error — fix it before going on.
Use delta to substitute and simplify, use epsilon for cross products and determinants, and contract an up with a down whenever you want a quantity all coordinate systems will agree on.