Uniqueness: When a Transform Pins Down a Distribution

The quiet promise behind every transform trick

Step back and notice what you have been doing all rung. You took a sum of independent variables, multiplied their transforms, recognized the product as some known transform, and then declared, "so the sum has that distribution." That last step is the whole game, and it quietly assumes something deep: that recognizing the transform is the same as recognizing the distribution. A transform like the moment generating function or the characteristic function squashes an entire distribution down into one tidy function of a dummy variable. The uniqueness theorem is the guarantee that no information was lost in the squashing — that you can always unsquash back to exactly one distribution.

Why is this not obvious? A transform is a single function, and a distribution is also a single object, but it is far from clear that the map between them is one-to-one. Plenty of natural summaries are not: two very different distributions can share the same mean, or even the same mean and variance, or — as you saw two guides ago — the very same first ten moments while still differing. So if knowing a handful of moments does not pin down a distribution, why should knowing a transform? The answer is that a transform is not a handful of numbers; it is a whole function, an uncountable infinity of values, and that turns out to be exactly enough information to recover the distribution completely.

Why a whole function is enough, when a few moments are not

Hold the discrete case in your hand first, because there the magic is transparent. The probability generating function of a non-negative integer variable is G(s) = E[s^X] = P(X=0) + P(X=1) s + P(X=2) s^2 + .... This is just an ordinary power series whose coefficients ARE the probabilities. Two power series that agree as functions must have identical coefficients, term by term — that is the rigidity of polynomials and power series. So if two count variables have the same pgf, then P(X=k) = P(Y=k) for every k, and they are the same distribution. Uniqueness here is not a miracle; it is the statement that a power series remembers each of its coefficients.

The continuous case works the same way in spirit, just with an integral instead of a sum. The Fourier inversion formula does the unsquashing explicitly: given the characteristic function phi(t) = E[e^(itX)], you can reconstruct the density (when one exists) by integrating phi(t) e^(-itx) over all t and dividing by 2 pi. You do not need to ever run this integral by hand — its mere existence is the point. It says the recipe from distribution to transform is reversible, so the map is one-to-one, so the transform is a faithful fingerprint. This connects straight to a theme from much earlier in the ladder: the distribution is the complete description of a random variable, and a transform is just that same complete description written in a different alphabet.

The mgf version and its fine print

The cleanest practical statement is the mgf uniqueness theorem: if two random variables have moment generating functions that are equal and finite on some open interval of t containing zero, then they have the same distribution. The phrase "on an open interval around zero" is the load-bearing fine print. It is not enough that the mgfs match at a few scattered points, and it is not enough that they match only at zero (every mgf equals 1 at t = 0, which tells you nothing). You need a whole little neighborhood of agreement; that neighborhood is what encodes all the moments and hence the function.

But the theorem has a real precondition that you must respect: the mgf has to exist near zero in the first place, and for some perfectly ordinary distributions it does not. The lognormal distribution has all its moments finite, yet its mgf is infinite for every positive t, so the mgf simply is not available to apply the theorem to. Worse, the lognormal is the textbook example of a distribution that is NOT determined by its moments — there is a whole family of distinct distributions sharing the lognormal's every moment. This is the sharpest possible warning that an mgf, when it exists, is far stronger than a moment list, and that an absent mgf is a genuine wall, not a formality.

This is exactly the moment when the previous guide pays off. The characteristic function carries an even stronger uniqueness theorem — characteristic functions uniquely determine distributions, full stop — and it comes with no precondition, because the characteristic function of any random variable always exists (it is the expectation of e^(itX), whose magnitude is bounded by 1, so the integral can never blow up). So the lognormal that defeats the mgf is still uniquely pinned down by its characteristic function. The transform you reach for can fail; the underlying uniqueness, properly stated, never does.

Uniqueness as a proof engine

Now the payoff in practice. Uniqueness converts every identity between transforms into an identity between distributions, and that is how transform methods actually prove things. Whenever you compute a transform and recognize the answer, you are allowed to read off the distribution — no integration of densities, no convolution, no clever change of variables. The pattern is always the same three beats: (1) write the transform of the thing you care about, (2) simplify using sums-become-products or known forms, (3) match the simplified transform to a catalog entry and invoke uniqueness to name the distribution.

Claim to prove: the sum of two independent normals is normal. Let X ~ Normal(mu1, sigma1^2) and Y ~ Normal(mu2, sigma2^2) be independent, and look at S = X + Y.
Use the sums-to-products rule from earlier in this rung: the mgf of an independent sum is the product, so M_S(t) = M_X(t) * M_Y(t).
Plug in the known normal mgf, M(t) = exp(mu t + sigma^2 t^2 / 2), and multiply. Adding the exponents gives exp((mu1+mu2) t + (sigma1^2+sigma2^2) t^2 / 2).
Recognize the result: it is exactly the mgf of a Normal(mu1+mu2, sigma1^2+sigma2^2). By the mgf uniqueness theorem, S must HAVE that distribution. Done — no density integral was ever computed.

That argument is the engine behind the fact that independent normals add to a normal, and the very same machine, run with characteristic functions instead of mgfs, is what makes the transform proof of the Central Limit Theorem possible: you show the characteristic function of a standardized sum converges to e^(-t^2/2), and then uniqueness, in its limiting form, lets you conclude the distribution converges to the standard normal. Uniqueness is the hinge on which the whole subject swings from 'these functions match' to 'these distributions are the same'.

Reading the fingerprint honestly

It helps to fix the right picture: a transform is a fingerprint of a distribution. Match the fingerprint everywhere it is defined and you have caught the unique suspect; but you must be careful about what 'everywhere' means and which fingerprint you used. A few traps catch beginners. Matching mgfs only at t = 0 proves nothing, since all mgfs agree there. Matching at a scatter of isolated points proves nothing, since two different analytic functions can cross at points. And matching mgfs over an interval is only valid if both mgfs actually exist on that interval — otherwise you are comparing a fingerprint to a blank.

Two further honest cautions are worth carrying. First, uniqueness is about the distribution, not the variable: two random variables with the same transform are identically distributed, which does not mean they are equal as functions on the sample space, nor that they are independent or dependent in any particular way — only that their distributions coincide. Second, do not over-read the moments. Even when an mgf exists and uniquely determines a distribution, that determination lives in the whole function; reconstructing a distribution from a truncated list of its moments is a different and often ill-posed problem, which is why the lognormal can be moment-twinned by impostors while its characteristic function still tells the truth.

Pull the rung together. The mgf and pgf are the friendly, computational transforms: easy to differentiate for moments, easy to multiply for sums, and unique whenever they comfortably exist. The characteristic function is the rigorous backstop: always there, always unique, the tool that survives the Cauchy and the lognormal where the mgf gives up. Uniqueness is the single theorem that makes all of them earn their keep — it is the reason 'I recognized the transform' is a complete and rigorous answer to 'what is the distribution?'