Mixers, Frequency Translation and Linearity (IP3)

Why a radio needs to move a signal in frequency

A piano tuner doesn't tune a string to a perfect 440 Hz by guessing — she plays a reference 440 Hz fork next to it and listens for the slow *beat*, the wah-wah-wah that appears whenever two close tones mix. Speed up the beat, the string is sharp; slow it, it's flat. That beat is two frequencies multiplying and producing a *new* frequency — their difference — out of thin air. A radio mixer is exactly that trick, made deliberate and put to work. It takes the high-frequency signal coming off the antenna and beats it against a clean reference tone, producing a copy of the whole channel sitting at a brand-new, more convenient frequency.

Why bother? Because a 2.4 GHz Wi-Fi signal or a 28 GHz 5G beam is brutal to process directly. Sharp, narrow filters are far easier to build at a low, fixed frequency than at gigahertz. Analog-to-digital converters run out of speed long before 2.4 GHz. And amplifiers with high gain are cheap at low frequency, expensive at high frequency. So nearly every receiver ever built does the same thing: catch the signal high, translate it *down* to a low intermediate frequency (IF) or all the way to baseband near DC, and do the hard work — filtering, gain, digitising — down there where silicon is comfortable. On transmit, the mixer runs in reverse, lifting a baseband signal *up* to the radio frequency the antenna will actually launch.

Multiply two tones, get two new ones

The mathematics behind the beat is one of the most useful identities in all of engineering. Multiply two cosines and a product-to-sum formula splits them into a *sum* frequency and a *difference* frequency — nothing in between, just those two. Feed in the RF signal at f_RF and a local reference (the local oscillator, or LO) at f_LO, and the multiplier hands you copies at f_RF + f_LO and f_RF − f_LO. Keep the difference, throw away the sum with a filter, and you have moved the entire channel down to f_IF = f_RF − f_LO.

The product-to-sum identity is the whole mixer:

  cos(w_RF t) x cos(w_LO t)
      = 1/2 cos((w_RF - w_LO)t)   <- DIFFERENCE  (the IF we keep)
      + 1/2 cos((w_RF + w_LO)t)   <- SUM         (filtered away)

Downconversion example (a Wi-Fi receiver):
  f_RF = 2400 MHz   (signal off the antenna)
  f_LO = 2000 MHz   (clean tone we generate on-chip)
  ----------------------------------------------
  f_IF = 2400 - 2000 = 400 MHz   <- keep this
  sum  = 2400 + 2000 = 4400 MHz  <- toss this

Zero-IF (direct conversion): set f_LO = f_RF exactly
  f_IF = 0  -> the channel lands centred on DC (baseband)

One trig identity does the entire job. The mixer outputs the sum and difference of its two inputs; the radio keeps whichever one it wants and filters the other.

Two architectures fall straight out of this. A superheterodyne receiver picks an LO that lands the signal on a fixed, well-chosen IF (say 70 MHz or 400 MHz), where a fixed crystal or SAW filter does superb channel selection — the dominant radio design for almost a century. A zero-IF or direct-conversion receiver sets f_LO = f_RF exactly, so f_IF = 0 and the channel lands centred on DC. Zero-IF needs no bulky IF filter and integrates beautifully onto a single CMOS die, which is why it dominates modern phone and Wi-Fi transceivers — at the cost of new headaches like DC offset and even-order distortion folding right onto your signal.

The Gilbert cell: a mixer built from a differential pair

Multiplication is a tall order for a silicon circuit — but the differential pair you met in the LNA rung is already most of the way there. Recall its core behaviour: a tail current splits between two transistors in a ratio set by the voltage between their gates. Drive that input small-signal, and the pair acts as a linear amplifier with a transconductance set by the bias. But here is the move that makes a mixer: take the *tail current* of one differential pair and make it the *signal* you want to translate, while a second, faster pair on top switches that current left-and-right in lockstep with the LO. Switching a current with the LO is the same as multiplying it by a square wave that flips between +1 and −1 — and a square wave is, by Fourier, a sum of tones at the LO frequency and its odd harmonics.

       Gilbert-cell active mixer (the CMOS workhorse)

        +Vdd                +Vdd
          |                   |
         [R]                 [R]      <- load: IF appears here (differential)
          |   IF+      IF-    |
          +-------+   +-------+
                  |   |
     LO+ --|[M3]   [M4]|-- LO-        SWITCHING QUAD
               \  /  \  /             (commutates current at LO rate)
     LO- --|[M5]   [M6]|-- LO+
              |        |
              +---++---+
         RF+ --|[M1] [M2]|-- RF-      TRANSCONDUCTOR (gm) stage
                  \    /              (turns RF voltage into a current)
                   |
                 [Itail]              tail current source / bias
                   |
                  GND

  Bottom pair (M1,M2): RF voltage -> signal current  (the gm cell)
  Top quad (M3..M6):   LO flips that current +/-1    (the multiplier)
  Output current x R:   IF voltage = gm * R * (RF x LO)

The double-balanced Gilbert cell: a transconductor at the bottom turns RF voltage into current, and an LO-driven switching quad on top multiplies it — a differential pair stacked on a differential pair.

This is the Gilbert cell, patented by Barrie Gilbert in 1968 and still the default active mixer in nearly every CMOS transceiver today. Stacking it *double-balanced* — symmetric on both the RF and LO sides — buys a priceless property: the strong LO tone and any DC from the RF side cancel out differentially, so they don't appear at the IF output to swamp your faint signal. Unlike a passive diode or switch mixer, the Gilbert cell offers real conversion gain instead of loss, because its bottom stage actively amplifies before the switching. That gain helps the system noise figure — though, per Friis from the last rung, the LNA in front still dominates the noise budget.

Linearity: when multiplying breeds monsters

A perfect mixer or amplifier would obey one rule: double the input, exactly double the output, no matter how big the signal. Real silicon never does. Push a transistor hard and its output starts to bend — the gain sags at large swings, the transfer curve develops a slight S-shape. Mathematically, the output is no longer just a clean copy of the input but a polynomial: a linear term you want, plus a squared term, a cubed term, and so on. Those higher-order terms are tiny when the signal is small, which is why everything seems linear at low levels — but they grow faster than the signal, and at large amplitudes they erupt into spurious tones that were never in the input. This is distortion, and in a receiver it is every bit as deadly as noise.

Two symptoms matter. The first is gain compression: as the input grows, the real output curve droops below the ideal straight line. The input level where the actual gain has dropped by 1 dB from its small-signal value is the famous 1-dB compression point (P1dB) — the practical ceiling on the strongest single signal a block can handle before it starts to crush. The second symptom is subtler and far nastier: when *two* signals are present at once, the cubed term mixes them together to create intermodulation products at frequencies that can land directly on top of your wanted channel. You cannot filter those away, because they are born *inside the band you care about*.

IP3: the number that bounds the strongest signal

We need a single figure of merit for how cleanly a block behaves, and the cleverest one is the third-order intercept point (IP3). It comes from a beautiful slope argument. In a two-tone test, the *wanted* output rises 1 dB for every 1 dB you raise the input — slope 1 on a log-log plot. But the IM3 product, born of a cubic term, rises 3 dB for every 1 dB of input — slope 3. Two lines with different slopes must eventually cross. Extrapolate both straight lines upward and the point where they meet is the IP3: a single power level (input-referred IIP3 or output-referred OIP3) that compactly captures the block's whole linearity.

TWO-TONE TEST: drive f1 and f2, watch the output spectrum

  Output power (dBm)
    ^
    |                                        . IP3 (the
    |                                   . *    intercept,
    |                              .   *       extrapolated --
    |  wanted tone  --->      .   *            never reached!)
    |  (slope = 1)        .       *
    |                 .          *  <--- IM3 product (slope = 3)
    |             .           *
    |         .            *
    |     .             *
    +--.-------------*--------------------> Input power (dBm)
              ^IIP3 (input-referred)

Spectrum at the output (the four tones that matter):

         IM3        f1   f2        IM3
          |          |    |          |
    -----[|]--------[|]--[|]--------[|]-----> freq
       2f1-f2                     2f2-f1
        ^ falls INSIDE the band -> cannot be filtered

Handy rule of thumb (input-referred, same units):
  IIP3  ~=  P_in  +  (delta / 2)
  where delta = (wanted output) - (IM3 output), in dB,
  measured at one convenient input power well below IP3.

The two-tone picture: the wanted tone climbs at slope 1, the IM3 product at slope 3. Their extrapolated intersection is IP3 — the higher it is, the more linear the block.

Drive the block with two equal tones at f₁ and f₂, close together but distinct, at a power safely below compression.
On a spectrum analyser, read the wanted tone power and the IM3 product power (at 2f₁−f₂ or 2f₂−f₁) in dBm.
Take their difference Δ in dB — the more linear the block, the larger this gap.
Project to the intercept: because the slopes are 1 and 3, the lines close the gap Δ at a rate of 2 dB per 1 dB of input, so IIP3 ≈ P_in + Δ/2.
IP3 is a fiction you never actually reach — the block compresses long before it. It is purely an extrapolated yardstick for comparing linearity.

A worked number makes it concrete. Suppose you drive an LNA-plus-mixer front end with two tones at −30 dBm each, and on the analyser the wanted tones sit at −10 dBm output while the IM3 products sit at −70 dBm. The gap Δ is 60 dB, so the *output*-referred OIP3 is −10 + 60/2 = +20 dBm, and the input-referred IIP3 is −30 + 60/2 = 0 dBm. That IIP3 of 0 dBm now sets the ceiling: combined with the receiver's noise floor from the previous rung, it defines the spurious-free dynamic range — the span between the faintest signal noise will let you hear and the strongest interferer distortion will let you tolerate. Gain, noise figure, and linearity, together, are the three legs the whole front-end stands on.

The whole front-end, in tension

Step back and the three rungs of this track snap into one picture. The LNA gave you gain with low noise so the faintest signals survive. The mixer translates frequency so the rest of the radio can do its work cheaply. And now linearity — IP3 and P1dB — sets the other end of the range: how loud the world can get before strong interferers manufacture phantom signals on top of your channel. These three pull against each other constantly. Crank up gain and you compress sooner, lowering IP3. Bias a transistor for the lowest noise and you often sacrifice linearity. Burn more current and you can buy back IP3 — but battery life and heat say no.

This is why a modern phone front-end is a small symphony of compromise: an LNA chosen for noise, a Gilbert-cell mixer chosen for conversion gain and linearity, an LO from a frequency synthesiser chosen for purity, all co-designed so that gain, noise figure, and IP3 land inside the link budget at once. Get any one wrong and the radio either goes deaf to weak signals or chokes on strong ones. Master all three together and you have built the front-end of every wireless device on Earth.