Pulling Signal from Noise: Matched Filters and Error-Correcting Codes

The receiver's problem: a known shape buried in hiss

Picture a lighthouse keeper at night watching for one specific flash pattern from a ship — three short, one long. The sea is full of random glints: moonlight on waves, a distant lamp, the keeper's own tired eyes. The job is not to see *light*, it is to see *that pattern*. A radio receiver faces exactly this. The transmitter sends a known pulse shape p(t) for each bit; the channel buries it under thermal noise. The receiver already knows the shape it is hunting for. The deep question of this rung is: given that you know the shape, what is the single best way to look for it?

From rung 5 you carry one hard number: the bit error rate (BER). For the workhorse modulation BPSK, BER falls as a steep function of the ratio Eb/N0 — the energy per bit divided by the noise power spectral density. Push Eb/N0 up by a few dB and BER drops by orders of magnitude. So every technique in this guide is, at heart, a machine for raising the *effective* Eb/N0 your decision circuit sees — without turning up the transmit power.

The matched filter: correlate with the shape you expect

Here is the keeper's trick made mathematical. Instead of staring at the raw waveform r(t) at any single instant, multiply it point-by-point by a copy of the known pulse p(t) and add up the result over the whole symbol. That running sum is a correlation — it asks, "how much does what I received look like the shape I sent?" When the received energy lines up with the template, the products are all positive and the sum swells to a peak. Random noise, having no allegiance to the shape, pushes the products up and down in equal measure and largely cancels itself. The matched filter is exactly this correlator, built as a filter whose impulse response is the time-reversed pulse, h(t) = p(T − t).

Transmitted pulse p(t)        Received r(t) = p(t) + noise
   ___                            .--._.-'`-.__.'\ . _.
  /   \                       .'\/      `.    `'  `
 _/     \_                  _.'             noise everywhere
                            (which sample do you trust?)

Matched filter = slide a time-reversed copy of p(t) along r(t),
multiply & integrate (this IS correlation):

   y(t) = INTEGRAL  r(tau) * p(tau - (t - T))  d tau

        noise-only stretch        SIGNAL aligns -> PEAK
          ___   _   __              |
   y(t): /   \_/ \_/  \___ . . . __/ \__ . . .
        (small, wobbly)            ^
                            SAMPLE EXACTLY HERE  (t = T)
                            -> maximum SNR, minimum BER

Correlating r(t) with the known pulse makes the signal pile up into one tall peak while noise stays low. Sample at that peak.

Why is this *the* optimum and not just *a* good idea? The matched filter is provably the linear filter that maximizes the signal-to-noise ratio at one chosen sampling instant for additive white Gaussian noise. The proof is the Cauchy–Schwarz inequality: among all possible weightings of the incoming waveform, the one that weights each piece in exact proportion to the signal energy there — and nowhere else — extracts the most signal per unit of noise. The peak SNR it delivers is 2·Eb/N0, and remarkably it depends only on the pulse's *energy*, not on its shape. A wide gentle pulse and a narrow sharp one give identical performance if they carry the same energy.

When the filter isn't enough: errors still slip through

The matched filter is optimal, but optimal is not the same as perfect. Even at its best, a deep noise spike can still shove the sampled value across the decision threshold and the slicer calls a 0 a 1. At Eb/N0 = 9.6 dB, BPSK still flips roughly one bit in 100,000. For a 1 Gbit/s link that is ten thousand errors every second — every photo, every payment, every line of code peppered with flipped bits. You cannot filter your way out of this; the matched filter has already done everything a filter can do.

The old answer was *ask again*: detect that something arrived garbled and request a retransmission (ARQ). That works for a web page, but it is a disaster for a Mars probe 20 light-minutes away, for a live video call, or for a Blu-ray disc that cannot phone home. The modern answer is to send the cure *along with* the disease. Add carefully structured redundant bits so the receiver can not only detect that an error happened but locate and fix it on its own. This is forward error correction (FEC) — the second great SNR-buying machine.

Worked example: a Hamming code that fixes one flipped bit

Richard Hamming, fed up with weekend computer runs ruined by a single bad bit, asked in 1950: how few extra bits do I need to *find and fix* one error? His answer, the Hamming(7,4) code, takes 4 data bits and adds 3 parity bits to make a 7-bit codeword. The trick is placement. Put the parity bits at positions 1, 2, 4 (the powers of two) and the data bits at 3, 5, 6, 7. Each parity bit guards exactly the positions whose binary index includes its bit. Then any single-bit error makes those parity checks fail in a pattern that spells out, in binary, the exact position that went wrong.

Position:   1    2    3    4    5    6    7
Role:      p1   p2   d1   p4   d2   d3   d4

Parity coverage (position's binary index has that bit set):
  p1 (001) checks positions 1,3,5,7   -> bit 0 of the index
  p2 (010) checks positions 2,3,6,7   -> bit 1 of the index
  p4 (100) checks positions 4,5,6,7   -> bit 2 of the index
Each p is set so its group has EVEN parity.

--- ENCODE data 1011 (d1 d2 d3 d4) ---
  p1 over {d1,d2,d4}=1,0,1 -> need even -> p1=0
  p2 over {d1,d3,d4}=1,1,1 -> need even -> p2=1
  p4 over {d2,d3,d4}=0,1,1 -> need even -> p4=0
  codeword = 0 1 1 0 0 1 1   (pos 1..7)

--- CHANNEL flips bit 5 ---  received = 0 1 1 0 1 1 1

--- DECODE: recompute the 3 checks ---
  c1 = parity of pos 1,3,5,7 = 0,1,1,1 = ODD  -> 1
  c2 = parity of pos 2,3,6,7 = 1,1,1,1 = EVEN -> 0
  c4 = parity of pos 4,5,6,7 = 0,1,1,1 = ODD  -> 1
  syndrome (c4 c2 c1) = 1 0 1 (binary) = 5
  ==> bit 5 is wrong. FLIP IT BACK. Error corrected, no resend.

Hamming(7,4): the three failed-check bits read out, in binary, the exact position to flip. Syndrome 101 = position 5.

Sit with how elegant that is. Three checks give a 3-bit syndrome with 8 possible values: 000 means "no error," and the other 7 point at exactly which of the 7 positions to flip. No table lookup of the data is needed; the syndrome *is* the address of the fault. The cost is overhead: you sent 7 bits to carry 4, a code rate of 4/7 ≈ 0.57. Spend a bit of bandwidth, get back the power to self-heal one error per block.

From Hamming to the codes in your pocket

Hamming codes are the friendly first step; real systems lean on heavier machinery, each tuned to the kind of errors it faces. The thread connecting them is always the same: structured redundancy plus a clever decoder.

Reed–Solomon works on whole symbols (bytes), not single bits, so it shrugs off *bursts* of damage — a scratch on a CD, a fading dropout. It ruled CDs, DVDs, QR codes, and deep-space links for decades. A Voyager image survives because RS quietly repairs the smear.
Convolutional codes + Viterbi decoding spread each bit's influence across a sliding window and find the most likely transmitted sequence. Pair them with an outer RS code (a concatenated scheme) and you get robust, layered protection — the classic deep-space and early-cellular recipe.
Turbo codes (1993) and LDPC codes decode *iteratively*, passing soft probability estimates back and forth until they converge. They march to within a fraction of a dB of the theoretical limit. LDPC now guards Wi‑Fi 6, 5G data channels, and modern storage; turbo codes carried 3G/4G.

That phrase "the theoretical limit" is not loose talk. It is Shannon's channel capacity — the hard ceiling on error-free bits per second a noisy channel can carry. Shannon proved in 1948 that codes achieving reliable communication *up to* capacity must exist; he just didn't say how to build them. The 65-year hunt from Hamming to LDPC was the engineering quest to actually reach the line he drew. Today's best codes sit within tenths of a dB of Shannon — the chase is essentially won.

Coding gain: the decibels you buy back

How do we score a code in a way an engineer can spend? With coding gain: the reduction in Eb/N0 needed to hit a target BER once the code is switched on. Pick a goal, say BER = 10⁻⁵. Uncoded BPSK needs about 9.6 dB to get there. A decent convolutional code reaches the same BER at perhaps 5 dB — a coding gain of ~4.6 dB. Modern LDPC at a good rate can buy well over 7 dB. Remember from earlier that 3 dB is a doubling of power; so a 6 dB coding gain is like having a transmitter four times more powerful, conjured purely from arithmetic at the receiver.

BER
 1e-2 |* uncoded BPSK
      | *\
 1e-3 |  * \        coded (e.g. LDPC)
      |   *  \        .
 1e-4 |    *   \      .
      |     *   \    .
 1e-5 |......*....\..*................. <- target BER
      |       *    \  .
 1e-6 |        *    \  .
      +--+----+----+--+----+---- Eb/N0 (dB)
         3    5    7  9.6  11
                |<-- ~4-7 dB -->|
                  CODING GAIN  (= cheaper transmitter)

Coding shifts the whole BER-vs-Eb/N0 curve left. The horizontal gap at the target BER is the coding gain.

Stack the two and you have the modern receiver's spine. The matched filter pulls the cleanest possible decision out of the analog haze, handing the slicer the best Eb/N0 physics allows. FEC then mops up the residual errors and buys several more decibels on top. Every modem in your phone, every Wi‑Fi chip, every satellite ground station runs this exact one-two punch — squeeze the signal, then heal the bits — to live as close to Shannon's line as silicon can manage.