Chapter 16: Classical Error Correction Refresher

Quantum computers are extraordinarily fragile. A single stray photon, a tiny temperature fluctuation, or an unwanted magnetic field can corrupt a qubit's state. Before we can understand how to protect quantum information - the subject of the next several chapters - we need to understand how classical error correction works. The ideas are older, simpler, and remarkably elegant. They also contain the seeds of every quantum error-correcting code we will encounter.

This chapter is entirely classical. We start with the most obvious strategy - make copies - and build up to linear codes and the Hamming codes that correct any single-bit error with minimal overhead. We close with Shannon's noisy channel coding theorem, the absolute limit of what error correction can achieve. Along the way, we develop the vocabulary of parity checks, syndromes, and generator matrices that will carry directly into the quantum setting.

16.1 Redundancy and Repetition Codes

The simplest idea in error correction is also the most ancient: say it more than once. If you shout "yes, yes, yes!" across a noisy field, your listener can take the majority vote of what they hear and correctly decode your message even if one repetition is garbled. This is the essence of a repetition code.

The 3-Bit Repetition Code

Suppose we want to send a single bit - either 0 or 1 - through a noisy channel where each bit has some probability $p$ of being flipped. Instead of sending the bare bit, we encode it by tripling:

$$0 \longrightarrow 000, \qquad 1 \longrightarrow 111$$

These two strings, $000$ and $111$, are our codewords. The receiver gets three bits, some of which may have been flipped, and decodes by taking the majority vote: whichever value appears at least twice is declared the original bit.

Key Concept.

A repetition code encodes one logical bit into $n$ identical copies. The receiver decodes by majority vote: the value held by more than half the bits wins. The 3-bit repetition code can correct any single-bit error, since flipping one bit still leaves two copies of the correct value.

If the channel flips each bit independently with probability $p$, the probability of a decoding failure (two or more flips) is:

$$P_{\text{fail}} = 3p^2(1-p) + p^3 = 3p^2 - 2p^3$$

This beats the uncoded error rate $p$ whenever $p < 1/2$. For $p = 0.01$, the failure probability drops from $0.01$ to roughly $0.0003$ - an improvement by a factor of 33.

Hamming Distance

Why does the repetition code work? The two codewords $000$ and $111$ differ in every position. The number of positions in which two binary strings differ is called their Hamming distance, named after Richard Hamming, who introduced it in 1950.

$$d(000, 111) = 3$$

The minimum distance of a code is the smallest Hamming distance between any pair of distinct codewords. For the 3-bit repetition code, there are only two codewords and their distance is 3, so $d_{\min} = 3$. The general rule is:

Key Concept.

A code with minimum distance $d$ can:

Detect up to $d - 1$ errors (any error pattern of weight less than $d$ moves a codeword to a non-codeword).
Correct up to $\lfloor (d-1)/2 \rfloor$ errors (the received word is still closer to the original codeword than to any other).

For the 3-bit repetition code with $d = 3$: it can detect up to 2 errors and correct up to 1 error. This matches our majority-vote analysis perfectly.

The Cost of Repetition

The 3-bit repetition code uses 3 physical bits for 1 logical bit - a rate of $R = k/n = 1/3$. A 5-bit repetition code corrects up to 2 errors but has rate $1/5$. In general, the rate is $1/n$, going to zero as we demand more protection. Can we protect multiple data bits simultaneously at a reasonable rate? Yes - and this leads us to linear codes.

Repetition Code in a Quantum Sandbox

Although this chapter is classical, we can illustrate the repetition code concept using CNOT gates on qubits. The circuit below encodes the state of q[0] into three qubits by "copying" it to q[1] and q[2] via CNOT. An X gate on q[1] simulates a single-bit-flip error. Then we decode by majority vote using two CNOT gates and a Toffoli, extracting the corrected bit into q[0]. Run the circuit and verify that the error is corrected.

The corrected logical bit is in q[0], which should read 1 with 100% probability. The other two qubits contain syndrome information (not the original encoded values), so the full output will be 100 for no error, 110 for error on q[1], etc. The key point is that q[0] always recovers the correct logical value. Try moving the error to q[0] or q[2] - any single-bit flip is corrected in q[0].

Note.

In the quantum world, this repetition code only corrects bit-flip errors (X errors). It cannot correct phase-flip errors (Z errors), which have no classical analog. We will confront this limitation directly in Chapter 17 when we introduce quantum error correction.

16.2 Linear Codes: Parity Checks and Hamming Codes

Linear codes are far more sophisticated than repetition. They protect $k$ data bits using $n$ total bits ($n > k$), adding only $r = n - k$ parity check bits. A single parity bit can protect multiple data bits simultaneously.

The Parity Check Idea

A parity check is a single bit that records whether a certain subset of bits has an even or odd number of 1s. For example, given three data bits $d_1, d_2, d_3$, we might define a parity bit:

$$p = d_1 \oplus d_2 \oplus d_3$$

where $\oplus$ denotes addition modulo 2 (XOR). If $d_1 = 1, d_2 = 0, d_3 = 1$, then $p = 1 \oplus 0 \oplus 1 = 0$. The four-bit string $d_1 d_2 d_3 p = 1010$ satisfies the constraint $d_1 \oplus d_2 \oplus d_3 \oplus p = 0$. If any single bit is flipped, this parity sum becomes 1, alerting us that an error occurred. One parity bit can detect a single error but cannot locate it - it only tells us that something went wrong, not where.

Arithmetic in GF(2)

Classical coding theory takes place over $\text{GF}(2)$ (also written $\mathbb{F}_2$) - the field with two elements, 0 and 1. Addition is XOR and multiplication is AND. The crucial rule is $1 + 1 = 0$ (no carrying). Subtraction equals addition since $-1 = 1$. All the matrix algebra from earlier chapters carries over - we just work mod 2.

The Generator Matrix

A linear code is a code whose set of codewords forms a vector subspace of $\text{GF}(2)^n$. In practice, this means we can describe the entire code using a generator matrix $G$ of size $k \times n$. To encode a message $\mathbf{m} = (m_1, \ldots, m_k)$, we compute:

$$\mathbf{c} = \mathbf{m} \cdot G$$

where all arithmetic is in $\text{GF}(2)$. The result $\mathbf{c}$ is an $n$-bit codeword. Because the encoding is a linear map, the sum of any two codewords is also a codeword - this is the defining property of a linear code.

A generator matrix in systematic form looks like $G = [I_k \mid P]$, where $I_k$ is the $k \times k$ identity matrix and $P$ is a $k \times r$ matrix that determines the parity bits. In systematic form, the first $k$ bits of the codeword are the original data bits, and the last $r$ bits are the parity checks.

The Parity Check Matrix

Dual to $G$ is the parity check matrix $H$ of size $r \times n$ ($r = n - k$). A vector $\mathbf{c}$ is a valid codeword if and only if:

$$H \mathbf{c}^T = \mathbf{0}$$

When the generator matrix is in systematic form $G = [I_k \mid P]$, the parity check matrix is:

$$H = [P^T \mid I_r]$$

The key identity $G H^T = \mathbf{0}$ guarantees every codeword satisfies all parity checks. Each row of $H$ is a parity equation that every valid codeword must satisfy.

Syndrome Decoding

Suppose we send a codeword $\mathbf{c}$ and receive $\mathbf{r} = \mathbf{c} + \mathbf{e}$, where $\mathbf{e}$ is the error vector (a string with 1s in the corrupted positions). The syndrome of the received word is:

$$\mathbf{s} = H \mathbf{r}^T = H(\mathbf{c} + \mathbf{e})^T = H\mathbf{c}^T + H\mathbf{e}^T = \mathbf{0} + H\mathbf{e}^T = H\mathbf{e}^T$$

The syndrome depends only on the error, not on which codeword was sent. This is the magic of linear codes. If $\mathbf{s} = \mathbf{0}$, no error is detected. If $\mathbf{s} \neq \mathbf{0}$, the syndrome tells us exactly which error pattern occurred (for correctable errors).

Key Concept.

Syndrome decoding computes $\mathbf{s} = H\mathbf{e}^T$. For a single-bit error at position $i$, the error vector $\mathbf{e}$ has a single 1 in position $i$, so the syndrome is simply the $i$-th column of $H$. If all columns of $H$ are distinct and nonzero, every single-bit error produces a unique syndrome, and the error can be corrected by flipping the bit whose column matches the syndrome.

The Hamming(7,4) Code

Richard Hamming's insight (1950) was to design $H$ so that its columns are all the nonzero binary strings of a given length. For the Hamming(7,4) code ($n=7$, $k=4$, $r=3$), the $2^3 - 1 = 7$ nonzero binary strings of length 3 become the columns of $H$:

$$H = \begin{pmatrix} 1 & 1 & 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 1 & 0 & 1 & 0 \\ 0 & 1 & 1 & 1 & 0 & 0 & 1 \end{pmatrix}$$

Reading off the systematic form, the last three columns form $I_3$, and the first four columns form $P^T$. The corresponding generator matrix in systematic form is:

$$G = \begin{pmatrix} 1 & 0 & 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{pmatrix}$$

The first four columns of $G$ are the identity (systematic form), so encoding preserves the four data bits and appends three parity bits.

Parameters of Hamming Codes

The general Hamming code is parametrized by an integer $r \geq 2$:

Block length: $n = 2^r - 1$
Data bits: $k = 2^r - 1 - r$
Parity bits: $r$
Minimum distance: $d = 3$ (corrects 1 error, detects 2)

$r$	$n$	$k$	Rate $k/n$	Name
2	3	1	0.333	Hamming(3,1) = repetition
3	7	4	0.571	Hamming(7,4)
4	15	11	0.733	Hamming(15,11)
5	31	26	0.839	Hamming(31,26)

As $r$ grows, the rate approaches 1 - the overhead of parity bits becomes negligible relative to the data. This is a dramatic improvement over the repetition code. The Hamming(7,4) code protects 4 data bits with only 3 parity bits, achieving a rate of $4/7 \approx 0.571$ while still correcting any single-bit error.

Historical Note.

Hamming developed his codes after growing frustrated with errors on the Bell Labs Model V relay computer. When errors occurred on weekends (when no operators were present), the machine simply halted and waited. Hamming decided that machines should fix their own errors, and the result was one of the foundational contributions to coding theory.

The Hamming Bound

How efficient can a single-error-correcting code be? The Hamming bound (also called the sphere-packing bound) gives a limit. A code with $n$-bit codewords that corrects $t$ errors must satisfy:

$$2^k \cdot \sum_{i=0}^{t} \binom{n}{i} \leq 2^n$$

For $t = 1$: $2^k \cdot (1 + n) \leq 2^n$, which gives $k \leq n - \log_2(1 + n)$. When $n = 2^r - 1$, we get $k \leq 2^r - 1 - r$, which is exactly the number of data bits in a Hamming code. In other words, Hamming codes are perfect codes - they achieve the Hamming bound with equality. Every possible received word is within Hamming distance 1 of exactly one codeword.

Interactive Parity Check Encoder/Decoder

Use the widget above to explore the Hamming(7,4) code. Enter four data bits to see the encoded 7-bit codeword. Then flip any single bit in the received word to watch syndrome decoding identify and correct the error.

Beyond Single-Error Correction

Hamming codes only correct single-bit errors. For applications requiring more protection, the coding theory toolbox includes:

Extended Hamming codes: Add an overall parity bit to get distance 4, enabling single-error correction and double-error detection (SECDED). The extended Hamming(8,4) code is used in ECC memory.
BCH codes: A generalization of Hamming codes that can correct multiple errors.
Reed-Solomon codes: Work over larger fields; used in CDs, DVDs, QR codes, and deep-space communication.
LDPC and Turbo codes: Modern codes approaching Shannon's theoretical limit, used in 5G cellular and Wi-Fi.

16.3 The Limits of Classical EC

Is there a fundamental limit to how well error correction can perform? The answer, given by Claude Shannon in 1948, is one of the most profound results in information theory.

The Binary Symmetric Channel

The simplest noise model is the binary symmetric channel (BSC): each bit is flipped independently with probability $p$ (the crossover probability):

$$0 \xrightarrow{1-p} 0 \qquad 0 \xrightarrow{p} 1$$ $$1 \xrightarrow{1-p} 1 \qquad 1 \xrightarrow{p} 0$$

When $p = 0$ the channel is perfect; when $p = 1/2$ the output is independent of the input. For $0 < p < 1/2$, some information survives the noise and error correction helps.

Channel Capacity

Shannon defined the capacity of a channel as the maximum rate at which information can be transmitted with arbitrarily low error probability. For the binary symmetric channel, the capacity is:

$$C = 1 - H(p)$$

where $H(p)$ is the binary entropy function:

$$H(p) = -p \log_2 p - (1-p) \log_2(1-p)$$

Key Concept.

Shannon's Noisy Channel Coding Theorem (1948): For any rate $R < C$ (where $C$ is the channel capacity), there exist codes that achieve arbitrarily low error probability. Conversely, for any rate $R > C$, reliable communication is impossible regardless of the code used.

Let us unpack this. The theorem has two parts:

Achievability (positive part): If you want to send data at any rate below capacity, you can do it reliably. There exist codes (possibly with very long block lengths) that make the error probability as small as you wish.
Converse (negative part): If you try to send data at any rate above capacity, you cannot avoid errors. No code, no matter how clever, can achieve reliable communication above the capacity limit.

Numerical Examples

For a BSC with $p = 0.01$ (1% error rate): $H(0.01) \approx 0.081$, so $C \approx 0.919$. Even with a 1% error rate, we can reliably transmit at up to 91.9% of the raw channel rate. The Hamming(7,4) code at rate $4/7 \approx 0.571$ is reliable but far from optimal. Modern LDPC codes operate within a fraction of a percent of Shannon's limit.

$p$	$H(p)$	$C = 1 - H(p)$	Interpretation
0	0	1	No noise, full capacity
0.01	0.081	0.919	Mild noise
0.1	0.469	0.531	Moderate noise
0.25	0.811	0.189	Heavy noise
0.5	1.0	0	Complete randomness

At $p = 0.5$, every output bit is equally likely to be 0 or 1 regardless of the input, so the channel carries zero useful information.

Shannon's Theorem Is Existential

Shannon's theorem proves that good codes exist, but does not tell you how to find them. His proof used random coding: pick a code at random and, with high probability, it will be good. The quest for explicit, efficiently decodable codes near Shannon's limit drove decades of research - from Hamming (1950) through turbo codes (1993) and LDPC codes (1990s). Modern polar codes (Arikan, 2009) provably achieve capacity, and LDPC codes in 5G standards operate within 0.1 dB of the Shannon limit.

Why This Matters for Quantum EC

Shannon's framework carries over to the quantum setting, but with key complications:

Richer error models. Classical bits only flip. Qubits suffer bit flips (X), phase flips (Z), or both (Y), plus continuous rotations.
No-cloning forbids copying. The repetition code works by copying bits. Quantum mechanics forbids copying unknown states. Quantum EC must protect information without duplicating it - achievable through entanglement.
Syndrome decoding survives. Stabilizer codes - the quantum generalization of linear codes - use parity measurements that extract syndrome information without collapsing the quantum state.

Common Misconception.

The no-cloning theorem does not make quantum error correction impossible. Quantum EC does not copy the state - it spreads information across multiple qubits using entanglement. The logical qubit lives in the correlations among physical qubits, not in any single one. In Chapter 17 we will see exactly how this works.

Classical-to-Quantum Dictionary

The classical vocabulary we built in this chapter maps directly to quantum error correction:

Classical Concept	Definition	Quantum Analog (Preview)
Codeword	A valid encoded bit string satisfying all parity checks	Code state (lives in a code subspace)
Parity check matrix $H$	Matrix whose null space is the code; rows are parity constraints	Stabilizer generators
Syndrome $\mathbf{s} = H\mathbf{e}^T$	Vector revealing the error pattern	Stabilizer measurement outcomes
Minimum distance $d$	Smallest Hamming weight of a nonzero codeword	Minimum weight of undetectable error
Code rate $R = k/n$	Fraction of bits carrying data	Ratio of logical to physical qubits
Shannon capacity	Maximum reliable transmission rate	Quantum channel capacity