Chapter 29: Quantum Shannon Theory
Shannon's classical theory of communication established that every noisy channel has a well-defined capacity - a maximum rate at which information can be transmitted reliably. The quantum version of this theory is richer and more surprising. A quantum channel has not one but several capacities, depending on whether we want to send classical bits, quantum bits, or use entanglement assistance. This chapter develops the four pillars of quantum Shannon theory: the HSW theorem for classical communication, the LSD theorem for quantum communication, entanglement-assisted capacity, and Schumacher compression for quantum data.
29.1 Classical Information over Quantum Channels
The most natural question about a quantum channel is: how much classical information can it transmit? Alice encodes a classical message $x$ into a quantum state $\rho_x$ and sends it through the channel $\mathcal{N}$. Bob receives $\mathcal{N}(\rho_x)$ and performs a measurement to decode the message. The maximum reliable transmission rate is the classical capacity $C(\mathcal{N})$.
The Holevo Bound Revisited
As we saw in Chapter 26, the Holevo bound limits the mutual information between Alice's message and Bob's measurement outcome in a single use of the channel. For an ensemble $\{p_x, \rho_x\}$:
$$I(X:Y) \leq \chi(\{p_x, \mathcal{N}(\rho_x)\}) = S\!\left(\sum_x p_x \mathcal{N}(\rho_x)\right) - \sum_x p_x S(\mathcal{N}(\rho_x))$$But the Holevo bound applies to a single channel use. In Shannon theory, we are interested in what happens when we use the channel many times and are allowed to encode across multiple uses (block coding). Can we do better than the single-use Holevo bound? The answer, in a precise asymptotic sense, is given by the HSW theorem.
The HSW Theorem
The Holevo-Schumacher-Westmoreland (HSW) Theorem. The classical capacity of a quantum channel $\mathcal{N}$ is: $$C(\mathcal{N}) = \lim_{n \to \infty} \frac{1}{n} \chi(\mathcal{N}^{\otimes n})$$ where $\chi(\mathcal{N}^{\otimes n}) = \max_{\{p_x, \rho_x\}} \chi(\{p_x, \mathcal{N}^{\otimes n}(\rho_x)\})$ is the Holevo information maximized over all input ensembles for $n$ uses of the channel. The regularization (limit over $n$) is necessary because the Holevo information can be superadditive.
The HSW theorem was proved independently by Holevo (1998) and by Schumacher and Westmoreland (1997). The achievability proof constructs random codebooks of product states and shows that typical measurement (a quantum generalization of joint typicality decoding) can reliably distinguish the codewords. The converse follows from the Holevo bound.
The Additivity Problem
The regularization in the HSW formula - the limit $\lim_{n \to \infty} \frac{1}{n}\chi(\mathcal{N}^{\otimes n})$ - is a serious complication. If the Holevo information were additive ($\chi(\mathcal{N}^{\otimes n}) = n \cdot \chi(\mathcal{N})$), the formula would simplify to $C(\mathcal{N}) = \chi(\mathcal{N})$, a single-letter formula that can be computed from one use of the channel.
For many years, it was conjectured that the Holevo information is always additive. In 2009, Hastings disproved this by showing that there exist channels for which entangled input states across multiple channel uses yield strictly higher Holevo information than product inputs. This was a landmark result: it means that entanglement between channel inputs is a genuine resource for classical communication, and computing the exact classical capacity is in general intractable.
It is tempting to think that "the Holevo bound gives the capacity." This is only true for channels where the Holevo information is additive (many common channels satisfy this, including the depolarizing channel for all dimensions where it has been checked). For general channels, the capacity requires the regularized formula, which involves an optimization over an unbounded number of channel uses. The Holevo $\chi$-quantity gives only a lower bound on the capacity (it is always achievable), but it may not be tight.
Proof Strategy: Typical Sequences Meet Quantum States
The achievability proof of the HSW theorem follows the same conceptual structure as Shannon's original proof, adapted to the quantum setting. Alice constructs a random codebook by choosing $2^{nR}$ codewords, each a product state of $n$ signal states drawn from the optimal input ensemble. Bob uses a POVM constructed from projections onto the typical subspaces associated with each codeword. The key steps are:
- Random coding: Generate a random codebook $\{x^n(m)\}$ by drawing each symbol independently from the optimal distribution.
- Gentle measurement lemma: Bob's POVM can be designed so that correctly identifying the codeword disturbs the state only slightly. This is a uniquely quantum ingredient with no classical analog.
- Union bound on error: The probability that Bob mistakes one codeword for another decays exponentially in $n$ when the rate $R < \chi$.
The converse (that rates above $\chi$ per use are impossible) follows from the Holevo bound applied to $n$ uses of the channel, combined with the Fano inequality.
Examples
For several important channels, the classical capacity is known in single-letter form:
- Noiseless qubit channel: $C = 1$ bit per use.
- Quantum erasure channel (erasure probability $\epsilon$): $C = (1 - \epsilon) \log 2 = 1 - \epsilon$ bits per qubit.
- Qubit depolarizing channel (error probability $p$): $C = 1 - H\!\bigl(\tfrac{2p}{3}\bigr)$, where $H(x) = -x\log_2 x - (1-x)\log_2(1-x)$ is the binary entropy function and $p$ here denotes the total Pauli error probability (each of $X$, $Y$, $Z$ applied with probability $p/3$), related to the replacement probability $p_r$ used in Chapter 27 by $p = \frac{3}{4}p_r$. The capacity decreases because the output of any pure input state has bit-flip probability $2p/3$ (the $Z$ error leaves basis states unchanged, while $X$ and $Y$ errors flip them).
For the depolarizing channel, the optimal input ensemble uses orthogonal states with equal probabilities, and the capacity decreases from 1 bit (no noise) to 0 as $p$ increases toward the fully depolarizing limit $p = 3/4$ (where every input is mapped to $I/2$).
29.2 Quantum Capacity
The quantum capacity $Q(\mathcal{N})$ is the maximum rate at which quantum information (qubits) can be reliably transmitted through a noisy quantum channel. This is a fundamentally quantum question with no classical analog: we are asking not just about transmitting bits, but about preserving quantum coherence and entanglement.
Coherent Information
The key quantity for quantum capacity is the coherent information, introduced by Schumacher and Nielsen (1996):
$$I_c(\rho, \mathcal{N}) = S(\mathcal{N}(\rho)) - S(\mathcal{N}^c(\rho))$$where $\mathcal{N}^c$ is the complementary channel (Chapter 27). This can be rewritten as:
$$I_c(\rho, \mathcal{N}) = S(B)_\sigma - S(AB)_\sigma$$where $\sigma_{AB} = (\mathcal{I}_A \otimes \mathcal{N}_B)(|\psi\rangle\langle\psi|_{AB})$ and $|\psi\rangle_{AB}$ is a purification of $\rho$. Note that $I_c = -S(A|B)$, the negative of the conditional entropy. The coherent information measures how much more the receiver knows about the full state than the environment does.
The coherent information $I_c(\rho, \mathcal{N})$ can be negative - this happens when the environment captures more information than the receiver. A positive coherent information indicates that the channel can support some quantum communication. Unlike classical mutual information, the coherent information is not symmetric and can be negative, reflecting the fundamental asymmetry between sender and receiver in quantum communication.
The LSD Theorem
The quantum capacity is given by the Lloyd-Shor-Devetak (LSD) theorem:
$$Q(\mathcal{N}) = \lim_{n \to \infty} \frac{1}{n} \max_\rho I_c(\rho, \mathcal{N}^{\otimes n})$$The achievability was proved with increasing rigor by Lloyd (1997), Shor (2002), and Devetak (2005). Devetak's proof is the most complete and uses a clever connection to private classical communication: sending quantum information is equivalent to sending classical information that is private from the environment.
Like the HSW theorem, the LSD formula requires regularization. The coherent information is known to be superadditive for some channels: there exist channels with zero single-use coherent information ($\max_\rho I_c(\rho, \mathcal{N}) = 0$) but positive quantum capacity. Two such channels used together can transmit quantum information even though neither can alone - a phenomenon called superactivation of quantum capacity, discovered by Smith and Yard (2008).
Degradable Channels
For degradable channels - channels where the complementary channel can be obtained by applying a degrading map to the output - the coherent information is additive, and the quantum capacity simplifies to a single-letter formula:
$$Q(\mathcal{N}) = \max_\rho I_c(\rho, \mathcal{N})$$Important degradable channels include the quantum erasure channel and the amplitude damping channel. For the erasure channel with parameter $\epsilon$: $Q = \max(1 - 2\epsilon, 0)$. Note that $Q > 0$ only for $\epsilon < 1/2$ - a clear threshold effect.
Worked Example: Quantum Capacity of the Erasure Channel
The quantum erasure channel with parameter $\epsilon$ transmits the qubit perfectly with probability $1 - \epsilon$ and erases it (replacing it with a known flag state) with probability $\epsilon$. Because it is degradable, the quantum capacity has the single-letter form:
$$Q = \max_\rho I_c(\rho, \mathcal{N})$$For any input state $\rho$, the coherent information of the erasure channel evaluates to $(1 - 2\epsilon)S(\rho)$, which is maximized by $\rho = I/2$ (the maximally mixed state). This gives:
$$Q = \max(1 - 2\epsilon, 0)$$The capacity is positive only when $\epsilon < 1/2$ - the receiver must get more than half the transmitted qubits on average. At $\epsilon = 1/2$, the channel is at the threshold: the environment captures exactly as much as the receiver, and no quantum information survives. Compare with the classical capacity $C = 1 - \epsilon$, which is positive for all $\epsilon < 1$. Quantum information is harder to protect than classical information.
Predict: Erasure Channel at $\epsilon = 0.5$
The quantum erasure channel with $\epsilon = 0.5$ erases half the qubits on average. Meanwhile, $C = 1 - \epsilon = 0.5$ bits per use (positive!). And $Q = \max(1 - 2\epsilon, 0) = 0$ qubits per use.
Question: Why does the classical capacity remain positive at $\epsilon = 0.5$ while the quantum capacity drops to zero? Write your prediction below.
Observe
At $\epsilon = 0.5$: the channel erases each qubit independently with probability 1/2. The receiver gets roughly half the transmitted states. The erasure is heralded - both parties know which qubits were erased. For classical bits, the surviving half still carries full information. For quantum states, the environment now holds as much quantum information as the receiver.
$C(\epsilon = 0.5) = 0.5$ bits > 0 (classical information survives).
$Q(\epsilon = 0.5) = 0$ qubits (quantum coherence is lost).
Explain
The asymmetry comes from the no-cloning theorem. Classical bits can be copied, so even if half are lost, the remaining half suffices. Quantum information cannot be cloned: at $\epsilon = 0.5$, the environment receives as much quantum information as the receiver (the channel becomes anti-degradable). If quantum information could still be sent, both receiver and environment would effectively share it - violating no-cloning.
This illustrates a deep principle: quantum information is a zero-sum resource between receiver and environment. The coherent information $I_c = S(B) - S(AB)$ equals zero precisely when the environment knows as much as the receiver.
Anti-Degradable Channels and Zero Capacity
A channel is anti-degradable if the receiver's output can be obtained from the environment's output. For such channels, the environment has at least as much information as the receiver, so no quantum information can be protected: $Q = 0$. The depolarizing channel is anti-degradable (and hence has zero quantum capacity) when the noise parameter $p$ exceeds a certain threshold.
The No-Cloning Bound
A beautiful connection between quantum capacity and the no-cloning theorem: if a channel $\mathcal{N}$ has $Q > 0$, then its complementary channel $\mathcal{N}^c$ must have $Q^c = 0$. If both $\mathcal{N}$ and $\mathcal{N}^c$ had positive quantum capacity, we could use them together to create two copies of quantum information (one at the receiver, one at the environment), violating no-cloning. This fundamental constraint means quantum information is a zero-sum game between receiver and environment: it cannot be shared.
The quantum capacity landscape is much richer than the classical one. Classical channel capacity is given by a simple single-letter formula (Shannon's noisy channel coding theorem). Quantum capacity requires regularization, can be superadditive, can be superactivated, and the question of whether a given channel has zero or positive capacity can be undecidable. This complexity reflects the fundamentally richer structure of quantum information.
29.3 Entanglement-Assisted Communication
What if Alice and Bob share entanglement before communication begins? Pre-shared entanglement is a powerful resource: recall that superdense coding uses one shared Bell pair to send two classical bits through one qubit channel. The entanglement-assisted classical capacity $C_E(\mathcal{N})$ quantifies this advantage.
The Bennett-Shor-Smolin-Thapliyal (BSST) Theorem
The BSST Theorem. The entanglement-assisted classical capacity of a quantum channel $\mathcal{N}$ is: $$C_E(\mathcal{N}) = \max_\rho I(A:B)_\sigma$$ where $\sigma_{AB} = (\mathcal{I}_A \otimes \mathcal{N}_B)(|\psi\rangle\langle\psi|_{AB})$, $|\psi\rangle_{AB}$ is a purification of $\rho$, and $I(A:B) = S(A) + S(B) - S(AB)$ is the quantum mutual information. This is a single-letter formula - no regularization is needed.
The BSST theorem is remarkable for several reasons:
- No regularization: Unlike $C$ and $Q$, the entanglement-assisted capacity is given by a single-letter formula. Entanglement between channel inputs provides no additional benefit beyond what the mutual information formula already captures.
- Always at least as large as $C$: $C_E(\mathcal{N}) \geq C(\mathcal{N})$, since pre-shared entanglement can only help. For a noiseless qubit channel, $C_E = 2$ bits (superdense coding) compared to $C = 1$ bit.
- Quantum mutual information as capacity: The quantum mutual information, which we studied as a measure of correlations in Chapter 28, now acquires an operational meaning as a communication rate.
Entanglement-Assisted Quantum Capacity
With pre-shared entanglement, quantum communication also becomes easier. The entanglement-assisted quantum capacity is:
$$Q_E(\mathcal{N}) = \frac{1}{2} C_E(\mathcal{N}) = \frac{1}{2} \max_\rho I(A:B)_\sigma$$The factor of $1/2$ comes from quantum teleportation: each qubit of quantum communication requires two classical bits via teleportation. This clean relationship between classical and quantum entanglement-assisted capacities is another consequence of the teleportation protocol.
The Capacity Hierarchy
We can now compare the various capacities. For a quantum channel $\mathcal{N}$:
$$Q(\mathcal{N}) \leq C(\mathcal{N}) \leq C_E(\mathcal{N})$$The first inequality says quantum information is harder to transmit than classical information (protecting coherence is harder than protecting bit values). The second says entanglement assistance can only help. For a noiseless qubit channel: $Q = 1$ qubit, $C = 1$ bit, $C_E = 2$ bits.
The gaps between these capacities can be dramatic. There exist channels with $Q = 0$ (no quantum communication possible) but $C > 0$ (classical communication still works). Entanglement assistance can turn a channel with low classical capacity into one with high capacity.
There is also a private classical capacity $C_p(\mathcal{N})$ - the rate of classical communication secure against an eavesdropper who controls the environment. It satisfies $Q(\mathcal{N}) \leq C_p(\mathcal{N}) \leq C(\mathcal{N})$. In fact, Devetak (2005) showed that $Q(\mathcal{N}) = C_p(\mathcal{N})$ for the regularized formulas: the quantum capacity equals the private capacity. Sending quantum information is the same as sending private classical information.
29.4 Quantum Data Compression
Shannon's source coding theorem says that a classical source with entropy $H$ can be compressed to $H$ bits per symbol and no fewer. The quantum analog is Schumacher compression, proved by Schumacher in 1995, which established the von Neumann entropy as the fundamental limit of quantum data compression.
The Setup
Alice has a quantum source that produces states from an ensemble $\{p_x, |\psi_x\rangle\}$, giving rise to the average state $\rho = \sum_x p_x |\psi_x\rangle\langle\psi_x|$. She wants to compress $n$ copies of this source into the fewest qubits possible, transmit them through a noiseless quantum channel, and have Bob decompress with high fidelity.
Typical Subspace
The key idea is the typical subspace, the quantum analog of the typical set from classical information theory. Consider the spectral decomposition $\rho = \sum_j \lambda_j |j\rangle\langle j|$. For $n$ copies, the Hilbert space of $\rho^{\otimes n}$ has dimension $d^n$, but most of the probability weight is concentrated on a subspace of dimension approximately $2^{nS(\rho)}$ - the typical subspace $T_{n,\delta}$.
More precisely, the typical subspace is spanned by tensor products $|j_1\rangle \otimes \cdots \otimes |j_n\rangle$ whose empirical eigenvalue frequencies are close to the true eigenvalue distribution $\{\lambda_j\}$. By the law of large numbers:
$$\text{dim}(T_{n,\delta}) \approx 2^{nS(\rho)}$$and the probability of the state lying outside the typical subspace vanishes as $n \to \infty$.
Schumacher's Theorem
Schumacher's Quantum Source Coding Theorem (1995). A quantum source with density operator $\rho$ can be reliably compressed at rate $R$ qubits per source state if and only if $R \geq S(\rho)$. The von Neumann entropy $S(\rho)$ is the optimal compression rate - the quantum analog of Shannon entropy for source coding.
The compression protocol works as follows:
- Encode: Project the $n$-qubit source state onto the typical subspace $T_{n,\delta}$ (dimension $\approx 2^{nS(\rho)}$). This projection succeeds with probability approaching 1 as $n \to \infty$.
- Compress: The typical subspace has dimension $2^{nS(\rho)}$, so it can be encoded into $nS(\rho)$ qubits using a unitary mapping followed by discarding the remaining qubits.
- Decompress: Bob applies the inverse unitary to recover the state in the typical subspace.
The fidelity of this protocol approaches 1 as $n \to \infty$ for any rate $R > S(\rho)$. Conversely, any protocol with rate $R < S(\rho)$ has fidelity approaching 0. The von Neumann entropy thus has a direct operational interpretation: it is the number of qubits per copy needed to faithfully represent a quantum source.
Comparison with Classical Compression
| Feature | Classical (Shannon) | Quantum (Schumacher) |
|---|---|---|
| Source | i.i.d. random variable $X$ | i.i.d. quantum states from $\rho$ |
| Optimal rate | $H(X)$ bits/symbol | $S(\rho)$ qubits/state |
| Key tool | Typical set | Typical subspace |
| Compression map | Variable-length coding | Projection + unitary |
| Entropy meaning | Average surprise | Quantum uncertainty / mixedness |
Worked Example: Compressing a Biased Qubit Source
Suppose a source produces $|0\rangle$ with probability $0.9$ and $|1\rangle$ with probability $0.1$. The average state is $\rho = 0.9|0\rangle\langle 0| + 0.1|1\rangle\langle 1|$, which is already diagonal, with eigenvalues $0.9$ and $0.1$. The von Neumann entropy is:
$$S(\rho) = -0.9\log 0.9 - 0.1\log 0.1 = h(0.1) \approx 0.469 \text{ qubits}$$So $n$ copies of this source can be compressed into approximately $0.469n$ qubits - less than one qubit per source state. The typical subspace of $\rho^{\otimes n}$ has dimension $\approx 2^{0.469n}$, which is exponentially smaller than the full Hilbert space of dimension $2^n$. Most of the probability weight is concentrated on sequences with approximately $0.9n$ zeros and $0.1n$ ones.
Notice that although these are orthogonal states (which classically require 1 bit each and compress to $h(0.1) \approx 0.469$ bits by Shannon's theorem), the quantum compression rate is the same. Schumacher compression reduces to Shannon compression when the states are orthogonal - the quantum theory properly generalizes the classical one.
Beyond i.i.d. Sources
Schumacher's theorem assumes the source produces independent and identically distributed (i.i.d.) quantum states. For correlated quantum sources, the compression limit is given by the quantum asymptotic equipartition property (AEP): the compression rate equals $\lim_{n \to \infty} \frac{1}{n} S(\rho^{(n)})$, where $\rho^{(n)}$ is the joint state of $n$ source outputs. For i.i.d. sources, this reduces to $S(\rho)$, but for correlated sources it can be strictly less, reflecting the redundancy in the correlations.
Visible vs. Blind Compression
In visible compression, the encoder knows which state $|\psi_x\rangle$ was produced in each round (she has classical side information). In blind compression, she only has the quantum state and not the classical label. For pure state ensembles, blind compression requires $S(\rho)$ qubits (Schumacher's result), while visible compression can sometimes do better by exploiting the classical knowledge. However, for ensembles of mixed states, blind compression is the standard model, and the rate is again $S(\rho)$.
Schumacher compression is sometimes called "quantum noiseless coding" by analogy with Shannon's noiseless coding theorem. Just as Shannon's theorem established that entropy is the right measure of classical information content, Schumacher's theorem established that the von Neumann entropy is the right measure of quantum information content. This was a crucial conceptual advance: it gave the von Neumann entropy, originally introduced in the context of quantum statistical mechanics in 1927, a precise information-theoretic operational interpretation.
Sandbox: Channel Capacity Exploration
This sandbox demonstrates how noise degrades information transmission. We send a known qubit state through different noise levels and measure how well the state is preserved. With no noise, the measurement outcome is deterministic. As noise increases, the outcomes become more random, reflecting the loss of channel capacity.
Experiments to try:
- No noise: Set the rotation to
ry(0) q[1];. The environment qubit stays in $|0\rangle$, the CNOT has no effect, and q[0] is measured as 0 with 100% probability. The channel is noiseless with capacity 1. - Light noise: Use
ry(0.3) q[1];. A small fraction of runs flip q[0]. The channel capacity is reduced but still positive. - Maximum noise: Use
ry(1.5708) q[1];($\pi/2$). The environment qubit is in an equal superposition, causing maximum decoherence. The data qubit is now 50/50 - the channel is fully depolarizing and capacity approaches zero. - Encoding protection: Add
h q[0];before the noise andh q[0];after. Does changing the encoding basis affect the error rate?
The Quantum Shannon Theory Landscape
The following table summarizes the major capacity theorems of quantum Shannon theory. Each row gives a communication task, the capacity formula, and whether the formula is single-letter or requires regularization.
| Task | Capacity | Formula | Single-letter? |
|---|---|---|---|
| Classical bits over quantum channel | $C(\mathcal{N})$ | $\lim_n \frac{1}{n}\chi(\mathcal{N}^{\otimes n})$ (HSW) | No (in general) |
| Qubits over quantum channel | $Q(\mathcal{N})$ | $\lim_n \frac{1}{n}\max_\rho I_c(\rho, \mathcal{N}^{\otimes n})$ (LSD) | No (in general) |
| Classical bits with entanglement | $C_E(\mathcal{N})$ | $\max_\rho I(A:B)_\sigma$ (BSST) | Yes |
| Qubits with entanglement | $Q_E(\mathcal{N})$ | $\frac{1}{2}\max_\rho I(A:B)_\sigma$ | Yes |
| Quantum source compression | - | $S(\rho)$ qubits/state (Schumacher) | Yes |
The striking feature of this landscape is the asymmetry between the entanglement-assisted and unassisted scenarios. Without entanglement, the capacity formulas are regularized (involving a limit over many channel uses) and generally intractable to compute. With entanglement assistance, the formulas become single-letter and computable. Entanglement does not just increase capacity - it simplifies the theory. This deep connection between entanglement and the mathematical tractability of quantum information theory remains one of the most fascinating aspects of the field.
Quantum Shannon theory reveals that a quantum channel is not characterized by a single number. It has a classical capacity $C$, a quantum capacity $Q$, an entanglement-assisted capacity $C_E$, a private capacity $C_p$, and more. These capacities are generally different and can have surprising relationships (e.g., $Q = C_p$, superactivation of $Q$). The resource of pre-shared entanglement is special: it always simplifies the capacity formula to a single-letter expression.