Chapter 14: Noise, Decoherence, and the NISQ Era

In the preceding chapters we built an impressive toolkit: qubits, gates, entanglement, teleportation, algorithms that factor integers and search databases. But every circuit we ran in the sandbox produced textbook-perfect results - histograms that matched the mathematics exactly. Real quantum hardware is not so obliging. Every physical qubit is under constant assault from its environment. Energy leaks away. Phases drift. Gates occasionally do the wrong thing. Measurements sometimes lie. In this chapter we confront the messy reality of quantum computing and develop the mathematical framework to describe it precisely.

The news is not all bad. Understanding noise is the first step toward fighting it, and the techniques we develop here - density matrices, quantum channels, benchmarking protocols - form the foundation for the quantum error correction we will study in later chapters. Meanwhile, the current generation of noisy intermediate-scale quantum (NISQ) devices is already pushing the boundaries of what classical computers can simulate, even without full error correction. By the end of this chapter you will understand what makes real qubits imperfect, how to model that imperfection mathematically, how experimentalists measure it, and what today's quantum hardware can and cannot accomplish.

14.1 Why Real Qubits Are Imperfect

A qubit, as we defined it in Chapter 4, is a two-level quantum system described by a state $|\psi\rangle = \alpha|0\rangle + \beta|1\rangle$. In theory, $\alpha$ and $\beta$ are fixed until we deliberately apply a gate or measurement. In practice, the qubit is a physical object - a superconducting circuit, a trapped ion, a photon's polarization - embedded in a larger physical environment. That environment is constantly interacting with the qubit, and those interactions corrupt its state. We classify these corruptions into four broad categories.

T1 Relaxation: Energy Decay

Every physical qubit has a ground state $|0\rangle$ and an excited state $|1\rangle$ separated by some energy gap. Left alone, a qubit in the excited state will eventually decay to the ground state, releasing energy into the environment - much like an atom emitting a photon. This process is called energy relaxation or T1 decay.

The characteristic time $T_1$ is defined so that the probability of finding the qubit still in $|1\rangle$ after time $t$ decays exponentially:

$$P(|1\rangle, t) = P(|1\rangle, 0) \, e^{-t/T_1}$$

For modern superconducting transmon qubits, $T_1$ values typically range from 50 to 200 microseconds on production processors. In optimized research devices, individual qubits have reached several hundred microseconds. Trapped-ion qubits enjoy much longer $T_1$ times - often seconds or more - because their energy levels are more isolated from the environment. However, even trapped-ion qubits are not immune: collisions with background gas molecules and stray electric fields eventually cause decay.

Key Concept.

$T_1$ (energy relaxation time) measures how long a qubit retains its excited state before decaying to the ground state. It sets a hard upper bound on the useful computation time: any quantum circuit must complete before its qubits have had time to relax appreciably.

T2 Dephasing: Loss of Phase Coherence

Even if a qubit does not lose energy, its relative phase can drift unpredictably. Consider the superposition $\frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)$. On the Bloch sphere (Chapter 5), this state lies on the equator, pointing along the $+x$ direction. Random fluctuations in the qubit's energy splitting cause this phase to wander, rotating the Bloch vector around the $z$-axis unpredictably. Over time, the phase becomes uniformly random, and the qubit's state effectively collapses to a classical mixture of $|0\rangle$ and $|1\rangle$. This process is called dephasing or T2 decay.

The dephasing time $T_2$ characterizes how quickly phase information is lost. There are actually two related timescales:

  • $T_2^*$ (T2 star): the free induction decay time, measured by a simple Ramsey experiment. This includes the effects of both intrinsic dephasing and slow fluctuations in the qubit frequency.
  • $T_2^{\text{echo}}$ (T2 echo): measured using a spin-echo or Hahn echo sequence, which refocuses slow frequency drifts. This is typically longer than $T_2^*$ because it filters out low-frequency noise.

A fundamental constraint relates these timescales:

$$T_2 \leq 2\,T_1$$

This inequality arises because energy relaxation also destroys phase coherence - if the qubit decays from $|1\rangle$ to $|0\rangle$, any superposition it was in is certainly destroyed. The factor of 2 reflects the fact that $T_1$ processes destroy both the population and the phase, while pure dephasing affects only the phase. When $T_2 = 2T_1$, the qubit is said to be T1-limited - there is no additional dephasing beyond what energy relaxation causes. When $T_2 \ll 2T_1$, additional dephasing mechanisms (charge noise, flux noise, magnetic field fluctuations) are significant.

Gate Errors

Quantum gates are implemented by applying precisely calibrated control pulses - microwave pulses for superconducting qubits, laser pulses for trapped ions, and so on. Any imprecision in these pulses introduces errors. A gate that should apply a rotation of exactly $\pi$ radians might instead apply $\pi + \epsilon$, with $\epsilon$ varying slightly from shot to shot.

Gate errors are typically quantified by their error rate or equivalently their fidelity, where fidelity $= 1 - \text{error rate}$. On current hardware:

  • Single-qubit gates achieve fidelities of 99.5% to 99.99% (error rates $10^{-2}$ to $10^{-4}$). These are relatively easy to calibrate because they involve only one qubit.
  • Two-qubit gates (CNOT, CZ) achieve fidelities of 99% to 99.9% (error rates $10^{-2}$ to $10^{-3}$). These are harder because they require precise interactions between two qubits, and any coupling to the environment during the interaction causes errors.

The critical insight is that errors accumulate. If each two-qubit gate has an error rate of 1%, then a circuit with 100 two-qubit gates has a total error probability of roughly $1 - (0.99)^{100} \approx 0.63$ - meaning the final result is wrong more often than it is right. This exponential accumulation of errors is the central challenge of quantum computing.

Measurement Errors

Even the act of measuring a qubit can go wrong. A qubit in $|0\rangle$ might be reported as $|1\rangle$, or vice versa. Measurement errors typically run from 0.5% to 5% on current hardware, depending on the platform and the readout technology. Like gate errors, measurement errors corrupt the final output of a computation. Unlike gate errors, they occur at the very end and do not propagate through subsequent gates (unless we are doing mid-circuit measurements for error correction, as we will see in later chapters).

Note.

On many platforms, measurement errors are asymmetric: the probability of misreading $|1\rangle$ as $|0\rangle$ differs from the probability of misreading $|0\rangle$ as $|1\rangle$. This asymmetry arises because the physical signals corresponding to $|0\rangle$ and $|1\rangle$ have different characteristics (e.g., different voltage levels in a superconducting readout resonator).

Putting It All Together

A real quantum computation is a race against time. The circuit must complete - all gates applied and measurements performed - before decoherence ($T_1$ and $T_2$) has corrupted the qubits beyond repair. But making gates faster introduces its own problems: faster pulses tend to be less precise, increasing gate errors. Hardware engineers navigate this tension every day, balancing speed against fidelity.

Error Source Typical Scale (Superconducting) Physical Origin
$T_1$ relaxation 50-500 microseconds Energy exchange with environment
$T_2$ dephasing 50-500 microseconds Phase fluctuations from noise
Single-qubit gate error $10^{-4}$ to $10^{-2}$ Pulse calibration imprecision
Two-qubit gate error $10^{-3}$ to $10^{-2}$ Coupling imprecision, crosstalk
Measurement error $5 \times 10^{-3}$ to $5 \times 10^{-2}$ Readout signal discrimination

Simulation: T1/T2 Decay on the Bloch Sphere

Watch how $T_1$ relaxation and $T_2$ dephasing affect a qubit on the Bloch sphere. $T_1$ pulls the state toward $|0\rangle$ (north pole), while $T_2$ shrinks the equatorial components. Adjust the decay time ratio to see how the Bloch vector evolves.

14.2 The Density Matrix: Describing Noisy Quantum States

So far in this textbook, we have described quantum states as kets: $|\psi\rangle = \alpha|0\rangle + \beta|1\rangle$. This formalism assumes we know the exact quantum state of our system - a situation called a pure state. But noise introduces a new kind of uncertainty. If a qubit has undergone random dephasing, we might know that it is either $|0\rangle$ with probability $p$ or $|1\rangle$ with probability $1-p$, but we do not know which. This is not a superposition - it is classical ignorance about which quantum state the system is in. We need a more general mathematical object to describe it.

From Kets to Density Matrices

The density matrix (or density operator) $\rho$ generalizes the state vector formalism. For a pure state $|\psi\rangle$, the density matrix is simply the outer product:

$$\rho = |\psi\rangle\langle\psi|$$

For a single qubit in state $|\psi\rangle = \alpha|0\rangle + \beta|1\rangle$, this gives:

$$\rho = \begin{pmatrix} |\alpha|^2 & \alpha\beta^* \\ \beta\alpha^* & |\beta|^2 \end{pmatrix}$$

The diagonal entries $|\alpha|^2$ and $|\beta|^2$ are the probabilities of measuring $|0\rangle$ and $|1\rangle$. The off-diagonal entries $\alpha\beta^*$ and $\beta\alpha^*$ are called coherences - they encode the phase relationship between the two basis states. The presence of non-zero coherences is what distinguishes a quantum superposition from a classical mixture.

Mixed States

Now suppose our qubit is in state $|\psi_1\rangle$ with probability $p_1$ or state $|\psi_2\rangle$ with probability $p_2$, where $p_1 + p_2 = 1$. This statistical mixture (or mixed state) is described by:

$$\rho = p_1 |\psi_1\rangle\langle\psi_1| + p_2 |\psi_2\rangle\langle\psi_2|$$

More generally, for a mixture of many states:

$$\rho = \sum_i p_i |\psi_i\rangle\langle\psi_i|$$

where $p_i \geq 0$ and $\sum_i p_i = 1$. This is the most general description of a quantum state. The density matrix has several important properties:

  • Hermitian: $\rho = \rho^\dagger$ (the matrix equals its conjugate transpose).
  • Positive semidefinite: all eigenvalues are non-negative.
  • Unit trace: $\text{Tr}(\rho) = 1$ (probabilities sum to one).
  • Purity test: $\text{Tr}(\rho^2) = 1$ if and only if $\rho$ is a pure state. For mixed states, $\text{Tr}(\rho^2) < 1$. The quantity $\text{Tr}(\rho^2)$ is called the purity.
Key Concept.

A pure state has $\text{Tr}(\rho^2) = 1$ and can be written as a single ket $|\psi\rangle$. A mixed state has $\text{Tr}(\rho^2) < 1$ and represents classical uncertainty about which quantum state the system is in. The density matrix $\rho$ describes both cases within a single framework.

Examples: Pure vs. Mixed

Consider the state $|+\rangle = \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)$. Its density matrix is:

$$\rho_{+} = |+\rangle\langle+| = \frac{1}{2}\begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix}$$

The off-diagonal entries are $\frac{1}{2}$ - full coherence. Now consider the maximally mixed state, where the qubit is $|0\rangle$ or $|1\rangle$ each with probability $\frac{1}{2}$:

$$\rho_{\text{mix}} = \frac{1}{2}|0\rangle\langle 0| + \frac{1}{2}|1\rangle\langle 1| = \frac{1}{2}\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = \frac{I}{2}$$

The diagonal entries are identical in both cases - measuring in the computational basis gives 50/50 outcomes either way. But the off-diagonal entries are zero in $\rho_{\text{mix}}$, reflecting the absence of any phase relationship. The two states are physically distinct and give different results when measured in other bases (e.g., the $|+\rangle / |-\rangle$ basis), even though they look the same in the computational basis.

Common Misconception.

A common confusion: "50% probability of $|0\rangle$ and 50% probability of $|1\rangle$" is not the same as the superposition $\frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)$. The former is a mixed state with no coherences; the latter is a pure state with full coherence. The difference is experimentally measurable - apply a Hadamard gate and measure. The pure state $|+\rangle$ always gives $|0\rangle$; the mixed state gives 50/50 outcomes.

The Bloch Sphere Interior

In Chapter 5, we saw that every pure single-qubit state maps to a point on the surface of the Bloch sphere. The density matrix formalism reveals what lives inside the sphere. Any single-qubit density matrix can be written as:

$$\rho = \frac{1}{2}(I + r_x X + r_y Y + r_z Z)$$

where $X$, $Y$, $Z$ are the Pauli matrices and $\vec{r} = (r_x, r_y, r_z)$ is the Bloch vector. The constraint is that $|\vec{r}| \leq 1$:

  • $|\vec{r}| = 1$: pure state (on the surface of the sphere).
  • $0 < |\vec{r}| < 1$: mixed state (inside the sphere).
  • $|\vec{r}| = 0$: maximally mixed state $\frac{I}{2}$ (at the center of the sphere).

Noise shrinks the Bloch vector toward the origin. $T_1$ relaxation pulls the vector toward the north pole ($|0\rangle$, the ground state). Dephasing shrinks the $x$ and $y$ components while leaving $z$ unchanged, collapsing the state toward the $z$-axis. Depolarizing noise uniformly shrinks the vector toward the center. The Bloch sphere provides a powerful geometric picture: noise takes states on the surface and pushes them inward, destroying quantum information.

The Partial Trace: Mixed States from Entanglement

There is a second, more fundamental way that mixed states arise: from entanglement. Consider the Bell state $|\Phi^+\rangle = \frac{1}{\sqrt{2}}(|00\rangle + |11\rangle)$, which is a pure state of two qubits. What is the state of just the first qubit? We obtain it by tracing out (averaging over) the second qubit:

$$\rho_A = \text{Tr}_B(|\Phi^+\rangle\langle\Phi^+|) = \frac{1}{2}|0\rangle\langle 0| + \frac{1}{2}|1\rangle\langle 1| = \frac{I}{2}$$

The first qubit, taken alone, is in the maximally mixed state - even though the two-qubit system as a whole is in a perfectly pure state. This is entanglement at work: the information about qubit A's state is encoded in its correlations with qubit B, not in A alone. The partial trace is the mathematical operation that formalizes "looking at part of a larger system."

This is deeply relevant to noise. When a qubit interacts with its environment, the qubit-plus-environment system evolves unitarily (as all quantum mechanics demands). But we do not have access to the environment's state. Tracing out the environment degrees of freedom leaves the qubit in a mixed state. Decoherence is entanglement with the environment, viewed from the qubit's perspective.

Expectation Values and Measurement

How do we extract physical predictions from a density matrix? The expectation value of an observable $O$ is:

$$\langle O \rangle = \text{Tr}(\rho \, O)$$

The probability of measurement outcome $|m\rangle$ is:

$$P(m) = \text{Tr}(\rho \, |m\rangle\langle m|) = \langle m|\rho|m\rangle$$

And the state after a unitary gate $U$ is:

$$\rho \mapsto U\rho U^\dagger$$

This last equation is the density matrix analogue of $|\psi\rangle \mapsto U|\psi\rangle$. It works equally well for pure and mixed states, which is why the density matrix formalism is indispensable for describing noisy quantum systems.

14.3 Quantum Channels: Modeling Noise Mathematically

We now have the language to describe noisy states (density matrices). Next we need a way to describe noisy processes - the operations that turn a clean state into a noisy one. In quantum information theory, these are called quantum channels.

The Kraus Representation

A quantum channel $\mathcal{E}$ maps an input density matrix $\rho$ to an output density matrix $\mathcal{E}(\rho)$. The most common mathematical representation uses Kraus operators $\{K_i\}$:

$$\mathcal{E}(\rho) = \sum_i K_i \rho K_i^\dagger$$

The Kraus operators must satisfy the completeness relation:

$$\sum_i K_i^\dagger K_i = I$$

This ensures that probabilities still sum to one (the channel is trace-preserving). Each term $K_i \rho K_i^\dagger$ can be thought of as one possible "error branch" - error $K_i$ occurs, and the resulting state is $K_i \rho K_i^\dagger$ (up to normalization). The channel sums over all possible errors, weighted by their probabilities.

A perfect (noiseless) gate $U$ is itself a quantum channel with a single Kraus operator $K_0 = U$. Real gates are described by channels with additional Kraus operators representing the noise.

Key Concept.

A quantum channel is a completely positive, trace-preserving (CPTP) map on density matrices. Its Kraus representation $\mathcal{E}(\rho) = \sum_i K_i \rho K_i^\dagger$ expresses the channel as a sum over possible error processes. Any physical noise process can be written in this form.

The Depolarizing Channel

The simplest noise model treats all errors as equally likely. The depolarizing channel with parameter $p$ replaces the qubit state with the maximally mixed state $\frac{I}{2}$ with probability $p$, and leaves it unchanged with probability $1-p$:

$$\mathcal{E}_{\text{depol}}(\rho) = (1-p)\,\rho + p\,\frac{I}{2}$$

Equivalently, this can be written using the Pauli matrices as Kraus-like error channels:

$$\mathcal{E}_{\text{depol}}(\rho) = \left(1 - \frac{3p}{4}\right)\rho + \frac{p}{4}\left(X\rho X + Y\rho Y + Z\rho Z\right)$$

Here, with probability $\frac{p}{4}$ each, one of the three Pauli errors ($X$, $Y$, or $Z$) is applied, and with probability $1 - \frac{3p}{4}$ nothing happens. (These two expressions are equivalent because $\frac{1}{4}(I\rho I + X\rho X + Y\rho Y + Z\rho Z) = \frac{I}{2}$ for any $\rho$.)

On the Bloch sphere, the depolarizing channel uniformly shrinks the Bloch vector: $\vec{r} \mapsto (1-p)\vec{r}$. Every direction is affected equally. This makes it a useful "worst-case" or "structureless" noise model when you do not know the specific nature of the noise in your system.

The Dephasing Channel

The dephasing channel (also called the phase damping channel) models $T_2$-type noise that destroys phase coherence without affecting populations. With parameter $p$:

$$\mathcal{E}_{\text{deph}}(\rho) = (1-p)\,\rho + p\,Z\rho Z$$

The Kraus operators are $K_0 = \sqrt{1-p}\,I$ and $K_1 = \sqrt{p}\,Z$. Acting on a general single-qubit density matrix:

$$\begin{pmatrix} \rho_{00} & \rho_{01} \\ \rho_{10} & \rho_{11} \end{pmatrix} \mapsto \begin{pmatrix} \rho_{00} & (1-2p)\rho_{01} \\ (1-2p)\rho_{10} & \rho_{11} \end{pmatrix}$$

The diagonal entries (populations) are untouched, but the off-diagonal entries (coherences) are multiplied by $(1-2p)$. When $p = \frac{1}{2}$, the coherences vanish entirely and the qubit becomes a classical mixture. On the Bloch sphere, the dephasing channel shrinks only the $x$ and $y$ components of the Bloch vector, while leaving the $z$ component unchanged - the sphere is squashed into a disk along the equator and eventually collapses to a line along the $z$-axis.

The Amplitude Damping Channel

The amplitude damping channel models $T_1$ relaxation - the decay of $|1\rangle$ to $|0\rangle$ with probability $\gamma$. Its Kraus operators are:

$$K_0 = \begin{pmatrix} 1 & 0 \\ 0 & \sqrt{1-\gamma} \end{pmatrix}, \qquad K_1 = \begin{pmatrix} 0 & \sqrt{\gamma} \\ 0 & 0 \end{pmatrix}$$

You can verify that $K_0^\dagger K_0 + K_1^\dagger K_1 = I$. The parameter $\gamma$ is related to time and $T_1$ by $\gamma = 1 - e^{-t/T_1}$. Acting on a density matrix:

$$\begin{pmatrix} \rho_{00} & \rho_{01} \\ \rho_{10} & \rho_{11} \end{pmatrix} \mapsto \begin{pmatrix} \rho_{00} + \gamma\,\rho_{11} & \sqrt{1-\gamma}\,\rho_{01} \\ \sqrt{1-\gamma}\,\rho_{10} & (1-\gamma)\,\rho_{11} \end{pmatrix}$$

Population flows from $|1\rangle$ to $|0\rangle$ (the $\rho_{00}$ entry grows by $\gamma\rho_{11}$), and coherences decay at rate $\sqrt{1-\gamma}$. When $\gamma = 1$ (infinite time), the qubit is in $|0\rangle$ regardless of its initial state. On the Bloch sphere, amplitude damping shrinks the sphere toward the north pole ($|0\rangle$).

Note.

Unlike the depolarizing and dephasing channels, amplitude damping is not a unital channel - it does not preserve the identity: $\mathcal{E}(I/2) \neq I/2$. Instead, it drives every state toward $|0\rangle$. This asymmetry reflects the physical reality that relaxation has a preferred direction.

Composing Channels: Why More Gates Means More Noise

When two noisy gates are applied in sequence, the noise compounds. If gate 1 is described by channel $\mathcal{E}_1$ and gate 2 by $\mathcal{E}_2$, the combined process is $\mathcal{E}_2 \circ \mathcal{E}_1$. For depolarizing noise with parameter $p$ after each gate, the Bloch vector shrinks by factor $(1-p)$ per gate. After $n$ gates:

$$|\vec{r}_n| = (1-p)^n |\vec{r}_0|$$

This is an exponential decay. For $p = 0.01$ (99% gate fidelity), the Bloch vector has shrunk to $37\%$ of its original length after 100 gates, and to just $0.004\%$ after 1000 gates. The quantum information is effectively destroyed. This exponential decay of signal quality with circuit depth is the fundamental reason why uncorrected quantum computers cannot run arbitrarily deep circuits.

Simulation: Noise Channel Explorer

Compare ideal and noisy circuit execution side-by-side. Select a noise channel, adjust its strength, and observe how the histogram and quantum state degrade.

Simulation: Error Accumulation

See how noise compounds with circuit depth. Adjust the depth slider to add more identity layers (pairs of CNOTs that cancel) and watch the fidelity degrade exponentially.

Sandbox: Ideal vs. Noisy Behavior

The simulator in this textbook runs ideal circuits - no noise. The sandbox below creates a Bell state $\frac{1}{\sqrt{2}}(|00\rangle + |11\rangle)$. On a perfect simulator, you will see exactly 50% $|00\rangle$ and 50% $|11\rangle$, with no other outcomes. On a real noisy device, you would also see small but non-zero probabilities for $|01\rangle$ and $|10\rangle$ due to gate errors, measurement errors, and decoherence. Run the ideal circuit below, then consider: what would change on real hardware?

On real hardware (such as IBM Quantum devices), running this same circuit typically produces results like 47% $|00\rangle$, 46% $|11\rangle$, 4% $|01\rangle$, and 3% $|10\rangle$. The "leakage" into $|01\rangle$ and $|10\rangle$ comes from a combination of CNOT gate errors, measurement errors, and decoherence during the circuit execution. The deeper the circuit, the worse the leakage becomes.

Sandbox: Longer Circuits Accumulate More Noise

To illustrate how noise compounds with depth, consider this circuit that applies an identity operation in the noisiest possible way: it applies 10 pairs of CNOT gates that should cancel each other out, leaving the Bell state unchanged. On a perfect simulator, the result is identical to the simple Bell circuit above. On real hardware, each CNOT pair would introduce additional error, significantly degrading the final state.

On a perfect simulator, both circuits produce identical results - the 20 extra CNOT gates cancel in pairs and have no effect. On a real device with 1% CNOT error rate, the additional 20 CNOT gates would reduce the fidelity by roughly $(0.99)^{20} \approx 0.82$, meaning about 18% of the signal has been lost to noise. This is why quantum algorithm designers work hard to minimize circuit depth and gate count.

14.4 Interactive: Explore Noise Channels

Now you can see noise in action. The simulator below runs a Bell state circuit through Goqu's density matrix simulator with a real noise model applied. Choose a noise channel and adjust its strength to see how the ideal probability distribution degrades. The ideal histogram shows the exact theoretical probabilities, while the noisy histogram shows sampled measurement outcomes from the noisy simulation. The purity (1.0 for a pure state, 0.25 for maximally mixed 2 qubits) and fidelity (overlap with the ideal state) quantify the degradation.

Try increasing the noise parameter toward 0.5 and watch the purity drop and the noisy histogram spread. Compare depolarizing noise (which affects all Pauli directions equally) with amplitude damping (which preferentially pushes the state toward $|00\rangle$) and phase damping (which destroys coherence but preserves populations). Each noise channel has a distinct physical signature.

Challenge: Finding the Noise Threshold

Using the noise explorer above, find the depolarizing noise parameter at which the Bell state fidelity drops below 0.5. At this point, the noisy state is closer to the maximally mixed state than to the intended Bell state - a critical threshold in quantum error correction theory. (Hint: try values around p = 0.4)

14.5 Characterizing Noise: Tomography and Benchmarking

To fight noise, we first need to measure it. How do experimentalists determine the error rates and noise properties of their quantum hardware? Several techniques exist, ranging from the comprehensive (but resource-intensive) to the practical (but less detailed).

Quantum State Tomography

Quantum state tomography reconstructs the full density matrix of a quantum state from repeated measurements. The idea is straightforward: measure the state in multiple bases (e.g., the $X$, $Y$, and $Z$ bases for a single qubit) and use the statistics to determine all the entries of $\rho$.

For a single qubit, $\rho$ is a $2 \times 2$ Hermitian matrix with unit trace, so it has 3 independent real parameters (the Bloch vector components $r_x$, $r_y$, $r_z$). We can determine these by measuring the expectation values of the three Pauli operators:

$$r_x = \text{Tr}(\rho X), \quad r_y = \text{Tr}(\rho Y), \quad r_z = \text{Tr}(\rho Z)$$

Each expectation value requires many repeated preparations and measurements to estimate accurately. For $n$ qubits, the density matrix is $2^n \times 2^n$ with $4^n - 1$ independent parameters. The measurement effort scales as $O(4^n)$ - exponential in the number of qubits. For even 10 qubits, this requires over a million measurement settings, making full state tomography impractical for large systems.

Quantum Process Tomography

Quantum process tomography (QPT) goes further: instead of characterizing a state, it characterizes an entire operation (gate or channel). The procedure involves:

  1. Prepare a complete set of input states (at least $4^n$ for $n$ qubits).
  2. Apply the unknown process to each input state.
  3. Perform state tomography on each output state.
  4. Reconstruct the full channel description from the input-output pairs.

The resources required scale as $O(16^n)$ - exponential growth that makes QPT infeasible beyond 2-3 qubits. Moreover, QPT is sensitive to state preparation and measurement (SPAM) errors: if your preparation or measurement is imperfect, those errors contaminate the process reconstruction. For these reasons, experimentalists have developed more scalable alternatives.

Randomized Benchmarking

Randomized benchmarking (RB) is the industry-standard technique for measuring average gate error rates. The protocol is elegant:

  1. Choose a random sequence of $m$ Clifford gates (gates from the Clifford group, which we introduced in Chapter 9).
  2. Compute the single Clifford gate that inverts the entire sequence, and append it. If all gates were perfect, the net operation would be the identity.
  3. Prepare $|0\rangle$, apply the sequence plus its inverse, and measure. A perfect system always returns $|0\rangle$.
  4. Repeat for many random sequences of length $m$, and average the probability of returning to $|0\rangle$.
  5. Repeat steps 1-4 for increasing values of $m$.

The key result: the survival probability decays exponentially with sequence length:

$$P(|0\rangle) = A \cdot p^m + B$$

where $p$ is the depolarizing parameter (not to be confused with the noise strength used earlier) and $A$, $B$ absorb SPAM errors. The average error per Clifford gate is extracted as:

$$r = \frac{(d-1)(1-p)}{d}$$

where $d = 2^n$ is the Hilbert space dimension. The crucial advantage of RB is its insensitivity to SPAM errors: because the exponential decay rate $p$ is determined by the slope of the curve (not its offset), imperfect state preparation or measurement shifts $A$ and $B$ but does not affect the extracted error rate. Additionally, RB scales efficiently - it can characterize systems with many qubits without the exponential overhead of tomography.

Key Concept.

Randomized benchmarking measures the average gate error rate by fitting an exponential decay curve to the survival probability as a function of circuit depth. It is insensitive to state preparation and measurement errors, making it the most reliable and scalable method for quantifying gate quality.

Quantum Volume

Individual gate fidelities do not tell the whole story. A processor might have excellent single-qubit gates but terrible connectivity, or fast gates but short coherence times. Quantum volume (QV), introduced by IBM in 2019, attempts to capture the overall capability of a quantum processor in a single number.

The protocol works as follows. For a given integer $m$, construct random circuits on $m$ qubits with depth $m$, where each layer consists of random two-qubit gates between randomly paired qubits. Compile each circuit to the native gate set of the device and run it. The quantum volume is defined as:

$$\text{QV} = 2^m$$

where $m$ is the largest value for which the device can execute these random circuits with greater than two-thirds probability of producing the correct heavy output (outputs in the heavier half of the ideal probability distribution).

Quantum volume captures the interplay between number of qubits, gate fidelity, qubit connectivity, and compiler efficiency. A device with QV $= 2^m$ can roughly be trusted to correctly execute circuits on $m$ qubits with depth $m$. Leading devices have achieved quantum volumes ranging from $2^5 = 32$ to $2^{25}$ and beyond, with ion-trap systems from Quantinuum achieving some of the highest published values.

Note.

Quantum volume has limitations as a benchmark. It tests only square circuits (width equals depth), does not capture performance on specific algorithms, and can be sensitive to compiler optimizations. Other benchmarks - such as CLOPS (circuit layer operations per second), algorithmic benchmarks, and application-specific metrics - provide complementary information. No single number fully captures the capability of a quantum processor.

14.6 NISQ: Noisy Intermediate-Scale Quantum Computing

In 2018, physicist John Preskill coined the term NISQ - Noisy Intermediate-Scale Quantum - to describe the current era of quantum computing. NISQ devices have enough qubits to be difficult for classical computers to simulate (the "intermediate-scale" part, roughly 50 to 1000+ qubits) but too much noise to run the deep, precise circuits that algorithms like Shor's factoring require (the "noisy" part).

What NISQ Devices Can Do

Despite their limitations, NISQ devices have demonstrated several important capabilities:

  • Quantum simulation. Simulating the behavior of other quantum systems - molecules, materials, condensed-matter models - is one of the most natural applications. In 2023, IBM used its 127-qubit Eagle processor to simulate a 127-spin Ising model with 60 layers of two-qubit gates (approximately 2,880 CNOT gates total), producing results that could not be efficiently replicated by brute-force classical simulation. This landmark experiment used sophisticated error mitigation techniques to extract accurate expectation values from noisy hardware.
  • Variational algorithms. Algorithms like VQE (Variational Quantum Eigensolver) and QAOA (Quantum Approximate Optimization Algorithm) use short-depth quantum circuits as subroutines within a classical optimization loop. These hybrid quantum-classical algorithms are designed to be noise-resilient because they use circuits shallow enough to complete before decoherence destroys the signal.
  • Random circuit sampling. Google's 2019 Sycamore experiment demonstrated that a 53-qubit processor could sample from random quantum circuits in 200 seconds - a task estimated to take 10,000 years on the world's fastest classical supercomputer at the time (though improved classical algorithms have since narrowed this gap).

What NISQ Devices Cannot Do (Yet)

The noise limitations of NISQ hardware impose clear boundaries:

  • No Shor's algorithm at useful scale. Factoring a cryptographically relevant number (e.g., a 2048-bit RSA key) would require thousands of logical qubits, each encoded using thousands of physical qubits for error correction - far beyond current hardware.
  • Limited circuit depth. With two-qubit gate errors around 0.1-1%, useful circuits are limited to hundreds of two-qubit gates at most before the output becomes dominated by noise.
  • No proven quantum advantage for practical problems. While quantum computers have demonstrated computational tasks that are hard to replicate classically, these tasks (like random circuit sampling) are not yet solving practical problems faster than classical alternatives. The search for "quantum utility" - useful quantum computation on real-world problems - is an active and intensely competitive area of research.

Error Mitigation vs. Error Correction

NISQ researchers have developed a toolbox of error mitigation techniques that reduce the impact of noise without the full overhead of quantum error correction. These include:

  • Zero-noise extrapolation: Run the circuit at multiple noise levels (by intentionally adding noise), then extrapolate the results back to the zero-noise limit.
  • Probabilistic error cancellation: Decompose the ideal operation into a linear combination of noisy operations that can be physically implemented, then reconstruct the ideal result through post-processing.
  • Measurement error mitigation: Characterize the measurement error matrix by preparing known states, then invert it to correct the measured probability distribution.
  • Dynamical decoupling: Insert sequences of pulses during idle periods to refocus dephasing errors, analogous to the spin-echo technique from NMR.

Error mitigation is not error correction. Mitigation techniques typically trade sampling overhead (more shots needed) for reduced bias in expectation values. They work well for estimating expectation values of observables but generally do not help with tasks that require sampling from the correct output distribution. Crucially, the sampling overhead of most mitigation techniques grows exponentially with circuit depth, which means they cannot indefinitely extend the reach of NISQ hardware.

Common Misconception.

Error mitigation and error correction are often confused, but they are fundamentally different. Error correction (covered in later chapters) uses redundant qubits to detect and actively fix errors during computation, enabling arbitrarily long computations in principle. Error mitigation uses classical post-processing to reduce the bias in measurement results, but the computation itself is still noisy and the sampling cost grows with circuit size.

The Road Ahead

The NISQ era is a transitional period. The long-term goal of the field is fault-tolerant quantum computing - machines with enough qubits and low enough error rates to implement full quantum error correction, enabling arbitrarily deep circuits. The threshold theorem (which we will study in later chapters) guarantees that this is possible in principle, as long as the physical error rate is below a certain threshold (roughly $10^{-3}$ to $10^{-2}$, depending on the error correction code).

Current hardware is approaching and in some cases crossing this threshold for individual gates, but the overhead of error correction remains daunting: estimates suggest that factoring a 2048-bit number with Shor's algorithm would require on the order of 20 million physical qubits. The transition from NISQ to fault tolerance will likely be gradual, with increasingly powerful error-corrected "logical qubits" being deployed alongside noisy physical qubits in hybrid architectures.

In the next chapter, we shift from the physics of quantum hardware to the software side: how do you actually program a quantum computer? We will explore quantum programming languages, compilers, and the software stack that bridges the gap between high-level algorithms and the physical pulse sequences that manipulate qubits.

Predict-Observe-Explain: Pure vs. Mixed States

The state $|+\rangle$ and the maximally mixed state both give 50/50 results in the $Z$-basis. Can you tell them apart?

Predict
Observe
Explain

Predict

Both $|+\rangle$ and the maximally mixed state $I/2$ give 50% $|0\rangle$ and 50% $|1\rangle$ when measured in the $Z$-basis. If we apply a Hadamard gate before measuring, what will the difference be?

Observe

Run this circuit that prepares $|+\rangle$, then applies $H$ and measures. You should see $|0\rangle$ with 100% probability since $H|+\rangle = |0\rangle$:

Now consider: if the qubit were in the maximally mixed state $I/2$ instead, applying $H$ would still give 50/50 (since $H(I/2)H = I/2$). The Hadamard reveals the coherence that the pure state has but the mixed state lacks.

Explain

The pure state $|+\rangle$ has off-diagonal elements (coherences) in its density matrix: $\rho_+ = \frac{1}{2}\begin{pmatrix}1&1\\1&1\end{pmatrix}$. The mixed state has none: $\rho_{\text{mix}} = \frac{1}{2}\begin{pmatrix}1&0\\0&1\end{pmatrix}$. The Hadamard gate converts these coherences into measurable population differences. This is why dephasing noise (which destroys coherences) is so damaging - it makes pure states look mixed, erasing the quantum information encoded in phases.

Interactive: Channel Representations

Explore the three mathematical representations of a noise channel. Adjust the noise strength parameter and see how the Kraus operators, effect on the Bloch sphere, and key metrics change together.

Noise Channel Representations