Chapter 25: Quantum Error Mitigation

Throughout this textbook, we have encountered the fundamental problem of quantum computing: qubits are fragile. Every quantum gate introduces a small error. Measurements are imperfect. Environmental noise causes decoherence. In previous chapters, we studied quantum error correction (QEC), which uses redundant encoding to protect quantum information. But QEC requires substantial overhead - many physical qubits per logical qubit, complex syndrome extraction circuits, and deep fault-tolerant gate constructions. Current quantum hardware is years away from supporting full QEC at meaningful scale.

This chapter addresses a pragmatic question: what can we do right now, on noisy hardware with tens to hundreds of qubits and no error correction, to improve the quality of our quantum computations? The answer is quantum error mitigation - a family of techniques that reduce the impact of noise on computed results without the overhead of full error correction. Error mitigation does not fix errors at the physical level; instead, it uses clever classical post-processing, noise characterization, and controlled noise amplification to extract more accurate expectation values from noisy measurements. It is the essential toolkit for the NISQ (Noisy Intermediate-Scale Quantum) era.

25.1 Why Error Mitigation Matters in the NISQ Era

The term "NISQ" was coined by John Preskill in 2018 to describe the current generation of quantum processors: devices with 50-1000+ qubits that are too noisy for fault-tolerant computation but large enough to explore computations that challenge classical simulation. NISQ devices sit in an awkward middle ground - powerful enough to be interesting, noisy enough to be frustrating.

The Noise Budget

Consider a quantum circuit with $d$ layers of two-qubit gates, each with error rate $\epsilon$. If the circuit acts on $n$ qubits, the total number of two-qubit gates is roughly $O(nd)$. The probability that the entire circuit executes without any error is approximately:

$$P_{\text{success}} \approx (1 - \epsilon)^{nd} \approx e^{-\epsilon n d}$$

For current hardware with $\epsilon \approx 10^{-3}$ (0.1% two-qubit gate error), a circuit on $n = 100$ qubits with $d = 100$ layers has $P_{\text{success}} \approx e^{-10} \approx 0.00005$ - effectively zero. Even a modest circuit with $n = 20$ qubits and $d = 50$ layers has $P_{\text{success}} \approx e^{-1} \approx 0.37$. The noise budget is tight.

This is why algorithms like QAOA (Chapter 23) and VQE use shallow circuits with few layers. But even shallow circuits accumulate errors, and the computed expectation values are biased toward random (maximally mixed) values. Error mitigation aims to correct this bias.

Error Mitigation vs. Error Correction

It is essential to understand the difference between these two approaches:

Property	Error Mitigation	Error Correction
Physical qubits needed	Same as unmitigated circuit	$10\times$-$1000\times$ overhead
What it corrects	Expectation values (statistical)	Quantum states (physical)
Sampling overhead	Moderate to exponential	Polynomial (once threshold met)
Scalability	Degrades with circuit size	Scales to arbitrary computation
Noise model knowledge	Often required	Not required (threshold-based)
Available today	Yes	Partial (small codes, limited)

Error mitigation is a stopgap, not a permanent solution. As quantum hardware improves and error correction becomes practical, mitigation techniques will become less necessary. But for the current era, they are indispensable - and understanding them is essential for anyone running algorithms on real quantum hardware.

Key Concept.

Error mitigation corrects expectation values through classical post-processing, without adding physical qubits. Error correction corrects quantum states at the physical level, requiring significant qubit overhead. Mitigation trades sampling overhead (more measurements) for accuracy, while correction trades qubit overhead (more physical qubits) for protection. In the NISQ era, mitigation is the practical choice.

25.2 Zero-Noise Extrapolation

Zero-Noise Extrapolation (ZNE) is the most widely used error mitigation technique, prized for its simplicity and generality. The core idea is beautifully intuitive: if you cannot eliminate noise, measure how the result changes as you increase the noise, then extrapolate backward to estimate what the result would be at zero noise.

The ZNE Protocol

ZNE proceeds in two steps:

Noise amplification: Run the circuit at several different noise levels $\lambda_1 < \lambda_2 < \cdots < \lambda_k$, where $\lambda_1 = 1$ corresponds to the physical noise level and $\lambda_j > 1$ corresponds to artificially amplified noise. Measure the expectation value $\langle O \rangle_{\lambda_j}$ at each noise level.
Extrapolation to zero noise: Fit a model (linear, polynomial, or exponential) to the data points $\{(\lambda_j, \langle O \rangle_{\lambda_j})\}$ and extrapolate to $\lambda = 0$ to estimate the noise-free expectation value $\langle O \rangle_0$.

How to Amplify Noise: Unitary Folding

The most common method for controlled noise amplification is unitary folding. The idea exploits the fact that inserting $U^\dagger U$ (a gate followed by its inverse) into a circuit does not change the ideal computation (since $U^\dagger U = I$), but each additional gate introduces real physical noise. To amplify noise by a factor $\lambda$:

Global folding: Replace the circuit $U$ with $U (U^\dagger U)^k$, where $k$ controls the noise amplification. Since the ideal unitary is unchanged ($U \cdot I^k = U$), the noiseless result is the same, but each round of $U^\dagger U$ adds noise from $2n_{\text{gates}}$ additional gate operations. The noise scale factor is $\lambda = 1 + 2k$ (each fold adds twice the original gate count).
Gate-level folding: Instead of folding the entire circuit, fold individual gates: replace each gate $G$ with $G \, G^\dagger G$. This provides finer control over the noise level and can achieve non-integer scale factors.

Extrapolation Models

The choice of extrapolation model affects the accuracy of ZNE:

Linear extrapolation: Fit $\langle O \rangle_\lambda = a + b\lambda$ and extrapolate to $\lambda = 0$. Simple and requires only two noise levels, but assumes noise affects the expectation value linearly - a rough approximation.
Polynomial extrapolation: Fit a degree-$d$ polynomial to $d+1$ noise levels. More flexible but can be sensitive to statistical noise in the data points.
Exponential extrapolation: Fit $\langle O \rangle_\lambda = a \, e^{-b\lambda} + c$. Physically motivated by depolarizing noise models, where expectation values decay exponentially with the noise strength. Often the most accurate model in practice.

The simulation below demonstrates ZNE on a simple circuit. The slider controls the "noise scale factor" $\lambda$, which determines how many times the circuit is folded. Watch how the measurement distribution degrades as noise increases. In a real ZNE workflow, you would record the expectation value at each noise level and fit a curve to extrapolate to $\lambda = 0$.

At $\lambda = 0$ (no noise), you should see a clean Bell state: roughly 50% 00 and 50% 11. As you increase the noise parameter, the distribution spreads to include 01 and 10 outcomes. In a real ZNE experiment, you would measure the expectation value $\langle Z_0 Z_1 \rangle$ at each noise level (which should be +1 for a perfect Bell state) and extrapolate the trend back to zero noise.

Key Concept.

Zero-Noise Extrapolation works by running the circuit at intentionally amplified noise levels (using unitary folding), measuring the expectation value at each level, and extrapolating to the zero-noise limit. It requires no knowledge of the specific noise model and adds no extra qubits - only additional circuit runs. The main cost is the sampling overhead needed to estimate expectation values at multiple noise levels with sufficient precision.

Strengths and Limitations of ZNE

ZNE's greatest strength is its simplicity and generality - it works for any circuit and any observable, requires no detailed noise characterization, and can be implemented as a pure classical post-processing step around the quantum computation. Its main limitations are:

Extrapolation model uncertainty: The true relationship between noise level and expectation value may not match the assumed model (linear, polynomial, exponential), leading to systematic errors in the extrapolated value.
Statistical cost: Amplified noise levels produce noisier estimates, so more shots are needed at higher $\lambda$ to maintain precision. The total number of shots can grow significantly.
Non-scalability: As circuits get larger and noisier, the expectation values at amplified noise levels approach the trivial (maximally mixed) value, making extrapolation increasingly unreliable. ZNE is most effective for circuits that are only moderately noisy.

Interactive: Zero-Noise Extrapolation

ZNE amplifies noise at several scale factors, measures the noisy expectation value at each, and extrapolates back to the zero-noise limit. Adjust the noise level and see how well linear extrapolation recovers the ideal $\langle ZZ \rangle$ value for a Bell state.

Noise strength: 0.05 Channel:

OPENQASM 2.0; include "qelib1.inc"; qreg q[2]; creg c[2]; h q[0]; cx q[0], q[1]; c[0] = measure q[0]; c[1] = measure q[1];

The ideal Bell state has $\langle ZZ \rangle = 1$. Under depolarizing noise at rate $p$ per gate, the expectation value decays as $(1-2p)^{2\lambda}$, where $\lambda$ is the noise scale factor (the circuit has 2 gates, each folded $\lambda$ times). ZNE measures at $\lambda = 1, 2, 3$ and extrapolates to $\lambda = 0$. Drag the slider to see how noise degrades the signal and how extrapolation recovers it.

Base noise $p$: 0.05

25.3 Probabilistic Error Cancellation

While ZNE estimates the zero-noise result by extrapolation, Probabilistic Error Cancellation (PEC) takes a more direct approach: it mathematically inverts the noise channel and implements the inverse as a probabilistic mixture of noisy operations. PEC can, in principle, exactly recover the noiseless expectation value - but at a cost.

The Idea: Inverting the Noise

Suppose each noisy gate in our circuit implements the ideal gate $\mathcal{U}$ followed by a noise channel $\mathcal{N}$, so the actual operation is $\mathcal{N} \circ \mathcal{U}$. If we knew $\mathcal{N}$ exactly, we could apply the inverse channel $\mathcal{N}^{-1}$ to recover the ideal operation. The problem: $\mathcal{N}^{-1}$ is generally not a physical quantum channel (it may have negative eigenvalues), so it cannot be implemented as a single quantum operation.

PEC resolves this by decomposing $\mathcal{N}^{-1}$ as a quasiprobability distribution over physically implementable operations:

$$\mathcal{N}^{-1} = \sum_i q_i \, \mathcal{B}_i$$

where $\{\mathcal{B}_i\}$ is a set of noisy basis operations (such as the noisy gate itself, or the noisy gate preceded by Pauli gates), and $q_i$ are real coefficients that can be negative. The coefficients $q_i$ form a quasiprobability distribution: they sum to 1 but are not all non-negative. This is the mathematical signature of a non-physical operation being decomposed into physical ones.

The PEC Protocol

Characterize the noise: Use gate set tomography or other characterization techniques to determine the noise channel $\mathcal{N}$ for each gate in the circuit.
Compute the quasiprobability decomposition: For each noisy gate, find the coefficients $q_i$ such that $\mathcal{N}^{-1} = \sum_i q_i \mathcal{B}_i$. The normalization factor $C = \sum_i |q_i|$ is called the one-norm of the decomposition.
Sample and execute: For each circuit execution, randomly replace each noisy gate with one of the basis operations $\mathcal{B}_i$ sampled with probability $|q_i|/C$. Record the measurement outcome and weight it by the sign $\text{sgn}(q_i)$ and the normalization $C$.
Average: The weighted average of the measurement outcomes over many samples converges to the noiseless expectation value.

The Sampling Overhead

The catch is the sampling overhead. The variance of the PEC estimator scales as $C^2$, where $C = \sum_i |q_i|$ for a single gate. For a circuit with $g$ noisy gates, each with one-norm $C_{\text{gate}}$, the total overhead is:

$$C_{\text{total}} = \prod_{k=1}^{g} C_k$$

Since each $C_k \geq 1$ (with equality only for noise-free gates), the total overhead grows exponentially with the number of noisy gates. To achieve the same statistical precision as a noiseless computation, we need $C_{\text{total}}^2$ times as many samples. For moderately noisy circuits, this overhead can be manageable (say, $C_{\text{total}} \approx 10$, requiring $100\times$ more samples). For highly noisy circuits, the overhead becomes prohibitive.

Key Concept.

Probabilistic Error Cancellation inverts the noise channel using a quasiprobability decomposition over physical operations. It can exactly recover noiseless expectation values (given perfect noise characterization) but requires sampling overhead that grows exponentially with the number of noisy gates. The one-norm $C = \sum_i |q_i|$ quantifies the cost: achieving precision $\epsilon$ requires $O(C_{\text{total}}^2/\epsilon^2)$ samples.

PEC in Practice

PEC requires detailed knowledge of the noise model - specifically, the noise channel for each gate must be characterized accurately. This is obtained through gate set tomography (GST) or randomized benchmarking, which are themselves resource-intensive characterization procedures. Errors in the noise model propagate into errors in the mitigated expectation value, so PEC is only as good as the noise characterization.

Despite its exponential scaling, PEC has been demonstrated successfully on real quantum hardware for small circuits. It is often combined with ZNE in hybrid approaches: PEC handles the coherent (systematic) errors while ZNE handles the incoherent (stochastic) errors, or PEC is applied to the most error-prone gates while ZNE is used as a global correction.

Common Misconception.

PEC does not reduce noise in the quantum state - it corrects the statistics of the measurement outcomes through weighted sampling. The quantum computer still runs the same noisy circuits; the correction happens entirely in classical post-processing. This means PEC cannot improve the fidelity of individual quantum states, only the accuracy of estimated expectation values.

25.4 Clifford Data Regression and Other Techniques

Beyond ZNE and PEC, a growing toolkit of error mitigation techniques addresses different aspects of the noise problem. We survey the most important ones here.

Clifford Data Regression (CDR)

Clifford Data Regression takes a machine-learning approach to error mitigation. The key insight: Clifford circuits (composed entirely of gates from the Clifford group - H, S, CNOT, and their combinations) can be efficiently simulated classically by the Gottesman-Knill theorem. This means we can compute the exact noiseless result for any Clifford circuit, even one that runs on hundreds of qubits.

The CDR protocol exploits this:

Generate training circuits: Create a set of "near-Clifford" circuits that resemble the circuit of interest but use only (or mostly) Clifford gates. These training circuits should have similar structure, depth, and gate composition to the target circuit, so they experience similar noise.
Collect training data: Run each training circuit on the noisy quantum hardware to get the noisy expectation value $\langle O \rangle_{\text{noisy}}$. Also compute the exact noiseless expectation value $\langle O \rangle_{\text{ideal}}$ using a classical Clifford simulator.
Fit a correction model: Train a regression model (typically linear: $\langle O \rangle_{\text{ideal}} = a \cdot \langle O \rangle_{\text{noisy}} + b$) on the training data pairs $\{(\langle O \rangle_{\text{noisy}}, \langle O \rangle_{\text{ideal}})\}$.
Apply the correction: Run the actual (non-Clifford) circuit of interest on the noisy hardware, measure $\langle O \rangle_{\text{noisy}}$, and apply the trained correction model to estimate the noiseless result.

CDR works because the noise experienced by the training circuits is similar to the noise experienced by the target circuit (since they have similar structure), so the learned noise-correction relationship transfers. The method is especially effective when the target circuit is "almost Clifford" - that is, most of its gates are Clifford gates, with only a few non-Clifford (typically T or parameterized rotation) gates.

Key Concept.

Clifford Data Regression uses efficiently simulable Clifford circuits as training data to learn a noise correction model. By comparing noisy quantum hardware results with exact classical simulations on structurally similar Clifford circuits, CDR learns the relationship between noisy and ideal expectation values and applies this correction to the circuit of interest. It requires no explicit noise model - the correction is learned empirically.

Measurement Error Mitigation

One of the simplest and most effective mitigation techniques addresses measurement errors specifically. The idea: characterize the measurement confusion matrix $M$, where $M_{ij}$ is the probability of measuring outcome $i$ when the true state is $j$. Then invert this matrix to correct the measured probability distribution:

$$\mathbf{p}_{\text{ideal}} = M^{-1} \mathbf{p}_{\text{measured}}$$

For $n$ qubits, the full confusion matrix is $2^n \times 2^n$, which is exponentially large and impractical to characterize for large systems. However, if measurement errors are approximately independent across qubits, the matrix factorizes as a tensor product of $n$ individual $2 \times 2$ confusion matrices, requiring only $O(n)$ calibration circuits. This "tensored" measurement error mitigation is widely used and often provides significant improvement with minimal overhead.

Symmetry Verification

Many quantum algorithms compute states that satisfy known symmetries - for example, particle number conservation in quantum chemistry, or parity conservation in certain optimization problems. Symmetry verification exploits this by measuring the symmetry operator and discarding (post-selecting on) measurement outcomes that violate the expected symmetry. Since noise tends to break symmetries, this filtering removes many error-corrupted shots and improves the quality of the remaining data.

The cost is a reduction in the effective number of samples (shots that fail the symmetry check are discarded), which increases the statistical uncertainty of the estimate. However, for circuits where the noise rate is moderate and the symmetry is well-preserved, the improvement in accuracy far outweighs the loss of statistics.

Twirling and Randomized Compiling

Randomized compiling (also called Pauli twirling) does not reduce noise but converts it into a simpler form. The technique inserts random Pauli gates before and after each CNOT gate (with corresponding corrections to maintain the ideal circuit unitary). The effect is to convert arbitrary coherent errors into incoherent (stochastic) Pauli noise, which is easier to analyze and mitigate with techniques like ZNE and PEC.

The intuition: coherent errors can interfere constructively and accumulate faster than stochastic errors. By randomizing the error, we eliminate the worst-case coherent buildup and ensure the noise behaves like a depolarizing channel, which decays predictably and is well-modeled by exponential extrapolation.

Note.

Randomized compiling is a preprocessing step that makes other mitigation techniques work better. It is routinely used in combination with ZNE and PEC on IBM, Google, and other quantum hardware platforms. The overhead is minimal - it only adds single-qubit Pauli gates, which are among the fastest and most accurate operations on superconducting hardware.

25.5 Combining Mitigation with Correction

Error mitigation and error correction are not mutually exclusive - they are complementary tools that can be combined to push the boundaries of what noisy quantum hardware can achieve. As quantum devices improve and small error-correcting codes become practical, hybrid strategies that use both approaches simultaneously will become increasingly important.

Mitigation on Logical Qubits

Even with error correction, logical error rates are not zero. A distance-$d$ surface code has a logical error rate per round that scales as $(p/p_{\text{th}})^{(d+1)/2}$, where $p$ is the physical error rate and $p_{\text{th}}$ is the threshold. For moderate code distances and physical error rates just below threshold, the logical error rate can still be substantial. Error mitigation techniques (ZNE, PEC) can be applied at the logical level to further reduce the effective logical error rate without increasing the code distance (and thus the qubit count).

The Mitigation-Correction Spectrum

In practice, the choice between mitigation and correction (and the blend of the two) depends on the specific computation and available resources:

Short circuits, few qubits: Error mitigation alone may suffice. ZNE and measurement error mitigation can bring expectation values close to the ideal result with modest sampling overhead.
Medium circuits: Combine randomized compiling (to simplify the noise) with ZNE or PEC. Symmetry verification can further improve results for structured problems.
Long circuits, many qubits: Error correction becomes necessary. But mitigation applied on top of correction can reduce the required code distance, saving physical qubits.
Fault-tolerant era: Full error correction with high-distance codes. Error mitigation may still be useful for reducing the overhead of magic state distillation or for running circuits just beyond the code's protection capability.

Recent Milestones

The practical importance of error mitigation was dramatically demonstrated in 2023, when IBM used a combination of ZNE, PEC, and twirling to execute a 127-qubit circuit with up to 60 layers of CNOT gates on their Eagle processor - far beyond what would be meaningful without mitigation. The mitigated expectation values matched exact classical simulation (computed with a tensor-network simulator at enormous classical cost) to within statistical error. This was a landmark result, showing that error mitigation can extend the useful reach of noisy hardware by a significant margin.

However, it is important to note that the sampling overhead was substantial (millions of circuit executions) and the classical post-processing required detailed noise characterization. As circuits grow larger, this overhead grows, and at some point, error correction becomes the more efficient approach. The crossover point depends on the hardware error rate, the circuit structure, and the precision required.

Key Concept.

Error mitigation and error correction are complementary, not competing approaches. In the near term, mitigation is the practical choice for NISQ devices. In the medium term, hybrid strategies combining partial correction with mitigation will extend the reach of both. In the long term, fault-tolerant error correction will be the primary approach, with mitigation playing a supporting role for reducing overhead.

A Practitioner's Decision Tree

When running a quantum algorithm on real hardware, the following decision tree provides a practical guide:

Always apply measurement error mitigation. It is cheap, effective, and improves virtually every computation.
Use randomized compiling to convert coherent errors into stochastic Pauli noise. This makes subsequent mitigation techniques more effective.
If the circuit has known symmetries, apply symmetry verification to discard error-corrupted shots.
Apply ZNE as a general-purpose correction. Use exponential extrapolation with 3-5 noise scale factors for best results.
If high accuracy is needed and noise characterization is available, use PEC or CDR for the most critical parts of the circuit.
If the circuit exceeds the mitigation budget, consider reducing circuit depth (fewer QAOA layers, shallower ansatz) or using error correction for the most noise-sensitive subcircuits.

Note.

Software frameworks for error mitigation are mature and freely available. Mitiq (from Unitary Fund), Qiskit's built-in mitigation tools, and Cirq's error mitigation modules all provide production-ready implementations of ZNE, PEC, CDR, and measurement error mitigation. Using these tools is now standard practice for any serious quantum computation on real hardware.

The Sandbox: ZNE in Action

The exercise below asks you to construct a circuit and observe the effect of noise. While our simulator does not model realistic noise, the exercise illustrates the principle of ZNE: adding extra (identity-equivalent) gates to a circuit increases the "noise exposure" without changing the ideal result. In a real experiment, you would measure the expectation value at each fold level and extrapolate.

In the simulator (which is noiseless), the folded circuit produces the same result as the unfolded circuit - confirming that unitary folding is an identity operation in the ideal case. On real noisy hardware, each additional fold would degrade the result, providing the data points needed for zero-noise extrapolation.

Dynamical Decoupling

Dynamical decoupling (DD) suppresses low-frequency noise by inserting rapid pulse sequences during idle periods in a circuit. The key idea is that if an unwanted interaction accumulates a phase error during idle time, a carefully timed sequence of pulses can refocus (cancel) that error - much like a spin echo in NMR.

Common DD sequences include:

XX (Hahn echo): A single $X$ pulse at the midpoint of an idle period refocuses static $Z$ errors, producing a "spin echo."
XY4: Alternates $X$ and $Y$ pulses ($X$-$Y$-$X$-$Y$) at quarter intervals, suppressing both $Z$ and off-axis errors. More robust than XX for realistic noise.
CPMG: Multiple equally spaced $X$ pulses, extending the Hahn echo to filter out a wider range of noise frequencies.

DD is one of the simplest and most broadly applicable mitigation techniques. It requires no additional qubits and only modest pulse overhead, making it a default strategy on most quantum computing platforms. The widget below shows how DD pulses are inserted into a circuit's idle slots.

Mitigation vs Correction: The Full Spectrum

Error mitigation and error correction occupy different points on a spectrum from near-term to fault-tolerant quantum computing. This comparison shows how each approach handles the same noisy circuit.

Noise strength: 0.10

OPENQASM 2.0; include "qelib1.inc"; qreg q[2]; creg c[2]; h q[0]; cx q[0], q[1]; c[0] = measure q[0]; c[1] = measure q[1];

The left panel shows the ideal (error-corrected) result: a clean Bell state with only 00 and 11 outcomes. The right panel shows the same circuit under noise, where 01 and 10 leak in. Error mitigation corrects the expectation values statistically; error correction prevents the errors at the physical level.

Interactive: Pauli Twirling

Pauli twirling inserts random Pauli gates before and after each CNOT, converting coherent errors into simpler stochastic Pauli noise. Compare the original circuit with a twirled version - the ideal computation is unchanged, but the noise structure is randomized.