Chapter 23: Quantum Approximate Optimization (QAOA)

Throughout this textbook, we have built up the mathematical machinery of quantum computing - qubits, gates, entanglement, interference. Now we put that machinery to work on one of the most practically important classes of problems in all of computer science: combinatorial optimization. These are problems where you must find the best solution from a vast, discrete set of possibilities - routing delivery trucks, scheduling airline crews, assigning frequencies to cell towers, or designing integrated circuits. Many of these problems are NP-hard, meaning no known classical algorithm can solve them efficiently in the worst case.

In 2014, Edward Farhi, Jeffrey Goldstone, and Sam Gutmann proposed the Quantum Approximate Optimization Algorithm (QAOA), a hybrid quantum-classical algorithm designed to tackle these problems on near-term quantum hardware. QAOA does not promise exact solutions - the word "approximate" is right there in the name - but it offers a structured, principled approach to finding good solutions using shallow quantum circuits. In this chapter, we will build QAOA from the ground up, apply it to the canonical MaxCut problem, connect it to the deeper theory of adiabatic quantum computation, and explore D-Wave's radically different approach to quantum optimization.

23.1 Combinatorial Optimization

A combinatorial optimization problem asks: given a finite set of candidate solutions, each with an associated cost (or objective value), find the solution that maximizes or minimizes that cost. The challenge is that the number of candidates grows exponentially with the problem size, making exhaustive search impractical.

MaxCut: The Canonical Example

Given an undirected graph $G = (V, E)$ with $n$ vertices and $m$ edges, the MaxCut problem asks: partition the vertices into two groups (call them $S$ and $\bar{S}$) so that the number of edges crossing between the two groups is maximized. An edge $(i, j)$ is "cut" if vertex $i$ is in one group and vertex $j$ is in the other.

We encode the partition using binary variables $z_i \in \{0, 1\}$ for each vertex $i$. Vertex $i$ goes into group $S$ if $z_i = 0$ and into $\bar{S}$ if $z_i = 1$. An edge $(i,j)$ is cut whenever $z_i \neq z_j$, which we can express mathematically as $z_i + z_j - 2 z_i z_j$ (this equals 1 when the bits differ and 0 when they match). The total number of cut edges is:

$$C(z) = \sum_{(i,j) \in E} \left( z_i + z_j - 2 z_i z_j \right)$$

For a graph with $n$ vertices, there are $2^n$ possible partitions. A graph with just 50 vertices has over $10^{15}$ possible cuts - far too many to check one by one. MaxCut is NP-hard: no known classical algorithm solves it exactly in polynomial time for all graphs.

Key Concept.

MaxCut encodes a graph partitioning problem into binary variables. The objective function counts edges between the two groups. Finding the partition that maximizes this count is NP-hard, making it an ideal benchmark for quantum optimization algorithms.

From Classical Bits to Quantum Spins

To bring MaxCut into the quantum world, we replace each classical bit $z_i \in \{0, 1\}$ with a qubit and use the Pauli-Z operator to encode the partition. Recall that $Z|0\rangle = +|0\rangle$ and $Z|1\rangle = -|1\rangle$, so the eigenvalue of $Z_i$ tells us which group vertex $i$ belongs to. The classical cost function translates to a quantum operator called the cost Hamiltonian:

$$H_C = \frac{1}{2} \sum_{(i,j) \in E} \left( I - Z_i Z_j \right)$$

When qubits $i$ and $j$ are in different computational basis states (one $|0\rangle$, one $|1\rangle$), the term $Z_i Z_j$ evaluates to $-1$, so $(I - Z_i Z_j)/2 = 1$ - the edge contributes to the cut. When they are in the same state, $Z_i Z_j = +1$ and the term vanishes. The eigenvalues of $H_C$ are exactly the cut values $C(z)$ for each bitstring $z$. Maximizing the cut is equivalent to finding the highest eigenvalue of $H_C$.

Beyond MaxCut: SAT and Scheduling

The QAOA framework extends far beyond MaxCut. Any combinatorial optimization problem whose objective function can be written as a sum of terms involving a bounded number of variables can be encoded as a cost Hamiltonian:

  • Satisfiability (SAT): Given a Boolean formula in conjunctive normal form (a conjunction of clauses, each a disjunction of literals), find an assignment that satisfies the maximum number of clauses. Each clause becomes a term in $H_C$ using products of $Z$ operators and identity shifts.
  • Graph coloring: Assign colors to vertices so that no two adjacent vertices share a color, using the minimum number of colors. The penalty for adjacent same-color vertices becomes the cost Hamiltonian.
  • Scheduling: Assign jobs to time slots and machines to minimize total completion time or conflicts. Constraints become penalty terms in the Hamiltonian; the objective becomes the cost to minimize.
  • Portfolio optimization: Select a subset of financial assets to maximize expected return while minimizing risk, subject to budget constraints. The quadratic risk-return trade-off maps naturally to a QUBO (Quadratic Unconstrained Binary Optimization) formulation.

The general pattern is always the same: encode the problem as a cost Hamiltonian $H_C$ whose ground state (or highest eigenstate) encodes the optimal solution, then use QAOA to prepare a quantum state with high overlap on that optimal state.

23.2 The QAOA Circuit

The QAOA circuit has an elegant, layered structure. It alternates between two types of unitary operations - the problem unitary (which encodes the cost function) and the mixing unitary (which explores the solution space) - applied $p$ times, where $p$ is the number of layers (also called the circuit depth). Each layer is parameterized by two angles, $\gamma_k$ and $\beta_k$, which are optimized classically.

Step 1: Initial State

QAOA begins by preparing an equal superposition over all $2^n$ computational basis states. This is achieved by applying a Hadamard gate to each of the $n$ qubits:

$$|s\rangle = H^{\otimes n} |0\rangle^{\otimes n} = \frac{1}{\sqrt{2^n}} \sum_{z \in \{0,1\}^n} |z\rangle$$

This initial state gives every possible solution an equal amplitude. The subsequent QAOA layers will reshape these amplitudes, concentrating probability on high-quality solutions.

Step 2: The Problem Unitary $U(H_C, \gamma)$

The problem unitary applies the cost Hamiltonian as a phase rotation:

$$U(H_C, \gamma) = e^{-i \gamma H_C}$$

For MaxCut, the cost Hamiltonian is a sum of commuting $Z_i Z_j$ terms (diagonal operators commute with each other), so the exponential factors cleanly:

$$U(H_C, \gamma) = \prod_{(i,j) \in E} e^{-i \gamma (I - Z_i Z_j)/2} = \prod_{(i,j) \in E} e^{-i \gamma Z_i Z_j / 2}$$

up to a global phase. Each factor $e^{-i \gamma Z_i Z_j / 2}$ is implemented by an $R_{ZZ}(\gamma)$ gate - a two-qubit gate that applies a relative phase depending on whether qubits $i$ and $j$ agree or disagree. In QASM, this is written as rzz(gamma) q[i], q[j];. The problem unitary stamps the cost function into the phases of the quantum state: bitstrings with higher cut values accumulate different phases than those with lower cut values.

Step 3: The Mixing Unitary $U(H_B, \beta)$

The mixing unitary is generated by a "mixer" Hamiltonian $H_B$ that does not commute with $H_C$. The standard choice is the transverse-field mixer:

$$H_B = \sum_{i=1}^{n} X_i$$

which gives the mixing unitary:

$$U(H_B, \beta) = e^{-i \beta H_B} = \prod_{i=1}^{n} e^{-i \beta X_i} = \prod_{i=1}^{n} R_X(2\beta)_i$$

Since the individual $X_i$ operators commute with each other (they act on different qubits), the product factorizes into single-qubit $R_X$ rotations. The mixer's role is to create transitions between computational basis states - it "mixes" the amplitudes, allowing the algorithm to explore the solution space. Without the mixer, the problem unitary would only modify phases, never moving amplitude between different bitstrings.

Step 4: Layer the Unitaries

A depth-$p$ QAOA circuit applies $p$ alternating layers of problem and mixing unitaries:

$$|\gamma, \beta\rangle = U(H_B, \beta_p) \, U(H_C, \gamma_p) \cdots U(H_B, \beta_1) \, U(H_C, \gamma_1) \, |s\rangle$$

The full parameter set is $(\gamma_1, \ldots, \gamma_p, \beta_1, \ldots, \beta_p)$ - a total of $2p$ real numbers. The quantum state $|\gamma, \beta\rangle$ depends on these parameters, and the goal is to choose them so that measuring the final state yields high-quality solutions with high probability.

Step 5: Measure and Optimize

After preparing $|\gamma, \beta\rangle$, we measure all qubits in the computational basis. Each measurement outcome is a bitstring $z$ representing a candidate partition. We evaluate the classical cost $C(z)$ for this bitstring. Repeating many times, we estimate the expected cost:

$$\langle C \rangle = \langle \gamma, \beta | H_C | \gamma, \beta \rangle$$

A classical optimizer (such as COBYLA, Nelder-Mead, or gradient-based methods) then adjusts the $2p$ parameters to maximize $\langle C \rangle$. This classical-quantum feedback loop continues until the parameters converge. The best bitstring observed during the entire optimization process is returned as the approximate solution.

Key Concept.

QAOA is a hybrid quantum-classical algorithm. The quantum computer prepares a parameterized state by alternating problem and mixing unitaries. The classical computer optimizes the parameters to maximize the expected cost. Increasing the depth $p$ improves the approximation quality, and in the limit $p \to \infty$, QAOA can find the exact optimum.

The sandbox below implements a single-layer ($p = 1$) QAOA circuit for MaxCut on a triangle graph (3 vertices, 3 edges). Each edge contributes an rzz gate for the problem unitary, and each qubit gets an rx gate for the mixer. Try varying the angles to see how the output distribution changes.

For the triangle graph, the optimal MaxCut partitions cut exactly 2 out of 3 edges (you cannot cut all three edges of a triangle). The optimal bitstrings are 001, 010, 100, 110, 101, and 011 - any partition where the groups are not trivially all-zero or all-one. A well-tuned QAOA circuit should concentrate probability on these six bitstrings while suppressing 000 and 111.

Note.

The rx gate implements $R_X(\theta) = e^{-i \theta X / 2}$, so rx(2*beta) corresponds to the mixing unitary $e^{-i \beta X}$. In the sandbox above, rx(pi/4) corresponds to $\beta = \pi/8$. Similarly, rzz(gamma) implements $e^{-i \gamma Z_i Z_j / 2}$, which is the problem unitary factor for each edge (up to global phase).

23.3 From QAOA to the Adiabatic Theorem

QAOA is not just a clever heuristic - it has deep roots in one of quantum mechanics' most powerful results: the adiabatic theorem. Understanding this connection explains why QAOA works, when it fails, and how to improve it.

The Adiabatic Theorem

The adiabatic theorem, first formulated by Max Born and Vladimir Fock in 1928, states: if a quantum system starts in the ground state of a Hamiltonian $H(0)$, and the Hamiltonian is changed slowly enough to a final Hamiltonian $H(T)$, then the system will remain in the ground state of the instantaneous Hamiltonian throughout the evolution. "Slowly enough" means the evolution time $T$ must be large compared to the inverse of the minimum energy gap squared between the ground state and the first excited state:

$$T \gg \frac{1}{\Delta_{\min}^2}$$

where $\Delta_{\min} = \min_{0 \leq t \leq T} \left( E_1(t) - E_0(t) \right)$ is the minimum spectral gap along the entire evolution path.

Adiabatic Quantum Computation

Adiabatic quantum computation (AQC) exploits this theorem directly. The idea: start with a simple Hamiltonian $H_B$ whose ground state is easy to prepare (the equal superposition $|s\rangle$ is the ground state of $H_B = -\sum_i X_i$), and slowly interpolate to the problem Hamiltonian $H_C$ whose ground state encodes the solution:

$$H(t) = \left(1 - \frac{t}{T}\right) H_B + \frac{t}{T} H_C$$

If the interpolation is slow enough, the adiabatic theorem guarantees that the system ends in the ground state of $H_C$ - the optimal solution. The required evolution time depends on the minimum gap, which is problem-dependent and can be exponentially small for hard instances.

QAOA as Trotterized Adiabatic Evolution

Here is the key insight: QAOA can be viewed as a Trotterized (discretized) version of adiabatic evolution. In the Trotterization of the continuous-time evolution under $H(t)$, we break the evolution into $p$ discrete steps. At step $k$, the system evolves under $H_C$ for a short time $\gamma_k$ and then under $H_B$ for a short time $\beta_k$. This is exactly the QAOA circuit structure:

$$|\psi\rangle = \prod_{k=1}^{p} e^{-i \beta_k H_B} \, e^{-i \gamma_k H_C} \, |s\rangle$$

If we choose $\gamma_k$ and $\beta_k$ to mimic a linear adiabatic schedule - small at the beginning (mostly mixer) and large at the end (mostly problem) - the QAOA circuit approximates the continuous adiabatic evolution. As $p \to \infty$ with an appropriate choice of parameters, QAOA converges to exact adiabatic evolution and finds the optimal solution.

Key Concept.

QAOA at depth $p$ can be understood as a $p$-step Trotterization of adiabatic quantum computation. In the limit $p \to \infty$, QAOA recovers the exact adiabatic evolution and is guaranteed to find the ground state of $H_C$. At finite $p$, the variational freedom to optimize $\gamma$ and $\beta$ allows QAOA to outperform a naive Trotter discretization of the same depth.

Why Variational Freedom Helps

A crucial advantage of QAOA over a fixed Trotter schedule: the parameters $\gamma_k$ and $\beta_k$ are variationally optimized, not fixed by a predetermined schedule. This means QAOA can compensate for the discretization error inherent in using only $p$ layers. Research has shown that the "Trotter errors" in QAOA can actually be beneficial - they act like counterdiabatic driving terms that suppress transitions out of the ground state, allowing QAOA to achieve better performance than a naive discretization of adiabatic evolution at the same circuit depth.

This also explains a practical observation: optimal QAOA parameters often follow a smooth, monotonic pattern. The $\gamma$ values tend to increase across layers (the problem Hamiltonian gets stronger) while the $\beta$ values tend to decrease (the mixer gets weaker) - precisely mimicking an adiabatic ramp. This pattern has been exploited to develop parameter initialization strategies that reduce the classical optimization burden.

The Gap Sets the Difficulty

The adiabatic perspective reveals why some optimization problems are harder than others for QAOA. The minimum spectral gap $\Delta_{\min}$ along the interpolation path determines how many QAOA layers are needed. Problems with exponentially small gaps require exponentially many layers - and thus exponentially long circuits - to solve exactly. This is consistent with computational complexity theory: if QAOA could efficiently solve NP-hard problems with polynomial-depth circuits, it would imply P = NP.

Common Misconception.

QAOA does not solve NP-hard problems efficiently. For worst-case instances of NP-hard problems, the required circuit depth $p$ can grow exponentially with problem size. What QAOA offers is a structured, quantum-enhanced heuristic that may find good approximate solutions faster than classical heuristics for certain problem classes - but rigorous quantum advantage for QAOA on practically relevant problems remains an active area of research.

23.4 Solving MaxCut with QAOA on a Simulator

Let us now put QAOA into practice. We will solve MaxCut on a small graph, exploring how the parameters $\gamma$ and $\beta$ affect the quality of the solution. The simulation below implements a depth-1 QAOA circuit for MaxCut on a 4-vertex path graph with edges $(0,1)$, $(1,2)$, and $(2,3)$. The optimal MaxCut for this graph cuts all 3 edges, achieved by the bitstrings 0101 and 1010.

Interactive Parameter Exploration

Use the sliders below to adjust the QAOA parameters $\gamma$ and $\beta$. Watch how the measurement distribution changes as you sweep through different parameter values. The goal is to find parameters that concentrate probability on the optimal bitstrings.

As you explore the parameter landscape, notice several features. When $\gamma = 0$, the problem unitary is the identity and you get a uniform distribution over all 16 bitstrings. When $\beta = 0$, the mixer is the identity and the problem unitary only adds phases (which do not affect measurement probabilities after a single layer). The interesting behavior emerges when both parameters are nonzero. Try values near $\gamma \approx 0.6$ and $\beta \approx 1.0$ - you should see the optimal bitstrings 0101 and 1010 begin to dominate the distribution.

The Optimization Landscape

The function $\langle C \rangle(\gamma, \beta)$ defines a two-dimensional landscape for depth-1 QAOA. This landscape has the following properties:

  • It is periodic in both $\gamma$ (period $2\pi$) and $\beta$ (period $\pi$), because the $R_{ZZ}$ and $R_X$ gates are periodic in their angles.
  • It can have multiple local maxima, making the classical optimization non-trivial. However, for small graphs, the landscape is smooth and the global optimum is easy to find.
  • For depth-1 QAOA on MaxCut, analytical formulas exist for the expected cost as a function of $\gamma$ and $\beta$, enabling exact optimization without sampling noise.

Exercise: Build a QAOA Circuit for a Square Graph

Now it is your turn. Construct a depth-1 QAOA circuit for MaxCut on a 4-vertex cycle (square) graph with edges $(0,1)$, $(1,2)$, $(2,3)$, and $(3,0)$. The optimal MaxCut for a square cuts all 4 edges, achieved by the bitstrings 0101 and 1010. Use $\gamma = \pi/4$ and $\beta = \pi/4$ as starting parameters.

Scaling Up: Deeper Circuits

Depth-1 QAOA provides a limited approximation. For 3-regular graphs (a class that includes the triangle), Farhi et al. proved that depth-1 QAOA achieves a worst-case approximation ratio (ratio of expected cut to optimal cut) of at least 0.6924 - provably above the trivial 0.5 ratio of random guessing, but below the 0.878 ratio guaranteed by the best known classical algorithm (the Goemans-Williamson semidefinite programming relaxation) on general graphs.

Increasing $p$ improves the approximation ratio. At depth $p = 2$, the QAOA state depends on 4 parameters ($\gamma_1, \gamma_2, \beta_1, \beta_2$), and the circuit applies two rounds of problem and mixing unitaries. Each additional layer adds more parameters and more expressive power, but also increases the circuit depth and the difficulty of classical parameter optimization. In practice, QAOA circuits with $p = 5$ to $p = 20$ are explored on near-term devices, with parameter initialization strategies (such as linear ramp schedules inspired by the adiabatic connection) helping the classical optimizer converge.

Note.

The approximation ratio of depth-1 QAOA for MaxCut on 3-regular graphs is exactly $0.6924$, proven by Farhi, Goldstone, and Gutmann in their original 2014 paper. Whether QAOA at any fixed depth $p$ can beat the Goemans-Williamson ratio of $0.878$ remains an important open question. Some results suggest depth-$p$ QAOA on bounded-degree graphs cannot surpass classical algorithms for fixed $p$, but the performance at growing $p$ (as a function of problem size) may tell a different story.

23.5 Quantum Annealing: D-Wave's Alternative

While QAOA runs on gate-based quantum computers, there is an entirely different approach to quantum optimization: quantum annealing. The Canadian company D-Wave Systems has built a series of special-purpose quantum processors based on this principle, with their latest systems containing over 5,000 qubits - far more than any gate-based quantum computer, though with a much more restricted set of operations.

How Quantum Annealing Works

Quantum annealing is a physical implementation of adiabatic quantum computation (the same principle that inspired QAOA). The processor implements a time-dependent Hamiltonian:

$$H(t) = A(t) \sum_i X_i + B(t) \left[ \sum_i h_i Z_i + \sum_{i At the start of the anneal ($t = 0$), $A(0)$ is large and $B(0) \approx 0$, so the system is dominated by the transverse field $\sum_i X_i$ and sits in an equal superposition of all bitstrings. Over a time period (typically microseconds), $A(t)$ decreases to zero while $B(t)$ increases, gradually turning on the problem Hamiltonian defined by the biases $h_i$ and couplings $J_{ij}$. If the anneal is slow enough relative to the spectral gap, the system follows the ground state and ends in the solution to the optimization problem.

The Ising Model and QUBO

D-Wave processors natively solve problems expressed as an Ising model (with spin variables $s_i \in \{-1, +1\}$) or equivalently as a Quadratic Unconstrained Binary Optimization (QUBO) problem (with binary variables $x_i \in \{0, 1\}$). The user specifies the coefficients $h_i$ (local biases on each qubit) and $J_{ij}$ (coupling strengths between pairs of qubits), and the annealer searches for the spin configuration that minimizes the total energy.

Many practical optimization problems - MaxCut, graph coloring, scheduling, portfolio optimization, logistics routing - can be formulated as QUBO problems. The translation from a high-level problem to QUBO coefficients is a well-studied field with available software libraries.

Hardware Topology: Chimera and Pegasus

A critical constraint of D-Wave processors is that qubits are not fully connected. Each qubit couples to only a limited number of other qubits, determined by the hardware topology:

  • Chimera topology (used in earlier D-Wave systems): a grid of unit cells, each containing 8 qubits arranged as a $K_{4,4}$ bipartite graph. Each qubit connects to at most 6 others. This sparse connectivity means that fully connected problems must be "embedded" into the hardware graph using chains of physical qubits to represent a single logical qubit.
  • Pegasus topology (used in D-Wave Advantage): a denser connectivity graph where each qubit connects to 15 others. This significantly reduces the embedding overhead, allowing larger problems to be solved with fewer physical qubits.

The embedding overhead is a practical bottleneck: a fully connected $n$-variable problem may require $O(n^2)$ physical qubits on a Chimera graph. The Pegasus topology improves this significantly but does not eliminate it. This is fundamentally different from gate-based quantum computers, where any two qubits can interact (though with SWAP gate overhead on hardware with limited connectivity).

QAOA vs. Quantum Annealing

Feature QAOA (Gate-Based) Quantum Annealing (D-Wave)
Hardware Universal gate-based QPU Special-purpose annealing QPU
Qubit count ~100-1000 (current) ~5000+ (current)
Flexibility Any quantum algorithm Optimization problems only
Error correction Compatible with QEC No standard QEC; uses thermal sampling
Parameters Variational ($\gamma, \beta$) Anneal schedule, chain strengths
Connectivity All-to-all (with SWAP overhead) Limited by hardware topology
Theoretical basis Circuit model + adiabatic limit Adiabatic theorem directly
Key Concept.

Quantum annealing and QAOA both exploit the connection between optimization and finding ground states of quantum Hamiltonians. Quantum annealing uses continuous-time adiabatic evolution on special-purpose hardware. QAOA discretizes this evolution into gate-based circuits that run on universal quantum computers. QAOA's variational parameters give it flexibility beyond a fixed annealing schedule, while D-Wave's higher qubit count allows it to tackle larger problem instances today.

The Quantum Advantage Debate

Whether quantum annealing provides a genuine speedup over classical optimization algorithms remains hotly debated. D-Wave systems have been benchmarked extensively against classical solvers (simulated annealing, parallel tempering, semidefinite programming), and the results are mixed. For many problem types and sizes tested, state-of-the-art classical solvers match or outperform the quantum annealer. However, D-Wave has demonstrated competitive performance on certain structured problems, and the technology continues to improve.

The same uncertainty applies to QAOA on gate-based hardware. Demonstrating unambiguous quantum advantage for optimization - not just for contrived, worst-case problems, but for practical instances at scale - remains one of the grand challenges of quantum computing research.

MaxCut Graph Editor

Click to add vertices, then click two vertices to add an edge. The widget evaluates all possible partitions and highlights the best cut. Vertices are colored by partition.

Interactive: QAOA Circuit Step-Through

Walk through a depth-1 QAOA circuit gate by gate for a triangle graph. The Hadamards create the initial superposition, the CNOTs encode ZZ interactions (problem unitary), and the final Hadamards act as the mixer.

OPENQASM 2.0; include "qelib1.inc"; qreg q[3]; h q[0]; h q[1]; h q[2]; cx q[0], q[1]; cx q[1], q[2]; cx q[0], q[2]; h q[0]; h q[1]; h q[2];

Interactive: QAOA Energy Landscape

Explore the 2D QAOA energy landscape. Adjust both $\gamma$ and $\beta$ to see how the measurement distribution changes for a depth-1 QAOA on the 4-vertex path graph. The optimal bitstrings are 0101 and 1010.

OPENQASM 2.0; include "qelib1.inc"; qreg q[4]; creg c[4]; h q[0]; h q[1]; h q[2]; h q[3]; rzz({gamma}) q[0], q[1]; rzz({gamma}) q[1], q[2]; rzz({gamma}) q[2], q[3]; rx(2*{beta}) q[0]; rx(2*{beta}) q[1]; rx(2*{beta}) q[2]; rx(2*{beta}) q[3]; c[0] = measure q[0]; c[1] = measure q[1]; c[2] = measure q[2]; c[3] = measure q[3];

Interactive: QAOA MaxCut Solver

Run the QAOA algorithm to find approximate solutions to the MaxCut problem. Select the number of QAOA layers $p$ and a graph topology. The solver optimizes the parameters and returns the best bitstring found.

QAOA Solver