Chapter 22: Variational Quantum Eigensolver (VQE)

In Chapter 21 we built the variational quantum computing framework: parameterized circuits, classical optimizers, and the measurement machinery that connects them. Now we put that framework to work on one of the most important problems in science - finding the ground-state energy of a quantum system. The Variational Quantum Eigensolver (VQE), proposed by Peruzzo et al. in 2014, was the first variational quantum algorithm to be experimentally demonstrated, and it remains the most studied application of near-term quantum hardware. In this chapter, we will see how molecules are mapped onto qubits, run VQE on simple molecules, explore an adaptive variant that builds its own ansatz, and confront the fundamental limitations that stand between today's experiments and practical quantum advantage.

22.1 The Ground State Problem in Chemistry and Physics

At the heart of quantum chemistry lies a deceptively simple question: given the Hamiltonian $H$ of a molecule (which encodes all the forces between its electrons and nuclei), what is its lowest-energy eigenvalue $E_0$ and the corresponding eigenstate $|\psi_0\rangle$? This is the ground-state problem, and solving it would unlock the ability to predict chemical reaction rates, design new materials, discover drugs, and understand catalysis at a fundamental level.

Why the Problem Is Hard

The electronic Hamiltonian of a molecule with $N$ electrons occupying $M$ spin-orbitals takes the form (in second quantization):

$$H = \sum_{p,q} h_{pq}\, a_p^\dagger a_q + \frac{1}{2}\sum_{p,q,r,s} h_{pqrs}\, a_p^\dagger a_q^\dagger a_r a_s$$

where $a_p^\dagger$ and $a_p$ are fermionic creation and annihilation operators, and the coefficients $h_{pq}$ (one-electron integrals) and $h_{pqrs}$ (two-electron integrals) encode the kinetic energy, electron-nuclear attraction, and electron-electron repulsion. The difficulty is that the Hilbert space grows exponentially: $M$ spin-orbitals give rise to a Hilbert space of dimension $\binom{M}{N}$, which scales roughly as $M^N/N!$. For a modestly-sized molecule with 50 orbitals and 25 electrons, the Hilbert space has dimension $\binom{50}{25} \approx 1.26 \times 10^{14}$ - far too large to diagonalize exactly on any classical computer.

Classical quantum chemistry has developed powerful approximation methods - Hartree-Fock, density functional theory (DFT), configuration interaction (CI), coupled cluster - but all involve trade-offs between accuracy and computational cost. Full Configuration Interaction (FCI), which gives the exact answer within a chosen basis set, scales exponentially and is limited to small molecules. Coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) - the "gold standard" of quantum chemistry - scales as $O(M^7)$ and fails for strongly correlated systems where the Hartree-Fock reference state is a poor starting point.

Key Concept.

The ground-state problem asks for the lowest eigenvalue $E_0$ of a molecular Hamiltonian $H$. Classical methods face an exponential wall: exactly representing the quantum state of $N$ electrons in $M$ orbitals requires storing $\binom{M}{N}$ amplitudes. Quantum computers can represent such states natively using $M$ qubits, suggesting an exponential advantage in memory. VQE exploits this advantage while keeping circuit depths manageable for near-term hardware.

The Variational Principle

VQE exploits the variational principle of quantum mechanics: for any trial state $|\psi\rangle$, the expectation value of the Hamiltonian provides an upper bound on the true ground-state energy:

$$\langle \psi | H | \psi \rangle \geq E_0$$

with equality if and only if $|\psi\rangle = |\psi_0\rangle$ (the true ground state). By parameterizing the trial state as $|\psi(\vec{\theta})\rangle$ and minimizing $\langle \psi(\vec{\theta})| H |\psi(\vec{\theta})\rangle$ over all parameters $\vec{\theta}$, VQE systematically approaches $E_0$ from above. The minimum value found is guaranteed to be an upper bound, providing a rigorous certificate on the quality of the approximation.

Chemical Accuracy

In quantum chemistry, the standard benchmark for a useful calculation is chemical accuracy: an error of no more than $1.6 \times 10^{-3}$ Hartree (approximately 1 kcal/mol or 4.2 kJ/mol). This is the threshold below which computed energies can reliably predict which of two chemical reactions is more favorable, or whether a particular molecular geometry is stable. For VQE to be practically useful, it must achieve chemical accuracy on problems that classical methods struggle with.

For small molecules, classical methods easily achieve chemical accuracy. The value of VQE lies in strongly correlated systems - molecules where the electrons interact in complex, many-body ways that cannot be captured by a single reference determinant. Examples include transition metal complexes, molecules with stretched or breaking bonds, and certain excited states. These are precisely the systems where Hartree-Fock and standard coupled cluster methods fail, and where the exponential memory advantage of a quantum computer could make a real difference.

Historical Note.

The Born-Oppenheimer approximation, which separates nuclear and electronic motion, is assumed throughout VQE. The nuclei are treated as fixed classical particles (defining the molecular geometry), and VQE computes the electronic energy for each geometry. By repeating VQE at many different geometries, one can map out the potential energy surface of the molecule - the energy as a function of nuclear positions - which determines reaction pathways, equilibrium structures, and vibrational frequencies.

22.2 Mapping Molecules to Qubits

Before running VQE on a quantum computer, we must translate the fermionic Hamiltonian (expressed in terms of creation and annihilation operators $a_p^\dagger$, $a_p$) into a qubit Hamiltonian (expressed in terms of Pauli operators $X$, $Y$, $Z$, $I$). This translation is non-trivial because electrons are fermions: swapping two electrons introduces a minus sign in the wavefunction. Qubits, by contrast, are distinguishable - swapping two qubits does not inherently introduce any sign. The mapping must encode the fermionic antisymmetry into the qubit representation.

The Jordan-Wigner Transformation

The Jordan-Wigner (JW) transformation, originally developed in 1928 for a different purpose, provides the most straightforward mapping. The idea is simple: each spin-orbital is assigned to one qubit. Qubit $p$ being in state $|1\rangle$ means orbital $p$ is occupied; $|0\rangle$ means it is empty. The creation and annihilation operators become:

$$a_p^\dagger \mapsto \frac{1}{2}(X_p - iY_p) \otimes Z_{p-1} \otimes Z_{p-2} \otimes \cdots \otimes Z_0$$ $$a_p \mapsto \frac{1}{2}(X_p + iY_p) \otimes Z_{p-1} \otimes Z_{p-2} \otimes \cdots \otimes Z_0$$

The string of $Z$ operators on all qubits with index less than $p$ is called the Jordan-Wigner string. It encodes the fermionic sign: each $Z$ gate contributes $+1$ if the corresponding orbital is empty and $-1$ if it is occupied, effectively counting the parity of occupied orbitals below orbital $p$.

The JW transformation is conceptually simple and preserves the occupation number representation directly - you can read off which orbitals are occupied by looking at which qubits are $|1\rangle$. However, the $Z$-strings make it non-local: a single fermionic operator acting on orbital $p$ maps to a qubit operator that acts on all qubits $0$ through $p$. For the last orbital in a system with $M$ orbitals, this means a Pauli string of length $M$. This non-locality increases the circuit depth required to implement the corresponding gates.

Key Concept.

The Jordan-Wigner transformation maps each spin-orbital to one qubit: $|1\rangle$ = occupied, $|0\rangle$ = empty. Fermionic antisymmetry is encoded by strings of $Z$ operators on all lower-indexed qubits. The mapping is exact but non-local - a single fermionic operator on orbital $p$ becomes a Pauli string of weight up to $O(p)$.

The Bravyi-Kitaev Transformation

The Bravyi-Kitaev (BK) transformation, developed in 2002, offers a more balanced encoding. Instead of storing occupation numbers directly (as JW does), BK stores a mixture of occupation and parity information in each qubit. The encoding is based on a binary tree structure: some qubits store the occupation of individual orbitals, while others store the parity (even or odd number of electrons) of groups of orbitals.

The key advantage: whereas the Jordan-Wigner transformation produces Pauli strings of weight up to $O(M)$ (where $M$ is the number of orbitals), the Bravyi-Kitaev transformation produces strings of weight at most $O(\log M)$. This logarithmic scaling means that for large molecules, BK-transformed Hamiltonians have shorter Pauli strings, which translates to shallower circuits when implementing VQE.

The trade-off is complexity: the BK encoding is harder to interpret physically (you cannot simply read off orbital occupations from qubit states), and the transformation itself is more involved. For small molecules (fewer than about 20 qubits), the difference in circuit depth between JW and BK is modest. For larger systems, the BK advantage becomes significant.

Comparing the Transformations

Property Jordan-Wigner Bravyi-Kitaev
Qubit meaning Occupation of orbital $p$ Mixture of occupation and parity
Pauli string weight $O(M)$ worst case $O(\log M)$ worst case
Number of qubits $M$ (one per spin-orbital) $M$ (same)
Physical interpretability Direct (read off occupations) Indirect (requires decoding)
Best for Small systems, pedagogy Large systems ($M > 20$)

A third option, the parity transformation, stores the parity of all orbitals up to and including orbital $p$ in qubit $p$. It has similar Pauli weight to Jordan-Wigner but enables a useful qubit reduction: the total parity of all electrons is a conserved quantity, so the last qubit can be removed, saving one qubit. For H$_2$ with four spin-orbitals, this reduces the problem from four qubits to two.

From Fermionic to Qubit Hamiltonian: H$_2$ Example

Let us trace the mapping concretely for molecular hydrogen (H$_2$) in the minimal STO-3G basis set. The molecule has 2 electrons in 4 spin-orbitals (2 spatial orbitals, each with spin-up and spin-down). After applying the Jordan-Wigner transformation and using symmetry reductions (freezing the core orbital and exploiting parity conservation), the Hamiltonian reduces to an effective two-qubit operator:

$$H_{\text{H}_2} = c_0\, I \otimes I + c_1\, Z \otimes I + c_2\, I \otimes Z + c_3\, Z \otimes Z + c_4\, X \otimes X + c_5\, Y \otimes Y$$

At the equilibrium bond length of $0.735$ angstroms and using a minimal STO-3G basis with parity mapping and two-qubit tapering, the coefficients are approximately $c_0 \approx -1.05$, $c_1 \approx 0.39$, $c_2 \approx -0.39$, $c_3 \approx -0.01$, $c_4 \approx 0.18$, and $c_5 \approx 0.18$ (all in Hartree). This six-term Hamiltonian can be measured with just three measurement settings (grouping commuting terms), making H$_2$ an ideal test case for VQE.

Historical Note.

The first experimental VQE computation was performed by Peruzzo et al. in 2014 on a photonic quantum processor, computing the ground-state energy of helium hydride (HeH$^+$). In 2017, Kandala et al. at IBM demonstrated VQE on a superconducting processor for H$_2$, LiH, and BeH$_2$ using a hardware-efficient ansatz with up to six qubits. These experiments established VQE as the leading near-term quantum chemistry algorithm.

22.3 VQE in Practice: Simulating H$_2$ and LiH

Let us now run VQE in practice. We start with the simplest possible case: a two-qubit parameterized circuit applied to a two-qubit Hamiltonian representing H$_2$. The ansatz is a single layer of $R_y$ rotations followed by a CNOT gate and another rotation - the same circuit structure we explored in Chapter 21.

H$_2$ Energy Landscape

The simulation below lets you manually explore the VQE energy landscape for H$_2$. The circuit prepares a parameterized two-qubit state, and the output distribution corresponds to the trial state $|\psi(\vec{\theta})\rangle$. In a full VQE implementation, the classical optimizer would measure $\langle H_{\text{H}_2} \rangle$ in this state and adjust the parameters to minimize the energy. Here, you can play the role of the optimizer: sweep the angles and look for the parameter values that concentrate the probability on the states corresponding to lowest energy.

Try setting $\theta_0 \approx \pi$ (3.14) and $\theta_1 \approx \pi$ (3.14) with $\theta_2 = 0$. Before the CNOT, both qubits are in $|1\rangle$; the CNOT flips the target, producing $|10\rangle$. Now slowly adjust $\theta_2$ - you should see probability flow between the $|01\rangle$ and $|10\rangle$ states, which correspond to the bonding and antibonding configurations of H$_2$. The ground state of H$_2$ is dominated by the $|01\rangle$ and $|10\rangle$ components (representing the two electrons occupying the bonding orbital), with the exact mixture determined by electron correlation.

The exact ground-state energy of H$_2$ at equilibrium bond length in the STO-3G basis is approximately $-1.137$ Hartree. A well-optimized VQE with even this simple two-qubit ansatz can achieve chemical accuracy (error less than $1.6 \times 10^{-3}$ Hartree, or about 1 kcal/mol), which is the threshold at which computed energies are useful for predicting chemical properties.

VQE Sandbox

The sandbox below provides a starting circuit for a two-qubit VQE experiment. Your task: modify the rotation angles to prepare a state that approximates the H$_2$ ground state. The ground state has significant probability on both $|01\rangle$ and $|10\rangle$, with small contributions from $|00\rangle$ and $|11\rangle$. Try to find angles that produce a distribution close to this pattern.

In the starting code, the x q[1]; gate prepares the $|01\rangle$ state (one electron in the bonding orbital), and the $R_y(0.22)$ rotation on qubit 0 followed by a CNOT introduces a small admixture of the $|10\rangle$ configuration. This mimics the effect of electron correlation in H$_2$. Experiment with different angles - the approximate ground state should show roughly 98-99% probability on $|01\rangle$ and $|10\rangle$ combined, with the dominant component on $|01\rangle$.

Beyond H$_2$: Lithium Hydride

Lithium hydride (LiH) is the next step up in complexity. In a minimal basis, LiH requires 12 spin-orbitals. After symmetry reductions and orbital freezing (removing the core $1s$ orbital of lithium, which does not participate meaningfully in bonding), the problem can be reduced to approximately 4-6 qubits, depending on the encoding.

The LiH Hamiltonian in the Jordan-Wigner transformation contains roughly 100 Pauli terms (compared to 6 for H$_2$), requiring more measurement settings and more shots per VQE iteration. The ansatz also needs more parameters and layers to capture the stronger electron correlation effects in LiH. IBM's 2017 experiment used a hardware-efficient ansatz with 6 qubits and achieved results within chemical accuracy across a range of bond lengths, mapping out the potential energy surface of the molecule.

The ground-state energy of LiH at its equilibrium bond length (approximately 1.595 angstroms) in the STO-3G basis is about $-7.882$ Hartree. Computing this value with VQE requires carefully balancing the depth of the ansatz (more layers improve accuracy but introduce more noise on real hardware) against the shot budget and optimization convergence.

The Potential Energy Surface

One of VQE's most important applications is computing the potential energy surface (PES) - the ground-state energy as a function of nuclear geometry. For H$_2$, the PES is a curve: the energy as a function of the bond length $R$ between the two hydrogen atoms. At short distances ($R < 0.5$ angstroms), the nuclei repel strongly and the energy is high. At the equilibrium distance ($R \approx 0.735$ angstroms), the energy reaches its minimum of approximately $-1.137$ Hartree. At large distances ($R > 3$ angstroms), the molecule dissociates into two separate hydrogen atoms.

The dissociation regime is particularly interesting because it is where classical methods struggle. At large bond lengths, the H$_2$ wavefunction becomes strongly correlated - the two electrons are far apart and their quantum superposition cannot be described by a single Slater determinant. Hartree-Fock fails dramatically in this regime, predicting the wrong dissociation limit. Restricted coupled cluster also struggles. VQE with a sufficiently expressive ansatz handles this regime naturally, because the quantum computer represents the correlated wavefunction directly.

Common Misconception.

It is sometimes claimed that VQE can only handle trivially small molecules. While current quantum hardware limits practical VQE to small systems, the algorithm itself scales polynomially with system size (the number of Pauli terms grows as $O(M^4)$). The real bottleneck is not the algorithm but the hardware: circuit depth, gate fidelity, and qubit count. As hardware improves, VQE will naturally scale to larger molecules.

Exercise: Build a VQE Trial State

Interactive: Expectation Value of a Variational Circuit

The core of VQE is computing the expectation value $\langle H \rangle = \langle \psi(\theta) | H | \psi(\theta) \rangle$ for a parameterized circuit. Adjust the rotation angle below to see how the expectation value of the $ZZ$ observable changes for a simple 2-qubit ansatz.

Pauli Decomposition of H$_2$ Hamiltonian

The H$_2$ Hamiltonian is a weighted sum of Pauli terms. Click any bar to see the measurement circuit needed to evaluate that term. The magnitude of each coefficient determines its contribution to the total energy.

22.4 ADAPT-VQE: Letting the Algorithm Choose Its Ansatz

A fundamental tension in VQE is the choice of ansatz. A hardware-efficient ansatz is easy to implement but may not be expressive enough (or may be too expressive, leading to barren plateaus). A UCCSD ansatz is physically motivated but can require very deep circuits. Is there a way to let the algorithm itself discover the best ansatz for a given problem?

The ADAPT-VQE algorithm (Adaptive Derivative-Assembled Pseudo-Trotter ansatz VQE), proposed by Grimsley et al. in 2019, answers this question. Instead of fixing the circuit structure before optimization, ADAPT-VQE grows the ansatz one operator at a time, choosing at each step the operator that most reduces the energy.

The ADAPT-VQE Protocol

  1. Define an operator pool. Start with a set of candidate operators $\{A_k\}$ - typically single and double excitation operators from the UCCSD framework, or even individual Pauli strings. Each operator $A_k$ generates a parameterized unitary $e^{i\theta_k A_k}$.
  2. Initialize. Begin with a reference state (usually the Hartree-Fock state) and an empty ansatz.
  3. Compute gradients. For each operator $A_k$ in the pool, compute the energy gradient $\frac{\partial E}{\partial \theta_k}\big|_{\theta_k=0}$ that would result from appending $e^{i\theta_k A_k}$ to the current ansatz. This gradient measures how much operator $A_k$ would reduce the energy if added.
  4. Select and append. Choose the operator with the largest gradient magnitude and append it to the ansatz with a new variational parameter.
  5. Optimize. Re-optimize all parameters in the (now longer) ansatz.
  6. Check convergence. If the norm of the gradient vector $||\vec{g}|| = \sqrt{\sum_k |\partial E/\partial \theta_k|^2}$ is below a threshold, stop. Otherwise, return to step 3.
Key Concept.

ADAPT-VQE builds the ansatz iteratively: at each step, it selects the operator from a predefined pool that has the largest energy gradient and appends it to the circuit. This produces a compact, problem-tailored ansatz that typically requires fewer parameters and shallower circuits than a fixed UCCSD ansatz, while achieving comparable or better accuracy.

Advantages of ADAPT-VQE

ADAPT-VQE has several compelling properties:

  • Compact ansatze. By adding only the most relevant operators, ADAPT-VQE typically produces ansatze with far fewer parameters than full UCCSD. For H$_2$, a single operator suffices. For LiH, ADAPT-VQE may need 10-20 operators compared to the hundreds in full UCCSD.
  • Systematic convergence. Under mild conditions, ADAPT-VQE converges to the exact ground state as the ansatz grows. The algorithm is provably exact for a complete operator pool.
  • Barren plateau resistance. Because each new operator is chosen based on a non-zero gradient, the algorithm avoids regions of the parameter landscape where gradients vanish. It starts from a structured state (Hartree-Fock) and makes controlled perturbations, rather than exploring an unstructured landscape.

Qubit-ADAPT-VQE

A variant called qubit-ADAPT-VQE (Tang et al., 2021) uses individual Pauli strings as the operator pool instead of fermionic excitation operators. This makes each appended operator simpler to implement (typically a single CNOT ladder plus rotations), further reducing circuit depth at the cost of potentially requiring more operators to converge. Qubit-ADAPT-VQE is particularly attractive for near-term hardware where every CNOT gate introduces significant noise.

ADAPT-VQE for H$_2$: A Walkthrough

For H$_2$ in the minimal basis, the ADAPT-VQE algorithm proceeds as follows. The reference state is the Hartree-Fock state $|0101\rangle$ (in the four-qubit Jordan-Wigner representation, the two occupied spin-orbitals are orbitals 0 and 2). The operator pool contains single and double excitation operators. In the first iteration, the algorithm evaluates the gradient of each operator and finds that the double excitation $a_1^\dagger a_3^\dagger a_2 a_0 - \text{h.c.}$ (promoting both electrons from the bonding to the antibonding orbital) has the largest gradient. This single operator is appended to the ansatz, and its parameter is optimized.

Remarkably, for H$_2$, this one operator is sufficient to converge to chemical accuracy. The resulting ansatz has just one parameter and requires a circuit of modest depth. A fixed UCCSD ansatz would include the same operator, but also all single excitation operators (which contribute negligibly for H$_2$), resulting in unnecessary parameters and circuit depth. ADAPT-VQE discovers automatically that only the double excitation matters.

Practical Note.

The gradient evaluation in step 3 of ADAPT-VQE requires measuring $\langle [H, A_k] \rangle$ for each operator in the pool. If the pool has $K$ operators, this adds $K$ measurements per ADAPT iteration - a significant overhead. Efficient strategies for gradient screening (quickly identifying which operators have negligible gradients) are an active area of research.

22.5 Limitations and the Barren Plateau Problem

Variational quantum algorithms are the most practical path to useful near-term quantum computation, but they face serious challenges. Understanding these limitations is essential for assessing when - and whether - VQE will deliver a genuine quantum advantage.

The Barren Plateau Problem

In 2018, McClean et al. published a seminal paper demonstrating that random parameterized quantum circuits suffer from barren plateaus: regions of the parameter landscape where the gradient of the cost function vanishes exponentially with the number of qubits. Formally, for a circuit drawn from a unitary 2-design (a distribution of unitaries that reproduces certain statistical properties of the full unitary group):

$$\text{Var}\left[\frac{\partial C}{\partial \theta_k}\right] \leq F(n)$$

where $F(n)$ decreases exponentially in $n$, the number of qubits. This means that for a random circuit on $n$ qubits, the typical gradient magnitude is exponentially small - of order $2^{-n}$. An optimizer relying on gradients cannot distinguish the direction of steepest descent from statistical noise, even with an exponential number of measurement shots.

Key Concept.

Barren plateaus occur when the variance of cost function gradients vanishes exponentially with qubit count: $\text{Var}[\partial C/\partial \theta_k] \sim O(2^{-n})$. On a barren plateau, the cost function landscape is exponentially flat, and no gradient-based optimizer can find the minimum without exponentially many function evaluations. This was proven by McClean et al. (2018) for sufficiently expressive random circuits.

When Do Barren Plateaus Arise?

Subsequent research has identified several conditions that trigger barren plateaus:

  • Deep, unstructured circuits. Hardware-efficient ansatze with many layers approach a unitary 2-design, producing barren plateaus. The critical depth scales linearly with the number of qubits.
  • Global cost functions. Cost functions that depend on measurements of all qubits simultaneously (like the full energy $\langle H \rangle$) are more susceptible to barren plateaus than local cost functions that depend on only a few qubits. Cerezo et al. (2021) showed that the locality of the cost function determines the onset of barren plateaus in shallow circuits.
  • Excessive entanglement. Circuits that generate too much entanglement too quickly tend to produce states that are locally indistinguishable from random states, leading to vanishing gradients.
  • Noise. Wang et al. (2021) showed that hardware noise can itself induce barren plateaus, even in circuits that would be trainable in the noiseless case. As noise accumulates, the quantum state approaches the maximally mixed state, which has zero gradient in every direction.

A Concrete Example

To illustrate the severity of barren plateaus, consider a hardware-efficient ansatz with $n$ qubits, $L$ layers, and $3nL$ parameters (three rotation angles per qubit per layer). McClean et al. showed that for $L \geq O(n)$ layers, the variance of any partial derivative is bounded by:

$$\text{Var}\left[\frac{\partial C}{\partial \theta_k}\right] \leq \frac{1}{2^n - 1}$$

For $n = 20$ qubits, this gives a variance bound of roughly $10^{-6}$, meaning the typical gradient magnitude is about $10^{-3}$. With finite shot noise (say 1024 shots per measurement), the gradient is completely buried in statistical noise. For $n = 50$ qubits, the variance bound drops to approximately $10^{-15}$ - a gradient so small that no physically realizable number of measurements could detect it. The landscape is, for all practical purposes, perfectly flat.

Strategies for Avoiding Barren Plateaus

The barren plateau problem is not necessarily fatal. Several mitigation strategies have been developed:

  • Structured ansatze. Problem-specific ansatze (UCCSD, Hamiltonian variational, symmetry-preserving) have much less expressive power than random circuits, which is precisely what makes them trainable. By constraining the circuit to a physically relevant subspace, the effective landscape avoids barren plateaus.
  • Layerwise training. Instead of optimizing all parameters at once, train one layer at a time. Add a new layer only after the previous layers are optimized. This keeps the effective circuit depth small during optimization.
  • Smart initialization. Initialize parameters near the identity transformation (small random angles near zero) rather than uniformly at random. This ensures the initial circuit is close to a known state with non-zero gradients.
  • ADAPT-VQE. The adaptive approach inherently avoids barren plateaus by growing the ansatz from a structured starting point and only adding operators with demonstrably large gradients.
  • Local cost functions. Reformulate the optimization objective to depend on local measurements when possible, reducing susceptibility to barren plateaus in shallow circuits.

Other Limitations

Beyond barren plateaus, VQE faces additional challenges:

  • Measurement overhead. The number of Pauli terms in the Hamiltonian scales as $O(M^4)$ for $M$ orbitals, and each must be measured with enough shots to achieve the desired precision. For a molecule with 100 orbitals, this can mean millions of circuit executions per VQE iteration.
  • Classical optimization difficulty. Even without barren plateaus, the VQE cost landscape can have many local minima. There is no general guarantee that the classical optimizer will find the global minimum, and the problem of distinguishing local from global minima is itself computationally hard.
  • Noise accumulation. On real hardware, gate errors, decoherence, and measurement errors corrupt the estimated energy. Error mitigation techniques (zero-noise extrapolation, probabilistic error cancellation) can reduce but not eliminate this effect. The achievable accuracy depends on the quality of the hardware and the depth of the circuit.
  • The quantum advantage question. For VQE to be useful, it must solve problems that classical methods cannot. Current VQE experiments are on systems small enough to solve exactly on classical computers. Demonstrating a genuine quantum advantage in chemistry - solving a problem beyond classical reach with VQE on quantum hardware - remains an open challenge.
Common Misconception.

Barren plateaus do not mean variational quantum algorithms are doomed. They mean that naive approaches - random circuits with random initialization - will fail at scale. The solution is to build problem structure into the ansatz, use smart initialization strategies, and employ adaptive methods like ADAPT-VQE. The barren plateau results are a guide for algorithm design, not a no-go theorem for the entire variational paradigm.

The Road Ahead

Despite these challenges, VQE remains one of the most promising applications of near-term quantum computers. Active research directions include:

  • Better ansatz design informed by both chemistry and information theory.
  • Improved classical optimizers specifically designed for noisy quantum cost functions.
  • Error mitigation techniques that extend the reach of noisy hardware.
  • Hybrid approaches that combine VQE with classical post-processing (e.g., quantum subspace expansion) to extract more information from shallow circuits.
  • Scaling demonstrations on increasingly large molecules, working toward the "quantum advantage" frontier where classical methods begin to fail.

The variational quantum eigensolver exemplifies the NISQ era: it is an algorithm designed not for a perfect quantum computer, but for the imperfect machines we have today. Its ultimate success or failure will be determined not by any single theoretical result, but by the co-evolution of hardware capabilities, algorithmic innovations, and our understanding of where the classical-quantum boundary truly lies.

Interactive: VQE Energy Landscape

Before running the optimizer, explore the energy landscape by sweeping a single parameter. This reveals the terrain that VQE must navigate to find the ground-state energy of the hydrogen molecule.

OPENQASM 2.0; include "qelib1.inc"; qreg q[2]; creg c[2]; x q[1]; ry(0.22) q[0]; cx q[0], q[1]; c[0] = measure q[0]; c[1] = measure q[1];
H$_2$ Energy vs Bond Length

This plot shows three energy curves for H$_2$: the exact (FCI) result, the Hartree-Fock approximation, and a simulated VQE curve. Notice how Hartree-Fock fails at large bond lengths (the dissociation regime) while VQE tracks the exact curve throughout.

-- Exact (FCI) -- Hartree-Fock o VQE

Interactive: VQE Manual Optimizer

Run the full VQE algorithm on a 2-qubit Hamiltonian representing molecular hydrogen at equilibrium bond distance. The optimizer searches the parameter space to find the ground state energy. Watch the energy converge toward the minimum.

OPENQASM 2.0; include "qelib1.inc"; qreg q[2]; creg c[2]; ry({theta0}) q[0]; ry({theta1}) q[1]; cx q[0], q[1]; c[0] = measure q[0]; c[1] = measure q[1];

Try to find the parameter values that concentrate probability on $|01\rangle$ and $|10\rangle$ - these correspond to the bonding configurations of H$_2$. The ground state has $\theta_0 \approx \pi$ and $\theta_1 \approx 0.22$. The expectation value $\langle ZZ \rangle$ reports how correlated the two qubits are.

Interactive: ADAPT-VQE Walkthrough

Watch the ADAPT-VQE algorithm build its ansatz one operator at a time. At each iteration, the algorithm evaluates the gradient of every operator in the pool, selects the one with the largest gradient, appends it to the circuit, and re-optimizes all parameters. The energy drops toward the exact ground state.

Step 0: Hartree-Fock Reference

We begin with the Hartree-Fock state $|01\rangle$ - the simplest approximation where both electrons occupy the bonding orbital. Energy: $-1.117$ Ha (above the exact value of $-1.137$ Ha by 0.020 Ha).

|00>
|01>
|10>
|11>

Step 1: Evaluate Gradients

Compute $|\partial E / \partial \theta_k|$ for each operator in the pool. The double excitation $a_1^\dagger a_3^\dagger a_2 a_0$ has gradient magnitude $0.362$ - by far the largest. Single excitations have near-zero gradients for H$_2$.

s01
s23
d0123
s02

Gradient magnitudes for operator pool

Step 2: Append and Optimize

The double excitation operator is appended to the ansatz. Its parameter $\theta_1$ is optimized, yielding $\theta_1 \approx 0.11$. Energy drops to $-1.137$ Ha - within chemical accuracy of the exact result. Circuit depth: just 1 CNOT layer.

|00>
|01>
|10>
|11>

Step 3: Convergence Check

All remaining gradient magnitudes are below the convergence threshold ($10^{-4}$). ADAPT-VQE terminates with a 1-parameter ansatz. For H$_2$, a single double excitation captures essentially all the electron correlation. Final energy: $-1.1372$ Ha (error: $0.0001$ Ha, well within chemical accuracy).

Converged!
E = -1.1372 Ha
Error = 0.0001 Ha < 0.0016 Ha (chemical accuracy)
Parameters: 1 | CNOT depth: 1