Chapter 10: The Quantum Fourier Transform

Throughout this textbook, we have built up an increasingly powerful toolkit for quantum computation - from single-qubit gates and superposition to multi-qubit entanglement and oracle-based algorithms. Now we arrive at one of the most important building blocks in all of quantum computing: the Quantum Fourier Transform (QFT). It is the engine inside Shor's factoring algorithm, the backbone of quantum phase estimation, and the key to an entire family of quantum speedups.

But before we dive into the quantum version, we need to understand what a Fourier transform does in the first place. The core idea is surprisingly intuitive: a Fourier transform finds hidden patterns in data. It takes a signal that looks complicated in one representation and reveals the simple, repeating components underneath. This is one of the most useful ideas in all of mathematics and engineering - and its quantum counterpart inherits that power while operating exponentially faster in the right circumstances.

10.1 Classical Fourier Transforms: Finding Hidden Patterns

Decomposing Signals into Frequencies

Imagine you are listening to a chord played on a piano - say, middle C, E, and G struck simultaneously. Your ear hears a single complex sound wave, a jagged, irregular pressure pattern oscillating in the air. Yet somehow your brain effortlessly separates it into three distinct notes. How? Your inner ear is essentially performing a Fourier transform: it decomposes the complex wave into its constituent frequencies.

This is precisely what the Fourier transform does mathematically. Given a signal - any function of time - it produces a new function of frequency that tells you how much of each frequency is present in the original signal. The original signal lives in the time domain; the transformed version lives in the frequency domain.

Key Concept.

The Fourier transform converts a signal from the time domain (what happens at each moment) to the frequency domain (how much of each frequency is present). Any signal can be decomposed into a sum of simple sinusoidal waves at different frequencies.

This idea has staggeringly broad applications. Audio engineers use it to isolate instruments in a recording. Doctors use it to reconstruct images in MRI scanners. Telecommunications systems use it to pack multiple phone calls onto a single cable. Astronomers use it to detect the faint periodic signals of exoplanets orbiting distant stars.

The Discrete Fourier Transform (DFT)

In the digital world, we work with discrete data - a finite list of $N$ numbers rather than a continuous function. The Discrete Fourier Transform (DFT) takes a vector of $N$ complex numbers $(x_0, x_1, \ldots, x_{N-1})$ and produces a new vector $(y_0, y_1, \ldots, y_{N-1})$ defined by:

$$y_k = \frac{1}{\sqrt{N}} \sum_{j=0}^{N-1} x_j \, \omega^{jk}$$

where $\omega = e^{2\pi i / N}$ is a primitive $N$-th root of unity. Recall from Chapter 2 that $e^{i\theta} = \cos\theta + i\sin\theta$, so $\omega$ is a complex number that sits on the unit circle at angle $2\pi/N$. The powers $\omega^{jk}$ sweep around the unit circle, creating the sinusoidal patterns that probe each frequency.

We can write the DFT compactly as a matrix-vector multiplication $\mathbf{y} = F_N \mathbf{x}$, where the DFT matrix $F_N$ has entries:

$$(F_N)_{jk} = \frac{1}{\sqrt{N}} \omega^{jk}$$

For $N = 4$ with $\omega = e^{2\pi i/4} = i$, the DFT matrix is:

$$F_4 = \frac{1}{2} \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & i & i^2 & i^3 \\ 1 & i^2 & i^4 & i^6 \\ 1 & i^3 & i^6 & i^9 \end{pmatrix} = \frac{1}{2} \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & i & -1 & -i \\ 1 & -1 & 1 & -1 \\ 1 & -i & -1 & i \end{pmatrix}$$

The key player here is $\omega = e^{2\pi i/N}$, the primitive $N$-th root of unity. Its powers $\omega^0, \omega^1, \omega^2, \ldots, \omega^{N-1}$ are $N$ equally spaced points on the unit circle in the complex plane, and they satisfy the crucial property $\omega^N = 1$. The DFT essentially measures how well the input data "resonates" with each of these $N$ rotational frequencies.

Interactive: Roots of Unity on the Unit Circle

Computing this matrix-vector product directly requires $N$ multiplications for each of the $N$ output values, giving a total of $O(N^2)$ arithmetic operations. For large $N$, this becomes prohibitively expensive. For example, processing one second of CD-quality audio ($N = 44{,}100$ samples) would require nearly two billion operations with the naive approach.

The Fast Fourier Transform (FFT)

In 1965, James Cooley and John Tukey published an algorithm that changed scientific computing forever: the Fast Fourier Transform (FFT). The FFT computes exactly the same result as the DFT but exploits a divide-and-conquer strategy to reduce the cost from $O(N^2)$ to $O(N \log N)$.

The key insight is that the DFT of size $N$ can be split into two DFTs of size $N/2$ - one on the even-indexed elements and one on the odd-indexed elements - and then the two halves can be combined with only $O(N)$ additional work. Applying this recursion $\log_2 N$ times yields the $O(N \log N)$ bound.

To put this speedup in perspective: for $N = 2^{20} \approx 10^6$ data points, the naive DFT requires about $10^{12}$ operations, while the FFT requires only about $2 \times 10^7$. That is a factor of $50{,}000$ faster. The FFT is sometimes called one of the most important algorithms of the 20th century, and for good reason - it made digital signal processing practical.

Note.

The FFT algorithm was actually discovered earlier by Gauss around 1805, but his notes were not published until long after his death. Cooley and Tukey independently rediscovered it in the context of Cold War-era signal processing for detecting nuclear tests.

As impressive as the FFT is, can we do even better? Classically, $O(N \log N)$ appears to be essentially optimal for computing the full Fourier transform. But on a quantum computer, we are about to see something remarkable: the same mathematical transformation can be performed using only $O(\log^2 N)$ elementary operations - an exponential speedup in the number of gates.

10.2 The Quantum Fourier Transform Circuit

Definition: From Computational Basis to Fourier Basis

The Quantum Fourier Transform is the quantum-mechanical analogue of the classical DFT. It is a unitary operation that acts on a quantum register of $n$ qubits and transforms computational basis states into Fourier basis states. Specifically, the QFT maps:

$$\text{QFT} |j\rangle = \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} e^{2\pi i jk/N} |k\rangle$$

where $N = 2^n$ and $j$ ranges from $0$ to $N - 1$. Comparing this to the classical DFT, we see the same formula: each computational basis state $|j\rangle$ is mapped to a superposition of all basis states, with the amplitude of $|k\rangle$ being $\frac{1}{\sqrt{N}} e^{2\pi i jk/N}$.

More generally, the QFT acts linearly on any input state:

$$\text{QFT} \left( \sum_{j=0}^{N-1} x_j |j\rangle \right) = \sum_{k=0}^{N-1} y_k |k\rangle$$

where $(y_0, y_1, \ldots, y_{N-1})$ is the DFT of $(x_0, x_1, \ldots, x_{N-1})$. The QFT matrix is exactly the DFT matrix $F_N$ - but implemented as a quantum gate.

Key Concept.

The QFT is a unitary gate that performs the Discrete Fourier Transform on the amplitudes of a quantum state. It maps computational basis states to Fourier basis states: $|j\rangle \mapsto \frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} e^{2\pi i jk/N} |k\rangle$.

Product Representation: The Key to the Circuit

The crucial insight that makes the QFT efficient is that the output state can be written as a tensor product of single-qubit states. Using the binary fraction notation where $0.j_1 j_2 \cdots j_m = j_1/2 + j_2/4 + \cdots + j_m/2^m$, the QFT of a basis state $|j\rangle = |j_1 j_2 \cdots j_n\rangle$ can be written as:

$$\text{QFT}|j_1 j_2 \cdots j_n\rangle = \frac{1}{\sqrt{2^n}} \left( |0\rangle + e^{2\pi i \, 0.j_n} |1\rangle \right) \otimes \left( |0\rangle + e^{2\pi i \, 0.j_{n-1}j_n} |1\rangle \right) \otimes \cdots \otimes \left( |0\rangle + e^{2\pi i \, 0.j_1 j_2 \cdots j_n} |1\rangle \right)$$

This product form is remarkable for several reasons. First, the output is not entangled - it factors into a tensor product of individual qubit states. Second, each factor has the form $\frac{1}{\sqrt{2}}(|0\rangle + e^{i\phi}|1\rangle)$ for some phase $\phi$, which is a state we can produce using Hadamard gates and controlled phase rotations. Third, and most importantly, the phase on the $m$-th qubit from the right depends only on the $m$ least significant bits of the input $j$. This locality of dependence is what makes the circuit efficient.

Let us verify the product formula for a small case. For $n = 2$ with input $|j\rangle = |j_1 j_2\rangle = |11\rangle = |3\rangle$:

$$\text{QFT}|3\rangle = \frac{1}{2}\sum_{k=0}^{3} e^{2\pi i \cdot 3k/4}|k\rangle = \frac{1}{2}\left(|0\rangle + e^{i3\pi/2}|1\rangle + e^{i3\pi}|2\rangle + e^{i9\pi/2}|3\rangle\right)$$ $$= \frac{1}{2}(|0\rangle - i|1\rangle - |2\rangle + i|3\rangle)$$

The product form gives: $\frac{1}{\sqrt{2}}(|0\rangle + e^{2\pi i \cdot 0.1}|1\rangle) \otimes \frac{1}{\sqrt{2}}(|0\rangle + e^{2\pi i \cdot 0.11}|1\rangle)$. Since $0.1_2 = 1/2$ we get $e^{i\pi} = -1$, and $0.11_2 = 3/4$ gives $e^{i3\pi/2} = -i$. So the product is $\frac{1}{2}(|0\rangle - |1\rangle)(|0\rangle - i|1\rangle) = \frac{1}{2}(|00\rangle - i|01\rangle - |10\rangle + i|11\rangle)$, which matches after relabeling $|00\rangle = |0\rangle$, $|01\rangle = |1\rangle$, $|10\rangle = |2\rangle$, $|11\rangle = |3\rangle$.

Building the Circuit: Hadamards and Controlled Phase Gates

The QFT circuit is constructed from two types of gates:

Hadamard gate (H): Creates the superposition $|0\rangle \mapsto \frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)$ and $|1\rangle \mapsto \frac{1}{\sqrt{2}}(|0\rangle - |1\rangle)$.
Controlled phase gate $\text{CP}(\theta)$: Applies a phase shift of $e^{i\theta}$ to the target qubit only when the control qubit is $|1\rangle$. In the QFT, the rotation angles are always $\theta = \pi/2^k$ for various values of $k$.

The circuit processes qubits one at a time, from the most significant (qubit 0) to the least significant (qubit $n-1$). For each qubit $q_m$:

Apply a Hadamard gate to $q_m$.
For each subsequent qubit $q_k$ where $k > m$, apply a controlled phase gate $\text{CP}(\pi/2^{k-m})$ with $q_k$ as control and $q_m$ as target.

For a 3-qubit QFT ($n = 3$, $N = 8$), the circuit proceeds as follows:

Apply H to $q_0$.
Apply $\text{CP}(\pi/2)$ controlled by $q_1$, targeting $q_0$.
Apply $\text{CP}(\pi/4)$ controlled by $q_2$, targeting $q_0$.
Apply H to $q_1$.
Apply $\text{CP}(\pi/2)$ controlled by $q_2$, targeting $q_1$.
Apply H to $q_2$.
Apply SWAP between $q_0$ and $q_2$ (bit reversal).

Let us trace through why each gate is needed. After the Hadamard on $q_0$, the first qubit is in the state $\frac{1}{\sqrt{2}}(|0\rangle + e^{2\pi i \, 0.j_1}|1\rangle)$, since the Hadamard introduces a phase of $\pi$ (which equals $e^{2\pi i \cdot j_1/2}$) when $j_1 = 1$. The subsequent controlled phase gates add the finer phase contributions: the $\text{CP}(\pi/2)$ gate controlled by $q_1$ adds the $j_2/4$ term, and the $\text{CP}(\pi/4)$ gate controlled by $q_2$ adds the $j_3/8$ term. Together, these produce the factor $(|0\rangle + e^{2\pi i \, 0.j_1 j_2 j_3}|1\rangle)/\sqrt{2}$.

The Bit Reversal Step

There is one subtlety. The product representation shows that the first qubit of the output should carry the phase $e^{2\pi i \, 0.j_n}$ (depending only on the least significant input bit), while the last qubit should carry the full phase $e^{2\pi i \, 0.j_1 j_2 \cdots j_n}$ (depending on all input bits). But the circuit naturally produces the qubits in the opposite order - the first qubit processed ends up with the most complex phase.

The fix is simple: after all the Hadamard and controlled-phase operations, we apply SWAP gates to reverse the order of the qubits. For $n$ qubits, this requires $\lfloor n/2 \rfloor$ SWAP gates: swap qubit 0 with qubit $n-1$, qubit 1 with qubit $n-2$, and so on.

Worked Example: 3-Qubit QFT on $|5\rangle = |101\rangle$

Let us trace the 3-qubit QFT circuit on the input state $|5\rangle = |101\rangle$, so $j_1 = 1$, $j_2 = 0$, $j_3 = 1$. The expected output is:

$$\text{QFT}|5\rangle = \frac{1}{\sqrt{8}} \sum_{k=0}^{7} e^{2\pi i \cdot 5k/8} |k\rangle$$

In product form, the output should be:

$$\frac{1}{\sqrt{2}}\left(|0\rangle + e^{2\pi i \cdot 0.1}|1\rangle\right) \otimes \frac{1}{\sqrt{2}}\left(|0\rangle + e^{2\pi i \cdot 0.01}|1\rangle\right) \otimes \frac{1}{\sqrt{2}}\left(|0\rangle + e^{2\pi i \cdot 0.101}|1\rangle\right)$$

Computing the binary fractions: $0.1_2 = 1/2$, so $e^{2\pi i \cdot 1/2} = e^{i\pi} = -1$. Next, $0.01_2 = 1/4$, so $e^{2\pi i \cdot 1/4} = e^{i\pi/2} = i$. Finally, $0.101_2 = 1/2 + 1/8 = 5/8$, so $e^{2\pi i \cdot 5/8} = e^{i \cdot 5\pi/4}$.

So the output (after bit reversal) is:

$$\frac{1}{2\sqrt{2}} \left(|0\rangle - |1\rangle\right) \otimes \left(|0\rangle + i|1\rangle\right) \otimes \left(|0\rangle + e^{i5\pi/4}|1\rangle\right)$$

Now let us verify this step by step through the circuit. The initial state is $|q_0 q_1 q_2\rangle = |1\,0\,1\rangle$.

Step 1: H on $q_0$. Since $q_0 = |1\rangle$, the Hadamard gives $\frac{1}{\sqrt{2}}(|0\rangle - |1\rangle)$. State becomes $\frac{1}{\sqrt{2}}(|0\rangle - |1\rangle) \otimes |0\rangle \otimes |1\rangle$.

Step 2: CP($\pi/2$) with $q_1$ controlling $q_0$. Since $q_1 = |0\rangle$, the controlled phase does nothing. State unchanged.

Step 3: CP($\pi/4$) with $q_2$ controlling $q_0$. Since $q_2 = |1\rangle$, a phase of $e^{i\pi/4}$ is applied to $q_0$ when $q_0 = |1\rangle$. State becomes $\frac{1}{\sqrt{2}}(|0\rangle - e^{i\pi/4}|1\rangle) \otimes |0\rangle \otimes |1\rangle$. Note: the existing $-1$ phase combines with $e^{i\pi/4}$ to give $-e^{i\pi/4} = e^{i\pi} \cdot e^{i\pi/4} = e^{i5\pi/4}$. So $q_0$ is now $\frac{1}{\sqrt{2}}(|0\rangle + e^{i5\pi/4}|1\rangle)$.

Step 4: H on $q_1$. Since $q_1 = |0\rangle$, the Hadamard gives $\frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)$.

Step 5: CP($\pi/2$) with $q_2$ controlling $q_1$. Since $q_2 = |1\rangle$, a phase of $e^{i\pi/2} = i$ is applied to $q_1$ when $q_1 = |1\rangle$. So $q_1$ becomes $\frac{1}{\sqrt{2}}(|0\rangle + i|1\rangle)$.

Step 6: H on $q_2$. Since $q_2 = |1\rangle$, the Hadamard gives $\frac{1}{\sqrt{2}}(|0\rangle - |1\rangle)$.

Before the SWAP, the state is:

$$\frac{1}{2\sqrt{2}} \left(|0\rangle + e^{i5\pi/4}|1\rangle\right) \otimes \left(|0\rangle + i|1\rangle\right) \otimes \left(|0\rangle - |1\rangle\right)$$

Step 7: SWAP $q_0$ and $q_2$. This reverses the qubit order, giving:

$$\frac{1}{2\sqrt{2}} \left(|0\rangle - |1\rangle\right) \otimes \left(|0\rangle + i|1\rangle\right) \otimes \left(|0\rangle + e^{i5\pi/4}|1\rangle\right)$$

This matches our expected product form exactly, confirming the circuit is correct.

Interactive: QFT Step-Through

Step through the 3-qubit QFT circuit gate by gate. At each step, the stabilizer tableau shows how the state evolves through Hadamard and controlled-phase gates.

Step 0

Sandbox: 3-Qubit QFT Circuit

Run the 3-qubit QFT circuit below and observe the output distribution. The input state $|101\rangle = |5\rangle$ is prepared first, then the QFT is applied. Because the QFT produces a uniform superposition with different phases on each basis state, all 8 outcomes should appear with roughly equal probability (about 12.5% each). Try changing the input state by modifying the X gates.

OPENQASM 3.0;
qubit[3] q; bit[3] c;
// Prepare |101> = |5>
x q[0]; x q[2];
// 3-qubit QFT
h q[0];
cp(pi/2) q[1], q[0];
cp(pi/4) q[2], q[0];
h q[1];
cp(pi/2) q[2], q[1];
h q[2];
swap q[0], q[2];
c = measure q;

Note.

The QFT of any computational basis state produces a uniform probability distribution - every output state has probability $1/N$. The information is encoded entirely in the phases of the amplitudes, which are invisible to measurement. This is why the QFT is useful as a subroutine (where subsequent operations can exploit the phase information) rather than as a standalone algorithm.

The General $n$-Qubit QFT Circuit

Generalizing the 3-qubit example, the QFT circuit for $n$ qubits follows a systematic pattern. We can write it as a sequence of $n$ stages, where stage $m$ (for $m = 0, 1, \ldots, n-1$) processes qubit $q_m$:

Apply H to $q_m$.
For $k = m+1, m+2, \ldots, n-1$: apply $\text{CP}(\pi/2^{k-m})$ with control $q_k$ and target $q_m$.

After all $n$ stages, apply $\lfloor n/2 \rfloor$ SWAP gates to reverse the qubit order. The rotation angles decrease as the controlling qubit gets farther from the target: the nearest neighbor contributes $\pi/2$, the next contributes $\pi/4$, then $\pi/8$, and so on. The $k$-th controlled rotation in stage $m$ applies the phase $\pi/2^{k-m}$, which corresponds to adding the bit $j_{k+1}$ at position $2^{-(k-m+1)}$ in the binary fraction $0.j_{m+1} j_{m+2} \cdots j_n$.

Gate Count

For $n$ qubits, the QFT circuit requires:

$n$ Hadamard gates (one per qubit)
$n(n-1)/2$ controlled phase gates (for qubit $m$, there are $n - m - 1$ controlled rotations from subsequent qubits)
$\lfloor n/2 \rfloor$ SWAP gates for bit reversal

The total gate count is $n + n(n-1)/2 + \lfloor n/2 \rfloor = O(n^2)$ gates. This quadratic scaling in the number of qubits is the source of the QFT's exponential advantage, as we will see in the next section.

10.3 Why the QFT Is Exponentially Faster

Comparing Gate Counts

Let us be very precise about the comparison between the QFT and its classical counterparts. Suppose we want to compute the Fourier transform of $N = 2^n$ data points:

Algorithm	Number of Operations	In Terms of $n$ Qubits
Naive DFT	$O(N^2)$	$O(2^{2n})$
Classical FFT	$O(N \log N)$	$O(n \cdot 2^n)$
Quantum QFT	$O(n^2)$ gates	$O(n^2)$

The difference is dramatic. For $n = 30$ qubits ($N \approx 10^9$ data points):

Classical FFT: approximately $30 \times 10^9 = 3 \times 10^{10}$ operations
QFT: approximately $30^2 = 900$ gates

That is a speedup factor of over 30 million. And the gap grows exponentially with $n$: for each additional qubit, the classical cost roughly doubles while the quantum cost grows only linearly.

Key Concept.

The QFT requires $O(n^2)$ quantum gates to transform $N = 2^n$ amplitudes, compared to $O(n \cdot 2^n)$ classical operations for the FFT. This is an exponential speedup in the number of gates - from exponential to polynomial in $n$.

The Catch: Input and Output Encoding

If the QFT is so much faster, why not use it to replace the classical FFT in all applications? There is a fundamental catch, and understanding it is essential for appreciating what quantum algorithms can and cannot do.

The catch is twofold:

Loading classical data is expensive. The QFT operates on the amplitudes of a quantum state. If you have a classical dataset of $N$ numbers and want to load them as amplitudes of an $n$-qubit state, you generally need $O(N)$ gates to prepare that state - wiping out the exponential advantage. There is no known general method to load $N$ arbitrary classical values into $n = \log_2 N$ qubits in fewer than $O(N)$ steps.
Reading out the full result is expensive. Even after the QFT transforms the amplitudes, you cannot directly read all $N$ output amplitudes. Measuring the quantum state collapses it, giving you a single sample. To reconstruct all $N$ output values to reasonable precision, you would need to repeat the entire computation $O(N)$ times.

Common Misconception.

The QFT does not let you compute the Fourier transform of arbitrary classical data exponentially faster than the FFT. The exponential speedup in gate count applies only to the transformation itself, not to the input preparation or output readout. The QFT is powerful when used as a subroutine within a larger quantum algorithm where the input is already in quantum form.

When the QFT Shines

The QFT becomes genuinely powerful in situations where:

The input is already quantum. In algorithms like Shor's and quantum phase estimation, the data to be Fourier-transformed is produced by prior quantum operations. There is no classical loading step.
You do not need the full output. Typically, the QFT is followed by a measurement, and the goal is to extract a single piece of information (such as the period of a function or the phase of an eigenvalue) rather than all $N$ Fourier coefficients.
The structure of the problem aligns with Fourier analysis. Problems involving periodicity, hidden subgroups, or phase estimation are natural fits for the QFT.

This is a recurring theme in quantum computing: quantum speedups are not universal accelerators. They arise from the interplay between quantum superposition, interference, and the mathematical structure of specific problems. The QFT is the tool that converts phase information (encoded in amplitudes) into measurable outcomes (encoded in basis state probabilities), and it does so with extraordinary efficiency.

Approximate QFT

For practical quantum computing, there is an additional optimization. In the exact QFT circuit, the controlled phase rotations become extremely small for distant qubits - for example, $\text{CP}(\pi/2^{n-1})$ involves a phase angle that is exponentially small in $n$. On a real quantum computer, such tiny rotations are both difficult to implement precisely and contribute negligibly to the final result.

The approximate QFT simply omits all controlled rotations with angles below a chosen threshold. If we keep only rotations of size $\pi/2^m$ or larger (for some cutoff $m$), the gate count drops from $O(n^2)$ to $O(n \cdot m)$, and for $m = O(\log n)$ the approximation error is negligible. This gives an $O(n \log n)$ circuit that is essentially as accurate as the exact QFT for practical purposes - matching the classical FFT's $O(N \log N)$ in a different sense, but still operating on quantum amplitudes rather than classical data.

10.4 Quantum Phase Estimation: Extracting Eigenvalues

The Problem

Quantum Phase Estimation (QPE) is arguably the most important application of the QFT and one of the most consequential algorithms in quantum computing. It solves the following problem:

Key Concept.

Phase Estimation Problem: Given a unitary operator $U$ and one of its eigenvectors $|\psi\rangle$ satisfying $U|\psi\rangle = e^{2\pi i \theta}|\psi\rangle$, estimate the eigenvalue phase $\theta$ to $n$ bits of precision.

Why is this important? Many quantum algorithms reduce their core computation to estimating eigenvalues of unitary operators. Shor's factoring algorithm uses QPE to find the period of a modular exponentiation function. The HHL algorithm for solving linear systems uses QPE to extract eigenvalues of a matrix. Quantum simulation algorithms use QPE to measure the energy levels of quantum systems. Estimating phases is, in a deep sense, what quantum computers do best.

The QPE Circuit

The QPE circuit uses two quantum registers:

Counting register: $n$ ancilla qubits, initialized to $|0\rangle^{\otimes n}$. These will hold the binary representation of $\theta$ at the end.
Eigenstate register: Contains the eigenvector $|\psi\rangle$ of $U$.

The algorithm proceeds in three stages:

Stage 1: Superposition. Apply Hadamard gates to all $n$ ancilla qubits, creating an equal superposition:

$$|0\rangle^{\otimes n} \xrightarrow{H^{\otimes n}} \frac{1}{\sqrt{2^n}} \sum_{k=0}^{2^n - 1} |k\rangle$$

Stage 2: Controlled unitary powers. Apply controlled-$U^{2^k}$ operations, where the $k$-th ancilla qubit controls $U^{2^k}$ acting on the eigenstate register. Since $U|\psi\rangle = e^{2\pi i \theta}|\psi\rangle$, we have $U^{2^k}|\psi\rangle = e^{2\pi i \cdot 2^k \theta}|\psi\rangle$. The controlled operation kicks this phase back onto the control qubit (this is the phase kickback trick from Chapter 9):

$$\frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)|\psi\rangle \xrightarrow{C\text{-}U^{2^k}} \frac{1}{\sqrt{2}}(|0\rangle + e^{2\pi i \cdot 2^k \theta}|1\rangle)|\psi\rangle$$

After all controlled-$U$ operations, the state of the counting register is:

$$\frac{1}{\sqrt{2^n}} \sum_{k=0}^{2^n - 1} e^{2\pi i \theta k} |k\rangle$$

Look carefully at this expression. It is a superposition over all basis states $|k\rangle$ with amplitudes proportional to $e^{2\pi i \theta k}$. This is exactly the state you would get by applying the QFT to $|2^n \theta\rangle$ (assuming $\theta$ can be written exactly in $n$ bits as $\theta = \theta_1/2 + \theta_2/4 + \cdots + \theta_n/2^n$). The eigenstate register remains unchanged in the state $|\psi\rangle$ throughout, since it is always acted upon by powers of $U$ for which $|\psi\rangle$ is an eigenvector.

This is the deep connection: the controlled unitary operations encode $\theta$ into the phases of the counting register in exactly the Fourier basis. The inverse QFT then converts from the Fourier basis back to the computational basis, revealing $\theta$ directly.

Stage 3: Inverse QFT. Apply the inverse QFT ($\text{QFT}^{\dagger}$) to the counting register. Since the state is the QFT of $|2^n \theta\rangle$, the inverse QFT recovers $|2^n \theta\rangle$ - the binary representation of $\theta$ stored in the ancilla qubits.

$$\frac{1}{\sqrt{2^n}} \sum_{k=0}^{2^n-1} e^{2\pi i \theta k} |k\rangle \xrightarrow{\text{QFT}^{\dagger}} |\tilde{\theta}\rangle$$

where $\tilde{\theta}$ is the $n$-bit binary approximation to $2^n \theta$. Measuring the counting register then gives $\tilde{\theta}$, from which we obtain $\theta \approx \tilde{\theta}/2^n$.

Precision and Success Probability

If $\theta$ can be represented exactly in $n$ bits (i.e., $2^n \theta$ is an integer), then QPE produces the exact answer with certainty. When $\theta$ is not exactly representable, the measurement outcome is the closest $n$-bit approximation, with success probability at least $4/\pi^2 \approx 0.405$.

To boost the success probability to $1 - \epsilon$, we can use $O(\log(1/\epsilon))$ additional ancilla qubits. With $n + \lceil\log_2(2 + 1/(2\epsilon))\rceil$ ancilla qubits, the probability of obtaining an $n$-bit accurate estimate is at least $1 - \epsilon$.

Ancilla Qubits	Precision	Example
3	$1/8 = 0.125$	Distinguish 8 evenly-spaced phases
5	$1/32 \approx 0.031$	About 2 decimal digits
10	$1/1024 \approx 0.001$	About 3 decimal digits
20	$1/1048576 \approx 10^{-6}$	About 6 decimal digits

How QPE Works: A Summary

It is worth pausing to appreciate the elegance of QPE. The algorithm has three conceptual stages, each exploiting a different quantum phenomenon:

Superposition (Hadamard gates): Creates a uniform superposition over all $2^n$ values of the counting register, allowing the circuit to "test" all possible phases simultaneously.
Phase kickback (controlled-$U^{2^k}$ gates): The eigenvalue phases of $U$ are kicked back onto the ancilla qubits, encoding $\theta$ into the relative phases of the superposition. This is the same phase kickback mechanism that powered the Deutsch-Jozsa and Bernstein-Vazirani algorithms in Chapter 9, but here it is applied to each ancilla qubit with exponentially increasing powers of $U$.
Interference (inverse QFT): The inverse QFT causes constructive interference at the state $|2^n \theta\rangle$ and destructive interference elsewhere, concentrating the probability amplitude onto the correct answer. This is the Fourier transform doing what it does best: extracting a hidden frequency from a signal.

Example: Estimating the Phase of the T Gate

Let us work through a concrete example. The T gate (also known as the $\pi/8$ gate from Chapter 5) has the matrix:

$$T = \begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{pmatrix}$$

Its eigenstates are $|0\rangle$ with eigenvalue 1 and $|1\rangle$ with eigenvalue $e^{i\pi/4}$. Let us use QPE to estimate the phase of the eigenvalue $e^{i\pi/4}$ associated with eigenvector $|1\rangle$.

We have $T|1\rangle = e^{i\pi/4}|1\rangle = e^{2\pi i \cdot (1/8)}|1\rangle$, so $\theta = 1/8 = 0.001_2$ in binary. With 3 ancilla qubits, we can represent $\theta$ exactly.

The circuit uses 3 ancilla qubits and 1 eigenstate qubit:

Initialize the eigenstate qubit to $|1\rangle$ with an X gate.
Apply H to each ancilla qubit.
Apply controlled-$T^{2^k}$ from ancilla $k$ to the eigenstate qubit:
- Ancilla 0 controls $T^{4} = T^4$. Since $T^8 = I$ (the T gate has order 8), $T^4 = Z$ (which applies phase $e^{i\pi}$).
- Ancilla 1 controls $T^{2}= S$ (the S gate, applying phase $e^{i\pi/2}$).
- Ancilla 2 controls $T^{1} = T$ (applying phase $e^{i\pi/4}$).
Apply the inverse QFT to the 3 ancilla qubits.
Measure the ancilla qubits to read off $\theta$.

After the controlled unitaries, the ancilla register is in the state:

$$\frac{1}{2\sqrt{2}} \sum_{k=0}^{7} e^{2\pi i k/8} |k\rangle$$

The inverse QFT maps this to $|001\rangle = |1\rangle$, which represents $\tilde{\theta} = 1$. Since $\theta = \tilde{\theta}/2^n = 1/8$, we recover the correct phase.

QPE for the T Gate

A QPE circuit with 3 ancilla qubits can estimate the phase of the T gate acting on $|1\rangle$. The expected measurement result on the ancilla qubits is $|001\rangle$, corresponding to $\theta = 1/8$. You can run this circuit in the interactive QPE simulation later in this chapter and verify that $|001\rangle$ appears with probability close to 100%.

Note.

In the inverse QFT, every gate from the forward QFT is replaced by its conjugate transpose (Hermitian adjoint) and applied in reverse order. For controlled phase gates, this means replacing $\text{CP}(\phi)$ with $\text{CP}(-\phi)$. The Hadamard gate is its own inverse ($H^{\dagger} = H$), so Hadamards remain unchanged.

QPE as a Building Block

QPE is not just an algorithm - it is a fundamental primitive that powers many of the most important quantum algorithms:

Shor's algorithm (Chapter 11): Uses QPE to find the period of modular exponentiation, which is then used to factor integers. The unitary $U$ implements multiplication by a constant modulo $N$, and QPE extracts the period from the eigenvalue phases.
HHL algorithm: Solves systems of linear equations $A\mathbf{x} = \mathbf{b}$ exponentially faster than classical methods (under certain conditions). QPE extracts the eigenvalues of $A$, which are then inverted to construct the solution.
Quantum simulation: To find the energy levels of a quantum system with Hamiltonian $H$, one constructs $U = e^{iHt}$ and uses QPE to estimate the eigenvalues of $U$, which encode the energies. This was Feynman's original motivation for quantum computing.
Quantum counting: Combines QPE with Grover's algorithm (Chapter 12) to estimate the number of solutions to a search problem, rather than just finding one.

The pattern is always the same: encode the problem as a unitary operator, prepare or approximate its eigenvector, and use QPE to extract the eigenvalue phase that contains the answer. The QFT (in its inverse form) is the final step that converts the phase information into a measurable bit string.

Key Concept.

Quantum Phase Estimation combines controlled unitary powers with the inverse QFT to convert eigenvalue phases into measurable bit strings. It is the central subroutine in Shor's algorithm, HHL, quantum simulation, and quantum counting. Using $n$ ancilla qubits, it estimates the phase $\theta$ to $n$ bits of precision.

The Inverse QFT: Running the Circuit Backwards

Since the QFT is a unitary operation, its inverse $\text{QFT}^{\dagger}$ exists and can be implemented by reversing the circuit: apply each gate's conjugate transpose in the opposite order. Concretely, for the inverse QFT:

First perform the SWAP gates (bit reversal) - since SWAP is its own inverse, this is unchanged.
Then process qubits in reverse order (from qubit $n-1$ down to qubit 0).
For each qubit, apply the controlled phase gates with negated angles ($-\pi/2^k$ instead of $\pi/2^k$) in reverse order, followed by a Hadamard.

The inverse QFT has exactly the same gate count as the forward QFT: $O(n^2)$ gates. It converts Fourier basis states back into computational basis states, making the encoded phase information accessible to measurement.

With the QFT and QPE in hand, we have the two central tools needed for the crown jewel of quantum algorithms. In the next chapter, we will see how Peter Shor combined QPE with modular arithmetic to create an algorithm that can factor large integers in polynomial time - a feat that threatens the very foundations of modern cryptography and remains one of the strongest motivations for building large-scale quantum computers.

Interactive: DFT vs FFT vs QFT Complexity

Interactive: Quantum Phase Estimation

QPE estimates the eigenvalue phase of a unitary operator. Select a gate (whose eigenvalue on $|1\rangle$ is $e^{2\pi i \phi}$), choose the number of precision bits, and watch QPE recover the phase from the measurement histogram.

Gate:

Exercise: Build QPE for the S Gate

Exercise: QPE for S Gate Phase Estimation

The S gate has eigenvalue $e^{i\pi/2} = e^{2\pi i \cdot 1/4}$ on $|1\rangle$, so $\theta = 1/4 = 0.01_2$. With 3 ancilla qubits, QPE should output $|010\rangle$ (decimal 2, since $2/8 = 1/4$). Complete the circuit below.

OPENQASM 3.0;
qubit[4] q; bit[3] c;
// QPE for S gate: eigenstate = |1>
x q[3];
// TODO: Apply H to ancilla qubits q[0], q[1], q[2]

// Controlled powers of S:
// q[0] controls S^4 = I (do nothing)
// q[1] controls S^2 = Z
// q[2] controls S^1 = S

// TODO: Apply controlled unitaries

// TODO: Apply inverse QFT to q[0..2]

// Measure ancillas
c[0] = measure q[0]; c[1] = measure q[1]; c[2] = measure q[2];