Chapter 15: Programming Quantum Computers
Throughout this textbook, we have built up quantum computing from first principles - bits, qubits, gates, circuits, and algorithms. But how do you actually program a quantum computer? How does an abstract algorithm become instructions that run on real hardware? This chapter bridges theory and practice. We will explore the quantum software stack, write circuits in major frameworks, submit jobs to real processors in the cloud, and learn how transpilation maps ideal circuits onto physical hardware constraints.
15.1 The Quantum Software Stack
Classical computers have a well-defined software stack: application code compiles to machine instructions, which the CPU executes as electrical signals. Quantum computers have an analogous stack with five layers:
- Algorithm layer. The high-level quantum algorithm (Shor's, Grover's, VQE) expressed in mathematical terms.
- Circuit layer. The algorithm translated into gates acting on qubits, expressed in an intermediate representation like OpenQASM.
- Transpilation layer. The abstract circuit adapted to hardware constraints: gate decomposition, qubit routing, and optimization.
- Pulse layer. Native gates translated into timed microwave pulses (for superconducting qubits), laser pulses (for trapped ions), or other control signals.
- Hardware layer. Physical execution and measurement. Results come back as bitstrings from many repeated "shots."
The quantum software stack transforms an abstract algorithm into physical control signals through five layers: algorithm, circuit, transpilation, pulse scheduling, and hardware. Understanding each layer helps you write better programs and diagnose unexpected results.
OpenQASM: The Assembly Language of Quantum Computing
OpenQASM (Open Quantum Assembly Language) is the common intermediate representation across much of the quantum ecosystem. All sandbox circuits in this textbook use OpenQASM 3.0 - you have been writing quantum assembly language since Chapter 3. Here is a GHZ state circuit demonstrating the structure of an OpenQASM program:
You should see roughly 50% 000 and 50% 111. The Hadamard puts
q[0] into superposition, and the CNOTs propagate it to create
$\tfrac{1}{\sqrt{2}}(|000\rangle + |111\rangle)$.
15.2 Introduction to Qiskit
Qiskit is IBM's open-source quantum SDK and the most widely adopted quantum framework.
Qiskit 2.0 (2025) rewrote core data structures in Rust for roughly 2x speedup in circuit
construction and 20% faster transpilation, and standardized on the primitives-based
execution model with Sampler and Estimator.
The Qiskit Workflow
A typical workflow has four stages: Build a circuit, Transpile it for a backend, Execute via primitives, and Analyze results. In Python:
from qiskit import QuantumCircuit
from qiskit.primitives import StatevectorSampler
qc = QuantumCircuit(2, 2)
qc.h(0)
qc.cx(0, 1)
qc.measure([0, 1], [0, 1])
sampler = StatevectorSampler()
result = sampler.run([qc], shots=1024).result()
print(result[0].data.c.get_counts()) # {'00': ~512, '11': ~512}
The equivalent OpenQASM circuit in our sandbox:
Gate Library and U3
The most general single-qubit gate is U3, parameterized by three Euler angles:
$$U(\theta, \phi, \lambda) = \begin{pmatrix} \cos(\theta/2) & -e^{i\lambda}\sin(\theta/2) \\ e^{i\phi}\sin(\theta/2) & e^{i(\phi+\lambda)}\cos(\theta/2) \end{pmatrix}$$Every single-qubit gate is a special case: H is $U(\pi/2, 0, \pi)$, X is $U(\pi, 0, \pi)$. The sandbox below shows U3 preparing a specific Bloch sphere state:
A comprehensive gate showcase using Clifford gates (H, S, X, Y, Z) and the non-Clifford T:
Simulating $n$ qubits requires storing $2^n$ complex amplitudes. At 16 bytes per amplitude, 30 qubits needs ~16 GB, 40 qubits ~16 TB. This exponential scaling is precisely why we build quantum computers.
15.3 Introduction to Cirq and PennyLane
Cirq: Hardware-Aware Circuit Design
Cirq, developed by Google Quantum AI, emphasizes hardware-aware circuit construction. Qubits are defined with grid coordinates matching physical chip layout, circuits are organized into parallel "moments," and Google's native gates (including the Sycamore gate) are first-class citizens.
import cirq
q0, q1 = cirq.GridQubit(0, 0), cirq.GridQubit(0, 1)
circuit = cirq.Circuit([
cirq.Moment([cirq.H(q0)]),
cirq.Moment([cirq.CNOT(q0, q1)]),
cirq.Moment([cirq.measure(q0, q1, key='result')])
])
result = cirq.Simulator().run(circuit, repetitions=1024)
The Sycamore gate combines an iSWAP-like interaction with a controlled phase. Try it:
PennyLane: Differentiable Quantum Computing
PennyLane (Xanadu) treats quantum circuits as differentiable programs. Circuits are wrapped as QNodes that support automatic differentiation via the parameter-shift rule, enabling gradient-based optimization of variational circuits. Version 0.44 (2026) added integration with the Munich Quantum Toolkit for advanced compilation.
import pennylane as qml
import numpy as np
dev = qml.device("default.qubit", wires=2)
@qml.qnode(dev)
def circuit(params):
qml.RY(params[0], wires=0)
qml.CNOT(wires=[0, 1])
qml.RZ(params[1], wires=1)
return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))
grad_fn = qml.grad(circuit)
gradients = grad_fn(np.array([0.5, 0.3]))
A variational ansatz in the sandbox with parametric rotations and entangling layers:
Framework Comparison
| Feature | Qiskit | Cirq | PennyLane |
|---|---|---|---|
| Developer | IBM | Xanadu | |
| Focus | General-purpose | Hardware-aware | Differentiable QC |
| Native hardware | IBM Heron/Nighthawk | Google Sycamore/Willow | Multi-backend |
| Gradient support | Via qiskit-algorithms | Manual | Built-in |
| Version (2026) | 2.2 | 1.4+ | 0.44 |
The frameworks are increasingly interoperable. PennyLane executes on IBM and Google hardware via plugins. Qiskit exports OpenQASM that Cirq can import. Amazon Braket wraps hardware from multiple vendors. Choose the framework that fits your workflow.
15.4 Running on Real Hardware
Running on real hardware is fundamentally different from simulation. Simulators give exact probabilities; real processors give noisy samples. The typical IBM submission workflow:
from qiskit_ibm_runtime import QiskitRuntimeService, SamplerV2
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
service = QiskitRuntimeService(channel="ibm_quantum")
backend = service.least_busy(operational=True, simulator=False)
pm = generate_preset_pass_manager(optimization_level=2, backend=backend)
transpiled = pm.run(circuit)
sampler = SamplerV2(mode=backend)
job = sampler.run([transpiled], shots=4096)
counts = job.result()[0].data.c.get_counts()
Shot Noise
Even a perfect quantum computer produces probabilistic results. For a state with probability $p$ measured over $N$ shots, the relative uncertainty is $\Delta p / p \approx 1/\sqrt{Np}$. Run the Bell state below multiple times - counts will fluctuate around 512 with standard deviation ~16, even with no hardware noise.
Hardware Errors
Real processors suffer from four main error types:
- Gate errors. Two-qubit gates have 0.1-1% error rates; single-qubit gates are 10-100x more reliable. IBM Heron r2 achieves ~0.3% two-qubit error.
- Measurement errors. A $|0\rangle$ might be read as 1 with 0.5-2% probability.
- Decoherence. Qubits lose quantum properties over time. $T_1$ (relaxation) and $T_2$ (dephasing) are typically 100-300 microseconds for superconducting qubits.
- Crosstalk. Operations on one qubit inadvertently affect neighbors.
Seeing unexpected outcomes does not mean your circuit is wrong. On real hardware, noise
always produces some incorrect outcomes. If you expect 50/50 between 00 and
11 but get 47/46/4/3%, that is typical noise, not a bug.
Error Mitigation
Before full quantum error correction (Chapters 17-20), several mitigation techniques help: measurement error mitigation (characterize and invert the confusion matrix), zero-noise extrapolation (run at amplified noise levels, extrapolate to zero), probabilistic error cancellation (represent ideal circuit as combination of noisy circuits), and dynamical decoupling (insert pulse sequences during idle periods to suppress low-frequency noise).
15.5 Transpilation and Circuit Optimization
Transpilation transforms an abstract circuit into one that runs on specific hardware. Three challenges drive it: gate decomposition, qubit routing, and optimization.
Gate Decomposition
Real processors implement only a small native gate set. IBM Heron uses {SX, RZ, CZ}. Any other gate must be decomposed. The Hadamard decomposes as $H = RZ(\pi/2) \cdot SX \cdot RZ(\pi/2)$. A Toffoli requires ~6 CNOTs plus single-qubit gates:
Qubit Routing and SWAP Networks
Most processors lack all-to-all connectivity. If your circuit needs a CNOT between non-adjacent qubits, the transpiler inserts SWAP gates (each costing 3 CNOTs). The SWAP decomposition:
Verify with the native SWAP gate:
Optimization Levels
Qiskit offers four optimization levels trading compilation time for circuit quality:
| Level | Strategy | Use Case |
|---|---|---|
| 0 | Minimal: map and decompose only | Debugging |
| 1 | Light: basic gate cancellation (default) | Prototyping |
| 2 | Medium: noise-aware layout, commutation | Production |
| 3 | Heavy: unitary resynthesis, multiple trials | Best quality |
Optimization Patterns
Gate cancellation ($XX = I$, $HH = I$), rotation merging ($R_z(\alpha)R_z(\beta) = R_z(\alpha+\beta)$), and commutation analysis are core techniques. Verify gate cancellation:
Multi-Qubit Gate Decomposition Costs
| Gate | CNOTs | Notes |
|---|---|---|
| CNOT | 1 | Already two-qubit |
| CZ | 1 | CNOT + Hadamards |
| SWAP | 3 | Three CNOTs |
| iSWAP | 2 | Two CNOTs + single-qubit |
| Toffoli (CCX) | 6 | Standard decomposition |
| Fredkin (CSWAP) | 8 | Or 5 with Toffoli |
| General 2-qubit | $\leq 3$ | KAK decomposition |
The iSWAP gate, native to some Google processors, and the ECR gate, native to IBM:
Advanced Gate Patterns
Controlled-Hadamard (CH), controlled-SX (CSX), DCX (double-CNOT), and CCZ:
Two-qubit Ising rotation gates (RXX, RYY, RZZ) and controlled rotations (CRX, CRY, CRZ, CP) are essential for variational ansatze and phase estimation:
Building a Quantum Fourier Transform
The QFT, core of Shor's algorithm and phase estimation, uses Hadamard gates and controlled phase rotations with decreasing angles. Each controlled-phase decomposes to 2 CNOTs on hardware, making transpilation cost significant:
15.6 Quantum Cloud Platforms
Quantum computers operate at millikelvin temperatures or under ultra-high vacuum - no one has one on their desk. Quantum computing is delivered as a cloud service.
IBM Quantum
IBM operates the largest public quantum fleet: 156-qubit Heron r2/r3, the newer 120-qubit Nighthawk optimized for lower errors, and a roadmap including the 1,386-qubit Kookaburra multi-chip processor. Native gates: {SX, RZ, CZ}. Qiskit 2.2 is the primary SDK. Free tier provides simulator and small processor access.
Amazon Braket
Braket provides multi-vendor access through AWS: superconducting (IQM Garnet/Emerald, Rigetti), trapped ion (IonQ Aria/Forte, AQT IBEX), and neutral atom (QuEra) processors. Includes on-demand simulators (SV1, DM1, TN1) and a free local simulator. In 2026, Braket expanded Qiskit integration for cross-platform job submission.
Azure Quantum
Microsoft's platform offers IonQ (25-qubit Aria, 36-qubit Forte) and Quantinuum (System Model H2) trapped-ion processors. The 2026 QDK update added chemistry-aware algorithms. Microsoft and Quantinuum are developing a 24-logical-qubit system.
Google Quantum AI
Google's Willow processor demonstrated below-threshold error correction in 2024. Access is primarily through research partnerships. Native gates include the Sycamore gate and $\sqrt{\text{iSWAP}}$.
Platform Comparison
| Feature | IBM Quantum | Amazon Braket | Azure Quantum | Google QAI |
|---|---|---|---|---|
| SDK | Qiskit | Braket SDK | QDK / Q# | Cirq |
| Hardware | IBM only (SC) | Multi-vendor | Multi-vendor | Google only (SC) |
| Max qubits | 156 | Varies | 36 (Forte) | 105 (Willow) |
| Connectivity | Heavy-hex | Varies | All-to-all (ions) | Grid |
| Free tier | Yes | Simulator only | Credits | Research |
Different qubit technologies have different strengths. Superconducting qubits (IBM, Google) offer fast gates (~tens of nanoseconds) but limited connectivity. Trapped ions (IonQ, Quantinuum) provide all-to-all connectivity and higher fidelity but slower gates (~microseconds). Neutral atoms (QuEra) can scale to hundreds of qubits with reconfigurable connectivity. The best platform depends on your algorithm's requirements.
Complete Workflow: Grover's Algorithm
Tracing an algorithm from idea to hardware-ready circuit. Grover's search for $|11\rangle$ among 4 items uses a CZ oracle and a diffusion operator:
One Grover iteration finds 11 with 100% probability for 4 items. On IBM Heron,
CZ is native, H decomposes to SX+RZ, and the full circuit needs ~2 two-qubit gates.
A more complex example: quantum teleportation, transferring a state using entanglement:
Exercises
Exercise 15.1. Write a circuit that creates the 4-qubit GHZ state $\tfrac{1}{\sqrt{2}}(|0000\rangle + |1111\rangle)$. How many CNOTs does it need?
Exercise 15.2. Decompose SWAP using three CZ gates and Hadamard gates. Verify it swaps $|10\rangle$ to $|01\rangle$. (Hint: $\text{CX} = (I \otimes H) \cdot \text{CZ} \cdot (I \otimes H)$.)
Exercise 15.3. Implement a 2-qubit phase estimation circuit. Use an eigenstate $|1\rangle$ on the target, a controlled-T gate, and inverse QFT on the estimation register. What outcomes do you expect?
Exercise 15.4. Create the W state: $\tfrac{1}{\sqrt{3}}(|001\rangle + |010\rangle + |100\rangle)$. Unlike GHZ, the W state remains entangled if one qubit is lost. (Hint: use $R_y$ rotations and controlled gates.)
Exercise 15.5. Consider H on q[0], CNOT(q[0],q[1]), CZ(q[1],q[2]) on a linear chain q[0]-q[1]-q[2]. How many SWAPs are needed? What about triangular connectivity?
Exercise 15.6. Decompose H into IBM's native gates (RZ and SX only). Verify:
Exercise 15.7. For all-to-all connectivity among 5 qubits, estimate minimum SWAPs needed on heavy-hex (max 3 neighbors) vs. grid (max 4 neighbors) topology.
Exercise 15.8. Run this circuit and explain the measurement statistics: