Chapter 15: Programming Quantum Computers

Throughout this textbook, we have built up quantum computing from first principles - bits, qubits, gates, circuits, and algorithms. But how do you actually program a quantum computer? How does an abstract algorithm become instructions that run on real hardware? This chapter bridges theory and practice. We will explore the quantum software stack, write circuits in major frameworks, submit jobs to real processors in the cloud, and learn how transpilation maps ideal circuits onto physical hardware constraints.

15.1 The Quantum Software Stack

Classical computers have a well-defined software stack: application code compiles to machine instructions, which the CPU executes as electrical signals. Quantum computers have an analogous stack with five layers:

Algorithm layer. The high-level quantum algorithm (Shor's, Grover's, VQE) expressed in mathematical terms.
Circuit layer. The algorithm translated into gates acting on qubits, expressed in an intermediate representation like OpenQASM.
Transpilation layer. The abstract circuit adapted to hardware constraints: gate decomposition, qubit routing, and optimization.
Pulse layer. Native gates translated into timed microwave pulses (for superconducting qubits), laser pulses (for trapped ions), or other control signals.
Hardware layer. Physical execution and measurement. Results come back as bitstrings from many repeated "shots."

Key Concept.

The quantum software stack transforms an abstract algorithm into physical control signals through five layers: algorithm, circuit, transpilation, pulse scheduling, and hardware. Understanding each layer helps you write better programs and diagnose unexpected results.

OpenQASM: The Assembly Language of Quantum Computing

OpenQASM (Open Quantum Assembly Language) is the common intermediate representation across much of the quantum ecosystem. All sandbox circuits in this textbook use OpenQASM 3.0 - you have been writing quantum assembly language since Chapter 3. Here is a GHZ state circuit demonstrating the structure of an OpenQASM program:

You should see roughly 50% 000 and 50% 111. The Hadamard puts q[0] into superposition, and the CNOTs propagate it to create $\tfrac{1}{\sqrt{2}}(|000\rangle + |111\rangle)$.

15.2 Introduction to Qiskit

Qiskit is IBM's open-source quantum SDK and the most widely adopted quantum framework. Qiskit 2.0 (2025) rewrote core data structures in Rust for roughly 2x speedup in circuit construction and 20% faster transpilation, and standardized on the primitives-based execution model with Sampler and Estimator.

The Qiskit Workflow

A typical workflow has four stages: Build a circuit, Transpile it for a backend, Execute via primitives, and Analyze results. In Python:

from qiskit import QuantumCircuit
from qiskit.primitives import StatevectorSampler

qc = QuantumCircuit(2, 2)
qc.h(0)
qc.cx(0, 1)
qc.measure([0, 1], [0, 1])

sampler = StatevectorSampler()
result = sampler.run([qc], shots=1024).result()
print(result[0].data.c.get_counts())  # {'00': ~512, '11': ~512}

The equivalent OpenQASM circuit in our sandbox:

Gate Library and U3

The most general single-qubit gate is U3, parameterized by three Euler angles:

$$U(\theta, \phi, \lambda) = \begin{pmatrix} \cos(\theta/2) & -e^{i\lambda}\sin(\theta/2) \\ e^{i\phi}\sin(\theta/2) & e^{i(\phi+\lambda)}\cos(\theta/2) \end{pmatrix}$$

Every single-qubit gate is a special case: H is $U(\pi/2, 0, \pi)$, X is $U(\pi, 0, \pi)$. The sandbox below shows U3 preparing a specific Bloch sphere state:

A comprehensive gate showcase using Clifford gates (H, S, X, Y, Z) and the non-Clifford T:

Simulation Limits.

Simulating $n$ qubits requires storing $2^n$ complex amplitudes. At 16 bytes per amplitude, 30 qubits needs ~16 GB, 40 qubits ~16 TB. This exponential scaling is precisely why we build quantum computers.

15.3 Introduction to Cirq and PennyLane

Cirq: Hardware-Aware Circuit Design

Cirq, developed by Google Quantum AI, emphasizes hardware-aware circuit construction. Qubits are defined with grid coordinates matching physical chip layout, circuits are organized into parallel "moments," and Google's native gates (including the Sycamore gate) are first-class citizens.

import cirq

q0, q1 = cirq.GridQubit(0, 0), cirq.GridQubit(0, 1)
circuit = cirq.Circuit([
    cirq.Moment([cirq.H(q0)]),
    cirq.Moment([cirq.CNOT(q0, q1)]),
    cirq.Moment([cirq.measure(q0, q1, key='result')])
])
result = cirq.Simulator().run(circuit, repetitions=1024)

The Sycamore gate combines an iSWAP-like interaction with a controlled phase. Try it:

PennyLane: Differentiable Quantum Computing

PennyLane (Xanadu) treats quantum circuits as differentiable programs. Circuits are wrapped as QNodes that support automatic differentiation via the parameter-shift rule, enabling gradient-based optimization of variational circuits. Version 0.44 (2026) added integration with the Munich Quantum Toolkit for advanced compilation.

import pennylane as qml
import numpy as np

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev)
def circuit(params):
    qml.RY(params[0], wires=0)
    qml.CNOT(wires=[0, 1])
    qml.RZ(params[1], wires=1)
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

grad_fn = qml.grad(circuit)
gradients = grad_fn(np.array([0.5, 0.3]))

A variational ansatz in the sandbox with parametric rotations and entangling layers:

Framework Comparison

Feature	Qiskit	Cirq	PennyLane
Developer	IBM	Google	Xanadu
Focus	General-purpose	Hardware-aware	Differentiable QC
Native hardware	IBM Heron/Nighthawk	Google Sycamore/Willow	Multi-backend
Gradient support	Via qiskit-algorithms	Manual	Built-in
Version (2026)	2.2	1.4+	0.44

Interoperability.

The frameworks are increasingly interoperable. PennyLane executes on IBM and Google hardware via plugins. Qiskit exports OpenQASM that Cirq can import. Amazon Braket wraps hardware from multiple vendors. Choose the framework that fits your workflow.

15.4 Running on Real Hardware

Running on real hardware is fundamentally different from simulation. Simulators give exact probabilities; real processors give noisy samples. The typical IBM submission workflow:

from qiskit_ibm_runtime import QiskitRuntimeService, SamplerV2
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager

service = QiskitRuntimeService(channel="ibm_quantum")
backend = service.least_busy(operational=True, simulator=False)
pm = generate_preset_pass_manager(optimization_level=2, backend=backend)
transpiled = pm.run(circuit)

sampler = SamplerV2(mode=backend)
job = sampler.run([transpiled], shots=4096)
counts = job.result()[0].data.c.get_counts()

Shot Noise

Even a perfect quantum computer produces probabilistic results. For a state with probability $p$ measured over $N$ shots, the relative uncertainty is $\Delta p / p \approx 1/\sqrt{Np}$. Run the Bell state below multiple times - counts will fluctuate around 512 with standard deviation ~16, even with no hardware noise.

Shots: 1024

Hardware Errors

Real processors suffer from four main error types:

Gate errors. Two-qubit gates have 0.1-1% error rates; single-qubit gates are 10-100x more reliable. IBM Heron r2 achieves ~0.3% two-qubit error.
Measurement errors. A $|0\rangle$ might be read as 1 with 0.5-2% probability.
Decoherence. Qubits lose quantum properties over time. $T_1$ (relaxation) and $T_2$ (dephasing) are typically 100-300 microseconds for superconducting qubits.
Crosstalk. Operations on one qubit inadvertently affect neighbors.

Common Misconception.

Seeing unexpected outcomes does not mean your circuit is wrong. On real hardware, noise always produces some incorrect outcomes. If you expect 50/50 between 00 and 11 but get 47/46/4/3%, that is typical noise, not a bug.

Error Mitigation

Before full quantum error correction (Chapters 17-20), several mitigation techniques help: measurement error mitigation (characterize and invert the confusion matrix), zero-noise extrapolation (run at amplified noise levels, extrapolate to zero), probabilistic error cancellation (represent ideal circuit as combination of noisy circuits), and dynamical decoupling (insert pulse sequences during idle periods to suppress low-frequency noise).

15.5 Transpilation and Circuit Optimization

Transpilation transforms an abstract circuit into one that runs on specific hardware. Three challenges drive it: gate decomposition, qubit routing, and optimization.

Circuit:

OPENQASM 3.0; include "stdgates.inc"; qubit[3] q; bit[3] c; h q[0]; cx q[0], q[1]; cx q[0], q[2]; c = measure q;

Gate Decomposition

Real processors implement only a small native gate set. IBM Heron uses {SX, RZ, CZ}. Any other gate must be decomposed. The Hadamard decomposes as $H = RZ(\pi/2) \cdot SX \cdot RZ(\pi/2)$. A Toffoli requires ~6 CNOTs plus single-qubit gates:

Qubit Routing and SWAP Networks

Most processors lack all-to-all connectivity. If your circuit needs a CNOT between non-adjacent qubits, the transpiler inserts SWAP gates (each costing 3 CNOTs). The SWAP decomposition:

Verify with the native SWAP gate:

Optimization Levels

Qiskit offers four optimization levels trading compilation time for circuit quality:

Level	Strategy	Use Case
0	Minimal: map and decompose only	Debugging
1	Light: basic gate cancellation (default)	Prototyping
2	Medium: noise-aware layout, commutation	Production
3	Heavy: unitary resynthesis, multiple trials	Best quality

Optimization Patterns

Gate cancellation ($XX = I$, $HH = I$), rotation merging ($R_z(\alpha)R_z(\beta) = R_z(\alpha+\beta)$), and commutation analysis are core techniques. Verify gate cancellation:

Multi-Qubit Gate Decomposition Costs

Gate	CNOTs	Notes
CNOT	1	Already two-qubit
CZ	1	CNOT + Hadamards
SWAP	3	Three CNOTs
iSWAP	2	Two CNOTs + single-qubit
Toffoli (CCX)	6	Standard decomposition
Fredkin (CSWAP)	8	Or 5 with Toffoli
General 2-qubit	$\leq 3$	KAK decomposition

The iSWAP gate, native to some Google processors, and the ECR gate, native to IBM:

Advanced Gate Patterns

Controlled-Hadamard (CH), controlled-SX (CSX), DCX (double-CNOT), and CCZ:

Two-qubit Ising rotation gates (RXX, RYY, RZZ) and controlled rotations (CRX, CRY, CRZ, CP) are essential for variational ansatze and phase estimation:

Building a Quantum Fourier Transform

The QFT, core of Shor's algorithm and phase estimation, uses Hadamard gates and controlled phase rotations with decreasing angles. Each controlled-phase decomposes to 2 CNOTs on hardware, making transpilation cost significant:

15.6 Quantum Cloud Platforms

Quantum computers operate at millikelvin temperatures or under ultra-high vacuum - no one has one on their desk. Quantum computing is delivered as a cloud service.

IBM Quantum

IBM operates the largest public quantum fleet: 156-qubit Heron r2/r3, the newer 120-qubit Nighthawk optimized for lower errors, and a roadmap including the 1,386-qubit Kookaburra multi-chip processor. Native gates: {SX, RZ, CZ}. Qiskit 2.2 is the primary SDK. Free tier provides simulator and small processor access.

Amazon Braket

Braket provides multi-vendor access through AWS: superconducting (IQM Garnet/Emerald, Rigetti), trapped ion (IonQ Aria/Forte, AQT IBEX), and neutral atom (QuEra) processors. Includes on-demand simulators (SV1, DM1, TN1) and a free local simulator. In 2026, Braket expanded Qiskit integration for cross-platform job submission.

Azure Quantum

Microsoft's platform offers IonQ (25-qubit Aria, 36-qubit Forte) and Quantinuum (System Model H2) trapped-ion processors. The 2026 QDK update added chemistry-aware algorithms. Microsoft and Quantinuum are developing a 24-logical-qubit system.

Google Quantum AI

Google's Willow processor demonstrated below-threshold error correction in 2024. Access is primarily through research partnerships. Native gates include the Sycamore gate and $\sqrt{\text{iSWAP}}$.

Platform Comparison

Feature	IBM Quantum	Amazon Braket	Azure Quantum	Google QAI
SDK	Qiskit	Braket SDK	QDK / Q#	Cirq
Hardware	IBM only (SC)	Multi-vendor	Multi-vendor	Google only (SC)
Max qubits	156	Varies	36 (Forte)	105 (Willow)
Connectivity	Heavy-hex	Varies	All-to-all (ions)	Grid
Free tier	Yes	Simulator only	Credits	Research

Key Concept.

Different qubit technologies have different strengths. Superconducting qubits (IBM, Google) offer fast gates (~tens of nanoseconds) but limited connectivity. Trapped ions (IonQ, Quantinuum) provide all-to-all connectivity and higher fidelity but slower gates (~microseconds). Neutral atoms (QuEra) can scale to hundreds of qubits with reconfigurable connectivity. The best platform depends on your algorithm's requirements.

Complete Workflow: Grover's Algorithm

Tracing an algorithm from idea to hardware-ready circuit. Grover's search for $|11\rangle$ among 4 items uses a CZ oracle and a diffusion operator:

One Grover iteration finds 11 with 100% probability for 4 items. On IBM Heron, CZ is native, H decomposes to SX+RZ, and the full circuit needs ~2 two-qubit gates.

A more complex example: quantum teleportation, transferring a state using entanglement:

Exercises

Exercise 15.1. Write a circuit that creates the 4-qubit GHZ state $\tfrac{1}{\sqrt{2}}(|0000\rangle + |1111\rangle)$. How many CNOTs does it need?

OPENQASM 3.0;
include "stdgates.inc";
qubit[4] q;
bit[4] c;

// Build a 4-qubit GHZ state
h q[0];
cx q[0], q[1];
cx q[1], q[2];
cx q[2], q[3];

c = measure q;

Exercise 15.2. Decompose SWAP using three CZ gates and Hadamard gates. Verify it swaps $|10\rangle$ to $|01\rangle$. (Hint: $\text{CX} = (I \otimes H) \cdot \text{CZ} \cdot (I \otimes H)$.)

Exercise 15.3. Implement a 2-qubit phase estimation circuit. Use an eigenstate $|1\rangle$ on the target, a controlled-T gate, and inverse QFT on the estimation register. What outcomes do you expect?

Exercise 15.4. Create the W state: $\tfrac{1}{\sqrt{3}}(|001\rangle + |010\rangle + |100\rangle)$. Unlike GHZ, the W state remains entangled if one qubit is lost. (Hint: use $R_y$ rotations and controlled gates.)

Exercise 15.5. Consider H on q[0], CNOT(q[0],q[1]), CZ(q[1],q[2]) on a linear chain q[0]-q[1]-q[2]. How many SWAPs are needed? What about triangular connectivity?

Exercise 15.6. Decompose H into IBM's native gates (RZ and SX only). Verify:

Exercise 15.7. For all-to-all connectivity among 5 qubits, estimate minimum SWAPs needed on heavy-hex (max 3 neighbors) vs. grid (max 4 neighbors) topology.

Exercise 15.8. Run this circuit and explain the measurement statistics: