1 The NISQ Era
The term NISQ — Noisy Intermediate-Scale Quantum — was coined by John Preskill in 2018 to describe the quantum hardware available today and in the near future: devices with 50–1000 qubits that are too large to simulate classically yet too noisy to run full fault-tolerant quantum algorithms.
"We have entered a new era of quantum technology. The most advanced quantum computers now have more than 50 qubits — enough to be beyond the reach of exact classical simulation. But they are very noisy. The term 'NISQ' captures both aspects."
— John Preskill, 2018
1.1 Three Generations of Quantum Hardware
1.2 Coherence and Decoherence
The central physical challenge of NISQ hardware is decoherence — the loss of quantum coherence (superposition and entanglement) due to the qubit's interaction with its environment. Qubits are quantum systems that must remain isolated to maintain their quantum state, but perfect isolation is physically impossible.
Maintaining a quantum state is like trying to keep a bucket of water full when it has tiny holes in it.
- $T_1$ (The Leak): This is like water leaking out of the bucket. Your $\ket{1}$ state (energy) "leaks" back down to the $\ket{0}$ state (ground). Once it leaks, the energy is gone.
- $T_2$ (The Shakiness): This is like someone shaking the bucket. Even if the water doesn't leak out, the waves (relative phases) get all messed up and random. In quantum computing, shaking the phase is usually faster than leaking the energy, so $T_2$ is shorter than $T_1$.
For superconducting qubits: $T_1 \approx 100\,\mu\text{s}$, $T_2 \approx 50\,\mu\text{s}$. If a gate takes 100ns, you can only do about 500 operations before your "bucket" is empty or too shaky to use.
The physical picture: a qubit in state $\ket{1}$ can spontaneously lose energy to the environment (dropping to $\ket{0}$, governed by $T_1$), and the relative phase between $\ket{0}$ and $\ket{1}$ can randomise due to fluctuating electromagnetic fields (governed by $T_2$). Both processes are irreversible — they cannot be undone by applying a unitary gate.
Advanced: the Lindblad master equation (formal description of decoherence)
For readers familiar with density matrices ($\rho$): the evolution of an open quantum system is described by the Lindblad equation: $$\frac{d\rho}{dt} = -\frac{i}{\hbar}[H,\rho] + \sum_k \left(L_k \rho L_k^\dagger - \frac{1}{2}\{L_k^\dagger L_k, \rho\}\right)$$ where $[\cdot,\cdot]$ is the commutator, $\{\cdot,\cdot\}$ is the anti-commutator ($\{A,B\}=AB+BA$), and the $L_k$ are jump operators (Lindblad operators) representing noise channels: amplitude damping ($L_1 = \sqrt{\gamma_1}\ket{0}\bra{1}$, rate $\gamma_1 = 1/T_1$), and phase damping ($L_2 = \sqrt{\gamma_2/2}\sigma_z$, rate related to $T_2$). The first term is ordinary unitary evolution; the second term captures irreversible noise. Don't worry if this notation is unfamiliar — the density matrix formalism is covered in advanced quantum mechanics courses. The key takeaway for us is simply that $T_1$ and $T_2$ set the timescales for energy loss and phase randomisation, respectively.
| Parameter | Physical meaning | Effect | Typical (IBM superconducting, 2024) |
|---|---|---|---|
| $T_1$ (relaxation) | Time for $\ket{1}\to\ket{0}$ spontaneous decay | Amplitude damping; sets lifetime of amplitude information | 100–500 μs |
| $T_2$ (dephasing) | Time for relative phase between $\ket{0}$ and $\ket{1}$ to randomise | Phase damping; always $T_2 \leq 2T_1$ (fundamental bound) | 50–300 μs |
| $t_{\text{gate}}$ (2-qubit) | Time to execute one CNOT/CZ gate | Circuit depth limit: $\approx T_2 / t_{\text{gate}}$ gates before dephasing | ~100–500 ns |
| Circuit depth limit | $T_2 / t_{\text{gate}}$ | Approx. max number of 2-qubit gates before SNR → 1 | ~500–3000 gates |
1.4 Interactive Decoherence Simulator
Observe how a qubit state loses its energy ($T_1$ relaxation towards $\ket{0}$) and loses its phase information ($T_2$ dephasing, shrinking to the Z-axis) over time. Adjust the $T_1$ and $T_2$ times and click "Play".
As a concrete illustration of why NISQ devices can't run deep quantum algorithms, consider running Grover's search on $n=20$ qubits ($N=2^{20} \approx 10^6$ elements). The ideal Grover circuit needs $k_\text{opt} = \lfloor\frac{\pi}{4}\sqrt{N}\rfloor \approx 804$ iterations.
Each Grover iteration on $n=20$ qubits requires roughly $O(n) \approx 40$ two-qubit gates (for the oracle and diffuser). With a typical 2-qubit gate error rate $\epsilon \approx 0.5\%$:
More practically, after $k$ iterations the circuit has accumulated $\sim k \times n \times \epsilon$ total error probability. For $k=804$, $n=20$, $\epsilon=0.005$: total error $\approx 804 \times 20 \times 0.005 = 80.4\%$ — the output is essentially random noise, providing no advantage over a classical coin flip.
Even at $k=10$ iterations (far from optimal), the error is $10 \times 20 \times 0.005 = 100\%$. Conclusion: Grover on 20 qubits is completely infeasible on current NISQ hardware. This is the fundamental motivation for VQAs — they use $O(1)$ or $O(\text{polylog}(n))$ depth circuits, keeping error accumulation manageable.
1.3 DiVincenzo Criteria for Quantum Computing
In 2000, David DiVincenzo articulated five (later expanded to seven) criteria that any physical system must satisfy to be a viable quantum computer. These provide a systematic checklist against which all qubit platforms are evaluated.
| # | Criterion | Description & Challenge |
|---|---|---|
| 1 | Scalable with well-characterised qubits | The system must have a scalable Hilbert space and each qubit must be identifiable and characterisable (we must know the Hamiltonian). Challenge: cross-talk between qubits grows with scale. |
| 2 | Ability to initialise qubits to a simple reference state | Must reliably prepare $\ket{0^n}$ before every computation. Achieved by: active reset (measure-and-flip), thermalization (wait for $T_1$), or feedback control. Current fidelity: ~99.9%. |
| 3 | Long relevant decoherence times | $T_1, T_2$ must be much longer than gate operation times. The ratio $T_2/t_{\text{gate}}$ must be large enough for useful computation. Current best: ~$10^4$–$10^6$ for trapped ions. |
| 4 | Universal set of quantum gates | Any single-qubit gate + CNOT is sufficient for universality. Physical gate sets: rotations $R_x, R_y, R_z$ + CNOT (superconducting), or $R_z$ + CZ (Google). Gate fidelities: 99.9% single-qubit, ~99.5% two-qubit. |
| 5 | Qubit-specific measurement ability | Read out individual qubits without disturbing unmeasured qubits. Achieved via: dispersive readout (superconducting), fluorescence detection (trapped ions). Readout fidelities: 98–99.5%. |
2 Variational Quantum Algorithms (VQA)
Given that NISQ devices can only run shallow circuits reliably, the community developed a class of hybrid quantum-classical algorithms called Variational Quantum Algorithms (VQAs). The idea is elegant: use a shallow parameterised quantum circuit to prepare a trial quantum state (cheaply, on the noisy device), estimate a cost function from measurements (also on the quantum device), and then update the parameters using a classical optimiser (on a classical computer). The quantum device does what it's good at (state preparation and expectation values); the classical computer does what it's good at (parameter updates).
2.1 The Hybrid Loop — "Co-stewarding" the Problem
Think of a VQA as a constant conversation between a quantum device and a classical computer.
Imagine you are trying to tune an old-style radio to a very faint station. You have a knob (the quantum parameters $\theta$).
- Your hand turns the knob a little bit (the classical computer).
- You listen to hear if the static gets quieter (the quantum measurement).
You repeat this over and over until you find the perfect position. The quantum computer's job is to play the music (calculate the energy), but the classical computer's job is to do the "thinking" (turning the knob).
Prepares $| \psi(\theta) \rangle$ by turning the knobs
Decides how to turn the knobs next
2.2 The Cost Function
A good VQA cost function must satisfy three properties:
- Faithful: The minimum of $C(\theta)$ corresponds to the solution of the problem. For VQE: $C(\theta) = \langle\psi(\theta)|H|\psi(\theta)\rangle$, minimised when $\ket{\psi(\theta)}$ is the ground state of $H$, by the variational principle.
- Estimable: $C(\theta)$ can be estimated efficiently from measurements on the quantum device. This typically requires expressing $H$ as a sum of Pauli strings and measuring each term separately.
- Trainable: The gradient $\nabla_\theta C(\theta)$ must be efficiently computable (not exponentially small — the barren plateau problem, discussed below).
2.3 Ansatz Design
An Ansatz (German for "approach") is the blueprint for your quantum circuit. It's a circuit with "empty slots" (the parameters $\theta$) that you haven't filled in yet.
Think of it like a recipe where you haven't decided the amount of salt or sugar yet. Your job is to find the perfect amounts ($\theta$) to make the dish (the quantum state) taste the best (have the lowest energy).
In quantum chemistry, the state of electrons in a molecule is described using creation ($a_i^\dagger$) and annihilation ($a_i$) operators. Think of them as "place" and "remove" operators for electrons:
- $a_i^\dagger$ creates an electron in orbital $i$: maps $\ket{0}_i \to \ket{1}_i$ (empty → occupied)
- $a_i$ annihilates an electron from orbital $i$: maps $\ket{1}_i \to \ket{0}_i$ (occupied → empty)
- $a_a^\dagger a_i$ moves an electron from orbital $i$ to orbital $a$ (a "single excitation")
- $a_a^\dagger a_b^\dagger a_j a_i$ moves two electrons at once (a "double excitation")
These operators obey anti-commutation relations (swapping two fermions picks up a minus sign), which is why mapping them to qubits requires special transforms (Jordan–Wigner, Bravyi–Kitaev — see §3.2 below).
Problem-Inspired Ansatz
Motivated by the physics/chemistry of the target problem. Example: UCCSD (Unitary Coupled Cluster with Singles and Doubles) for quantum chemistry — mimics the classical coupled cluster expansion in a unitary form.
$U_{\text{UCCSD}} = e^{T-T^\dagger}$ where $T = \sum_{ia}t_i^a a_a^\dagger a_i +
\sum_{ijab}t_{ij}^{ab}a_a^\dagger a_b^\dagger a_j a_i$
(Here $t_i^a, t_{ij}^{ab}$ are the trainable parameters — "how much
to excite electrons between orbitals.")
Pro: Physically motivated, often works well.
Con:
Deep circuit — may be too noisy for NISQ.
Hardware-Efficient Ansatz (HEA)
Layers of single-qubit rotations followed by native 2-qubit gates that match the hardware's connectivity graph. Uses only gates the device can execute natively and shallowly.
$U(\theta) = \prod_{l=1}^{L} \left[\bigotimes_i R_y(\theta_i^l) R_z(\theta_i^{'l})\right] \cdot \text{Entangling layer}$
Pro: Shallow, hardware-friendly.
Con: May
not explore the right part of Hilbert space for the problem.
| Ansatz type | Depth | Parameters | Pros | Cons | Best for |
|---|---|---|---|---|---|
| UCCSD | $O(n^4)$ gates | $O(n^4)$ | Systematically improvable; chemical accuracy achievable | Extremely deep; intractable on NISQ for $n > 10$ | Small molecules on fault-tolerant hardware |
| Hardware-Efficient (HEA) | $O(nL)$ gates for $L$ layers | $O(nL)$ | Shallow; uses native hardware gates; NISQ-compatible | Not problem-inspired; barren plateaus at large $L$ | NISQ experiments; benchmarking |
| QAOA ansatz | $2p$ layers of $U_C, U_B$ | $2p$ | Problem-inspired; adiabatic limit at $p\to\infty$ | Many shots per layer; classical optimisation hard | Combinatorial optimisation (MaxCut, TSP) |
| Tensor Network (MPS, MERA) | $O(n\chi^2)$ for bond dim. $\chi$ | $O(n\chi^2)$ | Classically simulable; captures area-law entanglement | Limited entanglement; may miss volume-law states | 1D physics, strongly correlated systems |
| Quantum Convolutional (QCNN) | $O(\log n)$ | $O(\log n)$ | Avoids barren plateaus; exponentially few parameters | Limited expressibility; requires translationally invariant problems | Quantum phase recognition, image classification |
2.4 The Parameter-Shift Rule: Computing Gradients on Quantum Hardware
A key challenge: you cannot compute derivatives numerically (finite differences) on a quantum computer without many shots of noise. The parameter-shift rule provides an exact gradient evaluation using only two circuit evaluations per parameter.
Imagine you are blindfolded on a mountain and want to know which way is "down." You can't see the slope directly, but you can take two steps.
- Take a step forward (+π/2) and measure the height.
- Take a step backward (-π/2) and measure the height.
The difference between those two heights tells you the exact slope at the center point where you started.
The Mathematical Catch: This works because for quantum gates with Pauli generators (eigenvalues $\pm 1/2$), the cost function is a simple sine wave. For a sine wave, you can always find the exact slope at any point by checking two points exactly $\pi$ apart.
The key insight: when the gate $U(\theta) = e^{-i\theta G}$ has a Pauli generator $G$ (with eigenvalues $\pm \frac{1}{2}$), the expectation value $C(\theta) = \langle\hat{O}\rangle_\theta$ is a sinusoidal function of $\theta$ — it has the form $C(\theta) = A + B\cos\theta + D\sin\theta$.
For any pure sinusoid $f(\theta) = A\sin(\theta + \phi)$, its derivative $f'(\theta) = A\cos(\theta+\phi) = A\sin(\theta+\phi+\frac{\pi}{2})$, so: $$f'(\theta) = \frac{f(\theta+\pi/2) - f(\theta-\pi/2)}{2}$$ This is the central difference formula, but for a sinusoid it is exact (no truncation error!). The parameter-shift rule is simply this fact applied to quantum circuits — classical finite differences are approximate for general functions, but quantum gates produce pure sinusoids, so the "shift trick" gives exact gradients.
2.5 Classical Optimizers for VQAs
The classical optimizer takes gradient information (from the parameter-shift rule) and updates the parameters. The choice of optimizer significantly affects VQA performance.
- Gradient Descent (SGD): $\theta_{t+1} = \theta_t - \eta \nabla C(\theta_t)$. Simple but slow convergence; sensitive to learning rate $\eta$. Stochastic version (mini-batches of measurements) helps.
- SPSA (Simultaneous Perturbation Stochastic Approximation): Approximates the gradient using only 2 measurements (regardless of the number of parameters) by simultaneously perturbing all parameters randomly. Particularly useful when measurement cost is the bottleneck. $$\hat{g}_k(\theta) = \frac{C(\theta+c_k\Delta_k) - C(\theta-c_k\Delta_k)}{2c_k}\Delta_k^{-1}$$ where $\Delta_k$ is a random $\pm 1$ vector and $c_k\to 0$.
- Quantum Natural Gradient (QNG): The quantum analogue of natural gradient descent, which uses the quantum geometric tensor (related to the Fubini–Study metric on the space of quantum states) as a preconditioning matrix: $$\theta_{t+1} = \theta_t - \eta \,g^{-1}(\theta_t)\,\nabla C(\theta_t)$$ where $g_{ij} = \text{Re}\left[\langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle\right]$ is the quantum Fisher information metric. QNG is more expensive per step but often converges in far fewer iterations.
- Adam / RMSProp: Adaptive moment-based optimizers from classical ML, widely used in practice. Adam maintains moving averages of gradients and squared gradients, providing adaptive per-parameter learning rates.
3 Variational Quantum Eigensolver (VQE)
3.1 Quantum Chemistry Motivation
Imagine you are designing a new battery. To make it hold more charge, you need to know which molecules are the stablest. This depends on the Ground State Energy — the "comfiest" way for electrons to sit inside a molecule.
Classical computers can guess this for simple molecules, but as soon as the electrons start interacting strongly (like in a complex protein or a new metal alloy), the classical math becomes impossible. VQE is our quantum way of finding that "comfiest" energy level.
To find the ground state, we need to solve for the energy of this equation:
What do these terms mean?
- $a_i^\dagger, a_i$: These are the "creation" and "annihilation" operators (placing or removing an electron).
- $h_{pq}$: These are the energy costs calculated from classical physics (kinetic energy and nuclear attraction).
- The Hard Part: The second term (repulsion) is what makes this exponentially hard. Each electron's position depends on every other electron's position.
Why can't we just use a bigger supercomputer? Because every time you add one electron to your simulation, the complexity doubles.
For a small molecule like Water (H₂O), the Hilbert space already has $\binom{20}{10} \approx 184,756$ dimensions. For larger molecules, it quickly outpaces all atoms in the universe.
For any Hermitian Hamiltonian $H$ with ground state energy $E_0$ and normalised trial state $\ket{\psi(\theta)}$:
$$E_0 \le \langle\psi(\theta)|H|\psi(\theta)\rangle \quad \forall\,\theta$$Equality holds if and only if $\ket{\psi(\theta)}$ is the ground state $\ket{\psi_0}$. Therefore, minimising $\langle H \rangle_\theta = \langle\psi(\theta)|H|\psi(\theta)\rangle$ over $\theta$ provides an upper bound on $E_0$, and the optimal $\theta^*$ gives the best approximation to $\ket{\psi_0}$ within the variational manifold defined by the ansatz.
Proof of the variational principle (3 lines of linear algebra)
Since $H$ is Hermitian, it has an orthonormal eigenbasis $\{\ket{E_n}\}$ with $H\ket{E_n} = E_n\ket{E_n}$ and $E_0 \le E_1 \le E_2 \le \cdots$. Expand the trial state: $\ket{\psi} = \sum_n c_n\ket{E_n}$ with $\sum_n|c_n|^2 = 1$ (normalisation).
Now compute the expectation value: $$\langle\psi|H|\psi\rangle = \sum_n |c_n|^2 E_n$$ Since $E_n \ge E_0$ for all $n$: $$\sum_n |c_n|^2 E_n \ge \sum_n |c_n|^2 E_0 = E_0 \underbrace{\sum_n |c_n|^2}_{=\,1} = E_0 \quad \square$$
Equality holds iff $c_n = 0$ for all $n \ne 0$, i.e., $\ket{\psi} = \ket{E_0}$. This is why VQE works: by minimising $\langle H\rangle$ over the parameter space, we are guaranteed to approach the true ground state energy from above.
3.2 Fermion-to-Qubit Encodings
Qubits are spin-$1/2$ objects. Fermions (electrons) obey anti-commutation relations. We need a mapping from fermionic operators to qubit operators. There are three main encodings:
Jordan–Wigner (JW) Transform
Maps fermion operators directly to Pauli strings, preserving anti-commutation via a "string" of $Z$ operators: $$a_j^\dagger = \underbrace{Z\otimes\cdots\otimes Z}_{j-1}\otimes\frac{X-iY}{2}\otimes I\otimes\cdots$$
Pro: Intuitive, locality of interactions.
Con:
Operators become non-local (long Pauli strings) for high-index orbitals; bad for
hardware with limited connectivity.
Bravyi–Kitaev (BK) Transform
A recursive encoding that balances locality between occupation numbers and parity information. Pauli string length: $O(\log N)$ instead of $O(N)$ for JW.
Pro: More hardware-efficient (shorter circuits).
Con:
More complex to implement and understand.
Parity Mapping
Stores partial parity sums instead of occupation numbers, allowing two-qubit reduction via symmetry ($Z_2$ symmetries of the Hamiltonian). Reduces qubit count by 2.
Pro: Minimal qubit count.
Con: Less
intuitive; requires careful symmetry analysis.
3.3 The VQE Hybrid Loop — Step by Step
- Hartree-Fock (HF) Initial State: Prepare the reference state $\ket{\text{HF}}$ (a product state — the quantum chemistry starting approximation) by initialising certain qubits to $\ket{1}$ (representing occupied orbitals) and others to $\ket{0}$ (unoccupied).
- Apply Ansatz: Run the parameterised circuit $U(\theta)$ to prepare $\ket{\psi(\theta)} = U(\theta)\ket{\text{HF}}$.
- Pauli Decomposition: The Hamiltonian $H = \sum_k c_k P_k$ where each $P_k$ is a tensor product of Pauli operators. Measure $\langle P_k\rangle = \langle\psi(\theta)|P_k|\psi(\theta)\rangle$ for each $k$ by appropriate basis rotations.
- Estimate Energy: $E(\theta) = \sum_k c_k \langle P_k\rangle$.
- Compute Gradient: Apply the parameter-shift rule for each $\theta_i$: run two shifted circuits, compute $\partial E / \partial \theta_i$.
- Classical Update: Update $\theta$ using chosen optimizer (Adam, SPSA, QNG).
- Repeat until convergence ($|\Delta E| < \epsilon$).
from qiskit_nature.second_q.drivers import PySCFDriver
from qiskit_nature.second_q.mappers import JordanWignerMapper
from qiskit_nature.second_q.circuit.library import UCCSD, HartreeFock
from qiskit.algorithms.minimum_eigensolvers import VQE
from qiskit.algorithms.optimizers import SPSA
from qiskit.primitives import Estimator
# 1. Define the molecule
driver = PySCFDriver(atom='H .0 .0 .0; H .0 .0 0.735',
basis='sto3g')
problem = driver.run()
# 2. Map to qubit Hamiltonian (Jordan-Wigner)
mapper = JordanWignerMapper()
qubit_op = mapper.map(problem.second_q_ops()[0])
# 3. Set up ansatz and HF reference state
num_particles = problem.num_particles
num_spatial_orbitals = problem.num_spatial_orbitals
hf_state = HartreeFock(num_spatial_orbitals, num_particles, mapper)
ansatz = UCCSD(num_spatial_orbitals, num_particles, mapper,
initial_state=hf_state)
# 4. Classical optimizer
optimizer = SPSA(maxiter=300)
# 5. Run VQE
estimator = Estimator()
vqe = VQE(estimator, ansatz, optimizer)
result = vqe.compute_minimum_eigenvalue(qubit_op)
print(f"VQE ground state energy: {result.eigenvalue:.6f} Ha")
print(f"Exact ground state energy (FCI): -1.137270 Ha")
print(f"Error: {abs(result.eigenvalue - (-1.137270))*1000:.2f} mHa")
3.4 VQE Cost Landscape Visualizer
For a single-parameter VQE (e.g., $H_2$ with UCCSD giving one free parameter $\theta$), the cost function $E(\theta) = \langle\psi(\theta)|H|\psi(\theta)\rangle$ has a single-valley landscape. Below: drag to explore how the energy depends on $\theta$.
4 NISQ Challenges
4.1 Barren Plateaus — The Gradient Vanishing Problem
Imagine you are on a quest to find the lowest valley in a mountain range, but the entire world is covered in a thick, infinite mist.
- Everywhere you look, the ground feels perfectly flat.
- There might be a deep valley just 10 feet away, but because the slope is so tiny, your "compass" (the gradient) doesn't move.
In quantum computing, as you add more qubits, the "mist" gets thicker. The gradient becomes exponentially small ($1/2^n$). This is the Barren Plateau—the optimizer gets lost because it can't feel which way is "down."
Geometrically: the cost function is like a nearly flat landscape with a tiny, exponentially deep well at the minimum. With exponentially small gradients, the optimizer cannot tell which direction to step.
Why does this happen? A sufficiently deep random parameterised circuit generates states that are, for all practical purposes, uniformly random in the exponentially large Hilbert space. (Technically, the circuit forms a unitary 2-design — it mimics drawing a unitary matrix uniformly at random from the group $U(2^n)$. The uniform distribution over this group is called the Haar measure, so the output state is called "Haar-random".) In such a uniformly random state, the expectation value of any observable is essentially constant ($\approx \tr(H)/2^n$) almost everywhere, and fluctuations are exponentially suppressed by the Hilbert space dimension $d=2^n$. In plain terms: after enough random rotations, you're equally likely to be anywhere in Hilbert space, so the landscape looks flat from every direction.
- Layer-wise training: Train one layer at a time, fixing previously trained parameters. Avoids the exponential landscape of the full circuit.
- Problem-inspired ansatz: Use physically motivated circuits that don't span the full unitary group. Restricts search to a relevant subspace.
- Local cost functions: Use cost functions that depend on local (few-qubit) observables rather than global ones. Local costs avoid barren plateaus for shallow circuits.
- Warm starting: Initialise parameters near the Hartree-Fock or classically computed solution, rather than randomly.
- Quantum natural gradient: Better-conditioned descent in the space of quantum states (uses Fisher information metric).
Even without the expressibility/random-circuit issue, hardware noise introduces its own distinct barren plateau.
View mathematical details of the decay
Practical implication: For a noise rate of $\epsilon=0.005$ (0.5%), $n=20$ qubits, and $L=10$ layers: gradient variance is suppressed by $e^{-0.005 \times 20 \times 10} = e^{-1} \approx 0.37$. At $L=100$: suppressed by $e^{-10} \approx 4.5\times10^{-5}$. This is why VQA circuits must be kept very shallow — not just for hardware depth constraints, but to preserve non-zero gradients. Error mitigation (ZNE, PEC) partially compensates, but cannot fully overcome the noise-induced barren plateau.
4.2 Interactive VQA Optimizer Simulator
The simulator below lets you run gradient descent on a toy cost landscape $C(\theta) = \cos\theta + 0.3\sin 2\theta$ — representative of a single-parameter VQE. Adjust the learning rate and noise level (simulating shot noise on gradient estimates via the parameter-shift rule) and watch the optimizer trajectory.
4.3 Measurement Efficiency
A Hamiltonian with $N_s$ spin-orbitals decomposes into $O(N_s^4)$ Pauli terms after Jordan-Wigner encoding. For $N_s = 100$: that's $10^8$ Pauli terms, each requiring many shots (measurements) to estimate. This is a severe practical bottleneck.
Pauli Grouping
Commuting Pauli operators can be measured simultaneously (in the same circuit run). Group the Hamiltonian terms into sets of mutually commuting Paulis. Reduces measurement cost from $O(N_s^4)$ to $O(N_s^3)$ or better with clever grouping.
Shadow Tomography
A randomised protocol (Huang et al. 2020) that estimates $M$ expectation values from $O(\log M)$ random Clifford measurements. Exponential improvement in the number of observables vs standard tomography.
Shot Budget
Each Pauli expectation value requires $O(1/\epsilon^2)$ measurements to achieve precision $\epsilon$. For chemical accuracy ($\epsilon = 1.6\times 10^{-3}$ Ha), each term needs ~$4\times 10^5$ shots. Total: astronomically many shots for large molecules.
4.4 Hardware Noise and Error Mitigation
Unlike error correction (which removes errors entirely but requires many physical qubits per logical qubit), error mitigation reduces the impact of noise on expectation values without full error correction overhead. These are post-processing techniques applied to the measurement outcomes.
Zero-Noise Extrapolation (ZNE)
Run the circuit at amplified noise levels $\lambda_0 < \lambda_1 < \lambda_2$ (by stretching gate times or adding intentional noise), measure $E(\lambda_0), E(\lambda_1), E(\lambda_2)$, then extrapolate to zero noise $\lambda\to 0$. Linear extrapolation: $E(0) \approx \frac{\lambda_1 E(\lambda_0) - \lambda_0 E(\lambda_1)}{\lambda_1 - \lambda_0}$.
Probabilistic Error Cancellation (PEC)
Decompose the ideal (noise-free) operation as a quasi-probabilistic linear combination of noisy operations. Sample circuits from this decomposition and average the results (with appropriate signs). Overhead: $e^{2\gamma T}$ where $\gamma$ is the noise rate and $T$ is circuit depth — exponential in depth, but works for shallow circuits.
Readout Error Mitigation
Characterise the readout error matrix $\mathcal{M}$ (probability of reading $j$ given prepared state $i$) by preparing known states. Invert $\mathcal{M}$ to correct measured probability distributions. This is a simple and widely used technique.
5 Quantum Generative Adversarial Networks (qGANs)
5.1 Classical GAN Recap
If you haven't encountered machine learning before: a neural network is a parameterised function (think of it like a parameterised quantum circuit, but classical — layers of simple operations with trainable weights). A GAN (Generative Adversarial Network) trains two such networks in a game:
- The generator tries to produce fake data that looks real (like a counterfeiter)
- The discriminator tries to tell real data from fake (like a detective)
They improve against each other until the generator's fakes are indistinguishable from real data. The quantum version replaces the classical generator with a quantum circuit — you don't need deep ML knowledge to follow the quantum part.
More formally, a classical GAN (Goodfellow et al., 2014) consists of two neural networks trained adversarially:
- Generator $G$: Takes random noise $z \sim p_z$ and outputs fake data $G(z)$ that tries to imitate the true data distribution $p_\text{data}$.
- Discriminator $D$: Takes a sample (real or fake) and outputs a probability that it's real. Tries to distinguish $p_\text{data}$ from $G(z)$.
They are trained with a minimax objective:
$$\min_G \max_D \; \mathbb{E}_{x\sim p_\text{data}}[\log D(x)] + \mathbb{E}_{z\sim p_z}[\log(1-D(G(z)))]$$At equilibrium (so-called Nash equilibrium — neither player can improve by changing strategy), the generator perfectly replicates the data distribution and the discriminator cannot distinguish real from fake ($D = 1/2$ everywhere).
5.2 Quantum GAN Structure
A qGAN (Zoufal et al. 2019, IBM Research) replaces the classical generator with a parameterised quantum circuit — the quantum generator. The discriminator can be classical or quantum (most practical implementations use a classical neural network discriminator).
- Quantum Generator: A VQA circuit $G(\theta)$ that prepares a quantum state $\ket{\psi(\theta)}$. Measuring this state produces samples from a distribution $p_G(x;\theta)$ over bitstrings $x \in \{0,1\}^n$.
- Classical Discriminator $D_\phi$: A neural network (parameterised by $\phi$) that takes a sample (from either $p_\text{data}$ or $p_G$) and outputs a probability that it's real.
The training alternates:
- Fix $\theta$, update $\phi$ to maximise $D$'s ability to discriminate (classical gradient ascent on the discriminator objective).
- Fix $\phi$, update $\theta$ (using parameter-shift rule on the quantum circuit) to minimise the discriminator's success — i.e., make the generator's samples indistinguishable from real data.
The hypothesis is that quantum circuits can represent certain distributions (those arising from quantum physics — e.g., quantum state distributions, many-body correlation functions) more efficiently than classical neural networks. The quantum generator's distribution is inherently quantum mechanical; correlations can be exponentially complex to represent classically. Additionally, the quantum generator's Hilbert space grows exponentially with qubits — a 20-qubit generator represents distributions over $2^{20} \approx 10^6$ dimensional space with only $O(20)$ parameters.
5.3 Financial Application: Option Pricing
An option is a financial contract that gives you the right (but not the obligation) to buy an asset (say, a stock) at a pre-agreed price $K$ (the strike price) at some future date $T$. If the stock ends up worth more than $K$, you exercise the option and profit; if not, you walk away.
The big question: how much should this contract cost today? The answer depends on how likely the stock is to be worth more than $K$, which requires computing an expectation value over many possible future price scenarios. Classically, this is done via Monte Carlo simulation — generating millions of random price paths and averaging. The quantum speedup comes from replacing this sampling with quantum amplitude estimation, which achieves the same result with quadratically fewer samples.
More precisely: a European call option on an asset $S$ pays $\max(S_T - K, 0)$ at expiry $T$, where $K$ is the strike price and $S_T$ is the asset price at expiry. The option price is the discounted expected payoff: $$C = e^{-rT}\,\mathbb{E}[{\max(S_T - K,\, 0)}]$$ where $r$ is the risk-free rate (the return you'd get from a "safe" investment like government bonds — it sets the baseline for discounting future cash flows to their present value).
In the qGAN approach (Zoufal et al. 2019, Stamatopoulos et al. 2020):
- Train a qGAN to reproduce the log-normal return distribution of the underlying asset.
- Load the trained quantum generator's output state into an amplitude estimation circuit.
- Use Quantum Amplitude Estimation (QAE) — a quantum algorithm that estimates $\mathbb{E}[f(X)]$ for a quantum-prepared distribution $p(X)$ in $O(1/\epsilon)$ queries (vs $O(1/\epsilon^2)$ for Monte Carlo). This is a quadratic speedup.
from qiskit_finance.circuit.library import LogNormalDistribution
from qiskit_finance.applications.estimation import EuropeanCallPricing
from qiskit.algorithms import IterativeAmplitudeEstimation, EstimationProblem
from qiskit.primitives import Sampler
# Asset model parameters (log-normal distribution)
S = 2.0 # spot price
vol = 0.4 # volatility
r = 0.05 # risk-free rate
T = 40/365 # time to expiry (40 days)
K = 2.0 # strike price
# Encode log-normal distribution into quantum circuit
num_qubits = [3] # 3 qubits → 8 discretisation points
mu = ((r - 0.5*vol**2)*T + np.log(S))
sigma = vol*np.sqrt(T)
mean = np.exp(mu + sigma**2/2)
var = (np.exp(sigma**2)-1)*np.exp(2*mu + sigma**2)
low, high = 0, 3*np.sqrt(var) + mean
distribution = LogNormalDistribution(num_qubits, mu=mu, sigma=sigma**2,
bounds=(low, high))
# Set up option pricing circuit
european_call = EuropeanCallPricing(num_qubits,
strike_price=K,
rescaling_factor=0.25,
bounds=(low, high),
uncertainty_model=distribution)
# Quantum Amplitude Estimation for the expected payoff
iae = IterativeAmplitudeEstimation(epsilon_target=0.01, alpha=0.05,
sampler=Sampler())
problem = EstimationProblem(state_preparation=european_call,
objective_qubits=[3],
post_processing=european_call.post_processing)
result = iae.estimate(problem)
price = result.estimation_processed * np.exp(-r*T)
print(f"Estimated option price: {price:.4f}")
5.4 QAOA and the Full VQA Family
The Quantum Approximate Optimisation Algorithm (QAOA), introduced by Farhi, Goldstone, and Gutmann (2014), is the leading VQA for combinatorial optimisation problems — finding the assignment of binary variables that minimises a cost function $C(x)$, $x \in \{0,1\}^n$.
QAOA alternates between two unitaries for $p$ layers:
- Cost unitary: $U_C(\gamma) = e^{-i\gamma C}$ — applies the cost Hamiltonian (encodes the problem)
- Mixing unitary: $U_B(\beta) = e^{-i\beta B}$ where $B = \sum_j X_j$ — explores the solution space
The $2p$ parameters $(\gamma_1,\ldots,\gamma_p,\beta_1,\ldots,\beta_p)$ are classically optimised to maximise $\langle C\rangle$. At $p\to\infty$, QAOA converges to the true optimal solution via adiabatic evolution. For finite $p$, it produces an approximation with a provable approximation ratio for some problems (e.g., MaxCut on 3-regular graphs: ratio $\ge 0.6924$ at $p=1$).
MaxCut: given a graph $G=(V,E)$, partition vertices into two sets to maximise the number of edges between them. Cost Hamiltonian: $$C = \frac{1}{2}\sum_{(i,j)\in E}(I - Z_i Z_j)$$
But why does this formula count cut edges? Let's check. Each vertex $i$ is assigned to partition 0 or 1, which we encode as $Z_i = +1$ (for $\ket{0}$) or $Z_i = -1$ (for $\ket{1}$). Then for each edge $(i,j)$:
- Same partition ($Z_i = Z_j$): $Z_iZ_j = +1$, so $\frac{1}{2}(I-Z_iZ_j) = \frac{1}{2}(1-1) = 0$ — edge is not cut.
- Different partitions ($Z_i \neq Z_j$): $Z_iZ_j = -1$, so $\frac{1}{2}(I-Z_iZ_j) = \frac{1}{2}(1-(-1)) = 1$ — edge is cut. ✓
So $C$ simply counts the number of edges between the two partitions. Maximising $C$ = maximising the cut! QAOA applies $e^{-i\gamma Z_i Z_j/2}$ for each edge (a ZZ-rotation) and $e^{-i\beta X_j}$ for each vertex (an X-rotation). After measurement, the bit string $x \in \{0,1\}^n$ is a candidate cut — the QAOA distribution concentrates on high-quality cuts.
The VQA family covers a wide range of problem domains:
| Algorithm | Problem Domain | Ansatz/Structure | Key Reference |
|---|---|---|---|
| VQE | Quantum chemistry, materials science | UCCSD or hardware-efficient; energy expectation | Peruzzo et al. 2014 |
| QAOA | Combinatorial optimisation (MaxCut, TSP, scheduling) | Alternating cost $U_C$ + mixer $U_B$, $p$ layers | Farhi et al. 2014 |
| qGAN | Generative modelling, Monte Carlo, finance | Quantum generator + classical discriminator | Zoufal et al. 2019 |
| VQC (Classifier) | Quantum machine learning, classification | Data re-uploading layers + measurement for labels | Pérez-Salinas et al. 2020 |
| Quantum CNN | Image classification, pattern recognition | Local unitary layers + pooling (trace-out) layers | Cong et al. 2019 |
| VQLS | Linear systems $Ax=b$ | Hadamard test + variational preparation of solution | Bravo-Prieto et al. 2023 |
6 Quantum Error Correction (QEC)
Variational algorithms (VQE, QAOA, qGANs) try to work with noise — using shallow circuits and error mitigation to reduce its effects. But for fault-tolerant quantum computing (Shor's algorithm, full quantum simulation), we need to eliminate errors entirely — or rather, suppress them to arbitrarily low levels using quantum error correction.
- No-Cloning Theorem: Classical repetition codes store 3 copies of each bit. We cannot copy an arbitrary quantum state — $\ket{\psi}\to\ket{\psi}\ket{\psi}\ket{\psi}$ is forbidden by unitarity — so the naive strategy is illegal from the start.
- Continuum of Errors: Classical bits can only flip ($0\leftrightarrow1$). A qubit can undergo any rotation $R_\mathbf{n}(\epsilon)$ for arbitrarily small $\epsilon$ — infinitely many distinct errors. We need a scheme that handles this continuous set, not just two outcomes.
- Measurement Destroys Superposition: Classical error correction reads the codeword to find errors. Directly measuring a qubit collapses $\alpha\ket{0}+\beta\ket{1}$ — we lose $\alpha$ and $\beta$ forever. We need to extract error information without ever learning the encoded quantum data.
QEC solves all three: (1) use entanglement instead of copying, (2) rely on the fact that syndrome measurement discretises continuous errors into a correctable Pauli set, and (3) measure only stabilizer operators that commute with the logical subspace, revealing errors but nothing about $\alpha,\beta$.
6.1 Quantum Noise Model — Kraus Operators
Any physical noise process on a qubit can be described as a quantum channel — a completely positive, trace-preserving (CPTP) map on the density matrix $\rho$, expressed in terms of Kraus operators $\{K_i\}$: $$\mathcal{E}(\rho) = \sum_i K_i \rho K_i^\dagger, \qquad \sum_i K_i^\dagger K_i = I$$
| Channel | Kraus Operators | Effect on Bloch vector | Physical source |
|---|---|---|---|
| Bit-flip (prob $p$) | $K_0=\sqrt{1-p}\,I,\; K_1=\sqrt{p}\,X$ | $\ket{0}\leftrightarrow\ket{1}$ with prob $p$; $x\to(1-2p)x$ | Charge noise, gate errors |
| Phase-flip (prob $p$) | $K_0=\sqrt{1-p}\,I,\; K_1=\sqrt{p}\,Z$ | $\ket{+}\leftrightarrow\ket{-}$ with prob $p$; $y,z$ contracted | Dephasing, $T_2$ processes |
| Depolarizing (prob $p$) | $K_0=\sqrt{1-p}\,I,\; K_{1,2,3}=\sqrt{p/3}\,\{X,Y,Z\}$ | Bloch sphere shrinks by $(1-4p/3)$; state → $I/2$ as $p\to3/4$ | Generic qubit errors; worst case |
| Amplitude damping ($T_1$ decay) | $K_0=\bigl(\begin{smallmatrix}1&0\\0&\sqrt{1-\gamma}\end{smallmatrix}\bigr),\; K_1=\bigl(\begin{smallmatrix}0&\sqrt{\gamma}\\0&0\end{smallmatrix}\bigr)$ | $\ket{1}\to\ket{0}$ with prob $\gamma=1-e^{-t/T_1}$; north pole stable | Spontaneous emission, $T_1$ relaxation |
| Phase damping ($T_2$ dephasing) | $K_0=\sqrt{1-p/2}\,I,\; K_1=\sqrt{p/2}\,Z$ | Off-diagonal elements of $\rho$ decay; $z$-axis stable | Low-frequency noise, $T_\phi$ processes |
6.2 Pauli Decomposition of Errors
Any single-qubit error can be decomposed in the Pauli basis $\{I, X, Y, Z\}$:
- $X$ error (bit flip): $\ket{0}\leftrightarrow\ket{1}$ — the quantum analogue of a classical bit flip
- $Z$ error (phase flip): $\ket{0}\to\ket{0}$, $\ket{1}\to-\ket{1}$ — no classical analogue
- $Y = iXZ$ error: simultaneously a bit flip and a phase flip
- $I$ error: no error
The most common NISQ noise model is the depolarizing channel: $$\mathcal{E}(\rho) = (1-p)\rho + \frac{p}{3}(X\rho X + Y\rho Y + Z\rho Z)$$ With probability $1-p$, nothing happens. With probability $p/3$ each, an $X$, $Y$, or $Z$ error occurs.
6.2 Quantum Hamming Bound
An $[[n, k, d]]$ quantum code encodes $k$ logical qubits into $n$ physical qubits and can correct any error on up to $t = \lfloor(d-1)/2\rfloor$ qubits. The quantum Hamming bound requires: $$2^n \ge 2^k \sum_{j=0}^{t}\binom{n}{j}3^j$$ For $k=1$ logical qubit and $t=1$ (single-qubit error correction), the minimum is $n=5$ physical qubits. The 5-qubit perfect code saturates this bound exactly.
However, the 5-qubit code is complex to implement. We first study the simpler 3-qubit bit-flip code, which corrects only $X$ errors (bit flips) but not $Z$ errors (phase flips). The full correction of both requires the 9-qubit Shor code or the 7-qubit Steane code.
6.3 The 3-Qubit Bit-Flip Code — Complete Analysis
Encoding: A single logical qubit $\alpha\ket{0}+\beta\ket{1}$ is encoded into 3 physical qubits: $$\alpha\ket{0}_L + \beta\ket{1}_L \equiv \alpha\ket{000} + \beta\ket{111}$$ The logical basis states are $\ket{0}_L = \ket{000}$ and $\ket{1}_L = \ket{111}$.
Encoding circuit: Start with $\alpha\ket{0}+\beta\ket{1}$ in qubit 1, and $\ket{0}$ in qubits 2 and 3. Apply CNOT(1→2) and CNOT(1→3): $$(\alpha\ket{0}+\beta\ket{1})\ket{00} \xrightarrow{\text{CNOT}_{12}} (\alpha\ket{00}+\beta\ket{11})\ket{0} \xrightarrow{\text{CNOT}_{13}} \alpha\ket{000}+\beta\ket{111}$$
Error model: Each qubit independently suffers a bit-flip ($X$) error with probability $p$.
We cannot measure the logical qubits directly (that would collapse $\alpha\ket{000}+\beta\ket{111}$). Instead, we measure two stabilizers: $$Z_1Z_2 \quad\text{and}\quad Z_2Z_3$$ These operators have eigenvalues $\pm 1$ and commute with the logical operators.
A state with no error: $Z_1Z_2(\alpha\ket{000}+\beta\ket{111}) = \alpha\ket{000}+\beta\ket{111}$, eigenvalue $+1$.
A state with error on qubit 2: $\alpha\ket{010}+\beta\ket{101}$. $Z_1Z_2$ gives eigenvalue $-1$ (they differ). $Z_2Z_3$ gives eigenvalue $-1$ (they differ). Syndrome: $(-1,-1)$ → error on qubit 2.
| Error | State | Syndrome Z₁Z₂ | Syndrome Z₂Z₃ | Diagnosis | Correction |
|---|---|---|---|---|---|
| None ($I$) | α|000⟩+β|111⟩ | +1 | +1 | No error | Nothing |
| $X_1$ | α|100⟩+β|011⟩ | −1 | +1 | Qubit 1 flipped | $X_1$ |
| $X_2$ | α|010⟩+β|101⟩ | −1 | −1 | Qubit 2 flipped | $X_2$ |
| $X_3$ | α|001⟩+β|110⟩ | +1 | −1 | Qubit 3 flipped | $X_3$ |
In the classical world, a bit only flips ($0 \leftrightarrow 1$). In the quantum world, an electron can drift by any tiny amount. How can we fix an infinite number of possible errors?
The miracle of Syndrome Measurement is that it forces the "drift" into a digital choice. When we measure the syndrome:
- Either no error happens (99% of the time).
- Or a perfect bit-flip happens (1% of the time).
The quantum world digitalizes itself the moment we look! We don't have to fix a 0.001% drift; we only have to fix a 100% flip, which is easy once we know which qubit did it.
If each physical qubit has bit-flip error rate $p$ (independently), the 3-qubit code fails only when 2 or more qubits are simultaneously wrong: $$p_L = \binom{3}{2}p^2(1-p) + \binom{3}{3}p^3 = 3p^2 - 2p^3$$
For $p < 1/2$: $p_L = 3p^2(1-p) + p^3 \approx 3p^2 \ll p$. The code is beneficial whenever $p_L < p$, i.e., $3p^2 < p$, i.e., $p < 1/3$. For $p = 0.01$: $p_L \approx 3\times 10^{-4}$ — a 33× improvement in error rate. For $p = 0.001$: $p_L \approx 3\times 10^{-6}$ — a 333× improvement. The code is worse if $p > 1/3$ (the noise is so severe even two errors become likely).
6.4 Interactive Syndrome Demo
Choose the logical state and inject an error. The syndrome measurements diagnose which qubit was flipped, and the recovery operation restores the state — all without measuring the logical qubit directly.
from qiskit import QuantumCircuit, QuantumRegister, ClassicalRegister
def bit_flip_code_circuit(error_qubit=None):
"""
3-qubit bit-flip error correction code.
error_qubit: 0, 1, or 2 (which qubit to flip), or None.
Returns full encode → error → syndrome → correct circuit.
"""
# Registers
data = QuantumRegister(3, 'data')
ancilla = QuantumRegister(2, 'anc') # syndrome ancillas
c_syn = ClassicalRegister(2, 'syn') # syndrome measurements
qc = QuantumCircuit(data, ancilla, c_syn)
# ── Step 1: Prepare logical |+⟩_L (for testing)
qc.h(data[0]) # data[0] = α|0⟩ + β|1⟩
# ── Step 2: Encode into logical code space
qc.cx(data[0], data[1]) # |ψ⟩|0⟩ → |ψ⟩|ψ₀⟩
qc.cx(data[0], data[2]) # complete encoding: α|000⟩ + β|111⟩
qc.barrier()
# ── Step 3: Inject error (simulating noise)
if error_qubit is not None:
qc.x(data[error_qubit]) # bit flip on specified qubit
qc.barrier()
# ── Step 4: Syndrome measurement
# Measure Z₁Z₂: parity of data[0] and data[1]
qc.cx(data[0], ancilla[0])
qc.cx(data[1], ancilla[0])
# Measure Z₂Z₃: parity of data[1] and data[2]
qc.cx(data[1], ancilla[1])
qc.cx(data[2], ancilla[1])
qc.measure(ancilla, c_syn)
qc.barrier()
# ── Step 5: Classical feedback correction
# syn=01 → error on qubit 0; syn=11 → error on qubit 1; syn=10 → error on qubit 2
with qc.if_test((c_syn[0], 1)):
with qc.if_test((c_syn[1], 0)):
qc.x(data[0]) # syndrome 01 → flip qubit 0
with qc.if_test((c_syn[0], 1)):
with qc.if_test((c_syn[1], 1)):
qc.x(data[1]) # syndrome 11 → flip qubit 1
with qc.if_test((c_syn[0], 0)):
with qc.if_test((c_syn[1], 1)):
qc.x(data[2]) # syndrome 10 → flip qubit 2
return qc
# Test all error cases
for eq in [None, 0, 1, 2]:
qc = bit_flip_code_circuit(error_qubit=eq)
print(f"Error on qubit {eq}: {qc.depth()} gate depth")
6.5 The Phase-Flip Code and the Shor 9-Qubit Code
The 3-qubit bit-flip code corrects $X$ errors (bit flips) but is powerless against $Z$ errors (phase flips). Why? Because $\ket{000}$ and $\ket{111}$ are eigenstates of $Z_1Z_2$ and $Z_2Z_3$ — a $Z$ error doesn't change any parity syndrome.
To correct phase errors, we work in the $X$-basis ($\ket{+},\ket{-}$) instead of the $Z$-basis. Encode: $$\ket{0}_L = \ket{+++}, \qquad \ket{1}_L = \ket{---}$$ Encoding: apply $H$ to the first qubit, then two CNOTs (giving $\ket{+++}$ or $\ket{---}$), then $H$ on all three qubits.
Syndrome: measure $X_1X_2$ and $X_2X_3$. A $Z$ error on qubit $i$ flips the sign of that qubit in the $X$-basis, giving a detectable syndrome — exactly the same table as the bit-flip code, but with $X\leftrightarrow Z$.
Key insight: The phase-flip code corrects $Z$ errors but not $X$ errors. The bit-flip code corrects $X$ errors but not $Z$ errors. Can we combine them?
Peter Shor's 1995 code (the first complete quantum error correcting code) combines both codes — it protects against both $X$ and $Z$ errors on any single qubit, hence against any single-qubit error.
Encoding: $1$ logical qubit into $9$ physical qubits:
$$\ket{0}_L = \frac{(\ket{000}+\ket{111})^{\otimes 3}}{2\sqrt{2}}, \qquad \ket{1}_L = \frac{(\ket{000}-\ket{111})^{\otimes 3}}{2\sqrt{2}}$$The code has a nested structure: within each block of 3 qubits, the bit-flip code provides $X$-error protection. Across the three blocks, the $\pm$ sign encodes the logical qubit, providing $Z$-error protection (phase-flip code in the $X$-basis).
Performance: Corrects any single-qubit error ($X$, $Y$, or $Z$) on any of the 9 qubits. Total: $9$ physical qubits, $8$ stabilizer generators, $1$ logical qubit. Distance $d=3$.
Any single-qubit error $E$ can be decomposed as $E = e_0 I + e_1 X + e_2 Y + e_3 Z$. When the syndrome is measured, this superposition of errors collapses to exactly one of $\{I, X, Y, Z\}$ (with probabilities $|e_0|^2, |e_1|^2, |e_2|^2, |e_3|^2$). Since $Y = iXZ$, a code that corrects both $X$ and $Z$ automatically corrects $Y$ as well. Therefore, correcting these 4 discrete Pauli errors is sufficient to correct any physical error that can be decomposed in the Pauli basis — including small rotations, decoherence, crosstalk, and so on. This is the discretisation miracle: QEC converts an infinite-dimensional continuous error space into a finite, correctable discrete set.
The Shor code is not the most efficient (it uses 9 qubits; the [[5,1,3]] perfect code does the same with just 5), but it was the first to demonstrate that fault-tolerant quantum computation is theoretically possible — paving the way for the entire field of quantum error correction.
6.6 Fault Tolerance and the Threshold Theorem
If the physical error rate $p$ per gate is below a threshold value $p_\text{th}$ (typically $\sim 10^{-3}$ to $10^{-2}$ depending on the code and architecture), then arbitrarily accurate quantum computation can be performed using a fault-tolerant scheme with overhead polynomial in the circuit size and logarithmic in the desired error rate.
Concretely: to achieve logical error rate $p_L$ with a concatenated code of distance $d$, you need $k = \lceil \log_{1/p} (p_L) / \log d \rceil$ levels of concatenation. The total overhead in physical qubits and gates is polynomial in the circuit size and $\log(1/p_L)$.
Current leading architectures for fault-tolerant quantum computation:
- Surface Code: A 2D array of physical qubits with nearest-neighbor interactions only. Threshold $\sim 1\%$ — the highest known for practically implementable codes. Requires ~1000 physical qubits per logical qubit for $p_L \sim 10^{-15}$. Currently the frontrunner for superconducting qubit platforms (Google, IBM).
- Steane Code [[7,1,3]]: Encodes 1 logical qubit in 7 physical qubits with distance 3 (corrects any single-qubit error). A CSS (Calderbank-Shor-Steane) code — simpler to understand theoretically, but lower threshold (~$10^{-4}$).
- 5-Qubit Perfect Code [[5,1,3]]: The smallest code that corrects any single-qubit error. Uses 5 physical qubits; saturates the quantum Hamming bound.
6.7 The Roadmap to Fault Tolerance
The path from today's NISQ machines to a truly useful quantum computer involves one massive hurdle: the Physical-to-Logical Ratio.
To run Shor's algorithm for RSA-2048, we need about 4,000 logical qubits. But because physical qubits are noisy, we can't just use 4,000 superconducting chips.
Currently, the Surface Code (the gold standard for error correction) requires roughly 1,000 physical qubits to create 1 logical qubit that is reliable enough for long calculations.
This is why hardware companies (IBM, Google, IonQ) are so focused on scaling. We aren't just building a computer; we're building a redundant factory where 99.9% of the components are there just to watch for errors in the other 0.1%.
In late 2024, Google announced results with their Willow processor demonstrating that error rates decrease exponentially as the surface code distance increases — a key requirement for scalable fault tolerance. Specifically, going from distance-3 to distance-5 to distance-7 surface codes, the logical error rate dropped by a factor of $\sim 2.14\times$ per distance increment — consistent with operating below the surface code threshold. This is the first experimental demonstration of a quantum processor operating in the "below threshold" regime for a topological quantum error correcting code.