Below Threshold: What Google's Willow Chip Actually Proved

On December 9, 2024, Google published a paper in Nature announcing results from their 105-qubit Willow chip. The press release led with the number that immediately spread across every technology news outlet on the planet: a computation that would take today’s fastest classical supercomputers $10^{25}$ years. For reference, the age of the universe is roughly $1.4 \times 10^{10}$ years, which makes the claimed classical runtime about $10^{15}$ times longer than the universe has existed.

Impressive. Also: almost entirely a distraction from what actually matters.

The real result is buried in the middle of the paper, requires knowing what the threshold theorem says to appreciate, and generated a fraction of the press coverage. Google’s Willow chip demonstrated quantum error correction operating below the threshold for the first time in a superconducting processor. This is the result that will matter in twenty years. The $10^{25}$-year number will have been forgotten by then — or quietly revised as classical simulation algorithms improve.

Let me explain why the threshold result is the one worth understanding.

Why quantum errors are a fundamental obstacle

When I first encountered quantum error correction as a student, my reaction was something like: surely you just copy the qubit a few times and take a majority vote, the way classical error correction works. This is wrong, and the reason it is wrong is elegant.

Quantum computation depends on superposition and entanglement. A quantum state like

$$|\psi\rangle = \alpha|0\rangle + \beta|1\rangle$$

encodes information in the complex amplitudes $\alpha$ and $\beta$. But real physical systems do not exist in isolation. They interact with their environment — thermal fluctuations, stray electromagnetic fields, unwanted couplings to neighbouring systems. This coupling causes decoherence: the quantum state entangles with environmental degrees of freedom, and the superposition is effectively destroyed as the relative phase between $|0\rangle$ and $|1\rangle$ becomes random. For superconducting qubits of the type used in Willow, coherence times are on the order of microseconds to hundreds of microseconds. Every gate operation takes tens to hundreds of nanoseconds. After a few hundred gate operations, the accumulated decoherence and gate errors have corrupted the quantum state beyond use.

The classical remedy — store each bit three times and take the majority vote — fails for a fundamental reason. The no-cloning theorem states that there is no unitary operation $U$ such that

$$U(|\psi\rangle \otimes |0\rangle) = |\psi\rangle \otimes |\psi\rangle$$

for all states $|\psi\rangle$. The proof is a one-liner: unitarity is linear, so if $U$ correctly copies $|0\rangle$ and $|1\rangle$, it maps $\frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)|0\rangle$ to $\frac{1}{\sqrt{2}}(|00\rangle + |11\rangle)$, which is an entangled state, not $(|0\rangle + |1\rangle)^{\otimes 2}/\sqrt{2}$. You cannot copy an arbitrary quantum state. Classical redundancy, applied naively, is forbidden by the linearity of quantum mechanics.

So quantum error correction requires a genuinely different idea.

The key insight: measure the error, not the state

The idea that unlocks quantum error correction is this: you can extract information about which error occurred without learning anything about the logical state the qubit is encoding.

Peter Shor showed in 1995 (Shor, 1995) that one logical qubit can be protected using 9 physical qubits. The construction is worth understanding in some detail because it reveals the structure that all subsequent codes share.

Bit-flip protection

First, consider only bit-flip errors: physical processes that flip $|0\rangle \leftrightarrow |1\rangle$ with probability $p$. Encode:

$$|0\rangle_L = |000\rangle, \quad |1\rangle_L = |111\rangle$$

If qubit 1 flips, the state becomes $\alpha|100\rangle + \beta|011\rangle$. We detect this by measuring the syndrome operators $Z_1 Z_2$ and $Z_2 Z_3$ — products of Pauli-Z operators. The eigenvalue of $Z_1 Z_2$ is $+1$ if qubits 1 and 2 have the same value, $-1$ if they differ. Crucially, measuring $Z_1 Z_2$ does not collapse the logical state: the operator commutes with the logical qubit, so the measurement tells you about the error without revealing $\alpha$ or $\beta$. This is the central trick of quantum error correction.

The syndrome outcome $(Z_1 Z_2, Z_2 Z_3) = (-1, +1)$ tells you qubit 1 flipped; apply $X_1$ to fix it. The logical state is restored.

Phase-flip protection

Phase errors flip the relative sign: $|+\rangle \leftrightarrow |-\rangle$ where $|\pm\rangle = (|0\rangle \pm |1\rangle)/\sqrt{2}$. Apply a Hadamard to rotate to the X basis, where phase flips look like bit flips, and apply the same three-qubit code.

Concatenation: the Shor 9-qubit code

Protect each of the three qubits in the phase-flip code with its own bit-flip code. The result: 9 physical qubits per logical qubit, protected against any single-qubit error. If the physical error rate is $p$, the logical error rate for the 9-qubit code scales as $p^2$ (you need at least two errors to fool the code), rather than $p$. Already an improvement.

The threshold theorem

Shor’s code is a proof of principle. The threshold theorem, established independently by Aharonov and Ben-Or (1997) and Knill, Laflamme, and Zurek (1998), shows something much stronger. For a level-$k$ concatenated code — a code of codes of codes, $k$ levels deep — the logical error rate scales as

$$p_L \sim \left(\frac{p}{p_{\text{th}}}\right)^{2^k}$$

where $p_{\text{th}}$ is the threshold: a critical physical error rate that depends on the code family. Below threshold ($p < p_{\text{th}}$), adding more code levels exponentially suppresses logical errors. Above threshold ($p > p_{\text{th}}$), more qubits make things worse — each additional physical qubit introduces more errors than the code can correct.

The threshold is not a continuous improvement. It is a phase transition. Below it, the system is in the error-correctable regime; above it, it is not. Getting a physical quantum processor into the sub-threshold regime is a necessary condition for scalable fault-tolerant quantum computing.

Surface codes: the practical path

Concatenated codes require encoding at multiple levels, which multiplies overhead rapidly. Surface codes, analysed in detail by Kitaev (1997) and comprehensively by Dennis, Kitaev, Landahl, and Preskill (2002) (Dennis et al., 2002), offer a more practical architecture and have become the leading candidate for fault-tolerant quantum computing.

The geometry

A distance-$d$ surface code arranges $d^2$ physical data qubits on the vertices of a $d \times d$ grid, interleaved with $(d^2 - 1)$ ancilla qubits used for syndrome measurement. The stabilizers are products of $Z$ operators on groups of four data qubits surrounding each face (detecting bit-flip errors) and products of $X$ operators on groups of four data qubits surrounding each vertex (detecting phase-flip errors). Measuring these stabilizers without disturbing the logical qubit is the workhorse operation of the code.

The code distance $d$ is the minimum number of physical errors required to produce an undetectable logical error. An error chain of length $d$ connecting opposite boundaries of the code is the smallest error pattern that corrupts the logical qubit without triggering a syndrome. Larger $d$: longer chains required, lower logical error rates.

The scaling

The logical error rate per error-correction round for a surface code is approximately (Fowler et al., 2012):

$$p_L \approx A \left(\frac{p}{p_{\text{th}}}\right)^{\lfloor (d+1)/2 \rfloor}$$

where $p_{\text{th}} \approx 1\%$ for surface codes (established by threshold simulations, and robust across reasonable noise models), $p$ is the physical error rate for two-qubit gates, $A$ is a code-specific constant of order unity, and $\lfloor (d+1)/2 \rfloor$ is the exponent that grows with code size.

This is the critical expression. For fixed physical error rate $p < p_{\text{th}}$:

Increasing $d$ by 2 (one step in the code distance ladder) increases the exponent by 1 — a multiplicative suppression of $p/p_{\text{th}}$.
If $p/p_{\text{th}} = 0.18$, each distance step multiplies the logical error rate by roughly 0.18 — almost a factor of 6 suppression per step.

For $p > p_{\text{th}}$: increasing $d$ makes $p_L$ worse. The code spends more effort chasing more errors than it eliminates.

The behaviour below versus above threshold is qualitatively different. Exponential suppression versus exponential growth. The dividing line is $p = p_{\text{th}}$.

What Willow demonstrated

Google’s Willow chip (Acharya et al., 2024) is a 105-qubit superconducting processor. The physical two-qubit gate error rate achieved is approximately $p \approx 0.18\%$ — well below the surface code threshold of $\sim 1\%$.

The experiment is direct. Implement surface codes at distances $d = 3$, $d = 5$, and $d = 7$, corresponding to 9, 25, and 49 physical data qubits per logical qubit (plus ancillas). Measure the logical error rate per error correction cycle for each code size.

The result:

Code distance $d$	Physical qubits	Logical error rate per cycle
3	17	$\approx 0.143\%$
5	49	$\approx 0.067\%$
7	97	$\approx 0.032\%$

Each step in code distance approximately halves the logical error rate. The suppression factor of $\sim 2.1$ per distance step is consistent with the theoretical prediction for a physical error rate of $0.18\%$ and a threshold around $1\%$: the ratio $p/p_{\text{th}} \approx 0.18$ predicts suppression factors of $(p/p_{\text{th}})^1 \approx 0.18$ per exponent increment, which at the level of distinguishing $d=3,5,7$ codes through a factor of roughly 2 per step matches the observed data.

This is the first time a superconducting quantum processor has demonstrated below-threshold error correction with the correct exponential scaling. Previous experiments showed that quantum error correction works in principle — syndromes can be measured, errors can be corrected. What had not been demonstrated was the exponential suppression with code size that the threshold theorem predicts. Without that scaling, error correction merely shifts the error rate; it cannot drive it to arbitrarily small values by increasing code size. With it, the path to fault tolerance is open in principle.

I want to be precise about what “for the first time” means here, because the claims in quantum computing tend to sprawl. Earlier work had demonstrated below-threshold error correction in other qubit modalities and at smaller scales. What Willow adds is the combination: a superconducting processor, three distinct code distances, clean exponential scaling, and a physical qubit count sufficient to demonstrate $d=7$ without other dominant error sources overwhelming the measurement. The data is convincing.

The random circuit sampling result and its limits

Now the $10^{25}$ years.

Random circuit sampling (RCS) is a computational task defined as follows: apply a sequence of randomly chosen quantum gates to a collection of qubits and sample from the resulting output distribution. The output distribution of a deep random circuit over $n$ qubits is believed to be classically hard to simulate: the best known classical algorithms scale exponentially in $n$. Google’s Willow chip performed RCS on a 105-qubit circuit in under 5 minutes. The $10^{25}$-year figure is an estimate of the time required for a Frontier-class supercomputer to simulate the same computation classically, extrapolated from benchmarks on smaller circuit sizes.

I will not contest the $10^{25}$-year number specifically — the extrapolation is defensible given current knowledge of classical simulation algorithms. But several things should be said about what the benchmark means.

RCS has no known practical application. It is not a step toward factoring integers, simulating molecules, or solving optimisation problems. It is a task designed to be hard for classical computers while being easy for quantum ones — a benchmark of quantum hardness, not quantum usefulness.

Classical simulation of random circuits is an active research area. The best classical algorithms for this task have improved substantially over the past five years. A result that seems to require $10^{25}$ years today may require $10^{10}$ years after a better classical algorithm is published. This has happened before: Google’s 2019 “quantum supremacy” claim was significantly eroded by subsequent classical simulation improvements. I expect the same here, to some degree.

Extrapolation is hard. The $10^{25}$-year estimate involves scaling classical simulation costs across many orders of magnitude in circuit size, from regimes where simulation is feasible to regimes where it is not. The uncertainty in the estimate is correspondingly large.

None of this makes the RCS result fraudulent or uninteresting. It is a genuine demonstration that a quantum processor can perform a specific task at a scale that classical computers plausibly cannot match. But calling this “quantum advantage” in the sense that matters — useful computation performed faster than any classical alternative — overstates it considerably.

The threshold result, by contrast, does not depend on classical simulation hardness arguments. It depends on measuring $p_L$ at $d = 3, 5, 7$ and checking whether the sequence is decreasing and consistent with the theoretical prediction. The data are directly interpretable without extrapolation. That is why I find the threshold result more significant.

Where we actually are — and the gap

The Willow chip has 105 physical qubits. The threshold result was demonstrated at $d = 7$, using approximately 101 physical qubits for one logical qubit (including ancillas). In other words: Willow demonstrated roughly one logical qubit operating below threshold.

A cryptographically relevant quantum computer — one capable of breaking RSA-2048 using Shor’s algorithm — requires approximately (Gidney & Ekerå, 2021):

$\sim 2048$ logical qubits for the factoring computation itself (with optimised circuits)
$\sim 1000$ to $10{,}000$ physical qubits per logical qubit, depending on the target logical error rate and physical error rate
Total: roughly 20 million physical qubits, operating through millions of error correction rounds

The gap between “one logical qubit at $d=7$, 105 physical qubits” and “20 million physical qubits, millions of coherent error-correction rounds” is not a gap of ten percent or a factor of two. It is four to five orders of magnitude in qubit count alone, and the engineering challenges of maintaining coherence, connectivity, and calibration across millions of physical qubits are qualitatively different from maintaining them across 105.

We are on the right curve. We are not close to the destination.

The relevant trajectory — assuming continued improvement in physical error rates, qubit counts, and error correction overhead — places cryptographically relevant quantum computing somewhere between ten and thirty years away. Those estimates are uncertain by design; the field has repeatedly surprised in both directions. But I am confident about the qualitative picture: Willow demonstrates that the physics works at small scale; the engineering challenge of scaling it up is immense.

NIST post-quantum cryptography standards

Here is where I become impatient with the framing that says “don’t worry, quantum computers are decades away.”

In August 2024, NIST released three finalised post-quantum cryptographic standards:

FIPS 203 (ML-KEM, formerly CRYSTALS-Kyber): key encapsulation based on Module Learning With Errors (Module-LWE), a lattice problem
FIPS 204 (ML-DSA, formerly CRYSTALS-Dilithium): digital signatures based on Module-LWE
FIPS 205 (SLH-DSA, formerly SPHINCS+): hash-based digital signatures — the conservative option, with no known vulnerability to either classical or quantum attacks, at the cost of larger signature sizes

These standards exist because the cryptographic community understands something that the “decades away” framing obscures: the threat is not when quantum computers exist, but when they existed. The attack is called “Harvest Now, Decrypt Later.” State-level adversaries — and I do not think it is paranoid to assume that multiple state intelligence agencies are collecting encrypted internet traffic today — archive encrypted data with the intention of decrypting it once cryptographically relevant quantum computing becomes available.

For data encrypted today with RSA or elliptic curve cryptography, the protection window is however long it takes for a cryptographically relevant quantum computer to be built. If that is fifteen years, data encrypted with RSA-2048 today and collected by a patient adversary is vulnerable within fifteen years. For most data, fifteen-year confidentiality is adequate — a credit card number from 2025 is not sensitive in 2040. But for state secrets, medical records, long-term financial instruments, and critical infrastructure keys, fifteen-year confidentiality is not even close to adequate.

The migration to post-quantum cryptography should be happening now. In many places, it is not.

The mathematical security of the NIST standards rests on lattice problems. The security of ML-KEM reduces to the hardness of Module-LWE: find a short vector in a high-dimensional lattice with noise. The best known classical algorithms for this are subexponential but still exponential in the lattice dimension; the best known quantum algorithms — including Shor’s algorithm — provide no advantage, because Module-LWE is not a hidden subgroup problem or a discrete logarithm problem. The reduction is to worst-case lattice problems; this is a strong theoretical foundation compared to the purely conjectured hardness of many classical cryptographic assumptions.

Whether post-quantum cryptographic standards will still look secure in twenty years is an empirical question that the cryptanalytic community will continue to probe. But the alternative — remaining on RSA while a sufficiently patient adversary harvests encrypted traffic — is worse.

The existence of Willow is not an argument that RSA is broken. It is an argument that the threshold for “good enough to be a real threat” has moved from “theoretical possibility” to “demonstrated at small scale with correct exponential scaling.” The curve is real. Act accordingly. (And if you are responsible for cryptographic infrastructure at an institution and have not yet read the public money, public code argument for open, auditable cryptographic implementations — it applies doubly here.)

The cat qubit alternative

I have written elsewhere about bosonic cat qubits as an alternative approach to error correction. It is worth briefly noting the contrast with the surface code philosophy.

Willow’s surface codes take a universal approach to errors: both bit-flip and phase-flip errors are corrected by the same 2D stabilizer code, requiring large 2D arrays of physical qubits with all-to-neighbour connectivity. The code distance $d$ drives both error types down together.

The cat qubit approach, exemplified by Alice & Bob’s recent result (Reglade et al., 2024), encodes a logical qubit in a superposition of coherent states $|\pm\alpha\rangle$ in a harmonic oscillator. The Kerr nonlinearity of the resonator suppresses bit-flip errors exponentially in $\alpha^2$ — the mean photon number — at the hardware level. Phase-flip errors remain, but they are the only dominant error mode, and they can be corrected with a simpler one-dimensional outer code rather than a full 2D surface code.

The overhead reduction could be substantial. If bit-flip errors are already exponentially suppressed by the hardware, you do not need a 2D code with overhead scaling as $d^2$ physical qubits per logical qubit. A 1D repetition code over cat qubits might achieve the same logical error rate with far fewer physical qubits. Alice & Bob have demonstrated cat qubits with bit-flip times exceeding ten seconds — many orders of magnitude longer than the phase-flip time. The gamble is that this asymmetry persists as the system scales and that the phase-flip outer code is tractable.

Surface codes and cat qubits are different bets on the same fundamental problem: how do you make the overhead of fault-tolerant quantum computing manageable? Surface codes are the more conservative bet — they work with any qubit that meets the error rate threshold, regardless of error anisotropy. Cat qubits are the more speculative bet — they require maintaining a specific nonlinear oscillator regime at scale, but the payoff in overhead reduction could be decisive. Both approaches are credible. Neither has been demonstrated at the scale where the comparison becomes definitive.

What the threshold result actually means

Let me close by saying precisely what I think the Willow result establishes and what it does not.

It establishes that below-threshold quantum error correction exists outside of theory. The threshold theorem says that below $p_{\text{th}}$, logical error rates decrease exponentially as code size grows. Willow demonstrates this behaviour at $d = 3, 5, 7$ in a superconducting processor. The theoretical prediction and the experimental observation are consistent. This is not a small thing. The threshold theorem has been the theoretical backbone of fault-tolerant quantum computing since the late 1990s; it is genuinely satisfying to see its core prediction — exponential scaling — confirmed experimentally.

It establishes that the physical error rates of superconducting qubits can be brought below the surface code threshold. $p \approx 0.18\%$ against $p_{\text{th}} \approx 1\%$ gives a comfortable margin. The ratio $p/p_{\text{th}} \approx 0.18$ is the suppression factor per unit increase in the exponent $\lfloor (d+1)/2 \rfloor$. That is a factor of roughly 5–6 per code distance step — enough to drive logical error rates to useful levels at moderate code distances, without requiring physical error rates of $10^{-4}$ or lower.

It does not establish that a cryptographically relevant quantum computer is imminent, near-term, or easy to build. The gap from one below-threshold logical qubit to 20 million physical qubits is real and large. The engineering challenges of superconducting quantum computers at scale — refrigeration, wiring, control electronics, cross-talk, calibration drift — are not solved by demonstrating $d=7$.

And the $10^{25}$-year benchmark is technically defensible and strategically irrelevant. Classical simulation of random circuits is an interesting research problem. It is not the problem that quantum computers are being built to solve.

The result that matters is the curve: as code distance grows, logical error rates fall exponentially. We are on that curve. We have not arrived anywhere yet. But for the first time in the history of quantum computing, the evidence says we are moving in the right direction with the right scaling. That is, genuinely, a significant step.

Start migrating your cryptographic infrastructure.

References

Shor, P. W. (1995). Scheme for reducing decoherence in quantum computer memory. Physical Review A, 52(4), R2493–R2496. DOI: 10.1103/PhysRevA.52.R2493
Dennis, E., Kitaev, A., Landahl, A., & Preskill, J. (2002). Topological quantum memory. Journal of Mathematical Physics, 43(9), 4452–4505. DOI: 10.1063/1.1499754
Fowler, A. G., Mariantoni, M., Martinis, J. M., & Cleland, A. N. (2012). Surface codes: Towards practical large-scale quantum computation. Physical Review A, 86(3), 032324. DOI: 10.1103/PhysRevA.86.032324
Acharya, R., et al. (Google Quantum AI). (2024). Quantum error correction below the surface code threshold. Nature, 636, 639–646. DOI: 10.1038/s41586-024-08449-y
Gidney, C., & Ekerå, M. (2021). How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits. Quantum, 5, 433. DOI: 10.22331/q-2021-04-15-433
National Institute of Standards and Technology. (2024). NIST Releases First 3 Finalized Post-Quantum Encryption Standards (FIPS 203, 204, 205). Retrieved from https://www.nist.gov/news-events/news/2024/08/nist-releases-first-3-finalized-post-quantum-encryption-standards
Reglade, U., et al. (2024). Quantum control of a cat qubit with bit-flip times exceeding ten seconds. Nature, 629, 778–783. DOI: 10.1038/s41586-024-07294-3

Changelog

2026-02-17: Updated the Fowler et al. (2012) author list to “Fowler, A. G., Mariantoni, M., Martinis, J. M., & Cleland, A. N.” — the previous list had been mixed with a different 2012 Fowler paper.
2026-02-17: Updated the closing section to “20 million physical qubits,” matching the Gidney & Ekerå (2021) figure cited earlier in the article.

Why quantum errors are a fundamental obstacle#

The key insight: measure the error, not the state#

Bit-flip protection#

Phase-flip protection#

Concatenation: the Shor 9-qubit code#

The threshold theorem#

Surface codes: the practical path#

The geometry#

The scaling#

What Willow demonstrated#

The random circuit sampling result and its limits#

Where we actually are — and the gap#

NIST post-quantum cryptography standards#

The cat qubit alternative#

What the threshold result actually means#

References#

Changelog#