Physics on Sebastian Spicker

Oppenheimer Didn't Have an Acceptable Use Policy

Tue, 03 Mar 2026 00:00:00 +0000

Physicists inherit, along with the formalism and the problem sets, a particular set of guilt. The profession has been working through its relationship to weapons, state violence, and the gap between scientific capability and ethical readiness since August 1945. This post is about why I think the current moment in AI closely resembles that history, and why Anthropic’s decision to draw a line matters even if — especially if — you think the line is imperfect.

What Just Happened

The news this week involves Anthropic and the question of whether and how large language models should be available for military applications. Anthropic has stepped back from a path toward unrestricted military use and restated a position: there are things their models will not be used for, weapons development and autonomous lethal systems among them. The response from parts of the defence and national security community has been predictable — naïve, idealistic, unilateral disarmament, your adversaries will not make the same choice.

These are not stupid objections. I want to take them seriously. But I also want to explain why, as someone who spent years studying physics in the shadow of the Manhattan Project’s legacy, the framing of those objections sounds very familiar, and why that familiarity is not reassuring.

What the Physicists Thought They Were Doing

The scientists who built the atomic bomb were not, for the most part, indifferent to what they were building. Many of them were refugees from European fascism. They understood what a Nazi atomic weapon would mean. The urgency was real, the moral reasoning was coherent, and the conclusion — build it before the other side does — followed from the premises.

What the premises did not include was adequate weight for what happens after the technical problem is solved.

By the time the Trinity test produced results in July 1945, Germany had already surrendered. The original justification — prevent the Nazis from getting there first — had evaporated. What remained was a weapon, an infrastructure for building more weapons, and a strategic and political logic that had largely moved beyond the scientists’ control. The Franck Report, written by a group of Manhattan Project scientists in June 1945, argued against using the bomb on a Japanese city without prior demonstration. It was ignored. Oppenheimer, who chaired the Interim Committee’s scientific panel, signed off on the Hiroshima target recommendation. He spent the rest of his life with that.

The lesson most physics students absorb from this history is something like: the scientists were not the decision-makers, the decision was going to be made anyway, and the presence of principled scientists in the room was better than their absence. The system was going to do what it was going to do; all you could influence was the margin.

I believed this for a long time. I am less sure of it now.

The Analogy and Its Limits

The comparison between the atom bomb and artificial general intelligence — or even current large language models at the capability frontier — is made often enough that it has become a cliché, which is usually the point at which people stop thinking carefully about it. Let me try to be specific about where the analogy holds and where it breaks.

Where it holds:

The core structural similarity is this: a small number of researchers, working at the frontier of a capability that most people do not understand, are making decisions that will constrain or enable uses they cannot fully anticipate, in contexts they will not control. The physics community in 1942 had a clearer view of what fission could do than any political or military decision-maker. The AI research community in 2026 has a clearer view of what large language models can do — and of what more capable successors will do — than most of the people who will deploy them.

That epistemic position is not morally neutral. Knowing more than the decision-makers does not mean you have unlimited responsibility, but it does mean you have more responsibility than someone who does not know. Feigning ignorance about downstream applications is not available to you.

The second similarity: once the capability exists and is demonstrated, the normative landscape changes. Before Trinity, the question of whether to build nuclear weapons was still open. After Trinity, it was no longer open in the same way — the knowledge existed, the infrastructure existed, the geopolitical expectations had already been set. The arms race was not caused by the bomb, but the bomb’s existence changed what the arms race meant and how fast it moved. We are somewhere in the vicinity of that transition with frontier AI systems. The question of whether to build them is still formally open for any given company or research group, but the landscape is already different from what it was five years ago.

Where it breaks:

The atom bomb was a single-use physical object whose primary function was destroying things. Large language models are general-purpose cognitive tools with a very wide range of applications, the majority of which are not weapons-relevant. This matters because it changes the policy space. You could, in principle, have not built the atom bomb. You cannot, in principle, not build language models while still having language models for medicine, education, scientific research, and the other applications that are clearly beneficial. The dual-use problem for AI is more severe, not less severe, than it was for physics.

The other important difference: the Manhattan Project was conducted in secret, under wartime conditions, with a relatively well-defined adversarial structure. The current AI landscape involves many organisations, many countries, public publication of research, and no clear equivalent of the Axis/Allied framing. The game theory of “if we don’t do it, they will” is more complicated when “they” is not a single identifiable adversary with symmetric interests.

What Anthropic’s Line Actually Says

Setting aside for a moment whether the line is in the right place, there is something worth examining in the act of drawing it at all.

The standard criticism — that a unilateral ethical commitment in a competitive field simply advantages less scrupulous actors — assumes that ethical commitments are pure costs with no countervailing benefits. This is the argument the weapons lobby has made about every arms control proposal in the history of arms control, and it has sometimes been right. Unilateral disarmament without reciprocal commitments can leave you worse off. This is not a trivial point.

But it smuggles in an assumption that deserves scrutiny: that the relevant competition is primarily between AI companies, and that the only variable that matters is relative capability. If you accept that framing, then any ethical constraint is a handicap and the only rational strategy is to develop as fast as possible with as few restrictions as possible.

That framing has a name in physics. It is called the arms race equilibrium, and the physics community spent thirty years understanding what it produces. It produces capability accumulation without a corresponding development of the normative frameworks, institutional safeguards, and mutual verification mechanisms that make the capability survivable. It produces Hiroshima, then the hydrogen bomb, then MIRV, then the point at which the accumulated arsenal is large enough to end complex life on Earth several times over, at which point you negotiate the first real arms limitation treaties — from a starting position of vastly more deployed capability than anyone needed and vastly less trust than anyone wanted.

The question Anthropic is implicitly asking is whether there is a path that does not look like that. The answer is not obvious. But I think it is worth asking.

What the Physicists Should Have Done

Here is the counterfactual that haunts the Manhattan Project’s legacy: what if the scientific community had treated the ethics of the bomb as seriously as the physics, from the beginning?

Not naïvely. Not by refusing to work on it and ceding the possibility of influencing it. But by making the ethical analysis parallel to the technical analysis, by treating the question of use as a scientific question with as much rigour as the question of yield, and by using the epistemic authority that came from being the people who understood the capability to push, hard, for the normative frameworks that did not yet exist.

Some scientists did this. Szilard circulated a petition, signed by 70 Manhattan Project scientists, against the use of the bomb on Japanese cities without prior warning. It did not work. But the effort was real, and the record of the effort matters — both as evidence that the scientific community was not unanimous in its acquiescence and as a model for what engaged dissent looks like from inside a project that is going to proceed regardless.

What most scientists did not do, and what the profession largely did not do in the decades that followed, was treat the ethical work as primary. Physics built its identity around the technical capability — the extraordinary achievement of understanding nature at the deepest level — and treated the ethical consequences as someone else’s department. The bomb was the military’s problem. The cold war was the politicians’ problem. The physicists kept doing physics.

This was comfortable and it was wrong.

What I Want From AI Researchers

I want AI researchers to do what the physicists did not, and to do it now, while the critical decisions are still open.

Anthropic drawing a line is one version of this. It is imperfect — the line is in a particular place, the enforcement mechanisms are limited, the competitive dynamics are real. But it is a claim that the people who built the capability have ongoing responsibility for how it is used, and that some uses are outside the bounds of what should happen regardless of what is technically possible.

That claim is not naïve. It is, in fact, the claim the Franck Report was making in 1945: that capability does not determine use, that scientists have a voice in the normative question, and that using that voice is part of the job rather than a distraction from it.

What I want beyond that is for the AI research community to treat the ethics as primary rather than as footnotes. Not ethics review boards that approve research post hoc. Not responsible AI teams that are consulted after the capability has been developed. A genuine integration of the normative analysis into the research process itself — asking, at each stage, what this capability makes possible and who benefits from that possibility and who pays the cost.

The physics community got to August 1945 before it had that conversation in earnest. The conversation has been going on ever since, and it has produced important institutional frameworks — the Bulletin of the Atomic Scientists, the arms control treaties, the export control regimes, the norms against first use. These things matter. But they were built in reaction to a capability that had already been deployed, and the shape of everything that followed was constrained by that starting point.

The AI community is not there yet. The starting point is still being established. That is what makes this moment consequential, and what makes Anthropic’s line — wherever exactly it is drawn — worth defending as an act of principle rather than dismissing as an act of commercial positioning.

A Note on the “Of Our Time” Framing

I am aware that comparisons to the atom bomb are sometimes used to generate unwarranted urgency, to short-circuit careful reasoning by invoking the most extreme case. I want to be clear about what I am and am not claiming.

I am not claiming that current large language models are as immediately dangerous as nuclear weapons. They are not.

I am claiming that the structural situation — researchers at the capability frontier, ahead of the policy frameworks, making decisions that will constrain future options, in a competitive environment with adversarial dynamics — is similar enough that the lessons of the Manhattan Project period are directly relevant. Not as prophecy. As a guide to the kind of mistakes that are available to make.

The physicists had plenty of warning. Szilard had been worried since 1933. Einstein wrote to Roosevelt in 1939. The Franck Report was written before Hiroshima. The warnings were on the record. What was not on the record was a scientific community that treated those warnings as actionable constraints on its own behaviour rather than as advisories for policymakers.

That is the thing I want to be different this time.

References

Franck, J. et al. (1945). Report of the Committee on Political and Social Problems (The Franck Report). National Archives, Record Group 77.

Oppenheimer, J. R. (1965). Interview on The Decision to Drop the Bomb (NBC documentary). Recorded 1965.

Rhodes, R. (1986). The Making of the Atomic Bomb. Simon & Schuster.

Russell, B., & Einstein, A. (1955). The Russell–Einstein Manifesto. Pugwash Conferences on Science and World Affairs.

Szilard, L. (1945). A Petition to the President of the United States. July 17, 1945. Available via the Atomic Heritage Foundation.

Bulletin of the Atomic Scientists (1945–present). Doomsday Clock statements. https://thebulletin.org/doomsday-clock/

If You Think This Is Written by AI, You Are Both Right and Wrong

Wed, 18 Feb 2026 00:00:00 +0000

I use AI tools in my writing. This post, like several others on this blog, was written with LLM assistance — research, structure, drafting, revision. If you run any of these posts through an AI writing detector, you will likely receive a high probability-of-AI score. The detector will be picking up something real.

It will also be wrong about what that means.

The Constitution Problem

In 2023, as universities began deploying AI detection tools at scale, educators started testing them on texts that were definitively not AI-generated. The results were instructive. The United States Constitution received high AI-probability scores from multiple commercial detectors. GPTZero returned a rating of 92% likely AI-written. The Federalist Papers fared similarly. So did sections of the King James Bible and Kant’s Critique of Pure Reason. Historical documents, written by humans, for human purposes, in an era when no AI existed — flagged as machine-generated.

This was not a marginal edge case. It was consistent across tools and across documents. And while it was widely reported as evidence that the detectors were broken, there is a more precise reading available: the detectors were working correctly, and we had misunderstood what they were measuring.

What the Detectors Actually Measure

Most commercial AI detectors — GPTZero, Turnitin’s detection layer, Copyleaks — use some combination of two statistical signals.

Perplexity. A language model assigns a probability to each token given the preceding tokens. Low perplexity means the text was, token by token, what the model expected — it sits close to the centre of the probability distribution. AI-generated text tends to have low perplexity because that is precisely what generation does: it samples from the high-probability region of the distribution [1]. Human text, on average, has higher perplexity, because humans write for specific contexts with idiosyncratic word choices, rhetorical effects that require the unexpected, and the accumulated noise of composing for a real reader.

Burstiness. A term introduced by Edward Tian, GPTZero’s creator: human writing has high burstiness — sentence lengths vary widely, vocabulary density shifts, complex constructions alternate with simple ones. AI writing is more uniform. The statistical distribution of sentence lengths in LLM output is narrower than in most human prose [2].

The underlying assumption these tools share: human writing is variable, contextually messy, idiosyncratic. AI writing is smooth and predictable.

This is accurate for a large class of human writing — casual prose, personal essays, social media, student writing in informal registers. It is wrong about a different and well-defined class of human writing. The Constitution sits in that class. So does a lot of other text.

The Systemising Brain

Simon Baron-Cohen’s empathising–systemising (E-S) theory distinguishes two cognitive orientations. Empathising involves attending to social and emotional cues, inferring mental states, navigating the pragmatic, implicit layer of communication — what is meant rather than what is said. Systemising involves attending to rules, patterns, and underlying regularities — the drive to understand how things work and to represent them in explicit, transferable, internally consistent terms [3].

Both orientations are distributed across the human population. They are not exclusive, and neither is pathological. But autism spectrum conditions are robustly associated with high systemising and relatively lower empathising — not because autistic people lack emotions or care about others, but because the cognitive mode that comes naturally to them is one of rules, structures, and explicit representation rather than social inference and pragmatic implication. The intense world theory [4] adds a complementary perspective: autistic brains may be characterised by hyper-reactivity and hyper-plasticity, with pattern-seeking and systematising serving partly as a way of making a too-intense world navigable. The systematicity is not a deficit. It is an adaptation.

This has direct consequences for writing.

High-systemising writing tends toward:

Consistent vocabulary. The same term is used for the same concept throughout, because substituting a synonym introduces ambiguity about whether the referent is actually the same. Neurotypical writing freely uses synonyms for stylistic variety; systemising writing resists this on principle.
Explicit logical structure. Claims are supported by stated reasons rather than left to pragmatic inference. If there are three conditions, all three are named. Nothing is “needless to say.”
Low social hedging. Phrases like “as everyone knows” or “obviously” are avoided, because they perform social alignment rather than convey information — and they depend on shared assumptions the writer is not confident are actually shared. (This connects to a point I made in the car-wash-walk post about Gricean pragmatics: autistic communication often violates the maxim of quantity in the direction of over-informing, because nothing is assumed implicit.)
Grammatical parallelism. Parallel logical content takes parallel grammatical form. This is not stylistic affectation; it is a natural consequence of representing structure explicitly.
Minimal rhetorical noise. The prose does not meander, warm up, or perform relatability. It states what needs to be stated.

Now run text with these properties through an AI detector. Consistent vocabulary reads as low lexical diversity. Explicit structure reads as low burstiness. Minimal rhetorical noise reads as smooth, generated output. The detector is measuring these properties accurately. The attribution to machine generation is where it goes wrong.

Liang et al. [5] demonstrated a closely related failure empirically: AI detectors are significantly more likely to flag writing by non-native English speakers as AI-generated. Non-native writers at advanced levels of formal English tend to write more carefully, more consistently, and more in accordance with explicit grammar rules — because they learned the language as a system of explicit rules rather than acquiring it through immersive social exposure. More systematic writing: higher AI probability score. The mechanism is the same. The population is different.

The Physicist Brain

Physics writing has its own conventions, independently developed but pointing in the same direction.

Scientific prose requires defined terms used consistently: in a paper about quantum error correction, “logical qubit,” “physical qubit,” and “syndrome” each mean exactly one thing, used identically in section 2 and section 5. It requires explicit assumptions: “We assume the noise is Markovian.” “In the limit of large N.” These are not vague hedges; they are precise statements about the domain of validity of the results. It requires logical derivation over rhetorical persuasion: the connectives are “since,” “therefore,” “it follows that” — explicit logical operators, not narrative bridges. And the passive construction of “the signal was measured” rather than “I measured the signal” removes the individual from the result, because the result should be reproducible regardless of who performs the measurement.

The outcome is prose that is systematic, consistent, and structurally predictable. From the outside — and from the vantage point of an AI detector — it looks machine-generated.

Paul Dirac is the physicist who comes to mind first here. His 1928 paper deriving the relativistic wave equation for the electron contains almost no rhetorical apparatus. Motivation, equation, consequence: each stated once, clearly, with no warm-up and no elaboration beyond what the argument requires. It is not warm. It is not discursive. It is beautiful in the way that a proof is beautiful: every element earns its place. Run it through GPTZero and see what you get.

This connection between the physicist’s prose style and the autistic cognitive mode is not accidental. Baron-Cohen et al. [6] surveyed Cambridge students by academic discipline and found that physical scientists and mathematicians scored consistently higher on the Autism Quotient (AQ) than humanities students and controls, with mathematicians scoring highest of all. The systemising orientation associated with autism spectrum conditions is also associated with — and presumably selected for — in quantitative scientific disciplines. The physicist’s prose reflects this. So does the writing of a high-systemising person who has never studied physics.

The categories overlap without being identical. What they share is a cognitive preference for explicit structure, consistent vocabulary, and logical transparency over social performance and rhetorical persuasion. The writing that emerges from that preference looks, to an AI detector, like it was generated by a machine.

It was not.

The Category Error

The error AI detectors make is not a measurement error. It is a category error.

They are trained to distinguish two things: output generated by a contemporary LLM, and a specific subset of human writing — typically casual, personal, or student prose collected from online sources. When they encounter text outside either of those training categories — systematic and precise but human-generated — the classifier has no good option. The text does not match the “AI” training data exactly, and it does not match the “human” baseline either. It gets assigned to the bin it fits least badly.

What is happening when the Constitution is flagged: it is systematic, definitional, prescriptive, and internally consistent. It was written by lawyers and statesmen who understood that ambiguity in foundational documents creates legal chaos. They wrote to be unambiguous. The result is text with low perplexity and low burstiness — the statistical signature the detector associates with AI.

GPTZero’s creator Edward Tian acknowledged this problem when it was reported: the Constitution appears so frequently in LLM training data that it registers as “already known” to the model, which artificially lowers its perplexity score. That is a real and specific issue. But it is secondary. The deeper issue is that the Constitution would score low-perplexity even without the training-data contamination effect, because systematic, definitional prose is intrinsically low-perplexity. Precise language is predictable language. That is partly the point of precise language.

The baseline assumption — that human writing is variable and idiosyncratic — holds for much human writing. It does not hold for legal drafting, technical documentation, scientific papers, sacred and historical texts written to be durable and precise, writing by people with high systemising orientation, or writing by non-native speakers at formal registers. That is not a small population of edge cases. It is a substantial fraction of all written material that exists.

Right and Wrong at the Same Time

So: if you think these posts are AI-generated, you are right and wrong at the same time.

Right, in two ways. First: yes, I use AI tools. LLM assistance is part of my writing process — not an occasional aid, but a regular part of how research notes and half-formed arguments become structured posts. Second: the writing style of these posts is systematic and precise in ways that detectors register as machine-generated. That systematicity is real, and if a detector picks it up, it is measuring something.

Wrong, also in two ways. First: the ideas, judgments, and connections in these posts are mine. The decisions about what to include and what to leave out, which papers to cite and how to frame their implications, where the interesting tension lies between neurodiversity research and the assumptions baked into AI detection tools — those are not outputs of a language model working in isolation. They are the product of someone who works at the intersection of these fields and has thought about them for a while. An LLM cannot generate these posts without a human who has already decided what to say.

Second, and more important for the argument here: the systematic, precise character of this writing is not evidence of machine generation. It is a cognitive signature — one associated with physics training, with high systemising orientation, with the overlap between those two things that I have written about elsewhere in the context of neurodiversity more broadly.

The detector is measuring a real property of the text. It is misattributing the origin of that property.

The interesting question this opens is not “did AI write this?” That question is increasingly poorly posed in an era where thinking and writing are already deeply entangled with machine assistance, in ways that differ sharply from person to person and task to task. The better question is: whose judgment is in the text? Whose choices about what to include, what to connect, what to leave out?

The systematicity in this writing is mine. The recognition that AI detectors systematically disadvantage autistic writers, physicist writers, and non-native speakers is a judgment I made, not one a language model was prompted to produce. The connection to the Constitution — a document written to be maximally unambiguous, flagged as maximally AI-like — is a connection I found worth drawing.

Whether that makes this text “human” is a philosophical question I am happy to leave open. What it is not is AI hallucination.

References

[1] Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., & Finn, C. (2023). DetectGPT: Zero-shot machine-generated text detection using probability curvature. Proceedings of the 40th International Conference on Machine Learning (ICML 2023). https://arxiv.org/abs/2301.11305

[2] Gehrmann, S., Strobelt, H., & Rush, A. M. (2019). GLTR: Statistical detection and visualization of generated text. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 111–116. https://doi.org/10.18653/v1/P19-3019

[3] Baron-Cohen, S. (2009). Autism: The empathising–systemising (E-S) theory. Annals of the New York Academy of Sciences, 1156(1), 68–80. https://doi.org/10.1111/j.1749-6632.2009.04467.x

[4] Markram, K., & Markram, H. (2010). The intense world theory — a unifying theory of the neurobiology of autism. Frontiers in Human Neuroscience, 4, 224. https://doi.org/10.3389/fnhum.2010.00224

[5] Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non-native English writers. Patterns, 4(7), 100779. https://doi.org/10.1016/j.patter.2023.100779

[6] Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., & Clubley, E. (2001). The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. Journal of Autism and Developmental Disorders, 31(1), 5–17. https://doi.org/10.1023/A:1005653411471

Try to Relax — and Other Things That Prevent Themselves

Thu, 15 Jan 2026 00:00:00 +0000

Someone, at some point in your life, has told you to relax. They may have specified that you should try to relax — as though relaxation were an effortful goal you could pursue with sufficient will. If you have ever received this advice and found it made things worse, you were not imagining it. You were experiencing a phenomenon with a name, a precise mechanism, and — it turns out — a surprising structural analogue in the geometry of spacetime.

The Ironic Process

In 1994, the social psychologist Daniel Wegner published a paper that formalised what most people already suspected: trying not to think of something makes you think of it more [1]. The theoretical model behind this has two components.

The first is an operating process: it actively generates mental content consistent with the intended state. You are trying to relax — the operating process searches for calming thoughts, slows your attention, tries to find the mood.

The second is a monitoring process: it runs in parallel, searching for evidence that the goal has not been achieved. Am I relaxed yet? No. Checking again. Still no. Its function is to detect failure early so the operating process can correct course.

Under normal conditions, the operating process dominates. You try to relax, the monitor runs quietly in the background, and eventually you converge on the intended state. Under conditions of cognitive load, stress, or self-consciousness — precisely the conditions under which someone might urgently need to relax — the balance shifts. The monitoring process, searching for signs of not-relaxing, finds them everywhere. The monitor activates the very content it is supposed to prevent. The harder you try, the louder the monitor, the further from the goal.

This is Wegner’s ironic process: the mechanism recruited to achieve a goal becomes the primary obstacle to that goal. It is not failure of will. It is a structural property of the system — and it applies to any goal whose target state is the absence of effortful activity. Trying to fall asleep. Trying not to feel anxious about a performance. Trying to be spontaneous. Trying, in the most purely paradoxical formulation, to relax.

The instruction “try to relax” is not bad advice because the advice-giver lacks empathy. It is bad advice because it is a category error: it applies an effort-based tool to a goal defined by the absence of effort. The monitoring process required to track progress toward the goal is precisely the kind of activity that constitutes not having reached it.

A Geometry That Does the Same Thing

The analogy I want to draw requires a brief detour into general relativity.

In 1988, Michael Morris and Kip Thorne published a paper with the unpromising title “Wormholes in spacetime and their use for interstellar travel: A tool for teaching general relativity” [2]. It is, in the field’s understated way, one of the more consequential papers in the subject. Morris and Thorne asked: what would a traversable wormhole — one you could actually pass through — require, physically and mathematically?

The spacetime metric of a traversable wormhole in their formulation is:

$$ds^2 = -e^{2\Phi(r)}\,dt^2 + \frac{dr^2}{1 - b(r)/r} + r^2\,d\Omega^2$$

where $\Phi(r)$ is the redshift function and $b(r)$ is the shape function. The throat of the wormhole sits at $r = r_0$, where $b(r_0) = r_0$. For anything to pass through in finite proper time, $\Phi$ must remain finite — no infinite redshift — and $b(r)/r$ must remain less than one away from the throat.

So far this is just geometry. The physics enters through the Einstein field equations, which connect the geometry to the matter and energy present. To maintain the wormhole throat against collapse — to hold it open — the stress-energy tensor of whatever matter fills the throat must satisfy:

$$T_{\mu\nu}\, k^\mu k^\nu < 0$$

for null vectors $k^\mu$ — what is called a violation of the null energy condition. In plain terms: the matter holding the wormhole open must have negative energy density. Not small energy density. Negative — less than nothing.

This is exotic matter. It does not appear in any tabletop experiment. Classical general relativity does not rule it out, but it does not provide it either.

Quantum mechanics is slightly more helpful: the Casimir effect produces measurable negative energy density between closely spaced conducting plates. The Hawking radiation calculation involves transient negative energy near black hole horizons. So quantum field theory permits negative energy — in principle. But Ford and Roman [3] showed that quantum field theory also strictly limits it: the integrated negative energy over any region is bounded by a quantum inequality. The shorter the burst of negative energy, the smaller it must be; the larger the region, the more constrained the magnitude. The result is that any realistic traversable wormhole would be either Planck-scale (far too small for anything but quantum information to traverse) or would require negative energy concentrated in a band many orders of magnitude thinner than the throat itself — an engineering requirement that borders on the physically absurd.

The wormhole, in other words, does something structurally similar to the monitoring process in Wegner’s model: the condition required to make it traversable actively resists being satisfied. The geometry that would allow passage tends toward collapse. The more you want the wormhole to be open and stable, the more the energy conditions conspire against you.

What the 2022 “Wormhole” Actually Was

In late 2022, a team including Daniel Jafferis, Alexander Zlokapa, and colleagues at Caltech and Google published a paper in Nature with the title “Traversable wormhole dynamics on a quantum processor” [4]. Several major news outlets reported that scientists had created a wormhole. This was not accurate.

What the team actually did was implement a quantum circuit on Google’s Sycamore processor that simulates the Sachdev-Ye-Kitaev (SYK) model — a quantum mechanical system of randomly interacting fermions that is holographically dual, via Maldacena’s AdS/CFT correspondence, to a nearly two-dimensional anti-de Sitter black hole geometry. Two coupled SYK systems are dual to a two-sided eternal black hole, which is connected in the bulk by an Einstein-Rosen bridge — a wormhole.

By coupling the two systems with a specific negative coupling (which corresponds, via ER=EPR, to injecting negative energy into the wormhole), the team made the bridge traversable in the holographic sense: information encoded in one quantum system propagated and was recovered in the other, consistent with traversal of the dual gravitational wormhole.

This is genuinely interesting physics. It is not a wormhole through our spacetime. The wormhole lives in the holographic dual geometry — a mathematical construct in a lower-dimensional theory of gravity, not a tunnel between two points in the universe you inhabit. Quantum teleportation occurred on a quantum chip via the ordinary mechanism of quantum entanglement. The gravitational language is a description of the same physics in a dual frame, not a shortcut through space.

The media confusion is itself instructive: “wormhole” has drifted far from its original meaning. In current physics, the word can refer to a Morris-Thorne traversable tunnel through spacetime, to the Einstein-Rosen bridge of an eternal black hole, to a holographic dual of quantum entanglement [5], or to saddle points in the Euclidean gravitational path integral relevant to the black hole information paradox. These are related by mathematics but quite different in what they physically represent. None of the last three are traversable shortcuts through the universe. The first is, in principle, but barely, and only at the cost of exotic matter physics that nobody knows how to achieve.

The harder physicists have worked to make the wormhole genuinely traversable and macroscopic, the more the mathematics has resisted. This is, at minimum, a suggestive pattern.

What 2025 Added

The field did not stand still after 2022. Three independent lines of work published in 2024 and 2025 have further complicated what a wormhole is — and in each case the complication pushes in the same direction: the geometry keeps refusing to be a shortcut.

The wormhole that does not connect two things. Maloney, Meruliya, and Van Raamsdonk [7] showed that Euclidean wormholes — saddle points in the gravitational path integral — appear generically in ordinary higher-dimensional gravity, without any special setup. The striking implication is that these wormholes do not bridge two separate universes or two separate theories; they encode statistical fluctuations within a single theory. The replica wormholes that resolved the Page curve for black hole radiation — one of the central recent results in the black hole information paradox — are of this type. The wormhole is not a connection between two things. It is a feature of how the theory sums over histories, a bookkeeping structure for correlations within one system. The physical picture of two mouths joined by a throat does not apply.

The wormhole that is not smooth. Magán, Sasieta, and Swingle [8] studied the interior geometry of the Einstein-Rosen bridge connecting typical entangled black holes — the configuration that is supposed, under ER=EPR, to be the gravitational dual of quantum entanglement. Their result, published in Physical Review Letters, is that this interior is not a smooth tunnel. It is long, irregular, and chaotic — an Einstein-Rosen caterpillar, as they call it. The quantum randomness of the entangled state maps directly onto geometric disorder in the interior: the more thermalized the state, the more disordered the bridge. A traversing observer, if one could exist, would not glide through a clean throat. They would navigate a geometry shaped by quantum chaos, growing longer and more disordered as the system evolves. This is ER=EPR taken seriously at the level of typical states rather than special ones, and the result is inhospitable to any ordinary notion of passage.

The wormhole that is not a tunnel at all. Gaztañaga, Kumar, and Marto [9] proposed a more radical reinterpretation: the Einstein-Rosen bridge, they argue, is not a connection between two separate spaces but a representation of time-reversal symmetry within a single quantum description. On this reading, there is only one space, and the bridge is an artefact of how you describe the time-symmetric structure of the quantum state. The paper, published in Classical and Quantum Gravity, attracted considerable press coverage. It sits somewhat outside the mainstream of holographic quantum gravity research, and the proposal has not yet been widely integrated into the community’s working framework — the language of two entangled systems and a connecting geometry remains the dominant picture in AdS/CFT calculations. But the direction it points is consistent with the other two results.

Taken together, these papers suggest that the word “wormhole” has been quietly revised from a noun into an adjective. Not a thing that exists somewhere, but a property of certain mathematical structures — one that describes correlation, disorder, or symmetry depending on which context you are working in. Each attempt to pin down what a wormhole is in practice finds something less traversable, less connected, and less tunnel-like than the previous attempt.

This is, to put it plainly, consistent with the theme of this article.

Causation Eating Its Own Tail

The wormhole’s physical problems become even sharper when you add time. A traversable wormhole connecting two different spacetime regions can in principle connect not just two different places but two different times — creating a closed timelike curve (CTC), a path through spacetime that loops back on itself. You leave on Tuesday and arrive last Thursday.

The standard paradoxes then apply. The grandfather paradox: you travel back in time, prevent an event that was a necessary precondition of your journey. The causal chain that produced the journey destroys the causal chain that produced the journey. The bootstrap paradox: an object or piece of information exists with no origin — passed back in time repeatedly, it has always already existed, created by nothing, caused by itself.

Friedman, Morris, Novikov and colleagues formalised what has become known as the Novikov self-consistency principle: the only physically admissible solutions are those in which the causal structure is globally consistent [6]. No grandfather paradox — not because you cannot go back, but because if you do, it turns out you were always part of the causal chain you thought you were disrupting. The time-traveller cannot prevent an event; they can only be the mechanism by which it occurred.

This is not resolution. It is constraint. The universe selects only the self-consistent loops, filtering out everything else. The causal structure enforces a particular kind of conservatism: only actions that were always going to happen can happen. There is no freedom in a closed timelike curve. Trying to change the loop from inside it is exactly like trying to relax by monitoring whether you have relaxed: the mechanism of change is part of the thing you are trying to change.

Rick Sanchez’s Particular Problem

Rick and Morty is, among other things, a sustained meditation on this structure — without ever calling it that.

Rick Sanchez is the smartest being in every universe. His portal gun creates traversable wormholes instantaneously and at negligible energy cost, which is exactly what general relativity and quantum field theory suggest should be impossible. The show waves this away; what it does not wave away is the psychological consequence of Rick’s capability.

Rick has thought his way to the conclusion that nothing matters. Infinite universes, infinite timelines, infinite Ricks: every moment is replaceable, every loss is recoverable somewhere else, every moral weight dissolves in the face of the combinatorial enormity of everything that exists. This is Rick’s version of relaxation — the nihilism that should follow from taking the multiverse seriously.

But the monitoring process runs. Rick checks whether he has achieved not-caring, finds that he cares (about Morty, about Beth, about being the smartest one in the room), and the caring becomes more vivid for having been suppressed. His nihilism is not peace. It is a performance of peace that is constantly undermined by the monitoring process watching for cracks.

Rick’s portal gun solves every spatial and temporal problem. It does not solve the ironic process. No level of intelligence, and no number of traversable wormholes, provides a shortcut past Wegner’s monitor. This is, I think, what makes the character work: the show’s impossible physics is the premise, but the actually impossible thing — the one the show treats as genuinely intractable — is the psychological paradox.

The Common Structure

These cases — the relaxation paradox, the traversable wormhole, the closed timelike curve — share a formal structure.

In each case, there is a desired end state (relaxation, passage through the wormhole, a changed past) and a mechanism for pursuing it (effortful monitoring, exotic matter, time travel). In each case, the mechanism required to pursue the end state is incompatible with the end state itself. The monitoring process that tracks “am I relaxed?” is the activity of not being relaxed. The exotic matter that holds the wormhole open is the physical condition that makes the geometry so extreme that traversal is barely possible. The attempt to change the past is always already part of the past you were trying to change.

The physicist’s version of this is the quantum measurement problem: the act of observing a system disturbs it. The observer cannot step outside the measurement. The psychologist’s version is the ironic process. The relativist’s version is the closed timelike curve. The narrative version is Rick Sanchez.

What Actually Works

Wegner’s answer to the ironic process is not to try harder with the operating system. It is to release the monitoring system — to stop checking whether the goal has been achieved. This is the core insight behind Acceptance and Commitment Therapy: you cannot think your way to not-thinking. The goal of not-thinking requires not-monitoring, which means not having the goal in the active, effortful sense at all.

This is harder than it sounds. It is a second-order intervention: instead of trying to relax, you try to stop trying to relax — which, done badly, just adds another monitoring process. But done well, it is the correct diagnosis: the category error was treating relaxation as an effortful goal in the first place.

For wormholes, the physics community has arrived at a related answer. The question “how do we make a macroscopic traversable wormhole in our spacetime?” may be the wrong question. The ER=EPR framework suggests that wormholes and quantum entanglement are two descriptions of the same thing. The question is not how to build a tunnel; it is what the entanglement structure of spacetime already is, and how information is already being transferred through it. The shortcut was never a shortcut. It was always just the ordinary geometry of entangled quantum systems, described in a language that made it look exotic.

For Rick Sanchez, the show has not found an answer. Which is, probably, the correct narrative decision.

References

[1] Wegner, D. M. (1994). Ironic processes of mental control. Psychological Review, 101(1), 34–52. https://doi.org/10.1037/0033-295X.101.1.34

[2] Morris, M. S., & Thorne, K. S. (1988). Wormholes in spacetime and their use for interstellar travel: A tool for teaching general relativity. American Journal of Physics, 56(5), 395–412. https://doi.org/10.1119/1.15620

[3] Ford, L. H., & Roman, T. A. (1996). Quantum field theory constrains traversable wormhole geometries. Physical Review D, 53(10), 5496–5507. https://doi.org/10.1103/PhysRevD.53.5496

[4] Jafferis, D., Zlokapa, A., Lykken, J. D., Kolchmeyer, D. K., Davis, S. I., Lauk, N., Neven, H., & Spiropulu, M. (2022). Traversable wormhole dynamics on a quantum processor. Nature, 612, 51–55. https://doi.org/10.1038/s41586-022-05424-3

[5] Maldacena, J., & Susskind, L. (2013). Cool horizons for entangled black holes. Fortschritte der Physik, 61(9), 781–811. https://doi.org/10.1002/prop.201300020

[6] Friedman, J., Morris, M. S., Novikov, I. D., Echeverria, F., Klinkhammer, G., Thorne, K. S., & Yurtsever, U. (1990). Cauchy problem in spacetimes with closed timelike curves. Physical Review D, 42(6), 1915–1930. https://doi.org/10.1103/PhysRevD.42.1915

[7] Maloney, A., Meruliya, V., & Van Raamsdonk, M. (2025). arXiv:2503.12227. https://arxiv.org/abs/2503.12227

[8] Magán, J. M., Sasieta, M., & Swingle, B. (2025). Einstein-Rosen caterpillar. Physical Review Letters, 135. https://doi.org/10.1103/btw6-44ry

[9] Gaztañaga, E., Kumar, A., & Marto, J. (2025). Classical and Quantum Gravity. https://doi.org/10.1088/1361-6382/ae3044

A Gas at Temperature T: Xenakis and the Physics of Stochastic Music

Tue, 14 Oct 2025 00:00:00 +0000

Iannis Xenakis (1922–2001) was trained as a civil engineer at the Athens Polytechnic, joined the Greek Resistance during the Second World War and the subsequent Greek Civil War, survived a British army tank shell in January 1945 that cost him the sight in his left eye and part of his jaw, was sentenced to death in absentia by the Greek military government, fled to Paris in 1947, and worked for twelve years as an architect in Le Corbusier’s atelier — where he contributed structural engineering to the Unité d’Habitation in Marseille and designed the Philips Pavilion for Expo 58. In parallel, already in his thirties, he taught himself composition — approaching Honegger (who was too ill to teach) and then studying with Messiaen — and became one of the central figures of the post-war avant-garde. I mention the biography not as background colour but because it bears on the physics. A person who has been through what Xenakis had been through by 1950 is not likely to be intimidated by the kinetic theory of gases.

He was not. In 1955–56 he composed Pithoprakta — “actions through probability” — for 46 strings, each of which is, in his own account, a molecule of an ideal gas. This post works through the mathematics he used and asks what it means when a composer takes statistical mechanics seriously as a compositional tool.

The Problem with Post-War Serialism

To understand why Xenakis did what he did, it helps to know what everyone else was doing. By the early 1950s, the dominant tendency in European new music was total serialism: the systematic extension of Schoenberg’s twelve-tone technique to rhythm, dynamics, articulation, and register. Every parameter of every note was determined by a series. Messiaen had sketched this direction in Mode de valeurs et d’intensités (1949); Boulez and Stockhausen had taken it to its logical extreme.

The result, as Xenakis observed with characteristic bluntness in Formalized Music (1963/1992), was a kind of sonic indistinguishability: because every parameter varied according to independent deterministic series, the textures produced by total serialism sounded essentially like random noise. The maximum of local determinism had produced the appearance of global chaos.

His diagnosis was precise and, I think, correct: if the perceptual result of maximum determinism and maximum randomness is the same, then the path forward is not to find a better deterministic scheme but to embrace randomness explicitly, at the level that governs the macroscopic structure. Control the distribution; let the individual events vary within it. This is exactly what statistical mechanics does for a gas: it does not track every molecule, but it knows with great precision what the distribution of velocities will be.

Statistical Mechanics in Brief

In a classical ideal gas of $N$ molecules at thermal equilibrium with temperature $T$, the molecules move in all directions with speeds distributed according to the Maxwell-Boltzmann speed distribution:

$$f(v) = \sqrt{\frac{2}{\pi}}\, \frac{v^2}{a^3}\, \exp\!\left(-\frac{v^2}{2a^2}\right), \qquad a = \sqrt{\frac{k_B T}{m}},$$

where $m$ is the molecular mass and $k_B$ is Boltzmann’s constant. The parameter $a$ sets the characteristic speed scale: it grows with temperature (hotter gas means faster molecules) and shrinks with molecular mass (heavier molecules move more slowly at the same temperature).

The distribution has a characteristic shape: it rises as $v^2$ for small speeds (few molecules are nearly stationary), peaks at the most probable speed $v_p = a\sqrt{2}$, and falls off as $e^{-v^2/2a^2}$ for large speeds (very fast molecules are exponentially rare). The three characteristic speeds are:

$$v_p = a\sqrt{2}, \qquad \langle v \rangle = a\sqrt{\tfrac{8}{\pi}}, \qquad v_\mathrm{rms} = a\sqrt{3}.$$

No individual molecule is tracked. The distribution is everything: once you know $f(v)$, you know all macroscopic properties of the gas — pressure, mean kinetic energy, thermal conductivity — without knowing the trajectory of a single molecule. The individual is sacrificed to the ensemble.

Pithoprakta and the Orchestra as Gas

In Pithoprakta (1955–56), Xenakis assigns each of the 46 string instruments to a molecule of a gas. The musical analogue of molecular speed is the velocity of a glissando: the rate at which a glissando moves through pitch, measured in semitones per second. Slow glissandi are cold molecules; fast glissandi are hot ones.

For a given passage with a specified musical “temperature” (an intensity-and-density parameter he could set as a compositional choice), the 46 glissando speeds are drawn from the Maxwell-Boltzmann distribution for that temperature. No two strings play the same glissando at the same speed. The effect, to a listener, is a dense sound-mass — a shimmer or a roar — whose internal texture varies but whose overall character (the temperature, the density) is under the composer’s control at exactly the level that matters perceptually.

Xenakis worked out the velocities numerically by hand. The score of Pithoprakta was among the first in which the individual parts were derived from a statistical distribution rather than from a melody, a row, or an improvisation instruction. The calculation is tedious but not difficult: for each time window, choose a temperature, compute $f(v)$ for the 46 values of $v$ that tile the distribution, and assign one speed to each instrument.

The connection between macroscopic structure and microscopic liberty is deliberately preserved. The shape of the sound-mass — its brightness, its turbulence, its rate of change — is controlled. Each individual line is unpredictable. This is, structurally, the same trade-off that makes thermodynamics work: you give up on the individual trajectory and gain exact knowledge of the aggregate.

Musical Temperature as a Compositional Parameter

The analogy is worth making precise. In the physical gas, raising the temperature $T$ increases $a = \sqrt{k_B T / m}$, which shifts the peak of $f(v)$ to the right and widens the distribution. More molecules have high speeds; the variance of speeds increases.

In Pithoprakta, raising the musical “temperature” has the same effect: more instruments perform rapid glissandi; the pitch-space trajectories are more varied; the texture becomes more active and more turbulent. Lowering the temperature concentrates the glissando speeds near zero — slow motion, near-stasis, long sustained tones that change pitch only gradually. The orchestra cools.

This mapping is not metaphorical. Xenakis computed it. The score contains numerically derived glissando speeds; the connection between the perceptual temperature of the texture and the statistical parameter $T$ is quantitative. When musicians speak of a passage “heating up,” they are usually using a figure of speech. In Pithoprakta, they are describing a thermodynamic fact.

The Poisson Distribution and Event Density

Pithoprakta uses a second physical model alongside the Maxwell-Boltzmann distribution: the Poisson process, which governs the density of independent, randomly occurring events.

If musical events (pizzicato attacks, bow changes, individual note entries) occur at a mean rate of $\lambda$ events per second, the probability of exactly $k$ events occurring in a time window of length $T$ is:

$$P(N = k) = \frac{(\lambda T)^k\, e^{-\lambda T}}{k!}.$$

The Poisson distribution has a single parameter $\lambda$ that controls both the mean and the variance (they are equal: $\langle N \rangle = \mathrm{Var}(N) = \lambda T$). A high $\lambda$ produces a dense cluster of events; a low $\lambda$ produces sparse, widely spaced events.

Xenakis used this to control the density of pizzicato attacks independently of the glissando texture. A passage can be cool (slow glissandi) and dense (many pizzicati), or hot and sparse, or any combination. The two distributions operate on independent musical parameters — pitch motion and event density — giving the composer a two-dimensional thermodynamic control space over the texture.

Markov Chains: Analogique A and Analogique B

In Analogique A (for string orchestra, 1958–59) and its companion Analogique B (for sinusoidal tones, same year), Xenakis moved to a different stochastic framework: Markov chains.

A Markov chain is a sequence of states where the probability of transitioning to the next state depends only on the current state. The chain is specified by a transition matrix $P$, where $P_{ij}$ is the probability of moving from state $i$ to state $j$:

$$P_{ij} \geq 0, \qquad \sum_j P_{ij} = 1 \quad \forall\, i.$$

Under mild conditions (irreducibility and aperiodicity), the chain converges to a unique stationary distribution $\pi$ satisfying:

$$\pi P = \pi, \qquad \sum_i \pi_i = 1.$$

The convergence is geometric: if $\lambda_2$ is the second-largest eigenvalue of $P$ in absolute value, then after $n$ steps the distribution $\pi^{(n)}$ satisfies $\|\pi^{(n)} - \pi\| \leq C |\lambda_2|^n$ for some constant $C$. The gap $1 - |\lambda_2|$ — the spectral gap — controls how quickly the chain forgets its initial state. A transition matrix with a large spectral gap produces rapid convergence; one with $|\lambda_2| \approx 1$ produces long-memory dependence between distant states. This is a compositional choice: the spectral gap determines how quickly a piece’s texture changes character.

In Analogique A, Xenakis divided the sonic space into a grid of cells defined by pitch register (high/middle/low), density (sparse/medium/dense), and dynamic (soft/loud). Each “screen” — a brief time window — occupies one cell in this grid. The progression of screens through the piece is governed by transition probabilities: from a high/dense/loud screen, there is some probability of moving to each adjacent cell, specified by Xenakis’s chosen transition matrix.

This is a Markov chain on a discrete state space of sonic textures. The macroscopic trajectory of the piece — its overall movement through sound- quality space — is determined by the transition matrix, which the composer sets. The details of each screen are filled in stochastically, within the parameters of the current state. Again, the individual is sacrificed to the aggregate; control is exercised at the level of the distribution rather than the event.

Game Theory: Duel and Stratégie

The most extreme and, to my mind, most interesting of Xenakis’s formalisations is the use of game theory in Duel (1959) and Stratégie (1962).

A two-player zero-sum game is specified by a payoff matrix $A \in \mathbb{R}^{m \times n}$. Player 1 (the “maximiser”) chooses a row $i$; Player 2 (the “minimiser”) chooses a column $j$; Player 1 receives payoff $A_{ij}$ and Player 2 receives $-A_{ij}$. In a pure-strategy game, each player selects a single action. In a mixed-strategy game, each player chooses a probability distribution over their actions: Player 1 uses $\mathbf{x} \in \Delta_m$ and Player 2 uses $\mathbf{y} \in \Delta_n$, where $\Delta_k$ denotes the standard $(k-1)$-simplex.

The expected payoff to Player 1 under mixed strategies is:

$$E(\mathbf{x}, \mathbf{y}) = \mathbf{x}^\top A\, \mathbf{y}.$$

Von Neumann’s minimax theorem (1928) guarantees that:

$$\max_{\mathbf{x} \in \Delta_m} \min_{\mathbf{y} \in \Delta_n} \mathbf{x}^\top A\, \mathbf{y} \;=\; \min_{\mathbf{y} \in \Delta_n} \max_{\mathbf{x} \in \Delta_m} \mathbf{x}^\top A\, \mathbf{y} \;=\; v^*,$$

where $v^*$ is the value of the game. The pair $(\mathbf{x}^*, \mathbf{y}^*)$ that achieves this saddle point is the Nash equilibrium: neither player can improve their expected payoff by unilaterally deviating from their equilibrium strategy.

In Stratégie, each conductor leads one orchestra. Each has nineteen “tactics” — six basic musical textures (e.g., sustained chords, staccato pizzicati, glissandi masses, silence) plus thirteen combinatorial tactics that combine two or three of the basics. The payoff matrix is a $19 \times 19$ integer matrix, also defined by Xenakis, specifying how many points Conductor 1 scores when their orchestra plays tactic $i$ against Conductor 2’s tactic $j$. A referee tracks the score.

The conductors make decisions in real time during the performance, choosing tactics based on what the other conductor is doing and on the evolving score. The piece ends when one conductor reaches a predetermined score threshold.

The Nash equilibrium of the payoff matrix tells each conductor, in principle, the optimal distribution over tactics to play: if both play optimally, the expected score trajectory is determined. In practice, conductors are not expected to compute mixed strategies on the podium; Xenakis’s point is structural. The game-theoretic formalism is used to design the payoff matrix so that no tactic dominates — every choice has consequences that depend on the opponent’s choice — guaranteeing that the piece will always contain genuine strategic tension regardless of who is conducting.

Duel (1959) is the earlier, simpler version for two chamber orchestras. Stratégie (1962) was premiered in April 1963 at the Venice Biennale with two conductors competing live. The audience was aware of the game, of the score, and of the payoff matrix. The premiere was by most accounts a success, though the practical complications of running a zero-sum game in a concert hall (including the question of whether conductors were actually computing Nash equilibria or just following intuition) were never fully resolved.

Formalized Music

Xenakis assembled his theoretical framework in Musiques formelles (1963), translated and expanded as Formalized Music (1971; revised edition 1992). The book is one of the strangest documents in twentieth-century music theory: part treatise, part manifesto, part mathematical appendix. It covers stochastic composition, Markov chains, game theory, set theory, group theory, and symbolic logic — all presented with the confidence of someone who is equally at home in the engineering faculty and the concert hall, and with the occasional obscurity of someone writing simultaneously for two audiences who share almost no vocabulary.

The core argument is that musical composition can and should be treated as the application of mathematical structures to sonic material, not because mathematics makes music “better” but because mathematical structures are the most powerful available tools for controlling relationships between sounds at multiple scales simultaneously. The statistical distributions control the macroscopic; the individual values vary within them. The game- theoretic payoff matrix controls the strategic interaction; the individual tactics fill in the details. Mathematics operates at the structural level and leaves the acoustic surface free.

This is a different relationship between mathematics and music from the ones in my earlier posts on group theory and Messiaen or the Euclidean algorithm and world rhythms. In those cases, mathematics describes structure that already exists in the music — structure the composers arrived at by ear. In Xenakis, mathematics is the generative tool: the score is derived from the calculation.

What the Analogy Does and Does Not Do

The Maxwell-Boltzmann analogy in Pithoprakta is exact in one direction and approximate in another.

It is exact in the following sense: the glissando speeds Xenakis computed for his 46 strings genuinely follow the Maxwell-Boltzmann distribution with the parameters he chose. The score is a realisation of that distribution. If you collect the glissando speeds from the score and plot their histogram, you will find the characteristic $v^2 e^{-v^2/2a^2}$ shape.

It is approximate — or rather, it is analogical — in the sense that strings in an orchestra are not molecules of a gas. They do not collide. They have mass and inertia in a physical sense that has no direct mapping to musical parameters. The temperature $T$ is not a temperature in any thermodynamic sense; it is a compositional variable that Xenakis chose to parameterise with the same symbol because the formal relationship is the same. The analogy is structural, not ontological.

This is worth saying plainly because it is easy to be misled in both directions: either to over-claim (the orchestra is a gas) or to dismiss (the orchestra is merely labelled with physical vocabulary). The actual claim is more modest and more interesting: the mathematical structure of the Maxwell-Boltzmann distribution is the right tool for specifying a certain kind of orchestral texture, namely one where individual elements vary stochastically around a controlled macroscopic envelope. The physics provides the formalism; the music provides the application. This is how mathematics works in engineering, too.

The Centenary and What Remains

Xenakis died in 2001, by then partially deaf and with dementia. His centenary in 2022 produced a wave of new performances, recordings, and scholarship — including the Meta-Xenakis volume (Open Book Publishers, 2022), which collects analyses of his compositional mathematics, his architectural work (he designed the Philips Pavilion for Le Corbusier’s Expo 58 in Brussels using the same ruled-surface geometry he was using in Metastaseis), and his political biography.

What remains resonant about his project is not the specific distributions he chose — the Maxwell-Boltzmann is not the only or even necessarily the best distribution for many musical applications — but the epistemological position it represents. Xenakis insisted that the right question to ask about a musical texture is not “what is the note at beat 3 of bar 47?” but “what is the distribution from which the events in this section are drawn?” This shift from individual determination to statistical control is precisely the shift that makes thermodynamics possible as a science, and Xenakis was the first composer to apply it deliberately and systematically.

When a composer writes “let the orchestra be a gas at temperature $T$” and then actually computes the consequences with Boltzmann’s constant in front of him, I do not feel that physics has been appropriated. I feel that it has been recognised — seen, from a different direction, as the same thing it always was: a set of tools for thinking about ensembles of interacting elements whose individual behaviour is too complex to track but whose collective behaviour is not.

The orchestra is not a gas. But the Maxwell-Boltzmann distribution describes it anyway.

References

Ames, C. (1989). The Markov process as a compositional model: A survey and tutorial. Leonardo, 22(2), 175–187. https://doi.org/10.2307/1575226
Jedrzejewski, F. (2006). Mathematical Theory of Music. Delatour France / IRCAM.
Nash, J. F. (1950). Equilibrium points in $n$-person games. Proceedings of the National Academy of Sciences, 36(1), 48–49. https://doi.org/10.1073/pnas.36.1.48
Nierhaus, G. (2009). Algorithmic Composition: Paradigms of Automated Music Generation. Springer.
Matossian, N. (2005). Xenakis (revised ed.). Moufflon Publications.
Solomos, M. (Ed.). (2022). Meta-Xenakis. Open Book Publishers. https://doi.org/10.11647/OBP.0313
von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische Annalen, 100(1), 295–320. https://doi.org/10.1007/BF01448847
von Neumann, J., & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press.
Xenakis, I. (1992). Formalized Music: Thought and Mathematics in Composition (revised ed.). Pendragon Press. (Originally published as Musiques formelles, La Revue Musicale, 1963.)

Changelog

2026-01-14: Corrected the description of Stratégie (1962): each conductor has nineteen tactics (six basic plus thirteen combinatorial), with a 19 x 19 payoff matrix — not six tactics and a 6 x 6 matrix. The six-tactic, 6 x 6 description applies to the earlier Duel (1959).
2026-01-14: Added “in April 1963” to the Stratégie premiere sentence. The composition date is 1962; the premiere took place on 25 April 1963 at the Venice Biennale.
2026-01-14: Changed “studying briefly with Honegger” to “approaching Honegger (who was too ill to teach).” Xenakis sought instruction from Honegger circa 1949, but Honegger was in declining health and did not take him as a student.

From Oxide to Oversampling: The Physics of Recorded Sound

Fri, 15 Aug 2025 00:00:00 +0000

There is an argument that has been running in recording studios since roughly 1982, when the first commercially mastered compact discs appeared. On one side: analogue tape has warmth, depth, something the ear likes. On the other: digital audio is more accurate, lower noise, the measurements say so. The argument produces more heat than light, because most participants treat it as an aesthetic question — a matter of feeling, taste, preference. It is not. The difference between tape and digital audio is a physics difference, and the physics is specific enough to calculate.

The physics here turns out to be some of my favourite kind: it sits at the intersection of condensed matter, signal processing, and Fourier analysis, and it connects directly to why certain sounds are perceived as pleasant. This post walks through both sides. Part I is the ferromagnetic physics of magnetic tape and the harmonic structure of saturation distortion. Part II is the delta-sigma modulator and the engineering trick that achieves 24-bit dynamic range from a 1-bit comparator. Neither side of the debate is as simple as its partisans claim, and the physics of both is more interesting than the aesthetics argument they have been stuck in for forty years.

Part I: The Physics of Magnetic Tape

Ferromagnetic Recording

Magnetic recording tape is a thin polymer substrate coated with a layer of ferromagnetic particles suspended in a binder. For most of the twentieth century those particles were iron oxide — specifically $\gamma\text{-Fe}_2\text{O}_3$, gamma-phase ferric oxide — though chromium dioxide ($\text{CrO}_2$) and later metal-particle formulations with pure iron or iron-cobalt alloys were developed for higher coercivity and better high-frequency response. What all of these materials share is the key property of ferromagnetism: each particle is a small permanent magnet, a magnetic domain with a net magnetic moment that can be oriented by an external field and that will retain that orientation when the field is removed.

The recording process exploits this directly. The recording head is a toroidal electromagnet with a narrow gap. When audio-frequency current flows through the head’s coil, the field at the gap follows the current, and as the tape moves past at a fixed speed, successive particles along the tape length are aligned according to the instantaneous field at the moment they pass the gap. The result is a spatial encoding of the time-domain audio signal along the tape. On playback, the inverse process occurs: the moving pattern of magnetised particles generates a time-varying flux in the playback head’s core, which induces a voltage in the coil by Faraday’s law, reproducing the original current waveform.

So far this description is entirely linear. The head current maps to a field, the field maps to a magnetisation, the magnetisation maps back to a voltage. If all three relationships were linear, tape would be a near-perfect recording medium — limited only by particle noise and head gap frequency response. The nonlinearity comes from the second relationship in that chain, and it comes from the fundamental physics of how ferromagnetic materials respond to an applied field.

The B-H Curve and Hysteresis

The relationship between the applied magnetic field intensity $H$ (from the recording head, measured in A/m) and the resulting magnetic flux density $B$ in the tape (measured in tesla) is not linear. It follows a curve — actually a family of nested curves — known as the hysteresis loop, and its shape determines almost everything interesting about tape recording [3].

Starting from a demagnetised state and increasing $H$ from zero, the initial slope $dB/dH$ — the magnetic permeability $\mu$ — is relatively low. The domains in the material are oriented randomly and require a threshold of energy to begin reorienting. As $H$ increases further, the permeability rises, and there is a region of steep, approximately linear increase in $B$. Then, as $H$ continues to increase, the material saturates: progressively fewer unaligned domains remain, the slope falls, and eventually $dB/dH \to 0$ as all domains are aligned. The $B$-$H$ curve is S-shaped, and the saturation is irreversible in a specific sense: if you now reduce $H$ back toward zero, $B$ does not retrace the original path. It remains at a higher value — the remanence $B_r$ — and you must apply a reverse field of magnitude $H_c$, the coercivity, to bring $B$ back to zero. The loop formed by this cycle of magnetisation and demagnetisation is the hysteresis loop, and its area is proportional to the energy dissipated as heat per cycle.

The crucial feature for audio recording is what happens near the origin. A small audio signal, sitting near $H = 0$, does not experience a nicely linear region of the $B$-$H$ curve. The initial permeability is low, and there is an inflection point near zero: the slope increases as you move away from zero before the saturation region brings it back down again. This means that even at low recording levels, the transfer function from head current to tape magnetisation is nonlinear, and in a particular way — the response is symmetric under $H \to -H$, which means the dominant nonlinear term is even-order. Without some remedy, even a gentle sine wave would emerge from the playback head with significant even-harmonic content added. The signal would also sit in a region of the curve where the effective permeability depends on signal amplitude, making the recording level-dependent in an uncontrolled way. Something needed to be done about this, and the solution found in the 1940s is one of the more elegant pieces of applied physics in the history of the recording industry.

The Bias Signal

The solution is called AC bias, and its discovery is usually credited to Braunmühl and Weber at the German Reichs-Rundfunk-Gesellschaft around 1940, though there are earlier related patents. The idea is simple once stated: add a high-frequency signal — typically between 50 kHz and 150 kHz, well above the audio band — to the recording current before it drives the head. This bias signal has an amplitude large enough to drive the tape through multiple cycles of its B-H curve on each audio cycle, but it is filtered out of the playback signal by the tape’s own limited high-frequency response and by subsequent low-pass filtering.

The effect on the recording process is to linearise the transfer function. The operating point is no longer stationary near the inflection point at $H = 0$. Instead, it rides up and down the B-H curve rapidly many times per audio period, driven by the bias. The audio signal merely modulates the envelope of this rapid oscillation. The net magnetisation that remains after the tape leaves the head gap is the time average of many rapid traversals of the hysteresis loop, and this average tracks the audio signal with good linearity provided the signal level is modest. The bias amplitude and frequency are tuned carefully for each tape formulation — too little bias and the linearisation is incomplete; too much and the signal is undermodulated and the high-frequency response suffers as the bias begins to erase fine spatial patterns written by high-frequency audio. Getting the bias right is part of the alignment procedure for every analogue tape machine and part of why different tape formulations require different machine settings.

The result, for moderate recording levels, is a remarkably clean and linear recording medium. The nonlinear character of the B-H curve is effectively tamed by the bias trick, and the remaining imperfections are mostly second-order: azimuth errors, print-through, head bump, self-demagnetisation at short wavelengths. For practical purposes, a well-aligned analogue tape machine at moderate recording levels is a linear system.

Harmonic Generation at High Levels

At high recording levels — when the audio signal is large enough to push the operating point into the saturation region even after the bias has done its linearising work — the picture changes. The transfer function from input current to output magnetisation becomes genuinely nonlinear, and the harmonic content of the distortion becomes the central question.

The standard framework is a Taylor expansion of the transfer function around the operating point:

$$y(t) = a_1 x(t) + a_2 x^2(t) + a_3 x^3(t) + a_4 x^4(t) + \cdots$$

where $x(t)$ is the input signal (the audio current), $y(t)$ is the output (the magnetisation recorded on tape), and the coefficients $a_n$ are determined by the shape of the B-H curve near saturation. For a pure tone $x(t) = A \sin(\omega t)$, the higher-order terms generate harmonics in a calculable way.

The second-order term gives:

$$a_2 x^2(t) = a_2 A^2 \sin^2(\omega t) = \frac{a_2 A^2}{2}\bigl(1 - \cos 2\omega t\bigr)$$

This is a DC offset plus a component at $2\omega$ — the second harmonic, one octave above the fundamental.

The third-order term gives:

$$a_3 x^3(t) = a_3 A^3 \sin^3(\omega t) = a_3 A^3 \left(\frac{3}{4}\sin\omega t - \frac{1}{4}\sin 3\omega t\right)$$

The $\frac{3}{4}$ piece adds to (or subtracts from) the fundamental depending on the sign of $a_3$; the $-\frac{1}{4}$ piece is a third harmonic at $3\omega$, one octave and a fifth above the fundamental.

Carrying through to fourth order:

$$a_4 x^4(t) = \frac{a_4 A^4}{8}\bigl(3 - 4\cos 2\omega t + \cos 4\omega t\bigr)$$

which contributes additional DC, a component at $2\omega$, and a fourth harmonic at $4\omega$.

Collecting the terms through fourth order, the output is approximately:

$$y(t) \approx \left(a_1 + \frac{3a_3 A^2}{4}\right)A\sin\omega t - \frac{a_2 A^2}{2}\cos 2\omega t - \frac{a_3 A^3}{4}\sin 3\omega t + \cdots$$

The important observation is about which harmonics dominate and what they sound like. The B-H curve of a ferromagnetic material near saturation is approximately symmetric: the saturation behaviour for positive $H$ mirrors that for negative $H$. A symmetric nonlinearity has $a_2 = a_4 = 0$ (all even coefficients vanish by symmetry), and only odd harmonics are generated. But at moderate levels, just before full saturation, the symmetry of the B-H loop as traversed by the biased signal is not perfect, and the even-order terms are nonzero — though small. This gives tape its characteristic distortion signature: at moderate saturation levels, the even harmonics ($2\omega$, $4\omega$) dominate; at heavy saturation, the odd harmonics ($3\omega$, $5\omega$) appear more strongly.

The perceptual consequence of this is the crux of the “analogue warmth” story. The second harmonic is the octave of the fundamental. The fourth harmonic is the double octave. These are, in Western harmonic practice and in the physics of vibrating strings, the most consonant possible intervals. Adding even harmonics at low amplitude to a fundamental makes the sound fuller and richer without introducing beating or dissonance. Odd harmonics — particularly the fifth (at $5\omega$, a major third above the double octave) and the seventh (a flattened seventh above the double octave) — are less consonant relative to the fundamental and at high amplitude produce the harsh, buzzy character associated with heavy distortion or the deliberate aggression of a fuzz pedal.

There is one more effect worth naming: the saturation is a soft knee. The B-H curve does not have a sharp corner at saturation — it curves gradually from the linear region into the flat-topped saturation region. This means that transient signals — percussive attacks, consonant onsets — that briefly exceed the nominal recording level are not hard-clipped but gently compressed. Their peaks are rounded by the shape of the B-H curve. Engineers and producers who record through tape often describe this as the machine “breathing” or as a pleasing “gluing” of transients. The physics is simple: the soft-knee transfer function applies more gain reduction to instantaneous peaks than to the sustained body of the signal, functioning as a fast, musically transparent dynamic compressor for any material that approaches saturation.

Part II: The Physics of Delta-Sigma Conversion

Nyquist-Rate ADC and Its Limits

The straightforward approach to analogue-to-digital audio conversion samples the signal at a rate just above twice the highest audio frequency — the Nyquist rate — using a quantiser with enough bits to achieve the desired dynamic range. For CD-quality audio, the sampling rate is 44.1 kHz (slightly above $2 \times 20{,}000$ Hz) and the word length is 16 bits. The dynamic range of a $b$-bit PCM system is, to a good approximation:

$$\text{SNR} \approx 6.02b + 1.76 \text{ dB}$$

so 16 bits gives approximately $6.02 \times 16 + 1.76 \approx 98$ dB, which matches the dynamic range of the best analogue tape and is well above the approximately 70 dB achievable with the noise floor of typical studio tape at 15 ips [4].

The engineering problem with a straightforward Nyquist-rate ADC is the anti-aliasing filter. Before sampling, all content above $f_s/2 = 22.05$ kHz must be removed. If it is not, energy at frequency $f > f_s/2$ aliases into the audio band as a spurious component at $f_s - f$, which is inaudible in origin but very much audible in its alias. To achieve 98 dB of alias suppression — matching the 16-bit dynamic range — the filter must attenuate signals at 22.05 kHz by 98 dB relative to signals at 20 kHz. The transition band is only 2.05 kHz wide. That requires a very high-order analogue filter — typically seventh-order elliptic or Chebyshev — and such filters have significant phase distortion within the audio band, particularly at frequencies near the passband edge. In 1982, building this filter precisely, cheaply, and repeatably in consumer hardware was a genuine engineering challenge. The filters introduced audible phase and amplitude ripple that the original measurements had not anticipated and that contributed to early criticisms of the CD sound.

Oversampling

The delta-sigma ($\Sigma\Delta$) ADC architecture was developed to sidestep the steep-filter problem entirely, and its adoption in consumer audio from the late 1980s onwards largely resolved the anti-aliasing filter debate [1]. The core idea is oversampling: instead of sampling at 44.1 kHz with 16 bits, the $\Sigma\Delta$ converter samples at $M \times 44.1$ kHz — where $M$ is the oversampling ratio, typically 64 in early audio converters, giving $64 \times 44.1 = 2.8224$ MHz — with a 1-bit quantiser. The anti-aliasing filter now needs to attenuate everything above 1.4112 MHz before sampling. Its transition band runs from 20 kHz to 1.4112 MHz, a ratio of roughly 70:1. This is easy: a simple, cheap, first- or second-order RC filter suffices, with negligible phase distortion anywhere in the audio band. The price paid is that the quantiser is now only 1 bit, and a 1-bit quantiser has terrible resolution on its own.

To understand what oversampling buys even before any clever signal processing, consider the quantisation noise floor. For a uniform quantiser with step size $\Delta$, the quantisation noise power is $P_q = \Delta^2/12$, and this noise is spread approximately uniformly from 0 to $f_s/2$. The noise power spectral density is $P_q / (f_s/2)$. After oversampling by a factor of $M$ — so that the effective Nyquist band runs from 0 to $f_{\text{audio}} = f_s/(2M)$ — the in-band noise power is:

$$P_{\text{in-band}} = \frac{P_q}{f_s/2} \cdot f_{\text{audio}} = \frac{P_q}{f_s/2} \cdot \frac{f_s}{2M} = \frac{P_q}{M}$$

Each doubling of $M$ halves the in-band noise power, an improvement of 3 dB, equivalent to half a bit of resolution. At 64× oversampling this gives 18 dB, or three extra bits — useful, but not enough to get from a 1-bit quantiser to 16-bit performance. We need something more.

Noise Shaping

The second ingredient — and the one that makes $\Sigma\Delta$ conversion genuinely remarkable — is noise shaping. Rather than spreading quantisation noise uniformly in frequency, we can engineer its spectral distribution so that almost all the noise power sits above the audio band, where it is removed by a digital low-pass filter (the decimation filter) at the output.

A first-order $\Sigma\Delta$ modulator achieves this by a feedback loop. At each sample step, the quantiser takes the running integral of the difference between the input signal and the previous quantised output. More precisely: the quantisation error $e_n = y_n - \hat{x}_n$ (where $\hat{x}_n$ is the input to the quantiser and $y_n$ is the 1-bit output) is fed back and subtracted from the next input before integration. This is the integrator-feedback structure that gives the modulator its name: $\Sigma$ for the integrating summation, $\Delta$ for the difference.

In the $z$-domain, this feedback structure gives the quantisation noise a transfer function of:

$$N(z) = 1 - z^{-1}$$

that is, the noise at time $n$ is the current error minus the previous error — a first-difference operation. In the frequency domain, substituting $z = e^{j 2\pi f / f_s}$:

$$\bigl|N(f)\bigr|^2 = \left|1 - e^{-j 2\pi f / f_s}\right|^2 = 4\sin^2\!\left(\frac{\pi f}{f_s}\right)$$

For frequencies well below the sampling rate, $f \ll f_s$, the small-angle approximation gives:

$$\bigl|N(f)\bigr|^2 \approx \left(\frac{2\pi f}{f_s}\right)^2$$

The noise power spectral density rises as $f^2$ — it is heavily suppressed at low frequencies and pushed up toward $f_s/2$. Integrating this shaped noise over the audio band $[0, f_{\text{audio}}]$ and comparing to the flat-spectrum case, the in-band SNR improvement for a first-order modulator scales as $M^3$ rather than $M^1$: every doubling of oversampling ratio gives 9 dB improvement (1.5 bits) instead of 3 dB. At 64× oversampling — six doublings — a first-order modulator recovers approximately 54 dB, or 9 effective bits.

A second-order modulator applies the noise-shaping filter twice, giving $|N(f)|^2 \propto f^4$ and an SNR gain scaling as $M^5$: 15 dB per octave of oversampling. At 64× — again six doublings — this recovers approximately 90 dB, or 15 effective bits. Modern high-performance audio ADCs use fifth- to seventh-order modulators operating at 128× oversampling or higher. The in-band noise floor drops to levels corresponding to 20–24 effective bits — entirely from a 1-bit hardware comparator, with all the resolution coming from the noise shaping and the subsequent digital decimation filter.

The following table illustrates the SNR gain achievable at practical oversampling ratios:

Modulator order	Oversampling ratio	SNR gain	Effective bits gained
1st order	64×	54 dB	9
2nd order	64×	90 dB	15
5th order	128×	~120 dB	~20

The 5th-order row deserves a moment’s attention. A single-bit comparator — a device that outputs only 1 or 0, with no analogue subtlety whatsoever — combined with oversampling and noise shaping, achieves the resolution of a 20-bit Nyquist-rate ADC and is doing so using a simple digital feedback loop and an analogue integrator that can be fabricated cheaply on a CMOS chip. This is, I think, one of the more quietly stunning pieces of engineering in consumer electronics, and it goes entirely unnoticed because the CD player it lives inside is now considered mundane.

There is a subtlety worth adding for completeness. Real $\Sigma\Delta$ modulators of order three and above are potentially unstable — the noise-shaping loop can become unstable for large input signals, producing limit cycles or tonal artefacts. Managing this stability is a significant part of the design problem and involves either restricting the input range, adding nonlinear stability control, or using multi-bit internal quantisers (which reduce the quantisation step and ease the stability constraint while retaining most of the noise-shaping benefit). The multi-bit approach also addresses a related issue: the ideal 1-bit DAC in the feedback loop is inherently linear (there are only two levels, so there is no differential nonlinearity), but multi-bit internal DACs must be trimmed or calibrated to avoid nonlinearity in the feedback path corrupting the noise shaping. These engineering details are discussed thoroughly in Norsworthy, Schreier, and Temes [5], which remains the standard reference.

The digital audio infrastructure that delta-sigma conversion enabled — clean, cheap, phase-linear converters without steep analogue filters — also made digital audio workable in latency-sensitive applications like live performance. For a discussion of why latency matters so much in network music performance and how it shapes system design, see my earlier post on NMP latency and the physics of musical timing.

The Irony of the Comparison

Both tape saturation and delta-sigma conversion are, at root, about the same problem: how to manage the relationship between a signal and the finite resolution of the medium storing it. Tape manages the problem physically and somewhat accidentally — the ferromagnetic B-H curve happens to generate even harmonics that are consonant with the recorded signal, and the bias trick linearises the response well enough that the distortion only becomes audible when the engineer deliberately pushes into saturation. Delta-sigma manages the problem mathematically and deliberately — quantisation noise is redistributed in frequency by a designed feedback loop so that it falls outside the audible band.

Neither approach is perfect, and neither is neutral. Tape adds signal-correlated harmonic distortion whose spectral content depends on recording level and which compresses transients in a way that changes the perceived dynamics. Digital audio, even with delta-sigma conversion, has its own imperfections: idle-channel noise from the modulator, potential for tonal limit-cycle artefacts at specific input levels, and the abrupt onset of hard clipping at full scale — which, unlike tape saturation, is symmetrical and rapid and adds all harmonics simultaneously, giving the harsh, unpleasant character that digital overloads are known for. The soft-knee vs. hard-clip distinction is real and audible, and it is probably the most defensible technical basis for the claim that analogue tape handles transient overloads more graciously.

What is not defensible is the claim that one medium is inherently more musical than the other, or that digital audio lacks something fundamental that tape possesses. They are differently imperfect. The imperfections of tape happen to sit at harmonic relationships that Western ears, shaped by a tradition of music built on those same harmonic intervals, find pleasing. The imperfections of digital audio are not at pleasing harmonic intervals; they are wideband quantisation noise (before shaping) or ultrasonic shaped noise (after), and a sharp cliff at full scale. Different physics, different perceptual character.

A Personal Note

I spent a long time thinking the tape versus digital debate was mostly audiophile mythology — a community of enthusiasts rationalising the warmth of nostalgia as the warmth of oxide particles. The physics is more interesting than that, and doing the calculation changed my view. The second-harmonic content of tape saturation is not an accident or a romantic story; it is what you get when you push a symmetric nonlinearity with an audio sine wave, and the reason it sounds pleasant is not arbitrary but is grounded in the physics of consonance and the harmonic series. The delta-sigma converter is not a mundane commodity chip but a genuinely elegant solution to an otherwise intractable filter-design problem, and the fact that it achieves 24-bit resolution from a 1-bit comparator by spectral redistribution of noise is the kind of result that should get more attention in physics education.

Both technologies deserve better than the aesthetics argument they have been fighting in for forty years. The tools to understand them are not exotic — Taylor series, Fourier analysis, the z-transform, and the basic physics of ferromagnetism — and the reward is a clear-eyed picture of what is actually going on inside two of the most consequential inventions in the history of recorded music. If you are interested in related mathematics underlying other aspects of music, the posts on Euclidean rhythms and Messiaen’s modes and group theory cover the combinatorial and algebraic structures in rhythm and pitch that sit alongside the physics discussed here.

References

[1] Candy, J. C., & Temes, G. C. (Eds.). (1992). Oversampling Delta-Sigma Data Converters: Theory, Design, and Simulation. IEEE Press.

[2] Reiss, J. D., & McPherson, A. (2015). Audio Effects: Theory, Implementation and Application. CRC Press.

[3] Bertram, H. N. (1994). Theory of Magnetic Recording. Cambridge University Press.

[4] Pohlmann, K. C. (2010). Principles of Digital Audio (6th ed.). McGraw-Hill.

[5] Norsworthy, S. R., Schreier, R., & Temes, G. C. (Eds.). (1997). Delta-Sigma Data Converters: Theory, Design, and Simulation. IEEE Press.

Changelog

2026-01-14: Updated the interval description for the 7th harmonic to “above the double octave.” The 7th harmonic (7f) sits between the double octave (4f) and the triple octave (8f).

The Cat's Eye: Slit Pupils, Thin-Film Mirrors, and 135-Fold Dynamic Range

Mon, 23 Jun 2025 00:00:00 +0000

Flash photography of cats produces glowing eyes. This is familiar enough that most people do not find it strange. But the physics that produces it — a biological multilayer interference reflector built from crystalline rodlets of riboflavin and zinc, tuned to the peak of night-vision sensitivity, sending returning photons through the retina for a second pass — is not familiar at all. I started thinking about this after photographing our cats at dusk — through the doorway; they are indoor cats now, for health reasons — and finding their eyes lit up a colour that depends on the angle: greenish from straight ahead, golden from the side. The angle-dependence is a direct consequence of the thin-film interference condition, and the different colours correspond to different constructive interference wavelengths at different angles of incidence.

The eye contains two optical solutions — pupil geometry and tapetum — that address different aspects of the same problem: how to function across a very large range of light levels, from bright midday sun to the dim luminance of a starlit field.

The Dynamic Range Problem

A crepuscular predator — active around dawn and dusk — must function visually across a light-level range of roughly $10^8$:$1$. The sun on a bright day produces retinal illuminance of around $10^5\,\mathrm{photons}/(\mu\mathrm{m}^2\cdot\mathrm{s})$; a moonless night produces roughly $10^{-3}$ in the same units. The ratio is approximately $10^8$.

The pupil is the variable aperture that controls how much light reaches the retina. The larger the pupil area, the more light admitted; the smaller the area, the less. For the human eye, the pupil diameter ranges from approximately $2\,\mathrm{mm}$ (bright light) to $8\,\mathrm{mm}$ (darkness), giving a maximum area ratio of:

$$\frac{A_\mathrm{max}}{A_\mathrm{min}} = \left(\frac{8}{2}\right)^2 = 16.$$

This is a dynamic range of 16:1 from the pupil alone. The remaining $10^8 / 16 \approx 6 \times 10^6$ factor in adaptation comes from neural and photochemical mechanisms in the retina itself (photopigment bleaching, dark adaptation of rods vs. cones, lateral inhibition).

For a domestic cat, the same measurement gives something different.

The Slit Pupil: 135:1 Dynamic Range

Banks, Sprague, Schmoll, Parnell, and Love published “Why do animal eyes have pupils of different shapes?” in Science Advances in 2015 (1:7, e1500391). They analysed pupil shape and size data from 214 terrestrial species and correlated pupil geometry with ecological niche.

Their principal finding for slit pupils: the domestic cat pupil, a vertical slit, achieves an area ratio of approximately 135:1 between maximum dilation and maximum constriction. Numerically:

$$\frac{A_\mathrm{max}}{A_\mathrm{min}} \approx 135.$$

The mechanism that makes this possible is geometrical. A circular pupil’s minimum area is limited by diffraction: constricting a circular aperture below about $2\,\mathrm{mm}$ diameter produces diffraction rings that degrade image quality. A slit, by contrast, can be made arbitrarily narrow in one direction while retaining a larger dimension in the other, limiting diffraction in only one axis. The vertical slit in a cat pupil can constrict to a width of $\sim 0.3\,\mathrm{mm}$ while retaining a height of $\sim 15\,\mathrm{mm}$, giving an area of roughly $0.3 \times 15 / (3.14 \times (8/2)^2) \times A_\mathrm{max}$ — approximately 135 times smaller.

The 135:1 ratio is nearly nine times the dynamic range achievable by the human circular pupil (16:1). This allows the cat’s pupil to do substantially more of the work of light adaptation, reducing the load on the slower neural and photochemical mechanisms.

Why Vertical? The Ecological Correlation

Banks et al. found a striking correlation between pupil geometry and predator ecology:

Vertical slit pupils correlate with ambush predators whose eyes are close to the ground — animals with shoulder height below approximately $42\,\mathrm{cm}$.
Horizontal slit pupils correlate with prey animals and grazing herbivores (horses, goats, sheep, deer). The horizontal slit, when the animal lowers its head to graze, rotates to remain approximately horizontal (the eye counterrotates in the orbit), providing a wide panoramic field of view for detecting approaching predators.
Circular pupils correlate with pursuit predators (humans, dogs, large raptors) that hunt at larger distances where the precise vertical depth cues provided by the slit geometry are less critical.

The functional advantage of a vertical slit for a low-to-the-ground ambush predator is depth estimation by blur circles. The slit geometry produces strong defocus blur in the horizontal direction but sharp focus in the vertical direction. An ambush predator lying in grass needs to estimate the horizontal distance to prey accurately; the defocus differential between horizontal and vertical blur provides a stereoscopic-like depth cue even with one eye. This is a form of astigmatic blur ranging: the degree of horizontal blur for a given focal setting encodes the object’s distance.

The correlation across 214 species is not perfect, but it is statistically robust: slit pupils in ground-level ambush predators is not coincidence, it is selection pressure.

The Tapetum Lucidum: A Biological Dielectric Mirror

Behind the retina, most nocturnal and crepuscular mammals possess a reflective layer called the tapetum lucidum (literally: “bright carpet”). Light that passes through the retina without being absorbed by a photoreceptor strikes the tapetum and is reflected back through the retina for a second absorption opportunity. This roughly doubles the effective optical path length through the photoreceptor layer, substantially increasing the probability of photon capture at low light levels.

The cat tapetum is a tapetum cellulosum: a layer of specialised cells whose cytoplasm contains dense arrays of rod-shaped crystalline inclusions composed primarily of riboflavin (vitamin B$_2$) and zinc. (This is distinct from the guanine-crystal tapeta found in fish and some reptiles.) The crystalline rodlets have a refractive index of approximately $n_1 \approx 1.8$; they alternate with layers of cytoplasm with refractive index $n_2 \approx 1.33$ (close to water). The rodlet arrays form a multilayer thin-film reflector.

Thin-Film Interference: The Physics of the Reflection

The physics of the tapetum is identical to the physics of anti-reflection coatings on camera lenses and dielectric mirrors in laser cavities.

Consider a single thin film of thickness $d$ and refractive index $n_1$ embedded between media of index $n_2 < n_1$. Light of wavelength $\lambda$ (in vacuum) incident at angle $\theta$ to the normal undergoes partial reflection at both interfaces. The two reflected beams interfere constructively when their optical path difference is a multiple of the wavelength:

$$\Delta = 2 n_1 d \cos\theta = m\lambda, \quad m = 1, 2, 3, \ldots$$

For the tapetum, typical rodlet diameter is $d \approx 100$–$120\,\mathrm{nm}$. With $n_1 \approx 1.8$ and $\theta \approx 0°$ (normal incidence), the first constructive interference maximum for a single layer occurs at:

$$\lambda_\mathrm{peak} = 2 n_1 d = 2 \times 1.8 \times 100\,\mathrm{nm} \approx 360\,\mathrm{nm}.$$

Wait — that is in the ultraviolet. The tapetum must have multiple layers.

For a stack of $N$ rodlet layers, the reflectance is strongly enhanced (approaching unity for large $N$) and the peak wavelength of the fundamental reflection maximum shifts. The relevant periodicity is the combined optical thickness of one rodlet layer plus one cytoplasm layer:

$$d_\mathrm{eff} = n_1 d_1 + n_2 d_2,$$

where $d_1 \approx 100\,\mathrm{nm}$ is the rodlet diameter and $d_2 \approx 50$–$100\,\mathrm{nm}$ is the cytoplasm spacing. Taking $d_2 \approx 60\,\mathrm{nm}$:

$$d_\mathrm{eff} = 1.8 \times 100 + 1.33 \times 60 \approx 180 + 80 = 260\,\mathrm{nm}.$$

Constructive interference (quarter-wave condition for a multilayer stack) at $m = 1$:

$$\lambda_\mathrm{peak} = 2 d_\mathrm{eff} \approx 520\,\mathrm{nm}.$$

This is green — close to the peak of the scotopic (rod) sensitivity curve at $\lambda_\mathrm{max,rod} = 498\,\mathrm{nm}$. The tapetum is tuned to reflect the wavelengths that the night-vision photoreceptors are most sensitive to. (The exact peak depends on rodlet spacing, which varies across the tapetum; this produces the observed variation from green to yellow.)

The angle-dependence of the peak wavelength follows from the interference condition: at angle $\theta$ to the normal, $\lambda_\mathrm{peak}(\theta) = 2 d_\mathrm{eff} \cos\theta$. At $\theta = 30°$, $\cos 30° \approx 0.87$, giving $\lambda_\mathrm{peak} \approx 450\,\mathrm{nm}$ — blue. At $\theta = 60°$, $\cos 60° = 0.5$, giving $\lambda \approx 260\,\mathrm{nm}$ — ultraviolet, invisible. The colour of eyeshine in a flash photograph therefore depends on the angle between the camera and the eye, exactly as observed.

Reflectance of a Multilayer Stack

For $N$ identical bilayers (each of optical thickness $n_1 d_1 + n_2 d_2$), the reflectance at the design wavelength is given by the transfer matrix method. For the cat tapetum with $N \approx 10$–$15$ bilayers:

$$R = \left(\frac{1 - (n_2/n_1)^{2N}}{1 + (n_2/n_1)^{2N}}\right)^2 \approx 1 - 4\left(\frac{n_2}{n_1}\right)^{2N}.$$

With $n_2/n_1 = 1.33/1.8 \approx 0.739$ and $N = 15$:

$$(0.739)^{30} \approx 1.1 \times 10^{-4}.$$

The reflectance is approximately $1 - 4 \times 1.1 \times 10^{-4} \approx 0.9996$ — essentially $100\%$ at the design wavelength for a sufficiently thick stack. The tapetum is a near-perfect reflector in a narrow wavelength band, a biological dielectric mirror.

Photon Statistics at Low Light

The tapetum’s function becomes clearest when framed in terms of photon statistics. A single rod photoreceptor has an absorption probability of approximately $\eta_\mathrm{single} \approx 25\%$ for a photon passing through it once at $\lambda = 500\,\mathrm{nm}$.

With the tapetum reflecting the photon back for a second pass, the total absorption probability becomes:

$$\eta_\mathrm{total} = \eta + (1 - \eta)\, R\, \eta,$$

where $R \approx 1$ is the tapetum reflectance. For $\eta = 0.25$ and $R = 0.98$:

$$\eta_\mathrm{total} = 0.25 + (0.75)(0.98)(0.25) = 0.25 + 0.184 \approx 0.43.$$

The double pass increases the photon detection efficiency from $25\%$ to approximately $43\%$ — a factor of $1.7\times$.

At extremely low light levels, photon detection becomes a counting problem governed by Poisson statistics. If a mean of $\bar{n}$ photons reaches a single photoreceptor per integration time, the probability of detecting at least one photon (and hence registering the presence of light) is:

$$P(\text{detection}) = 1 - e^{-\bar{n}\,\eta_\mathrm{total}}.$$

For very dim stimuli where $\bar{n} \approx 1$–$3$ photons per rod per integration time (close to the absolute threshold of cat vision at around $7 \times 10^{-7}\,\mathrm{lux}$), increasing $\eta$ by a factor of $\sim 1.7$ has a significant effect on detection probability. The tapetum is not a luxury at low light levels; it is a biophysical necessity for sub-threshold light detection.

Percy Shaw and the Road Catseye

In 1934, Percy Shaw, a road-mender from Halifax, applied for a British patent for a retroreflective road stud that he called the “Catseye.” Shaw’s stated inspiration was the reflection of his car headlights from a cat’s eyes while driving on an unlit road at night. Whether this story is entirely accurate is unclear, but the name and the inspiration are both documented in period sources.

Shaw’s device uses a different retroreflection mechanism from the tapetum. The tapetum produces specular (mirror-like) reflection in the back-focal plane of the eye’s lens — light returning along its incident path because the lens refocuses it. Shaw’s Catseye uses glass hemisphere retroreflectors (or, in later versions, corner-cube retroreflectors) that return light toward its source by total internal reflection rather than thin-film interference.

The corner-cube geometry guarantees retroreflection: any ray entering a trihedral corner (three mutually perpendicular surfaces) reflects from all three surfaces and exits parallel to the incident direction, regardless of the angle of incidence. The mathematical proof is that the product of three reflections in mutually perpendicular planes is the identity transformation on vectors up to a sign change — the direction vector $\hat{v}$ exits as $-\hat{v}$, which is exactly retroreflection.

$$\hat{v}_\mathrm{out} = -\hat{v}_\mathrm{in}.$$

Shaw’s road Catseye became standard equipment on British roads during the Second World War, credited with a significant reduction in road fatalities during blackouts and foggy conditions. The biological original was a multilayer interference mirror; the engineering copy is a corner-cube retroreflector. Different physics, same function, same name.

Two Optical Solutions to One Problem

The cat’s eye contains two distinct optical technologies:

The slit pupil — a variable aperture with 135:1 dynamic range, optimised for depth estimation by astigmatic blur in a low-to-the-ground ambush predator.
The tapetum lucidum — a multilayer thin-film reflector of riboflavin crystalline rodlets, tuned to the scotopic sensitivity peak, achieving near-100% reflectance at design wavelength and increasing photon detection efficiency by a factor of approximately $1.7\times$.

Both solutions were arrived at by natural selection over millions of years of low-light hunting. Both have been copied — one consciously (Shaw’s road reflectors), one as a model for engineered multilayer reflectors in telescopes, laser cavities, and narrowband optical filters.

When I photograph our cats at dusk and their eyes glow green, I am seeing the thin-film interference of a biological photonic crystal — riboflavin rodlets in cytoplasm — wavelength-selected to send green photons back through rod cells for a second chance at absorption. The green is not cosmetic. It is functional, and it is physics.

References

Banks, M.S., Sprague, W.W., Schmoll, J., Parnell, J.A.Q., & Love, G.D. (2015). Why do animal eyes have pupils of different shapes? Science Advances, 1(7), e1500391. https://doi.org/10.1126/sciadv.1500391
Ollivier, F.J., Samuelson, D.A., Brooks, D.E., Lewis, P.A., Kallberg, M.E., & Komaromy, A.M. (2004). Comparative morphology of the tapetum lucidum (among selected species). Veterinary Ophthalmology, 7(1), 11–22. https://doi.org/10.1111/j.1463-5224.2004.00318.x
Born, M., & Wolf, E. (1999). Principles of Optics (7th ed.). Cambridge University Press. (Chapters 1, 7 on thin-film interference and multilayer coatings.)
Shaw, P. (1934). Improvements in Studs for Roads and like Surfaces. British Patent 436,290. Applied 3 April 1934.
Warrant, E.J. (1999). Seeing better at night: Life style, eye design and the optimum strategy of spatial and temporal summation. Vision Research, 39(9), 1611–1630. https://doi.org/10.1016/S0042-6989(98)00262-4

Changelog

2025-12-15: Corrected the adoption date of Percy Shaw’s road Catseyes from “from 1945 onward” to “during the Second World War” (widespread adoption began under wartime blackout conditions, not after the war ended). Removed the Machan, Gu, & Bharthuar (2020) reference, which could not be confirmed in available databases.

From Thought Experiment to Qubit: Schrödinger's Cat at Ninety

Mon, 27 Jan 2025 00:00:00 +0000

I have two live cats — indoor-only now, for health reasons, a fact they register as an ongoing injustice. This already puts me in a better epistemic position than Schrödinger, who had one hypothetical dead-or-alive one. I want to use this advantage to say something substantive about what the thought experiment actually claimed, why it was not a paradox but a critique, and what has happened in the ninety years since — because what has happened is extraordinary. The cat state is now an engineering specification.

The 1935 Thought Experiment

Erwin Schrödinger introduced the cat in a paper titled “Die gegenwärtige Situation in der Quantenmechanik” (Naturwissenschaften, 1935). The paper is a critique of the Copenhagen interpretation of quantum mechanics, not an endorsement of macroscopic superposition.

The setup is familiar: a cat is placed in a sealed chamber with a radioactive atom, a Geiger counter, a hammer, and a vial of poison. If the atom decays in one hour, the counter fires, the hammer falls, the vial breaks, and the cat dies. If the atom does not decay, the cat lives. The atom is a quantum system; after one hour it is in a superposition of decayed and undecayed states.

Quantum mechanics — specifically, the Schrödinger equation, applied without any special rule for measurement — says the entire system (atom + counter + hammer + vial + cat) evolves into a superposition:

$$|\Psi\rangle = \frac{1}{\sqrt{2}}\bigl(|\text{decayed}\rangle|\text{cat dead}\rangle

|\text{undecayed}\rangle|\text{cat alive}\rangle\bigr).$$

Schrödinger’s point was that this is absurd: the cat is either dead or alive, not a superposition of both, and any interpretation of quantum mechanics that predicts otherwise is failing at the level of macroscopic physical reality. He intended the cat as a reductio ad absurdum — a demonstration that taking the wave function literally at macroscopic scales leads to nonsense.

He was not proposing that cats are literally in superposition. He was proposing that the theory was incomplete.

What Actually Resolves the Cat

The resolution that modern physics offers is decoherence — the process by which a quantum superposition is destroyed through entanglement with the environment.

A macroscopic object — a cat, a hammer, a Geiger counter — is coupled to an enormous number of environmental degrees of freedom: air molecules, photons, phonons in its own structure. Each of these interactions entangles the macroscopic system with the environment, and the entanglement effectively destroys the coherence between branches of the superposition. What starts as

$$|\Psi\rangle = \frac{1}{\sqrt{2}}(|\text{decayed}\rangle|\text{dead}\rangle

|\text{undecayed}\rangle|\text{alive}\rangle)$$

rapidly becomes, after environmental entanglement (tracing over environmental degrees of freedom $|E\rangle$):

$$\rho = \frac{1}{2}|\text{decayed}\rangle\langle\text{decayed}| \otimes |\text{dead}\rangle\langle\text{dead}|

\frac{1}{2}|\text{undecayed}\rangle\langle\text{undecayed}| \otimes |\text{alive}\rangle\langle\text{alive}|.$$

This is a mixed state, not a superposition. The off-diagonal terms (the interference terms that distinguish a superposition from a classical mixture) vanish on a timescale

$$\tau_\mathrm{decoherence} \sim \frac{\hbar}{E_\mathrm{int}} \cdot \frac{1}{N},$$

where $E_\mathrm{int}$ is the interaction energy with each environmental degree of freedom and $N$ is the number of such degrees of freedom. For a macroscopic object at room temperature, $\tau_\mathrm{decoherence}$ is of order $10^{-20}$–$10^{-30}$ seconds — unmeasurably short. The cat is never in a superposition for any observable duration. The superposition collapses before any measurement can resolve it.

This is not a philosophical solution to the measurement problem — it does not explain why a particular measurement outcome is obtained, only why we never observe interference between macroscopic branches — but it does explain why Schrödinger’s setup does not produce an observable macroscopic superposition. The cat’s entanglement with its own environment (the box, the air, its own thermal photons) destroys the coherence long before any observation.

What a Cat State Actually Is

In quantum optics, a cat state is not a cat in a superposition. It is a specific quantum state of a harmonic oscillator (typically a mode of the electromagnetic field) that was named in honour of Schrödinger’s thought experiment.

A coherent state $|\alpha\rangle$ is the quantum state that most closely resembles a classical oscillating electromagnetic field with amplitude $\alpha \in \mathbb{C}$. Coherent states are eigenstates of the annihilation operator: $\hat{a}|\alpha\rangle = \alpha|\alpha\rangle$. The mean photon number is $\bar{n} = |\alpha|^2$.

A cat state is a superposition of two coherent states with opposite phases:

$$|\mathrm{cat}_\pm\rangle = \mathcal{N}_\pm\bigl(|\alpha\rangle \pm |-\alpha\rangle\bigr),$$

where $\mathcal{N}_\pm = 1/\sqrt{2(1 \pm e^{-2|\alpha|^2})}$ is the normalisation constant. For large $|\alpha|$, the two coherent states are nearly orthogonal: $\langle -\alpha | \alpha \rangle = e^{-2|\alpha|^2} \approx 0$.

The Wigner quasi-probability distribution of a cat state is revealing. The Wigner function of a coherent state $|\alpha\rangle$ is a Gaussian peaked at $(x, p) = (\sqrt{2}\,\mathrm{Re}\,\alpha, \sqrt{2}\,\mathrm{Im}\,\alpha)$. The cat state Wigner function is:

$$W_{\mathrm{cat}+}(x,p) = \mathcal{N}+^2\bigl[W_{|\alpha\rangle}(x,p) + W_{|-\alpha\rangle}(x,p)

2W_\mathrm{int}(x,p)\bigr],$$

where the interference term $W_\mathrm{int}$ has negative values in the region between the two Gaussian peaks. Negative regions of the Wigner function are a signature of non-classical states; they cannot arise from any classical probability distribution. The cat state is quantum mechanical in a way that coherent states are not.

Haroche and the Nobel Prize

Serge Haroche (ENS Paris) spent two decades developing techniques to create, control, and observe cat states of the electromagnetic field in real time. His experiment used a superconducting microwave cavity — a polished copper box cooled to near absolute zero — in which single microwave photons could be trapped for hundreds of milliseconds, and a beam of single Rydberg atoms to probe the field non-destructively.

Haroche created cat states of cavity photons and, crucially, watched their decoherence in real time: as the quantum coherence between the two branches $|\alpha\rangle$ and $|-\alpha\rangle$ was progressively destroyed by coupling to the environment, the Wigner function’s negative region (the interference fringe) smoothed out and disappeared, leaving a classical mixture. The decoherence rate was proportional to $|\alpha|^2$ — the mean photon number, which measures how “macroscopic” the cat state is:

$$\Gamma_\mathrm{decoherence} \propto |\alpha|^2 \cdot \kappa,$$

where $\kappa$ is the photon loss rate of the cavity. A larger cat (larger $|\alpha|^2$) decoheres faster, as Schrödinger’s argument implicitly requires.

Haroche shared the 2012 Nobel Prize in Physics with David Wineland “for ground-breaking experimental methods that enable measuring and manipulation of individual quantum systems.”

Cat Qubits: From Paradox to Engineering

The step from fundamental physics to quantum computing was taken when researchers noted that the two coherent states $|\alpha\rangle$ and $|-\alpha\rangle$ can serve as the two computational basis states of a qubit:

$$|0\rangle_L \equiv |\alpha\rangle, \quad |1\rangle_L \equiv |-\alpha\rangle.$$

The cat qubit encodes a logical qubit in this pair of coherent states. Its remarkable property is an intrinsic asymmetry between error types.

Bit-Flip Suppression

A bit-flip error ($|0\rangle_L \leftrightarrow |1\rangle_L$, i.e., $|\alpha\rangle \leftrightarrow |-\alpha\rangle$) requires flipping the amplitude of the oscillator from $+\alpha$ to $-\alpha$. For a stabilised cat qubit (confined to the cat-state manifold by a parametric drive), this requires overcoming an energy barrier proportional to $|\alpha|^2$. The bit-flip time scales exponentially:

$$T_\mathrm{bit-flip} \sim T_1 \cdot e^{2|\alpha|^2},$$

where $T_1$ is the single-photon loss time. For modest values of $|\alpha|^2$ (mean photon numbers of 5–10), the bit-flip time can exceed minutes.

A phase-flip error (the other error type) is not suppressed — the cat qubit is still vulnerable to dephasing at a rate proportional to $|\alpha|^2$. This creates a strongly biased noise channel: only one of the two error types is relevant.

The Engineering Consequence

Biased noise is useful because it allows the error-correcting code to focus its resources on only one error type. A repetition code (a string of cat qubits where phase errors are corrected by majority vote) can suppress the phase-flip error arbitrarily while the exponential bit-flip suppression handles the other. The hardware overhead for fault tolerance — the ratio of physical qubits to logical qubits — is dramatically reduced compared to codes that must handle both error types equally.

In 2023 and 2024, several groups demonstrated cat qubits with bit-flip times of seconds to minutes:

Grimm et al. (2020, Nature 584, 205): Kerr cat qubit with exponential bit-flip suppression demonstrated in a superconducting circuit.
Berdou et al. (2023, PRX Quantum 4, 020350): Cat qubit with $T_X$ exceeding $100$ seconds.
Reglade et al. (2024, Nature 629, 778–783): Cat qubits from Alice & Bob demonstrating exponential scaling $T_\mathrm{bit-flip} \propto e^{2|\alpha|^2}$ with mean photon numbers up to $|\alpha|^2 \approx 10$, pushing bit-flip times beyond $10$ seconds in the laboratory and, in subsequent chip demonstrations, beyond several minutes.

This is the state of the art as of early 2025: the cat qubit is no longer a curiosity but a competitive architecture for fault-tolerant quantum computing, with bit-flip coherence times exceeding the best alternative approaches.

The Wigner Function and Quantum Non-Classicality

The Wigner quasi-probability distribution provides the most informative picture of a quantum state’s non-classicality. For a state with density matrix $\rho$, the Wigner function is:

$$W(x, p) = \frac{1}{\pi\hbar} \int_{-\infty}^{\infty} \langle x + y | \rho | x - y \rangle\, e^{2ipy/\hbar}\, dy.$$

For the cat state $|\mathrm{cat}_+\rangle$ with $|\alpha|^2 = 4$ (four mean photons in each coherent component), the Wigner function has two positive Gaussian peaks at $(x, p) = (\pm\sqrt{2}|\alpha|, 0)$ and an oscillating interference fringe between them with negative regions of amplitude $\sim -2/\pi$. The negativity of the Wigner function is a necessary condition for the state to exhibit quantum features that no classical mixture can reproduce.

As decoherence proceeds (e.g., through photon loss in a cavity), the negative regions shrink and eventually vanish — the Wigner function becomes everywhere non-negative, and the state becomes classically describable as a mixture of coherent states. This is the quantum-to-classical transition, made visible in phase space.

Haroche’s team measured this process directly, frame by frame, in real time. It is one of the most dramatic experimental visualisations of decoherence ever achieved.

What Schrödinger Would Make of This

Schrödinger was a physicist, not a philosopher of language. If told in 1935 that ninety years later, the superposition of two distinguishable states of a harmonic oscillator — named after his cat, with the same formal structure as his thought experiment — would be the leading candidate for the basic unit of a fault-tolerant quantum computer, he would have had two questions.

The first: how do you maintain the superposition against decoherence? The answer is that you work at millikelvin temperatures in superconducting circuits, and you use an active parametric drive to confine the state to the cat-state manifold.

The second, I think, would have been: does this resolve the measurement problem? And the honest answer remains: no, not fully. Decoherence explains why macroscopic superpositions are unobservable, but it does not explain why any particular measurement outcome occurs. That question is as open as it was in 1935.

What has changed is the practical relationship between quantum theory and technology. The uncertainty Schrödinger was pointing at — the strangeness of superposition, the fragility of coherence, the role of the environment — is now a resource to be engineered, not a conceptual embarrassment to be resolved. The cat qubit works precisely because the decoherence is asymmetric: bit flips are exponentially suppressed while phase flips are correctable. The asymmetry is exploited, not apologised for.

My two cats, meanwhile, are in definite classical states. One is on the radiator. The other is on the keyboard.

References

Grimm, A., Frattini, N.E., Puri, S., Mundhada, S.O., Touzard, S., Mirrahimi, M., Girvin, S.M., Shankar, S., & Devoret, M.H. (2020). Stabilization and operation of a Kerr-cat qubit. Nature, 584, 205–209. https://doi.org/10.1038/s41586-020-2587-z
Haroche, S., & Raimond, J.-M. (2006). Exploring the Quantum: Atoms, Cavities, and Photons. Oxford University Press.
Reglade, U., Bocquet, A., Gautier, R., et al. (2024). Quantum control of a cat qubit with bit-flip times exceeding ten seconds. Nature, 629, 778–783. https://doi.org/10.1038/s41586-024-07294-3
Mirrahimi, M., Leghtas, Z., Albert, V.V., Touzard, S., Schoelkopf, R.J., Jiang, L., & Devoret, M.H. (2014). Dynamically protected cat-qubits: A new paradigm for universal quantum computation. New Journal of Physics, 16, 045014. https://doi.org/10.1088/1367-2630/16/4/045014
Schrödinger, E. (1935). Die gegenwärtige Situation in der Quantenmechanik. Naturwissenschaften, 23(48), 807–812; 23(49), 823–828; 23(50), 844–849. https://doi.org/10.1007/BF01491891
Walls, D.F., & Milburn, G.J. (2008). Quantum Optics (2nd ed.). Springer.
Zurek, W.H. (2003). Decoherence, einselection, and the quantum origins of the classical. Reviews of Modern Physics, 75(3), 715–775. https://doi.org/10.1103/RevModPhys.75.715

Changelog

2026-02-17: Updated “bit-flip times exceeding seven minutes” in the summary to “exceeding minutes,” aligning with the sourced figures: the body text reports “beyond several minutes” and Reglade et al. (2024) report “exceeding ten seconds.”

Primes Are Energy Levels: The Montgomery-Odlyzko Conjecture

Mon, 18 Nov 2024 00:00:00 +0000

A Very Large Prime

On 12 October 2024, a retired NVIDIA engineer named Luke Durant announced that he had found the 52nd known Mersenne prime. The number is $2^{136{,}279{,}841} - 1$, and writing it out in decimal requires 41,024,320 digits. Durant had organised a cloud network of GPU servers spread across 17 countries — essentially repurposing the hardware that normally trains language models to instead do modular arithmetic on numbers with tens of millions of digits. The verification alone took about 51 days of computation.

This is the kind of thing that makes headlines, and it deserves them. Mersenne primes are rare and verifying them is genuinely hard. But if I am honest, the more interesting prime story of the last half-century is not about the record-breaking number. It is about a conversation over tea in Princeton in 1972, and the increasingly hard-to-dismiss suspicion that the prime numbers are, in a precise statistical sense, quantum energy levels.

When I say “quantum energy levels,” I mean it almost literally — not as a metaphor. Let me explain.

The Riemann Zeta Function Encodes the Primes

Start with the most famous function in number theory. For $\operatorname{Re}(s) > 1$, the Riemann zeta function is defined by the series

$$\zeta(s) = \sum_{n=1}^{\infty} \frac{1}{n^s}.$$

This converges nicely and defines an analytic function. But the real reason to care about it is Euler’s product formula:

$$\zeta(s) = \prod_{p \text{ prime}} \frac{1}{1 - p^{-s}}.$$

This is not obvious — it follows from unique prime factorisation, essentially — but its implications are enormous. The product runs over all primes, and each prime contributes a factor. The primes are encoded in the analytic structure of $\zeta$. If you know $\zeta$, you know the primes; if you understand the zeros of $\zeta$, you understand their distribution.

Riemann’s 1859 paper made this explicit (Riemann, 1859). He showed that $\zeta$ extends analytically to the whole complex plane (minus a simple pole at $s = 1$), and he wrote down an explicit formula connecting the prime-counting function

$$\pi(x) = \#\{p \leq x : p \text{ prime}\}$$

to the zeros of $\zeta$. The formula is

$$\pi(x) \approx \operatorname{Li}(x) - \sum_{\rho} \operatorname{Li}(x^{\rho}) + \text{(lower-order terms)},$$

where $\operatorname{Li}(x) = \int_2^x \frac{dt}{\ln t}$ is the logarithmic integral and the sum runs over the non-trivial zeros $\rho$ of $\zeta$.

What are the non-trivial zeros? The zeta function has trivial zeros at the negative even integers $-2, -4, -6, \ldots$ — boring, understood. The non-trivial zeros lie in the critical strip $0 < \operatorname{Re}(s) < 1$, and their imaginary parts are what drive the oscillatory corrections to $\pi(x)$. Each zero $\rho = \frac{1}{2} + it_n$ contributes a term that oscillates like $x^{1/2} \cos(t_n \ln x)$. The prime distribution is a superposition of these oscillations, one per zero.

The Riemann Hypothesis is the claim that all non-trivial zeros lie on the critical line $\operatorname{Re}(s) = \frac{1}{2}$. It has been verified numerically for the first $10^{13}$ zeros (Gourdon, 2004; building on earlier high-height computations by Odlyzko, 1987). It has not been proved. It remains, after 165 years, the most important unsolved problem in mathematics.

Tea with Dyson

In 1972, Hugh Montgomery was visiting the Institute for Advanced Study in Princeton. He was working on a specific question: if you take the imaginary parts of the non-trivial zeros of $\zeta$ and normalise them so that their mean spacing is 1, what is the distribution of spacings between them?

More precisely, he was computing the pair correlation function of the normalised zeros. If $\tilde{\gamma}_n$ are the normalised imaginary parts (ordered $\tilde{\gamma}_1 \leq \tilde{\gamma}_2 \leq \cdots$), the pair correlation function $R_2(r)$ measures the density of pairs $(\tilde{\gamma}_m, \tilde{\gamma}_n)$ with $\tilde{\gamma}_n - \tilde{\gamma}_m \approx r$.

Montgomery found — subject to certain assumptions about the behaviour of $\zeta$ — that

$$R_2(r) = 1 - \left(\frac{\sin \pi r}{\pi r}\right)^2.$$

(Montgomery, 1973)

He mentioned this to Freeman Dyson over tea. Dyson — who had spent years on quantum mechanics and random matrix theory — recognised the formula immediately. That expression, $1 - (\sin \pi r / \pi r)^2$, is exactly the pair correlation function of eigenvalues of random matrices drawn from the Gaussian Unitary Ensemble.

Montgomery had not been thinking about quantum mechanics. Dyson had not been thinking about primes. The formula matched.

The Gaussian Unitary Ensemble

Let me say a few words about where that formula comes from in physics, because it is not obvious.

The Gaussian Unitary Ensemble (GUE) is a probability distribution over $N \times N$ Hermitian matrices. Specifically, it is the distribution proportional to $e^{-\operatorname{tr}(H^2)}$ on the space of Hermitian matrices, which is invariant under conjugation $H \mapsto U H U^\dagger$ for any unitary $U$. The entries on the diagonal are real Gaussians; the off-diagonal entries are complex Gaussians with independent real and imaginary parts.

In the limit $N \to \infty$, the eigenvalues of a GUE matrix distribute globally according to Wigner’s semicircle law. But the local statistics — the fine-grained distribution of spacings between nearby eigenvalues — follow a universal law. The pair correlation function is

$$R_2^{\text{GUE}}(r) = 1 - \left(\frac{\sin \pi r}{\pi r}\right)^2.$$

This distribution has a crucial qualitative feature called level repulsion: as $r \to 0$, $R_2(r) \to 0$. Eigenvalues of random Hermitian matrices strongly avoid each other. A Poisson distribution — which is what you would get for eigenvalues that were statistically independent — would give $R_2(r) = 1$ everywhere, with no such repulsion. The GUE formula suppresses small gaps quadratically: $R_2(r) \sim \pi^2 r^2 / 3$ for small $r$.

Why does GUE statistics arise in physics? This is the content of the Bohigas-Giannoni-Schmit conjecture (1984), which by now has overwhelming numerical support: quantum systems whose classical limit is chaotic and which lack time-reversal symmetry have energy level statistics described by the GUE. Systems with time-reversal symmetry fall into the Gaussian Orthogonal Ensemble (GOE), which has a different but related formula. Nuclear energy levels, quantum billiards with the right shapes, molecular spectra — all of them, when appropriately normalised, show GUE or GOE statistics.

The universality is the point. It does not matter what the specific Hamiltonian is. If the system is sufficiently chaotic, the eigenvalue statistics are universal.

Odlyzko’s Computation

Montgomery’s result was conditional and covered only a limited range of $r$. The natural next step was numerical verification: actually compute a large number of Riemann zeros and measure their pair correlation.

Andrew Odlyzko did exactly this, in a series of computations beginning in the 1980s. The results were striking (Odlyzko, 1987). He computed millions of zeros with high precision and compared their empirical pair correlation to the GUE prediction. The agreement was not merely qualitative — it was quantitatively exact, to within the statistical error of the sample.

Odlyzko then pushed further. He computed zeros near the $10^{20}$-th zero, far out on the critical line. Same statistics. He computed zeros near the $10^{22}$-th zero. Same statistics. The agreement held regardless of how far up the critical line one went. This is not a small-sample artifact and it is not coincidence, or at least it would be an extraordinary coincidence of a kind that mathematics has never before encountered.

The plots from Odlyzko’s computations are, in my view, some of the most beautiful images in mathematics. You draw the GUE prediction — a smooth curve, starting at zero, rising to approach 1 — and you overlay the empirical histogram from the Riemann zeros. They are the same curve.

Berry, Keating, and the Missing Hamiltonian

If the zeros of $\zeta$ are energy levels, there should be a Hamiltonian $H$ — a self-adjoint operator — whose spectrum is exactly $\{t_n\}$, the imaginary parts of the non-trivial zeros (assuming the Riemann Hypothesis, so that all zeros are of the form $\frac{1}{2} + it_n$).

In 1999, Michael Berry and Jon Keating proposed a candidate (Berry & Keating, 1999). Their suggestion was the classical Hamiltonian

$$H_{\text{cl}} = xp,$$

where $x$ is position and $p$ is momentum, quantized with appropriate symmetrization:

$$\hat{H} = \frac{1}{2}(\hat{x}\hat{p} + \hat{p}\hat{x}).$$

Classically, $H = xp$ describes a system in which the phase-space trajectories are hyperbolas $xp = E = \text{const}$, and the motion is $x(t) = x_0 e^t$, $p(t) = p_0 e^{-t}$ — exponential expansion in position, contraction in momentum. This is essentially the dynamics of an unstable fixed point, and it is classically chaotic in the appropriate sense.

The semiclassical (WKB) approximation gives an eigenvalue counting function

$$N(E) \approx \frac{E}{2\pi} \ln \frac{E}{2\pi} - \frac{E}{2\pi} + \frac{7}{8} + \cdots,$$

which matches Riemann’s formula for the number of zeros of $\zeta$ with imaginary part up to $T$:

$$N(T) = \frac{T}{2\pi} \ln \frac{T}{2\pi} - \frac{T}{2\pi} + \frac{7}{8} + O\!\left(\frac{\ln T}{T}\right).$$

This is not a coincidence: the correspondence is exact at the level of the smooth counting function. The hard part is the oscillatory corrections — and those require the specific eigenvalues, which requires knowing the boundary conditions.

The problem is that $\hat{H} = \frac{1}{2}(\hat{x}\hat{p} + \hat{p}\hat{x})$ as an operator on $L^2(\mathbb{R})$ is not bounded below and has a continuous spectrum, not a discrete one. Turning it into an operator with a discrete spectrum matching the Riemann zeros requires boundary conditions that have not been found. This is the crux: Berry and Keating have the right classical system, but the quantum boundary conditions are missing.

What would be profound about finding $\hat{H}$? If $\hat{H}$ is self-adjoint and bounded below ($\hat{H} \geq 0$), its eigenvalues are all non-negative real numbers. If those eigenvalues are the imaginary parts of the zeros, then all zeros have real part exactly $\frac{1}{2}$ — which is the Riemann Hypothesis. A proof of the existence of such a Hamiltonian would, in one stroke, resolve the most important open problem in mathematics.

Primes as Periodic Orbits: The Gutzwiller Analogy

The quantum chaos connection goes deeper than pair correlations. In semiclassical quantum mechanics, the Gutzwiller trace formula relates the density of quantum energy levels to a sum over classical periodic orbits:

$$d(E) = \bar{d}(E) + \sum_{\gamma} A_\gamma \cos\!\left(\frac{S_\gamma}{\hbar} - \phi_\gamma\right),$$

where the sum runs over all classical periodic orbits $\gamma$, $S_\gamma$ is the classical action of the orbit, $A_\gamma$ is an amplitude, and $\phi_\gamma$ is a phase (Maslov index correction). The smooth part $\bar{d}(E)$ comes from the Thomas-Fermi approximation; the oscillatory part encodes quantum interference between orbits.

The direct analogue in number theory is the explicit formula for the prime-counting function. Written as a formula for the oscillatory part of the zero-counting function, it reads

$$\psi(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2}\ln(1 - x^{-2}),$$

where $\psi(x) = \sum_{p^k \leq x} \ln p$ is the Chebyshev function and the sum is over non-trivial zeros $\rho$.

Comparing these two formulas term by term: the zeros $\rho$ of $\zeta$ play the role of the quantum energy levels $E_n$; the primes $p$ — and their prime powers $p^k$ — play the role of the classical periodic orbits $\gamma$. The “action” of the orbit corresponding to $p^k$ is $k \ln p$. The primes are the primitive periodic orbits; $p^k$ is the $k$-th traversal of that orbit.

This is not a metaphor or a loose analogy. The Selberg trace formula — developed for the Laplacian on hyperbolic surfaces — makes this correspondence rigorous in a related setting: the periodic geodesics on a hyperbolic surface play the role of primes, and the eigenvalues of the Laplacian play the role of Riemann zeros (Rudnick & Sarnak, 1996). The Riemann zeta function is the limit of a family of such systems, in some sense that is still being made precise.

I find it remarkable that the logarithms of primes — the most elementary sequence in arithmetic — appear as lengths of orbits in what would be a quantum chaotic system. Each prime contributes an oscillation to $\psi(x)$ with “frequency” proportional to its logarithm. You are, in a sense, hearing the primes as quantum interference.

This connects to a theme that comes up elsewhere on this blog. The falling cat problem involves Berry phase and geometric holonomy — again a situation where deep structure emerges from symmetry and topology. The Schrödinger cat in quantum computing involves the spectacular fragility of quantum coherence. The Riemann zeros are, if the conjecture is right, a quantum system that has never decohered — a perfectly coherent spectrum hiding inside the most ancient problem in mathematics.

A Brief Detour: Maynard and Primes Without Digits

While we are talking about primes, I cannot resist a detour through two results of James Maynard, who received the Fields Medal in 2022.

The first concerns bounded gaps. Euclid proved that there are infinitely many primes. The Twin Prime Conjecture says there are infinitely many pairs of primes $(p, p+2)$. This remains open. But in 2013, Yitang Zhang proved something extraordinary: there are infinitely many pairs of primes differing by at most 70,000,000 (Zhang, 2014). The bound is large, but the qualitative statement — that gaps between primes are bounded infinitely often — was completely new. Shortly thereafter, Maynard independently proved a much stronger result using the Maynard-Tao sieve: infinitely many prime pairs with gap at most 600 (Maynard, 2015). A crowdsourced effort (Polymath8b) brought the bound down to 246. The Twin Prime Conjecture remains open, but 246 is a long way from 70,000,000.

The second result is stranger. Maynard proved in 2016 that for any decimal digit $d \in \{0, 1, \ldots, 9\}$, there are infinitely many primes whose decimal representation contains no instance of $d$. There are infinitely many primes with no $7$ in their decimal expansion. There are infinitely many primes with no $3$. The proof uses techniques from analytic number theory, specifically exponential sum estimates and sieve methods, and the result holds not just for base 10 but for any base.

This is one of those results that sounds impossible on first hearing. Surely removing an entire digit should make most large numbers unavailable, so the primes run out? Not so. The density of such “digitless” numbers thins out, but not fast enough to eliminate infinitely many primes.

The 52nd Mersenne Prime and What We Do Not Know

Return to $M_{136{,}279{,}841} = 2^{136{,}279{,}841} - 1$. Mersenne primes have the form $2^p - 1$ where $p$ is a prime (though not all such numbers are prime — $2^{11} - 1 = 2047 = 23 \times 89$). They are tested via the Lucas-Lehmer primality test: define the sequence

$$s_0 = 4, \qquad s_{n+1} = s_n^2 - 2.$$

Then $M_p = 2^p - 1$ is prime if and only if $s_{p-2} \equiv 0 \pmod{M_p}$.

The test requires $p - 2$ squarings modulo $M_p$. Each squaring involves numbers with roughly $p$ digits, and modular reduction modulo $M_p = 2^p - 1$ is cheap because it reduces to bit-shifts. This is why GPU parallelism helps enormously: each squaring can be broken into many parallel multiplications of sub-blocks of digits. Durant’s cloud network was, in effect, a massively distributed modular arithmetic engine.

We do not know if there are infinitely many Mersenne primes. The heuristic Lenstra-Pomerance-Wagstaff conjecture says yes: the expected number of Mersenne primes $2^p - 1$ with $p \leq x$ is approximately

$$e^\gamma \ln x / \ln 2 \approx 1.78 \cdot \log_2 x,$$

where $\gamma \approx 0.5772$ is the Euler-Mascheroni constant. This predicts roughly logarithmic growth in the count — consistent with the 52 known examples — but is nowhere near proved.

The known Mersenne primes do not form a sequence with obviously regular gaps. The exponents $p$ are: 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, … and then larger, less predictable values. Whether their distribution has GUE-like statistics is not a standard research question (the sample is too small), but the question of whether the primes $p$ for which $2^p - 1$ is prime have any special structure is an active one. For now, the answer is: we do not know.

Why This Matters, and Why It Does Not Prove Anything

Let me be precise about what has and has not been established.

What has been established:

Montgomery proved (conditionally, assuming a form of the generalised Riemann Hypothesis) that the pair correlation of Riemann zeros, for a certain range of $r$, is given by $1 - (\sin \pi r / \pi r)^2$.
Odlyzko verified numerically — to extraordinary precision, over billions of zeros — that the full empirical pair correlation matches the GUE prediction.
The Gutzwiller/Selberg analogy between periodic orbits and primes is mathematically precise in related settings (hyperbolic surfaces, function fields over finite fields).
Rudnick and Sarnak proved that the $n$-point correlation functions of Riemann zeros match GUE for all $n$, subject to a plausible conjecture about $\zeta$ (Rudnick & Sarnak, 1996).

What has not been established:

There is no known Hamiltonian $\hat{H}$ whose spectrum is the set of Riemann zeros.
The Riemann Hypothesis remains open.
There is no proof that the Montgomery-Odlyzko connection is anything more than an extraordinary numerical coincidence.

The broader context is the Langlands program — a still-hypothetical grand unification of number theory, algebraic geometry, and representation theory, sometimes described as a “grand unified theory of mathematics.” The Langlands correspondence predicts deep connections between $L$-functions (generalisations of $\zeta$) and representations of algebraic groups. The spectral interpretation of Riemann zeros — if it could be made precise — would fit naturally into this framework. Some researchers believe that a proof of the Riemann Hypothesis will come from the Langlands side, not from analytic number theory or quantum mechanics. Others think the quantum chaos connection is the right road. Nobody knows.

What would it mean if the connection is real? It would mean that the prime numbers — discovered by Euclid, studied for two and a half millennia, used today in every TLS handshake and RSA key — are the eigenvalues of a physical Hamiltonian. The abstract number-theoretic structure and the physical quantum mechanical structure would be not merely analogous but identical. That is a claim of the same depth as the unexpected appearance of the same partial differential equations in heat flow, diffusion, and Brownian motion: a discovery that what seemed to be different phenomena are manifestations of the same underlying law.

Or it could be a very surprising coincidence. Mathematics has a long history of producing such coincidences — the same numbers appearing in unrelated contexts for reasons that, when understood, turned out not to be coincidences at all. I suspect this is not a coincidence. But suspicion is not proof.

A Closing Reflection

I started this post with the 52nd Mersenne prime because it is the news item that prompted me to write. GPU clusters finding 41-million-digit primes are genuinely impressive technology. But I keep returning to the image of Montgomery and Dyson at tea in 1972, and the formula $1 - (\sin \pi r / \pi r)^2$ connecting two conversations that had nothing to do with each other.

I have spent some time with random matrix theory, and separately with the zeta function, and the thing that still strikes me is how clean the connection is. This is not a numerical coincidence of the form “these two quantities agree to 3 decimal places.” Odlyzko’s plots show agreement across many orders of magnitude, for zeros computed billions of entries into the sequence. The GUE curve and the empirical histogram are, visually, the same curve.

As someone trained as a physicist, I find this both encouraging and slightly unsettling. Encouraging because it suggests that the primes are not random — they have a structure, one that matches the eigenvalue repulsion of quantum chaotic systems, and that structure might be the key to proving the Riemann Hypothesis. Unsettling because it means that the quantum mechanical formalism — which I always thought was a description of a physical world — seems to be reaching into pure arithmetic, where there is no wave function, no Hilbert space, no measurement. The primes do not know they are supposed to be energy levels. And yet, statistically, they are.

If you find a flaw in this picture, or know of a result I have missed, I am genuinely interested. Peer review is welcome — open an issue on GitHub.

References

Riemann, B. (1859). Über die Anzahl der Primzahlen unter einer gegebenen Grösse. Monatsberichte der Berliner Akademie.
Montgomery, H. L. (1973). The pair correlation of zeros of the zeta function. Analytic Number Theory, Proc. Symp. Pure Math., 24, 181–193.
Odlyzko, A. M. (1987). On the distribution of spacings between zeros of the zeta function. Mathematics of Computation, 48, 273–308. DOI: 10.2307/2007890
Berry, M. V., & Keating, J. P. (1999). The Riemann zeros and eigenvalue asymptotics. SIAM Review, 41(2), 236–266. DOI: 10.1137/S0036144598347497
Zhang, Y. (2014). Bounded gaps between primes. Annals of Mathematics, 179(3), 1121–1174. DOI: 10.4007/annals.2014.179.3.7
Maynard, J. (2015). Small gaps between primes. Annals of Mathematics, 181(1), 383–413. DOI: 10.4007/annals.2015.181.1.7
Rudnick, Z., & Sarnak, P. (1996). Zeros of principal L-functions and random matrix theory. Duke Mathematical Journal, 81(2), 269–322. DOI: 10.1215/S0012-7094-96-08115-6
GIMPS (2024). 2^136279841-1 is Prime! Great Internet Mersenne Prime Search. Retrieved from https://www.mersenne.org/primes/?press=M136279841

Changelog

2026-02-17: Corrected the date of the Montgomery-Dyson meeting from 1973 to 1972 (the paper was published in the 1973 proceedings volume, but the meeting at the IAS took place in April 1972).

The Hamiltonian of Intelligence: From Spin Glasses to Neural Networks

Mon, 21 Oct 2024 00:00:00 +0000

On October 8, 2024, the Royal Swedish Academy of Sciences announced that the Nobel Prize in Physics would go to John Hopfield and Geoffrey Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.” Within hours, the physics corner of the internet had an episode. Thermodynamics Twitter — yes, that is a thing — asked whether gradient descent is really physics in the sense that the Higgs mechanism is physics. The condensed matter community, who have been doing disordered systems since before most ML practitioners were born, oscillated between pride (“finally, they noticed us”) and bafflement (“why is Hinton here and not Parisi?”). There were takes. There were dunks. Someone made a graph of Nobel prizes versus average journal impact factor and it was not flattering to this year’s winner.

I understand the irritation. I do not share it.

The argument I want to make is stronger than “machine learning uses some physics concepts by analogy.” The energy function that Hopfield wrote down in 1982 is not inspired by the Ising Hamiltonian. It is the Ising Hamiltonian. The machine that Hinton and Sejnowski built in 1985 is not named after Boltzmann as a cute metaphor. It is a physical system whose equilibrium distribution is the Boltzmann distribution, and whose learning algorithm is derived from statistical mechanics. The lineage from disordered magnets to protein structure prediction is not a convenient narrative; it is a sequence of mathematical identities.

Let me trace it properly.

The 2021 Nobel: Parisi and the frozen magnet

Before we get to 2024, we need 2021. Giorgio Parisi received half the Nobel Prize in Physics that year for work done between 1979 and 1983 on spin glasses. The other half went to Syukuro Manabe and Klaus Hasselmann for climate modelling — an interesting pairing that provoked its own set of takes, though rather fewer.

A spin glass is a disordered magnetic system. The canonical physical realisation is a dilute alloy: a small concentration of manganese atoms dissolved in copper. Each manganese atom carries a magnetic moment — a spin — that can point in one of two directions, which we label $\sigma_i \in \{-1, +1\}$. The spins interact with each other via exchange interactions mediated by the conduction electrons. The crucial feature is that these interactions are random: some spin pairs prefer to align (ferromagnetic coupling, $J_{ij} > 0$) and others prefer to anti-align (antiferromagnetic coupling, $J_{ij} < 0$), and there is no spatial pattern to which is which.

The Hamiltonian of the system is

$$H = -\sum_{i < j} J_{ij} \sigma_i \sigma_j$$

where the $J_{ij}$ are random variables drawn from some distribution. In the Sherrington-Kirkpatrick (SK) model (Sherrington & Kirkpatrick, 1975), all $N$ spins interact with all other spins — a mean-field model — and the couplings are drawn from a Gaussian distribution with mean zero and variance $J^2/N$:

$$J_{ij} \sim \mathcal{N}\!\left(0,\, \frac{J^2}{N}\right)$$

The factor of $1/N$ is essential for extensivity: without it, the energy would scale as $N^2$ rather than $N$, which is unphysical.

Now here is the key phenomenon. At high temperature, the spins fluctuate freely and the system is paramagnetic. Cool it below the glass transition temperature $T_g$, and the system “freezes” — but not into a ferromagnet with all spins aligned, and not into a simple antiferromagnet. It freezes into one of an astronomically large number of disordered, metastable states. The system is not in its true ground state; it is trapped. It cannot find its way down because the energy landscape is rugged: every path toward lower energy is blocked by a barrier.

This rugged landscape is the central object. It has exponentially many local minima, separated by barriers that grow with system size. Different initial conditions lead to different frozen states. The system has memory of its history — hence “glass” rather than “crystal.”

Computing thermodynamic quantities in this system requires averaging over the disorder (the random $J_{ij}$), which means computing the quenched average of the free energy:

$$\overline{F} = -T\, \overline{\ln Z}$$

The overline denotes an average over the distribution of couplings. The problem is that $\ln Z$ is hard to average because $Z$ is a sum of exponentially many terms. Parisi’s solution — the replica trick — is a mathematical device worth describing, because it is beautifully strange.

The trick exploits the identity $\ln Z = \lim_{n \to 0} (Z^n - 1)/n$. We compute $\overline{Z^n}$ for integer $n$, which is feasible because $Z^n$ is a product of $n$ copies (replicas) of the partition function, and the average over disorder decouples. We then analytically continue in $n$ to $n \to 0$. The result is an effective action in terms of order parameters $q^{ab}$, which describe the overlap between spin configurations in replica $a$ and replica $b$.

The naive assumption is replica symmetry: all $q^{ab}$ are equal. This assumption turns out to be wrong. Parisi showed that the correct solution breaks replica symmetry in a hierarchical way — the overlap matrix $q^{ab}$ has a nested structure, described by a function $q(x)$ for $x \in [0,1]$. This is replica symmetry breaking (RSB).

RSB has a beautiful physical interpretation. The phase space of the spin glass is organised into an ultrametric tree: exponentially many states, arranged in nested clusters. States in the same cluster are similar (high overlap); states in different clusters are very different (low overlap). The hierarchy has infinitely many levels. Parisi showed that this structure is exact in the SK model (Parisi, 1979), and he spent the subsequent years proving it rigorously.

This is not an abstraction. RSB predicts specific, measurable properties of real spin glass alloys, and experiments have confirmed them. It is also, I want to emphasise, not a result that anyone expected. The mathematics forced it.

Three years after Parisi solved the SK model, a physicist at Bell Labs wrote a paper about memory.

Hopfield (1982): memory as energy minimisation

John Hopfield was a condensed matter physicist who had drifted toward biophysics — electron transfer in proteins, neural computation. In 1982 he published a paper in PNAS with the title “Neural networks and physical systems with emergent collective computational abilities” (Hopfield, 1982). Most biologists read it as a neuroscience paper. It is a statistical mechanics paper.

Hopfield defined a network of $N$ binary “neurons” $s_i \in \{-1, +1\}$ with symmetric weights $W_{ij} = W_{ji}$, and an energy function:

$$E = -\frac{1}{2} \sum_{i \neq j} W_{ij}\, s_i s_j$$

Readers who have seen the SK Hamiltonian above will notice something. This is it. The $J_{ij}$ of the spin glass are the $W_{ij}$ of the neural network. The Ising spins $\sigma_i$ are the neuron states $s_i$. The Hopfield network energy function is the Ising model Hamiltonian with symmetric, fixed (non-random) couplings. This is not a metaphor. This is the same equation.

The dynamics: at each step, choose a neuron $i$ at random and update it according to

$$s_i \leftarrow \text{sgn}\!\left(\sum_{j} W_{ij} s_j\right)$$

This update always decreases or leaves unchanged the energy $E$ (because the weights are symmetric). The network is a gradient descent machine on $E$. It will always converge to a local minimum — a fixed point.

The innovation is in how Hopfield chose the weights. To store a set of $p$ binary patterns $\xi^\mu \in \{-1,+1\}^N$ (for $\mu = 1, \ldots, p$), use Hebb’s rule:

$$W_{ij} = \frac{1}{N} \sum_{\mu=1}^{p} \xi^\mu_i\, \xi^\mu_j$$

This is the outer product rule. Each stored pattern contributes a rank-1 matrix to $W$. You can verify that if $s = \xi^\mu$, then the local field at neuron $i$ is

$$h_i = \sum_j W_{ij} s_j = \frac{1}{N}\sum_j \sum_{\nu} \xi^\nu_i \xi^\nu_j \xi^\mu_j = \xi^\mu_i + \frac{1}{N}\sum_{\nu \neq \mu} \xi^\nu_i \underbrace{\left(\sum_j \xi^\nu_j \xi^\mu_j\right)}_{\text{cross-talk}}$$

The first term reinforces pattern $\mu$. The second term is noise from the other stored patterns. When the patterns are random and uncorrelated, the cross-talk averages to zero for the first term to dominate, and the stored patterns are stable fixed points of the dynamics. A noisy or incomplete input — a partial pattern — will evolve under the dynamics toward the nearest stored pattern. This is associative memory: content-addressable retrieval.

The capacity limit follows from the same analysis. As $p$ grows, the cross-talk grows. When $p$ exceeds approximately $0.14N$, the cross-talk overwhelms the signal, and the network begins to form spurious minima — states that are not any of the stored patterns but are mixtures or corruptions of them. The network has entered a spin-glass phase.

This is not a rough analogy. Amit, Gutfreund, and Sompolinsky showed in 1985 that the Hopfield model is exactly the SK model with $p$ planted minima (Amit, Gutfreund, & Sompolinsky, 1985). The phase diagram of the Hopfield model — paramagnetic phase, memory phase, spin-glass phase — maps precisely onto the phase diagram of the SK model. The capacity limit $p \approx 0.14N$ is the phase boundary between the memory phase and the spin-glass phase, derivable from Parisi’s RSB theory.

The 2021 Nobel and the 2024 Nobel are, mathematically, about the same model.

Boltzmann machines (Hinton & Sejnowski, 1985)

The Hopfield model is deterministic and shallow — one layer of visible neurons, no hidden structure. Geoffrey Hinton and Terry Sejnowski, in a collaboration that began at the Cognitive Science summer school in Pittsfield in 1983 and culminated in a 1985 paper (Ackley, Hinton, & Sejnowski, 1985), added two things: hidden units and stochastic dynamics.

Hidden units $h_j$ are neurons not connected to any input or output. They do not correspond to observable quantities; they model latent structure in the data. The energy of the system is:

$$E(\mathbf{v}, \mathbf{h}) = -\sum_{i,j} W_{ij}\, v_i h_j - \sum_i a_i v_i - \sum_j b_j h_j$$

where $v_i$ are the visible (data) units, $h_j$ are the hidden units, $a_i$ and $b_j$ are biases. Note that this is still an Ising-type energy; the $W_{ij}$ are now inter-layer weights.

The stochastic dynamics replace deterministic gradient descent with a Markov chain. Each unit is updated probabilistically:

$$P(s_k = 1 \mid \text{rest}) = \sigma\!\left(\sum_j W_{kj} s_j + \text{bias}_k\right)$$

where $\sigma(x) = 1/(1 + e^{-x})$ is the logistic sigmoid. At inverse temperature $\beta = 1/T$, the probability of any complete configuration is

$$P(\mathbf{v}, \mathbf{h}) = \frac{1}{Z}\, e^{-\beta E(\mathbf{v}, \mathbf{h})}$$

This is the Boltzmann distribution. The machine is named after Ludwig Boltzmann because the equilibrium distribution of its states is the Boltzmann distribution. Not analogously. Literally.

Learning amounts to adjusting the weights to make the model distribution $P(\mathbf{v}, \mathbf{h})$ match the data distribution $P_{\text{data}}(\mathbf{v})$. The objective is to minimise the Kullback-Leibler divergence:

$$\mathcal{L} = D_{\mathrm{KL}}(P_{\text{data}} \| P_{\text{model}}) = \sum_{\mathbf{v}} P_{\text{data}}(\mathbf{v}) \ln \frac{P_{\text{data}}(\mathbf{v})}{P_{\text{model}}(\mathbf{v})}$$

The gradient with respect to the weight $W_{ij}$ is

$$\frac{\partial \mathcal{L}}{\partial W_{ij}} = -\langle v_i h_j \rangle_{\text{data}} + \langle v_i h_j \rangle_{\text{model}}$$

The first term is the empirical correlation between visible unit $i$ and hidden unit $j$ when the visible units are clamped to data. The second term is the correlation in the model’s free-running equilibrium. The learning rule says: increase $W_{ij}$ if the data sees these two units co-active more than the model does, and decrease it otherwise. This is Hebbian learning with a contrastive correction — the physics of equilibration drives the learning.

The computational difficulty is the second term. Computing $\langle v_i h_j \rangle_{\text{model}}$ requires the Markov chain to reach equilibrium, which takes exponentially long in general. Hinton’s later invention of contrastive divergence — run the chain for only a few steps rather than to equilibrium — made training feasible, at the cost of a biased gradient estimate. This engineering compromise is part of why the physics purists are uncomfortable: the original derivation is rigorous statistical mechanics, but the algorithm that actually works in practice is an approximation whose convergence properties are poorly understood.

I find this charming rather than damning. Physics itself is full of approximations whose convergence properties are poorly understood but which happen to give right answers. Perturbation theory beyond leading order, the replica trick itself — these are not rigorous mathematics. They are informed guesses that happen to be correct. The history of theoretical physics is mostly the history of getting away with things.

From Boltzmann machines to transformers

The Boltzmann machine was computationally difficult but conceptually foundational. The restricted Boltzmann machine (RBM) — with no within-layer connections, so that hidden units are conditionally independent given the visible units and vice versa — made training via contrastive divergence practical.

Hinton, Osindero, and Teh’s 2006 paper on deep belief networks showed that stacking RBMs and pre-training them greedily could initialise deep networks well enough to fine-tune with backpropagation. This was the breakthrough that restarted deep learning after the winter of the 1990s. It is fair to say that without the Boltzmann machine as conceptual foundation and the RBM as practical building block, the deep learning revolution that gave us large language models that fail to count letters in words would not have happened in the form it did.

The connection between Hopfield networks and modern attention mechanisms is more recent and more surprising. Ramsauer et al. (2020) showed that modern Hopfield networks — a generalisation of the original with continuous states and a different energy function — have exponential storage capacity (Ramsauer et al., 2020). More strikingly, the update rule of the modern Hopfield network is:

$$\mathbf{s}^{\text{new}} = \mathbf{X}\, \text{softmax}\!\left(\beta \mathbf{X}^\top \mathbf{s}\right)$$

where $\mathbf{X}$ is the matrix of stored patterns and $\mathbf{s}$ is the query. This is the attention mechanism of the transformer, up to notation. The transformer’s multi-head self-attention is, formally, a generalised Hopfield retrieval step. The architecture that powers GPT and everything descended from it is, at one level of abstraction, an associative memory performing energy minimisation on a Hopfield energy landscape.

I do not want to overstate this. The connection is formal and the interpretation is contested. But it is not nothing. The physicists who built the Hopfield network in 1982 were working on the same mathematical object that is now used to process language, images, and protein sequences at industrial scale.

The protein folding connection

The 2024 Nobel Prize in Chemistry went to Demis Hassabis, John Jumper, and David Baker for computational protein structure prediction — specifically for AlphaFold2 (Jumper et al., 2021). This made October 2024 a remarkable month for Nobel Prizes in fields adjacent to artificial intelligence, and it is not a coincidence.

Protein folding is a spin-glass problem. A protein is a polymer of amino acids, each with different chemical properties and steric constraints. The protein folds into a unique three-dimensional structure — its native conformation — determined by its sequence. The energy landscape of the folding process is precisely the kind of rugged landscape that Parisi described for spin glasses: exponentially many misfolded states, separated by barriers, with the native structure as the global minimum (or close to it).

Levinthal’s paradox, formulated in 1969, makes the absurdity quantitative. A modest protein of 100 amino acids might have $3^{100} \approx 10^{47}$ possible conformations (allowing three dihedral angle states per residue). Random search of this space, at the rate of one conformation per picosecond, would take $10^{35}$ years — somewhat longer than the age of the universe. Yet proteins fold in milliseconds to seconds. They do not search randomly; the energy landscape is funnel-shaped, channelling the dynamics toward the native state. But predicting which state is the native one from sequence alone remained one of the hard problems of structural biology for fifty years.

AlphaFold2 uses a transformer architecture — descended from the Boltzmann machine lineage — trained on millions of known protein structures. It does not simulate the folding dynamics; it has learned, from data, a mapping from sequence to structure that encodes the statistical mechanics of the folding funnel. The Nobel committee gave it the Chemistry prize because it is transforming biochemistry. But the conceptual machinery is pure statistical physics: representation of a high-dimensional energy landscape, approximation of the minimum, learned from the distribution of solved instances.

The three Nobels of 2021–2024 form the most coherent consecutive triple I can remember: Parisi showed how disordered energy landscapes behave; Hopfield and Hinton showed how to use energy landscapes as memory and learning machines; Hassabis and Jumper showed how to apply the resulting architecture to the most consequential outstanding problem in molecular biology. Each step is a mathematical consequence of the one before it.

The controversy: did the committee err?

I said I understand the irritation. Here is what is right about it.

Hinton’s work after the Boltzmann machine — backpropagation, dropout, convolutional networks, deep learning at ImageNet scale — is primarily engineering and empirical machine learning. The 2012 AlexNet result that restarted the field was not a theoretical physics contribution; it was a demonstration that known methods work very well on very large datasets with very large GPUs. The fact that it works is not explained by statistical mechanics. The scaling laws of neural networks (loss scales as a power law with compute, parameters, and data) are empirical observations that physicists have tried to explain with renormalisation group arguments with mixed success.

If the Nobel Prize in Physics were awarded for “the work that most influenced technology in the past decade,” the case for Hinton is strong. If it were awarded for “the most important contribution to the science of physics,” the case is weaker. There is a version of the Nobel announcement that emphasises the Boltzmann machine specifically — the 1985 paper that is literally named after a physicist and uses his distribution — and that version sits cleanly within physics. There is a broader version that encompasses all of Hinton’s career, and that version includes a great deal of empirical machine learning that the physics community is reasonably reluctant to claim.

My view, for what it is worth from someone who has been thinking about AI ethics and consequences for rather longer than feels comfortable: the Nobel correctly identifies that the foundational conceptual contributions — the Ising Hamiltonian as associative memory, the Boltzmann distribution as a learning target, the connection between statistical mechanics and computation — are physics. They came from physicists, they use physics mathematics, they extend physics intuition into a new domain. The subsequent scaling of these ideas using TPUs and transformer architectures is engineering. Valuable engineering, world-changing engineering, but engineering. The Nobel is for the former. If the citation had been more specific — “for the Boltzmann machine and its demonstration that physical principles govern neural computation” — the physics community would have been less irritated and equally correct.

What the irritation reveals is something slightly uncomfortable about disciplinary identity. Physicists are proud of universality: the idea that the same mathematical structures appear in wildly different physical systems. RSB in spin glasses, replica methods in random matrices, the Parisi–Sourlas correspondence between disordered systems and supersymmetric field theories — the joy of physics is precisely that these deep structural similarities cross domain boundaries. When that universality reaches into machine learning and says “your transformer attention layer is a Hopfield retrieval step,” physicists should be delighted, not affronted.

The agentic systems that are being built right now on top of transformer architectures are doing something that looks, from a sufficiently abstract distance, like what the Hopfield network was designed to do: find stored patterns that match a query, and use them to generate a response. The failures of grounding that I have written about elsewhere are, in this view, failures of the energy landscape — the model finds a metastable state that is not the correct minimum, and the dynamics cannot escape. Spin glass physics does not explain these failures in detail, but it gives a language for thinking about them. That is what physics is for.

The universality argument

Let me make the deeper claim explicit. Why should disordered magnets, associative memory networks, and protein folding all live in the same mathematical family?

Because they all have the same structure: many interacting degrees of freedom with competing constraints, a combinatorially large configuration space, an energy landscape with exponentially many metastable states, and dynamics that search for — and frequently fail to find — global minima. This is a universality class. The specific details (magnetic moments versus neuron states versus dihedral angles) are irrelevant at the level of the energy landscape topology.

Parisi’s contribution was to show that this class has a specific, exactly-solvable structure in mean field theory, characterised by replica symmetry breaking and the ultrametric organisation of states. This was not a solution to one model. It was a description of a universality class. The fact that the Hopfield model is in this class is not a coincidence requiring explanation; it is a mathematical identity requiring verification.

The Kuramoto model for coupled oscillators — which I have written about in the context of ensemble synchronisation and neural phase coupling — is another member of this extended family. The synchronisation transition in the Kuramoto model, the glass transition in the SK model, and the memory phase transition in the Hopfield model are all mean-field phase transitions in disordered many-body systems. The mathematics is more similar than the physics syllabi suggest.

When I teach physics and occasionally venture into questions about what the AI tools my students are using actually do, I find myself reaching for this framework. Not because it gives engineering insight into how to train a better model — it does not, particularly — but because it gives honest insight into what kind of thing a neural network is. It is a physical system. It has an energy landscape. Its failures are phase transitions. Its successes are energy minimisation. The vocabulary of statistical mechanics is not a metaphor; it is the correct description.

The Nobel committee noticed. They were right to notice.

The 2021 and 2024 Nobel Prizes in Physics have now officially bridged the gap between condensed matter physics and machine learning in the public record. For anyone who wants to understand either field more deeply than the press releases suggest, the SK model and the Hopfield network are the right place to start. Both papers are short by modern standards — Parisi’s 1979 letter is three pages; Hopfield’s 1982 PNAS paper is five — and both repay close reading.

References

Sherrington, D., & Kirkpatrick, S. (1975). Solvable model of a spin-glass. Physical Review Letters, 35(26), 1792–1796. DOI: 10.1103/PhysRevLett.35.1792
Parisi, G. (1979). Infinite number of order parameters for spin-glasses. Physical Review Letters, 43(23), 1754–1756. DOI: 10.1103/PhysRevLett.43.1754
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558. DOI: 10.1073/pnas.79.8.2554
Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1), 147–169. DOI: 10.1207/s15516709cog0901_7
Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55(14), 1530–1533. DOI: 10.1103/PhysRevLett.55.1530
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589. DOI: 10.1038/s41586-021-03819-2
Ramsauer, H., Schäfl, B., Lehner, J., Seidl, P., Widrich, M., Adler, T., Gruber, L., Holzleitner, M., Pavlović, M., Sandve, G. K., Greiff, V., Kreil, D., Kopp, M., Klambauer, G., Brandstetter, J., & Hochreiter, S. (2020). Hopfield networks is all you need. arXiv:2008.02217. Retrieved from https://arxiv.org/abs/2008.02217
Nobel Prize Committee. (2024). Scientific background: Machine learning and physical systems. The Royal Swedish Academy of Sciences. Retrieved from https://www.nobelprize.org/prizes/physics/2024/advanced-information/

Why Cats Purr at 25 Hz: Vocal Fold Pads and the Physics of Self-Sustained Oscillation

Mon, 09 Sep 2024 00:00:00 +0000

The first thing either of our cats did when I sat still long enough was purr. Not after food, not during play — the purr arrived when I sat down and held still and they settled against me, and it arrived as a physical fact, a vibration felt through the sternum and the ribs, not merely heard. The frequency was low: around 25–30 cycles per second, which you can feel as a buzz rather than hear as a tone. This is, I later confirmed, not far from the frequency at which clinical devices stimulate bone growth. They are indoor cats now, on our vet’s recommendation — they find this unreasonable, but sitting still and being purred on has become a regular feature of working from home.

The physics of how the larynx produces that frequency is, as of 2023, finally resolved — and the mechanism is more elegant than anyone suspected.

The Frequency and Its Peculiarity

Domestic cats purr at approximately $25$–$30\,\mathrm{Hz}$. This is remarkably low for an animal of cat size. A human vocal fold — roughly comparable in size — vibrates at $85$–$255\,\mathrm{Hz}$ for normal speech. A cat’s larynx is smaller than a human’s, not larger, which makes the low frequency surprising: in a simple spring-mass oscillator model, smaller and lighter vocal folds should vibrate faster, not slower.

The frequency range $25$–$50\,\mathrm{Hz}$ has clinical significance in a different field. Therapeutic vibration platforms used in sports medicine and osteoporosis treatment operate in exactly this range, exploiting Wolff’s law (bone remodelling under mechanical stress) to increase bone density and accelerate fracture repair. The coincidence is suggestive. It was first noted quantitatively by von Muggenthaler (2001, Journal of the Acoustical Society of America 110, 2666), who recorded purrs from 44 felids and found that all produced dominant frequencies between $25$ and $150\,\mathrm{Hz}$.

Whether cats deliberately exploit this frequency for self-healing is a separate biological question. The physics question is simpler: how does the larynx produce it?

Flow-Induced Vocal Fold Oscillation

Vocal fold oscillation in mammals is a flow-induced, self-sustained mechanical phenomenon. The Bernoulli effect and elastic restoring forces create a feedback loop that keeps the folds oscillating as long as subglottal air pressure is maintained.

The mechanism is as follows. The lungs supply a steady subglottal pressure $p_\mathrm{sub}$. This drives airflow through the glottis (the gap between the vocal folds). As the folds are pushed apart by the pressure, the airflow velocity in the narrowed glottis increases; by Bernoulli’s principle,

$$p + \tfrac{1}{2}\rho v^2 = \mathrm{const},$$

the pressure drops, drawing the folds back together. The folds’ elastic restoring force adds to this: they spring back when displaced. The result is an oscillation — the folds open and close periodically, chopping the airflow into pressure pulses that we perceive as sound (or vibration, for low frequencies).

The fundamental frequency is approximately:

$$f_0 \approx \frac{1}{2L}\sqrt{\frac{T}{\rho_s}},$$

where $L$ is the vibrating length of the vocal fold, $T$ is the longitudinal tension, and $\rho_s$ is the surface density (mass per unit area). This is the same formula as for a vibrating string — and the physics is closely related.

For a cat-sized larynx with $L \approx 1\,\mathrm{cm}$, realistic tissue tension, and tissue density $\rho_s \sim 1\,\mathrm{kg/m}^2$, this formula gives $f_0$ in the hundreds of hertz — far above the observed purring frequency of $25$–$30\,\mathrm{Hz}$.

Something is missing from the model.

The Long-Standing Controversy

Until 2023, the dominant explanation for the low purring frequency was the Active Muscular Contraction (AMC) hypothesis: the laryngeal muscles contract rhythmically at the purring frequency, mechanically driving the vocal folds rather than relying on passive aeroelastic oscillation. On this view, purring is more like a drumming than a singing — the neural drive at $25$–$30\,\mathrm{Hz}$ sets the frequency, overriding the natural aeroelastic frequency.

The AMC hypothesis was difficult to test directly because the larynx is inaccessible in a live, purring cat without interfering with the purr. Electromyographic recordings from laryngeal muscles of purring cats showed rhythmic activity consistent with the AMC hypothesis, but causality was unclear: were the muscles driving the oscillation, or responding to it?

The alternative hypothesis — that purring is passive, driven purely by aeroelastic forces — faced the problem noted above: the aeroelastic frequency of a cat-sized larynx should be far too high to explain $25$–$30\,\mathrm{Hz}$. Unless something was being added to the vocal folds to lower their effective resonant frequency.

Herbst et al. 2023: The Mass-Loading Mechanism

In October 2023, Christian Herbst and colleagues at the University of Vienna published “Domestic cat larynges can produce purring frequencies without neural input” (Current Biology 33, 4727–4732). The experiment was decisive.

The team excised larynges from domestic cats (post-mortem, within a short time window to preserve tissue properties) and mounted them in a flow bench: a controlled airflow was supplied to the subglottal side, and the larynges were held at physiologically realistic tension and hydration.

The result: all eight excised larynges produced self-sustained oscillations at $25$–$30\,\mathrm{Hz}$ — the normal purring frequency — without any neural input whatsoever. No muscular contraction was present (no motor neurons, no calcium signalling, no ATP). The oscillation was purely passive, driven by the airflow and maintained by the tissue mechanics.

This ruled out the AMC hypothesis. The neural drive is not needed to sustain the oscillation; it may modulate it, start or stop it, but the fundamental frequency is set by the tissue mechanics, not the neural firing rate.

The follow-up finding was the key to the physics: histological analysis of the vocal fold tissue revealed connective tissue pads embedded in the vocal fold mucosa, up to $4\,\mathrm{mm}$ thick. These pads are not present in the vocal folds of humans or other mammals that do not purr. They increase the effective mass of the oscillating tissue significantly, without adding corresponding stiffness.

The Mass-Loading Physics

The fundamental frequency of a harmonic oscillator is:

$$f_0 = \frac{1}{2\pi}\sqrt{\frac{k}{m}},$$

where $k$ is the effective stiffness and $m$ is the effective mass. Adding mass (at constant stiffness) lowers the frequency as $f_0 \propto m^{-1/2}$.

For the vocal folds, the spring constant $k$ is set by tissue tension and elasticity — properties that the tissue pads do not significantly alter. But the pads add a substantial mass $\Delta m$ to the oscillating system. The purring frequency becomes:

$$f_\mathrm{purr} = \frac{1}{2\pi}\sqrt{\frac{k}{m_0 + \Delta m}},$$

where $m_0$ is the baseline vocal fold mass and $\Delta m$ is the added mass from the pads.

As a rough estimate: if the unloaded aeroelastic frequency were in the range $f_\mathrm{normal} \approx 200$–$400\,\mathrm{Hz}$ (the range of cat meow fundamental frequencies), lowering it to $f_\mathrm{purr} \approx 25\,\mathrm{Hz}$ would require a mass increase by a factor of

$$\frac{m_0 + \Delta m}{m_0} = \left(\frac{f_\mathrm{normal}}{f_\mathrm{purr}}\right)^2 \approx 64\text{–}256.$$

This is a large factor, but not implausible for pads up to 4 mm thick embedded in a mucosal membrane that is itself very thin. The simple harmonic oscillator model is an idealisation — the actual frequency reduction also involves changes in vibration mode shape, tissue coupling, and aerodynamic loading — but the mass-loading effect is the dominant mechanism. The tissue pads are, in effect, frequency dividers: they convert a high-frequency aeroelastic oscillator into a low-frequency vibration generator.

This is the same principle used in engineering to lower the natural frequency of mechanical structures: add mass without changing stiffness. Tuned mass dampers in skyscrapers work on the same principle. So do the heavy flywheel weights added to engines to suppress rotational vibration.

The cat’s larynx evolved this solution independently, and with a mass ratio that would impress a structural engineer.

The Self-Sustained Oscillation Criterion

Not every mass-loaded oscillator will self-sustain under airflow. The Bernoulli-elastic feedback loop must overcome the viscous damping of the tissue. A dimensional scaling estimate for the critical subglottal pressure is:

$$p^* \sim \eta_\mathrm{tissue} \cdot \frac{v}{L} \sim \eta_\mathrm{tissue} \cdot f_0,$$

where $\eta_\mathrm{tissue}$ is the tissue viscosity, $v \sim f_0 L$ is the characteristic mucosal wave velocity, and $L$ is the fold length. (The full phonation threshold pressure, as derived by Titze (2006), depends on additional geometric and aerodynamic parameters.) For typical laryngeal tissue properties and the observed purring frequency, this critical pressure is of order $100$–$200\,\mathrm{Pa}$ — low enough to be sustained by the respiratory system without extraordinary effort.

This is consistent with the observation that cats can purr both during inhalation and exhalation, maintaining a continuous acoustic output throughout the breathing cycle. The oscillation threshold is low enough that normal respiration can maintain it.

Wolff’s Law and the 25 Hz Coincidence

Julius Wolff (1892) proposed that bone remodels in response to mechanical loading: osteoblasts (bone-building cells) are stimulated by cyclic compressive stress, while osteoclasts (bone-resorbing cells) dominate in the absence of loading. This principle — now called Wolff’s law — underpins the use of therapeutic vibration in orthopaedics.

The optimal frequency for osteoblast stimulation, determined empirically in clinical studies, is $20$–$50\,\mathrm{Hz}$. Vibration at these frequencies, applied at amplitudes of $0.2$–$1.0\,g$ (where $g$ is gravitational acceleration), produces measurable increases in bone mineral density, accelerates fracture healing, and reduces bone loss in microgravity. The frequency range is not a narrow resonance; it reflects the natural frequencies of cellular mechanotransduction pathways involving focal adhesion kinase (FAK) and integrin signalling.

Cat purring produces vibration in the frequency range $25$–$50\,\mathrm{Hz}$ at the body surface. Whether this is sufficient to produce meaningful bone stimulation — and whether cats evolved purring partly as a bone-maintenance mechanism — is not yet resolved by controlled experiments. The hypothesis is physiologically plausible: cats conserve metabolic energy by resting for up to 16 hours per day, and during this rest period, bone would normally be unstressed and subject to resorption. A continuous low-frequency vibration during rest could counteract this.

This is speculative at the level of evolutionary causation. What is not speculative is that the purring frequency overlaps precisely with the therapeutic vibration range, and that this overlap is not obviously accidental.

Across Felid Species

Von Muggenthaler’s 2001 survey of 44 felids found that most domestic cats purr in the range $25$–$30\,\mathrm{Hz}$, with harmonics at $50$, $75\,\mathrm{Hz}$, and so on. Cheetahs purr at $20$–$25\,\mathrm{Hz}$; pumas (mountain lions) at $20$–$30\,\mathrm{Hz}$; servals and ocelots at $22$–$28\,\mathrm{Hz}$.

The large roaring cats — lions, tigers, leopards, jaguars — do not purr in the continuous sense that domestic cats do. Their enlarged hyoid apparatus allows roaring by a different mechanism (a modified laryngeal pad that allows very low-frequency, high-intensity sound production). Some large cats produce purr-like sounds during exhalation but not the continuous through- inhalation-and-exhalation purring of smaller felids.

The vocal fold pad mechanism appears to be specific to the non-roaring felids, though detailed histological comparisons across species are still sparse.

What I Hear

When one of our cats purrs while settled against me, what I am feeling is the mechanical resonance of a mass-loaded aeroelastic oscillator at approximately $25\,\mathrm{Hz}$, the frequency having been lowered by connective tissue pads from a natural aeroelastic frequency several hundred hertz higher. The pads evolved, we think, to produce exactly this frequency — sustained under normal respiratory airflow pressure with no additional muscular energy. The acoustic output is a byproduct of a vibration.

Whether the vibration serves a direct physiological function in the cat’s own bones is, as of this writing, still an open question. What seems clear is that the 2023 paper settled the mechanism question conclusively: the frequency is set by mass loading, not neural drive. The larynx purrs by itself when you blow air through it.

I find this reassuring. The physics is in the cat, not in its nervous system. The cat purrs the way a tuning fork rings — not because it decides to, but because that is what it does when the conditions are right.

References

Herbst, C.T., Prigge, T., Garcia, M., Hampala, V., Hofer, R., Weissengruber, G.E., Svec, J.G., & Fitch, W.T. (2023). Domestic cat larynges can produce purring frequencies without neural input. Current Biology, 33(22), 4727–4732.e4. https://doi.org/10.1016/j.cub.2023.09.014
von Muggenthaler, E. (2001). The felid purr: A healing mechanism? Journal of the Acoustical Society of America, 110(5), 2666. https://doi.org/10.1121/1.4777098
Titze, I.R. (2006). The Myoelastic Aerodynamic Theory of Phonation. National Center for Voice and Speech.
Wolff, J. (1892). Das Gesetz der Transformation der Knochen. A. Hirschwald. (English translation: Maquet, P., & Furlong, R., 1986. The Law of Bone Remodelling. Springer.)
Rubin, C.T., & Lanyon, L.E. (1984). Regulation of bone formation by applied dynamic loads. Journal of Bone and Joint Surgery, 66(3), 397–402. https://doi.org/10.2106/00004623-198466030-00012
Christiansen, P. (2008). Evolution of skull and mandible shape in cats (Carnivora: Felidae). PLOS ONE, 3(7), e2807. https://doi.org/10.1371/journal.pone.0002807

Why 44,100? The Accidental Physics of the CD Sampling Rate

Mon, 05 Aug 2024 00:00:00 +0000

44,100 Hz. Not 44,000. Not 48,000. Not even 40,000 or 50,000, which would at least have the virtue of roundness. The number that defines CD-quality audio is specific in a way that invites a question most people never think to ask: why that number?

The Puzzle

When a physical constant turns out to be $1.6 \times 10^{-19}$ coulombs, that is just nature being nature — no further explanation is needed or available. But when an engineering standard settles on 44,100 Hz rather than, say, 44,000 Hz or 45,000 Hz, there is a story hiding in the specificity.

The standard answer — the one you find on Wikipedia and in most popular accounts — is that 44.1 kHz satisfies the Nyquist criterion for 20 kHz audio, and so it was chosen to preserve the full range of human hearing. This is true. It is also almost completely uninformative. The Nyquist criterion for 20 kHz audio requires only that the sampling rate exceed 40 kHz. That constraint is satisfied by 40,001 Hz as much as by 44,100 Hz. The specific value requires a different explanation entirely.

That explanation involves a Sony engineer, a consumer videocassette recorder, and the accidental convergence of two television standards developed independently on different continents. The number 44,100 is not an optimisation. It is an archaeological deposit. And like most archaeological deposits, it is still with us long after the civilisation that created it has disappeared.

I want to work through the physics first, because the Nyquist theorem is genuinely beautiful and is often presented in a way that obscures what it actually says. Then I want to show you the arithmetic that makes 44,100 inevitable given 1970s constraints — and the way NTSC and PAL, designed for completely different reasons, conspire to produce the same number. If you enjoy “hidden mathematics in music,” you might also find it in Euclidean Rhythms, where a 2,300-year-old algorithm turns out to encode the structure of West African and Cuban percussion.

The Nyquist–Shannon Sampling Theorem

Before the archaeology, the physics.

In 1928, Harry Nyquist published a paper on telegraph transmission theory that contained, somewhat incidentally, the germ of what would become one of the most consequential theorems in applied mathematics [4]. Claude Shannon formalised and generalised it in 1949 [5]. The theorem states: a continuous bandlimited signal whose highest frequency component is $f_{\max}$ can be perfectly reconstructed from discrete samples taken at rate $f_s$ if and only if

$$f_s > 2 f_{\max}.$$

The quantity $f_s / 2$ is called the Nyquist frequency. Sampling below it causes aliasing: high-frequency components fold back into the spectrum and appear as spurious low-frequency artefacts that are indistinguishable from genuine signal. Once you have aliased a signal, the damage is permanent. Sampling at or above the Nyquist rate, the theorem says, causes no information loss at all — the original continuous waveform can be recovered exactly, in principle, from the discrete sample sequence.

Human hearing extends from roughly 20 Hz to 20 kHz (and, for most adults over thirty, substantially less at the top end, but 20 kHz is the canonical engineering requirement). Setting $f_{\max} = 20$ kHz, the Nyquist criterion requires $f_s > 40$ kHz.

But here is the subtlety that the Wikipedia summary tends to skip. The theorem assumes that the signal is perfectly bandlimited before sampling — meaning that all energy above $f_{\max}$ has been removed. This requires an anti-aliasing filter: a low-pass filter applied to the analogue signal before the analogue-to-digital converter samples it. If your anti-aliasing filter passes everything up to 20 kHz and blocks everything above it with perfect sharpness, then 40,001 Hz would suffice. The problem is that such a filter is physically unrealisable.

Real filters do not have vertical cutoffs. They have a transition band: a frequency range over which attenuation increases gradually from zero to full suppression. The steeper you want the transition, the higher the filter order, and for practical filter hardware in 1979 — op-amps, capacitors, inductors, no DSP to speak of — a “steep enough” filter meant a transition band of roughly 10% of the passband edge frequency. For a 20 kHz passband edge, that is about 2 kHz of transition band.

So the actual engineering requirement is not just $f_s > 40$ kHz. It is $f_s > 40$ kHz plus enough headroom for a realisable anti-aliasing filter. With $f_s = 44.1$ kHz, the Nyquist limit sits at $f_s/2 = 22.05$ kHz. The gap between the top of the audio band and the Nyquist limit is

$$22{,}050 - 20{,}000 = 2{,}050 \text{ Hz},$$

which is just over 10% of 20 kHz. This is enough to build a practical anti-aliasing filter with 1970s and early 1980s analogue components. Had the sampling rate been 41 kHz, the gap would have been only 500 Hz — far too narrow for affordable hardware. Had it been 50 kHz, the gap would have been more comfortable, but you would be storing 13.6% more data per second for no audible benefit.

So 44.1 kHz is in the right neighbourhood given real-world filter constraints. But it is still a specific number. The question of why 44,100 rather than 44,000 or 43,500 or 44,800 is still open. That is where the VCRs come in.

The VCR Problem

In the late 1970s, Sony was developing what would eventually become the Compact Disc. One of the fundamental engineering problems was storage: where do you put the digital audio data? A 74-minute stereo recording at 16 bits and 44.1 kHz generates roughly 780 megabytes. In 1979, that was an absurd quantity of data. Hard drives with that capacity existed but cost tens of thousands of dollars and weighed as much as a washing machine. Dedicated digital tape formats existed in professional studios but were exotic and expensive [1].

The only affordable high-bandwidth magnetic recording medium available to consumer-facing engineers in 1979 was the VCR — the videocassette recorder. VHS and Betamax had recently become consumer products, and the tape and drive mechanism was cheap, reliable, and capable of storing several hours of high-bandwidth video signal. That video signal bandwidth was substantial: enough, in principle, to carry digital audio if you could get it onto the tape in the right form.

Sony’s solution was elegant to the point of audacity. Rather than inventing a new tape format, they encoded digital audio samples as a black-and-white pseudo-video signal — patterns of light and dark pixels that a standard VCR recorded without modification, because as far as the VCR was concerned it was just receiving a monochrome video feed. The resulting device, the Sony PCM-1600 (1979), was a standalone unit that sat between a microphone preamplifier and a VCR, converting audio to fake video for recording and back to audio for playback [3].

The sampling rate of the audio was now determined not by any audio engineering consideration but by the geometry of the video signal. And the geometry of the video signal was fixed by the television broadcast standard — which brought entirely different historical contingencies into the calculation.

The NTSC Arithmetic

The NTSC standard — developed in North America and Japan — specifies 30 frames per second and 525 total scan lines per frame. Of those 525 lines, 35 are consumed by the vertical blanking interval (the time needed for the electron beam in a CRT to return from the bottom of the screen to the top). That leaves 490 active lines per frame actually carrying picture information.

Sony packed 3 audio samples into each active scan line. The audio sampling rate is then:

$$f_s = \underbrace{30}_{\text{frames/s}} \times \underbrace{490}_{\text{active lines/frame}} \times \underbrace{3}_{\text{samples/line}} = 44{,}100 \text{ Hz}.$$

There it is. 44,100 Hz, emerging not from any consideration of human hearing or filter design, but from the frame rate and line count of the North American television standard.

The PAL Arithmetic

Now the European video standard, PAL, which was developed in the 1960s independently of NTSC and optimised for different priorities. PAL uses 25 frames per second and 625 total scan lines per frame. The vertical blanking interval consumes 37 lines, leaving 588 active lines per frame.

Sony packed 3 audio samples into each active PAL scan line as well. The sampling rate:

$$f_s = \underbrace{25}_{\text{frames/s}} \times \underbrace{588}_{\text{active lines/frame}} \times \underbrace{3}_{\text{samples/line}} = 44{,}100 \text{ Hz}.$$

The same number.

Let that settle for a moment. NTSC: 30 frames per second, 490 active lines. PAL: 25 frames per second, 588 active lines. Different frame rates. Different line counts. Developed on different continents for different broadcast environments. And yet $30 \times 490 = 25 \times 588 = 14{,}700$, so multiplying by 3 gives 44,100 in both cases.

This is not coincidence in any deep sense — NTSC and PAL were both designed to fill approximately the same video bandwidth, just with different tradeoffs between temporal resolution (frame rate) and spatial resolution (line count). But for Sony’s VCR encoding scheme, the numerical convergence was enormously convenient: a single PCM processor running at 44.1 kHz could record to either NTSC or PAL video equipment without any change to the audio electronics. The same master machine could work in Tokyo and in Frankfurt.

The arithmetic is, I think, one of those moments where a coincidence that is perfectly explicable in hindsight still feels satisfying in the way that a physical derivation feels satisfying. You set up the constraints — fill the video bandwidth, pack an integer number of samples per line, keep the number of samples small enough to fit in a line’s worth of data — and the number 44,100 falls out of two independent calculations like a constant of nature. It is not a constant of nature. It is a contingent product of mid-twentieth-century broadcast engineering. But the mathematics does not care.

From Tape to Disc

When Philips and Sony sat down to negotiate the Red Book standard — the technical specification for the Compact Disc, finalised in 1980 and commercially launched in 1982 — both companies brought existing infrastructure to the table [3]. Both had been building digital audio equipment for several years. Both had PCM processors running in professional studios. Both had catalogues of digital masters recorded on VCR tape. And all of that equipment ran at 44.1 kHz, because all of it had been built to interface with the video tape standard that made digital audio recording practically affordable in the first place.

Changing the sampling rate for the CD would have required rebuilding the entire mastering chain: new PCM processors, new format conversion hardware, new master tape libraries. The economic and logistical cost would have been enormous. The 44.1 kHz rate was not chosen for the CD because it was optimal in any absolute engineering sense. It was chosen because it was already there [1], [2].

This is a pattern worth recognising. Major technical standards are rarely chosen by optimisation from first principles. They are chosen by consolidating what already exists. The QWERTY keyboard layout was optimised for typewriter mechanisms that no longer exist. The 60 Hz AC frequency in North America was set by Westinghouse generators installed in the 1890s. The 44.1 kHz CD sampling rate was set by VCR tape recorders that were obsolete within a decade of the CD’s launch.

The Other Rates

Not all digital audio runs at 44.1 kHz, and the coexistence of different rates in the modern audio industry is the direct legacy of 44.1 kHz’s awkward origins.

48 kHz is the professional broadcast and studio standard. It is used in digital video, in DAT tape, in most professional audio interfaces, and in the digital audio embedded in broadcast television signals — including, as a matter of course, in the digital television infrastructure described in the context of university video platforms like educast.nrw. Why 48? Broadcast infrastructure needed a rate that had clean integer relationships with the 32 kHz rate used in early satellite and ISDN broadcast systems. The relationship $48 = \frac{3}{2} \times 32$ is exact, making synchronisation straightforward. 44.1 kHz has no such clean relationship with anything in broadcast engineering.

The ratio between the two dominant rates is $48 / 44.1 = 160 / 147$. This fraction — irreducible, inelegant, non-obvious — is the source of essentially every sample-rate conversion problem in audio post-production. When a CD master (44.1 kHz) is prepared for broadcast (48 kHz), a sample-rate converter must interpolate 147 samples up to 160 samples, or downsample 160 samples to 147, at every moment. The process introduces small errors, and doing it well requires significant computational effort. Every time a musician’s recording moves between the consumer and professional audio worlds, it passes through this fractional bottleneck. Two standards that could have been made compatible were instead set by completely independent historical processes, and we have been paying the computational tax ever since.

96 kHz and 192 kHz are marketed as “high-resolution audio.” Here the physics gets genuinely murky and the claims made by the audio industry deserve some scepticism. Human hearing above 20 kHz is, for most adults, genuinely absent — not reduced, but absent, because the outer hair cells in the cochlea that respond to those frequencies progressively die from the teenage years onward and are not replaced. The argument for high sampling rates is typically one of two things: first, that ultrasonic content can cause intermodulation distortion, where sum and difference frequencies of ultrasonic components fall back into the audible band; second, that a higher sampling rate allows for a more relaxed anti-aliasing filter with better phase behaviour within the audible band.

Both effects are real and measurable in laboratory conditions. Whether they are audible under controlled double-blind listening conditions is a separate and more contested question. The published evidence is not strong. What is not contested is that 96 kHz files are twice the size of 44.1 kHz files, and 192 kHz files are more than four times the size, for the same bit depth and the same number of audio channels. Whether that storage cost buys anything audible is, as of the current state of the literature, an open question.

The Irony

Here is the situation we are actually in. The canonical digital audio format — 16-bit, 44.1 kHz PCM, the format that defined CD quality for a generation and that remains the standard for music distribution — is physically a photograph of analogue video tape. The digitisation of music was made possible by television engineering. The specific number that defines the fidelity of every CD ever pressed is determined by the frame rates and line counts of 1970s broadcast television standards, which were themselves determined by the capabilities of 1940s CRT technology and the political negotiations of early broadcast licensing bodies.

When someone tells you that 44.1 kHz is the “natural” or “perfect” sampling rate for audio, they are, without knowing it, paying tribute to the NTSC standards committee of 1941 and the PAL engineers of the 1960s. The number carries history in it the way a fossil carries the structure of a long-dead organism. It is the right number, in the sense that it works. Its rightness has nothing to do with the reasons it was chosen.

I find this genuinely satisfying rather than disappointing. The history of physics and engineering is full of contingent numbers that turned out to be good enough, and whose goodness was only rationalised after the fact. The metre was originally defined as one ten-millionth of the distance from the equator to the North Pole along the Paris meridian — an arbitrary geodetic choice that turned out to produce a unit of length that is remarkably convenient for human-scale physics. The kilogram was a cylinder of platinum-iridium alloy in a vault outside Paris for over a century. 44,100 Hz is in good company.

The Archaeology of a Number

The numbers we inherit from engineering history are rarely arbitrary at every level simultaneously. 44,100 Hz is not arbitrary at the level of sampling theory: it satisfies the Nyquist criterion with enough headroom for a physically realisable anti-aliasing filter, given 1970s component technology. That is a genuine constraint, and the number sits in the right region of parameter space for it.

But it is arbitrary at a deeper level: it is the specific number that happened to fit a video tape format that happened to be affordable in 1979, a format that was itself determined by broadcast standards that were set for entirely unrelated reasons decades earlier. The chain of contingencies runs: 1940s television engineering defines NTSC and PAL frame rates and line counts; 1970s consumer VCR technology makes those tape formats cheap; 1979 Sony engineers encode digital audio as fake video; the arithmetic of the video formats fixes the sampling rate at 44,100 Hz; that rate gets locked into the CD standard in 1980; 44.1 kHz becomes the defining frequency of a digital music format that ships billions of units over the following four decades.

Science and engineering produce exact numbers from messy contingencies. The number 44,100 is simultaneously a theorem output (it satisfies a well-defined engineering constraint), a historical accident (it is determined by the specific video tape hardware that existed in 1979), and an institutional fossil (it outlasted the VCRs that created it by four decades and counting). All three things are true at the same time.

The VCRs are gone. The sampling rate remains.

References

[1] Pohlmann, K. C. (2010). Principles of Digital Audio (6th ed.). McGraw-Hill.

[2] Watkinson, J. (2001). The Art of Digital Audio (3rd ed.). Focal Press.

[3] Immink, K. A. S. (1998). The compact disc story. Journal of the AES, 46(5), 458–465.

[4] Nyquist, H. (1928). Certain topics in telegraph transmission theory. Transactions of the AIEE, 47(2), 617–644.

[5] Shannon, C. E. (1949). Communication in the presence of noise. Proceedings of the IRE, 37(1), 10–21.

How Cats Drink: Inertia, Gravity, and the Froude Number at the Tip of a Tongue

Mon, 22 Jul 2024 00:00:00 +0000

I have spent a non-trivial amount of time watching our cats drink — they are indoor-only cats, on our vet’s advice, which gives them few distractions and gives me ample opportunity to observe. This is not entirely voluntary. Once you have noticed that something is happening at the water bowl that does not look right — the tongue moves too fast, the water column is pulled upward rather than scooped, the jaw closes before the tongue returns — you find yourself crouching beside the bowl with your phone propped against a chair, filming at 240 frames per second and feeling that you have perhaps chosen an unusual way to spend a Tuesday morning.

Pedro Reis, Sunghwan Jung, Jeffrey Aristoff, and Roman Stocker had the same impulse, with better equipment. Their 2010 paper in Science, “How Cats Lap: Water Uptake by Felis catus,” is one of the more elegant pieces of dimensional analysis in recent biology.

How Cats Do Not Drink

The simplest hypothesis — that cats curl the tongue into a spoon and scoop water into the mouth — is false. High-speed photography shows that the cat’s tongue does not form a cup shape. Instead, the cat extends the tongue tip downward toward the water surface and then rapidly retracts it. The motion is fast — too fast for normal video — and the tongue barely contacts the surface.

The contrast with dogs is instructive. Dogs do scoop: the tongue curls backward (not forward), forming a ladle shape that scoops water upward and backwards into the mouth. The mechanism is vigorous and inefficient — a significant fraction of the water misses the mouth entirely, which is why drinking dogs produce splashing and dogs often have wet chins. The mechanism works but is inelegant.

Cats produce almost no splash. The mechanism is different in kind.

The Physical Mechanism

Reis et al. (2010) used high-speed photography (1000 frames per second) to resolve the cat’s lapping motion. Their observations:

The cat extends the tongue tip downward until the dorsal surface (the top side) just touches the water surface. The ventral surface (the smooth underside) does not contact the water.
The cat then rapidly retracts the tongue upward. The tongue tip is moving at roughly $v \approx 0.7\,\mathrm{m/s}$ during this retraction.
As the tongue tip pulls away from the surface, a column of liquid is pulled upward by the adhesion between the liquid and the retreating tongue. The column rises against gravity.
The column eventually stalls — inertia is overcome by gravity — and begins to fall back. The cat closes its jaw at exactly the moment of maximum column height, capturing the peak volume of water.
The cat then extends the tongue for the next lap.

The cat closes its jaw before the tongue fully retracts. This is important: the jaw closure captures the water column, not the water adhering to the tongue. The tongue is the mechanism that creates the column; the jaw captures it.

Dimensional Analysis: The Froude Number

The relevant competition is between inertia (which drives the column upward) and gravity (which pulls it back down). Surface tension plays a role in stabilising the column but is not the primary factor governing the column height.

The balance between inertia and gravity for a fluid column moving at speed $v$ and of characteristic length scale $L$ (here, the diameter of the tongue tip, $L \approx 5\,\mathrm{mm}$ for a domestic cat) is captured by the Froude number:

$$\mathrm{Fr} = \frac{v}{\sqrt{gL}},$$

where $g = 9.81\,\mathrm{m/s}^2$ is gravitational acceleration.

When $\mathrm{Fr} \ll 1$: gravity dominates, inertia is insufficient to pull a significant column of water upward. Very slow tongue motion would lift almost no water.

When $\mathrm{Fr} \gg 1$: inertia dominates, the column rises far above the surface but the jaw must be closed quickly before the large amount of water falls back. Very fast tongue motion wastes water and requires rapid jaw closure.

The optimal lapping frequency — maximising captured volume per lap — occurs near $\mathrm{Fr} \approx 1$, where inertial and gravitational forces are comparable and the column height is matched to the jaw closure dynamics.

Checking the Numbers for a Domestic Cat

For a domestic cat:

Tongue tip diameter: $L \approx 5\,\mathrm{mm} = 5 \times 10^{-3}\,\mathrm{m}$
Characteristic tongue tip speed: $v \approx 0.7\,\mathrm{m/s}$

$$\mathrm{Fr} = \frac{0.7}{\sqrt{9.81 \times 5 \times 10^{-3}}} = \frac{0.7}{\sqrt{0.049}} = \frac{0.7}{0.22} \approx 3.2.$$

Reis et al. found Fr of order unity — inertial and gravitational forces comparable — confirming that the lapping speed is tuned to the inertia-gravity balance. (The exact numerical value depends on the choice of characteristic length scale; using the tongue tip diameter as above gives Fr in the range 1–3, squarely in the regime where neither force dominates.)

Scaling Across Felids

The Froude number prediction yields a scaling law for lapping frequency across felid species of different sizes. If all felids lap at $\mathrm{Fr} \approx 1$, then the characteristic speed scales as $v \sim \sqrt{gL}$, and the lapping frequency scales as:

$$f = \frac{v}{d} \sim \frac{\sqrt{gL}}{d},$$

where $d$ is the distance the tongue travels per lap (roughly proportional to tongue length, which scales with body size). Since $L \sim d$ scales with body size, we get:

$$f \sim \frac{\sqrt{g \cdot d}}{d} = \sqrt{\frac{g}{d}} \propto d^{-1/2}.$$

Larger cats have longer tongues and lap more slowly. The prediction is that lapping frequency scales as the square root of inverse tongue length — or, equivalently, as the inverse square root of body mass (since linear dimensions scale as mass$^{1/3}$):

$$f \propto m^{-1/6}.$$

Reis et al. tested this against high-speed footage of large felids. A domestic cat laps at approximately $4\,\mathrm{Hz}$; a lion laps at approximately $1.2\,\mathrm{Hz}$; a tiger at roughly $1\,\mathrm{Hz}$. The scaling is consistent with $f \propto m^{-1/6}$ across three orders of magnitude in body mass.

The table below shows the predicted versus observed scaling:

Species	Body mass (kg)	Predicted $f$ relative to cat	Predicted $f$ (Hz)	Observed $f$ (Hz)
Domestic cat	4	1.0	4.0	~4.0
Jaguar	80	$\left(\frac{4}{80}\right)^{1/6} \approx 0.61$	2.4	~2.0
Lion	200	$\left(\frac{4}{200}\right)^{1/6} \approx 0.52$	2.1	~1.5
Tiger	220	$\left(\frac{4}{220}\right)^{1/6} \approx 0.51$	2.1	~1.0

The $m^{-1/6}$ scaling captures the correct trend — larger cats lap more slowly — though the predicted frequencies for the largest cats somewhat overestimate the observed values. The discrepancy may reflect the limitations of the simple allometric assumption (that all linear dimensions scale as $m^{1/3}$) and the fact that tongue geometry does not scale isometrically across the full range of felid body sizes.

Why Not Just Lick?

A natural question: why not simply allow the tongue to fully submerge and absorb water through the papillae, as the tongue already contacts water when lapping? Several answers:

Papillae are not sponges. Feline papillae are hollow and scoop-shaped (filiform papillae with hollow tips), optimised for grooming and food manipulation, not passive absorption. Active wicking is limited.
The cat cannot breathe with its mouth submerged. A lapping mechanism that keeps the mouth mostly closed except for the brief jaw-closure moment allows continuous breathing through the nose during drinking.
Speed and efficiency. The inertial column mechanism delivers significantly more water per jaw movement than surface tension adhesion alone. At 4 laps per second, a domestic cat takes in roughly $0.14\,\mathrm{mL}$ per lap, for a total of roughly $34\,\mathrm{mL/min}$ — comparable to sipping rates in animals that use more direct intake mechanisms.

The cat has converged on a hydrodynamically optimal strategy under the constraint of keeping the oral cavity mostly sealed during the intake cycle.

The Robotic Tongue

Reis et al. constructed a robotic cat tongue to verify the mechanism: a smooth glass disc lowered to the water surface and retracted at controlled speeds. The column height as a function of speed followed the predicted inertia-gravity balance, confirming that the mechanism does not depend on any specifically biological property of the tongue — it is a fluid dynamics result that applies to any surface moving away from a water interface at the right speed.

The robot lapped at the same Froude number as the cat.

Dogs, Horses, and the Comparison

Dogs cup the tongue caudally (backwards) rather than ventrally, forming a ladle. The mechanism is faster and delivers more water per stroke but is messy — the ladle is formed outside the mouth, and water sloshes freely. Dogs lap at roughly $3\,\mathrm{Hz}$ with a tongue tip speed significantly higher than cats, producing Fr well above unity. The excess inertia is why dog drinking generates splashing.

Horses, by contrast, create a near-seal with their lips and use suction — a fundamentally different mechanism that requires no tongue projection at all. The lapping mechanism of felids is phylogenetically specific and appears to have evolved under selection pressure for both efficiency and noise suppression, consistent with the ambush-predator lifestyle. A cat that splashed while drinking would alert prey at a water source. A cat that laps near-silently does not.

A Note on the Measurement

Getting reliable high-speed footage of a cat drinking is harder than it sounds. Our cats drink at different times of day, in different moods, and the presence of a camera tripod next to the water bowl is regarded as grounds for drinking elsewhere. Pedro Reis et al. solved this by filming their laboratory cat, Cutta Cutta, in a controlled setting. Their footage is available online and is genuinely beautiful: a slow-motion waterfall in miniature, rising improbably from the tongue tip and held there by the balance between upward momentum and downward gravity, until the jaw swings shut.

The physics is in the timing.

References

Reis, P.M., Jung, S., Aristoff, J.M., & Stocker, R. (2010). How cats lap: Water uptake by Felis catus. Science, 330(6008), 1231–1234. https://doi.org/10.1126/science.1195421
Aristoff, J.M., Stocker, R., Jung, S., & Reis, P.M. (2011). On the water lapping of felines and the water running of lizards. Communicative & Integrative Biology, 4(2), 213–215.
Vogel, S. (1994). Life in Moving Fluids: The Physical Biology of Flow (2nd ed.). Princeton University Press.

Changelog

2025-12-15: Updated water intake per lap from 0.04 mL to 0.14 mL (Reis et al. report ~0.14 +/- 0.04 mL per lap; the previous value was the standard deviation), and updated the intake rate accordingly (~34 mL/min). Updated the papillae location from ventral to dorsal surface. Updated the Aristoff et al. reference to the correct 2011 Communicative & Integrative Biology article. Removed the Jung & Kim (2012) PRL reference (article number 034501 resolves to a different paper).

Are Cats Liquid? The Deborah Number and the Rheology of Cats

Wed, 03 Apr 2024 00:00:00 +0000

One of our strays discovered, sometime in her first winter indoors — they are strictly indoor cats now, on our vet’s recommendation — that she could fit into a salad bowl. Not sit beside it, not rest her head on its rim: fit into it, curled into a precise sphere with her tail tucked under her chin and her ears folded flat, filling the bowl as liquid fills a container. The bowl has a diameter of 22 centimetres. I did not find this as surprising as perhaps I should have: there is a quantity in materials science that determines, rigorously, whether a given material in a given situation should be classified as a solid or a liquid. For a cat in a bowl, this quantity is comfortably below one.

The material is a liquid. The material is also a cat.

The Definition of a Fluid

The intuitive distinction between solids and liquids is that solids hold their shape and liquids conform to their container. But this distinction is one of timescale, not of material identity.

A classic demonstration: place a ball of silly putty on a table. Over the course of an hour, it flows slowly outward, taking the shape of the table surface — clearly a liquid. Strike it sharply with a hammer and it shatters — clearly a solid. The material has not changed. The timescale of the interaction has.

The same principle applies to glass (contrary to popular myth, medieval window glass is not thicker at the bottom because it has flowed — the variation is from the manufacturing process, and the relaxation time of soda-lime glass at room temperature is of order $10^{23}$ years — but at elevated temperatures near the glass transition, silicate glass flows readily). It applies to mantle rock, which is solid on the scale of earthquake waves and liquid on the scale of continental drift. It applies to pitch, to ice sheets, to asphalt on a hot day.

The formal tool for capturing this is the Deborah number.

The Deborah Number

The Deborah number was introduced by Marcus Reiner in 1964, in a short note in Physics Today (Reiner 1964). It is defined as:

$$\mathrm{De} = \frac{\tau}{T},$$

where $\tau$ is the relaxation time of the material — roughly, the characteristic time over which it can rearrange its internal structure and relieve stress — and $T$ is the observation time or the timescale of the imposed deformation.

$\mathrm{De} \ll 1$: The material relaxes quickly relative to the timescale of observation. Internal stresses are continuously relieved. The material behaves as a fluid.
$\mathrm{De} \gg 1$: The material relaxes slowly relative to the observation timescale. Internal stresses persist. The material behaves as a solid.
$\mathrm{De} \sim 1$: The material is in a viscoelastic regime — partly fluid, partly solid, exhibiting time-dependent behaviour that is neither.

The name comes from the prophetess Deborah, who sang in Judges 5:5: “The mountains flowed before the Lord.” At the timescale of a divine perspective, mountains are liquid. At the timescale of a human lifetime, they are not. Reiner’s point was that the solid-liquid distinction is not a property of the material but of the relationship between the material’s internal dynamics and the observer’s timescale.

For Newtonian fluids (water, air at ordinary conditions), $\tau \to 0$ and $\mathrm{De} \to 0$ for any finite observation time — they are always liquid. For a perfectly elastic solid (an ideal spring), $\tau \to \infty$ and $\mathrm{De} \to \infty$ for any finite observation time — always solid. Real materials lie between these extremes.

The Maxwell Viscoelastic Model

The simplest model of a material with a finite relaxation time is the Maxwell element: a spring (elastic, spring constant $G$) in series with a dashpot (viscous, viscosity $\eta$). Under a step stress $\sigma_0$ applied at time $t = 0$, the strain evolves as:

$$\epsilon(t) = \frac{\sigma_0}{G} + \frac{\sigma_0}{\eta}\,t,$$

where $\tau = \eta / G$ is the Maxwell relaxation time. The first term is the instantaneous elastic deformation of the spring; the second is the linear viscous creep of the dashpot. For $t \ll \tau$, the elastic strain dominates and the material behaves as a solid; for $t \gg \tau$, the viscous flow dominates and the material behaves as a liquid. The material “decides” whether to be solid or liquid depending on the ratio of $\tau$ to the duration of the applied stress — which is precisely the Deborah number.

The creep compliance $J(t) = \epsilon(t)/\sigma_0 = t/\eta + 1/G$ grows linearly with time for $t \gg \tau$, confirming liquid behaviour on long timescales. The relaxation modulus $G(t) = \sigma(t)/\epsilon_0 = G e^{-t/\tau}$ decays exponentially to zero, confirming that the material cannot sustain a permanent stress — again, liquid behaviour on long timescales.

On the Rheology of Cats

In 2014, Marc-Antoine Fardin, a physicist at the ENS Lyon, published “On the Rheology of Cats” in the Rheology Bulletin 83(2), 16–17. The paper asked whether cats satisfy the defining rheological criterion for liquids, using the Deborah number as the test. Fardin was awarded the 2017 Ig Nobel Prize in Physics — which is awarded for research that “makes you laugh, then makes you think” — for this work.

The paper is not a joke. It is standard rheology applied to an unusual material, with appropriately hedged conclusions and correct citations to the primary literature on viscoelastic flow. The humour is in the application; the physics is serious.

Estimating the Cat’s Relaxation Time

The relaxation time $\tau$ of a cat is the time scale over which the cat’s body deforms to fill a container. This is observable. A cat placed near a suitable container — a salad bowl, a cardboard box, a bathroom sink — adopts a conformed shape on a timescale of roughly 5–30 seconds. The initial posture (stiff, alert) gives way to a relaxed conformation as the cat assesses the container and adjusts. Fardin estimated $\tau \approx 1$–$30$ seconds, with the exact value depending on the container’s attractiveness to the specific cat.

This is the material’s characteristic relaxation time. The fact that it is finite — that the cat does eventually conform to the container — is the essential observation.

Computing the Deborah Number for Various Situations

Scenario 1: Cat in a sink. A cat taking ten minutes to settle into a bathroom sink. Observation time $T = 600\,\mathrm{s}$, relaxation time $\tau \approx 5\,\mathrm{s}$.

$$\mathrm{De}_\mathrm{sink} = \frac{5}{600} \approx 0.008 \ll 1.$$

The cat is unambiguously a liquid.

Scenario 2: Cat in a cardboard box. Conformation over approximately 30 minutes, $\tau \approx 20\,\mathrm{s}$.

$$\mathrm{De}_\mathrm{box} = \frac{20}{1800} \approx 0.011 \ll 1.$$

Liquid.

Scenario 3: Cat dropping from a bookshelf. Contact time during a jump approximately $T \approx 0.05\,\mathrm{s}$, relaxation time still $\tau \approx 5\,\mathrm{s}$.

$$\mathrm{De}_\mathrm{jump} = \frac{5}{0.05} = 100 \gg 1.$$

Solid. The cat does not deform into the shape of the bookshelf during the jump; it rebounds elastically.

Scenario 4: Cat startled by a loud noise. Reaction time $T \approx 0.3\,\mathrm{s}$, $\tau \approx 5\,\mathrm{s}$.

$$\mathrm{De}_\mathrm{startle} = \frac{5}{0.3} \approx 17 \gg 1.$$

Solid. On short timescales, cats behave as elastic materials — they spring, they bounce, they do not flow.

The cat is neither permanently solid nor permanently liquid. It is a viscoelastic material whose phase classification depends on the timescale of the interaction. This is not a loose analogy; it is the definition of viscoelasticity.

Non-Newtonian Behaviour and Flow Instabilities

Fardin noted an additional complication: cat flow is not Newtonian. A Newtonian fluid has a viscosity $\eta$ that is independent of the applied shear rate $\dot\gamma$. Many real materials are shear-thinning (viscosity decreases with increasing shear rate — ketchup, blood, many polymer solutions) or shear-thickening (viscosity increases with increasing shear rate — cornstarch suspension, some dense suspensions). Cats, Fardin observed, appear to be shear-thinning: the more rapidly you attempt to move a relaxed cat from its current position, the more “liquid” (accommodating, compliant) it becomes, up to a point at which the cat transitions to solid behaviour (claws, teeth).

This is, formally, the behaviour of a yield-stress fluid: a material that behaves as a solid below a critical stress $\sigma_y$ and flows above it. The Herschel–Bulkley model describes such fluids:

$$\sigma = \sigma_y + k \dot\gamma^n, \quad \sigma > \sigma_y,$$

where $k$ is the flow consistency index and $n < 1$ for shear-thinning. The challenge of fitting $k$, $n$, and $\sigma_y$ for a specific cat is experimental, and Fardin acknowledged this was left to future work.

The Deborah number and the yield stress together provide a two-parameter phase diagram for cat rheology:

Low stress, short timescale: solid (De ≫ 1 or σ < σ_y)
Low stress, long timescale: liquid (De ≪ 1)
High stress: yield, followed by flow

Flow Instabilities: The Rayleigh-Plateau Connection

Fardin also noted that cats confined to containers thinner than their body diameter can exhibit flow instabilities. A cat attempting to fit into a glass too narrow for its body will sometimes adopt a helical or coiled configuration — an instability reminiscent of the Rayleigh–Plateau instability of a liquid jet.

The Rayleigh–Plateau instability occurs when a cylindrical fluid jet of radius $r_0$ is subject to perturbations of wavelength $\lambda > 2\pi r_0$. Modes with wavelength longer than the cylinder’s circumference are unstable and grow, breaking the jet into droplets. The dispersion relation for growth rate $\sigma$ as a function of wavenumber $k = 2\pi/\lambda$ (for an inviscid jet) is:

$$\sigma^2 = \frac{\gamma}{\rho r_0^3}\, k r_0 \bigl(1 - k^2 r_0^2\bigr) I_1(kr_0)/I_0(kr_0),$$

where $\gamma$ is surface tension and $I_0, I_1$ are modified Bessel functions. The analogy with a cat is inexact — surface tension is not the dominant restoring force — but the qualitative instability mechanism (a long cylinder of material is unstable to perturbations whose wavelength exceeds the cylinder’s circumference) appears to apply, suggesting that very elongated cats in very narrow containers should be unstable to coiling. This is, again, left to future experimental work.

Why the Deborah Number Matters (Outside of Cat Physics)

The Deborah number is not a curiosity; it is a central dimensionless number in engineering and materials science.

Polymer processing: The flow of polymer melts through injection-moulding channels involves De in the range $10^{-2}$–$10^2$. Too high a De leads to elastic instabilities, melt fracture, and surface defects in the finished part.

Blood rheology: Blood is a non-Newtonian viscoelastic fluid. In the large arteries (low shear rate), red blood cells aggregate into rouleaux and blood behaves as a shear-thinning fluid. In the capillaries (high shear rate), rouleaux break up and individual cells deform to fit through vessels smaller than their resting diameter — liquid behaviour on short length scales.

Geophysics: The mantle is an elastic solid for seismic waves ($T \sim$ seconds, De ≫ 1) and a viscous fluid for convection ($T \sim 10^8$–$10^9$ years, De ≪ 1). The same material. Different Deborah numbers.

Glaciology: Ice is an elastic solid for rapid fracture (calving of icebergs) and a viscous fluid for glacier flow. The transition occurs at timescales of years to decades, depending on temperature and stress.

The cat is in good company.

References

Fardin, M.-A. (2014). On the rheology of cats. Rheology Bulletin, 83(2), 16–17.
Reiner, M. (1964). The Deborah number. Physics Today, 17(1), 62. https://doi.org/10.1063/1.3051374
Barnes, H.A., Hutton, J.F., & Walters, K. (1989). An Introduction to Rheology. Elsevier (Rheology Series, Vol. 3).
Bird, R.B., Armstrong, R.C., & Hassager, O. (1987). Dynamics of Polymeric Liquids, Vol. 1: Fluid Mechanics (2nd ed.). Wiley-Interscience.
Eggers, J. (1997). Nonlinear dynamics and breakup of free-surface flows. Reviews of Modern Physics, 69(3), 865–930. https://doi.org/10.1103/RevModPhys.69.865

Changelog

2025-12-15: Fixed Deborah number in summary from 0.08 to 0.008 (matching the body calculation: 5/600 = 0.00833).
2025-12-15: Corrected Fardin’s institutional affiliation from “Paris Diderot University” to “ENS Lyon” — his affiliation on the 2014 Rheology Bulletin paper is Université de Lyon / ENS Lyon (CNRS UMR 5672). He moved to Paris Diderot later in 2014, after the paper was published.

When Musicians Lock In: Coupled Oscillators and the Physics of Ensemble Synchronisation

Thu, 08 Feb 2024 00:00:00 +0000

The problem is ancient and the language for it is recent. In any ensemble — a string quartet, a jazz rhythm section, an orchestra — musicians with slightly different internal tempos must stay together. They do this by listening to each other. But what, exactly, does “listening to each other” do to their timing? And what happens when the listening channel is imperfect — delayed by the speed of sound across a wide stage, or by a network cable crossing a continent? The answer involves a differential equation that was not written to describe music.

This post extends the latency analysis in Latency in Networked Music Performance with the dynamical systems framework that underlies it.

Two Clocks on a Board

The first documented observation of coupled-oscillator synchronisation was made not by a musician but by a physicist. In 1665, Christiaan Huygens, confined to bed with illness, was watching two pendulum clocks mounted on the same wooden beam. Over the course of the night, the pendulums had synchronised into anti-phase oscillation — swinging in opposite directions in exact unison. He reported it to his father:

“I have noticed a remarkable effect which no-one has observed before… two clocks on the same board always end up in mutual synchrony.”

The mechanism was mechanical coupling through the beam. Each pendulum’s swing imparted a small impulse to the wood; the other pendulum felt this as a perturbation to its rhythm. Small perturbations, accumulated over hours, drove the clocks into a shared frequency and a fixed phase relationship.

This is the prototype of every ensemble synchronisation problem. Each musician is a clock. The acoustic environment — the air in the room, the reflected sound from the walls, the vibrations through the stage floor — is the wooden beam.

The Kuramoto Model

Yoshiki Kuramoto formalised the mathematics of coupled oscillators in 1975, motivated by biological synchronisation problems: firefly flashing, circadian rhythms, cardiac pacemakers. His model considers $N$ oscillators, each with a phase $\theta_i(t)$ evolving according to:

$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin(\theta_j - \theta_i), \qquad i = 1, \ldots, N.$$

The first term, $\omega_i$, is the oscillator’s natural frequency — the tempo it would maintain in isolation. These are drawn from a distribution $g(\omega)$, which in a real ensemble reflects the spread of individual preferred tempos among the players. The second term is the coupling: each oscillator is attracted toward the phases of all others, with strength $K/N$. The factor $1/N$ keeps the total coupling intensive (independent of ensemble size) as $N$ grows large.

Musically: $\theta_i$ is the phase of musician $i$’s internal pulse at a given moment, $\omega_i$ is their preferred tempo if playing alone, and $K$ is the coupling strength — how much they adjust their tempo in response to what they hear from the others.

The Order Parameter and the Phase Transition

To measure the degree of synchronisation, Kuramoto introduced the complex order parameter:

$$r(t)\, e^{i\psi(t)} = \frac{1}{N} \sum_{j=1}^{N} e^{i\theta_j(t)},$$

where $r(t) \in [0, 1]$ is the coherence of the ensemble and $\psi(t)$ is the collective mean phase. When $r = 0$, the phases are uniformly spread around the unit circle — the ensemble is incoherent. When $r = 1$, all phases coincide — perfect synchrony. In a live ensemble, $r$ is a direct measure of rhythmic cohesion, though of course not one you can read off a score.

Substituting the order parameter into the equation of motion:

$$\frac{d\theta_i}{dt} = \omega_i + K r \sin(\psi - \theta_i).$$

Each oscillator now interacts only with the mean-field quantities $r$ and $\psi$, not with every other oscillator individually. The coupling pulls each musician toward the collective mean phase with a force proportional to both $K$ (how attentively they listen) and $r$ (how coherent the group already is).

This mean-field form reveals the essential physics. For small $K$, oscillators with widely differing $\omega_i$ cannot follow the mean field — they drift at their own frequencies, and $r \approx 0$. At a critical coupling strength $K_c$, a macroscopic fraction of oscillators suddenly locks to a shared frequency, and $r$ begins to grow continuously from zero. For a unimodal, symmetric frequency distribution $g(\omega)$ with density $g(\bar\omega)$ at the mean:

$$K_c = \frac{2}{\pi\, g(\bar\omega)}.$$

Above $K_c$, the coherence grows as:

$$r \approx \sqrt{\frac{K - K_c}{K_c}}, \qquad K \gtrsim K_c.$$

This is a second-order (continuous) phase transition — the same mathematical structure as a ferromagnet approaching the Curie temperature, where spontaneous magnetisation appears continuously above a critical coupling. The musical ensemble and the magnetic material belong to the same universality class, governed by the same mean-field exponent $\frac{1}{2}$.

Above $K_c$, the fraction of oscillators that are locked (synchronised to the mean-field frequency) can be computed explicitly. An oscillator with natural frequency $\omega_i$ locks to the mean field if $|\omega_i - \bar\omega| \leq Kr$. For a Lorentzian distribution $g(\omega) = \frac{\gamma/\pi}{(\omega - \bar\omega)^2 + \gamma^2}$, this yields:

$$r = \sqrt{1 - \frac{K_c}{K}}, \qquad K_c = 2\gamma,$$

which is the exact self-consistency equation for the Kuramoto model with Lorentzian frequency spread (Strogatz, 2000).

The physical reading is direct: whether an ensemble locks into a shared pulse or drifts apart is a threshold phenomenon. A group of musicians with similar preferred tempos has a peaked $g(\bar\omega)$, giving a low $K_c$ — they synchronise easily with minimal attentive listening. A group with widely varying individual tempos needs stronger, more sustained coupling to cross the threshold. This is not a matter of musical discipline; it is a material property of the ensemble.

Concert Hall Applause: Neda et al. (2000)

The Kuramoto model is not only a theoretical construction. Neda et al. (2000) applied it to concert hall applause — one of the most direct real-world demonstrations of coupled-oscillator dynamics in a musical context.

They recorded applause in Romanian and Hungarian theaters and found that audiences spontaneously alternate between two distinct states. In the incoherent regime, each audience member claps at their own preferred rate (typically 2–3 Hz). Through acoustic coupling — each person hears the room-averaged sound and adjusts their clapping — the audience gradually synchronises to a shared, slower frequency (around 1.5 Hz): the synchronised regime.

The transitions between the two regimes are quantitatively consistent with the Kuramoto phase transition: the emergence of synchrony corresponds to $K$ crossing $K_c$ as people progressively pay more attention to the collective sound. Furthermore, Neda et al. document a characteristic phenomenon when synchrony breaks down: individual clapping frequency approximately doubles as audience members attempt to re-establish coherence. This frequency-doubling — a feature of nonlinear oscillator systems near instability — is exactly what the delayed response of coupling near $K_c$ predicts.

The paper is a useful pedagogical artefact: every music student has experienced concert hall applause, and hearing that it undergoes a physically measurable phase transition makes the connection between physics and musical experience concrete.

Latency and the Limits of Networked Ensemble Performance

In standard acoustic ensemble playing, the coupling delay is the propagation time for sound to cross the ensemble: at $343\ \text{m/s}$, across a ten-metre stage, roughly 30 ms. This is why orchestral seating is arranged with attention to who needs to hear whom first.

In networked music performance (NMP), the coupling delay $\tau$ is much larger: tens to hundreds of milliseconds depending on geographic distance and network infrastructure. The Kuramoto model generalises naturally to include this delay:

$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin\!\bigl(\theta_j(t - \tau) - \theta_i(t)\bigr).$$

Each musician hears the others’ phases as they were $\tau$ seconds ago, not as they are now.

In a synchronised state where all oscillators share the collective frequency $\bar\omega$ and phase $\psi(t) = \bar\omega t$, the delayed phase signal is $\psi(t - \tau) = \bar\omega t - \bar\omega\tau$. The effective coupling force contains a factor $\cos(\bar\omega\tau)$: the delay introduces a phase shift that reduces the useful component of the coupling. The critical coupling with delay is therefore:

$$K_c(\tau) = \frac{K_c(0)}{\cos(\bar\omega \tau)}.$$

As $\tau$ increases, $K_c(\tau)$ grows: synchronisation requires progressively stronger coupling (more attentive adjustment) to compensate for the information lag. The denominator $\cos(\bar\omega\tau)$ reaches zero when $\bar\omega\tau = \pi/2$. At this point $K_c(\tau) \to \infty$: no finite coupling strength can maintain synchrony. The critical delay is:

$$\tau_c = \frac{\pi}{2\bar\omega}.$$

For an ensemble performing at 120 BPM, the beat frequency is $\bar\omega = 2\pi \times 2\ \text{Hz} = 4\pi\ \text{rad/s}$:

$$\tau_c = \frac{\pi}{2 \times 4\pi} = \frac{1}{8}\ \text{s} = 125\ \text{ms}.$$

This is a remarkably clean result. The Kuramoto model with delay predicts that ensemble synchronisation collapses at around 125 ms one-way delay for a standard performance tempo. The empirical literature on NMP — from LoLa deployments across European conservatories to controlled latency studies in the lab — consistently finds that rhythmic coherence degrades noticeably above 50–80 ms and becomes essentially unworkable above 100–150 ms one-way. The model and the data agree.

The derivation also shows why faster tempos are harder in NMP: $\tau_c \propto 1/\bar\omega$, so doubling the tempo halves the tolerable latency. An ensemble performing at 240 BPM in a distributed setting faces a theoretical ceiling of 62 ms — which rules out transcontinental performance for most repertoire.

Brains in Sync: EEG Hyperscanning

The Kuramoto framework has recently been applied at a neural level. EEG hyperscanning — simultaneous EEG recording from multiple participants during a shared musical activity — has shown that musicians performing together exhibit inter-brain synchronisation: coherent cortical oscillations at the frequency of the music are measurable between players (Lindenberger et al., 2009; Müller et al., 2013). The phase coupling between brains during joint performance is significantly higher than during solo performance and higher than for musicians playing simultaneously but without acoustic coupling.

This suggests that the Kuramoto coupling operates at two levels: the acoustic (each musician hears the other and adjusts physical timing) and the neural (each musician’s cortical oscillators entrain to the shared musical pulse). The question of which level is primary — whether neural synchrony causes or follows from acoustic synchrony — remains open.

A 2023 review by Demos and Palmer argues that pairwise Kuramoto-type coupling is insufficient to capture full ensemble dynamics. Group-level effects — the differentiation between leader and follower roles, the emergence of collective timing that no individual would produce alone — require nonlinear dynamical frameworks that go beyond mean-field averaging. The model that adequately describes a string quartet may need to be richer than the one that describes a population of identical fireflies.

What This Means for Teaching

The Kuramoto model reframes standard rehearsal intuitions in physical terms.

“Listen more” translates to “increase your effective coupling constant $K$.” A musician who plays without attending to others has set $K \approx 0$ and will drift freely according to their own $\omega_i$. Listening — actively adjusting tempo in response to what you hear — is not metaphorical. It is the physical mechanism of coupling, and its effect is to pull you toward the mean phase $\psi$ with a force $Kr\sin(\psi - \theta_i)$.

“Our tempos are too different” is a claim about $g(\bar\omega)$ and therefore about $K_c$. A group with a wide spread of natural tempos needs more and stronger listening to synchronise. This is not a moral failing but a parameter; it suggests that ensemble warm-up time or explicit tempo negotiation before a performance serves to reduce the spread of natural frequencies before the coupling has to do all the work.

Latency as a rehearsal experiment can be made explicit. Artificially delaying the acoustic return to one musician in an ensemble — via headphone monitoring with variable delay — allows students to experience directly how the coordination degrades as $\tau$ increases toward $\tau_c$. They feel the system approaching the phase transition without the theoretical framework, but the framework makes the experience interpretable afterward.

The click track replaces peer-to-peer Kuramoto coupling with an external forcing term: each musician locks to a shared reference with fixed $\omega$ rather than adjusting dynamically to the group mean. This eliminates the phase transition but also eliminates the adaptive dynamics — the micro-timing fluctuations and expressive rubato — that characterise live ensemble playing. It is a pedagogically important distinction, even if studios routinely make the pragmatic choice.

References

Demos, A. P., & Palmer, C. (2023). Social and nonlinear dynamics unite: Musical group synchrony. Trends in Cognitive Sciences, 27(11), 1008–1018. https://doi.org/10.1016/j.tics.2023.08.005
Huygens, C. (1665). Letter to his father Constantijn Huygens, 26 February 1665. In Œuvres complètes de Christiaan Huygens, Vol. 5, p. 243. Martinus Nijhoff, 1893.
Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear oscillators. In H. Araki (Ed.), International Symposium on Mathematical Problems in Theoretical Physics (Lecture Notes in Physics, Vol. 39, pp. 420–422). Springer.
Kuramoto, Y. (1984). Chemical Oscillations, Waves, and Turbulence. Springer.
Lindenberger, U., Li, S.-C., Gruber, W., & Müller, V. (2009). Brains swinging in concert: Cortical phase synchronization while playing guitar. BMC Neuroscience, 10, 22. https://doi.org/10.1186/1471-2202-10-22
Müller, V., Sänger, J., & Lindenberger, U. (2013). Intra- and inter-brain synchronization during musical improvisation on the guitar. PLOS ONE, 8(9), e73852. https://doi.org/10.1371/journal.pone.0073852
Neda, Z., Ravasz, E., Vicsek, T., Brechet, Y., & Barabási, A.-L. (2000). Physics of the rhythmic applause. Physical Review E, 61(6), 6987–6992. https://doi.org/10.1103/PhysRevE.61.6987
Strogatz, S. H. (2000). From Kuramoto to Crawford: Exploring the onset of synchronization in populations of coupled oscillators. Physica D: Nonlinear Phenomena, 143(1–4), 1–20. https://doi.org/10.1016/S0167-2789(00)00094-4
Strogatz, S. H. (2003). Sync: How Order Emerges from Chaos in the Universe, Nature, and Daily Life. Hyperion.

Changelog

2026-01-14: Updated the author list for the Demos (2023) Trends in Cognitive Sciences reference to the published two authors (Demos & Palmer). The five names previously listed were from a different Demos paper.
2026-01-14: Changed “period-doubling” to “frequency-doubling.” When the clapping frequency doubles, the period halves; “frequency-doubling” is the precise term in this context.

LK-99: Six Weeks That Showed How Physics Works

Mon, 09 Oct 2023 00:00:00 +0000

July 22, 2023

On a Saturday morning in late July 2023, two preprints appeared on arXiv. They were submitted by researchers affiliated with the Quantum Energy Research Centre in Seoul — Sukbae Lee, Ji-Hoon Kim, and colleagues — and they claimed something that condensed matter physicists have been chasing for over a century: a material that superconducts at room temperature and ambient pressure.

The compound was called LK-99. It was a copper-doped lead apatite, synthesized from common precursors using a procedure that, on paper, any moderately equipped laboratory could attempt. The claimed critical temperature was above 400 K — well above 293 K, which is room temperature, which is roughly the temperature of a warm afternoon in Seoul in July.

A video circulated almost immediately. A small, grey, irregular piece of LK-99 appeared to be partially levitating — tilting up, one end raised — above a permanent neodymium magnet. In the video it wobbles slightly, like something caught between gravity and an invisible hand.

Physics Twitter — I will use that name; it was still recognizably that in July 2023 — detonated. Within 72 hours, laboratories across the world were racing to synthesize LK-99. Discord servers formed. GitHub repositories appeared with shared synthesis protocols. Preprints from independent groups began accumulating before the original authors had likely had a good night’s sleep.

Six weeks later, the claim was dead.

I want to write about what happened in those six weeks, because I think the episode is more interesting as sociology of science than as condensed matter physics. LK-99 turned out to be a modest semiconductor with a ferromagnetic impurity. But the speed and the manner of that determination — the way a globally distributed community of physicists organized itself, shared data in real time, converged on a falsification, and then moved on — that is genuinely remarkable, and worth examining carefully.

Why Room-Temperature Superconductivity Is the Grail

Let me be precise about why this particular claim generates the response it does.

Superconductivity is the phenomenon in which certain materials, below a critical temperature T_c, carry electrical current with exactly zero resistance. Not very low resistance — zero. A current established in a superconducting loop will, in principle, continue flowing indefinitely without any driving voltage. This is not a small quantitative improvement over ordinary conductors; it is a qualitatively different regime of physics.

The trouble is that essentially all known superconductors require extreme cooling. Conventional metallic superconductors — the ones Heike Kamerlingh Onnes discovered in mercury in 1911 — become superconducting below about 30 K at best. That is liquid helium temperature, which is expensive, logistically demanding, and entirely impractical for large-scale applications. The discovery of high-temperature cuprate superconductors in 1986 (Bednorz and Müller, Nobel Prize 1987) was genuinely revolutionary: some cuprates superconduct up to about 138 K. But 138 K is still −135°C. It requires liquid nitrogen cooling, which is cheaper than liquid helium but still not something you install in a power grid without substantial infrastructure.

The current record belongs to a class of hydrogen-rich compounds under extreme pressure — carbonaceous sulfur hydride at roughly 15°C, but requiring about 267 GPa of pressure. For context, the pressure at the center of the Earth is about 360 GPa. You cannot run a power cable through a diamond anvil cell.

Room-temperature, ambient-pressure superconductivity would be transformative in a way that very few material discoveries are. Electrical grids currently lose somewhere between 5 and 10 percent of all transmitted energy to resistive heating — a staggering quantity of energy, simply dissipated as heat in cables. Zero-resistance transmission would eliminate that loss. Magnetically levitated transport would become feasible without the cryogenic infrastructure that makes current Maglev systems enormously expensive to build and maintain. Compact, affordable MRI machines would become possible. Effects on computing, on energy storage, on medical technology — the list runs long. It would be one of the most consequential material discoveries in the history of technology.

This is why the response to the LK-99 preprints was not hysteria but rather the entirely rational behavior of a community that understood exactly what was at stake if the claim were true.

What LK-99 Was and What It Claimed

LK-99 is chemically expressed as Pb₁₀₋ₓCuₓ(PO₄)₆O, where x is approximately 0.9 to 1.1. It is a lead apatite — the same crystal family as the mineral in tooth enamel — with a fraction of the lead atoms replaced by copper.

The proposed mechanism, as sketched in the preprints, involved Cu²⁺ substituting for Pb²⁺. Because copper has a slightly smaller ionic radius than lead, this substitution induces a local structural distortion. The claim was that this distortion produces a flat electronic band at the Fermi level — and flat bands are associated with strong electronic correlations that can, in principle, give rise to unconventional superconductivity. The analogy to twisted bilayer graphene was implicit in the discussion, though the mechanism is quite different and twisted bilayer graphene superconducts only well below 1 K.

Reading the preprints in late July 2023 was, I confess, a slightly uncomfortable experience. The writing was rushed. The two preprints — submitted by different author subsets from the same group — were internally inconsistent in places. The resistance measurements showed a large drop with temperature, but not zero resistance. The synthesis protocol was described in enough detail to be reproducible, which was good, but the characterization was incomplete in ways that mattered.

Red flags were present from the beginning, and many physicists noted them immediately. The levitation video showed a piece of LK-99 that was tilted and wobbling — not the stable, complete expulsion of magnetic flux you would expect from a true Meissner effect. A perfect superconductor placed above a magnet would levitate horizontally and stably. This piece was doing something, but the something was not obviously Meissner levitation.

And yet. The synthesis was simple. The claim was specific and testable. If there was even a small chance it was real, the imperative to check was overwhelming. So labs checked.

The Replication Wave

What happened over the following weeks was, as far as I am aware, unprecedented in condensed matter physics.

Normally, a replication in physics looks like this: a group reads a paper, decides it is interesting enough to attempt, orders precursor materials, synthesizes the compound (which takes weeks to months), characterizes it with appropriate instruments (more weeks), writes up the results, submits them (more weeks), and eventually publishes — often six months to a year after the original claim, sometimes much longer. The feedback cycle is slow by design: slowness is a feature, not a bug, because it allows careful work rather than hasty work.

The LK-99 replication did not look like this.

Within a week, preprints from independent groups — China, India, the United States, Germany — were appearing on arXiv. Discord servers with hundreds of members were organizing synthesis attempts in real time, sharing thermograms, resistance measurements, and microscope images as they came off instruments. Twitter threads tracked emerging results with the urgency of a live event. A GitHub repository maintained by the community accumulated synthesis protocols, shared data files, and links to new preprints as they appeared.

Some groups reported partial levitation. Others reported anomalous resistance drops. Others — starting almost immediately — reported synthesizing the material and finding nothing unusual at all.

The speed of this was extraordinary not because of any particular organizational effort, but because the incentive structure happened to align with the infrastructure that now exists. Preprints made sharing immediate. Social media made results public the moment they existed. The synthesis was simple enough to attempt in any reasonably equipped solid-state chemistry lab. And the motivation — the prize, if it were real — was enormous. You would not need to tell anyone to work on this. You would have to tell people to stop.

By mid-August 2023 — three weeks after the original preprints — the key debunking papers had appeared. By late August, there was no serious scientific debate remaining.

The Mechanism of Falsification

The levitating video was explained first, and the explanation is both mundane and instructive.

The LK-99 synthesis produces, as an essentially unavoidable impurity, copper sulfide — Cu₂S. Copper sulfide is interesting in its own right: it undergoes a structural phase transition at roughly 105°C (378 K) from a low-temperature chalcocite form to a high-temperature superionic conductor. This transition is accompanied by a large, sharp drop in electrical resistance — exactly the kind of anomalous feature that, in a sample of mixed composition, might be misidentified as a superconducting transition.

More importantly for the levitation: the LK-99 synthesis products ubiquitously contain ferromagnetic impurity phases. A ferromagnetic material will interact with a permanent magnet. Partial levitation, tilted and unstable, is entirely consistent with a ferromagnetic-diamagnetic competition — not with the Meissner effect.

Several groups published debunking papers in rapid succession. Kumar and colleagues (Kumar et al., 2023) reported the absence of superconductivity in LK-99 samples; other groups synthesized Cu₂S independently, confirmed its resistance anomaly near 380 K, and showed quantitatively that the LK-99 observations were fully consistent with Cu₂S contamination and ferromagnetic impurities. Liu and Meng (Liu & Meng, 2023) provided a complementary symmetry analysis explaining why the structural distortion mechanism did not actually predict superconductivity.

Several Chinese groups with high-quality synthesis capabilities — and, frankly, strong motivation to find a positive result — produced very pure LK-99 samples and found what you would expect of a clean lead apatite: a semiconductor with modest diamagnetism. Nothing anomalous. When you removed the Cu₂S impurity, you removed the anomaly.

Daniel Garisto summarized the consensus in a Nature news piece in August 2023 (Garisto, 2023): LK-99 is not a superconductor. The case was closed, with an efficiency that the scientific community should be proud of.

A Useful Contrast: Ranga Dias

The LK-99 episode does not exist in isolation. The preceding years had seen other extraordinary claims of room-temperature or near-room-temperature superconductivity, and the most prominent involved Ranga Dias at the University of Rochester.

Dias published two papers in Nature claiming superconductivity at or near room temperature: one in 2020, describing carbonaceous sulfur hydride at roughly 15°C under 267 GPa (Snider et al., 2020 — and I note that the earlier Dias and Silvera Science paper on metallic hydrogen (Dias & Silvera, 2017) received a significant erratum and has been widely questioned — establishing a pattern), and one in 2023, describing nitrogen-doped lutetium hydride under much lower pressure. Both Nature papers were eventually retracted — the 2020 paper in 2022, the 2023 paper in November 2023 — amid serious and credible allegations of data manipulation. The criticisms included statistical anomalies in background signals, apparent image duplication across different experimental conditions, and raw data that did not match the published figures. Hirsch, who had been following these claims closely, documented many of the irregularities (Hirsch, 2021).

The contrast with LK-99 is worth sitting with. The Korean team appears to have been guilty of honest overreach: genuine excitement about anomalous observations, insufficient characterization before posting, motivated interpretation of ambiguous data. This happens in science. Extraordinary rewards for being right create extraordinary pressure to believe you are right. The LK-99 researchers may have seen something they genuinely could not explain and convinced themselves it was what they hoped it was.

The Dias case, if the allegations of data manipulation are accurate — and the retractions, and the University of Rochester investigation that followed, suggest they have merit — is something different: not motivated misinterpretation but deliberate fabrication. The scientific outcomes are superficially similar: both sets of claims were false, both caused the community to expend significant effort on falsification, both damaged the credibility of the field. But the causes, and the appropriate institutional and moral responses, differ substantially.

How do you tell them apart in real time? In both cases, you had extraordinary claims that passed initial peer review at prestigious venues. In both cases, independent replication failed. The LK-99 falsification came faster, partly because the synthesis was simpler and partly because the community mobilized more broadly. The Dias case took years, and the data manipulation allegations required access to raw data that the research group was slow to provide.

I do not have a clean answer. The difference in mechanism — honest error versus alleged fraud — is not directly observable from the outside. What you can observe is willingness to share data, consistency of results across different instruments and laboratories, and whether the research group facilitates or obstructs independent verification. On those criteria, the LK-99 group and the Dias group look quite different.

The Sociology of What Happened

Let me step back from the physics and say something about what the LK-99 episode reveals about how science actually functions.

The first thing it reveals is that community self-correction works, and now works at extraordinary speed when the incentive is high enough. The coordinated global replication was not organized by any institution, any journal, any funding body. It emerged spontaneously from a community that understood what was at stake and had the tools — preprint servers, social media, Discord, GitHub — to coordinate without central direction. The result was a falsification that, in a previous era, might have taken two to five years, completed in six weeks. That is remarkable.

The second thing it reveals is that the preprint revolution is real and consequential. The LK-99 preprints bypassed traditional peer review entirely. That could be bad — and in principle, a false claim could propagate further and faster without peer review as a gate. In practice, in this case, removing the gate allowed not just the false claim but its falsification to move at the same speed. Peer review, as it is normally practiced, is too slow to respond to a claim like this on a timescale that matters. The community replaced it with something faster: immediate, distributed, adversarial review by people with direct experimental access to the question.

This is not an argument against peer review. It is an argument that peer review in the traditional sense — two or three reviewers reading a manuscript over a few weeks — is not the only form that meaningful scientific scrutiny takes.

The third thing the episode reveals is that social media’s role in science communication is deeply ambivalent. Twitter accelerated the spread of both the original claim and the debunking. The community of physicists on Twitter was, on the whole, appropriately skeptical from the first day — I saw many threads on July 22 and 23 that noted the red flags I mentioned above: the tilted levitation, the non-zero resistance, the inconsistencies between the two preprints. But that skepticism was invisible to most science journalists, who were looking at the same videos and preprints and reading the excitement rather than the caveats.

The Media, and the Calibration Problem

I want to be specific about the media failure, because I think it matters.

The appropriate headline on July 23, 2023 was something like: “Korean researchers post preprints claiming room-temperature superconductivity; claim is extraordinary and unverified; replication underway.” That headline is accurate. It conveys the genuine excitement — because the claim, if true, would be extraordinary — while conveying the appropriate uncertainty about an unverified preprint from a single group.

The headlines that actually appeared, across outlets that should know better, included “Room-temperature superconductor discovered” and “Scientists may have created the holy grail of energy.” These are not accurate. They convey neither the uncertainty nor the specific nature of the claim. They treat a preprint as a discovery.

This is a calibration failure — the same kind of failure I have written about in other contexts. On this blog, I have discussed how LLMs can fail catastrophically when they lack the context to assess whether their confident-sounding output is grounded in anything real (see the car-wash post, and more generally the discussion of context and grounding in more context is not always better). The mechanism in journalism is different but the structure is the same: confidence that is not appropriately calibrated to evidence.

The Bayesian structure of the situation was, or should have been, clear. The prior probability of a room-temperature, ambient-pressure superconductor being found in any given week is very small — not because room-temperature superconductors are impossible, but because such discoveries do not happen often and many previous claims have failed. Call that prior probability low. Against that prior, what evidence did we have on July 23? A video showing partial, unstable levitation — which, as I noted, is not what Meissner levitation looks like. Two rushed preprints that disagreed with each other in some details. No independent replication. P(levitation video | not a superconductor) was not particularly small, as the Cu₂S explanation would later demonstrate. So the posterior probability that LK-99 was a room-temperature superconductor, given the evidence available on July 23, was not meaningfully higher than the prior — which was low.

A well-calibrated science journalist would not have written “Room-temperature superconductor discovered.” A well-calibrated scientist — and many of them said exactly this — would have written “interesting claim, requires replication, maintain high skepticism.” The scientific community was, on the whole, well-calibrated. The journalism was not.

This is not a new observation. Science journalists have been criticized for overclaiming since there have been science journalists. But the LK-99 episode is a particularly clean example because the timescale was so short: the calibration failure in the media and the calibration success in the scientific community happened simultaneously, in full public view, and could be compared directly.

I write occasionally about AI systems and their tendency to produce confident outputs that are not grounded in evidence — a form of miscalibration that is particularly dangerous because the confident tone is not a signal of accuracy (a theme that runs through recent posts on this blog). The LK-99 episode is a reminder that miscalibration is not unique to neural networks. It is a general failure mode in any system that needs to estimate uncertainty about claims — human, institutional, or artificial. The cure in all cases is the same: track confidence to evidence, update on data, resist the pull of exciting priors.

What the Scientific Community Actually Did

I want to be careful not to end on a note of pure cynicism about the media and leave the scientific community looking saintly. The community is not saintly.

There were preprints from independent groups that claimed positive results before the falsification was clear — groups that perhaps saw anomalies and wanted to be part of the story. There was social pressure, documented in real time on Twitter, to share exciting results before they were fully analyzed. The Discord servers and GitHub repositories that were genuinely useful for coordination were also, occasionally, vectors for misinformation and premature interpretation.

The community self-corrected. That is the important thing. The noise in the system resolved into a clear answer, in six weeks, through a process that was adversarial in the best scientific sense: many people trying to verify or refute a specific testable claim, sharing data openly, calling out methodological problems in public. The answer that emerged was correct.

I find this genuinely impressive. It is easy to be cynical about institutional science — about publication bias, about the replication crisis in psychology and medicine, about the incentive structures that reward novelty over rigor. The LK-99 episode is a counter-example. It is evidence that, when a question is clear and testable and the stakes are high, the system works. Not perfectly, not without noise, but functionally.

Peer review in the classical sense was absent. Peer review in a broader sense — global, immediate, public, adversarial — worked faster than any journal could have managed, and reached a correct conclusion.

The Next Extraordinary Claim

LK-99 is over. The compound will appear in future textbooks, probably in a sidebar about famous failed claims in condensed matter physics, alongside Schön and Dias and others. The researchers who synthesized and characterized it honestly will get some credit for the negative result; the original Korean team will, I imagine, have a difficult few years professionally.

The question I am left with is what happens next time.

Room-temperature superconductivity will, almost certainly, be claimed again. The prize is too large and the search too active. Possibly the claim will be correct — I would not put that probability at zero. More likely it will be another false positive, another Cu₂S lurking in the impurity profile.

Will the media learn from LK-99? I am genuinely uncertain. The incentive structure for science journalism rewards excitement over accuracy, and “extraordinary claim requires replication” is a less clickable headline than “room-temperature superconductor discovered.” The journalists who wrote those headlines were not stupid; they were responding rationally to the incentives of their profession.

Will the scientific community respond as effectively? I think so, at least for claims of this kind: testable, synthesis-based, with enough labs in the world capable of attempting replication. The infrastructure — preprints, Discord, shared repositories — exists and is now demonstrated to work. The speed of the LK-99 falsification sets a kind of benchmark.

What the episode showed, in the end, is not that science is infallible or that the system is without problems. It showed that, under the right conditions — a clear empirical question, a distributed community with the tools and motivation to address it, and a culture of open data sharing — science can self-correct at remarkable speed. The failure was in communication, not in the science. That is a meaningful distinction.

Whether the media will have learned anything by the time the next extraordinary claim appears — that, I confess, I doubt.

References

Lee, S., Kim, J. H., & Kwon, Y.-W. (2023). The First Room-Temperature Ambient-Pressure Superconductor. arXiv:2307.12008. Retrieved from https://arxiv.org/abs/2307.12008
Kumar, K., Surface, N. B., & Baral, B. (2023). Absence of superconductivity in LK-99 at ambient conditions. arXiv:2308.03544. Retrieved from https://arxiv.org/abs/2308.03544
Liu, S., & Meng, S. (2023). Symmetry-breaking and the origin of the anomalous properties of LK-99. arXiv:2308.05135. Retrieved from https://arxiv.org/abs/2308.05135
Garisto, D. (2023). LK-99 isn’t a superconductor — how science sleuths solved the mystery. Nature, 620, 705–706. DOI: 10.1038/d41586-023-02585-7
Snider, E., Dasenbrock-Gammon, N., McBride, R., Debessai, M., Vindana, H., Vencatasamy, K., Lawler, K. V., Salamat, A., & Dias, R. P. (2020). Room-temperature superconductivity in a carbonaceous sulfur hydride. Nature, 586, 373–377. DOI: 10.1038/s41586-020-2801-z (Retracted 2022.)
Dias, R. P., & Silvera, I. F. (2017). Observation of the Wigner-Huntington transition to metallic hydrogen. Science, 355, 715–718. DOI: 10.1126/science.aal1579 (Erratum published 2017; widely questioned.)
Hirsch, J. E. (2021). Rejoinder to “Comment on ‘Absence of magnetic evidence for superconductivity in hydride compounds’” by Dias and Salamat. Physica C, 590, 1353964. DOI: 10.1016/j.physc.2021.1353964

Changelog

2025-09-14: Updated the Cu₂S characterisation: pure Cu₂S is diamagnetic; the ferromagnetism in LK-99 samples comes from impurity phases. Updated the Dias & Silvera 2017 Science paper status: it received an erratum but was not formally retracted (unlike the 2020 and 2023 Nature papers). Updated the Senapati et al. reference to the correct LK-99 debunking literature (the previous arXiv ID resolved to a different paper).

Zero Angular Momentum: The Falling Cat and the Geometry of Shape Space

Tue, 03 Oct 2023 00:00:00 +0000

We adopted two stray cats in 2023. They had been living under a garden shed and had strong opinions about most things, including the correct height from which to leap onto a bookshelf and whether landing was optional. They are indoor cats now, for health reasons — a vet’s recommendation they find unconvincing but have largely accepted. Watching one of them drop from a windowsill — always feet-first, always orientated correctly, from heights that would leave me reconsidering my life choices — I found myself thinking about a problem I had first encountered in a mechanics course and had never fully resolved to my satisfaction.

How does a cat rotate with zero angular momentum?

The Problem

When a cat is dropped from an inverted position — upside-down, held by a practised experimenter, then released — it rotates approximately 180° and lands on its feet. The drop takes around 0.3 seconds. The cat begins with negligible angular momentum (the experimenter can release it with almost no spin), and there are no external torques during free fall. By conservation of angular momentum, the total angular momentum of the cat must remain constant throughout the fall.

The total angular momentum is therefore approximately zero throughout the fall.

And yet the cat rotates 180°.

This is the falling cat problem. It was first documented quantitatively by Étienne-Jules Marey in 1894 using chronophotography — among the first high-speed photography of any biological motion — and it has occupied physicists, mathematicians, neuroscientists, and roboticists ever since.

The problem is not exotic. Every cat owner has seen it. What requires explanation is why our intuitions about angular momentum fail here, and what replaces them.

Why the Obvious Answers Do Not Work

There are two naive explanations for the cat’s righting reflex, both wrong.

Explanation 1: The cat uses initial angular momentum. The experimenter gives the cat a small spin before releasing it; the cat amplifies this to achieve the full 180°. This fails because controlled experiments (and Marey’s original photographs) confirm that cats can right themselves even when released with zero initial spin. Careful experimenters have verified this explicitly.

Explanation 2: The cat pushes against the air. A falling cat could, in principle, use aerodynamic forces to push against the air and generate a reaction. This fails because the angular impulse from air drag over 0.3 seconds is far too small to account for the observed 180° rotation. Marey’s chronophotographs already showed that the motion begins immediately on release, before air resistance could contribute meaningfully.

Both explanations appeal to external torques. The correct explanation requires none.

Marey and the Photographic Evidence

Étienne-Jules Marey published his chronophotographic sequence of a falling cat in La Nature on 10 November 1894. The images, taken at 60 frames per second, show the following clearly:

The front and rear halves of the cat move asymmetrically. The front half rotates in one direction; the rear half rotates by a smaller angle in the opposite direction.
The cat pulls its front legs in close to its body (reducing the moment of inertia of the front half) while extending its rear legs (increasing the moment of inertia of the rear half).
The front half then rotates rapidly (large angle, small moment of inertia); the rear half rotates slowly in the opposite direction (small angle, large moment of inertia).
The cat then extends its front legs and pulls in its rear legs, and reverses the process.

The net effect: the cat’s body orientation rotates by 180° even though the total angular momentum — computed as the sum of both halves — remains constant. The key word is sum. Individual parts can exchange angular momentum through internal torques; the sum is conserved.

This mechanism — internal redistribution of angular momentum without changing its total — is correct but not complete. It explains that rotation is possible, not how much rotation is achieved per cycle of shape change. For that, we need the mathematics.

Kane and Scher: The Two-Cylinder Model

The first rigorous mechanical model was published by T.R. Kane and M.P. Scher in 1969 (International Journal of Solids and Structures 5, 663–670).

They modelled the cat as two rigid axisymmetric cylinders — a front half and a rear half — connected at a joint that allows relative bending and twisting. The joint constraint imposes that the relative twist between the two halves is zero (a “no-twist” condition: the cylinders cannot spin relative to each other at their connection). The total angular momentum of the system is held fixed at zero.

Let the two cylinders have moments of inertia $I_1$ and $I_2$ about their symmetry axes, and let $\phi$ be the bend angle between them and $\psi$ the twist angle. The zero-angular-momentum constraint, combined with the no-twist condition, gives a system of equations that can be integrated numerically to find the net body rotation as a function of the shape-change trajectory $(\phi(t), \psi(t))$.

Kane and Scher showed that a specific sequence of shape changes — one complete cycle in the $(\phi, \psi)$ plane — produces a net rotation of approximately 90–100°. A second cycle gives the rest. The calculation was the first to confirm, from mechanics alone, that the righting manoeuvre requires no external torques and is entirely consistent with conservation of angular momentum.

What the Kane–Scher model does not explain is why the net rotation per cycle depends on the area enclosed by the trajectory in shape space — or why the same mathematical structure appears in quantum mechanics. For that, we need Montgomery’s formulation.

Montgomery: Fiber Bundles and Geometric Holonomy

In 1993, Richard Montgomery published a reformulation of the falling cat problem using gauge theory (Dynamics and Control of Mechanical Systems, Fields Institute Communications, AMS, pp. 193–218). The reformulation is the definitive mathematical treatment, and it connects the cat to one of the deepest structures in modern physics.

The Configuration Space

The full configuration space of the cat — the space of all possible positions and orientations — is

$$Q = SO(3) \times \mathcal{S},$$

where $SO(3)$ is the rotation group (describing the cat’s overall orientation in space) and $\mathcal{S}$ is the shape space (describing the internal geometry: the bend angle, the twist, the position of each limb relative to the body).

The angular momentum constraint $\mathbf{L} = 0$ defines a horizontal distribution on $Q$ — a preferred subspace of tangent vectors at each point that correspond to shape changes at zero angular momentum. This distribution is not integrable (it does not come from a foliation), which is the mathematical signature that holonomy is possible.

The Fiber Bundle

The projection

$$\pi \colon Q \to \mathcal{S}, \qquad (R, s) \mapsto s,$$

makes $Q$ into a principal fiber bundle over $\mathcal{S}$ with structure group $SO(3)$. The fiber above each shape $s \in \mathcal{S}$ is the set of all orientations the cat can have with that shape.

A connection on this bundle is a rule for “lifting” paths in the base $\mathcal{S}$ to horizontal paths in the total space $Q$ — that is, paths along which the angular momentum constraint is satisfied. This connection $\mathcal{A}$ is a one-form on $\mathcal{S}$ taking values in the Lie algebra $\mathfrak{so}(3)$.

Holonomy: The Geometric Phase

When the cat executes a closed loop $\gamma$ in shape space — a sequence of shape changes that returns it to its initial shape — the holonomy of the connection $\mathcal{A}$ around $\gamma$ gives the net rotation:

$$R_\gamma = \mathrm{Hol}_\mathcal{A}(\gamma) \in SO(3).$$

For the full non-Abelian case ($SO(3)$), the holonomy is a path-ordered exponential along $\gamma$ and its relationship to the curvature involves non-Abelian corrections. But the essential geometric intuition is captured by the Abelian case — rotation about a single axis — where Stokes’s theorem gives the net rotation directly:

$$\theta_\gamma = \iint_{\Sigma} F,$$

where $\Sigma$ is a surface bounded by $\gamma$ and $F = d\mathcal{A}$ is the curvature 2-form. The cat’s net rotation per cycle is the integral of the curvature over the area enclosed by its shape-change loop in $\mathcal{S}$. For small loops, the curvature $F_\mathcal{A} = d\mathcal{A}

\mathcal{A} \wedge \mathcal{A}$ determines the holonomy to leading order in both the Abelian and non-Abelian cases.

The rotation is geometric: it depends on the shape of the loop, not on the speed at which the loop is traversed. A cat executing the same shape-change sequence twice as fast achieves the same rotation in half the time.

The Connection to Berry Phase

The gauge structure of the falling cat problem is not an isolated curiosity. It is the same mathematical structure that governs several central phenomena in modern physics.

The Berry phase (Berry 1984, Proceedings of the Royal Society A) arises when a quantum system is transported adiabatically around a closed loop $C$ in parameter space. The state acquires a phase

$$\gamma_B = \oint_C \mathbf{A} \cdot d\mathbf{R},$$

where $\mathbf{A} = i\langle n(\mathbf{R}) | \nabla_\mathbf{R} | n(\mathbf{R}) \rangle$ is the Berry connection — a gauge field on parameter space. The Berry phase is the holonomy of this connection, which is to say: the cat righting itself and a quantum state accumulating a geometric phase are instances of the same mathematical theorem.

Shapere and Wilczek (1989) made this connection explicit for deformable bodies, noting that the net rotation of a swimming microorganism or a falling cat is the holonomy of a gauge connection on shape space — exactly the Berry phase, expressed in the language of classical mechanics.

The Foucault pendulum precesses at a rate of $2\pi\sin\phi$ per sidereal day, where $\phi$ is the latitude. The holonomy of the Levi-Civita connection on $S^2$ for parallel transport around the circle of latitude is the solid angle of the enclosed polar cap, $\Omega = 2\pi(1 - \sin\phi)$. The lab-frame precession $2\pi\sin\phi = 2\pi - \Omega$ is the complementary angle — the two sum to a full rotation because the local frame itself completes one circuit per sidereal day. It is another geometric phase.

The Aharonov-Bohm effect (1959) produces a phase shift for electrons circling a solenoid, even when the electrons travel only through field-free regions. The phase is the holonomy of the electromagnetic vector potential $\mathbf{A}$ around the loop — a Berry phase for the electromagnetic field.

All four phenomena — the falling cat, the Berry phase, the Foucault pendulum, the Aharonov-Bohm effect — are manifestations of the same structure: a connection on a fiber bundle, and holonomy as the geometric consequence of traversing a closed loop.

Batterman (2003, Studies in History and Philosophy of Modern Physics 34, 527–557) gives a particularly clear account of this unification, drawing out the common mathematical skeleton and its physical implications.

High-Rise Syndrome: Terminal Velocity and the Parachute Cat

There is a grounding empirical footnote to the elegant geometry above. Whitney and Mehlhaff (1987, Journal of the American Veterinary Medical Association 191, 1399–1403) analysed 132 cats brought to a Manhattan veterinary clinic after falling from buildings of two to thirty-two stories. Their finding was counterintuitive:

Cats falling from above seven stories had a lower injury rate than cats falling two to six stories. Overall, 90% of the cats in the study survived, with injuries paradoxically less severe at greater heights.

The explanation involves two phases. Below seven stories, the cat is still accelerating: it is tense, its legs are extended to brace for impact, and it absorbs the force of landing poorly. Above seven stories, the cat reaches terminal velocity — approximately $100\,\mathrm{km/h}$ for a falling cat — and then, apparently, relaxes. The vestibular system, having identified that the fall is not ending imminently, switches from the righting reflex to a parachute posture: legs spread horizontally, body flattened, increasing the cross-sectional area and hence air resistance.

Terminal velocity is reached when the drag force equals the gravitational force:

$$mg = \frac{1}{2} C_D \rho A v_t^2, \qquad v_t = \sqrt{\frac{2mg}{C_D \rho A}}.$$

For a spread-eagle cat ($m \approx 4\,\mathrm{kg}$, $A \approx 0.06\,\mathrm{m}^2$, $C_D \approx 1.0$, $\rho_\mathrm{air} \approx 1.2\,\mathrm{kg/m}^3$):

$$v_t \approx \sqrt{\frac{2 \times 4 \times 9.8}{1.0 \times 1.2 \times 0.06}} \approx 33\,\mathrm{m/s} \approx 120\,\mathrm{km/h}.$$

(The exact value depends on posture and fur drag; empirical estimates for cats in the parachute posture are lower, roughly $25$–$30\,\mathrm{m/s}$, because the effective area increases when the limbs are spread.)

A human in free-fall has terminal velocity around $55\,\mathrm{m/s}$ ($200\,\mathrm{km/h}$) — faster, because the mass-to-area ratio is higher. The cat, with its low mass and high drag relative to body weight, hits a gentler terminal velocity and distributes the impact more effectively.

The study is sometimes cited as evidence that cats are invincible. A significant caveat is survivorship bias: cats that died on impact were likely not brought to the veterinary clinic, so the dataset underrepresents fatal outcomes, especially for higher falls. The apparent decrease in injury rate above seven stories may partly reflect the fact that the most severely injured cats from those heights never entered the study. The aerodynamic posture explanation is plausible, but the data do not cleanly separate it from the sampling bias.

Robotics and Spacecraft

The falling cat problem has practical applications beyond veterinary statistics.

Spacecraft attitude control: Astronauts in free fall can change their body orientation without thrusters, using the same gauge-theoretic mechanism as the cat. NASA and ESA have studied cat-inspired reorientation manoeuvres for astronauts and satellites.

Robotics: The two-cylinder model inspired early robot designs capable of reorienting in free fall — useful for robots deployed from aircraft or spacecraft. Subsequent work (including a 2022 review in IEEE Transactions on Robotics) has produced legged robots that can right themselves after being knocked over using shape-change sequences derived from the Montgomery connection.

Gymnastics and diving: Human athletes performing somersaults and twists exploit the same gauge structure, though without articulating the mathematics. A tuck increases rotation rate (smaller $I$, constant $L$ → larger $\omega$); a layout decreases it. Changing the tuck–layout timing mid-rotation produces a net twist — holonomy in the shape space of a human body.

The View from a Windowsill

My cats have no opinion about fiber bundles. When one of them drops from the top of the bookcase, she is not solving the variational problem

$$\min_{\gamma \in \Omega} \int_\gamma |\dot{s}|^2 \, dt, \quad \text{subject to } \mathrm{Hol}_\mathcal{A}(\gamma) = R_{180°},$$

she is executing a motor program refined over millions of years of feline evolution. The vestibular system provides continuous feedback on body orientation; the cerebellum coordinates the shape-change sequence; the whole manoeuvre is over in a third of a second.

What physics tells us is that the manoeuvre is possible — that no law of nature forbids a body with zero angular momentum from reorienting — and gives the precise geometric reason: the curvature of a connection on shape space is non-zero, which means the holonomy of closed loops is non-trivial.

The same curvature that allows a cat to right itself allows a quantum state to accumulate a geometric phase, allows the Foucault pendulum to precess, and allows the Aharonov-Bohm effect to shift an interference fringe without a local field. These are not analogies. They are the same theorem, applied to different physical systems in different mathematical languages.

I find this more remarkable than the cat.

References

Batterman, R.W. (2003). Falling cats, parallel parking, and polarized light. Studies in History and Philosophy of Modern Physics, 34(4), 527–557. https://doi.org/10.1016/S1355-2198(03)00062-5
Berry, M.V. (1984). Quantal phase factors accompanying adiabatic changes. Proceedings of the Royal Society A, 392, 45–57. https://doi.org/10.1098/rspa.1984.0023
Gbur, G.J. (2019). Falling Felines and Fundamental Physics. Yale University Press.
Kane, T.R., & Scher, M.P. (1969). A dynamical explanation of the falling cat phenomenon. International Journal of Solids and Structures, 5(7), 663–670. https://doi.org/10.1016/0020-7683(69)90086-9
Marey, É.-J. (1894). Des mouvements que certains animaux exécutent pour retomber sur leurs pieds lorsqu’ils sont précipités d’un lieu élevé. La Nature, 10 November 1894.
Montgomery, R. (1993). Gauge theory of the falling cat. In M. Enos (Ed.), Dynamics and Control of Mechanical Systems (Fields Institute Communications, Vol. 1, pp. 193–218). American Mathematical Society.
Shapere, A., & Wilczek, F. (Eds.). (1989). Geometric Phases in Physics. World Scientific.
Whitney, W.O., & Mehlhaff, C.J. (1987). High-rise syndrome in cats. Journal of the American Veterinary Medical Association, 191(11), 1399–1403.

Changelog

2025-12-15: Corrected the Marey publication date from 22 November 1894 to 10 November 1894 (in text and in reference). Updated the Whitney & Mehlhaff (1987) statistics to reflect that the 90% survival rate applies to all cats in the study, as reported in the paper, rather than specifically to those falling from above seven stories.