<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Physics on Sebastian Spicker</title>
    <link>https://sebastianspicker.github.io/tags/physics/</link>
    <description>Recent content in Physics on Sebastian Spicker</description>
    <image>
      <title>Sebastian Spicker</title>
      <url>https://sebastianspicker.github.io/og-image.png</url>
      <link>https://sebastianspicker.github.io/og-image.png</link>
    </image>
    <generator>Hugo -- 0.160.0</generator>
    <language>en</language>
    <lastBuildDate>Tue, 03 Mar 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://sebastianspicker.github.io/tags/physics/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Oppenheimer Didn&#39;t Have an Acceptable Use Policy</title>
      <link>https://sebastianspicker.github.io/posts/ai-warfare-anthropic-atom-bomb/</link>
      <pubDate>Tue, 03 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-warfare-anthropic-atom-bomb/</guid>
      <description>Anthropic has drawn a public line on military use of its models. The physics community spent the better part of the twentieth century working out what it means to draw that line after you have already built the thing. As a physicist watching this unfold, I find the parallels clarifying and the differences more unsettling than the parallels.</description>
      <content:encoded><![CDATA[<p><em>Physicists inherit, along with the formalism and the problem sets, a particular
set of guilt. The profession has been working
through its relationship to weapons, state violence, and the gap between
scientific capability and ethical readiness since August 1945. This post is about
why I think the current moment in AI closely resembles that history, and why
Anthropic&rsquo;s decision to draw a line matters even if — especially if — you think
the line is imperfect.</em></p>
<hr>
<h2 id="what-just-happened">What Just Happened</h2>
<p>The news this week involves Anthropic and the question of whether and how large
language models should be available for military applications. Anthropic has stepped
back from a path toward unrestricted military use and restated a position: there
are things their models will not be used for, weapons development and autonomous
lethal systems among them. The response from parts of the defence and national
security community has been predictable — naïve, idealistic, unilateral disarmament,
your adversaries will not make the same choice.</p>
<p>These are not stupid objections. I want to take them seriously. But I also want
to explain why, as someone who spent years studying physics in the shadow of the
Manhattan Project&rsquo;s legacy, the framing of those objections sounds very familiar,
and why that familiarity is not reassuring.</p>
<hr>
<h2 id="what-the-physicists-thought-they-were-doing">What the Physicists Thought They Were Doing</h2>
<p>The scientists who built the atomic bomb were not, for the most part, indifferent
to what they were building. Many of them were refugees from European fascism.
They understood what a Nazi atomic weapon would mean. The urgency was real, the
moral reasoning was coherent, and the conclusion — build it before the other side
does — followed from the premises.</p>
<p>What the premises did not include was adequate weight for what happens after the
technical problem is solved.</p>
<p>By the time the Trinity test produced results in July 1945, Germany had already
surrendered. The original justification — prevent the Nazis from getting there
first — had evaporated. What remained was a weapon, an infrastructure for building
more weapons, and a strategic and political logic that had largely moved beyond
the scientists&rsquo; control. The Franck Report, written by a group of Manhattan Project
scientists in June 1945, argued against using the bomb on a Japanese city without
prior demonstration. It was ignored. Oppenheimer, who chaired the Interim
Committee&rsquo;s scientific panel, signed off on the Hiroshima target recommendation.
He spent the rest of his life with that.</p>
<p>The lesson most physics students absorb from this history is something like: the
scientists were not the decision-makers, the decision was going to be made anyway,
and the presence of principled scientists in the room was better than their absence.
The system was going to do what it was going to do; all you could influence was
the margin.</p>
<p>I believed this for a long time. I am less sure of it now.</p>
<hr>
<h2 id="the-analogy-and-its-limits">The Analogy and Its Limits</h2>
<p>The comparison between the atom bomb and artificial general intelligence — or even
current large language models at the capability frontier — is made often enough
that it has become a cliché, which is usually the point at which people stop
thinking carefully about it. Let me try to be specific about where the analogy
holds and where it breaks.</p>
<p><strong>Where it holds:</strong></p>
<p>The core structural similarity is this: a small number of researchers, working
at the frontier of a capability that most people do not understand, are making
decisions that will constrain or enable uses they cannot fully anticipate, in
contexts they will not control. The physics community in 1942 had a clearer view
of what fission could do than any political or military decision-maker. The AI
research community in 2026 has a clearer view of what large language models can
do — and of what more capable successors will do — than most of the people who
will deploy them.</p>
<p>That epistemic position is not morally neutral. Knowing more than the decision-makers
does not mean you have unlimited responsibility, but it does mean you have more
responsibility than someone who does not know. Feigning ignorance about downstream
applications is not available to you.</p>
<p>The second similarity: once the capability exists and is demonstrated, the
normative landscape changes. Before Trinity, the question of whether to build nuclear
weapons was still open. After Trinity, it was no longer open in the same way — the
knowledge existed, the infrastructure existed, the geopolitical expectations had
already been set. The arms race was not caused by the bomb, but the bomb&rsquo;s existence
changed what the arms race meant and how fast it moved. We are somewhere in the
vicinity of that transition with frontier AI systems. The question of whether to
build them is still formally open for any given company or research group, but the
landscape is already different from what it was five years ago.</p>
<p><strong>Where it breaks:</strong></p>
<p>The atom bomb was a single-use physical object whose primary function was destroying
things. Large language models are general-purpose cognitive tools with a very wide
range of applications, the majority of which are not weapons-relevant. This matters
because it changes the policy space. You could, in principle, have not built the
atom bomb. You cannot, in principle, not build language models while still having
language models for medicine, education, scientific research, and the other
applications that are clearly beneficial. The dual-use problem for AI is more
severe, not less severe, than it was for physics.</p>
<p>The other important difference: the Manhattan Project was conducted in secret, under
wartime conditions, with a relatively well-defined adversarial structure. The current
AI landscape involves many organisations, many countries, public publication of
research, and no clear equivalent of the Axis/Allied framing. The game theory
of &ldquo;if we don&rsquo;t do it, they will&rdquo; is more complicated when &ldquo;they&rdquo; is not a single
identifiable adversary with symmetric interests.</p>
<hr>
<h2 id="what-anthropics-line-actually-says">What Anthropic&rsquo;s Line Actually Says</h2>
<p>Setting aside for a moment whether the line is in the right place, there is something
worth examining in the act of drawing it at all.</p>
<p>The standard criticism — that a unilateral ethical commitment in a competitive
field simply advantages less scrupulous actors — assumes that ethical commitments
are pure costs with no countervailing benefits. This is the argument the weapons
lobby has made about every arms control proposal in the history of arms control,
and it has sometimes been right. Unilateral disarmament without reciprocal
commitments can leave you worse off. This is not a trivial point.</p>
<p>But it smuggles in an assumption that deserves scrutiny: that the relevant
competition is primarily between AI companies, and that the only variable that
matters is relative capability. If you accept that framing, then any ethical
constraint is a handicap and the only rational strategy is to develop as fast as
possible with as few restrictions as possible.</p>
<p>That framing has a name in physics. It is called the arms race equilibrium, and
the physics community spent thirty years understanding what it produces. It produces
capability accumulation without a corresponding development of the normative
frameworks, institutional safeguards, and mutual verification mechanisms that
make the capability survivable. It produces Hiroshima, then the hydrogen bomb,
then MIRV, then the point at which the accumulated arsenal is large enough to
end complex life on Earth several times over, at which point you negotiate the
first real arms limitation treaties — from a starting position of vastly more
deployed capability than anyone needed and vastly less trust than anyone wanted.</p>
<p>The question Anthropic is implicitly asking is whether there is a path that does
not look like that. The answer is not obvious. But I think it is worth asking.</p>
<hr>
<h2 id="what-the-physicists-should-have-done">What the Physicists Should Have Done</h2>
<p>Here is the counterfactual that haunts the Manhattan Project&rsquo;s legacy: what if
the scientific community had treated the ethics of the bomb as seriously as the
physics, from the beginning?</p>
<p>Not naïvely. Not by refusing to work on it and ceding the possibility of influencing
it. But by making the ethical analysis parallel to the technical analysis, by
treating the question of use as a scientific question with as much rigour as the
question of yield, and by using the epistemic authority that came from being the
people who understood the capability to push, hard, for the normative frameworks
that did not yet exist.</p>
<p>Some scientists did this. Szilard circulated a petition, signed by 70 Manhattan
Project scientists, against the use of the bomb on Japanese cities without prior
warning. It did not work. But the effort was real, and the record of the effort
matters — both as evidence that the scientific community was not unanimous in its
acquiescence and as a model for what engaged dissent looks like from inside a
project that is going to proceed regardless.</p>
<p>What most scientists did not do, and what the profession largely did not do in the
decades that followed, was treat the ethical work as primary. Physics built its
identity around the technical capability — the extraordinary achievement of
understanding nature at the deepest level — and treated the ethical consequences
as someone else&rsquo;s department. The bomb was the military&rsquo;s problem. The cold war was
the politicians&rsquo; problem. The physicists kept doing physics.</p>
<p>This was comfortable and it was wrong.</p>
<hr>
<h2 id="what-i-want-from-ai-researchers">What I Want From AI Researchers</h2>
<p>I want AI researchers to do what the physicists did not, and to do it now, while
the critical decisions are still open.</p>
<p>Anthropic drawing a line is one version of this. It is imperfect — the line is
in a particular place, the enforcement mechanisms are limited, the competitive
dynamics are real. But it is a claim that the people who built the capability
have ongoing responsibility for how it is used, and that some uses are outside
the bounds of what should happen regardless of what is technically possible.</p>
<p>That claim is not naïve. It is, in fact, the claim the Franck Report was making
in 1945: that capability does not determine use, that scientists have a voice in
the normative question, and that using that voice is part of the job rather than
a distraction from it.</p>
<p>What I want beyond that is for the AI research community to treat the ethics
as primary rather than as footnotes. Not ethics review boards that approve research
post hoc. Not responsible AI teams that are consulted after the capability has
been developed. A genuine integration of the normative analysis into the research
process itself — asking, at each stage, what this capability makes possible and
who benefits from that possibility and who pays the cost.</p>
<p>The physics community got to August 1945 before it had that conversation in earnest.
The conversation has been going on ever since, and it has produced important
institutional frameworks — the Bulletin of the Atomic Scientists, the arms control
treaties, the export control regimes, the norms against first use. These things
matter. But they were built in reaction to a capability that had already been
deployed, and the shape of everything that followed was constrained by that
starting point.</p>
<p>The AI community is not there yet. The starting point is still being established.
That is what makes this moment consequential, and what makes Anthropic&rsquo;s line —
wherever exactly it is drawn — worth defending as an act of principle rather than
dismissing as an act of commercial positioning.</p>
<hr>
<h2 id="a-note-on-the-of-our-time-framing">A Note on the &ldquo;Of Our Time&rdquo; Framing</h2>
<p>I am aware that comparisons to the atom bomb are sometimes used to generate
unwarranted urgency, to short-circuit careful reasoning by invoking the most
extreme case. I want to be clear about what I am and am not claiming.</p>
<p>I am not claiming that current large language models are as immediately dangerous
as nuclear weapons. They are not.</p>
<p>I am claiming that the structural situation — researchers at the capability
frontier, ahead of the policy frameworks, making decisions that will constrain
future options, in a competitive environment with adversarial dynamics — is
similar enough that the lessons of the Manhattan Project period are directly
relevant. Not as prophecy. As a guide to the kind of mistakes that are available
to make.</p>
<p>The physicists had plenty of warning. Szilard had been worried since 1933.
Einstein wrote to Roosevelt in 1939. The Franck Report was written before
Hiroshima. The warnings were on the record. What was not on the record was
a scientific community that treated those warnings as actionable constraints
on its own behaviour rather than as advisories for policymakers.</p>
<p>That is the thing I want to be different this time.</p>
<hr>
<h2 id="references">References</h2>
<p>Franck, J. et al. (1945). <em>Report of the Committee on Political and Social Problems
(The Franck Report).</em> National Archives, Record Group 77.</p>
<p>Oppenheimer, J. R. (1965). Interview on <em>The Decision to Drop the Bomb</em> (NBC
documentary). Recorded 1965.</p>
<p>Rhodes, R. (1986). <em>The Making of the Atomic
Bomb.</em> Simon &amp; Schuster.</p>
<p>Russell, B., &amp; Einstein, A. (1955). <em>The Russell–Einstein Manifesto.</em>
Pugwash Conferences on Science and World Affairs.</p>
<p>Szilard, L. (1945). <em>A Petition to the President of the United States.</em> July 17,
1945. Available via the Atomic Heritage Foundation.</p>
<p>Bulletin of the Atomic Scientists (1945–present). <em>Doomsday Clock statements.</em>
<a href="https://thebulletin.org/doomsday-clock/">https://thebulletin.org/doomsday-clock/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>If You Think This Is Written by AI, You Are Both Right and Wrong</title>
      <link>https://sebastianspicker.github.io/posts/ai-detectors-systematic-minds/</link>
      <pubDate>Wed, 18 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-detectors-systematic-minds/</guid>
      <description>AI detectors flag the US Constitution as machine-generated. They also flag technical papers, legal prose, and — with striking consistency — writing produced by autistic minds and physics-trained ones. The error is not in the measurement. It is in the baseline assumption: that systematic, precise writing is inhuman.</description>
      <content:encoded><![CDATA[<p>I use AI tools in my writing. This post, like several others on this blog,
was written with LLM assistance — research, structure, drafting,
revision. If you run any of these posts through an AI writing detector, you
will likely receive a high probability-of-AI score. The detector will be
picking up something real.</p>
<p>It will also be wrong about what that means.</p>
<hr>
<h2 id="the-constitution-problem">The Constitution Problem</h2>
<p>In 2023, as universities began deploying AI detection tools at scale,
educators started testing them on texts that were definitively not
AI-generated. The results were instructive. The United States Constitution
received high AI-probability scores from multiple commercial detectors.
GPTZero returned a rating of 92% likely AI-written. The Federalist Papers
fared similarly. So did sections of the King James Bible and Kant&rsquo;s <em>Critique
of Pure Reason</em>. Historical documents, written by humans, for human purposes,
in an era when no AI existed — flagged as machine-generated.</p>
<p>This was not a marginal edge case. It was consistent across tools and across
documents. And while it was widely reported as evidence that the detectors
were broken, there is a more precise reading available: the detectors were
working correctly, and we had misunderstood what they were measuring.</p>
<hr>
<h2 id="what-the-detectors-actually-measure">What the Detectors Actually Measure</h2>
<p>Most commercial AI detectors — GPTZero, Turnitin&rsquo;s detection layer,
Copyleaks — use some combination of two statistical signals.</p>
<p><strong>Perplexity.</strong> A language model assigns a probability to each token given
the preceding tokens. Low perplexity means the text was, token by token,
what the model expected — it sits close to the centre of the probability
distribution. AI-generated text tends to have low perplexity because that
is precisely what generation does: it samples from the high-probability
region of the distribution <a href="#ref-1">[1]</a>. Human text, on average,
has higher perplexity, because humans write for specific contexts with
idiosyncratic word choices, rhetorical effects that require the unexpected,
and the accumulated noise of composing for a real reader.</p>
<p><strong>Burstiness.</strong> A term introduced by Edward Tian, GPTZero&rsquo;s creator: human
writing has high burstiness — sentence lengths vary widely, vocabulary
density shifts, complex constructions alternate with simple ones. AI writing
is more uniform. The statistical distribution of sentence lengths in LLM
output is narrower than in most human prose <a href="#ref-2">[2]</a>.</p>
<p>The underlying assumption these tools share: human writing is variable,
contextually messy, idiosyncratic. AI writing is smooth and predictable.</p>
<p>This is accurate for a large class of human writing — casual prose, personal
essays, social media, student writing in informal registers. It is wrong
about a different and well-defined class of human writing. The Constitution
sits in that class. So does a lot of other text.</p>
<hr>
<h2 id="the-systemising-brain">The Systemising Brain</h2>
<p>Simon Baron-Cohen&rsquo;s empathising–systemising (E-S) theory distinguishes two
cognitive orientations. Empathising involves attending to social and emotional
cues, inferring mental states, navigating the pragmatic, implicit layer of
communication — what is meant rather than what is said. Systemising involves
attending to rules, patterns, and underlying regularities — the drive to
understand how things work and to represent them in explicit, transferable,
internally consistent terms <a href="#ref-3">[3]</a>.</p>
<p>Both orientations are distributed across the human population. They are not
exclusive, and neither is pathological. But autism spectrum conditions are
robustly associated with high systemising and relatively lower empathising —
not because autistic people lack emotions or care about others, but because
the cognitive mode that comes naturally to them is one of rules, structures,
and explicit representation rather than social inference and pragmatic
implication. The intense world theory <a href="#ref-4">[4]</a> adds a
complementary perspective: autistic brains may be characterised by
hyper-reactivity and hyper-plasticity, with pattern-seeking and systematising
serving partly as a way of making a too-intense world navigable. The
systematicity is not a deficit. It is an adaptation.</p>
<p>This has direct consequences for writing.</p>
<p>High-systemising writing tends toward:</p>
<ul>
<li>
<p><strong>Consistent vocabulary.</strong> The same term is used for the same concept
throughout, because substituting a synonym introduces ambiguity about
whether the referent is actually the same. Neurotypical writing freely
uses synonyms for stylistic variety; systemising writing resists this
on principle.</p>
</li>
<li>
<p><strong>Explicit logical structure.</strong> Claims are supported by stated reasons
rather than left to pragmatic inference. If there are three conditions,
all three are named. Nothing is &ldquo;needless to say.&rdquo;</p>
</li>
<li>
<p><strong>Low social hedging.</strong> Phrases like &ldquo;as everyone knows&rdquo; or &ldquo;obviously&rdquo;
are avoided, because they perform social alignment rather than convey
information — and they depend on shared assumptions the writer is not
confident are actually shared. (This connects to a point I made in the
<a href="/posts/car-wash-walk/">car-wash-walk post</a> about Gricean pragmatics:
autistic communication often violates the maxim of quantity in the
direction of over-informing, because nothing is assumed implicit.)</p>
</li>
<li>
<p><strong>Grammatical parallelism.</strong> Parallel logical content takes parallel
grammatical form. This is not stylistic affectation; it is a natural
consequence of representing structure explicitly.</p>
</li>
<li>
<p><strong>Minimal rhetorical noise.</strong> The prose does not meander, warm up, or
perform relatability. It states what needs to be stated.</p>
</li>
</ul>
<p>Now run text with these properties through an AI detector. Consistent
vocabulary reads as low lexical diversity. Explicit structure reads as low
burstiness. Minimal rhetorical noise reads as smooth, generated output. The
detector is measuring these properties accurately. The attribution to machine
generation is where it goes wrong.</p>
<p>Liang et al. <a href="#ref-5">[5]</a> demonstrated a closely related failure empirically: AI
detectors are significantly more likely to flag writing by non-native English
speakers as AI-generated. Non-native writers at advanced levels of formal
English tend to write more carefully, more consistently, and more in
accordance with explicit grammar rules — because they learned the language
as a system of explicit rules rather than acquiring it through immersive
social exposure. More systematic writing: higher AI probability score. The
mechanism is the same. The population is different.</p>
<hr>
<h2 id="the-physicist-brain">The Physicist Brain</h2>
<p>Physics writing has its own conventions, independently developed but pointing
in the same direction.</p>
<p>Scientific prose requires defined terms used consistently: in a paper about
quantum error correction, &ldquo;logical qubit,&rdquo; &ldquo;physical qubit,&rdquo; and &ldquo;syndrome&rdquo;
each mean exactly one thing, used identically in section 2 and section 5.
It requires explicit assumptions: &ldquo;We assume the noise is Markovian.&rdquo; &ldquo;In
the limit of large N.&rdquo; These are not vague hedges; they are precise
statements about the domain of validity of the results. It requires logical
derivation over rhetorical persuasion: the connectives are &ldquo;since,&rdquo;
&ldquo;therefore,&rdquo; &ldquo;it follows that&rdquo; — explicit logical operators, not narrative
bridges. And the passive construction of &ldquo;the signal was measured&rdquo; rather
than &ldquo;I measured the signal&rdquo; removes the individual from the result,
because the result should be reproducible regardless of who performs the
measurement.</p>
<p>The outcome is prose that is systematic, consistent, and structurally
predictable. From the outside — and from the vantage point of an AI
detector — it looks machine-generated.</p>
<p>Paul Dirac is the physicist who comes to mind first here. His 1928 paper
deriving the relativistic wave equation for the electron contains almost no
rhetorical apparatus. Motivation, equation, consequence: each stated once,
clearly, with no warm-up and no elaboration beyond what the argument
requires. It is not warm. It is not discursive. It is beautiful in the way
that a proof is beautiful: every element earns its place. Run it through
GPTZero and see what you get.</p>
<p>This connection between the physicist&rsquo;s prose style and the autistic cognitive
mode is not accidental. Baron-Cohen et al. <a href="#ref-6">[6]</a> surveyed Cambridge students
by academic discipline and found that physical scientists and mathematicians
scored consistently higher on the Autism Quotient (AQ) than humanities
students and controls, with mathematicians scoring highest of all. The
systemising orientation associated with autism spectrum conditions is also
associated with — and presumably selected for — in quantitative scientific
disciplines. The physicist&rsquo;s prose reflects this. So does the writing of a
high-systemising person who has never studied physics.</p>
<p>The categories overlap without being identical. What they share is a
cognitive preference for explicit structure, consistent vocabulary, and
logical transparency over social performance and rhetorical persuasion. The
writing that emerges from that preference looks, to an AI detector, like it
was generated by a machine.</p>
<p>It was not.</p>
<hr>
<h2 id="the-category-error">The Category Error</h2>
<p>The error AI detectors make is not a measurement error. It is a category
error.</p>
<p>They are trained to distinguish two things: output generated by a
contemporary LLM, and a specific subset of human writing — typically casual,
personal, or student prose collected from online sources. When they encounter
text outside either of those training categories — systematic and precise but
human-generated — the classifier has no good option. The text does not match
the &ldquo;AI&rdquo; training data exactly, and it does not match the &ldquo;human&rdquo; baseline
either. It gets assigned to the bin it fits least badly.</p>
<p>What is happening when the Constitution is flagged: it is systematic,
definitional, prescriptive, and internally consistent. It was written by
lawyers and statesmen who understood that ambiguity in foundational documents
creates legal chaos. They wrote to be unambiguous. The result is text with
low perplexity and low burstiness — the statistical signature the detector
associates with AI.</p>
<p>GPTZero&rsquo;s creator Edward Tian acknowledged this problem when it was reported:
the Constitution appears so frequently in LLM training data that it registers
as &ldquo;already known&rdquo; to the model, which artificially lowers its perplexity
score. That is a real and specific issue. But it is secondary. The deeper
issue is that the Constitution would score low-perplexity even without the
training-data contamination effect, because systematic, definitional prose
is intrinsically low-perplexity. Precise language is predictable language.
That is partly the point of precise language.</p>
<p>The baseline assumption — that human writing is variable and idiosyncratic —
holds for much human writing. It does not hold for legal drafting, technical
documentation, scientific papers, sacred and historical texts written to be
durable and precise, writing by people with high systemising orientation, or
writing by non-native speakers at formal registers. That is not a small
population of edge cases. It is a substantial fraction of all written
material that exists.</p>
<hr>
<h2 id="right-and-wrong-at-the-same-time">Right and Wrong at the Same Time</h2>
<p>So: if you think these posts are AI-generated, you are right and wrong at
the same time.</p>
<p>Right, in two ways. First: yes, I use AI tools. LLM assistance is part of
my writing process — not an occasional aid, but a regular part of how
research notes and half-formed arguments become structured posts. Second:
the writing style of these posts is systematic and precise in ways that
detectors register as machine-generated. That systematicity is real, and
if a detector picks it up, it is measuring something.</p>
<p>Wrong, also in two ways. First: the ideas, judgments, and connections in
these posts are mine. The decisions about what to include and what to leave
out, which papers to cite and how to frame their implications, where the
interesting tension lies between neurodiversity research and the assumptions
baked into AI detection tools — those are not outputs of a language model
working in isolation. They are the product of someone who works at the
intersection of these fields and has thought about them for a while. An LLM
cannot generate these posts without a human who has already decided what
to say.</p>
<p>Second, and more important for the argument here: the systematic, precise
character of this writing is not evidence of machine generation. It is a
cognitive signature — one associated with physics training, with high
systemising orientation, with the <a href="/posts/inner-echo/">overlap between those two things that I
have written about elsewhere</a> in the context of
neurodiversity more broadly.</p>
<p>The detector is measuring a real property of the text. It is misattributing
the origin of that property.</p>
<p>The interesting question this opens is not &ldquo;did AI write this?&rdquo; That question
is increasingly poorly posed in an era where thinking and writing are already
deeply entangled with machine assistance, in ways that differ sharply from
person to person and task to task. The better question is: <em>whose judgment
is in the text?</em> Whose choices about what to include, what to connect, what
to leave out?</p>
<p>The systematicity in this writing is mine. The recognition that AI detectors
systematically disadvantage autistic writers, physicist writers, and
non-native speakers is a judgment I made, not one a language model was
prompted to produce. The connection to the Constitution — a document written
to be maximally unambiguous, flagged as maximally AI-like — is a connection
I found worth drawing.</p>
<p>Whether that makes this text &ldquo;human&rdquo; is a philosophical question I am happy
to leave open. What it is not is AI hallucination.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., &amp; Finn, C. (2023). DetectGPT: Zero-shot machine-generated text detection using probability curvature. <em>Proceedings of the 40th International Conference on Machine Learning (ICML 2023)</em>. <a href="https://arxiv.org/abs/2301.11305">https://arxiv.org/abs/2301.11305</a></p>
<p><span id="ref-2"></span>[2] Gehrmann, S., Strobelt, H., &amp; Rush, A. M. (2019). GLTR: Statistical detection and visualization of generated text. <em>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</em>, 111–116. <a href="https://doi.org/10.18653/v1/P19-3019">https://doi.org/10.18653/v1/P19-3019</a></p>
<p><span id="ref-3"></span>[3] Baron-Cohen, S. (2009). Autism: The empathising–systemising (E-S) theory. <em>Annals of the New York Academy of Sciences</em>, 1156(1), 68–80. <a href="https://doi.org/10.1111/j.1749-6632.2009.04467.x">https://doi.org/10.1111/j.1749-6632.2009.04467.x</a></p>
<p><span id="ref-4"></span>[4] Markram, K., &amp; Markram, H. (2010). The intense world theory — a unifying theory of the neurobiology of autism. <em>Frontiers in Human Neuroscience</em>, 4, 224. <a href="https://doi.org/10.3389/fnhum.2010.00224">https://doi.org/10.3389/fnhum.2010.00224</a></p>
<p><span id="ref-5"></span>[5] Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., &amp; Zou, J. (2023). GPT detectors are biased against non-native English writers. <em>Patterns</em>, 4(7), 100779. <a href="https://doi.org/10.1016/j.patter.2023.100779">https://doi.org/10.1016/j.patter.2023.100779</a></p>
<p><span id="ref-6"></span>[6] Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., &amp; Clubley, E. (2001). The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. <em>Journal of Autism and Developmental Disorders</em>, 31(1), 5–17. <a href="https://doi.org/10.1023/A:1005653411471">https://doi.org/10.1023/A:1005653411471</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Try to Relax — and Other Things That Prevent Themselves</title>
      <link>https://sebastianspicker.github.io/posts/try-to-relax-ironic-process-wormholes/</link>
      <pubDate>Thu, 15 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/try-to-relax-ironic-process-wormholes/</guid>
      <description>&amp;ldquo;Try to relax&amp;rdquo; is a paradox with a precise psychological mechanism. So is the traversable wormhole: the geometry you need to cross spacetime closes the moment you try to use it. The grandfather paradox, Wegner&amp;rsquo;s ironic monitoring process, and Rick Sanchez&amp;rsquo;s nihilism problem all share the same deep structure — and understanding that structure is more interesting than any of the individual cases.</description>
      <content:encoded><![CDATA[<p>Someone, at some point in your life, has told you to relax. They may have
specified that you should <em>try</em> to relax — as though relaxation were an
effortful goal you could pursue with sufficient will. If you have ever
received this advice and found it made things worse, you were not imagining
it. You were experiencing a phenomenon with a name, a precise mechanism,
and — it turns out — a surprising structural analogue in the geometry of
spacetime.</p>
<hr>
<h2 id="the-ironic-process">The Ironic Process</h2>
<p>In 1994, the social psychologist Daniel Wegner published a paper that
formalised what most people already suspected: trying not to think of
something makes you think of it more <a href="#ref-1">[1]</a>. The theoretical
model behind this has two components.</p>
<p>The first is an <strong>operating process</strong>: it actively generates mental content
consistent with the intended state. You are trying to relax — the operating
process searches for calming thoughts, slows your attention, tries to find
the mood.</p>
<p>The second is a <strong>monitoring process</strong>: it runs in parallel, searching for
evidence that the goal has <em>not</em> been achieved. Am I relaxed yet? No.
Checking again. Still no. Its function is to detect failure early so the
operating process can correct course.</p>
<p>Under normal conditions, the operating process dominates. You try to relax,
the monitor runs quietly in the background, and eventually you converge on
the intended state. Under conditions of cognitive load, stress, or
self-consciousness — precisely the conditions under which someone might
urgently need to relax — the balance shifts. The monitoring process,
searching for signs of not-relaxing, finds them everywhere. The monitor
activates the very content it is supposed to prevent. The harder you try,
the louder the monitor, the further from the goal.</p>
<p>This is Wegner&rsquo;s ironic process: the mechanism recruited to achieve a goal
becomes the primary obstacle to that goal. It is not failure of will. It
is a structural property of the system — and it applies to any goal whose
target state is the <em>absence</em> of effortful activity. Trying to fall asleep.
Trying not to feel anxious about a performance. Trying to be spontaneous.
Trying, in the most purely paradoxical formulation, to relax.</p>
<p>The instruction &ldquo;try to relax&rdquo; is not bad advice because the advice-giver
lacks empathy. It is bad advice because it is a <em>category error</em>: it applies
an effort-based tool to a goal defined by the absence of effort. The
monitoring process required to track progress toward the goal is precisely
the kind of activity that constitutes not having reached it.</p>
<hr>
<h2 id="a-geometry-that-does-the-same-thing">A Geometry That Does the Same Thing</h2>
<p>The analogy I want to draw requires a brief detour into general relativity.</p>
<p>In 1988, Michael Morris and Kip Thorne published a paper with the
unpromising title &ldquo;Wormholes in spacetime and their use for interstellar
travel: A tool for teaching general relativity&rdquo; <a href="#ref-2">[2]</a>.
It is, in the field&rsquo;s understated way, one of the more consequential papers
in the subject. Morris and Thorne asked: what would a traversable wormhole —
one you could actually pass through — require, physically and mathematically?</p>
<p>The spacetime metric of a traversable wormhole in their formulation is:</p>
$$ds^2 = -e^{2\Phi(r)}\,dt^2 + \frac{dr^2}{1 - b(r)/r} + r^2\,d\Omega^2$$<p>where $\Phi(r)$ is the redshift function and $b(r)$ is the shape function.
The throat of the wormhole sits at $r = r_0$, where $b(r_0) = r_0$.
For anything to pass through in finite proper time, $\Phi$ must remain
finite — no infinite redshift — and $b(r)/r$ must remain less than one
away from the throat.</p>
<p>So far this is just geometry. The physics enters through the Einstein field
equations, which connect the geometry to the matter and energy present.
To maintain the wormhole throat against collapse — to hold it open — the
stress-energy tensor of whatever matter fills the throat must satisfy:</p>
$$T_{\mu\nu}\, k^\mu k^\nu < 0$$<p>for null vectors $k^\mu$ — what is called a <em>violation of the null energy
condition</em>. In plain terms: the matter holding the wormhole open must have
negative energy density. Not small energy density. <em>Negative</em> — less than
nothing.</p>
<p>This is exotic matter. It does not appear in any tabletop experiment.
Classical general relativity does not rule it out, but it does not provide
it either.</p>
<p>Quantum mechanics is slightly more helpful: the Casimir effect produces
measurable negative energy density between closely spaced conducting plates.
The Hawking radiation calculation involves transient negative energy near
black hole horizons. So quantum field theory permits negative energy — in
principle. But Ford and Roman <a href="#ref-3">[3]</a> showed that quantum field theory also
strictly <em>limits</em> it: the integrated negative energy over any region is
bounded by a quantum inequality. The shorter the burst of negative energy,
the smaller it must be; the larger the region, the more constrained the
magnitude. The result is that any realistic traversable wormhole would be
either Planck-scale (far too small for anything but quantum information to
traverse) or would require negative energy concentrated in a band many
orders of magnitude thinner than the throat itself — an engineering
requirement that borders on the physically absurd.</p>
<p>The wormhole, in other words, does something structurally similar to the
monitoring process in Wegner&rsquo;s model: the condition required to make it
traversable actively resists being satisfied. The geometry that would allow
passage tends toward collapse. The more you want the wormhole to be open
and stable, the more the energy conditions conspire against you.</p>
<hr>
<h2 id="what-the-2022-wormhole-actually-was">What the 2022 &ldquo;Wormhole&rdquo; Actually Was</h2>
<p>In late 2022, a team including Daniel Jafferis, Alexander Zlokapa, and
colleagues at Caltech and Google published a paper in <em>Nature</em> with the
title &ldquo;Traversable wormhole dynamics on a quantum processor&rdquo; <a href="#ref-4">[4]</a>. Several major news outlets reported that scientists had
created a wormhole. This was not accurate.</p>
<p>What the team actually did was implement a quantum circuit on Google&rsquo;s
Sycamore processor that simulates the Sachdev-Ye-Kitaev (SYK) model —
a quantum mechanical system of randomly interacting fermions that is
holographically dual, via Maldacena&rsquo;s AdS/CFT correspondence, to a
nearly two-dimensional anti-de Sitter black hole geometry. Two coupled
SYK systems are dual to a two-sided eternal black hole, which is connected
in the bulk by an Einstein-Rosen bridge — a wormhole.</p>
<p>By coupling the two systems with a specific negative coupling (which
corresponds, via ER=EPR, to injecting negative energy into the wormhole),
the team made the bridge traversable in the holographic sense: information
encoded in one quantum system propagated and was recovered in the other,
consistent with traversal of the dual gravitational wormhole.</p>
<p>This is genuinely interesting physics. It is not a wormhole through our
spacetime. The wormhole lives in the holographic dual geometry — a
mathematical construct in a lower-dimensional theory of gravity, not a
tunnel between two points in the universe you inhabit. Quantum teleportation
occurred on a quantum chip via the ordinary mechanism of quantum
entanglement. The gravitational language is a description of the
same physics in a dual frame, not a shortcut through space.</p>
<p>The media confusion is itself instructive: &ldquo;wormhole&rdquo; has drifted far from
its original meaning. In current physics, the word can refer to a
Morris-Thorne traversable tunnel through spacetime, to the Einstein-Rosen
bridge of an eternal black hole, to a holographic dual of quantum
entanglement <a href="#ref-5">[5]</a>, or to saddle points in the
Euclidean gravitational path integral relevant to the black hole information
paradox. These are related by mathematics but quite different in what they
physically represent. None of the last three are traversable shortcuts
through the universe. The first is, in principle, but barely, and only at
the cost of exotic matter physics that nobody knows how to achieve.</p>
<p>The harder physicists have worked to make the wormhole genuinely traversable
and macroscopic, the more the mathematics has resisted. This is, at minimum,
a suggestive pattern.</p>
<hr>
<h2 id="what-2025-added">What 2025 Added</h2>
<p>The field did not stand still after 2022. Three independent lines of work
published in 2024 and 2025 have further complicated what a wormhole is —
and in each case the complication pushes in the same direction: the geometry
keeps refusing to be a shortcut.</p>
<p><strong>The wormhole that does not connect two things.</strong> Maloney, Meruliya, and Van Raamsdonk <a href="#ref-7">[7]</a> showed that Euclidean wormholes — saddle points in
the gravitational path integral — appear generically in ordinary
higher-dimensional gravity, without any special setup. The striking
implication is that these wormholes do not bridge two separate universes
or two separate theories; they encode statistical fluctuations <em>within a
single theory</em>. The replica wormholes that resolved the Page curve for
black hole radiation — one of the central recent results in the black hole
information paradox — are of this type. The wormhole is not a connection
between two things. It is a feature of how the theory sums over histories,
a bookkeeping structure for correlations within one system. The physical
picture of two mouths joined by a throat does not apply.</p>
<p><strong>The wormhole that is not smooth.</strong> Magán, Sasieta, and Swingle <a href="#ref-8">[8]</a> studied the interior geometry of the Einstein-Rosen bridge connecting
typical entangled black holes — the configuration that is supposed, under
ER=EPR, to be the gravitational dual of quantum entanglement. Their result,
published in <em>Physical Review Letters</em>, is that this interior is not a
smooth tunnel. It is long, irregular, and chaotic — an Einstein-Rosen
caterpillar, as they call it. The quantum randomness of the entangled state
maps directly onto geometric disorder in the interior: the more thermalized
the state, the more disordered the bridge. A traversing observer, if one
could exist, would not glide through a clean throat. They would navigate a
geometry shaped by quantum chaos, growing longer and more disordered as
the system evolves. This is ER=EPR taken seriously at the level of typical
states rather than special ones, and the result is inhospitable to any
ordinary notion of passage.</p>
<p><strong>The wormhole that is not a tunnel at all.</strong> Gaztañaga, Kumar, and Marto <a href="#ref-9">[9]</a> proposed a more radical reinterpretation: the Einstein-Rosen bridge,
they argue, is not a connection between two separate spaces but a
representation of time-reversal symmetry within a single quantum description.
On this reading, there is only one space, and the bridge is an artefact
of how you describe the time-symmetric structure of the quantum state. The
paper, published in <em>Classical and Quantum Gravity</em>, attracted considerable
press coverage. It sits somewhat outside the mainstream of holographic
quantum gravity research, and the proposal has not yet been widely
integrated into the community&rsquo;s working framework — the language of two
entangled systems and a connecting geometry remains the dominant picture
in AdS/CFT calculations. But the direction it points is consistent with
the other two results.</p>
<p>Taken together, these papers suggest that the word &ldquo;wormhole&rdquo; has been
quietly revised from a noun into an adjective. Not a thing that exists
somewhere, but a property of certain mathematical structures — one that
describes correlation, disorder, or symmetry depending on which context
you are working in. Each attempt to pin down what a wormhole <em>is</em> in
practice finds something less traversable, less connected, and less
tunnel-like than the previous attempt.</p>
<p>This is, to put it plainly, consistent with the theme of this article.</p>
<hr>
<h2 id="causation-eating-its-own-tail">Causation Eating Its Own Tail</h2>
<p>The wormhole&rsquo;s physical problems become even sharper when you add time.
A traversable wormhole connecting two different spacetime regions can in
principle connect not just two different places but two different <em>times</em> —
creating a closed timelike curve (CTC), a path through spacetime that loops
back on itself. You leave on Tuesday and arrive last Thursday.</p>
<p>The standard paradoxes then apply. The grandfather paradox: you travel back
in time, prevent an event that was a necessary precondition of your journey.
The causal chain that produced the journey destroys the causal chain that
produced the journey. The bootstrap paradox: an object or piece of
information exists with no origin — passed back in time repeatedly, it has
always already existed, created by nothing, caused by itself.</p>
<p>Friedman, Morris, Novikov and colleagues formalised what has become known
as the Novikov self-consistency principle: the only physically admissible
solutions are those in which the causal structure is globally consistent <a href="#ref-6">[6]</a>. No grandfather paradox — not because you cannot
go back, but because if you do, it turns out you were always part of the
causal chain you thought you were disrupting. The time-traveller cannot
prevent an event; they can only be the mechanism by which it occurred.</p>
<p>This is not resolution. It is constraint. The universe selects only the
self-consistent loops, filtering out everything else. The causal structure
enforces a particular kind of conservatism: only actions that were always
going to happen can happen. There is no freedom in a closed timelike curve.
Trying to change the loop from inside it is exactly like trying to relax
by monitoring whether you have relaxed: the mechanism of change is part
of the thing you are trying to change.</p>
<hr>
<h2 id="rick-sanchezs-particular-problem">Rick Sanchez&rsquo;s Particular Problem</h2>
<p>Rick and Morty is, among other things, a sustained meditation on
this structure — without ever calling it that.</p>
<p>Rick Sanchez is the smartest being in every universe. His portal gun
creates traversable wormholes instantaneously and at negligible energy
cost, which is exactly what general relativity and quantum field theory
suggest should be impossible. The show waves this away; what it does not
wave away is the <em>psychological</em> consequence of Rick&rsquo;s capability.</p>
<p>Rick has thought his way to the conclusion that nothing matters. Infinite
universes, infinite timelines, infinite Ricks: every moment is replaceable,
every loss is recoverable somewhere else, every moral weight dissolves
in the face of the combinatorial enormity of everything that exists. This
is Rick&rsquo;s version of relaxation — the nihilism that should follow from
taking the multiverse seriously.</p>
<p>But the monitoring process runs. Rick checks whether he has achieved
not-caring, finds that he cares (about Morty, about Beth, about being
the smartest one in the room), and the caring becomes more vivid for
having been suppressed. His nihilism is not peace. It is a performance of
peace that is constantly undermined by the monitoring process watching
for cracks.</p>
<p>Rick&rsquo;s portal gun solves every spatial and temporal problem. It does not
solve the ironic process. No level of intelligence, and no number of
traversable wormholes, provides a shortcut past Wegner&rsquo;s monitor.
This is, I think, what makes the character work: the show&rsquo;s impossible
physics is the premise, but the <em>actually</em> impossible thing — the one the
show treats as genuinely intractable — is the psychological paradox.</p>
<hr>
<h2 id="the-common-structure">The Common Structure</h2>
<p>These cases — the relaxation paradox, the traversable wormhole, the closed
timelike curve — share a formal structure.</p>
<p>In each case, there is a desired end state (relaxation, passage through the
wormhole, a changed past) and a mechanism for pursuing it (effortful
monitoring, exotic matter, time travel). In each case, the mechanism
required to pursue the end state is incompatible with the end state itself.
The monitoring process that tracks &ldquo;am I relaxed?&rdquo; is the activity of not
being relaxed. The exotic matter that holds the wormhole open is the
physical condition that makes the geometry so extreme that traversal is
barely possible. The attempt to change the past is always already part
of the past you were trying to change.</p>
<p>The physicist&rsquo;s version of this is the quantum measurement problem: the act
of observing a system disturbs it. The observer cannot step outside the
measurement. The psychologist&rsquo;s version is the ironic process. The
relativist&rsquo;s version is the closed timelike curve. The narrative version
is Rick Sanchez.</p>
<hr>
<h2 id="what-actually-works">What Actually Works</h2>
<p>Wegner&rsquo;s answer to the ironic process is not to try harder with the
operating system. It is to release the monitoring system — to stop checking
whether the goal has been achieved. This is the core insight behind
Acceptance and Commitment Therapy: you cannot think your way to not-thinking.
The goal of not-thinking requires not-monitoring, which means not having the
goal in the active, effortful sense at all.</p>
<p>This is harder than it sounds. It is a second-order intervention: instead
of trying to relax, you try to stop trying to relax — which, done badly,
just adds another monitoring process. But done well, it is the correct
diagnosis: the category error was treating relaxation as an effortful goal
in the first place.</p>
<p>For wormholes, the physics community has arrived at a related answer. The
question &ldquo;how do we make a macroscopic traversable wormhole in our
spacetime?&rdquo; may be the wrong question. The ER=EPR framework suggests that
wormholes and quantum entanglement are two descriptions of the same thing.
The question is not how to build a tunnel; it is what the entanglement
structure of spacetime already is, and how information is already being
transferred through it. The shortcut was never a shortcut. It was always
just the ordinary geometry of entangled quantum systems, described in
a language that made it look exotic.</p>
<p>For Rick Sanchez, the show has not found an answer. Which is, probably,
the correct narrative decision.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Wegner, D. M. (1994). Ironic processes of mental control. <em>Psychological Review</em>, 101(1), 34–52. <a href="https://doi.org/10.1037/0033-295X.101.1.34">https://doi.org/10.1037/0033-295X.101.1.34</a></p>
<p><span id="ref-2"></span>[2] Morris, M. S., &amp; Thorne, K. S. (1988). Wormholes in spacetime and their use for interstellar travel: A tool for teaching general relativity. <em>American Journal of Physics</em>, 56(5), 395–412. <a href="https://doi.org/10.1119/1.15620">https://doi.org/10.1119/1.15620</a></p>
<p><span id="ref-3"></span>[3] Ford, L. H., &amp; Roman, T. A. (1996). Quantum field theory constrains traversable wormhole geometries. <em>Physical Review D</em>, 53(10), 5496–5507. <a href="https://doi.org/10.1103/PhysRevD.53.5496">https://doi.org/10.1103/PhysRevD.53.5496</a></p>
<p><span id="ref-4"></span>[4] Jafferis, D., Zlokapa, A., Lykken, J. D., Kolchmeyer, D. K., Davis, S. I., Lauk, N., Neven, H., &amp; Spiropulu, M. (2022). Traversable wormhole dynamics on a quantum processor. <em>Nature</em>, 612, 51–55. <a href="https://doi.org/10.1038/s41586-022-05424-3">https://doi.org/10.1038/s41586-022-05424-3</a></p>
<p><span id="ref-5"></span>[5] Maldacena, J., &amp; Susskind, L. (2013). Cool horizons for entangled black holes. <em>Fortschritte der Physik</em>, 61(9), 781–811. <a href="https://doi.org/10.1002/prop.201300020">https://doi.org/10.1002/prop.201300020</a></p>
<p><span id="ref-6"></span>[6] Friedman, J., Morris, M. S., Novikov, I. D., Echeverria, F., Klinkhammer, G., Thorne, K. S., &amp; Yurtsever, U. (1990). Cauchy problem in spacetimes with closed timelike curves. <em>Physical Review D</em>, 42(6), 1915–1930. <a href="https://doi.org/10.1103/PhysRevD.42.1915">https://doi.org/10.1103/PhysRevD.42.1915</a></p>
<p><span id="ref-7"></span>[7] Maloney, A., Meruliya, V., &amp; Van Raamsdonk, M. (2025). arXiv:2503.12227. <a href="https://arxiv.org/abs/2503.12227">https://arxiv.org/abs/2503.12227</a></p>
<p><span id="ref-8"></span>[8] Magán, J. M., Sasieta, M., &amp; Swingle, B. (2025). Einstein-Rosen caterpillar. <em>Physical Review Letters</em>, 135. <a href="https://doi.org/10.1103/btw6-44ry">https://doi.org/10.1103/btw6-44ry</a></p>
<p><span id="ref-9"></span>[9] Gaztañaga, E., Kumar, A., &amp; Marto, J. (2025). <em>Classical and Quantum Gravity</em>. <a href="https://doi.org/10.1088/1361-6382/ae3044">https://doi.org/10.1088/1361-6382/ae3044</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>A Gas at Temperature T: Xenakis and the Physics of Stochastic Music</title>
      <link>https://sebastianspicker.github.io/posts/xenakis-stochastic-music/</link>
      <pubDate>Tue, 14 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/xenakis-stochastic-music/</guid>
      <description>Iannis Xenakis applied the Maxwell-Boltzmann velocity distribution, Markov chains, and game theory to orchestral composition. In Pithoprakta (1955–56), 46 string parts are molecules of a gas, each following the kinetic theory distribution. In Duel and Stratégie (1959–62), two conductors play a zero-sum game with payoff matrices on stage. This post works through the physics and mathematics, and asks what it means when a composer treats an orchestra as a thermodynamic system.</description>
      <content:encoded><![CDATA[<p><em>Iannis Xenakis (1922–2001) was trained as a civil engineer at the Athens
Polytechnic, joined the Greek Resistance during the Second World War and the
subsequent Greek Civil War, survived a British army tank shell in January 1945
that cost him the sight in his left eye and part of his jaw, was sentenced to
death in absentia by the Greek military government, fled to Paris in 1947, and
worked for twelve years as an architect in Le Corbusier&rsquo;s atelier — where he
contributed structural engineering to the Unité d&rsquo;Habitation in Marseille and
designed the Philips Pavilion for Expo 58. In parallel, already in his thirties,
he taught himself composition — approaching Honegger (who was too ill to teach) and then studying with Messiaen
— and became one of the central figures of the post-war avant-garde. I mention
the biography not as background colour but because it bears on the physics. A
person who has been through what Xenakis had been through by 1950 is not likely
to be intimidated by the kinetic theory of gases.</em></p>
<p><em>He was not. In 1955–56 he composed</em> Pithoprakta <em>— &ldquo;actions through
probability&rdquo; — for 46 strings, each of which is, in his own account, a
molecule of an ideal gas. This post works through the mathematics he
used and asks what it means when a composer takes statistical mechanics
seriously as a compositional tool.</em></p>
<hr>
<h2 id="the-problem-with-post-war-serialism">The Problem with Post-War Serialism</h2>
<p>To understand why Xenakis did what he did, it helps to know what everyone
else was doing. By the early 1950s, the dominant tendency in European
new music was total serialism: the systematic extension of Schoenberg&rsquo;s
twelve-tone technique to rhythm, dynamics, articulation, and register. Every
parameter of every note was determined by a series. Messiaen had sketched
this direction in <em>Mode de valeurs et d&rsquo;intensités</em> (1949); Boulez and
Stockhausen had taken it to its logical extreme.</p>
<p>The result, as Xenakis observed with characteristic bluntness in <em>Formalized
Music</em> (1963/1992), was a kind of sonic indistinguishability: because every
parameter varied according to independent deterministic series, the textures
produced by total serialism sounded essentially like random noise. The
maximum of local determinism had produced the appearance of global chaos.</p>
<p>His diagnosis was precise and, I think, correct: if the perceptual result of
maximum determinism and maximum randomness is the same, then the path forward
is not to find a better deterministic scheme but to embrace randomness
explicitly, at the level that governs the <em>macroscopic</em> structure. Control the
distribution; let the individual events vary within it. This is exactly what
statistical mechanics does for a gas: it does not track every molecule, but
it knows with great precision what the distribution of velocities will be.</p>
<hr>
<h2 id="statistical-mechanics-in-brief">Statistical Mechanics in Brief</h2>
<p>In a classical ideal gas of $N$ molecules at thermal equilibrium with
temperature $T$, the molecules move in all directions with speeds distributed
according to the Maxwell-Boltzmann speed distribution:</p>
$$f(v) = \sqrt{\frac{2}{\pi}}\, \frac{v^2}{a^3}\, \exp\!\left(-\frac{v^2}{2a^2}\right), \qquad a = \sqrt{\frac{k_B T}{m}},$$<p>where $m$ is the molecular mass and $k_B$ is Boltzmann&rsquo;s constant. The
parameter $a$ sets the characteristic speed scale: it grows with temperature
(hotter gas means faster molecules) and shrinks with molecular mass (heavier
molecules move more slowly at the same temperature).</p>
<p>The distribution has a characteristic shape: it rises as $v^2$ for small
speeds (few molecules are nearly stationary), peaks at the most probable
speed $v_p = a\sqrt{2}$, and falls off as $e^{-v^2/2a^2}$ for large speeds
(very fast molecules are exponentially rare). The three characteristic
speeds are:</p>
$$v_p = a\sqrt{2}, \qquad \langle v \rangle = a\sqrt{\tfrac{8}{\pi}}, \qquad v_\mathrm{rms} = a\sqrt{3}.$$<p>No individual molecule is tracked. The distribution is everything: once you
know $f(v)$, you know all macroscopic properties of the gas — pressure,
mean kinetic energy, thermal conductivity — without knowing the trajectory
of a single molecule. The individual is sacrificed to the ensemble.</p>
<hr>
<h2 id="pithoprakta-and-the-orchestra-as-gas"><em>Pithoprakta</em> and the Orchestra as Gas</h2>
<p>In <em>Pithoprakta</em> (1955–56), Xenakis assigns each of the 46 string instruments
to a molecule of a gas. The musical analogue of molecular speed is the
<em>velocity of a glissando</em>: the rate at which a glissando moves through
pitch, measured in semitones per second. Slow glissandi are cold molecules;
fast glissandi are hot ones.</p>
<p>For a given passage with a specified musical &ldquo;temperature&rdquo; (an
intensity-and-density parameter he could set as a compositional choice),
the 46 glissando speeds are drawn from the Maxwell-Boltzmann distribution
for that temperature. No two strings play the same glissando at the same
speed. The effect, to a listener, is a dense sound-mass — a shimmer or
a roar — whose internal texture varies but whose overall character (the
temperature, the density) is under the composer&rsquo;s control at exactly the
level that matters perceptually.</p>
<p>Xenakis worked out the velocities numerically by hand. The score of
<em>Pithoprakta</em> was among the first in which the individual parts were derived
from a statistical distribution rather than from a melody, a row, or an
improvisation instruction. The calculation is tedious but not difficult:
for each time window, choose a temperature, compute $f(v)$ for the 46
values of $v$ that tile the distribution, and assign one speed to each
instrument.</p>
<p>The connection between macroscopic structure and microscopic liberty is
deliberately preserved. The shape of the sound-mass — its brightness,
its turbulence, its rate of change — is controlled. Each individual line
is unpredictable. This is, structurally, the same trade-off that makes
thermodynamics work: you give up on the individual trajectory and gain
exact knowledge of the aggregate.</p>
<hr>
<h2 id="musical-temperature-as-a-compositional-parameter">Musical Temperature as a Compositional Parameter</h2>
<p>The analogy is worth making precise. In the physical gas, raising the
temperature $T$ increases $a = \sqrt{k_B T / m}$, which shifts the
peak of $f(v)$ to the right and widens the distribution. More molecules
have high speeds; the variance of speeds increases.</p>
<p>In <em>Pithoprakta</em>, raising the musical &ldquo;temperature&rdquo; has the same
effect: more instruments perform rapid glissandi; the pitch-space
trajectories are more varied; the texture becomes more active and
more turbulent. Lowering the temperature concentrates the glissando
speeds near zero — slow motion, near-stasis, long sustained tones
that change pitch only gradually. The orchestra cools.</p>
<p>This mapping is not metaphorical. Xenakis computed it. The score
contains numerically derived glissando speeds; the connection between the
perceptual temperature of the texture and the statistical parameter $T$ is
quantitative. When musicians speak of a passage &ldquo;heating up,&rdquo; they are
usually using a figure of speech. In <em>Pithoprakta</em>, they are describing
a thermodynamic fact.</p>
<hr>
<h2 id="the-poisson-distribution-and-event-density">The Poisson Distribution and Event Density</h2>
<p><em>Pithoprakta</em> uses a second physical model alongside the Maxwell-Boltzmann
distribution: the Poisson process, which governs the density of
independent, randomly occurring events.</p>
<p>If musical events (pizzicato attacks, bow changes, individual note entries)
occur at a mean rate of $\lambda$ events per second, the probability of
exactly $k$ events occurring in a time window of length $T$ is:</p>
$$P(N = k) = \frac{(\lambda T)^k\, e^{-\lambda T}}{k!}.$$<p>The Poisson distribution has a single parameter $\lambda$ that controls
both the mean and the variance (they are equal: $\langle N \rangle =
\mathrm{Var}(N) = \lambda T$). A high $\lambda$ produces a dense cluster
of events; a low $\lambda$ produces sparse, widely spaced events.</p>
<p>Xenakis used this to control the density of pizzicato attacks independently
of the glissando texture. A passage can be cool (slow glissandi) and dense
(many pizzicati), or hot and sparse, or any combination. The two
distributions operate on independent musical parameters — pitch motion and
event density — giving the composer a two-dimensional thermodynamic control
space over the texture.</p>
<hr>
<h2 id="markov-chains-analogique-a-and-analogique-b">Markov Chains: <em>Analogique A</em> and <em>Analogique B</em></h2>
<p>In <em>Analogique A</em> (for string orchestra, 1958–59) and its companion
<em>Analogique B</em> (for sinusoidal tones, same year), Xenakis moved to a
different stochastic framework: Markov chains.</p>
<p>A Markov chain is a sequence of states where the probability of
transitioning to the next state depends only on the current state. The
chain is specified by a transition matrix $P$, where $P_{ij}$ is the
probability of moving from state $i$ to state $j$:</p>
$$P_{ij} \geq 0, \qquad \sum_j P_{ij} = 1 \quad \forall\, i.$$<p>Under mild conditions (irreducibility and aperiodicity), the chain
converges to a unique stationary distribution $\pi$ satisfying:</p>
$$\pi P = \pi, \qquad \sum_i \pi_i = 1.$$<p>The convergence is geometric: if $\lambda_2$ is the second-largest eigenvalue
of $P$ in absolute value, then after $n$ steps the distribution $\pi^{(n)}$
satisfies $\|\pi^{(n)} - \pi\| \leq C |\lambda_2|^n$ for some constant $C$.
The gap $1 - |\lambda_2|$ — the <em>spectral gap</em> — controls how quickly the
chain forgets its initial state. A transition matrix with a large spectral
gap produces rapid convergence; one with $|\lambda_2| \approx 1$ produces
long-memory dependence between distant states. This is a compositional
choice: the spectral gap determines how quickly a piece&rsquo;s texture changes
character.</p>
<p>In <em>Analogique A</em>, Xenakis divided the sonic space into a grid of
cells defined by pitch register (high/middle/low), density
(sparse/medium/dense), and dynamic (soft/loud). Each &ldquo;screen&rdquo; — a brief
time window — occupies one cell in this grid. The progression of screens
through the piece is governed by transition probabilities: from a
high/dense/loud screen, there is some probability of moving to each
adjacent cell, specified by Xenakis&rsquo;s chosen transition matrix.</p>
<p>This is a Markov chain on a discrete state space of sonic textures. The
macroscopic trajectory of the piece — its overall movement through sound-
quality space — is determined by the transition matrix, which the composer
sets. The details of each screen are filled in stochastically, within the
parameters of the current state. Again, the individual is sacrificed to the
aggregate; control is exercised at the level of the distribution rather
than the event.</p>
<hr>
<h2 id="game-theory-duel-and-stratégie">Game Theory: <em>Duel</em> and <em>Stratégie</em></h2>
<p>The most extreme and, to my mind, most interesting of Xenakis&rsquo;s
formalisations is the use of game theory in <em>Duel</em> (1959) and <em>Stratégie</em>
(1962).</p>
<p>A <strong>two-player zero-sum game</strong> is specified by a payoff matrix $A \in
\mathbb{R}^{m \times n}$. Player 1 (the &ldquo;maximiser&rdquo;) chooses a row $i$;
Player 2 (the &ldquo;minimiser&rdquo;) chooses a column $j$; Player 1 receives payoff
$A_{ij}$ and Player 2 receives $-A_{ij}$. In a pure-strategy game, each
player selects a single action. In a <strong>mixed-strategy game</strong>, each player
chooses a probability distribution over their actions: Player 1 uses
$\mathbf{x} \in \Delta_m$ and Player 2 uses $\mathbf{y} \in \Delta_n$,
where $\Delta_k$ denotes the standard $(k-1)$-simplex.</p>
<p>The expected payoff to Player 1 under mixed strategies is:</p>
$$E(\mathbf{x}, \mathbf{y}) = \mathbf{x}^\top A\, \mathbf{y}.$$<p>Von Neumann&rsquo;s minimax theorem (1928) guarantees that:</p>
$$\max_{\mathbf{x} \in \Delta_m} \min_{\mathbf{y} \in \Delta_n}
\mathbf{x}^\top A\, \mathbf{y}
\;=\;
\min_{\mathbf{y} \in \Delta_n} \max_{\mathbf{x} \in \Delta_m}
\mathbf{x}^\top A\, \mathbf{y}
\;=\; v^*,$$<p>where $v^*$ is the <strong>value</strong> of the game. The pair $(\mathbf{x}^*,
\mathbf{y}^*)$ that achieves this saddle point is the Nash equilibrium:
neither player can improve their expected payoff by unilaterally deviating
from their equilibrium strategy.</p>
<p>In <em>Stratégie</em>, each conductor leads one orchestra. Each has nineteen
&ldquo;tactics&rdquo; — six basic musical textures (e.g., sustained chords, staccato
pizzicati, glissandi masses, silence) plus thirteen combinatorial tactics
that combine two or three of the basics. The payoff matrix is a
$19 \times 19$ integer matrix, also defined by Xenakis, specifying how
many points Conductor 1 scores when their orchestra plays tactic $i$ against
Conductor 2&rsquo;s tactic $j$. A referee tracks the score.</p>
<p>The conductors make decisions in real time during the performance, choosing
tactics based on what the other conductor is doing and on the evolving score.
The piece ends when one conductor reaches a predetermined score threshold.</p>
<p>The Nash equilibrium of the payoff matrix tells each conductor, in principle,
the optimal <em>distribution</em> over tactics to play: if both play optimally, the
expected score trajectory is determined. In practice, conductors are not
expected to compute mixed strategies on the podium; Xenakis&rsquo;s point is
structural. The game-theoretic formalism is used to design the payoff matrix
so that no tactic dominates — every choice has consequences that depend on
the opponent&rsquo;s choice — guaranteeing that the piece will always contain
genuine strategic tension regardless of who is conducting.</p>
<p><em>Duel</em> (1959) is the earlier, simpler version for two chamber orchestras.
<em>Stratégie</em> (1962) was premiered in April 1963 at the Venice Biennale with two conductors
competing live. The audience was aware of the game, of the score, and of
the payoff matrix. The premiere was by most accounts a success, though the
practical complications of running a zero-sum game in a concert hall
(including the question of whether conductors were actually computing Nash
equilibria or just following intuition) were never fully resolved.</p>
<hr>
<h2 id="formalized-music"><em>Formalized Music</em></h2>
<p>Xenakis assembled his theoretical framework in <em>Musiques formelles</em> (1963),
translated and expanded as <em>Formalized Music</em> (1971; revised edition 1992).
The book is one of the strangest documents in twentieth-century music theory:
part treatise, part manifesto, part mathematical appendix. It covers
stochastic composition, Markov chains, game theory, set theory, group theory,
and symbolic logic — all presented with the confidence of someone who is
equally at home in the engineering faculty and the concert hall, and with
the occasional obscurity of someone writing simultaneously for two audiences
who share almost no vocabulary.</p>
<p>The core argument is that musical composition can and should be treated as
the application of mathematical structures to sonic material, not because
mathematics makes music &ldquo;better&rdquo; but because mathematical structures are
the most powerful available tools for controlling relationships between
sounds at multiple scales simultaneously. The statistical distributions
control the macroscopic; the individual values vary within them. The game-
theoretic payoff matrix controls the strategic interaction; the individual
tactics fill in the details. Mathematics operates at the structural level
and leaves the acoustic surface free.</p>
<p>This is a different relationship between mathematics and music from the
ones in my earlier posts on <a href="/posts/messiaen-modes-group-theory/">group theory and Messiaen</a>
or <a href="/posts/euclidean-rhythms/">the Euclidean algorithm and world rhythms</a>.
In those cases, mathematics describes structure that already exists in the
music — structure the composers arrived at by ear. In Xenakis, mathematics
is the generative tool: the score is derived from the calculation.</p>
<hr>
<h2 id="what-the-analogy-does-and-does-not-do">What the Analogy Does and Does Not Do</h2>
<p>The Maxwell-Boltzmann analogy in <em>Pithoprakta</em> is exact in one direction
and approximate in another.</p>
<p>It is exact in the following sense: the glissando speeds Xenakis computed
for his 46 strings genuinely follow the Maxwell-Boltzmann distribution with
the parameters he chose. The score is a realisation of that distribution.
If you collect the glissando speeds from the score and plot their histogram,
you will find the characteristic $v^2 e^{-v^2/2a^2}$ shape.</p>
<p>It is approximate — or rather, it is analogical — in the sense that strings
in an orchestra are not molecules of a gas. They do not collide. They have
mass and inertia in a physical sense that has no direct mapping to
musical parameters. The temperature $T$ is not a temperature in any
thermodynamic sense; it is a compositional variable that Xenakis chose to
parameterise with the same symbol because the formal relationship is the
same. The analogy is structural, not ontological.</p>
<p>This is worth saying plainly because it is easy to be misled in both
directions: either to over-claim (the orchestra <em>is</em> a gas) or to dismiss
(the orchestra is <em>merely</em> labelled with physical vocabulary). The actual
claim is more modest and more interesting: the mathematical structure of the
Maxwell-Boltzmann distribution is the right tool for specifying a certain
kind of orchestral texture, namely one where individual elements vary
stochastically around a controlled macroscopic envelope. The physics
provides the formalism; the music provides the application. This is how
mathematics works in engineering, too.</p>
<hr>
<h2 id="the-centenary-and-what-remains">The Centenary and What Remains</h2>
<p>Xenakis died in 2001, by then partially deaf and with dementia. His centenary
in 2022 produced a wave of new performances, recordings, and scholarship
— including the <em>Meta-Xenakis</em> volume (Open Book Publishers, 2022), which
collects analyses of his compositional mathematics, his architectural work
(he designed the Philips Pavilion for Le Corbusier&rsquo;s Expo 58 in Brussels
using the same ruled-surface geometry he was using in <em>Metastaseis</em>), and
his political biography.</p>
<p>What remains resonant about his project is not the specific distributions
he chose — the Maxwell-Boltzmann is not the only or even necessarily the
best distribution for many musical applications — but the epistemological
position it represents. Xenakis insisted that the right question to ask
about a musical texture is not &ldquo;what is the note at beat 3 of bar 47?&rdquo; but
&ldquo;what is the distribution from which the events in this section are drawn?&rdquo;
This shift from individual determination to statistical control is precisely
the shift that makes thermodynamics possible as a science, and Xenakis was
the first composer to apply it deliberately and systematically.</p>
<p>When a composer writes &ldquo;let the
orchestra be a gas at temperature $T$&rdquo; and then actually computes the
consequences with Boltzmann&rsquo;s constant in front of him, I do not feel that
physics has been appropriated. I feel that it has been recognised — seen,
from a different direction, as the same thing it always was: a set of tools
for thinking about ensembles of interacting elements whose individual
behaviour is too complex to track but whose collective behaviour is not.</p>
<p>The orchestra is not a gas. But the Maxwell-Boltzmann distribution describes
it anyway.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Ames, C. (1989). The Markov process as a compositional model: A survey and
tutorial. <em>Leonardo</em>, 22(2), 175–187. <a href="https://doi.org/10.2307/1575226">https://doi.org/10.2307/1575226</a></p>
</li>
<li>
<p>Jedrzejewski, F. (2006). <em>Mathematical Theory of Music.</em> Delatour France /
IRCAM.</p>
</li>
<li>
<p>Nash, J. F. (1950). Equilibrium points in $n$-person games. <em>Proceedings of
the National Academy of Sciences</em>, 36(1), 48–49.
<a href="https://doi.org/10.1073/pnas.36.1.48">https://doi.org/10.1073/pnas.36.1.48</a></p>
</li>
<li>
<p>Nierhaus, G. (2009). <em>Algorithmic Composition: Paradigms of Automated Music
Generation.</em> Springer.</p>
</li>
<li>
<p>Matossian, N. (2005). <em>Xenakis</em> (revised ed.). Moufflon Publications.</p>
</li>
<li>
<p>Solomos, M. (Ed.). (2022). <em>Meta-Xenakis.</em> Open Book Publishers.
<a href="https://doi.org/10.11647/OBP.0313">https://doi.org/10.11647/OBP.0313</a></p>
</li>
<li>
<p>von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. <em>Mathematische
Annalen</em>, 100(1), 295–320. <a href="https://doi.org/10.1007/BF01448847">https://doi.org/10.1007/BF01448847</a></p>
</li>
<li>
<p>von Neumann, J., &amp; Morgenstern, O. (1944). <em>Theory of Games and Economic
Behavior.</em> Princeton University Press.</p>
</li>
<li>
<p>Xenakis, I. (1992). <em>Formalized Music: Thought and Mathematics in
Composition</em> (revised ed.). Pendragon Press.
(Originally published as <em>Musiques formelles</em>, La Revue Musicale, 1963.)</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-14</strong>: Corrected the description of <em>Stratégie</em> (1962): each conductor has nineteen tactics (six basic plus thirteen combinatorial), with a 19 x 19 payoff matrix — not six tactics and a 6 x 6 matrix. The six-tactic, 6 x 6 description applies to the earlier <em>Duel</em> (1959).</li>
<li><strong>2026-01-14</strong>: Added &ldquo;in April 1963&rdquo; to the <em>Stratégie</em> premiere sentence. The composition date is 1962; the premiere took place on 25 April 1963 at the Venice Biennale.</li>
<li><strong>2026-01-14</strong>: Changed &ldquo;studying briefly with Honegger&rdquo; to &ldquo;approaching Honegger (who was too ill to teach).&rdquo; Xenakis sought instruction from Honegger circa 1949, but Honegger was in declining health and did not take him as a student.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>From Oxide to Oversampling: The Physics of Recorded Sound</title>
      <link>https://sebastianspicker.github.io/posts/tape-saturation-delta-sigma-adc-physics/</link>
      <pubDate>Fri, 15 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/tape-saturation-delta-sigma-adc-physics/</guid>
      <description>&amp;lsquo;Analogue warmth&amp;rsquo; and &amp;lsquo;digital coldness&amp;rsquo; are not aesthetic preferences — they are different physics. Ferromagnetic hysteresis generates even harmonics. Delta-sigma modulators push quantisation noise to ultrasonic frequencies. Both effects are calculable.</description>
      <content:encoded><![CDATA[<p>There is an argument that has been running in recording studios since roughly 1982, when the first commercially mastered compact discs appeared. On one side: analogue tape has warmth, depth, something the ear likes. On the other: digital audio is more accurate, lower noise, the measurements say so. The argument produces more heat than light, because most participants treat it as an aesthetic question — a matter of feeling, taste, preference. It is not. The difference between tape and digital audio is a physics difference, and the physics is specific enough to calculate.</p>
<p>The physics here turns out to be some of my favourite kind: it sits at the intersection of condensed matter, signal processing, and Fourier analysis, and it connects directly to why certain sounds are perceived as pleasant. This post walks through both sides. Part I is the ferromagnetic physics of magnetic tape and the harmonic structure of saturation distortion. Part II is the delta-sigma modulator and the engineering trick that achieves 24-bit dynamic range from a 1-bit comparator. Neither side of the debate is as simple as its partisans claim, and the physics of both is more interesting than the aesthetics argument they have been stuck in for forty years.</p>
<hr>
<h2 id="part-i-the-physics-of-magnetic-tape">Part I: The Physics of Magnetic Tape</h2>
<h3 id="ferromagnetic-recording">Ferromagnetic Recording</h3>
<p>Magnetic recording tape is a thin polymer substrate coated with a layer of ferromagnetic particles suspended in a binder. For most of the twentieth century those particles were iron oxide — specifically $\gamma\text{-Fe}_2\text{O}_3$, gamma-phase ferric oxide — though chromium dioxide ($\text{CrO}_2$) and later metal-particle formulations with pure iron or iron-cobalt alloys were developed for higher coercivity and better high-frequency response. What all of these materials share is the key property of ferromagnetism: each particle is a small permanent magnet, a magnetic domain with a net magnetic moment that can be oriented by an external field and that will retain that orientation when the field is removed.</p>
<p>The recording process exploits this directly. The recording head is a toroidal electromagnet with a narrow gap. When audio-frequency current flows through the head&rsquo;s coil, the field at the gap follows the current, and as the tape moves past at a fixed speed, successive particles along the tape length are aligned according to the instantaneous field at the moment they pass the gap. The result is a spatial encoding of the time-domain audio signal along the tape. On playback, the inverse process occurs: the moving pattern of magnetised particles generates a time-varying flux in the playback head&rsquo;s core, which induces a voltage in the coil by Faraday&rsquo;s law, reproducing the original current waveform.</p>
<p>So far this description is entirely linear. The head current maps to a field, the field maps to a magnetisation, the magnetisation maps back to a voltage. If all three relationships were linear, tape would be a near-perfect recording medium — limited only by particle noise and head gap frequency response. The nonlinearity comes from the second relationship in that chain, and it comes from the fundamental physics of how ferromagnetic materials respond to an applied field.</p>
<h3 id="the-b-h-curve-and-hysteresis">The B-H Curve and Hysteresis</h3>
<p>The relationship between the applied magnetic field intensity $H$ (from the recording head, measured in A/m) and the resulting magnetic flux density $B$ in the tape (measured in tesla) is not linear. It follows a curve — actually a family of nested curves — known as the hysteresis loop, and its shape determines almost everything interesting about tape recording <a href="#ref-3">[3]</a>.</p>
<p>Starting from a demagnetised state and increasing $H$ from zero, the initial slope $dB/dH$ — the magnetic permeability $\mu$ — is relatively low. The domains in the material are oriented randomly and require a threshold of energy to begin reorienting. As $H$ increases further, the permeability rises, and there is a region of steep, approximately linear increase in $B$. Then, as $H$ continues to increase, the material saturates: progressively fewer unaligned domains remain, the slope falls, and eventually $dB/dH \to 0$ as all domains are aligned. The $B$-$H$ curve is S-shaped, and the saturation is irreversible in a specific sense: if you now reduce $H$ back toward zero, $B$ does not retrace the original path. It remains at a higher value — the remanence $B_r$ — and you must apply a reverse field of magnitude $H_c$, the coercivity, to bring $B$ back to zero. The loop formed by this cycle of magnetisation and demagnetisation is the hysteresis loop, and its area is proportional to the energy dissipated as heat per cycle.</p>
<p>The crucial feature for audio recording is what happens near the origin. A small audio signal, sitting near $H = 0$, does not experience a nicely linear region of the $B$-$H$ curve. The initial permeability is low, and there is an inflection point near zero: the slope increases as you move away from zero before the saturation region brings it back down again. This means that even at low recording levels, the transfer function from head current to tape magnetisation is nonlinear, and in a particular way — the response is symmetric under $H \to -H$, which means the dominant nonlinear term is even-order. Without some remedy, even a gentle sine wave would emerge from the playback head with significant even-harmonic content added. The signal would also sit in a region of the curve where the effective permeability depends on signal amplitude, making the recording level-dependent in an uncontrolled way. Something needed to be done about this, and the solution found in the 1940s is one of the more elegant pieces of applied physics in the history of the recording industry.</p>
<h3 id="the-bias-signal">The Bias Signal</h3>
<p>The solution is called AC bias, and its discovery is usually credited to Braunmühl and Weber at the German Reichs-Rundfunk-Gesellschaft around 1940, though there are earlier related patents. The idea is simple once stated: add a high-frequency signal — typically between 50 kHz and 150 kHz, well above the audio band — to the recording current before it drives the head. This bias signal has an amplitude large enough to drive the tape through multiple cycles of its B-H curve on each audio cycle, but it is filtered out of the playback signal by the tape&rsquo;s own limited high-frequency response and by subsequent low-pass filtering.</p>
<p>The effect on the recording process is to linearise the transfer function. The operating point is no longer stationary near the inflection point at $H = 0$. Instead, it rides up and down the B-H curve rapidly many times per audio period, driven by the bias. The audio signal merely modulates the envelope of this rapid oscillation. The net magnetisation that remains after the tape leaves the head gap is the time average of many rapid traversals of the hysteresis loop, and this average tracks the audio signal with good linearity provided the signal level is modest. The bias amplitude and frequency are tuned carefully for each tape formulation — too little bias and the linearisation is incomplete; too much and the signal is undermodulated and the high-frequency response suffers as the bias begins to erase fine spatial patterns written by high-frequency audio. Getting the bias right is part of the alignment procedure for every analogue tape machine and part of why different tape formulations require different machine settings.</p>
<p>The result, for moderate recording levels, is a remarkably clean and linear recording medium. The nonlinear character of the B-H curve is effectively tamed by the bias trick, and the remaining imperfections are mostly second-order: azimuth errors, print-through, head bump, self-demagnetisation at short wavelengths. For practical purposes, a well-aligned analogue tape machine at moderate recording levels is a linear system.</p>
<h3 id="harmonic-generation-at-high-levels">Harmonic Generation at High Levels</h3>
<p>At high recording levels — when the audio signal is large enough to push the operating point into the saturation region even after the bias has done its linearising work — the picture changes. The transfer function from input current to output magnetisation becomes genuinely nonlinear, and the harmonic content of the distortion becomes the central question.</p>
<p>The standard framework is a Taylor expansion of the transfer function around the operating point:</p>
$$y(t) = a_1 x(t) + a_2 x^2(t) + a_3 x^3(t) + a_4 x^4(t) + \cdots$$<p>where $x(t)$ is the input signal (the audio current), $y(t)$ is the output (the magnetisation recorded on tape), and the coefficients $a_n$ are determined by the shape of the B-H curve near saturation. For a pure tone $x(t) = A \sin(\omega t)$, the higher-order terms generate harmonics in a calculable way.</p>
<p>The second-order term gives:</p>
$$a_2 x^2(t) = a_2 A^2 \sin^2(\omega t) = \frac{a_2 A^2}{2}\bigl(1 - \cos 2\omega t\bigr)$$<p>This is a DC offset plus a component at $2\omega$ — the second harmonic, one octave above the fundamental.</p>
<p>The third-order term gives:</p>
$$a_3 x^3(t) = a_3 A^3 \sin^3(\omega t) = a_3 A^3 \left(\frac{3}{4}\sin\omega t - \frac{1}{4}\sin 3\omega t\right)$$<p>The $\frac{3}{4}$ piece adds to (or subtracts from) the fundamental depending on the sign of $a_3$; the $-\frac{1}{4}$ piece is a third harmonic at $3\omega$, one octave and a fifth above the fundamental.</p>
<p>Carrying through to fourth order:</p>
$$a_4 x^4(t) = \frac{a_4 A^4}{8}\bigl(3 - 4\cos 2\omega t + \cos 4\omega t\bigr)$$<p>which contributes additional DC, a component at $2\omega$, and a fourth harmonic at $4\omega$.</p>
<p>Collecting the terms through fourth order, the output is approximately:</p>
$$y(t) \approx \left(a_1 + \frac{3a_3 A^2}{4}\right)A\sin\omega t - \frac{a_2 A^2}{2}\cos 2\omega t - \frac{a_3 A^3}{4}\sin 3\omega t + \cdots$$<p>The important observation is about which harmonics dominate and what they sound like. The B-H curve of a ferromagnetic material near saturation is approximately symmetric: the saturation behaviour for positive $H$ mirrors that for negative $H$. A symmetric nonlinearity has $a_2 = a_4 = 0$ (all even coefficients vanish by symmetry), and only odd harmonics are generated. But at moderate levels, just before full saturation, the symmetry of the B-H loop as traversed by the biased signal is not perfect, and the even-order terms are nonzero — though small. This gives tape its characteristic distortion signature: at moderate saturation levels, the even harmonics ($2\omega$, $4\omega$) dominate; at heavy saturation, the odd harmonics ($3\omega$, $5\omega$) appear more strongly.</p>
<p>The perceptual consequence of this is the crux of the &ldquo;analogue warmth&rdquo; story. The second harmonic is the octave of the fundamental. The fourth harmonic is the double octave. These are, in Western harmonic practice and in the physics of vibrating strings, the most consonant possible intervals. Adding even harmonics at low amplitude to a fundamental makes the sound fuller and richer without introducing beating or dissonance. Odd harmonics — particularly the fifth (at $5\omega$, a major third above the double octave) and the seventh (a flattened seventh above the double octave) — are less consonant relative to the fundamental and at high amplitude produce the harsh, buzzy character associated with heavy distortion or the deliberate aggression of a fuzz pedal.</p>
<p>There is one more effect worth naming: the saturation is a soft knee. The B-H curve does not have a sharp corner at saturation — it curves gradually from the linear region into the flat-topped saturation region. This means that transient signals — percussive attacks, consonant onsets — that briefly exceed the nominal recording level are not hard-clipped but gently compressed. Their peaks are rounded by the shape of the B-H curve. Engineers and producers who record through tape often describe this as the machine &ldquo;breathing&rdquo; or as a pleasing &ldquo;gluing&rdquo; of transients. The physics is simple: the soft-knee transfer function applies more gain reduction to instantaneous peaks than to the sustained body of the signal, functioning as a fast, musically transparent dynamic compressor for any material that approaches saturation.</p>
<hr>
<h2 id="part-ii-the-physics-of-delta-sigma-conversion">Part II: The Physics of Delta-Sigma Conversion</h2>
<h3 id="nyquist-rate-adc-and-its-limits">Nyquist-Rate ADC and Its Limits</h3>
<p>The straightforward approach to analogue-to-digital audio conversion samples the signal at a rate just above twice the highest audio frequency — the Nyquist rate — using a quantiser with enough bits to achieve the desired dynamic range. For CD-quality audio, the sampling rate is 44.1 kHz (slightly above $2 \times 20{,}000$ Hz) and the word length is 16 bits. The dynamic range of a $b$-bit PCM system is, to a good approximation:</p>
$$\text{SNR} \approx 6.02b + 1.76 \text{ dB}$$<p>so 16 bits gives approximately $6.02 \times 16 + 1.76 \approx 98$ dB, which matches the dynamic range of the best analogue tape and is well above the approximately 70 dB achievable with the noise floor of typical studio tape at 15 ips <a href="#ref-4">[4]</a>.</p>
<p>The engineering problem with a straightforward Nyquist-rate ADC is the anti-aliasing filter. Before sampling, all content above $f_s/2 = 22.05$ kHz must be removed. If it is not, energy at frequency $f > f_s/2$ aliases into the audio band as a spurious component at $f_s - f$, which is inaudible in origin but very much audible in its alias. To achieve 98 dB of alias suppression — matching the 16-bit dynamic range — the filter must attenuate signals at 22.05 kHz by 98 dB relative to signals at 20 kHz. The transition band is only 2.05 kHz wide. That requires a very high-order analogue filter — typically seventh-order elliptic or Chebyshev — and such filters have significant phase distortion within the audio band, particularly at frequencies near the passband edge. In 1982, building this filter precisely, cheaply, and repeatably in consumer hardware was a genuine engineering challenge. The filters introduced audible phase and amplitude ripple that the original measurements had not anticipated and that contributed to early criticisms of the CD sound.</p>
<h3 id="oversampling">Oversampling</h3>
<p>The delta-sigma ($\Sigma\Delta$) ADC architecture was developed to sidestep the steep-filter problem entirely, and its adoption in consumer audio from the late 1980s onwards largely resolved the anti-aliasing filter debate <a href="#ref-1">[1]</a>. The core idea is oversampling: instead of sampling at 44.1 kHz with 16 bits, the $\Sigma\Delta$ converter samples at $M \times 44.1$ kHz — where $M$ is the oversampling ratio, typically 64 in early audio converters, giving $64 \times 44.1 = 2.8224$ MHz — with a 1-bit quantiser. The anti-aliasing filter now needs to attenuate everything above 1.4112 MHz before sampling. Its transition band runs from 20 kHz to 1.4112 MHz, a ratio of roughly 70:1. This is easy: a simple, cheap, first- or second-order RC filter suffices, with negligible phase distortion anywhere in the audio band. The price paid is that the quantiser is now only 1 bit, and a 1-bit quantiser has terrible resolution on its own.</p>
<p>To understand what oversampling buys even before any clever signal processing, consider the quantisation noise floor. For a uniform quantiser with step size $\Delta$, the quantisation noise power is $P_q = \Delta^2/12$, and this noise is spread approximately uniformly from 0 to $f_s/2$. The noise power spectral density is $P_q / (f_s/2)$. After oversampling by a factor of $M$ — so that the effective Nyquist band runs from 0 to $f_{\text{audio}} = f_s/(2M)$ — the in-band noise power is:</p>
$$P_{\text{in-band}} = \frac{P_q}{f_s/2} \cdot f_{\text{audio}} = \frac{P_q}{f_s/2} \cdot \frac{f_s}{2M} = \frac{P_q}{M}$$<p>Each doubling of $M$ halves the in-band noise power, an improvement of 3 dB, equivalent to half a bit of resolution. At 64× oversampling this gives 18 dB, or three extra bits — useful, but not enough to get from a 1-bit quantiser to 16-bit performance. We need something more.</p>
<h3 id="noise-shaping">Noise Shaping</h3>
<p>The second ingredient — and the one that makes $\Sigma\Delta$ conversion genuinely remarkable — is noise shaping. Rather than spreading quantisation noise uniformly in frequency, we can engineer its spectral distribution so that almost all the noise power sits above the audio band, where it is removed by a digital low-pass filter (the decimation filter) at the output.</p>
<p>A first-order $\Sigma\Delta$ modulator achieves this by a feedback loop. At each sample step, the quantiser takes the running integral of the difference between the input signal and the previous quantised output. More precisely: the quantisation error $e_n = y_n - \hat{x}_n$ (where $\hat{x}_n$ is the input to the quantiser and $y_n$ is the 1-bit output) is fed back and subtracted from the next input before integration. This is the integrator-feedback structure that gives the modulator its name: $\Sigma$ for the integrating summation, $\Delta$ for the difference.</p>
<p>In the $z$-domain, this feedback structure gives the quantisation noise a transfer function of:</p>
$$N(z) = 1 - z^{-1}$$<p>that is, the noise at time $n$ is the current error minus the previous error — a first-difference operation. In the frequency domain, substituting $z = e^{j 2\pi f / f_s}$:</p>
$$\bigl|N(f)\bigr|^2 = \left|1 - e^{-j 2\pi f / f_s}\right|^2 = 4\sin^2\!\left(\frac{\pi f}{f_s}\right)$$<p>For frequencies well below the sampling rate, $f \ll f_s$, the small-angle approximation gives:</p>
$$\bigl|N(f)\bigr|^2 \approx \left(\frac{2\pi f}{f_s}\right)^2$$<p>The noise power spectral density rises as $f^2$ — it is heavily suppressed at low frequencies and pushed up toward $f_s/2$. Integrating this shaped noise over the audio band $[0, f_{\text{audio}}]$ and comparing to the flat-spectrum case, the in-band SNR improvement for a first-order modulator scales as $M^3$ rather than $M^1$: every doubling of oversampling ratio gives 9 dB improvement (1.5 bits) instead of 3 dB. At 64× oversampling — six doublings — a first-order modulator recovers approximately 54 dB, or 9 effective bits.</p>
<p>A second-order modulator applies the noise-shaping filter twice, giving $|N(f)|^2 \propto f^4$ and an SNR gain scaling as $M^5$: 15 dB per octave of oversampling. At 64× — again six doublings — this recovers approximately 90 dB, or 15 effective bits. Modern high-performance audio ADCs use fifth- to seventh-order modulators operating at 128× oversampling or higher. The in-band noise floor drops to levels corresponding to 20–24 effective bits — entirely from a 1-bit hardware comparator, with all the resolution coming from the noise shaping and the subsequent digital decimation filter.</p>
<p>The following table illustrates the SNR gain achievable at practical oversampling ratios:</p>
<table>
  <thead>
      <tr>
          <th>Modulator order</th>
          <th>Oversampling ratio</th>
          <th>SNR gain</th>
          <th>Effective bits gained</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1st order</td>
          <td>64×</td>
          <td>54 dB</td>
          <td>9</td>
      </tr>
      <tr>
          <td>2nd order</td>
          <td>64×</td>
          <td>90 dB</td>
          <td>15</td>
      </tr>
      <tr>
          <td>5th order</td>
          <td>128×</td>
          <td>~120 dB</td>
          <td>~20</td>
      </tr>
  </tbody>
</table>
<p>The 5th-order row deserves a moment&rsquo;s attention. A single-bit comparator — a device that outputs only 1 or 0, with no analogue subtlety whatsoever — combined with oversampling and noise shaping, achieves the resolution of a 20-bit Nyquist-rate ADC and is doing so using a simple digital feedback loop and an analogue integrator that can be fabricated cheaply on a CMOS chip. This is, I think, one of the more quietly stunning pieces of engineering in consumer electronics, and it goes entirely unnoticed because the CD player it lives inside is now considered mundane.</p>
<p>There is a subtlety worth adding for completeness. Real $\Sigma\Delta$ modulators of order three and above are potentially unstable — the noise-shaping loop can become unstable for large input signals, producing limit cycles or tonal artefacts. Managing this stability is a significant part of the design problem and involves either restricting the input range, adding nonlinear stability control, or using multi-bit internal quantisers (which reduce the quantisation step and ease the stability constraint while retaining most of the noise-shaping benefit). The multi-bit approach also addresses a related issue: the ideal 1-bit DAC in the feedback loop is inherently linear (there are only two levels, so there is no differential nonlinearity), but multi-bit internal DACs must be trimmed or calibrated to avoid nonlinearity in the feedback path corrupting the noise shaping. These engineering details are discussed thoroughly in Norsworthy, Schreier, and Temes <a href="#ref-5">[5]</a>, which remains the standard reference.</p>
<p>The digital audio infrastructure that delta-sigma conversion enabled — clean, cheap, phase-linear converters without steep analogue filters — also made digital audio workable in latency-sensitive applications like live performance. For a discussion of why latency matters so much in network music performance and how it shapes system design, see my earlier post on <a href="/posts/nmp-latency-lola-mvtp/">NMP latency and the physics of musical timing</a>.</p>
<hr>
<h2 id="the-irony-of-the-comparison">The Irony of the Comparison</h2>
<p>Both tape saturation and delta-sigma conversion are, at root, about the same problem: how to manage the relationship between a signal and the finite resolution of the medium storing it. Tape manages the problem physically and somewhat accidentally — the ferromagnetic B-H curve happens to generate even harmonics that are consonant with the recorded signal, and the bias trick linearises the response well enough that the distortion only becomes audible when the engineer deliberately pushes into saturation. Delta-sigma manages the problem mathematically and deliberately — quantisation noise is redistributed in frequency by a designed feedback loop so that it falls outside the audible band.</p>
<p>Neither approach is perfect, and neither is neutral. Tape adds signal-correlated harmonic distortion whose spectral content depends on recording level and which compresses transients in a way that changes the perceived dynamics. Digital audio, even with delta-sigma conversion, has its own imperfections: idle-channel noise from the modulator, potential for tonal limit-cycle artefacts at specific input levels, and the abrupt onset of hard clipping at full scale — which, unlike tape saturation, is symmetrical and rapid and adds all harmonics simultaneously, giving the harsh, unpleasant character that digital overloads are known for. The soft-knee vs. hard-clip distinction is real and audible, and it is probably the most defensible technical basis for the claim that analogue tape handles transient overloads more graciously.</p>
<p>What is not defensible is the claim that one medium is inherently more musical than the other, or that digital audio lacks something fundamental that tape possesses. They are differently imperfect. The imperfections of tape happen to sit at harmonic relationships that Western ears, shaped by a tradition of music built on those same harmonic intervals, find pleasing. The imperfections of digital audio are not at pleasing harmonic intervals; they are wideband quantisation noise (before shaping) or ultrasonic shaped noise (after), and a sharp cliff at full scale. Different physics, different perceptual character.</p>
<hr>
<h2 id="a-personal-note">A Personal Note</h2>
<p>I spent a long time thinking the tape versus digital debate was mostly audiophile mythology — a community of enthusiasts rationalising the warmth of nostalgia as the warmth of oxide particles. The physics is more interesting than that, and doing the calculation changed my view. The second-harmonic content of tape saturation is not an accident or a romantic story; it is what you get when you push a symmetric nonlinearity with an audio sine wave, and the reason it sounds pleasant is not arbitrary but is grounded in the physics of consonance and the harmonic series. The delta-sigma converter is not a mundane commodity chip but a genuinely elegant solution to an otherwise intractable filter-design problem, and the fact that it achieves 24-bit resolution from a 1-bit comparator by spectral redistribution of noise is the kind of result that should get more attention in physics education.</p>
<p>Both technologies deserve better than the aesthetics argument they have been fighting in for forty years. The tools to understand them are not exotic — Taylor series, Fourier analysis, the z-transform, and the basic physics of ferromagnetism — and the reward is a clear-eyed picture of what is actually going on inside two of the most consequential inventions in the history of recorded music. If you are interested in related mathematics underlying other aspects of music, the posts on <a href="/posts/euclidean-rhythms/">Euclidean rhythms</a> and <a href="/posts/messiaen-modes-group-theory/">Messiaen&rsquo;s modes and group theory</a> cover the combinatorial and algebraic structures in rhythm and pitch that sit alongside the physics discussed here.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Candy, J. C., &amp; Temes, G. C. (Eds.). (1992). <em>Oversampling Delta-Sigma Data Converters: Theory, Design, and Simulation</em>. IEEE Press.</p>
<p><span id="ref-2"></span>[2] Reiss, J. D., &amp; McPherson, A. (2015). <em>Audio Effects: Theory, Implementation and Application</em>. CRC Press.</p>
<p><span id="ref-3"></span>[3] Bertram, H. N. (1994). <em>Theory of Magnetic Recording</em>. Cambridge University Press.</p>
<p><span id="ref-4"></span>[4] Pohlmann, K. C. (2010). <em>Principles of Digital Audio</em> (6th ed.). McGraw-Hill.</p>
<p><span id="ref-5"></span>[5] Norsworthy, S. R., Schreier, R., &amp; Temes, G. C. (Eds.). (1997). <em>Delta-Sigma Data Converters: Theory, Design, and Simulation</em>. IEEE Press.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-14</strong>: Updated the interval description for the 7th harmonic to &ldquo;above the double octave.&rdquo; The 7th harmonic (7f) sits between the double octave (4f) and the triple octave (8f).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Cat&#39;s Eye: Slit Pupils, Thin-Film Mirrors, and 135-Fold Dynamic Range</title>
      <link>https://sebastianspicker.github.io/posts/cat-eyes-slit-pupils-tapetum/</link>
      <pubDate>Mon, 23 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/cat-eyes-slit-pupils-tapetum/</guid>
      <description>A cat&amp;rsquo;s eye contains two distinct optical technologies that human engineers have copied — one consciously, one not. The slit pupil achieves a dynamic range of 135:1 in light transmission, nearly ten times that of the human circular pupil. The tapetum lucidum is a multilayer thin-film reflector of crystalline rodlets, producing constructive interference at the peak of scotopic sensitivity and sending light through the retina twice. Banks et al. (Science Advances, 2015) showed why the slit geometry specifically evolved in ambush predators; Percy Shaw&amp;rsquo;s 1934 Catseye road reflector borrowed the principle directly.</description>
      <content:encoded><![CDATA[<p><em>Flash photography of cats produces glowing eyes. This is familiar enough that
most people do not find it strange. But the physics that produces it — a
biological multilayer interference reflector built from crystalline rodlets of
riboflavin and zinc, tuned to the peak of night-vision sensitivity, sending returning photons through
the retina for a second pass — is not familiar at all. I started thinking about
this after photographing our cats at dusk — through the doorway; they are indoor
cats now, for health reasons — and finding their eyes lit up a colour
that depends on the angle: greenish from straight ahead, golden from the side.
The angle-dependence is a direct consequence of the thin-film interference
condition, and the different colours correspond to different constructive
interference wavelengths at different angles of incidence.</em></p>
<p><em>The eye contains two optical solutions — pupil geometry and tapetum — that
address different aspects of the same problem: how to function across a very
large range of light levels, from bright midday sun to the dim luminance of a
starlit field.</em></p>
<hr>
<h2 id="the-dynamic-range-problem">The Dynamic Range Problem</h2>
<p>A crepuscular predator — active around dawn and dusk — must function visually
across a light-level range of roughly $10^8$:$1$. The sun on a bright day
produces retinal illuminance of around $10^5\,\mathrm{photons}/(\mu\mathrm{m}^2\cdot\mathrm{s})$;
a moonless night produces roughly $10^{-3}$ in the same units. The ratio is
approximately $10^8$.</p>
<p>The pupil is the variable aperture that controls how much light reaches the
retina. The larger the pupil area, the more light admitted; the smaller the
area, the less. For the human eye, the pupil diameter ranges from approximately
$2\,\mathrm{mm}$ (bright light) to $8\,\mathrm{mm}$ (darkness), giving a
maximum area ratio of:</p>
$$\frac{A_\mathrm{max}}{A_\mathrm{min}} = \left(\frac{8}{2}\right)^2 = 16.$$<p>This is a dynamic range of 16:1 from the pupil alone. The remaining
$10^8 / 16 \approx 6 \times 10^6$ factor in adaptation comes from neural
and photochemical mechanisms in the retina itself (photopigment bleaching,
dark adaptation of rods vs. cones, lateral inhibition).</p>
<p>For a domestic cat, the same measurement gives something different.</p>
<hr>
<h2 id="the-slit-pupil-1351-dynamic-range">The Slit Pupil: 135:1 Dynamic Range</h2>
<p>Banks, Sprague, Schmoll, Parnell, and Love published &ldquo;Why do animal eyes have
pupils of different shapes?&rdquo; in <em>Science Advances</em> in 2015 (1:7, e1500391).
They analysed pupil shape and size data from 214 terrestrial species and
correlated pupil geometry with ecological niche.</p>
<p>Their principal finding for slit pupils: the domestic cat pupil, a vertical
slit, achieves an area ratio of approximately <strong>135:1</strong> between maximum dilation
and maximum constriction. Numerically:</p>
$$\frac{A_\mathrm{max}}{A_\mathrm{min}} \approx 135.$$<p>The mechanism that makes this possible is geometrical. A circular pupil&rsquo;s
minimum area is limited by diffraction: constricting a circular aperture below
about $2\,\mathrm{mm}$ diameter produces diffraction rings that degrade image
quality. A slit, by contrast, can be made arbitrarily narrow in one direction
while retaining a larger dimension in the other, limiting diffraction in only
one axis. The vertical slit in a cat pupil can constrict to a width of
$\sim 0.3\,\mathrm{mm}$ while retaining a height of $\sim 15\,\mathrm{mm}$,
giving an area of roughly $0.3 \times 15 / (3.14 \times (8/2)^2) \times
A_\mathrm{max}$ — approximately 135 times smaller.</p>
<p>The 135:1 ratio is nearly <strong>nine times</strong> the dynamic range achievable by the
human circular pupil (16:1). This allows the cat&rsquo;s pupil to do substantially
more of the work of light adaptation, reducing the load on the slower neural
and photochemical mechanisms.</p>
<hr>
<h2 id="why-vertical-the-ecological-correlation">Why Vertical? The Ecological Correlation</h2>
<p>Banks et al. found a striking correlation between pupil geometry and predator
ecology:</p>
<ul>
<li><strong>Vertical slit pupils</strong> correlate with <em>ambush predators whose eyes are
close to the ground</em> — animals with shoulder height below approximately
$42\,\mathrm{cm}$.</li>
<li><strong>Horizontal slit pupils</strong> correlate with <em>prey animals and grazing
herbivores</em> (horses, goats, sheep, deer). The horizontal slit, when the
animal lowers its head to graze, rotates to remain approximately horizontal
(the eye counterrotates in the orbit), providing a wide panoramic field
of view for detecting approaching predators.</li>
<li><strong>Circular pupils</strong> correlate with <em>pursuit predators</em> (humans, dogs, large
raptors) that hunt at larger distances where the precise vertical depth
cues provided by the slit geometry are less critical.</li>
</ul>
<p>The functional advantage of a <strong>vertical slit for a low-to-the-ground ambush
predator</strong> is depth estimation by <em>blur circles</em>. The slit geometry produces
strong defocus blur in the horizontal direction but sharp focus in the vertical
direction. An ambush predator lying in grass needs to estimate the horizontal
distance to prey accurately; the defocus differential between horizontal and
vertical blur provides a stereoscopic-like depth cue even with one eye. This
is a form of <strong>astigmatic blur ranging</strong>: the degree of horizontal blur for a
given focal setting encodes the object&rsquo;s distance.</p>
<p>The correlation across 214 species is not perfect, but it is statistically
robust: slit pupils in ground-level ambush predators is not coincidence, it is
selection pressure.</p>
<hr>
<h2 id="the-tapetum-lucidum-a-biological-dielectric-mirror">The Tapetum Lucidum: A Biological Dielectric Mirror</h2>
<p>Behind the retina, most nocturnal and crepuscular mammals possess a reflective
layer called the <em>tapetum lucidum</em> (literally: &ldquo;bright carpet&rdquo;). Light that
passes through the retina without being absorbed by a photoreceptor strikes
the tapetum and is reflected back through the retina for a second absorption
opportunity. This roughly doubles the effective optical path length through
the photoreceptor layer, substantially increasing the probability of photon
capture at low light levels.</p>
<p>The cat tapetum is a <strong>tapetum cellulosum</strong>: a layer of specialised cells
whose cytoplasm contains dense arrays of rod-shaped crystalline inclusions
composed primarily of riboflavin (vitamin B$_2$) and zinc. (This is distinct
from the guanine-crystal tapeta found in fish and some reptiles.) The
crystalline rodlets have a refractive index of approximately $n_1 \approx 1.8$;
they alternate with layers of cytoplasm with refractive index $n_2 \approx
1.33$ (close to water). The rodlet arrays form a multilayer thin-film
reflector.</p>
<hr>
<h2 id="thin-film-interference-the-physics-of-the-reflection">Thin-Film Interference: The Physics of the Reflection</h2>
<p>The physics of the tapetum is identical to the physics of anti-reflection
coatings on camera lenses and dielectric mirrors in laser cavities.</p>
<p>Consider a single thin film of thickness $d$ and refractive index $n_1$
embedded between media of index $n_2 < n_1$. Light of wavelength $\lambda$
(in vacuum) incident at angle $\theta$ to the normal undergoes partial
reflection at both interfaces. The two reflected beams interfere
constructively when their optical path difference is a multiple of the
wavelength:</p>
$$\Delta = 2 n_1 d \cos\theta = m\lambda, \quad m = 1, 2, 3, \ldots$$<p>For the tapetum, typical rodlet diameter is $d \approx 100$–$120\,\mathrm{nm}$.
With $n_1 \approx 1.8$ and $\theta \approx 0°$ (normal incidence), the first
constructive interference maximum for a single layer occurs at:</p>
$$\lambda_\mathrm{peak} = 2 n_1 d = 2 \times 1.8 \times 100\,\mathrm{nm}
\approx 360\,\mathrm{nm}.$$<p>Wait — that is in the ultraviolet. The tapetum must have multiple layers.</p>
<p>For a stack of $N$ rodlet layers, the reflectance is strongly enhanced
(approaching unity for large $N$) and the peak wavelength of the fundamental
reflection maximum shifts. The relevant periodicity is the combined optical
thickness of one rodlet layer plus one cytoplasm layer:</p>
$$d_\mathrm{eff} = n_1 d_1 + n_2 d_2,$$<p>where $d_1 \approx 100\,\mathrm{nm}$ is the rodlet diameter and
$d_2 \approx 50$–$100\,\mathrm{nm}$ is the cytoplasm spacing. Taking
$d_2 \approx 60\,\mathrm{nm}$:</p>
$$d_\mathrm{eff} = 1.8 \times 100 + 1.33 \times 60 \approx 180 + 80
= 260\,\mathrm{nm}.$$<p>Constructive interference (quarter-wave condition for a multilayer stack) at
$m = 1$:</p>
$$\lambda_\mathrm{peak} = 2 d_\mathrm{eff} \approx 520\,\mathrm{nm}.$$<p>This is green — close to the peak of the scotopic (rod) sensitivity
curve at $\lambda_\mathrm{max,rod} = 498\,\mathrm{nm}$. The tapetum is tuned
to reflect the wavelengths that the night-vision photoreceptors are most
sensitive to. (The exact peak depends on rodlet spacing, which varies across
the tapetum; this produces the observed variation from green to yellow.)</p>
<p>The angle-dependence of the peak wavelength follows from the interference
condition: at angle $\theta$ to the normal, $\lambda_\mathrm{peak}(\theta)
= 2 d_\mathrm{eff} \cos\theta$. At $\theta = 30°$, $\cos 30° \approx 0.87$,
giving $\lambda_\mathrm{peak} \approx 450\,\mathrm{nm}$ — blue. At
$\theta = 60°$, $\cos 60° = 0.5$, giving $\lambda \approx 260\,\mathrm{nm}$ —
ultraviolet, invisible. The colour of eyeshine in a flash photograph therefore
depends on the angle between the camera and the eye, exactly as observed.</p>
<hr>
<h2 id="reflectance-of-a-multilayer-stack">Reflectance of a Multilayer Stack</h2>
<p>For $N$ identical bilayers (each of optical thickness $n_1 d_1 + n_2 d_2$),
the reflectance at the design wavelength is given by the transfer matrix
method. For the cat tapetum with $N \approx 10$–$15$ bilayers:</p>
$$R = \left(\frac{1 - (n_2/n_1)^{2N}}{1 + (n_2/n_1)^{2N}}\right)^2
\approx 1 - 4\left(\frac{n_2}{n_1}\right)^{2N}.$$<p>With $n_2/n_1 = 1.33/1.8 \approx 0.739$ and $N = 15$:</p>
$$(0.739)^{30} \approx 1.1 \times 10^{-4}.$$<p>The reflectance is approximately $1 - 4 \times 1.1 \times 10^{-4} \approx
0.9996$ — essentially $100\%$ at the design wavelength for a sufficiently thick
stack. The tapetum is a near-perfect reflector in a narrow wavelength band,
a biological dielectric mirror.</p>
<hr>
<h2 id="photon-statistics-at-low-light">Photon Statistics at Low Light</h2>
<p>The tapetum&rsquo;s function becomes clearest when framed in terms of photon
statistics. A single rod photoreceptor has an absorption probability of
approximately $\eta_\mathrm{single} \approx 25\%$ for a photon passing through
it once at $\lambda = 500\,\mathrm{nm}$.</p>
<p>With the tapetum reflecting the photon back for a second pass, the total
absorption probability becomes:</p>
$$\eta_\mathrm{total} = \eta + (1 - \eta)\, R\, \eta,$$<p>where $R \approx 1$ is the tapetum reflectance. For $\eta = 0.25$ and $R =
0.98$:</p>
$$\eta_\mathrm{total} = 0.25 + (0.75)(0.98)(0.25) = 0.25 + 0.184 \approx 0.43.$$<p>The double pass increases the photon detection efficiency from $25\%$ to
approximately $43\%$ — a factor of $1.7\times$.</p>
<p>At extremely low light levels, photon detection becomes a counting problem
governed by Poisson statistics. If a mean of $\bar{n}$ photons reaches a
single photoreceptor per integration time, the probability of detecting at
least one photon (and hence registering the presence of light) is:</p>
$$P(\text{detection}) = 1 - e^{-\bar{n}\,\eta_\mathrm{total}}.$$<p>For very dim stimuli where $\bar{n} \approx 1$–$3$ photons per rod per
integration time (close to the absolute threshold of cat vision at around
$7 \times 10^{-7}\,\mathrm{lux}$), increasing $\eta$ by a factor of $\sim
1.7$ has a significant effect on detection probability. The tapetum is not a
luxury
at low light levels; it is a biophysical necessity for sub-threshold light
detection.</p>
<hr>
<h2 id="percy-shaw-and-the-road-catseye">Percy Shaw and the Road Catseye</h2>
<p>In 1934, Percy Shaw, a road-mender from Halifax, applied for a British patent
for a retroreflective road stud that he called the &ldquo;Catseye.&rdquo; Shaw&rsquo;s stated
inspiration was the reflection of his car headlights from a cat&rsquo;s eyes while
driving on an unlit road at night. Whether this story is entirely accurate is
unclear, but the name and the inspiration are both documented in period sources.</p>
<p>Shaw&rsquo;s device uses a different retroreflection mechanism from the tapetum. The
tapetum produces specular (mirror-like) reflection in the back-focal plane of
the eye&rsquo;s lens — light returning along its incident path because the lens
refocuses it. Shaw&rsquo;s Catseye uses glass hemisphere retroreflectors (or, in
later versions, corner-cube retroreflectors) that return light toward its
source by total internal reflection rather than thin-film interference.</p>
<p>The corner-cube geometry guarantees retroreflection: any ray entering a trihedral
corner (three mutually perpendicular surfaces) reflects from all three surfaces
and exits parallel to the incident direction, regardless of the angle of
incidence. The mathematical proof is that the product of three reflections in
mutually perpendicular planes is the identity transformation on vectors up to
a sign change — the direction vector $\hat{v}$ exits as $-\hat{v}$, which is
exactly retroreflection.</p>
$$\hat{v}_\mathrm{out} = -\hat{v}_\mathrm{in}.$$<p>Shaw&rsquo;s road Catseye became standard equipment on British roads during the Second World War,
credited with a significant reduction in road fatalities during blackouts and
foggy conditions. The biological original was a multilayer interference mirror;
the engineering copy is a corner-cube retroreflector. Different physics, same
function, same name.</p>
<hr>
<h2 id="two-optical-solutions-to-one-problem">Two Optical Solutions to One Problem</h2>
<p>The cat&rsquo;s eye contains two distinct optical technologies:</p>
<ol>
<li>
<p><strong>The slit pupil</strong> — a variable aperture with 135:1 dynamic range, optimised
for depth estimation by astigmatic blur in a low-to-the-ground ambush predator.</p>
</li>
<li>
<p><strong>The tapetum lucidum</strong> — a multilayer thin-film reflector of riboflavin
crystalline rodlets, tuned to the scotopic sensitivity peak, achieving
near-100% reflectance at design wavelength and increasing photon detection
efficiency by a factor of approximately $1.7\times$.</p>
</li>
</ol>
<p>Both solutions were arrived at by natural selection over millions of years of
low-light hunting. Both have been copied — one consciously (Shaw&rsquo;s road
reflectors), one as a model for engineered multilayer reflectors in telescopes,
laser cavities, and narrowband optical filters.</p>
<p>When I photograph our cats at dusk and their eyes glow green, I am seeing
the thin-film interference of a biological photonic crystal — riboflavin
rodlets in cytoplasm — wavelength-selected to send green photons back through
rod cells for a second chance at absorption.
The green is not cosmetic. It is functional, and it is physics.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Banks, M.S., Sprague, W.W., Schmoll, J., Parnell, J.A.Q., &amp; Love, G.D.
(2015). Why do animal eyes have pupils of different shapes? <em>Science Advances</em>,
1(7), e1500391. <a href="https://doi.org/10.1126/sciadv.1500391">https://doi.org/10.1126/sciadv.1500391</a></p>
</li>
<li>
<p>Ollivier, F.J., Samuelson, D.A., Brooks, D.E., Lewis, P.A., Kallberg, M.E.,
&amp; Komaromy, A.M. (2004). Comparative morphology of the tapetum lucidum
(among selected species). <em>Veterinary Ophthalmology</em>, 7(1), 11–22.
<a href="https://doi.org/10.1111/j.1463-5224.2004.00318.x">https://doi.org/10.1111/j.1463-5224.2004.00318.x</a></p>
</li>
<li>
<p>Born, M., &amp; Wolf, E. (1999). <em>Principles of Optics</em> (7th ed.). Cambridge
University Press. (Chapters 1, 7 on thin-film interference and multilayer
coatings.)</p>
</li>
<li>
<p>Shaw, P. (1934). <em>Improvements in Studs for Roads and like Surfaces.</em> British
Patent 436,290. Applied 3 April 1934.</p>
</li>
<li>
<p>Warrant, E.J. (1999). Seeing better at night: Life style, eye design and the
optimum strategy of spatial and temporal summation. <em>Vision Research</em>, 39(9),
1611–1630. <a href="https://doi.org/10.1016/S0042-6989(98)00262-4">https://doi.org/10.1016/S0042-6989(98)00262-4</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-15</strong>: Corrected the adoption date of Percy Shaw&rsquo;s road Catseyes from &ldquo;from 1945 onward&rdquo; to &ldquo;during the Second World War&rdquo; (widespread adoption began under wartime blackout conditions, not after the war ended). Removed the Machan, Gu, &amp; Bharthuar (2020) reference, which could not be confirmed in available databases.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>From Thought Experiment to Qubit: Schrödinger&#39;s Cat at Ninety</title>
      <link>https://sebastianspicker.github.io/posts/schrodinger-cat-qubits/</link>
      <pubDate>Mon, 27 Jan 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/schrodinger-cat-qubits/</guid>
      <description>In 1935, Schrödinger introduced the cat as a reductio ad absurdum of quantum superposition. Ninety years later, &amp;ldquo;cat states&amp;rdquo; — superpositions of coherent states with opposite phases — are a practical tool in quantum computing. Bosonic cat qubits have bit-flip times exceeding minutes, scaling exponentially with photon number, and are among the leading architectures for fault-tolerant quantum computation. The cat is no longer a paradox. It is a qubit.</description>
      <content:encoded><![CDATA[<p><em>I have two live cats — indoor-only now, for health reasons, a fact they register
as an ongoing injustice. This already puts me in a better epistemic position than
Schrödinger, who had one hypothetical dead-or-alive one. I want to use this
advantage to say something substantive about what the thought experiment actually
claimed, why it was not a paradox but a critique, and what has happened in the
ninety years since — because what has happened is extraordinary. The cat state
is now an engineering specification.</em></p>
<hr>
<h2 id="the-1935-thought-experiment">The 1935 Thought Experiment</h2>
<p>Erwin Schrödinger introduced the cat in a paper titled &ldquo;Die gegenwärtige
Situation in der Quantenmechanik&rdquo; (<em>Naturwissenschaften</em>, 1935). The paper is
a critique of the Copenhagen interpretation of quantum mechanics, not an
endorsement of macroscopic superposition.</p>
<p>The setup is familiar: a cat is placed in a sealed chamber with a radioactive
atom, a Geiger counter, a hammer, and a vial of poison. If the atom decays in
one hour, the counter fires, the hammer falls, the vial breaks, and the cat
dies. If the atom does not decay, the cat lives. The atom is a quantum system;
after one hour it is in a superposition of decayed and undecayed states.</p>
<p>Quantum mechanics — specifically, the Schrödinger equation, applied without
any special rule for measurement — says the entire system (atom + counter +
hammer + vial + cat) evolves into a superposition:</p>
<p>$$|\Psi\rangle = \frac{1}{\sqrt{2}}\bigl(|\text{decayed}\rangle|\text{cat dead}\rangle</p>
<ul>
<li>|\text{undecayed}\rangle|\text{cat alive}\rangle\bigr).$$</li>
</ul>
<p>Schrödinger&rsquo;s point was that this is <em>absurd</em>: the cat is either dead or alive,
not a superposition of both, and any interpretation of quantum mechanics that
predicts otherwise is failing at the level of macroscopic physical reality. He
intended the cat as a <em>reductio ad absurdum</em> — a demonstration that taking
the wave function literally at macroscopic scales leads to nonsense.</p>
<p>He was not proposing that cats are literally in superposition. He was proposing
that the theory was incomplete.</p>
<hr>
<h2 id="what-actually-resolves-the-cat">What Actually Resolves the Cat</h2>
<p>The resolution that modern physics offers is <strong>decoherence</strong> — the process by
which a quantum superposition is destroyed through entanglement with the
environment.</p>
<p>A macroscopic object — a cat, a hammer, a Geiger counter — is coupled to an
enormous number of environmental degrees of freedom: air molecules, photons,
phonons in its own structure. Each of these interactions entangles the
macroscopic system with the environment, and the entanglement effectively
destroys the coherence between branches of the superposition. What starts as</p>
<p>$$|\Psi\rangle = \frac{1}{\sqrt{2}}(|\text{decayed}\rangle|\text{dead}\rangle</p>
<ul>
<li>|\text{undecayed}\rangle|\text{alive}\rangle)$$</li>
</ul>
<p>rapidly becomes, after environmental entanglement (tracing over environmental
degrees of freedom $|E\rangle$):</p>
<p>$$\rho = \frac{1}{2}|\text{decayed}\rangle\langle\text{decayed}|
\otimes |\text{dead}\rangle\langle\text{dead}|</p>
<ul>
<li>\frac{1}{2}|\text{undecayed}\rangle\langle\text{undecayed}|
\otimes |\text{alive}\rangle\langle\text{alive}|.$$</li>
</ul>
<p>This is a <em>mixed state</em>, not a superposition. The off-diagonal terms (the
interference terms that distinguish a superposition from a classical mixture)
vanish on a timescale</p>
$$\tau_\mathrm{decoherence} \sim \frac{\hbar}{E_\mathrm{int}} \cdot \frac{1}{N},$$<p>where $E_\mathrm{int}$ is the interaction energy with each environmental degree
of freedom and $N$ is the number of such degrees of freedom. For a macroscopic
object at room temperature, $\tau_\mathrm{decoherence}$ is of order
$10^{-20}$–$10^{-30}$ seconds — unmeasurably short. The cat is never in a
superposition for any observable duration. The superposition collapses before
any measurement can resolve it.</p>
<p>This is not a philosophical solution to the measurement problem — it does not
explain <em>why</em> a particular measurement outcome is obtained, only why we never
observe interference between macroscopic branches — but it does explain why
Schrödinger&rsquo;s setup does not produce an observable macroscopic superposition.
The cat&rsquo;s entanglement with its own environment (the box, the air, its own
thermal photons) destroys the coherence long before any observation.</p>
<hr>
<h2 id="what-a-cat-state-actually-is">What a Cat State Actually Is</h2>
<p>In quantum optics, a <strong>cat state</strong> is not a cat in a superposition. It is a
specific quantum state of a harmonic oscillator (typically a mode of the
electromagnetic field) that was named in honour of Schrödinger&rsquo;s thought
experiment.</p>
<p>A <strong>coherent state</strong> $|\alpha\rangle$ is the quantum state that most closely
resembles a classical oscillating electromagnetic field with amplitude $\alpha
\in \mathbb{C}$. Coherent states are eigenstates of the annihilation operator:
$\hat{a}|\alpha\rangle = \alpha|\alpha\rangle$. The mean photon number is
$\bar{n} = |\alpha|^2$.</p>
<p>A <strong>cat state</strong> is a superposition of two coherent states with opposite
phases:</p>
$$|\mathrm{cat}_\pm\rangle = \mathcal{N}_\pm\bigl(|\alpha\rangle \pm |-\alpha\rangle\bigr),$$<p>where $\mathcal{N}_\pm = 1/\sqrt{2(1 \pm e^{-2|\alpha|^2})}$ is the
normalisation constant. For large $|\alpha|$, the two coherent states are
nearly orthogonal: $\langle -\alpha | \alpha \rangle = e^{-2|\alpha|^2} \approx 0$.</p>
<p>The Wigner quasi-probability distribution of a cat state is revealing. The
Wigner function of a coherent state $|\alpha\rangle$ is a Gaussian peaked at
$(x, p) = (\sqrt{2}\,\mathrm{Re}\,\alpha, \sqrt{2}\,\mathrm{Im}\,\alpha)$.
The cat state Wigner function is:</p>
<p>$$W_{\mathrm{cat}<em>+}(x,p) = \mathcal{N}</em>+^2\bigl[W_{|\alpha\rangle}(x,p) + W_{|-\alpha\rangle}(x,p)</p>
<ul>
<li>2W_\mathrm{int}(x,p)\bigr],$$</li>
</ul>
<p>where the interference term $W_\mathrm{int}$ has <em>negative values</em> in the
region between the two Gaussian peaks. Negative regions of the Wigner function
are a signature of non-classical states; they cannot arise from any classical
probability distribution. The cat state is quantum mechanical in a way that
coherent states are not.</p>
<hr>
<h2 id="haroche-and-the-nobel-prize">Haroche and the Nobel Prize</h2>
<p>Serge Haroche (ENS Paris) spent two decades developing techniques to create,
control, and observe cat states of the electromagnetic field in real time.
His experiment used a <strong>superconducting microwave cavity</strong> — a polished copper
box cooled to near absolute zero — in which single microwave photons could be
trapped for hundreds of milliseconds, and a beam of single Rydberg atoms to
probe the field non-destructively.</p>
<p>Haroche created cat states of cavity photons and, crucially, watched their
<strong>decoherence in real time</strong>: as the quantum coherence between the two branches
$|\alpha\rangle$ and $|-\alpha\rangle$ was progressively destroyed by coupling
to the environment, the Wigner function&rsquo;s negative region (the interference
fringe) smoothed out and disappeared, leaving a classical mixture. The
decoherence rate was proportional to $|\alpha|^2$ — the mean photon number,
which measures how &ldquo;macroscopic&rdquo; the cat state is:</p>
$$\Gamma_\mathrm{decoherence} \propto |\alpha|^2 \cdot \kappa,$$<p>where $\kappa$ is the photon loss rate of the cavity. A larger cat (larger
$|\alpha|^2$) decoheres faster, as Schrödinger&rsquo;s argument implicitly requires.</p>
<p>Haroche shared the 2012 Nobel Prize in Physics with David Wineland &ldquo;for
ground-breaking experimental methods that enable measuring and manipulation
of individual quantum systems.&rdquo;</p>
<hr>
<h2 id="cat-qubits-from-paradox-to-engineering">Cat Qubits: From Paradox to Engineering</h2>
<p>The step from fundamental physics to quantum computing was taken when
researchers noted that the two coherent states $|\alpha\rangle$ and
$|-\alpha\rangle$ can serve as the two computational basis states of a qubit:</p>
$$|0\rangle_L \equiv |\alpha\rangle, \quad |1\rangle_L \equiv |-\alpha\rangle.$$<p>The <strong>cat qubit</strong> encodes a logical qubit in this pair of coherent states.
Its remarkable property is an intrinsic asymmetry between error types.</p>
<h3 id="bit-flip-suppression">Bit-Flip Suppression</h3>
<p>A bit-flip error ($|0\rangle_L \leftrightarrow |1\rangle_L$, i.e.,
$|\alpha\rangle \leftrightarrow |-\alpha\rangle$) requires flipping the
amplitude of the oscillator from $+\alpha$ to $-\alpha$. For a stabilised
cat qubit (confined to the cat-state manifold by a parametric drive), this
requires overcoming an energy barrier proportional to $|\alpha|^2$. The
bit-flip time scales exponentially:</p>
$$T_\mathrm{bit-flip} \sim T_1 \cdot e^{2|\alpha|^2},$$<p>where $T_1$ is the single-photon loss time. For modest values of $|\alpha|^2$
(mean photon numbers of 5–10), the bit-flip time can exceed minutes.</p>
<p>A <strong>phase-flip error</strong> (the other error type) is not suppressed — the cat qubit
is still vulnerable to dephasing at a rate proportional to $|\alpha|^2$. This
creates a strongly biased noise channel: only one of the two error types is
relevant.</p>
<h3 id="the-engineering-consequence">The Engineering Consequence</h3>
<p>Biased noise is useful because it allows the error-correcting code to focus
its resources on only one error type. A repetition code (a string of cat
qubits where phase errors are corrected by majority vote) can suppress the
phase-flip error arbitrarily while the exponential bit-flip suppression handles
the other. The hardware overhead for fault tolerance — the ratio of physical
qubits to logical qubits — is dramatically reduced compared to codes that must
handle both error types equally.</p>
<p>In 2023 and 2024, several groups demonstrated cat qubits with bit-flip times
of seconds to minutes:</p>
<ul>
<li><strong>Grimm et al. (2020, <em>Nature</em> 584, 205)</strong>: Kerr cat qubit with exponential
bit-flip suppression demonstrated in a superconducting circuit.</li>
<li><strong>Berdou et al. (2023, <em>PRX Quantum</em> 4, 020350)</strong>: Cat qubit with $T_X$
exceeding $100$ seconds.</li>
<li><strong>Reglade et al. (2024, <em>Nature</em> 629, 778–783)</strong>: Cat qubits from Alice &amp;
Bob demonstrating exponential scaling $T_\mathrm{bit-flip} \propto
  e^{2|\alpha|^2}$ with mean photon numbers up to $|\alpha|^2 \approx 10$,
pushing bit-flip times beyond $10$ seconds in the laboratory and, in
subsequent chip demonstrations, beyond several minutes.</li>
</ul>
<p>This is the state of the art as of early 2025: the cat qubit is no longer
a curiosity but a competitive architecture for fault-tolerant quantum computing,
with bit-flip coherence times exceeding the best alternative approaches.</p>
<hr>
<h2 id="the-wigner-function-and-quantum-non-classicality">The Wigner Function and Quantum Non-Classicality</h2>
<p>The Wigner quasi-probability distribution provides the most informative picture
of a quantum state&rsquo;s non-classicality. For a state with density matrix $\rho$,
the Wigner function is:</p>
$$W(x, p) = \frac{1}{\pi\hbar} \int_{-\infty}^{\infty}
\langle x + y | \rho | x - y \rangle\, e^{2ipy/\hbar}\, dy.$$<p>For the cat state $|\mathrm{cat}_+\rangle$ with $|\alpha|^2 = 4$ (four mean
photons in each coherent component), the Wigner function has two positive
Gaussian peaks at $(x, p) = (\pm\sqrt{2}|\alpha|, 0)$ and an oscillating
interference fringe between them with negative regions of amplitude
$\sim -2/\pi$. The negativity of the Wigner function is a necessary condition
for the state to exhibit quantum features that no classical mixture can reproduce.</p>
<p>As decoherence proceeds (e.g., through photon loss in a cavity), the negative
regions shrink and eventually vanish — the Wigner function becomes everywhere
non-negative, and the state becomes classically describable as a mixture of
coherent states. This is the quantum-to-classical transition, made visible in
phase space.</p>
<p>Haroche&rsquo;s team measured this process directly, frame by frame, in real time.
It is one of the most dramatic experimental visualisations of decoherence ever
achieved.</p>
<hr>
<h2 id="what-schrödinger-would-make-of-this">What Schrödinger Would Make of This</h2>
<p>Schrödinger was a physicist, not a philosopher of language. If told in 1935
that ninety years later, the superposition of two distinguishable states of a
harmonic oscillator — named after his cat, with the same formal structure as
his thought experiment — would be the leading candidate for the basic unit of
a fault-tolerant quantum computer, he would have had two questions.</p>
<p>The first: how do you maintain the superposition against decoherence? The
answer is that you work at millikelvin temperatures in superconducting circuits,
and you use an active parametric drive to confine the state to the cat-state
manifold.</p>
<p>The second, I think, would have been: does this resolve the measurement
problem? And the honest answer remains: no, not fully. Decoherence explains
why macroscopic superpositions are unobservable, but it does not explain why
any particular measurement outcome occurs. That question is as open as it was
in 1935.</p>
<p>What has changed is the practical relationship between quantum theory and
technology. The uncertainty Schrödinger was pointing at — the strangeness of
superposition, the fragility of coherence, the role of the environment — is
now a resource to be engineered, not a conceptual embarrassment to be
resolved. The cat qubit works precisely <em>because</em> the decoherence is
asymmetric: bit flips are exponentially suppressed while phase flips are
correctable. The asymmetry is exploited, not apologised for.</p>
<p>My two cats, meanwhile, are in definite classical states. One is on the
radiator. The other is on the keyboard.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Grimm, A., Frattini, N.E., Puri, S., Mundhada, S.O., Touzard, S.,
Mirrahimi, M., Girvin, S.M., Shankar, S., &amp; Devoret, M.H. (2020). Stabilization
and operation of a Kerr-cat qubit. <em>Nature</em>, 584, 205–209.
<a href="https://doi.org/10.1038/s41586-020-2587-z">https://doi.org/10.1038/s41586-020-2587-z</a></p>
</li>
<li>
<p>Haroche, S., &amp; Raimond, J.-M. (2006). <em>Exploring the Quantum: Atoms,
Cavities, and Photons.</em> Oxford University Press.</p>
</li>
<li>
<p>Reglade, U., Bocquet, A., Gautier, R., et al. (2024). Quantum control of a
cat qubit with bit-flip times exceeding ten seconds. <em>Nature</em>, 629, 778–783.
<a href="https://doi.org/10.1038/s41586-024-07294-3">https://doi.org/10.1038/s41586-024-07294-3</a></p>
</li>
<li>
<p>Mirrahimi, M., Leghtas, Z., Albert, V.V., Touzard, S., Schoelkopf, R.J.,
Jiang, L., &amp; Devoret, M.H. (2014). Dynamically protected cat-qubits: A new
paradigm for universal quantum computation. <em>New Journal of Physics</em>, 16,
045014. <a href="https://doi.org/10.1088/1367-2630/16/4/045014">https://doi.org/10.1088/1367-2630/16/4/045014</a></p>
</li>
<li>
<p>Schrödinger, E. (1935). Die gegenwärtige Situation in der Quantenmechanik.
<em>Naturwissenschaften</em>, 23(48), 807–812; 23(49), 823–828; 23(50), 844–849.
<a href="https://doi.org/10.1007/BF01491891">https://doi.org/10.1007/BF01491891</a></p>
</li>
<li>
<p>Walls, D.F., &amp; Milburn, G.J. (2008). <em>Quantum Optics</em> (2nd ed.). Springer.</p>
</li>
<li>
<p>Zurek, W.H. (2003). Decoherence, einselection, and the quantum origins of
the classical. <em>Reviews of Modern Physics</em>, 75(3), 715–775.
<a href="https://doi.org/10.1103/RevModPhys.75.715">https://doi.org/10.1103/RevModPhys.75.715</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-02-17</strong>: Updated &ldquo;bit-flip times exceeding seven minutes&rdquo; in the summary to &ldquo;exceeding minutes,&rdquo; aligning with the sourced figures: the body text reports &ldquo;beyond several minutes&rdquo; and Reglade et al. (2024) report &ldquo;exceeding ten seconds.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Primes Are Energy Levels: The Montgomery-Odlyzko Conjecture</title>
      <link>https://sebastianspicker.github.io/posts/riemann-primes-quantum-chaos/</link>
      <pubDate>Mon, 18 Nov 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/riemann-primes-quantum-chaos/</guid>
      <description>In October 2024, the largest known prime was discovered — 41 million digits, found by a GPU cluster. But the deepest prime story is not about record-breaking numbers. It is about a 1972 teatime conversation at the Institute for Advanced Study, a pair correlation formula, and the suspicion — numerically confirmed to extraordinary precision — that the zeros of the Riemann zeta function are the energy levels of an undiscovered quantum system.</description>
      <content:encoded><![CDATA[<h2 id="a-very-large-prime">A Very Large Prime</h2>
<p>On 12 October 2024, a retired NVIDIA engineer named Luke Durant announced that he had found the 52nd known Mersenne prime. The number is $2^{136{,}279{,}841} - 1$, and writing it out in decimal requires 41,024,320 digits. Durant had organised a cloud network of GPU servers spread across 17 countries — essentially repurposing the hardware that normally trains language models to instead do modular arithmetic on numbers with tens of millions of digits. The verification alone took about 51 days of computation.</p>
<p>This is the kind of thing that makes headlines, and it deserves them. Mersenne primes are rare and verifying them is genuinely hard. But if I am honest, the more interesting prime story of the last half-century is not about the record-breaking number. It is about a conversation over tea in Princeton in 1972, and the increasingly hard-to-dismiss suspicion that the prime numbers are, in a precise statistical sense, quantum energy levels.</p>
<p>When I say &ldquo;quantum energy levels,&rdquo; I mean it almost literally — not as a metaphor. Let me explain.</p>
<h2 id="the-riemann-zeta-function-encodes-the-primes">The Riemann Zeta Function Encodes the Primes</h2>
<p>Start with the most famous function in number theory. For $\operatorname{Re}(s) > 1$, the Riemann zeta function is defined by the series</p>
$$\zeta(s) = \sum_{n=1}^{\infty} \frac{1}{n^s}.$$<p>This converges nicely and defines an analytic function. But the real reason to care about it is Euler&rsquo;s product formula:</p>
$$\zeta(s) = \prod_{p \text{ prime}} \frac{1}{1 - p^{-s}}.$$<p>This is not obvious — it follows from unique prime factorisation, essentially — but its implications are enormous. The product runs over <em>all</em> primes, and each prime contributes a factor. The primes are encoded in the analytic structure of $\zeta$. If you know $\zeta$, you know the primes; if you understand the zeros of $\zeta$, you understand their distribution.</p>
<p>Riemann&rsquo;s 1859 paper made this explicit (<a href="#ref-Riemann1859">Riemann, 1859</a>). He showed that $\zeta$ extends analytically to the whole complex plane (minus a simple pole at $s = 1$), and he wrote down an explicit formula connecting the prime-counting function</p>
$$\pi(x) = \#\{p \leq x : p \text{ prime}\}$$<p>to the zeros of $\zeta$. The formula is</p>
$$\pi(x) \approx \operatorname{Li}(x) - \sum_{\rho} \operatorname{Li}(x^{\rho}) + \text{(lower-order terms)},$$<p>where $\operatorname{Li}(x) = \int_2^x \frac{dt}{\ln t}$ is the logarithmic integral and the sum runs over the <em>non-trivial zeros</em> $\rho$ of $\zeta$.</p>
<p>What are the non-trivial zeros? The zeta function has trivial zeros at the negative even integers $-2, -4, -6, \ldots$ — boring, understood. The non-trivial zeros lie in the <em>critical strip</em> $0 < \operatorname{Re}(s) < 1$, and their imaginary parts are what drive the oscillatory corrections to $\pi(x)$. Each zero $\rho = \frac{1}{2} + it_n$ contributes a term that oscillates like $x^{1/2} \cos(t_n \ln x)$. The prime distribution is a superposition of these oscillations, one per zero.</p>
<p>The Riemann Hypothesis is the claim that all non-trivial zeros lie on the <em>critical line</em> $\operatorname{Re}(s) = \frac{1}{2}$. It has been verified numerically for the first $10^{13}$ zeros (Gourdon, 2004; building on earlier high-height computations by <a href="#ref-Odlyzko1987">Odlyzko, 1987</a>). It has not been proved. It remains, after 165 years, the most important unsolved problem in mathematics.</p>
<h2 id="tea-with-dyson">Tea with Dyson</h2>
<p>In 1972, Hugh Montgomery was visiting the Institute for Advanced Study in Princeton. He was working on a specific question: if you take the imaginary parts of the non-trivial zeros of $\zeta$ and normalise them so that their mean spacing is 1, what is the distribution of spacings between them?</p>
<p>More precisely, he was computing the <em>pair correlation function</em> of the normalised zeros. If $\tilde{\gamma}_n$ are the normalised imaginary parts (ordered $\tilde{\gamma}_1 \leq \tilde{\gamma}_2 \leq \cdots$), the pair correlation function $R_2(r)$ measures the density of pairs $(\tilde{\gamma}_m, \tilde{\gamma}_n)$ with $\tilde{\gamma}_n - \tilde{\gamma}_m \approx r$.</p>
<p>Montgomery found — subject to certain assumptions about the behaviour of $\zeta$ — that</p>
$$R_2(r) = 1 - \left(\frac{\sin \pi r}{\pi r}\right)^2.$$<p>(<a href="#ref-Montgomery1973">Montgomery, 1973</a>)</p>
<p>He mentioned this to Freeman Dyson over tea. Dyson — who had spent years on quantum mechanics and random matrix theory — recognised the formula immediately. That expression, $1 - (\sin \pi r / \pi r)^2$, is exactly the pair correlation function of eigenvalues of random matrices drawn from the Gaussian Unitary Ensemble.</p>
<p>Montgomery had not been thinking about quantum mechanics. Dyson had not been thinking about primes. The formula matched.</p>
<h2 id="the-gaussian-unitary-ensemble">The Gaussian Unitary Ensemble</h2>
<p>Let me say a few words about where that formula comes from in physics, because it is not obvious.</p>
<p>The Gaussian Unitary Ensemble (GUE) is a probability distribution over $N \times N$ Hermitian matrices. Specifically, it is the distribution proportional to $e^{-\operatorname{tr}(H^2)}$ on the space of Hermitian matrices, which is invariant under conjugation $H \mapsto U H U^\dagger$ for any unitary $U$. The entries on the diagonal are real Gaussians; the off-diagonal entries are complex Gaussians with independent real and imaginary parts.</p>
<p>In the limit $N \to \infty$, the eigenvalues of a GUE matrix distribute globally according to Wigner&rsquo;s semicircle law. But the local statistics — the fine-grained distribution of spacings between nearby eigenvalues — follow a universal law. The pair correlation function is</p>
$$R_2^{\text{GUE}}(r) = 1 - \left(\frac{\sin \pi r}{\pi r}\right)^2.$$<p>This distribution has a crucial qualitative feature called <em>level repulsion</em>: as $r \to 0$, $R_2(r) \to 0$. Eigenvalues of random Hermitian matrices strongly avoid each other. A Poisson distribution — which is what you would get for eigenvalues that were statistically independent — would give $R_2(r) = 1$ everywhere, with no such repulsion. The GUE formula suppresses small gaps quadratically: $R_2(r) \sim \pi^2 r^2 / 3$ for small $r$.</p>
<p>Why does GUE statistics arise in physics? This is the content of the Bohigas-Giannoni-Schmit conjecture (1984), which by now has overwhelming numerical support: quantum systems whose classical limit is chaotic and which lack time-reversal symmetry have energy level statistics described by the GUE. Systems with time-reversal symmetry fall into the Gaussian Orthogonal Ensemble (GOE), which has a different but related formula. Nuclear energy levels, quantum billiards with the right shapes, molecular spectra — all of them, when appropriately normalised, show GUE or GOE statistics.</p>
<p>The universality is the point. It does not matter what the specific Hamiltonian is. If the system is sufficiently chaotic, the eigenvalue statistics are universal.</p>
<h2 id="odlyzkos-computation">Odlyzko&rsquo;s Computation</h2>
<p>Montgomery&rsquo;s result was conditional and covered only a limited range of $r$. The natural next step was numerical verification: actually compute a large number of Riemann zeros and measure their pair correlation.</p>
<p>Andrew Odlyzko did exactly this, in a series of computations beginning in the 1980s. The results were striking (<a href="#ref-Odlyzko1987">Odlyzko, 1987</a>). He computed millions of zeros with high precision and compared their empirical pair correlation to the GUE prediction. The agreement was not merely qualitative — it was quantitatively exact, to within the statistical error of the sample.</p>
<p>Odlyzko then pushed further. He computed zeros near the $10^{20}$-th zero, far out on the critical line. Same statistics. He computed zeros near the $10^{22}$-th zero. Same statistics. The agreement held regardless of how far up the critical line one went. This is not a small-sample artifact and it is not coincidence, or at least it would be an extraordinary coincidence of a kind that mathematics has never before encountered.</p>
<p>The plots from Odlyzko&rsquo;s computations are, in my view, some of the most beautiful images in mathematics. You draw the GUE prediction — a smooth curve, starting at zero, rising to approach 1 — and you overlay the empirical histogram from the Riemann zeros. They are the same curve.</p>
<h2 id="berry-keating-and-the-missing-hamiltonian">Berry, Keating, and the Missing Hamiltonian</h2>
<p>If the zeros of $\zeta$ are energy levels, there should be a Hamiltonian $H$ — a self-adjoint operator — whose spectrum is exactly $\{t_n\}$, the imaginary parts of the non-trivial zeros (assuming the Riemann Hypothesis, so that all zeros are of the form $\frac{1}{2} + it_n$).</p>
<p>In 1999, Michael Berry and Jon Keating proposed a candidate (<a href="#ref-BerryKeating1999">Berry &amp; Keating, 1999</a>). Their suggestion was the classical Hamiltonian</p>
$$H_{\text{cl}} = xp,$$<p>where $x$ is position and $p$ is momentum, quantized with appropriate symmetrization:</p>
$$\hat{H} = \frac{1}{2}(\hat{x}\hat{p} + \hat{p}\hat{x}).$$<p>Classically, $H = xp$ describes a system in which the phase-space trajectories are hyperbolas $xp = E = \text{const}$, and the motion is $x(t) = x_0 e^t$, $p(t) = p_0 e^{-t}$ — exponential expansion in position, contraction in momentum. This is essentially the dynamics of an unstable fixed point, and it is classically chaotic in the appropriate sense.</p>
<p>The semiclassical (WKB) approximation gives an eigenvalue counting function</p>
$$N(E) \approx \frac{E}{2\pi} \ln \frac{E}{2\pi} - \frac{E}{2\pi} + \frac{7}{8} + \cdots,$$<p>which matches Riemann&rsquo;s formula for the number of zeros of $\zeta$ with imaginary part up to $T$:</p>
$$N(T) = \frac{T}{2\pi} \ln \frac{T}{2\pi} - \frac{T}{2\pi} + \frac{7}{8} + O\!\left(\frac{\ln T}{T}\right).$$<p>This is not a coincidence: the correspondence is exact at the level of the smooth counting function. The hard part is the oscillatory corrections — and those require the specific eigenvalues, which requires knowing the boundary conditions.</p>
<p>The problem is that $\hat{H} = \frac{1}{2}(\hat{x}\hat{p} + \hat{p}\hat{x})$ as an operator on $L^2(\mathbb{R})$ is not bounded below and has a continuous spectrum, not a discrete one. Turning it into an operator with a discrete spectrum matching the Riemann zeros requires boundary conditions that have not been found. This is the crux: Berry and Keating have the right classical system, but the quantum boundary conditions are missing.</p>
<p>What would be profound about finding $\hat{H}$? If $\hat{H}$ is self-adjoint and bounded below ($\hat{H} \geq 0$), its eigenvalues are all non-negative real numbers. If those eigenvalues are the imaginary parts of the zeros, then all zeros have real part exactly $\frac{1}{2}$ — which is the Riemann Hypothesis. A proof of the existence of such a Hamiltonian would, in one stroke, resolve the most important open problem in mathematics.</p>
<h2 id="primes-as-periodic-orbits-the-gutzwiller-analogy">Primes as Periodic Orbits: The Gutzwiller Analogy</h2>
<p>The quantum chaos connection goes deeper than pair correlations. In semiclassical quantum mechanics, the Gutzwiller trace formula relates the density of quantum energy levels to a sum over classical periodic orbits:</p>
$$d(E) = \bar{d}(E) + \sum_{\gamma} A_\gamma \cos\!\left(\frac{S_\gamma}{\hbar} - \phi_\gamma\right),$$<p>where the sum runs over all classical periodic orbits $\gamma$, $S_\gamma$ is the classical action of the orbit, $A_\gamma$ is an amplitude, and $\phi_\gamma$ is a phase (Maslov index correction). The smooth part $\bar{d}(E)$ comes from the Thomas-Fermi approximation; the oscillatory part encodes quantum interference between orbits.</p>
<p>The direct analogue in number theory is the <em>explicit formula</em> for the prime-counting function. Written as a formula for the oscillatory part of the zero-counting function, it reads</p>
$$\psi(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2}\ln(1 - x^{-2}),$$<p>where $\psi(x) = \sum_{p^k \leq x} \ln p$ is the Chebyshev function and the sum is over non-trivial zeros $\rho$.</p>
<p>Comparing these two formulas term by term: the zeros $\rho$ of $\zeta$ play the role of the quantum energy levels $E_n$; the primes $p$ — and their prime powers $p^k$ — play the role of the classical periodic orbits $\gamma$. The &ldquo;action&rdquo; of the orbit corresponding to $p^k$ is $k \ln p$. The primes are the primitive periodic orbits; $p^k$ is the $k$-th traversal of that orbit.</p>
<p>This is not a metaphor or a loose analogy. The Selberg trace formula — developed for the Laplacian on hyperbolic surfaces — makes this correspondence rigorous in a related setting: the periodic geodesics on a hyperbolic surface play the role of primes, and the eigenvalues of the Laplacian play the role of Riemann zeros (<a href="#ref-RudnickSarnak1996">Rudnick &amp; Sarnak, 1996</a>). The Riemann zeta function is the limit of a family of such systems, in some sense that is still being made precise.</p>
<p>I find it remarkable that the logarithms of primes — the most elementary sequence in arithmetic — appear as lengths of orbits in what would be a quantum chaotic system. Each prime contributes an oscillation to $\psi(x)$ with &ldquo;frequency&rdquo; proportional to its logarithm. You are, in a sense, hearing the primes as quantum interference.</p>
<p>This connects to a theme that comes up elsewhere on this blog. The <a href="/posts/falling-cat-geometric-phase/">falling cat problem</a> involves Berry phase and geometric holonomy — again a situation where deep structure emerges from symmetry and topology. The <a href="/posts/schrodinger-cat-qubits/">Schrödinger cat in quantum computing</a> involves the spectacular fragility of quantum coherence. The Riemann zeros are, if the conjecture is right, a quantum system that has never decohered — a perfectly coherent spectrum hiding inside the most ancient problem in mathematics.</p>
<h2 id="a-brief-detour-maynard-and-primes-without-digits">A Brief Detour: Maynard and Primes Without Digits</h2>
<p>While we are talking about primes, I cannot resist a detour through two results of James Maynard, who received the Fields Medal in 2022.</p>
<p>The first concerns bounded gaps. Euclid proved that there are infinitely many primes. The Twin Prime Conjecture says there are infinitely many pairs of primes $(p, p+2)$. This remains open. But in 2013, Yitang Zhang proved something extraordinary: there are infinitely many pairs of primes differing by at most 70,000,000 (<a href="#ref-Zhang2014">Zhang, 2014</a>). The bound is large, but the qualitative statement — that gaps between primes are bounded infinitely often — was completely new. Shortly thereafter, Maynard independently proved a much stronger result using the Maynard-Tao sieve: infinitely many prime pairs with gap at most 600 (<a href="#ref-Maynard2015">Maynard, 2015</a>). A crowdsourced effort (Polymath8b) brought the bound down to 246. The Twin Prime Conjecture remains open, but 246 is a long way from 70,000,000.</p>
<p>The second result is stranger. Maynard proved in 2016 that for any decimal digit $d \in \{0, 1, \ldots, 9\}$, there are infinitely many primes whose decimal representation contains no instance of $d$. There are infinitely many primes with no $7$ in their decimal expansion. There are infinitely many primes with no $3$. The proof uses techniques from analytic number theory, specifically exponential sum estimates and sieve methods, and the result holds not just for base 10 but for any base.</p>
<p>This is one of those results that sounds impossible on first hearing. Surely removing an entire digit should make most large numbers unavailable, so the primes run out? Not so. The density of such &ldquo;digitless&rdquo; numbers thins out, but not fast enough to eliminate infinitely many primes.</p>
<h2 id="the-52nd-mersenne-prime-and-what-we-do-not-know">The 52nd Mersenne Prime and What We Do Not Know</h2>
<p>Return to $M_{136{,}279{,}841} = 2^{136{,}279{,}841} - 1$. Mersenne primes have the form $2^p - 1$ where $p$ is a prime (though not all such numbers are prime — $2^{11} - 1 = 2047 = 23 \times 89$). They are tested via the Lucas-Lehmer primality test: define the sequence</p>
$$s_0 = 4, \qquad s_{n+1} = s_n^2 - 2.$$<p>Then $M_p = 2^p - 1$ is prime if and only if $s_{p-2} \equiv 0 \pmod{M_p}$.</p>
<p>The test requires $p - 2$ squarings modulo $M_p$. Each squaring involves numbers with roughly $p$ digits, and modular reduction modulo $M_p = 2^p - 1$ is cheap because it reduces to bit-shifts. This is why GPU parallelism helps enormously: each squaring can be broken into many parallel multiplications of sub-blocks of digits. Durant&rsquo;s cloud network was, in effect, a massively distributed modular arithmetic engine.</p>
<p>We do not know if there are infinitely many Mersenne primes. The heuristic Lenstra-Pomerance-Wagstaff conjecture says yes: the expected number of Mersenne primes $2^p - 1$ with $p \leq x$ is approximately</p>
$$e^\gamma \ln x / \ln 2 \approx 1.78 \cdot \log_2 x,$$<p>where $\gamma \approx 0.5772$ is the Euler-Mascheroni constant. This predicts roughly logarithmic growth in the count — consistent with the 52 known examples — but is nowhere near proved.</p>
<p>The known Mersenne primes do not form a sequence with obviously regular gaps. The exponents $p$ are: 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, &hellip; and then larger, less predictable values. Whether their distribution has GUE-like statistics is not a standard research question (the sample is too small), but the question of whether the primes $p$ for which $2^p - 1$ is prime have any special structure is an active one. For now, the answer is: we do not know.</p>
<h2 id="why-this-matters-and-why-it-does-not-prove-anything">Why This Matters, and Why It Does Not Prove Anything</h2>
<p>Let me be precise about what has and has not been established.</p>
<p>What has been established:</p>
<ul>
<li>Montgomery proved (conditionally, assuming a form of the generalised Riemann Hypothesis) that the pair correlation of Riemann zeros, for a certain range of $r$, is given by $1 - (\sin \pi r / \pi r)^2$.</li>
<li>Odlyzko verified numerically — to extraordinary precision, over billions of zeros — that the full empirical pair correlation matches the GUE prediction.</li>
<li>The Gutzwiller/Selberg analogy between periodic orbits and primes is mathematically precise in related settings (hyperbolic surfaces, function fields over finite fields).</li>
<li>Rudnick and Sarnak proved that the $n$-point correlation functions of Riemann zeros match GUE for all $n$, subject to a plausible conjecture about $\zeta$ (<a href="#ref-RudnickSarnak1996">Rudnick &amp; Sarnak, 1996</a>).</li>
</ul>
<p>What has not been established:</p>
<ul>
<li>There is no known Hamiltonian $\hat{H}$ whose spectrum is the set of Riemann zeros.</li>
<li>The Riemann Hypothesis remains open.</li>
<li>There is no proof that the Montgomery-Odlyzko connection is anything more than an extraordinary numerical coincidence.</li>
</ul>
<p>The broader context is the Langlands program — a still-hypothetical grand unification of number theory, algebraic geometry, and representation theory, sometimes described as a &ldquo;grand unified theory of mathematics.&rdquo; The Langlands correspondence predicts deep connections between $L$-functions (generalisations of $\zeta$) and representations of algebraic groups. The spectral interpretation of Riemann zeros — if it could be made precise — would fit naturally into this framework. Some researchers believe that a proof of the Riemann Hypothesis will come from the Langlands side, not from analytic number theory or quantum mechanics. Others think the quantum chaos connection is the right road. Nobody knows.</p>
<p>What would it mean if the connection is real? It would mean that the prime numbers — discovered by Euclid, studied for two and a half millennia, used today in every TLS handshake and RSA key — are the eigenvalues of a physical Hamiltonian. The abstract number-theoretic structure and the physical quantum mechanical structure would be not merely analogous but identical. That is a claim of the same depth as the unexpected appearance of the same partial differential equations in heat flow, diffusion, and Brownian motion: a discovery that what seemed to be different phenomena are manifestations of the same underlying law.</p>
<p>Or it could be a very surprising coincidence. Mathematics has a long history of producing such coincidences — the same numbers appearing in unrelated contexts for reasons that, when understood, turned out not to be coincidences at all. I suspect this is not a coincidence. But suspicion is not proof.</p>
<h2 id="a-closing-reflection">A Closing Reflection</h2>
<p>I started this post with the 52nd Mersenne prime because it is the news item that prompted me to write. GPU clusters finding 41-million-digit primes are genuinely impressive technology. But I keep returning to the image of Montgomery and Dyson at tea in 1972, and the formula $1 - (\sin \pi r / \pi r)^2$ connecting two conversations that had nothing to do with each other.</p>
<p>I have spent some time with random matrix theory, and separately with the zeta function, and the thing that still strikes me is how <em>clean</em> the connection is. This is not a numerical coincidence of the form &ldquo;these two quantities agree to 3 decimal places.&rdquo; Odlyzko&rsquo;s plots show agreement across many orders of magnitude, for zeros computed billions of entries into the sequence. The GUE curve and the empirical histogram are, visually, the same curve.</p>
<p>As someone trained as a physicist, I find this both encouraging and slightly unsettling. Encouraging because it suggests that the primes are not random — they have a structure, one that matches the eigenvalue repulsion of quantum chaotic systems, and that structure might be the key to proving the Riemann Hypothesis. Unsettling because it means that the quantum mechanical formalism — which I always thought was a description of a physical world — seems to be reaching into pure arithmetic, where there is no wave function, no Hilbert space, no measurement. The primes do not know they are supposed to be energy levels. And yet, statistically, they are.</p>
<p>If you find a flaw in this picture, or know of a result I have missed, I am genuinely interested. Peer review is welcome — open an issue on <a href="https://github.com/sebastianspicker/sebastianspicker.github.io/issues">GitHub</a>.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-Riemann1859"></span>Riemann, B. (1859). Über die Anzahl der Primzahlen unter einer gegebenen Grösse. <em>Monatsberichte der Berliner Akademie</em>.</p>
</li>
<li>
<p><span id="ref-Montgomery1973"></span>Montgomery, H. L. (1973). The pair correlation of zeros of the zeta function. <em>Analytic Number Theory</em>, Proc. Symp. Pure Math., 24, 181–193.</p>
</li>
<li>
<p><span id="ref-Odlyzko1987"></span>Odlyzko, A. M. (1987). On the distribution of spacings between zeros of the zeta function. <em>Mathematics of Computation</em>, 48, 273–308. <a href="https://doi.org/10.2307/2007890">DOI: 10.2307/2007890</a></p>
</li>
<li>
<p><span id="ref-BerryKeating1999"></span>Berry, M. V., &amp; Keating, J. P. (1999). The Riemann zeros and eigenvalue asymptotics. <em>SIAM Review</em>, 41(2), 236–266. <a href="https://doi.org/10.1137/S0036144598347497">DOI: 10.1137/S0036144598347497</a></p>
</li>
<li>
<p><span id="ref-Zhang2014"></span>Zhang, Y. (2014). Bounded gaps between primes. <em>Annals of Mathematics</em>, 179(3), 1121–1174. <a href="https://doi.org/10.4007/annals.2014.179.3.7">DOI: 10.4007/annals.2014.179.3.7</a></p>
</li>
<li>
<p><span id="ref-Maynard2015"></span>Maynard, J. (2015). Small gaps between primes. <em>Annals of Mathematics</em>, 181(1), 383–413. <a href="https://doi.org/10.4007/annals.2015.181.1.7">DOI: 10.4007/annals.2015.181.1.7</a></p>
</li>
<li>
<p><span id="ref-RudnickSarnak1996"></span>Rudnick, Z., &amp; Sarnak, P. (1996). Zeros of principal L-functions and random matrix theory. <em>Duke Mathematical Journal</em>, 81(2), 269–322. <a href="https://doi.org/10.1215/S0012-7094-96-08115-6">DOI: 10.1215/S0012-7094-96-08115-6</a></p>
</li>
<li>
<p><span id="ref-GIMPS2024"></span>GIMPS (2024). 2^136279841-1 is Prime! Great Internet Mersenne Prime Search. Retrieved from <a href="https://www.mersenne.org/primes/?press=M136279841">https://www.mersenne.org/primes/?press=M136279841</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-02-17</strong>: Corrected the date of the Montgomery-Dyson meeting from 1973 to 1972 (the paper was published in the 1973 proceedings volume, but the meeting at the IAS took place in April 1972).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Hamiltonian of Intelligence: From Spin Glasses to Neural Networks</title>
      <link>https://sebastianspicker.github.io/posts/spin-glass-hopfield-ai-physics-lineage/</link>
      <pubDate>Mon, 21 Oct 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/spin-glass-hopfield-ai-physics-lineage/</guid>
      <description>On October 8, 2024, Hopfield and Hinton were awarded the Nobel Prize in Physics. The physics community reacted with irritation: is machine learning really physics? The irritation is wrong. The energy function of a Hopfield network is literally the Ising Hamiltonian. The lineage runs from Giorgio Parisi&amp;rsquo;s disordered iron alloys in 1979 to the model that predicted the structures of 200 million proteins.</description>
      <content:encoded><![CDATA[<p>On October 8, 2024, the Royal Swedish Academy of Sciences announced that the Nobel Prize in Physics would go to John Hopfield and Geoffrey Hinton &ldquo;for foundational discoveries and inventions that enable machine learning with artificial neural networks.&rdquo; Within hours, the physics corner of the internet had an episode. Thermodynamics Twitter — yes, that is a thing — asked whether gradient descent is really physics in the sense that the Higgs mechanism is physics. The condensed matter community, who have been doing disordered systems since before most ML practitioners were born, oscillated between pride (&ldquo;finally, they noticed us&rdquo;) and bafflement (&ldquo;why is Hinton here and not Parisi?&rdquo;). There were takes. There were dunks. Someone made a graph of Nobel prizes versus average journal impact factor and it was not flattering to this year&rsquo;s winner.</p>
<p>I understand the irritation. I do not share it.</p>
<p>The argument I want to make is stronger than &ldquo;machine learning uses some physics concepts by analogy.&rdquo; The energy function that Hopfield wrote down in 1982 is not <em>inspired by</em> the Ising Hamiltonian. It <em>is</em> the Ising Hamiltonian. The machine that Hinton and Sejnowski built in 1985 is not named after Boltzmann as a cute metaphor. It is a physical system whose equilibrium distribution is the Boltzmann distribution, and whose learning algorithm is derived from statistical mechanics. The lineage from disordered magnets to protein structure prediction is not a convenient narrative; it is a sequence of mathematical identities.</p>
<p>Let me trace it properly.</p>
<h2 id="the-2021-nobel-parisi-and-the-frozen-magnet">The 2021 Nobel: Parisi and the frozen magnet</h2>
<p>Before we get to 2024, we need 2021. Giorgio Parisi received half the Nobel Prize in Physics that year for work done between 1979 and 1983 on spin glasses. The other half went to Syukuro Manabe and Klaus Hasselmann for climate modelling — an interesting pairing that provoked its own set of takes, though rather fewer.</p>
<p>A spin glass is a disordered magnetic system. The canonical physical realisation is a dilute alloy: a small concentration of manganese atoms dissolved in copper. Each manganese atom carries a magnetic moment — a spin — that can point in one of two directions, which we label $\sigma_i \in \{-1, +1\}$. The spins interact with each other via exchange interactions mediated by the conduction electrons. The crucial feature is that these interactions are random: some spin pairs prefer to align (ferromagnetic coupling, $J_{ij} > 0$) and others prefer to anti-align (antiferromagnetic coupling, $J_{ij} < 0$), and there is no spatial pattern to which is which.</p>
<p>The Hamiltonian of the system is</p>
$$H = -\sum_{i < j} J_{ij} \sigma_i \sigma_j$$<p>where the $J_{ij}$ are random variables drawn from some distribution. In the Sherrington-Kirkpatrick (SK) model (<a href="#ref-Sherrington1975">Sherrington &amp; Kirkpatrick, 1975</a>), all $N$ spins interact with all other spins — a mean-field model — and the couplings are drawn from a Gaussian distribution with mean zero and variance $J^2/N$:</p>
$$J_{ij} \sim \mathcal{N}\!\left(0,\, \frac{J^2}{N}\right)$$<p>The factor of $1/N$ is essential for extensivity: without it, the energy would scale as $N^2$ rather than $N$, which is unphysical.</p>
<p>Now here is the key phenomenon. At high temperature, the spins fluctuate freely and the system is paramagnetic. Cool it below the glass transition temperature $T_g$, and the system &ldquo;freezes&rdquo; — but not into a ferromagnet with all spins aligned, and not into a simple antiferromagnet. It freezes into one of an astronomically large number of disordered, metastable states. The system is not in its true ground state; it is trapped. It cannot find its way down because the energy landscape is rugged: every path toward lower energy is blocked by a barrier.</p>
<p>This rugged landscape is the central object. It has exponentially many local minima, separated by barriers that grow with system size. Different initial conditions lead to different frozen states. The system has memory of its history — hence &ldquo;glass&rdquo; rather than &ldquo;crystal.&rdquo;</p>
<p>Computing thermodynamic quantities in this system requires averaging over the disorder (the random $J_{ij}$), which means computing the quenched average of the free energy:</p>
$$\overline{F} = -T\, \overline{\ln Z}$$<p>The overline denotes an average over the distribution of couplings. The problem is that $\ln Z$ is hard to average because $Z$ is a sum of exponentially many terms. Parisi&rsquo;s solution — the replica trick — is a mathematical device worth describing, because it is beautifully strange.</p>
<p>The trick exploits the identity $\ln Z = \lim_{n \to 0} (Z^n - 1)/n$. We compute $\overline{Z^n}$ for integer $n$, which is feasible because $Z^n$ is a product of $n$ copies (replicas) of the partition function, and the average over disorder decouples. We then analytically continue in $n$ to $n \to 0$. The result is an effective action in terms of order parameters $q^{ab}$, which describe the overlap between spin configurations in replica $a$ and replica $b$.</p>
<p>The naive assumption is replica symmetry: all $q^{ab}$ are equal. This assumption turns out to be wrong. Parisi showed that the correct solution breaks replica symmetry in a hierarchical way — the overlap matrix $q^{ab}$ has a nested structure, described by a function $q(x)$ for $x \in [0,1]$. This is replica symmetry breaking (RSB).</p>
<p>RSB has a beautiful physical interpretation. The phase space of the spin glass is organised into an ultrametric tree: exponentially many states, arranged in nested clusters. States in the same cluster are similar (high overlap); states in different clusters are very different (low overlap). The hierarchy has infinitely many levels. Parisi showed that this structure is exact in the SK model (<a href="#ref-Parisi1979">Parisi, 1979</a>), and he spent the subsequent years proving it rigorously.</p>
<p>This is not an abstraction. RSB predicts specific, measurable properties of real spin glass alloys, and experiments have confirmed them. It is also, I want to emphasise, not a result that anyone expected. The mathematics forced it.</p>
<p>Three years after Parisi solved the SK model, a physicist at Bell Labs wrote a paper about memory.</p>
<h2 id="hopfield-1982-memory-as-energy-minimisation">Hopfield (1982): memory as energy minimisation</h2>
<p>John Hopfield was a condensed matter physicist who had drifted toward biophysics — electron transfer in proteins, neural computation. In 1982 he published a paper in PNAS with the title &ldquo;Neural networks and physical systems with emergent collective computational abilities&rdquo; (<a href="#ref-Hopfield1982">Hopfield, 1982</a>). Most biologists read it as a neuroscience paper. It is a statistical mechanics paper.</p>
<p>Hopfield defined a network of $N$ binary &ldquo;neurons&rdquo; $s_i \in \{-1, +1\}$ with symmetric weights $W_{ij} = W_{ji}$, and an energy function:</p>
$$E = -\frac{1}{2} \sum_{i \neq j} W_{ij}\, s_i s_j$$<p>Readers who have seen the SK Hamiltonian above will notice something. This is it. The $J_{ij}$ of the spin glass are the $W_{ij}$ of the neural network. The Ising spins $\sigma_i$ are the neuron states $s_i$. The Hopfield network energy function is the Ising model Hamiltonian with symmetric, fixed (non-random) couplings. This is not a metaphor. This is the same equation.</p>
<p>The dynamics: at each step, choose a neuron $i$ at random and update it according to</p>
$$s_i \leftarrow \text{sgn}\!\left(\sum_{j} W_{ij} s_j\right)$$<p>This update always decreases or leaves unchanged the energy $E$ (because the weights are symmetric). The network is a gradient descent machine on $E$. It will always converge to a local minimum — a fixed point.</p>
<p>The innovation is in how Hopfield chose the weights. To store a set of $p$ binary patterns $\xi^\mu \in \{-1,+1\}^N$ (for $\mu = 1, \ldots, p$), use Hebb&rsquo;s rule:</p>
$$W_{ij} = \frac{1}{N} \sum_{\mu=1}^{p} \xi^\mu_i\, \xi^\mu_j$$<p>This is the outer product rule. Each stored pattern contributes a rank-1 matrix to $W$. You can verify that if $s = \xi^\mu$, then the local field at neuron $i$ is</p>
$$h_i = \sum_j W_{ij} s_j = \frac{1}{N}\sum_j \sum_{\nu} \xi^\nu_i \xi^\nu_j \xi^\mu_j = \xi^\mu_i + \frac{1}{N}\sum_{\nu \neq \mu} \xi^\nu_i \underbrace{\left(\sum_j \xi^\nu_j \xi^\mu_j\right)}_{\text{cross-talk}}$$<p>The first term reinforces pattern $\mu$. The second term is noise from the other stored patterns. When the patterns are random and uncorrelated, the cross-talk averages to zero for the first term to dominate, and the stored patterns are stable fixed points of the dynamics. A noisy or incomplete input — a partial pattern — will evolve under the dynamics toward the nearest stored pattern. This is associative memory: content-addressable retrieval.</p>
<p>The capacity limit follows from the same analysis. As $p$ grows, the cross-talk grows. When $p$ exceeds approximately $0.14N$, the cross-talk overwhelms the signal, and the network begins to form spurious minima — states that are not any of the stored patterns but are mixtures or corruptions of them. The network has entered a spin-glass phase.</p>
<p>This is not a rough analogy. Amit, Gutfreund, and Sompolinsky showed in 1985 that the Hopfield model is <em>exactly</em> the SK model with $p$ planted minima (<a href="#ref-Amit1985">Amit, Gutfreund, &amp; Sompolinsky, 1985</a>). The phase diagram of the Hopfield model — paramagnetic phase, memory phase, spin-glass phase — maps precisely onto the phase diagram of the SK model. The capacity limit $p \approx 0.14N$ is the phase boundary between the memory phase and the spin-glass phase, derivable from Parisi&rsquo;s RSB theory.</p>
<p>The 2021 Nobel and the 2024 Nobel are, mathematically, about the same model.</p>
<h2 id="boltzmann-machines-hinton--sejnowski-1985">Boltzmann machines (Hinton &amp; Sejnowski, 1985)</h2>
<p>The Hopfield model is deterministic and shallow — one layer of visible neurons, no hidden structure. Geoffrey Hinton and Terry Sejnowski, in a collaboration that began at the Cognitive Science summer school in Pittsfield in 1983 and culminated in a 1985 paper (<a href="#ref-Ackley1985">Ackley, Hinton, &amp; Sejnowski, 1985</a>), added two things: hidden units and stochastic dynamics.</p>
<p>Hidden units $h_j$ are neurons not connected to any input or output. They do not correspond to observable quantities; they model latent structure in the data. The energy of the system is:</p>
$$E(\mathbf{v}, \mathbf{h}) = -\sum_{i,j} W_{ij}\, v_i h_j - \sum_i a_i v_i - \sum_j b_j h_j$$<p>where $v_i$ are the visible (data) units, $h_j$ are the hidden units, $a_i$ and $b_j$ are biases. Note that this is still an Ising-type energy; the $W_{ij}$ are now inter-layer weights.</p>
<p>The stochastic dynamics replace deterministic gradient descent with a Markov chain. Each unit is updated probabilistically:</p>
$$P(s_k = 1 \mid \text{rest}) = \sigma\!\left(\sum_j W_{kj} s_j + \text{bias}_k\right)$$<p>where $\sigma(x) = 1/(1 + e^{-x})$ is the logistic sigmoid. At inverse temperature $\beta = 1/T$, the probability of any complete configuration is</p>
$$P(\mathbf{v}, \mathbf{h}) = \frac{1}{Z}\, e^{-\beta E(\mathbf{v}, \mathbf{h})}$$<p>This is the Boltzmann distribution. The machine is named after Ludwig Boltzmann because the equilibrium distribution of its states is the Boltzmann distribution. Not analogously. Literally.</p>
<p>Learning amounts to adjusting the weights to make the model distribution $P(\mathbf{v}, \mathbf{h})$ match the data distribution $P_{\text{data}}(\mathbf{v})$. The objective is to minimise the Kullback-Leibler divergence:</p>
$$\mathcal{L} = D_{\mathrm{KL}}(P_{\text{data}} \| P_{\text{model}}) = \sum_{\mathbf{v}} P_{\text{data}}(\mathbf{v}) \ln \frac{P_{\text{data}}(\mathbf{v})}{P_{\text{model}}(\mathbf{v})}$$<p>The gradient with respect to the weight $W_{ij}$ is</p>
$$\frac{\partial \mathcal{L}}{\partial W_{ij}} = -\langle v_i h_j \rangle_{\text{data}} + \langle v_i h_j \rangle_{\text{model}}$$<p>The first term is the empirical correlation between visible unit $i$ and hidden unit $j$ when the visible units are clamped to data. The second term is the correlation in the model&rsquo;s free-running equilibrium. The learning rule says: increase $W_{ij}$ if the data sees these two units co-active more than the model does, and decrease it otherwise. This is Hebbian learning with a contrastive correction — the physics of equilibration drives the learning.</p>
<p>The computational difficulty is the second term. Computing $\langle v_i h_j \rangle_{\text{model}}$ requires the Markov chain to reach equilibrium, which takes exponentially long in general. Hinton&rsquo;s later invention of contrastive divergence — run the chain for only a few steps rather than to equilibrium — made training feasible, at the cost of a biased gradient estimate. This engineering compromise is part of why the physics purists are uncomfortable: the original derivation is rigorous statistical mechanics, but the algorithm that actually works in practice is an approximation whose convergence properties are poorly understood.</p>
<p>I find this charming rather than damning. Physics itself is full of approximations whose convergence properties are poorly understood but which happen to give right answers. Perturbation theory beyond leading order, the replica trick itself — these are not rigorous mathematics. They are informed guesses that happen to be correct. The history of theoretical physics is mostly the history of getting away with things.</p>
<h2 id="from-boltzmann-machines-to-transformers">From Boltzmann machines to transformers</h2>
<p>The Boltzmann machine was computationally difficult but conceptually foundational. The restricted Boltzmann machine (RBM) — with no within-layer connections, so that hidden units are conditionally independent given the visible units and vice versa — made training via contrastive divergence practical.</p>
<p>Hinton, Osindero, and Teh&rsquo;s 2006 paper on deep belief networks showed that stacking RBMs and pre-training them greedily could initialise deep networks well enough to fine-tune with backpropagation. This was the breakthrough that restarted deep learning after the winter of the 1990s. It is fair to say that without the Boltzmann machine as conceptual foundation and the RBM as practical building block, the deep learning revolution that gave us <a href="/posts/strawberry-tokenisation/">large language models that fail to count letters in words</a> would not have happened in the form it did.</p>
<p>The connection between Hopfield networks and modern attention mechanisms is more recent and more surprising. Ramsauer et al. (2020) showed that modern Hopfield networks — a generalisation of the original with continuous states and a different energy function — have exponential storage capacity (<a href="#ref-Ramsauer2020">Ramsauer et al., 2020</a>). More strikingly, the update rule of the modern Hopfield network is:</p>
$$\mathbf{s}^{\text{new}} = \mathbf{X}\, \text{softmax}\!\left(\beta \mathbf{X}^\top \mathbf{s}\right)$$<p>where $\mathbf{X}$ is the matrix of stored patterns and $\mathbf{s}$ is the query. This is the attention mechanism of the transformer, up to notation. The transformer&rsquo;s multi-head self-attention is, formally, a generalised Hopfield retrieval step. The architecture that powers GPT and everything descended from it is, at one level of abstraction, an associative memory performing energy minimisation on a Hopfield energy landscape.</p>
<p>I do not want to overstate this. The connection is formal and the interpretation is contested. But it is not nothing. The physicists who built the Hopfield network in 1982 were working on the same mathematical object that is now used to process language, images, and protein sequences at industrial scale.</p>
<h2 id="the-protein-folding-connection">The protein folding connection</h2>
<p>The 2024 Nobel Prize in Chemistry went to Demis Hassabis, John Jumper, and David Baker for computational protein structure prediction — specifically for AlphaFold2 (<a href="#ref-Jumper2021">Jumper et al., 2021</a>). This made October 2024 a remarkable month for Nobel Prizes in fields adjacent to artificial intelligence, and it is not a coincidence.</p>
<p>Protein folding is a spin-glass problem. A protein is a polymer of amino acids, each with different chemical properties and steric constraints. The protein folds into a unique three-dimensional structure — its native conformation — determined by its sequence. The energy landscape of the folding process is precisely the kind of rugged landscape that Parisi described for spin glasses: exponentially many misfolded states, separated by barriers, with the native structure as the global minimum (or close to it).</p>
<p>Levinthal&rsquo;s paradox, formulated in 1969, makes the absurdity quantitative. A modest protein of 100 amino acids might have $3^{100} \approx 10^{47}$ possible conformations (allowing three dihedral angle states per residue). Random search of this space, at the rate of one conformation per picosecond, would take $10^{35}$ years — somewhat longer than the age of the universe. Yet proteins fold in milliseconds to seconds. They do not search randomly; the energy landscape is funnel-shaped, channelling the dynamics toward the native state. But predicting <em>which</em> state is the native one from sequence alone remained one of the hard problems of structural biology for fifty years.</p>
<p>AlphaFold2 uses a transformer architecture — descended from the Boltzmann machine lineage — trained on millions of known protein structures. It does not simulate the folding dynamics; it has learned, from data, a mapping from sequence to structure that encodes the statistical mechanics of the folding funnel. The Nobel committee gave it the Chemistry prize because it is transforming biochemistry. But the conceptual machinery is pure statistical physics: representation of a high-dimensional energy landscape, approximation of the minimum, learned from the distribution of solved instances.</p>
<p>The three Nobels of 2021–2024 form the most coherent consecutive triple I can remember: Parisi showed how disordered energy landscapes behave; Hopfield and Hinton showed how to use energy landscapes as memory and learning machines; Hassabis and Jumper showed how to apply the resulting architecture to the most consequential outstanding problem in molecular biology. Each step is a mathematical consequence of the one before it.</p>
<h2 id="the-controversy-did-the-committee-err">The controversy: did the committee err?</h2>
<p>I said I understand the irritation. Here is what is right about it.</p>
<p>Hinton&rsquo;s work after the Boltzmann machine — backpropagation, dropout, convolutional networks, deep learning at ImageNet scale — is primarily engineering and empirical machine learning. The 2012 AlexNet result that restarted the field was not a theoretical physics contribution; it was a demonstration that known methods work very well on very large datasets with very large GPUs. The fact that it works is not explained by statistical mechanics. The scaling laws of neural networks (loss scales as a power law with compute, parameters, and data) are empirical observations that physicists have tried to explain with renormalisation group arguments with mixed success.</p>
<p>If the Nobel Prize in Physics were awarded for &ldquo;the work that most influenced technology in the past decade,&rdquo; the case for Hinton is strong. If it were awarded for &ldquo;the most important contribution to the science of physics,&rdquo; the case is weaker. There is a version of the Nobel announcement that emphasises the Boltzmann machine specifically — the 1985 paper that is literally named after a physicist and uses his distribution — and that version sits cleanly within physics. There is a broader version that encompasses all of Hinton&rsquo;s career, and that version includes a great deal of empirical machine learning that the physics community is reasonably reluctant to claim.</p>
<p>My view, for what it is worth from someone who has been <a href="/posts/ai-warfare-anthropic-atom-bomb/">thinking about AI ethics and consequences</a> for rather longer than feels comfortable: the Nobel correctly identifies that the foundational conceptual contributions — the Ising Hamiltonian as associative memory, the Boltzmann distribution as a learning target, the connection between statistical mechanics and computation — are physics. They came from physicists, they use physics mathematics, they extend physics intuition into a new domain. The subsequent scaling of these ideas using TPUs and transformer architectures is engineering. Valuable engineering, world-changing engineering, but engineering. The Nobel is for the former. If the citation had been more specific — &ldquo;for the Boltzmann machine and its demonstration that physical principles govern neural computation&rdquo; — the physics community would have been less irritated and equally correct.</p>
<p>What the irritation reveals is something slightly uncomfortable about disciplinary identity. Physicists are proud of universality: the idea that the same mathematical structures appear in wildly different physical systems. RSB in spin glasses, replica methods in random matrices, the Parisi–Sourlas correspondence between disordered systems and supersymmetric field theories — the joy of physics is precisely that these deep structural similarities cross domain boundaries. When that universality reaches into machine learning and says &ldquo;your transformer attention layer is a Hopfield retrieval step,&rdquo; physicists should be delighted, not affronted.</p>
<p>The <a href="/posts/ralph-loop/">agentic systems</a> that are being built right now on top of transformer architectures are doing something that looks, from a sufficiently abstract distance, like what the Hopfield network was designed to do: find stored patterns that match a query, and use them to generate a response. The <a href="/posts/car-wash-grounding/">failures of grounding</a> that I have written about elsewhere are, in this view, failures of the energy landscape — the model finds a metastable state that is not the correct minimum, and the dynamics cannot escape. Spin glass physics does not explain these failures in detail, but it gives a language for thinking about them. That is what physics is for.</p>
<h2 id="the-universality-argument">The universality argument</h2>
<p>Let me make the deeper claim explicit. Why should disordered magnets, associative memory networks, and protein folding all live in the same mathematical family?</p>
<p>Because they all have the same structure: many interacting degrees of freedom with competing constraints, a combinatorially large configuration space, an energy landscape with exponentially many metastable states, and dynamics that search for — and frequently fail to find — global minima. This is a universality class. The specific details (magnetic moments versus neuron states versus dihedral angles) are irrelevant at the level of the energy landscape topology.</p>
<p>Parisi&rsquo;s contribution was to show that this class has a specific, exactly-solvable structure in mean field theory, characterised by replica symmetry breaking and the ultrametric organisation of states. This was not a solution to one model. It was a description of a universality class. The fact that the Hopfield model is in this class is not a coincidence requiring explanation; it is a mathematical identity requiring verification.</p>
<p>The <a href="/posts/kuramoto-ensemble-sync/">Kuramoto model for coupled oscillators</a> — which I have written about in the context of ensemble synchronisation and neural phase coupling — is another member of this extended family. The synchronisation transition in the Kuramoto model, the glass transition in the SK model, and the memory phase transition in the Hopfield model are all mean-field phase transitions in disordered many-body systems. The mathematics is more similar than the physics syllabi suggest.</p>
<p>When I teach physics and occasionally venture into questions about what the AI tools my students are using actually do, I find myself reaching for this framework. Not because it gives engineering insight into how to train a better model — it does not, particularly — but because it gives honest insight into <em>what kind of thing</em> a neural network is. It is a physical system. It has an energy landscape. Its failures are phase transitions. Its successes are energy minimisation. The vocabulary of statistical mechanics is not a metaphor; it is the correct description.</p>
<p>The Nobel committee noticed. They were right to notice.</p>
<hr>
<p><em>The 2021 and 2024 Nobel Prizes in Physics have now officially bridged the gap between condensed matter physics and machine learning in the public record. For anyone who wants to understand either field more deeply than the press releases suggest, the SK model and the Hopfield network are the right place to start. Both papers are short by modern standards — Parisi&rsquo;s 1979 letter is three pages; Hopfield&rsquo;s 1982 PNAS paper is five — and both repay close reading.</em></p>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-Sherrington1975"></span>Sherrington, D., &amp; Kirkpatrick, S. (1975). Solvable model of a spin-glass. <em>Physical Review Letters</em>, 35(26), 1792–1796. <a href="https://doi.org/10.1103/PhysRevLett.35.1792">DOI: 10.1103/PhysRevLett.35.1792</a></p>
</li>
<li>
<p><span id="ref-Parisi1979"></span>Parisi, G. (1979). Infinite number of order parameters for spin-glasses. <em>Physical Review Letters</em>, 43(23), 1754–1756. <a href="https://doi.org/10.1103/PhysRevLett.43.1754">DOI: 10.1103/PhysRevLett.43.1754</a></p>
</li>
<li>
<p><span id="ref-Hopfield1982"></span>Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. <em>Proceedings of the National Academy of Sciences</em>, 79(8), 2554–2558. <a href="https://doi.org/10.1073/pnas.79.8.2554">DOI: 10.1073/pnas.79.8.2554</a></p>
</li>
<li>
<p><span id="ref-Ackley1985"></span>Ackley, D. H., Hinton, G. E., &amp; Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. <em>Cognitive Science</em>, 9(1), 147–169. <a href="https://doi.org/10.1207/s15516709cog0901_7">DOI: 10.1207/s15516709cog0901_7</a></p>
</li>
<li>
<p><span id="ref-Amit1985"></span>Amit, D. J., Gutfreund, H., &amp; Sompolinsky, H. (1985). Storing infinite numbers of patterns in a spin-glass model of neural networks. <em>Physical Review Letters</em>, 55(14), 1530–1533. <a href="https://doi.org/10.1103/PhysRevLett.55.1530">DOI: 10.1103/PhysRevLett.55.1530</a></p>
</li>
<li>
<p><span id="ref-Jumper2021"></span>Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. <em>Nature</em>, 596, 583–589. <a href="https://doi.org/10.1038/s41586-021-03819-2">DOI: 10.1038/s41586-021-03819-2</a></p>
</li>
<li>
<p><span id="ref-Ramsauer2020"></span>Ramsauer, H., Schäfl, B., Lehner, J., Seidl, P., Widrich, M., Adler, T., Gruber, L., Holzleitner, M., Pavlović, M., Sandve, G. K., Greiff, V., Kreil, D., Kopp, M., Klambauer, G., Brandstetter, J., &amp; Hochreiter, S. (2020). Hopfield networks is all you need. <em>arXiv:2008.02217</em>. Retrieved from <a href="https://arxiv.org/abs/2008.02217">https://arxiv.org/abs/2008.02217</a></p>
</li>
<li>
<p><span id="ref-Nobel2024"></span>Nobel Prize Committee. (2024). Scientific background: Machine learning and physical systems. The Royal Swedish Academy of Sciences. Retrieved from <a href="https://www.nobelprize.org/prizes/physics/2024/advanced-information/">https://www.nobelprize.org/prizes/physics/2024/advanced-information/</a></p>
</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Why Cats Purr at 25 Hz: Vocal Fold Pads and the Physics of Self-Sustained Oscillation</title>
      <link>https://sebastianspicker.github.io/posts/purring-physics-vocal-fold-pads/</link>
      <pubDate>Mon, 09 Sep 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/purring-physics-vocal-fold-pads/</guid>
      <description>For decades, the mechanism of purring was disputed. A 2023 paper in Current Biology showed that cat larynges purr without any neural input: airflow alone drives a self-sustained oscillation. The secret is connective tissue pads embedded in the vocal folds that increase effective mass and lower the resonant frequency to 25–30 Hz — the same range used clinically for bone- density stimulation and fracture healing under Wolff&amp;rsquo;s law.</description>
      <content:encoded><![CDATA[<p><em>The first thing either of our cats did when I sat still long enough was purr.
Not after food, not during play — the purr arrived when I sat down and held
still and they settled against me, and it arrived as a physical fact, a vibration
felt through the sternum and the ribs, not merely heard. The frequency was low:
around 25–30 cycles per second, which you can feel as a buzz rather than hear
as a tone. This is, I later confirmed, not far from the frequency at which
clinical devices stimulate bone growth. They are indoor cats now, on our vet&rsquo;s
recommendation — they find this unreasonable, but sitting still and being purred
on has become a regular feature of working from home.</em></p>
<p><em>The physics of how the larynx produces that frequency is, as of 2023, finally
resolved — and the mechanism is more elegant than anyone suspected.</em></p>
<hr>
<h2 id="the-frequency-and-its-peculiarity">The Frequency and Its Peculiarity</h2>
<p>Domestic cats purr at approximately $25$–$30\,\mathrm{Hz}$. This is
remarkably low for an animal of cat size. A human vocal fold — roughly
comparable in size — vibrates at $85$–$255\,\mathrm{Hz}$ for normal speech.
A cat&rsquo;s larynx is smaller than a human&rsquo;s, not larger, which makes the low
frequency surprising: in a simple spring-mass oscillator model, smaller and
lighter vocal folds should vibrate <em>faster</em>, not slower.</p>
<p>The frequency range $25$–$50\,\mathrm{Hz}$ has clinical significance in a
different field. Therapeutic vibration platforms used in sports medicine and
osteoporosis treatment operate in exactly this range, exploiting Wolff&rsquo;s law
(bone remodelling under mechanical stress) to increase bone density and
accelerate fracture repair. The coincidence is suggestive. It was first
noted quantitatively by von Muggenthaler (2001, <em>Journal of the Acoustical
Society of America</em> 110, 2666), who recorded purrs from 44 felids and
found that all produced dominant frequencies between $25$ and $150\,\mathrm{Hz}$.</p>
<p>Whether cats deliberately exploit this frequency for self-healing is a separate
biological question. The physics question is simpler: how does the larynx
produce it?</p>
<hr>
<h2 id="flow-induced-vocal-fold-oscillation">Flow-Induced Vocal Fold Oscillation</h2>
<p>Vocal fold oscillation in mammals is a flow-induced, self-sustained mechanical
phenomenon. The Bernoulli effect and elastic restoring forces create a
feedback loop that keeps the folds oscillating as long as subglottal air
pressure is maintained.</p>
<p>The mechanism is as follows. The lungs supply a steady subglottal pressure
$p_\mathrm{sub}$. This drives airflow through the glottis (the gap between the
vocal folds). As the folds are pushed apart by the pressure, the airflow
velocity in the narrowed glottis increases; by Bernoulli&rsquo;s principle,</p>
$$p + \tfrac{1}{2}\rho v^2 = \mathrm{const},$$<p>the pressure drops, drawing the folds back together. The folds&rsquo; elastic
restoring force adds to this: they spring back when displaced. The result is
an oscillation — the folds open and close periodically, chopping the airflow
into pressure pulses that we perceive as sound (or vibration, for low
frequencies).</p>
<p>The fundamental frequency is approximately:</p>
$$f_0 \approx \frac{1}{2L}\sqrt{\frac{T}{\rho_s}},$$<p>where $L$ is the vibrating length of the vocal fold, $T$ is the longitudinal
tension, and $\rho_s$ is the surface density (mass per unit area). This is
the same formula as for a vibrating string — and the physics is closely
related.</p>
<p>For a cat-sized larynx with $L \approx 1\,\mathrm{cm}$, realistic tissue
tension, and tissue density $\rho_s \sim 1\,\mathrm{kg/m}^2$, this formula
gives $f_0$ in the hundreds of hertz — far above the observed purring
frequency of $25$–$30\,\mathrm{Hz}$.</p>
<p>Something is missing from the model.</p>
<hr>
<h2 id="the-long-standing-controversy">The Long-Standing Controversy</h2>
<p>Until 2023, the dominant explanation for the low purring frequency was the
<strong>Active Muscular Contraction (AMC) hypothesis</strong>: the laryngeal muscles
contract rhythmically at the purring frequency, mechanically driving the
vocal folds rather than relying on passive aeroelastic oscillation. On this
view, purring is more like a drumming than a singing — the neural drive at
$25$–$30\,\mathrm{Hz}$ sets the frequency, overriding the natural aeroelastic
frequency.</p>
<p>The AMC hypothesis was difficult to test directly because the larynx is
inaccessible in a live, purring cat without interfering with the purr.
Electromyographic recordings from laryngeal muscles of purring cats showed
rhythmic activity consistent with the AMC hypothesis, but causality was unclear:
were the muscles driving the oscillation, or responding to it?</p>
<p>The alternative hypothesis — that purring is passive, driven purely by
aeroelastic forces — faced the problem noted above: the aeroelastic frequency
of a cat-sized larynx should be far too high to explain $25$–$30\,\mathrm{Hz}$.
Unless something was being added to the vocal folds to lower their effective
resonant frequency.</p>
<hr>
<h2 id="herbst-et-al-2023-the-mass-loading-mechanism">Herbst et al. 2023: The Mass-Loading Mechanism</h2>
<p>In October 2023, Christian Herbst and colleagues at the University of Vienna
published &ldquo;Domestic cat larynges can produce purring frequencies without neural
input&rdquo; (<em>Current Biology</em> 33, 4727–4732). The experiment was decisive.</p>
<p>The team excised larynges from domestic cats (post-mortem, within a short time
window to preserve tissue properties) and mounted them in a flow bench: a
controlled airflow was supplied to the subglottal side, and the larynges were
held at physiologically realistic tension and hydration.</p>
<p><strong>The result</strong>: all eight excised larynges produced self-sustained oscillations
at $25$–$30\,\mathrm{Hz}$ — the normal purring frequency — without any neural
input whatsoever. No muscular contraction was present (no motor neurons, no
calcium signalling, no ATP). The oscillation was purely passive, driven by the
airflow and maintained by the tissue mechanics.</p>
<p>This ruled out the AMC hypothesis. The neural drive is not needed to sustain
the oscillation; it may modulate it, start or stop it, but the fundamental
frequency is set by the tissue mechanics, not the neural firing rate.</p>
<p>The follow-up finding was the key to the physics: histological analysis of the
vocal fold tissue revealed <strong>connective tissue pads</strong> embedded in the vocal
fold mucosa, up to $4\,\mathrm{mm}$ thick. These pads are not present in the
vocal folds of humans or other mammals that do not purr. They increase the
effective mass of the oscillating tissue significantly, without adding
corresponding stiffness.</p>
<hr>
<h2 id="the-mass-loading-physics">The Mass-Loading Physics</h2>
<p>The fundamental frequency of a harmonic oscillator is:</p>
$$f_0 = \frac{1}{2\pi}\sqrt{\frac{k}{m}},$$<p>where $k$ is the effective stiffness and $m$ is the effective mass. Adding mass
(at constant stiffness) lowers the frequency as $f_0 \propto m^{-1/2}$.</p>
<p>For the vocal folds, the spring constant $k$ is set by tissue tension and
elasticity — properties that the tissue pads do not significantly alter. But
the pads add a substantial mass $\Delta m$ to the oscillating system. The
purring frequency becomes:</p>
$$f_\mathrm{purr} = \frac{1}{2\pi}\sqrt{\frac{k}{m_0 + \Delta m}},$$<p>where $m_0$ is the baseline vocal fold mass and $\Delta m$ is the added mass
from the pads.</p>
<p>As a rough estimate: if the unloaded aeroelastic frequency were in the
range $f_\mathrm{normal} \approx 200$–$400\,\mathrm{Hz}$ (the range of
cat meow fundamental frequencies), lowering it to $f_\mathrm{purr} \approx
25\,\mathrm{Hz}$ would require a mass increase by a factor of</p>
$$\frac{m_0 + \Delta m}{m_0} = \left(\frac{f_\mathrm{normal}}{f_\mathrm{purr}}\right)^2
\approx 64\text{–}256.$$<p>This is a large factor, but not implausible for pads up to 4 mm thick
embedded in a mucosal membrane that is itself very thin. The simple
harmonic oscillator model is an idealisation — the actual frequency reduction
also involves changes in vibration mode shape, tissue coupling, and
aerodynamic loading — but the mass-loading effect is the dominant mechanism.
The tissue pads are, in effect, frequency dividers: they convert a
high-frequency aeroelastic oscillator into a low-frequency vibration
generator.</p>
<p>This is the same principle used in engineering to lower the natural frequency
of mechanical structures: add mass without changing stiffness. Tuned mass
dampers in skyscrapers work on the same principle. So do the heavy flywheel
weights added to engines to suppress rotational vibration.</p>
<p>The cat&rsquo;s larynx evolved this solution independently, and with a mass ratio
that would impress a structural engineer.</p>
<hr>
<h2 id="the-self-sustained-oscillation-criterion">The Self-Sustained Oscillation Criterion</h2>
<p>Not every mass-loaded oscillator will self-sustain under airflow. The
Bernoulli-elastic feedback loop must overcome the viscous damping of the
tissue. A dimensional scaling estimate for the critical subglottal pressure is:</p>
$$p^* \sim \eta_\mathrm{tissue} \cdot \frac{v}{L} \sim \eta_\mathrm{tissue} \cdot f_0,$$<p>where $\eta_\mathrm{tissue}$ is the tissue viscosity, $v \sim f_0 L$ is the
characteristic mucosal wave velocity, and $L$ is the fold length. (The full
phonation threshold pressure, as derived by Titze (2006), depends on
additional geometric and aerodynamic parameters.) For typical laryngeal tissue properties and the observed purring
frequency, this critical pressure is of order $100$–$200\,\mathrm{Pa}$ —
low enough to be sustained by the respiratory system without extraordinary
effort.</p>
<p>This is consistent with the observation that cats can purr both during
inhalation and exhalation, maintaining a continuous acoustic output throughout
the breathing cycle. The oscillation threshold is low enough that normal
respiration can maintain it.</p>
<hr>
<h2 id="wolffs-law-and-the-25-hz-coincidence">Wolff&rsquo;s Law and the 25 Hz Coincidence</h2>
<p>Julius Wolff (1892) proposed that bone remodels in response to mechanical
loading: osteoblasts (bone-building cells) are stimulated by cyclic compressive
stress, while osteoclasts (bone-resorbing cells) dominate in the absence of
loading. This principle — now called Wolff&rsquo;s law — underpins the use of
therapeutic vibration in orthopaedics.</p>
<p>The optimal frequency for osteoblast stimulation, determined empirically in
clinical studies, is $20$–$50\,\mathrm{Hz}$. Vibration at these frequencies,
applied at amplitudes of $0.2$–$1.0\,g$ (where $g$ is gravitational
acceleration), produces measurable increases in bone mineral density, accelerates
fracture healing, and reduces bone loss in microgravity. The frequency range
is not a narrow resonance; it reflects the natural frequencies of cellular
mechanotransduction pathways involving focal adhesion kinase (FAK) and
integrin signalling.</p>
<p>Cat purring produces vibration in the frequency range $25$–$50\,\mathrm{Hz}$
at the body surface. Whether this is sufficient to produce meaningful bone
stimulation — and whether cats evolved purring partly as a bone-maintenance
mechanism — is not yet resolved by controlled experiments. The hypothesis is
physiologically plausible: cats conserve metabolic energy by resting for up
to 16 hours per day, and during this rest period, bone would normally be
unstressed and subject to resorption. A continuous low-frequency vibration
during rest could counteract this.</p>
<p>This is speculative at the level of evolutionary causation. What is not
speculative is that the purring frequency overlaps precisely with the
therapeutic vibration range, and that this overlap is not obviously accidental.</p>
<hr>
<h2 id="across-felid-species">Across Felid Species</h2>
<p>Von Muggenthaler&rsquo;s 2001 survey of 44 felids found that most domestic
cats purr in the range $25$–$30\,\mathrm{Hz}$, with harmonics at $50$,
$75\,\mathrm{Hz}$, and so on. Cheetahs purr at $20$–$25\,\mathrm{Hz}$;
pumas (mountain lions) at $20$–$30\,\mathrm{Hz}$; servals and ocelots at
$22$–$28\,\mathrm{Hz}$.</p>
<p>The large roaring cats — lions, tigers, leopards, jaguars — do not purr in
the continuous sense that domestic cats do. Their enlarged hyoid apparatus
allows roaring by a different mechanism (a modified laryngeal pad that
allows very low-frequency, high-intensity sound production). Some large cats
produce purr-like sounds during exhalation but not the continuous through-
inhalation-and-exhalation purring of smaller felids.</p>
<p>The vocal fold pad mechanism appears to be specific to the non-roaring felids,
though detailed histological comparisons across species are still sparse.</p>
<hr>
<h2 id="what-i-hear">What I Hear</h2>
<p>When one of our cats purrs while settled against me, what I am feeling is the
mechanical resonance of a mass-loaded aeroelastic oscillator at approximately
$25\,\mathrm{Hz}$, the frequency having been lowered by connective tissue pads
from a natural aeroelastic frequency several hundred hertz higher. The pads
evolved, we think, to produce exactly this frequency — sustained under normal
respiratory airflow pressure with no additional muscular energy. The acoustic
output is a byproduct of a vibration.</p>
<p>Whether the vibration serves a direct physiological function in the cat&rsquo;s own
bones is, as of this writing, still an open question. What seems clear is that
the 2023 paper settled the mechanism question conclusively: the frequency is
set by mass loading, not neural drive. The larynx purrs by itself when you
blow air through it.</p>
<p>I find this reassuring. The physics is in the cat, not in its nervous system.
The cat purrs the way a tuning fork rings — not because it decides to, but
because that is what it does when the conditions are right.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Herbst, C.T., Prigge, T., Garcia, M., Hampala, V., Hofer, R., Weissengruber,
G.E., Svec, J.G., &amp; Fitch, W.T. (2023). Domestic cat larynges can produce
purring frequencies without neural input. <em>Current Biology</em>, 33(22),
4727–4732.e4. <a href="https://doi.org/10.1016/j.cub.2023.09.014">https://doi.org/10.1016/j.cub.2023.09.014</a></p>
</li>
<li>
<p>von Muggenthaler, E. (2001). The felid purr: A healing mechanism?
<em>Journal of the Acoustical Society of America</em>, 110(5), 2666.
<a href="https://doi.org/10.1121/1.4777098">https://doi.org/10.1121/1.4777098</a></p>
</li>
<li>
<p>Titze, I.R. (2006). <em>The Myoelastic Aerodynamic Theory of Phonation.</em>
National Center for Voice and Speech.</p>
</li>
<li>
<p>Wolff, J. (1892). <em>Das Gesetz der Transformation der Knochen.</em> A. Hirschwald.
(English translation: Maquet, P., &amp; Furlong, R., 1986. <em>The Law of Bone
Remodelling.</em> Springer.)</p>
</li>
<li>
<p>Rubin, C.T., &amp; Lanyon, L.E. (1984). Regulation of bone formation by applied
dynamic loads. <em>Journal of Bone and Joint Surgery</em>, 66(3), 397–402.
<a href="https://doi.org/10.2106/00004623-198466030-00012">https://doi.org/10.2106/00004623-198466030-00012</a></p>
</li>
<li>
<p>Christiansen, P. (2008). Evolution of skull and mandible shape in cats
(Carnivora: Felidae). <em>PLOS ONE</em>, 3(7), e2807.
<a href="https://doi.org/10.1371/journal.pone.0002807">https://doi.org/10.1371/journal.pone.0002807</a></p>
</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Why 44,100? The Accidental Physics of the CD Sampling Rate</title>
      <link>https://sebastianspicker.github.io/posts/why-44100-hz-cd-sampling-rate/</link>
      <pubDate>Mon, 05 Aug 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/why-44100-hz-cd-sampling-rate/</guid>
      <description>The CD sampling rate is not a round number chosen by committee. It is the direct output of 1970s NTSC and PAL video engineering — and both standards, designed on different continents, converge on exactly the same number.</description>
      <content:encoded><![CDATA[<p><em>44,100 Hz. Not 44,000. Not 48,000. Not even 40,000 or 50,000, which would at least have the virtue of roundness. The number that defines CD-quality audio is specific in a way that invites a question most people never think to ask: why that number?</em></p>
<hr>
<h2 id="the-puzzle">The Puzzle</h2>
<p>When a physical constant turns out to be $1.6 \times 10^{-19}$ coulombs, that is just nature being nature — no further explanation is needed or available. But when an engineering standard settles on 44,100 Hz rather than, say, 44,000 Hz or 45,000 Hz, there is a story hiding in the specificity.</p>
<p>The standard answer — the one you find on Wikipedia and in most popular accounts — is that 44.1 kHz satisfies the Nyquist criterion for 20 kHz audio, and so it was chosen to preserve the full range of human hearing. This is true. It is also almost completely uninformative. The Nyquist criterion for 20 kHz audio requires only that the sampling rate exceed 40 kHz. That constraint is satisfied by 40,001 Hz as much as by 44,100 Hz. The specific value requires a different explanation entirely.</p>
<p>That explanation involves a Sony engineer, a consumer videocassette recorder, and the accidental convergence of two television standards developed independently on different continents. The number 44,100 is not an optimisation. It is an archaeological deposit. And like most archaeological deposits, it is still with us long after the civilisation that created it has disappeared.</p>
<p>I want to work through the physics first, because the Nyquist theorem is genuinely beautiful and is often presented in a way that obscures what it actually says. Then I want to show you the arithmetic that makes 44,100 inevitable given 1970s constraints — and the way NTSC and PAL, designed for completely different reasons, conspire to produce the same number. If you enjoy &ldquo;hidden mathematics in music,&rdquo; you might also find it in <a href="/posts/euclidean-rhythms/">Euclidean Rhythms</a>, where a 2,300-year-old algorithm turns out to encode the structure of West African and Cuban percussion.</p>
<hr>
<h2 id="the-nyquistshannon-sampling-theorem">The Nyquist–Shannon Sampling Theorem</h2>
<p>Before the archaeology, the physics.</p>
<p>In 1928, Harry Nyquist published a paper on telegraph transmission theory that contained, somewhat incidentally, the germ of what would become one of the most consequential theorems in applied mathematics <a href="#ref-4">[4]</a>. Claude Shannon formalised and generalised it in 1949 <a href="#ref-5">[5]</a>. The theorem states: a continuous bandlimited signal whose highest frequency component is $f_{\max}$ can be perfectly reconstructed from discrete samples taken at rate $f_s$ if and only if</p>
$$f_s > 2 f_{\max}.$$<p>The quantity $f_s / 2$ is called the Nyquist frequency. Sampling below it causes <em>aliasing</em>: high-frequency components fold back into the spectrum and appear as spurious low-frequency artefacts that are indistinguishable from genuine signal. Once you have aliased a signal, the damage is permanent. Sampling at or above the Nyquist rate, the theorem says, causes no information loss at all — the original continuous waveform can be recovered exactly, in principle, from the discrete sample sequence.</p>
<p>Human hearing extends from roughly 20 Hz to 20 kHz (and, for most adults over thirty, substantially less at the top end, but 20 kHz is the canonical engineering requirement). Setting $f_{\max} = 20$ kHz, the Nyquist criterion requires $f_s > 40$ kHz.</p>
<p>But here is the subtlety that the Wikipedia summary tends to skip. The theorem assumes that the signal is <em>perfectly</em> bandlimited before sampling — meaning that all energy above $f_{\max}$ has been removed. This requires an <em>anti-aliasing filter</em>: a low-pass filter applied to the analogue signal before the analogue-to-digital converter samples it. If your anti-aliasing filter passes everything up to 20 kHz and blocks everything above it with perfect sharpness, then 40,001 Hz would suffice. The problem is that such a filter is physically unrealisable.</p>
<p>Real filters do not have vertical cutoffs. They have a <em>transition band</em>: a frequency range over which attenuation increases gradually from zero to full suppression. The steeper you want the transition, the higher the filter order, and for practical filter hardware in 1979 — op-amps, capacitors, inductors, no DSP to speak of — a &ldquo;steep enough&rdquo; filter meant a transition band of roughly 10% of the passband edge frequency. For a 20 kHz passband edge, that is about 2 kHz of transition band.</p>
<p>So the actual engineering requirement is not just $f_s > 40$ kHz. It is $f_s > 40$ kHz <em>plus enough headroom for a realisable anti-aliasing filter</em>. With $f_s = 44.1$ kHz, the Nyquist limit sits at $f_s/2 = 22.05$ kHz. The gap between the top of the audio band and the Nyquist limit is</p>
$$22{,}050 - 20{,}000 = 2{,}050 \text{ Hz},$$<p>which is just over 10% of 20 kHz. This is enough to build a practical anti-aliasing filter with 1970s and early 1980s analogue components. Had the sampling rate been 41 kHz, the gap would have been only 500 Hz — far too narrow for affordable hardware. Had it been 50 kHz, the gap would have been more comfortable, but you would be storing 13.6% more data per second for no audible benefit.</p>
<p>So 44.1 kHz is in the right <em>neighbourhood</em> given real-world filter constraints. But it is still a specific number. The question of why 44,100 rather than 44,000 or 43,500 or 44,800 is still open. That is where the VCRs come in.</p>
<hr>
<h2 id="the-vcr-problem">The VCR Problem</h2>
<p>In the late 1970s, Sony was developing what would eventually become the Compact Disc. One of the fundamental engineering problems was storage: where do you put the digital audio data? A 74-minute stereo recording at 16 bits and 44.1 kHz generates roughly 780 megabytes. In 1979, that was an absurd quantity of data. Hard drives with that capacity existed but cost tens of thousands of dollars and weighed as much as a washing machine. Dedicated digital tape formats existed in professional studios but were exotic and expensive <a href="#ref-1">[1]</a>.</p>
<p>The only affordable high-bandwidth magnetic recording medium available to consumer-facing engineers in 1979 was the VCR — the videocassette recorder. VHS and Betamax had recently become consumer products, and the tape and drive mechanism was cheap, reliable, and capable of storing several hours of high-bandwidth video signal. That video signal bandwidth was substantial: enough, in principle, to carry digital audio if you could get it onto the tape in the right form.</p>
<p>Sony&rsquo;s solution was elegant to the point of audacity. Rather than inventing a new tape format, they encoded digital audio samples as a black-and-white pseudo-video signal — patterns of light and dark pixels that a standard VCR recorded without modification, because as far as the VCR was concerned it was just receiving a monochrome video feed. The resulting device, the Sony PCM-1600 (1979), was a standalone unit that sat between a microphone preamplifier and a VCR, converting audio to fake video for recording and back to audio for playback <a href="#ref-3">[3]</a>.</p>
<p>The sampling rate of the audio was now determined not by any audio engineering consideration but by the geometry of the video signal. And the geometry of the video signal was fixed by the television broadcast standard — which brought entirely different historical contingencies into the calculation.</p>
<hr>
<h2 id="the-ntsc-arithmetic">The NTSC Arithmetic</h2>
<p>The NTSC standard — developed in North America and Japan — specifies 30 frames per second and 525 total scan lines per frame. Of those 525 lines, 35 are consumed by the vertical blanking interval (the time needed for the electron beam in a CRT to return from the bottom of the screen to the top). That leaves 490 active lines per frame actually carrying picture information.</p>
<p>Sony packed 3 audio samples into each active scan line. The audio sampling rate is then:</p>
$$f_s = \underbrace{30}_{\text{frames/s}} \times \underbrace{490}_{\text{active lines/frame}} \times \underbrace{3}_{\text{samples/line}} = 44{,}100 \text{ Hz}.$$<p>There it is. 44,100 Hz, emerging not from any consideration of human hearing or filter design, but from the frame rate and line count of the North American television standard.</p>
<hr>
<h2 id="the-pal-arithmetic">The PAL Arithmetic</h2>
<p>Now the European video standard, PAL, which was developed in the 1960s independently of NTSC and optimised for different priorities. PAL uses 25 frames per second and 625 total scan lines per frame. The vertical blanking interval consumes 37 lines, leaving 588 active lines per frame.</p>
<p>Sony packed 3 audio samples into each active PAL scan line as well. The sampling rate:</p>
$$f_s = \underbrace{25}_{\text{frames/s}} \times \underbrace{588}_{\text{active lines/frame}} \times \underbrace{3}_{\text{samples/line}} = 44{,}100 \text{ Hz}.$$<p>The same number.</p>
<p>Let that settle for a moment. NTSC: 30 frames per second, 490 active lines. PAL: 25 frames per second, 588 active lines. Different frame rates. Different line counts. Developed on different continents for different broadcast environments. And yet $30 \times 490 = 25 \times 588 = 14{,}700$, so multiplying by 3 gives 44,100 in both cases.</p>
<p>This is not coincidence in any deep sense — NTSC and PAL were both designed to fill approximately the same video bandwidth, just with different tradeoffs between temporal resolution (frame rate) and spatial resolution (line count). But for Sony&rsquo;s VCR encoding scheme, the numerical convergence was enormously convenient: a single PCM processor running at 44.1 kHz could record to either NTSC or PAL video equipment without any change to the audio electronics. The same master machine could work in Tokyo and in Frankfurt.</p>
<p>The arithmetic is, I think, one of those moments where a coincidence that is perfectly explicable in hindsight still feels satisfying in the way that a physical derivation feels satisfying. You set up the constraints — fill the video bandwidth, pack an integer number of samples per line, keep the number of samples small enough to fit in a line&rsquo;s worth of data — and the number 44,100 falls out of two independent calculations like a constant of nature. It is not a constant of nature. It is a contingent product of mid-twentieth-century broadcast engineering. But the mathematics does not care.</p>
<hr>
<h2 id="from-tape-to-disc">From Tape to Disc</h2>
<p>When Philips and Sony sat down to negotiate the Red Book standard — the technical specification for the Compact Disc, finalised in 1980 and commercially launched in 1982 — both companies brought existing infrastructure to the table <a href="#ref-3">[3]</a>. Both had been building digital audio equipment for several years. Both had PCM processors running in professional studios. Both had catalogues of digital masters recorded on VCR tape. And all of that equipment ran at 44.1 kHz, because all of it had been built to interface with the video tape standard that made digital audio recording practically affordable in the first place.</p>
<p>Changing the sampling rate for the CD would have required rebuilding the entire mastering chain: new PCM processors, new format conversion hardware, new master tape libraries. The economic and logistical cost would have been enormous. The 44.1 kHz rate was not chosen for the CD because it was optimal in any absolute engineering sense. It was chosen because it was already there <a href="#ref-1">[1]</a>, <a href="#ref-2">[2]</a>.</p>
<p>This is a pattern worth recognising. Major technical standards are rarely chosen by optimisation from first principles. They are chosen by consolidating what already exists. The QWERTY keyboard layout was optimised for typewriter mechanisms that no longer exist. The 60 Hz AC frequency in North America was set by Westinghouse generators installed in the 1890s. The 44.1 kHz CD sampling rate was set by VCR tape recorders that were obsolete within a decade of the CD&rsquo;s launch.</p>
<hr>
<h2 id="the-other-rates">The Other Rates</h2>
<p>Not all digital audio runs at 44.1 kHz, and the coexistence of different rates in the modern audio industry is the direct legacy of 44.1 kHz&rsquo;s awkward origins.</p>
<p><strong>48 kHz</strong> is the professional broadcast and studio standard. It is used in digital video, in DAT tape, in most professional audio interfaces, and in the digital audio embedded in broadcast television signals — including, as a matter of course, in the digital television infrastructure described in the context of university video platforms like <a href="/posts/educast-nrw-hochschul-youtube/">educast.nrw</a>. Why 48? Broadcast infrastructure needed a rate that had clean integer relationships with the 32 kHz rate used in early satellite and ISDN broadcast systems. The relationship $48 = \frac{3}{2} \times 32$ is exact, making synchronisation straightforward. 44.1 kHz has no such clean relationship with anything in broadcast engineering.</p>
<p>The ratio between the two dominant rates is $48 / 44.1 = 160 / 147$. This fraction — irreducible, inelegant, non-obvious — is the source of essentially every sample-rate conversion problem in audio post-production. When a CD master (44.1 kHz) is prepared for broadcast (48 kHz), a sample-rate converter must interpolate 147 samples up to 160 samples, or downsample 160 samples to 147, at every moment. The process introduces small errors, and doing it well requires significant computational effort. Every time a musician&rsquo;s recording moves between the consumer and professional audio worlds, it passes through this fractional bottleneck. Two standards that could have been made compatible were instead set by completely independent historical processes, and we have been paying the computational tax ever since.</p>
<p><strong>96 kHz and 192 kHz</strong> are marketed as &ldquo;high-resolution audio.&rdquo; Here the physics gets genuinely murky and the claims made by the audio industry deserve some scepticism. Human hearing above 20 kHz is, for most adults, genuinely absent — not reduced, but absent, because the outer hair cells in the cochlea that respond to those frequencies progressively die from the teenage years onward and are not replaced. The argument for high sampling rates is typically one of two things: first, that ultrasonic content can cause <em>intermodulation distortion</em>, where sum and difference frequencies of ultrasonic components fall back into the audible band; second, that a higher sampling rate allows for a more relaxed anti-aliasing filter with better phase behaviour within the audible band.</p>
<p>Both effects are real and measurable in laboratory conditions. Whether they are <em>audible</em> under controlled double-blind listening conditions is a separate and more contested question. The published evidence is not strong. What is not contested is that 96 kHz files are twice the size of 44.1 kHz files, and 192 kHz files are more than four times the size, for the same bit depth and the same number of audio channels. Whether that storage cost buys anything audible is, as of the current state of the literature, an open question.</p>
<hr>
<h2 id="the-irony">The Irony</h2>
<p>Here is the situation we are actually in. The canonical digital audio format — 16-bit, 44.1 kHz PCM, the format that defined CD quality for a generation and that remains the standard for music distribution — is physically a photograph of analogue video tape. The digitisation of music was made possible by television engineering. The specific number that defines the fidelity of every CD ever pressed is determined by the frame rates and line counts of 1970s broadcast television standards, which were themselves determined by the capabilities of 1940s CRT technology and the political negotiations of early broadcast licensing bodies.</p>
<p>When someone tells you that 44.1 kHz is the &ldquo;natural&rdquo; or &ldquo;perfect&rdquo; sampling rate for audio, they are, without knowing it, paying tribute to the NTSC standards committee of 1941 and the PAL engineers of the 1960s. The number carries history in it the way a fossil carries the structure of a long-dead organism. It is the right number, in the sense that it works. Its rightness has nothing to do with the reasons it was chosen.</p>
<p>I find this genuinely satisfying rather than disappointing. The history of physics and engineering is full of contingent numbers that turned out to be good enough, and whose goodness was only rationalised after the fact. The metre was originally defined as one ten-millionth of the distance from the equator to the North Pole along the Paris meridian — an arbitrary geodetic choice that turned out to produce a unit of length that is remarkably convenient for human-scale physics. The kilogram was a cylinder of platinum-iridium alloy in a vault outside Paris for over a century. 44,100 Hz is in good company.</p>
<hr>
<h2 id="the-archaeology-of-a-number">The Archaeology of a Number</h2>
<p>The numbers we inherit from engineering history are rarely arbitrary at every level simultaneously. 44,100 Hz is not arbitrary at the level of sampling theory: it satisfies the Nyquist criterion with enough headroom for a physically realisable anti-aliasing filter, given 1970s component technology. That is a genuine constraint, and the number sits in the right region of parameter space for it.</p>
<p>But it is arbitrary at a deeper level: it is the specific number that happened to fit a video tape format that happened to be affordable in 1979, a format that was itself determined by broadcast standards that were set for entirely unrelated reasons decades earlier. The chain of contingencies runs: 1940s television engineering defines NTSC and PAL frame rates and line counts; 1970s consumer VCR technology makes those tape formats cheap; 1979 Sony engineers encode digital audio as fake video; the arithmetic of the video formats fixes the sampling rate at 44,100 Hz; that rate gets locked into the CD standard in 1980; 44.1 kHz becomes the defining frequency of a digital music format that ships billions of units over the following four decades.</p>
<p>Science and engineering produce exact numbers from messy contingencies. The number 44,100 is simultaneously a theorem output (it satisfies a well-defined engineering constraint), a historical accident (it is determined by the specific video tape hardware that existed in 1979), and an institutional fossil (it outlasted the VCRs that created it by four decades and counting). All three things are true at the same time.</p>
<p>The VCRs are gone. The sampling rate remains.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Pohlmann, K. C. (2010). <em>Principles of Digital Audio</em> (6th ed.). McGraw-Hill.</p>
<p><span id="ref-2"></span>[2] Watkinson, J. (2001). <em>The Art of Digital Audio</em> (3rd ed.). Focal Press.</p>
<p><span id="ref-3"></span>[3] Immink, K. A. S. (1998). The compact disc story. <em>Journal of the AES</em>, 46(5), 458–465.</p>
<p><span id="ref-4"></span>[4] Nyquist, H. (1928). Certain topics in telegraph transmission theory. <em>Transactions of the AIEE</em>, 47(2), 617–644.</p>
<p><span id="ref-5"></span>[5] Shannon, C. E. (1949). Communication in the presence of noise. <em>Proceedings of the IRE</em>, 37(1), 10–21.</p>
]]></content:encoded>
    </item>
    <item>
      <title>How Cats Drink: Inertia, Gravity, and the Froude Number at the Tip of a Tongue</title>
      <link>https://sebastianspicker.github.io/posts/how-cats-drink-froude-number/</link>
      <pubDate>Mon, 22 Jul 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/how-cats-drink-froude-number/</guid>
      <description>Cats do not scoop water with their tongues — they exploit a delicate balance between inertia and gravity at the air-water interface. The tip of the tongue just touches the surface; rapid withdrawal pulls a fluid column upward; the jaw closes at exactly the moment the column peaks. Reis, Jung, Aristoff, and Stocker (Science, 2010) showed that the lapping frequency of all felids — from domestic cats to lions — is tuned so that the Froude number at the tongue tip is approximately unity.</description>
      <content:encoded><![CDATA[<p><em>I have spent a non-trivial amount of time watching our cats drink — they are
indoor-only cats, on our vet&rsquo;s advice, which gives them few distractions and
gives me ample opportunity to observe. This is not entirely voluntary. Once you have noticed that something is happening at the
water bowl that does not look right — the tongue moves too fast, the water
column is pulled upward rather than scooped, the jaw closes before the tongue
returns — you find yourself crouching beside the bowl with your phone propped
against a chair, filming at 240 frames per second and feeling that you have
perhaps chosen an unusual way to spend a Tuesday morning.</em></p>
<p><em>Pedro Reis, Sunghwan Jung, Jeffrey Aristoff, and Roman Stocker had the same
impulse, with better equipment. Their 2010 paper in Science, &ldquo;How Cats Lap:
Water Uptake by Felis catus,&rdquo; is one of the more elegant pieces of dimensional
analysis in recent biology.</em></p>
<hr>
<h2 id="how-cats-do-not-drink">How Cats Do Not Drink</h2>
<p>The simplest hypothesis — that cats curl the tongue into a spoon and scoop
water into the mouth — is false. High-speed photography shows that the cat&rsquo;s
tongue does not form a cup shape. Instead, the cat extends the tongue tip
downward toward the water surface and then rapidly retracts it. The motion is
fast — too fast for normal video — and the tongue barely contacts the surface.</p>
<p>The contrast with dogs is instructive. Dogs <em>do</em> scoop: the tongue curls
backward (not forward), forming a ladle shape that scoops water upward and
backwards into the mouth. The mechanism is vigorous and inefficient — a
significant fraction of the water misses the mouth entirely, which is why
drinking dogs produce splashing and dogs often have wet chins. The mechanism
works but is inelegant.</p>
<p>Cats produce almost no splash. The mechanism is different in kind.</p>
<hr>
<h2 id="the-physical-mechanism">The Physical Mechanism</h2>
<p>Reis et al. (2010) used high-speed photography (1000 frames per second) to
resolve the cat&rsquo;s lapping motion. Their observations:</p>
<ol>
<li>
<p>The cat extends the tongue tip downward until the <em>dorsal surface</em> (the top
side) just touches the water surface. The ventral surface (the smooth
underside) does not contact the water.</p>
</li>
<li>
<p>The cat then rapidly retracts the tongue upward. The tongue tip is moving
at roughly $v \approx 0.7\,\mathrm{m/s}$ during this retraction.</p>
</li>
<li>
<p>As the tongue tip pulls away from the surface, a column of liquid is pulled
upward by the adhesion between the liquid and the retreating tongue. The
column rises against gravity.</p>
</li>
<li>
<p>The column eventually stalls — inertia is overcome by gravity — and begins
to fall back. The cat closes its jaw at exactly the moment of maximum column
height, capturing the peak volume of water.</p>
</li>
<li>
<p>The cat then extends the tongue for the next lap.</p>
</li>
</ol>
<p>The cat closes its jaw before the tongue fully retracts. This is important:
the jaw closure captures the water column, not the water adhering to the tongue.
The tongue is the mechanism that <em>creates</em> the column; the jaw captures it.</p>
<hr>
<h2 id="dimensional-analysis-the-froude-number">Dimensional Analysis: The Froude Number</h2>
<p>The relevant competition is between <strong>inertia</strong> (which drives the column
upward) and <strong>gravity</strong> (which pulls it back down). Surface tension plays a
role in stabilising the column but is not the primary factor governing the
column height.</p>
<p>The balance between inertia and gravity for a fluid column moving at speed
$v$ and of characteristic length scale $L$ (here, the diameter of the tongue
tip, $L \approx 5\,\mathrm{mm}$ for a domestic cat) is captured by the
<strong>Froude number</strong>:</p>
$$\mathrm{Fr} = \frac{v}{\sqrt{gL}},$$<p>where $g = 9.81\,\mathrm{m/s}^2$ is gravitational acceleration.</p>
<p>When $\mathrm{Fr} \ll 1$: gravity dominates, inertia is insufficient to pull a
significant column of water upward. Very slow tongue motion would lift almost
no water.</p>
<p>When $\mathrm{Fr} \gg 1$: inertia dominates, the column rises far above the
surface but the jaw must be closed quickly before the large amount of water
falls back. Very fast tongue motion wastes water and requires rapid jaw closure.</p>
<p>The optimal lapping frequency — maximising captured volume per lap — occurs
near $\mathrm{Fr} \approx 1$, where inertial and gravitational forces are
comparable and the column height is matched to the jaw closure dynamics.</p>
<h3 id="checking-the-numbers-for-a-domestic-cat">Checking the Numbers for a Domestic Cat</h3>
<p>For a domestic cat:</p>
<ul>
<li>Tongue tip diameter: $L \approx 5\,\mathrm{mm} = 5 \times 10^{-3}\,\mathrm{m}$</li>
<li>Characteristic tongue tip speed: $v \approx 0.7\,\mathrm{m/s}$</li>
</ul>
$$\mathrm{Fr} = \frac{0.7}{\sqrt{9.81 \times 5 \times 10^{-3}}}
= \frac{0.7}{\sqrt{0.049}} = \frac{0.7}{0.22} \approx 3.2.$$<p>Reis et al. found Fr of order unity — inertial and gravitational forces
comparable — confirming that the lapping speed is tuned to the inertia-gravity
balance. (The exact numerical value depends on the choice of characteristic
length scale; using the tongue tip diameter as above gives Fr in the range
1–3, squarely in the regime where neither force dominates.)</p>
<hr>
<h2 id="scaling-across-felids">Scaling Across Felids</h2>
<p>The Froude number prediction yields a scaling law for lapping frequency across
felid species of different sizes. If all felids lap at $\mathrm{Fr} \approx 1$,
then the characteristic speed scales as $v \sim \sqrt{gL}$, and the lapping
frequency scales as:</p>
$$f = \frac{v}{d} \sim \frac{\sqrt{gL}}{d},$$<p>where $d$ is the distance the tongue travels per lap (roughly proportional to
tongue length, which scales with body size). Since $L \sim d$ scales with body
size, we get:</p>
$$f \sim \frac{\sqrt{g \cdot d}}{d} = \sqrt{\frac{g}{d}} \propto d^{-1/2}.$$<p>Larger cats have longer tongues and lap more slowly. The prediction is that
lapping frequency scales as the square root of inverse tongue length — or,
equivalently, as the inverse square root of body mass (since linear dimensions
scale as mass$^{1/3}$):</p>
$$f \propto m^{-1/6}.$$<p>Reis et al. tested this against high-speed footage of large felids. A domestic
cat laps at approximately $4\,\mathrm{Hz}$; a lion laps at approximately
$1.2\,\mathrm{Hz}$; a tiger at roughly $1\,\mathrm{Hz}$. The scaling is
consistent with $f \propto m^{-1/6}$ across three orders of magnitude in
body mass.</p>
<p>The table below shows the predicted versus observed scaling:</p>
<table>
  <thead>
      <tr>
          <th>Species</th>
          <th>Body mass (kg)</th>
          <th>Predicted $f$ relative to cat</th>
          <th>Predicted $f$ (Hz)</th>
          <th>Observed $f$ (Hz)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Domestic cat</td>
          <td>4</td>
          <td>1.0</td>
          <td>4.0</td>
          <td>~4.0</td>
      </tr>
      <tr>
          <td>Jaguar</td>
          <td>80</td>
          <td>$\left(\frac{4}{80}\right)^{1/6} \approx 0.61$</td>
          <td>2.4</td>
          <td>~2.0</td>
      </tr>
      <tr>
          <td>Lion</td>
          <td>200</td>
          <td>$\left(\frac{4}{200}\right)^{1/6} \approx 0.52$</td>
          <td>2.1</td>
          <td>~1.5</td>
      </tr>
      <tr>
          <td>Tiger</td>
          <td>220</td>
          <td>$\left(\frac{4}{220}\right)^{1/6} \approx 0.51$</td>
          <td>2.1</td>
          <td>~1.0</td>
      </tr>
  </tbody>
</table>
<p>The $m^{-1/6}$ scaling captures the correct trend — larger cats lap more
slowly — though the predicted frequencies for the largest cats somewhat
overestimate the observed values. The discrepancy may reflect the limitations
of the simple allometric assumption (that all linear dimensions scale as
$m^{1/3}$) and the fact that tongue geometry does not scale isometrically
across the full range of felid body sizes.</p>
<hr>
<h2 id="why-not-just-lick">Why Not Just Lick?</h2>
<p>A natural question: why not simply allow the tongue to fully submerge and
absorb water through the papillae, as the tongue already contacts water when
lapping? Several answers:</p>
<ol>
<li>
<p><strong>Papillae are not sponges.</strong> Feline papillae are hollow and scoop-shaped
(filiform papillae with hollow tips), optimised for grooming and food
manipulation, not passive absorption. Active wicking is limited.</p>
</li>
<li>
<p><strong>The cat cannot breathe with its mouth submerged.</strong> A lapping mechanism
that keeps the mouth mostly closed except for the brief jaw-closure moment
allows continuous breathing through the nose during drinking.</p>
</li>
<li>
<p><strong>Speed and efficiency.</strong> The inertial column mechanism delivers significantly
more water per jaw movement than surface tension adhesion alone. At 4 laps
per second, a domestic cat takes in roughly $0.14\,\mathrm{mL}$ per lap,
for a total of roughly $34\,\mathrm{mL/min}$ — comparable to sipping rates
in animals that use more direct intake mechanisms.</p>
</li>
</ol>
<p>The cat has converged on a hydrodynamically optimal strategy under the
constraint of keeping the oral cavity mostly sealed during the intake cycle.</p>
<hr>
<h2 id="the-robotic-tongue">The Robotic Tongue</h2>
<p>Reis et al. constructed a robotic cat tongue to verify the mechanism: a smooth
glass disc lowered to the water surface and retracted at controlled speeds.
The column height as a function of speed followed the predicted inertia-gravity
balance, confirming that the mechanism does not depend on any specifically
biological property of the tongue — it is a fluid dynamics result that applies
to any surface moving away from a water interface at the right speed.</p>
<p>The robot lapped at the same Froude number as the cat.</p>
<hr>
<h2 id="dogs-horses-and-the-comparison">Dogs, Horses, and the Comparison</h2>
<p>Dogs cup the tongue <em>caudally</em> (backwards) rather than ventrally, forming a
ladle. The mechanism is faster and delivers more water per stroke but is
messy — the ladle is formed outside the mouth, and water sloshes freely. Dogs
lap at roughly $3\,\mathrm{Hz}$ with a tongue tip speed significantly higher
than cats, producing Fr well above unity. The excess inertia is why dog
drinking generates splashing.</p>
<p>Horses, by contrast, create a near-seal with their lips and use suction —
a fundamentally different mechanism that requires no tongue projection at all.
The lapping mechanism of felids is phylogenetically specific and appears to
have evolved under selection pressure for both efficiency and noise suppression,
consistent with the ambush-predator lifestyle. A cat that splashed while
drinking would alert prey at a water source. A cat that laps near-silently
does not.</p>
<hr>
<h2 id="a-note-on-the-measurement">A Note on the Measurement</h2>
<p>Getting reliable high-speed footage of a cat drinking is harder than it sounds.
Our cats drink at different times of day, in different moods, and the presence
of a camera tripod next to the water bowl is regarded as grounds for drinking
elsewhere. Pedro Reis et al. solved this by filming their laboratory cat, Cutta
Cutta, in a controlled setting. Their footage is available online and is
genuinely beautiful: a slow-motion waterfall in miniature, rising improbably
from the tongue tip and held there by the balance between upward momentum and
downward gravity, until the jaw swings shut.</p>
<p>The physics is in the timing.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Reis, P.M., Jung, S., Aristoff, J.M., &amp; Stocker, R. (2010). How cats lap:
Water uptake by <em>Felis catus</em>. <em>Science</em>, 330(6008), 1231–1234.
<a href="https://doi.org/10.1126/science.1195421">https://doi.org/10.1126/science.1195421</a></p>
</li>
<li>
<p>Aristoff, J.M., Stocker, R., Jung, S., &amp; Reis, P.M. (2011). On the water
lapping of felines and the water running of lizards. <em>Communicative &amp;
Integrative Biology</em>, 4(2), 213–215.</p>
</li>
<li>
<p>Vogel, S. (1994). <em>Life in Moving Fluids: The Physical Biology of Flow</em>
(2nd ed.). Princeton University Press.</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-15</strong>: Updated water intake per lap from 0.04 mL to 0.14 mL (Reis et al. report ~0.14 +/- 0.04 mL per lap; the previous value was the standard deviation), and updated the intake rate accordingly (~34 mL/min). Updated the papillae location from ventral to dorsal surface. Updated the Aristoff et al. reference to the correct 2011 <em>Communicative &amp; Integrative Biology</em> article. Removed the Jung &amp; Kim (2012) PRL reference (article number 034501 resolves to a different paper).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Are Cats Liquid? The Deborah Number and the Rheology of Cats</title>
      <link>https://sebastianspicker.github.io/posts/liquid-cats-deborah-number/</link>
      <pubDate>Wed, 03 Apr 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/liquid-cats-deborah-number/</guid>
      <description>Marc-Antoine Fardin won the 2017 Ig Nobel Prize in Physics for proving, rigorously, that cats are liquid. The argument rests on the Deborah number De = τ/T: if the material&amp;rsquo;s relaxation time τ is shorter than the observation time T, the material behaves as a fluid. A cat filling a sink (De ≈ 0.008) is a liquid. A cat bouncing off a table (De ≫ 1) is a solid. The classification is not a joke — it is standard rheology, applied to an unusual substrate.</description>
      <content:encoded><![CDATA[<p><em>One of our strays discovered, sometime in her first winter indoors — they are
strictly indoor cats now, on our vet&rsquo;s recommendation — that she could fit into
a salad bowl. Not sit beside it, not rest her head on its rim: fit into it,
curled into a precise sphere with her tail tucked under her chin and her ears
folded flat, filling the bowl as liquid fills a container. The bowl has a
diameter of 22 centimetres. I did not find this as surprising as perhaps I
should have: there is a quantity in materials science that determines, rigorously,
whether a given material in a given situation should be classified as a solid or
a liquid. For a cat in a bowl, this quantity is comfortably below one.</em></p>
<p><em>The material is a liquid. The material is also a cat.</em></p>
<hr>
<h2 id="the-definition-of-a-fluid">The Definition of a Fluid</h2>
<p>The intuitive distinction between solids and liquids is that solids hold their
shape and liquids conform to their container. But this distinction is one of
timescale, not of material identity.</p>
<p>A classic demonstration: place a ball of silly putty on a table. Over the
course of an hour, it flows slowly outward, taking the shape of the table
surface — clearly a liquid. Strike it sharply with a hammer and it shatters —
clearly a solid. The material has not changed. The timescale of the
interaction has.</p>
<p>The same principle applies to glass (contrary to popular myth, medieval window
glass is not thicker at the bottom because it has flowed — the variation is
from the manufacturing process, and the relaxation time of soda-lime glass at
room temperature is of order $10^{23}$ years — but at elevated temperatures
near the glass transition, silicate glass flows readily). It applies
to mantle rock, which is solid on the scale of earthquake waves and liquid on
the scale of continental drift. It applies to pitch, to ice sheets, to asphalt
on a hot day.</p>
<p>The formal tool for capturing this is the <strong>Deborah number</strong>.</p>
<hr>
<h2 id="the-deborah-number">The Deborah Number</h2>
<p>The Deborah number was introduced by Marcus Reiner in 1964, in a short note
in <em>Physics Today</em> (Reiner 1964). It is defined as:</p>
$$\mathrm{De} = \frac{\tau}{T},$$<p>where $\tau$ is the <strong>relaxation time</strong> of the material — roughly, the
characteristic time over which it can rearrange its internal structure and
relieve stress — and $T$ is the <strong>observation time</strong> or the timescale of the
imposed deformation.</p>
<ul>
<li>$\mathrm{De} \ll 1$: The material relaxes quickly relative to the timescale
of observation. Internal stresses are continuously relieved. The material
behaves as a <strong>fluid</strong>.</li>
<li>$\mathrm{De} \gg 1$: The material relaxes slowly relative to the observation
timescale. Internal stresses persist. The material behaves as a <strong>solid</strong>.</li>
<li>$\mathrm{De} \sim 1$: The material is in a viscoelastic regime — partly
fluid, partly solid, exhibiting time-dependent behaviour that is neither.</li>
</ul>
<p>The name comes from the prophetess Deborah, who sang in Judges 5:5: <em>&ldquo;The
mountains flowed before the Lord.&rdquo;</em> At the timescale of a divine perspective,
mountains are liquid. At the timescale of a human lifetime, they are not.
Reiner&rsquo;s point was that the solid-liquid distinction is not a property of
the material but of the relationship between the material&rsquo;s internal
dynamics and the observer&rsquo;s timescale.</p>
<p>For Newtonian fluids (water, air at ordinary conditions), $\tau \to 0$ and
$\mathrm{De} \to 0$ for any finite observation time — they are always liquid.
For a perfectly elastic solid (an ideal spring), $\tau \to \infty$ and
$\mathrm{De} \to \infty$ for any finite observation time — always solid. Real
materials lie between these extremes.</p>
<hr>
<h2 id="the-maxwell-viscoelastic-model">The Maxwell Viscoelastic Model</h2>
<p>The simplest model of a material with a finite relaxation time is the Maxwell
element: a spring (elastic, spring constant $G$) in series with a dashpot
(viscous, viscosity $\eta$). Under a step stress $\sigma_0$ applied at time
$t = 0$, the strain evolves as:</p>
$$\epsilon(t) = \frac{\sigma_0}{G} + \frac{\sigma_0}{\eta}\,t,$$<p>where $\tau = \eta / G$ is the Maxwell relaxation time. The first term is the
instantaneous elastic deformation of the spring; the second is the linear
viscous creep of the dashpot. For $t \ll \tau$, the elastic strain dominates
and the material behaves as a solid; for $t \gg \tau$, the viscous flow
dominates and the material behaves as a liquid. The material &ldquo;decides&rdquo; whether
to be solid or liquid depending on the ratio of $\tau$ to the duration of the
applied stress — which is precisely the Deborah number.</p>
<p>The <strong>creep compliance</strong> $J(t) = \epsilon(t)/\sigma_0 = t/\eta + 1/G$ grows
linearly with time for $t \gg \tau$, confirming liquid behaviour on long
timescales. The <strong>relaxation modulus</strong> $G(t) = \sigma(t)/\epsilon_0 = G
e^{-t/\tau}$ decays exponentially to zero, confirming that the material
cannot sustain a permanent stress — again, liquid behaviour on long timescales.</p>
<hr>
<h2 id="on-the-rheology-of-cats">On the Rheology of Cats</h2>
<p>In 2014, Marc-Antoine Fardin, a physicist at the ENS Lyon,
published &ldquo;On the Rheology of Cats&rdquo; in the <em>Rheology Bulletin</em> 83(2), 16–17.
The paper asked whether cats satisfy the defining rheological criterion for
liquids, using the Deborah number as the test. Fardin was awarded the 2017
Ig Nobel Prize in Physics — which is awarded for research that &ldquo;makes you
laugh, then makes you think&rdquo; — for this work.</p>
<p>The paper is not a joke. It is standard rheology applied to an unusual material,
with appropriately hedged conclusions and correct citations to the primary
literature on viscoelastic flow. The humour is in the application; the physics
is serious.</p>
<h3 id="estimating-the-cats-relaxation-time">Estimating the Cat&rsquo;s Relaxation Time</h3>
<p>The relaxation time $\tau$ of a cat is the time scale over which the cat&rsquo;s
body deforms to fill a container. This is observable. A cat placed near a
suitable container — a salad bowl, a cardboard box, a bathroom sink —
adopts a conformed shape on a timescale of roughly 5–30 seconds. The initial
posture (stiff, alert) gives way to a relaxed conformation as the cat
assesses the container and adjusts. Fardin estimated $\tau \approx 1$–$30$
seconds, with the exact value depending on the container&rsquo;s attractiveness
to the specific cat.</p>
<p>This is the material&rsquo;s characteristic relaxation time. The fact that it is
finite — that the cat does eventually conform to the container — is the
essential observation.</p>
<h3 id="computing-the-deborah-number-for-various-situations">Computing the Deborah Number for Various Situations</h3>
<p><strong>Scenario 1: Cat in a sink.</strong>
A cat taking ten minutes to settle into a bathroom sink. Observation time
$T = 600\,\mathrm{s}$, relaxation time $\tau \approx 5\,\mathrm{s}$.</p>
$$\mathrm{De}_\mathrm{sink} = \frac{5}{600} \approx 0.008 \ll 1.$$<p>The cat is unambiguously a <strong>liquid</strong>.</p>
<p><strong>Scenario 2: Cat in a cardboard box.</strong>
Conformation over approximately 30 minutes, $\tau \approx 20\,\mathrm{s}$.</p>
$$\mathrm{De}_\mathrm{box} = \frac{20}{1800} \approx 0.011 \ll 1.$$<p><strong>Liquid.</strong></p>
<p><strong>Scenario 3: Cat dropping from a bookshelf.</strong>
Contact time during a jump approximately $T \approx 0.05\,\mathrm{s}$,
relaxation time still $\tau \approx 5\,\mathrm{s}$.</p>
$$\mathrm{De}_\mathrm{jump} = \frac{5}{0.05} = 100 \gg 1.$$<p><strong>Solid.</strong> The cat does not deform into the shape of the bookshelf during the
jump; it rebounds elastically.</p>
<p><strong>Scenario 4: Cat startled by a loud noise.</strong>
Reaction time $T \approx 0.3\,\mathrm{s}$, $\tau \approx 5\,\mathrm{s}$.</p>
$$\mathrm{De}_\mathrm{startle} = \frac{5}{0.3} \approx 17 \gg 1.$$<p><strong>Solid.</strong> On short timescales, cats behave as elastic materials — they spring,
they bounce, they do not flow.</p>
<p>The cat is neither permanently solid nor permanently liquid. It is a
<strong>viscoelastic material</strong> whose phase classification depends on the timescale
of the interaction. This is not a loose analogy; it is the definition of
viscoelasticity.</p>
<hr>
<h2 id="non-newtonian-behaviour-and-flow-instabilities">Non-Newtonian Behaviour and Flow Instabilities</h2>
<p>Fardin noted an additional complication: cat flow is not Newtonian. A Newtonian
fluid has a viscosity $\eta$ that is independent of the applied shear rate
$\dot\gamma$. Many real materials are <strong>shear-thinning</strong> (viscosity decreases
with increasing shear rate — ketchup, blood, many polymer solutions) or
<strong>shear-thickening</strong> (viscosity increases with increasing shear rate —
cornstarch suspension, some dense suspensions). Cats, Fardin observed, appear
to be shear-thinning: the more rapidly you attempt to move a relaxed cat from
its current position, the more &ldquo;liquid&rdquo; (accommodating, compliant) it becomes,
up to a point at which the cat transitions to solid behaviour (claws, teeth).</p>
<p>This is, formally, the behaviour of a <strong>yield-stress fluid</strong>: a material that
behaves as a solid below a critical stress $\sigma_y$ and flows above it. The
Herschel–Bulkley model describes such fluids:</p>
$$\sigma = \sigma_y + k \dot\gamma^n, \quad \sigma > \sigma_y,$$<p>where $k$ is the flow consistency index and $n < 1$ for shear-thinning. The
challenge of fitting $k$, $n$, and $\sigma_y$ for a specific cat is
experimental, and Fardin acknowledged this was left to future work.</p>
<p>The <strong>Deborah number</strong> and the <strong>yield stress</strong> together provide a two-parameter
phase diagram for cat rheology:</p>
<ul>
<li>Low stress, short timescale: solid (De ≫ 1 or σ &lt; σ_y)</li>
<li>Low stress, long timescale: liquid (De ≪ 1)</li>
<li>High stress: yield, followed by flow</li>
</ul>
<hr>
<h2 id="flow-instabilities-the-rayleigh-plateau-connection">Flow Instabilities: The Rayleigh-Plateau Connection</h2>
<p>Fardin also noted that cats confined to containers thinner than their body
diameter can exhibit flow instabilities. A cat attempting to fit into a glass
too narrow for its body will sometimes adopt a helical or coiled configuration —
an instability reminiscent of the <strong>Rayleigh–Plateau instability</strong> of a liquid
jet.</p>
<p>The Rayleigh–Plateau instability occurs when a cylindrical fluid jet of radius
$r_0$ is subject to perturbations of wavelength $\lambda > 2\pi r_0$. Modes
with wavelength longer than the cylinder&rsquo;s circumference are unstable and grow,
breaking the jet into droplets. The dispersion relation for growth rate $\sigma$
as a function of wavenumber $k = 2\pi/\lambda$ (for an inviscid jet) is:</p>
$$\sigma^2 = \frac{\gamma}{\rho r_0^3}\, k r_0 \bigl(1 - k^2 r_0^2\bigr)
I_1(kr_0)/I_0(kr_0),$$<p>where $\gamma$ is surface tension and $I_0, I_1$ are modified Bessel functions.
The analogy with a cat is inexact — surface tension is not the dominant
restoring force — but the qualitative instability mechanism (a long cylinder of
material is unstable to perturbations whose wavelength exceeds the cylinder&rsquo;s
circumference) appears to apply, suggesting that very elongated cats in very
narrow containers should be unstable to coiling. This is, again, left to future
experimental work.</p>
<hr>
<h2 id="why-the-deborah-number-matters-outside-of-cat-physics">Why the Deborah Number Matters (Outside of Cat Physics)</h2>
<p>The Deborah number is not a curiosity; it is a central dimensionless number
in engineering and materials science.</p>
<p><strong>Polymer processing</strong>: The flow of polymer melts through injection-moulding
channels involves De in the range $10^{-2}$–$10^2$. Too high a De leads to
elastic instabilities, melt fracture, and surface defects in the finished part.</p>
<p><strong>Blood rheology</strong>: Blood is a non-Newtonian viscoelastic fluid. In the large
arteries (low shear rate), red blood cells aggregate into <em>rouleaux</em> and
blood behaves as a shear-thinning fluid. In the capillaries (high shear rate),
rouleaux break up and individual cells deform to fit through vessels smaller
than their resting diameter — liquid behaviour on short length scales.</p>
<p><strong>Geophysics</strong>: The mantle is an elastic solid for seismic waves ($T \sim$
seconds, De ≫ 1) and a viscous fluid for convection ($T \sim 10^8$–$10^9$
years, De ≪ 1). The same material. Different Deborah numbers.</p>
<p><strong>Glaciology</strong>: Ice is an elastic solid for rapid fracture (calving of icebergs)
and a viscous fluid for glacier flow. The transition occurs at timescales of
years to decades, depending on temperature and stress.</p>
<p>The cat is in good company.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Fardin, M.-A. (2014). On the rheology of cats. <em>Rheology Bulletin</em>, 83(2),
16–17.</p>
</li>
<li>
<p>Reiner, M. (1964). The Deborah number. <em>Physics Today</em>, 17(1), 62.
<a href="https://doi.org/10.1063/1.3051374">https://doi.org/10.1063/1.3051374</a></p>
</li>
<li>
<p>Barnes, H.A., Hutton, J.F., &amp; Walters, K. (1989). <em>An Introduction to
Rheology.</em> Elsevier (Rheology Series, Vol. 3).</p>
</li>
<li>
<p>Bird, R.B., Armstrong, R.C., &amp; Hassager, O. (1987). <em>Dynamics of Polymeric
Liquids, Vol. 1: Fluid Mechanics</em> (2nd ed.). Wiley-Interscience.</p>
</li>
<li>
<p>Eggers, J. (1997). Nonlinear dynamics and breakup of free-surface flows.
<em>Reviews of Modern Physics</em>, 69(3), 865–930.
<a href="https://doi.org/10.1103/RevModPhys.69.865">https://doi.org/10.1103/RevModPhys.69.865</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-15</strong>: Fixed Deborah number in summary from 0.08 to 0.008 (matching the body calculation: 5/600 = 0.00833).</li>
<li><strong>2025-12-15</strong>: Corrected Fardin&rsquo;s institutional affiliation from &ldquo;Paris Diderot University&rdquo; to &ldquo;ENS Lyon&rdquo; — his affiliation on the 2014 <em>Rheology Bulletin</em> paper is Université de Lyon / ENS Lyon (CNRS UMR 5672). He moved to Paris Diderot later in 2014, after the paper was published.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>When Musicians Lock In: Coupled Oscillators and the Physics of Ensemble Synchronisation</title>
      <link>https://sebastianspicker.github.io/posts/kuramoto-ensemble-sync/</link>
      <pubDate>Thu, 08 Feb 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/kuramoto-ensemble-sync/</guid>
      <description>Every ensemble faces the same physical problem: N oscillators with slightly different natural frequencies trying to synchronise through a shared coupling channel. The Kuramoto model — developed by a statistical physicist to describe fireflies, neurons, and power grids — applies directly to musicians. It predicts a phase transition between incoherence and synchrony, quantifies why latency destroys networked ensemble performance, and connects to recent EEG studies of inter-brain synchronisation.</description>
      <content:encoded><![CDATA[<p><em>The problem is ancient and the language for it is recent. In any ensemble — a
string quartet, a jazz rhythm section, an orchestra — musicians with slightly
different internal tempos must stay together. They do this by listening to each
other. But what, exactly, does &ldquo;listening to each other&rdquo; do to their timing? And
what happens when the listening channel is imperfect — delayed by the speed of
sound across a wide stage, or by a network cable crossing a continent? The answer
involves a differential equation that was not written to describe music.</em></p>
<p><em>This post extends the latency analysis in <a href="/posts/nmp-latency-lola-mvtp/">Latency in Networked Music
Performance</a> with the dynamical systems framework
that underlies it.</em></p>
<hr>
<h2 id="two-clocks-on-a-board">Two Clocks on a Board</h2>
<p>The first documented observation of coupled-oscillator synchronisation was made
not by a musician but by a physicist. In 1665, Christiaan Huygens, confined to
bed with illness, was watching two pendulum clocks mounted on the same wooden
beam. Over the course of the night, the pendulums had synchronised into
<em>anti-phase</em> oscillation — swinging in opposite directions in exact unison.
He reported it to his father:</p>
<blockquote>
<p>&ldquo;I have noticed a remarkable effect which no-one has observed before&hellip; two
clocks on the same board always end up in mutual synchrony.&rdquo;</p>
</blockquote>
<p>The mechanism was mechanical coupling through the beam. Each pendulum&rsquo;s swing
imparted a small impulse to the wood; the other pendulum felt this as a
perturbation to its rhythm. Small perturbations, accumulated over hours, drove
the clocks into a shared frequency and a fixed phase relationship.</p>
<p>This is the prototype of every ensemble synchronisation problem. Each musician
is a clock. The acoustic environment — the air in the room, the reflected sound
from the walls, the vibrations through the stage floor — is the wooden beam.</p>
<hr>
<h2 id="the-kuramoto-model">The Kuramoto Model</h2>
<p>Yoshiki Kuramoto formalised the mathematics of coupled oscillators in 1975,
motivated by biological synchronisation problems: firefly flashing, circadian
rhythms, cardiac pacemakers. His model considers $N$ oscillators, each with a
phase $\theta_i(t)$ evolving according to:</p>
$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin(\theta_j - \theta_i), \qquad i = 1, \ldots, N.$$<p>The first term, $\omega_i$, is the oscillator&rsquo;s <em>natural frequency</em> — the tempo it
would maintain in isolation. These are drawn from a distribution $g(\omega)$, which
in a real ensemble reflects the spread of individual preferred tempos among the
players. The second term is the coupling: each oscillator is attracted toward the
phases of all others, with strength $K/N$. The factor $1/N$ keeps the total
coupling intensive (independent of ensemble size) as $N$ grows large.</p>
<p>Musically: $\theta_i$ is the phase of musician $i$&rsquo;s internal pulse at a given
moment, $\omega_i$ is their preferred tempo if playing alone, and $K$ is the
coupling strength — how much they adjust their tempo in response to what they
hear from the others.</p>
<hr>
<h2 id="the-order-parameter-and-the-phase-transition">The Order Parameter and the Phase Transition</h2>
<p>To measure the degree of synchronisation, Kuramoto introduced the complex order
parameter:</p>
$$r(t)\, e^{i\psi(t)} = \frac{1}{N} \sum_{j=1}^{N} e^{i\theta_j(t)},$$<p>where $r(t) \in [0, 1]$ is the <em>coherence</em> of the ensemble and $\psi(t)$ is the
collective mean phase. When $r = 0$, the phases are uniformly spread around the
unit circle — the ensemble is incoherent. When $r = 1$, all phases coincide —
perfect synchrony. In a live ensemble, $r$ is a direct measure of rhythmic
cohesion, though of course not one you can read off a score.</p>
<p>Substituting the order parameter into the equation of motion:</p>
$$\frac{d\theta_i}{dt} = \omega_i + K r \sin(\psi - \theta_i).$$<p>Each oscillator now interacts only with the mean-field quantities $r$ and $\psi$,
not with every other oscillator individually. The coupling pulls each musician
toward the collective mean phase with a force proportional to both $K$ (how
attentively they listen) and $r$ (how coherent the group already is).</p>
<p>This mean-field form reveals the essential physics. For small $K$, oscillators
with widely differing $\omega_i$ cannot follow the mean field — they drift at
their own frequencies, and $r \approx 0$. At a critical coupling strength $K_c$,
a macroscopic fraction of oscillators suddenly locks to a shared frequency, and
$r$ begins to grow continuously from zero. For a unimodal,
symmetric frequency distribution $g(\omega)$ with density $g(\bar\omega)$ at the
mean:</p>
$$K_c = \frac{2}{\pi\, g(\bar\omega)}.$$<p>Above $K_c$, the coherence grows as:</p>
$$r \approx \sqrt{\frac{K - K_c}{K_c}}, \qquad K \gtrsim K_c.$$<p>This is a <strong>second-order (continuous) phase transition</strong> — the same
mathematical structure as a ferromagnet approaching the Curie temperature,
where spontaneous magnetisation appears continuously above a critical coupling.
The musical ensemble and the magnetic material belong to the same universality
class, governed by the same mean-field exponent $\frac{1}{2}$.</p>
<p>Above $K_c$, the fraction of oscillators that are <em>locked</em> (synchronised to the
mean-field frequency) can be computed explicitly. An oscillator with natural
frequency $\omega_i$ locks to the mean field if $|\omega_i - \bar\omega| \leq
Kr$. For a Lorentzian distribution $g(\omega) = \frac{\gamma/\pi}{(\omega -
\bar\omega)^2 + \gamma^2}$, this yields:</p>
$$r = \sqrt{1 - \frac{K_c}{K}}, \qquad K_c = 2\gamma,$$<p>which is the exact self-consistency equation for the Kuramoto model with
Lorentzian frequency spread (Strogatz, 2000).</p>
<p>The physical reading is direct: whether an ensemble locks into a shared pulse or
drifts apart is a threshold phenomenon. A group of musicians with similar
preferred tempos has a peaked $g(\bar\omega)$, giving a low $K_c$ — they
synchronise easily with minimal attentive listening. A group with widely varying
individual tempos needs stronger, more sustained coupling to cross the threshold.
This is not a matter of musical discipline; it is a material property of the
ensemble.</p>
<hr>
<h2 id="concert-hall-applause-neda-et-al-2000">Concert Hall Applause: Neda et al. (2000)</h2>
<p>The Kuramoto model is not only a theoretical construction. Neda et al. (2000)
applied it to concert hall applause — one of the most direct real-world
demonstrations of coupled-oscillator dynamics in a musical context.</p>
<p>They recorded applause in Romanian and Hungarian theaters and found that audiences
spontaneously alternate between two distinct states. In the <em>incoherent</em> regime,
each audience member claps at their own preferred rate (typically 2–3 Hz). Through
acoustic coupling — each person hears the room-averaged sound and adjusts their
clapping — the audience gradually synchronises to a shared, slower frequency
(around 1.5 Hz): the <em>synchronised</em> regime.</p>
<p>The transitions between the two regimes are quantitatively consistent with the
Kuramoto phase transition: the emergence of synchrony corresponds to $K$ crossing
$K_c$ as people progressively pay more attention to the collective sound.
Furthermore, Neda et al. document a characteristic phenomenon when synchrony
breaks down: individual clapping frequency approximately <em>doubles</em> as audience
members attempt to re-establish coherence. This frequency-doubling — a feature of
nonlinear oscillator systems near instability — is exactly what the delayed
response of coupling near $K_c$ predicts.</p>
<p>The paper is a useful pedagogical artefact: every music student has experienced
concert hall applause, and hearing that it undergoes a physically measurable phase
transition makes the connection between physics and musical experience concrete.</p>
<hr>
<h2 id="latency-and-the-limits-of-networked-ensemble-performance">Latency and the Limits of Networked Ensemble Performance</h2>
<p>In standard acoustic ensemble playing, the coupling delay is the propagation time
for sound to cross the ensemble: at $343\ \text{m/s}$, across a ten-metre stage,
roughly 30 ms. This is why orchestral seating is arranged with attention to who
needs to hear whom first.</p>
<p>In networked music performance (NMP), the coupling delay $\tau$ is much larger:
tens to hundreds of milliseconds depending on geographic distance and network
infrastructure. The Kuramoto model generalises naturally to include this delay:</p>
$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin\!\bigl(\theta_j(t - \tau) - \theta_i(t)\bigr).$$<p>Each musician hears the others&rsquo; phases as they were $\tau$ seconds ago, not as
they are now.</p>
<p>In a synchronised state where all oscillators share the collective frequency
$\bar\omega$ and phase $\psi(t) = \bar\omega t$, the delayed phase signal is
$\psi(t - \tau) = \bar\omega t - \bar\omega\tau$. The effective coupling
force contains a factor $\cos(\bar\omega\tau)$: the delay introduces a phase
shift that reduces the useful component of the coupling. The critical coupling
with delay is therefore:</p>
$$K_c(\tau) = \frac{K_c(0)}{\cos(\bar\omega \tau)}.$$<p>As $\tau$ increases, $K_c(\tau)$ grows: synchronisation requires progressively
stronger coupling (more attentive adjustment) to compensate for the information
lag. The denominator $\cos(\bar\omega\tau)$ reaches zero when
$\bar\omega\tau = \pi/2$. At this point $K_c(\tau) \to \infty$: no finite coupling
strength can maintain synchrony. The critical delay is:</p>
$$\tau_c = \frac{\pi}{2\bar\omega}.$$<p>For an ensemble performing at 120 BPM, the beat frequency is
$\bar\omega = 2\pi \times 2\ \text{Hz} = 4\pi\ \text{rad/s}$:</p>
$$\tau_c = \frac{\pi}{2 \times 4\pi} = \frac{1}{8}\ \text{s} = 125\ \text{ms}.$$<p>This is a remarkably clean result. The Kuramoto model with delay predicts that
ensemble synchronisation collapses at around 125 ms one-way delay for a standard
performance tempo. The empirical literature on NMP — from LoLa deployments across
European conservatories to controlled latency studies in the lab — consistently
finds that rhythmic coherence degrades noticeably above 50–80 ms and becomes
essentially unworkable above 100–150 ms one-way. The model and the data agree.</p>
<p>The derivation also shows why faster tempos are harder in NMP: $\tau_c \propto
1/\bar\omega$, so doubling the tempo halves the tolerable latency. An ensemble
performing at 240 BPM in a distributed setting faces a theoretical ceiling of
62 ms — which rules out transcontinental performance for most repertoire.</p>
<hr>
<h2 id="brains-in-sync-eeg-hyperscanning">Brains in Sync: EEG Hyperscanning</h2>
<p>The Kuramoto framework has recently been applied at a neural level.
EEG hyperscanning — simultaneous EEG recording from multiple participants during
a shared musical activity — has shown that musicians performing together exhibit
<em>inter-brain synchronisation</em>: coherent cortical oscillations at the frequency of
the music are measurable between players (Lindenberger et al., 2009; Müller et
al., 2013). The phase coupling between brains during joint performance is
significantly higher than during solo performance and higher than for musicians
playing simultaneously but without acoustic coupling.</p>
<p>This suggests that the Kuramoto coupling operates at two levels: the acoustic
(each musician hears the other and adjusts physical timing) and the neural (each
musician&rsquo;s cortical oscillators entrain to the shared musical pulse). The
question of which level is primary — whether neural synchrony causes or follows
from acoustic synchrony — remains open.</p>
<p>A 2023 review by Demos and Palmer argues that pairwise Kuramoto-type coupling is
insufficient to capture full ensemble dynamics. Group-level effects — the
differentiation between leader and follower roles, the emergence of collective
timing that no individual would produce alone — require nonlinear dynamical
frameworks that go beyond mean-field averaging. The model that adequately
describes a string quartet may need to be richer than the one that describes a
population of identical fireflies.</p>
<hr>
<h2 id="what-this-means-for-teaching">What This Means for Teaching</h2>
<p>The Kuramoto model reframes standard rehearsal intuitions in physical terms.</p>
<p><strong>&ldquo;Listen more&rdquo;</strong> translates to &ldquo;increase your effective coupling constant $K$.&rdquo;
A musician who plays without attending to others has set $K \approx 0$ and will
drift freely according to their own $\omega_i$. Listening — actively adjusting
tempo in response to what you hear — is not metaphorical. It is the physical
mechanism of coupling, and its effect is to pull you toward the mean phase $\psi$
with a force $Kr\sin(\psi - \theta_i)$.</p>
<p><strong>&ldquo;Our tempos are too different&rdquo;</strong> is a claim about $g(\bar\omega)$ and therefore
about $K_c$. A group with a wide spread of natural tempos needs more and stronger
listening to synchronise. This is not a moral failing but a parameter; it
suggests that ensemble warm-up time or explicit tempo negotiation before a
performance serves to reduce the spread of natural frequencies before the coupling
has to do all the work.</p>
<p><strong>Latency as a rehearsal experiment</strong> can be made explicit. Artificially delaying
the acoustic return to one musician in an ensemble — via headphone monitoring with
variable delay — allows students to experience directly how the coordination
degrades as $\tau$ increases toward $\tau_c$. They feel the system approaching
the phase transition without the theoretical framework, but the framework makes
the experience interpretable afterward.</p>
<p><strong>The click track</strong> replaces peer-to-peer Kuramoto coupling with an external
forcing term: each musician locks to a shared reference with fixed $\omega$
rather than adjusting dynamically to the group mean. This eliminates the phase
transition but also eliminates the adaptive dynamics — the micro-timing
fluctuations and expressive rubato — that characterise live ensemble playing. It
is a pedagogically important distinction, even if studios routinely make the
pragmatic choice.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Demos, A. P., &amp; Palmer, C.
(2023). Social and nonlinear dynamics unite: Musical group synchrony. <em>Trends
in Cognitive Sciences</em>, 27(11), 1008–1018.
<a href="https://doi.org/10.1016/j.tics.2023.08.005">https://doi.org/10.1016/j.tics.2023.08.005</a></p>
</li>
<li>
<p>Huygens, C. (1665). Letter to his father Constantijn Huygens, 26 February
1665. In <em>Œuvres complètes de Christiaan Huygens</em>, Vol. 5, p. 243. Martinus
Nijhoff, 1893.</p>
</li>
<li>
<p>Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear
oscillators. In H. Araki (Ed.), <em>International Symposium on Mathematical
Problems in Theoretical Physics</em> (Lecture Notes in Physics, Vol. 39,
pp. 420–422). Springer.</p>
</li>
<li>
<p>Kuramoto, Y. (1984). <em>Chemical Oscillations, Waves, and Turbulence.</em> Springer.</p>
</li>
<li>
<p>Lindenberger, U., Li, S.-C., Gruber, W., &amp; Müller, V. (2009). Brains swinging
in concert: Cortical phase synchronization while playing guitar.
<em>BMC Neuroscience</em>, 10, 22. <a href="https://doi.org/10.1186/1471-2202-10-22">https://doi.org/10.1186/1471-2202-10-22</a></p>
</li>
<li>
<p>Müller, V., Sänger, J., &amp; Lindenberger, U. (2013). Intra- and inter-brain
synchronization during musical improvisation on the guitar. <em>PLOS ONE</em>, 8(9),
e73852. <a href="https://doi.org/10.1371/journal.pone.0073852">https://doi.org/10.1371/journal.pone.0073852</a></p>
</li>
<li>
<p>Neda, Z., Ravasz, E., Vicsek, T., Brechet, Y., &amp; Barabási, A.-L. (2000).
Physics of the rhythmic applause. <em>Physical Review E</em>, 61(6), 6987–6992.
<a href="https://doi.org/10.1103/PhysRevE.61.6987">https://doi.org/10.1103/PhysRevE.61.6987</a></p>
</li>
<li>
<p>Strogatz, S. H. (2000). From Kuramoto to Crawford: Exploring the onset of
synchronization in populations of coupled oscillators. <em>Physica D: Nonlinear
Phenomena</em>, 143(1–4), 1–20.
<a href="https://doi.org/10.1016/S0167-2789(00)00094-4">https://doi.org/10.1016/S0167-2789(00)00094-4</a></p>
</li>
<li>
<p>Strogatz, S. H. (2003). <em>Sync: How Order Emerges from Chaos in the Universe,
Nature, and Daily Life.</em> Hyperion.</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-14</strong>: Updated the author list for the Demos (2023) <em>Trends in Cognitive Sciences</em> reference to the published two authors (Demos &amp; Palmer). The five names previously listed were from a different Demos paper.</li>
<li><strong>2026-01-14</strong>: Changed &ldquo;period-doubling&rdquo; to &ldquo;frequency-doubling.&rdquo; When the clapping frequency doubles, the period halves; &ldquo;frequency-doubling&rdquo; is the precise term in this context.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>LK-99: Six Weeks That Showed How Physics Works</title>
      <link>https://sebastianspicker.github.io/posts/lk99-preprint-physics-sociology/</link>
      <pubDate>Mon, 09 Oct 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/lk99-preprint-physics-sociology/</guid>
      <description>On July 22, 2023, a Korean preprint claimed that LK-99 — a copper-doped lead apatite — was a room-temperature, ambient-pressure superconductor. Within six weeks, the claim was definitively falsified. The episode is more interesting for what it revealed about the sociology of science than for the compound itself: how a global community self-corrected at extraordinary speed, and how the media managed to fail at conveying uncertainty despite watching it happen in real time.</description>
      <content:encoded><![CDATA[<h2 id="july-22-2023">July 22, 2023</h2>
<p>On a Saturday morning in late July 2023, two preprints appeared on arXiv. They were submitted by researchers affiliated with the Quantum Energy Research Centre in Seoul — Sukbae Lee, Ji-Hoon Kim, and colleagues — and they claimed something that condensed matter physicists have been chasing for over a century: a material that superconducts at room temperature and ambient pressure.</p>
<p>The compound was called LK-99. It was a copper-doped lead apatite, synthesized from common precursors using a procedure that, on paper, any moderately equipped laboratory could attempt. The claimed critical temperature was above 400 K — well above 293 K, which is room temperature, which is roughly the temperature of a warm afternoon in Seoul in July.</p>
<p>A video circulated almost immediately. A small, grey, irregular piece of LK-99 appeared to be partially levitating — tilting up, one end raised — above a permanent neodymium magnet. In the video it wobbles slightly, like something caught between gravity and an invisible hand.</p>
<p>Physics Twitter — I will use that name; it was still recognizably that in July 2023 — detonated. Within 72 hours, laboratories across the world were racing to synthesize LK-99. Discord servers formed. GitHub repositories appeared with shared synthesis protocols. Preprints from independent groups began accumulating before the original authors had likely had a good night&rsquo;s sleep.</p>
<p>Six weeks later, the claim was dead.</p>
<p>I want to write about what happened in those six weeks, because I think the episode is more interesting as sociology of science than as condensed matter physics. LK-99 turned out to be a modest semiconductor with a ferromagnetic impurity. But the speed and the manner of that determination — the way a globally distributed community of physicists organized itself, shared data in real time, converged on a falsification, and then moved on — that is genuinely remarkable, and worth examining carefully.</p>
<h2 id="why-room-temperature-superconductivity-is-the-grail">Why Room-Temperature Superconductivity Is the Grail</h2>
<p>Let me be precise about why this particular claim generates the response it does.</p>
<p>Superconductivity is the phenomenon in which certain materials, below a critical temperature T<sub>c</sub>, carry electrical current with exactly zero resistance. Not very low resistance — zero. A current established in a superconducting loop will, in principle, continue flowing indefinitely without any driving voltage. This is not a small quantitative improvement over ordinary conductors; it is a qualitatively different regime of physics.</p>
<p>The trouble is that essentially all known superconductors require extreme cooling. Conventional metallic superconductors — the ones Heike Kamerlingh Onnes discovered in mercury in 1911 — become superconducting below about 30 K at best. That is liquid helium temperature, which is expensive, logistically demanding, and entirely impractical for large-scale applications. The discovery of high-temperature cuprate superconductors in 1986 (Bednorz and Müller, Nobel Prize 1987) was genuinely revolutionary: some cuprates superconduct up to about 138 K. But 138 K is still −135°C. It requires liquid nitrogen cooling, which is cheaper than liquid helium but still not something you install in a power grid without substantial infrastructure.</p>
<p>The current record belongs to a class of hydrogen-rich compounds under extreme pressure — carbonaceous sulfur hydride at roughly 15°C, but requiring about 267 GPa of pressure. For context, the pressure at the center of the Earth is about 360 GPa. You cannot run a power cable through a diamond anvil cell.</p>
<p>Room-temperature, ambient-pressure superconductivity would be transformative in a way that very few material discoveries are. Electrical grids currently lose somewhere between 5 and 10 percent of all transmitted energy to resistive heating — a staggering quantity of energy, simply dissipated as heat in cables. Zero-resistance transmission would eliminate that loss. Magnetically levitated transport would become feasible without the cryogenic infrastructure that makes current Maglev systems enormously expensive to build and maintain. Compact, affordable MRI machines would become possible. Effects on computing, on energy storage, on medical technology — the list runs long. It would be one of the most consequential material discoveries in the history of technology.</p>
<p>This is why the response to the LK-99 preprints was not hysteria but rather the entirely rational behavior of a community that understood exactly what was at stake if the claim were true.</p>
<h2 id="what-lk-99-was-and-what-it-claimed">What LK-99 Was and What It Claimed</h2>
<p>LK-99 is chemically expressed as Pb₁₀₋ₓCuₓ(PO₄)₆O, where x is approximately 0.9 to 1.1. It is a lead apatite — the same crystal family as the mineral in tooth enamel — with a fraction of the lead atoms replaced by copper.</p>
<p>The proposed mechanism, as sketched in the preprints, involved Cu²⁺ substituting for Pb²⁺. Because copper has a slightly smaller ionic radius than lead, this substitution induces a local structural distortion. The claim was that this distortion produces a flat electronic band at the Fermi level — and flat bands are associated with strong electronic correlations that can, in principle, give rise to unconventional superconductivity. The analogy to twisted bilayer graphene was implicit in the discussion, though the mechanism is quite different and twisted bilayer graphene superconducts only well below 1 K.</p>
<p>Reading the preprints in late July 2023 was, I confess, a slightly uncomfortable experience. The writing was rushed. The two preprints — submitted by different author subsets from the same group — were internally inconsistent in places. The resistance measurements showed a large drop with temperature, but not zero resistance. The synthesis protocol was described in enough detail to be reproducible, which was good, but the characterization was incomplete in ways that mattered.</p>
<p>Red flags were present from the beginning, and many physicists noted them immediately. The levitation video showed a piece of LK-99 that was tilted and wobbling — not the stable, complete expulsion of magnetic flux you would expect from a true Meissner effect. A perfect superconductor placed above a magnet would levitate horizontally and stably. This piece was doing something, but the something was not obviously Meissner levitation.</p>
<p>And yet. The synthesis was simple. The claim was specific and testable. If there was even a small chance it was real, the imperative to check was overwhelming. So labs checked.</p>
<h2 id="the-replication-wave">The Replication Wave</h2>
<p>What happened over the following weeks was, as far as I am aware, unprecedented in condensed matter physics.</p>
<p>Normally, a replication in physics looks like this: a group reads a paper, decides it is interesting enough to attempt, orders precursor materials, synthesizes the compound (which takes weeks to months), characterizes it with appropriate instruments (more weeks), writes up the results, submits them (more weeks), and eventually publishes — often six months to a year after the original claim, sometimes much longer. The feedback cycle is slow by design: slowness is a feature, not a bug, because it allows careful work rather than hasty work.</p>
<p>The LK-99 replication did not look like this.</p>
<p>Within a week, preprints from independent groups — China, India, the United States, Germany — were appearing on arXiv. Discord servers with hundreds of members were organizing synthesis attempts in real time, sharing thermograms, resistance measurements, and microscope images as they came off instruments. Twitter threads tracked emerging results with the urgency of a live event. A GitHub repository maintained by the community accumulated synthesis protocols, shared data files, and links to new preprints as they appeared.</p>
<p>Some groups reported partial levitation. Others reported anomalous resistance drops. Others — starting almost immediately — reported synthesizing the material and finding nothing unusual at all.</p>
<p>The speed of this was extraordinary not because of any particular organizational effort, but because the incentive structure happened to align with the infrastructure that now exists. Preprints made sharing immediate. Social media made results public the moment they existed. The synthesis was simple enough to attempt in any reasonably equipped solid-state chemistry lab. And the motivation — the prize, if it were real — was enormous. You would not need to tell anyone to work on this. You would have to tell people to stop.</p>
<p>By mid-August 2023 — three weeks after the original preprints — the key debunking papers had appeared. By late August, there was no serious scientific debate remaining.</p>
<h2 id="the-mechanism-of-falsification">The Mechanism of Falsification</h2>
<p>The levitating video was explained first, and the explanation is both mundane and instructive.</p>
<p>The LK-99 synthesis produces, as an essentially unavoidable impurity, copper sulfide — Cu₂S. Copper sulfide is interesting in its own right: it undergoes a structural phase transition at roughly 105°C (378 K) from a low-temperature chalcocite form to a high-temperature superionic conductor. This transition is accompanied by a large, sharp drop in electrical resistance — exactly the kind of anomalous feature that, in a sample of mixed composition, might be misidentified as a superconducting transition.</p>
<p>More importantly for the levitation: the LK-99 synthesis products ubiquitously contain ferromagnetic impurity phases. A ferromagnetic material will interact with a permanent magnet. Partial levitation, tilted and unstable, is entirely consistent with a ferromagnetic-diamagnetic competition — not with the Meissner effect.</p>
<p>Several groups published debunking papers in rapid succession. Kumar and colleagues (<a href="#ref-Senapati2023">Kumar et al., 2023</a>) reported the absence of superconductivity in LK-99 samples; other groups synthesized Cu₂S independently, confirmed its resistance anomaly near 380 K, and showed quantitatively that the LK-99 observations were fully consistent with Cu₂S contamination and ferromagnetic impurities. Liu and Meng (<a href="#ref-LiuMeng2023">Liu &amp; Meng, 2023</a>) provided a complementary symmetry analysis explaining why the structural distortion mechanism did not actually predict superconductivity.</p>
<p>Several Chinese groups with high-quality synthesis capabilities — and, frankly, strong motivation to find a positive result — produced very pure LK-99 samples and found what you would expect of a clean lead apatite: a semiconductor with modest diamagnetism. Nothing anomalous. When you removed the Cu₂S impurity, you removed the anomaly.</p>
<p>Daniel Garisto summarized the consensus in a <em>Nature</em> news piece in August 2023 (<a href="#ref-Garisto2023">Garisto, 2023</a>): LK-99 is not a superconductor. The case was closed, with an efficiency that the scientific community should be proud of.</p>
<h2 id="a-useful-contrast-ranga-dias">A Useful Contrast: Ranga Dias</h2>
<p>The LK-99 episode does not exist in isolation. The preceding years had seen other extraordinary claims of room-temperature or near-room-temperature superconductivity, and the most prominent involved Ranga Dias at the University of Rochester.</p>
<p>Dias published two papers in <em>Nature</em> claiming superconductivity at or near room temperature: one in 2020, describing carbonaceous sulfur hydride at roughly 15°C under 267 GPa (<a href="#ref-Snider2020">Snider et al., 2020</a> — and I note that the earlier Dias and Silvera <em>Science</em> paper on metallic hydrogen (<a href="#ref-DiasSilvera2017">Dias &amp; Silvera, 2017</a>) received a significant erratum and has been widely questioned — establishing a pattern), and one in 2023, describing nitrogen-doped lutetium hydride under much lower pressure. Both <em>Nature</em> papers were eventually retracted — the 2020 paper in 2022, the 2023 paper in November 2023 — amid serious and credible allegations of data manipulation. The criticisms included statistical anomalies in background signals, apparent image duplication across different experimental conditions, and raw data that did not match the published figures. Hirsch, who had been following these claims closely, documented many of the irregularities (<a href="#ref-Hirsch2021">Hirsch, 2021</a>).</p>
<p>The contrast with LK-99 is worth sitting with. The Korean team appears to have been guilty of honest overreach: genuine excitement about anomalous observations, insufficient characterization before posting, motivated interpretation of ambiguous data. This happens in science. Extraordinary rewards for being right create extraordinary pressure to believe you are right. The LK-99 researchers may have seen something they genuinely could not explain and convinced themselves it was what they hoped it was.</p>
<p>The Dias case, if the allegations of data manipulation are accurate — and the retractions, and the University of Rochester investigation that followed, suggest they have merit — is something different: not motivated misinterpretation but deliberate fabrication. The scientific outcomes are superficially similar: both sets of claims were false, both caused the community to expend significant effort on falsification, both damaged the credibility of the field. But the causes, and the appropriate institutional and moral responses, differ substantially.</p>
<p>How do you tell them apart in real time? In both cases, you had extraordinary claims that passed initial peer review at prestigious venues. In both cases, independent replication failed. The LK-99 falsification came faster, partly because the synthesis was simpler and partly because the community mobilized more broadly. The Dias case took years, and the data manipulation allegations required access to raw data that the research group was slow to provide.</p>
<p>I do not have a clean answer. The difference in mechanism — honest error versus alleged fraud — is not directly observable from the outside. What you can observe is willingness to share data, consistency of results across different instruments and laboratories, and whether the research group facilitates or obstructs independent verification. On those criteria, the LK-99 group and the Dias group look quite different.</p>
<h2 id="the-sociology-of-what-happened">The Sociology of What Happened</h2>
<p>Let me step back from the physics and say something about what the LK-99 episode reveals about how science actually functions.</p>
<p>The first thing it reveals is that community self-correction works, and now works at extraordinary speed when the incentive is high enough. The coordinated global replication was not organized by any institution, any journal, any funding body. It emerged spontaneously from a community that understood what was at stake and had the tools — preprint servers, social media, Discord, GitHub — to coordinate without central direction. The result was a falsification that, in a previous era, might have taken two to five years, completed in six weeks. That is remarkable.</p>
<p>The second thing it reveals is that the preprint revolution is real and consequential. The LK-99 preprints bypassed traditional peer review entirely. That could be bad — and in principle, a false claim could propagate further and faster without peer review as a gate. In practice, in this case, removing the gate allowed not just the false claim but its falsification to move at the same speed. Peer review, as it is normally practiced, is too slow to respond to a claim like this on a timescale that matters. The community replaced it with something faster: immediate, distributed, adversarial review by people with direct experimental access to the question.</p>
<p>This is not an argument against peer review. It is an argument that peer review in the traditional sense — two or three reviewers reading a manuscript over a few weeks — is not the only form that meaningful scientific scrutiny takes.</p>
<p>The third thing the episode reveals is that social media&rsquo;s role in science communication is deeply ambivalent. Twitter accelerated the spread of both the original claim and the debunking. The community of physicists on Twitter was, on the whole, appropriately skeptical from the first day — I saw many threads on July 22 and 23 that noted the red flags I mentioned above: the tilted levitation, the non-zero resistance, the inconsistencies between the two preprints. But that skepticism was invisible to most science journalists, who were looking at the same videos and preprints and reading the excitement rather than the caveats.</p>
<h2 id="the-media-and-the-calibration-problem">The Media, and the Calibration Problem</h2>
<p>I want to be specific about the media failure, because I think it matters.</p>
<p>The appropriate headline on July 23, 2023 was something like: &ldquo;Korean researchers post preprints claiming room-temperature superconductivity; claim is extraordinary and unverified; replication underway.&rdquo; That headline is accurate. It conveys the genuine excitement — because the claim, if true, would be extraordinary — while conveying the appropriate uncertainty about an unverified preprint from a single group.</p>
<p>The headlines that actually appeared, across outlets that should know better, included &ldquo;Room-temperature superconductor discovered&rdquo; and &ldquo;Scientists may have created the holy grail of energy.&rdquo; These are not accurate. They convey neither the uncertainty nor the specific nature of the claim. They treat a preprint as a discovery.</p>
<p>This is a calibration failure — the same kind of failure I have written about in other contexts. On this blog, I have discussed how LLMs can fail catastrophically when they lack the context to assess whether their confident-sounding output is grounded in anything real (<a href="/posts/car-wash-grounding/">see the car-wash post</a>, and more generally the discussion of context and grounding in <a href="/posts/more-context-not-always-better/">more context is not always better</a>). The mechanism in journalism is different but the structure is the same: confidence that is not appropriately calibrated to evidence.</p>
<p>The Bayesian structure of the situation was, or should have been, clear. The prior probability of a room-temperature, ambient-pressure superconductor being found in any given week is very small — not because room-temperature superconductors are impossible, but because such discoveries do not happen often and many previous claims have failed. Call that prior probability low. Against that prior, what evidence did we have on July 23? A video showing partial, unstable levitation — which, as I noted, is not what Meissner levitation looks like. Two rushed preprints that disagreed with each other in some details. No independent replication. P(levitation video | not a superconductor) was not particularly small, as the Cu₂S explanation would later demonstrate. So the posterior probability that LK-99 was a room-temperature superconductor, given the evidence available on July 23, was not meaningfully higher than the prior — which was low.</p>
<p>A well-calibrated science journalist would not have written &ldquo;Room-temperature superconductor discovered.&rdquo; A well-calibrated scientist — and many of them said exactly this — would have written &ldquo;interesting claim, requires replication, maintain high skepticism.&rdquo; The scientific community was, on the whole, well-calibrated. The journalism was not.</p>
<p>This is not a new observation. Science journalists have been criticized for overclaiming since there have been science journalists. But the LK-99 episode is a particularly clean example because the timescale was so short: the calibration failure in the media and the calibration success in the scientific community happened simultaneously, in full public view, and could be compared directly.</p>
<p>I write occasionally about AI systems and their tendency to produce confident outputs that are not grounded in evidence — a form of miscalibration that is particularly dangerous because the confident tone is not a signal of accuracy (<a href="/posts/ai-detectors-systematic-minds/">a theme that runs through recent posts on this blog</a>). The LK-99 episode is a reminder that miscalibration is not unique to neural networks. It is a general failure mode in any system that needs to estimate uncertainty about claims — human, institutional, or artificial. The cure in all cases is the same: track confidence to evidence, update on data, resist the pull of exciting priors.</p>
<h2 id="what-the-scientific-community-actually-did">What the Scientific Community Actually Did</h2>
<p>I want to be careful not to end on a note of pure cynicism about the media and leave the scientific community looking saintly. The community is not saintly.</p>
<p>There were preprints from independent groups that claimed positive results before the falsification was clear — groups that perhaps saw anomalies and wanted to be part of the story. There was social pressure, documented in real time on Twitter, to share exciting results before they were fully analyzed. The Discord servers and GitHub repositories that were genuinely useful for coordination were also, occasionally, vectors for misinformation and premature interpretation.</p>
<p>The community self-corrected. That is the important thing. The noise in the system resolved into a clear answer, in six weeks, through a process that was adversarial in the best scientific sense: many people trying to verify or refute a specific testable claim, sharing data openly, calling out methodological problems in public. The answer that emerged was correct.</p>
<p>I find this genuinely impressive. It is easy to be cynical about institutional science — about publication bias, about the replication crisis in psychology and medicine, about the incentive structures that reward novelty over rigor. The LK-99 episode is a counter-example. It is evidence that, when a question is clear and testable and the stakes are high, the system works. Not perfectly, not without noise, but functionally.</p>
<p>Peer review in the classical sense was absent. Peer review in a broader sense — global, immediate, public, adversarial — worked faster than any journal could have managed, and reached a correct conclusion.</p>
<h2 id="the-next-extraordinary-claim">The Next Extraordinary Claim</h2>
<p>LK-99 is over. The compound will appear in future textbooks, probably in a sidebar about famous failed claims in condensed matter physics, alongside Schön and Dias and others. The researchers who synthesized and characterized it honestly will get some credit for the negative result; the original Korean team will, I imagine, have a difficult few years professionally.</p>
<p>The question I am left with is what happens next time.</p>
<p>Room-temperature superconductivity will, almost certainly, be claimed again. The prize is too large and the search too active. Possibly the claim will be correct — I would not put that probability at zero. More likely it will be another false positive, another Cu₂S lurking in the impurity profile.</p>
<p>Will the media learn from LK-99? I am genuinely uncertain. The incentive structure for science journalism rewards excitement over accuracy, and &ldquo;extraordinary claim requires replication&rdquo; is a less clickable headline than &ldquo;room-temperature superconductor discovered.&rdquo; The journalists who wrote those headlines were not stupid; they were responding rationally to the incentives of their profession.</p>
<p>Will the scientific community respond as effectively? I think so, at least for claims of this kind: testable, synthesis-based, with enough labs in the world capable of attempting replication. The infrastructure — preprints, Discord, shared repositories — exists and is now demonstrated to work. The speed of the LK-99 falsification sets a kind of benchmark.</p>
<p>What the episode showed, in the end, is not that science is infallible or that the system is without problems. It showed that, under the right conditions — a clear empirical question, a distributed community with the tools and motivation to address it, and a culture of open data sharing — science can self-correct at remarkable speed. The failure was in communication, not in the science. That is a meaningful distinction.</p>
<p>Whether the media will have learned anything by the time the next extraordinary claim appears — that, I confess, I doubt.</p>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-LeeKim2023"></span>Lee, S., Kim, J. H., &amp; Kwon, Y.-W. (2023). The First Room-Temperature Ambient-Pressure Superconductor. <em>arXiv</em>:2307.12008. Retrieved from <a href="https://arxiv.org/abs/2307.12008">https://arxiv.org/abs/2307.12008</a></p>
</li>
<li>
<p><span id="ref-Senapati2023"></span>Kumar, K., Surface, N. B., &amp; Baral, B. (2023). Absence of superconductivity in LK-99 at ambient conditions. <em>arXiv</em>:2308.03544. Retrieved from <a href="https://arxiv.org/abs/2308.03544">https://arxiv.org/abs/2308.03544</a></p>
</li>
<li>
<p><span id="ref-LiuMeng2023"></span>Liu, S., &amp; Meng, S. (2023). Symmetry-breaking and the origin of the anomalous properties of LK-99. <em>arXiv</em>:2308.05135. Retrieved from <a href="https://arxiv.org/abs/2308.05135">https://arxiv.org/abs/2308.05135</a></p>
</li>
<li>
<p><span id="ref-Garisto2023"></span>Garisto, D. (2023). LK-99 isn&rsquo;t a superconductor — how science sleuths solved the mystery. <em>Nature</em>, 620, 705–706. <a href="https://doi.org/10.1038/d41586-023-02585-7">DOI: 10.1038/d41586-023-02585-7</a></p>
</li>
<li>
<p><span id="ref-Snider2020"></span>Snider, E., Dasenbrock-Gammon, N., McBride, R., Debessai, M., Vindana, H., Vencatasamy, K., Lawler, K. V., Salamat, A., &amp; Dias, R. P. (2020). Room-temperature superconductivity in a carbonaceous sulfur hydride. <em>Nature</em>, 586, 373–377. <a href="https://doi.org/10.1038/s41586-020-2801-z">DOI: 10.1038/s41586-020-2801-z</a> (Retracted 2022.)</p>
</li>
<li>
<p><span id="ref-DiasSilvera2017"></span>Dias, R. P., &amp; Silvera, I. F. (2017). Observation of the Wigner-Huntington transition to metallic hydrogen. <em>Science</em>, 355, 715–718. <a href="https://doi.org/10.1126/science.aal1579">DOI: 10.1126/science.aal1579</a> (Erratum published 2017; widely questioned.)</p>
</li>
<li>
<p><span id="ref-Hirsch2021"></span>Hirsch, J. E. (2021). Rejoinder to &ldquo;Comment on &lsquo;Absence of magnetic evidence for superconductivity in hydride compounds&rsquo;&rdquo; by Dias and Salamat. <em>Physica C</em>, 590, 1353964. <a href="https://doi.org/10.1016/j.physc.2021.1353964">DOI: 10.1016/j.physc.2021.1353964</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-09-14</strong>: Updated the Cu₂S characterisation: pure Cu₂S is diamagnetic; the ferromagnetism in LK-99 samples comes from impurity phases. Updated the Dias &amp; Silvera 2017 <em>Science</em> paper status: it received an erratum but was not formally retracted (unlike the 2020 and 2023 <em>Nature</em> papers). Updated the Senapati et al. reference to the correct LK-99 debunking literature (the previous arXiv ID resolved to a different paper).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Zero Angular Momentum: The Falling Cat and the Geometry of Shape Space</title>
      <link>https://sebastianspicker.github.io/posts/falling-cat-geometric-phase/</link>
      <pubDate>Tue, 03 Oct 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/falling-cat-geometric-phase/</guid>
      <description>A cat dropped upside-down rotates 180° and lands on its feet, despite having zero angular momentum throughout. This is not a trick and not a violation of physics. The explanation took physicists from 1894 to 1993 to fully work out, and the answer — a geometric phase arising from the holonomy of a fiber bundle — is the same mathematics that governs the Berry phase in quantum mechanics and the Aharonov-Bohm effect in electrodynamics. We adopted two strays this year. They fall beautifully.</description>
      <content:encoded><![CDATA[<p><em>We adopted two stray cats in 2023. They had been living under a garden shed and
had strong opinions about most things, including the correct height from which to
leap onto a bookshelf and whether landing was optional. They are indoor cats now,
for health reasons — a vet&rsquo;s recommendation they find unconvincing but have largely
accepted. Watching one of them drop
from a windowsill — always feet-first, always orientated correctly, from heights
that would leave me reconsidering my life choices — I found myself thinking about
a problem I had first encountered in a mechanics course and had never fully
resolved to my satisfaction.</em></p>
<p><em>How does a cat rotate with zero angular momentum?</em></p>
<hr>
<h2 id="the-problem">The Problem</h2>
<p>When a cat is dropped from an inverted position — upside-down, held by a
practised experimenter, then released — it rotates approximately 180° and
lands on its feet. The drop takes around 0.3 seconds. The cat begins with
negligible angular momentum (the experimenter can release it with almost no
spin), and there are no external torques during free fall. By conservation of
angular momentum, the total angular momentum of the cat must remain constant
throughout the fall.</p>
<p>The total angular momentum is therefore approximately zero throughout the
fall.</p>
<p>And yet the cat rotates 180°.</p>
<p>This is the falling cat problem. It was first documented quantitatively by
Étienne-Jules Marey in 1894 using chronophotography — among the first
high-speed photography of any biological motion — and it has
occupied physicists, mathematicians, neuroscientists, and roboticists ever
since.</p>
<p>The problem is not exotic. Every cat owner has seen it. What requires
explanation is why our intuitions about angular momentum fail here, and what
replaces them.</p>
<hr>
<h2 id="why-the-obvious-answers-do-not-work">Why the Obvious Answers Do Not Work</h2>
<p>There are two naive explanations for the cat&rsquo;s righting reflex, both wrong.</p>
<p><strong>Explanation 1: The cat uses initial angular momentum.</strong> The experimenter
gives the cat a small spin before releasing it; the cat amplifies this to
achieve the full 180°. This fails because controlled experiments (and Marey&rsquo;s
original photographs) confirm that cats can right themselves even when
released with zero initial spin. Careful experimenters have verified this
explicitly.</p>
<p><strong>Explanation 2: The cat pushes against the air.</strong> A falling cat could, in
principle, use aerodynamic forces to push against the air and generate a
reaction. This fails because the angular impulse from air drag over 0.3
seconds is far too small to account for the observed 180° rotation. Marey&rsquo;s
chronophotographs already showed that the motion begins immediately on
release, before air resistance could contribute meaningfully.</p>
<p>Both explanations appeal to external torques. The correct explanation requires
none.</p>
<hr>
<h2 id="marey-and-the-photographic-evidence">Marey and the Photographic Evidence</h2>
<p>Étienne-Jules Marey published his chronophotographic sequence of a falling
cat in <em>La Nature</em> on 10 November 1894. The images, taken at 60 frames per
second, show the following clearly:</p>
<ol>
<li>The front and rear halves of the cat move <em>asymmetrically</em>. The front half
rotates in one direction; the rear half rotates by a smaller angle in the
opposite direction.</li>
<li>The cat pulls its front legs in close to its body (reducing the moment of
inertia of the front half) while extending its rear legs (increasing the
moment of inertia of the rear half).</li>
<li>The front half then rotates rapidly (large angle, small moment of inertia);
the rear half rotates slowly in the opposite direction (small angle, large
moment of inertia).</li>
<li>The cat then extends its front legs and pulls in its rear legs, and reverses
the process.</li>
</ol>
<p>The net effect: the cat&rsquo;s body orientation rotates by 180° even though the
<em>total</em> angular momentum — computed as the sum of both halves — remains
constant. The key word is <em>sum</em>. Individual parts can exchange angular momentum
through internal torques; the sum is conserved.</p>
<p>This mechanism — internal redistribution of angular momentum without changing
its total — is correct but not complete. It explains <em>that</em> rotation is
possible, not <em>how much</em> rotation is achieved per cycle of shape change. For
that, we need the mathematics.</p>
<hr>
<h2 id="kane-and-scher-the-two-cylinder-model">Kane and Scher: The Two-Cylinder Model</h2>
<p>The first rigorous mechanical model was published by T.R. Kane and M.P. Scher
in 1969 (<em>International Journal of Solids and Structures</em> 5, 663–670).</p>
<p>They modelled the cat as two rigid axisymmetric cylinders — a front half and
a rear half — connected at a joint that allows relative bending and twisting.
The joint constraint imposes that the relative twist between the two halves is
zero (a &ldquo;no-twist&rdquo; condition: the cylinders cannot spin relative to each other
at their connection). The total angular momentum of the system is held fixed
at zero.</p>
<p>Let the two cylinders have moments of inertia $I_1$ and $I_2$ about their
symmetry axes, and let $\phi$ be the bend angle between them and $\psi$ the
twist angle. The zero-angular-momentum constraint, combined with the no-twist
condition, gives a system of equations that can be integrated numerically to
find the net body rotation as a function of the shape-change trajectory
$(\phi(t), \psi(t))$.</p>
<p>Kane and Scher showed that a specific sequence of shape changes — one complete
cycle in the $(\phi, \psi)$ plane — produces a net rotation of approximately
90–100°. A second cycle gives the rest. The calculation was the first to
confirm, from mechanics alone, that the righting manoeuvre requires no external
torques and is entirely consistent with conservation of angular momentum.</p>
<p>What the Kane–Scher model does not explain is <em>why</em> the net rotation per cycle
depends on the area enclosed by the trajectory in shape space — or why the
same mathematical structure appears in quantum mechanics. For that, we need
Montgomery&rsquo;s formulation.</p>
<hr>
<h2 id="montgomery-fiber-bundles-and-geometric-holonomy">Montgomery: Fiber Bundles and Geometric Holonomy</h2>
<p>In 1993, Richard Montgomery published a reformulation of the falling cat problem
using gauge theory (<em>Dynamics and Control of Mechanical Systems</em>, Fields
Institute Communications, AMS, pp. 193–218). The reformulation is the
definitive mathematical treatment, and it connects the cat to one of the deepest
structures in modern physics.</p>
<h3 id="the-configuration-space">The Configuration Space</h3>
<p>The full configuration space of the cat — the space of all possible positions
and orientations — is</p>
$$Q = SO(3) \times \mathcal{S},$$<p>where $SO(3)$ is the rotation group (describing the cat&rsquo;s overall orientation
in space) and $\mathcal{S}$ is the <em>shape space</em> (describing the internal
geometry: the bend angle, the twist, the position of each limb relative to the
body).</p>
<p>The angular momentum constraint $\mathbf{L} = 0$ defines a <em>horizontal
distribution</em> on $Q$ — a preferred subspace of tangent vectors at each point
that correspond to shape changes at zero angular momentum. This distribution is
not integrable (it does not come from a foliation), which is the mathematical
signature that holonomy is possible.</p>
<h3 id="the-fiber-bundle">The Fiber Bundle</h3>
<p>The projection</p>
$$\pi \colon Q \to \mathcal{S}, \qquad (R, s) \mapsto s,$$<p>makes $Q$ into a principal fiber bundle over $\mathcal{S}$ with structure group
$SO(3)$. The fiber above each shape $s \in \mathcal{S}$ is the set of all
orientations the cat can have with that shape.</p>
<p>A <em>connection</em> on this bundle is a rule for &ldquo;lifting&rdquo; paths in the base
$\mathcal{S}$ to horizontal paths in the total space $Q$ — that is, paths
along which the angular momentum constraint is satisfied. This connection
$\mathcal{A}$ is a one-form on $\mathcal{S}$ taking values in the Lie algebra
$\mathfrak{so}(3)$.</p>
<h3 id="holonomy-the-geometric-phase">Holonomy: The Geometric Phase</h3>
<p>When the cat executes a closed loop $\gamma$ in shape space — a sequence of
shape changes that returns it to its initial shape — the <em>holonomy</em> of the
connection $\mathcal{A}$ around $\gamma$ gives the net rotation:</p>
$$R_\gamma = \mathrm{Hol}_\mathcal{A}(\gamma) \in SO(3).$$<p>For the full non-Abelian case ($SO(3)$), the holonomy is a path-ordered
exponential along $\gamma$ and its relationship to the curvature involves
non-Abelian corrections. But the essential geometric intuition is captured
by the Abelian case — rotation about a single axis — where Stokes&rsquo;s theorem
gives the net rotation directly:</p>
$$\theta_\gamma = \iint_{\Sigma} F,$$<p>where $\Sigma$ is a surface bounded by $\gamma$ and $F = d\mathcal{A}$
is the curvature 2-form. The cat&rsquo;s net rotation per cycle is the integral
of the curvature over the area enclosed by its shape-change loop in
$\mathcal{S}$. For small loops, the curvature $F_\mathcal{A} = d\mathcal{A}</p>
<ul>
<li>\mathcal{A} \wedge \mathcal{A}$ determines the holonomy to leading order
in both the Abelian and non-Abelian cases.</li>
</ul>
<p>The rotation is <em>geometric</em>: it depends on the shape of the loop, not on the
speed at which the loop is traversed. A cat executing the same shape-change
sequence twice as fast achieves the same rotation in half the time.</p>
<hr>
<h2 id="the-connection-to-berry-phase">The Connection to Berry Phase</h2>
<p>The gauge structure of the falling cat problem is not an isolated curiosity.
It is the same mathematical structure that governs several central phenomena
in modern physics.</p>
<p><strong>The Berry phase</strong> (Berry 1984, <em>Proceedings of the Royal Society A</em>) arises
when a quantum system is transported adiabatically around a closed loop $C$ in
parameter space. The state acquires a phase</p>
$$\gamma_B = \oint_C \mathbf{A} \cdot d\mathbf{R},$$<p>where $\mathbf{A} = i\langle n(\mathbf{R}) | \nabla_\mathbf{R} | n(\mathbf{R}) \rangle$
is the Berry connection — a gauge field on parameter space. The Berry phase is
the holonomy of this connection, which is to say: the cat righting itself and
a quantum state accumulating a geometric phase are instances of the <em>same
mathematical theorem</em>.</p>
<p>Shapere and Wilczek (1989) made this connection explicit for deformable bodies,
noting that the net rotation of a swimming microorganism or a falling cat is
the holonomy of a gauge connection on shape space — exactly the Berry phase,
expressed in the language of classical mechanics.</p>
<p><strong>The Foucault pendulum</strong> precesses at a rate of $2\pi\sin\phi$ per sidereal
day, where $\phi$ is the latitude. The holonomy of the Levi-Civita connection
on $S^2$ for parallel transport around the circle of latitude is the solid
angle of the enclosed polar cap, $\Omega = 2\pi(1 - \sin\phi)$. The
lab-frame precession $2\pi\sin\phi = 2\pi - \Omega$ is the complementary
angle — the two sum to a full rotation because the local frame itself
completes one circuit per sidereal day. It is another geometric phase.</p>
<p><strong>The Aharonov-Bohm effect</strong> (1959) produces a phase shift for electrons
circling a solenoid, even when the electrons travel only through field-free
regions. The phase is the holonomy of the electromagnetic vector potential
$\mathbf{A}$ around the loop — a Berry phase for the electromagnetic field.</p>
<p>All four phenomena — the falling cat, the Berry phase, the Foucault pendulum,
the Aharonov-Bohm effect — are manifestations of the same structure: a
connection on a fiber bundle, and holonomy as the geometric consequence of
traversing a closed loop.</p>
<p>Batterman (2003, <em>Studies in History and Philosophy of Modern Physics</em> 34,
527–557) gives a particularly clear account of this unification, drawing out
the common mathematical skeleton and its physical implications.</p>
<hr>
<h2 id="high-rise-syndrome-terminal-velocity-and-the-parachute-cat">High-Rise Syndrome: Terminal Velocity and the Parachute Cat</h2>
<p>There is a grounding empirical footnote to the elegant geometry above. Whitney
and Mehlhaff (1987, <em>Journal of the American Veterinary Medical Association</em>
191, 1399–1403) analysed 132 cats brought to a Manhattan veterinary clinic after
falling from buildings of two to thirty-two stories. Their finding was
counterintuitive:</p>
<p>Cats falling from above seven stories had a <em>lower</em> injury rate than cats
falling two to six stories. Overall, 90% of the cats in the study survived,
with injuries paradoxically less severe at greater heights.</p>
<p>The explanation involves two phases. Below seven stories, the cat is still
accelerating: it is tense, its legs are extended to brace for impact, and it
absorbs the force of landing poorly. Above seven stories, the cat reaches
terminal velocity — approximately $100\,\mathrm{km/h}$ for a falling cat — and
then, apparently, <em>relaxes</em>. The vestibular system, having identified that the
fall is not ending imminently, switches from the righting reflex to a
parachute posture: legs spread horizontally, body flattened, increasing the
cross-sectional area and hence air resistance.</p>
<p>Terminal velocity is reached when the drag force equals the gravitational force:</p>
$$mg = \frac{1}{2} C_D \rho A v_t^2, \qquad
v_t = \sqrt{\frac{2mg}{C_D \rho A}}.$$<p>For a spread-eagle cat ($m \approx 4\,\mathrm{kg}$, $A \approx 0.06\,\mathrm{m}^2$,
$C_D \approx 1.0$, $\rho_\mathrm{air} \approx 1.2\,\mathrm{kg/m}^3$):</p>
$$v_t \approx \sqrt{\frac{2 \times 4 \times 9.8}{1.0 \times 1.2 \times 0.06}}
\approx 33\,\mathrm{m/s} \approx 120\,\mathrm{km/h}.$$<p>(The exact value depends on posture and fur drag; empirical estimates for
cats in the parachute posture are lower, roughly $25$–$30\,\mathrm{m/s}$,
because the effective area increases when the limbs are spread.)</p>
<p>A human in free-fall has terminal velocity around $55\,\mathrm{m/s}$
($200\,\mathrm{km/h}$) — faster, because the mass-to-area ratio is higher.
The cat, with its low mass and high drag relative to body weight, hits a
gentler terminal velocity and distributes the impact more effectively.</p>
<p>The study is sometimes cited as evidence that cats are invincible. A significant
caveat is <strong>survivorship bias</strong>: cats that died on impact were likely not brought
to the veterinary clinic, so the dataset underrepresents fatal outcomes,
especially for higher falls. The apparent decrease in injury rate above seven
stories may partly reflect the fact that the most severely injured cats from
those heights never entered the study. The aerodynamic posture explanation is
plausible, but the data do not cleanly separate it from the sampling bias.</p>
<hr>
<h2 id="robotics-and-spacecraft">Robotics and Spacecraft</h2>
<p>The falling cat problem has practical applications beyond veterinary statistics.</p>
<p><strong>Spacecraft attitude control</strong>: Astronauts in free fall can change their
body orientation without thrusters, using the same gauge-theoretic mechanism
as the cat. NASA and ESA have studied cat-inspired reorientation manoeuvres
for astronauts and satellites.</p>
<p><strong>Robotics</strong>: The two-cylinder model inspired early robot designs capable of
reorienting in free fall — useful for robots deployed from aircraft or
spacecraft. Subsequent work (including a 2022 review in <em>IEEE Transactions on
Robotics</em>) has produced legged robots that can right themselves after being
knocked over using shape-change sequences derived from the Montgomery connection.</p>
<p><strong>Gymnastics and diving</strong>: Human athletes performing somersaults and twists
exploit the same gauge structure, though without articulating the mathematics.
A tuck increases rotation rate (smaller $I$, constant $L$ → larger $\omega$);
a layout decreases it. Changing the tuck–layout timing mid-rotation produces
a net twist — holonomy in the shape space of a human body.</p>
<hr>
<h2 id="the-view-from-a-windowsill">The View from a Windowsill</h2>
<p>My cats have no opinion about fiber bundles. When one of them drops from the
top of the bookcase, she is not solving the variational problem</p>
$$\min_{\gamma \in \Omega} \int_\gamma |\dot{s}|^2 \, dt,
\quad \text{subject to } \mathrm{Hol}_\mathcal{A}(\gamma) = R_{180°},$$<p>she is executing a motor program refined over millions of years of feline
evolution. The vestibular system provides continuous feedback on body
orientation; the cerebellum coordinates the shape-change sequence; the whole
manoeuvre is over in a third of a second.</p>
<p>What physics tells us is that the manoeuvre is <em>possible</em> — that no law of
nature forbids a body with zero angular momentum from reorienting — and gives
the precise geometric reason: the curvature of a connection on shape space is
non-zero, which means the holonomy of closed loops is non-trivial.</p>
<p>The same curvature that allows a cat to right itself allows a quantum state to
accumulate a geometric phase, allows the Foucault pendulum to precess, and
allows the Aharonov-Bohm effect to shift an interference fringe without a local
field. These are not analogies. They are the same theorem, applied to different
physical systems in different mathematical languages.</p>
<p>I find this more remarkable than the cat.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Batterman, R.W. (2003). Falling cats, parallel parking, and polarized light.
<em>Studies in History and Philosophy of Modern Physics</em>, 34(4), 527–557.
<a href="https://doi.org/10.1016/S1355-2198(03)00062-5">https://doi.org/10.1016/S1355-2198(03)00062-5</a></p>
</li>
<li>
<p>Berry, M.V. (1984). Quantal phase factors accompanying adiabatic changes.
<em>Proceedings of the Royal Society A</em>, 392, 45–57.
<a href="https://doi.org/10.1098/rspa.1984.0023">https://doi.org/10.1098/rspa.1984.0023</a></p>
</li>
<li>
<p>Gbur, G.J. (2019). <em>Falling Felines and Fundamental Physics.</em> Yale University
Press.</p>
</li>
<li>
<p>Kane, T.R., &amp; Scher, M.P. (1969). A dynamical explanation of the falling cat
phenomenon. <em>International Journal of Solids and Structures</em>, 5(7), 663–670.
<a href="https://doi.org/10.1016/0020-7683(69)90086-9">https://doi.org/10.1016/0020-7683(69)90086-9</a></p>
</li>
<li>
<p>Marey, É.-J. (1894). Des mouvements que certains animaux exécutent pour
retomber sur leurs pieds lorsqu&rsquo;ils sont précipités d&rsquo;un lieu élevé. <em>La
Nature</em>, 10 November 1894.</p>
</li>
<li>
<p>Montgomery, R. (1993). Gauge theory of the falling cat. In M. Enos (Ed.),
<em>Dynamics and Control of Mechanical Systems</em> (Fields Institute Communications,
Vol. 1, pp. 193–218). American Mathematical Society.</p>
</li>
<li>
<p>Shapere, A., &amp; Wilczek, F. (Eds.). (1989). <em>Geometric Phases in Physics.</em>
World Scientific.</p>
</li>
<li>
<p>Whitney, W.O., &amp; Mehlhaff, C.J. (1987). High-rise syndrome in cats. <em>Journal
of the American Veterinary Medical Association</em>, 191(11), 1399–1403.</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-15</strong>: Corrected the Marey publication date from 22 November 1894 to 10 November 1894 (in text and in reference). Updated the Whitney &amp; Mehlhaff (1987) statistics to reflect that the 90% survival rate applies to all cats in the study, as reported in the paper, rather than specifically to those falling from above seven stories.</li>
</ul>
]]></content:encoded>
    </item>
  </channel>
</rss>
