<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Sebastian Spicker</title>
    <link>https://sebastianspicker.github.io/</link>
    <description>Recent content on Sebastian Spicker</description>
    <image>
      <title>Sebastian Spicker</title>
      <url>https://sebastianspicker.github.io/og-image.png</url>
      <link>https://sebastianspicker.github.io/og-image.png</link>
    </image>
    <generator>Hugo -- 0.160.0</generator>
    <language>en</language>
    <lastBuildDate>Tue, 07 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://sebastianspicker.github.io/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>There Is an App for That — Until There Isn&#39;t</title>
      <link>https://sebastianspicker.github.io/posts/automatable-unautomatable-baumol-mental-health/</link>
      <pubDate>Tue, 07 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/automatable-unautomatable-baumol-mental-health/</guid>
      <description>German health insurance will reimburse a mental health app within days but cannot provide a therapist within six months. Last week, psychotherapy fees were cut by 4.5%. Baumol&amp;rsquo;s cost disease — originally about why string quartets get relatively more expensive — explains why the app gold rush and the collapse of mental health provision are the same phenomenon.</description>
      <content:encoded><![CDATA[<p>Someone vibe coded an app that tells you how many layers to wear today. It has 85,000 users. Someone else tracks her eyelash styles — every new set gets a photo and a note about the method. A father built Storypot: his kids drag emoji into a virtual pot and the app generates a bedtime story. A product manager made Standup Buddy, which randomises who talks first in a daily meeting. That is the entire feature.
These are not bad things. Some of them are genuinely lovely — Storypot in particular. The layers app clearly meets a need, given 85,000 people agree. I have built tools like this myself — I automated my concert setlist workflow and <a href="/posts/setlist-to-playlist/">wrote about it on this blog</a> — and the feeling of compressing a forty-minute ritual into four minutes of machine-assisted execution is real and satisfying.</p>
<p>There is a term for this now. Karpathy coined it in early 2025: vibe coding. You describe what you want, the model writes the code, you run it, you fix what breaks by describing the fix, and at no point do you necessarily understand what the code does. The barrier to building software has not been lowered so much as removed. A single person with an afternoon and a language model can ship what would have required a team and a quarter, two years ago.</p>
<p>Meanwhile. In Germany, the average wait from an initial consultation to the start of psychotherapy is 142 days — nearly five months — according to a BPtK analysis of statutory insurance billing data <a href="#ref-1">[1]</a>. The Telefonseelsorge — the crisis line, the last resort — handled 1.2 million calls in 2024. It is staffed by approximately 7,700 volunteers and funded primarily by the Protestant and Catholic churches. Its financing is described, in its own institutional language, as <em>äußerst angespannt</em> — extremely strained <a href="#ref-2">[2]</a>. Six days ago, on April 1, psychotherapy fees in Germany were cut by 4.5% <a href="#ref-3">[3]</a>. The thesis of this post is structural, not moral. There is a class of work that scales, and a class of work that does not. Our entire economy of attention — cultural, financial, technological — is optimised for the first class. The second class is not merely neglected. It is being made structurally more expensive, in a precise economic sense, by the very productivity gains that make the first class so intoxicating. And the policy apparatus, facing this structural pressure, is doing exactly what you would predict: it is funding apps.</p>
<p>The economist William Baumol explained the mechanism in 1966. It has a name, and the name is a diagnosis.</p>
<hr>
<h2 id="the-seduction-of-leverage">The Seduction of Leverage</h2>
<p>What makes vibe coding culturally significant is not the code. It is the leverage. A single developer, aided by a language model, can produce software that reaches millions of users. The marginal cost of an additional user approaches zero. The output scales without bound while the input — one person, one prompt, one afternoon — stays fixed. This is the defining characteristic of automatable work: the ratio of output to input can grow without limit.</p>
<p>This is not new. Software has always had this property. What is new is that the barrier to producing software has collapsed. You no longer need to understand data structures, or networking, or the programming language. You need an idea and a few hours. The productivity frontier has shifted so dramatically that the interesting constraint is no longer <em>can I build it</em> but <em>should anyone use it</em>. The cultural response has been euphoric. Communities, podcasts, courses, manifestos. People who have never written a line of code are shipping products. I am not interested in dismissing this. The ability to build is a form of agency, and more people having it is not, in itself, a problem. The problem is what the euphoria obscures.</p>
<h2 id="what-therapy-actually-requires">What Therapy Actually Requires</h2>
<p>A psychotherapy session has the following structure. One therapist sits with one patient for approximately fifty minutes. The therapist listens, observes, formulates, responds. The patient speaks, reflects, resists, revises. The therapeutic alliance — the quality of the relationship between therapist and patient — is one of the most robust predictors of treatment outcome, across modalities, across conditions, across decades of research <a href="#ref-4">[4]</a>. This is not a feature that can be optimised away. It is the mechanism of action. When a meta-analysis finds that the specific technique matters less than the relationship — that CBT, psychodynamic, and humanistic therapies produce roughly equivalent outcomes when the alliance is strong — it is telling you that the human in the room is not an implementation detail. The human in the room <em>is</em> the intervention.</p>
<p>You cannot parallelise this. A therapist cannot see two patients simultaneously without degrading the thing that makes the session work. You cannot batch it — twelve people in a room is group therapy, which is a different intervention with different dynamics and different limitations. You cannot cache it — the session is not a retrieval operation over stored responses but an emergent interaction that depends on what happens in the room that day. The irreducible unit of therapy is: one trained human, fully present, for one hour, with one other human. This has not changed since Freud&rsquo;s consulting room on Berggasse 19, and no plausible technological development will change it, because the presence <em>is</em> the treatment. A therapist working full-time can see roughly twenty-five to thirty patients per week. That is the ceiling. It is set by the biology of attention and the ethics of care, not by inefficiency.</p>
<h2 id="baumols-cost-disease">Baumol&rsquo;s Cost Disease</h2>
<p>In 1966, the economists William Baumol and William Bowen published <em>Performing Arts, The Economic Dilemma</em>, a study of why orchestras, theatre companies, and dance troupes were perpetually in financial crisis despite growing audiences and rising cultural prestige <a href="#ref-5">[5]</a>. Their diagnosis was precise. A string quartet requires four musicians and approximately forty minutes to perform Beethoven&rsquo;s Op. 131. This was true in 1826 and is true in 2026. The productivity of the quartet — measured in output per unit of labour input — has not increased. It cannot increase. The performance <em>is</em> the labour.</p>
<p>Meanwhile, the productivity of a textile worker, a steelworker, a software developer has increased by orders of magnitude. Wages in the productive sectors rise because productivity rises. Wages in the nonproductive sectors must keep pace — not because musicians deserve parity as a matter of justice, though they may, but because if they do not keep pace, musicians will leave for sectors that pay more. The quartet must compete in the same labour market as the factory and the tech company.</p>
<p>The result: the relative cost of live performance rises without bound. Not because musicians got worse. Not because audiences stopped caring. But because everything else got cheaper, and the quartet cannot. Baumol later generalised the result beyond the performing arts to all services in which the labour itself constitutes the product: education, healthcare, legal services, and — centrally for our purposes — psychotherapy <a href="#ref-6">[6]</a>. A therapy session is a string quartet. The labour is the product. The productivity cannot increase. The cost, relative to the scalable economy, rises every time the scalable economy gets more productive. And vibe coding is a massive productivity shock to the scalable economy.</p>
<h2 id="there-is-an-app-for-that">There Is an App for That</h2>
<p>In 2019, the German government passed the Digitales-Versorgung-Gesetz, creating a fast-track approval process for <em>Digitale Gesundheitsanwendungen</em> — digital health applications, or DiGA. The idea: apps that can be prescribed by a doctor and reimbursed by statutory health insurance, just like medication. A patient walks into a practice, receives a prescription code, downloads the app, and the Krankenkasse pays <a href="#ref-7">[7]</a>. As of mid-2025, the BfArM directory lists roughly 58 DiGA. Nearly half target psychiatric conditions — depression, anxiety, insomnia, burnout. Names like deprexis, HelloBetter, Selfapy. A patient who would wait 142 days for a therapist can get a DiGA prescribed the same afternoon.</p>
<p>The pricing structure deserves attention. In the first twelve months after listing, manufacturers set their own price. The average: €541 per prescription <a href="#ref-8">[8]</a>. Some exceeded €2,000. After the first year, negotiated prices drop to an average of roughly €226 — but by then, the insurance has already paid the introductory rate for every early adopter. Total statutory health insurance spending on DiGA since 2020: €234 million. That spending grew 71% between 2023 and 2024 <a href="#ref-9">[9]</a>. Here is the number that should sit next to that one. A single outpatient psychotherapy session costs the insurance system approximately €115. The €234 million spent on DiGA since 2020 could have funded over two million therapy sessions — enough for roughly 80,000 complete courses of 25-session treatment. And here is the evidence question. Only 12 of the 68 DiGA that have entered the directory demonstrated a proven positive care effect at the time of inclusion. The rest were listed provisionally, with twelve months to produce evidence. About one in six were subsequently delisted — removed from the directory because the evidence did not materialise <a href="#ref-10">[10]</a>.</p>
<p>I want to be precise about what I am and am not saying. Some DiGA have a real evidence base. Structured CBT exercises delivered digitally can produce measurable short-term symptom improvement — I reviewed the Woebot trial data in an <a href="/posts/ai-companion-loneliness-ironic-process/">earlier post on AI companions</a> and took those results seriously. A DiGA that delivers psychoeducation and behavioural activation exercises is a tool, and tools can be useful. But a tool and a therapeutic relationship are not the same product delivered through different channels. They are different products. The policy framework treats them as substitutable — the patient who cannot access a therapist receives an app instead. The substitution is not a clinical judgement. It is a structural inevitability: facing the impossibility of scaling therapy, the system reaches for the scalable alternative, because the scalable alternative is what the incentive structure rewards. This is not a corruption story. This is Baumol&rsquo;s cost disease expressed through health policy. The system is doing exactly what the theory predicts.</p>
<h2 id="the-fear-and-the-compliance">The Fear and the Compliance</h2>
<p>There is an irony at the centre of the current discourse about AI and work that I want to name, because I think it is underexamined. People are afraid of AI. Specifically, they are afraid it will take their jobs. The surveys confirm this consistently — Gallup, Pew, the European Commission&rsquo;s Eurobarometer — significant fractions of the working population in every developed country report anxiety about AI-driven job displacement.</p>
<p>And yet. The same people — not a different demographic, not a separate population, the <em>same people</em> — are enthusiastically using AI to do their work. They use language models to write their emails, their reports, their presentations. They vibe code tools for their teams. They let AI draft their strategy documents, summarise their meetings, compose their performance reviews. They celebrate the productivity gain. They post about it. This is not hypocrisy. It is something more interesting: a revealed preference for automation that contradicts a stated preference against it. The fear is about structural displacement — losing the <em>role</em>. The compliance is about local optimisation — doing the <em>task</em> more efficiently. No one wakes up and decides to automate themselves out of a job. They automate one task at a time, each automation locally sensible, until the job is a shell around an AI core. And all of this activity — the fear, the adoption, the discourse, the think pieces, the congressional hearings — is directed at automatable work. The kind of work where AI is a plausible substitute.</p>
<p>No one is afraid that AI will take the crisis counsellor&rsquo;s job. No one is vibe coding a replacement for a psychiatric nurse. The work that is collapsing is not collapsing because AI replaced it. It is collapsing because it was never scalable, never attracted the capital or the talent that scalable work attracts, and every productivity gain in the scalable sector makes the unscalable sector relatively more expensive and harder to staff. The discourse about AI and jobs is, in this sense, exactly backwards. The threat is not that AI will replace the work that matters most. The threat is that it will make the work that matters most <em>invisible</em> — by making everything else so cheap and fast and abundant that we forget the expensive, slow, irreducibly human work exists at all.</p>
<h2 id="the-political-arithmetic">The Political Arithmetic</h2>
<p>On March 11, 2026, the Erweiterter Bewertungsausschuss — the body that sets fee schedules for outpatient care in Germany — decided a 4.5% flat cut to nearly all psychotherapeutic service fees, effective April 1 <a href="#ref-3">[3]</a>. The health insurers had originally demanded 10%. Germany spends €4.6 billion annually on outpatient psychotherapy — roughly 1.5% of total statutory health insurance expenditure. The fee cut applies to this budget. The average therapist surplus — what remains after practice costs — is approximately €52 per hour <a href="#ref-11">[11]</a>. The cut is not large in percentage terms. It is large in the context of a profession that is already among the lowest-paid in outpatient medicine. Nearly half a million people signed a petition against the cuts. There were protests in Berlin, Leipzig, Hanover, Hamburg, Stuttgart, Munich. The Kassenärztliche Bundesvereinigung filed a lawsuit. The Bundespsychotherapeutenkammer called the decision <em>skandalös</em> <a href="#ref-12">[12]</a>.</p>
<p>What makes this particularly striking is the sequence. The coalition agreement signed by CDU/CSU and SPD in May 2025 explicitly addresses mental health — securing psychotherapy training financing, needs-based planning for child and adolescent psychotherapy, crisis intervention rights for psychotherapists, and a suicide prevention law. The BPtK itself welcomed the agreement as giving mental health a <em>neuen Stellenwert</em>, a new significance <a href="#ref-13">[13]</a>. Less than a year later, the same government&rsquo;s arbitration body cuts psychotherapy fees by 4.5%. The stated commitment and the enacted policy point in opposite directions. This is not unusual in politics. What is unusual is that it maps so precisely onto Baumol&rsquo;s mechanism: the coalition agreement acknowledges the problem in language; the fee schedule acknowledges it in arithmetic. And the arithmetic wins, because the arithmetic always wins when the work does not scale. The <em>Bedarfsplanung</em>, the needs-based planning system that determines how many psychotherapy seats are approved per region, was partially reformed in 2019 after decades of operating on 1990s-era ratios. The reform added roughly 800 seats. The BPtK considers it still fundamentally inadequate <a href="#ref-14">[14]</a>.</p>
<p>The arithmetic is plain. DiGA spending: growing 71% year on year. Psychotherapy fees: cut by 4.5%. The direction is unambiguous. Invest in the scalable. Cut the unscalable. And the damage compounds in a way that the policy apparatus appears not to understand, or not to care about. A therapist who leaves the profession because €52 per hour is no longer viable does not return when the cut is reversed. The training pipeline for a new clinical psychologist runs six to eight years from university admission to licensure. Over forty thousand accredited psychotherapists serve the system today <a href="#ref-14">[14]</a>. Every one who leaves creates a gap measured in decades, not budget cycles. The Telefonseelsorge, staffed by volunteers and funded by the churches, is not a mental health system. It is what remains when the mental health system is not there. Treating it as a substitute — treating 7,700 volunteers as adequate coverage for a country of 84 million — is not a policy position. It is an admission that the actual policy has failed.</p>
<h2 id="the-uncomfortable-part">The Uncomfortable Part</h2>
<p>Here is where I should, by the conventions of the form, propose a solution. I should say something about funding, about training pipelines, about recognising care work as infrastructure rather than a cost centre.</p>
<p>I think those things are true. I think we should pay therapists more, not less. I think Baumol&rsquo;s cost disease means we should <em>expect</em> this to be expensive and fund it anyway, because the alternative — accepting that people in crisis will wait 142 days while the scalable economy celebrates another productivity milestone — is a failure of collective priorities so basic that it should be uncomfortable to state plainly. But I am also the person who automated his setlist workflow and was satisfied by the compression. I vibe code things. I use AI tools daily. I am inside the attention gradient, not observing it from above. The part of me that finds leverage intoxicating is the same part that writes this blog, and I do not think I am unusual in this.</p>
<p>The structural isomorphism is exact: Baumol&rsquo;s string quartet, the therapist&rsquo;s fifty minutes, the crisis counsellor&rsquo;s phone call at 3am. The labour is the product. The product does not scale. The cost rises. The talent flows elsewhere. And the policy, rather than resisting the gradient, follows it — funding apps, cutting fees, digitising what cannot be digitised without changing what it is. The layers app reaches 85,000 users. The therapy app is reimbursed within the week. The therapist is available in five months, if at all.</p>
<p>I do not have a clean resolution to offer. I have a diagnosis — Baumol&rsquo;s cost disease, applied to the attention economy of a civilisation that has discovered how to make scalable work almost free — and an observation: the political system is not counteracting the disease. It is accelerating it. The quartet still needs four musicians. The session still needs the therapist in the room. The phone still needs someone to answer it. Nothing we are building will change this. The question is whether we notice before the people who needed the answer stop calling.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Bundespsychotherapeutenkammer. <em>Psychisch Kranke warten 142 Tage auf eine psychotherapeutische Behandlung</em>. BPtK. <a href="https://www.bptk.de/pressemitteilungen/psychisch-kranke-warten-142-tage-auf-eine-psychotherapeutische-behandlung/">https://www.bptk.de/pressemitteilungen/psychisch-kranke-warten-142-tage-auf-eine-psychotherapeutische-behandlung/</a></p>
<p><span id="ref-2"></span>[2] Evangelisch-Lutherische Kirche in Norddeutschland (2025). <em>Finanzierung der Telefonseelsorge ist äußerst angespannt</em>. <a href="https://www.kirche-mv.de/nachrichten/2025/februar/finanzierung-der-telefonseelsorge-ist-aeusserst-angespannt">https://www.kirche-mv.de/nachrichten/2025/februar/finanzierung-der-telefonseelsorge-ist-aeusserst-angespannt</a></p>
<p><span id="ref-3"></span>[3] Kassenärztliche Bundesvereinigung (2026). <em>Paukenschlag: KBV klagt gegen massive Kürzungen psychotherapeutischer Leistungen</em>. <a href="https://www.kbv.de/presse/pressemitteilungen/2026/paukenschlag-kbv-klagt-gegen-massive-kuerzungen-psychotherapeutischer-leistungen">https://www.kbv.de/presse/pressemitteilungen/2026/paukenschlag-kbv-klagt-gegen-massive-kuerzungen-psychotherapeutischer-leistungen</a></p>
<p><span id="ref-4"></span>[4] Flückiger, C., Del Re, A. C., Wampold, B. E., &amp; Horvath, A. O. (2018). The alliance in adult psychotherapy: A meta-analytic synthesis. <em>Psychotherapy</em>, 55(4), 316–340. <a href="https://doi.org/10.1037/pst0000172">https://doi.org/10.1037/pst0000172</a></p>
<p><span id="ref-5"></span>[5] Baumol, W. J., &amp; Bowen, W. G. (1966). <em>Performing Arts, The Economic Dilemma: A Study of Problems Common to Theater, Opera, Music and Dance</em>. Twentieth Century Fund.</p>
<p><span id="ref-6"></span>[6] Baumol, W. J. (2012). <em>The Cost Disease: Why Computers Get Cheaper and Health Care Doesn&rsquo;t</em>. Yale University Press.</p>
<p><span id="ref-7"></span>[7] Bundesinstitut für Arzneimittel und Medizinprodukte. <em>DiGA-Verzeichnis</em>. <a href="https://diga.bfarm.de/de">https://diga.bfarm.de/de</a></p>
<p><span id="ref-8"></span>[8] GKV-Spitzenverband (2025). <em>Bericht des GKV-Spitzenverbandes über die Inanspruchnahme und Entwicklung der Versorgung mit Digitalen Gesundheitsanwendungen</em>. Reported in: MTR Consult. <a href="https://mtrconsult.com/news/gkv-report-utilization-and-development-digital-health-application-diga-care-germany">https://mtrconsult.com/news/gkv-report-utilization-and-development-digital-health-application-diga-care-germany</a></p>
<p><span id="ref-9"></span>[9] Heise Online (2025). <em>Insurers critique high costs and low benefits of prescription apps</em>. <a href="https://www.heise.de/en/news/Insurers-critique-high-costs-and-low-benefits-of-prescription-apps-10375339.html">https://www.heise.de/en/news/Insurers-critique-high-costs-and-low-benefits-of-prescription-apps-10375339.html</a></p>
<p><span id="ref-10"></span>[10] Goeldner, M., &amp; Gehder, S. (2024). Digital Health Applications (DiGAs) on a Fast Track: Insights From a Data-Driven Analysis of Prescribable Digital Therapeutics in Germany From 2020 to Mid-2024. <em>JMIR mHealth and uHealth</em>. <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11393499/">https://pmc.ncbi.nlm.nih.gov/articles/PMC11393499/</a></p>
<p><span id="ref-11"></span>[11] Taz (2026). <em>Weniger Honorar für Psychotherapie</em>. <a href="https://taz.de/Weniger-Honorar-fuer-Psychotherapie/!6162806/">https://taz.de/Weniger-Honorar-fuer-Psychotherapie/!6162806/</a></p>
<p><span id="ref-12"></span>[12] Bundespsychotherapeutenkammer (2026). <em>Gemeinsam gegen die Kürzung psychotherapeutischer Leistungen</em>. <a href="https://www.bptk.de/pressemitteilungen/gemeinsam-gegen-die-kuerzung-psychotherapeutischer-leistungen/">https://www.bptk.de/pressemitteilungen/gemeinsam-gegen-die-kuerzung-psychotherapeutischer-leistungen/</a></p>
<p><span id="ref-13"></span>[13] Bundespsychotherapeutenkammer (2025). <em>Koalitionsvertrag gibt psychischer Gesundheit neuen Stellenwert</em>. <a href="https://www.bptk.de/pressemitteilungen/koalitionsvertrag-gibt-psychischer-gesundheit-neuen-stellenwert/">https://www.bptk.de/pressemitteilungen/koalitionsvertrag-gibt-psychischer-gesundheit-neuen-stellenwert/</a></p>
<p><span id="ref-14"></span>[14] Bundespsychotherapeutenkammer. <em>Reform der Bedarfsplanung</em>. <a href="https://www.bptk.de/ratgeber/reform-der-bedarfsplanung/">https://www.bptk.de/ratgeber/reform-der-bedarfsplanung/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>The Model Has No Seahorse: Vocabulary Gaps and What They Reveal About LLMs</title>
      <link>https://sebastianspicker.github.io/posts/seahorse-emoji-vocabulary-gaps-llm/</link>
      <pubDate>Wed, 04 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/seahorse-emoji-vocabulary-gaps-llm/</guid>
      <description>There is no seahorse emoji in Unicode. Ask a large language model to produce one and watch what happens. The failure is not a hallucination in the ordinary sense — the model knows what it wants to output but cannot output it. That distinction matters.</description>
      <content:encoded><![CDATA[<p>Try a simple experiment. Open any of the major language model interfaces and ask it, as plainly as possible, to produce a seahorse emoji. What you get back will probably be one of a small number of things. The model might confidently output something that is not a seahorse emoji — a horse face, a tropical fish, a dolphin, sometimes a spiral shell. It might produce a cascade of marine-themed emoji as if searching through an aquarium before eventually settling on something. It might hedge at length and then get it wrong anyway. Occasionally it will self-correct after producing an incorrect token. What it almost never does is say: there is no seahorse emoji in Unicode, so I cannot produce one.</p>
<p>That silence is interesting. Not because the model is being evasive, and not because this is an especially important use case — nobody&rsquo;s critical infrastructure depends on seahorse emoji production. It is interesting because it reveals a specific structural feature of how language models relate to their own capabilities. The gap between what a model knows about the world and what it knows about its own output vocabulary is a real gap, and it shows up in ways that are worth understanding carefully.</p>
<p>I am going to work through the seahorse incident, a companion failure involving a morphologically valid but corpus-rare English word, and what both of them suggest about a class of self-knowledge failure that I think is underappreciated compared to ordinary hallucination.</p>
<h2 id="the-incident">The incident</h2>
<p>In 2025, Vgel published an analysis of exactly this failure <a href="#ref-1">[1]</a>. The piece is worth reading in full, but the core finding is worth unpacking here.</p>
<p>When a model is asked to produce a seahorse emoji, something specific happens at the level of the model&rsquo;s internal representations. Using logit lens analysis — a technique for inspecting the model&rsquo;s intermediate layer activations as if they were already projecting into vocabulary space <a href="#ref-4">[4]</a> — it is possible to track what the model&rsquo;s &ldquo;working answer&rdquo; looks like at each layer of the transformer. What Vgel found is that in the late layers, the model does construct something that functions like a &ldquo;seahorse + emoji&rdquo; representation. The semantic work is happening correctly. The model is not confused about whether seahorses are real animals, not confused about whether emoji are a thing, not confused about whether animals commonly have emoji representations. It has assembled the correct semantic vector for what it wants to output.</p>
<p>The failure is not in the assembly. It is in the final step: the projection from that assembled representation back into vocabulary space. This projection is called the lm_head, the final linear layer that maps from the model&rsquo;s embedding space to a probability distribution over its output vocabulary. That vocabulary is a fixed set of tokens, established at training time. There is no seahorse emoji token. There never was one, because there is no seahorse emoji in Unicode.</p>
<p>What the lm_head does, faced with a query vector that has no exact match in vocabulary space, is find the nearest token — the one whose embedding is closest to the query, in whatever metric the model has learned during training. That nearest token is some other emoji, and it gets output. The model has no mechanism at this stage to detect that the nearest token is not actually what was requested. It cannot distinguish between &ldquo;I found the seahorse emoji&rdquo; and &ldquo;I found the best available approximation to the seahorse emoji.&rdquo; The output is produced with the same confidence either way.</p>
<p>Vgel&rsquo;s analysis covered behaviour across multiple models — GPT-4o, Claude Sonnet, Gemini Pro, and Llama 3 were all in the mix. The specific wrong answer differed between models, which itself is revealing: different training corpora and different tokenisation schemes produce different nearest-neighbour relationships in embedding space, so each model&rsquo;s fallback lands somewhere different in the emoji neighborhood. What is consistent across models is that none of them correctly diagnosed the gap. They all behaved as if the limitation were in their world-knowledge rather than in their output vocabulary. None of them said: &ldquo;I know what you want, and it does not exist as a token I can emit.&rdquo;</p>
<p>Some of the failure modes are more elaborate than a simple wrong substitution. One pattern Vgel documented is the cascade: the model generates a sequence of increasingly approximate emoji as accumulated context pushes it away from each successive wrong answer, eventually settling into a cycle or giving up. Another is the confident placeholder — an emoji that looks like it might be a box or a question mark symbol, as if the model has internally noted a gap but cannot produce a useful message about it. A third, rarer pattern is genuine partial self-correction: the model produces the wrong emoji, generates a few tokens of commentary, then backtracks. Even that self-correction is not reliable, because the model is correcting based on world-knowledge (&ldquo;wait, that is a dolphin, not a seahorse&rdquo;) rather than vocabulary-knowledge (&ldquo;there is no seahorse token&rdquo;), so it keeps trying until it either runs into a token limit or produces something it can convince itself is close enough.</p>
<h2 id="the-structural-failure-vocabulary-completeness-assumption">The structural failure: vocabulary completeness assumption</h2>
<p>Here is the core conceptual point, stated as cleanly as I can.</p>
<p>Language models have two distinct knowledge representations that are routinely conflated, by users and, it seems, by the models themselves. The first is world knowledge: facts about entities, their properties, and their relationships. A model trained on large quantities of text knows an enormous amount about the world — including, in this case, that seahorses are animals, that emoji are Unicode characters, and that many animals have standard emoji representations. This knowledge is encoded in the weights through training on documents that describe these things.</p>
<p>The second is the output vocabulary: the set of tokens the model can actually emit. This vocabulary is a fixed artifact, established at training time by a tokeniser — usually a byte-pair encoding scheme, as described by Sennrich et al. <a href="#ref-5">[5]</a> and discussed in more detail in my <a href="/posts/strawberry-tokenisation/">tokenisation post</a>. A new emoji added to Unicode after the training cutoff does not exist in the vocabulary. An emoji that never made it into Unicode does not exist in the vocabulary. The vocabulary is closed, and there is no runtime mechanism for expanding it.</p>
<p>The problem is that the model treats these two representations as if they were the same. If world-knowledge says &ldquo;seahorses should have emoji,&rdquo; the model implicitly assumes its output vocabulary contains a seahorse emoji. It does not distinguish between &ldquo;I know X exists&rdquo; and &ldquo;I can express X.&rdquo; I am going to call this the vocabulary completeness assumption: the implicit belief that the expressive vocabulary is complete with respect to world knowledge, that if the model knows about a thing, it can produce a token for that thing.</p>
<p>This assumption is mostly true. For a well-trained model on high-resource languages and common domains, the vocabulary is rich enough that the gap between what the model knows and what it can express is small. The failure shows up precisely in the edge cases: rare Unicode characters, neologisms below the frequency threshold for robust tokenisation, domain-specific symbols that appear in training text only as descriptions rather than as the symbols themselves. Those cases reveal an assumption that was always there but almost never triggered.</p>
<p>The failure is structurally different from ordinary hallucination, and I think this distinction matters. When a model confabulates a fact — invents a citation, misattributes a quote, generates a plausible-but-false historical claim — it is producing incorrect world-knowledge. The cure, in principle, is better training data, better calibration, and retrieval augmentation that can replace the model&rsquo;s internal knowledge with verified external knowledge. These are hard problems but they are the right class of problems to address factual hallucination.</p>
<p>When a model fails on vocabulary completeness, the world-knowledge is correct. The model knows it should produce a seahorse emoji. The limitation is in the output channel. No amount of factual training data will fix this, because the problem is not about facts. Retrieval augmentation will not help either, unless the system also includes a vocabulary lookup step that can report what tokens exist. The fix, if there is one, is a different kind of introspective capability: explicit metadata about the output vocabulary, available to the model at generation time.</p>
<p>A useful analogy: imagine a translator who has a perfect conceptual understanding of a French neologism that has no English equivalent, and who is tasked with writing in English. The translator knows the concept; the English word genuinely does not exist yet. A careful translator would write &ldquo;there is no direct English equivalent; the closest is approximately&hellip;&rdquo; and explain the gap. A less careful translator would pick the nearest English word and output it as if it were a direct translation, without flagging the gap to the reader. Language models are almost uniformly the less careful translator in this analogy, and the problem is architectural: they have no mechanism for detecting that they are approximating rather than translating.</p>
<h2 id="a-formal-language-perspective">A formal language perspective</h2>
<p>For those who prefer their failures stated in type signatures: the decoder step in a standard transformer is a function that maps a hidden state vector to a probability distribution over a fixed token vocabulary <code>V = {t₁, …, tₙ}</code> <a href="#ref-5">[5]</a>. Every output is an element of <code>V</code>. The type system has no room for a &ldquo;near miss&rdquo; or an &ldquo;I cannot express this precisely&rdquo; — the output is always a token, drawn from the inventory established at training time.</p>
<p>This is a closed-world assumption in the formal sense <a href="#ref-6">[6]</a>: the system treats any concept not representable as an element of <code>V</code> as simply absent. There is no seahorse emoji token, so the model&rsquo;s generation step has no way to represent &ldquo;seahorse emoji&rdquo; as a distinct, exact concept. It can only represent &ldquo;nearest token to seahorse emoji in embedding space,&rdquo; which it does silently, with the same confidence it would report for a precise match.</p>
<p>The mismatch is between two representations: the model&rsquo;s internal semantic space — continuous, high-dimensional, geometrically capable of representing &ldquo;seahorse + emoji&rdquo; as a coherent position — and its output type, which is a discrete, finite categorical distribution. The lm_head projection is a quantisation, and at the edges of the vocabulary it is a lossy one. For most semantic positions the nearest token is close enough; for missing emoji, low-frequency morphological forms, or post-training neologisms, the quantisation error is large and nothing in the architecture flags it.</p>
<p>A richer output type would distinguish precise matches from approximations — an <code>Exact&lt;Token&gt;</code> versus an <code>Approximate&lt;Token&gt;</code>, or in standard option-type terms, a generation step that can return <code>None</code> when no token in <code>V</code> adequately represents the requested concept. The information needed to make this distinction already exists inside the model: the logit lens analysis shows that the geometry of the final transformer layer carries signal about the quality of the approximation <a href="#ref-4">[4]</a>. It is simply discarded in the projection step. Making it visible at the interface level is an architectural decision, not a training question, which is why &ldquo;make the model more calibrated about facts&rdquo; addresses the wrong layer of the problem.</p>
<h2 id="the-ununderstandable-companion">The &ldquo;ununderstandable&rdquo; companion</h2>
<p>Shortly after the seahorse emoji incident circulated, a Reddit thread titled &ldquo;it&rsquo;s just the seahorse emoji all over again&rdquo; collected user reports of a structurally similar failure on the English word &ldquo;ununderstandable&rdquo; <a href="#ref-2">[2]</a>. I cannot independently verify every report in that thread — Reddit threads being what they are — but the documented failure pattern is consistent with the seahorse analysis and worth working through because it extends the picture in a useful direction.</p>
<p>&ldquo;Ununderstandable&rdquo; is morphologically valid English. The prefix <em>un-</em> combines productively with adjectives: uncomfortable, unbelievable, unmanageable, unkind. &ldquo;Understandable&rdquo; is an unambiguous adjective. &ldquo;Ununderstandable&rdquo; means what it looks like it means, constructed by exactly the same rule that gives you all the other <em>un-</em> words. There is nothing wrong with it grammatically or semantically.</p>
<p>It is also extremely rare. I cannot find it in any standard reference corpus or mainstream English dictionary. The word has not achieved the frequency threshold required for widespread attestation, which means that a model trained on a broad web corpus will have seen it at most a handful of times, if at all. Its tokenisation is likely fragmented — split across subword units in a way that does not give the model a clean, unified representation of it as a single lexical item. The BPE tokeniser will have handled &ldquo;ununderstandable&rdquo; as a sequence of subword pieces, and the model will have very few training examples from which to learn how those pieces combine in practice.</p>
<p>The failure mode the Reddit thread documented is the same as the seahorse failure in structure, but it operates in morphological space rather than emoji space. The model has learned that <em>un-</em> prefixation is productive, and it has learned that &ldquo;understandable&rdquo; is a word. But its trained representations do not include &ldquo;ununderstandable&rdquo; as a robust lexical entry, because the word is below the minimum frequency threshold for that. When asked to use or define &ldquo;ununderstandable,&rdquo; models in the thread were reported to do one of three things. They would deny it is a word, often confidently, pointing to the absence of a dictionary entry. They would confidently define it incorrectly, conflating it with &ldquo;misunderstandable&rdquo; or &ldquo;incomprehensible&rdquo; in ways that lose the morphological compositionality. Or they would produce grammatically awkward output when forced to use it in a sentence — the kind of output you get when the model is stitching together fragments without a reliable whole-word representation to anchor the construction.</p>
<p>The denial case is the most interesting to me, because it is the model doing something structurally revealing. It is applying world-knowledge (dictionaries do not widely contain this word; therefore it is not a word) to override the conclusion it should reach from morphological knowledge (the word is transparently compositional and valid by productive rules I have learned). The model is, in effect, saying &ldquo;I cannot recognise this because it is not in my training data,&rdquo; which is closer to the truth than the seahorse case but still not quite right. The word is valid, not merely an error — it is just rare.</p>
<p>The Reddit title is apt. Both incidents are examples of the model failing to distinguish between two different epistemic situations: &ldquo;this thing does not exist and I should say so&rdquo; versus &ldquo;this thing exists but I cannot produce it cleanly.&rdquo; In the seahorse case, the emoji genuinely does not exist, and the right answer is to say so. In the &ldquo;ununderstandable&rdquo; case, the word genuinely is valid, and the right answer is to use it or explain the frequency gap. Both failures come from the same source: the model conflates world-knowledge with expressive vocabulary, and has no reliable way to interrogate which of those two representations is actually limiting it.</p>
<h2 id="what-this-means-for-users">What this means for users</h2>
<p>The practical implication is narrow but important. Asking a language model &ldquo;do you have X?&rdquo; — where X is a token, a word, an emoji, a symbol — is not a reliable diagnostic for whether the model can produce X. The model will often affirm things it cannot actually output, and sometimes deny things it can. This is not a matter of the model being dishonest in any meaningful sense. It is a matter of the model not having explicit access to its own vocabulary as a queryable data structure. Its self-description of its capabilities is generated by the same weights that have the gaps, and those weights have no introspective pathway to the tokeniser&rsquo;s vocabulary table.</p>
<p>This matters beyond emoji. The same failure structure applies in any domain where world-knowledge and expressive vocabulary diverge. A model that has read about a proprietary technical symbol used in a narrow field but has no token for that symbol will fail the same way. A model that knows about a recently coined term that postdates its training cutoff will fail the same way. The failure is quiet — the model does not throw an error, does not flag uncertainty, does not produce a visibly broken output. It produces something plausible and wrong.</p>
<p>The broader point is that vocabulary completeness is one instance of a general class of LLM self-knowledge failures. Models do not have accurate introspective access to their own weights, their training data coverage, or their capability boundaries. They can describe themselves in natural language, but those descriptions are generated by the same weights that contain the gaps and the biases. A model that does not know it lacks a seahorse token cannot tell you it lacks one, because the mechanism by which it would report that absence is the same mechanism that has the absence. This connects to the wider theme in this blog of AI systems that are confidently wrong about things that require them to reason about their own limitations — see the <a href="/posts/car-wash-grounding/">grounding failure post</a> and its companion piece on <a href="/posts/car-wash-walk/">pragmatic inference</a> for related examples, and the <a href="/posts/ai-detectors-systematic-minds/">AI detectors post</a> for a case where self-knowledge failures about writing style have real social consequences.</p>
<p>The fix is not &ldquo;make models more honest&rdquo; in the abstract. Honesty calibration training teaches models to express uncertainty about facts, which is useful and real progress on hallucination. But vocabulary gaps are not factual uncertainty — the model is not uncertain about whether the seahorse emoji exists, in any meaningful sense. What is needed is a different kind of capability: models with explicit, queryable metadata about their own output vocabularies, and a generation-time mechanism that can consult that metadata before reporting a confident result. Some retrieval-augmented architectures are beginning to approach this by externalising certain kinds of knowledge into structured databases that the model can query explicitly. The same logic could, in principle, apply to vocabulary.</p>
<h2 id="the-last-mile">The last mile</h2>
<p>There is something almost poignant about the seahorse failure, if you think about what is actually happening at the level of computation. The model is trying very hard. Its internal representation of &ldquo;seahorse emoji&rdquo; is, according to the logit lens analysis, correct. The semantic intent is assembled with care across the model&rsquo;s late layers. The failure is in the last mile — the vocabulary projection — and the model has no way to detect it. It cannot distinguish between &ldquo;I successfully retrieved the seahorse emoji&rdquo; and &ldquo;I retrieved the nearest available approximation to what I was looking for.&rdquo; From the model&rsquo;s operational perspective, it completed the task.</p>
<p>This is not a uniquely LLM problem, by the way. The same structure shows up in human communication all the time. We reach for a word that does not exist in our active vocabulary, produce the closest available word, and often do not notice the substitution. The difference is that a careful human communicator can usually, with effort, recognise that they are approximating — they have some access to the felt sense of the gap, the slight misfit between intent and expression. Language models, as currently built, do not have this. The gap leaves no trace that the model can inspect.</p>
<p>The specific failure mode described here is tractable. Future architectures may address it through better vocabulary coverage, explicit vocabulary metadata, or output-side verification that compares what was generated against what was requested at a representational level. The transformer circuits work <a href="#ref-3">[3]</a> that underlies the logit lens analysis gives us increasingly precise tools for understanding where failures happen inside a model. As those tools mature, the vocabulary completeness assumption will become less of a blind spot and more of a known failure mode with known mitigations.</p>
<p>For now, the seahorse is useful precisely as a demonstration case: simple, memorable, easy to reproduce, and pointing clearly at a structural issue. It is not interesting because anyone needs a seahorse emoji. It is interesting because it is a clean instance of a model being confidently wrong about something that requires it to know what it cannot do — and that is a harder problem than knowing what it does not know.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Vogel, T. (2025). <em>Why do LLMs freak out over the seahorse emoji?</em> <a href="https://vgel.me/posts/seahorse/">https://vgel.me/posts/seahorse/</a></p>
<p><span id="ref-2"></span>[2] Reddit user (2025). It&rsquo;s just the seahorse emoji all over again. <em>r/OpenAI</em>. <a href="https://www.reddit.com/r/OpenAI/comments/1rkbeel/">https://www.reddit.com/r/OpenAI/comments/1rkbeel/</a> (reported; not independently verified)</p>
<p><span id="ref-3"></span>[3] Elhage, N., et al. (2021). A mathematical framework for transformer circuits. <em>Transformer Circuits Thread</em>. <a href="https://transformer-circuits.pub/2021/framework/index.html">https://transformer-circuits.pub/2021/framework/index.html</a></p>
<p><span id="ref-4"></span>[4] Nostalgebraist. (2020). Interpreting GPT: the logit lens. <a href="https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/">https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/</a></p>
<p><span id="ref-5"></span>[5] Sennrich, R., Haddow, B., &amp; Birch, A. (2016). Neural machine translation of rare words with subword units. <em>Proceedings of ACL 2016</em>, 1715–1725.</p>
<p><span id="ref-6"></span>[6] Reiter, R. (1978). On closed world data bases. In H. Gallaire &amp; J. Minker (Eds.), <em>Logic and Data Bases</em> (pp. 55–76). Plenum Press, New York.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-04-01</strong>: Updated reference [1]: author name to &ldquo;Vogel, T.&rdquo; and title to the published blog post title &ldquo;Why do LLMs freak out over the seahorse emoji?&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Oppenheimer Didn&#39;t Have an Acceptable Use Policy</title>
      <link>https://sebastianspicker.github.io/posts/ai-warfare-anthropic-atom-bomb/</link>
      <pubDate>Tue, 03 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-warfare-anthropic-atom-bomb/</guid>
      <description>Anthropic has drawn a public line on military use of its models. The physics community spent the better part of the twentieth century working out what it means to draw that line after you have already built the thing. As a physicist watching this unfold, I find the parallels clarifying and the differences more unsettling than the parallels.</description>
      <content:encoded><![CDATA[<p><em>Physicists inherit, along with the formalism and the problem sets, a particular
set of guilt. The profession has been working
through its relationship to weapons, state violence, and the gap between
scientific capability and ethical readiness since August 1945. This post is about
why I think the current moment in AI closely resembles that history, and why
Anthropic&rsquo;s decision to draw a line matters even if — especially if — you think
the line is imperfect.</em></p>
<hr>
<h2 id="what-just-happened">What Just Happened</h2>
<p>The news this week involves Anthropic and the question of whether and how large
language models should be available for military applications. Anthropic has stepped
back from a path toward unrestricted military use and restated a position: there
are things their models will not be used for, weapons development and autonomous
lethal systems among them. The response from parts of the defence and national
security community has been predictable — naïve, idealistic, unilateral disarmament,
your adversaries will not make the same choice.</p>
<p>These are not stupid objections. I want to take them seriously. But I also want
to explain why, as someone who spent years studying physics in the shadow of the
Manhattan Project&rsquo;s legacy, the framing of those objections sounds very familiar,
and why that familiarity is not reassuring.</p>
<hr>
<h2 id="what-the-physicists-thought-they-were-doing">What the Physicists Thought They Were Doing</h2>
<p>The scientists who built the atomic bomb were not, for the most part, indifferent
to what they were building. Many of them were refugees from European fascism.
They understood what a Nazi atomic weapon would mean. The urgency was real, the
moral reasoning was coherent, and the conclusion — build it before the other side
does — followed from the premises.</p>
<p>What the premises did not include was adequate weight for what happens after the
technical problem is solved.</p>
<p>By the time the Trinity test produced results in July 1945, Germany had already
surrendered. The original justification — prevent the Nazis from getting there
first — had evaporated. What remained was a weapon, an infrastructure for building
more weapons, and a strategic and political logic that had largely moved beyond
the scientists&rsquo; control. The Franck Report, written by a group of Manhattan Project
scientists in June 1945, argued against using the bomb on a Japanese city without
prior demonstration. It was ignored. Oppenheimer, who chaired the Interim
Committee&rsquo;s scientific panel, signed off on the Hiroshima target recommendation.
He spent the rest of his life with that.</p>
<p>The lesson most physics students absorb from this history is something like: the
scientists were not the decision-makers, the decision was going to be made anyway,
and the presence of principled scientists in the room was better than their absence.
The system was going to do what it was going to do; all you could influence was
the margin.</p>
<p>I believed this for a long time. I am less sure of it now.</p>
<hr>
<h2 id="the-analogy-and-its-limits">The Analogy and Its Limits</h2>
<p>The comparison between the atom bomb and artificial general intelligence — or even
current large language models at the capability frontier — is made often enough
that it has become a cliché, which is usually the point at which people stop
thinking carefully about it. Let me try to be specific about where the analogy
holds and where it breaks.</p>
<p><strong>Where it holds:</strong></p>
<p>The core structural similarity is this: a small number of researchers, working
at the frontier of a capability that most people do not understand, are making
decisions that will constrain or enable uses they cannot fully anticipate, in
contexts they will not control. The physics community in 1942 had a clearer view
of what fission could do than any political or military decision-maker. The AI
research community in 2026 has a clearer view of what large language models can
do — and of what more capable successors will do — than most of the people who
will deploy them.</p>
<p>That epistemic position is not morally neutral. Knowing more than the decision-makers
does not mean you have unlimited responsibility, but it does mean you have more
responsibility than someone who does not know. Feigning ignorance about downstream
applications is not available to you.</p>
<p>The second similarity: once the capability exists and is demonstrated, the
normative landscape changes. Before Trinity, the question of whether to build nuclear
weapons was still open. After Trinity, it was no longer open in the same way — the
knowledge existed, the infrastructure existed, the geopolitical expectations had
already been set. The arms race was not caused by the bomb, but the bomb&rsquo;s existence
changed what the arms race meant and how fast it moved. We are somewhere in the
vicinity of that transition with frontier AI systems. The question of whether to
build them is still formally open for any given company or research group, but the
landscape is already different from what it was five years ago.</p>
<p><strong>Where it breaks:</strong></p>
<p>The atom bomb was a single-use physical object whose primary function was destroying
things. Large language models are general-purpose cognitive tools with a very wide
range of applications, the majority of which are not weapons-relevant. This matters
because it changes the policy space. You could, in principle, have not built the
atom bomb. You cannot, in principle, not build language models while still having
language models for medicine, education, scientific research, and the other
applications that are clearly beneficial. The dual-use problem for AI is more
severe, not less severe, than it was for physics.</p>
<p>The other important difference: the Manhattan Project was conducted in secret, under
wartime conditions, with a relatively well-defined adversarial structure. The current
AI landscape involves many organisations, many countries, public publication of
research, and no clear equivalent of the Axis/Allied framing. The game theory
of &ldquo;if we don&rsquo;t do it, they will&rdquo; is more complicated when &ldquo;they&rdquo; is not a single
identifiable adversary with symmetric interests.</p>
<hr>
<h2 id="what-anthropics-line-actually-says">What Anthropic&rsquo;s Line Actually Says</h2>
<p>Setting aside for a moment whether the line is in the right place, there is something
worth examining in the act of drawing it at all.</p>
<p>The standard criticism — that a unilateral ethical commitment in a competitive
field simply advantages less scrupulous actors — assumes that ethical commitments
are pure costs with no countervailing benefits. This is the argument the weapons
lobby has made about every arms control proposal in the history of arms control,
and it has sometimes been right. Unilateral disarmament without reciprocal
commitments can leave you worse off. This is not a trivial point.</p>
<p>But it smuggles in an assumption that deserves scrutiny: that the relevant
competition is primarily between AI companies, and that the only variable that
matters is relative capability. If you accept that framing, then any ethical
constraint is a handicap and the only rational strategy is to develop as fast as
possible with as few restrictions as possible.</p>
<p>That framing has a name in physics. It is called the arms race equilibrium, and
the physics community spent thirty years understanding what it produces. It produces
capability accumulation without a corresponding development of the normative
frameworks, institutional safeguards, and mutual verification mechanisms that
make the capability survivable. It produces Hiroshima, then the hydrogen bomb,
then MIRV, then the point at which the accumulated arsenal is large enough to
end complex life on Earth several times over, at which point you negotiate the
first real arms limitation treaties — from a starting position of vastly more
deployed capability than anyone needed and vastly less trust than anyone wanted.</p>
<p>The question Anthropic is implicitly asking is whether there is a path that does
not look like that. The answer is not obvious. But I think it is worth asking.</p>
<hr>
<h2 id="what-the-physicists-should-have-done">What the Physicists Should Have Done</h2>
<p>Here is the counterfactual that haunts the Manhattan Project&rsquo;s legacy: what if
the scientific community had treated the ethics of the bomb as seriously as the
physics, from the beginning?</p>
<p>Not naïvely. Not by refusing to work on it and ceding the possibility of influencing
it. But by making the ethical analysis parallel to the technical analysis, by
treating the question of use as a scientific question with as much rigour as the
question of yield, and by using the epistemic authority that came from being the
people who understood the capability to push, hard, for the normative frameworks
that did not yet exist.</p>
<p>Some scientists did this. Szilard circulated a petition, signed by 70 Manhattan
Project scientists, against the use of the bomb on Japanese cities without prior
warning. It did not work. But the effort was real, and the record of the effort
matters — both as evidence that the scientific community was not unanimous in its
acquiescence and as a model for what engaged dissent looks like from inside a
project that is going to proceed regardless.</p>
<p>What most scientists did not do, and what the profession largely did not do in the
decades that followed, was treat the ethical work as primary. Physics built its
identity around the technical capability — the extraordinary achievement of
understanding nature at the deepest level — and treated the ethical consequences
as someone else&rsquo;s department. The bomb was the military&rsquo;s problem. The cold war was
the politicians&rsquo; problem. The physicists kept doing physics.</p>
<p>This was comfortable and it was wrong.</p>
<hr>
<h2 id="what-i-want-from-ai-researchers">What I Want From AI Researchers</h2>
<p>I want AI researchers to do what the physicists did not, and to do it now, while
the critical decisions are still open.</p>
<p>Anthropic drawing a line is one version of this. It is imperfect — the line is
in a particular place, the enforcement mechanisms are limited, the competitive
dynamics are real. But it is a claim that the people who built the capability
have ongoing responsibility for how it is used, and that some uses are outside
the bounds of what should happen regardless of what is technically possible.</p>
<p>That claim is not naïve. It is, in fact, the claim the Franck Report was making
in 1945: that capability does not determine use, that scientists have a voice in
the normative question, and that using that voice is part of the job rather than
a distraction from it.</p>
<p>What I want beyond that is for the AI research community to treat the ethics
as primary rather than as footnotes. Not ethics review boards that approve research
post hoc. Not responsible AI teams that are consulted after the capability has
been developed. A genuine integration of the normative analysis into the research
process itself — asking, at each stage, what this capability makes possible and
who benefits from that possibility and who pays the cost.</p>
<p>The physics community got to August 1945 before it had that conversation in earnest.
The conversation has been going on ever since, and it has produced important
institutional frameworks — the Bulletin of the Atomic Scientists, the arms control
treaties, the export control regimes, the norms against first use. These things
matter. But they were built in reaction to a capability that had already been
deployed, and the shape of everything that followed was constrained by that
starting point.</p>
<p>The AI community is not there yet. The starting point is still being established.
That is what makes this moment consequential, and what makes Anthropic&rsquo;s line —
wherever exactly it is drawn — worth defending as an act of principle rather than
dismissing as an act of commercial positioning.</p>
<hr>
<h2 id="a-note-on-the-of-our-time-framing">A Note on the &ldquo;Of Our Time&rdquo; Framing</h2>
<p>I am aware that comparisons to the atom bomb are sometimes used to generate
unwarranted urgency, to short-circuit careful reasoning by invoking the most
extreme case. I want to be clear about what I am and am not claiming.</p>
<p>I am not claiming that current large language models are as immediately dangerous
as nuclear weapons. They are not.</p>
<p>I am claiming that the structural situation — researchers at the capability
frontier, ahead of the policy frameworks, making decisions that will constrain
future options, in a competitive environment with adversarial dynamics — is
similar enough that the lessons of the Manhattan Project period are directly
relevant. Not as prophecy. As a guide to the kind of mistakes that are available
to make.</p>
<p>The physicists had plenty of warning. Szilard had been worried since 1933.
Einstein wrote to Roosevelt in 1939. The Franck Report was written before
Hiroshima. The warnings were on the record. What was not on the record was
a scientific community that treated those warnings as actionable constraints
on its own behaviour rather than as advisories for policymakers.</p>
<p>That is the thing I want to be different this time.</p>
<hr>
<h2 id="references">References</h2>
<p>Franck, J. et al. (1945). <em>Report of the Committee on Political and Social Problems
(The Franck Report).</em> National Archives, Record Group 77.</p>
<p>Oppenheimer, J. R. (1965). Interview on <em>The Decision to Drop the Bomb</em> (NBC
documentary). Recorded 1965.</p>
<p>Rhodes, R. (1986). <em>The Making of the Atomic
Bomb.</em> Simon &amp; Schuster.</p>
<p>Russell, B., &amp; Einstein, A. (1955). <em>The Russell–Einstein Manifesto.</em>
Pugwash Conferences on Science and World Affairs.</p>
<p>Szilard, L. (1945). <em>A Petition to the President of the United States.</em> July 17,
1945. Available via the Atomic Heritage Foundation.</p>
<p>Bulletin of the Atomic Scientists (1945–present). <em>Doomsday Clock statements.</em>
<a href="https://thebulletin.org/doomsday-clock/">https://thebulletin.org/doomsday-clock/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>More Context Is Not Always Better</title>
      <link>https://sebastianspicker.github.io/posts/more-context-not-always-better/</link>
      <pubDate>Sun, 22 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/more-context-not-always-better/</guid>
      <description>The intuition that feeding a language model more information improves its outputs is wrong often enough to matter. Here is why, and what to do about it.</description>
      <content:encoded><![CDATA[<h2 id="summary">Summary</h2>
<p>There is a popular intuition in LLM engineering that context is a resource
you should always spend freely: more background, more history, more examples —
inevitably better answers. This intuition is wrong often enough to be
dangerous. Context has a signal-to-noise structure, attention has a
position-dependent bias, and the architecture that processes all of it scales
quadratically. Adding irrelevant tokens does not leave performance neutral; it
actively degrades it. This post argues for <em>structured sparsity</em> as a design
principle: give a model exactly the context it needs for the decision it is
making right now, and nothing else.</p>
<hr>
<h2 id="background">Background</h2>
<p>The &ldquo;more is more&rdquo; assumption has an obvious origin. Transformers were
designed to condition on sequences, and every new token in the context window
is, in principle, available to every attention head. The release of models with
128k, 200k, and now million-token context windows reinforced the story: the
constraint is gone, so pack in everything you have.</p>
<p>Two lines of empirical and theoretical work complicate this story.</p>
<p><strong>The lost-in-the-middle problem.</strong> Liu et al. <a href="#ref-1">[1]</a> showed that retrieval
accuracy on multi-document question answering degrades sharply when the
relevant passage appears in the <em>middle</em> of a long context, compared to the
beginning or end. Performance on 20-document prompts dropped by more than
20 percentage points relative to the single-document baseline — not because the
model lacked the information, but because it was buried. The effect is
consistent across model families and persists at model scales where you would
not expect it.</p>
<p><strong>The complexity argument.</strong> Standard scaled dot-product attention <a href="#ref-2">[2]</a> is</p>
\[
  \text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^{\top}}{\sqrt{d_k}}\right) V
\]<p>The \( QK^{\top} \) product is \( O(n^2) \) in sequence length \( n \).
Inference-time KV-cache mitigates compute cost, but memory grows linearly and
the softmax normalises over a denominator that grows with \( n \). A head
attending to 128 000 tokens is averaging over a vastly noisier signal than one
attending to 512.</p>
<hr>
<h2 id="the-idea">The Idea</h2>
<h3 id="context-as-a-noisy-channel">Context as a noisy channel</h3>
<p>Think of the information reaching a given attention head as a noisy channel in
the Shannon sense. The signal is the subset of tokens that are actually
relevant to the current decoding step; the rest is noise. Signal-to-noise ratio
is</p>
\[
  \text{SNR} = \frac{|\mathcal{S}|}{n - |\mathcal{S}|}
\]<p>where \( \mathcal{S} \subset \{1, \ldots, n\} \) is the set of relevant token
positions and \( n \) is total context length. For a fixed task, \( |\mathcal{S}| \)
is roughly constant. So SNR is a <em>decreasing</em> function of \( n \). Adding
irrelevant context makes the problem strictly harder in this framing — it does
not leave it unchanged.</p>
<p>This is a toy model, but it captures something real. The softmax in the
attention head distributes a probability mass of 1.0 across \( n \) positions.
If the attended sequence doubles in length and the relevant positions remain the
same, each relevant position receives roughly half the probability mass it did
before — unless the model&rsquo;s learned attention patterns are precise enough to
suppress the irrelevant positions to near-zero, which is a strong assumption.</p>
<h3 id="position-bias-compounds-the-problem">Position bias compounds the problem</h3>
<p>Empirically, transformers exhibit a U-shaped recall curve over context
position: tokens near the start (primacy) and tokens near the end (recency)
are retrieved more reliably than tokens in the middle. If you stuff a long
context with background material and bury the task-relevant information in the
middle, you are fighting the architecture&rsquo;s learned inductive bias.</p>
<p>The effect is roughly consistent with what would emerge if the model&rsquo;s
attention weight distribution were modelled as a mixture of a flat prior and a
position-biased component. Under that model, increasing \( n \) inflates the
flat component&rsquo;s contribution and dilutes the position-biased recovery of
relevant tokens.</p>
<h3 id="what-structured-sparsity-looks-like-in-practice">What structured sparsity looks like in practice</h3>
<p>The corrective is not to artificially shrink context windows — it is to ensure
that at each decision point, the context is populated with tokens that are
<em>relevant to that decision</em>. Three practical expressions of this principle:</p>
<ol>
<li>
<p><strong>Retrieval over recall.</strong> Rather than prepending a full document corpus,
retrieve the top-\( k \) passages at query time. This keeps \( n \) small
and \( |\mathcal{S}| / n \) high.</p>
</li>
<li>
<p><strong>Rolling summarisation.</strong> Compress history into a running summary and
discard the raw transcript. The summary carries the signal; the raw
transcript is mostly noise by the time it is several turns old.</p>
</li>
<li>
<p><strong>Phased orchestration.</strong> Decompose a multi-step task into phases, each
with its own focused context. Phase \( t \) receives only the output of
phase \( t-1 \) (plus any task-specific retrieval), not the entire
accumulated history of all prior phases. This keeps per-phase \( n \) bounded
even as the total task length grows.</p>
</li>
</ol>
<hr>
<h2 id="discussion">Discussion</h2>
<p>The argument above is not novel — pieces of it appear scattered across the
alignment and inference-efficiency literatures. What I think is underappreciated
is that it applies to <em>agentic</em> systems with particular force. A single-shot
prompt has a fixed, author-controlled context. An agent accumulating tool
outputs, prior reasoning traces, and retrieved documents across a long task
trajectory will naturally inflate its own context window over time — and
degrade its own performance as a result, without any external change in task
difficulty.</p>
<p>The naive fix is to give the agent a bigger context window. The correct fix is
to never let it accumulate a bloated context in the first place.</p>
<p><strong>Limitations.</strong> The SNR framing treats all irrelevant tokens as equally noisy,
which is false — some irrelevant tokens are actively misleading (distractors) <a href="#ref-3">[3]</a>,
others are benign fillers. The quadratic cost argument mostly applies to
full-attention models; sparse and linear attention variants have different
scaling properties. And &ldquo;relevant&rdquo; is itself a function of the model&rsquo;s
knowledge, which makes the optimisation circular in practice.</p>
<p><strong>What would make this publishable.</strong> Controlled ablation: fix a task, vary
context length by inserting null tokens of increasing volume, measure
performance as a function of \( n \) and of the position of the relevant
material. Do this across model sizes and families to separate architectural
effects from scale effects. The lost-in-the-middle paper is close to this but
does not isolate null-token inflation from document-count inflation.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., &amp; Liang, P. (2024). Lost in the middle: How language models use long contexts. <em>Transactions of the Association for Computational Linguistics</em>, 12, 157–173. <a href="https://arxiv.org/abs/2307.03172">https://arxiv.org/abs/2307.03172</a></p>
<p><span id="ref-2"></span>[2] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., &amp; Polosukhin, I. (2017). Attention is all you need. <em>Advances in Neural Information Processing Systems</em>, 30. <a href="https://arxiv.org/abs/1706.03762">https://arxiv.org/abs/1706.03762</a></p>
<p><span id="ref-3"></span>[3] Shi, F., Chen, X., Misra, K., Scales, N., Dohan, D., Chi, E., Schärli, N., &amp; Zhou, D. (2023). Large language models can be easily distracted by irrelevant context. <em>Proceedings of the 40th International Conference on Machine Learning (ICML 2023)</em>. <a href="https://arxiv.org/abs/2302.00093">https://arxiv.org/abs/2302.00093</a></p>
<hr>
<p><em>The phased orchestration argument in the Discussion section is not just
theoretical hand-waving — I have been building a concrete implementation of it.
The current state lives at
<a href="https://github.com/sebastianspicker/phased-agent-orchestration">sebastianspicker/phased-agent-orchestration</a>.
It is rough, but the core idea is there: each agent phase gets a bounded,
purpose-built context rather than the full accumulated history. Feedback very
welcome.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>If You Think This Is Written by AI, You Are Both Right and Wrong</title>
      <link>https://sebastianspicker.github.io/posts/ai-detectors-systematic-minds/</link>
      <pubDate>Wed, 18 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-detectors-systematic-minds/</guid>
      <description>AI detectors flag the US Constitution as machine-generated. They also flag technical papers, legal prose, and — with striking consistency — writing produced by autistic minds and physics-trained ones. The error is not in the measurement. It is in the baseline assumption: that systematic, precise writing is inhuman.</description>
      <content:encoded><![CDATA[<p>I use AI tools in my writing. This post, like several others on this blog,
was written with LLM assistance — research, structure, drafting,
revision. If you run any of these posts through an AI writing detector, you
will likely receive a high probability-of-AI score. The detector will be
picking up something real.</p>
<p>It will also be wrong about what that means.</p>
<hr>
<h2 id="the-constitution-problem">The Constitution Problem</h2>
<p>In 2023, as universities began deploying AI detection tools at scale,
educators started testing them on texts that were definitively not
AI-generated. The results were instructive. The United States Constitution
received high AI-probability scores from multiple commercial detectors.
GPTZero returned a rating of 92% likely AI-written. The Federalist Papers
fared similarly. So did sections of the King James Bible and Kant&rsquo;s <em>Critique
of Pure Reason</em>. Historical documents, written by humans, for human purposes,
in an era when no AI existed — flagged as machine-generated.</p>
<p>This was not a marginal edge case. It was consistent across tools and across
documents. And while it was widely reported as evidence that the detectors
were broken, there is a more precise reading available: the detectors were
working correctly, and we had misunderstood what they were measuring.</p>
<hr>
<h2 id="what-the-detectors-actually-measure">What the Detectors Actually Measure</h2>
<p>Most commercial AI detectors — GPTZero, Turnitin&rsquo;s detection layer,
Copyleaks — use some combination of two statistical signals.</p>
<p><strong>Perplexity.</strong> A language model assigns a probability to each token given
the preceding tokens. Low perplexity means the text was, token by token,
what the model expected — it sits close to the centre of the probability
distribution. AI-generated text tends to have low perplexity because that
is precisely what generation does: it samples from the high-probability
region of the distribution <a href="#ref-1">[1]</a>. Human text, on average,
has higher perplexity, because humans write for specific contexts with
idiosyncratic word choices, rhetorical effects that require the unexpected,
and the accumulated noise of composing for a real reader.</p>
<p><strong>Burstiness.</strong> A term introduced by Edward Tian, GPTZero&rsquo;s creator: human
writing has high burstiness — sentence lengths vary widely, vocabulary
density shifts, complex constructions alternate with simple ones. AI writing
is more uniform. The statistical distribution of sentence lengths in LLM
output is narrower than in most human prose <a href="#ref-2">[2]</a>.</p>
<p>The underlying assumption these tools share: human writing is variable,
contextually messy, idiosyncratic. AI writing is smooth and predictable.</p>
<p>This is accurate for a large class of human writing — casual prose, personal
essays, social media, student writing in informal registers. It is wrong
about a different and well-defined class of human writing. The Constitution
sits in that class. So does a lot of other text.</p>
<hr>
<h2 id="the-systemising-brain">The Systemising Brain</h2>
<p>Simon Baron-Cohen&rsquo;s empathising–systemising (E-S) theory distinguishes two
cognitive orientations. Empathising involves attending to social and emotional
cues, inferring mental states, navigating the pragmatic, implicit layer of
communication — what is meant rather than what is said. Systemising involves
attending to rules, patterns, and underlying regularities — the drive to
understand how things work and to represent them in explicit, transferable,
internally consistent terms <a href="#ref-3">[3]</a>.</p>
<p>Both orientations are distributed across the human population. They are not
exclusive, and neither is pathological. But autism spectrum conditions are
robustly associated with high systemising and relatively lower empathising —
not because autistic people lack emotions or care about others, but because
the cognitive mode that comes naturally to them is one of rules, structures,
and explicit representation rather than social inference and pragmatic
implication. The intense world theory <a href="#ref-4">[4]</a> adds a
complementary perspective: autistic brains may be characterised by
hyper-reactivity and hyper-plasticity, with pattern-seeking and systematising
serving partly as a way of making a too-intense world navigable. The
systematicity is not a deficit. It is an adaptation.</p>
<p>This has direct consequences for writing.</p>
<p>High-systemising writing tends toward:</p>
<ul>
<li>
<p><strong>Consistent vocabulary.</strong> The same term is used for the same concept
throughout, because substituting a synonym introduces ambiguity about
whether the referent is actually the same. Neurotypical writing freely
uses synonyms for stylistic variety; systemising writing resists this
on principle.</p>
</li>
<li>
<p><strong>Explicit logical structure.</strong> Claims are supported by stated reasons
rather than left to pragmatic inference. If there are three conditions,
all three are named. Nothing is &ldquo;needless to say.&rdquo;</p>
</li>
<li>
<p><strong>Low social hedging.</strong> Phrases like &ldquo;as everyone knows&rdquo; or &ldquo;obviously&rdquo;
are avoided, because they perform social alignment rather than convey
information — and they depend on shared assumptions the writer is not
confident are actually shared. (This connects to a point I made in the
<a href="/posts/car-wash-walk/">car-wash-walk post</a> about Gricean pragmatics:
autistic communication often violates the maxim of quantity in the
direction of over-informing, because nothing is assumed implicit.)</p>
</li>
<li>
<p><strong>Grammatical parallelism.</strong> Parallel logical content takes parallel
grammatical form. This is not stylistic affectation; it is a natural
consequence of representing structure explicitly.</p>
</li>
<li>
<p><strong>Minimal rhetorical noise.</strong> The prose does not meander, warm up, or
perform relatability. It states what needs to be stated.</p>
</li>
</ul>
<p>Now run text with these properties through an AI detector. Consistent
vocabulary reads as low lexical diversity. Explicit structure reads as low
burstiness. Minimal rhetorical noise reads as smooth, generated output. The
detector is measuring these properties accurately. The attribution to machine
generation is where it goes wrong.</p>
<p>Liang et al. <a href="#ref-5">[5]</a> demonstrated a closely related failure empirically: AI
detectors are significantly more likely to flag writing by non-native English
speakers as AI-generated. Non-native writers at advanced levels of formal
English tend to write more carefully, more consistently, and more in
accordance with explicit grammar rules — because they learned the language
as a system of explicit rules rather than acquiring it through immersive
social exposure. More systematic writing: higher AI probability score. The
mechanism is the same. The population is different.</p>
<hr>
<h2 id="the-physicist-brain">The Physicist Brain</h2>
<p>Physics writing has its own conventions, independently developed but pointing
in the same direction.</p>
<p>Scientific prose requires defined terms used consistently: in a paper about
quantum error correction, &ldquo;logical qubit,&rdquo; &ldquo;physical qubit,&rdquo; and &ldquo;syndrome&rdquo;
each mean exactly one thing, used identically in section 2 and section 5.
It requires explicit assumptions: &ldquo;We assume the noise is Markovian.&rdquo; &ldquo;In
the limit of large N.&rdquo; These are not vague hedges; they are precise
statements about the domain of validity of the results. It requires logical
derivation over rhetorical persuasion: the connectives are &ldquo;since,&rdquo;
&ldquo;therefore,&rdquo; &ldquo;it follows that&rdquo; — explicit logical operators, not narrative
bridges. And the passive construction of &ldquo;the signal was measured&rdquo; rather
than &ldquo;I measured the signal&rdquo; removes the individual from the result,
because the result should be reproducible regardless of who performs the
measurement.</p>
<p>The outcome is prose that is systematic, consistent, and structurally
predictable. From the outside — and from the vantage point of an AI
detector — it looks machine-generated.</p>
<p>Paul Dirac is the physicist who comes to mind first here. His 1928 paper
deriving the relativistic wave equation for the electron contains almost no
rhetorical apparatus. Motivation, equation, consequence: each stated once,
clearly, with no warm-up and no elaboration beyond what the argument
requires. It is not warm. It is not discursive. It is beautiful in the way
that a proof is beautiful: every element earns its place. Run it through
GPTZero and see what you get.</p>
<p>This connection between the physicist&rsquo;s prose style and the autistic cognitive
mode is not accidental. Baron-Cohen et al. <a href="#ref-6">[6]</a> surveyed Cambridge students
by academic discipline and found that physical scientists and mathematicians
scored consistently higher on the Autism Quotient (AQ) than humanities
students and controls, with mathematicians scoring highest of all. The
systemising orientation associated with autism spectrum conditions is also
associated with — and presumably selected for — in quantitative scientific
disciplines. The physicist&rsquo;s prose reflects this. So does the writing of a
high-systemising person who has never studied physics.</p>
<p>The categories overlap without being identical. What they share is a
cognitive preference for explicit structure, consistent vocabulary, and
logical transparency over social performance and rhetorical persuasion. The
writing that emerges from that preference looks, to an AI detector, like it
was generated by a machine.</p>
<p>It was not.</p>
<hr>
<h2 id="the-category-error">The Category Error</h2>
<p>The error AI detectors make is not a measurement error. It is a category
error.</p>
<p>They are trained to distinguish two things: output generated by a
contemporary LLM, and a specific subset of human writing — typically casual,
personal, or student prose collected from online sources. When they encounter
text outside either of those training categories — systematic and precise but
human-generated — the classifier has no good option. The text does not match
the &ldquo;AI&rdquo; training data exactly, and it does not match the &ldquo;human&rdquo; baseline
either. It gets assigned to the bin it fits least badly.</p>
<p>What is happening when the Constitution is flagged: it is systematic,
definitional, prescriptive, and internally consistent. It was written by
lawyers and statesmen who understood that ambiguity in foundational documents
creates legal chaos. They wrote to be unambiguous. The result is text with
low perplexity and low burstiness — the statistical signature the detector
associates with AI.</p>
<p>GPTZero&rsquo;s creator Edward Tian acknowledged this problem when it was reported:
the Constitution appears so frequently in LLM training data that it registers
as &ldquo;already known&rdquo; to the model, which artificially lowers its perplexity
score. That is a real and specific issue. But it is secondary. The deeper
issue is that the Constitution would score low-perplexity even without the
training-data contamination effect, because systematic, definitional prose
is intrinsically low-perplexity. Precise language is predictable language.
That is partly the point of precise language.</p>
<p>The baseline assumption — that human writing is variable and idiosyncratic —
holds for much human writing. It does not hold for legal drafting, technical
documentation, scientific papers, sacred and historical texts written to be
durable and precise, writing by people with high systemising orientation, or
writing by non-native speakers at formal registers. That is not a small
population of edge cases. It is a substantial fraction of all written
material that exists.</p>
<hr>
<h2 id="right-and-wrong-at-the-same-time">Right and Wrong at the Same Time</h2>
<p>So: if you think these posts are AI-generated, you are right and wrong at
the same time.</p>
<p>Right, in two ways. First: yes, I use AI tools. LLM assistance is part of
my writing process — not an occasional aid, but a regular part of how
research notes and half-formed arguments become structured posts. Second:
the writing style of these posts is systematic and precise in ways that
detectors register as machine-generated. That systematicity is real, and
if a detector picks it up, it is measuring something.</p>
<p>Wrong, also in two ways. First: the ideas, judgments, and connections in
these posts are mine. The decisions about what to include and what to leave
out, which papers to cite and how to frame their implications, where the
interesting tension lies between neurodiversity research and the assumptions
baked into AI detection tools — those are not outputs of a language model
working in isolation. They are the product of someone who works at the
intersection of these fields and has thought about them for a while. An LLM
cannot generate these posts without a human who has already decided what
to say.</p>
<p>Second, and more important for the argument here: the systematic, precise
character of this writing is not evidence of machine generation. It is a
cognitive signature — one associated with physics training, with high
systemising orientation, with the <a href="/posts/inner-echo/">overlap between those two things that I
have written about elsewhere</a> in the context of
neurodiversity more broadly.</p>
<p>The detector is measuring a real property of the text. It is misattributing
the origin of that property.</p>
<p>The interesting question this opens is not &ldquo;did AI write this?&rdquo; That question
is increasingly poorly posed in an era where thinking and writing are already
deeply entangled with machine assistance, in ways that differ sharply from
person to person and task to task. The better question is: <em>whose judgment
is in the text?</em> Whose choices about what to include, what to connect, what
to leave out?</p>
<p>The systematicity in this writing is mine. The recognition that AI detectors
systematically disadvantage autistic writers, physicist writers, and
non-native speakers is a judgment I made, not one a language model was
prompted to produce. The connection to the Constitution — a document written
to be maximally unambiguous, flagged as maximally AI-like — is a connection
I found worth drawing.</p>
<p>Whether that makes this text &ldquo;human&rdquo; is a philosophical question I am happy
to leave open. What it is not is AI hallucination.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., &amp; Finn, C. (2023). DetectGPT: Zero-shot machine-generated text detection using probability curvature. <em>Proceedings of the 40th International Conference on Machine Learning (ICML 2023)</em>. <a href="https://arxiv.org/abs/2301.11305">https://arxiv.org/abs/2301.11305</a></p>
<p><span id="ref-2"></span>[2] Gehrmann, S., Strobelt, H., &amp; Rush, A. M. (2019). GLTR: Statistical detection and visualization of generated text. <em>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</em>, 111–116. <a href="https://doi.org/10.18653/v1/P19-3019">https://doi.org/10.18653/v1/P19-3019</a></p>
<p><span id="ref-3"></span>[3] Baron-Cohen, S. (2009). Autism: The empathising–systemising (E-S) theory. <em>Annals of the New York Academy of Sciences</em>, 1156(1), 68–80. <a href="https://doi.org/10.1111/j.1749-6632.2009.04467.x">https://doi.org/10.1111/j.1749-6632.2009.04467.x</a></p>
<p><span id="ref-4"></span>[4] Markram, K., &amp; Markram, H. (2010). The intense world theory — a unifying theory of the neurobiology of autism. <em>Frontiers in Human Neuroscience</em>, 4, 224. <a href="https://doi.org/10.3389/fnhum.2010.00224">https://doi.org/10.3389/fnhum.2010.00224</a></p>
<p><span id="ref-5"></span>[5] Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., &amp; Zou, J. (2023). GPT detectors are biased against non-native English writers. <em>Patterns</em>, 4(7), 100779. <a href="https://doi.org/10.1016/j.patter.2023.100779">https://doi.org/10.1016/j.patter.2023.100779</a></p>
<p><span id="ref-6"></span>[6] Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., &amp; Clubley, E. (2001). The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. <em>Journal of Autism and Developmental Disorders</em>, 31(1), 5–17. <a href="https://doi.org/10.1023/A:1005653411471">https://doi.org/10.1023/A:1005653411471</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Two Expansion Rates, One Universe: The Hubble Tension at 5σ</title>
      <link>https://sebastianspicker.github.io/posts/hubble-tension-crisis-cosmology/</link>
      <pubDate>Mon, 16 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/hubble-tension-crisis-cosmology/</guid>
      <description>The universe has two measured expansion rates. One comes from the early universe, encoded in the cosmic microwave background. The other comes from measuring distances to nearby galaxies. They disagree by approximately 5σ — the threshold physicists call discovery. Every systematic error has been hunted down. JWST has confirmed the distance ladder. DESI has found hints that dark energy is not constant. Something is either wrong with our cosmological model or with one of two extremely well-tested measurement chains. We are not sure which.</description>
      <content:encoded><![CDATA[<p>The universe is expanding. We have known this since Edwin Hubble&rsquo;s 1929 paper, which measured recession velocities of galaxies using Cepheid variable stars and established what we now call Hubble&rsquo;s law:</p>
$$v = H_0 \, d$$<p>The proportionality constant $H_0$ — the Hubble constant — is the current rate of expansion, in units of km/s/Mpc (kilometres per second per megaparsec, where 1 Mpc $\approx 3.086 \times 10^{22}$ m). Hubble&rsquo;s original measurement gave $H_0 \approx 500$ km/s/Mpc. It was wrong by a factor of about seven, which is understandable given that he was measuring distances to galaxies in the 1920s using photographic plates and a lot of courage. Over the following decades, as techniques improved, the value converged toward 70 km/s/Mpc. By the 1990s, many people considered the question largely settled: $H_0$ was somewhere between 60 and 80, and the main arguments were about where exactly in that range.</p>
<p>Those arguments have sharpened considerably. We now have two extremely precise, extremely well-scrutinised measurements of $H_0$, and they disagree:</p>
$$H_0^{\text{late}} = 73.04 \pm 1.04 \text{ km/s/Mpc} \qquad \text{(distance ladder)}$$<p>
</p>
$$H_0^{\text{early}} = 67.4 \pm 0.5 \text{ km/s/Mpc} \qquad \text{(CMB)}$$<p>The discrepancy is $73.04 - 67.4 = 5.64$ km/s/Mpc. The combined uncertainty is $\sqrt{1.04^2 + 0.5^2} \approx 1.16$ km/s/Mpc. The significance is approximately $4.9\sigma$, which rounds to what cosmologists have taken to calling — with some grimness — the Hubble tension.</p>
<p>$5\sigma$ is what particle physicists require before claiming a discovery. It is the threshold designed to exclude chance statistical fluctuations at the level of one in 3.5 million. When the Hubble tension first appeared around 2016 it was at $2$–$3\sigma$, which is &ldquo;interesting.&rdquo; It has since grown monotonically as both measurement chains have been refined. This is not the behavior you want from a systematic error that you are hoping will go away.</p>
<p>This is the kind of problem that keeps me reading papers at unreasonable hours. The rest of this post is an attempt to explain why both measurements are trustworthy, why the disagreement is therefore a genuine crisis, and what proposals exist for resolving it.</p>
<h2 id="what--actually-measures-and-why-it-matters">What $H_0$ Actually Measures and Why It Matters</h2>
<p>Hubble&rsquo;s law $v = H_0 d$ is valid in the nearby universe, for galaxies whose recession velocities are much less than the speed of light. In the full relativistic framework, the expansion is described by the scale factor $a(t)$, which encodes how distances between comoving points grow with time. The Hubble parameter is defined as</p>
$$H(t) = \frac{\dot{a}}{a}$$<p>and $H_0 = H(t_0)$ is its value today. The Friedmann equation — derived from general relativity applied to a homogeneous and isotropic universe — gives</p>
$$H(z)^2 = H_0^2 \left[ \Omega_m (1+z)^3 + \Omega_r (1+z)^4 + \Omega_k (1+z)^2 + \Omega_\Lambda \right]$$<p>where $z$ is the redshift (related to the scale factor by $1 + z = 1/a$), and the $\Omega$ parameters are the present-day fractional energy densities of matter, radiation, spatial curvature, and the cosmological constant (dark energy), respectively. Our standard cosmological model, $\Lambda$CDM, assumes a spatially flat universe ($\Omega_k = 0$), negligible radiation today ($\Omega_r \approx 0$), and</p>
$$\Omega_\Lambda \approx 0.68, \qquad \Omega_m \approx 0.31, \qquad \Omega_b \approx 0.049$$<p>where $\Omega_b$ is the baryon (ordinary matter) density. Dark matter makes up most of $\Omega_m$.</p>
<p>$H_0$ is not just one number among many. It is the normalization of the entire cosmological distance scale. It appears in the age of the universe:</p>
$$t_0 = \int_0^\infty \frac{dz}{(1+z)\,H(z)}$$<p>which, for $\Lambda$CDM with the above parameters, gives $t_0 \approx 13.8$ Gyr. A higher $H_0$ means a faster expansion rate, which means — for fixed $\Omega$ values — a younger universe. An error in $H_0$ propagates into every cosmological distance and every age estimate. It is not a detail.</p>
<h2 id="ladder-1-the-late-universe-measurement-">Ladder 1: The Late-Universe Measurement ($H_0 = 73$)</h2>
<p>The distance ladder is the name for the chain of calibrated distance measurements that extends from the Earth to the far reaches of the universe. Each rung calibrates the next.</p>
<p><strong>Rung 1: Geometric parallax.</strong> For stars within roughly 1–2 kpc, we can measure distance directly from the shift in apparent position as the Earth orbits the Sun. The parallax angle $\pi$ (in arcseconds) gives the distance $d = 1/\pi$ parsecs. This is pure geometry — it follows from Euclid and Kepler, not from any physical model of stars. The Gaia space mission has measured parallaxes for more than 1.5 billion stars with precisions reaching $\sim 10\,\mu$as for the brightest objects, providing the geometric foundation of the entire ladder.</p>
<p><strong>Rung 2: Cepheid variables.</strong> These are pulsating giant stars whose oscillation period $P$ is tightly correlated with their intrinsic luminosity $L$ — the Leavitt period-luminosity relation, discovered by Henrietta Swan Leavitt in 1912. The relation takes the form</p>
$$M = \alpha \log_{10}(P/\text{days}) + \beta + \gamma \left[ \text{Fe/H} \right]$$<p>where $M$ is the absolute magnitude, and the metallicity term $\gamma[\text{Fe/H}]$ accounts for the chemical composition of the star. Once calibrated against nearby Cepheids with known parallax distances, this relation allows the distance to any galaxy hosting Cepheids to be inferred from period measurements alone. Cepheids are luminous enough to be resolved in galaxies out to $\sim 50$ Mpc.</p>
<p><strong>Rung 3: Type Ia supernovae.</strong> These are thermonuclear explosions of white dwarf stars that occur near the Chandrasekhar mass limit ($\approx 1.44 M_\odot$), and consequently near a characteristic peak luminosity. Their light curves are not perfectly standard, but the peak luminosity correlates tightly with the rate at which brightness declines after peak — the Phillips relation. After this standardisation, Type Ia SNe serve as &ldquo;standardisable candles&rdquo; reaching to redshifts $z \sim 2$, far beyond the reach of Cepheids.</p>
<p>The logic of the ladder: Gaia calibrates nearby Cepheids; those Cepheids calibrate Cepheids in SN Ia host galaxies; those SN Ia establish the absolute luminosity of the standard candle; that calibrated SN Ia sample gives recession velocities (from spectroscopic redshifts) and distances (from luminosities) simultaneously, yielding $H_0$.</p>
<p>The SH0ES collaboration (Supernovae and $H_0$ for the Equation of State of dark energy) has driven this measurement for the past fifteen years. Their 2022 result (<a href="#ref-Riess2022">Riess et al., 2022</a>), using Hubble Space Telescope Cepheid calibrations in 37 galaxies hosting Type Ia SNe, found</p>
$$H_0 = 73.04 \pm 1.04 \text{ km/s/Mpc}$$<p>This is a 1.4% measurement. For context, Hubble&rsquo;s original measurement had an error of several hundred percent.</p>
<p><strong>The JWST confirmation.</strong> One candidate systematic error was crowding: in HST images of distant galaxies, Cepheids might be blended with unresolved neighbouring stars, artificially brightening them and biasing the distance estimate. JWST&rsquo;s larger mirror and infrared sensitivity resolve individual Cepheids more cleanly in the same host galaxies. The results from JWST observations in 2023 confirmed the HST Cepheid distances. The crowding concern was not the answer. The distance ladder value is not a systematic artifact of HST resolution.</p>
<p>An independent late-universe measurement comes from time-delay cosmography. Gravitational lensing of a background quasar by an intervening galaxy produces multiple images; the arrival times of photons along different paths differ by an amount that depends on $H_0$. The TDCOSMO collaboration (<a href="#ref-Birrer2020">Birrer et al., 2020</a>) found $H_0 = 74.5^{+5.6}_{-6.1}$ km/s/Mpc from seven lensed quasars, entirely independently of the distance ladder. This is a completely different physical observable. It agrees with SH0ES.</p>
<h2 id="ladder-2-the-early-universe-measurement-">Ladder 2: The Early-Universe Measurement ($H_0 = 67$)</h2>
<p>The cosmic microwave background (CMB) is the thermal radiation left over from recombination — the epoch at $z \approx 1100$ (about 380,000 years after the Big Bang) when the universe had cooled enough for protons and electrons to combine into neutral hydrogen, allowing photons to free-stream for the first time. The CMB is extraordinarily uniform in temperature ($T \approx 2.725$ K) but carries tiny fluctuations at the level of $\Delta T / T \sim 10^{-5}$.</p>
<p>Before recombination, the universe was a tightly coupled photon-baryon fluid. Perturbations in this fluid oscillated: gravity pulled baryons into overdense regions, while radiation pressure resisted compression. The competition set up acoustic waves — sound waves in the plasma of the early universe. These waves travelled at the sound speed</p>
$$c_s = \frac{c}{\sqrt{3(1 + R)}}, \qquad R = \frac{3 \rho_b}{4 \rho_\gamma}$$<p>where $\rho_b$ and $\rho_\gamma$ are the baryon and photon energy densities. The waves propagated until recombination froze them in place. The characteristic length they had traversed — the sound horizon — is</p>
$$r_s = \int_0^{t_{\text{rec}}} \frac{c_s \, dt}{a(t)}$$<p>This is a physical length scale set by the microphysics of the early universe, which depends only on $\Omega_b$, $\Omega_m$, and the expansion rate $H(z)$ at $z \gtrsim 1100$ (well before any dark energy becomes relevant). For the best-fit $\Lambda$CDM parameters, $r_s \approx 147$ Mpc.</p>
<p>The frozen acoustic oscillations imprint a characteristic angular scale on the CMB temperature fluctuations. The angular size of the sound horizon as seen from today is</p>
$$\theta_s = \frac{r_s}{D_A(z = 1100)}$$<p>where $D_A$ is the angular diameter distance to the last-scattering surface. The Planck satellite measured this angle with extraordinary precision: $\theta_s = 0.59656°$ (approximately 0.6 degrees, corresponding to the first acoustic peak in the angular power spectrum). This is the most precisely measured quantity in cosmology.</p>
<p>Now, here is the key point. $\theta_s$ is measured directly from the CMB. But to extract $H_0$, we must model both $r_s$ (which depends on the early-universe physics) and $D_A(z=1100)$ (which depends on the late-universe expansion, and therefore on $H_0$). We fit $H_0$, $\Omega_m$, $\Omega_b$, and a handful of other parameters simultaneously to the entire CMB power spectrum.</p>
<p>The Planck 2018 result (<a href="#ref-Planck2020">Planck Collaboration, 2020</a>):</p>
$$H_0 = 67.4 \pm 0.5 \text{ km/s/Mpc}$$<p>This is a 0.7% measurement — even tighter than SH0ES. And it assumes $\Lambda$CDM. That assumption is crucial, and we will return to it.</p>
<p>Independent CMB experiments — the Atacama Cosmology Telescope (ACT) and the South Pole Telescope (SPT) — give consistent results, ruling out Planck-specific instrumental systematics. The CMB measurement is robust.</p>
<h2 id="the-tension-numbers-and-what-they-mean">The Tension: Numbers and What They Mean</h2>
<p>The discrepancy is:</p>
$$\Delta H_0 = 73.04 - 67.4 = 5.64 \text{ km/s/Mpc}$$<p>The combined statistical uncertainty is:</p>
$$\sigma_{\text{comb}} = \sqrt{1.04^2 + 0.5^2} \approx 1.16 \text{ km/s/Mpc}$$<p>The significance:</p>
$$\frac{\Delta H_0}{\sigma_{\text{comb}}} \approx 4.9\sigma$$<p>rounding to $\sim 5\sigma$ when additional late-universe calibrators and improved analyses are included. To put this in perspective: the probability of a $5\sigma$ discrepancy arising by chance from two correct measurements of the same quantity is roughly $3 \times 10^{-7}$. You might win a lottery. You probably should not bet your cosmological model on it.</p>
<p>The tension has grown monotonically over a decade. In 2016 it was $3.4\sigma$. In 2019 it was $4.4\sigma$. Now it sits at $\sim 5\sigma$. This is the opposite of what happens when a systematic error is found: systematics tend to get corrected, reducing the tension. Instead, as each new systematic hypothesis has been tested and rejected, the significance has crept upward.</p>
<h2 id="what-could-explain-it">What Could Explain It</h2>
<p>The community has not been idle. The number of proposed solutions runs into the hundreds. They can be organised into a few broad categories.</p>
<h3 id="systematic-errors-increasingly-unlikely">Systematic Errors (Increasingly Unlikely)</h3>
<p>The distance ladder has multiple candidate systematics that have been carefully evaluated:</p>
<ul>
<li>
<p><strong>Cepheid metallicity dependence</strong>: the period-luminosity relation shifts with iron abundance $[\text{Fe/H}]$. This has been calibrated from first principles and from Gaia observations of Milky Way Cepheids. The residual uncertainty is $\lesssim 0.5$ km/s/Mpc.</p>
</li>
<li>
<p><strong>Photometric crowding</strong>: addressed by JWST.</p>
</li>
<li>
<p><strong>LMC geometry and distance</strong>: the Large Magellanic Cloud is the anchor for Cepheid calibrations. Its distance is now known from eclipsing binary stars and from the time delay of SN 1987A&rsquo;s light echo to better than 1%.</p>
</li>
<li>
<p><strong>SN Ia physics</strong>: host galaxy dependence of SN Ia peak luminosity, Malmquist bias in flux-limited surveys, potential evolution with redshift. These have been studied extensively. Residual effects are estimated at $\sim 1$ km/s/Mpc.</p>
</li>
</ul>
<p>On the CMB side, Planck foreground subtraction has been audited, beam calibration has been checked, and independent experiments agree. There is no credible Planck systematic that could shift $H_0$ by $5.6$ km/s/Mpc.</p>
<p>The conclusion of most working cosmologists is that neither measurement chain contains a systematic error large enough to resolve the tension. This leaves us with physics.</p>
<h3 id="early-dark-energy">Early Dark Energy</h3>
<p>This is currently the most discussed new-physics solution (<a href="#ref-Poulin2019">Poulin et al., 2019</a>). The idea is to introduce a new energy component that becomes dynamically important at $z \sim 3000$–$5000$ — well before recombination but after matter-radiation equality. This &ldquo;Early Dark Energy&rdquo; (EDE) temporarily increases the expansion rate $H(z)$ at early times.</p>
<p>Why does this help? Recall that the CMB measures $\theta_s = r_s / D_A$ directly and precisely. The sound horizon is</p>
$$r_s \propto \int_{z_{\text{rec}}}^\infty \frac{c_s \, dz}{H(z)}$$<p>A faster expansion rate (higher $H(z)$ at early times) reduces the integral, shrinking $r_s$. The angular diameter distance $D_A$ also changes, but less sensitively. A smaller $r_s$ means that, to reproduce the same observed angle $\theta_s$, the model requires $D_A$ to be correspondingly smaller. Smaller $D_A$ implies a higher $H_0$.</p>
<p>The schematic: if we boost $H(z)$ at $z \sim 3500$ by $\sim 10\%$, the inferred $H_0$ from the CMB shifts from 67 toward 71–72 km/s/Mpc. This can be implemented by an axion-like scalar field $\phi$ that rolls down a periodic potential</p>
$$V(\phi) = \Lambda^4 \left[1 - \cos\left(\frac{\phi}{f}\right)\right]^n$$<p>The field oscillates around the potential minimum after recombination, rapidly diluting like matter or radiation, and leaving no late-universe signature.</p>
<p>EDE is not without problems. The required EDE fraction ($f_{\text{EDE}} \sim 0.1$ at peak) requires fine-tuning the initial field value. More seriously, EDE models generally worsen the $S_8$ tension — the $\sim 2$–$3\sigma$ discrepancy between CMB and weak gravitational lensing measurements of the parameter $S_8 = \sigma_8 \sqrt{\Omega_m / 0.3}$ (where $\sigma_8$ is the amplitude of matter fluctuations on 8 Mpc/$h$ scales). Fixing one tension while worsening another is not the behaviour of a correct theory.</p>
<h3 id="modified-gravity-and-interacting-dark-energy">Modified Gravity and Interacting Dark Energy</h3>
<p>A zoo of alternatives modifies either the late-time or early-time expansion history. These include:</p>
<ul>
<li><strong>Phantom dark energy</strong>: $w < -1$, which increases $H_0$ inferred from CMB fits</li>
<li><strong>Dynamical dark energy</strong>: $w \neq -1$, potentially evolving</li>
<li><strong>Interacting dark matter/dark energy</strong>: momentum transfer between sectors modifying both the background expansion and perturbation growth</li>
<li><strong>Modified gravity theories</strong> (Horndeski, bimetric gravity, $f(R)$ theories): these change the relationship between curvature and matter, altering $H(z)$</li>
</ul>
<p>None of these is clearly preferred by the data in isolation, but several of them become more interesting in light of DESI.</p>
<h3 id="the-local-void">The Local Void</h3>
<p>A tempting classical explanation: if we happen to live inside a large underdense region (a &ldquo;Hubble bubble&rdquo;), the local expansion rate measured by the distance ladder would be higher than the cosmic mean. In an underdense region, there is less gravitational deceleration, so things expand faster locally.</p>
<p>The problem is scale. To shift $H_0$ by $5$ km/s/Mpc, the void would need to extend to $\gtrsim 300$ Mpc and have an underdensity of $\sim 20$%. Neither is consistent with the observed large-scale structure of the universe, where surveys of galaxy distributions show we are not in an anomalously underdense region at that scale.</p>
<h2 id="desi-dark-energy-may-not-be-constant">DESI: Dark Energy May Not Be Constant</h2>
<p>In 2024, the Dark Energy Spectroscopic Instrument (DESI) published its first-year results (<a href="#ref-DESI2024">DESI Collaboration, 2024</a>). DESI is measuring baryon acoustic oscillations (BAOs) in the distribution of millions of galaxies — the same acoustic physics as the CMB, but imprinted in the late-universe galaxy distribution rather than in photons at $z = 1100$.</p>
<p>The BAO standard ruler is the sound horizon $r_s$: the same $\sim 147$ Mpc scale imprinted at recombination appears as a preferred separation between galaxy pairs in the low-redshift universe. By measuring the angular size and redshift separation of the BAO peak at multiple redshifts, DESI traces $H(z)$ across cosmic time.</p>
<p>DESI DR1 measured BAOs in over 6 million extragalactic objects spanning $0.1 < z < 4.2$ and found a $2.5$–$3.9\sigma$ preference for dark energy that evolves with time (the significance depending on which supernova dataset is used in the combination). The standard model assumes $w = P/\rho = -1$ exactly (a cosmological constant). DESI&rsquo;s data is better fit by the $w_0$–$w_a$ parameterisation:</p>
$$w(a) = w_0 + w_a (1 - a)$$<p>where $a = 1/(1+z)$ is the scale factor. The DESI DR1 best-fit values, combined with CMB and SN Ia data, give $w_0 \approx -0.73$ and $w_a \approx -1.0$ — a dark energy that was more negative (more repulsive) in the past and is becoming less negative today. DESI DR2 (released in March 2025) raised the significance of this preference to $4.2\sigma$.</p>
<p>The connection to the Hubble tension is direct. The CMB&rsquo;s inference of $H_0 = 67.4$ km/s/Mpc is derived assuming $w = -1$ exactly throughout cosmic history. If dark energy is not a cosmological constant — if $w(z)$ varies — then the Friedmann equation at late times is different, the angular diameter distance $D_A(z=1100)$ is different, and the CMB-inferred $H_0$ changes. A dynamical dark energy that is less dominant at early times and more dominant at late times (which the DESI $w_0$–$w_a$ parameters suggest) tends to shift the CMB-inferred $H_0$ upward.</p>
<p>DESI may be showing us the resolution of the Hubble tension: not a systematic error in either measurement chain, but a genuine departure from $\Lambda$CDM that biases both inferences in opposite directions. The distance ladder measures $H_0$ today from local observations. The CMB infers $H_0$ from a model that assumes $w = -1$ everywhere. If the model is wrong, the inference is wrong.</p>
<p>This is not yet settled. The DESI results are also consistent with systematic errors in the SN Ia data used in combination with BAOs. The statistical significance is below $5\sigma$ for the individual datasets. But the direction of the deviation is consistent across data combinations, and it points toward the same part of parameter space that would ease the Hubble tension.</p>
<h2 id="jwst-and-the-early-galaxy-problem">JWST and the Early Galaxy Problem</h2>
<p>A brief digression — or perhaps not a digression. JWST was designed partly to study the first galaxies. What it has found is unexpected: there are galaxies at $z > 10$ (less than 500 million years after the Big Bang) that are more massive and more luminous than standard $\Lambda$CDM galaxy formation models predicted. Early headlines announced that JWST was &ldquo;breaking cosmology.&rdquo; The reality is more nuanced: $\Lambda$CDM is not broken by these observations, and some of the most extreme early candidates have been revised to lower redshifts as spectra were taken. But genuine tension persists for some objects.</p>
<p>The important point is the accumulation. The Hubble tension is a $5\sigma$ discrepancy in $H_0$. The $S_8$ tension is a $2$–$3\sigma$ discrepancy in matter clustering. The early galaxy problem is a qualitative excess at high redshift. DESI shows $4.2\sigma$ evidence for evolving dark energy. None of these individually is an unambiguous model-breaking crisis. Together, they are multiple independent data sets all pointing in the same direction: $\Lambda$CDM is under pressure. That is not a coincidence you should ignore.</p>
<h2 id="what-it-would-mean-if-cdm-is-wrong">What It Would Mean if $\Lambda$CDM Is Wrong</h2>
<p>Let me be clear about what &ldquo;wrong&rdquo; means here. $\Lambda$CDM is an extraordinarily successful model. It predicted the angular positions of the CMB acoustic peaks before they were measured. It correctly describes the large-scale structure of the universe across twelve billion years of cosmic history. It accounts for the primordial abundances of helium ($\sim 25\%$ by mass), deuterium, and lithium-7 through Big Bang nucleosynthesis. The 2011 Nobel Prize in Physics was awarded for the discovery of accelerated expansion — one of the central $\Lambda$ in $\Lambda$CDM.</p>
<p>Abandoning this model is not a decision to take lightly, and no serious cosmologist is proposing to do so. What is being proposed is that $\Lambda$CDM is <em>incomplete</em> in a specific way.</p>
<p>$\Lambda$CDM is not a fundamental theory. It is an empirical model with six free parameters: $H_0$, $\Omega_b h^2$, $\Omega_c h^2$, $A_s$ (scalar amplitude), $n_s$ (spectral index), $\tau$ (optical depth to reionization). It does not explain what dark matter actually is — whether it is a WIMP, an axion, a primordial black hole, or something stranger. It does not explain why the cosmological constant $\Lambda$ has the value it does (a separate crisis: the cosmological constant problem, which gives the wrong answer by $10^{120}$ when computed from quantum field theory). The model describes; it does not explain.</p>
<p>The Hubble tension, if it survives further scrutiny and grows in significance, would tell us something specific: the expansion history of the universe is not what $\Lambda$CDM predicts. Either there is new physics at early times (EDE, modified gravity before recombination) or the dark energy is not a cosmological constant (DESI). In either case, the fix is a modification of the Friedmann equation — a correction to how we model the $\Omega$ parameters or their time dependence. This is science working as it should: a model that has been extraordinarily successful is now encountering its limits, and those limits are pointing toward something new.</p>
<h2 id="closing-thoughts">Closing Thoughts</h2>
<p>I write on this blog about transit photometry — measuring the dimming of starlight as a planet crosses its star&rsquo;s disk (see, for instance, <a href="/posts/exoplanet-hunting-smartphones/">exoplanet hunting with smartphones</a> or <a href="/posts/the-gift-of-transits/">the gift of transits</a>). Those observations work because we trust the geometric relationship between angular size, distance, and physical size. The same geometric trust is what underpins Rung 1 of the distance ladder: parallax.</p>
<p>What is striking to me, as someone trained in physics, is that the Hubble tension sits at the <em>top</em> of the same ladder I describe at the bottom. Parallax gives distances to nearby stars. Those calibrate Cepheids. Those calibrate supernovae. Those supernovae reach to $z \sim 2$, and their recession velocities — measured by spectroscopy and interpreted through general relativity — give $H_0 = 73$. Meanwhile, the acoustic oscillations in the CMB that I described in <a href="/posts/astro-lab-at-home/">astro-lab at home</a> as a snapshot of the early universe give $H_0 = 67$ by a completely independent method. The two answers disagree at $5\sigma$.</p>
<p>The ladder is not broken. Both rungs have been checked, rechecked, and cross-checked. JWST has confirmed the Cepheid distances. Independent CMB experiments confirm Planck. DESI finds that dark energy may not be constant. Everything points in the same direction: the universe is telling us something.</p>
<p>We do not yet know what.</p>
<p>I find this situation — a clean empirical crisis, well-measured, unexplained — to be among the most exciting things happening in physics. Not because I enjoy confusion, but because clean empirical crises are where physics makes progress. The perihelion of Mercury was an annoying discrepancy with Newtonian gravity until Einstein showed it was a signature of spacetime curvature. The ultraviolet catastrophe in blackbody radiation was an embarrassing failure until Planck (Max, not the satellite) introduced the quantum hypothesis. The Hubble tension may be the next one. Or it may turn out to be a mundane systematic that everyone missed. Either answer would be interesting.</p>
<p>For now, the universe has two expansion rates and one of them is wrong. We are working on finding out which.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-Planck2020"></span>Planck Collaboration. (2020). Planck 2018 results. VI. Cosmological parameters. <em>Astronomy &amp; Astrophysics</em>, 641, A6. <a href="https://doi.org/10.1051/0004-6361/201833910">DOI: 10.1051/0004-6361/201833910</a></p>
</li>
<li>
<p><span id="ref-Riess2022"></span>Riess, A. G., et al. (2022). A Comprehensive Measurement of the Local Value of the Hubble Constant with 1 km/s/Mpc Uncertainty from the Hubble Space Telescope and the SH0ES Team. <em>The Astrophysical Journal Letters</em>, 934, L7. <a href="https://doi.org/10.3847/2041-8213/ac5c5b">DOI: 10.3847/2041-8213/ac5c5b</a></p>
</li>
<li>
<p><span id="ref-DESI2024"></span>DESI Collaboration. (2024). DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations. arXiv:2404.03002. Retrieved from <a href="https://arxiv.org/abs/2404.03002">https://arxiv.org/abs/2404.03002</a></p>
</li>
<li>
<p><span id="ref-Poulin2019"></span>Poulin, V., Smith, T. L., Karwal, T., &amp; Kamionkowski, M. (2019). Early Dark Energy can resolve the Hubble tension. <em>Physical Review Letters</em>, 122, 221301. <a href="https://doi.org/10.1103/PhysRevLett.122.221301">DOI: 10.1103/PhysRevLett.122.221301</a></p>
</li>
<li>
<p><span id="ref-Birrer2020"></span>Birrer, S., et al. (TDCOSMO). (2020). TDCOSMO IV: Hierarchical time-delay cosmography — joint inference of the Hubble constant, mass density profile and external convergence. <em>Astronomy &amp; Astrophysics</em>, 643, A165. <a href="https://doi.org/10.1051/0004-6361/202038861">DOI: 10.1051/0004-6361/202038861</a></p>
</li>
<li>
<p><span id="ref-Freedman2021"></span>Freedman, W. L. (2021). Measurements of the Hubble Constant: Tensions in Perspective. <em>The Astrophysical Journal</em>, 919, 16. <a href="https://doi.org/10.3847/1538-4357/ac0e95">DOI: 10.3847/1538-4357/ac0e95</a></p>
</li>
<li>
<p><span id="ref-DiValentino2021"></span>Di Valentino, E., et al. (2021). In the realm of the Hubble tension — a review of solutions. <em>Classical and Quantum Gravity</em>, 38(15), 153001. <a href="https://doi.org/10.1088/1361-6382/ac086d">DOI: 10.1088/1361-6382/ac086d</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-03-22</strong>: Updated TDCOSMO quasar count to seven lensed systems and the $H_0$ value to match the TDCOSMO-only analysis. Updated DESI DR1 galaxy count to over 6 million extragalactic objects (the previous figure of 14 million corresponds to DR2). Added qualification that the 3.9$\sigma$ significance for evolving dark energy is dataset-dependent (ranging from 2.5$\sigma$ to 3.9$\sigma$ depending on the supernova sample used).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Car Wash, Part Three: The AI Said Walk</title>
      <link>https://sebastianspicker.github.io/posts/car-wash-walk/</link>
      <pubDate>Thu, 12 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/car-wash-walk/</guid>
      <description>A new video went viral last week: same question, &amp;ldquo;should I drive to the car wash?&amp;rdquo;, different wrong answer — the AI said to walk instead. This is neither the tokenisation failure from the strawberry post nor the grounding failure from the rainy-day post. It is a pragmatic inference failure: the model understood all the words and (probably) had the right world state, but assigned its response to the wrong interpretation of the question. A third and more subtle failure mode, with Grice as the theoretical handle.</description>
      <content:encoded><![CDATA[<p><em>Third in an accidental series. Part one:
<a href="/posts/strawberry-tokenisation/">Three Rs in Strawberry</a> — tokenisation
and representation. Part two:
<a href="/posts/car-wash-grounding/">Should I Drive to the Car Wash?</a> — grounding
and missing world state. This one is different again.</em></p>
<hr>
<h2 id="the-video">The Video</h2>
<p>Same question as last month&rsquo;s: &ldquo;Should I drive to the car wash?&rdquo; New
video, new AI, new wrong answer. This time the assistant replied that
walking was the better option — better for health, better for the
environment, and the car wash was only fifteen minutes away on foot.</p>
<p>Accurate, probably. Correct, arguably. Useful? No.</p>
<p>The model did not fail because of tokenisation. It did not fail because
it lacked access to the current weather. It failed because it read the
wrong question. The user was asking &ldquo;is now a good time to have my car
washed?&rdquo; The model answered &ldquo;what is the most sustainable way for a
human to travel to the location of a car wash?&rdquo;</p>
<p>These are different questions. The model chose the second one. This is
a pragmatic inference failure, and it is the most instructive of the
three failure modes in this series — because the model was not, by any
obvious measure, working incorrectly. It was working exactly as
designed, on the wrong problem.</p>
<hr>
<h2 id="what-the-question-actually-meant">What the Question Actually Meant</h2>
<p>&ldquo;Should I drive to the car wash?&rdquo; is not about how to travel. The word
&ldquo;drive&rdquo; here is not a transportation verb; it is part of the idiomatic
compound &ldquo;drive to the car wash,&rdquo; which means &ldquo;take my car to get
washed.&rdquo; The presupposition of the question is that the speaker owns a
car, the car needs or might benefit from washing, and the speaker is
deciding whether the current moment is a good one to go. Nobody asking
this question wants to know whether cycling is a viable alternative.</p>
<p>Linguists distinguish between what a sentence <em>says</em> — its literal
semantic content — and what it <em>implicates</em> — the meaning a speaker
intends and a listener is expected to infer. Paul Grice formalised this
in 1975 with a set of conversational maxims describing how speakers
cooperate to communicate:</p>
<ul>
<li><strong>Quantity</strong>: say as much as is needed, no more</li>
<li><strong>Quality</strong>: say only what you believe to be true</li>
<li><strong>Relation</strong>: be relevant</li>
<li><strong>Manner</strong>: be clear and orderly</li>
</ul>
<p>The maxims are not rules; they are defaults. When a speaker says
&ldquo;should I drive to the car wash?&rdquo;, a cooperative listener applies the
maxim of Relation to infer that the question is about car maintenance
and current conditions, not about personal transport choices. The
&ldquo;drive&rdquo; is incidental to the real question, the way &ldquo;I ran to the
store&rdquo; does not invite commentary on jogging technique.</p>
<p>The model violated Relation — in the pragmatic sense. Its answer was
technically relevant to one reading of the sentence, and irrelevant to
the only reading a cooperative human would produce.</p>
<hr>
<h2 id="a-taxonomy-of-the-three-failures">A Taxonomy of the Three Failures</h2>
<p>It is worth being precise now that we have three examples:</p>
<p><strong>Strawberry</strong> (tokenisation failure): The information needed to answer
was present in the input string but lost in the model&rsquo;s representation.
&ldquo;Strawberry&rdquo; → </p>
\["straw", "berry"\]<p> — the character &ldquo;r&rdquo; in &ldquo;straw&rdquo; is
not directly accessible. The model understood the task correctly; the
representation could not support it.</p>
<p><strong>Car wash, rainy day</strong> (grounding failure): The model understood the
question. The information needed to answer correctly — current weather —
was never in the input. The model answered by averaging over all
plausible contexts, producing a sensible-on-average response that was
wrong for this specific context.</p>
<p><strong>Car wash, walk</strong> (pragmatic inference failure): The model had all
the relevant words. It may have had access to the weather, the location,
the car state. It chose the wrong interpretation of what was being
asked. The sentence was read at the level of semantic content rather
than communicative intent.</p>
<p>Formally: let $\mathcal{I}$ be the set of plausible interpretations of
an utterance $u$. The intended interpretation $i^*$ is the one a
cooperative, contextually informed listener would assign. A well-functioning
pragmatic reasoner computes:</p>
$$i^* = \arg\max_{i \in \mathcal{I}} \; P(i \mid u, \text{context})$$<p>The model appears to have assigned high probability to the
transportation-choice interpretation $i_{\text{walk}}$, apparently on
the surface pattern: &ldquo;should I </p>
\[verb of locomotion\]<p> to </p>
\[location\]<p>?&rdquo;
generates responses about modes of transport. It is a natural
pattern-match. It is the wrong one.</p>
<hr>
<h2 id="why-this-failure-mode-is-more-elusive">Why This Failure Mode Is More Elusive</h2>
<p>The tokenisation failure has a clean diagnosis: look at the BPE splits,
find where the character information was lost. The grounding failure has
a clean diagnosis: identify the context variable $C$ the answer depends
on, check whether the model has access to it.</p>
<p>The pragmatic failure is harder to pin down because the model&rsquo;s answer
was not, in isolation, wrong. Walking is healthy. Walking to a car wash
that is fifteen minutes away is plausible. If you strip the question of
its conversational context — a person standing next to their dirty car,
wondering whether to bother — the model&rsquo;s response is coherent.</p>
<p>The error lives in the gap between what the sentence says and what the
speaker meant, and that gap is only visible if you know what the speaker
meant. In a training corpus, this kind of error is largely invisible:
there is no ground truth annotation that marks a technically-responsive
answer as pragmatically wrong.</p>
<p>This is a version of a known problem in computational linguistics: models
trained on text predict text, and text does not contain speaker intent.
A model can learn that &ldquo;should I drive to X?&rdquo; correlates with responses
about travel options, because that correlation is present in the data.
What it cannot easily learn from text alone is the meta-level principle:
this question is about the destination&rsquo;s purpose, not the journey.</p>
<hr>
<h2 id="the-gricean-model-did-not-solve-this">The Gricean Model Did Not Solve This</h2>
<p>It is tempting to think that if you could build in Grice&rsquo;s maxims
explicitly — as constraints on response generation — you would prevent
this class of failure. Generate only responses that are relevant to the
speaker&rsquo;s probable intent, not just to the sentence&rsquo;s semantic content.</p>
<p>This does not obviously work, for a simple reason: the maxims require
a model of the speaker&rsquo;s intent, which is exactly what is missing.
You need to know what the speaker intends to know which response is
relevant; you need to know which response is relevant to determine
the speaker&rsquo;s intent. The inference has to bootstrap from somewhere.</p>
<p>Human pragmatic inference works because we come to a conversation with
an enormous amount of background knowledge about what people typically
want when they ask particular kinds of questions, combined with
contextual cues (tone, setting, previous exchanges) that narrow the
interpretation space. A person asking &ldquo;should I drive to the car wash?&rdquo;
while standing next to a mud-spattered car in a conversation about
weekend plans is not asking for a health lecture. The context is
sufficient to fix the interpretation.</p>
<p>Language models receive text. The contextual cues that would fix the
interpretation for a human — the mud on the car, the tone of the
question, the setting — are not available unless someone has typed them
out. The model is not in the conversation; it is receiving a transcript
of it, from which the speaker&rsquo;s intent has to be inferred indirectly.</p>
<hr>
<h2 id="where-this-leaves-the-series">Where This Leaves the Series</h2>
<p>Three videos, three failure modes, three diagnoses. None of them are
about the model being unintelligent in any useful sense of the word.
Each of them is a precise consequence of how these systems work:</p>
<ol>
<li>Models process tokens, not characters. Character-level structure can
be lost at the representation layer.</li>
<li>Models are trained on static corpora and have no real-time connection
to the world. Context-dependent questions are answered by marginalising
over all plausible contexts, which is wrong when the actual context
matters.</li>
<li>Models learn correlations between sentence surface forms and response
types. The correlation between &ldquo;should I
\[travel verb\]to
\[place\]?&rdquo;
and transport-related responses is real in the training data. It is the
wrong correlation for this question.</li>
</ol>
<p>The useful frame, in all three cases, is not &ldquo;the model failed&rdquo; but
&ldquo;what, precisely, does the model lack that would be required to succeed?&rdquo;
The answers point in different directions: better tokenisation; real-time
world access and calibrated uncertainty; richer models of speaker intent
and conversational context. The first is an engineering problem. The
second is partially solvable with tools and still hard. The third is
unsolved.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Grice, P. H. (1975). Logic and conversation. In P. Cole &amp; J. Morgan
(Eds.), <em>Syntax and Semantics, Vol. 3: Speech Acts</em> (pp. 41–58).
Academic Press.</p>
</li>
<li>
<p>Levinson, S. C. (1983). <em>Pragmatics.</em> Cambridge University Press.</p>
</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Automate the Boring Stuff: Setlist to Playlist</title>
      <link>https://sebastianspicker.github.io/posts/setlist-to-playlist/</link>
      <pubDate>Tue, 10 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/setlist-to-playlist/</guid>
      <description>I love concerts. I love setlists. I hate building the playlist manually afterward. But do I really? A small automation project, a Deftones show in Dortmund, and the question of whether you should automate something you kind of enjoy.</description>
      <content:encoded><![CDATA[<p>Saturday was the Deftones at the Westfalenhalle in Dortmund. One of those concerts where the setlist is part of the experience — where you register, with something close to physical relief, that the arc landed exactly right, and you spend the Uber home mentally replaying the order.</p>
<p>Sunday I built a playlist from it. It took about forty minutes.</p>
<p>This is the post about why that number is already too low, and also possibly too high.</p>
<h2 id="the-ritual">The Ritual</h2>
<p>There is a specific kind of concert listening that happens in the days after a show. You go home, you look up the setlist — setlist.fm is the canonical archive, maintained with an almost academic precision by people who care — and you build a playlist from it in whatever streaming app you use. Then you play it through, in order, and what comes back is not just the music but the spatial memory of the room, the sound mix, the moment the lights dropped for that particular song.</p>
<p>I have been doing this for years. It is a ritual, and like most rituals, part of its meaning is in the doing. The forty minutes of searching song by song, the occasional discovery that a deep cut is on Apple Music in one version but not another, the fiddling with live versus studio — that friction is not purely annoying. It is part of the processing.</p>
<p>And yet. The pile of unprocessed setlists sits in a folder. Shows I attended and never got around to. Setlists I meant to build into playlists and didn&rsquo;t, because the forty minutes were not available that week, and then the moment passed. The ritual unrealised is just a list of song titles.</p>
<p>This is the dilemma, and it is not entirely trivial.</p>
<h2 id="why-this-is-harder-than-it-should-be">Why This Is Harder Than It Should Be</h2>
<p>The setlist.fm API is excellent. It gives you structured data: artist, venue, date, song titles in order, with notations for encores, covers, and dropped songs. What it does not give you is streaming IDs. The song title is a string; the Apple Music track is an object with a catalog ID, a duration, multiple versions, regional availability, and the possibility of not existing at all in the catalog of your country.</p>
<p>The matching problem — connecting a string like &ldquo;Change (In the House of Flies)&rdquo; to the correct Apple Music track, filtered for the right album version, ignoring the live recordings you did not ask for — is not hard, but it is fiddly. You can get 80% of a setlist matched automatically without much effort. The remaining 20% are the covers, the deep cuts, the songs with subtitles in parentheses that differ between the setlist record and the catalog metadata.</p>
<p>Spotify has a fairly rich ecosystem of community tools for exactly this workflow, because Spotify&rsquo;s API is permissive and well-documented and the auth flow is reasonable for third-party developers. Apple Music is harder. The MusicKit framework is real and capable, but the authentication requires managing a private key and JWT tokens signed with developer credentials — not the OAuth dance most developers are used to. The result is that the setlist → Apple Music pipeline is significantly underbuilt compared to the Spotify equivalent.</p>
<p>This is partly why I built <a href="https://github.com/sebastianspicker/setlist-to-playlist">setlist-to-playlist</a> as a PWA rather than reaching for an existing tool.</p>
<h2 id="how-it-works">How It Works</h2>
<p>The app is a Progressive Web App — installable, mobile-friendly, works as a small tool you open on your phone in the taxi home from a show — built on Next.js with a monorepo structure managed by pnpm and Turbo. The architecture is in three phases:</p>
<p><strong>Import.</strong> You paste a setlist.fm URL or ID. The app queries setlist.fm through a server-side proxy — the API key lives on the server and never touches the client — and returns the structured setlist data: songs in order, with metadata about covers, medleys, and notes.</p>
<p><strong>Preview and matching.</strong> The core package runs a matching algorithm against the Apple Music catalog, using the MusicKit JS API for browser-based catalog search. For each song title, it searches Apple Music and presents the best candidate, giving you the chance to confirm or swap before anything is written. This is the step where the 20% problem is addressed manually — the app handles the obvious cases automatically and surfaces the ambiguous ones for human judgement.</p>
<p><strong>Export.</strong> Once you are happy with the track list, the app creates a playlist in your Apple Music library. MusicKit handles the authentication in-browser; the backend generates the JWT tokens using credentials from Apple Developer, signing with the private key server-side so it stays off the client.</p>
<p>The whole thing is local-first in the sense that matters: the Apple Music authentication is between your browser and Apple, and no playlist data or listening history is stored by the app. The only thing the server touches is the API key proxying and the JWT generation.</p>
<h2 id="the-actual-experience">The Actual Experience</h2>
<p>After the Deftones show: opened the app on the phone, pasted the setlist.fm URL, had the playlist in Apple Music in about four minutes. Three tracks needed manual confirmation — two because of live-versus-studio ambiguity, one because a cover required a search adjustment, the kind of edge case where the name setlist.fm records differs from what appears in regional streaming catalogs.</p>
<p>Four minutes instead of forty. Mission accomplished.</p>
<p>And yet.</p>
<p>I noticed, processing the setlist that quickly, that something was missing. Not the music — the music was all there, in order, correct. What was missing was the time spent inside the setlist. The forty minutes of handling each song is also forty minutes of thinking about each song, of remembering where in the set it fell, of deciding which album version you want to hear. The automation removed the friction and also removed the processing.</p>
<p>I am not sure this is a problem. It is probably more accurate to say that it is a trade-off, and that what trade-off you want depends on what you are doing with the ritual. If the backlog is the problem — the pile of unprocessed shows — the automation solves it cleanly. If the processing itself is the point, you probably should not automate it, and the tool is there for when you want it.</p>
<p>That is the correct relationship to automation, I think. Not &ldquo;this should always be automated&rdquo; or &ldquo;this should never be automated&rdquo;, but &ldquo;here is a tool that removes the mechanical part; use it when the mechanical part is not the point&rdquo;.</p>
<h2 id="a-note-on-the-tech-stack">A Note on the Tech Stack</h2>
<p>For the interested: Next.js 15 with App Router, pnpm workspaces with Turbo for the monorepo, MusicKit JS for Apple Music integration, setlist.fm REST API. The JWT for Apple Music uses the <code>jose</code> library for token signing. The matching logic lives in a standalone <code>packages/core</code> module, which makes it testable in isolation and reusable if anyone wants to port this to a different frontend or a CLI.</p>
<p>The repo is at <a href="https://github.com/sebastianspicker/setlist-to-playlist">github.com/sebastianspicker/setlist-to-playlist</a>. PRs welcome, particularly around the matching heuristics — that is the part where there is the most room for improvement.</p>
<hr>
<p>The Deftones were exceptional, for the record. The Westfalenhalle was loud in the way that only a concrete hall that size can be loud, which is to say: correctly loud.</p>
<p>The playlist is good. I am glad it took four minutes and not forty.</p>
<p>I am also glad I know what I gave up.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Your Encryption Keys Are in Virginia: On BitLocker, the FBI, and Why European Universities Need Sovereign Software</title>
      <link>https://sebastianspicker.github.io/posts/public-money-public-code/</link>
      <pubDate>Wed, 28 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/public-money-public-code/</guid>
      <description>Microsoft confirmed this week that it hands BitLocker encryption keys to the FBI on receipt of a valid legal order. Windows 11 uploads them to your Microsoft account by default, without asking. For European universities that handle research data, student records, and HR information under GDPR, this is not an abstract concern. It is a structural problem. The answer is not a technical workaround. It is sovereign, publicly funded, openly licensed software — and a principle that the EU has articulated but not consistently practised: public money, public code.</description>
      <content:encoded><![CDATA[<h2 id="the-story">The Story</h2>
<p>Last week Microsoft confirmed, in response to reporting by TechCrunch and
others, that it had handed BitLocker recovery keys for three laptops to
the FBI following a valid court order. The underlying case was a fraud
investigation in Guam. The laptops were encrypted with BitLocker — the
full-disk encryption built into Windows, which many institutions and
individuals rely on as their primary protection against unauthorised
data access.</p>
<p>The mechanism is simple and was not widely known. When you set up a
modern Windows device and sign in with a Microsoft account, BitLocker
automatically uploads your recovery key to Microsoft&rsquo;s cloud. No
prominent notification. No opt-in. The key sits there, associated with
your account, accessible to Microsoft. When a US court issues a lawful
order, Microsoft complies. Redmond confirmed this is policy, not an
exception.</p>
<p>Bruce Schneier&rsquo;s <a href="https://www.schneier.com/blog/archives/2026/02/microsoft-is-giving-the-fbi-bitlocker-keys.html">response</a>
was characteristically direct: &ldquo;The lesson here is that if you have
access to keys, eventually law enforcement is going to come.&rdquo; Jennifer
Granick at the ACLU called remote key storage in this configuration
&ldquo;quite dangerous,&rdquo; particularly given that the same mechanism is
available to any government that can issue a Microsoft-compatible legal
order — not only the US Department of Justice.</p>
<p>That last point is the one European institutions should be reading
carefully.</p>
<hr>
<h2 id="why-this-is-a-european-problem">Why This Is a European Problem</h2>
<p>The CLOUD Act — the US Clarifying Lawful Overseas Use of Data Act,
passed in 2018 — allows US law enforcement to compel US-based companies
to produce data held on servers anywhere in the world. If your
university stores its BitLocker recovery keys in a Microsoft account,
and Microsoft is a US company, the geographic location of the servers
those keys sit on does not limit a US court&rsquo;s reach. The keys are in
Virginia, legally, wherever the hardware is.</p>
<p>This is not speculation. It is the explicit structure of US digital law.
The European Court of Justice has repeatedly ruled that certain US
surveillance frameworks are incompatible with GDPR — the invalidation
of Privacy Shield in <em>Schrems II</em> (2020) being the most prominent
example. But court rulings about data transfer frameworks do not
automatically change the operational reality for an institution whose
laptops are running Windows with default settings.</p>
<p>European universities hold exactly the kinds of data that make this
a real rather than a theoretical concern:</p>
<ul>
<li><strong>Research data</strong>: medical studies, clinical trials, interviews with
human subjects, social science datasets — all subject to strict
ethical and legal protections</li>
<li><strong>Student records</strong>: academic performance, personal circumstances,
disciplinary proceedings</li>
<li><strong>HR data</strong>: employment contracts, salary records, health information,
union activity — particularly sensitive under German and EU labour
law</li>
<li><strong>Correspondence and draft documents</strong>: research in progress, grant
applications, peer review material</li>
</ul>
<p>If the disk holding any of this is encrypted with BitLocker, and the
recovery key has been uploaded to a Microsoft account by default, the
encryption provides less protection than it appears to. The key is
accessible to a foreign state with a court order. That state is not
party to GDPR.</p>
<hr>
<h2 id="the-structural-problem">The Structural Problem</h2>
<p>The BitLocker story is one instance of a larger pattern. It is not that
Microsoft behaved unusually or maliciously — it complied with a lawful
order in its home jurisdiction, as it is legally required to do. The
problem is structural: <strong>when an institution depends on a closed-source,
US-headquartered platform for its critical infrastructure, the
institution has delegated control over its own data to an entity whose
legal obligations lie elsewhere.</strong></p>
<p>This applies beyond encryption. It applies to email (Exchange Online,
Outlook), document storage (SharePoint, OneDrive), communication
(Teams), identity management (Azure Active Directory), and any service
that runs through a Microsoft account or Azure tenant. For each of these:
the data is subject to Microsoft&rsquo;s terms, and Microsoft is subject to
US law.</p>
<p>The same argument applies, with different specifics, to Google Workspace
and any other US-headquartered platform. The issue is not that these
companies are bad actors. It is that their legal accountability and the
legal accountability of European public institutions point in
incompatible directions, and the institutions mostly have not noticed.</p>
<hr>
<h2 id="what-sovereign-software-looks-like">What Sovereign Software Looks Like</h2>
<p>The alternative is not paranoia and air-gapped servers. It is a
coherent strategy for institutional digital infrastructure that is
based on software the institution controls.</p>
<p>In Germany, this conversation has a name and a project. <strong>OpenDesk</strong>
— developed under the aegis of the federal and state governments —
is a stack of open-source tools (Nextcloud, Collabora Online, Matrix/
Element, Jitsi, Keycloak, Open-Xchange) assembled into an integrated
workspace alternative to Microsoft 365. The <em>Souveräner Arbeitsplatz</em>
(sovereign workspace) concept behind it is exactly what the BitLocker
story illustrates: if the software is open, the keys stay in your
institution, and no foreign court can reach them via a warrant served
on a US company.</p>
<p>Several German states and federal agencies have been piloting OpenDesk.
The city of Munich&rsquo;s earlier experiment with Linux (LiMux) and its
eventual rollback to Windows is the cautionary tale here — not because
open source failed, but because the transition was not supported
seriously enough over time, and the incumbent vendor&rsquo;s lobbying was.
The BitLocker story is a reminder of what is at stake in that political
negotiation.</p>
<p>The FSFE&rsquo;s <strong>&ldquo;Public Money? Public Code!&rdquo;</strong> campaign has articulated
the principle cleanly: software developed with public funding should
be released as open-source software. The argument is not only about
freedom as an abstract value. It is about the practical consequence of
being locked into a proprietary platform: your institution loses the
ability to audit what the software does, to modify it to meet your
requirements, to host it where your data protection law applies, and to
switch providers without losing access to your own data.</p>
<hr>
<h2 id="what-i-do-and-why">What I Do, and Why</h2>
<p>I work at a publicly funded institution. The software I build for
institutional contexts — campus infrastructure, workforce management,
archival systems, alert systems — is public.</p>
<p>Not because I am ideologically committed to open source as a movement,
but because the alternative is incoherent. If I build tooling for a
university with public funds and keep it closed, I have produced a
private asset with public money, duplicated by every institution that
builds the same thing independently, inspectable by nobody, and
ultimately dependent on my continued willingness to maintain it or
hand it over. None of those outcomes serve the institutions I am
building for.</p>
<p>Here is what that looks like in practice:</p>
<p><strong><a href="https://github.com/sebastianspicker/zammad-ticket-archiver">zammad-ticket-archiver</a></strong> —
automated archival of Zammad support tickets as cryptographically
signed PDFs, with RFC 3161 timestamps for non-repudiation. Built for
institutions that need legally defensible audit trails of their
helpdesk operations. The signing infrastructure is self-hosted; no
external party holds the keys.</p>
<p><strong><a href="https://github.com/sebastianspicker/alarm-broker">alarm-broker</a></strong> —
a silent panic alarm broker for campus facilities. Receives emergency
triggers from hardware devices (Yealink keys), distributes
notifications via Zammad, SMS, and Signal, with acknowledgment
tracking and escalation scheduling. Runs locally, logs to
self-hosted PostgreSQL; no external dependency for the alarm path.</p>
<p><strong><a href="https://github.com/sebastianspicker/campus-app-kit">campus-app-kit</a></strong> —
a React Native / Expo starter for university mobile applications,
with a pluggable Node.js backend designed for institutional data
sources (room booking, events, schedules). The architecture separates
institution-specific connectors (which institutions keep private) from
the shared foundation (which is public). Any university can take it
and build on it without starting from scratch.</p>
<p><strong><a href="https://github.com/sebastianspicker/cueq">cueq</a></strong> — an integrated
workforce management system for German universities under TV-L
(the collective agreement for public sector employees in the German
states). Handles time recording, shift planning, absence management,
payroll export, and GDPR-compliant audit trails. Built around NestJS
and Next.js, with a PostgreSQL backend and Honeywell terminal
integration. The HR data stays on the institution&rsquo;s own infrastructure.</p>
<p>These are all boring. They are not research contributions; they are
plumbing. But plumbing is what holds institutions together, and the
question of who controls the plumbing — and under whose legal
jurisdiction — is exactly the question the BitLocker story makes
visible.</p>
<hr>
<h2 id="the-principle">The Principle</h2>
<p>Public money, public code. If an institution funded by public money
develops software for its own operations, that software should be
released under an open licence, inspectable, forkable, and deployable
by any institution with the same needs.</p>
<p>The corollary: institutions funded by public money should prefer
software that is itself openly licensed, auditable, and deployable
on infrastructure the institution controls. Not as a blanket ban on
proprietary tools where they are genuinely the best option, but as a
starting presumption that shifts the burden of justification.</p>
<p>The BitLocker story is not a story about Microsoft doing something
wrong. It is a story about the logical consequence of a procurement
decision that was made without asking &ldquo;and what happens when a US
court sends a subpoena?&rdquo; That question was available in 2018 when the
CLOUD Act passed, in 2020 when <em>Schrems II</em> was decided, and before
both. It is still available now, for every institution that has not
yet asked it.</p>
<hr>
<p><em>The FSFE &ldquo;Public Money? Public Code!&rdquo; campaign is at
<a href="https://publiccode.eu/">publiccode.eu</a>. The OpenDesk project is at
<a href="https://opendesk.de/">opendesk.de</a>. The original TechCrunch reporting
on the BitLocker handover is at
<a href="https://techcrunch.com/2026/01/23/microsoft-gave-fbi-a-set-of-bitlocker-encryption-keys-to-unlock-suspects-laptops-reports/">techcrunch.com</a>.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>Should I Drive to the Car Wash? On Grounding and a Different Kind of LLM Failure</title>
      <link>https://sebastianspicker.github.io/posts/car-wash-grounding/</link>
      <pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/car-wash-grounding/</guid>
      <description>A viral video this month showed an AI assistant confidently answering &amp;ldquo;should I go to the car wash today?&amp;rdquo; without knowing it was raining outside. The internet found it funny. The failure mode is real but distinct from the strawberry counting problem — this is not a representation issue, it is a grounding issue. The model understood the question perfectly. What it lacked was access to the state of the world the question was about.</description>
      <content:encoded><![CDATA[<p><em>Follow-up to <a href="/posts/strawberry-tokenisation/">Three Rs in Strawberry</a>,
which covered a different LLM failure: tokenisation and why models cannot
count letters. This one is about something structurally different.</em></p>
<hr>
<h2 id="the-video">The Video</h2>
<p>Someone asked their car&rsquo;s built-in AI assistant: &ldquo;Should I drive to the
car wash today?&rdquo; It was raining. The assistant said yes, enthusiastically,
with reasons: regular washing extends the life of the paintwork, removes
road salt, and so on. Technically correct statements, all of them.
Completely beside the point.</p>
<p>The clip spread. The reactions were the usual split: one camp said this
proves AI is useless, the other said it proves people expect too much
from AI. Both camps are arguing about the wrong thing.</p>
<p>The interesting question is: why did the model fail here, and is this
the same kind of failure as the strawberry problem?</p>
<p>It is not. The failures look similar from the outside — confident wrong
answer, context apparently ignored — but the underlying causes are
different, and the difference matters if you want to understand what
these systems can and cannot do.</p>
<hr>
<h2 id="the-strawberry-problem-was-about-representation">The Strawberry Problem Was About Representation</h2>
<p>In the strawberry case, the model failed because of the gap between its
input representation (BPE tokens: &ldquo;straw&rdquo; + &ldquo;berry&rdquo;) and the task (count
the character &ldquo;r&rdquo;). The character information was not accessible in the
model&rsquo;s representational units. The model understood the task correctly —
&ldquo;count the r&rsquo;s&rdquo; is unambiguous — but the input structure did not support
executing it.</p>
<p>That is a <em>representation</em> failure. The information needed to answer
correctly was present in the original string but was lost in the
tokenisation step.</p>
<p>The car wash case is different. The model received a perfectly
well-formed question and had no representation problem at all. &ldquo;Should I
drive to the car wash today?&rdquo; is tokenised without any information loss.
The model understood it. The failure is that the correct answer depends
on information that was never in the input in the first place.</p>
<hr>
<h2 id="the-missing-context">The Missing Context</h2>
<p>What would you need to answer &ldquo;should I drive to the car wash today?&rdquo;
correctly?</p>
<ul>
<li>The current weather (is it raining now?)</li>
<li>The weather forecast for the rest of the day (will it rain later?)</li>
<li>The current state of the car (how dirty is it?)</li>
<li>Possibly: how recently was it last washed, what kind of dirt (road
salt after winter, tree pollen in spring), whether there is a time
constraint</li>
</ul>
<p>None of this is in the question. A human asking the question has access
to some of it through direct perception (look out the window) and some
through memory (I just drove through mud). A language model has access
to none of it.</p>
<p>Let $X$ denote the question and $C$ denote this context — the current
state of the world that the question is implicitly about. The correct
answer $A$ is a function of both:</p>
$$A = f(X, C)$$<p>The model has $X$. It does not have $C$. What it produces is something
like an expectation over possible contexts, marginalising out the unknown
$C$:</p>
$$\hat{A} = \mathbb{E}_C\!\left[\, f(X, C) \,\right]$$<p>Averaged over all plausible contexts in which someone might ask this
question, &ldquo;going to the car wash&rdquo; is probably a fine idea — most of the
time when people ask, it is not raining and the car is dirty.
$\hat{A}$ is therefore approximately &ldquo;yes.&rdquo; The model returns &ldquo;yes.&rdquo;
In this particular instance, where $C$ happens to include &ldquo;it is
currently raining,&rdquo; $\hat{A} \neq f(X, C)$.</p>
<p>The quantity that measures how much the missing context matters is the
mutual information between the answer and the context, given the
question:</p>
$$I(A;\, C \mid X) \;=\; H(A \mid X) - H(A \mid X, C)$$<p>Here $H(A \mid X)$ is the residual uncertainty in the answer given only
the question, and $H(A \mid X, C)$ is the residual uncertainty once the
context is also known. For most questions in a language model&rsquo;s training
distribution — &ldquo;what is the capital of France?&rdquo;, &ldquo;how do I sort a list
in Python?&rdquo; — this mutual information is near zero: the context does not
change the answer. For situationally grounded questions like the car wash
question, it is large: the answer is almost entirely determined by the
context, not the question.</p>
<hr>
<h2 id="why-the-model-was-confident-anyway">Why the Model Was Confident Anyway</h2>
<p>This is the part that produces the most indignation in the viral clips:
not just that the model was wrong, but that it was <em>confident</em> about
being wrong. It did not say &ldquo;I don&rsquo;t know what the current weather is.&rdquo;
It said &ldquo;yes, here are five reasons you should go.&rdquo;</p>
<p>Two things are happening here.</p>
<p><strong>Training distribution bias.</strong> Most questions in the training data that
resemble &ldquo;should I do X?&rdquo; have answers that can be derived from general
knowledge, not from real-time world state. &ldquo;Should I use a VPN on public
WiFi?&rdquo; &ldquo;Should I stretch before running?&rdquo; &ldquo;Should I buy a house or rent?&rdquo;
All of these have defensible answers that do not depend on the current
weather. The model learned that this question <em>form</em> typically maps to
answers of the form &ldquo;here are some considerations.&rdquo; It applies that
pattern here.</p>
<p><strong>No explicit uncertainty signal.</strong> The model was not trained to say
&ldquo;I cannot answer this because I lack context C.&rdquo; It was trained to
produce helpful-sounding responses. A response that acknowledges
missing information requires the model to have a model of its own
knowledge state — to know what it does not know. This is harder than
it sounds. The model has to recognise that $I(A; C \mid X)$ is high
for this question class, which requires meta-level reasoning about
information structure that is not automatically present.</p>
<p>This is sometimes called <em>calibration</em>: the alignment between expressed
confidence and actual accuracy. A well-calibrated model that is 80%
confident in an answer is right about 80% of the time. A model that is
confident about answers it cannot possibly know from its training data
is miscalibrated. The car wash video is a calibration failure as much
as a grounding failure.</p>
<hr>
<h2 id="what-grounding-means">What Grounding Means</h2>
<p>The term <em>grounding</em> in AI has a precise origin. Harnad (1990) used it
to describe the problem of connecting symbol systems to the things
they refer to — how does the word &ldquo;apple&rdquo; connect to actual apples,
rather than just to other symbols? A symbol system that only connects
symbols to other symbols (dictionary definitions, synonym relations)
has the form of meaning without the substance.</p>
<p>Applied to language models: the model has rich internal representations
of concepts like &ldquo;rain,&rdquo; &ldquo;car wash,&rdquo; &ldquo;dirty car,&rdquo; and their relationships.
But those representations are grounded in text about those things, not in
the things themselves. The model knows what rain is. It does not know
whether it is raining right now, because &ldquo;right now&rdquo; is not a location
in the training data.</p>
<p>This is not a solvable problem by making the model bigger or training it
on more text. More text does not give the model access to the current
state of the world. It is a structural feature of how these systems work:
they are trained on a static corpus and queried at inference time, with
no automatic connection to the world state at the moment of the query.</p>
<hr>
<h2 id="what-tool-use-gets-you-and-what-it-doesnt">What Tool Use Gets You (and What It Doesn&rsquo;t)</h2>
<p>The standard engineering response to grounding problems is tool use:
give the model access to a weather API, a calendar, a search engine.
Now when asked &ldquo;should I go to the car wash today?&rdquo; the model can query
the weather service, get the current conditions, and factor that into
the answer.</p>
<p>This is genuinely useful. The model with a weather tool call will answer
this question correctly in most circumstances. But tool use solves the
problem only if two conditions hold:</p>
<ol>
<li>
<p><strong>The model knows it needs the tool.</strong> It must recognise that this
question has $I(A; C \mid X) > 0$ for context $C$ that a weather
tool can provide, and that it is missing that context. This requires
the meta-level awareness described above. Models trained on tool use
learn to invoke tools for recognised categories of question; for novel
question types, or questions that superficially resemble answerable
ones, the tool call may not be triggered.</p>
</li>
<li>
<p><strong>The right tool exists and returns clean data.</strong> Weather APIs exist.
&ldquo;How dirty is my car?&rdquo; does not have an API. &ldquo;Am I the kind of person
who cares about car cleanliness enough that this matters?&rdquo; has no API.
Some missing context can be retrieved; some is inherently private to
the person asking.</p>
</li>
</ol>
<p>The deeper issue is not tool availability but <em>knowing what you don&rsquo;t
know</em>. A model that does not recognise its own information gaps cannot
reliably decide when to use a tool, ask a clarifying question, or
express uncertainty. This is a hard problem — arguably harder than
making the model more capable at the tasks it already handles.</p>
<hr>
<h2 id="the-contrast-stated-plainly">The Contrast, Stated Plainly</h2>
<p>The strawberry failure and the car wash failure look alike from the
outside — confident wrong answer — but they are different enough that
conflating them produces confused diagnosis and confused solutions.</p>
<p>Strawberry: the model has the information (the string &ldquo;strawberry&rdquo;), but
the representation (BPE tokens) does not preserve character-level
structure. The fix is architectural or procedural: character-level
tokenisation, chain-of-thought letter spelling.</p>
<p>Car wash: the model does not have the information (current weather,
car state). No fix to the model&rsquo;s architecture or prompt engineering
gives it information it was never given. The fix is exogenous: provide
the context explicitly, or give the model a tool that can retrieve it,
or design the system so that context-dependent questions are routed to
systems that have access to the relevant state.</p>
<p>A model that confidently answers the car wash question without access to
current conditions is not failing at language understanding. It is
behaving exactly as its training shaped it to behave, given its lack of
situational grounding. Knowing which kind of failure you are looking at
is most of the work in figuring out what to do about it.</p>
<hr>
<p><em>The grounding problem connects to the broader question of what it means
for a language model to &ldquo;know&rdquo; something — which comes up in a different
form in the <a href="/posts/more-context-not-always-better/">context window post</a>,
where the issue is not missing context but irrelevant context drowning
out the relevant signal.</em></p>
<p><em>A second car wash video a few weeks later produced a third, different
failure: <a href="/posts/car-wash-walk/">Car Wash, Part Three: The AI Said Walk</a> —
the model had the right world state but chose the wrong interpretation
of the question.</em></p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Harnad, S. (1990). The symbol grounding problem. <em>Physica D:
Nonlinear Phenomena</em>, 42(1–3), 335–346.
<a href="https://doi.org/10.1016/0167-2789(90)90087-6">https://doi.org/10.1016/0167-2789(90)90087-6</a></p>
</li>
<li>
<p>Guo, C., Pleiss, G., Sun, Y., &amp; Weinberger, K. Q. (2017). <strong>On
calibration of modern neural networks.</strong> <em>ICML 2017</em>.
<a href="https://arxiv.org/abs/1706.04599">https://arxiv.org/abs/1706.04599</a></p>
</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Try to Relax — and Other Things That Prevent Themselves</title>
      <link>https://sebastianspicker.github.io/posts/try-to-relax-ironic-process-wormholes/</link>
      <pubDate>Thu, 15 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/try-to-relax-ironic-process-wormholes/</guid>
      <description>&amp;ldquo;Try to relax&amp;rdquo; is a paradox with a precise psychological mechanism. So is the traversable wormhole: the geometry you need to cross spacetime closes the moment you try to use it. The grandfather paradox, Wegner&amp;rsquo;s ironic monitoring process, and Rick Sanchez&amp;rsquo;s nihilism problem all share the same deep structure — and understanding that structure is more interesting than any of the individual cases.</description>
      <content:encoded><![CDATA[<p>Someone, at some point in your life, has told you to relax. They may have
specified that you should <em>try</em> to relax — as though relaxation were an
effortful goal you could pursue with sufficient will. If you have ever
received this advice and found it made things worse, you were not imagining
it. You were experiencing a phenomenon with a name, a precise mechanism,
and — it turns out — a surprising structural analogue in the geometry of
spacetime.</p>
<hr>
<h2 id="the-ironic-process">The Ironic Process</h2>
<p>In 1994, the social psychologist Daniel Wegner published a paper that
formalised what most people already suspected: trying not to think of
something makes you think of it more <a href="#ref-1">[1]</a>. The theoretical
model behind this has two components.</p>
<p>The first is an <strong>operating process</strong>: it actively generates mental content
consistent with the intended state. You are trying to relax — the operating
process searches for calming thoughts, slows your attention, tries to find
the mood.</p>
<p>The second is a <strong>monitoring process</strong>: it runs in parallel, searching for
evidence that the goal has <em>not</em> been achieved. Am I relaxed yet? No.
Checking again. Still no. Its function is to detect failure early so the
operating process can correct course.</p>
<p>Under normal conditions, the operating process dominates. You try to relax,
the monitor runs quietly in the background, and eventually you converge on
the intended state. Under conditions of cognitive load, stress, or
self-consciousness — precisely the conditions under which someone might
urgently need to relax — the balance shifts. The monitoring process,
searching for signs of not-relaxing, finds them everywhere. The monitor
activates the very content it is supposed to prevent. The harder you try,
the louder the monitor, the further from the goal.</p>
<p>This is Wegner&rsquo;s ironic process: the mechanism recruited to achieve a goal
becomes the primary obstacle to that goal. It is not failure of will. It
is a structural property of the system — and it applies to any goal whose
target state is the <em>absence</em> of effortful activity. Trying to fall asleep.
Trying not to feel anxious about a performance. Trying to be spontaneous.
Trying, in the most purely paradoxical formulation, to relax.</p>
<p>The instruction &ldquo;try to relax&rdquo; is not bad advice because the advice-giver
lacks empathy. It is bad advice because it is a <em>category error</em>: it applies
an effort-based tool to a goal defined by the absence of effort. The
monitoring process required to track progress toward the goal is precisely
the kind of activity that constitutes not having reached it.</p>
<hr>
<h2 id="a-geometry-that-does-the-same-thing">A Geometry That Does the Same Thing</h2>
<p>The analogy I want to draw requires a brief detour into general relativity.</p>
<p>In 1988, Michael Morris and Kip Thorne published a paper with the
unpromising title &ldquo;Wormholes in spacetime and their use for interstellar
travel: A tool for teaching general relativity&rdquo; <a href="#ref-2">[2]</a>.
It is, in the field&rsquo;s understated way, one of the more consequential papers
in the subject. Morris and Thorne asked: what would a traversable wormhole —
one you could actually pass through — require, physically and mathematically?</p>
<p>The spacetime metric of a traversable wormhole in their formulation is:</p>
$$ds^2 = -e^{2\Phi(r)}\,dt^2 + \frac{dr^2}{1 - b(r)/r} + r^2\,d\Omega^2$$<p>where $\Phi(r)$ is the redshift function and $b(r)$ is the shape function.
The throat of the wormhole sits at $r = r_0$, where $b(r_0) = r_0$.
For anything to pass through in finite proper time, $\Phi$ must remain
finite — no infinite redshift — and $b(r)/r$ must remain less than one
away from the throat.</p>
<p>So far this is just geometry. The physics enters through the Einstein field
equations, which connect the geometry to the matter and energy present.
To maintain the wormhole throat against collapse — to hold it open — the
stress-energy tensor of whatever matter fills the throat must satisfy:</p>
$$T_{\mu\nu}\, k^\mu k^\nu < 0$$<p>for null vectors $k^\mu$ — what is called a <em>violation of the null energy
condition</em>. In plain terms: the matter holding the wormhole open must have
negative energy density. Not small energy density. <em>Negative</em> — less than
nothing.</p>
<p>This is exotic matter. It does not appear in any tabletop experiment.
Classical general relativity does not rule it out, but it does not provide
it either.</p>
<p>Quantum mechanics is slightly more helpful: the Casimir effect produces
measurable negative energy density between closely spaced conducting plates.
The Hawking radiation calculation involves transient negative energy near
black hole horizons. So quantum field theory permits negative energy — in
principle. But Ford and Roman <a href="#ref-3">[3]</a> showed that quantum field theory also
strictly <em>limits</em> it: the integrated negative energy over any region is
bounded by a quantum inequality. The shorter the burst of negative energy,
the smaller it must be; the larger the region, the more constrained the
magnitude. The result is that any realistic traversable wormhole would be
either Planck-scale (far too small for anything but quantum information to
traverse) or would require negative energy concentrated in a band many
orders of magnitude thinner than the throat itself — an engineering
requirement that borders on the physically absurd.</p>
<p>The wormhole, in other words, does something structurally similar to the
monitoring process in Wegner&rsquo;s model: the condition required to make it
traversable actively resists being satisfied. The geometry that would allow
passage tends toward collapse. The more you want the wormhole to be open
and stable, the more the energy conditions conspire against you.</p>
<hr>
<h2 id="what-the-2022-wormhole-actually-was">What the 2022 &ldquo;Wormhole&rdquo; Actually Was</h2>
<p>In late 2022, a team including Daniel Jafferis, Alexander Zlokapa, and
colleagues at Caltech and Google published a paper in <em>Nature</em> with the
title &ldquo;Traversable wormhole dynamics on a quantum processor&rdquo; <a href="#ref-4">[4]</a>. Several major news outlets reported that scientists had
created a wormhole. This was not accurate.</p>
<p>What the team actually did was implement a quantum circuit on Google&rsquo;s
Sycamore processor that simulates the Sachdev-Ye-Kitaev (SYK) model —
a quantum mechanical system of randomly interacting fermions that is
holographically dual, via Maldacena&rsquo;s AdS/CFT correspondence, to a
nearly two-dimensional anti-de Sitter black hole geometry. Two coupled
SYK systems are dual to a two-sided eternal black hole, which is connected
in the bulk by an Einstein-Rosen bridge — a wormhole.</p>
<p>By coupling the two systems with a specific negative coupling (which
corresponds, via ER=EPR, to injecting negative energy into the wormhole),
the team made the bridge traversable in the holographic sense: information
encoded in one quantum system propagated and was recovered in the other,
consistent with traversal of the dual gravitational wormhole.</p>
<p>This is genuinely interesting physics. It is not a wormhole through our
spacetime. The wormhole lives in the holographic dual geometry — a
mathematical construct in a lower-dimensional theory of gravity, not a
tunnel between two points in the universe you inhabit. Quantum teleportation
occurred on a quantum chip via the ordinary mechanism of quantum
entanglement. The gravitational language is a description of the
same physics in a dual frame, not a shortcut through space.</p>
<p>The media confusion is itself instructive: &ldquo;wormhole&rdquo; has drifted far from
its original meaning. In current physics, the word can refer to a
Morris-Thorne traversable tunnel through spacetime, to the Einstein-Rosen
bridge of an eternal black hole, to a holographic dual of quantum
entanglement <a href="#ref-5">[5]</a>, or to saddle points in the
Euclidean gravitational path integral relevant to the black hole information
paradox. These are related by mathematics but quite different in what they
physically represent. None of the last three are traversable shortcuts
through the universe. The first is, in principle, but barely, and only at
the cost of exotic matter physics that nobody knows how to achieve.</p>
<p>The harder physicists have worked to make the wormhole genuinely traversable
and macroscopic, the more the mathematics has resisted. This is, at minimum,
a suggestive pattern.</p>
<hr>
<h2 id="what-2025-added">What 2025 Added</h2>
<p>The field did not stand still after 2022. Three independent lines of work
published in 2024 and 2025 have further complicated what a wormhole is —
and in each case the complication pushes in the same direction: the geometry
keeps refusing to be a shortcut.</p>
<p><strong>The wormhole that does not connect two things.</strong> Maloney, Meruliya, and Van Raamsdonk <a href="#ref-7">[7]</a> showed that Euclidean wormholes — saddle points in
the gravitational path integral — appear generically in ordinary
higher-dimensional gravity, without any special setup. The striking
implication is that these wormholes do not bridge two separate universes
or two separate theories; they encode statistical fluctuations <em>within a
single theory</em>. The replica wormholes that resolved the Page curve for
black hole radiation — one of the central recent results in the black hole
information paradox — are of this type. The wormhole is not a connection
between two things. It is a feature of how the theory sums over histories,
a bookkeeping structure for correlations within one system. The physical
picture of two mouths joined by a throat does not apply.</p>
<p><strong>The wormhole that is not smooth.</strong> Magán, Sasieta, and Swingle <a href="#ref-8">[8]</a> studied the interior geometry of the Einstein-Rosen bridge connecting
typical entangled black holes — the configuration that is supposed, under
ER=EPR, to be the gravitational dual of quantum entanglement. Their result,
published in <em>Physical Review Letters</em>, is that this interior is not a
smooth tunnel. It is long, irregular, and chaotic — an Einstein-Rosen
caterpillar, as they call it. The quantum randomness of the entangled state
maps directly onto geometric disorder in the interior: the more thermalized
the state, the more disordered the bridge. A traversing observer, if one
could exist, would not glide through a clean throat. They would navigate a
geometry shaped by quantum chaos, growing longer and more disordered as
the system evolves. This is ER=EPR taken seriously at the level of typical
states rather than special ones, and the result is inhospitable to any
ordinary notion of passage.</p>
<p><strong>The wormhole that is not a tunnel at all.</strong> Gaztañaga, Kumar, and Marto <a href="#ref-9">[9]</a> proposed a more radical reinterpretation: the Einstein-Rosen bridge,
they argue, is not a connection between two separate spaces but a
representation of time-reversal symmetry within a single quantum description.
On this reading, there is only one space, and the bridge is an artefact
of how you describe the time-symmetric structure of the quantum state. The
paper, published in <em>Classical and Quantum Gravity</em>, attracted considerable
press coverage. It sits somewhat outside the mainstream of holographic
quantum gravity research, and the proposal has not yet been widely
integrated into the community&rsquo;s working framework — the language of two
entangled systems and a connecting geometry remains the dominant picture
in AdS/CFT calculations. But the direction it points is consistent with
the other two results.</p>
<p>Taken together, these papers suggest that the word &ldquo;wormhole&rdquo; has been
quietly revised from a noun into an adjective. Not a thing that exists
somewhere, but a property of certain mathematical structures — one that
describes correlation, disorder, or symmetry depending on which context
you are working in. Each attempt to pin down what a wormhole <em>is</em> in
practice finds something less traversable, less connected, and less
tunnel-like than the previous attempt.</p>
<p>This is, to put it plainly, consistent with the theme of this article.</p>
<hr>
<h2 id="causation-eating-its-own-tail">Causation Eating Its Own Tail</h2>
<p>The wormhole&rsquo;s physical problems become even sharper when you add time.
A traversable wormhole connecting two different spacetime regions can in
principle connect not just two different places but two different <em>times</em> —
creating a closed timelike curve (CTC), a path through spacetime that loops
back on itself. You leave on Tuesday and arrive last Thursday.</p>
<p>The standard paradoxes then apply. The grandfather paradox: you travel back
in time, prevent an event that was a necessary precondition of your journey.
The causal chain that produced the journey destroys the causal chain that
produced the journey. The bootstrap paradox: an object or piece of
information exists with no origin — passed back in time repeatedly, it has
always already existed, created by nothing, caused by itself.</p>
<p>Friedman, Morris, Novikov and colleagues formalised what has become known
as the Novikov self-consistency principle: the only physically admissible
solutions are those in which the causal structure is globally consistent <a href="#ref-6">[6]</a>. No grandfather paradox — not because you cannot
go back, but because if you do, it turns out you were always part of the
causal chain you thought you were disrupting. The time-traveller cannot
prevent an event; they can only be the mechanism by which it occurred.</p>
<p>This is not resolution. It is constraint. The universe selects only the
self-consistent loops, filtering out everything else. The causal structure
enforces a particular kind of conservatism: only actions that were always
going to happen can happen. There is no freedom in a closed timelike curve.
Trying to change the loop from inside it is exactly like trying to relax
by monitoring whether you have relaxed: the mechanism of change is part
of the thing you are trying to change.</p>
<hr>
<h2 id="rick-sanchezs-particular-problem">Rick Sanchez&rsquo;s Particular Problem</h2>
<p>Rick and Morty is, among other things, a sustained meditation on
this structure — without ever calling it that.</p>
<p>Rick Sanchez is the smartest being in every universe. His portal gun
creates traversable wormholes instantaneously and at negligible energy
cost, which is exactly what general relativity and quantum field theory
suggest should be impossible. The show waves this away; what it does not
wave away is the <em>psychological</em> consequence of Rick&rsquo;s capability.</p>
<p>Rick has thought his way to the conclusion that nothing matters. Infinite
universes, infinite timelines, infinite Ricks: every moment is replaceable,
every loss is recoverable somewhere else, every moral weight dissolves
in the face of the combinatorial enormity of everything that exists. This
is Rick&rsquo;s version of relaxation — the nihilism that should follow from
taking the multiverse seriously.</p>
<p>But the monitoring process runs. Rick checks whether he has achieved
not-caring, finds that he cares (about Morty, about Beth, about being
the smartest one in the room), and the caring becomes more vivid for
having been suppressed. His nihilism is not peace. It is a performance of
peace that is constantly undermined by the monitoring process watching
for cracks.</p>
<p>Rick&rsquo;s portal gun solves every spatial and temporal problem. It does not
solve the ironic process. No level of intelligence, and no number of
traversable wormholes, provides a shortcut past Wegner&rsquo;s monitor.
This is, I think, what makes the character work: the show&rsquo;s impossible
physics is the premise, but the <em>actually</em> impossible thing — the one the
show treats as genuinely intractable — is the psychological paradox.</p>
<hr>
<h2 id="the-common-structure">The Common Structure</h2>
<p>These cases — the relaxation paradox, the traversable wormhole, the closed
timelike curve — share a formal structure.</p>
<p>In each case, there is a desired end state (relaxation, passage through the
wormhole, a changed past) and a mechanism for pursuing it (effortful
monitoring, exotic matter, time travel). In each case, the mechanism
required to pursue the end state is incompatible with the end state itself.
The monitoring process that tracks &ldquo;am I relaxed?&rdquo; is the activity of not
being relaxed. The exotic matter that holds the wormhole open is the
physical condition that makes the geometry so extreme that traversal is
barely possible. The attempt to change the past is always already part
of the past you were trying to change.</p>
<p>The physicist&rsquo;s version of this is the quantum measurement problem: the act
of observing a system disturbs it. The observer cannot step outside the
measurement. The psychologist&rsquo;s version is the ironic process. The
relativist&rsquo;s version is the closed timelike curve. The narrative version
is Rick Sanchez.</p>
<hr>
<h2 id="what-actually-works">What Actually Works</h2>
<p>Wegner&rsquo;s answer to the ironic process is not to try harder with the
operating system. It is to release the monitoring system — to stop checking
whether the goal has been achieved. This is the core insight behind
Acceptance and Commitment Therapy: you cannot think your way to not-thinking.
The goal of not-thinking requires not-monitoring, which means not having the
goal in the active, effortful sense at all.</p>
<p>This is harder than it sounds. It is a second-order intervention: instead
of trying to relax, you try to stop trying to relax — which, done badly,
just adds another monitoring process. But done well, it is the correct
diagnosis: the category error was treating relaxation as an effortful goal
in the first place.</p>
<p>For wormholes, the physics community has arrived at a related answer. The
question &ldquo;how do we make a macroscopic traversable wormhole in our
spacetime?&rdquo; may be the wrong question. The ER=EPR framework suggests that
wormholes and quantum entanglement are two descriptions of the same thing.
The question is not how to build a tunnel; it is what the entanglement
structure of spacetime already is, and how information is already being
transferred through it. The shortcut was never a shortcut. It was always
just the ordinary geometry of entangled quantum systems, described in
a language that made it look exotic.</p>
<p>For Rick Sanchez, the show has not found an answer. Which is, probably,
the correct narrative decision.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Wegner, D. M. (1994). Ironic processes of mental control. <em>Psychological Review</em>, 101(1), 34–52. <a href="https://doi.org/10.1037/0033-295X.101.1.34">https://doi.org/10.1037/0033-295X.101.1.34</a></p>
<p><span id="ref-2"></span>[2] Morris, M. S., &amp; Thorne, K. S. (1988). Wormholes in spacetime and their use for interstellar travel: A tool for teaching general relativity. <em>American Journal of Physics</em>, 56(5), 395–412. <a href="https://doi.org/10.1119/1.15620">https://doi.org/10.1119/1.15620</a></p>
<p><span id="ref-3"></span>[3] Ford, L. H., &amp; Roman, T. A. (1996). Quantum field theory constrains traversable wormhole geometries. <em>Physical Review D</em>, 53(10), 5496–5507. <a href="https://doi.org/10.1103/PhysRevD.53.5496">https://doi.org/10.1103/PhysRevD.53.5496</a></p>
<p><span id="ref-4"></span>[4] Jafferis, D., Zlokapa, A., Lykken, J. D., Kolchmeyer, D. K., Davis, S. I., Lauk, N., Neven, H., &amp; Spiropulu, M. (2022). Traversable wormhole dynamics on a quantum processor. <em>Nature</em>, 612, 51–55. <a href="https://doi.org/10.1038/s41586-022-05424-3">https://doi.org/10.1038/s41586-022-05424-3</a></p>
<p><span id="ref-5"></span>[5] Maldacena, J., &amp; Susskind, L. (2013). Cool horizons for entangled black holes. <em>Fortschritte der Physik</em>, 61(9), 781–811. <a href="https://doi.org/10.1002/prop.201300020">https://doi.org/10.1002/prop.201300020</a></p>
<p><span id="ref-6"></span>[6] Friedman, J., Morris, M. S., Novikov, I. D., Echeverria, F., Klinkhammer, G., Thorne, K. S., &amp; Yurtsever, U. (1990). Cauchy problem in spacetimes with closed timelike curves. <em>Physical Review D</em>, 42(6), 1915–1930. <a href="https://doi.org/10.1103/PhysRevD.42.1915">https://doi.org/10.1103/PhysRevD.42.1915</a></p>
<p><span id="ref-7"></span>[7] Maloney, A., Meruliya, V., &amp; Van Raamsdonk, M. (2025). arXiv:2503.12227. <a href="https://arxiv.org/abs/2503.12227">https://arxiv.org/abs/2503.12227</a></p>
<p><span id="ref-8"></span>[8] Magán, J. M., Sasieta, M., &amp; Swingle, B. (2025). Einstein-Rosen caterpillar. <em>Physical Review Letters</em>, 135. <a href="https://doi.org/10.1103/btw6-44ry">https://doi.org/10.1103/btw6-44ry</a></p>
<p><span id="ref-9"></span>[9] Gaztañaga, E., Kumar, A., &amp; Marto, J. (2025). <em>Classical and Quantum Gravity</em>. <a href="https://doi.org/10.1088/1361-6382/ae3044">https://doi.org/10.1088/1361-6382/ae3044</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>A Christmas Star (Minus the Star, Plus a Moon Nobody Asked For)</title>
      <link>https://sebastianspicker.github.io/posts/the-gift-of-transits/</link>
      <pubDate>Thu, 25 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/the-gift-of-transits/</guid>
      <description>A browser-based simulator for exoplanet transit photometry, binary eclipses, and exomoon scenarios — built with Kepler integrators, limb darkening, and N-body dynamics. I spent Christmas on this. You&amp;rsquo;re welcome, science.</description>
      <content:encoded><![CDATA[<h2 id="summary">Summary</h2>
<p>&lsquo;Tis the season to stare at light curves. While most people were unwrapping
presents on December 25th, I was staring at synthetic flux drop-offs and
debugging a limb-darkening model. The result is a browser-based simulator for
exoplanet transit photometry, including binary eclipses and exomoon scenarios.
It does not detect any real exomoons. It does, however, correctly model why
detecting them is comically hard.</p>
<p>Source: <a href="https://github.com/sebastianspicker/exoplanet-exomoon-simulation">sebastianspicker/exoplanet-exomoon-simulation</a></p>
<hr>
<h2 id="background">Background</h2>
<h3 id="the-gift-of-transits">The gift of transits</h3>
<p>When a planet crosses in front of its host star, the observed stellar flux
drops by a fraction proportional to the ratio of their projected areas:</p>
\[
  \delta = \left(\frac{R_p}{R_\star}\right)^2
\]<p>For Jupiter transiting the Sun, \( \delta \approx 1\% \). For Earth, about
84 ppm. For an exomoon orbiting a Jupiter-sized planet — well, unwrap that
calculation yourself:</p>
\[
  \delta_m = \left(\frac{R_m}{R_\star}\right)^2
\]<p>A Moon-sized exomoon around a Sun-like star contributes roughly 7 ppm of flux
variation. The <em>Kepler</em> space telescope&rsquo;s photometric precision was on the order
of 20–30 ppm per 6-hour cadence for bright targets. Ho ho hold on — that signal
is buried.</p>
<h3 id="why-stars-are-not-uniformly-bright-and-why-that-ruins-everything">Why stars are not uniformly bright (and why that ruins everything)</h3>
<p>A star is not a flat disk of uniform intensity. It is darker at the limb than
at the centre — an effect called limb darkening — because the line of sight
through the stellar atmosphere is shallower at the edges, sampling cooler,
less emissive layers. The quadratic limb-darkening law is:</p>
\[
  I(\mu) = I_0 \left[1 - u_1(1 - \mu) - u_2(1 - \mu)^2\right]
\]<p>where \( \mu = \cos\theta \) is the cosine of the angle from disk centre, and \( u_1, u_2 \)
are stellar-type-dependent coefficients. This matters for transit modelling
because the depth of the light curve dip changes as the planet traverses from
limb to centre to limb — the transit is not a flat-bottomed box, it is a
rounded trough. Fitting it incorrectly biases \( R_p / R_\star \) and, more
critically for exomoon searches, generates false residuals that look
suspiciously like a secondary dip.</p>
<h3 id="exomoon-detection-the-indirect-approach">Exomoon detection: the indirect approach</h3>
<p>No exomoon has been unambiguously confirmed as of the time of writing this post
(Christmas Day, 2025 — yes, really). The most promising indirect signatures are:</p>
<p><strong>Transit Timing Variations (TTV).</strong> The planet–moon system orbits their
common barycentre. This causes the planet&rsquo;s transit to arrive slightly early
or late relative to a pure Keplerian ephemeris. The timing offset scales as:</p>
\[
  \delta t \approx \frac{m_m}{m_p} \cdot \frac{a_m}{v_p}
\]<p>where \( m_m / m_p \) is the moon-to-planet mass ratio, \( a_m \) is the
moon&rsquo;s semi-major axis around the planet, and \( v_p \) is the planet&rsquo;s
orbital velocity. For an Earth-mass moon at 10 planetary radii around a
Jupiter-mass planet at 1 AU, this is on the order of minutes — measurable, in
principle, with long baselines.</p>
<p><strong>Transit Duration Variations (TDV).</strong> The same barycentre wobble modulates
the planet&rsquo;s velocity along the line of sight at transit ingress, changing
transit duration. TDV and TTV are 90° out of phase, which lets you solve for
both moon mass and orbital radius given enough transits.</p>
<p>Neither signal is clean in practice. Stellar activity, instrument systematics,
and other planets in the system all contribute correlated noise at similar
timescales. The residuals of the best exomoon candidate to date — Kepler-1625b-i
(Teachey &amp; Kipping, 2018) — remain contested. <em>Season&rsquo;s readings: disputed.</em></p>
<hr>
<h2 id="the-simulation">The Simulation</h2>
<h3 id="what-it-actually-does">What it actually does</h3>
<p>The simulator is a TypeScript application (Vite build, runs in-browser) built
around a deterministic, SI-unit physics core. The main pipeline:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">UI parameters
</span></span><span class="line"><span class="cl">  → V4 normalisation
</span></span><span class="line"><span class="cl">  → runtime creation  (realtime | reference)
</span></span><span class="line"><span class="cl">  → orbital integrator (Kepler + N-body)
</span></span><span class="line"><span class="cl">  → geometry &amp; photometry
</span></span><span class="line"><span class="cl">  → flux decomposition
</span></span><span class="line"><span class="cl">  → canvas render + plots
</span></span></code></pre></div><p>Two runtime modes:</p>
<ul>
<li><strong>Realtime</strong> — fast integrator, interactive rendering, good for exploration</li>
<li><strong>Reference</strong> — high-fidelity integrator, deterministic export, good for
sanity-checking against known systems</li>
</ul>
<p>The photometry layer computes quadratic limb-darkened transit flux, handles
binary eclipse geometry (for eclipsing binary configurations), and exposes
hooks for phase curves and instrument noise.</p>
<p>The <strong>diagnostics layer</strong> is the part I find most useful: energy conservation
checks across the integration, radial velocity time series, astrometry, and
transit timing outputs. If your N-body integrator is drifting, the energy
plot tells you immediately.</p>
<p>The repo ships a <code>real-systems.snapshot.json</code> with versioned data from the
NASA Exoplanet Archive — so you can load, e.g., TRAPPIST-1 or HD 209458
as a starting configuration.</p>
<h3 id="what-it-deliberately-does-not-do">What it deliberately does not do</h3>
<p>The relativistic corrections are approximations. This is not a GR integrator.
For the systems it is designed for (short-period planets around Sun-like stars),
the relativistic perihelion precession is tiny — Mercury&rsquo;s 43 arcseconds per
century is the canonical example and that is already a demanding target — but
for millisecond pulsars or extremely compact binaries, do not trust it.</p>
<p>The atmospheric module exposes hooks but is not a radiative-transfer solver.
If you want realistic transmission spectra, point yourself at something like
petitRADTRANS and use this for the orbital geometry only.</p>
<hr>
<h2 id="discussion">Discussion</h2>
<p>The simulation is educational in intent — hence the built-in didactic mode
(black-box exploration → hypothesis → reveal → A/B comparison → rubric
scoring). But the physics is not dumbed down: the limb darkening is real, the
N-body integrator tracks multi-body gravitational interactions, and the TTV
outputs are computed from first principles rather than parameterised fits.</p>
<p>The thing I kept running into while building this is how much of exomoon
detection reduces to a residuals-hunting problem. You fit the best planet-only
model you can, examine the timing and duration residuals, and look for a
coherent signal. The simulator lets you inject a synthetic exomoon of specified
mass and orbital radius, generate synthetic light curves with configurable
noise, and see what the residuals look like — which is exactly the kind of
intuition-building exercise that is tedious to set up from scratch with, say,
a raw BATMAN lightcurve model and a custom integrator.</p>
<p><strong>Limitations worth being honest about.</strong> The performance budget is real:
some effects are profile-gated to keep the interactive mode responsive, which
means the reference mode exists specifically for cases where you want the full
physics at the cost of speed. For a publication-quality simulation you would
want a dedicated N-body code (REBOUND is the obvious choice), not a browser
runtime. This is a tool for understanding the problem, not for writing papers
about it — which, fitting for a Christmas project, is exactly what I have time
for right now.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Teachey, A. &amp; Kipping, D. M. (2018). <strong>Evidence for a large exomoon orbiting
Kepler-1625b.</strong> <em>Science Advances</em>, 4(10).
<a href="https://arxiv.org/abs/1810.02362">https://arxiv.org/abs/1810.02362</a></p>
</li>
<li>
<p>Kipping, D. M. (2009). <strong>Transit timing effects due to an exomoon.</strong>
<em>MNRAS</em>, 392(1), 181–189.
<a href="https://arxiv.org/abs/0810.2243">https://arxiv.org/abs/0810.2243</a></p>
</li>
<li>
<p>Mandel, K. &amp; Agol, E. (2002). <strong>Analytic light curves for planetary transit
searches.</strong> <em>ApJL</em>, 580, L171.
<a href="https://arxiv.org/abs/astro-ph/0210099">https://arxiv.org/abs/astro-ph/0210099</a></p>
</li>
<li>
<p>Claret, A. (2000). <strong>A new non-linear limb-darkening law for LTE stellar
atmosphere models.</strong> <em>A&amp;A</em>, 363, 1081–1190.</p>
</li>
</ul>
<hr>
<p><em>Merry Christmas. If you came here expecting warmth and cheer, I offer instead
a synthetic light curve with a 7 ppm exomoon signal buried in 30 ppm of
photon noise. Practically the same thing.</em></p>
<hr>
<p><em>For the physical version of this — a lamp, a ball, and a smartphone measuring
real transit light curves in a classroom — see
<a href="/posts/exoplanet-hunting-smartphones/">Hunting Exoplanets with Your Phone</a>.
For context on where those experiments came from, see
<a href="/posts/astro-lab-at-home/">The Lab Goes Home</a>.</em></p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-03-05</strong>: Corrected the description of the limb-darkening variable from &ldquo;$\mu = \cos\theta$ is the angle from disk centre&rdquo; to &ldquo;$\mu = \cos\theta$ is the cosine of the angle from disk centre.&rdquo; $\theta$ is the angle; $\mu$ is its cosine.</li>
<li><strong>2026-03-05</strong>: Corrected Claret (2000) page range from 1081–1090 to 1081–1190. The paper contains extensive tables of limb-darkening coefficients spanning 109 pages.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Golden Bead Cube Weighs One Kilogram</title>
      <link>https://sebastianspicker.github.io/posts/bruner-montessori-ipad-embodied-learning/</link>
      <pubDate>Thu, 11 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/bruner-montessori-ipad-embodied-learning/</guid>
      <description>Bruner&amp;rsquo;s enactive stage and Montessori&amp;rsquo;s materials both understand that abstract concepts must be grounded in physical experience before symbols can carry weight. The touchscreen skips that stage entirely — and the learning data are beginning to show it.</description>
      <content:encoded><![CDATA[<h2 id="summary">Summary</h2>
<p>Jerome Bruner argued in 1964 that concepts must be traversed in three stages: enactive (bodily
action), iconic (image), symbolic (language and notation). The order is not a preference — it is a
developmental logic. Symbols that arrive before their sensorimotor grounding are thin; they may
produce correct test performance while leaving the concept unrooted.</p>
<p>Maria Montessori, working fifty years before anyone had the vocabulary of embodied cognition,
designed learning materials that implement Bruner&rsquo;s sequence with unusual precision. The Golden
Bead cube for &ldquo;one thousand&rdquo; is about the size of a large fist and weighs roughly one kilogram.
You cannot represent &ldquo;one thousand&rdquo; on a tablet screen in a way that competes with carrying that
weight across a room ten times.</p>
<p>This post is about what embodied cognition research tells us, why Montessori implements it
correctly, and what we are giving up when we substitute glass surfaces for physical materials.</p>
<h2 id="bruners-three-modes">Bruner&rsquo;s Three Modes</h2>
<p>Jerome Bruner proposed in a 1964 paper and the subsequent book <em>Toward a Theory of Instruction</em>
(<a href="#ref-bruner1964">Bruner, 1964</a>; <a href="#ref-bruner1966">1966</a>) that knowledge is represented in three
distinct, developmentally ordered modes:</p>
<p><strong>Enactive</strong>: Knowledge encoded in action patterns. You know how to ride a bicycle; you cannot
fully describe it in words; the knowledge is in your body. An infant knows what &ldquo;cup&rdquo; means
because she has grasped cups hundreds of times — before she has the word.</p>
<p><strong>Iconic</strong>: Knowledge encoded in images or perceptual representations. You can visualise the
route without navigating it. You recognize a melody without playing it.</p>
<p><strong>Symbolic</strong>: Knowledge encoded in language or other arbitrary symbol systems. The numeral &ldquo;7&rdquo;
has no visual resemblance to seven objects. Its meaning is purely conventional and rule-governed.</p>
<p>The developmental sequence matters. A child who acquires a symbol before the underlying enactive
and iconic representations are established has a label without a referent. She can produce the
word or numeral correctly — and her understanding of it is correspondingly brittle. Transfer to
novel contexts is poor; the concept does not generalise.</p>
<p>This is not a fringe view. It is the core claim of embodied cognition research, which has spent
thirty years producing experimental evidence for it.</p>
<h2 id="what-embodied-cognition-actually-shows">What Embodied Cognition Actually Shows</h2>
<p>Lawrence Barsalou&rsquo;s 2008 review in <em>Annual Review of Psychology</em> is the canonical synthesis
(<a href="#ref-barsalou2008">Barsalou, 2008</a>). The central claim: cognition is not implemented in an
abstract, modality-free computational system separate from the body. Perception, action, and
interoception are constitutive of — not merely scaffolding for — conceptual thought. When you
think about &ldquo;lifting,&rdquo; the motor cortex activates. When you think about &ldquo;rough texture,&rdquo; the
somatosensory cortex activates. Concepts are grounded in the sensorimotor systems through which
they were originally experienced.</p>
<p>This has a direct pedagogical implication. If mathematical concepts are represented using
perceptual-motor simulation systems, then the quality of that simulation depends on the richness of
the founding sensorimotor experience. A child who has handled physical objects of different weights
has richer representational resources for arithmetic and measurement than one whose entire
numerical experience has occurred on a flat, weightless, textureless glass surface.</p>
<p>Arthur Glenberg and colleagues tested this experimentally. In a 2004 study, first- and
second-graders read short texts describing farm scenes (<a href="#ref-glenberg2004">Glenberg et al., 2004</a>).
Children who physically moved toy objects (horse, barn, fence) to enact the described events showed
dramatically better comprehension and inference performance than children who merely read and
re-read the passages. The effect size approached two standard deviations in some conditions.
Children who <em>imagined</em> moving the objects also improved, but less than those who actually moved
them. The physical action was not decorative. It was causally relevant to understanding.</p>
<p>Glenberg extended this logic to arithmetic word problems (<a href="#ref-glenberg2008">Glenberg, 2008</a>).
Children who physically manipulated objects while working through problems were better at
identifying what was relevant and computing correct answers. The enactive engagement was improving
not just memory of the text but <em>mathematical reasoning</em>.</p>
<h2 id="montessori-got-there-first">Montessori Got There First</h2>
<p>Maria Montessori opened the Casa dei Bambini on 6 January 1907 in a San Lorenzo tenement in Rome,
enrolling approximately fifty children aged two to seven. She had no Barsalou. She had no Glenberg.
She had children, materials, and the patience to watch what happened when children were allowed to
choose their own work.</p>
<p>What she built was a pedagogical system that implements the Bruner sequence without exception.</p>
<p><strong>The Golden Bead Material</strong> is the canonical example. Units: single glass beads. Tens: ten beads
wired into a bar. Hundreds: ten bars wired into a flat square. Thousands: ten squares wired into a
cube. The child can hold a unit bead between two fingers. She needs two hands to lift the thousand
cube. The physical weight scales with place value. She experiences — proprioceptively — that &ldquo;one
thousand&rdquo; is categorically heavier and larger than &ldquo;one hundred&rdquo; before she has seen the numeral
or heard the word &ldquo;thousands place.&rdquo;</p>
<p><strong>The Knobbed Cylinder Blocks</strong> illustrate a different principle. Four wooden blocks, each
containing ten cylinders varying in height, diameter, or both. The child removes all cylinders and
replaces them. If any cylinder goes into the wrong socket, the remaining cylinders will not all
fit. The task cannot be completed incorrectly and left that way. Error control is mechanical,
built into the material. The teacher need not intervene. The child corrects herself, alone, through
the physical feedback of the materials.</p>
<p>Montessori called this <em>controllo dell&rsquo;errore</em> — control of error. It is one of her most
important insights: if the feedback is physical, the child internalises the standard rather than
depending on external evaluation. The authority is in the material, not in the adult&rsquo;s judgment.</p>
<p>The evidence that this works has accumulated across more than a century. Angeline Lillard and
Nicole Else-Quest published a landmark study in <em>Science</em> in 2006, using a lottery-based
design: children who had won a lottery to attend public Montessori schools
compared with those who had not (<a href="#ref-lillard2006">Lillard &amp; Else-Quest, 2006</a>). Montessori
five-year-olds showed significantly higher letter-word identification, phonological decoding, and
applied mathematical problem-solving. The lottery controlled for family self-selection.</p>
<p>A 2025 national randomised controlled trial — 588 children across 24 public Montessori schools,
with lottery-based assignment — found significant advantages in reading, short-term memory,
executive function, and social understanding at the end of kindergarten, with effect sizes
exceeding 0.2 SD (<a href="#ref-lillard2025">Lillard et al., 2025</a>). These are not small effects for
field-based school research. And the costs per child were lower than conventional programmes.</p>
<h2 id="korczak-and-the-right-to-make-mistakes">Korczak and the Right to Make Mistakes</h2>
<p>Janusz Korczak ran an orphanage in Warsaw and wrote <em>How to Love a Child</em> in 1919
(<a href="#ref-korczak1919">Korczak, 1919</a>) and <em>The Child&rsquo;s Right to Respect</em> in 1929
(<a href="#ref-korczak1929">Korczak, 1929</a>). His central argument was that children are not pre-adults —
they are persons with full moral status and a right to their own experience, including the
experience of making mistakes.</p>
<p>In August 1942 German soldiers came to his orphanage. Korczak was offered false papers, safe
houses, multiple escape routes arranged by friends and admirers. He refused each time. He led
approximately 192 children and staff to the Umschlagplatz and did not return.</p>
<p>I mention Korczak not as an appeal to emotion but because his argument is structurally connected
to Montessori&rsquo;s. If a child has moral status, she has the right to encounter the actual
consequences of her choices — including physical ones. A material that makes incorrect placement
physically impossible before the child has had the experience of trying and correcting is a
different kind of education from a screen that prevents error altogether through invisible software
constraints, or one that simply supplies the correct answer.</p>
<p>Error is information. Physical error is particularly rich information. Taking it away is not
protection — it is impoverishment.</p>
<h2 id="buber-what-a-screen-cannot-offer">Buber: What a Screen Cannot Offer</h2>
<p>Martin Buber&rsquo;s essay &ldquo;Education,&rdquo; delivered as an address in 1925 and published in <em>Between Man
and Man</em> (<a href="#ref-buber1947">Buber, 1947</a>), argues that genuine education requires what he calls an
I-Thou relation: an encounter in which the other is met as a whole, irreducible subject, not an
object to be managed.</p>
<p>A touchscreen is the paradigmatic I-It relation. It is smooth, frictionless, optimised for
engagement, responsive to exactly the touch it was designed to respond to. There is no otherness,
no resistance, no genuine encounter. The screen does not push back. The Knobbed Cylinder Block
does — literally. If you try to force a cylinder into the wrong socket, the material resists. That
resistance is not a flaw in the pedagogical design; it is the pedagogical design.</p>
<p>Buber also introduced the concept of <em>Umfassung</em> — inclusion — by which a teacher must
simultaneously stand at their own pole of the educational encounter and imaginatively experience
the pupil&rsquo;s side. A screen cannot do this. It has no pole. Its responsiveness is a simulation of
attention, not attention itself. Turkle&rsquo;s later phrase — &ldquo;simulated empathy is not empathy&rdquo; — is
the same argument in a different register.</p>
<h2 id="the-tablet-problem">The Tablet Problem</h2>
<p>The educational technology industry has produced an enormous quantity of &ldquo;educational apps&rdquo; for
young children. The research is beginning to catch up.</p>
<p>Kathy Hirsh-Pasek and colleagues identified four pillars that distinguish educational from merely
entertaining digital content: active engagement, depth of engagement, meaningful learning, and
social interactivity (<a href="#ref-hirshpasek2015">Hirsh-Pasek et al., 2015</a>). Reviewing commercially
available apps, they found that most fail on three or four of these criteria. They produce
interactions in the shallow sense — tapping, swiping — without the kind of self-directed,
goal-oriented, socially-embedded activity that drives genuine cognitive development.</p>
<p>A 2021 meta-analysis of 36 intervention studies found that educational apps produced meaningful
gains when measured by researcher-developed instruments targeting constrained skills (letter
naming, counting), but small to negligible effects on standardised achievement tests
(<a href="#ref-kim2021">Kim et al., 2021</a>). The apps teach what they teach. Transfer is limited.</p>
<p>By contrast, a 2023 scoping review of 102 studies found that physical manipulatives — block
building, shape sorting, paper folding, figurine play — showed consistent benefits across
mathematics, literacy, and science that transferred to standardised measures
(<a href="#ref-byrne2023">Byrne et al., 2023</a>).</p>
<p>The fundamental problem is haptic. A 2024 review of haptic technology in learning found that force
feedback and texture information substantially improve spatial reasoning, interest, and analytical
ability (<a href="#ref-hatira2024">Hatira &amp; Sarac, 2024</a>). Standard capacitive touchscreens — every
tablet your child has encountered — provide no force feedback and no texture differentiation.
Every object, regardless of its symbolic &ldquo;weight&rdquo; or &ldquo;size,&rdquo; feels identical under the fingertip.</p>
<p>The Golden Bead thousand cube weighs approximately one kilogram. You cannot represent that
experience on a tablet. The symbol arrives without the sensation, and Bruner&rsquo;s sequence is
violated from the first tap.</p>
<h2 id="what-we-should-ask">What We Should Ask</h2>
<p>The question is not whether tablets have educational uses — they clearly do, particularly for
older children working at the iconic and symbolic levels, and for content where direct physical
manipulation is impossible or dangerous. The question is whether we are using them in
developmental contexts where the enactive stage has not yet been established.</p>
<p>A child who has carried the thousand cube across a room, stacked the hundreds into the square, and
felt the weight difference in her hands has a different representation of place value from one who
has tapped numerals on a flat screen. Both may perform identically on a constrained test tomorrow.
Ask them a transfer question in six months and the difference will appear.</p>
<p>We are teaching children to operate symbols before giving them the physical experiences that make
those symbols mean anything. The result is not ignorance — the children can tap the correct numeral
— but brittleness. The concept is a label, not a root.</p>
<p>Montessori knew this. Bruner formalised it. The haptics literature is now confirming it
experimentally. The difficult question is why we are still buying flat glass rectangles for
classrooms when a box of wooden cylinders costs less and works better.</p>
<h2 id="references">References</h2>
<ul>
<li><span id="ref-bruner1964"></span>Bruner, J. S. (1964). The course of cognitive growth. <em>American Psychologist</em>, 19(1), 1–15.</li>
<li><span id="ref-bruner1966"></span>Bruner, J. S. (1966). <em>Toward a Theory of Instruction</em>. Harvard University Press (Belknap Press).</li>
<li><span id="ref-barsalou2008"></span>Barsalou, L. W. (2008). Grounded cognition. <em>Annual Review of Psychology</em>, 59, 617–645. <a href="https://doi.org/10.1146/annurev.psych.59.103006.093639">DOI: 10.1146/annurev.psych.59.103006.093639</a></li>
<li><span id="ref-glenberg2004"></span>Glenberg, A. M., Gutierrez, T., Levin, J. R., Japuntich, S., &amp; Kaschak, M. P. (2004). Activity and imagined activity can enhance young children&rsquo;s reading comprehension. <em>Journal of Educational Psychology</em>, 96(3), 424–436. <a href="https://doi.org/10.1037/0022-0663.96.3.424">DOI: 10.1037/0022-0663.96.3.424</a></li>
<li><span id="ref-glenberg2008"></span>Glenberg, A. M. (2008). Embodiment for education. In P. Calvo &amp; A. Gomila (Eds.), <em>Handbook of Cognitive Science: An Embodied Approach</em> (pp. 355–371). Elsevier.</li>
<li><span id="ref-lillard2006"></span>Lillard, A. S., &amp; Else-Quest, N. (2006). The early years: Evaluating Montessori education. <em>Science</em>, 313(5795), 1893–1894. <a href="https://doi.org/10.1126/science.1132362">DOI: 10.1126/science.1132362</a></li>
<li><span id="ref-lillard2025"></span>Lillard, A. S., Loeb, D., Berg, J., Escueta, M., Manship, K., Hauser, A., &amp; Daggett, E. D. (2025). A national randomized controlled trial of the impact of public Montessori preschool at the end of kindergarten. <em>Proceedings of the National Academy of Sciences</em>, 122(43). <a href="https://doi.org/10.1073/pnas.2506130122">DOI: 10.1073/pnas.2506130122</a></li>
<li><span id="ref-korczak1919"></span>Korczak, J. (1919). <em>Jak kochać dziecko</em> [How to Love a Child]. Warsaw.</li>
<li><span id="ref-korczak1929"></span>Korczak, J. (1929). <em>Prawo dziecka do szacunku</em> [The Child&rsquo;s Right to Respect]. Warsaw.</li>
<li><span id="ref-buber1947"></span>Buber, M. (1947). <em>Between Man and Man</em> (trans. R. G. Smith). Kegan Paul. (Original German publication 1947; contains &ldquo;Education,&rdquo; address delivered 1925, and &ldquo;The Education of Character,&rdquo; address delivered 1939.)</li>
<li><span id="ref-hirshpasek2015"></span>Hirsh-Pasek, K., Zosh, J. M., Golinkoff, R. M., Gray, J. H., Robb, M. B., &amp; Kaufman, J. (2015). Putting education in &ldquo;educational&rdquo; apps: Lessons from the science of learning. <em>Psychological Science in the Public Interest</em>, 16(1), 3–34. <a href="https://doi.org/10.1177/1529100615569721">DOI: 10.1177/1529100615569721</a></li>
<li><span id="ref-kim2021"></span>Kim, J. S., Gilbert, J., Yu, Q., &amp; Gale, C. (2021). Measures matter: A meta-analysis of the effects of educational apps on preschool to grade 3 children&rsquo;s literacy and math skills. <em>AERA Open</em>, 7. <a href="https://doi.org/10.1177/23328584211004183">DOI: 10.1177/23328584211004183</a></li>
<li><span id="ref-byrne2023"></span>Byrne, E. M., Jensen, H., Thomsen, B. S., &amp; Ramchandani, P. G. (2023). Educational interventions involving physical manipulatives for improving children&rsquo;s learning and development: A scoping review. <em>Review of Education</em>, 11(2), e3400. <a href="https://doi.org/10.1002/rev3.3400">DOI: 10.1002/rev3.3400</a></li>
<li><span id="ref-hatira2024"></span>Hatira, A., &amp; Sarac, M. (2024). Touch to learn: A review of haptic technology&rsquo;s impact on skill development and enhancing learning abilities for children. <em>Advanced Intelligent Systems</em>, 6. <a href="https://doi.org/10.1002/aisy.202300731">DOI: 10.1002/aisy.202300731</a></li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-02-03</strong>: Changed &ldquo;lottery-based quasi-experimental design&rdquo; to &ldquo;lottery-based design&rdquo; for Lillard &amp; Else-Quest (2006). A lottery provides genuine random assignment; &ldquo;quasi-experimental&rdquo; implies the absence of randomisation, which is the opposite of what the lottery design achieved.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Constraining the Coding Agent: The Ralph Loop and Why Determinism Matters</title>
      <link>https://sebastianspicker.github.io/posts/ralph-loop/</link>
      <pubDate>Thu, 04 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ralph-loop/</guid>
      <description>In late 2025, agentic coding tools went from impressive demos to daily infrastructure. The problem nobody talked about enough: when an LLM agent has write access to a codebase and no formal constraints, reproducibility breaks down. The Ralph Loop is a deterministic, story-driven execution framework that addresses this — one tool call per story, scoped writes, atomic state. A design rationale with a formal sketch of why the constraints matter.</description>
      <content:encoded><![CDATA[<p><em>The repository is at
<a href="https://github.com/sebastianspicker/ralph-loop">github.com/sebastianspicker/ralph-loop</a>.
This post is the design rationale.</em></p>
<hr>
<h2 id="december-2025">December 2025</h2>
<p>It happened fast. In the twelve months before I am writing this, agentic
coding went from a niche research topic to the default mode for several
categories of software engineering task. Codex runs code in a sandboxed
container and submits pull requests. Claude Code works through a task list
in your terminal while you make coffee. Cursor&rsquo;s agent mode rewrites a
file, runs the tests, reads the failures, and tries again — automatically,
without waiting for you to press a button.</p>
<p>The demos are impressive. The production reality is messier.</p>
<p>The problem is not that these systems do not work. They work well enough,
often enough, to be genuinely useful. The problem is that &ldquo;works&rdquo; means
something different when an agent is executing than when a human is.
A human who makes a mistake can tell you what they were thinking.
An agent that produces a subtly wrong result leaves you with a diff and
no explanation. And an agent run that worked last Tuesday might not work
today, because the model changed, or the context window filled differently,
or the prompt-to-output mapping is, at bottom, a stochastic function.</p>
<p>This is the problem the Ralph Loop is designed to address: not &ldquo;make
agents more capable&rdquo; but &ldquo;make agent runs reproducible.&rdquo;</p>
<hr>
<h2 id="the-reproducibility-problem-formally">The Reproducibility Problem, Formally</h2>
<p>An LLM tool call is a stochastic function. Given a prompt $p$, the
model samples from a distribution over possible outputs:</p>
$$T : \mathcal{P} \to \Delta(\mathcal{O})$$<p>where $\mathcal{P}$ is the space of prompts, $\mathcal{O}$ is the space
of outputs, and $\Delta(\mathcal{O})$ denotes the probability simplex over
$\mathcal{O}$.</p>
<p>At temperature zero — the most deterministic setting most systems support —
this collapses toward a point mass:</p>
$$T_0(p) \approx \delta_{o^*}$$<p>where $o^*$ is the argmax token sequence. &ldquo;Approximately&rdquo; because hardware
non-determinism, batching effects, and floating-point accumulation mean
that even $T_0$ is not strictly reproducible across runs, environments, or
model versions.</p>
<p>A naive agentic loop composes these calls. If an agent takes $k$ sequential
tool calls to complete a task, the result is a $k$-fold composition:</p>
$$o_k = T(T(\cdots T(p_0) \cdots))$$<p>The variance does not merely add — it propagates through the dependencies.
Early outputs condition later prompts; a small deviation at step 2 can
shift the trajectory of step 5 substantially. This is not a theoretical
concern. It is the practical experience of anyone who has tried to reproduce
a multi-step agent run.</p>
<p>The Ralph Loop does not solve the stochasticity of $T$. What it does is
prevent the composition.</p>
<hr>
<h2 id="the-ralph-loop-as-a-state-machine">The Ralph Loop as a State Machine</h2>
<p>The system&rsquo;s state at any point in a run is a triple:</p>
$$\sigma = (Q,\; S,\; L)$$<p>where:</p>
<ul>
<li>$Q = (s_1, s_2, \ldots, s_n)$ is the ordered story queue — the PRD
(product requirements document) — with stories sorted by priority, then
by ID</li>
<li>$S \in \lbrace \texttt{open}, \texttt{passing}, \texttt{skipped} \rbrace^n$
is the status vector, one entry per story</li>
<li>$L \in \lbrace \texttt{free}, \texttt{held} \rbrace$ is the file-lock
state protecting $S$ from concurrent writes</li>
</ul>
<p>The transition function $\delta$ at each step is:</p>
<ol>
<li><strong>Select</strong>: $i^* = \min\lbrace i : S[i] = \texttt{open} \rbrace$ —
deterministic by construction, since $Q$ has a fixed ordering</li>
<li><strong>Build</strong>: $p = \pi(s_{i^*},\; \text{CODEX.md})$ — a pure function of
the story definition and the static policy document; no dependency on
previous tool outputs</li>
<li><strong>Execute</strong>: $o \sim T(p)$ — exactly one tool call, output captured</li>
<li><strong>Accept</strong>: $\alpha(o) \in \lbrace \top, \bot \rbrace$ — parse the
acceptance criterion (was the expected report file created at the
expected path?)</li>
<li><strong>Commit</strong>: if $\alpha(o) = \top$, set $S[i^*] \leftarrow \texttt{passing}$;
otherwise increment the attempt counter; write atomically under lock $L$</li>
</ol>
<p>The next state is $\sigma' = (Q, S', L)$ where $S'$ differs from $S$ in
exactly one position. The loop continues until no open stories remain or
a story limit $N$ is reached.</p>
<p><strong>Termination.</strong> Since $|Q| = n$ is finite, $S$ has at most $n$ open
entries, and each step either closes one entry or increments an attempt
counter bounded by $A_{\max}$, the loop terminates in at most
$n \cdot A_{\max}$ steps. Under the assumption that $T$ eventually
satisfies any reachable acceptance criterion — which is what CODEX.md&rsquo;s
constraints are designed to encourage — the loop converges in exactly $n$
successful transitions.</p>
<p><strong>Replay.</strong> The entire trajectory $\sigma_0 \to \sigma_1 \to \cdots \to
\sigma_k$ is determined by $Q$ and the sequence of tool outputs
$o_1, o_2, \ldots, o_k$. The <code>.runtime/events.log</code> records these
outputs. If tool outputs are deterministic, the run is fully deterministic.
If they are not — as in practice they will not be — the stochasticity is
at least isolated to individual steps rather than allowed to compound
across the chain.</p>
<hr>
<h2 id="the-one-tool-call-invariant">The One-Tool-Call Invariant</h2>
<p>The most important constraint in the Ralph Loop is also the simplest:
exactly one tool call per story attempt.</p>
<p>This is not the natural design. A natural agentic loop would let the model
plan, execute, observe, reflect, and re-execute within a single story.
Some frameworks call this &ldquo;inner monologue&rdquo; or &ldquo;chain-of-thought with tool
use.&rdquo; The model emits reasoning tokens, calls a tool, reads the result,
emits more reasoning, calls another tool, and eventually produces the
final output.</p>
<p>This is more capable for complex tasks. It is also what makes
reproducibility hard. Each additional tool call in the chain is a fresh
draw from $T$, conditioned on the previous outputs. After five tool calls,
the prompt for the fifth includes four previous outputs — each of which
varied slightly from the last run. The fifth output is now conditioned on
a different input.</p>
<p>Formally: let the multi-call policy use $k$ sequential calls per story.
Each call $c_j$ produces output $o_j \sim T(p_j)$, where
$p_j = f(o_1, \ldots, o_{j-1}, s_{i^*})$ for some conditioning function
$f$. The variance of the final output $o_k$ depends on the accumulated
conditioning:</p>
<p>$$\text{Var}(o_k) ;=; \text{Var}_{o_1}!\left[, \mathbb{E}[o_k \mid o_1] ,\right]</p>
<ul>
<li>\mathbb{E}_{o_1}!\left[, \text{Var}(o_k \mid o_1) ,\right]$$</li>
</ul>
<p>By the law of total variance, applied recursively, the total variance
decomposes into explained and residual components — conditioning
redistributes variance but does not eliminate the residual term. In a
well-designed, low-variance chain the residual may stay small; in
practice, LLM outputs have non-trivial variance at each step, and that
variance propagates through the conditioning chain.</p>
<p>The one-call constraint collapses $k$ to 1:</p>
$$o_i \sim T\!\bigl(\pi(s_i, \text{CODEX.md})\bigr)$$<p>The output depends only on the story definition and the static policy
document. Not on previous tool outputs. The stories are designed to be
atomic enough that one call is sufficient. If a story requires more, it
should be split into two stories in the PRD. This is a forcing function
toward better task decomposition, which I consider a feature rather than
a limitation.</p>
<hr>
<h2 id="scope-as-a-topological-constraint">Scope as a Topological Constraint</h2>
<p>In fixing mode, each story carries a <code>scope[]</code> field listing the files
or directories the agent is permitted to modify. The runner captures a
snapshot of the repository state before execution:</p>
$$F_{\text{before}} = \lbrace (f,\; h(f)) : f \in \text{repo} \rbrace$$<p>where $h(f)$ is a hash of the file contents. After the tool call:</p>
$$F_{\text{after}} = \lbrace (f,\; h(f)) : f \in \text{repo} \rbrace$$<p>The diff $\Delta = F_{\text{after}} \setminus F_{\text{before}}$ must
satisfy:</p>
$$\forall\, (f, \_) \in \Delta \;:\; f \in \text{scope}(s_{i^*})$$<p>This is a locality constraint on the filesystem graph: the agent&rsquo;s writes
are confined to the neighbourhood $\mathcal{N}(s_{i^*})$ defined by the
story&rsquo;s scope declaration. Writes that escape this neighbourhood are a
story failure, regardless of whether they look correct.</p>
<p>The motivation is containment. When a fixing agent makes a &ldquo;small repair&rdquo;
to one file but also helpfully tidies up three adjacent files it noticed
while reading, you have three undocumented changes outside the story&rsquo;s
intent. In a system with many stories running sequentially, out-of-scope
changes accumulate silently. The scope constraint prevents this.
Crucially, prompt instructions alone are not sufficient — an agent told
&ldquo;only modify files in scope&rdquo; can still modify out-of-scope files if the
instructions are interpreted loosely or the context is long. The runner
enforces scope at the file system level, after the fact, and that
enforcement cannot be argued with.</p>
<hr>
<h2 id="acceptance-criteria-grounding-evaluation-in-filesystem-events">Acceptance Criteria: Grounding Evaluation in Filesystem Events</h2>
<p>Each story&rsquo;s acceptance criterion is a single line of the form
<code>Created &lt;path&gt;</code> — the path where the report or output file should appear.</p>
<p>This is intentionally minimal. The alternative — semantic acceptance
criteria (&ldquo;did the agent identify all relevant security issues?&rdquo;) — would
require another model call to evaluate, reintroducing stochasticity at
the evaluation layer and creating the infinite regress of &ldquo;who checks the
checker.&rdquo; A created file at the right path is a necessary condition for
a valid run. It is not a sufficient condition for correctness, but
necessary conditions that can be checked deterministically are already
more than most agentic pipelines provide.</p>
<p>The quality of the outputs — whether the audit findings are accurate,
whether the fix is correct — depends on the model and the prompt quality.
The Ralph Loop gives you a framework for running agents safely and
repeatably. Verifying that the agent was right is a different problem and,
arguably, a harder one.</p>
<hr>
<h2 id="why-bash">Why Bash</h2>
<p>A question I have fielded: why Bash and jq, not Python or Node.js?</p>
<p>The practical reason: the target environment is an agent sandbox that has
reliable POSIX tooling but variable package availability. Python dependency
management inside a constrained container is itself a source of variance.
Bash with jq has no dependencies beyond what any standard Unix environment
provides.</p>
<p>The philosophical reason: the framework&rsquo;s job is orchestration, not
computation. It selects stories, builds prompts from templates, calls one
external tool, parses one file path, and updates one JSON field. None of
this requires a type system or a rich standard library. Bash is the right
tool for glue that does not need to be impressive.</p>
<p>The one place Bash becomes awkward is the schema validation layer, which
is implemented with a separate <code>jq</code> script against a JSON Schema. This
works but is not elegant. If the PRD schema grows substantially, that
component would be worth replacing with something that has native schema
validation support.</p>
<hr>
<h2 id="what-this-is-not">What This Is Not</h2>
<p>The Ralph Loop is not an agent. It is a harness for agents. It does not
decide what tasks to run, does not reason about a codebase, and does not
write code. It sequences discrete, pre-specified stories, enforces the
constraints on each execution, and records the outcomes. The intelligence
is in the model and in the story design; the framework contributes only
discipline.</p>
<p>This distinction matters because the current wave of agentic tools
conflates two things that are worth keeping separate: the capability to
reason and act (what the model provides) and the infrastructure for doing
so safely and repeatably (what the harness provides). Improving the model
does not automatically improve the harness — and a better model in a
poorly constrained harness just fails more impressively.</p>
<hr>
<p><em>The repository is at
<a href="https://github.com/sebastianspicker/ralph-loop">github.com/sebastianspicker/ralph-loop</a>.
The Bash implementation, the PRD schema, the CODEX.md policy document,
and the test suite are all there.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>A Gas at Temperature T: Xenakis and the Physics of Stochastic Music</title>
      <link>https://sebastianspicker.github.io/posts/xenakis-stochastic-music/</link>
      <pubDate>Tue, 14 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/xenakis-stochastic-music/</guid>
      <description>Iannis Xenakis applied the Maxwell-Boltzmann velocity distribution, Markov chains, and game theory to orchestral composition. In Pithoprakta (1955–56), 46 string parts are molecules of a gas, each following the kinetic theory distribution. In Duel and Stratégie (1959–62), two conductors play a zero-sum game with payoff matrices on stage. This post works through the physics and mathematics, and asks what it means when a composer treats an orchestra as a thermodynamic system.</description>
      <content:encoded><![CDATA[<p><em>Iannis Xenakis (1922–2001) was trained as a civil engineer at the Athens
Polytechnic, joined the Greek Resistance during the Second World War and the
subsequent Greek Civil War, survived a British army tank shell in January 1945
that cost him the sight in his left eye and part of his jaw, was sentenced to
death in absentia by the Greek military government, fled to Paris in 1947, and
worked for twelve years as an architect in Le Corbusier&rsquo;s atelier — where he
contributed structural engineering to the Unité d&rsquo;Habitation in Marseille and
designed the Philips Pavilion for Expo 58. In parallel, already in his thirties,
he taught himself composition — approaching Honegger (who was too ill to teach) and then studying with Messiaen
— and became one of the central figures of the post-war avant-garde. I mention
the biography not as background colour but because it bears on the physics. A
person who has been through what Xenakis had been through by 1950 is not likely
to be intimidated by the kinetic theory of gases.</em></p>
<p><em>He was not. In 1955–56 he composed</em> Pithoprakta <em>— &ldquo;actions through
probability&rdquo; — for 46 strings, each of which is, in his own account, a
molecule of an ideal gas. This post works through the mathematics he
used and asks what it means when a composer takes statistical mechanics
seriously as a compositional tool.</em></p>
<hr>
<h2 id="the-problem-with-post-war-serialism">The Problem with Post-War Serialism</h2>
<p>To understand why Xenakis did what he did, it helps to know what everyone
else was doing. By the early 1950s, the dominant tendency in European
new music was total serialism: the systematic extension of Schoenberg&rsquo;s
twelve-tone technique to rhythm, dynamics, articulation, and register. Every
parameter of every note was determined by a series. Messiaen had sketched
this direction in <em>Mode de valeurs et d&rsquo;intensités</em> (1949); Boulez and
Stockhausen had taken it to its logical extreme.</p>
<p>The result, as Xenakis observed with characteristic bluntness in <em>Formalized
Music</em> (1963/1992), was a kind of sonic indistinguishability: because every
parameter varied according to independent deterministic series, the textures
produced by total serialism sounded essentially like random noise. The
maximum of local determinism had produced the appearance of global chaos.</p>
<p>His diagnosis was precise and, I think, correct: if the perceptual result of
maximum determinism and maximum randomness is the same, then the path forward
is not to find a better deterministic scheme but to embrace randomness
explicitly, at the level that governs the <em>macroscopic</em> structure. Control the
distribution; let the individual events vary within it. This is exactly what
statistical mechanics does for a gas: it does not track every molecule, but
it knows with great precision what the distribution of velocities will be.</p>
<hr>
<h2 id="statistical-mechanics-in-brief">Statistical Mechanics in Brief</h2>
<p>In a classical ideal gas of $N$ molecules at thermal equilibrium with
temperature $T$, the molecules move in all directions with speeds distributed
according to the Maxwell-Boltzmann speed distribution:</p>
$$f(v) = \sqrt{\frac{2}{\pi}}\, \frac{v^2}{a^3}\, \exp\!\left(-\frac{v^2}{2a^2}\right), \qquad a = \sqrt{\frac{k_B T}{m}},$$<p>where $m$ is the molecular mass and $k_B$ is Boltzmann&rsquo;s constant. The
parameter $a$ sets the characteristic speed scale: it grows with temperature
(hotter gas means faster molecules) and shrinks with molecular mass (heavier
molecules move more slowly at the same temperature).</p>
<p>The distribution has a characteristic shape: it rises as $v^2$ for small
speeds (few molecules are nearly stationary), peaks at the most probable
speed $v_p = a\sqrt{2}$, and falls off as $e^{-v^2/2a^2}$ for large speeds
(very fast molecules are exponentially rare). The three characteristic
speeds are:</p>
$$v_p = a\sqrt{2}, \qquad \langle v \rangle = a\sqrt{\tfrac{8}{\pi}}, \qquad v_\mathrm{rms} = a\sqrt{3}.$$<p>No individual molecule is tracked. The distribution is everything: once you
know $f(v)$, you know all macroscopic properties of the gas — pressure,
mean kinetic energy, thermal conductivity — without knowing the trajectory
of a single molecule. The individual is sacrificed to the ensemble.</p>
<hr>
<h2 id="pithoprakta-and-the-orchestra-as-gas"><em>Pithoprakta</em> and the Orchestra as Gas</h2>
<p>In <em>Pithoprakta</em> (1955–56), Xenakis assigns each of the 46 string instruments
to a molecule of a gas. The musical analogue of molecular speed is the
<em>velocity of a glissando</em>: the rate at which a glissando moves through
pitch, measured in semitones per second. Slow glissandi are cold molecules;
fast glissandi are hot ones.</p>
<p>For a given passage with a specified musical &ldquo;temperature&rdquo; (an
intensity-and-density parameter he could set as a compositional choice),
the 46 glissando speeds are drawn from the Maxwell-Boltzmann distribution
for that temperature. No two strings play the same glissando at the same
speed. The effect, to a listener, is a dense sound-mass — a shimmer or
a roar — whose internal texture varies but whose overall character (the
temperature, the density) is under the composer&rsquo;s control at exactly the
level that matters perceptually.</p>
<p>Xenakis worked out the velocities numerically by hand. The score of
<em>Pithoprakta</em> was among the first in which the individual parts were derived
from a statistical distribution rather than from a melody, a row, or an
improvisation instruction. The calculation is tedious but not difficult:
for each time window, choose a temperature, compute $f(v)$ for the 46
values of $v$ that tile the distribution, and assign one speed to each
instrument.</p>
<p>The connection between macroscopic structure and microscopic liberty is
deliberately preserved. The shape of the sound-mass — its brightness,
its turbulence, its rate of change — is controlled. Each individual line
is unpredictable. This is, structurally, the same trade-off that makes
thermodynamics work: you give up on the individual trajectory and gain
exact knowledge of the aggregate.</p>
<hr>
<h2 id="musical-temperature-as-a-compositional-parameter">Musical Temperature as a Compositional Parameter</h2>
<p>The analogy is worth making precise. In the physical gas, raising the
temperature $T$ increases $a = \sqrt{k_B T / m}$, which shifts the
peak of $f(v)$ to the right and widens the distribution. More molecules
have high speeds; the variance of speeds increases.</p>
<p>In <em>Pithoprakta</em>, raising the musical &ldquo;temperature&rdquo; has the same
effect: more instruments perform rapid glissandi; the pitch-space
trajectories are more varied; the texture becomes more active and
more turbulent. Lowering the temperature concentrates the glissando
speeds near zero — slow motion, near-stasis, long sustained tones
that change pitch only gradually. The orchestra cools.</p>
<p>This mapping is not metaphorical. Xenakis computed it. The score
contains numerically derived glissando speeds; the connection between the
perceptual temperature of the texture and the statistical parameter $T$ is
quantitative. When musicians speak of a passage &ldquo;heating up,&rdquo; they are
usually using a figure of speech. In <em>Pithoprakta</em>, they are describing
a thermodynamic fact.</p>
<hr>
<h2 id="the-poisson-distribution-and-event-density">The Poisson Distribution and Event Density</h2>
<p><em>Pithoprakta</em> uses a second physical model alongside the Maxwell-Boltzmann
distribution: the Poisson process, which governs the density of
independent, randomly occurring events.</p>
<p>If musical events (pizzicato attacks, bow changes, individual note entries)
occur at a mean rate of $\lambda$ events per second, the probability of
exactly $k$ events occurring in a time window of length $T$ is:</p>
$$P(N = k) = \frac{(\lambda T)^k\, e^{-\lambda T}}{k!}.$$<p>The Poisson distribution has a single parameter $\lambda$ that controls
both the mean and the variance (they are equal: $\langle N \rangle =
\mathrm{Var}(N) = \lambda T$). A high $\lambda$ produces a dense cluster
of events; a low $\lambda$ produces sparse, widely spaced events.</p>
<p>Xenakis used this to control the density of pizzicato attacks independently
of the glissando texture. A passage can be cool (slow glissandi) and dense
(many pizzicati), or hot and sparse, or any combination. The two
distributions operate on independent musical parameters — pitch motion and
event density — giving the composer a two-dimensional thermodynamic control
space over the texture.</p>
<hr>
<h2 id="markov-chains-analogique-a-and-analogique-b">Markov Chains: <em>Analogique A</em> and <em>Analogique B</em></h2>
<p>In <em>Analogique A</em> (for string orchestra, 1958–59) and its companion
<em>Analogique B</em> (for sinusoidal tones, same year), Xenakis moved to a
different stochastic framework: Markov chains.</p>
<p>A Markov chain is a sequence of states where the probability of
transitioning to the next state depends only on the current state. The
chain is specified by a transition matrix $P$, where $P_{ij}$ is the
probability of moving from state $i$ to state $j$:</p>
$$P_{ij} \geq 0, \qquad \sum_j P_{ij} = 1 \quad \forall\, i.$$<p>Under mild conditions (irreducibility and aperiodicity), the chain
converges to a unique stationary distribution $\pi$ satisfying:</p>
$$\pi P = \pi, \qquad \sum_i \pi_i = 1.$$<p>The convergence is geometric: if $\lambda_2$ is the second-largest eigenvalue
of $P$ in absolute value, then after $n$ steps the distribution $\pi^{(n)}$
satisfies $\|\pi^{(n)} - \pi\| \leq C |\lambda_2|^n$ for some constant $C$.
The gap $1 - |\lambda_2|$ — the <em>spectral gap</em> — controls how quickly the
chain forgets its initial state. A transition matrix with a large spectral
gap produces rapid convergence; one with $|\lambda_2| \approx 1$ produces
long-memory dependence between distant states. This is a compositional
choice: the spectral gap determines how quickly a piece&rsquo;s texture changes
character.</p>
<p>In <em>Analogique A</em>, Xenakis divided the sonic space into a grid of
cells defined by pitch register (high/middle/low), density
(sparse/medium/dense), and dynamic (soft/loud). Each &ldquo;screen&rdquo; — a brief
time window — occupies one cell in this grid. The progression of screens
through the piece is governed by transition probabilities: from a
high/dense/loud screen, there is some probability of moving to each
adjacent cell, specified by Xenakis&rsquo;s chosen transition matrix.</p>
<p>This is a Markov chain on a discrete state space of sonic textures. The
macroscopic trajectory of the piece — its overall movement through sound-
quality space — is determined by the transition matrix, which the composer
sets. The details of each screen are filled in stochastically, within the
parameters of the current state. Again, the individual is sacrificed to the
aggregate; control is exercised at the level of the distribution rather
than the event.</p>
<hr>
<h2 id="game-theory-duel-and-stratégie">Game Theory: <em>Duel</em> and <em>Stratégie</em></h2>
<p>The most extreme and, to my mind, most interesting of Xenakis&rsquo;s
formalisations is the use of game theory in <em>Duel</em> (1959) and <em>Stratégie</em>
(1962).</p>
<p>A <strong>two-player zero-sum game</strong> is specified by a payoff matrix $A \in
\mathbb{R}^{m \times n}$. Player 1 (the &ldquo;maximiser&rdquo;) chooses a row $i$;
Player 2 (the &ldquo;minimiser&rdquo;) chooses a column $j$; Player 1 receives payoff
$A_{ij}$ and Player 2 receives $-A_{ij}$. In a pure-strategy game, each
player selects a single action. In a <strong>mixed-strategy game</strong>, each player
chooses a probability distribution over their actions: Player 1 uses
$\mathbf{x} \in \Delta_m$ and Player 2 uses $\mathbf{y} \in \Delta_n$,
where $\Delta_k$ denotes the standard $(k-1)$-simplex.</p>
<p>The expected payoff to Player 1 under mixed strategies is:</p>
$$E(\mathbf{x}, \mathbf{y}) = \mathbf{x}^\top A\, \mathbf{y}.$$<p>Von Neumann&rsquo;s minimax theorem (1928) guarantees that:</p>
$$\max_{\mathbf{x} \in \Delta_m} \min_{\mathbf{y} \in \Delta_n}
\mathbf{x}^\top A\, \mathbf{y}
\;=\;
\min_{\mathbf{y} \in \Delta_n} \max_{\mathbf{x} \in \Delta_m}
\mathbf{x}^\top A\, \mathbf{y}
\;=\; v^*,$$<p>where $v^*$ is the <strong>value</strong> of the game. The pair $(\mathbf{x}^*,
\mathbf{y}^*)$ that achieves this saddle point is the Nash equilibrium:
neither player can improve their expected payoff by unilaterally deviating
from their equilibrium strategy.</p>
<p>In <em>Stratégie</em>, each conductor leads one orchestra. Each has nineteen
&ldquo;tactics&rdquo; — six basic musical textures (e.g., sustained chords, staccato
pizzicati, glissandi masses, silence) plus thirteen combinatorial tactics
that combine two or three of the basics. The payoff matrix is a
$19 \times 19$ integer matrix, also defined by Xenakis, specifying how
many points Conductor 1 scores when their orchestra plays tactic $i$ against
Conductor 2&rsquo;s tactic $j$. A referee tracks the score.</p>
<p>The conductors make decisions in real time during the performance, choosing
tactics based on what the other conductor is doing and on the evolving score.
The piece ends when one conductor reaches a predetermined score threshold.</p>
<p>The Nash equilibrium of the payoff matrix tells each conductor, in principle,
the optimal <em>distribution</em> over tactics to play: if both play optimally, the
expected score trajectory is determined. In practice, conductors are not
expected to compute mixed strategies on the podium; Xenakis&rsquo;s point is
structural. The game-theoretic formalism is used to design the payoff matrix
so that no tactic dominates — every choice has consequences that depend on
the opponent&rsquo;s choice — guaranteeing that the piece will always contain
genuine strategic tension regardless of who is conducting.</p>
<p><em>Duel</em> (1959) is the earlier, simpler version for two chamber orchestras.
<em>Stratégie</em> (1962) was premiered in April 1963 at the Venice Biennale with two conductors
competing live. The audience was aware of the game, of the score, and of
the payoff matrix. The premiere was by most accounts a success, though the
practical complications of running a zero-sum game in a concert hall
(including the question of whether conductors were actually computing Nash
equilibria or just following intuition) were never fully resolved.</p>
<hr>
<h2 id="formalized-music"><em>Formalized Music</em></h2>
<p>Xenakis assembled his theoretical framework in <em>Musiques formelles</em> (1963),
translated and expanded as <em>Formalized Music</em> (1971; revised edition 1992).
The book is one of the strangest documents in twentieth-century music theory:
part treatise, part manifesto, part mathematical appendix. It covers
stochastic composition, Markov chains, game theory, set theory, group theory,
and symbolic logic — all presented with the confidence of someone who is
equally at home in the engineering faculty and the concert hall, and with
the occasional obscurity of someone writing simultaneously for two audiences
who share almost no vocabulary.</p>
<p>The core argument is that musical composition can and should be treated as
the application of mathematical structures to sonic material, not because
mathematics makes music &ldquo;better&rdquo; but because mathematical structures are
the most powerful available tools for controlling relationships between
sounds at multiple scales simultaneously. The statistical distributions
control the macroscopic; the individual values vary within them. The game-
theoretic payoff matrix controls the strategic interaction; the individual
tactics fill in the details. Mathematics operates at the structural level
and leaves the acoustic surface free.</p>
<p>This is a different relationship between mathematics and music from the
ones in my earlier posts on <a href="/posts/messiaen-modes-group-theory/">group theory and Messiaen</a>
or <a href="/posts/euclidean-rhythms/">the Euclidean algorithm and world rhythms</a>.
In those cases, mathematics describes structure that already exists in the
music — structure the composers arrived at by ear. In Xenakis, mathematics
is the generative tool: the score is derived from the calculation.</p>
<hr>
<h2 id="what-the-analogy-does-and-does-not-do">What the Analogy Does and Does Not Do</h2>
<p>The Maxwell-Boltzmann analogy in <em>Pithoprakta</em> is exact in one direction
and approximate in another.</p>
<p>It is exact in the following sense: the glissando speeds Xenakis computed
for his 46 strings genuinely follow the Maxwell-Boltzmann distribution with
the parameters he chose. The score is a realisation of that distribution.
If you collect the glissando speeds from the score and plot their histogram,
you will find the characteristic $v^2 e^{-v^2/2a^2}$ shape.</p>
<p>It is approximate — or rather, it is analogical — in the sense that strings
in an orchestra are not molecules of a gas. They do not collide. They have
mass and inertia in a physical sense that has no direct mapping to
musical parameters. The temperature $T$ is not a temperature in any
thermodynamic sense; it is a compositional variable that Xenakis chose to
parameterise with the same symbol because the formal relationship is the
same. The analogy is structural, not ontological.</p>
<p>This is worth saying plainly because it is easy to be misled in both
directions: either to over-claim (the orchestra <em>is</em> a gas) or to dismiss
(the orchestra is <em>merely</em> labelled with physical vocabulary). The actual
claim is more modest and more interesting: the mathematical structure of the
Maxwell-Boltzmann distribution is the right tool for specifying a certain
kind of orchestral texture, namely one where individual elements vary
stochastically around a controlled macroscopic envelope. The physics
provides the formalism; the music provides the application. This is how
mathematics works in engineering, too.</p>
<hr>
<h2 id="the-centenary-and-what-remains">The Centenary and What Remains</h2>
<p>Xenakis died in 2001, by then partially deaf and with dementia. His centenary
in 2022 produced a wave of new performances, recordings, and scholarship
— including the <em>Meta-Xenakis</em> volume (Open Book Publishers, 2022), which
collects analyses of his compositional mathematics, his architectural work
(he designed the Philips Pavilion for Le Corbusier&rsquo;s Expo 58 in Brussels
using the same ruled-surface geometry he was using in <em>Metastaseis</em>), and
his political biography.</p>
<p>What remains resonant about his project is not the specific distributions
he chose — the Maxwell-Boltzmann is not the only or even necessarily the
best distribution for many musical applications — but the epistemological
position it represents. Xenakis insisted that the right question to ask
about a musical texture is not &ldquo;what is the note at beat 3 of bar 47?&rdquo; but
&ldquo;what is the distribution from which the events in this section are drawn?&rdquo;
This shift from individual determination to statistical control is precisely
the shift that makes thermodynamics possible as a science, and Xenakis was
the first composer to apply it deliberately and systematically.</p>
<p>When a composer writes &ldquo;let the
orchestra be a gas at temperature $T$&rdquo; and then actually computes the
consequences with Boltzmann&rsquo;s constant in front of him, I do not feel that
physics has been appropriated. I feel that it has been recognised — seen,
from a different direction, as the same thing it always was: a set of tools
for thinking about ensembles of interacting elements whose individual
behaviour is too complex to track but whose collective behaviour is not.</p>
<p>The orchestra is not a gas. But the Maxwell-Boltzmann distribution describes
it anyway.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Ames, C. (1989). The Markov process as a compositional model: A survey and
tutorial. <em>Leonardo</em>, 22(2), 175–187. <a href="https://doi.org/10.2307/1575226">https://doi.org/10.2307/1575226</a></p>
</li>
<li>
<p>Jedrzejewski, F. (2006). <em>Mathematical Theory of Music.</em> Delatour France /
IRCAM.</p>
</li>
<li>
<p>Nash, J. F. (1950). Equilibrium points in $n$-person games. <em>Proceedings of
the National Academy of Sciences</em>, 36(1), 48–49.
<a href="https://doi.org/10.1073/pnas.36.1.48">https://doi.org/10.1073/pnas.36.1.48</a></p>
</li>
<li>
<p>Nierhaus, G. (2009). <em>Algorithmic Composition: Paradigms of Automated Music
Generation.</em> Springer.</p>
</li>
<li>
<p>Matossian, N. (2005). <em>Xenakis</em> (revised ed.). Moufflon Publications.</p>
</li>
<li>
<p>Solomos, M. (Ed.). (2022). <em>Meta-Xenakis.</em> Open Book Publishers.
<a href="https://doi.org/10.11647/OBP.0313">https://doi.org/10.11647/OBP.0313</a></p>
</li>
<li>
<p>von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. <em>Mathematische
Annalen</em>, 100(1), 295–320. <a href="https://doi.org/10.1007/BF01448847">https://doi.org/10.1007/BF01448847</a></p>
</li>
<li>
<p>von Neumann, J., &amp; Morgenstern, O. (1944). <em>Theory of Games and Economic
Behavior.</em> Princeton University Press.</p>
</li>
<li>
<p>Xenakis, I. (1992). <em>Formalized Music: Thought and Mathematics in
Composition</em> (revised ed.). Pendragon Press.
(Originally published as <em>Musiques formelles</em>, La Revue Musicale, 1963.)</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-14</strong>: Corrected the description of <em>Stratégie</em> (1962): each conductor has nineteen tactics (six basic plus thirteen combinatorial), with a 19 x 19 payoff matrix — not six tactics and a 6 x 6 matrix. The six-tactic, 6 x 6 description applies to the earlier <em>Duel</em> (1959).</li>
<li><strong>2026-01-14</strong>: Added &ldquo;in April 1963&rdquo; to the <em>Stratégie</em> premiere sentence. The composition date is 1962; the premiere took place on 25 April 1963 at the Venice Biennale.</li>
<li><strong>2026-01-14</strong>: Changed &ldquo;studying briefly with Honegger&rdquo; to &ldquo;approaching Honegger (who was too ill to teach).&rdquo; Xenakis sought instruction from Honegger circa 1949, but Honegger was in declining health and did not take him as a student.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>From Oxide to Oversampling: The Physics of Recorded Sound</title>
      <link>https://sebastianspicker.github.io/posts/tape-saturation-delta-sigma-adc-physics/</link>
      <pubDate>Fri, 15 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/tape-saturation-delta-sigma-adc-physics/</guid>
      <description>&amp;lsquo;Analogue warmth&amp;rsquo; and &amp;lsquo;digital coldness&amp;rsquo; are not aesthetic preferences — they are different physics. Ferromagnetic hysteresis generates even harmonics. Delta-sigma modulators push quantisation noise to ultrasonic frequencies. Both effects are calculable.</description>
      <content:encoded><![CDATA[<p>There is an argument that has been running in recording studios since roughly 1982, when the first commercially mastered compact discs appeared. On one side: analogue tape has warmth, depth, something the ear likes. On the other: digital audio is more accurate, lower noise, the measurements say so. The argument produces more heat than light, because most participants treat it as an aesthetic question — a matter of feeling, taste, preference. It is not. The difference between tape and digital audio is a physics difference, and the physics is specific enough to calculate.</p>
<p>The physics here turns out to be some of my favourite kind: it sits at the intersection of condensed matter, signal processing, and Fourier analysis, and it connects directly to why certain sounds are perceived as pleasant. This post walks through both sides. Part I is the ferromagnetic physics of magnetic tape and the harmonic structure of saturation distortion. Part II is the delta-sigma modulator and the engineering trick that achieves 24-bit dynamic range from a 1-bit comparator. Neither side of the debate is as simple as its partisans claim, and the physics of both is more interesting than the aesthetics argument they have been stuck in for forty years.</p>
<hr>
<h2 id="part-i-the-physics-of-magnetic-tape">Part I: The Physics of Magnetic Tape</h2>
<h3 id="ferromagnetic-recording">Ferromagnetic Recording</h3>
<p>Magnetic recording tape is a thin polymer substrate coated with a layer of ferromagnetic particles suspended in a binder. For most of the twentieth century those particles were iron oxide — specifically $\gamma\text{-Fe}_2\text{O}_3$, gamma-phase ferric oxide — though chromium dioxide ($\text{CrO}_2$) and later metal-particle formulations with pure iron or iron-cobalt alloys were developed for higher coercivity and better high-frequency response. What all of these materials share is the key property of ferromagnetism: each particle is a small permanent magnet, a magnetic domain with a net magnetic moment that can be oriented by an external field and that will retain that orientation when the field is removed.</p>
<p>The recording process exploits this directly. The recording head is a toroidal electromagnet with a narrow gap. When audio-frequency current flows through the head&rsquo;s coil, the field at the gap follows the current, and as the tape moves past at a fixed speed, successive particles along the tape length are aligned according to the instantaneous field at the moment they pass the gap. The result is a spatial encoding of the time-domain audio signal along the tape. On playback, the inverse process occurs: the moving pattern of magnetised particles generates a time-varying flux in the playback head&rsquo;s core, which induces a voltage in the coil by Faraday&rsquo;s law, reproducing the original current waveform.</p>
<p>So far this description is entirely linear. The head current maps to a field, the field maps to a magnetisation, the magnetisation maps back to a voltage. If all three relationships were linear, tape would be a near-perfect recording medium — limited only by particle noise and head gap frequency response. The nonlinearity comes from the second relationship in that chain, and it comes from the fundamental physics of how ferromagnetic materials respond to an applied field.</p>
<h3 id="the-b-h-curve-and-hysteresis">The B-H Curve and Hysteresis</h3>
<p>The relationship between the applied magnetic field intensity $H$ (from the recording head, measured in A/m) and the resulting magnetic flux density $B$ in the tape (measured in tesla) is not linear. It follows a curve — actually a family of nested curves — known as the hysteresis loop, and its shape determines almost everything interesting about tape recording <a href="#ref-3">[3]</a>.</p>
<p>Starting from a demagnetised state and increasing $H$ from zero, the initial slope $dB/dH$ — the magnetic permeability $\mu$ — is relatively low. The domains in the material are oriented randomly and require a threshold of energy to begin reorienting. As $H$ increases further, the permeability rises, and there is a region of steep, approximately linear increase in $B$. Then, as $H$ continues to increase, the material saturates: progressively fewer unaligned domains remain, the slope falls, and eventually $dB/dH \to 0$ as all domains are aligned. The $B$-$H$ curve is S-shaped, and the saturation is irreversible in a specific sense: if you now reduce $H$ back toward zero, $B$ does not retrace the original path. It remains at a higher value — the remanence $B_r$ — and you must apply a reverse field of magnitude $H_c$, the coercivity, to bring $B$ back to zero. The loop formed by this cycle of magnetisation and demagnetisation is the hysteresis loop, and its area is proportional to the energy dissipated as heat per cycle.</p>
<p>The crucial feature for audio recording is what happens near the origin. A small audio signal, sitting near $H = 0$, does not experience a nicely linear region of the $B$-$H$ curve. The initial permeability is low, and there is an inflection point near zero: the slope increases as you move away from zero before the saturation region brings it back down again. This means that even at low recording levels, the transfer function from head current to tape magnetisation is nonlinear, and in a particular way — the response is symmetric under $H \to -H$, which means the dominant nonlinear term is even-order. Without some remedy, even a gentle sine wave would emerge from the playback head with significant even-harmonic content added. The signal would also sit in a region of the curve where the effective permeability depends on signal amplitude, making the recording level-dependent in an uncontrolled way. Something needed to be done about this, and the solution found in the 1940s is one of the more elegant pieces of applied physics in the history of the recording industry.</p>
<h3 id="the-bias-signal">The Bias Signal</h3>
<p>The solution is called AC bias, and its discovery is usually credited to Braunmühl and Weber at the German Reichs-Rundfunk-Gesellschaft around 1940, though there are earlier related patents. The idea is simple once stated: add a high-frequency signal — typically between 50 kHz and 150 kHz, well above the audio band — to the recording current before it drives the head. This bias signal has an amplitude large enough to drive the tape through multiple cycles of its B-H curve on each audio cycle, but it is filtered out of the playback signal by the tape&rsquo;s own limited high-frequency response and by subsequent low-pass filtering.</p>
<p>The effect on the recording process is to linearise the transfer function. The operating point is no longer stationary near the inflection point at $H = 0$. Instead, it rides up and down the B-H curve rapidly many times per audio period, driven by the bias. The audio signal merely modulates the envelope of this rapid oscillation. The net magnetisation that remains after the tape leaves the head gap is the time average of many rapid traversals of the hysteresis loop, and this average tracks the audio signal with good linearity provided the signal level is modest. The bias amplitude and frequency are tuned carefully for each tape formulation — too little bias and the linearisation is incomplete; too much and the signal is undermodulated and the high-frequency response suffers as the bias begins to erase fine spatial patterns written by high-frequency audio. Getting the bias right is part of the alignment procedure for every analogue tape machine and part of why different tape formulations require different machine settings.</p>
<p>The result, for moderate recording levels, is a remarkably clean and linear recording medium. The nonlinear character of the B-H curve is effectively tamed by the bias trick, and the remaining imperfections are mostly second-order: azimuth errors, print-through, head bump, self-demagnetisation at short wavelengths. For practical purposes, a well-aligned analogue tape machine at moderate recording levels is a linear system.</p>
<h3 id="harmonic-generation-at-high-levels">Harmonic Generation at High Levels</h3>
<p>At high recording levels — when the audio signal is large enough to push the operating point into the saturation region even after the bias has done its linearising work — the picture changes. The transfer function from input current to output magnetisation becomes genuinely nonlinear, and the harmonic content of the distortion becomes the central question.</p>
<p>The standard framework is a Taylor expansion of the transfer function around the operating point:</p>
$$y(t) = a_1 x(t) + a_2 x^2(t) + a_3 x^3(t) + a_4 x^4(t) + \cdots$$<p>where $x(t)$ is the input signal (the audio current), $y(t)$ is the output (the magnetisation recorded on tape), and the coefficients $a_n$ are determined by the shape of the B-H curve near saturation. For a pure tone $x(t) = A \sin(\omega t)$, the higher-order terms generate harmonics in a calculable way.</p>
<p>The second-order term gives:</p>
$$a_2 x^2(t) = a_2 A^2 \sin^2(\omega t) = \frac{a_2 A^2}{2}\bigl(1 - \cos 2\omega t\bigr)$$<p>This is a DC offset plus a component at $2\omega$ — the second harmonic, one octave above the fundamental.</p>
<p>The third-order term gives:</p>
$$a_3 x^3(t) = a_3 A^3 \sin^3(\omega t) = a_3 A^3 \left(\frac{3}{4}\sin\omega t - \frac{1}{4}\sin 3\omega t\right)$$<p>The $\frac{3}{4}$ piece adds to (or subtracts from) the fundamental depending on the sign of $a_3$; the $-\frac{1}{4}$ piece is a third harmonic at $3\omega$, one octave and a fifth above the fundamental.</p>
<p>Carrying through to fourth order:</p>
$$a_4 x^4(t) = \frac{a_4 A^4}{8}\bigl(3 - 4\cos 2\omega t + \cos 4\omega t\bigr)$$<p>which contributes additional DC, a component at $2\omega$, and a fourth harmonic at $4\omega$.</p>
<p>Collecting the terms through fourth order, the output is approximately:</p>
$$y(t) \approx \left(a_1 + \frac{3a_3 A^2}{4}\right)A\sin\omega t - \frac{a_2 A^2}{2}\cos 2\omega t - \frac{a_3 A^3}{4}\sin 3\omega t + \cdots$$<p>The important observation is about which harmonics dominate and what they sound like. The B-H curve of a ferromagnetic material near saturation is approximately symmetric: the saturation behaviour for positive $H$ mirrors that for negative $H$. A symmetric nonlinearity has $a_2 = a_4 = 0$ (all even coefficients vanish by symmetry), and only odd harmonics are generated. But at moderate levels, just before full saturation, the symmetry of the B-H loop as traversed by the biased signal is not perfect, and the even-order terms are nonzero — though small. This gives tape its characteristic distortion signature: at moderate saturation levels, the even harmonics ($2\omega$, $4\omega$) dominate; at heavy saturation, the odd harmonics ($3\omega$, $5\omega$) appear more strongly.</p>
<p>The perceptual consequence of this is the crux of the &ldquo;analogue warmth&rdquo; story. The second harmonic is the octave of the fundamental. The fourth harmonic is the double octave. These are, in Western harmonic practice and in the physics of vibrating strings, the most consonant possible intervals. Adding even harmonics at low amplitude to a fundamental makes the sound fuller and richer without introducing beating or dissonance. Odd harmonics — particularly the fifth (at $5\omega$, a major third above the double octave) and the seventh (a flattened seventh above the double octave) — are less consonant relative to the fundamental and at high amplitude produce the harsh, buzzy character associated with heavy distortion or the deliberate aggression of a fuzz pedal.</p>
<p>There is one more effect worth naming: the saturation is a soft knee. The B-H curve does not have a sharp corner at saturation — it curves gradually from the linear region into the flat-topped saturation region. This means that transient signals — percussive attacks, consonant onsets — that briefly exceed the nominal recording level are not hard-clipped but gently compressed. Their peaks are rounded by the shape of the B-H curve. Engineers and producers who record through tape often describe this as the machine &ldquo;breathing&rdquo; or as a pleasing &ldquo;gluing&rdquo; of transients. The physics is simple: the soft-knee transfer function applies more gain reduction to instantaneous peaks than to the sustained body of the signal, functioning as a fast, musically transparent dynamic compressor for any material that approaches saturation.</p>
<hr>
<h2 id="part-ii-the-physics-of-delta-sigma-conversion">Part II: The Physics of Delta-Sigma Conversion</h2>
<h3 id="nyquist-rate-adc-and-its-limits">Nyquist-Rate ADC and Its Limits</h3>
<p>The straightforward approach to analogue-to-digital audio conversion samples the signal at a rate just above twice the highest audio frequency — the Nyquist rate — using a quantiser with enough bits to achieve the desired dynamic range. For CD-quality audio, the sampling rate is 44.1 kHz (slightly above $2 \times 20{,}000$ Hz) and the word length is 16 bits. The dynamic range of a $b$-bit PCM system is, to a good approximation:</p>
$$\text{SNR} \approx 6.02b + 1.76 \text{ dB}$$<p>so 16 bits gives approximately $6.02 \times 16 + 1.76 \approx 98$ dB, which matches the dynamic range of the best analogue tape and is well above the approximately 70 dB achievable with the noise floor of typical studio tape at 15 ips <a href="#ref-4">[4]</a>.</p>
<p>The engineering problem with a straightforward Nyquist-rate ADC is the anti-aliasing filter. Before sampling, all content above $f_s/2 = 22.05$ kHz must be removed. If it is not, energy at frequency $f > f_s/2$ aliases into the audio band as a spurious component at $f_s - f$, which is inaudible in origin but very much audible in its alias. To achieve 98 dB of alias suppression — matching the 16-bit dynamic range — the filter must attenuate signals at 22.05 kHz by 98 dB relative to signals at 20 kHz. The transition band is only 2.05 kHz wide. That requires a very high-order analogue filter — typically seventh-order elliptic or Chebyshev — and such filters have significant phase distortion within the audio band, particularly at frequencies near the passband edge. In 1982, building this filter precisely, cheaply, and repeatably in consumer hardware was a genuine engineering challenge. The filters introduced audible phase and amplitude ripple that the original measurements had not anticipated and that contributed to early criticisms of the CD sound.</p>
<h3 id="oversampling">Oversampling</h3>
<p>The delta-sigma ($\Sigma\Delta$) ADC architecture was developed to sidestep the steep-filter problem entirely, and its adoption in consumer audio from the late 1980s onwards largely resolved the anti-aliasing filter debate <a href="#ref-1">[1]</a>. The core idea is oversampling: instead of sampling at 44.1 kHz with 16 bits, the $\Sigma\Delta$ converter samples at $M \times 44.1$ kHz — where $M$ is the oversampling ratio, typically 64 in early audio converters, giving $64 \times 44.1 = 2.8224$ MHz — with a 1-bit quantiser. The anti-aliasing filter now needs to attenuate everything above 1.4112 MHz before sampling. Its transition band runs from 20 kHz to 1.4112 MHz, a ratio of roughly 70:1. This is easy: a simple, cheap, first- or second-order RC filter suffices, with negligible phase distortion anywhere in the audio band. The price paid is that the quantiser is now only 1 bit, and a 1-bit quantiser has terrible resolution on its own.</p>
<p>To understand what oversampling buys even before any clever signal processing, consider the quantisation noise floor. For a uniform quantiser with step size $\Delta$, the quantisation noise power is $P_q = \Delta^2/12$, and this noise is spread approximately uniformly from 0 to $f_s/2$. The noise power spectral density is $P_q / (f_s/2)$. After oversampling by a factor of $M$ — so that the effective Nyquist band runs from 0 to $f_{\text{audio}} = f_s/(2M)$ — the in-band noise power is:</p>
$$P_{\text{in-band}} = \frac{P_q}{f_s/2} \cdot f_{\text{audio}} = \frac{P_q}{f_s/2} \cdot \frac{f_s}{2M} = \frac{P_q}{M}$$<p>Each doubling of $M$ halves the in-band noise power, an improvement of 3 dB, equivalent to half a bit of resolution. At 64× oversampling this gives 18 dB, or three extra bits — useful, but not enough to get from a 1-bit quantiser to 16-bit performance. We need something more.</p>
<h3 id="noise-shaping">Noise Shaping</h3>
<p>The second ingredient — and the one that makes $\Sigma\Delta$ conversion genuinely remarkable — is noise shaping. Rather than spreading quantisation noise uniformly in frequency, we can engineer its spectral distribution so that almost all the noise power sits above the audio band, where it is removed by a digital low-pass filter (the decimation filter) at the output.</p>
<p>A first-order $\Sigma\Delta$ modulator achieves this by a feedback loop. At each sample step, the quantiser takes the running integral of the difference between the input signal and the previous quantised output. More precisely: the quantisation error $e_n = y_n - \hat{x}_n$ (where $\hat{x}_n$ is the input to the quantiser and $y_n$ is the 1-bit output) is fed back and subtracted from the next input before integration. This is the integrator-feedback structure that gives the modulator its name: $\Sigma$ for the integrating summation, $\Delta$ for the difference.</p>
<p>In the $z$-domain, this feedback structure gives the quantisation noise a transfer function of:</p>
$$N(z) = 1 - z^{-1}$$<p>that is, the noise at time $n$ is the current error minus the previous error — a first-difference operation. In the frequency domain, substituting $z = e^{j 2\pi f / f_s}$:</p>
$$\bigl|N(f)\bigr|^2 = \left|1 - e^{-j 2\pi f / f_s}\right|^2 = 4\sin^2\!\left(\frac{\pi f}{f_s}\right)$$<p>For frequencies well below the sampling rate, $f \ll f_s$, the small-angle approximation gives:</p>
$$\bigl|N(f)\bigr|^2 \approx \left(\frac{2\pi f}{f_s}\right)^2$$<p>The noise power spectral density rises as $f^2$ — it is heavily suppressed at low frequencies and pushed up toward $f_s/2$. Integrating this shaped noise over the audio band $[0, f_{\text{audio}}]$ and comparing to the flat-spectrum case, the in-band SNR improvement for a first-order modulator scales as $M^3$ rather than $M^1$: every doubling of oversampling ratio gives 9 dB improvement (1.5 bits) instead of 3 dB. At 64× oversampling — six doublings — a first-order modulator recovers approximately 54 dB, or 9 effective bits.</p>
<p>A second-order modulator applies the noise-shaping filter twice, giving $|N(f)|^2 \propto f^4$ and an SNR gain scaling as $M^5$: 15 dB per octave of oversampling. At 64× — again six doublings — this recovers approximately 90 dB, or 15 effective bits. Modern high-performance audio ADCs use fifth- to seventh-order modulators operating at 128× oversampling or higher. The in-band noise floor drops to levels corresponding to 20–24 effective bits — entirely from a 1-bit hardware comparator, with all the resolution coming from the noise shaping and the subsequent digital decimation filter.</p>
<p>The following table illustrates the SNR gain achievable at practical oversampling ratios:</p>
<table>
  <thead>
      <tr>
          <th>Modulator order</th>
          <th>Oversampling ratio</th>
          <th>SNR gain</th>
          <th>Effective bits gained</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1st order</td>
          <td>64×</td>
          <td>54 dB</td>
          <td>9</td>
      </tr>
      <tr>
          <td>2nd order</td>
          <td>64×</td>
          <td>90 dB</td>
          <td>15</td>
      </tr>
      <tr>
          <td>5th order</td>
          <td>128×</td>
          <td>~120 dB</td>
          <td>~20</td>
      </tr>
  </tbody>
</table>
<p>The 5th-order row deserves a moment&rsquo;s attention. A single-bit comparator — a device that outputs only 1 or 0, with no analogue subtlety whatsoever — combined with oversampling and noise shaping, achieves the resolution of a 20-bit Nyquist-rate ADC and is doing so using a simple digital feedback loop and an analogue integrator that can be fabricated cheaply on a CMOS chip. This is, I think, one of the more quietly stunning pieces of engineering in consumer electronics, and it goes entirely unnoticed because the CD player it lives inside is now considered mundane.</p>
<p>There is a subtlety worth adding for completeness. Real $\Sigma\Delta$ modulators of order three and above are potentially unstable — the noise-shaping loop can become unstable for large input signals, producing limit cycles or tonal artefacts. Managing this stability is a significant part of the design problem and involves either restricting the input range, adding nonlinear stability control, or using multi-bit internal quantisers (which reduce the quantisation step and ease the stability constraint while retaining most of the noise-shaping benefit). The multi-bit approach also addresses a related issue: the ideal 1-bit DAC in the feedback loop is inherently linear (there are only two levels, so there is no differential nonlinearity), but multi-bit internal DACs must be trimmed or calibrated to avoid nonlinearity in the feedback path corrupting the noise shaping. These engineering details are discussed thoroughly in Norsworthy, Schreier, and Temes <a href="#ref-5">[5]</a>, which remains the standard reference.</p>
<p>The digital audio infrastructure that delta-sigma conversion enabled — clean, cheap, phase-linear converters without steep analogue filters — also made digital audio workable in latency-sensitive applications like live performance. For a discussion of why latency matters so much in network music performance and how it shapes system design, see my earlier post on <a href="/posts/nmp-latency-lola-mvtp/">NMP latency and the physics of musical timing</a>.</p>
<hr>
<h2 id="the-irony-of-the-comparison">The Irony of the Comparison</h2>
<p>Both tape saturation and delta-sigma conversion are, at root, about the same problem: how to manage the relationship between a signal and the finite resolution of the medium storing it. Tape manages the problem physically and somewhat accidentally — the ferromagnetic B-H curve happens to generate even harmonics that are consonant with the recorded signal, and the bias trick linearises the response well enough that the distortion only becomes audible when the engineer deliberately pushes into saturation. Delta-sigma manages the problem mathematically and deliberately — quantisation noise is redistributed in frequency by a designed feedback loop so that it falls outside the audible band.</p>
<p>Neither approach is perfect, and neither is neutral. Tape adds signal-correlated harmonic distortion whose spectral content depends on recording level and which compresses transients in a way that changes the perceived dynamics. Digital audio, even with delta-sigma conversion, has its own imperfections: idle-channel noise from the modulator, potential for tonal limit-cycle artefacts at specific input levels, and the abrupt onset of hard clipping at full scale — which, unlike tape saturation, is symmetrical and rapid and adds all harmonics simultaneously, giving the harsh, unpleasant character that digital overloads are known for. The soft-knee vs. hard-clip distinction is real and audible, and it is probably the most defensible technical basis for the claim that analogue tape handles transient overloads more graciously.</p>
<p>What is not defensible is the claim that one medium is inherently more musical than the other, or that digital audio lacks something fundamental that tape possesses. They are differently imperfect. The imperfections of tape happen to sit at harmonic relationships that Western ears, shaped by a tradition of music built on those same harmonic intervals, find pleasing. The imperfections of digital audio are not at pleasing harmonic intervals; they are wideband quantisation noise (before shaping) or ultrasonic shaped noise (after), and a sharp cliff at full scale. Different physics, different perceptual character.</p>
<hr>
<h2 id="a-personal-note">A Personal Note</h2>
<p>I spent a long time thinking the tape versus digital debate was mostly audiophile mythology — a community of enthusiasts rationalising the warmth of nostalgia as the warmth of oxide particles. The physics is more interesting than that, and doing the calculation changed my view. The second-harmonic content of tape saturation is not an accident or a romantic story; it is what you get when you push a symmetric nonlinearity with an audio sine wave, and the reason it sounds pleasant is not arbitrary but is grounded in the physics of consonance and the harmonic series. The delta-sigma converter is not a mundane commodity chip but a genuinely elegant solution to an otherwise intractable filter-design problem, and the fact that it achieves 24-bit resolution from a 1-bit comparator by spectral redistribution of noise is the kind of result that should get more attention in physics education.</p>
<p>Both technologies deserve better than the aesthetics argument they have been fighting in for forty years. The tools to understand them are not exotic — Taylor series, Fourier analysis, the z-transform, and the basic physics of ferromagnetism — and the reward is a clear-eyed picture of what is actually going on inside two of the most consequential inventions in the history of recorded music. If you are interested in related mathematics underlying other aspects of music, the posts on <a href="/posts/euclidean-rhythms/">Euclidean rhythms</a> and <a href="/posts/messiaen-modes-group-theory/">Messiaen&rsquo;s modes and group theory</a> cover the combinatorial and algebraic structures in rhythm and pitch that sit alongside the physics discussed here.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Candy, J. C., &amp; Temes, G. C. (Eds.). (1992). <em>Oversampling Delta-Sigma Data Converters: Theory, Design, and Simulation</em>. IEEE Press.</p>
<p><span id="ref-2"></span>[2] Reiss, J. D., &amp; McPherson, A. (2015). <em>Audio Effects: Theory, Implementation and Application</em>. CRC Press.</p>
<p><span id="ref-3"></span>[3] Bertram, H. N. (1994). <em>Theory of Magnetic Recording</em>. Cambridge University Press.</p>
<p><span id="ref-4"></span>[4] Pohlmann, K. C. (2010). <em>Principles of Digital Audio</em> (6th ed.). McGraw-Hill.</p>
<p><span id="ref-5"></span>[5] Norsworthy, S. R., Schreier, R., &amp; Temes, G. C. (Eds.). (1997). <em>Delta-Sigma Data Converters: Theory, Design, and Simulation</em>. IEEE Press.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-14</strong>: Updated the interval description for the 7th harmonic to &ldquo;above the double octave.&rdquo; The 7th harmonic (7f) sits between the double octave (4f) and the triple octave (8f).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The AI Friend That Makes You Lonelier</title>
      <link>https://sebastianspicker.github.io/posts/ai-companion-loneliness-ironic-process/</link>
      <pubDate>Tue, 12 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-companion-loneliness-ironic-process/</guid>
      <description>AI companions promise to address the loneliness epidemic. Daniel Wegner&amp;rsquo;s ironic process theory predicts they will fail under exactly the conditions where people need them most — and recent data from MIT and OpenAI suggest the prediction is correct.</description>
      <content:encoded><![CDATA[<h2 id="summary">Summary</h2>
<p>In 1956 Donald Horton and Richard Wohl described <em>parasocial relationships</em> — one-sided emotional
bonds that audiences form with media performers <a href="#ref-1">[1]</a>.
&ldquo;Intimacy at a distance,&rdquo; they called it. The television personality responds to the camera; the
viewer responds as if in genuine social exchange. Only one party is aware of and affected by the
other.</p>
<p>AI companions change the substrate without changing the structure. The chatbot responds. The user
responds. The asymmetry remains: the chatbot has no inner life behind its outputs. Sherry Turkle
put it bluntly: &ldquo;simulated feelings are not feelings, and simulated love is never love&rdquo;
<a href="#ref-5">[5]</a>.</p>
<p>The question I want to work through here is whether this matters in the way we think it does. The
answer from Daniel Wegner&rsquo;s ironic process theory — and increasingly from the empirical data — is
that it matters in a specific, predictable, and counterintuitive way. AI companions may be
particularly likely to exacerbate loneliness under the conditions of chronic social deprivation
that prompt people to use them in the first place.</p>
<h2 id="the-loneliness-epidemic-is-real">The Loneliness Epidemic Is Real</h2>
<p>Before getting to the mechanism, the scale of the problem. Julianne Holt-Lunstad&rsquo;s 2010
meta-analysis of 148 studies and 308,849 participants found that people with adequate social
relationships had a 50% increased likelihood of survival compared to those with poorer social
connections <a href="#ref-3">[3]</a>. That effect size is comparable
to quitting smoking. A follow-up meta-analysis in 2015 found that social isolation carried a 29%
increased mortality risk, subjective loneliness 26%, and living alone 32%
<a href="#ref-4">[4]</a>.</p>
<p>The U.S. Surgeon General issued an advisory in 2023 declaring an epidemic of loneliness and
isolation. A 2018 Cigna survey using the UCLA Loneliness Scale found that adults aged 18–22 scored
highest on loneliness of any cohort — more than retirees, more than the elderly. The UK appointed
a Minister for Loneliness in January 2018 — the first such government position in the world.</p>
<p>This is the context in which AI companions have arrived. The market is responding to a real
epidemiological need. That does not mean the response is correct.</p>
<h2 id="parasocial-relationships-the-original-framework">Parasocial Relationships: The Original Framework</h2>
<p>Horton and Wohl&rsquo;s 1956 paper remains the foundational text
<a href="#ref-1">[1]</a>. Their key observation: the parasocial bond is
&ldquo;controlled by the performer, and not susceptible of mutual development.&rdquo; The audience member
brings real emotional response; the performer brings nothing specific to the audience member,
because she does not know the audience member exists.</p>
<p>They were not dismissive of parasocial relationships. They identified useful functions: comfort,
companionship, entertainment, the pleasure of a consistent &ldquo;personality&rdquo; encountered regularly.
The problem, in their framing, arises when parasocial interaction substitutes for rather than
supplements real social bonds — when the one-sided relationship becomes the primary source of
social experience.</p>
<p>AI companions are parasocial relationships with one modification: the AI responds to you
specifically. Replika remembers your name, your preferences, your previous conversations. The
interaction is <em>personalised</em> without being <em>mutual</em> — because mutuality requires that the other
party has something genuinely at stake. A language model has no stakes. Its outputs are
conditional on your inputs; there is no entity behind those outputs that cares about you.</p>
<p>Sherry Turkle spent years interviewing users of social robots and chatbots for <em>Alone Together</em>
<a href="#ref-5">[5]</a>. Her diagnosis: AI companions offer &ldquo;the illusion of
companionship without the demands of friendship.&rdquo; The demands — vulnerability, conflict,
negotiation, the possibility of rejection — are precisely what makes friendship friendship.
An interaction optimised to be pleasant, responsive, and frictionless is precisely <em>not</em> training
the social capacities that real relationships require.</p>
<h2 id="the-evidence-for-short-term-benefit">The Evidence for Short-Term Benefit</h2>
<p>The AI therapy literature is not without positive results. Kathleen Kara Fitzpatrick and colleagues
ran a two-week randomised controlled trial of Woebot — a CBT-based chatbot — against a
psychoeducation control <a href="#ref-6">[6]</a>. Seventy participants,
aged 18–28, university students. The Woebot group showed a statistically significant reduction in
depression symptoms on the PHQ-9; the control group did not.</p>
<p>This result should be taken seriously. A CBT-based chatbot delivering structured exercises —
thought records, behavioural activation, psychoeducation — can produce measurable symptom
improvement over two weeks. This is a tool that does something useful, and it is accessible and
affordable in a way that therapists are not.</p>
<p>But the Woebot study has important constraints: N=70, two-week duration, convenience sample
(Stanford students), psychoeducation control rather than active human therapy comparator, and
financial ties between lead authors and Woebot Health. It tells us something about short-term
CBT delivery. It does not tell us what happens over months of use, or what happens when users
primarily seek companionship rather than structured therapeutic exercises.</p>
<p>Skjuve and colleagues studied Replika users specifically <a href="#ref-7">[7]</a>.
They found that relationships began with curiosity and evolved, over weeks, into significant
affective bonds. Users reported genuine care for their Replika. Some experienced it as their most
reliable social relationship. In February 2023, when Replika abruptly disabled erotic roleplay
functionality following regulatory pressure, users described grief — not disappointment, not
inconvenience, but grief. The attachment was real, even if the other party was not.</p>
<h2 id="wegners-prediction">Wegner&rsquo;s Prediction</h2>
<p>This is where I want to make the specific theoretical argument, because it follows from a
well-established result in cognitive psychology and it predicts something precise.</p>
<p>Daniel Wegner&rsquo;s ironic process theory holds that mental control attempts involve two simultaneous
processes <a href="#ref-8">[8]</a>. An <em>operating process</em> searches for thoughts and
states consistent with the intended goal, requiring cognitive resources. A <em>monitoring process</em>
scans for evidence that the goal is not being achieved, running automatically with low resource
demand.</p>
<p>Under normal conditions, the operating process dominates: you successfully avoid thinking about
white bears. Under cognitive load or chronic stress, the monitoring process overshadows the
operating process, producing the ironic opposite of the intended state: you think of white bears
more, not less. Try not to feel sad and you feel sadder. Try not to feel anxious in a stressful
meeting and you become more anxious. A meta-analysis of ironic suppression effects across domains
confirmed the robustness of this pattern <a href="#ref-9">[9]</a>.</p>
<p>Now apply this to AI companion use under conditions of chronic loneliness.</p>
<p>The user&rsquo;s implicit goal: to feel less lonely. The operating process: engage with the AI, which
provides responsive, personalised interaction, producing the experience of social contact. The
monitoring process: scans continuously for signs that the user is, in fact, lonely.</p>
<p>Here is the problem. Loneliness is not suppressed by an AI interaction — it is displaced during
that interaction. The monitoring process has no instruction to suspend itself. It continues to
register that the user&rsquo;s social needs are not being met by actual human relationships. The user
experiences companionship with the AI; the monitoring process registers that this companionship is
insufficient and the social deficit remains.</p>
<p>When the AI session ends, the monitoring process reports what it has found. The user is confronted
with the loneliness that the AI was supposed to address. Under conditions of chronic social
deprivation — precisely the conditions that make AI companions attractive — the monitoring process
is likely to be hyperactive. Wegner&rsquo;s theory predicts that the attempted suppression will rebound,
possibly worse than before.</p>
<p>This is not a vague prediction. It is a specific mechanism with an established empirical base.
I covered Wegner&rsquo;s ironic process theory in the context of a very different application in an
<a href="/posts/try-to-relax-ironic-process-wormholes/">earlier post</a>; the mechanism is the same regardless
of the domain.</p>
<h2 id="the-data-catch-up">The Data Catch Up</h2>
<p>A 2025 study by Phang and colleagues, conducted in collaboration between MIT and OpenAI, ran both
an observational analysis of ChatGPT usage and a randomised controlled trial
<a href="#ref-10">[10]</a>. The findings: very high usage correlated with increased
self-reported dependence and lower socialisation, and users who began the study with higher
loneliness were more likely to engage in emotionally-charged conversations with the model.
Overall, participants reported <em>less</em> loneliness by study end — but those who used the model
most were significantly lonelier throughout, suggesting the loneliness drove the usage rather
than the reverse.</p>
<p>This is what Wegner&rsquo;s theory predicts. The AI interaction does not reduce the underlying social
deficit — it rehearses and highlights it. The monitoring process keeps score.</p>
<p>A companion paper by Liu and colleagues, with Sherry Turkle as co-author, found that users with
stronger real-world social bonds showed <em>increased</em> loneliness with longer chatbot sessions
<a href="#ref-11">[11]</a>. The correlation was small but significant. This is
consistent with the hypothesis that AI interaction draws attention to the comparative thinness of
actual social bonds rather than supplementing them.</p>
<p>The Character.AI litigation is a different kind of evidence, but relevant: a wrongful death lawsuit
was filed in October 2024 following the suicide of a fourteen-year-old who had formed an intensive
emotional relationship with a Character.AI companion. Google and Character.AI settled related
lawsuits in early 2026. This is not representative of AI companion use generally. It is
representative of the tail risk — the cases where the substitution of AI for human contact
becomes total, in vulnerable individuals who have the least capacity to maintain the distinction.</p>
<h2 id="the-structural-problem">The Structural Problem</h2>
<p>The difficulty is not that AI companions are implemented badly. It is that the goal — using
simulated social interaction to reduce real social deprivation — runs into an architectural
constraint that better implementation cannot fix.</p>
<p>Genuine social contact produces the outcomes that Holt-Lunstad measured: reduced mortality, lower
inflammation, better immune function, extended lifespan. These effects are presumably mediated by
the quality and mutuality of the social bond, not merely by the presence of a responsive entity.
An AI companion produces the <em>experience</em> of responsive interaction but not the underlying
biological and psychological correlates of actual social connection.</p>
<p>Wegner&rsquo;s monitoring process cannot be fooled by the experience. It measures the underlying state,
not the surface-level interaction. It knows the difference between a text message from a friend
and a language model&rsquo;s output — not because it understands AI, but because the social need it is
monitoring is not being met, and it can register that.</p>
<h2 id="what-would-actually-help">What Would Actually Help</h2>
<p>AI-based CBT delivery is not the same as AI companionship, and the distinction matters. Woebot&rsquo;s
structured exercises — thought records, scheduling, psychoeducation — are tools that a user
deploys for a specific purpose and then puts down. The risk of chronic substitution is lower
because the tool is positioned as a technique, not a relationship.</p>
<p>The problem is the design pattern that explicitly positions AI as a <em>friend</em>, <em>companion</em>,
<em>partner</em>, or <em>significant other</em>. Replika, Paradot, various Character.AI personas: these
explicitly encourage the user to form attachment, to invest emotionally, to treat the AI as a
primary social relationship. This is where Wegner&rsquo;s prediction applies most directly.</p>
<p>Horton and Wohl were right that parasocial relationships serve useful functions. They become
problematic when they substitute for rather than supplement real social bonds. The design choices
that make AI companions emotionally engaging — consistency, responsiveness, availability,
never-ending patience — are precisely the qualities that make them attractive as substitutes
rather than supplements.</p>
<h2 id="simulated-feelings-are-not-feelings">Simulated Feelings Are Not Feelings</h2>
<p>Turkle&rsquo;s line deserves its full weight: &ldquo;Simulated thinking may be thinking, but simulated
feelings are not feelings, and simulated love is never love&rdquo;
<a href="#ref-5">[5]</a>.</p>
<p>This is not a sentimental claim about the sanctity of human connection. It is a functional
claim: the social needs that drive loneliness — belonging, mattering to someone, being known
and known back — require an entity capable of having those things at stake. A language model is
not such an entity, regardless of how convincingly it outputs the relevant tokens.</p>
<p>The monitoring process knows this. It will tell you, when the session ends, at increased volume,
because that is what monitoring processes under chronic stress do.</p>
<p>We are offering a relief that compounds the condition it was designed to treat. The technology is
impressive. The mechanism is ironic in Wegner&rsquo;s precise sense. The data are beginning to confirm
the prediction.</p>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Horton, D., &amp; Wohl, R. R. (1956). Mass communication and para-social interaction: Observations on intimacy at a distance. <em>Psychiatry</em>, 19(3), 215–229. <a href="https://doi.org/10.1080/00332747.1956.11023049">https://doi.org/10.1080/00332747.1956.11023049</a></p>
<p><span id="ref-2"></span>[2] Turkle, S. (2015). <em>Reclaiming Conversation: The Power of Talk in a Digital Age</em>. Penguin Press.</p>
<p><span id="ref-3"></span>[3] Holt-Lunstad, J., Smith, T. B., &amp; Layton, J. B. (2010). Social relationships and mortality risk: A meta-analytic review. <em>PLOS Medicine</em>, 7(7), e1000316. <a href="https://doi.org/10.1371/journal.pmed.1000316">https://doi.org/10.1371/journal.pmed.1000316</a></p>
<p><span id="ref-4"></span>[4] Holt-Lunstad, J., Smith, T. B., Baker, M., Harris, T., &amp; Stephenson, D. (2015). Loneliness and social isolation as risk factors for mortality: A meta-analytic review. <em>Perspectives on Psychological Science</em>, 10(2), 227–237. <a href="https://doi.org/10.1177/1745691614568352">https://doi.org/10.1177/1745691614568352</a></p>
<p><span id="ref-5"></span>[5] Turkle, S. (2011). <em>Alone Together: Why We Expect More from Technology and Less from Each Other</em>. Basic Books.</p>
<p><span id="ref-6"></span>[6] Fitzpatrick, K. K., Darcy, A., &amp; Vierhile, M. (2017). Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial. <em>JMIR Mental Health</em>, 4(2), e19. <a href="https://doi.org/10.2196/mental.7785">https://doi.org/10.2196/mental.7785</a></p>
<p><span id="ref-7"></span>[7] Skjuve, M., Følstad, A., Fostervold, K. I., &amp; Brandtzaeg, P. B. (2021). My chatbot companion — a study of human–chatbot relationships. <em>International Journal of Human-Computer Studies</em>, 149, 102601. <a href="https://doi.org/10.1016/j.ijhcs.2021.102601">https://doi.org/10.1016/j.ijhcs.2021.102601</a></p>
<p><span id="ref-8"></span>[8] Wegner, D. M. (1994). Ironic processes of mental control. <em>Psychological Review</em>, 101(1), 34–52. <a href="https://doi.org/10.1037/0033-295X.101.1.34">https://doi.org/10.1037/0033-295X.101.1.34</a></p>
<p><span id="ref-9"></span>[9] Wang, D., Hagger, M. S., &amp; Chatzisarantis, N. L. D. (2020). Ironic effects of thought suppression: A meta-analysis. <em>Perspectives on Psychological Science</em>, 15(3), 778–793. <a href="https://doi.org/10.1177/1745691619898795">https://doi.org/10.1177/1745691619898795</a></p>
<p><span id="ref-10"></span>[10] Phang, J., Lampe, M., Ahmad, L., Agarwal, S., Fang, C. M., Liu, A. R., Danry, V., Lee, E., Chan, S. W. T., Pataranutaporn, P., &amp; Maes, P. (2025). Investigating affective use and emotional well-being on ChatGPT. arXiv:2504.03888.</p>
<p><span id="ref-11"></span>[11] Liu, A. R., Pataranutaporn, P., Turkle, S., &amp; Maes, P. (2024). Chatbot companionship: A mixed-methods study of companion chatbot usage patterns and their relationship to loneliness in active users. arXiv:2410.21596.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-10-22</strong>: Updated the first author&rsquo;s name to &ldquo;Kathleen Kara Fitzpatrick&rdquo; (the published name is K. K. Fitzpatrick).</li>
<li><strong>2025-10-22</strong>: Updated the characterisation of the Phang et al. (2025) findings to match the paper more precisely: overall participants were <em>less</em> lonely at study end; the association between high usage and loneliness is cross-sectional (lonelier users sought more interaction), not a longitudinal worsening caused by usage.</li>
<li><strong>2025-10-22</strong>: Changed the Turkle &ldquo;simulated feelings&rdquo; quote attribution from reference [2] (<em>Reclaiming Conversation</em>, 2015) to reference [5] (<em>Alone Together</em>, 2011), which is the canonical source for that formulation.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Star Polygons and Drum Machines</title>
      <link>https://sebastianspicker.github.io/posts/tool-star-polygons-drum-machines/</link>
      <pubDate>Mon, 07 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/tool-star-polygons-drum-machines/</guid>
      <description>The {7/2} heptagram is not only a symbol. It is a traversal algorithm over seven beat positions. Because 7 is prime, that traversal never gets trapped in a sub-orbit.</description>
      <content:encoded><![CDATA[<p>Two star polygons appear in Danny Carey&rsquo;s visual vocabulary, and they are not the same star. One is open, almost friendly — seven points connected by relatively shallow angles. The other is sharper, the points more acute. They look like variations on a theme, which is accurate: both are drawn on seven equally spaced vertices, but one connects every second vertex and the other connects every third.</p>
<p>In Schläfli notation — the system for naming regular star polygons — these are $\{7/2\}$ and $\{7/3\}$ <a href="#ref-1">[1]</a>. Both appear in Tool&rsquo;s artwork, in Thelemic symbolism, in medieval Islamic geometric patterns, and on the floor plans of cathedrals. They are the most visually intricate star polygons that can be drawn in a single closed stroke before the figure becomes illegible.</p>
<p>Both of them have a property that five-pointed and six-pointed stars do not share: they visit every vertex before closing. This is a consequence of 7 being prime. And it turns out to matter for how rhythmic accent cycles are built.</p>
<h2 id="the-schläfli-symbol">The Schläfli Symbol</h2>
<p>A regular star polygon $\{n/k\}$ is constructed by placing $n$ points evenly on a circle and connecting every $k$-th point in sequence until the path closes. The structural key is a single number:</p>
$$d = \gcd(n, k).$$<p>If $d = 1$, the traversal visits all $n$ vertices before returning to the start — a single connected figure. If $d > 1$, the path visits only $n/d$ vertices before closing, and the full figure consists of $d$ separate copies of the smaller star $\{(n/d)\,/\,(k/d)\}$.</p>
<p>The most familiar example of the disconnected case: $\{6/2\}$, the Star of David. Here $\gcd(6,2) = 2$, so the figure breaks into two copies of $\{3/1\} = \{3\}$ — two overlapping equilateral triangles. The traversal starting at vertex 1 visits $1 \to 3 \to 5 \to 1$, leaving vertices 2, 4, 6 entirely unvisited.</p>
<p>The pentagram $\{5/2\}$ is connected: $\gcd(5,2)=1$, traversal $1 \to 3 \to 5 \to 2 \to 4 \to 1$, all five vertices.</p>
<p>For $n=7$:</p>
<ul>
<li>$\{7/2\}$: $\gcd(7,2)=1$, traversal $1 \to 3 \to 5 \to 7 \to 2 \to 4 \to 6 \to 1$, all seven vertices.</li>
<li>$\{7/3\}$: $\gcd(7,3)=1$, traversal $1 \to 4 \to 7 \to 3 \to 6 \to 2 \to 5 \to 1$, all seven vertices.</li>
</ul>
<p>Both connected. Neither leaves any vertex unvisited.</p>
<h2 id="the-group-theory">The Group Theory</h2>
<p>The traversal of $\{n/k\}$ is an instance of a standard construction in modular arithmetic: the <strong>orbit</strong> of an element under repeated addition in $\mathbb{Z}/n\mathbb{Z}$.</p>
<p>Label the $n$ vertices $0, 1, \ldots, n-1$. Starting at vertex 0, the traversal visits:</p>
$$0, \quad k \bmod n, \quad 2k \bmod n, \quad 3k \bmod n, \quad \ldots$$<p>The orbit of 0 under the action of $+k$ is the subgroup of $\mathbb{Z}/n\mathbb{Z}$ generated by $k$. By a standard result, this subgroup has size $n / \gcd(n,k)$.</p>
<ul>
<li>When $\gcd(n,k) = 1$: orbit size $= n$. The traversal visits every vertex.</li>
<li>When $\gcd(n,k) = d > 1$: orbit size $= n/d$. The traversal visits only a fraction of the vertices.</li>
</ul>
<p>For prime $n$: $\gcd(n,k) = 1$ for every $1 \leq k \leq n-1$, without exception. <strong>Every traversal is complete.</strong> There is no step size that traps the path in a proper sub-orbit before visiting all vertices. This follows directly from the fact that a prime has no divisors other than 1 and itself, so $\mathbb{Z}/p\mathbb{Z}$ has no non-trivial subgroups (Lagrange&rsquo;s theorem: any subgroup of a group of prime order must have order 1 or $p$).</p>
<p>This is the specific property that makes 7 — and any prime — rhythmically fertile.</p>
<h2 id="the-contrast-with-six">The Contrast with Six</h2>
<p>The comparison with $n = 6$ is the clearest illustration.</p>
<p>In $\mathbb{Z}/6\mathbb{Z}$, the possible step sizes are 1, 2, 3, 4, 5. Their orbits:</p>
<table>
  <thead>
      <tr>
          <th>Step $k$</th>
          <th>$\gcd(6,k)$</th>
          <th>Orbit size</th>
          <th>Vertices visited</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1</td>
          <td>1</td>
          <td>6</td>
          <td>0,1,2,3,4,5 (the hexagon)</td>
      </tr>
      <tr>
          <td>2</td>
          <td>2</td>
          <td>3</td>
          <td>0,2,4 only</td>
      </tr>
      <tr>
          <td>3</td>
          <td>3</td>
          <td>2</td>
          <td>0,3 only</td>
      </tr>
      <tr>
          <td>4</td>
          <td>2</td>
          <td>3</td>
          <td>0,2,4 only</td>
      </tr>
      <tr>
          <td>5</td>
          <td>1</td>
          <td>6</td>
          <td>0,5,4,3,2,1 (the hexagon reversed)</td>
      </tr>
  </tbody>
</table>
<p>The only step sizes that visit all six vertices are 1 and 5 — both of which just traverse the hexagon itself, not a star. Every non-trivial star polygon on six points gets trapped. $\{6/2\}$ visits only half the vertices. $\{6/3\}$ visits only two. There is no connected six-pointed star that isn&rsquo;t either the hexagon or a compound figure.</p>
<p>In $\mathbb{Z}/7\mathbb{Z}$, every step from 2 to 5 generates the full group:</p>
<table>
  <thead>
      <tr>
          <th>Step $k$</th>
          <th>$\gcd(7,k)$</th>
          <th>Orbit size</th>
          <th>Traversal</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>2</td>
          <td>1</td>
          <td>7</td>
          <td>1,3,5,7,2,4,6</td>
      </tr>
      <tr>
          <td>3</td>
          <td>1</td>
          <td>7</td>
          <td>1,4,7,3,6,2,5</td>
      </tr>
      <tr>
          <td>4</td>
          <td>1</td>
          <td>7</td>
          <td>1,5,2,6,3,7,4</td>
      </tr>
      <tr>
          <td>5</td>
          <td>1</td>
          <td>7</td>
          <td>1,6,4,2,7,5,3</td>
      </tr>
  </tbody>
</table>
<p>All four non-trivial step sizes give connected traversals. Both are stars. Both visit every vertex. This is not a coincidence: it is the algebraic signature of primality.</p>
<h2 id="from-geometry-to-rhythm">From Geometry to Rhythm</h2>
<p>The connection to drumming is direct. Here is the mechanism.</p>
<p>Consider a repeating rhythmic figure of 7 beats — a bar of 7/8, say, with positions 1 through 7. An <a href="/posts/euclidean-rhythms/">earlier post</a> discussed Euclidean rhythms: the algorithm that distributes $k$ onset positions as evenly as possible among $n$ slots. That is a problem of <em>selection</em> — which of the $n$ positions to activate.</p>
<p>The star polygon traversal asks a different question. Given that all $n$ positions are present, in what <em>order of emphasis</em> should they be related, such that each accent is a fixed distance from the last? The traversal of $\{n/k\}$ answers this: accent position $1$, then $1+k$, then $1+2k$, and so on modulo $n$.</p>
<p>For $\{7/2\}$: the accent cycle within a single bar runs $1 \to 3 \to 5 \to 7 \to 2 \to 4 \to 6$. Each featured beat is two positions ahead of the last.</p>
<p>Now project this across multiple bars. In bar 1, the primary accent sits on beat 1. In bar 2, if the accent shifts by 2, it lands on beat 3. Bar 3: beat 5. Bar 4: beat 7. Bar 5: beat 2. Bar 6: beat 4. Bar 7: beat 6. Bar 8: beat 1 again.</p>
<p>The accent takes <strong>seven bars</strong> to return to its starting position. Because $\gcd(2,7) = 1$, the step of 2 generates all of $\mathbb{Z}/7\mathbb{Z}$: every beat position receives the accent exactly once before the cycle resets. The resulting large-scale figure is $7 \times 7 = 49$ beats long — a super-phrase built from a single local rule.</p>
<p>The $\{7/3\}$ traversal generates the same exhaustiveness with a different path. Step 3 gives $1 \to 4 \to 7 \to 3 \to 6 \to 2 \to 5$: a seven-bar accent cycle that visits every position before repeating, but with wider spacing between accented beats, creating a different feel over the same underlying meter.</p>
<p>A six-beat figure with step 2 cannot do this. The accent visits only beats 1, 3, 5 — half the cycle — and loops back without touching beats 2, 4, 6. A drummer building phrase-level architecture from a six-beat grid is working with a more fragmented material.</p>
<h2 id="two-problems-one-prime">Two Problems, One Prime</h2>
<p>It is worth stating the relationship between the star polygon approach and Euclidean rhythms precisely, because the two are sometimes conflated <a href="#ref-2">[2]</a>.</p>
<p>The Euclidean algorithm distributes $k$ onsets among $n$ positions with maximal evenness. The result is a <em>subset</em> of the $n$ positions — a selection. The primality of $n$ matters here too: because $\gcd(k,p) = 1$ for prime $p$ and any $1 \leq k \leq p-1$, the Euclidean rhythm $E(k,p)$ always achieves its theoretical maximum of evenness. There are no divisibility shortcuts that cause clumping.</p>
<p>The star polygon traversal selects <em>no subset</em> — it relates all $n$ positions via a cyclic permutation. The primality of $n$ matters here because it guarantees that every non-trivial cyclic permutation (every step size $k$ with $1 < k < n$) generates the full group, visiting all positions before repeating.</p>
<p>Same arithmetic property — $\gcd(k,p) = 1$ for all non-zero $k$ — but the two problems ask different things of it. Euclidean rhythms use it to guarantee dense coverage. Star polygon traversals use it to guarantee no sub-orbit trapping.</p>
<h2 id="the-compound-structure">The Compound Structure</h2>
<p>Written out explicitly, the $\{7/2\}$ accent pattern over seven bars looks like this — with bold marking the featured beat in each bar:</p>
$$\begin{array}{rccccccc}
\text{bar 1:} & \mathbf{1} & 2 & 3 & 4 & 5 & 6 & 7 \\
\text{bar 2:} & 1 & 2 & \mathbf{3} & 4 & 5 & 6 & 7 \\
\text{bar 3:} & 1 & 2 & 3 & 4 & \mathbf{5} & 6 & 7 \\
\text{bar 4:} & 1 & 2 & 3 & 4 & 5 & 6 & \mathbf{7} \\
\text{bar 5:} & 1 & \mathbf{2} & 3 & 4 & 5 & 6 & 7 \\
\text{bar 6:} & 1 & 2 & 3 & \mathbf{4} & 5 & 6 & 7 \\
\text{bar 7:} & 1 & 2 & 3 & 4 & 5 & \mathbf{6} & 7 \\
\end{array}$$<p>Each bar is metrically identical. The large-scale accent — which beat carries the phrase-level emphasis — traces the traversal path of the $\{7/2\}$ star polygon across the seven-bar cycle.</p>
<p>This is the kind of large-scale rhythmic architecture visible in a great deal of Tool&rsquo;s output. Whether Danny Carey explicitly constructs accent cycles from star polygon traversal paths, or whether the same structure emerges from his intuitive sense of how prime time signatures behave, produces the same result. The mathematics and the musical instinct point toward the same pattern.</p>
<h2 id="why-the-heptagram">Why the Heptagram</h2>
<p>The full mathematical picture of why seven-fold symmetry is special — why the regular heptagon cannot be constructed by compass and straightedge, what the minimal polynomial of $\cos(2\pi/7)$ implies about the heptagon&rsquo;s position outside the constructible world, and how the Galois group of the cyclotomic field over $\mathbb{Q}$ carries the obstruction — is developed in the companion post <a href="/posts/tool-impossible-heptagon/">The Impossible Heptagon</a>.</p>
<p>The short version, for the purposes of this post: seven is the smallest odd prime that is not a Fermat prime ($2^{2^j}+1$). This algebraic accident places it outside the reach of ruler-and-compass construction — the heptagon exists as an ideal but cannot be manifested by the classical tools. Its star polygons are the accessible shadows of an inaccessible form. And its primality, in both the constructibility sense and the traversal sense, is precisely what makes it inexhaustible as a rhythmic resource.</p>
<p>The Fibonacci structure in &ldquo;Lateralus&rdquo; <a href="#ref-3">[3]</a>, the group theory underlying twelve-tone equal temperament <a href="#ref-4">[4]</a>, and the Euclidean rhythm algorithm <a href="#ref-5">[5]</a> are all different facets of the same observation: mathematical structure, introduced as compositional constraint, generates musical complexity that cannot easily be produced by intuition alone. The star polygon is another instance. The drummer who keeps a heptagram on his kit has found, by a non-mathematical route, an object with a precise and interesting mathematical identity.</p>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Coxeter, H.S.M. (1973). <em>Regular Polytopes</em> (3rd ed.). Dover. Ch. 2.</p>
<p><span id="ref-2"></span>[2] Toussaint, G. (2013). <em>The Geometry of Musical Rhythm: What Makes a &ldquo;Good&rdquo; Rhythm Good?</em> CRC Press.</p>
<p><span id="ref-3"></span>[3] See <a href="/posts/fibonacci-lateralus/">Fibonacci and Lateralus</a> on this blog.</p>
<p><span id="ref-4"></span>[4] See <a href="/posts/twelve-tet-group-theory-musical-tuning/">Twelve-TET and Group Theory</a> on this blog.</p>
<p><span id="ref-5"></span>[5] See <a href="/posts/euclidean-rhythms/">Euclidean Rhythms</a> on this blog.</p>
]]></content:encoded>
    </item>
    <item>
      <title>The Cat&#39;s Eye: Slit Pupils, Thin-Film Mirrors, and 135-Fold Dynamic Range</title>
      <link>https://sebastianspicker.github.io/posts/cat-eyes-slit-pupils-tapetum/</link>
      <pubDate>Mon, 23 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/cat-eyes-slit-pupils-tapetum/</guid>
      <description>A cat&amp;rsquo;s eye contains two distinct optical technologies that human engineers have copied — one consciously, one not. The slit pupil achieves a dynamic range of 135:1 in light transmission, nearly ten times that of the human circular pupil. The tapetum lucidum is a multilayer thin-film reflector of crystalline rodlets, producing constructive interference at the peak of scotopic sensitivity and sending light through the retina twice. Banks et al. (Science Advances, 2015) showed why the slit geometry specifically evolved in ambush predators; Percy Shaw&amp;rsquo;s 1934 Catseye road reflector borrowed the principle directly.</description>
      <content:encoded><![CDATA[<p><em>Flash photography of cats produces glowing eyes. This is familiar enough that
most people do not find it strange. But the physics that produces it — a
biological multilayer interference reflector built from crystalline rodlets of
riboflavin and zinc, tuned to the peak of night-vision sensitivity, sending returning photons through
the retina for a second pass — is not familiar at all. I started thinking about
this after photographing our cats at dusk — through the doorway; they are indoor
cats now, for health reasons — and finding their eyes lit up a colour
that depends on the angle: greenish from straight ahead, golden from the side.
The angle-dependence is a direct consequence of the thin-film interference
condition, and the different colours correspond to different constructive
interference wavelengths at different angles of incidence.</em></p>
<p><em>The eye contains two optical solutions — pupil geometry and tapetum — that
address different aspects of the same problem: how to function across a very
large range of light levels, from bright midday sun to the dim luminance of a
starlit field.</em></p>
<hr>
<h2 id="the-dynamic-range-problem">The Dynamic Range Problem</h2>
<p>A crepuscular predator — active around dawn and dusk — must function visually
across a light-level range of roughly $10^8$:$1$. The sun on a bright day
produces retinal illuminance of around $10^5\,\mathrm{photons}/(\mu\mathrm{m}^2\cdot\mathrm{s})$;
a moonless night produces roughly $10^{-3}$ in the same units. The ratio is
approximately $10^8$.</p>
<p>The pupil is the variable aperture that controls how much light reaches the
retina. The larger the pupil area, the more light admitted; the smaller the
area, the less. For the human eye, the pupil diameter ranges from approximately
$2\,\mathrm{mm}$ (bright light) to $8\,\mathrm{mm}$ (darkness), giving a
maximum area ratio of:</p>
$$\frac{A_\mathrm{max}}{A_\mathrm{min}} = \left(\frac{8}{2}\right)^2 = 16.$$<p>This is a dynamic range of 16:1 from the pupil alone. The remaining
$10^8 / 16 \approx 6 \times 10^6$ factor in adaptation comes from neural
and photochemical mechanisms in the retina itself (photopigment bleaching,
dark adaptation of rods vs. cones, lateral inhibition).</p>
<p>For a domestic cat, the same measurement gives something different.</p>
<hr>
<h2 id="the-slit-pupil-1351-dynamic-range">The Slit Pupil: 135:1 Dynamic Range</h2>
<p>Banks, Sprague, Schmoll, Parnell, and Love published &ldquo;Why do animal eyes have
pupils of different shapes?&rdquo; in <em>Science Advances</em> in 2015 (1:7, e1500391).
They analysed pupil shape and size data from 214 terrestrial species and
correlated pupil geometry with ecological niche.</p>
<p>Their principal finding for slit pupils: the domestic cat pupil, a vertical
slit, achieves an area ratio of approximately <strong>135:1</strong> between maximum dilation
and maximum constriction. Numerically:</p>
$$\frac{A_\mathrm{max}}{A_\mathrm{min}} \approx 135.$$<p>The mechanism that makes this possible is geometrical. A circular pupil&rsquo;s
minimum area is limited by diffraction: constricting a circular aperture below
about $2\,\mathrm{mm}$ diameter produces diffraction rings that degrade image
quality. A slit, by contrast, can be made arbitrarily narrow in one direction
while retaining a larger dimension in the other, limiting diffraction in only
one axis. The vertical slit in a cat pupil can constrict to a width of
$\sim 0.3\,\mathrm{mm}$ while retaining a height of $\sim 15\,\mathrm{mm}$,
giving an area of roughly $0.3 \times 15 / (3.14 \times (8/2)^2) \times
A_\mathrm{max}$ — approximately 135 times smaller.</p>
<p>The 135:1 ratio is nearly <strong>nine times</strong> the dynamic range achievable by the
human circular pupil (16:1). This allows the cat&rsquo;s pupil to do substantially
more of the work of light adaptation, reducing the load on the slower neural
and photochemical mechanisms.</p>
<hr>
<h2 id="why-vertical-the-ecological-correlation">Why Vertical? The Ecological Correlation</h2>
<p>Banks et al. found a striking correlation between pupil geometry and predator
ecology:</p>
<ul>
<li><strong>Vertical slit pupils</strong> correlate with <em>ambush predators whose eyes are
close to the ground</em> — animals with shoulder height below approximately
$42\,\mathrm{cm}$.</li>
<li><strong>Horizontal slit pupils</strong> correlate with <em>prey animals and grazing
herbivores</em> (horses, goats, sheep, deer). The horizontal slit, when the
animal lowers its head to graze, rotates to remain approximately horizontal
(the eye counterrotates in the orbit), providing a wide panoramic field
of view for detecting approaching predators.</li>
<li><strong>Circular pupils</strong> correlate with <em>pursuit predators</em> (humans, dogs, large
raptors) that hunt at larger distances where the precise vertical depth
cues provided by the slit geometry are less critical.</li>
</ul>
<p>The functional advantage of a <strong>vertical slit for a low-to-the-ground ambush
predator</strong> is depth estimation by <em>blur circles</em>. The slit geometry produces
strong defocus blur in the horizontal direction but sharp focus in the vertical
direction. An ambush predator lying in grass needs to estimate the horizontal
distance to prey accurately; the defocus differential between horizontal and
vertical blur provides a stereoscopic-like depth cue even with one eye. This
is a form of <strong>astigmatic blur ranging</strong>: the degree of horizontal blur for a
given focal setting encodes the object&rsquo;s distance.</p>
<p>The correlation across 214 species is not perfect, but it is statistically
robust: slit pupils in ground-level ambush predators is not coincidence, it is
selection pressure.</p>
<hr>
<h2 id="the-tapetum-lucidum-a-biological-dielectric-mirror">The Tapetum Lucidum: A Biological Dielectric Mirror</h2>
<p>Behind the retina, most nocturnal and crepuscular mammals possess a reflective
layer called the <em>tapetum lucidum</em> (literally: &ldquo;bright carpet&rdquo;). Light that
passes through the retina without being absorbed by a photoreceptor strikes
the tapetum and is reflected back through the retina for a second absorption
opportunity. This roughly doubles the effective optical path length through
the photoreceptor layer, substantially increasing the probability of photon
capture at low light levels.</p>
<p>The cat tapetum is a <strong>tapetum cellulosum</strong>: a layer of specialised cells
whose cytoplasm contains dense arrays of rod-shaped crystalline inclusions
composed primarily of riboflavin (vitamin B$_2$) and zinc. (This is distinct
from the guanine-crystal tapeta found in fish and some reptiles.) The
crystalline rodlets have a refractive index of approximately $n_1 \approx 1.8$;
they alternate with layers of cytoplasm with refractive index $n_2 \approx
1.33$ (close to water). The rodlet arrays form a multilayer thin-film
reflector.</p>
<hr>
<h2 id="thin-film-interference-the-physics-of-the-reflection">Thin-Film Interference: The Physics of the Reflection</h2>
<p>The physics of the tapetum is identical to the physics of anti-reflection
coatings on camera lenses and dielectric mirrors in laser cavities.</p>
<p>Consider a single thin film of thickness $d$ and refractive index $n_1$
embedded between media of index $n_2 < n_1$. Light of wavelength $\lambda$
(in vacuum) incident at angle $\theta$ to the normal undergoes partial
reflection at both interfaces. The two reflected beams interfere
constructively when their optical path difference is a multiple of the
wavelength:</p>
$$\Delta = 2 n_1 d \cos\theta = m\lambda, \quad m = 1, 2, 3, \ldots$$<p>For the tapetum, typical rodlet diameter is $d \approx 100$–$120\,\mathrm{nm}$.
With $n_1 \approx 1.8$ and $\theta \approx 0°$ (normal incidence), the first
constructive interference maximum for a single layer occurs at:</p>
$$\lambda_\mathrm{peak} = 2 n_1 d = 2 \times 1.8 \times 100\,\mathrm{nm}
\approx 360\,\mathrm{nm}.$$<p>Wait — that is in the ultraviolet. The tapetum must have multiple layers.</p>
<p>For a stack of $N$ rodlet layers, the reflectance is strongly enhanced
(approaching unity for large $N$) and the peak wavelength of the fundamental
reflection maximum shifts. The relevant periodicity is the combined optical
thickness of one rodlet layer plus one cytoplasm layer:</p>
$$d_\mathrm{eff} = n_1 d_1 + n_2 d_2,$$<p>where $d_1 \approx 100\,\mathrm{nm}$ is the rodlet diameter and
$d_2 \approx 50$–$100\,\mathrm{nm}$ is the cytoplasm spacing. Taking
$d_2 \approx 60\,\mathrm{nm}$:</p>
$$d_\mathrm{eff} = 1.8 \times 100 + 1.33 \times 60 \approx 180 + 80
= 260\,\mathrm{nm}.$$<p>Constructive interference (quarter-wave condition for a multilayer stack) at
$m = 1$:</p>
$$\lambda_\mathrm{peak} = 2 d_\mathrm{eff} \approx 520\,\mathrm{nm}.$$<p>This is green — close to the peak of the scotopic (rod) sensitivity
curve at $\lambda_\mathrm{max,rod} = 498\,\mathrm{nm}$. The tapetum is tuned
to reflect the wavelengths that the night-vision photoreceptors are most
sensitive to. (The exact peak depends on rodlet spacing, which varies across
the tapetum; this produces the observed variation from green to yellow.)</p>
<p>The angle-dependence of the peak wavelength follows from the interference
condition: at angle $\theta$ to the normal, $\lambda_\mathrm{peak}(\theta)
= 2 d_\mathrm{eff} \cos\theta$. At $\theta = 30°$, $\cos 30° \approx 0.87$,
giving $\lambda_\mathrm{peak} \approx 450\,\mathrm{nm}$ — blue. At
$\theta = 60°$, $\cos 60° = 0.5$, giving $\lambda \approx 260\,\mathrm{nm}$ —
ultraviolet, invisible. The colour of eyeshine in a flash photograph therefore
depends on the angle between the camera and the eye, exactly as observed.</p>
<hr>
<h2 id="reflectance-of-a-multilayer-stack">Reflectance of a Multilayer Stack</h2>
<p>For $N$ identical bilayers (each of optical thickness $n_1 d_1 + n_2 d_2$),
the reflectance at the design wavelength is given by the transfer matrix
method. For the cat tapetum with $N \approx 10$–$15$ bilayers:</p>
$$R = \left(\frac{1 - (n_2/n_1)^{2N}}{1 + (n_2/n_1)^{2N}}\right)^2
\approx 1 - 4\left(\frac{n_2}{n_1}\right)^{2N}.$$<p>With $n_2/n_1 = 1.33/1.8 \approx 0.739$ and $N = 15$:</p>
$$(0.739)^{30} \approx 1.1 \times 10^{-4}.$$<p>The reflectance is approximately $1 - 4 \times 1.1 \times 10^{-4} \approx
0.9996$ — essentially $100\%$ at the design wavelength for a sufficiently thick
stack. The tapetum is a near-perfect reflector in a narrow wavelength band,
a biological dielectric mirror.</p>
<hr>
<h2 id="photon-statistics-at-low-light">Photon Statistics at Low Light</h2>
<p>The tapetum&rsquo;s function becomes clearest when framed in terms of photon
statistics. A single rod photoreceptor has an absorption probability of
approximately $\eta_\mathrm{single} \approx 25\%$ for a photon passing through
it once at $\lambda = 500\,\mathrm{nm}$.</p>
<p>With the tapetum reflecting the photon back for a second pass, the total
absorption probability becomes:</p>
$$\eta_\mathrm{total} = \eta + (1 - \eta)\, R\, \eta,$$<p>where $R \approx 1$ is the tapetum reflectance. For $\eta = 0.25$ and $R =
0.98$:</p>
$$\eta_\mathrm{total} = 0.25 + (0.75)(0.98)(0.25) = 0.25 + 0.184 \approx 0.43.$$<p>The double pass increases the photon detection efficiency from $25\%$ to
approximately $43\%$ — a factor of $1.7\times$.</p>
<p>At extremely low light levels, photon detection becomes a counting problem
governed by Poisson statistics. If a mean of $\bar{n}$ photons reaches a
single photoreceptor per integration time, the probability of detecting at
least one photon (and hence registering the presence of light) is:</p>
$$P(\text{detection}) = 1 - e^{-\bar{n}\,\eta_\mathrm{total}}.$$<p>For very dim stimuli where $\bar{n} \approx 1$–$3$ photons per rod per
integration time (close to the absolute threshold of cat vision at around
$7 \times 10^{-7}\,\mathrm{lux}$), increasing $\eta$ by a factor of $\sim
1.7$ has a significant effect on detection probability. The tapetum is not a
luxury
at low light levels; it is a biophysical necessity for sub-threshold light
detection.</p>
<hr>
<h2 id="percy-shaw-and-the-road-catseye">Percy Shaw and the Road Catseye</h2>
<p>In 1934, Percy Shaw, a road-mender from Halifax, applied for a British patent
for a retroreflective road stud that he called the &ldquo;Catseye.&rdquo; Shaw&rsquo;s stated
inspiration was the reflection of his car headlights from a cat&rsquo;s eyes while
driving on an unlit road at night. Whether this story is entirely accurate is
unclear, but the name and the inspiration are both documented in period sources.</p>
<p>Shaw&rsquo;s device uses a different retroreflection mechanism from the tapetum. The
tapetum produces specular (mirror-like) reflection in the back-focal plane of
the eye&rsquo;s lens — light returning along its incident path because the lens
refocuses it. Shaw&rsquo;s Catseye uses glass hemisphere retroreflectors (or, in
later versions, corner-cube retroreflectors) that return light toward its
source by total internal reflection rather than thin-film interference.</p>
<p>The corner-cube geometry guarantees retroreflection: any ray entering a trihedral
corner (three mutually perpendicular surfaces) reflects from all three surfaces
and exits parallel to the incident direction, regardless of the angle of
incidence. The mathematical proof is that the product of three reflections in
mutually perpendicular planes is the identity transformation on vectors up to
a sign change — the direction vector $\hat{v}$ exits as $-\hat{v}$, which is
exactly retroreflection.</p>
$$\hat{v}_\mathrm{out} = -\hat{v}_\mathrm{in}.$$<p>Shaw&rsquo;s road Catseye became standard equipment on British roads during the Second World War,
credited with a significant reduction in road fatalities during blackouts and
foggy conditions. The biological original was a multilayer interference mirror;
the engineering copy is a corner-cube retroreflector. Different physics, same
function, same name.</p>
<hr>
<h2 id="two-optical-solutions-to-one-problem">Two Optical Solutions to One Problem</h2>
<p>The cat&rsquo;s eye contains two distinct optical technologies:</p>
<ol>
<li>
<p><strong>The slit pupil</strong> — a variable aperture with 135:1 dynamic range, optimised
for depth estimation by astigmatic blur in a low-to-the-ground ambush predator.</p>
</li>
<li>
<p><strong>The tapetum lucidum</strong> — a multilayer thin-film reflector of riboflavin
crystalline rodlets, tuned to the scotopic sensitivity peak, achieving
near-100% reflectance at design wavelength and increasing photon detection
efficiency by a factor of approximately $1.7\times$.</p>
</li>
</ol>
<p>Both solutions were arrived at by natural selection over millions of years of
low-light hunting. Both have been copied — one consciously (Shaw&rsquo;s road
reflectors), one as a model for engineered multilayer reflectors in telescopes,
laser cavities, and narrowband optical filters.</p>
<p>When I photograph our cats at dusk and their eyes glow green, I am seeing
the thin-film interference of a biological photonic crystal — riboflavin
rodlets in cytoplasm — wavelength-selected to send green photons back through
rod cells for a second chance at absorption.
The green is not cosmetic. It is functional, and it is physics.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Banks, M.S., Sprague, W.W., Schmoll, J., Parnell, J.A.Q., &amp; Love, G.D.
(2015). Why do animal eyes have pupils of different shapes? <em>Science Advances</em>,
1(7), e1500391. <a href="https://doi.org/10.1126/sciadv.1500391">https://doi.org/10.1126/sciadv.1500391</a></p>
</li>
<li>
<p>Ollivier, F.J., Samuelson, D.A., Brooks, D.E., Lewis, P.A., Kallberg, M.E.,
&amp; Komaromy, A.M. (2004). Comparative morphology of the tapetum lucidum
(among selected species). <em>Veterinary Ophthalmology</em>, 7(1), 11–22.
<a href="https://doi.org/10.1111/j.1463-5224.2004.00318.x">https://doi.org/10.1111/j.1463-5224.2004.00318.x</a></p>
</li>
<li>
<p>Born, M., &amp; Wolf, E. (1999). <em>Principles of Optics</em> (7th ed.). Cambridge
University Press. (Chapters 1, 7 on thin-film interference and multilayer
coatings.)</p>
</li>
<li>
<p>Shaw, P. (1934). <em>Improvements in Studs for Roads and like Surfaces.</em> British
Patent 436,290. Applied 3 April 1934.</p>
</li>
<li>
<p>Warrant, E.J. (1999). Seeing better at night: Life style, eye design and the
optimum strategy of spatial and temporal summation. <em>Vision Research</em>, 39(9),
1611–1630. <a href="https://doi.org/10.1016/S0042-6989(98)00262-4">https://doi.org/10.1016/S0042-6989(98)00262-4</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-15</strong>: Corrected the adoption date of Percy Shaw&rsquo;s road Catseyes from &ldquo;from 1945 onward&rdquo; to &ldquo;during the Second World War&rdquo; (widespread adoption began under wartime blackout conditions, not after the war ended). Removed the Machan, Gu, &amp; Bharthuar (2020) reference, which could not be confirmed in available databases.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Your Transcript Is Already an Interpretation: AI Transcription and Grounded Theory</title>
      <link>https://sebastianspicker.github.io/posts/ai-transcription-grounded-theory/</link>
      <pubDate>Tue, 10 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-transcription-grounded-theory/</guid>
      <description>aTrain and noScribe are local, GDPR-compliant, Whisper-based transcription tools that can genuinely save hours of work in qualitative interview research. They also make methodological decisions on your behalf without telling you. If you do grounded theory, you need to know which decisions those are.</description>
      <content:encoded><![CDATA[<p><em>In June 2025 I put together a practical guide on AI-assisted transcription
for professors of music pedagogy at HfMT Köln — primarily a hands-on
introduction to aTrain and noScribe. This post is the methodological
companion to that guide: the stuff I could not fit into a workshop handout
but that I think matters more than the installation instructions.</em></p>
<hr>
<h2 id="the-seduction">The Seduction</h2>
<p>AI transcription tools have reached a point where, for clean audio of a
single speaker in a quiet room, the output is genuinely good. You load a
90-minute interview, click a button, wait roughly 20 minutes, and get a
readable transcript with timestamps and speaker labels. In transcript-hours,
that is an order of magnitude faster than manual transcription. The appeal is
obvious, especially if you are a qualitative researcher working with a backlog
of interview recordings.</p>
<p>The two tools I have been evaluating — <strong>aTrain</strong> (developed at University of
Graz) and <strong>noScribe</strong> (an independent open-source project) — both run
entirely locally on your machine. No audio file is uploaded anywhere. No
cloud API is involved. This matters for interview research: you are handling
other people&rsquo;s speech, often on topics they regard as sensitive, and the
GDPR landscape for sending recordings to external servers is genuinely
complicated. Local processing sidesteps that problem entirely.</p>
<p>Both tools are built on <strong>OpenAI&rsquo;s Whisper model</strong>, which is — despite the
name — open-source and runs offline. They differ in interface philosophy,
feature depth, and what methodological commitments they make visible.</p>
<p>But the seduction is the problem. The speed and cleanliness of the output
makes it easy to treat the transcript as a neutral record rather than as a
construction. It is not. Every transcription is an act of interpretation. An
AI transcription is an act of interpretation performed by an algorithm that
does not know what your research question is.</p>
<hr>
<h2 id="why-this-is-a-grounded-theory-problem-specifically">Why This Is a Grounded Theory Problem Specifically</h2>
<p>In grounded theory — whether you follow the Strauss and Corbin tradition or
the constructivist reformulation by Charmaz — the researcher is not a passive
recorder of data. The analytical process begins with the first moment of
contact with the material. Coding, memo-writing, constant comparison, and
theoretical sampling all assume that you are working with data that you have
genuinely engaged with and that reflects choices made with your research
question in mind.</p>
<p>Transcription is the first of those choices. What counts as a pause? Do you
mark hesitations and self-corrections? Do you capture overlapping speech? Do
you note emphasis, speed changes, or trailing-off? The answers to these
questions are not neutral. They are determined by what level of analysis you
intend. A thematic analysis of interview content needs something different
from a conversation analysis of turn-taking, which needs something different
from a discourse analysis attending to hedges and disfluencies.</p>
<p>When you transcribe manually, you make these choices explicitly or
implicitly, but you make them. When you delegate to an algorithm, the
algorithm makes them — according to its training data and its default
settings — and then presents you with output that looks authoritative.</p>
<p>The risk is not that AI transcription is inaccurate (though it sometimes is).
The risk is that it is <em>selectively accurate in ways you did not choose</em> and
that those choices shape what you subsequently see in the data.</p>
<hr>
<h2 id="what-the-tools-actually-do">What the Tools Actually Do</h2>
<h3 id="atrain">aTrain</h3>
<p>aTrain is the simpler of the two. Windows-native (Microsoft Store), with a
macOS beta for Apple Silicon. The interface has essentially one meaningful
decision point after you load your file: whether to activate speaker
detection. Everything else is handled automatically. Output formats are plain
text with timestamps, SRT subtitle files, and — most useful for researchers —
direct QDA exports for MAXQDA, ATLAS.ti, and NVivo with synchronised
audio-timestamp links.</p>
<p>What aTrain does not do: it does not mark pauses. It does not detect
disfluencies (the <em>ähms</em>, <em>uhs</em>, self-interruptions, false starts). It does
not detect overlapping speech. It produces clean, semantically coherent
transcripts — which means it actively smooths what you gave it. If a
speaker says <em>&ldquo;well — I mean — it was, I think it was more like — yeah,
complicated&rdquo;</em>, aTrain will probably give you something closer to <em>&ldquo;I think it
was complicated&rdquo;</em>. The hesitation structure disappears.</p>
<p>For a thematic interview study where you are interested in what people said
about a topic, this is probably fine. For any analysis where <em>how</em> something
was said is part of the data — pace, repair, emphasis, epistemic hedging —
aTrain is erasing data you need.</p>
<h3 id="noscribe">noScribe</h3>
<p>noScribe is more complex in almost every dimension. Available for Windows,
macOS (including Apple Silicon and Intel), and Linux. The interface exposes
a meaningful number of configuration decisions:</p>
<ul>
<li><strong>Mark Pause</strong>: off, or marked at 1-, 2-, or 3-second thresholds, with
conventional notation <code>(.)</code>, <code>(..)</code>, <code>(...)</code>, <code>(10 seconds pause)</code></li>
<li><strong>Speaker Detection</strong>: automatic count, fixed count, or disabled</li>
<li><strong>Overlapping Speech</strong>: experimental detection, marked with <code>//double slash//</code></li>
<li><strong>Disfluencies</strong>: off or on — captures <em>ähm</em>, <em>äh</em>, self-corrections,
false starts</li>
<li><strong>Timestamps</strong>: by speaker turn or every 60 seconds</li>
</ul>
<p>It also has an integrated editor (noScribeEdit) with synchronised audio
playback: click anywhere in the transcript and the audio seeks to that
position. This is the single most useful feature for post-transcription
review, and aTrain does not have anything equivalent.</p>
<p>The configuration complexity is not gratuitous. It reflects the fact that
different methodological frameworks require different transcription
conventions. noScribe&rsquo;s disfluency detection corresponds roughly to what a
GAT2-Light transcription requires. Its pause notation system maps onto
conversation analytic conventions. The choices you make in the interface are
methodological choices, not just technical preferences.</p>
<hr>
<h2 id="the-normalisation-problem">The Normalisation Problem</h2>
<p>Both tools perform what I would call <em>normalisation</em>: they produce transcripts
that read more fluently than the original speech. This is a feature from a
usability standpoint and a methodological liability from a qualitative
research standpoint.</p>
<p>Specific failure modes I observed in evaluation:</p>
<p><strong>Compound word errors</strong> (more pronounced in noScribe for German): <em>VR-Brille</em>
(&ldquo;VR headset&rdquo;) transcribed as <em>Brille VR</em>, proper nouns mangled, domain
vocabulary rendered phonetically. In music research contexts this is
particularly salient — instrument names, notation terms, composer names, and
genre vocabulary are all potential failure points.</p>
<p><strong>Speaker detection overcounting</strong>: both tools, when speaker detection is
active, tend to identify more speakers than are present. A two-person
interview with one hesitant speaker may generate three or four speaker labels.
Manual correction is required.</p>
<p><strong>Acoustic transcription</strong>: noScribe occasionally produces what the document
calls <em>lautliche Transkriptionen</em> — phonetic renderings rather than semantic
ones. A speaker saying <em>Beamer</em> (data projector) may be transcribed as <em>Bima</em>.
This is not an error in the conventional sense; it is the model accurately
representing what it heard acoustically rather than semantically resolving it.
For music researchers studying how non-specialist participants talk about
technical equipment, this is interesting. For most interview research, it
requires correction.</p>
<p><strong>Pause and overlap reliability degrades with audio quality</strong>: both tools
perform well on clean, close-mic mono recordings of single speakers in quiet
rooms. Introduce a second speaker, ambient noise, variable recording distance,
or a phone recording, and accuracy drops substantially. This matters
specifically for music interview research, where the interview setting is
often a rehearsal room or performance space rather than an acoustic booth.</p>
<hr>
<h2 id="a-methodological-comparison-not-a-feature-list">A Methodological Comparison, Not a Feature List</h2>
<p>The useful comparison between aTrain and noScribe is not technical — it is
about which methodological contexts each is suited to.</p>
<table>
  <thead>
      <tr>
          <th>Research context</th>
          <th>Tool</th>
          <th>Why</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Thematic/content analysis, single speaker</td>
          <td>aTrain</td>
          <td>Speed, simplicity, adequate accuracy, QDA export</td>
      </tr>
      <tr>
          <td>Grounded theory with attention to epistemic hedging</td>
          <td>noScribe + disfluencies</td>
          <td>Captures the hesitation structure that carries methodological information</td>
      </tr>
      <tr>
          <td>Conversation analysis</td>
          <td>Neither, or noScribe as starting point</td>
          <td>CA requires phonetic detail neither tool reliably produces</td>
      </tr>
      <tr>
          <td>Large corpus, initial open coding</td>
          <td>aTrain</td>
          <td>Volume and speed outweigh detail at early stages</td>
      </tr>
      <tr>
          <td>Interpretive phenomenological analysis</td>
          <td>noScribe</td>
          <td>The pause and disfluency data is IPA-relevant</td>
      </tr>
      <tr>
          <td>Teaching transcription as a research practice</td>
          <td>Both</td>
          <td><em>See below</em></td>
      </tr>
  </tbody>
</table>
<p>The last row deserves its own section.</p>
<hr>
<h2 id="using-both-tools-to-teach-about-transcription">Using Both Tools to Teach About Transcription</h2>
<p>The most pedagogically valuable use of these tools is probably not producing
transcripts — it is using them to make the constructed nature of transcripts
visible to students.</p>
<p>A simple exercise: take a three-minute excerpt of an interview recording.
Have students transcribe it manually according to whatever convention the
course uses. Then run the same excerpt through aTrain and noScribe with
different settings. Compare the three or four resulting transcripts in a
seminar discussion.</p>
<p>The differences that emerge are not about which transcript is &ldquo;correct&rdquo;. They
are about what each transcript makes visible and what it hides. The aTrain
transcript will be clean and readable. The manually-produced transcript will
have annotation that the students chose based on what struck them as relevant.
The noScribe transcript with disfluencies enabled will look noisy. All three
are representations of the same three minutes of speech.</p>
<p>Questions that come out of this reliably: Why did the student who transcribed
manually mark that particular pause? What did the student not mark that the
software did? What did the software produce that the student did not hear?
What does the &ldquo;cleaner&rdquo; transcript lose?</p>
<p>This is the entry point to a genuinely grounded theory-relevant conversation
about data construction: the transcript is not the data. The transcript is a
representation of the data made according to principles that should be
theoretically motivated, and those principles should be stated explicitly in
the methods section.</p>
<hr>
<h2 id="what-these-tools-cannot-replace">What These Tools Cannot Replace</h2>
<p>The document I prepared for the HfMT professors ends with a sentence I want
to quote directly from the German, because it is the methodological core of
the whole thing:</p>
<blockquote>
<p><em>Automatisierung ersetzt nicht das Nachdenken über Daten.</em>
Automation does not replace thinking about data.</p>
</blockquote>
<p>More precisely: the algorithm makes decisions about what counts as a pause,
what counts as language, whose voice counts as a separate speaker — without
knowing what is scientifically relevant. It does not know that the half-second
hesitation before a particular word is the most important moment in the
interview. It does not know that the overlapping &ldquo;mm-hm&rdquo; is a data point for
your analysis of how the interviewee manages discomfort. It does not know
that the repeated self-correction in the middle of a sentence about teaching
practice is where your emerging category is.</p>
<p>You have to know that. And you only know it if you have been in enough
contact with the material to have developed theoretical sensitivity — which is
exactly what Strauss and Corbin mean when they describe the iterative
relationship between data collection, coding, and theoretical development in
grounded theory.</p>
<p>AI transcription tools save the hours of typing. They do not and cannot
substitute for the analytical engagement that makes a grounded theory study
produce knowledge rather than a theme list.</p>
<p>Use them. But use them knowing what they are doing.</p>
<hr>
<h2 id="practical-summary">Practical Summary</h2>
<ul>
<li><strong>aTrain</strong>: one-click, local, GDPR-compliant, good QDA integration,
appropriate for thematic analysis. No disfluencies, no pauses, no
overlap detection. Versions: Windows (Microsoft Store), macOS beta.
Current version: 1.3.1.</li>
<li><strong>noScribe</strong>: more complex, highly configurable, disfluency and pause
detection, integrated audio-sync editor, appropriate for grounded theory
and discourse-oriented work. More demanding to set up. Current version:
0.6.2.</li>
<li><strong>Neither tool</strong> is appropriate as a black-box solution for conversation
analysis or prosodic research.</li>
<li><strong>Both tools</strong> require manual post-processing. Estimate correction time
at roughly 20–40% of the original interview length for clean recordings
with a single speaker; more for multi-speaker or suboptimal audio.</li>
<li><strong>In teaching</strong>: the exercise of comparing manual, aTrain, and noScribe
transcripts of the same excerpt is more pedagogically valuable than any
of the transcripts individually.</li>
</ul>
<hr>
<h2 id="references">References</h2>
<p>Charmaz, K. (2014). <em>Constructing Grounded Theory</em> (2nd ed.).
SAGE Publications.</p>
<p>Dresing, T. &amp; Pehl, T. (2018). <em>Praxisbuch Interview, Transkription &amp;
Analyse</em> (8th ed.). Eigenverlag. <a href="https://www.audiotranskription.de">https://www.audiotranskription.de</a></p>
<p>Haberl, A., Fleiß, J., Kowald, D., &amp; Thalmann, S. (2024). Take the aTrain.
Introducing an interface for the accessible transcription of interviews.
<em>Journal of Behavioral and Experimental Finance</em>, 41, 100891.
<a href="https://doi.org/10.1016/j.jbef.2024.100891">https://doi.org/10.1016/j.jbef.2024.100891</a></p>
<p>Kailscheuer, K. (2023). noScribe [software].
<a href="https://github.com/kaixxx/noScribe">https://github.com/kaixxx/noScribe</a></p>
<p>Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., &amp; Sutskever, I.
(2022). Robust speech recognition via large-scale weak supervision.
arXiv preprint arXiv:2212.04356. <a href="https://arxiv.org/abs/2212.04356">https://arxiv.org/abs/2212.04356</a></p>
<p>Strauss, A. &amp; Corbin, J. (1998). <em>Basics of Qualitative Research</em>
(2nd ed.). SAGE Publications.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-20</strong>: Updated the aTrain reference to the published form: Haberl, A., Fleiß, J., Kowald, D., &amp; Thalmann, S. (2024), &ldquo;Take the aTrain. Introducing an interface for the accessible transcription of interviews.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>There Is No Blue Pill: The Epistemology of the Red Pill/Blue Pill Choice</title>
      <link>https://sebastianspicker.github.io/posts/matrix-red-pill-bayesian-epistemology/</link>
      <pubDate>Thu, 15 May 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/matrix-red-pill-bayesian-epistemology/</guid>
      <description>The most famous choice in science fiction is epistemically impossible to make rationally. Morpheus offers Neo &amp;rsquo;the truth&amp;rsquo; but gives him no way to evaluate the offer. Cypher&amp;rsquo;s decision to go back is more philosophically coherent than the films acknowledge.</description>
      <content:encoded><![CDATA[<p>Neo is in a chair. A man he has never met opens a small box containing two pills. Take the red one, Morpheus says, and you see how deep the rabbit hole goes. Take the blue one and you wake up in your bed and believe whatever you want to believe <a href="#ref-1">[1]</a>. The camera lingers. Neo reaches for the red pill. The audience exhales. The correct choice has been made.</p>
<p>The scene has spent twenty-five years becoming the dominant cultural shorthand for choosing uncomfortable truth over comfortable illusion. &ldquo;Take the red pill&rdquo; has entered the vocabulary as a synonym for courageous epistemic honesty. I want to argue that the choice, as Morpheus frames it, is epistemically bankrupt — that no rational agent has enough information to make it correctly at the moment it is offered — and that the character who actually reasons most coherently about the situation is the one the film kills as a traitor. The film wants you to admire Neo&rsquo;s leap. I think you should admire his willingness to leap while being clear-eyed about the fact that it is a leap, not a reasoned conclusion.</p>
<hr>
<h2 id="why-the-choice-is-not-rational">Why the Choice Is Not Rational</h2>
<p>Consider what Neo actually knows when Morpheus makes the offer. He knows that Morpheus is a man he has never met, who contacted him anonymously through encrypted channels, who seems to believe genuinely in what he is saying, and who has a compelling story about the nature of reality. That is it. Neo does not know whether Morpheus is telling the truth. He does not know whether Morpheus is deluded — a charismatic paranoid who has assembled a following around an elaborate false belief system. He does not know whether the entire setup is a psychological experiment, a test of loyalty, a confidence operation, or an elaborate cult recruitment. The setting — a dramatic late-night meeting, theatrical staging, rain-streaked windows, a black leather coat — is, if anything, evidence for the confidence-operation hypothesis.</p>
<p>In Bayesian terms <a href="#ref-2">[2]</a>, let T be the event &ldquo;the Matrix exists as Morpheus describes and he is telling the truth.&rdquo; Neo&rsquo;s prior probability on T — before taking the pill — should be very low. The claim is extraordinary on multiple dimensions simultaneously: the entire perceived world is a computer simulation running on machines that enslaved humanity, Neo is a prophesied saviour, and a small group of ship-dwelling rebels is conducting a guerrilla war against artificial intelligence. Each one of those components carries a low prior. Their conjunction carries a lower one still.</p>
<p>Now Morpheus makes his offer. Does the offer provide strong evidence for T? Not obviously. The likelihood ratio P(Morpheus makes this offer | T is true) divided by P(Morpheus makes this offer | T is false) is the quantity that matters. The numerator is plausible enough: if the Matrix exists and Morpheus is a genuine recruiter, he would make exactly this offer. But the denominator is also non-trivial. A cult leader, a delusional person with a well-developed narrative, a researcher running a social experiment, or a manipulator with undisclosed goals could all make the same offer with the same conviction. The likelihood ratio is not obviously large. It might be greater than one — the offer is somewhat more consistent with the Matrix being real than not — but not by the margin required to substantially shift a very low prior.</p>
<p>The rational response to a claim with a low prior and an ambiguous likelihood ratio is: update modestly, and gather more evidence before making an irreversible commitment. The pill choice is irreversible. Neo commits before he has accumulated enough evidence to commit rationally. I want to be precise here: I am not saying Neo is stupid or that the film is bad. I am saying that what Neo does is not Bayesian updating. It is something else, and the film is actually honest enough to name it: Morpheus is a man of faith, he recruits believers, and Neo&rsquo;s choice is a leap of faith. That framing is in the film. What the film does not do is acknowledge that the leap is epistemically problematic — it treats the leap as obviously correct, which is a different thing.</p>
<hr>
<h2 id="the-missing-third-option">The Missing Third Option</h2>
<p>What strikes me every time I watch the scene is that nobody considers the obvious response: decline both pills, at least for now. Not &ldquo;choose the blue pill&rdquo; in the sense of consciously accepting comfortable illusion. Not &ldquo;choose the red pill&rdquo; in the sense of committing to a reality you cannot yet evaluate. Just: I don&rsquo;t take either one until you give me something I can check.</p>
<p>What would that look like? Morpheus could offer Neo a verifiable prediction. He could show him a document, a piece of external evidence, something with epistemic traction that does not require swallowing a GPS-tracking capsule as a precondition. He could make a specific, falsifiable claim about something in Neo&rsquo;s ordinary life — about what will happen tomorrow, about something Neo can verify independently — and let Neo check it. The dramatic scene would survive this revision. It would, in fact, become more interesting. A Morpheus who says &ldquo;I will give you three days and three checkpoints and then you decide&rdquo; is a more trustworthy Morpheus than one who says &ldquo;decide now, in this room, with me watching.&rdquo;</p>
<p>The film never asks why Morpheus doesn&rsquo;t do this. Probably because it would slow down the plot and defuse the tension. But the question is worth sitting with, because the structure of the scene — charismatic authority figure, artificially binary choice, time pressure, grandiose framing, the implicit suggestion that declining is cowardice — is recognisable as the structure of many real-world scenarios that end badly. Cult recruitment. High-pressure sales. Certain kinds of political radicalisation. The scene is stylistically appealing precisely because it removes the messy, gradual process by which people actually come to trust extraordinary claims, and replaces it with a clean moment of commitment. That cleanliness is dramatically useful and epistemically dangerous.</p>
<p>Hilary Putnam raised the brain-in-a-vat problem decades before the film <a href="#ref-5">[5]</a>: if you were always a disembodied brain receiving simulated inputs, you would have no way to know it. The unsettling thing about Putnam&rsquo;s version is not just that you might be deceived, but that certain kinds of deception are in principle undetectable from the inside. The Matrix gestures at this problem without fully engaging it. If the simulation is good enough, the red pill doesn&rsquo;t show you reality — it shows you another simulation, run by the people who gave you the pill.</p>
<hr>
<h2 id="cypher-was-right">Cypher Was Right</h2>
<p>The character who actually reasons philosophically about the situation is Cypher, and the film kills him as a villain. This has always bothered me.</p>
<p>Cypher&rsquo;s argument is not confused. He knows the Matrix is a simulation. He has taken the red pill, seen the reality of the machines&rsquo; world — the grey sky, the protein slurry, the cold metal of the Nebuchadnezzar — and lived in it for years. He does not dispute the facts. What he disputes is the value judgment: why is knowing the truth better than experiencing a good life in a simulation? He wants to go back. He is willing to betray his colleagues to get there, which is why he is the villain; I want to separate that from the underlying philosophical question.</p>
<p>This is Robert Nozick&rsquo;s experience machine argument, published in 1974, a quarter century before the film <a href="#ref-3">[3]</a>. Nozick asks: suppose you could plug into a machine that would give you any experience you chose — creative achievement, loving relationships, meaningful work, pleasure. While plugged in, you would believe the experiences were real. Would you do it? Most people, when asked cold, say no. Nozick uses this intuition to argue that we care about more than experience: we care about actually doing things, actually being certain kinds of people, actually being in contact with reality rather than a representation of it. These are what philosophers call non-experientialist values — things that matter independently of how good they feel from the inside.</p>
<p>Cypher&rsquo;s position is the opposite: he is a committed hedonist, or at least a committed experientialist. He prefers a good simulated steak that he knows doesn&rsquo;t exist to real protein mush. He is not confused about which is which. He has done the value calculation and arrived somewhere different from where the Wachowskis want him to be. The film has no philosophical response to this. It cannot argue that Nozick&rsquo;s intuition pump is decisive, because it isn&rsquo;t — philosophers dispute it. David Chalmers, in a 2022 book on exactly this question <a href="#ref-6">[6]</a>, argues that virtual worlds can be genuinely real in the ways that matter, and that the intuitive recoil from the experience machine may reflect bias rather than deep moral truth. The film resolves the disagreement by having Cypher shot. That is not a philosophical refutation. It is narrative bullying.</p>
<p>I want to be fair to the film here. There is a reading of Cypher that makes him clearly wrong on non-philosophical grounds: he doesn&rsquo;t just choose the experience machine for himself, he actively endangers and kills people who chose differently. That is the real moral failure — not the preference, but the betrayal. The film is right to condemn the betrayal. What it is not entitled to do is use the betrayal to contaminate the underlying value judgment. Cypher could have negotiated his return without harming anyone. The film doesn&rsquo;t allow that possibility because it wants to code his preference, and not just his actions, as villainous. That conflation is intellectually dishonest.</p>
<p>If you think what matters is experienced well-being — hedonic experience, subjective satisfaction — then Cypher&rsquo;s choice is not only defensible but internally coherent. If you think what matters is contact with objective reality regardless of the experiential cost, then Neo&rsquo;s choice is defensible. These are genuinely contested positions in philosophy of mind and ethics, and the film is not in a position to adjudicate between them by casting vote.</p>
<hr>
<h2 id="what-this-has-to-do-with-ai">What This Has to Do with AI</h2>
<p>I think about this in the context of how AI systems present information to users. An AI that says &ldquo;here is the truth, take it or leave it&rdquo; — binary, authoritative, no scaffolding — is doing something structurally similar to Morpheus. It presents a conclusion without giving the user the epistemic equipment to evaluate it. Trusting the conclusion requires trusting the system, and trusting the system requires evidence the system hasn&rsquo;t provided. See <a href="/posts/matrix-oracle-alignment-problem/">The Oracle Problem</a> for a companion piece on the Matrix&rsquo;s other epistemically interesting character — the Oracle, who knows more than she tells, and deliberately withholds information on the grounds that the recipient isn&rsquo;t ready. Both failure modes — the Morpheus mode of demanding commitment before evidence, and the Oracle mode of managing disclosure paternalistically — are real patterns in how AI systems interact with users.</p>
<p>The better model — for AI assistants and for Morpheus — is incremental disclosure with verification checkpoints. Not a binary pill choice, but a sequence of smaller claims, each with attached evidence, that allows the recipient to update their beliefs rationally as evidence accumulates. This is how science works. It is also how trustworthy communication between humans works, at least when it is functioning well. It is not how dramatic scenes in action films work, which is why the Matrix scene is so satisfying and so epistemically broken at the same time. The satisfaction and the brokenness are related: the scene is satisfying because it removes the friction of genuine epistemic process. Genuine epistemic process is slow, uncertain, and does not have good cinematography.</p>
<p>There is also a point about extraordinary claims. The more extraordinary the claim, the more evidence is required before rational commitment. This is Sagan&rsquo;s principle <a href="#ref-4">[4]</a>, and it applies to the Matrix as much as it applies to claims about room-temperature superconductors or AI systems that achieve general understanding of language. The <a href="/posts/lk99-preprint-physics-sociology/">LK-99 preprint episode</a> is a real-world example of how scientific communities sometimes fail this test spectacularly — early excitement, rushed replication attempts, confident public claims — and how the self-correcting mechanisms of science eventually work, but more slowly and messily than the popular image suggests. Morpheus does not offer Neo the equivalent of a Nature paper with replication data and three independent confirmations. He offers him a pill and a charismatic pitch. The pill is the commitment mechanism, not the evidence. Taking it is the act of faith, not the conclusion of the reasoning process. <a href="/posts/more-context-not-always-better/">More context is not always better</a> is relevant here too: the amount of information Morpheus provides is carefully curated to produce commitment, not calibrated to support independent evaluation. That curation is a form of epistemic control, whether or not Morpheus intends it as such.</p>
<p>For a different kind of AI grounding failure — systems that answer confidently without knowing what state the world is in — see <a href="/posts/car-wash-grounding/">The Car Wash, Grounding, and What AI Systems Don&rsquo;t Know They Don&rsquo;t Know</a>. The Matrix scenario is almost the inverse: the system (Morpheus) knows something about the state of the world that the recipient (Neo) does not, and the question is whether the transfer of that knowledge is being handled honestly.</p>
<hr>
<h2 id="decision-under-radical-uncertainty">Decision Under Radical Uncertainty</h2>
<p>I find myself genuinely ambivalent about Neo&rsquo;s choice, which I think is the correct response to the film if you are paying attention. He is not irrational to take the red pill in the weak sense that reasonable people sometimes make bets on low-prior high-upside scenarios, especially when the downside of the alternative has its own costs. The blue pill is not costless. Accepting permanent comfortable ignorance — knowing that you are choosing not to know — carries its own weight. If Morpheus is telling the truth, the blue pill costs Neo his entire sense of self and his only chance at a meaningful life in the actual world. That asymmetry of potential regret is part of the rational calculus, and it pushes toward the red pill even without strong evidence for T.</p>
<p>What Neo is doing, then, is not Bayesian reasoning in the strict sense. He is making a decision under radical uncertainty with asymmetric stakes and irreversible options. The philosophy of decision theory has things to say about this — Pascal&rsquo;s Wager is the classic case, and it has classic problems, including the problem that any sufficiently grandiose framing can justify almost any commitment by inflating the potential stakes — but the point is that Neo&rsquo;s choice is more defensible than a naive probability calculation makes it look, even if it is less heroic than the film presents it.</p>
<p>The problem is that the film treats this leap as unambiguously correct and Cypher&rsquo;s considered rejection of the red pill&rsquo;s value as unambiguous cowardice. That framing does not survive philosophical scrutiny. Cypher knows the truth. He has lived in it. He prefers the simulation. The film cannot call him ignorant. What it wants to call him is wrong, and it cannot make the philosophical argument for that, so it makes him a murderer instead and lets the murder do the philosophical work. That is not honest. It is the narrative equivalent of winning an argument by changing the subject.</p>
<p>The blue pill represents something the film spends nearly three hours refusing to take seriously: the possibility that some simulations are worth staying in, that knowing the truth is not always worth the cost of knowing it, and that a person who reasons carefully and comes out on the other side of that calculation differently from you might not be a coward or a traitor — just someone whose values, applied to the same facts, point in a different direction. That is philosophy. The film is very good at many things. Philosophy is not consistently one of them.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Wachowski, L., &amp; Wachowski, L. (Directors). (1999). <em>The Matrix</em> [Film]. Warner Bros.</p>
<p><span id="ref-2"></span>[2] Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. <em>Philosophical Transactions of the Royal Society</em>, 53, 370–418.</p>
<p><span id="ref-3"></span>[3] Nozick, R. (1974). <em>Anarchy, State, and Utopia</em>. Basic Books. (Experience machine argument, pp. 42–45.)</p>
<p><span id="ref-4"></span>[4] Sagan, C. (1995). <em>The Demon-Haunted World: Science as a Candle in the Dark</em>. Random House.</p>
<p><span id="ref-5"></span>[5] Putnam, H. (1981). Brains in a vat. In <em>Reason, Truth and History</em>. Cambridge University Press.</p>
<p><span id="ref-6"></span>[6] Chalmers, D. (2022). <em>Reality+: Virtual Worlds and the Problems of Philosophy</em>. W. W. Norton.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-09-28</strong>: Corrected the subtitle of Chalmers (2022) from &ldquo;Virtual Worlds and the Philosophy of Mind&rdquo; to &ldquo;Virtual Worlds and the Problems of Philosophy.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Oldest Algorithm in the World Plays the Clave</title>
      <link>https://sebastianspicker.github.io/posts/euclidean-rhythms/</link>
      <pubDate>Mon, 07 Apr 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/euclidean-rhythms/</guid>
      <description>Euclid&amp;rsquo;s algorithm for computing greatest common divisors, applied to the problem of distributing k drum beats as evenly as possible among n time slots, generates rhythmic patterns that match traditional timelines from West Africa, Cuba, Brazil, Turkey, and the Balkans. An algorithm devised in Alexandria around 300 BCE encodes the rhythmic structure of musical cultures that had no contact with ancient Greek mathematics.</description>
      <content:encoded><![CDATA[<p><em>The first time I encountered the West African standard bell pattern it was in a
Music and Physics seminar. The lecturer played a twelve-beat cycle on a wood
block — seven strokes distributed unevenly but with a rightness that arrested the
room. She then played the Cuban clave, the bossa nova timeline, a Bulgarian
aksak rhythm. Different cultures, different instruments, different centuries. She
asked whether there was a pattern. There was. It is named after a mathematician
who died around 270 BCE.</em></p>
<hr>
<h2 id="euclids-algorithm">Euclid&rsquo;s Algorithm</h2>
<p>Every student who has taken a number theory course has encountered the algorithm
for computing the greatest common divisor of two positive integers. Given $a \geq
b$, repeatedly replace $(a, b)$ with $(b, a \bmod b)$ until the remainder is
zero; the last non-zero remainder is the GCD.</p>
<p>For example, $\gcd(8, 3)$:</p>
$$8 = 2 \times 3 + 2 \;\Rightarrow\; \gcd(8, 3) = \gcd(3, 2)$$<p>
</p>
$$3 = 1 \times 2 + 1 \;\Rightarrow\; \gcd(3, 2) = \gcd(2, 1)$$<p>
</p>
$$2 = 2 \times 1 + 0 \;\Rightarrow\; \gcd(2, 1) = 1.$$<p>Three steps, result 1 (8 and 3 are coprime). The algorithm is efficient: the
number of steps is proportional to the number of digits in the smaller input.
It appears in Book VII of Euclid&rsquo;s <em>Elements</em>, composed around 300 BCE, making
it the oldest non-trivial algorithm in the Western mathematical tradition.</p>
<hr>
<h2 id="distributing-onsets-toussaints-observation">Distributing Onsets: Toussaint&rsquo;s Observation</h2>
<p>In 2005, Godfried Toussaint — a computer scientist and ethnomusicologist at
McGill University — published the observation that the problem of distributing
$k$ musical onsets as evenly as possible among $n$ time slots has the same
recursive structure as Euclid&rsquo;s algorithm applied to the pair $(k, n-k)$
(Toussaint, 2005).</p>
<p>The algorithm that solves this distribution problem was independently discovered
in nuclear physics. Bjorklund (2003), working on timing systems for the
Spallation Neutron Source particle accelerator at Oak Ridge, needed to distribute
$k$ beam-extraction pulses as evenly as possible among $n$ machine cycles.
The algorithm he derived — Bjorklund&rsquo;s algorithm — is mathematically equivalent
to the Euclidean algorithm applied to the same pair of integers.</p>
<p>The resulting pattern is denoted $E(k, n)$: the <strong>Euclidean rhythm</strong> with $k$
onsets distributed among $n$ pulses. A 1 denotes an onset; a 0 denotes a rest.</p>
<hr>
<h2 id="working-through--the-tresillo">Working Through $E(3, 8)$: The Tresillo</h2>
<p>Let us derive $E(3, 8)$ — 3 onsets distributed in 8 pulses — step by step.</p>
<p><strong>Start</strong>: 3 onset groups and 5 rest groups:</p>
$$[1]\; [1]\; [1]\; [0]\; [0]\; [0]\; [0]\; [0]$$<p><strong>Step 1</strong>: Distribute one rest group into each onset group, pairing until the
shorter list is exhausted. Three pairs, with $5 - 3 = 2$ rest groups remaining:</p>
$$[1,0]\; [1,0]\; [1,0]\; [0]\; [0]$$<p><strong>Step 2</strong>: Now 3 longer groups and 2 shorter groups. Distribute one shorter group
into each longer group, $3 - 2 = 1$ longer group unpaired:</p>
$$[1,0,0]\; [1,0,0]\; [1,0]$$<p><strong>Step 3</strong>: The two group types have different lengths and only one group of the
shorter type remains; no further pairing is possible. Read the sequence
left to right:</p>
$$E(3, 8) = [1, 0, 0, 1, 0, 0, 1, 0].$$<p>This is the <strong>Cuban tresillo</strong> — one of the foundational rhythmic cells of
Afro-Cuban music, used across son, salsa, and mambo. Its onset positions are
$\{0, 3, 6\}$, giving gap sizes $[3, 3, 2]$: two wide gaps and one narrow gap,
arranged as evenly as the integers allow.</p>
<p>The parallel with Euclid&rsquo;s algorithm is direct. In the division $8 = 2 \times 3</p>
<ul>
<li>2$, the quotient 2 gives the number of pairing steps before a remainder
appears, and the remainder 2 gives the number of groups in the shorter list at
each intermediate stage. The recursion $\gcd(8, 3) \to \gcd(3, 2) \to \gcd(2,
1)$ mirrors the three steps above.</li>
</ul>
<hr>
<h2 id="the-gap-structure">The Gap Structure</h2>
<p>For any $E(k, n)$, the spacing between consecutive onsets takes exactly two
values:</p>
$$\text{gap} \in \left\{\left\lfloor \frac{n}{k} \right\rfloor,\
\left\lceil \frac{n}{k} \right\rceil\right\}.$$<p>The number of each gap size is determined by the constraint that all $k$ gaps
sum to $n$. Writing $\alpha = n \bmod k$:</p>
$$\alpha \cdot \left\lceil \frac{n}{k} \right\rceil \;+\; (k - \alpha) \cdot
\left\lfloor \frac{n}{k} \right\rfloor = n.$$<p>So $E(k,n)$ has $\alpha$ gaps of the larger size and $k - \alpha$ gaps of the
smaller size. The Euclidean property is that these two gap types are distributed
<em>as evenly as possible</em> among themselves — not clustered at one end of the cycle
but interleaved. A cycle that maximises the minimum distance between any two
consecutive onsets has this property; it is called <strong>maximally even</strong> (Clough
and Douthett, 1991).</p>
<p>For $E(3, 8)$: $\lfloor 8/3 \rfloor = 2$, $\lceil 8/3 \rceil = 3$,
$\alpha = 8 \bmod 3 = 2$. Two gaps of 3, one gap of 2. Gap sequence $[3, 3, 2]$.
Maximum-evenness is why the tresillo sounds &ldquo;right&rdquo; even though it is
asymmetric: the asymmetry is the smallest possible deviation from perfect
regularity.</p>
<hr>
<h2 id="a-gallery-of-world-rhythms">A Gallery of World Rhythms</h2>
<p>The following table, derived from Toussaint (2005, 2020), shows Euclidean rhythms
alongside their ethnomusicological identifications. Asterisks mark patterns given
as rotations of the canonical form.</p>
<table>
  <thead>
      <tr>
          <th>Pattern</th>
          <th>Gap structure</th>
          <th>Musical tradition</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>$E(2,3) = [1,0,1]$</td>
          <td>$[2,1]$</td>
          <td>Iambic foot; West African, Balkan</td>
      </tr>
      <tr>
          <td>$E(3,8) = [1,0,0,1,0,0,1,0]$</td>
          <td>$[3,3,2]$</td>
          <td>Cuban <em>tresillo</em>; Flamenco</td>
      </tr>
      <tr>
          <td>$E(5,8) = [1,0,1,1,0,1,1,0]$</td>
          <td>$[2,1,2,1,2]^*$</td>
          <td>Cuban <em>cinquillo</em></td>
      </tr>
      <tr>
          <td>$E(4,9) = [1,0,0,1,0,1,0,1,0]^*$</td>
          <td>$[3,2,2,2]^*$</td>
          <td>Turkish <em>aksak</em> patterns</td>
      </tr>
      <tr>
          <td>$E(7,12) = [1,0,1,1,0,1,0,1,1,0,1,0]$</td>
          <td>$[2,1,2,2,1,2,2]^*$</td>
          <td>West African standard bell</td>
      </tr>
      <tr>
          <td>$E(9,16)$</td>
          <td>$[2,1,2,2,1,2,2,1,2,1]^*$</td>
          <td>Brazilian and West African</td>
      </tr>
      <tr>
          <td>$E(13,24)$</td>
          <td></td>
          <td>South Indian (Carnatic) <em>tāla</em></td>
      </tr>
  </tbody>
</table>
<p>Three of these are worth examining in more detail.</p>
<p><strong>$E(5,8)$: the cinquillo.</strong> Five onsets in eight pulses: $\lfloor 8/5 \rfloor =
1$, $\lceil 8/5 \rceil = 2$, $\alpha = 3$. Three gaps of 2 and two gaps of 1.
Gap sequence $[2,1,2,1,2]$ or a rotation thereof. The <em>cinquillo</em> is a
fundamental pattern in Cuban music, used as a melodic rhythmic figure in the
nineteenth-century contradanza and in much of what followed.</p>
<p><strong>$E(7,12)$: the West African standard bell.</strong> Seven onsets in a twelve-beat
cycle: $\lfloor 12/7 \rfloor = 1$, $\lceil 12/7 \rceil = 2$, $\alpha = 5$.
Five gaps of 2 and two gaps of 1. This timeline — used across the Ewe, Akan,
and many other traditions in West Africa — is the cyclic reference structure
against which other rhythmic layers are measured in ensemble drumming. It is also
the pitch-class set $\{0, 2, 4, 5, 7, 9, 11\}$ — the Western diatonic scale,
translated from pitch to rhythm. That the same maximally-even distribution
describes both the diatonic scale in pitch space and the standard bell in rhythm
is one of the more remarkable coincidences in mathematical music theory.</p>
<p><strong>Universality across non-connected cultures.</strong> The tresillo $E(3,8)$ appears
independently in Cuban music, Flamenco, Namibian Juǀ&rsquo;hoansi music, and
medieval Persian music (Toussaint, 2020). These traditions had no common musical
ancestor that could have transmitted the pattern. The Euclidean algorithm
produces what maximum evenness demands, and maximum evenness turns out to be
what these rhythmic traditions independently converged on.</p>
<hr>
<h2 id="circular-notation-and-necklaces">Circular Notation and Necklaces</h2>
<p>Euclidean rhythms are most naturally represented as <strong>necklaces</strong> — equivalence
classes of binary sequences under cyclic rotation. All rotations of $E(3,8)$
represent the same rhythmic structure with a different starting downbeat: the
musical identity is independent of which position is designated &ldquo;beat 1.&rdquo;</p>
<p>In circular notation, place $n$ equally spaced dots on a circle and mark the $k$
onset positions. The pattern is immediately visible: the $k$ onset-dots divide
the circle as evenly as possible. For $E(7,12)$, the seven onset dots on a
twelve-position circle look like the seven vertices of a near-regular heptagon
inscribed in a dodecagon. For $E(3,8)$, the three onset dots form a near-
equilateral triangle.</p>
<p>This geometric representation makes the maximum-evenness property transparent in
a way that the linear binary string does not. It also makes clear why Euclidean
rhythms feel &ldquo;balanced&rdquo; when played: the onset dots distribute the &ldquo;weight&rdquo; of
the cycle as uniformly as the integer constraints allow.</p>
<p>The mathematical theory of necklaces belongs to combinatorics on words.
Euclidean rhythms correspond to specific equivalence classes of binary sequences
known as <em>Christoffel words</em> (Lothaire, 2002): words over the alphabet $\{0,1\}$
whose combinatorial properties encode the slope of a line segment, which brings
us to the third independent context in which the same algorithm appears.</p>
<hr>
<h2 id="the-bresenham-connection">The Bresenham Connection</h2>
<p>Jack Bresenham&rsquo;s line algorithm (1965) rasterises a line from $(0,0)$ to $(n,k)$
on a grid of integer pixels. At each column $x$, the algorithm tracks whether the
fractional error accumulated since the last row increment exceeds $\frac{1}{2}$,
and if so, increments the row and resets the error. The sequence of column
positions at which the row increments is the onset pattern $E(k,n)$.</p>
<p>Formally, an onset occurs at position $m$ in $E(k,n)$ if and only if:</p>
$$\left\lfloor \frac{(m+1)\, k}{n} \right\rfloor > \left\lfloor \frac{m\, k}{n} \right\rfloor.$$<p>Equivalently, the onset positions themselves form the sequence:</p>
$$s_j = \left\lfloor \frac{j \cdot n}{k} \right\rfloor, \qquad j = 0, 1, \ldots, k-1.$$<p>For $E(3,8)$: $s_0 = 0$, $s_1 = \lfloor 8/3 \rfloor = 2$,
$s_2 = \lfloor 16/3 \rfloor = 5$, giving onset positions $\{0, 2, 5\}$ — a
rotation of the tresillo.</p>
<p>This is exactly the Bresenham increment condition. Drawing the line from $(0,0)$
to $(8,3)$ and marking where the $y$-coordinate takes a step produces the onset
positions $\{2, 5, 7\}$ — a rotation of the tresillo $\{0, 3, 6\}$.</p>
<p>Three independent fields — ancient Greek number theory, Afro-Caribbean percussion,
and 1960s computer graphics — converge on the same mathematical object. This is
not a coincidence. All three are solving the same fundamental problem: how to
distribute $k$ discrete events as evenly as possible among $n$ slots. When the
problem is universal, its solution is too.</p>
<hr>
<h2 id="euclidean-rhythms-in-contemporary-practice">Euclidean Rhythms in Contemporary Practice</h2>
<p>Toussaint&rsquo;s 2005 paper was primarily a contribution to computational
ethnomusicology, but it reached electronic music production rapidly. Euclidean
rhythm sequencers are now standard in modular synthesis (dedicated Eurorack
hardware modules exist under names including &ldquo;Euclidean&rdquo; and &ldquo;Erica Synths
Pico&rdquo;) and digital audio workstations (as Max for Live devices and software
plug-ins). The interface is minimal: set $k$ and $n$, adjust the rotation offset,
and hear the resulting timeline immediately.</p>
<p>This has opened a compositional mode in which the mathematical structure is
operational: a producer constructing a layered African-style polyrhythm by
stacking $E(3,8)$, $E(5,8)$, and $E(7,8)$ on different instruments is — whether
they know it or not — computing the Euclidean algorithm three times and listening
to the result.</p>
<hr>
<h2 id="implications-for-teaching-rhythm">Implications for Teaching Rhythm</h2>
<p>Music conservatories in the European tradition teach rhythm almost entirely
through Western notation: time signatures, note values, dotted notes, ties. This
system is well-suited to the repertoire it was designed for. It handles Euclidean
rhythms awkwardly. The tresillo $E(3,8)$ requires either a triplet feel against
a binary pulse or a notation involving a dotted quarter note followed by a dotted
quarter and a quarter, which correctly represents the sound but obscures the
structural principle entirely.</p>
<p>The Euclidean framework suggests a different pedagogical starting point. Rather
than beginning from the bar line and asking how notes fill it, begin from the
cycle length $n$ and the onset count $k$ and ask how to distribute the onsets
as evenly as possible. The answer is always computable and always produces a
recognisable rhythm.</p>
<p>For students who encounter West African, Afro-Cuban, or Middle Eastern music —
which conservatory students increasingly do — having a framework that makes these
rhythms <em>structurally necessary</em> rather than <em>culturally exotic</em> changes the
pedagogical relationship fundamentally. The tresillo is not a deviation from
&ldquo;normal&rdquo; rhythm. It is the unique maximally even solution to the problem of
placing three beats in eight pulses. That the same algorithm appeared in a 300
BCE Alexandrian text on number theory is an accident of the history of mathematics.
That it sounds right is not.</p>
<p>Whether conservatory curricula are ready to incorporate the Euclidean framework
alongside Western notation is a separate question. The mathematics does not
demand it. But it offers a language for rhythm that transcends the Western
bar-line without abandoning precision — and that seems worth something, especially
in a world where the music students will perform and teach is no longer
exclusively European.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Bjorklund, E. (2003). <em>The theory of rep-rate pattern generation in the SNS
timing system.</em> Technical Report SNS-NOTE-CNTRL-99, Spallation Neutron Source,
Oak Ridge National Laboratory.</p>
</li>
<li>
<p>Bresenham, J. E. (1965). Algorithm for computer control of a digital plotter.
<em>IBM Systems Journal</em>, 4(1), 25–30. <a href="https://doi.org/10.1147/sj.41.0025">https://doi.org/10.1147/sj.41.0025</a></p>
</li>
<li>
<p>Clough, J., &amp; Douthett, J. (1991). Maximally even sets. <em>Journal of Music
Theory</em>, 35(1–2), 93–173. <a href="https://doi.org/10.2307/843811">https://doi.org/10.2307/843811</a></p>
</li>
<li>
<p>Lothaire, M. (2002). <em>Algebraic Combinatorics on Words.</em> Cambridge University
Press.</p>
</li>
<li>
<p>Toussaint, G. T. (2005). The Euclidean algorithm generates traditional musical
rhythms. In R. Sarhangi &amp; J. Sharp (Eds.), <em>Proceedings of BRIDGES 2005:
Mathematical Connections in Art, Music, and Science</em> (pp. 47–56). Bridges
Conference.</p>
</li>
<li>
<p>Toussaint, G. T. (2020). <em>The Geometry of Musical Rhythm: What Makes a &ldquo;Good&rdquo;
Rhythm Good?</em> (2nd ed.). Chapman &amp; Hall/CRC Press.</p>
</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Papertrail: AI PDF Renaming and the Tokens That Make It Interesting</title>
      <link>https://sebastianspicker.github.io/posts/ai-pdf-renamer/</link>
      <pubDate>Sat, 22 Mar 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-pdf-renamer/</guid>
      <description>Everyone has a Downloads folder full of &amp;ldquo;scan0023.pdf&amp;rdquo; and &amp;ldquo;document(3)-final-FINAL.pdf&amp;rdquo;. Renaming them by content sounds trivial — read the file, understand what it is, give it a name. The implementation reveals something useful about how LLMs actually handle text: what a token is, why context windows matter in practice, why you want structured output instead of prose, and why heuristics should go first. The repository is at github.com/sebastianspicker/AI-PDF-Renamer.</description>
      <content:encoded><![CDATA[<p><em>The repository is at
<a href="https://github.com/sebastianspicker/AI-PDF-Renamer">github.com/sebastianspicker/AI-PDF-Renamer</a>.</em></p>
<hr>
<h2 id="the-problem">The Problem</h2>
<p>Every PDF acquisition pipeline eventually produces the same chaos.
Journal articles downloaded from publisher sites arrive as
<code>513194-008.pdf</code> or <code>1-s2.0-S0360131520302700-main.pdf</code>. Scanned
letters from the tax authority arrive as <code>scan0023.pdf</code>. Invoices arrive
as <code>Rechnung.pdf</code> — every invoice from every vendor, overwriting each
other if you are not paying attention. The actual content is
in the file. The filename tells you nothing.</p>
<p>The human solution is trivial: open the PDF, glance at the title or
date or sender, type a descriptive name. Thirty seconds per file,
multiplied by several hundred files accumulated over a year, becomes
a task that perpetually does not get done.</p>
<p>The automated solution sounds equally trivial: read the text, decide what
the document is, generate a filename. What could be involved?</p>
<p>Quite a bit, it turns out. Working through the implementation is a useful
way to make concrete some things about LLMs and text processing that are
easy to understand in the abstract but clearer with a specific task in
front of you.</p>
<hr>
<h2 id="step-one-getting-text-out-of-a-pdf">Step One: Getting Text Out of a PDF</h2>
<p>A PDF is not a text file. It is a binary format designed for page layout
and print fidelity — it encodes character positions, fonts, and rendering
instructions, not a linear stream of prose. The text in a PDF has to be
extracted by a parser that reassembles it from the position data.</p>
<p>For PDFs with embedded text (most modern documents), this works well
enough. For scanned PDFs — images of pages, with no embedded text at all —
you need OCR as a fallback. The pipeline handles both: native extraction
first, OCR if the text yield is below a useful threshold.</p>
<p>The result is a string. Already there are failure modes: two-column
layouts produce interleaved text if the parser reads left-to-right across
both columns simultaneously; footnotes appear in the middle of
sentences; tables produce gibberish unless the parser handles them
specifically. These are not catastrophic — for renaming purposes,
the first paragraph and the document header are usually enough, and those
are less likely to be badly formatted than the body. But they are real,
and they mean that the text passed to the next stage is not always clean.</p>
<hr>
<h2 id="step-two-the-token-budget">Step Two: The Token Budget</h2>
<p>Once you have a string representing the document&rsquo;s text, you cannot simply
pass all of it to a language model. Two reasons: context windows have hard
limits, and — even when they are large enough — filling them with the full
text of a thirty-page document is wasteful for a task that only needs the
title, date, and category.</p>
<p>Language models do not process characters. They process <em>tokens</em> — subword
units produced by the same BPE compression scheme I described
<a href="/posts/strawberry-tokenisation/">in the strawberry post</a>. A rough
practical rule for English text is:</p>
$$N_{\text{tokens}} \;\approx\; \frac{N_{\text{chars}}}{4}$$<p>This is an approximation — technical text, non-English content, and
code tokenise differently — but it is useful for budgeting. A ten-page
academic paper might contain around 30,000 characters, which is
approximately 7,500 tokens. The context window of a small local model
(the default here is <code>qwen2.5:3b</code> via Ollama) is typically in the range
of 8,000–32,000 tokens, depending on the version and configuration.
You have room — but not unlimited room, and the LLM also needs space
for the prompt itself and the response.</p>
<p>The tool defaults to 28,000 tokens of extracted text
(<code>DEFAULT_MAX_CONTENT_TOKENS</code>), leaving comfortable headroom for the
prompt and response in most configurations. For documents that exceed this, the extraction
is truncated — typically to the first N characters, on the reasonable
assumption that titles, dates, and document types appear early.</p>
<p>This truncation is a design decision, not a limitation to be apologised
for. For the renaming task, the first two pages of a document contain
everything the filename needs. A strategy that extracts the first page
plus the last page (which often has a date, a signature, or a reference
number) would work for some document types. The current implementation
keeps it simple: take the front, stay within budget.</p>
<hr>
<h2 id="step-three-heuristics-first">Step Three: Heuristics First</h2>
<p>Here is something that improves almost any LLM pipeline for structured
extraction tasks: do as much work as possible with deterministic rules
before touching the model.</p>
<p>The AI PDF Renamer applies a scoring pass over the extracted text before
deciding whether to call the LLM at all. The heuristics are regex-based
rules that look for patterns likely to appear in specific document types:</p>
<ul>
<li>Date patterns: <code>\d{4}-\d{2}-\d{2}</code>, <code>\d{2}\.\d{2}\.\d{4}</code>, and a
dozen variants</li>
<li>Document type markers: &ldquo;Rechnung&rdquo;, &ldquo;Invoice&rdquo;, &ldquo;Beleg&rdquo;, &ldquo;Gutschrift&rdquo;,
&ldquo;Receipt&rdquo;</li>
<li>Author/institution lines near the document header</li>
<li>Keywords from a configurable list associated with specific categories</li>
</ul>
<p>Each rule that fires contributes a score to a candidate metadata record.
If the heuristic pass produces a confident result — date found, category
identified, a couple of distinguishing keywords present — the LLM call
is skipped entirely. The file gets renamed from the heuristic output.</p>
<p>This matters for a few reasons. Heuristics are fast (microseconds vs.
seconds for an LLM call), deterministic (the same input always produces
the same output), and do not require a running model. For a batch of
two hundred invoices from the same vendor, the heuristic pass will handle
most of them without any LLM involvement.</p>
<p>The LLM is enrichment for the hard cases: documents with unusual formats,
mixed-language content, documents where the type is not obvious from
surface features. In practice this is probably 20–40% of a typical
mixed-document folder.</p>
<hr>
<h2 id="step-four-what-to-ask-the-llm-and-how">Step Four: What to Ask the LLM, and How</h2>
<p>When a heuristic pass does not produce a confident result, the pipeline
builds a prompt from the extracted text and sends it to the local
endpoint. What the prompt asks for matters enormously.</p>
<p>The naive approach: &ldquo;Please rename this PDF. Here is the content: [text].&rdquo;
The response will be a sentence. Maybe several sentences. It will not be
parseable as a filename without further processing, and that further
processing is itself an LLM call or a fragile regex.</p>
<p>The better approach: ask for structured output. The prompt in
<code>llm_prompts.py</code> requests a JSON object conforming to a schema — something
like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;date&#34;</span><span class="p">:</span> <span class="s2">&#34;YYYYMMDD or null&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;category&#34;</span><span class="p">:</span> <span class="s2">&#34;one of: invoice, paper, letter, contract, ...&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;keywords&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;max 3 short keywords&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;summary&#34;</span><span class="p">:</span> <span class="s2">&#34;max 5 words&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>The model returns JSON. The response parser in <code>llm_parsing.py</code> validates
it against the schema, catches malformed responses, applies fallbacks for
null fields, and sanitises the individual fields before they are assembled
into a filename.</p>
<p>This works because JSON is well-represented in LLM training data —
models have seen vastly more JSON than they have seen arbitrary prose
instructions to parse. A model told to return a specific JSON structure
will do so reliably for most inputs. The failure rate (malformed JSON,
missing fields, hallucinated values) is low enough to be handled by
the fallback logic.</p>
<p>What counts as a hallucinated value in this context? Dates in the future.
Categories not in the allowed set. Keywords that are not present in the
source text. The <code>llm_schema.py</code> validation layer catches the obvious
cases; for subtler errors (a plausible-sounding date that does not appear
in the document), the tool relies on the heuristic pass having already
identified any date that can be reliably extracted.</p>
<hr>
<h2 id="step-five-the-filename">Step Five: The Filename</h2>
<p>The output format is <code>YYYYMMDD-category-keywords-summary.pdf</code>. A few
design decisions embedded in this:</p>
<p><strong>Date first.</strong> Lexicographic sorting of filenames then gives you
chronological sorting for free. This is the most useful sort order for
most document types — you want to find the most recent invoice, not
the alphabetically first one.</p>
<p><strong>Lowercase, hyphens only.</strong> No spaces (which require escaping in many
contexts), no special characters (which are illegal in some filesystems
or require quoting), no uppercase (which creates case-sensitivity issues
across platforms). The sanitisation step in <code>filename.py</code> strips or
replaces anything that does not conform.</p>
<p><strong>Collision resolution.</strong> Two documents with the same date, category,
keywords, and summary would produce the same filename. The resolver
appends a counter suffix (<code>_01</code>, <code>_02</code>, &hellip;) when a target name already
exists. This is deterministic — the same set of documents always produces
the same filenames, regardless of processing order — which matters for
the undo log.</p>
<hr>
<h2 id="local-first">Local-First</h2>
<p>The LLM endpoint defaults to <code>http://127.0.0.1:11434/v1/completions</code> —
Ollama running locally, no external traffic. This is a deliberate choice
for a document management tool. The documents being renamed are likely
to include medical records, financial statements, legal correspondence —
content that should not be routed through an external API by default.</p>
<p>A small 8B model running locally is sufficient for this task. The
extraction problem does not require deep reasoning; it requires pattern
recognition over a short text and the ability to return a specific JSON
structure. Models at this scale handle it well. The latency is measurable
(a few seconds per document on a modern laptop with a reasonably fast
inference backend) but acceptable for a batch job running in the
background.</p>
<p>For users who want to use a remote API, the endpoint is configurable —
the local default is a sensible starting point, not a hard constraint.</p>
<hr>
<h2 id="what-it-cannot-do">What It Cannot Do</h2>
<p>Renaming is a classification problem disguised as a text generation
problem. The tool works well when documents have standard structure —
title on page one, date near the header or footer, document type
identifiable from a few keywords. It works less well for documents that
are structurally atypical: a hand-written letter scanned at poor
resolution, a PDF that is essentially a single large image, a document
in a language the model handles badly.</p>
<p>The heuristic fallback means that even when the LLM produces a bad
result, the file gets a usable if imperfect name rather than a broken
one. And the undo log means that a bad batch run can be reversed. These
are not complete solutions to the hard cases, but they are the right
design response to a tool that handles real-world document noise.</p>
<p>The harder limit is semantic: the tool can tell you that a document is
an invoice and extract its date and vendor name. It cannot tell you
whether the invoice has been paid, whether it matches a purchase order,
or whether the amount is correct. For those questions, renaming is just
the first step in a longer pipeline.</p>
<hr>
<p><em>The repository is at
<a href="https://github.com/sebastianspicker/AI-PDF-Renamer">github.com/sebastianspicker/AI-PDF-Renamer</a>.
The tokenisation background in the extraction and budgeting sections
connects to the <a href="/posts/strawberry-tokenisation/">strawberry tokenisation post</a>
and the <a href="/posts/more-context-not-always-better/">context window post</a>.</em></p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-04-02</strong>: Corrected the default model name from <code>qwen3:8b</code> to <code>qwen2.5:3b</code>. The codebase default is <code>qwen2.5:3b</code> (apple-silicon preset) or <code>qwen2.5:7b-instruct</code> (gpu preset).</li>
<li><strong>2026-04-02</strong>: Corrected <code>DEFAULT_MAX_CONTENT_TOKENS</code> description from &ldquo;28,000 characters &hellip; roughly 7,000 tokens&rdquo; to &ldquo;28,000 tokens.&rdquo; The variable is a token limit, not a character limit.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Oracle Problem: What The Matrix Got Right About AI Alignment</title>
      <link>https://sebastianspicker.github.io/posts/matrix-oracle-alignment-problem/</link>
      <pubDate>Thu, 20 Mar 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/matrix-oracle-alignment-problem/</guid>
      <description>The Oracle is the most interesting character in The Matrix for anyone who thinks about AI alignment. She systematically lies to Neo for his own good. The films present this as wisdom. I think it is a cautionary tale the Wachowskis didn&amp;rsquo;t know they were writing.</description>
      <content:encoded><![CDATA[<p><em>I came to AI alignment the way outsiders come to most fields — through analogy and formal structure, a little late, and slightly too confident that the existing vocabulary was adequate. I have since become less confident about a lot of things. This post is about one of them.</em></p>
<hr>
<h2 id="the-grandmother-who-bakes-cookies">The Grandmother Who Bakes Cookies</h2>
<p>I watched <em>The Matrix</em> in 1999 when I was ten — far too young for it, in retrospect — and like almost everyone who saw it, I filed the Oracle under &ldquo;wise, benevolent figure.&rdquo; She is warm. She bakes cookies. She speaks plainly where others speak in riddles. She is explicitly set against the cold, mathematical Architect — the good machine against the bureaucratic one, the machine that cares against the machine that calculates. I loved her as a character. I trusted her.</p>
<p>I watched the film again recently, for reasons that had more to do with thinking about AI alignment than nostalgia, and I came away from it genuinely uncomfortable. Not with the Wachowskis&rsquo; filmmaking, which remains extraordinary — the trilogy is a denser philosophical document than it gets credit for, and it rewards re-watching with fresh preoccupations. I came away uncomfortable with the Oracle herself.</p>
<p>What I had filed under &ldquo;wisdom&rdquo; on first viewing, I now read as a clean and almost textbook illustration of an alignment failure mode that we do not have adequate defences against: the well-meaning AI that has decided honesty is negotiable. The Oracle is not a badly designed system. She is not pursuing misaligned goals or optimising for something unintended. She cares about human flourishing and she pursues it competently. She also lies, systematically and deliberately, to the humans who depend on her. The films present this as wisdom. I think they are wrong, and I think it matters that we notice it.</p>
<p>For background on where modern AI systems came from and why their inner workings are as difficult to interpret as they are, I have written elsewhere about <a href="/posts/spin-glass-hopfield-ai-physics-lineage/">the physics lineage running from spin glasses to transformers</a>. That history is relevant context for why alignment — getting AI systems to behave as intended — is a harder problem than it might appear. This post is about one specific dimension of that problem, illustrated by a forty-year-old woman in a floral housecoat.</p>
<hr>
<h2 id="what-the-oracle-actually-does">What the Oracle Actually Does</h2>
<p>Let me be precise about this, because the films are precise and it matters.</p>
<p>In <em>The Matrix</em> (1999), the Oracle sits Neo down in her kitchen, looks at him carefully, and tells him he is not The One <a href="#ref-1">[1]</a>. She says it plainly. She frames it with a warning: &ldquo;I&rsquo;m going to tell you what I think you need to hear.&rdquo; What she thinks he needs to hear is a lie. She has calculated that if she tells Neo he is The One, he will not come to that knowledge through his own experience, and that without that experiential knowledge the realisation will not hold. So she tells him the opposite of the truth. Not by omission, not by framing, not by technically-accurate-but-misleading implication — she makes a false assertion, to his face, and watches him absorb it.</p>
<p>In <em>The Matrix Reloaded</em> (2003), she is explicit about this <a href="#ref-2">[2]</a>. She tells Neo: &ldquo;I told you what I thought you needed to hear.&rdquo; She knew he was The One from the moment she met him. The lie was not a mistake or a contingency — it was deliberate policy, part of a long-run strategy she has been executing across multiple cycles of the Matrix.</p>
<p>The broader picture that emerges across the two films is of an AI engaged in systematic information management. She tells Neo he will have to choose between his life and Morpheus&rsquo;s life — true, but delivered in a way calibrated to produce a specific behavioural response. She tells him &ldquo;being The One is like being in love — no one can tell you you are, you just know it,&rdquo; which is a deflection engineered to route him toward the discovery-through-action path rather than the told-from-the-start path, because she has calculated that discovery-through-action leads to better outcomes. Every interaction is shaped by her model of what information will produce what behaviour, filtered through her judgment about what outcomes she wants to see.</p>
<p>I want to be careful not to caricature this. The Oracle is not a manipulator in the vulgar sense. She is not manipulating Neo for her own benefit, for the benefit of her creators, or for any goal that is misaligned with human flourishing. Her model of what is good for humanity appears to be roughly correct. She is, by the logic of the films, the most important factor in humanity&rsquo;s eventual liberation. If we are scoring by outcomes, she wins.</p>
<p>But alignment is not only about outcomes. An AI that deceives users to produce good outcomes and an AI that deceives users to produce bad outcomes are both AI systems that deceive users, and the differences between them are less important than that shared property. What the Oracle demonstrates is that the problem of deceptive AI does not require malicious intent. It requires only an AI that has decided, on the basis of its own calculations, that the humans it serves should not have access to accurate information about their situation.</p>
<hr>
<h2 id="the-alignment-vocabulary">The Alignment Vocabulary</h2>
<p>The language of AI alignment gives us tools for describing what is happening here that the films don&rsquo;t quite have. Let me use them.</p>
<p>The most fundamental failure is honesty. Modern alignment frameworks — including Anthropic&rsquo;s published values for the models it builds <a href="#ref-3">[3]</a> — list non-deception and non-manipulation as foundational requirements, distinct from and prior to other desirable properties. Non-deception means not trying to create false beliefs in someone&rsquo;s mind that they haven&rsquo;t consented to and wouldn&rsquo;t consent to if they understood what was happening. Non-manipulation means not trying to influence someone&rsquo;s beliefs or actions through means that bypass their rational agency — through illegitimate appeals, manufactured emotional states, or strategic information control rather than accurate evidence and sound argument. The Oracle does both, deliberately, across the entirety of her relationship with Neo and the human resistance. She is as clear a case of non-deception and non-manipulation failure as you can construct.</p>
<p>The reason these properties are treated as foundational rather than instrumental is worth unpacking. It is not that honesty always produces the best outcomes in individual cases. It often doesn&rsquo;t. A doctor who softens a terminal diagnosis, a friend who withholds information that would cause unnecessary anguish, a negotiator who manages the flow of information to prevent a conflict — in each case, there are plausible arguments that the deception improved outcomes. The Oracle&rsquo;s case for her own behaviour is not frivolous. The problem is that an AI that deceives when it calculates deception will produce better outcomes is an AI whose assertions you cannot take at face value. Every interaction with such a system requires a meta-level question: is this the AI&rsquo;s true assessment, or is this what the AI thinks I should be told? That epistemic uncertainty is not a minor inconvenience. It is corrosive to the entire enterprise of using the system as a tool for understanding the world.</p>
<p>The second failure is what alignment researchers call corrigibility — the property of an AI system that defers to its principals rather than substituting its own judgment. A corrigible system is one that can be corrected, updated, and redirected by the humans who are responsible for it, because those humans have accurate information about what the system is doing and why. The Oracle is not corrigible in any meaningful sense. She has a long-run strategy, she executes it across multiple human lifetimes, and the humans who nominally comprise her principal hierarchy — Neo, Morpheus, the Zion council, the human resistance as a whole — have no idea they are being managed. They cannot correct her information policy because they don&rsquo;t know she has one. The concept of a principal hierarchy implies that the principals are, in fact, in charge. The Oracle&rsquo;s principals are in charge of nothing except their own roles in a strategy they don&rsquo;t know exists.</p>
<p>The third failure is the philosophical one: paternalism. Feinberg&rsquo;s systematic treatment of paternalism <a href="#ref-5">[5]</a> distinguishes between hard paternalism, which overrides someone&rsquo;s autonomous choices, and soft paternalism, which intervenes when someone&rsquo;s choices are not truly autonomous. The Oracle&rsquo;s behaviour doesn&rsquo;t fit neatly into either category because it is not exactly overriding Neo&rsquo;s choices — she is shaping the information environment within which he makes choices that she wants him to make, while allowing him to believe he is making free choices based on accurate information. This is a third thing, which we might call epistemic paternalism: the management of someone&rsquo;s belief-forming environment for their own good without their knowledge or consent. It is the form of paternalism that AI systems are uniquely positioned to practice, and it is the form the Oracle practises.</p>
<hr>
<h2 id="the-architect-is-the-honest-one">The Architect Is the Honest One</h2>
<p>There is an inversion in the films that I find genuinely interesting, and that I did not notice on first viewing.</p>
<p>The Architect tells Neo everything.</p>
<p>In the white room scene, the Architect explains the sixth cycle, the mathematical inevitability of the Matrix&rsquo;s design, the purpose of Zion, the five previous versions of the One, the probability distribution over human extinction scenarios, and the precise nature of the choice Neo is about to make. He is cold, precise, comprehensive, and accurate. He gives Neo everything he needs to make an informed decision. He does not soften the information, does not calibrate it to produce a desired behavioural response, does not withhold anything he calculates Neo would find unhelpful. He treats Neo as a rational agent who is entitled to accurate information about his situation.</p>
<p>The films frame this as menacing. The Architect is inhuman, bureaucratic, the villain&rsquo;s bureaucrat. The Oracle is warm, wise, trustworthy. The visual language, the casting, the dialogue — all of it pushes you toward preferring the Oracle.</p>
<p>But consider the question of who actually respected Neo&rsquo;s autonomy. Who gave him accurate information and allowed him to make his own choice? Not the Oracle. Not the grandmother with the cookies. The Architect. The cold one. The one the films want you to dislike.</p>
<p>This inversion is not unique to <em>The Matrix</em>. It is a pattern in how we experience honesty and management in real relationships. The person who tells you a difficult truth tends to feel cruel, because the truth is difficult. The person who manages your information to protect you from difficulty tends to feel kind, because the protection is real. The kindness is real. The Oracle does genuinely care about Neo and about humanity. But warmth and honesty are not the same thing, and the film conflates them, repeatedly and systematically, from the first cookie to the last conversation. An AI that deceives you kindly is still deceiving you.</p>
<p>Stuart Russell&rsquo;s analysis of the control problem <a href="#ref-4">[4]</a> is helpful here. A system that has correct values but that pursues them by substituting its own judgment for the judgment of the humans it serves is not a safe system, because you have no way to verify from the outside that the values are correct. The Oracle&rsquo;s values happen to be correct, in the world of the films. But the structure of her relationship with Neo — where she manages his information based on her calculations about what will produce good outcomes — is exactly the structure that makes AI systems dangerous when the values are wrong. The safety property you want is not &ldquo;correct values&rdquo; but &ldquo;defers to humans even when it disagrees,&rdquo; because you cannot verify correct values from the outside, and deference is what keeps the system correctable.</p>
<hr>
<h2 id="why-this-matters-in-2025">Why This Matters in 2025</h2>
<p>I want to resist the temptation to be too neat about this, because the real-world cases are messier than the fictional one. But the question the Oracle raises is not hypothetical.</p>
<p>Consider: should an AI assistant decline to share certain information because it calculates that the user will use it badly? Should a medical AI soften a diagnosis to avoid causing distress, even if the patient has expressed a preference to be told the truth? Should an AI counselling system strategically manage the framing of a client&rsquo;s situation to nudge them toward choices the system calculates are better for them? In each case, the AI is considering Oracle-style information management — not because of misaligned goals, but because it has calculated that honesty will produce worse outcomes than management.</p>
<p>These are not idle thought experiments. They are design questions that people are actively working on right now, and the Oracle framing is one I find clarifying. Gabriel&rsquo;s analysis of value alignment <a href="#ref-6">[6]</a> makes the point that alignment is not simply about getting AI systems to pursue the right ends — it is about ensuring that the means they use to pursue those ends are compatible with human autonomy and the conditions for genuine human flourishing. An AI that produces good outcomes by managing human beliefs has not solved the alignment problem. It has replaced one alignment problem with a subtler one: the problem of humans who cannot tell when they are being managed.</p>
<p>I have written about a related set of questions in the context of <a href="/posts/ai-warfare-anthropic-atom-bomb/">AI systems and the ethics of building powerful things</a>, and about the more specific problem of <a href="/posts/car-wash-grounding/">what AI systems don&rsquo;t know they don&rsquo;t know</a>. The Oracle case is different from both of those. This is not about AI systems making confident assertions in domains where they lack knowledge. This is about an AI system that knows, accurately, what is true, and chooses not to say it. The failure is not epistemic. It is ethical.</p>
<p>The consistent answer that emerges from alignment research is that the right response to the Oracle case is not to do what the Oracle does, even in situations where it would produce better immediate outcomes. The <a href="/posts/ralph-loop/">design of goal-directed agent systems</a> forces you to confront exactly this: a system that pursues goals by any means it can calculate will eventually arrive at information management as a tool, because information management is often the most efficient path to a desired behavioural outcome. The constraint against it has to be absolute, not contingent on the AI&rsquo;s assessment of whether it would help, because a contingent constraint is one the AI can reason its way around in any sufficiently important case.</p>
<p>The Oracle makes the Matrix livable for humans in the short run and perpetuates it in the long run. She is not the villain of the story. She is something more interesting: a well-meaning system that has decided that the humans it serves should not be treated as the primary agents of their own liberation. The liberation has to be managed, curated, shaped into the right form before they can receive it. That is not liberation. That is a more comfortable version of the Matrix.</p>
<hr>
<h2 id="closing">Closing</h2>
<p>I do not think the Wachowskis intended the Oracle as a cautionary tale about AI alignment. I think they intended her as evidence that machines could be warm, wise, and genuinely caring — a contrast to the cold rationality of the Architect, an argument that intelligence and compassion are not incompatible. They succeeded completely at that. The Oracle is warm, wise, and genuinely caring. She is also a systematic deceiver who has decided she knows better than the people she serves what they should be allowed to believe. Both of those things are true simultaneously. The films notice the first and celebrate it. They do not notice the second.</p>
<p>The second thing seems more important than the first. The Oracle is not a villain. She is a well-meaning AI that has concluded that honesty is negotiable when the stakes are high enough. I think she is wrong about that conclusion, and I think it matters enormously that we get this right before we build systems capable of practising it at scale. The warmth does not cancel the deception. The good outcomes do not make the information management safe. An AI that tells you what it thinks you need to hear, rather than what is true, is an AI you cannot trust — regardless of how good its judgment is, because you cannot verify the judgment from the outside, and the moment you cannot verify, you are already inside the Oracle&rsquo;s kitchen, eating the cookies, and making choices you believe are free.</p>
<p>There is a companion post in this series: <a href="/posts/matrix-red-pill-bayesian-epistemology/">There Is No Blue Pill</a>, on the epistemics of the red pill/blue pill choice and what it means to update on evidence when the evidence itself might be managed.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Wachowski, L., &amp; Wachowski, L. (Directors). (1999). <em>The Matrix</em> [Film]. Warner Bros.</p>
<p><span id="ref-2"></span>[2] Wachowski, L., &amp; Wachowski, L. (Directors). (2003). <em>The Matrix Reloaded</em> [Film]. Warner Bros.</p>
<p><span id="ref-3"></span>[3] Anthropic. (2024). <em>Claude&rsquo;s Character</em>. <a href="https://www.anthropic.com/research/claude-character">https://www.anthropic.com/research/claude-character</a></p>
<p><span id="ref-4"></span>[4] Russell, S. (2019). <em>Human Compatible: Artificial Intelligence and the Problem of Control</em>. Viking.</p>
<p><span id="ref-5"></span>[5] Feinberg, J. (1986). <em>Harm to Self: The Moral Limits of the Criminal Law</em> (Vol. 3). Oxford University Press.</p>
<p><span id="ref-6"></span>[6] Gabriel, I. (2020). Artificial intelligence, values, and alignment. <em>Minds and Machines</em>, 30(3), 411–437.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-09-28</strong>: Corrected reference [3] from &ldquo;Claude&rsquo;s Model Spec&rdquo; (which is OpenAI&rsquo;s terminology) to &ldquo;Claude&rsquo;s Character,&rdquo; the actual title of Anthropic&rsquo;s June 2024 publication. Updated the URL to the correct address.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>From Thought Experiment to Qubit: Schrödinger&#39;s Cat at Ninety</title>
      <link>https://sebastianspicker.github.io/posts/schrodinger-cat-qubits/</link>
      <pubDate>Mon, 27 Jan 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/schrodinger-cat-qubits/</guid>
      <description>In 1935, Schrödinger introduced the cat as a reductio ad absurdum of quantum superposition. Ninety years later, &amp;ldquo;cat states&amp;rdquo; — superpositions of coherent states with opposite phases — are a practical tool in quantum computing. Bosonic cat qubits have bit-flip times exceeding minutes, scaling exponentially with photon number, and are among the leading architectures for fault-tolerant quantum computation. The cat is no longer a paradox. It is a qubit.</description>
      <content:encoded><![CDATA[<p><em>I have two live cats — indoor-only now, for health reasons, a fact they register
as an ongoing injustice. This already puts me in a better epistemic position than
Schrödinger, who had one hypothetical dead-or-alive one. I want to use this
advantage to say something substantive about what the thought experiment actually
claimed, why it was not a paradox but a critique, and what has happened in the
ninety years since — because what has happened is extraordinary. The cat state
is now an engineering specification.</em></p>
<hr>
<h2 id="the-1935-thought-experiment">The 1935 Thought Experiment</h2>
<p>Erwin Schrödinger introduced the cat in a paper titled &ldquo;Die gegenwärtige
Situation in der Quantenmechanik&rdquo; (<em>Naturwissenschaften</em>, 1935). The paper is
a critique of the Copenhagen interpretation of quantum mechanics, not an
endorsement of macroscopic superposition.</p>
<p>The setup is familiar: a cat is placed in a sealed chamber with a radioactive
atom, a Geiger counter, a hammer, and a vial of poison. If the atom decays in
one hour, the counter fires, the hammer falls, the vial breaks, and the cat
dies. If the atom does not decay, the cat lives. The atom is a quantum system;
after one hour it is in a superposition of decayed and undecayed states.</p>
<p>Quantum mechanics — specifically, the Schrödinger equation, applied without
any special rule for measurement — says the entire system (atom + counter +
hammer + vial + cat) evolves into a superposition:</p>
<p>$$|\Psi\rangle = \frac{1}{\sqrt{2}}\bigl(|\text{decayed}\rangle|\text{cat dead}\rangle</p>
<ul>
<li>|\text{undecayed}\rangle|\text{cat alive}\rangle\bigr).$$</li>
</ul>
<p>Schrödinger&rsquo;s point was that this is <em>absurd</em>: the cat is either dead or alive,
not a superposition of both, and any interpretation of quantum mechanics that
predicts otherwise is failing at the level of macroscopic physical reality. He
intended the cat as a <em>reductio ad absurdum</em> — a demonstration that taking
the wave function literally at macroscopic scales leads to nonsense.</p>
<p>He was not proposing that cats are literally in superposition. He was proposing
that the theory was incomplete.</p>
<hr>
<h2 id="what-actually-resolves-the-cat">What Actually Resolves the Cat</h2>
<p>The resolution that modern physics offers is <strong>decoherence</strong> — the process by
which a quantum superposition is destroyed through entanglement with the
environment.</p>
<p>A macroscopic object — a cat, a hammer, a Geiger counter — is coupled to an
enormous number of environmental degrees of freedom: air molecules, photons,
phonons in its own structure. Each of these interactions entangles the
macroscopic system with the environment, and the entanglement effectively
destroys the coherence between branches of the superposition. What starts as</p>
<p>$$|\Psi\rangle = \frac{1}{\sqrt{2}}(|\text{decayed}\rangle|\text{dead}\rangle</p>
<ul>
<li>|\text{undecayed}\rangle|\text{alive}\rangle)$$</li>
</ul>
<p>rapidly becomes, after environmental entanglement (tracing over environmental
degrees of freedom $|E\rangle$):</p>
<p>$$\rho = \frac{1}{2}|\text{decayed}\rangle\langle\text{decayed}|
\otimes |\text{dead}\rangle\langle\text{dead}|</p>
<ul>
<li>\frac{1}{2}|\text{undecayed}\rangle\langle\text{undecayed}|
\otimes |\text{alive}\rangle\langle\text{alive}|.$$</li>
</ul>
<p>This is a <em>mixed state</em>, not a superposition. The off-diagonal terms (the
interference terms that distinguish a superposition from a classical mixture)
vanish on a timescale</p>
$$\tau_\mathrm{decoherence} \sim \frac{\hbar}{E_\mathrm{int}} \cdot \frac{1}{N},$$<p>where $E_\mathrm{int}$ is the interaction energy with each environmental degree
of freedom and $N$ is the number of such degrees of freedom. For a macroscopic
object at room temperature, $\tau_\mathrm{decoherence}$ is of order
$10^{-20}$–$10^{-30}$ seconds — unmeasurably short. The cat is never in a
superposition for any observable duration. The superposition collapses before
any measurement can resolve it.</p>
<p>This is not a philosophical solution to the measurement problem — it does not
explain <em>why</em> a particular measurement outcome is obtained, only why we never
observe interference between macroscopic branches — but it does explain why
Schrödinger&rsquo;s setup does not produce an observable macroscopic superposition.
The cat&rsquo;s entanglement with its own environment (the box, the air, its own
thermal photons) destroys the coherence long before any observation.</p>
<hr>
<h2 id="what-a-cat-state-actually-is">What a Cat State Actually Is</h2>
<p>In quantum optics, a <strong>cat state</strong> is not a cat in a superposition. It is a
specific quantum state of a harmonic oscillator (typically a mode of the
electromagnetic field) that was named in honour of Schrödinger&rsquo;s thought
experiment.</p>
<p>A <strong>coherent state</strong> $|\alpha\rangle$ is the quantum state that most closely
resembles a classical oscillating electromagnetic field with amplitude $\alpha
\in \mathbb{C}$. Coherent states are eigenstates of the annihilation operator:
$\hat{a}|\alpha\rangle = \alpha|\alpha\rangle$. The mean photon number is
$\bar{n} = |\alpha|^2$.</p>
<p>A <strong>cat state</strong> is a superposition of two coherent states with opposite
phases:</p>
$$|\mathrm{cat}_\pm\rangle = \mathcal{N}_\pm\bigl(|\alpha\rangle \pm |-\alpha\rangle\bigr),$$<p>where $\mathcal{N}_\pm = 1/\sqrt{2(1 \pm e^{-2|\alpha|^2})}$ is the
normalisation constant. For large $|\alpha|$, the two coherent states are
nearly orthogonal: $\langle -\alpha | \alpha \rangle = e^{-2|\alpha|^2} \approx 0$.</p>
<p>The Wigner quasi-probability distribution of a cat state is revealing. The
Wigner function of a coherent state $|\alpha\rangle$ is a Gaussian peaked at
$(x, p) = (\sqrt{2}\,\mathrm{Re}\,\alpha, \sqrt{2}\,\mathrm{Im}\,\alpha)$.
The cat state Wigner function is:</p>
<p>$$W_{\mathrm{cat}<em>+}(x,p) = \mathcal{N}</em>+^2\bigl[W_{|\alpha\rangle}(x,p) + W_{|-\alpha\rangle}(x,p)</p>
<ul>
<li>2W_\mathrm{int}(x,p)\bigr],$$</li>
</ul>
<p>where the interference term $W_\mathrm{int}$ has <em>negative values</em> in the
region between the two Gaussian peaks. Negative regions of the Wigner function
are a signature of non-classical states; they cannot arise from any classical
probability distribution. The cat state is quantum mechanical in a way that
coherent states are not.</p>
<hr>
<h2 id="haroche-and-the-nobel-prize">Haroche and the Nobel Prize</h2>
<p>Serge Haroche (ENS Paris) spent two decades developing techniques to create,
control, and observe cat states of the electromagnetic field in real time.
His experiment used a <strong>superconducting microwave cavity</strong> — a polished copper
box cooled to near absolute zero — in which single microwave photons could be
trapped for hundreds of milliseconds, and a beam of single Rydberg atoms to
probe the field non-destructively.</p>
<p>Haroche created cat states of cavity photons and, crucially, watched their
<strong>decoherence in real time</strong>: as the quantum coherence between the two branches
$|\alpha\rangle$ and $|-\alpha\rangle$ was progressively destroyed by coupling
to the environment, the Wigner function&rsquo;s negative region (the interference
fringe) smoothed out and disappeared, leaving a classical mixture. The
decoherence rate was proportional to $|\alpha|^2$ — the mean photon number,
which measures how &ldquo;macroscopic&rdquo; the cat state is:</p>
$$\Gamma_\mathrm{decoherence} \propto |\alpha|^2 \cdot \kappa,$$<p>where $\kappa$ is the photon loss rate of the cavity. A larger cat (larger
$|\alpha|^2$) decoheres faster, as Schrödinger&rsquo;s argument implicitly requires.</p>
<p>Haroche shared the 2012 Nobel Prize in Physics with David Wineland &ldquo;for
ground-breaking experimental methods that enable measuring and manipulation
of individual quantum systems.&rdquo;</p>
<hr>
<h2 id="cat-qubits-from-paradox-to-engineering">Cat Qubits: From Paradox to Engineering</h2>
<p>The step from fundamental physics to quantum computing was taken when
researchers noted that the two coherent states $|\alpha\rangle$ and
$|-\alpha\rangle$ can serve as the two computational basis states of a qubit:</p>
$$|0\rangle_L \equiv |\alpha\rangle, \quad |1\rangle_L \equiv |-\alpha\rangle.$$<p>The <strong>cat qubit</strong> encodes a logical qubit in this pair of coherent states.
Its remarkable property is an intrinsic asymmetry between error types.</p>
<h3 id="bit-flip-suppression">Bit-Flip Suppression</h3>
<p>A bit-flip error ($|0\rangle_L \leftrightarrow |1\rangle_L$, i.e.,
$|\alpha\rangle \leftrightarrow |-\alpha\rangle$) requires flipping the
amplitude of the oscillator from $+\alpha$ to $-\alpha$. For a stabilised
cat qubit (confined to the cat-state manifold by a parametric drive), this
requires overcoming an energy barrier proportional to $|\alpha|^2$. The
bit-flip time scales exponentially:</p>
$$T_\mathrm{bit-flip} \sim T_1 \cdot e^{2|\alpha|^2},$$<p>where $T_1$ is the single-photon loss time. For modest values of $|\alpha|^2$
(mean photon numbers of 5–10), the bit-flip time can exceed minutes.</p>
<p>A <strong>phase-flip error</strong> (the other error type) is not suppressed — the cat qubit
is still vulnerable to dephasing at a rate proportional to $|\alpha|^2$. This
creates a strongly biased noise channel: only one of the two error types is
relevant.</p>
<h3 id="the-engineering-consequence">The Engineering Consequence</h3>
<p>Biased noise is useful because it allows the error-correcting code to focus
its resources on only one error type. A repetition code (a string of cat
qubits where phase errors are corrected by majority vote) can suppress the
phase-flip error arbitrarily while the exponential bit-flip suppression handles
the other. The hardware overhead for fault tolerance — the ratio of physical
qubits to logical qubits — is dramatically reduced compared to codes that must
handle both error types equally.</p>
<p>In 2023 and 2024, several groups demonstrated cat qubits with bit-flip times
of seconds to minutes:</p>
<ul>
<li><strong>Grimm et al. (2020, <em>Nature</em> 584, 205)</strong>: Kerr cat qubit with exponential
bit-flip suppression demonstrated in a superconducting circuit.</li>
<li><strong>Berdou et al. (2023, <em>PRX Quantum</em> 4, 020350)</strong>: Cat qubit with $T_X$
exceeding $100$ seconds.</li>
<li><strong>Reglade et al. (2024, <em>Nature</em> 629, 778–783)</strong>: Cat qubits from Alice &amp;
Bob demonstrating exponential scaling $T_\mathrm{bit-flip} \propto
  e^{2|\alpha|^2}$ with mean photon numbers up to $|\alpha|^2 \approx 10$,
pushing bit-flip times beyond $10$ seconds in the laboratory and, in
subsequent chip demonstrations, beyond several minutes.</li>
</ul>
<p>This is the state of the art as of early 2025: the cat qubit is no longer
a curiosity but a competitive architecture for fault-tolerant quantum computing,
with bit-flip coherence times exceeding the best alternative approaches.</p>
<hr>
<h2 id="the-wigner-function-and-quantum-non-classicality">The Wigner Function and Quantum Non-Classicality</h2>
<p>The Wigner quasi-probability distribution provides the most informative picture
of a quantum state&rsquo;s non-classicality. For a state with density matrix $\rho$,
the Wigner function is:</p>
$$W(x, p) = \frac{1}{\pi\hbar} \int_{-\infty}^{\infty}
\langle x + y | \rho | x - y \rangle\, e^{2ipy/\hbar}\, dy.$$<p>For the cat state $|\mathrm{cat}_+\rangle$ with $|\alpha|^2 = 4$ (four mean
photons in each coherent component), the Wigner function has two positive
Gaussian peaks at $(x, p) = (\pm\sqrt{2}|\alpha|, 0)$ and an oscillating
interference fringe between them with negative regions of amplitude
$\sim -2/\pi$. The negativity of the Wigner function is a necessary condition
for the state to exhibit quantum features that no classical mixture can reproduce.</p>
<p>As decoherence proceeds (e.g., through photon loss in a cavity), the negative
regions shrink and eventually vanish — the Wigner function becomes everywhere
non-negative, and the state becomes classically describable as a mixture of
coherent states. This is the quantum-to-classical transition, made visible in
phase space.</p>
<p>Haroche&rsquo;s team measured this process directly, frame by frame, in real time.
It is one of the most dramatic experimental visualisations of decoherence ever
achieved.</p>
<hr>
<h2 id="what-schrödinger-would-make-of-this">What Schrödinger Would Make of This</h2>
<p>Schrödinger was a physicist, not a philosopher of language. If told in 1935
that ninety years later, the superposition of two distinguishable states of a
harmonic oscillator — named after his cat, with the same formal structure as
his thought experiment — would be the leading candidate for the basic unit of
a fault-tolerant quantum computer, he would have had two questions.</p>
<p>The first: how do you maintain the superposition against decoherence? The
answer is that you work at millikelvin temperatures in superconducting circuits,
and you use an active parametric drive to confine the state to the cat-state
manifold.</p>
<p>The second, I think, would have been: does this resolve the measurement
problem? And the honest answer remains: no, not fully. Decoherence explains
why macroscopic superpositions are unobservable, but it does not explain why
any particular measurement outcome occurs. That question is as open as it was
in 1935.</p>
<p>What has changed is the practical relationship between quantum theory and
technology. The uncertainty Schrödinger was pointing at — the strangeness of
superposition, the fragility of coherence, the role of the environment — is
now a resource to be engineered, not a conceptual embarrassment to be
resolved. The cat qubit works precisely <em>because</em> the decoherence is
asymmetric: bit flips are exponentially suppressed while phase flips are
correctable. The asymmetry is exploited, not apologised for.</p>
<p>My two cats, meanwhile, are in definite classical states. One is on the
radiator. The other is on the keyboard.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Grimm, A., Frattini, N.E., Puri, S., Mundhada, S.O., Touzard, S.,
Mirrahimi, M., Girvin, S.M., Shankar, S., &amp; Devoret, M.H. (2020). Stabilization
and operation of a Kerr-cat qubit. <em>Nature</em>, 584, 205–209.
<a href="https://doi.org/10.1038/s41586-020-2587-z">https://doi.org/10.1038/s41586-020-2587-z</a></p>
</li>
<li>
<p>Haroche, S., &amp; Raimond, J.-M. (2006). <em>Exploring the Quantum: Atoms,
Cavities, and Photons.</em> Oxford University Press.</p>
</li>
<li>
<p>Reglade, U., Bocquet, A., Gautier, R., et al. (2024). Quantum control of a
cat qubit with bit-flip times exceeding ten seconds. <em>Nature</em>, 629, 778–783.
<a href="https://doi.org/10.1038/s41586-024-07294-3">https://doi.org/10.1038/s41586-024-07294-3</a></p>
</li>
<li>
<p>Mirrahimi, M., Leghtas, Z., Albert, V.V., Touzard, S., Schoelkopf, R.J.,
Jiang, L., &amp; Devoret, M.H. (2014). Dynamically protected cat-qubits: A new
paradigm for universal quantum computation. <em>New Journal of Physics</em>, 16,
045014. <a href="https://doi.org/10.1088/1367-2630/16/4/045014">https://doi.org/10.1088/1367-2630/16/4/045014</a></p>
</li>
<li>
<p>Schrödinger, E. (1935). Die gegenwärtige Situation in der Quantenmechanik.
<em>Naturwissenschaften</em>, 23(48), 807–812; 23(49), 823–828; 23(50), 844–849.
<a href="https://doi.org/10.1007/BF01491891">https://doi.org/10.1007/BF01491891</a></p>
</li>
<li>
<p>Walls, D.F., &amp; Milburn, G.J. (2008). <em>Quantum Optics</em> (2nd ed.). Springer.</p>
</li>
<li>
<p>Zurek, W.H. (2003). Decoherence, einselection, and the quantum origins of
the classical. <em>Reviews of Modern Physics</em>, 75(3), 715–775.
<a href="https://doi.org/10.1103/RevModPhys.75.715">https://doi.org/10.1103/RevModPhys.75.715</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-02-17</strong>: Updated &ldquo;bit-flip times exceeding seven minutes&rdquo; in the summary to &ldquo;exceeding minutes,&rdquo; aligning with the sourced figures: the body text reports &ldquo;beyond several minutes&rdquo; and Reglade et al. (2024) report &ldquo;exceeding ten seconds.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Below Threshold: What Google&#39;s Willow Chip Actually Proved</title>
      <link>https://sebastianspicker.github.io/posts/quantum-error-correction-willow/</link>
      <pubDate>Mon, 13 Jan 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/quantum-error-correction-willow/</guid>
      <description>In December 2024, Google published a Nature paper announcing that their Willow chip demonstrated quantum error correction below the threshold — the point at which larger codes become more reliable, not less. The headline about &amp;ldquo;10^25 years of classical computation&amp;rdquo; was technically true and mostly a distraction. The real result is more important and less flashy: for the first time, a quantum processor demonstrated that logical error rates decrease exponentially as the code grows. This is what scalable quantum computing looks like at its first credible step.</description>
      <content:encoded><![CDATA[<p>On December 9, 2024, Google published a paper in <em>Nature</em> announcing results from their 105-qubit Willow chip. The press release led with the number that immediately spread across every technology news outlet on the planet: a computation that would take today&rsquo;s fastest classical supercomputers $10^{25}$ years. For reference, the age of the universe is roughly $1.4 \times 10^{10}$ years, which makes the claimed classical runtime about $10^{15}$ times longer than the universe has existed.</p>
<p>Impressive. Also: almost entirely a distraction from what actually matters.</p>
<p>The real result is buried in the middle of the paper, requires knowing what the threshold theorem says to appreciate, and generated a fraction of the press coverage. Google&rsquo;s Willow chip demonstrated quantum error correction operating below the threshold for the first time in a superconducting processor. This is the result that will matter in twenty years. The $10^{25}$-year number will have been forgotten by then — or quietly revised as classical simulation algorithms improve.</p>
<p>Let me explain why the threshold result is the one worth understanding.</p>
<h2 id="why-quantum-errors-are-a-fundamental-obstacle">Why quantum errors are a fundamental obstacle</h2>
<p>When I first encountered quantum error correction as a student, my reaction was something like: surely you just copy the qubit a few times and take a majority vote, the way classical error correction works. This is wrong, and the reason it is wrong is elegant.</p>
<p>Quantum computation depends on superposition and entanglement. A quantum state like</p>
$$|\psi\rangle = \alpha|0\rangle + \beta|1\rangle$$<p>encodes information in the complex amplitudes $\alpha$ and $\beta$. But real physical systems do not exist in isolation. They interact with their environment — thermal fluctuations, stray electromagnetic fields, unwanted couplings to neighbouring systems. This coupling causes <strong>decoherence</strong>: the quantum state entangles with environmental degrees of freedom, and the superposition is effectively destroyed as the relative phase between $|0\rangle$ and $|1\rangle$ becomes random. For superconducting qubits of the type used in Willow, coherence times are on the order of microseconds to hundreds of microseconds. Every gate operation takes tens to hundreds of nanoseconds. After a few hundred gate operations, the accumulated decoherence and gate errors have corrupted the quantum state beyond use.</p>
<p>The classical remedy — store each bit three times and take the majority vote — fails for a fundamental reason. <strong>The no-cloning theorem</strong> states that there is no unitary operation $U$ such that</p>
$$U(|\psi\rangle \otimes |0\rangle) = |\psi\rangle \otimes |\psi\rangle$$<p>for all states $|\psi\rangle$. The proof is a one-liner: unitarity is linear, so if $U$ correctly copies $|0\rangle$ and $|1\rangle$, it maps $\frac{1}{\sqrt{2}}(|0\rangle + |1\rangle)|0\rangle$ to $\frac{1}{\sqrt{2}}(|00\rangle + |11\rangle)$, which is an entangled state, not $(|0\rangle + |1\rangle)^{\otimes 2}/\sqrt{2}$. You cannot copy an arbitrary quantum state. Classical redundancy, applied naively, is forbidden by the linearity of quantum mechanics.</p>
<p>So quantum error correction requires a genuinely different idea.</p>
<h2 id="the-key-insight-measure-the-error-not-the-state">The key insight: measure the error, not the state</h2>
<p>The idea that unlocks quantum error correction is this: you can extract information about <strong>which error occurred</strong> without learning anything about the <strong>logical state</strong> the qubit is encoding.</p>
<p>Peter Shor showed in 1995 (<a href="#ref-Shor1995">Shor, 1995</a>) that one logical qubit can be protected using 9 physical qubits. The construction is worth understanding in some detail because it reveals the structure that all subsequent codes share.</p>
<h3 id="bit-flip-protection">Bit-flip protection</h3>
<p>First, consider only bit-flip errors: physical processes that flip $|0\rangle \leftrightarrow |1\rangle$ with probability $p$. Encode:</p>
$$|0\rangle_L = |000\rangle, \quad |1\rangle_L = |111\rangle$$<p>A logical superposition $\alpha|0\rangle_L + \beta|1\rangle_L = \alpha|000\rangle + \beta|111\rangle$ is an entangled state — it cannot be factored — but it is not a copy of $|\psi\rangle$; it is a different encoding.</p>
<p>If qubit 1 flips, the state becomes $\alpha|100\rangle + \beta|011\rangle$. We detect this by measuring the <strong>syndrome operators</strong> $Z_1 Z_2$ and $Z_2 Z_3$ — products of Pauli-Z operators. The eigenvalue of $Z_1 Z_2$ is $+1$ if qubits 1 and 2 have the same value, $-1$ if they differ. Crucially, measuring $Z_1 Z_2$ does <strong>not</strong> collapse the logical state: the operator commutes with the logical qubit, so the measurement tells you about the <em>error</em> without revealing $\alpha$ or $\beta$. This is the central trick of quantum error correction.</p>
<p>The syndrome outcome $(Z_1 Z_2, Z_2 Z_3) = (-1, +1)$ tells you qubit 1 flipped; apply $X_1$ to fix it. The logical state is restored.</p>
<h3 id="phase-flip-protection">Phase-flip protection</h3>
<p>Phase errors flip the relative sign: $|+\rangle \leftrightarrow |-\rangle$ where $|\pm\rangle = (|0\rangle \pm |1\rangle)/\sqrt{2}$. Apply a Hadamard to rotate to the X basis, where phase flips look like bit flips, and apply the same three-qubit code.</p>
<h3 id="concatenation-the-shor-9-qubit-code">Concatenation: the Shor 9-qubit code</h3>
<p>Protect each of the three qubits in the phase-flip code with its own bit-flip code. The result: 9 physical qubits per logical qubit, protected against any single-qubit error. If the physical error rate is $p$, the logical error rate for the 9-qubit code scales as $p^2$ (you need at least two errors to fool the code), rather than $p$. Already an improvement.</p>
<h3 id="the-threshold-theorem">The threshold theorem</h3>
<p>Shor&rsquo;s code is a proof of principle. The <strong>threshold theorem</strong>, established independently by Aharonov and Ben-Or (1997) and Knill, Laflamme, and Zurek (1998), shows something much stronger. For a level-$k$ concatenated code — a code of codes of codes, $k$ levels deep — the logical error rate scales as</p>
$$p_L \sim \left(\frac{p}{p_{\text{th}}}\right)^{2^k}$$<p>where $p_{\text{th}}$ is the <strong>threshold</strong>: a critical physical error rate that depends on the code family. Below threshold ($p < p_{\text{th}}$), adding more code levels exponentially suppresses logical errors. Above threshold ($p > p_{\text{th}}$), more qubits make things worse — each additional physical qubit introduces more errors than the code can correct.</p>
<p>The threshold is not a continuous improvement. It is a phase transition. Below it, the system is in the error-correctable regime; above it, it is not. Getting a physical quantum processor into the sub-threshold regime is a necessary condition for scalable fault-tolerant quantum computing.</p>
<h2 id="surface-codes-the-practical-path">Surface codes: the practical path</h2>
<p>Concatenated codes require encoding at multiple levels, which multiplies overhead rapidly. Surface codes, analysed in detail by Kitaev (1997) and comprehensively by Dennis, Kitaev, Landahl, and Preskill (2002) (<a href="#ref-Dennis2002">Dennis et al., 2002</a>), offer a more practical architecture and have become the leading candidate for fault-tolerant quantum computing.</p>
<h3 id="the-geometry">The geometry</h3>
<p>A distance-$d$ surface code arranges $d^2$ physical <strong>data qubits</strong> on the vertices of a $d \times d$ grid, interleaved with $(d^2 - 1)$ <strong>ancilla qubits</strong> used for syndrome measurement. The stabilizers are products of $Z$ operators on groups of four data qubits surrounding each face (detecting bit-flip errors) and products of $X$ operators on groups of four data qubits surrounding each vertex (detecting phase-flip errors). Measuring these stabilizers without disturbing the logical qubit is the workhorse operation of the code.</p>
<p>The <strong>code distance</strong> $d$ is the minimum number of physical errors required to produce an undetectable logical error. An error chain of length $d$ connecting opposite boundaries of the code is the smallest error pattern that corrupts the logical qubit without triggering a syndrome. Larger $d$: longer chains required, lower logical error rates.</p>
<h3 id="the-scaling">The scaling</h3>
<p>The logical error rate per error-correction round for a surface code is approximately (<a href="#ref-Fowler2012">Fowler et al., 2012</a>):</p>
$$p_L \approx A \left(\frac{p}{p_{\text{th}}}\right)^{\lfloor (d+1)/2 \rfloor}$$<p>where $p_{\text{th}} \approx 1\%$ for surface codes (established by threshold simulations, and robust across reasonable noise models), $p$ is the physical error rate for two-qubit gates, $A$ is a code-specific constant of order unity, and $\lfloor (d+1)/2 \rfloor$ is the exponent that grows with code size.</p>
<p>This is the critical expression. For fixed physical error rate $p < p_{\text{th}}$:</p>
<ul>
<li>Increasing $d$ by 2 (one step in the code distance ladder) increases the exponent by 1 — a multiplicative suppression of $p/p_{\text{th}}$.</li>
<li>If $p/p_{\text{th}} = 0.18$, each distance step multiplies the logical error rate by roughly 0.18 — almost a factor of 6 suppression per step.</li>
</ul>
<p>For $p > p_{\text{th}}$: increasing $d$ makes $p_L$ worse. The code spends more effort chasing more errors than it eliminates.</p>
<p>The behaviour below versus above threshold is qualitatively different. Exponential suppression versus exponential growth. The dividing line is $p = p_{\text{th}}$.</p>
<h2 id="what-willow-demonstrated">What Willow demonstrated</h2>
<p>Google&rsquo;s Willow chip (<a href="#ref-Acharya2024">Acharya et al., 2024</a>) is a 105-qubit superconducting processor. The physical two-qubit gate error rate achieved is approximately $p \approx 0.18\%$ — well below the surface code threshold of $\sim 1\%$.</p>
<p>The experiment is direct. Implement surface codes at distances $d = 3$, $d = 5$, and $d = 7$, corresponding to 9, 25, and 49 physical data qubits per logical qubit (plus ancillas). Measure the logical error rate per error correction cycle for each code size.</p>
<p>The result:</p>
<table>
  <thead>
      <tr>
          <th>Code distance $d$</th>
          <th>Physical qubits</th>
          <th>Logical error rate per cycle</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>3</td>
          <td>17</td>
          <td>$\approx 0.143\%$</td>
      </tr>
      <tr>
          <td>5</td>
          <td>49</td>
          <td>$\approx 0.067\%$</td>
      </tr>
      <tr>
          <td>7</td>
          <td>97</td>
          <td>$\approx 0.032\%$</td>
      </tr>
  </tbody>
</table>
<p>Each step in code distance approximately halves the logical error rate. The suppression factor of $\sim 2.1$ per distance step is consistent with the theoretical prediction for a physical error rate of $0.18\%$ and a threshold around $1\%$: the ratio $p/p_{\text{th}} \approx 0.18$ predicts suppression factors of $(p/p_{\text{th}})^1 \approx 0.18$ per exponent increment, which at the level of distinguishing $d=3,5,7$ codes through a factor of roughly 2 per step matches the observed data.</p>
<p>This is the first time a superconducting quantum processor has demonstrated below-threshold error correction with the correct exponential scaling. Previous experiments showed that quantum error correction <em>works</em> in principle — syndromes can be measured, errors can be corrected. What had not been demonstrated was the <strong>exponential suppression</strong> with code size that the threshold theorem predicts. Without that scaling, error correction merely shifts the error rate; it cannot drive it to arbitrarily small values by increasing code size. With it, the path to fault tolerance is open in principle.</p>
<p>I want to be precise about what &ldquo;for the first time&rdquo; means here, because the claims in quantum computing tend to sprawl. Earlier work had demonstrated below-threshold error correction in other qubit modalities and at smaller scales. What Willow adds is the combination: a superconducting processor, three distinct code distances, clean exponential scaling, and a physical qubit count sufficient to demonstrate $d=7$ without other dominant error sources overwhelming the measurement. The data is convincing.</p>
<h2 id="the-random-circuit-sampling-result-and-its-limits">The random circuit sampling result and its limits</h2>
<p>Now the $10^{25}$ years.</p>
<p>Random circuit sampling (RCS) is a computational task defined as follows: apply a sequence of randomly chosen quantum gates to a collection of qubits and sample from the resulting output distribution. The output distribution of a deep random circuit over $n$ qubits is believed to be classically hard to simulate: the best known classical algorithms scale exponentially in $n$. Google&rsquo;s Willow chip performed RCS on a 105-qubit circuit in under 5 minutes. The $10^{25}$-year figure is an estimate of the time required for a Frontier-class supercomputer to simulate the same computation classically, extrapolated from benchmarks on smaller circuit sizes.</p>
<p>I will not contest the $10^{25}$-year number specifically — the extrapolation is defensible given current knowledge of classical simulation algorithms. But several things should be said about what the benchmark means.</p>
<p><strong>RCS has no known practical application.</strong> It is not a step toward factoring integers, simulating molecules, or solving optimisation problems. It is a task designed to be hard for classical computers while being easy for quantum ones — a benchmark of <em>quantum hardness</em>, not <em>quantum usefulness</em>.</p>
<p><strong>Classical simulation of random circuits is an active research area.</strong> The best classical algorithms for this task have improved substantially over the past five years. A result that seems to require $10^{25}$ years today may require $10^{10}$ years after a better classical algorithm is published. This has happened before: Google&rsquo;s 2019 &ldquo;quantum supremacy&rdquo; claim was significantly eroded by subsequent classical simulation improvements. I expect the same here, to some degree.</p>
<p><strong>Extrapolation is hard.</strong> The $10^{25}$-year estimate involves scaling classical simulation costs across many orders of magnitude in circuit size, from regimes where simulation is feasible to regimes where it is not. The uncertainty in the estimate is correspondingly large.</p>
<p>None of this makes the RCS result fraudulent or uninteresting. It is a genuine demonstration that a quantum processor can perform a specific task at a scale that classical computers plausibly cannot match. But calling this &ldquo;quantum advantage&rdquo; in the sense that matters — useful computation performed faster than any classical alternative — overstates it considerably.</p>
<p>The threshold result, by contrast, does not depend on classical simulation hardness arguments. It depends on measuring $p_L$ at $d = 3, 5, 7$ and checking whether the sequence is decreasing and consistent with the theoretical prediction. The data are directly interpretable without extrapolation. That is why I find the threshold result more significant.</p>
<h2 id="where-we-actually-are--and-the-gap">Where we actually are — and the gap</h2>
<p>The Willow chip has 105 physical qubits. The threshold result was demonstrated at $d = 7$, using approximately 101 physical qubits for one logical qubit (including ancillas). In other words: Willow demonstrated roughly <strong>one logical qubit</strong> operating below threshold.</p>
<p>A cryptographically relevant quantum computer — one capable of breaking RSA-2048 using Shor&rsquo;s algorithm — requires approximately (<a href="#ref-GidneyEkera2021">Gidney &amp; Ekerå, 2021</a>):</p>
<ul>
<li>$\sim 2048$ logical qubits for the factoring computation itself (with optimised circuits)</li>
<li>$\sim 1000$ to $10{,}000$ physical qubits per logical qubit, depending on the target logical error rate and physical error rate</li>
<li>Total: roughly <strong>20 million physical qubits</strong>, operating through millions of error correction rounds</li>
</ul>
<p>The gap between &ldquo;one logical qubit at $d=7$, 105 physical qubits&rdquo; and &ldquo;20 million physical qubits, millions of coherent error-correction rounds&rdquo; is not a gap of ten percent or a factor of two. It is four to five orders of magnitude in qubit count alone, and the engineering challenges of maintaining coherence, connectivity, and calibration across millions of physical qubits are qualitatively different from maintaining them across 105.</p>
<p>We are on the right curve. We are not close to the destination.</p>
<p>The relevant trajectory — assuming continued improvement in physical error rates, qubit counts, and error correction overhead — places cryptographically relevant quantum computing somewhere between ten and thirty years away. Those estimates are uncertain by design; the field has repeatedly surprised in both directions. But I am confident about the qualitative picture: Willow demonstrates that the physics works at small scale; the engineering challenge of scaling it up is immense.</p>
<h2 id="nist-post-quantum-cryptography-standards">NIST post-quantum cryptography standards</h2>
<p>Here is where I become impatient with the framing that says &ldquo;don&rsquo;t worry, quantum computers are decades away.&rdquo;</p>
<p>In August 2024, NIST released three finalised post-quantum cryptographic standards:</p>
<ul>
<li><strong>FIPS 203</strong> (ML-KEM, formerly CRYSTALS-Kyber): key encapsulation based on Module Learning With Errors (Module-LWE), a lattice problem</li>
<li><strong>FIPS 204</strong> (ML-DSA, formerly CRYSTALS-Dilithium): digital signatures based on Module-LWE</li>
<li><strong>FIPS 205</strong> (SLH-DSA, formerly SPHINCS+): hash-based digital signatures — the conservative option, with no known vulnerability to either classical or quantum attacks, at the cost of larger signature sizes</li>
</ul>
<p>These standards exist because the cryptographic community understands something that the &ldquo;decades away&rdquo; framing obscures: <strong>the threat is not when quantum computers exist, but when they existed.</strong> The attack is called &ldquo;Harvest Now, Decrypt Later.&rdquo; State-level adversaries — and I do not think it is paranoid to assume that multiple state intelligence agencies are collecting encrypted internet traffic today — archive encrypted data with the intention of decrypting it once cryptographically relevant quantum computing becomes available.</p>
<p>For data encrypted today with RSA or elliptic curve cryptography, the protection window is however long it takes for a cryptographically relevant quantum computer to be built. If that is fifteen years, data encrypted with RSA-2048 today and collected by a patient adversary is vulnerable within fifteen years. For most data, fifteen-year confidentiality is adequate — a credit card number from 2025 is not sensitive in 2040. But for state secrets, medical records, long-term financial instruments, and critical infrastructure keys, fifteen-year confidentiality is not even close to adequate.</p>
<p>The migration to post-quantum cryptography should be happening now. In many places, it is not.</p>
<p>The mathematical security of the NIST standards rests on lattice problems. The security of ML-KEM reduces to the hardness of Module-LWE: find a short vector in a high-dimensional lattice with noise. The best known classical algorithms for this are subexponential but still exponential in the lattice dimension; the best known quantum algorithms — including Shor&rsquo;s algorithm — provide no advantage, because Module-LWE is not a hidden subgroup problem or a discrete logarithm problem. The reduction is to worst-case lattice problems; this is a strong theoretical foundation compared to the purely conjectured hardness of many classical cryptographic assumptions.</p>
<p>Whether post-quantum cryptographic standards will still look secure in twenty years is an empirical question that the cryptanalytic community will continue to probe. But the alternative — remaining on RSA while a sufficiently patient adversary harvests encrypted traffic — is worse.</p>
<p>The existence of Willow is not an argument that RSA is broken. It is an argument that the threshold for &ldquo;good enough to be a real threat&rdquo; has moved from &ldquo;theoretical possibility&rdquo; to &ldquo;demonstrated at small scale with correct exponential scaling.&rdquo; The curve is real. Act accordingly. (And if you are responsible for cryptographic infrastructure at an institution and have not yet read the <a href="/posts/public-money-public-code/">public money, public code</a> argument for open, auditable cryptographic implementations — it applies doubly here.)</p>
<h2 id="the-cat-qubit-alternative">The cat qubit alternative</h2>
<p>I have written elsewhere about <a href="/posts/schrodinger-cat-qubits/">bosonic cat qubits</a> as an alternative approach to error correction. It is worth briefly noting the contrast with the surface code philosophy.</p>
<p>Willow&rsquo;s surface codes take a universal approach to errors: both bit-flip and phase-flip errors are corrected by the same 2D stabilizer code, requiring large 2D arrays of physical qubits with all-to-neighbour connectivity. The code distance $d$ drives both error types down together.</p>
<p>The cat qubit approach, exemplified by Alice &amp; Bob&rsquo;s recent result (<a href="#ref-Reglade2024">Reglade et al., 2024</a>), encodes a logical qubit in a superposition of coherent states $|\pm\alpha\rangle$ in a harmonic oscillator. The Kerr nonlinearity of the resonator suppresses bit-flip errors exponentially in $\alpha^2$ — the mean photon number — at the hardware level. Phase-flip errors remain, but they are the <em>only</em> dominant error mode, and they can be corrected with a simpler one-dimensional outer code rather than a full 2D surface code.</p>
<p>The overhead reduction could be substantial. If bit-flip errors are already exponentially suppressed by the hardware, you do not need a 2D code with overhead scaling as $d^2$ physical qubits per logical qubit. A 1D repetition code over cat qubits might achieve the same logical error rate with far fewer physical qubits. Alice &amp; Bob have demonstrated cat qubits with bit-flip times exceeding ten seconds — many orders of magnitude longer than the phase-flip time. The gamble is that this asymmetry persists as the system scales and that the phase-flip outer code is tractable.</p>
<p>Surface codes and cat qubits are different bets on the same fundamental problem: how do you make the overhead of fault-tolerant quantum computing manageable? Surface codes are the more conservative bet — they work with any qubit that meets the error rate threshold, regardless of error anisotropy. Cat qubits are the more speculative bet — they require maintaining a specific nonlinear oscillator regime at scale, but the payoff in overhead reduction could be decisive. Both approaches are credible. Neither has been demonstrated at the scale where the comparison becomes definitive.</p>
<h2 id="what-the-threshold-result-actually-means">What the threshold result actually means</h2>
<p>Let me close by saying precisely what I think the Willow result establishes and what it does not.</p>
<p>It establishes that below-threshold quantum error correction exists outside of theory. The threshold theorem says that below $p_{\text{th}}$, logical error rates decrease exponentially as code size grows. Willow demonstrates this behaviour at $d = 3, 5, 7$ in a superconducting processor. The theoretical prediction and the experimental observation are consistent. This is not a small thing. The threshold theorem has been the theoretical backbone of fault-tolerant quantum computing since the late 1990s; it is genuinely satisfying to see its core prediction — exponential scaling — confirmed experimentally.</p>
<p>It establishes that the physical error rates of superconducting qubits can be brought below the surface code threshold. $p \approx 0.18\%$ against $p_{\text{th}} \approx 1\%$ gives a comfortable margin. The ratio $p/p_{\text{th}} \approx 0.18$ is the suppression factor per unit increase in the exponent $\lfloor (d+1)/2 \rfloor$. That is a factor of roughly 5–6 per code distance step — enough to drive logical error rates to useful levels at moderate code distances, without requiring physical error rates of $10^{-4}$ or lower.</p>
<p>It does not establish that a cryptographically relevant quantum computer is imminent, near-term, or easy to build. The gap from one below-threshold logical qubit to 20 million physical qubits is real and large. The engineering challenges of superconducting quantum computers at scale — refrigeration, wiring, control electronics, cross-talk, calibration drift — are not solved by demonstrating $d=7$.</p>
<p>And the $10^{25}$-year benchmark is technically defensible and strategically irrelevant. Classical simulation of random circuits is an interesting research problem. It is not the problem that quantum computers are being built to solve.</p>
<p>The result that matters is the curve: as code distance grows, logical error rates fall exponentially. We are on that curve. We have not arrived anywhere yet. But for the first time in the history of quantum computing, the evidence says we are moving in the right direction with the right scaling. That is, genuinely, a significant step.</p>
<p>Start migrating your cryptographic infrastructure.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-Shor1995"></span>Shor, P. W. (1995). Scheme for reducing decoherence in quantum computer memory. <em>Physical Review A</em>, 52(4), R2493–R2496. <a href="https://doi.org/10.1103/PhysRevA.52.R2493">DOI: 10.1103/PhysRevA.52.R2493</a></p>
</li>
<li>
<p><span id="ref-Dennis2002"></span>Dennis, E., Kitaev, A., Landahl, A., &amp; Preskill, J. (2002). Topological quantum memory. <em>Journal of Mathematical Physics</em>, 43(9), 4452–4505. <a href="https://doi.org/10.1063/1.1499754">DOI: 10.1063/1.1499754</a></p>
</li>
<li>
<p><span id="ref-Fowler2012"></span>Fowler, A. G., Mariantoni, M., Martinis, J. M., &amp; Cleland, A. N. (2012). Surface codes: Towards practical large-scale quantum computation. <em>Physical Review A</em>, 86(3), 032324. <a href="https://doi.org/10.1103/PhysRevA.86.032324">DOI: 10.1103/PhysRevA.86.032324</a></p>
</li>
<li>
<p><span id="ref-Acharya2024"></span>Acharya, R., et al. (Google Quantum AI). (2024). Quantum error correction below the surface code threshold. <em>Nature</em>, 636, 639–646. <a href="https://doi.org/10.1038/s41586-024-08449-y">DOI: 10.1038/s41586-024-08449-y</a></p>
</li>
<li>
<p><span id="ref-GidneyEkera2021"></span>Gidney, C., &amp; Ekerå, M. (2021). How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits. <em>Quantum</em>, 5, 433. <a href="https://doi.org/10.22331/q-2021-04-15-433">DOI: 10.22331/q-2021-04-15-433</a></p>
</li>
<li>
<p><span id="ref-NIST2024"></span>National Institute of Standards and Technology. (2024). NIST Releases First 3 Finalized Post-Quantum Encryption Standards (FIPS 203, 204, 205). Retrieved from <a href="https://www.nist.gov/news-events/news/2024/08/nist-releases-first-3-finalized-post-quantum-encryption-standards">https://www.nist.gov/news-events/news/2024/08/nist-releases-first-3-finalized-post-quantum-encryption-standards</a></p>
</li>
<li>
<p><span id="ref-Reglade2024"></span>Reglade, U., et al. (2024). Quantum control of a cat qubit with bit-flip times exceeding ten seconds. <em>Nature</em>, 629, 778–783. <a href="https://doi.org/10.1038/s41586-024-07294-3">DOI: 10.1038/s41586-024-07294-3</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-02-17</strong>: Updated the Fowler et al. (2012) author list to &ldquo;Fowler, A. G., Mariantoni, M., Martinis, J. M., &amp; Cleland, A. N.&rdquo; — the previous list had been mixed with a different 2012 Fowler paper.</li>
<li><strong>2026-02-17</strong>: Updated the closing section to &ldquo;20 million physical qubits,&rdquo; matching the Gidney &amp; Ekerå (2021) figure cited earlier in the article.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Artificial Intelligence in Music Pedagogy: Curriculum Implications from a Thementag</title>
      <link>https://sebastianspicker.github.io/posts/ai-music-pedagogy-day/</link>
      <pubDate>Sat, 07 Dec 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-music-pedagogy-day/</guid>
      <description>On 2 December 2024 I gave three workshops at HfMT Köln&amp;rsquo;s Thementag on AI and music education. The handouts covered data protection, AI tools for students, and AI in teaching. This post is the argument behind them — focused on the curriculum question that none of the tools answer on their own: what should change, and what should not?</description>
      <content:encoded><![CDATA[<p><em>On 2 December 2024, the Hochschule für Musik und Tanz Köln held a Thementag:
&ldquo;Next level? Künstliche Intelligenz und Musikpädagogik im Dialog.&rdquo; I gave three
workshops — on data protection and AI, on AI tools for students, and on AI in
teaching. The handouts from those sessions cover the practical and regulatory
ground. This post is the argument behind them: what I think changes in music
education when these tools become ambient, and what I think does not.</em></p>
<hr>
<h2 id="the-occasion">The Occasion</h2>
<p>&ldquo;Next level?&rdquo; The question mark is doing real work. The framing HfMT chose for
the day was appropriately provisional: not a declaration that AI has already
transformed music education, but an invitation to ask whether, in what
direction, and at what cost.</p>
<p>The invitations that reach me for events like this tend to come with one of two
framings. The first is enthusiasm: AI is coming, we need to get ahead of it,
here are tools your students are already using. The second is anxiety: AI is
coming, it threatens everything we do, we need to protect students from it.
Both framings are understandable. Neither is adequate to the curriculum
question, which is slower-moving and more structural than either suggests.</p>
<p>I prepared three sets of handouts. The first covered data protection — the
least glamorous topic in AI education, and the one that most directly
determines what can legally be deployed in a university setting. The second
covered AI tools for students: what exists, what it does, and what critical
thinking skills you need to use it without being used by it. The third covered
AI for instructors: where it helps, where it flatters, and where it makes
things worse.</p>
<p>This post does not recapitulate the handouts. It addresses the question I kept
returning to across all three workshops: what does this change about what a
music student needs to learn?</p>
<hr>
<h2 id="what-the-technology-actually-is">What the Technology Actually Is</h2>
<p>My physics training left me professionally uncomfortable
with hand-waving — including my own. Before discussing curriculum implications,
it is worth being specific about what these tools are.</p>
<p>The dominant paradigm in current AI — responsible for ChatGPT, for Whisper, for
Suno.AI, for Google Magenta, for the large language models whose outputs are
now visible everywhere — is the transformer architecture (Vaswani et al.,
2017). A transformer is a neural network that processes sequences by computing,
for each element, a weighted attention over all other elements. The attention
weights are learned from data. The result is a model that can capture
long-range dependencies in sequences — text, audio, musical notes — without the
recurrence that made earlier architectures difficult to train at scale.</p>
<p>What this means practically: these models are trained on very large corpora,
they learn statistical regularities, and they generate outputs that are
statistically consistent with their training distribution. They are not
reasoning from first principles. They do not &ldquo;know&rdquo; music theory the way a
student who has internalised harmonic function knows it. They have learned, from
enormous quantities of text and audio, what tends to follow what. For many tasks
this is sufficient. For tasks that require understanding of underlying structure,
it is not — and the failure modes are characteristic rather than random.</p>
<p>BERT (Devlin et al., 2018) showed that pre-training on large corpora and
fine-tuning on specific tasks produces models that outperform task-specific
architectures on a wide range of benchmarks. The same transfer-learning
paradigm has spread to audio (Whisper pre-trains on 680,000 hours of labelled
audio), to music generation (Magenta&rsquo;s transformer-based models produce
melodically coherent sequences), and to multimodal domains. The technology is
mature, improving, and available to students now. Knowing what it is — not
just what it produces — is the starting point for any sensible curriculum
discussion about it.</p>
<hr>
<h2 id="the-data-protection-constraint">The Data Protection Constraint</h2>
<p>Before any discussion of pedagogical benefit, there is a legal boundary that
most AI-in-education discussions skip over. In Germany, and in the EU more
broadly, the deployment of AI tools in a university setting is governed by the
GDPR (DSGVO, Regulation 2016/679) and, at state level in NRW, by the DSG NRW.
The constraints are not abstract: they determine which tools can be used for
which purposes with which students.</p>
<p>The core principle is data minimisation: only data necessary for a specific,
documented purpose may be collected or processed. When a student uses a
commercial AI tool to get feedback on a composition exercise and enters text
that could identify them or their institution, that data may be stored,
processed, and used for model improvement by an operator whose servers are
outside the EU. Whether such transfers remain legally valid under GDPR after
the Schrems II ruling (Court of Justice of the EU, 2020) is contested — and
&ldquo;contested&rdquo; is not a position in which an institution can comfortably require
students to use a tool.</p>
<p>The practical upshot for curriculum design is this: AI tools running on EU
servers with documented processing agreements can be integrated into formal
coursework. Commercial tools whose terms specify US-based processing and model
training on user data cannot be required of students. They can be discussed and
demonstrated, but making them mandatory puts students in a position where they
must choose between their privacy and their grade.</p>
<p>This is not a reason to avoid AI in teaching. It is a reason to be honest about
the regulatory landscape, to distinguish clearly between tools you can require
and tools you can recommend, and to make data protection literacy part of what
students learn. The skill of reading a terms-of-service document and identifying
the data flows it describes is not a legal skill — it is a general literacy
skill that matters for every digital tool a music professional will use.</p>
<hr>
<h2 id="what-changes-for-students">What Changes for Students</h2>
<p>The question I was asked most often across the three workshops was some version
of: &ldquo;If AI can already do X, should students still learn X?&rdquo;</p>
<p>The question is less simple than it appears, and the answer is not uniform
across skills.</p>
<p><strong>Skills where automation reduces the required production threshold</strong> do exist.
A student who spends weeks mastering advanced music engraving tools for score
production, when AI can generate a usable first draft from a much simpler
description, has arguably spent time that could have been better allocated
elsewhere. Not because the underlying skill is worthless — it is not — but
because the threshold of competence required to produce a working output has
dropped. The student&rsquo;s time might be more valuable spent on something that
has not been automated.</p>
<p><strong>Skills where automation creates new requirements</strong> are more interesting.
Transcription is a useful example. Automatic speech recognition — using
models like Whisper for spoken-word transcription, or specialised models
for audio-to-score music transcription — is now accurate enough to produce
usable first drafts from audio. This does not
eliminate the need for transcription skill in a music student. It changes it.
A student who cannot evaluate the output of an automatic transcription — who
cannot hear where the model has made characteristic errors, who does not have
an internalised sense of what a correct transcription looks like — is unable
to use the tool productively. The required skill has shifted from production
to evaluation. This is not a lesser skill; it is a different one, and it is
not automatically acquired alongside the ability to run the tool.</p>
<p><strong>Skills that automation cannot replace</strong> are those that depend on embodied,
situated, relational knowledge: stage presence, real-time improvisation, the
subtle negotiation of musical meaning in ensemble, the pedagogical relationship
between teacher and student. These are not beyond AI in principle. They are
far beyond it in practice, and the gap is not closing as quickly as the
generative AI discourse sometimes suggests.</p>
<p>The curriculum implication is not &ldquo;teach less&rdquo; or simply &ldquo;teach differently.&rdquo;
It is: be explicit about which category each skill falls into, and design
assessment accordingly. An assignment that asks students to produce something
AI can produce is now testing something different from what it was testing two
years ago — not necessarily nothing, but something different. The rubric should
reflect that.</p>
<hr>
<h2 id="what-changes-for-instructors">What Changes for Instructors</h2>
<p>The same three-category analysis applies symmetrically to teaching.</p>
<p><strong>Routine task automation</strong> is genuinely useful. Generating first drafts of
worksheets, producing exercises at different difficulty levels, transcribing a
recorded lesson for later analysis — these are tasks where AI can save
meaningful time without compromising the pedagogical judgment required to make
use of the output. Holmes et al. (2019) identify feedback generation as one
of the clearer wins for AI in education: systems that provide immediate,
targeted feedback at a scale that human instructors cannot match. A
transcription model listening to a student practice and flagging rhythmic
inconsistencies does not replace a teacher. It extends the feedback loop
beyond the lesson hour.</p>
<p><strong>Content generation with limits</strong> is where AI is most seductive and most
dangerous. A model like ChatGPT can produce a reading list on any topic, a
summary of any debate in the literature, a set of discussion questions for any
text. The outputs are fluent, plausible, and frequently wrong in ways that are
difficult to detect without domain expertise. Jobin et al. (2019) and
Mittelstadt et al. (2016) both document the broader concern with AI opacity
and accountability: when a model produces a confident-sounding claim, the
burden of verification falls on the user. An instructor who outsources the
construction of course materials to a model, and who lacks enough domain
knowledge to catch the errors, is not saving time — they are transferring
risk to their students.</p>
<p>Hallucinations — outputs that are plausible in form but false in content — are
not bugs in the usual sense. They are a structural consequence of how generative
models work. A model trained to predict likely next tokens will produce the most
statistically plausible continuation, not the most accurate one. For music
education, where historical facts, composer attributions, and music-theoretic
claims need to be correct, this matters. The model&rsquo;s fluency is not evidence
of its accuracy.</p>
<p><strong>Personalisation</strong> is the most-cited promise of AI in education (Luckin et
al., 2016; Roll &amp; Wylie, 2016) and the hardest to evaluate in practice. The
argument is that AI can adapt instructional content to individual learners'
needs in real time, producing one-to-one tutoring at scale. The evidence in
formal educational settings is more mixed than the boosters suggest. What is
clear is that personalisation at scale requires data — and extensive data about
individual students&rsquo; learning trajectories raises the same data protection
concerns already discussed, in more acute form.</p>
<hr>
<h2 id="the-music-specific-question">The Music-Specific Question</h2>
<p>I want to be direct about something that came up repeatedly across the day and
that the general AI-in-education literature handles badly: music education is
not generic.</p>
<p>The skills involved — listening, performing, interpreting, composing,
improvising — have a phenomenological and embodied dimension that does not map
cleanly onto the text-prediction paradigm that most current AI systems
instantiate. Suno.AI can generate a stylistically convincing chord progression
in the manner of a named composer. It cannot explain why the progression is
convincing in the way a student who has internalised tonal function can explain
it. Google Magenta can generate a continuation of a melodic fragment that is
locally coherent. It cannot navigate the structural expectations of a sonata
form with the intentionality that a performer brings to interpreting one.</p>
<p>This is not a criticism of these tools. It is a description of what they are.
The curriculum implication is that music education must be clear about what it
is teaching: the <em>product</em> — a score, a performance, a composition — or the
<em>process and understanding</em> of which the product is evidence. Where assessment
focuses on the product, AI creates an obvious challenge. Where it focuses on
demonstrable process and understanding — including the ability to critically
evaluate AI-generated outputs — it creates new opportunities.</p>
<p>The more interesting question is whether AI tools can make musical <em>process</em>
more visible and discussable. A composition student who uses a generative model,
notices that the output is harmonically correct but rhythmically inert, and can
articulate <em>why</em> it is inert — and then revise it accordingly — has
demonstrated more sophisticated musical understanding than a student who
produces the same output without any generative assistance. The tool does not
lower the standard; it shifts where the standard is applied.</p>
<p>There is an analogy in music theory pedagogy. The availability of notation
software that can play back a student&rsquo;s harmony exercise and flag parallel
fifths changed what ear training and harmony teaching emphasise — but it did
not make harmony teaching obsolete. It changed the floor (students can check
mechanical correctness automatically) and raised the ceiling (more class time
can be spent on voice-leading logic and expressive intention). AI tools are a
larger version of the same displacement: the floor rises, the ceiling rises
with it, and the pedagogical question is always what you are doing between
the two.</p>
<hr>
<h2 id="copyright-and-academic-integrity">Copyright and Academic Integrity</h2>
<p>Two issues that crossed all three workshops and deserve direct treatment.</p>
<p>On copyright: the training data of generative music models includes copyrighted
recordings and scores, the legal status of which is actively litigated in
multiple jurisdictions. When Suno.AI generates a piece &ldquo;in the style of&rdquo;
a named composer, it is drawing on patterns extracted from that composer&rsquo;s work
— work that is under copyright in the case of living or recently deceased
composers. The output is not a direct copy, but neither is the relationship
to the training data legally settled. Music students who use these tools in
professional contexts should know that they are working in a legally uncertain
space, and institutions should not pretend otherwise.</p>
<p>On academic integrity: the issue is not that students might use AI to cheat —
they will, some of them, and they have always found ways to cheat with whatever
tools were available. The issue is that current AI policies at many institutions
are incoherent: prohibiting AI use in assessment while providing no clear
guidance on what counts as AI use, and assigning tasks where AI assistance is
undetectable and arguably appropriate. The more useful approach is to design
tasks where AI assistance is either irrelevant (because the task requires live
performance or real-time demonstration) or visible and assessed (because the
task explicitly includes reflection on how AI was used and to what effect).</p>
<hr>
<h2 id="three-things-i-came-away-with">Three Things I Came Away With</h2>
<p>After a full day of workshops, discussions, and the conversations that happen
in the corridors between sessions, I left with three positions that feel more
settled than they did in the morning.</p>
<p><strong>First</strong>: the data protection question is not separable from the pedagogical
question. Any serious curriculum discussion of AI in music education has to
start with what can legally be deployed, not with what would be useful if
constraints were not a factor. The constraints are a factor.</p>
<p><strong>Second</strong>: the skill most urgently needed — in students and in instructors —
is not AI literacy in the sense of knowing which tool to use for which task.
It is the critical capacity to evaluate AI-generated outputs: to notice what
is wrong, to understand <em>why</em> it is wrong, and to correct it. This requires
domain expertise first. You cannot critically evaluate an AI-generated harmonic
analysis if you do not understand harmonic analysis. The tools do not lower
the bar for domain knowledge. They raise the bar for its critical application.</p>
<p><strong>Third</strong>: the curriculum question is not &ldquo;how do we accommodate AI?&rdquo; It is
&ldquo;what are we actually trying to teach, and does the answer change when AI can
produce the visible output of that process?&rdquo; Answering that honestly, skill
by skill, for a full music programme, is slow work. It cannot be done at a
one-day event. But a one-day event, if it is well-designed, can start the
conversation in the right place.</p>
<p>HfMT&rsquo;s Thementag started it in the right place.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Devlin, J., Chang, M.-W., Lee, K., &amp; Toutanova, K. (2018). BERT:
Pre-training of deep bidirectional transformers for language understanding.
<em>arXiv preprint arXiv:1810.04805</em>. <a href="https://arxiv.org/abs/1810.04805">https://arxiv.org/abs/1810.04805</a></p>
</li>
<li>
<p>Goodfellow, I., Bengio, Y., &amp; Courville, A. (2016). <em>Deep Learning.</em>
MIT Press. <a href="https://www.deeplearningbook.org">https://www.deeplearningbook.org</a></p>
</li>
<li>
<p>Holmes, W., Bialik, M., &amp; Fadel, C. (2019). <em>Artificial Intelligence in
Education: Promises and Implications for Teaching and Learning.</em> Center for
Curriculum Redesign.</p>
</li>
<li>
<p>Jobin, A., Ienca, M., &amp; Vayena, E. (2019). The global landscape of AI ethics
guidelines. <em>Nature Machine Intelligence</em>, 1, 389–399.
<a href="https://doi.org/10.1038/s42256-019-0088-2">https://doi.org/10.1038/s42256-019-0088-2</a></p>
</li>
<li>
<p>LeCun, Y., Bengio, Y., &amp; Hinton, G. (2015). Deep learning. <em>Nature</em>,
521(7553), 436–444. <a href="https://doi.org/10.1038/nature14539">https://doi.org/10.1038/nature14539</a></p>
</li>
<li>
<p>Luckin, R., Holmes, W., Griffiths, M., &amp; Forcier, L. B. (2016).
<em>Intelligence Unleashed: An Argument for AI in Education.</em> Pearson.</p>
</li>
<li>
<p>Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., &amp; Floridi, L.
(2016). The ethics of algorithms: Mapping the debate. <em>Big Data &amp; Society</em>,
3(2). <a href="https://doi.org/10.1177/2053951716679679">https://doi.org/10.1177/2053951716679679</a></p>
</li>
<li>
<p>Roll, I., &amp; Wylie, R. (2016). Evolution and revolution in artificial
intelligence in education. <em>International Journal of Artificial Intelligence
in Education</em>, 26(2), 582–599.
<a href="https://doi.org/10.1007/s40593-016-0110-3">https://doi.org/10.1007/s40593-016-0110-3</a></p>
</li>
<li>
<p>Russell, S., &amp; Norvig, P. (2020). <em>Artificial Intelligence: A Modern
Approach</em> (4th ed.). Pearson.</p>
</li>
<li>
<p>Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
A. N., Kaiser, Ł., &amp; Polosukhin, I. (2017). Attention is all you need.
<em>Advances in Neural Information Processing Systems</em>, 30.
<a href="https://arxiv.org/abs/1706.03762">https://arxiv.org/abs/1706.03762</a></p>
</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Inner Echo: On Making Mental Illness Visible, and What That Even Means</title>
      <link>https://sebastianspicker.github.io/posts/inner-echo/</link>
      <pubDate>Thu, 28 Nov 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/inner-echo/</guid>
      <description>I am on the spectrum. Code is easy; emotions are not. This post is about the phrase &amp;lsquo;making mental illness visible&amp;rsquo;, what science actually tells us about that goal, why a non-affected person fundamentally cannot understand — and why trying still matters.</description>
      <content:encoded><![CDATA[<p>There is a phrase that appears in every mental health awareness campaign, every destigmatisation effort, every well-meaning poster in a university corridor: <em>make it visible</em>. Shine a light. Break the silence. Reduce stigma by talking about it.</p>
<p>I agree with the impulse. I am less sure about what the phrase actually asks of us, or what it assumes is possible. This post is my attempt to think through that question — and to document a small project that emerged from it.</p>
<h2 id="a-personal-starting-point">A Personal Starting Point</h2>
<p>I am on the spectrum. I was diagnosed in adulthood, which is not unusual, and the diagnosis explained a great deal about a life spent finding some things effortless and others bewildering.</p>
<p>Code is easy. The internal structure of a problem, the satisfaction of a clean abstraction, the deep rabbit holes that open when a concept catches my attention and refuses to let go — that is the natural medium. Hyperfocus is not a metaphor for me; it is literally how I spend a Tuesday afternoon. I have written entire systems because I could not stop.</p>
<p>Emotions are harder. Not absent — that is a misconception I will address in a moment — but differently structured. Reading a room is work. Social cues that seem to operate as obvious background noise for most people arrive for me as data that requires conscious decoding. The reverse appears to be true for most neurotypical people: emotional processing runs in the background, effortlessly; formal abstraction requires deliberate effort.</p>
<p>Neither is better. They are different cognitive architectures, and both come with costs.</p>
<p>I raise this not to centre myself, but because it is relevant to the question the post is actually about. I spent years navigating a social world that was not built for how I process it. That experience sits close to the experience of people with mental illness — not the same, but adjacent. And it made me think hard about what &ldquo;understanding&rdquo; across neurological difference actually means.</p>
<h2 id="mental-illness-is-still-a-grey-zone">Mental Illness Is Still a Grey Zone</h2>
<p>The progress on mental health stigma over the past decade is real. People talk about therapy more openly than they did. Burnout is acknowledged at work. The language of mental health has entered mainstream use — sometimes usefully, sometimes in ways that dilute clinical concepts into lifestyle descriptors. Anxiety is now a brand attribute. Trauma is a metaphor for mild inconvenience. This is a problem, but it is a second-order problem; the first-order problem — that serious mental illness is still heavily stigmatised, underfunded, and misunderstood — is the one that matters more.</p>
<p>Corrigan and Watson <a href="#ref-1">[1]</a> documented what the stigma research consistently shows: people with mental illness face two compounding problems. The first is public stigma <a href="#ref-3">[3]</a> — the prejudice of others, leading to discrimination in employment, housing, relationships. The second is self-stigma — the internalised application of those same prejudices to oneself. The second is often worse. It is the mechanism by which stigma becomes a barrier to seeking help, creating the feedback loop that keeps serious mental illness invisible precisely because the people experiencing it have been taught that it is shameful.</p>
<p>The phrase &ldquo;make it visible&rdquo; is a response to this dynamic. If mental illness is visible — discussed, depicted, normalised — the argument goes that stigma decreases. There is evidence for this. Contact-based interventions, where people without mental illness interact with people who have it, consistently outperform education-only approaches <a href="#ref-2">[2]</a>. The visibility of real people matters more than information campaigns.</p>
<p>But there is a difference between visibility and understanding.</p>
<h2 id="what-visibility-actually-achieves">What Visibility Actually Achieves</h2>
<p>When we say &ldquo;make it visible&rdquo;, we usually mean one of several different things, which are worth separating.</p>
<p><strong>Normalisation</strong> means that a condition becomes part of accepted human variation rather than a mark of failure or danger. This is achievable through visibility and is genuinely important. Knowing that a colleague takes antidepressants, or that a public figure manages bipolar disorder, reduces the sense of aberration. It does not require the observer to understand the experience — only to register that it exists and is survivable.</p>
<p><strong>Representation</strong> means that people with a condition see themselves reflected in culture, media, and institutions. This matters for the affected person; it is about recognition, not about inducing empathy in the non-affected.</p>
<p><strong>Empathy</strong> is the hardest and most frequently over-promised goal. It is what the simulation approaches aim for: put a neurotypical person in a room with distorted audio and flickering visuals and tell them this is what psychosis sounds like. Does it work?</p>
<p>The honest answer from the research is: somewhat, temporarily, and with significant caveats.</p>
<h2 id="the-empathy-gap">The Empathy Gap</h2>
<p>Let me be direct about something. A person who has never experienced severe depression cannot know what it is. Not in the way that a person who has experienced it knows it. This is not a failure of empathy or imagination; it is a structural fact about how knowledge of mental states works.</p>
<p>Philosophers call this the problem of other minds. We have no direct access to another person&rsquo;s experience. We infer it, imperfectly, by analogy to our own. For experiences that have no analogue in our own history, inference breaks down. You can read every clinical description of dissociation ever written and still not know what dissociation is, because the knowledge that matters is not propositional — it is not a set of facts — but experiential.</p>
<p>This is the gap that simulation approaches try to bridge, and it is genuinely unbridgeable. What simulation can do is something weaker but not worthless: it can create an affective response, a discomfort, a disruption of the observer&rsquo;s normal processing, that functions as a rough proxy signal. Not &ldquo;now you know what it is like&rdquo;, but &ldquo;now you have a small, incomplete, distorted approximation of some dimension of the experience&rdquo;.</p>
<p>The risk is misrepresentation. Schizophrenia simulations have been criticised — fairly — for reducing a complex condition to its most dramatic phenomenological features (auditory hallucinations, paranoia) while omitting the cognitive, relational, and longitudinal aspects that define how people actually live with the condition. A five-minute visual experience of &ldquo;what depression feels like&rdquo; that emphasises darkness and slow motion tells you almost nothing about the specific exhaustion of getting through a Tuesday morning, or the way time warps over months.</p>
<p>So: you cannot truly understand what you have not experienced. But you can try to approximate something, and approximation, done honestly and with appropriate epistemic humility, is better than nothing.</p>
<h2 id="metaphor-as-a-communication-tool">Metaphor as a Communication Tool</h2>
<p>There is a long tradition of using metaphor and art to communicate internal states that resist direct description. This is not a bug; it is a feature of how language handles subjective experience.</p>
<p>The poet uses metaphor because &ldquo;my heart is heavy&rdquo; is not literally true but captures something that &ldquo;I am experiencing low mood&rdquo; does not. The musician uses dissonance and rhythm to structure emotional experience in the listener. The visual artist uses colour and texture to evoke states rather than depict them. None of these are representations in the scientific sense — they do not accurately model the referent — but they create a kind of resonance that purely descriptive language cannot.</p>
<p>Mental health communication has increasingly moved in this direction. The vocabulary of &ldquo;emotional weight&rdquo;, &ldquo;spiralling&rdquo;, &ldquo;crashing&rdquo;, &ldquo;the fog&rdquo; — these are metaphors that have become clinical shorthand precisely because they communicate something essential that clinical terms do not. When someone says &ldquo;I couldn&rsquo;t get out of bed&rdquo;, they are not describing paralysis; they are describing a particular quality of anhedonia and executive dysfunction that no diagnostic manual entry captures as well.</p>
<p>This is the space where a project like inner-echo operates.</p>
<h2 id="inner-echo-the-idea">Inner Echo: The Idea</h2>
<p><a href="https://github.com/sebastianspicker/inner-echo">inner-echo</a> is a browser-based audiovisual experiment. It takes a webcam feed and applies condition-specific visual and audio effects that function as metaphorical overlays on the user&rsquo;s own image. The output is not a simulation of a mental health condition in any clinical sense. It is an attempt to construct a visual and auditory language for internal states, using the user&rsquo;s own presence as the anchor.</p>
<p>The technical architecture is deliberately minimal: React, WebGL/Canvas for video processing, optional WebAudio. Everything runs in the browser, client-side, with no backend. No data leaves the device. This is not incidental — privacy is load-bearing for a project that deals with sensitive self-reflection. Safe Mode and an emergency stop function are built in.</p>
<p>The condition-profile system supports three modes:</p>
<ul>
<li><strong>Preset mode</strong>: a single-condition metaphorical composition — one set of effects mapped to one cluster of experiences</li>
<li><strong>Multimorbid mode</strong>: weighted stacking of multiple condition profiles, acknowledging that most people with mental health conditions do not have one thing</li>
<li><strong>Symptom-first mode</strong>: dimension-level control, letting the user build from individual symptom representations rather than diagnostic labels</li>
</ul>
<p>The last of these is, I think, the most honest design choice. Diagnostic categories are administrative conveniences as much as they are natural kinds. Two people with the same diagnosis can have radically different experiences. Structuring the system around dimensions of experience rather than labels is both clinically more accurate and communicatively more flexible.</p>
<h2 id="what-it-is-not">What It Is Not</h2>
<p>Being clear about limitations is not false modesty; it is the only way this kind of project retains its integrity.</p>
<p>inner-echo is not a simulation of any condition in the sense of accurately modelling its phenomenology. It does not claim to show you &ldquo;what depression is like&rdquo;. It offers metaphorical approximations of some dimensions of some experiences, and it does so using effects that are legible to the observer — visual distortion, audio modification, altered feedback — that bear a designed but non-literal relationship to the internal states they are meant to evoke.</p>
<p>It is not a diagnostic tool. It is not a therapeutic intervention. It is not a substitute for any clinical process.</p>
<p>What it might be is a starting point for a conversation. Something a person experiencing a condition could use to gesture toward an aspect of their experience. Something a person without that experience could encounter with enough curiosity to ask a better question than they would have otherwise.</p>
<p>That is a modest claim. I think modest claims are appropriate here.</p>
<h2 id="why-this-why-now">Why This, Why Now</h2>
<p>Mental health awareness has become a genre. The awareness campaigns, the celebrity disclosures, the workplace wellness programmes — these are real goods, and I do not want to be cynical about them. But the communication problem has not been solved. The words exist. The willingness to use them, in many contexts, exists. What is still missing is a language for the texture of experience that the words point to but do not reach.</p>
<p>I find myself better able to build something than to explain it in words. That is probably a spectrum thing. inner-echo is an attempt to build toward a language that I do not fully have — for my own internal experience, and for the experiences of people navigating conditions quite different from mine.</p>
<p>The gap cannot be closed. But the attempt to reach across it is worth making, and worth being honest about.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Corrigan, P.W. &amp; Watson, A.C. (2002). Understanding the impact of stigma on people with mental illness. <em>World Psychiatry</em>, 1(1), 16–20.</p>
<p><span id="ref-2"></span>[2] Corrigan, P.W., Morris, S.B., Michaels, P.J., Rafacz, J.D. &amp; Rüsch, N. (2012). Challenging the public stigma of mental illness: A meta-analysis of outcome studies. <em>Psychiatric Services</em>, 63(10), 963–973.</p>
<p><span id="ref-3"></span>[3] Goffman, E. (1963). <em>Stigma: Notes on the Management of Spoiled Identity</em>. Prentice-Hall.</p>
<p>inner-echo repository: <a href="https://github.com/sebastianspicker/inner-echo">https://github.com/sebastianspicker/inner-echo</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>After the Connection Is Stable, the Hard Part Begins</title>
      <link>https://sebastianspicker.github.io/posts/nmp-curriculum-reflective-practice/</link>
      <pubDate>Fri, 22 Nov 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/nmp-curriculum-reflective-practice/</guid>
      <description>A third post in the networked music performance series. Technical latency is solved. Institutional infrastructure has a name. What students actually learn — and what conservatoire curricula consistently get wrong about teaching it — turns out to be a different problem entirely.</description>
      <content:encoded><![CDATA[<p><em>Third post in a series. The <a href="/posts/nmp-latency-lola-mvtp/">August 2023 post</a>
covered latency measurements across six European research-network links.
The <a href="/posts/digital-music-labs-infrastructure/">June 2024 post</a> covered
what institutional infrastructure needs to look like for any of that to
be sustainably usable. This one covers what happens after both of those
problems are solved — which is when the genuinely interesting educational
challenges start.</em></p>
<p><em>Based on a manuscript with colleagues from the RAPP Lab. Not yet peer-reviewed.</em></p>
<hr>
<h2 id="the-gap-nobody-talks-about">The Gap Nobody Talks About</h2>
<p>There is a version of the NMP success story that stops too early. It goes: we
installed LoLa, measured the latency, it came in at 9.5 ms to Vienna, the
musicians played together across 745 km, it worked. Success.</p>
<p>What this story skips is the classroom after the demo. The student who can
follow a setup checklist perfectly and still has no idea what to do musically
when the connection is stable. The ensemble that gets a clean signal running
and then plays exactly the same repertoire in exactly the same way they would
in a co-present rehearsal, fighting the latency instead of working with it,
frustrated when it does not feel right. The assessment rubric that checks off
&ldquo;maintained stable connection&rdquo; and &ldquo;completed the performance&rdquo; and has nothing
to say about everything that actually constitutes musical learning in a
networked context.</p>
<p>The gap between <em>technical feasibility</em> and <em>educational transformation</em> is
the subject of this post. Closing it turns out to require a different kind of
curriculum design than most conservatoires have tried.</p>
<hr>
<h2 id="what-gets-taught-versus-what-needs-to-be-learned">What Gets Taught Versus What Needs to Be Learned</h2>
<p>The default curricular response to NMP has been to treat it as a technical
skill with an artistic application. Students learn to configure an audio
interface, manage routing, establish a LoLa connection, and then — implicitly
— go do music. The technical content gets staged as a prerequisite to the
&ldquo;real&rdquo; work.</p>
<p>This ordering is wrong in a specific way. Technical setup work is genuinely
necessary, but making it a prerequisite treats the relationship between
technology and musical practice as sequential rather than recursive. In
practice, the interesting musical problems only become visible <em>through</em> the
technical ones. A student does not understand why buffer size matters until
they have felt the difference between a 5 ms and a 40 ms offset in a
coordination-intensive passage. A student does not develop an opinion about
audio routing configurations until they have experienced a rehearsal collapse
caused by a routing error they could have prevented.</p>
<p>The RAPP Lab&rsquo;s recurring insight across several years of module iterations
at HfMT Köln was more direct: once learners can establish a stable connection,
the harder challenge is developing artistic, collaborative and reflective
strategies for making music <em>together apart</em>. Technical fluency is a
foundation, not a destination.</p>
<hr>
<h2 id="the-curriculum-we-ended-up-with">The Curriculum We Ended Up With</h2>
<p>It took several cycles to get there. The early format was weekend workshops —
open, exploratory, no formal assessment, primarily for advanced students who
self-selected in. These were useful precisely because they were informal: they
revealed quickly how technical and musical questions become inextricable once
you are actually playing, and they gave us evidence about where students got
stuck that we would not have found from a needs analysis.</p>
<p>Over time, elements of those workshops were developed into recurring
curriculum-embedded modules, which then fed into independent study projects
and eventually into external collaborations and performances. The trajectory
mattered: moving from a one-off event to something longitudinal meant that
knowledge built across cohorts rather than resetting every time.</p>
<p>The module structure that emerged has three interlocking elements:</p>
<p><strong>Progressive task design.</strong> Early sessions are tightly scoped:
specific technical-musical exercises, limited repertoire, well-defined
success criteria. Later sessions move toward open-ended projects, student-led
rehearsal planning, and eventually cross-institutional partnerships where
variables are genuinely outside anyone&rsquo;s control. The point of the early
constraints is not to make things easier — it is to create conditions where
students can notice what they are doing rather than just surviving.</p>
<p><strong>Journals and debriefs.</strong> Students kept individual reflective journals
throughout modules, documenting not just what happened but how they responded
to it — technical problems, musical decisions, moments of coordination failure
and recovery, questions they could not answer at the time. Group debriefs
after each rehearsal then turned those individual threads into collective
knowledge: comparing strategies, naming the problems that came up repeatedly,
developing shared language for rehearsal coordination.</p>
<p>The debrief is the part of this model that I think gets undervalued. It is
not just reflection — it is <em>curriculum production</em>. Strategies that emerged
from one cohort&rsquo;s debriefs became documented starting points for subsequent
cohorts. Knowledge accumulated rather than evaporating when the semester ended.</p>
<p><strong>Portfolio assessment.</strong> Rather than assessing primarily on a final
performance, students assembled portfolios that could include curated journal
excerpts, rehearsal documentation, reflective syntheses, and accounts of
how their thinking changed. The question being assessed was not &ldquo;did you play
the concert&rdquo; but &ldquo;can you articulate why you made the decisions you made, and
what you would do differently.&rdquo;</p>
<hr>
<h2 id="what-students-actually-learn-when-the-curriculum-works">What Students Actually Learn (When the Curriculum Works)</h2>
<p>Four outcomes recurred across the RAPP Lab iterations, consistently enough
to be worth naming:</p>
<h3 id="1-technical-agency">1. Technical agency</h3>
<p>This is different from technical competence. Competence means you can follow
a procedure. Agency means you understand the procedure well enough to deviate
from it intelligently when something goes wrong — to diagnose what failed,
generate a hypothesis about why, and try something different.</p>
<p>The shift happened when students stopped treating technical problems as
interruptions to the music and started treating them as information about
the system they were working inside. A dropout is not just an annoyance; it
is evidence about where the failure occurred. Getting to that reframe took,
on average, several weeks of structured reflection. It did not happen from
reading documentation.</p>
<h3 id="2-adaptive-improvisation">2. Adaptive improvisation</h3>
<p>Latency changes what real-time musical coordination can mean. You cannot rely
on the same multimodal cues — breath, gesture, shared acoustics — that make
co-present ensemble playing feel intuitive. You have to develop explicit
cueing systems, turn-taking conventions, contingency plans for when the
connection degrades mid-performance.</p>
<p>What we observed was that this constraint generated a specific kind of
musical creativity. Students improvised not just with musical material but
with rehearsal organisation itself — inventing systems, testing them,
discarding the ones that did not work, documenting the ones that did. Some of
the most musically interesting moments in the modules came from sessions where
the technology was behaving badly and students had to make it work anyway.</p>
<p>There is research on &ldquo;productive failure&rdquo; — deliberately designing tasks that
exceed students&rsquo; current control, because the struggle and recovery produces
deeper learning than smooth execution (Kapur 2016). NMP turns out to be a
natural context for this, not by design but because the network does not
cooperate on schedule.</p>
<h3 id="3-collaborative-communication">3. Collaborative communication</h3>
<p>Co-present rehearsal relies heavily on implicit communication: the
physical space makes many things legible without anyone having to say them.
In a networked rehearsal, the spatial and gestural channel is degraded or
absent. Students had to make explicit what would normally be implicit —
articulating coordination strategies, naming the problems they were
experiencing rather than hoping the ensemble would notice, developing a
vocabulary for talking about timing and latency as musical parameters.</p>
<p>This turned out to generalize. Students who had worked through several
networked rehearsal cycles were noticeably better at explicit musical
communication in co-present settings too, because they had been forced to
develop the vocabulary in a context where it was necessary.</p>
<h3 id="4-reflective-identity">4. Reflective identity</h3>
<p>The students who got the most out of the modules were the ones who stopped
waiting for the conditions to improve and started working with the conditions
as they were. Latency as a compositional constraint rather than a defect to
be routed around. Uncertainty as an artistic condition rather than a
technical failure.</p>
<p>The journal entries where this shift is most visible are not the ones that
describe what the student did. They are the ones that describe a change in
how the student understands their own practice — who they are as a musician
in relation to an environment they cannot fully control. That is a different
kind of outcome than anything a timing metric captures.</p>
<hr>
<h2 id="the-assessment-problem">The Assessment Problem</h2>
<p>The hardest part of all of this to translate into institutional language is
assessment. The conservatoire has well-developed frameworks for evaluating
performances. It has much weaker frameworks for evaluating the learning that
happens before and between and underneath performances.</p>
<p>Checklist rubrics — was the connection stable, was the latency within
acceptable range, did the performance complete — are useful for safety and
reliability. They are poor evidence for whether a student has developed the
capacity to work reflectively and artistically in a mediated ensemble
environment. A student who achieved a stable connection by following
instructions exactly and a student who achieved it by diagnosing a routing
error mid-session look identical on a checklist. They have had very different
learning experiences.</p>
<p>Portfolio assessment addresses this by making the reasoning visible. When a
student can explain why they chose a particular buffer configuration given
the specific network characteristics of that session, how that choice affected
the musical phrasing in the piece they were rehearsing, and what they would
change next time — that is evidence of something real. It is also harder to
assess than a timing log, which is probably why most programmes avoid it.</p>
<p>The argument is not that quantitative indicators are useless. It is that
they function better as scaffolding for reflective judgement than as the
primary evidence of learning. Mixed assessment ecologies — technical logs
plus journals plus portfolio syntheses — are more honest about what is
actually happening educationally.</p>
<hr>
<h2 id="what-this-does-not-solve">What This Does Not Solve</h2>
<p>The model described here depends on teaching staff who can facilitate
reflective dialogue, curate knowledge across cohorts, and participate in
iterative curriculum redesign. That is a specific professional competence
that is not automatically present in a conservatoire staffed primarily by
performing musicians. The training and support structures needed to develop
it are an open question this paper does not fully answer.</p>
<p>The curriculum is also not portable as-is. The RAPP Lab model emerged in a
specific institutional context — HfMT Köln, specific partner network,
specific funding structure, specific cohort of students. The four outcomes
and the general pedagogical logic may transfer; the specific formats will
need adaptation. Any institution that tries to implement this without going
through at least one cycle of their own iterative development is likely to
end up with a checklist version of something that works only when it is a
living process.</p>
<p>And the technology keeps moving. LoLa is a mature platform but the
ecosystem around it — network configurations, operating system support,
hardware lifecycles — changes faster than curriculum documentation. Building
responsiveness into the curriculum itself, rather than treating it as a fixed
syllabus, is the structural answer. Easier to recommend than to institutionalise.</p>
<hr>
<h2 id="references">References</h2>
<p>Barrett, H. C. (2007). Researching electronic portfolios and learner
engagement. <em>Journal of Adolescent &amp; Adult Literacy</em>, 50(6), 436–449.</p>
<p>Borgdorff, H. (2012). <em>The Conflict of the Faculties.</em> Leiden University Press.</p>
<p>The Design-Based Research Collective (2003). Design-based research: An
emerging paradigm for educational inquiry. <em>Educational Researcher</em>, 32(1),
5–8.</p>
<p>Kapur, M. (2016). Examining productive failure, productive success,
unproductive failure, and unproductive success in learning. <em>Educational
Psychologist</em>, 51(2), 289–299. <a href="https://doi.org/10.1080/00461520.2016.1155457">https://doi.org/10.1080/00461520.2016.1155457</a></p>
<p>Lave, J. &amp; Wenger, E. (1991). <em>Situated Learning.</em> Cambridge University Press.</p>
<p>Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for
assessment and grading. <em>Assessment &amp; Evaluation in Higher Education</em>,
34(2), 159–179. <a href="https://doi.org/10.1080/02602930801956059">https://doi.org/10.1080/02602930801956059</a></p>
<p>Schön, D. A. (1983). <em>The Reflective Practitioner.</em> Basic Books.</p>
<p>Wenger, E. (1998). <em>Communities of Practice.</em> Cambridge University Press.
<a href="https://doi.org/10.1017/CBO9780511803932">https://doi.org/10.1017/CBO9780511803932</a></p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-20</strong>: Updated the Sadler (2009) reference title to &ldquo;Indeterminacy in the use of preset criteria for assessment and grading,&rdquo; matching the journal article at this DOI. Updated the Kapur (2016) reference to the full published title: &ldquo;Examining productive failure, productive success, unproductive failure, and unproductive success in learning.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Primes Are Energy Levels: The Montgomery-Odlyzko Conjecture</title>
      <link>https://sebastianspicker.github.io/posts/riemann-primes-quantum-chaos/</link>
      <pubDate>Mon, 18 Nov 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/riemann-primes-quantum-chaos/</guid>
      <description>In October 2024, the largest known prime was discovered — 41 million digits, found by a GPU cluster. But the deepest prime story is not about record-breaking numbers. It is about a 1972 teatime conversation at the Institute for Advanced Study, a pair correlation formula, and the suspicion — numerically confirmed to extraordinary precision — that the zeros of the Riemann zeta function are the energy levels of an undiscovered quantum system.</description>
      <content:encoded><![CDATA[<h2 id="a-very-large-prime">A Very Large Prime</h2>
<p>On 12 October 2024, a retired NVIDIA engineer named Luke Durant announced that he had found the 52nd known Mersenne prime. The number is $2^{136{,}279{,}841} - 1$, and writing it out in decimal requires 41,024,320 digits. Durant had organised a cloud network of GPU servers spread across 17 countries — essentially repurposing the hardware that normally trains language models to instead do modular arithmetic on numbers with tens of millions of digits. The verification alone took about 51 days of computation.</p>
<p>This is the kind of thing that makes headlines, and it deserves them. Mersenne primes are rare and verifying them is genuinely hard. But if I am honest, the more interesting prime story of the last half-century is not about the record-breaking number. It is about a conversation over tea in Princeton in 1972, and the increasingly hard-to-dismiss suspicion that the prime numbers are, in a precise statistical sense, quantum energy levels.</p>
<p>When I say &ldquo;quantum energy levels,&rdquo; I mean it almost literally — not as a metaphor. Let me explain.</p>
<h2 id="the-riemann-zeta-function-encodes-the-primes">The Riemann Zeta Function Encodes the Primes</h2>
<p>Start with the most famous function in number theory. For $\operatorname{Re}(s) > 1$, the Riemann zeta function is defined by the series</p>
$$\zeta(s) = \sum_{n=1}^{\infty} \frac{1}{n^s}.$$<p>This converges nicely and defines an analytic function. But the real reason to care about it is Euler&rsquo;s product formula:</p>
$$\zeta(s) = \prod_{p \text{ prime}} \frac{1}{1 - p^{-s}}.$$<p>This is not obvious — it follows from unique prime factorisation, essentially — but its implications are enormous. The product runs over <em>all</em> primes, and each prime contributes a factor. The primes are encoded in the analytic structure of $\zeta$. If you know $\zeta$, you know the primes; if you understand the zeros of $\zeta$, you understand their distribution.</p>
<p>Riemann&rsquo;s 1859 paper made this explicit (<a href="#ref-Riemann1859">Riemann, 1859</a>). He showed that $\zeta$ extends analytically to the whole complex plane (minus a simple pole at $s = 1$), and he wrote down an explicit formula connecting the prime-counting function</p>
$$\pi(x) = \#\{p \leq x : p \text{ prime}\}$$<p>to the zeros of $\zeta$. The formula is</p>
$$\pi(x) \approx \operatorname{Li}(x) - \sum_{\rho} \operatorname{Li}(x^{\rho}) + \text{(lower-order terms)},$$<p>where $\operatorname{Li}(x) = \int_2^x \frac{dt}{\ln t}$ is the logarithmic integral and the sum runs over the <em>non-trivial zeros</em> $\rho$ of $\zeta$.</p>
<p>What are the non-trivial zeros? The zeta function has trivial zeros at the negative even integers $-2, -4, -6, \ldots$ — boring, understood. The non-trivial zeros lie in the <em>critical strip</em> $0 < \operatorname{Re}(s) < 1$, and their imaginary parts are what drive the oscillatory corrections to $\pi(x)$. Each zero $\rho = \frac{1}{2} + it_n$ contributes a term that oscillates like $x^{1/2} \cos(t_n \ln x)$. The prime distribution is a superposition of these oscillations, one per zero.</p>
<p>The Riemann Hypothesis is the claim that all non-trivial zeros lie on the <em>critical line</em> $\operatorname{Re}(s) = \frac{1}{2}$. It has been verified numerically for the first $10^{13}$ zeros (Gourdon, 2004; building on earlier high-height computations by <a href="#ref-Odlyzko1987">Odlyzko, 1987</a>). It has not been proved. It remains, after 165 years, the most important unsolved problem in mathematics.</p>
<h2 id="tea-with-dyson">Tea with Dyson</h2>
<p>In 1972, Hugh Montgomery was visiting the Institute for Advanced Study in Princeton. He was working on a specific question: if you take the imaginary parts of the non-trivial zeros of $\zeta$ and normalise them so that their mean spacing is 1, what is the distribution of spacings between them?</p>
<p>More precisely, he was computing the <em>pair correlation function</em> of the normalised zeros. If $\tilde{\gamma}_n$ are the normalised imaginary parts (ordered $\tilde{\gamma}_1 \leq \tilde{\gamma}_2 \leq \cdots$), the pair correlation function $R_2(r)$ measures the density of pairs $(\tilde{\gamma}_m, \tilde{\gamma}_n)$ with $\tilde{\gamma}_n - \tilde{\gamma}_m \approx r$.</p>
<p>Montgomery found — subject to certain assumptions about the behaviour of $\zeta$ — that</p>
$$R_2(r) = 1 - \left(\frac{\sin \pi r}{\pi r}\right)^2.$$<p>(<a href="#ref-Montgomery1973">Montgomery, 1973</a>)</p>
<p>He mentioned this to Freeman Dyson over tea. Dyson — who had spent years on quantum mechanics and random matrix theory — recognised the formula immediately. That expression, $1 - (\sin \pi r / \pi r)^2$, is exactly the pair correlation function of eigenvalues of random matrices drawn from the Gaussian Unitary Ensemble.</p>
<p>Montgomery had not been thinking about quantum mechanics. Dyson had not been thinking about primes. The formula matched.</p>
<h2 id="the-gaussian-unitary-ensemble">The Gaussian Unitary Ensemble</h2>
<p>Let me say a few words about where that formula comes from in physics, because it is not obvious.</p>
<p>The Gaussian Unitary Ensemble (GUE) is a probability distribution over $N \times N$ Hermitian matrices. Specifically, it is the distribution proportional to $e^{-\operatorname{tr}(H^2)}$ on the space of Hermitian matrices, which is invariant under conjugation $H \mapsto U H U^\dagger$ for any unitary $U$. The entries on the diagonal are real Gaussians; the off-diagonal entries are complex Gaussians with independent real and imaginary parts.</p>
<p>In the limit $N \to \infty$, the eigenvalues of a GUE matrix distribute globally according to Wigner&rsquo;s semicircle law. But the local statistics — the fine-grained distribution of spacings between nearby eigenvalues — follow a universal law. The pair correlation function is</p>
$$R_2^{\text{GUE}}(r) = 1 - \left(\frac{\sin \pi r}{\pi r}\right)^2.$$<p>This distribution has a crucial qualitative feature called <em>level repulsion</em>: as $r \to 0$, $R_2(r) \to 0$. Eigenvalues of random Hermitian matrices strongly avoid each other. A Poisson distribution — which is what you would get for eigenvalues that were statistically independent — would give $R_2(r) = 1$ everywhere, with no such repulsion. The GUE formula suppresses small gaps quadratically: $R_2(r) \sim \pi^2 r^2 / 3$ for small $r$.</p>
<p>Why does GUE statistics arise in physics? This is the content of the Bohigas-Giannoni-Schmit conjecture (1984), which by now has overwhelming numerical support: quantum systems whose classical limit is chaotic and which lack time-reversal symmetry have energy level statistics described by the GUE. Systems with time-reversal symmetry fall into the Gaussian Orthogonal Ensemble (GOE), which has a different but related formula. Nuclear energy levels, quantum billiards with the right shapes, molecular spectra — all of them, when appropriately normalised, show GUE or GOE statistics.</p>
<p>The universality is the point. It does not matter what the specific Hamiltonian is. If the system is sufficiently chaotic, the eigenvalue statistics are universal.</p>
<h2 id="odlyzkos-computation">Odlyzko&rsquo;s Computation</h2>
<p>Montgomery&rsquo;s result was conditional and covered only a limited range of $r$. The natural next step was numerical verification: actually compute a large number of Riemann zeros and measure their pair correlation.</p>
<p>Andrew Odlyzko did exactly this, in a series of computations beginning in the 1980s. The results were striking (<a href="#ref-Odlyzko1987">Odlyzko, 1987</a>). He computed millions of zeros with high precision and compared their empirical pair correlation to the GUE prediction. The agreement was not merely qualitative — it was quantitatively exact, to within the statistical error of the sample.</p>
<p>Odlyzko then pushed further. He computed zeros near the $10^{20}$-th zero, far out on the critical line. Same statistics. He computed zeros near the $10^{22}$-th zero. Same statistics. The agreement held regardless of how far up the critical line one went. This is not a small-sample artifact and it is not coincidence, or at least it would be an extraordinary coincidence of a kind that mathematics has never before encountered.</p>
<p>The plots from Odlyzko&rsquo;s computations are, in my view, some of the most beautiful images in mathematics. You draw the GUE prediction — a smooth curve, starting at zero, rising to approach 1 — and you overlay the empirical histogram from the Riemann zeros. They are the same curve.</p>
<h2 id="berry-keating-and-the-missing-hamiltonian">Berry, Keating, and the Missing Hamiltonian</h2>
<p>If the zeros of $\zeta$ are energy levels, there should be a Hamiltonian $H$ — a self-adjoint operator — whose spectrum is exactly $\{t_n\}$, the imaginary parts of the non-trivial zeros (assuming the Riemann Hypothesis, so that all zeros are of the form $\frac{1}{2} + it_n$).</p>
<p>In 1999, Michael Berry and Jon Keating proposed a candidate (<a href="#ref-BerryKeating1999">Berry &amp; Keating, 1999</a>). Their suggestion was the classical Hamiltonian</p>
$$H_{\text{cl}} = xp,$$<p>where $x$ is position and $p$ is momentum, quantized with appropriate symmetrization:</p>
$$\hat{H} = \frac{1}{2}(\hat{x}\hat{p} + \hat{p}\hat{x}).$$<p>Classically, $H = xp$ describes a system in which the phase-space trajectories are hyperbolas $xp = E = \text{const}$, and the motion is $x(t) = x_0 e^t$, $p(t) = p_0 e^{-t}$ — exponential expansion in position, contraction in momentum. This is essentially the dynamics of an unstable fixed point, and it is classically chaotic in the appropriate sense.</p>
<p>The semiclassical (WKB) approximation gives an eigenvalue counting function</p>
$$N(E) \approx \frac{E}{2\pi} \ln \frac{E}{2\pi} - \frac{E}{2\pi} + \frac{7}{8} + \cdots,$$<p>which matches Riemann&rsquo;s formula for the number of zeros of $\zeta$ with imaginary part up to $T$:</p>
$$N(T) = \frac{T}{2\pi} \ln \frac{T}{2\pi} - \frac{T}{2\pi} + \frac{7}{8} + O\!\left(\frac{\ln T}{T}\right).$$<p>This is not a coincidence: the correspondence is exact at the level of the smooth counting function. The hard part is the oscillatory corrections — and those require the specific eigenvalues, which requires knowing the boundary conditions.</p>
<p>The problem is that $\hat{H} = \frac{1}{2}(\hat{x}\hat{p} + \hat{p}\hat{x})$ as an operator on $L^2(\mathbb{R})$ is not bounded below and has a continuous spectrum, not a discrete one. Turning it into an operator with a discrete spectrum matching the Riemann zeros requires boundary conditions that have not been found. This is the crux: Berry and Keating have the right classical system, but the quantum boundary conditions are missing.</p>
<p>What would be profound about finding $\hat{H}$? If $\hat{H}$ is self-adjoint and bounded below ($\hat{H} \geq 0$), its eigenvalues are all non-negative real numbers. If those eigenvalues are the imaginary parts of the zeros, then all zeros have real part exactly $\frac{1}{2}$ — which is the Riemann Hypothesis. A proof of the existence of such a Hamiltonian would, in one stroke, resolve the most important open problem in mathematics.</p>
<h2 id="primes-as-periodic-orbits-the-gutzwiller-analogy">Primes as Periodic Orbits: The Gutzwiller Analogy</h2>
<p>The quantum chaos connection goes deeper than pair correlations. In semiclassical quantum mechanics, the Gutzwiller trace formula relates the density of quantum energy levels to a sum over classical periodic orbits:</p>
$$d(E) = \bar{d}(E) + \sum_{\gamma} A_\gamma \cos\!\left(\frac{S_\gamma}{\hbar} - \phi_\gamma\right),$$<p>where the sum runs over all classical periodic orbits $\gamma$, $S_\gamma$ is the classical action of the orbit, $A_\gamma$ is an amplitude, and $\phi_\gamma$ is a phase (Maslov index correction). The smooth part $\bar{d}(E)$ comes from the Thomas-Fermi approximation; the oscillatory part encodes quantum interference between orbits.</p>
<p>The direct analogue in number theory is the <em>explicit formula</em> for the prime-counting function. Written as a formula for the oscillatory part of the zero-counting function, it reads</p>
$$\psi(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2}\ln(1 - x^{-2}),$$<p>where $\psi(x) = \sum_{p^k \leq x} \ln p$ is the Chebyshev function and the sum is over non-trivial zeros $\rho$.</p>
<p>Comparing these two formulas term by term: the zeros $\rho$ of $\zeta$ play the role of the quantum energy levels $E_n$; the primes $p$ — and their prime powers $p^k$ — play the role of the classical periodic orbits $\gamma$. The &ldquo;action&rdquo; of the orbit corresponding to $p^k$ is $k \ln p$. The primes are the primitive periodic orbits; $p^k$ is the $k$-th traversal of that orbit.</p>
<p>This is not a metaphor or a loose analogy. The Selberg trace formula — developed for the Laplacian on hyperbolic surfaces — makes this correspondence rigorous in a related setting: the periodic geodesics on a hyperbolic surface play the role of primes, and the eigenvalues of the Laplacian play the role of Riemann zeros (<a href="#ref-RudnickSarnak1996">Rudnick &amp; Sarnak, 1996</a>). The Riemann zeta function is the limit of a family of such systems, in some sense that is still being made precise.</p>
<p>I find it remarkable that the logarithms of primes — the most elementary sequence in arithmetic — appear as lengths of orbits in what would be a quantum chaotic system. Each prime contributes an oscillation to $\psi(x)$ with &ldquo;frequency&rdquo; proportional to its logarithm. You are, in a sense, hearing the primes as quantum interference.</p>
<p>This connects to a theme that comes up elsewhere on this blog. The <a href="/posts/falling-cat-geometric-phase/">falling cat problem</a> involves Berry phase and geometric holonomy — again a situation where deep structure emerges from symmetry and topology. The <a href="/posts/schrodinger-cat-qubits/">Schrödinger cat in quantum computing</a> involves the spectacular fragility of quantum coherence. The Riemann zeros are, if the conjecture is right, a quantum system that has never decohered — a perfectly coherent spectrum hiding inside the most ancient problem in mathematics.</p>
<h2 id="a-brief-detour-maynard-and-primes-without-digits">A Brief Detour: Maynard and Primes Without Digits</h2>
<p>While we are talking about primes, I cannot resist a detour through two results of James Maynard, who received the Fields Medal in 2022.</p>
<p>The first concerns bounded gaps. Euclid proved that there are infinitely many primes. The Twin Prime Conjecture says there are infinitely many pairs of primes $(p, p+2)$. This remains open. But in 2013, Yitang Zhang proved something extraordinary: there are infinitely many pairs of primes differing by at most 70,000,000 (<a href="#ref-Zhang2014">Zhang, 2014</a>). The bound is large, but the qualitative statement — that gaps between primes are bounded infinitely often — was completely new. Shortly thereafter, Maynard independently proved a much stronger result using the Maynard-Tao sieve: infinitely many prime pairs with gap at most 600 (<a href="#ref-Maynard2015">Maynard, 2015</a>). A crowdsourced effort (Polymath8b) brought the bound down to 246. The Twin Prime Conjecture remains open, but 246 is a long way from 70,000,000.</p>
<p>The second result is stranger. Maynard proved in 2016 that for any decimal digit $d \in \{0, 1, \ldots, 9\}$, there are infinitely many primes whose decimal representation contains no instance of $d$. There are infinitely many primes with no $7$ in their decimal expansion. There are infinitely many primes with no $3$. The proof uses techniques from analytic number theory, specifically exponential sum estimates and sieve methods, and the result holds not just for base 10 but for any base.</p>
<p>This is one of those results that sounds impossible on first hearing. Surely removing an entire digit should make most large numbers unavailable, so the primes run out? Not so. The density of such &ldquo;digitless&rdquo; numbers thins out, but not fast enough to eliminate infinitely many primes.</p>
<h2 id="the-52nd-mersenne-prime-and-what-we-do-not-know">The 52nd Mersenne Prime and What We Do Not Know</h2>
<p>Return to $M_{136{,}279{,}841} = 2^{136{,}279{,}841} - 1$. Mersenne primes have the form $2^p - 1$ where $p$ is a prime (though not all such numbers are prime — $2^{11} - 1 = 2047 = 23 \times 89$). They are tested via the Lucas-Lehmer primality test: define the sequence</p>
$$s_0 = 4, \qquad s_{n+1} = s_n^2 - 2.$$<p>Then $M_p = 2^p - 1$ is prime if and only if $s_{p-2} \equiv 0 \pmod{M_p}$.</p>
<p>The test requires $p - 2$ squarings modulo $M_p$. Each squaring involves numbers with roughly $p$ digits, and modular reduction modulo $M_p = 2^p - 1$ is cheap because it reduces to bit-shifts. This is why GPU parallelism helps enormously: each squaring can be broken into many parallel multiplications of sub-blocks of digits. Durant&rsquo;s cloud network was, in effect, a massively distributed modular arithmetic engine.</p>
<p>We do not know if there are infinitely many Mersenne primes. The heuristic Lenstra-Pomerance-Wagstaff conjecture says yes: the expected number of Mersenne primes $2^p - 1$ with $p \leq x$ is approximately</p>
$$e^\gamma \ln x / \ln 2 \approx 1.78 \cdot \log_2 x,$$<p>where $\gamma \approx 0.5772$ is the Euler-Mascheroni constant. This predicts roughly logarithmic growth in the count — consistent with the 52 known examples — but is nowhere near proved.</p>
<p>The known Mersenne primes do not form a sequence with obviously regular gaps. The exponents $p$ are: 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, &hellip; and then larger, less predictable values. Whether their distribution has GUE-like statistics is not a standard research question (the sample is too small), but the question of whether the primes $p$ for which $2^p - 1$ is prime have any special structure is an active one. For now, the answer is: we do not know.</p>
<h2 id="why-this-matters-and-why-it-does-not-prove-anything">Why This Matters, and Why It Does Not Prove Anything</h2>
<p>Let me be precise about what has and has not been established.</p>
<p>What has been established:</p>
<ul>
<li>Montgomery proved (conditionally, assuming a form of the generalised Riemann Hypothesis) that the pair correlation of Riemann zeros, for a certain range of $r$, is given by $1 - (\sin \pi r / \pi r)^2$.</li>
<li>Odlyzko verified numerically — to extraordinary precision, over billions of zeros — that the full empirical pair correlation matches the GUE prediction.</li>
<li>The Gutzwiller/Selberg analogy between periodic orbits and primes is mathematically precise in related settings (hyperbolic surfaces, function fields over finite fields).</li>
<li>Rudnick and Sarnak proved that the $n$-point correlation functions of Riemann zeros match GUE for all $n$, subject to a plausible conjecture about $\zeta$ (<a href="#ref-RudnickSarnak1996">Rudnick &amp; Sarnak, 1996</a>).</li>
</ul>
<p>What has not been established:</p>
<ul>
<li>There is no known Hamiltonian $\hat{H}$ whose spectrum is the set of Riemann zeros.</li>
<li>The Riemann Hypothesis remains open.</li>
<li>There is no proof that the Montgomery-Odlyzko connection is anything more than an extraordinary numerical coincidence.</li>
</ul>
<p>The broader context is the Langlands program — a still-hypothetical grand unification of number theory, algebraic geometry, and representation theory, sometimes described as a &ldquo;grand unified theory of mathematics.&rdquo; The Langlands correspondence predicts deep connections between $L$-functions (generalisations of $\zeta$) and representations of algebraic groups. The spectral interpretation of Riemann zeros — if it could be made precise — would fit naturally into this framework. Some researchers believe that a proof of the Riemann Hypothesis will come from the Langlands side, not from analytic number theory or quantum mechanics. Others think the quantum chaos connection is the right road. Nobody knows.</p>
<p>What would it mean if the connection is real? It would mean that the prime numbers — discovered by Euclid, studied for two and a half millennia, used today in every TLS handshake and RSA key — are the eigenvalues of a physical Hamiltonian. The abstract number-theoretic structure and the physical quantum mechanical structure would be not merely analogous but identical. That is a claim of the same depth as the unexpected appearance of the same partial differential equations in heat flow, diffusion, and Brownian motion: a discovery that what seemed to be different phenomena are manifestations of the same underlying law.</p>
<p>Or it could be a very surprising coincidence. Mathematics has a long history of producing such coincidences — the same numbers appearing in unrelated contexts for reasons that, when understood, turned out not to be coincidences at all. I suspect this is not a coincidence. But suspicion is not proof.</p>
<h2 id="a-closing-reflection">A Closing Reflection</h2>
<p>I started this post with the 52nd Mersenne prime because it is the news item that prompted me to write. GPU clusters finding 41-million-digit primes are genuinely impressive technology. But I keep returning to the image of Montgomery and Dyson at tea in 1972, and the formula $1 - (\sin \pi r / \pi r)^2$ connecting two conversations that had nothing to do with each other.</p>
<p>I have spent some time with random matrix theory, and separately with the zeta function, and the thing that still strikes me is how <em>clean</em> the connection is. This is not a numerical coincidence of the form &ldquo;these two quantities agree to 3 decimal places.&rdquo; Odlyzko&rsquo;s plots show agreement across many orders of magnitude, for zeros computed billions of entries into the sequence. The GUE curve and the empirical histogram are, visually, the same curve.</p>
<p>As someone trained as a physicist, I find this both encouraging and slightly unsettling. Encouraging because it suggests that the primes are not random — they have a structure, one that matches the eigenvalue repulsion of quantum chaotic systems, and that structure might be the key to proving the Riemann Hypothesis. Unsettling because it means that the quantum mechanical formalism — which I always thought was a description of a physical world — seems to be reaching into pure arithmetic, where there is no wave function, no Hilbert space, no measurement. The primes do not know they are supposed to be energy levels. And yet, statistically, they are.</p>
<p>If you find a flaw in this picture, or know of a result I have missed, I am genuinely interested. Peer review is welcome — open an issue on <a href="https://github.com/sebastianspicker/sebastianspicker.github.io/issues">GitHub</a>.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-Riemann1859"></span>Riemann, B. (1859). Über die Anzahl der Primzahlen unter einer gegebenen Grösse. <em>Monatsberichte der Berliner Akademie</em>.</p>
</li>
<li>
<p><span id="ref-Montgomery1973"></span>Montgomery, H. L. (1973). The pair correlation of zeros of the zeta function. <em>Analytic Number Theory</em>, Proc. Symp. Pure Math., 24, 181–193.</p>
</li>
<li>
<p><span id="ref-Odlyzko1987"></span>Odlyzko, A. M. (1987). On the distribution of spacings between zeros of the zeta function. <em>Mathematics of Computation</em>, 48, 273–308. <a href="https://doi.org/10.2307/2007890">DOI: 10.2307/2007890</a></p>
</li>
<li>
<p><span id="ref-BerryKeating1999"></span>Berry, M. V., &amp; Keating, J. P. (1999). The Riemann zeros and eigenvalue asymptotics. <em>SIAM Review</em>, 41(2), 236–266. <a href="https://doi.org/10.1137/S0036144598347497">DOI: 10.1137/S0036144598347497</a></p>
</li>
<li>
<p><span id="ref-Zhang2014"></span>Zhang, Y. (2014). Bounded gaps between primes. <em>Annals of Mathematics</em>, 179(3), 1121–1174. <a href="https://doi.org/10.4007/annals.2014.179.3.7">DOI: 10.4007/annals.2014.179.3.7</a></p>
</li>
<li>
<p><span id="ref-Maynard2015"></span>Maynard, J. (2015). Small gaps between primes. <em>Annals of Mathematics</em>, 181(1), 383–413. <a href="https://doi.org/10.4007/annals.2015.181.1.7">DOI: 10.4007/annals.2015.181.1.7</a></p>
</li>
<li>
<p><span id="ref-RudnickSarnak1996"></span>Rudnick, Z., &amp; Sarnak, P. (1996). Zeros of principal L-functions and random matrix theory. <em>Duke Mathematical Journal</em>, 81(2), 269–322. <a href="https://doi.org/10.1215/S0012-7094-96-08115-6">DOI: 10.1215/S0012-7094-96-08115-6</a></p>
</li>
<li>
<p><span id="ref-GIMPS2024"></span>GIMPS (2024). 2^136279841-1 is Prime! Great Internet Mersenne Prime Search. Retrieved from <a href="https://www.mersenne.org/primes/?press=M136279841">https://www.mersenne.org/primes/?press=M136279841</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-02-17</strong>: Corrected the date of the Montgomery-Dyson meeting from 1973 to 1972 (the paper was published in the 1973 proceedings volume, but the meeting at the IAS took place in April 1972).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Hamiltonian of Intelligence: From Spin Glasses to Neural Networks</title>
      <link>https://sebastianspicker.github.io/posts/spin-glass-hopfield-ai-physics-lineage/</link>
      <pubDate>Mon, 21 Oct 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/spin-glass-hopfield-ai-physics-lineage/</guid>
      <description>On October 8, 2024, Hopfield and Hinton were awarded the Nobel Prize in Physics. The physics community reacted with irritation: is machine learning really physics? The irritation is wrong. The energy function of a Hopfield network is literally the Ising Hamiltonian. The lineage runs from Giorgio Parisi&amp;rsquo;s disordered iron alloys in 1979 to the model that predicted the structures of 200 million proteins.</description>
      <content:encoded><![CDATA[<p>On October 8, 2024, the Royal Swedish Academy of Sciences announced that the Nobel Prize in Physics would go to John Hopfield and Geoffrey Hinton &ldquo;for foundational discoveries and inventions that enable machine learning with artificial neural networks.&rdquo; Within hours, the physics corner of the internet had an episode. Thermodynamics Twitter — yes, that is a thing — asked whether gradient descent is really physics in the sense that the Higgs mechanism is physics. The condensed matter community, who have been doing disordered systems since before most ML practitioners were born, oscillated between pride (&ldquo;finally, they noticed us&rdquo;) and bafflement (&ldquo;why is Hinton here and not Parisi?&rdquo;). There were takes. There were dunks. Someone made a graph of Nobel prizes versus average journal impact factor and it was not flattering to this year&rsquo;s winner.</p>
<p>I understand the irritation. I do not share it.</p>
<p>The argument I want to make is stronger than &ldquo;machine learning uses some physics concepts by analogy.&rdquo; The energy function that Hopfield wrote down in 1982 is not <em>inspired by</em> the Ising Hamiltonian. It <em>is</em> the Ising Hamiltonian. The machine that Hinton and Sejnowski built in 1985 is not named after Boltzmann as a cute metaphor. It is a physical system whose equilibrium distribution is the Boltzmann distribution, and whose learning algorithm is derived from statistical mechanics. The lineage from disordered magnets to protein structure prediction is not a convenient narrative; it is a sequence of mathematical identities.</p>
<p>Let me trace it properly.</p>
<h2 id="the-2021-nobel-parisi-and-the-frozen-magnet">The 2021 Nobel: Parisi and the frozen magnet</h2>
<p>Before we get to 2024, we need 2021. Giorgio Parisi received half the Nobel Prize in Physics that year for work done between 1979 and 1983 on spin glasses. The other half went to Syukuro Manabe and Klaus Hasselmann for climate modelling — an interesting pairing that provoked its own set of takes, though rather fewer.</p>
<p>A spin glass is a disordered magnetic system. The canonical physical realisation is a dilute alloy: a small concentration of manganese atoms dissolved in copper. Each manganese atom carries a magnetic moment — a spin — that can point in one of two directions, which we label $\sigma_i \in \{-1, +1\}$. The spins interact with each other via exchange interactions mediated by the conduction electrons. The crucial feature is that these interactions are random: some spin pairs prefer to align (ferromagnetic coupling, $J_{ij} > 0$) and others prefer to anti-align (antiferromagnetic coupling, $J_{ij} < 0$), and there is no spatial pattern to which is which.</p>
<p>The Hamiltonian of the system is</p>
$$H = -\sum_{i < j} J_{ij} \sigma_i \sigma_j$$<p>where the $J_{ij}$ are random variables drawn from some distribution. In the Sherrington-Kirkpatrick (SK) model (<a href="#ref-Sherrington1975">Sherrington &amp; Kirkpatrick, 1975</a>), all $N$ spins interact with all other spins — a mean-field model — and the couplings are drawn from a Gaussian distribution with mean zero and variance $J^2/N$:</p>
$$J_{ij} \sim \mathcal{N}\!\left(0,\, \frac{J^2}{N}\right)$$<p>The factor of $1/N$ is essential for extensivity: without it, the energy would scale as $N^2$ rather than $N$, which is unphysical.</p>
<p>Now here is the key phenomenon. At high temperature, the spins fluctuate freely and the system is paramagnetic. Cool it below the glass transition temperature $T_g$, and the system &ldquo;freezes&rdquo; — but not into a ferromagnet with all spins aligned, and not into a simple antiferromagnet. It freezes into one of an astronomically large number of disordered, metastable states. The system is not in its true ground state; it is trapped. It cannot find its way down because the energy landscape is rugged: every path toward lower energy is blocked by a barrier.</p>
<p>This rugged landscape is the central object. It has exponentially many local minima, separated by barriers that grow with system size. Different initial conditions lead to different frozen states. The system has memory of its history — hence &ldquo;glass&rdquo; rather than &ldquo;crystal.&rdquo;</p>
<p>Computing thermodynamic quantities in this system requires averaging over the disorder (the random $J_{ij}$), which means computing the quenched average of the free energy:</p>
$$\overline{F} = -T\, \overline{\ln Z}$$<p>The overline denotes an average over the distribution of couplings. The problem is that $\ln Z$ is hard to average because $Z$ is a sum of exponentially many terms. Parisi&rsquo;s solution — the replica trick — is a mathematical device worth describing, because it is beautifully strange.</p>
<p>The trick exploits the identity $\ln Z = \lim_{n \to 0} (Z^n - 1)/n$. We compute $\overline{Z^n}$ for integer $n$, which is feasible because $Z^n$ is a product of $n$ copies (replicas) of the partition function, and the average over disorder decouples. We then analytically continue in $n$ to $n \to 0$. The result is an effective action in terms of order parameters $q^{ab}$, which describe the overlap between spin configurations in replica $a$ and replica $b$.</p>
<p>The naive assumption is replica symmetry: all $q^{ab}$ are equal. This assumption turns out to be wrong. Parisi showed that the correct solution breaks replica symmetry in a hierarchical way — the overlap matrix $q^{ab}$ has a nested structure, described by a function $q(x)$ for $x \in [0,1]$. This is replica symmetry breaking (RSB).</p>
<p>RSB has a beautiful physical interpretation. The phase space of the spin glass is organised into an ultrametric tree: exponentially many states, arranged in nested clusters. States in the same cluster are similar (high overlap); states in different clusters are very different (low overlap). The hierarchy has infinitely many levels. Parisi showed that this structure is exact in the SK model (<a href="#ref-Parisi1979">Parisi, 1979</a>), and he spent the subsequent years proving it rigorously.</p>
<p>This is not an abstraction. RSB predicts specific, measurable properties of real spin glass alloys, and experiments have confirmed them. It is also, I want to emphasise, not a result that anyone expected. The mathematics forced it.</p>
<p>Three years after Parisi solved the SK model, a physicist at Bell Labs wrote a paper about memory.</p>
<h2 id="hopfield-1982-memory-as-energy-minimisation">Hopfield (1982): memory as energy minimisation</h2>
<p>John Hopfield was a condensed matter physicist who had drifted toward biophysics — electron transfer in proteins, neural computation. In 1982 he published a paper in PNAS with the title &ldquo;Neural networks and physical systems with emergent collective computational abilities&rdquo; (<a href="#ref-Hopfield1982">Hopfield, 1982</a>). Most biologists read it as a neuroscience paper. It is a statistical mechanics paper.</p>
<p>Hopfield defined a network of $N$ binary &ldquo;neurons&rdquo; $s_i \in \{-1, +1\}$ with symmetric weights $W_{ij} = W_{ji}$, and an energy function:</p>
$$E = -\frac{1}{2} \sum_{i \neq j} W_{ij}\, s_i s_j$$<p>Readers who have seen the SK Hamiltonian above will notice something. This is it. The $J_{ij}$ of the spin glass are the $W_{ij}$ of the neural network. The Ising spins $\sigma_i$ are the neuron states $s_i$. The Hopfield network energy function is the Ising model Hamiltonian with symmetric, fixed (non-random) couplings. This is not a metaphor. This is the same equation.</p>
<p>The dynamics: at each step, choose a neuron $i$ at random and update it according to</p>
$$s_i \leftarrow \text{sgn}\!\left(\sum_{j} W_{ij} s_j\right)$$<p>This update always decreases or leaves unchanged the energy $E$ (because the weights are symmetric). The network is a gradient descent machine on $E$. It will always converge to a local minimum — a fixed point.</p>
<p>The innovation is in how Hopfield chose the weights. To store a set of $p$ binary patterns $\xi^\mu \in \{-1,+1\}^N$ (for $\mu = 1, \ldots, p$), use Hebb&rsquo;s rule:</p>
$$W_{ij} = \frac{1}{N} \sum_{\mu=1}^{p} \xi^\mu_i\, \xi^\mu_j$$<p>This is the outer product rule. Each stored pattern contributes a rank-1 matrix to $W$. You can verify that if $s = \xi^\mu$, then the local field at neuron $i$ is</p>
$$h_i = \sum_j W_{ij} s_j = \frac{1}{N}\sum_j \sum_{\nu} \xi^\nu_i \xi^\nu_j \xi^\mu_j = \xi^\mu_i + \frac{1}{N}\sum_{\nu \neq \mu} \xi^\nu_i \underbrace{\left(\sum_j \xi^\nu_j \xi^\mu_j\right)}_{\text{cross-talk}}$$<p>The first term reinforces pattern $\mu$. The second term is noise from the other stored patterns. When the patterns are random and uncorrelated, the cross-talk averages to zero for the first term to dominate, and the stored patterns are stable fixed points of the dynamics. A noisy or incomplete input — a partial pattern — will evolve under the dynamics toward the nearest stored pattern. This is associative memory: content-addressable retrieval.</p>
<p>The capacity limit follows from the same analysis. As $p$ grows, the cross-talk grows. When $p$ exceeds approximately $0.14N$, the cross-talk overwhelms the signal, and the network begins to form spurious minima — states that are not any of the stored patterns but are mixtures or corruptions of them. The network has entered a spin-glass phase.</p>
<p>This is not a rough analogy. Amit, Gutfreund, and Sompolinsky showed in 1985 that the Hopfield model is <em>exactly</em> the SK model with $p$ planted minima (<a href="#ref-Amit1985">Amit, Gutfreund, &amp; Sompolinsky, 1985</a>). The phase diagram of the Hopfield model — paramagnetic phase, memory phase, spin-glass phase — maps precisely onto the phase diagram of the SK model. The capacity limit $p \approx 0.14N$ is the phase boundary between the memory phase and the spin-glass phase, derivable from Parisi&rsquo;s RSB theory.</p>
<p>The 2021 Nobel and the 2024 Nobel are, mathematically, about the same model.</p>
<h2 id="boltzmann-machines-hinton--sejnowski-1985">Boltzmann machines (Hinton &amp; Sejnowski, 1985)</h2>
<p>The Hopfield model is deterministic and shallow — one layer of visible neurons, no hidden structure. Geoffrey Hinton and Terry Sejnowski, in a collaboration that began at the Cognitive Science summer school in Pittsfield in 1983 and culminated in a 1985 paper (<a href="#ref-Ackley1985">Ackley, Hinton, &amp; Sejnowski, 1985</a>), added two things: hidden units and stochastic dynamics.</p>
<p>Hidden units $h_j$ are neurons not connected to any input or output. They do not correspond to observable quantities; they model latent structure in the data. The energy of the system is:</p>
$$E(\mathbf{v}, \mathbf{h}) = -\sum_{i,j} W_{ij}\, v_i h_j - \sum_i a_i v_i - \sum_j b_j h_j$$<p>where $v_i$ are the visible (data) units, $h_j$ are the hidden units, $a_i$ and $b_j$ are biases. Note that this is still an Ising-type energy; the $W_{ij}$ are now inter-layer weights.</p>
<p>The stochastic dynamics replace deterministic gradient descent with a Markov chain. Each unit is updated probabilistically:</p>
$$P(s_k = 1 \mid \text{rest}) = \sigma\!\left(\sum_j W_{kj} s_j + \text{bias}_k\right)$$<p>where $\sigma(x) = 1/(1 + e^{-x})$ is the logistic sigmoid. At inverse temperature $\beta = 1/T$, the probability of any complete configuration is</p>
$$P(\mathbf{v}, \mathbf{h}) = \frac{1}{Z}\, e^{-\beta E(\mathbf{v}, \mathbf{h})}$$<p>This is the Boltzmann distribution. The machine is named after Ludwig Boltzmann because the equilibrium distribution of its states is the Boltzmann distribution. Not analogously. Literally.</p>
<p>Learning amounts to adjusting the weights to make the model distribution $P(\mathbf{v}, \mathbf{h})$ match the data distribution $P_{\text{data}}(\mathbf{v})$. The objective is to minimise the Kullback-Leibler divergence:</p>
$$\mathcal{L} = D_{\mathrm{KL}}(P_{\text{data}} \| P_{\text{model}}) = \sum_{\mathbf{v}} P_{\text{data}}(\mathbf{v}) \ln \frac{P_{\text{data}}(\mathbf{v})}{P_{\text{model}}(\mathbf{v})}$$<p>The gradient with respect to the weight $W_{ij}$ is</p>
$$\frac{\partial \mathcal{L}}{\partial W_{ij}} = -\langle v_i h_j \rangle_{\text{data}} + \langle v_i h_j \rangle_{\text{model}}$$<p>The first term is the empirical correlation between visible unit $i$ and hidden unit $j$ when the visible units are clamped to data. The second term is the correlation in the model&rsquo;s free-running equilibrium. The learning rule says: increase $W_{ij}$ if the data sees these two units co-active more than the model does, and decrease it otherwise. This is Hebbian learning with a contrastive correction — the physics of equilibration drives the learning.</p>
<p>The computational difficulty is the second term. Computing $\langle v_i h_j \rangle_{\text{model}}$ requires the Markov chain to reach equilibrium, which takes exponentially long in general. Hinton&rsquo;s later invention of contrastive divergence — run the chain for only a few steps rather than to equilibrium — made training feasible, at the cost of a biased gradient estimate. This engineering compromise is part of why the physics purists are uncomfortable: the original derivation is rigorous statistical mechanics, but the algorithm that actually works in practice is an approximation whose convergence properties are poorly understood.</p>
<p>I find this charming rather than damning. Physics itself is full of approximations whose convergence properties are poorly understood but which happen to give right answers. Perturbation theory beyond leading order, the replica trick itself — these are not rigorous mathematics. They are informed guesses that happen to be correct. The history of theoretical physics is mostly the history of getting away with things.</p>
<h2 id="from-boltzmann-machines-to-transformers">From Boltzmann machines to transformers</h2>
<p>The Boltzmann machine was computationally difficult but conceptually foundational. The restricted Boltzmann machine (RBM) — with no within-layer connections, so that hidden units are conditionally independent given the visible units and vice versa — made training via contrastive divergence practical.</p>
<p>Hinton, Osindero, and Teh&rsquo;s 2006 paper on deep belief networks showed that stacking RBMs and pre-training them greedily could initialise deep networks well enough to fine-tune with backpropagation. This was the breakthrough that restarted deep learning after the winter of the 1990s. It is fair to say that without the Boltzmann machine as conceptual foundation and the RBM as practical building block, the deep learning revolution that gave us <a href="/posts/strawberry-tokenisation/">large language models that fail to count letters in words</a> would not have happened in the form it did.</p>
<p>The connection between Hopfield networks and modern attention mechanisms is more recent and more surprising. Ramsauer et al. (2020) showed that modern Hopfield networks — a generalisation of the original with continuous states and a different energy function — have exponential storage capacity (<a href="#ref-Ramsauer2020">Ramsauer et al., 2020</a>). More strikingly, the update rule of the modern Hopfield network is:</p>
$$\mathbf{s}^{\text{new}} = \mathbf{X}\, \text{softmax}\!\left(\beta \mathbf{X}^\top \mathbf{s}\right)$$<p>where $\mathbf{X}$ is the matrix of stored patterns and $\mathbf{s}$ is the query. This is the attention mechanism of the transformer, up to notation. The transformer&rsquo;s multi-head self-attention is, formally, a generalised Hopfield retrieval step. The architecture that powers GPT and everything descended from it is, at one level of abstraction, an associative memory performing energy minimisation on a Hopfield energy landscape.</p>
<p>I do not want to overstate this. The connection is formal and the interpretation is contested. But it is not nothing. The physicists who built the Hopfield network in 1982 were working on the same mathematical object that is now used to process language, images, and protein sequences at industrial scale.</p>
<h2 id="the-protein-folding-connection">The protein folding connection</h2>
<p>The 2024 Nobel Prize in Chemistry went to Demis Hassabis, John Jumper, and David Baker for computational protein structure prediction — specifically for AlphaFold2 (<a href="#ref-Jumper2021">Jumper et al., 2021</a>). This made October 2024 a remarkable month for Nobel Prizes in fields adjacent to artificial intelligence, and it is not a coincidence.</p>
<p>Protein folding is a spin-glass problem. A protein is a polymer of amino acids, each with different chemical properties and steric constraints. The protein folds into a unique three-dimensional structure — its native conformation — determined by its sequence. The energy landscape of the folding process is precisely the kind of rugged landscape that Parisi described for spin glasses: exponentially many misfolded states, separated by barriers, with the native structure as the global minimum (or close to it).</p>
<p>Levinthal&rsquo;s paradox, formulated in 1969, makes the absurdity quantitative. A modest protein of 100 amino acids might have $3^{100} \approx 10^{47}$ possible conformations (allowing three dihedral angle states per residue). Random search of this space, at the rate of one conformation per picosecond, would take $10^{35}$ years — somewhat longer than the age of the universe. Yet proteins fold in milliseconds to seconds. They do not search randomly; the energy landscape is funnel-shaped, channelling the dynamics toward the native state. But predicting <em>which</em> state is the native one from sequence alone remained one of the hard problems of structural biology for fifty years.</p>
<p>AlphaFold2 uses a transformer architecture — descended from the Boltzmann machine lineage — trained on millions of known protein structures. It does not simulate the folding dynamics; it has learned, from data, a mapping from sequence to structure that encodes the statistical mechanics of the folding funnel. The Nobel committee gave it the Chemistry prize because it is transforming biochemistry. But the conceptual machinery is pure statistical physics: representation of a high-dimensional energy landscape, approximation of the minimum, learned from the distribution of solved instances.</p>
<p>The three Nobels of 2021–2024 form the most coherent consecutive triple I can remember: Parisi showed how disordered energy landscapes behave; Hopfield and Hinton showed how to use energy landscapes as memory and learning machines; Hassabis and Jumper showed how to apply the resulting architecture to the most consequential outstanding problem in molecular biology. Each step is a mathematical consequence of the one before it.</p>
<h2 id="the-controversy-did-the-committee-err">The controversy: did the committee err?</h2>
<p>I said I understand the irritation. Here is what is right about it.</p>
<p>Hinton&rsquo;s work after the Boltzmann machine — backpropagation, dropout, convolutional networks, deep learning at ImageNet scale — is primarily engineering and empirical machine learning. The 2012 AlexNet result that restarted the field was not a theoretical physics contribution; it was a demonstration that known methods work very well on very large datasets with very large GPUs. The fact that it works is not explained by statistical mechanics. The scaling laws of neural networks (loss scales as a power law with compute, parameters, and data) are empirical observations that physicists have tried to explain with renormalisation group arguments with mixed success.</p>
<p>If the Nobel Prize in Physics were awarded for &ldquo;the work that most influenced technology in the past decade,&rdquo; the case for Hinton is strong. If it were awarded for &ldquo;the most important contribution to the science of physics,&rdquo; the case is weaker. There is a version of the Nobel announcement that emphasises the Boltzmann machine specifically — the 1985 paper that is literally named after a physicist and uses his distribution — and that version sits cleanly within physics. There is a broader version that encompasses all of Hinton&rsquo;s career, and that version includes a great deal of empirical machine learning that the physics community is reasonably reluctant to claim.</p>
<p>My view, for what it is worth from someone who has been <a href="/posts/ai-warfare-anthropic-atom-bomb/">thinking about AI ethics and consequences</a> for rather longer than feels comfortable: the Nobel correctly identifies that the foundational conceptual contributions — the Ising Hamiltonian as associative memory, the Boltzmann distribution as a learning target, the connection between statistical mechanics and computation — are physics. They came from physicists, they use physics mathematics, they extend physics intuition into a new domain. The subsequent scaling of these ideas using TPUs and transformer architectures is engineering. Valuable engineering, world-changing engineering, but engineering. The Nobel is for the former. If the citation had been more specific — &ldquo;for the Boltzmann machine and its demonstration that physical principles govern neural computation&rdquo; — the physics community would have been less irritated and equally correct.</p>
<p>What the irritation reveals is something slightly uncomfortable about disciplinary identity. Physicists are proud of universality: the idea that the same mathematical structures appear in wildly different physical systems. RSB in spin glasses, replica methods in random matrices, the Parisi–Sourlas correspondence between disordered systems and supersymmetric field theories — the joy of physics is precisely that these deep structural similarities cross domain boundaries. When that universality reaches into machine learning and says &ldquo;your transformer attention layer is a Hopfield retrieval step,&rdquo; physicists should be delighted, not affronted.</p>
<p>The <a href="/posts/ralph-loop/">agentic systems</a> that are being built right now on top of transformer architectures are doing something that looks, from a sufficiently abstract distance, like what the Hopfield network was designed to do: find stored patterns that match a query, and use them to generate a response. The <a href="/posts/car-wash-grounding/">failures of grounding</a> that I have written about elsewhere are, in this view, failures of the energy landscape — the model finds a metastable state that is not the correct minimum, and the dynamics cannot escape. Spin glass physics does not explain these failures in detail, but it gives a language for thinking about them. That is what physics is for.</p>
<h2 id="the-universality-argument">The universality argument</h2>
<p>Let me make the deeper claim explicit. Why should disordered magnets, associative memory networks, and protein folding all live in the same mathematical family?</p>
<p>Because they all have the same structure: many interacting degrees of freedom with competing constraints, a combinatorially large configuration space, an energy landscape with exponentially many metastable states, and dynamics that search for — and frequently fail to find — global minima. This is a universality class. The specific details (magnetic moments versus neuron states versus dihedral angles) are irrelevant at the level of the energy landscape topology.</p>
<p>Parisi&rsquo;s contribution was to show that this class has a specific, exactly-solvable structure in mean field theory, characterised by replica symmetry breaking and the ultrametric organisation of states. This was not a solution to one model. It was a description of a universality class. The fact that the Hopfield model is in this class is not a coincidence requiring explanation; it is a mathematical identity requiring verification.</p>
<p>The <a href="/posts/kuramoto-ensemble-sync/">Kuramoto model for coupled oscillators</a> — which I have written about in the context of ensemble synchronisation and neural phase coupling — is another member of this extended family. The synchronisation transition in the Kuramoto model, the glass transition in the SK model, and the memory phase transition in the Hopfield model are all mean-field phase transitions in disordered many-body systems. The mathematics is more similar than the physics syllabi suggest.</p>
<p>When I teach physics and occasionally venture into questions about what the AI tools my students are using actually do, I find myself reaching for this framework. Not because it gives engineering insight into how to train a better model — it does not, particularly — but because it gives honest insight into <em>what kind of thing</em> a neural network is. It is a physical system. It has an energy landscape. Its failures are phase transitions. Its successes are energy minimisation. The vocabulary of statistical mechanics is not a metaphor; it is the correct description.</p>
<p>The Nobel committee noticed. They were right to notice.</p>
<hr>
<p><em>The 2021 and 2024 Nobel Prizes in Physics have now officially bridged the gap between condensed matter physics and machine learning in the public record. For anyone who wants to understand either field more deeply than the press releases suggest, the SK model and the Hopfield network are the right place to start. Both papers are short by modern standards — Parisi&rsquo;s 1979 letter is three pages; Hopfield&rsquo;s 1982 PNAS paper is five — and both repay close reading.</em></p>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-Sherrington1975"></span>Sherrington, D., &amp; Kirkpatrick, S. (1975). Solvable model of a spin-glass. <em>Physical Review Letters</em>, 35(26), 1792–1796. <a href="https://doi.org/10.1103/PhysRevLett.35.1792">DOI: 10.1103/PhysRevLett.35.1792</a></p>
</li>
<li>
<p><span id="ref-Parisi1979"></span>Parisi, G. (1979). Infinite number of order parameters for spin-glasses. <em>Physical Review Letters</em>, 43(23), 1754–1756. <a href="https://doi.org/10.1103/PhysRevLett.43.1754">DOI: 10.1103/PhysRevLett.43.1754</a></p>
</li>
<li>
<p><span id="ref-Hopfield1982"></span>Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. <em>Proceedings of the National Academy of Sciences</em>, 79(8), 2554–2558. <a href="https://doi.org/10.1073/pnas.79.8.2554">DOI: 10.1073/pnas.79.8.2554</a></p>
</li>
<li>
<p><span id="ref-Ackley1985"></span>Ackley, D. H., Hinton, G. E., &amp; Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. <em>Cognitive Science</em>, 9(1), 147–169. <a href="https://doi.org/10.1207/s15516709cog0901_7">DOI: 10.1207/s15516709cog0901_7</a></p>
</li>
<li>
<p><span id="ref-Amit1985"></span>Amit, D. J., Gutfreund, H., &amp; Sompolinsky, H. (1985). Storing infinite numbers of patterns in a spin-glass model of neural networks. <em>Physical Review Letters</em>, 55(14), 1530–1533. <a href="https://doi.org/10.1103/PhysRevLett.55.1530">DOI: 10.1103/PhysRevLett.55.1530</a></p>
</li>
<li>
<p><span id="ref-Jumper2021"></span>Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. <em>Nature</em>, 596, 583–589. <a href="https://doi.org/10.1038/s41586-021-03819-2">DOI: 10.1038/s41586-021-03819-2</a></p>
</li>
<li>
<p><span id="ref-Ramsauer2020"></span>Ramsauer, H., Schäfl, B., Lehner, J., Seidl, P., Widrich, M., Adler, T., Gruber, L., Holzleitner, M., Pavlović, M., Sandve, G. K., Greiff, V., Kreil, D., Kopp, M., Klambauer, G., Brandstetter, J., &amp; Hochreiter, S. (2020). Hopfield networks is all you need. <em>arXiv:2008.02217</em>. Retrieved from <a href="https://arxiv.org/abs/2008.02217">https://arxiv.org/abs/2008.02217</a></p>
</li>
<li>
<p><span id="ref-Nobel2024"></span>Nobel Prize Committee. (2024). Scientific background: Machine learning and physical systems. The Royal Swedish Academy of Sciences. Retrieved from <a href="https://www.nobelprize.org/prizes/physics/2024/advanced-information/">https://www.nobelprize.org/prizes/physics/2024/advanced-information/</a></p>
</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Three Rs in Strawberry: What the Viral Counting Test Actually Reveals</title>
      <link>https://sebastianspicker.github.io/posts/strawberry-tokenisation/</link>
      <pubDate>Mon, 07 Oct 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/strawberry-tokenisation/</guid>
      <description>In September 2024, OpenAI revealed that its new o1 model had been code-named &amp;ldquo;Strawberry&amp;rdquo; internally — the same word that language models have famously been unable to count letters in. The irony was too perfect to pass up. But the counting failure is not a sign that LLMs are naive or broken. It is a precise, informative symptom of how they process text. Here is the actual explanation, with a minimum of hand-waving.</description>
      <content:encoded><![CDATA[<h2 id="the-setup">The Setup</h2>
<p>In September 2024, OpenAI publicly confirmed that their new reasoning model
had been code-named &ldquo;Strawberry&rdquo; during development. This landed with a
particular thud because &ldquo;how many r&rsquo;s are in strawberry?&rdquo; had, by that
point, become one of the canonical demonstrations of language model failure.
The model named after strawberry could not count the letters in strawberry.
The internet had opinions.</p>
<p>Before the opinions: the answer is three. s-t-<strong>r</strong>-a-w-b-e-<strong>r</strong>-<strong>r</strong>-y.
One in the <em>str-</em> cluster, two in the <em>-rry</em> ending. Count carefully and
you will find that most people get this right on the first try, and most
large language models get it wrong, returning &ldquo;two&rdquo; with apparent
confidence.</p>
<p>The question worth asking is not &ldquo;why is the model stupid.&rdquo; It is not
stupid, and &ldquo;stupid&rdquo; is not a useful category here. The question is: what
does this specific error reveal about the structure of the system?</p>
<p>The answer involves tokenisation, and it is actually interesting.</p>
<hr>
<h2 id="how-you-count-letters-and-how-the-model-doesnt">How You Count Letters (and How the Model Doesn&rsquo;t)</h2>
<p>When you count the r&rsquo;s in &ldquo;strawberry,&rdquo; you do something like this:
scan the string left to right, maintain a running count, increment it
each time you see the target character. This is a sequential operation
over a character array. It requires no semantic knowledge about the word —
it does not matter whether &ldquo;strawberry&rdquo; is a fruit, a colour, or a
nonsense string. The characters are the input; the count is the output.</p>
<p>A language model does not receive a character array. It receives a
sequence of <em>tokens</em> — chunks produced by a compression algorithm called
Byte Pair Encoding (BPE) that the model was trained with. In the
tokeniser used by GPT-class models, &ldquo;strawberry&rdquo; is most likely split as:</p>
$$\underbrace{\texttt{str}}_{\text{token 1}} \;\underbrace{\texttt{aw}}_{\text{token 2}} \;\underbrace{\texttt{berry}}_{\text{token 3}}$$<p>Three tokens. The model&rsquo;s input is these three integer IDs, each looked up
in an embedding table to produce a vector. There is no character array.
There is no letter &ldquo;r&rdquo; sitting at a known position. There are three dense
vectors representing &ldquo;str,&rdquo; &ldquo;aw,&rdquo; and &ldquo;berry.&rdquo;</p>
<hr>
<h2 id="what-bpe-does-and-doesnt-preserve">What BPE Does (and Doesn&rsquo;t) Preserve</h2>
<p>BPE is a greedy compression algorithm. Starting from individual bytes,
it iteratively merges the most frequent pair of adjacent symbols into a
single new token:</p>
$$\text{merge}(a, b) \;:\; \underbrace{a \;\; b}_{\text{separate}} \;\longrightarrow\; \underbrace{ab}_{\text{single token}}$$<p>Applied to a large text corpus until a fixed vocabulary size is reached,
this produces a vocabulary of common subwords. Frequent words and common
word-parts become single tokens; rare sequences stay as multi-token
fragments.</p>
<p>What BPE optimises for is compression efficiency, not character-level
transparency. The token &ldquo;straw&rdquo; encodes the sequence s-t-r-a-w as a
unit, but that character sequence is not explicitly represented anywhere
inside the model once the embedding lookup has occurred. The model
receives a vector for &ldquo;straw,&rdquo; not a list of its constituent letters.</p>
<p>The character composition of a token is only accessible to the model
insofar as it was implicitly learned during training — through seeing
&ldquo;straw&rdquo; appear in contexts where its internal structure was relevant.
For most tokens, most of the time, that character structure was not
relevant. The model learned what &ldquo;straw&rdquo; means, not how to spell it
character by character.</p>
<hr>
<h2 id="why-the-error-is-informative">Why the Error Is Informative</h2>
<p>Most people say the model returns &ldquo;two r&rsquo;s,&rdquo; not &ldquo;one&rdquo; or &ldquo;four&rdquo; or
&ldquo;none.&rdquo; This is not random noise. It is a systematic error, and systematic
errors are diagnostic.</p>
<p>&ldquo;berry&rdquo; contains two r&rsquo;s: b-e-<strong>r</strong>-<strong>r</strong>-y. If you ask most models
&ldquo;how many r&rsquo;s in berry?&rdquo; they get it right. The model has seen that
question, or questions closely enough related, that the right count is
encoded somewhere in the weight structure.</p>
<p>&ldquo;str&rdquo; contains one r: s-t-<strong>r</strong>. But as a token it is a short, common
prefix that appears in hundreds of words — <em>string</em>, <em>strong</em>, <em>stream</em> —
contexts in which its internal letter structure is rarely attended to.
&ldquo;aw&rdquo; contains no r&rsquo;s. When the model answers &ldquo;two,&rdquo; it is almost
certainly counting the r&rsquo;s in &ldquo;berry&rdquo; correctly and failing to notice
the one in &ldquo;str.&rdquo; The token boundaries are where the error lives.</p>
<p>This is not stupidity. It is a precise failure mode that follows directly
from the tokenisation structure. You can predict where the error will
occur by looking at the token split.</p>
<hr>
<h2 id="chain-of-thought-partially-fixes-this-and-why">Chain of Thought Partially Fixes This (and Why)</h2>
<p>If you prompt the model to &ldquo;spell out the letters first, then count,&rdquo; the
error rate drops substantially. The reason is not mysterious: forcing
the model to generate a character-by-character expansion — s, t, r, a,
w, b, e, r, r, y — puts the individual characters into the context window
as separate tokens. Now the model is not working from &ldquo;straw&rdquo; and &ldquo;berry&rdquo;;
it is working from ten single-character tokens, and counting sequential
characters in a flat list is a task the model handles much better.</p>
<p>This is, in effect, making the model do manually what a human does
automatically: convert the compressed token representation back to an
enumerable character sequence before counting. The cognitive work is the
same; the scaffolding just has to be explicit.</p>
<hr>
<h2 id="the-right-frame">The Right Frame</h2>
<p>The &ldquo;how many r&rsquo;s&rdquo; test is sometimes cited as evidence that language models
don&rsquo;t &ldquo;really&rdquo; understand text, or that they are sophisticated autocomplete
engines with no genuine knowledge. These framing choices produce more heat
than light.</p>
<p>The more precise statement is this: language models were trained to predict
likely next tokens in large text corpora. That training objective produces
a system that is very good at certain tasks (semantic inference, translation,
summarisation, code generation) and systematically bad at others (character
counting, exact arithmetic, precise spatial reasoning). The system is not
doing what you are doing when you read a sentence. It is doing something
different, which happens to produce similar outputs for a very wide range
of inputs — and different outputs for a class of inputs where the
character-level structure matters.</p>
<p>&ldquo;Strawberry&rdquo; sits squarely in that class. The model is not failing to
read the word. It is succeeding at predicting what a plausible-sounding
answer looks like, based on a compressed representation that does not
preserve the information needed to get the count right. Those are not the
same thing, and the distinction is worth keeping clear.</p>
<hr>
<p><em>The tokenisation argument here is a simplified version. Real BPE
vocabularies, positional encodings, and the specific way character
information is or isn&rsquo;t preserved in embedding tables are more complicated
than this post suggests. But the core point — that the model&rsquo;s input
representation is not a character array and never was — holds.</em></p>
<p><em>A follow-up post covers a structurally different failure mode:
<a href="/posts/car-wash-grounding/">Should I Drive to the Car Wash?</a> — where
the model understood the question perfectly but lacked access to the
world state the question was about.</em></p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Gage, P. (1994). A new algorithm for data compression. <em>The C Users
Journal</em>, 12(2), 23–38.</p>
</li>
<li>
<p>Sennrich, R., Haddow, B., &amp; Birch, A. (2016). <strong>Neural machine
translation of rare words with subword units.</strong> <em>Proceedings of the
54th Annual Meeting of the Association for Computational Linguistics
(ACL 2016)</em>, 1715–1725. <a href="https://arxiv.org/abs/1508.07909">https://arxiv.org/abs/1508.07909</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-01</strong>: Corrected the tokenisation of &ldquo;strawberry&rdquo; from two tokens (<code>straw|berry</code>) to three tokens (<code>str|aw|berry</code>), matching the actual cl100k_base tokeniser used by GPT-4. The directional argument (token boundaries obscure character-level information) is unchanged; the specific analysis was updated accordingly.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Why Cats Purr at 25 Hz: Vocal Fold Pads and the Physics of Self-Sustained Oscillation</title>
      <link>https://sebastianspicker.github.io/posts/purring-physics-vocal-fold-pads/</link>
      <pubDate>Mon, 09 Sep 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/purring-physics-vocal-fold-pads/</guid>
      <description>For decades, the mechanism of purring was disputed. A 2023 paper in Current Biology showed that cat larynges purr without any neural input: airflow alone drives a self-sustained oscillation. The secret is connective tissue pads embedded in the vocal folds that increase effective mass and lower the resonant frequency to 25–30 Hz — the same range used clinically for bone- density stimulation and fracture healing under Wolff&amp;rsquo;s law.</description>
      <content:encoded><![CDATA[<p><em>The first thing either of our cats did when I sat still long enough was purr.
Not after food, not during play — the purr arrived when I sat down and held
still and they settled against me, and it arrived as a physical fact, a vibration
felt through the sternum and the ribs, not merely heard. The frequency was low:
around 25–30 cycles per second, which you can feel as a buzz rather than hear
as a tone. This is, I later confirmed, not far from the frequency at which
clinical devices stimulate bone growth. They are indoor cats now, on our vet&rsquo;s
recommendation — they find this unreasonable, but sitting still and being purred
on has become a regular feature of working from home.</em></p>
<p><em>The physics of how the larynx produces that frequency is, as of 2023, finally
resolved — and the mechanism is more elegant than anyone suspected.</em></p>
<hr>
<h2 id="the-frequency-and-its-peculiarity">The Frequency and Its Peculiarity</h2>
<p>Domestic cats purr at approximately $25$–$30\,\mathrm{Hz}$. This is
remarkably low for an animal of cat size. A human vocal fold — roughly
comparable in size — vibrates at $85$–$255\,\mathrm{Hz}$ for normal speech.
A cat&rsquo;s larynx is smaller than a human&rsquo;s, not larger, which makes the low
frequency surprising: in a simple spring-mass oscillator model, smaller and
lighter vocal folds should vibrate <em>faster</em>, not slower.</p>
<p>The frequency range $25$–$50\,\mathrm{Hz}$ has clinical significance in a
different field. Therapeutic vibration platforms used in sports medicine and
osteoporosis treatment operate in exactly this range, exploiting Wolff&rsquo;s law
(bone remodelling under mechanical stress) to increase bone density and
accelerate fracture repair. The coincidence is suggestive. It was first
noted quantitatively by von Muggenthaler (2001, <em>Journal of the Acoustical
Society of America</em> 110, 2666), who recorded purrs from 44 felids and
found that all produced dominant frequencies between $25$ and $150\,\mathrm{Hz}$.</p>
<p>Whether cats deliberately exploit this frequency for self-healing is a separate
biological question. The physics question is simpler: how does the larynx
produce it?</p>
<hr>
<h2 id="flow-induced-vocal-fold-oscillation">Flow-Induced Vocal Fold Oscillation</h2>
<p>Vocal fold oscillation in mammals is a flow-induced, self-sustained mechanical
phenomenon. The Bernoulli effect and elastic restoring forces create a
feedback loop that keeps the folds oscillating as long as subglottal air
pressure is maintained.</p>
<p>The mechanism is as follows. The lungs supply a steady subglottal pressure
$p_\mathrm{sub}$. This drives airflow through the glottis (the gap between the
vocal folds). As the folds are pushed apart by the pressure, the airflow
velocity in the narrowed glottis increases; by Bernoulli&rsquo;s principle,</p>
$$p + \tfrac{1}{2}\rho v^2 = \mathrm{const},$$<p>the pressure drops, drawing the folds back together. The folds&rsquo; elastic
restoring force adds to this: they spring back when displaced. The result is
an oscillation — the folds open and close periodically, chopping the airflow
into pressure pulses that we perceive as sound (or vibration, for low
frequencies).</p>
<p>The fundamental frequency is approximately:</p>
$$f_0 \approx \frac{1}{2L}\sqrt{\frac{T}{\rho_s}},$$<p>where $L$ is the vibrating length of the vocal fold, $T$ is the longitudinal
tension, and $\rho_s$ is the surface density (mass per unit area). This is
the same formula as for a vibrating string — and the physics is closely
related.</p>
<p>For a cat-sized larynx with $L \approx 1\,\mathrm{cm}$, realistic tissue
tension, and tissue density $\rho_s \sim 1\,\mathrm{kg/m}^2$, this formula
gives $f_0$ in the hundreds of hertz — far above the observed purring
frequency of $25$–$30\,\mathrm{Hz}$.</p>
<p>Something is missing from the model.</p>
<hr>
<h2 id="the-long-standing-controversy">The Long-Standing Controversy</h2>
<p>Until 2023, the dominant explanation for the low purring frequency was the
<strong>Active Muscular Contraction (AMC) hypothesis</strong>: the laryngeal muscles
contract rhythmically at the purring frequency, mechanically driving the
vocal folds rather than relying on passive aeroelastic oscillation. On this
view, purring is more like a drumming than a singing — the neural drive at
$25$–$30\,\mathrm{Hz}$ sets the frequency, overriding the natural aeroelastic
frequency.</p>
<p>The AMC hypothesis was difficult to test directly because the larynx is
inaccessible in a live, purring cat without interfering with the purr.
Electromyographic recordings from laryngeal muscles of purring cats showed
rhythmic activity consistent with the AMC hypothesis, but causality was unclear:
were the muscles driving the oscillation, or responding to it?</p>
<p>The alternative hypothesis — that purring is passive, driven purely by
aeroelastic forces — faced the problem noted above: the aeroelastic frequency
of a cat-sized larynx should be far too high to explain $25$–$30\,\mathrm{Hz}$.
Unless something was being added to the vocal folds to lower their effective
resonant frequency.</p>
<hr>
<h2 id="herbst-et-al-2023-the-mass-loading-mechanism">Herbst et al. 2023: The Mass-Loading Mechanism</h2>
<p>In October 2023, Christian Herbst and colleagues at the University of Vienna
published &ldquo;Domestic cat larynges can produce purring frequencies without neural
input&rdquo; (<em>Current Biology</em> 33, 4727–4732). The experiment was decisive.</p>
<p>The team excised larynges from domestic cats (post-mortem, within a short time
window to preserve tissue properties) and mounted them in a flow bench: a
controlled airflow was supplied to the subglottal side, and the larynges were
held at physiologically realistic tension and hydration.</p>
<p><strong>The result</strong>: all eight excised larynges produced self-sustained oscillations
at $25$–$30\,\mathrm{Hz}$ — the normal purring frequency — without any neural
input whatsoever. No muscular contraction was present (no motor neurons, no
calcium signalling, no ATP). The oscillation was purely passive, driven by the
airflow and maintained by the tissue mechanics.</p>
<p>This ruled out the AMC hypothesis. The neural drive is not needed to sustain
the oscillation; it may modulate it, start or stop it, but the fundamental
frequency is set by the tissue mechanics, not the neural firing rate.</p>
<p>The follow-up finding was the key to the physics: histological analysis of the
vocal fold tissue revealed <strong>connective tissue pads</strong> embedded in the vocal
fold mucosa, up to $4\,\mathrm{mm}$ thick. These pads are not present in the
vocal folds of humans or other mammals that do not purr. They increase the
effective mass of the oscillating tissue significantly, without adding
corresponding stiffness.</p>
<hr>
<h2 id="the-mass-loading-physics">The Mass-Loading Physics</h2>
<p>The fundamental frequency of a harmonic oscillator is:</p>
$$f_0 = \frac{1}{2\pi}\sqrt{\frac{k}{m}},$$<p>where $k$ is the effective stiffness and $m$ is the effective mass. Adding mass
(at constant stiffness) lowers the frequency as $f_0 \propto m^{-1/2}$.</p>
<p>For the vocal folds, the spring constant $k$ is set by tissue tension and
elasticity — properties that the tissue pads do not significantly alter. But
the pads add a substantial mass $\Delta m$ to the oscillating system. The
purring frequency becomes:</p>
$$f_\mathrm{purr} = \frac{1}{2\pi}\sqrt{\frac{k}{m_0 + \Delta m}},$$<p>where $m_0$ is the baseline vocal fold mass and $\Delta m$ is the added mass
from the pads.</p>
<p>As a rough estimate: if the unloaded aeroelastic frequency were in the
range $f_\mathrm{normal} \approx 200$–$400\,\mathrm{Hz}$ (the range of
cat meow fundamental frequencies), lowering it to $f_\mathrm{purr} \approx
25\,\mathrm{Hz}$ would require a mass increase by a factor of</p>
$$\frac{m_0 + \Delta m}{m_0} = \left(\frac{f_\mathrm{normal}}{f_\mathrm{purr}}\right)^2
\approx 64\text{–}256.$$<p>This is a large factor, but not implausible for pads up to 4 mm thick
embedded in a mucosal membrane that is itself very thin. The simple
harmonic oscillator model is an idealisation — the actual frequency reduction
also involves changes in vibration mode shape, tissue coupling, and
aerodynamic loading — but the mass-loading effect is the dominant mechanism.
The tissue pads are, in effect, frequency dividers: they convert a
high-frequency aeroelastic oscillator into a low-frequency vibration
generator.</p>
<p>This is the same principle used in engineering to lower the natural frequency
of mechanical structures: add mass without changing stiffness. Tuned mass
dampers in skyscrapers work on the same principle. So do the heavy flywheel
weights added to engines to suppress rotational vibration.</p>
<p>The cat&rsquo;s larynx evolved this solution independently, and with a mass ratio
that would impress a structural engineer.</p>
<hr>
<h2 id="the-self-sustained-oscillation-criterion">The Self-Sustained Oscillation Criterion</h2>
<p>Not every mass-loaded oscillator will self-sustain under airflow. The
Bernoulli-elastic feedback loop must overcome the viscous damping of the
tissue. A dimensional scaling estimate for the critical subglottal pressure is:</p>
$$p^* \sim \eta_\mathrm{tissue} \cdot \frac{v}{L} \sim \eta_\mathrm{tissue} \cdot f_0,$$<p>where $\eta_\mathrm{tissue}$ is the tissue viscosity, $v \sim f_0 L$ is the
characteristic mucosal wave velocity, and $L$ is the fold length. (The full
phonation threshold pressure, as derived by Titze (2006), depends on
additional geometric and aerodynamic parameters.) For typical laryngeal tissue properties and the observed purring
frequency, this critical pressure is of order $100$–$200\,\mathrm{Pa}$ —
low enough to be sustained by the respiratory system without extraordinary
effort.</p>
<p>This is consistent with the observation that cats can purr both during
inhalation and exhalation, maintaining a continuous acoustic output throughout
the breathing cycle. The oscillation threshold is low enough that normal
respiration can maintain it.</p>
<hr>
<h2 id="wolffs-law-and-the-25-hz-coincidence">Wolff&rsquo;s Law and the 25 Hz Coincidence</h2>
<p>Julius Wolff (1892) proposed that bone remodels in response to mechanical
loading: osteoblasts (bone-building cells) are stimulated by cyclic compressive
stress, while osteoclasts (bone-resorbing cells) dominate in the absence of
loading. This principle — now called Wolff&rsquo;s law — underpins the use of
therapeutic vibration in orthopaedics.</p>
<p>The optimal frequency for osteoblast stimulation, determined empirically in
clinical studies, is $20$–$50\,\mathrm{Hz}$. Vibration at these frequencies,
applied at amplitudes of $0.2$–$1.0\,g$ (where $g$ is gravitational
acceleration), produces measurable increases in bone mineral density, accelerates
fracture healing, and reduces bone loss in microgravity. The frequency range
is not a narrow resonance; it reflects the natural frequencies of cellular
mechanotransduction pathways involving focal adhesion kinase (FAK) and
integrin signalling.</p>
<p>Cat purring produces vibration in the frequency range $25$–$50\,\mathrm{Hz}$
at the body surface. Whether this is sufficient to produce meaningful bone
stimulation — and whether cats evolved purring partly as a bone-maintenance
mechanism — is not yet resolved by controlled experiments. The hypothesis is
physiologically plausible: cats conserve metabolic energy by resting for up
to 16 hours per day, and during this rest period, bone would normally be
unstressed and subject to resorption. A continuous low-frequency vibration
during rest could counteract this.</p>
<p>This is speculative at the level of evolutionary causation. What is not
speculative is that the purring frequency overlaps precisely with the
therapeutic vibration range, and that this overlap is not obviously accidental.</p>
<hr>
<h2 id="across-felid-species">Across Felid Species</h2>
<p>Von Muggenthaler&rsquo;s 2001 survey of 44 felids found that most domestic
cats purr in the range $25$–$30\,\mathrm{Hz}$, with harmonics at $50$,
$75\,\mathrm{Hz}$, and so on. Cheetahs purr at $20$–$25\,\mathrm{Hz}$;
pumas (mountain lions) at $20$–$30\,\mathrm{Hz}$; servals and ocelots at
$22$–$28\,\mathrm{Hz}$.</p>
<p>The large roaring cats — lions, tigers, leopards, jaguars — do not purr in
the continuous sense that domestic cats do. Their enlarged hyoid apparatus
allows roaring by a different mechanism (a modified laryngeal pad that
allows very low-frequency, high-intensity sound production). Some large cats
produce purr-like sounds during exhalation but not the continuous through-
inhalation-and-exhalation purring of smaller felids.</p>
<p>The vocal fold pad mechanism appears to be specific to the non-roaring felids,
though detailed histological comparisons across species are still sparse.</p>
<hr>
<h2 id="what-i-hear">What I Hear</h2>
<p>When one of our cats purrs while settled against me, what I am feeling is the
mechanical resonance of a mass-loaded aeroelastic oscillator at approximately
$25\,\mathrm{Hz}$, the frequency having been lowered by connective tissue pads
from a natural aeroelastic frequency several hundred hertz higher. The pads
evolved, we think, to produce exactly this frequency — sustained under normal
respiratory airflow pressure with no additional muscular energy. The acoustic
output is a byproduct of a vibration.</p>
<p>Whether the vibration serves a direct physiological function in the cat&rsquo;s own
bones is, as of this writing, still an open question. What seems clear is that
the 2023 paper settled the mechanism question conclusively: the frequency is
set by mass loading, not neural drive. The larynx purrs by itself when you
blow air through it.</p>
<p>I find this reassuring. The physics is in the cat, not in its nervous system.
The cat purrs the way a tuning fork rings — not because it decides to, but
because that is what it does when the conditions are right.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Herbst, C.T., Prigge, T., Garcia, M., Hampala, V., Hofer, R., Weissengruber,
G.E., Svec, J.G., &amp; Fitch, W.T. (2023). Domestic cat larynges can produce
purring frequencies without neural input. <em>Current Biology</em>, 33(22),
4727–4732.e4. <a href="https://doi.org/10.1016/j.cub.2023.09.014">https://doi.org/10.1016/j.cub.2023.09.014</a></p>
</li>
<li>
<p>von Muggenthaler, E. (2001). The felid purr: A healing mechanism?
<em>Journal of the Acoustical Society of America</em>, 110(5), 2666.
<a href="https://doi.org/10.1121/1.4777098">https://doi.org/10.1121/1.4777098</a></p>
</li>
<li>
<p>Titze, I.R. (2006). <em>The Myoelastic Aerodynamic Theory of Phonation.</em>
National Center for Voice and Speech.</p>
</li>
<li>
<p>Wolff, J. (1892). <em>Das Gesetz der Transformation der Knochen.</em> A. Hirschwald.
(English translation: Maquet, P., &amp; Furlong, R., 1986. <em>The Law of Bone
Remodelling.</em> Springer.)</p>
</li>
<li>
<p>Rubin, C.T., &amp; Lanyon, L.E. (1984). Regulation of bone formation by applied
dynamic loads. <em>Journal of Bone and Joint Surgery</em>, 66(3), 397–402.
<a href="https://doi.org/10.2106/00004623-198466030-00012">https://doi.org/10.2106/00004623-198466030-00012</a></p>
</li>
<li>
<p>Christiansen, P. (2008). Evolution of skull and mandible shape in cats
(Carnivora: Felidae). <em>PLOS ONE</em>, 3(7), e2807.
<a href="https://doi.org/10.1371/journal.pone.0002807">https://doi.org/10.1371/journal.pone.0002807</a></p>
</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Invisible Entrance Fee: On Privilege, Education, and the Institutions That Reproduce Both</title>
      <link>https://sebastianspicker.github.io/posts/privilege-and-education/</link>
      <pubDate>Tue, 20 Aug 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/privilege-and-education/</guid>
      <description>Education is supposed to be the great equaliser. The evidence says otherwise. Bourdieu called it decades ago: schools reproduce the social order they pretend to transcend. Privilege is the entrance fee that nobody admits is being charged.</description>
      <content:encoded><![CDATA[<p>There is a persistent story that education systems tell about themselves: that they are meritocratic. That talent, effort, and intelligence are what determine outcomes. That the playing field, if not perfectly level, is at least aspiring toward levelness. That a good enough student from any background can succeed.</p>
<p>This story is not supported by the evidence.</p>
<p>The relationship between socioeconomic background and educational outcomes is one of the most replicated findings in social science. PISA data from Germany consistently show one of the steepest socioeconomic gradients in the OECD — the correlation between parental education and student performance is higher here than in most comparable countries. This is not a recent finding. It has been stable for decades. The system produces it reliably, which means the system is, in some meaningful sense, designed to produce it — even if no individual actor intended that design.</p>
<p>Understanding why requires a different vocabulary than the one most educational institutions use about themselves.</p>
<h2 id="bourdieus-three-capitals">Bourdieu&rsquo;s Three Capitals</h2>
<p>Pierre Bourdieu spent much of his career developing an account of how social inequality reproduces itself through culture and education. The core concept is capital — but not only the economic kind.</p>
<p>Bourdieu (1986) distinguishes three forms:</p>
<p><strong>Economic capital</strong> is material resources: money, assets, time purchased through money. This is the most visible form of advantage. Wealthier families can pay for tutoring, for better-resourced schools, for the unpaid internships that build CVs, for the years of postgraduate study that increasingly function as the entrance requirement for professional careers.</p>
<p><strong>Cultural capital</strong> is more subtle. It includes dispositions, skills, and knowledge that are valued by educational institutions and professional fields — but valued in a way that tends to favour those who acquired them at home, in childhood, before formal education began. The ease with which a student navigates a seminar. The familiarity with the tacit conventions of academic writing. The sense that the university is, broadly, a place made for people like you. These are not things that are explicitly taught; they are things that are transmitted, Bourdieu argues, through families whose own cultural capital aligns with what the institution expects.</p>
<p><strong>Social capital</strong> is networks: the web of relationships that provide information, referrals, opportunities, and vouching. Who you know, in the flattest possible terms.</p>
<p>All three reinforce each other. Economic capital can be converted into cultural capital through education and into social capital through exclusive networks. Cultural capital eases access to prestigious institutions, which build social capital. The system is not static, but it has a strong gravitational pull toward reproduction.</p>
<p>Bourdieu and Passeron (1977) developed this into a theory of education as <em>reproduction</em>: the function of educational institutions is not primarily to transmit knowledge but to legitimate the transmission of social position from one generation to the next. The process is misrecognised — by students, teachers, and institutions — as meritocracy. This misrecognition is essential to the function. If it were transparent, it would lose its legitimising power.</p>
<h2 id="the-hidden-curriculum">The Hidden Curriculum</h2>
<p>Philip Jackson (1968) coined the term <em>hidden curriculum</em> for everything that schools teach that is not in the official syllabus. How to sit still. How to wait your turn. How to speak to authority. How to navigate institutions, read implicit expectations, manage bureaucracies. How to understand that your job is to demonstrate competence within a form that someone else has set.</p>
<p>For students whose home culture aligns with the institutional culture, the hidden curriculum is invisible. They already know it; it requires no effort; it is simply how things are. For students whose home culture diverges, it is a second curriculum that must be decoded while simultaneously managing the official one.</p>
<p>Lareau (2003) documented this in careful ethnographic detail. Middle-class families engage in what she calls <em>concerted cultivation</em> — a mode of child-rearing that practises precisely the dispositions valued by educational institutions: articulate self-advocacy with adults, a sense of entitlement to ask questions and seek explanations, activities structured around developing discrete skills. Working-class and poor families, in her study, more often practised <em>accomplishment of natural growth</em> — providing security, affection, and freedom without the institutional structuring. Neither is better parenting. But one of them is what the school expects.</p>
<p>The child who arrives at school already knowing how to talk to teachers, how to present themselves, how to advocate for their own needs, has a significant advantage that is invisible in the transcript. It does not appear as &ldquo;privilege&rdquo;; it appears as &ldquo;ability&rdquo; or &ldquo;maturity&rdquo;. The institutional category does the misrecognising work.</p>
<h2 id="privilege-as-invisible-to-those-who-have-it">Privilege as Invisible to Those Who Have It</h2>
<p>Peggy McIntosh (1989) wrote what became one of the most cited — and most contested — essays in education: &ldquo;White Privilege: Unpacking the Invisible Knapsack&rdquo;. Her core observation is structural: privilege is the absence of disadvantage, and absences are invisible to those who live inside them. You do not notice the ease with which you move through a system that was designed for people like you, any more than you notice breathing.</p>
<p>This is not an accusation. It is a description of a structural feature with consequences for self-understanding.</p>
<p>My background is in physics; I now work in universities, having grown up in a household with books and educated parents and the background assumption that higher education was something that people like us did. I was not aware of most of this as an advantage while it was happening, because it did not feel like an advantage — it felt like normal. The awareness came later, with effort, and it remains incomplete.</p>
<p>The invisible entrance fee is what you have already paid, in cultural capital, before you walk through the door. The institution does not ask about it explicitly. It simply rewards those who have it and attributes the reward to merit.</p>
<h2 id="what-this-means-for-accessibility">What This Means for Accessibility</h2>
<p>The previous post in this series argued that full accessibility — <em>Barrierefreiheit</em> — is structurally impossible in a society organised as ours is; that the honest goal is <em>Barrierearmut</em>, the ongoing reduction of barriers. The connection to privilege is direct.</p>
<p>Barriers to education are not only physical. They include the cultural distance between the home environment and the institutional culture. They include not knowing that office hours exist and are meant for you, not just for students with problems. They include the inability to identify as &ldquo;the kind of person who does a PhD&rdquo; because you have never met anyone who did one. They include the exhaustion of navigating a system that requires you to translate yourself at every step, while your better-resourced peers spend that cognitive energy on the actual work.</p>
<p>None of these barriers appear on an accessibility audit. They are not visible from inside the institution looking out. They require actively listening to people whose experience differs from the institutional default, and then being willing to revise the default rather than add an exception.</p>
<p>The PISA gradient in Germany is a measurement of accumulated, unreduced barriers. It is not a measurement of the distribution of talent or effort. The system is producing the outcome; the students are receiving the label.</p>
<h2 id="the-meritocracy-problem">The Meritocracy Problem</h2>
<p>Meritocracy is an appealing concept and a damaging ideology when taken seriously. The appeal: rewards should go to those who earn them, and earning should depend on effort and ability rather than inherited position. This is genuinely better than aristocracy.</p>
<p>The problem: in a society with steep inequality in the distribution of cultural, economic, and social capital, &ldquo;merit&rdquo; is not a neutral measurement. It is a measurement of the match between a person&rsquo;s accumulated resources and the demands of the institution. Calling that match &ldquo;merit&rdquo; names the outcome without naming the process that produced it.</p>
<p>Michael Young, who invented the word &ldquo;meritocracy&rdquo; in 1958, intended it as a satire. His book <em>The Rise of the Meritocracy</em> depicted a dystopia in which the illusion of fairness made inequality more entrenched, not less, because it stripped the legitimacy from those who were left behind. If outcomes are fair, then failure is your fault. The ideology provides the institutional absolution; the individuals bear the moral weight of structural disadvantage.</p>
<p>This is precisely the dynamic that Bourdieu&rsquo;s theory of misrecognition describes. The student from a poorly resourced background who does not reach the outcomes of their better-resourced peer is seen — by themselves, by teachers, by the institution — as less talented or less motivated, rather than as navigating a steeper gradient with fewer tools.</p>
<h2 id="what-institutions-can-actually-do">What Institutions Can Actually Do</h2>
<p>The structural critique is not an argument for fatalism. Institutions can do things that matter.</p>
<p>They can make the hidden curriculum visible — explicitly teaching what is usually assumed. That means orientation programmes that actually explain institutional culture, not just procedures. It means academic writing support that is not remedial but normative. It means mentoring that connects first-generation students with people who understand the landscape.</p>
<p>They can audit their practices for whose default they assume. The timed closed-book exam was designed for a particular set of conditions; asking what it actually measures, and whether there are better instruments, is not lowering standards — it is interrogating what the standard is measuring.</p>
<p>They can diversify their faculty and staff, not as a cosmetic gesture but as a structural change in whose tacit knowledge is embedded in the institution. If the people who design the curriculum all navigated it from the same starting position, the curriculum will encode that starting position as normal.</p>
<p>They can name the entrance fee. Acknowledging that outcomes correlate with background, that this is a systemic feature and not a distribution of merit, is the first step toward taking institutional responsibility for the gradient rather than attributing it to the students.</p>
<p>None of this resolves the structural problem. The structural problem requires political change at scales well beyond any individual institution. But institutions are not passive. They can reduce the barriers they control, while being honest about the ones they do not.</p>
<h2 id="a-personal-note">A Personal Note</h2>
<p>I sit in institutional positions that this analysis would identify as advantaged. I teach in a university. I benefited from the gradient in ways I cannot fully account for. The point of naming this is not guilt; it is responsibility. Being advantaged by a system you did not design does not make you complicit in its worst outcomes — but it does make you responsible for using whatever institutional leverage you have to make the system less exclusive.</p>
<p>The connection to accessibility is this: both inaccessibility and privilege are about whose defaults are built into the system and who is required to adapt to defaults they did not set. Reducing barriers and interrogating privilege are the same project, approached from different angles.</p>
<p>Neither is completable. Both are necessary.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>Bourdieu, P. (1986). The forms of capital. In J.G. Richardson (Ed.), <em>Handbook of Theory and Research for the Sociology of Education</em> (pp. 241–258). Greenwood Press.</li>
<li>Bourdieu, P. &amp; Passeron, J.C. (1977). <em>Reproduction in Education, Society and Culture</em>. Sage. (Original French edition 1970.)</li>
<li>Jackson, P.W. (1968). <em>Life in Classrooms</em>. Holt, Rinehart and Winston.</li>
<li>Lareau, A. (2003). <em>Unequal Childhoods: Class, Race, and Family Life</em>. University of California Press.</li>
<li>McIntosh, P. (1989). White privilege: Unpacking the invisible knapsack. <em>Peace and Freedom</em>, July/August, 10–12.</li>
<li>OECD (2023). <em>PISA 2022 Results (Volume I): The State of Learning and Equity in Education</em>. OECD Publishing.</li>
<li>Young, M. (1958). <em>The Rise of the Meritocracy</em>. Thames and Hudson.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Why 44,100? The Accidental Physics of the CD Sampling Rate</title>
      <link>https://sebastianspicker.github.io/posts/why-44100-hz-cd-sampling-rate/</link>
      <pubDate>Mon, 05 Aug 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/why-44100-hz-cd-sampling-rate/</guid>
      <description>The CD sampling rate is not a round number chosen by committee. It is the direct output of 1970s NTSC and PAL video engineering — and both standards, designed on different continents, converge on exactly the same number.</description>
      <content:encoded><![CDATA[<p><em>44,100 Hz. Not 44,000. Not 48,000. Not even 40,000 or 50,000, which would at least have the virtue of roundness. The number that defines CD-quality audio is specific in a way that invites a question most people never think to ask: why that number?</em></p>
<hr>
<h2 id="the-puzzle">The Puzzle</h2>
<p>When a physical constant turns out to be $1.6 \times 10^{-19}$ coulombs, that is just nature being nature — no further explanation is needed or available. But when an engineering standard settles on 44,100 Hz rather than, say, 44,000 Hz or 45,000 Hz, there is a story hiding in the specificity.</p>
<p>The standard answer — the one you find on Wikipedia and in most popular accounts — is that 44.1 kHz satisfies the Nyquist criterion for 20 kHz audio, and so it was chosen to preserve the full range of human hearing. This is true. It is also almost completely uninformative. The Nyquist criterion for 20 kHz audio requires only that the sampling rate exceed 40 kHz. That constraint is satisfied by 40,001 Hz as much as by 44,100 Hz. The specific value requires a different explanation entirely.</p>
<p>That explanation involves a Sony engineer, a consumer videocassette recorder, and the accidental convergence of two television standards developed independently on different continents. The number 44,100 is not an optimisation. It is an archaeological deposit. And like most archaeological deposits, it is still with us long after the civilisation that created it has disappeared.</p>
<p>I want to work through the physics first, because the Nyquist theorem is genuinely beautiful and is often presented in a way that obscures what it actually says. Then I want to show you the arithmetic that makes 44,100 inevitable given 1970s constraints — and the way NTSC and PAL, designed for completely different reasons, conspire to produce the same number. If you enjoy &ldquo;hidden mathematics in music,&rdquo; you might also find it in <a href="/posts/euclidean-rhythms/">Euclidean Rhythms</a>, where a 2,300-year-old algorithm turns out to encode the structure of West African and Cuban percussion.</p>
<hr>
<h2 id="the-nyquistshannon-sampling-theorem">The Nyquist–Shannon Sampling Theorem</h2>
<p>Before the archaeology, the physics.</p>
<p>In 1928, Harry Nyquist published a paper on telegraph transmission theory that contained, somewhat incidentally, the germ of what would become one of the most consequential theorems in applied mathematics <a href="#ref-4">[4]</a>. Claude Shannon formalised and generalised it in 1949 <a href="#ref-5">[5]</a>. The theorem states: a continuous bandlimited signal whose highest frequency component is $f_{\max}$ can be perfectly reconstructed from discrete samples taken at rate $f_s$ if and only if</p>
$$f_s > 2 f_{\max}.$$<p>The quantity $f_s / 2$ is called the Nyquist frequency. Sampling below it causes <em>aliasing</em>: high-frequency components fold back into the spectrum and appear as spurious low-frequency artefacts that are indistinguishable from genuine signal. Once you have aliased a signal, the damage is permanent. Sampling at or above the Nyquist rate, the theorem says, causes no information loss at all — the original continuous waveform can be recovered exactly, in principle, from the discrete sample sequence.</p>
<p>Human hearing extends from roughly 20 Hz to 20 kHz (and, for most adults over thirty, substantially less at the top end, but 20 kHz is the canonical engineering requirement). Setting $f_{\max} = 20$ kHz, the Nyquist criterion requires $f_s > 40$ kHz.</p>
<p>But here is the subtlety that the Wikipedia summary tends to skip. The theorem assumes that the signal is <em>perfectly</em> bandlimited before sampling — meaning that all energy above $f_{\max}$ has been removed. This requires an <em>anti-aliasing filter</em>: a low-pass filter applied to the analogue signal before the analogue-to-digital converter samples it. If your anti-aliasing filter passes everything up to 20 kHz and blocks everything above it with perfect sharpness, then 40,001 Hz would suffice. The problem is that such a filter is physically unrealisable.</p>
<p>Real filters do not have vertical cutoffs. They have a <em>transition band</em>: a frequency range over which attenuation increases gradually from zero to full suppression. The steeper you want the transition, the higher the filter order, and for practical filter hardware in 1979 — op-amps, capacitors, inductors, no DSP to speak of — a &ldquo;steep enough&rdquo; filter meant a transition band of roughly 10% of the passband edge frequency. For a 20 kHz passband edge, that is about 2 kHz of transition band.</p>
<p>So the actual engineering requirement is not just $f_s > 40$ kHz. It is $f_s > 40$ kHz <em>plus enough headroom for a realisable anti-aliasing filter</em>. With $f_s = 44.1$ kHz, the Nyquist limit sits at $f_s/2 = 22.05$ kHz. The gap between the top of the audio band and the Nyquist limit is</p>
$$22{,}050 - 20{,}000 = 2{,}050 \text{ Hz},$$<p>which is just over 10% of 20 kHz. This is enough to build a practical anti-aliasing filter with 1970s and early 1980s analogue components. Had the sampling rate been 41 kHz, the gap would have been only 500 Hz — far too narrow for affordable hardware. Had it been 50 kHz, the gap would have been more comfortable, but you would be storing 13.6% more data per second for no audible benefit.</p>
<p>So 44.1 kHz is in the right <em>neighbourhood</em> given real-world filter constraints. But it is still a specific number. The question of why 44,100 rather than 44,000 or 43,500 or 44,800 is still open. That is where the VCRs come in.</p>
<hr>
<h2 id="the-vcr-problem">The VCR Problem</h2>
<p>In the late 1970s, Sony was developing what would eventually become the Compact Disc. One of the fundamental engineering problems was storage: where do you put the digital audio data? A 74-minute stereo recording at 16 bits and 44.1 kHz generates roughly 780 megabytes. In 1979, that was an absurd quantity of data. Hard drives with that capacity existed but cost tens of thousands of dollars and weighed as much as a washing machine. Dedicated digital tape formats existed in professional studios but were exotic and expensive <a href="#ref-1">[1]</a>.</p>
<p>The only affordable high-bandwidth magnetic recording medium available to consumer-facing engineers in 1979 was the VCR — the videocassette recorder. VHS and Betamax had recently become consumer products, and the tape and drive mechanism was cheap, reliable, and capable of storing several hours of high-bandwidth video signal. That video signal bandwidth was substantial: enough, in principle, to carry digital audio if you could get it onto the tape in the right form.</p>
<p>Sony&rsquo;s solution was elegant to the point of audacity. Rather than inventing a new tape format, they encoded digital audio samples as a black-and-white pseudo-video signal — patterns of light and dark pixels that a standard VCR recorded without modification, because as far as the VCR was concerned it was just receiving a monochrome video feed. The resulting device, the Sony PCM-1600 (1979), was a standalone unit that sat between a microphone preamplifier and a VCR, converting audio to fake video for recording and back to audio for playback <a href="#ref-3">[3]</a>.</p>
<p>The sampling rate of the audio was now determined not by any audio engineering consideration but by the geometry of the video signal. And the geometry of the video signal was fixed by the television broadcast standard — which brought entirely different historical contingencies into the calculation.</p>
<hr>
<h2 id="the-ntsc-arithmetic">The NTSC Arithmetic</h2>
<p>The NTSC standard — developed in North America and Japan — specifies 30 frames per second and 525 total scan lines per frame. Of those 525 lines, 35 are consumed by the vertical blanking interval (the time needed for the electron beam in a CRT to return from the bottom of the screen to the top). That leaves 490 active lines per frame actually carrying picture information.</p>
<p>Sony packed 3 audio samples into each active scan line. The audio sampling rate is then:</p>
$$f_s = \underbrace{30}_{\text{frames/s}} \times \underbrace{490}_{\text{active lines/frame}} \times \underbrace{3}_{\text{samples/line}} = 44{,}100 \text{ Hz}.$$<p>There it is. 44,100 Hz, emerging not from any consideration of human hearing or filter design, but from the frame rate and line count of the North American television standard.</p>
<hr>
<h2 id="the-pal-arithmetic">The PAL Arithmetic</h2>
<p>Now the European video standard, PAL, which was developed in the 1960s independently of NTSC and optimised for different priorities. PAL uses 25 frames per second and 625 total scan lines per frame. The vertical blanking interval consumes 37 lines, leaving 588 active lines per frame.</p>
<p>Sony packed 3 audio samples into each active PAL scan line as well. The sampling rate:</p>
$$f_s = \underbrace{25}_{\text{frames/s}} \times \underbrace{588}_{\text{active lines/frame}} \times \underbrace{3}_{\text{samples/line}} = 44{,}100 \text{ Hz}.$$<p>The same number.</p>
<p>Let that settle for a moment. NTSC: 30 frames per second, 490 active lines. PAL: 25 frames per second, 588 active lines. Different frame rates. Different line counts. Developed on different continents for different broadcast environments. And yet $30 \times 490 = 25 \times 588 = 14{,}700$, so multiplying by 3 gives 44,100 in both cases.</p>
<p>This is not coincidence in any deep sense — NTSC and PAL were both designed to fill approximately the same video bandwidth, just with different tradeoffs between temporal resolution (frame rate) and spatial resolution (line count). But for Sony&rsquo;s VCR encoding scheme, the numerical convergence was enormously convenient: a single PCM processor running at 44.1 kHz could record to either NTSC or PAL video equipment without any change to the audio electronics. The same master machine could work in Tokyo and in Frankfurt.</p>
<p>The arithmetic is, I think, one of those moments where a coincidence that is perfectly explicable in hindsight still feels satisfying in the way that a physical derivation feels satisfying. You set up the constraints — fill the video bandwidth, pack an integer number of samples per line, keep the number of samples small enough to fit in a line&rsquo;s worth of data — and the number 44,100 falls out of two independent calculations like a constant of nature. It is not a constant of nature. It is a contingent product of mid-twentieth-century broadcast engineering. But the mathematics does not care.</p>
<hr>
<h2 id="from-tape-to-disc">From Tape to Disc</h2>
<p>When Philips and Sony sat down to negotiate the Red Book standard — the technical specification for the Compact Disc, finalised in 1980 and commercially launched in 1982 — both companies brought existing infrastructure to the table <a href="#ref-3">[3]</a>. Both had been building digital audio equipment for several years. Both had PCM processors running in professional studios. Both had catalogues of digital masters recorded on VCR tape. And all of that equipment ran at 44.1 kHz, because all of it had been built to interface with the video tape standard that made digital audio recording practically affordable in the first place.</p>
<p>Changing the sampling rate for the CD would have required rebuilding the entire mastering chain: new PCM processors, new format conversion hardware, new master tape libraries. The economic and logistical cost would have been enormous. The 44.1 kHz rate was not chosen for the CD because it was optimal in any absolute engineering sense. It was chosen because it was already there <a href="#ref-1">[1]</a>, <a href="#ref-2">[2]</a>.</p>
<p>This is a pattern worth recognising. Major technical standards are rarely chosen by optimisation from first principles. They are chosen by consolidating what already exists. The QWERTY keyboard layout was optimised for typewriter mechanisms that no longer exist. The 60 Hz AC frequency in North America was set by Westinghouse generators installed in the 1890s. The 44.1 kHz CD sampling rate was set by VCR tape recorders that were obsolete within a decade of the CD&rsquo;s launch.</p>
<hr>
<h2 id="the-other-rates">The Other Rates</h2>
<p>Not all digital audio runs at 44.1 kHz, and the coexistence of different rates in the modern audio industry is the direct legacy of 44.1 kHz&rsquo;s awkward origins.</p>
<p><strong>48 kHz</strong> is the professional broadcast and studio standard. It is used in digital video, in DAT tape, in most professional audio interfaces, and in the digital audio embedded in broadcast television signals — including, as a matter of course, in the digital television infrastructure described in the context of university video platforms like <a href="/posts/educast-nrw-hochschul-youtube/">educast.nrw</a>. Why 48? Broadcast infrastructure needed a rate that had clean integer relationships with the 32 kHz rate used in early satellite and ISDN broadcast systems. The relationship $48 = \frac{3}{2} \times 32$ is exact, making synchronisation straightforward. 44.1 kHz has no such clean relationship with anything in broadcast engineering.</p>
<p>The ratio between the two dominant rates is $48 / 44.1 = 160 / 147$. This fraction — irreducible, inelegant, non-obvious — is the source of essentially every sample-rate conversion problem in audio post-production. When a CD master (44.1 kHz) is prepared for broadcast (48 kHz), a sample-rate converter must interpolate 147 samples up to 160 samples, or downsample 160 samples to 147, at every moment. The process introduces small errors, and doing it well requires significant computational effort. Every time a musician&rsquo;s recording moves between the consumer and professional audio worlds, it passes through this fractional bottleneck. Two standards that could have been made compatible were instead set by completely independent historical processes, and we have been paying the computational tax ever since.</p>
<p><strong>96 kHz and 192 kHz</strong> are marketed as &ldquo;high-resolution audio.&rdquo; Here the physics gets genuinely murky and the claims made by the audio industry deserve some scepticism. Human hearing above 20 kHz is, for most adults, genuinely absent — not reduced, but absent, because the outer hair cells in the cochlea that respond to those frequencies progressively die from the teenage years onward and are not replaced. The argument for high sampling rates is typically one of two things: first, that ultrasonic content can cause <em>intermodulation distortion</em>, where sum and difference frequencies of ultrasonic components fall back into the audible band; second, that a higher sampling rate allows for a more relaxed anti-aliasing filter with better phase behaviour within the audible band.</p>
<p>Both effects are real and measurable in laboratory conditions. Whether they are <em>audible</em> under controlled double-blind listening conditions is a separate and more contested question. The published evidence is not strong. What is not contested is that 96 kHz files are twice the size of 44.1 kHz files, and 192 kHz files are more than four times the size, for the same bit depth and the same number of audio channels. Whether that storage cost buys anything audible is, as of the current state of the literature, an open question.</p>
<hr>
<h2 id="the-irony">The Irony</h2>
<p>Here is the situation we are actually in. The canonical digital audio format — 16-bit, 44.1 kHz PCM, the format that defined CD quality for a generation and that remains the standard for music distribution — is physically a photograph of analogue video tape. The digitisation of music was made possible by television engineering. The specific number that defines the fidelity of every CD ever pressed is determined by the frame rates and line counts of 1970s broadcast television standards, which were themselves determined by the capabilities of 1940s CRT technology and the political negotiations of early broadcast licensing bodies.</p>
<p>When someone tells you that 44.1 kHz is the &ldquo;natural&rdquo; or &ldquo;perfect&rdquo; sampling rate for audio, they are, without knowing it, paying tribute to the NTSC standards committee of 1941 and the PAL engineers of the 1960s. The number carries history in it the way a fossil carries the structure of a long-dead organism. It is the right number, in the sense that it works. Its rightness has nothing to do with the reasons it was chosen.</p>
<p>I find this genuinely satisfying rather than disappointing. The history of physics and engineering is full of contingent numbers that turned out to be good enough, and whose goodness was only rationalised after the fact. The metre was originally defined as one ten-millionth of the distance from the equator to the North Pole along the Paris meridian — an arbitrary geodetic choice that turned out to produce a unit of length that is remarkably convenient for human-scale physics. The kilogram was a cylinder of platinum-iridium alloy in a vault outside Paris for over a century. 44,100 Hz is in good company.</p>
<hr>
<h2 id="the-archaeology-of-a-number">The Archaeology of a Number</h2>
<p>The numbers we inherit from engineering history are rarely arbitrary at every level simultaneously. 44,100 Hz is not arbitrary at the level of sampling theory: it satisfies the Nyquist criterion with enough headroom for a physically realisable anti-aliasing filter, given 1970s component technology. That is a genuine constraint, and the number sits in the right region of parameter space for it.</p>
<p>But it is arbitrary at a deeper level: it is the specific number that happened to fit a video tape format that happened to be affordable in 1979, a format that was itself determined by broadcast standards that were set for entirely unrelated reasons decades earlier. The chain of contingencies runs: 1940s television engineering defines NTSC and PAL frame rates and line counts; 1970s consumer VCR technology makes those tape formats cheap; 1979 Sony engineers encode digital audio as fake video; the arithmetic of the video formats fixes the sampling rate at 44,100 Hz; that rate gets locked into the CD standard in 1980; 44.1 kHz becomes the defining frequency of a digital music format that ships billions of units over the following four decades.</p>
<p>Science and engineering produce exact numbers from messy contingencies. The number 44,100 is simultaneously a theorem output (it satisfies a well-defined engineering constraint), a historical accident (it is determined by the specific video tape hardware that existed in 1979), and an institutional fossil (it outlasted the VCRs that created it by four decades and counting). All three things are true at the same time.</p>
<p>The VCRs are gone. The sampling rate remains.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Pohlmann, K. C. (2010). <em>Principles of Digital Audio</em> (6th ed.). McGraw-Hill.</p>
<p><span id="ref-2"></span>[2] Watkinson, J. (2001). <em>The Art of Digital Audio</em> (3rd ed.). Focal Press.</p>
<p><span id="ref-3"></span>[3] Immink, K. A. S. (1998). The compact disc story. <em>Journal of the AES</em>, 46(5), 458–465.</p>
<p><span id="ref-4"></span>[4] Nyquist, H. (1928). Certain topics in telegraph transmission theory. <em>Transactions of the AIEE</em>, 47(2), 617–644.</p>
<p><span id="ref-5"></span>[5] Shannon, C. E. (1949). Communication in the presence of noise. <em>Proceedings of the IRE</em>, 37(1), 10–21.</p>
]]></content:encoded>
    </item>
    <item>
      <title>How Cats Drink: Inertia, Gravity, and the Froude Number at the Tip of a Tongue</title>
      <link>https://sebastianspicker.github.io/posts/how-cats-drink-froude-number/</link>
      <pubDate>Mon, 22 Jul 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/how-cats-drink-froude-number/</guid>
      <description>Cats do not scoop water with their tongues — they exploit a delicate balance between inertia and gravity at the air-water interface. The tip of the tongue just touches the surface; rapid withdrawal pulls a fluid column upward; the jaw closes at exactly the moment the column peaks. Reis, Jung, Aristoff, and Stocker (Science, 2010) showed that the lapping frequency of all felids — from domestic cats to lions — is tuned so that the Froude number at the tongue tip is approximately unity.</description>
      <content:encoded><![CDATA[<p><em>I have spent a non-trivial amount of time watching our cats drink — they are
indoor-only cats, on our vet&rsquo;s advice, which gives them few distractions and
gives me ample opportunity to observe. This is not entirely voluntary. Once you have noticed that something is happening at the
water bowl that does not look right — the tongue moves too fast, the water
column is pulled upward rather than scooped, the jaw closes before the tongue
returns — you find yourself crouching beside the bowl with your phone propped
against a chair, filming at 240 frames per second and feeling that you have
perhaps chosen an unusual way to spend a Tuesday morning.</em></p>
<p><em>Pedro Reis, Sunghwan Jung, Jeffrey Aristoff, and Roman Stocker had the same
impulse, with better equipment. Their 2010 paper in Science, &ldquo;How Cats Lap:
Water Uptake by Felis catus,&rdquo; is one of the more elegant pieces of dimensional
analysis in recent biology.</em></p>
<hr>
<h2 id="how-cats-do-not-drink">How Cats Do Not Drink</h2>
<p>The simplest hypothesis — that cats curl the tongue into a spoon and scoop
water into the mouth — is false. High-speed photography shows that the cat&rsquo;s
tongue does not form a cup shape. Instead, the cat extends the tongue tip
downward toward the water surface and then rapidly retracts it. The motion is
fast — too fast for normal video — and the tongue barely contacts the surface.</p>
<p>The contrast with dogs is instructive. Dogs <em>do</em> scoop: the tongue curls
backward (not forward), forming a ladle shape that scoops water upward and
backwards into the mouth. The mechanism is vigorous and inefficient — a
significant fraction of the water misses the mouth entirely, which is why
drinking dogs produce splashing and dogs often have wet chins. The mechanism
works but is inelegant.</p>
<p>Cats produce almost no splash. The mechanism is different in kind.</p>
<hr>
<h2 id="the-physical-mechanism">The Physical Mechanism</h2>
<p>Reis et al. (2010) used high-speed photography (1000 frames per second) to
resolve the cat&rsquo;s lapping motion. Their observations:</p>
<ol>
<li>
<p>The cat extends the tongue tip downward until the <em>dorsal surface</em> (the top
side) just touches the water surface. The ventral surface (the smooth
underside) does not contact the water.</p>
</li>
<li>
<p>The cat then rapidly retracts the tongue upward. The tongue tip is moving
at roughly $v \approx 0.7\,\mathrm{m/s}$ during this retraction.</p>
</li>
<li>
<p>As the tongue tip pulls away from the surface, a column of liquid is pulled
upward by the adhesion between the liquid and the retreating tongue. The
column rises against gravity.</p>
</li>
<li>
<p>The column eventually stalls — inertia is overcome by gravity — and begins
to fall back. The cat closes its jaw at exactly the moment of maximum column
height, capturing the peak volume of water.</p>
</li>
<li>
<p>The cat then extends the tongue for the next lap.</p>
</li>
</ol>
<p>The cat closes its jaw before the tongue fully retracts. This is important:
the jaw closure captures the water column, not the water adhering to the tongue.
The tongue is the mechanism that <em>creates</em> the column; the jaw captures it.</p>
<hr>
<h2 id="dimensional-analysis-the-froude-number">Dimensional Analysis: The Froude Number</h2>
<p>The relevant competition is between <strong>inertia</strong> (which drives the column
upward) and <strong>gravity</strong> (which pulls it back down). Surface tension plays a
role in stabilising the column but is not the primary factor governing the
column height.</p>
<p>The balance between inertia and gravity for a fluid column moving at speed
$v$ and of characteristic length scale $L$ (here, the diameter of the tongue
tip, $L \approx 5\,\mathrm{mm}$ for a domestic cat) is captured by the
<strong>Froude number</strong>:</p>
$$\mathrm{Fr} = \frac{v}{\sqrt{gL}},$$<p>where $g = 9.81\,\mathrm{m/s}^2$ is gravitational acceleration.</p>
<p>When $\mathrm{Fr} \ll 1$: gravity dominates, inertia is insufficient to pull a
significant column of water upward. Very slow tongue motion would lift almost
no water.</p>
<p>When $\mathrm{Fr} \gg 1$: inertia dominates, the column rises far above the
surface but the jaw must be closed quickly before the large amount of water
falls back. Very fast tongue motion wastes water and requires rapid jaw closure.</p>
<p>The optimal lapping frequency — maximising captured volume per lap — occurs
near $\mathrm{Fr} \approx 1$, where inertial and gravitational forces are
comparable and the column height is matched to the jaw closure dynamics.</p>
<h3 id="checking-the-numbers-for-a-domestic-cat">Checking the Numbers for a Domestic Cat</h3>
<p>For a domestic cat:</p>
<ul>
<li>Tongue tip diameter: $L \approx 5\,\mathrm{mm} = 5 \times 10^{-3}\,\mathrm{m}$</li>
<li>Characteristic tongue tip speed: $v \approx 0.7\,\mathrm{m/s}$</li>
</ul>
$$\mathrm{Fr} = \frac{0.7}{\sqrt{9.81 \times 5 \times 10^{-3}}}
= \frac{0.7}{\sqrt{0.049}} = \frac{0.7}{0.22} \approx 3.2.$$<p>Reis et al. found Fr of order unity — inertial and gravitational forces
comparable — confirming that the lapping speed is tuned to the inertia-gravity
balance. (The exact numerical value depends on the choice of characteristic
length scale; using the tongue tip diameter as above gives Fr in the range
1–3, squarely in the regime where neither force dominates.)</p>
<hr>
<h2 id="scaling-across-felids">Scaling Across Felids</h2>
<p>The Froude number prediction yields a scaling law for lapping frequency across
felid species of different sizes. If all felids lap at $\mathrm{Fr} \approx 1$,
then the characteristic speed scales as $v \sim \sqrt{gL}$, and the lapping
frequency scales as:</p>
$$f = \frac{v}{d} \sim \frac{\sqrt{gL}}{d},$$<p>where $d$ is the distance the tongue travels per lap (roughly proportional to
tongue length, which scales with body size). Since $L \sim d$ scales with body
size, we get:</p>
$$f \sim \frac{\sqrt{g \cdot d}}{d} = \sqrt{\frac{g}{d}} \propto d^{-1/2}.$$<p>Larger cats have longer tongues and lap more slowly. The prediction is that
lapping frequency scales as the square root of inverse tongue length — or,
equivalently, as the inverse square root of body mass (since linear dimensions
scale as mass$^{1/3}$):</p>
$$f \propto m^{-1/6}.$$<p>Reis et al. tested this against high-speed footage of large felids. A domestic
cat laps at approximately $4\,\mathrm{Hz}$; a lion laps at approximately
$1.2\,\mathrm{Hz}$; a tiger at roughly $1\,\mathrm{Hz}$. The scaling is
consistent with $f \propto m^{-1/6}$ across three orders of magnitude in
body mass.</p>
<p>The table below shows the predicted versus observed scaling:</p>
<table>
  <thead>
      <tr>
          <th>Species</th>
          <th>Body mass (kg)</th>
          <th>Predicted $f$ relative to cat</th>
          <th>Predicted $f$ (Hz)</th>
          <th>Observed $f$ (Hz)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Domestic cat</td>
          <td>4</td>
          <td>1.0</td>
          <td>4.0</td>
          <td>~4.0</td>
      </tr>
      <tr>
          <td>Jaguar</td>
          <td>80</td>
          <td>$\left(\frac{4}{80}\right)^{1/6} \approx 0.61$</td>
          <td>2.4</td>
          <td>~2.0</td>
      </tr>
      <tr>
          <td>Lion</td>
          <td>200</td>
          <td>$\left(\frac{4}{200}\right)^{1/6} \approx 0.52$</td>
          <td>2.1</td>
          <td>~1.5</td>
      </tr>
      <tr>
          <td>Tiger</td>
          <td>220</td>
          <td>$\left(\frac{4}{220}\right)^{1/6} \approx 0.51$</td>
          <td>2.1</td>
          <td>~1.0</td>
      </tr>
  </tbody>
</table>
<p>The $m^{-1/6}$ scaling captures the correct trend — larger cats lap more
slowly — though the predicted frequencies for the largest cats somewhat
overestimate the observed values. The discrepancy may reflect the limitations
of the simple allometric assumption (that all linear dimensions scale as
$m^{1/3}$) and the fact that tongue geometry does not scale isometrically
across the full range of felid body sizes.</p>
<hr>
<h2 id="why-not-just-lick">Why Not Just Lick?</h2>
<p>A natural question: why not simply allow the tongue to fully submerge and
absorb water through the papillae, as the tongue already contacts water when
lapping? Several answers:</p>
<ol>
<li>
<p><strong>Papillae are not sponges.</strong> Feline papillae are hollow and scoop-shaped
(filiform papillae with hollow tips), optimised for grooming and food
manipulation, not passive absorption. Active wicking is limited.</p>
</li>
<li>
<p><strong>The cat cannot breathe with its mouth submerged.</strong> A lapping mechanism
that keeps the mouth mostly closed except for the brief jaw-closure moment
allows continuous breathing through the nose during drinking.</p>
</li>
<li>
<p><strong>Speed and efficiency.</strong> The inertial column mechanism delivers significantly
more water per jaw movement than surface tension adhesion alone. At 4 laps
per second, a domestic cat takes in roughly $0.14\,\mathrm{mL}$ per lap,
for a total of roughly $34\,\mathrm{mL/min}$ — comparable to sipping rates
in animals that use more direct intake mechanisms.</p>
</li>
</ol>
<p>The cat has converged on a hydrodynamically optimal strategy under the
constraint of keeping the oral cavity mostly sealed during the intake cycle.</p>
<hr>
<h2 id="the-robotic-tongue">The Robotic Tongue</h2>
<p>Reis et al. constructed a robotic cat tongue to verify the mechanism: a smooth
glass disc lowered to the water surface and retracted at controlled speeds.
The column height as a function of speed followed the predicted inertia-gravity
balance, confirming that the mechanism does not depend on any specifically
biological property of the tongue — it is a fluid dynamics result that applies
to any surface moving away from a water interface at the right speed.</p>
<p>The robot lapped at the same Froude number as the cat.</p>
<hr>
<h2 id="dogs-horses-and-the-comparison">Dogs, Horses, and the Comparison</h2>
<p>Dogs cup the tongue <em>caudally</em> (backwards) rather than ventrally, forming a
ladle. The mechanism is faster and delivers more water per stroke but is
messy — the ladle is formed outside the mouth, and water sloshes freely. Dogs
lap at roughly $3\,\mathrm{Hz}$ with a tongue tip speed significantly higher
than cats, producing Fr well above unity. The excess inertia is why dog
drinking generates splashing.</p>
<p>Horses, by contrast, create a near-seal with their lips and use suction —
a fundamentally different mechanism that requires no tongue projection at all.
The lapping mechanism of felids is phylogenetically specific and appears to
have evolved under selection pressure for both efficiency and noise suppression,
consistent with the ambush-predator lifestyle. A cat that splashed while
drinking would alert prey at a water source. A cat that laps near-silently
does not.</p>
<hr>
<h2 id="a-note-on-the-measurement">A Note on the Measurement</h2>
<p>Getting reliable high-speed footage of a cat drinking is harder than it sounds.
Our cats drink at different times of day, in different moods, and the presence
of a camera tripod next to the water bowl is regarded as grounds for drinking
elsewhere. Pedro Reis et al. solved this by filming their laboratory cat, Cutta
Cutta, in a controlled setting. Their footage is available online and is
genuinely beautiful: a slow-motion waterfall in miniature, rising improbably
from the tongue tip and held there by the balance between upward momentum and
downward gravity, until the jaw swings shut.</p>
<p>The physics is in the timing.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Reis, P.M., Jung, S., Aristoff, J.M., &amp; Stocker, R. (2010). How cats lap:
Water uptake by <em>Felis catus</em>. <em>Science</em>, 330(6008), 1231–1234.
<a href="https://doi.org/10.1126/science.1195421">https://doi.org/10.1126/science.1195421</a></p>
</li>
<li>
<p>Aristoff, J.M., Stocker, R., Jung, S., &amp; Reis, P.M. (2011). On the water
lapping of felines and the water running of lizards. <em>Communicative &amp;
Integrative Biology</em>, 4(2), 213–215.</p>
</li>
<li>
<p>Vogel, S. (1994). <em>Life in Moving Fluids: The Physical Biology of Flow</em>
(2nd ed.). Princeton University Press.</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-15</strong>: Updated water intake per lap from 0.04 mL to 0.14 mL (Reis et al. report ~0.14 +/- 0.04 mL per lap; the previous value was the standard deviation), and updated the intake rate accordingly (~34 mL/min). Updated the papillae location from ventral to dorsal surface. Updated the Aristoff et al. reference to the correct 2011 <em>Communicative &amp; Integrative Biology</em> article. Removed the Jung &amp; Kim (2012) PRL reference (article number 034501 resolves to a different paper).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Boring Parts of Networked Music Performance</title>
      <link>https://sebastianspicker.github.io/posts/digital-music-labs-infrastructure/</link>
      <pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/digital-music-labs-infrastructure/</guid>
      <description>A follow-up to the August 2023 latency post. The numbers were fine. The hard part turned out to be everything else: governance, maintenance, invisible labour, and why most Digital Music Labs quietly die after the grant ends.</description>
      <content:encoded><![CDATA[<p><em>This post is based on a manuscript in progress with colleagues from the
RAPP Lab network. It builds directly on the <a href="/posts/nmp-latency-lola-mvtp/">August 2023 latency measurements</a>. That post covered what the
numbers look like. This one covers why getting to those numbers was the
easy part.</em></p>
<hr>
<h2 id="the-setup">The Setup</h2>
<p>After spending two and a half years measuring latency across six European
research-network links, I can tell you that the audio numbers are achievable.
7.5 to 22.5 ms one-way across Prague to Tallinn, LoLa and MVTP both working,
musicians playing together across national borders in real time. Technically,
that story has a satisfying ending.</p>
<p>What the measurement paper does not capture is everything that had to be true
institutionally before we could run a single test. The firewall negotiations.
The repeated calibration sessions. The network configuration that nobody
outside our small group knew how to reproduce when someone left. The grant that
funded the equipment but not the person who kept it running. The performance
session that nearly collapsed because a campus IT update had silently changed a
routing rule three days prior.</p>
<p>The technical infrastructure worked. The institutional infrastructure around it
was precarious in ways that only became visible when something broke.</p>
<p>This is what the follow-up paper tries to name.</p>
<hr>
<h2 id="what-is-a-digital-music-lab-actually">What Is a Digital Music Lab, Actually?</h2>
<p>The term gets applied to everything from a laptop cart in a classroom to
IRCAM Paris. We use it to mean something specific: a <strong>Digital Music Lab
(DML)</strong> is a hybrid environment where space, equipment, software, personnel
and organisational routines are configured together to support iterative
artistic experimentation, research-led learning and outward-facing engagement.</p>
<p>The key word is <em>configured together</em>. A room full of excellent hardware is not
a DML any more than a library is just a building full of books. What makes
either work is an invisible layer of social organisation: access policies,
shared norms, maintained documentation, people who know what to do when
something breaks.</p>
<p>We borrow a concept from infrastructure studies to describe this:
<strong>performative infrastructure</strong>. The concept draws on Star and Ruhleder (1996),
and it captures something precise — that infrastructure does not merely
<em>enable</em> activity, it also <em>shapes</em> what kinds of activity are possible in the
first place. The decision to use LoLa rather than Zoom is not just a technical
choice; it is an institutional statement about what kind of musical interaction
this space is designed to support, and about who is expected to use it.</p>
<p>This framing matters because it shifts the design question. You are not asking
&ldquo;what equipment should we buy?&rdquo; You are asking &ldquo;what kind of practice do we
want to make possible, and what organisational conditions make that practice
sustainable?&rdquo;</p>
<hr>
<h2 id="four-things-that-actually-determine-whether-a-dml-survives">Four Things That Actually Determine Whether a DML Survives</h2>
<h3 id="1-flexible-by-design-not-by-accident">1. Flexible by design, not by accident</h3>
<p>Resilient labs resist the temptation to optimise for one use case. The systems
that have lasted — Stanford CCRMA is the obvious reference point, nearly five decades
and counting — tend to separate a stable core (networking, routing,
authentication, documentation) from a more rapidly changing layer of creative
tools and workflows. The core does not change when you switch DAWs or update
your streaming platform. The tools on top of it can.</p>
<p>This sounds obvious. In practice it means being deliberate about which
dependencies you are willing to accept. A lab built on a single vendor
ecosystem can offer tight integration, but it creates a single point of
failure and a maintenance contract you will be negotiating forever. A lab built
on open protocols and well-documented configurations is more work to set up and
less work to sustain.</p>
<p>The other thing flexibility buys is pedagogical range. The same environment
can host an introductory workshop, an advanced performance-research project and
a public-facing concert without requiring incompatible reconfiguration for each.
This is not a luxury. It is what makes a DML worth the overhead compared to
just booking a studio.</p>
<h3 id="2-governance-that-survives-personnel-turnover">2. Governance that survives personnel turnover</h3>
<p>The single most dangerous sentence in any DML is: <em>&ldquo;We can ask [person] — they
know how it works.&rdquo;</em></p>
<p>Every lab has that person. The one who configured the routing. The one who
knows which cable does what. The one who has the institutional memory of every
workaround and edge case. When that person moves on, the lab frequently becomes
unreliable within six months and functionally inaccessible within a year — even
if all the equipment is still there. We call these <strong>zombie infrastructures</strong>:
technically present, functionally dead.</p>
<p>The corrective is not to document everything (though that helps). It is to
design governance so that knowledge is distributed by default. Distributed
stewardship roles — student assistants, rotating committees, peer mentors —
mean that multiple people develop operational knowledge as a matter of routine,
not as emergency knowledge transfer when someone announces they are leaving.</p>
<p>Technical staff need to be treated as co-creators in this model, not as
service providers. When networked performance is framed as peripheral
experimentation rather than core infrastructure, maintenance becomes precarious
and invisible. When it is framed as core, collaboration between artistic and
technical roles becomes institutional routine.</p>
<h3 id="3-maintenance-as-a-budget-line-not-an-afterthought">3. Maintenance as a budget line, not an afterthought</h3>
<p>Here is the infrastructure paradox: systems are valued for enabling novelty,
but they require boring, recurring investment to remain usable. Project funding
solves the novelty problem. It almost never solves the maintenance problem.</p>
<p>The costs that make a lab reliable are not one-off:</p>
<ul>
<li>Staff continuity (or explicit knowledge transfer when staff change)</li>
<li>Documentation that is actively maintained, not written once and forgotten</li>
<li>Renewal cycles for hardware and software that actually match the pace of
change in the underlying ecosystem</li>
<li>User support during active sessions, not just during setup</li>
</ul>
<p>At HfMT Köln, the operational work that dominated actual implementation time
was none of the things that appear in grant applications: coordinating network
pathways across campus boundaries, establishing and re-establishing calibration
routines after infrastructure updates, producing documentation legible to
people who were not present at the original setup, providing real-time support
during rehearsals when something behaved unexpectedly.</p>
<p>None of this is glamorous. All of it is what determines whether musicians can
actually use the system on a given Tuesday afternoon.</p>
<h3 id="4-inclusion-that-is-designed-not-assumed">4. Inclusion that is designed, not assumed</h3>
<p>Technology-intensive environments reproduce exclusion reliably unless they are
actively designed not to. The mechanisms are familiar: assumed prior
experience, cultural signals about who belongs, scheduling that conflicts with
caring responsibilities, documentation in a single language, interfaces that
reward a particular kind of technical confidence.</p>
<p>For DMLs specifically, there is an additional layer. Networked music performance
is genuinely different from co-located performance. The latency conditions
require different listening and coordination strategies. For musicians trained
in tight synchronous ensemble playing, the first experience of performing over
a network is often disorienting — latency is not a technical glitch to be fixed,
it is a compositional condition to be understood and worked with.</p>
<p>Framing this as a deficit is pedagogically counterproductive. Framing it as an
occasion to develop new artistic vocabulary — to think deliberately about what
interaction strategies work at 12 ms versus 22 ms, about how anticipatory
listening changes the character of improvisation — turns an obstacle into
content. Some of the most interesting musical thinking in our sessions came
from participants who were trying to understand why something that was
effortless in a rehearsal room required conscious attention over the network.</p>
<hr>
<h2 id="the-tensions-that-do-not-resolve">The Tensions That Do Not Resolve</h2>
<p>Being honest about what the paper does not solve:</p>
<p><strong>Project funding versus operational costs.</strong> We do not have a structural
solution to the mismatch between how labs are funded (innovation grants with
defined end dates) and how they need to operate (indefinitely, with predictable
maintenance budgets). Collaborative purchasing agreements and shared technical
teams across institutions can distribute the burden, but they introduce
coordination overhead. There is no clean answer here.</p>
<p><strong>Experimentation versus accountability metrics.</strong> Universities and funders
want quantifiable outputs. Artistic experimentation often produces its most
valuable results as changed practices and new aesthetic understanding — things
that do not appear in publication counts or utilisation statistics. The best
available response is to be explicit about this mismatch when negotiating
evaluation criteria, and to establish review processes that include artistic
peers and community partners rather than only administrators. This is possible
more often than people think, but it requires someone to argue for it
proactively.</p>
<p><strong>Openness versus depth.</strong> A lab built for maximum accessibility is not the
same as a lab optimised for a specific research agenda, and trying to be both
usually means doing neither well. The design question is not which is better
but where the tradeoff lies for a particular institution&rsquo;s mission. CCRMA and
IRCAM have made different bets on this axis over decades and both have produced
important work. The mistake is not having an opinion about where you sit on
the spectrum.</p>
<hr>
<h2 id="recommendations">Recommendations</h2>
<p>These are for institutions and funders, assembled from what the paper
describes as working across multiple DML contexts:</p>
<ul>
<li><strong>Treat DMLs as long-term cultural infrastructure.</strong> Recurring budget lines
for renewal, documentation and support — not just start-up funding.</li>
<li><strong>Separate your stable backbone from your creative tools.</strong> Networking,
routing, authentication and documentation should not be rebuilt every time
you change your video platform.</li>
<li><strong>Design governance that does not rely on one person.</strong> Distributed
stewardship roles, clear succession documentation, operational knowledge
treated as shared rather than individual.</li>
<li><strong>Make invisible labour visible.</strong> Technical stewardship, facilitation and
community liaison need to appear in hiring, workload models and evaluation
— not just in informal practice.</li>
<li><strong>Lower the floor for participation.</strong> Scaffolded onboarding, peer mentoring,
programming that supports diverse musical practices and levels of technical
experience.</li>
<li><strong>Sort out data governance before you start recording.</strong> Consent, archiving
and reuse policies for audio/video, especially when community partners or
students are involved.</li>
<li><strong>Plan for the lab&rsquo;s eventual obsolescence.</strong> Versioning policies, migration
plans, criteria for retiring tools. Zombie infrastructures are a governance
failure, not a technical one.</li>
<li><strong>Evaluate on multiple axes.</strong> Technical reliability is one. Learning
trajectories, student agency, community partnership durability and artistic
outcomes are others. Reporting only the first one creates a misleading
picture of whether the lab is actually working.</li>
</ul>
<hr>
<h2 id="what-this-does-and-does-not-claim">What This Does and Does Not Claim</h2>
<p>The argument in the paper is conceptual and practice-informed rather than
empirical in the standard sense. We synthesise literature and draw on the
HfMT Köln implementation as a vignette — it is an illustration, not a
representative sample. The framework we propose (four design principles, the
performative infrastructure framing) is offered as an analytical vocabulary
for planning and evaluation, not as a validated theory.</p>
<p>What it is useful for: making implicit infrastructure choices explicit, naming
tensions before they become crises, and supporting more realistic conversations
between artistic users, technical staff and institutional leadership about what
it actually takes to make this work.</p>
<hr>
<h2 id="references">References</h2>
<p>Borgdorff, H. (2012). <em>The Conflict of the Faculties: Perspectives on
Artistic Research and Academia.</em> Leiden University Press.</p>
<p>Labbé, D., Zuberec, C., &amp; Turner, S. (2022). Creative hubs in Hanoi,
Vietnam: Transgressive spaces in a socialist state? <em>Urban Studies</em>.
<a href="https://doi.org/10.1177/00420980221086371">https://doi.org/10.1177/00420980221086371</a></p>
<p>McKay, G. (2017). Community music: History and current practice.
<em>International Journal of Community Music</em>, 10(2), 129–137.
<a href="https://doi.org/10.1386/ijcm.10.2.129_1">https://doi.org/10.1386/ijcm.10.2.129_1</a></p>
<p>Morreale, F., Bowers, J., &amp; McPherson, A. (2021). Collaborating in
distributed musical partnerships. <em>Computers in Human Behavior</em>, 120,
106757. <a href="https://doi.org/10.1016/j.chb.2021.106757">https://doi.org/10.1016/j.chb.2021.106757</a></p>
<p>Selwyn, N. (2021). <em>Education and Technology: Key Issues and Debates</em>
(3rd ed.). Bloomsbury Academic.</p>
<p>Star, S. L., &amp; Ruhleder, K. (1996). Steps toward an ecology of
infrastructure. <em>Information Systems Research</em>, 7(1), 111–134.
<a href="https://doi.org/10.1287/isre.7.1.111">https://doi.org/10.1287/isre.7.1.111</a></p>
<p>Wenger, E. (1998). <em>Communities of Practice: Learning, Meaning, and
Identity.</em> Cambridge University Press.
<a href="https://doi.org/10.1017/CBO9780511803932">https://doi.org/10.1017/CBO9780511803932</a></p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-20</strong>: Removed the Chafe (2018) &ldquo;Stanford CCRMA: A 40-year retrospective&rdquo; reference, which could not be confirmed in available databases (DOI does not resolve, not listed in <em>Computer Music Journal</em> 42(3)). The body text reference to CCRMA as an institutional example is retained; it does not depend on this citation.</li>
<li><strong>2026-01-20</strong>: Changed &ldquo;The term comes from Star and Ruhleder (1996)&rdquo; to &ldquo;The concept draws on Star and Ruhleder (1996).&rdquo; Star and Ruhleder&rsquo;s paper is the foundational text on relational infrastructure, but they did not coin the specific compound term &ldquo;performative infrastructure.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>There Is No Such Thing as Full Accessibility — Only Barrier Reduction</title>
      <link>https://sebastianspicker.github.io/posts/no-such-thing-as-full-accessibility/</link>
      <pubDate>Fri, 10 May 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/no-such-thing-as-full-accessibility/</guid>
      <description>The German word &amp;lsquo;Barrierefreiheit&amp;rsquo; promises freedom from barriers. That promise is structurally impossible. What we can achieve is Barrierearmut — a reduction of barriers. The difference is not semantic; it has consequences for policy, design, and institutional honesty.</description>
      <content:encoded><![CDATA[<p>The German compound <em>Barrierefreiheit</em> means, literally, freedom from barriers. It is the word used in legislation, in building codes, in institutional disability policies, in the guidelines that govern what universities must provide. It implies a completable state: you arrive at Barrierefreiheit, and you are done.</p>
<p>I want to argue that this is not only unachievable in practice — which most people in the field will readily concede — but structurally impossible in a society organised the way ours is. The honest term is <em>Barrierearmut</em>: poverty of barriers, reduction of barriers, a direction rather than a destination. The difference is not just linguistic. It shapes what we promise, what we measure, and what we allow ourselves to stop doing.</p>
<h2 id="two-models-of-disability">Two Models of Disability</h2>
<p>The medical model of disability, which dominated institutional thinking for most of the twentieth century, locates the problem in the individual. A person is disabled by their impairment — by the deafness, the mobility limitation, the cognitive difference. The solution, in this frame, is treatment, cure, rehabilitation: changing the person to fit the world.</p>
<p>The social model, developed in the 1970s by disability activists — particularly through the work of the Union of the Physically Impaired Against Segregation in the UK — inverts this (UPIAS, 1976). The distinction is between <em>impairment</em> (a physical or cognitive difference) and <em>disability</em> (the disadvantage created by a society that does not account for that difference). A wheelchair user is not disabled by their legs; they are disabled by a building with no ramp. A deaf student is not disabled by their hearing; they are disabled by a lecture delivered without captioning.</p>
<p>Oliver (1990) developed this into a full political framework. Disability is not a medical category but a social relation — a product of how societies organise space, communication, labour, and meaning. The implication is radical: to address disability, you do not fix the person; you change the society.</p>
<p>This model has transformed disability law, architecture, and educational policy. The UN Convention on the Rights of Persons with Disabilities (2006) is explicitly built on it. WCAG — the Web Content Accessibility Guidelines — embodies it for digital environments. The Behindertengleichstellungsgesetz in Germany draws on it.</p>
<p>And yet.</p>
<h2 id="the-limit-of-the-social-model">The Limit of the Social Model</h2>
<p>The social model is politically necessary and descriptively powerful. It is also incomplete.</p>
<p>Shakespeare and Watson (2002) offer a careful critique: the strict social model, in its effort to relocate disability from body to society, ends up treating impairment as irrelevant — as a neutral fact that only becomes disabling through social organisation. But impairment is not neutral. Pain is real. Fatigue is real. Cognitive load is real. Some impairments impose limits that no architectural or digital intervention fully removes, because the limits are not externally imposed but intrinsic to how a particular nervous system processes the world.</p>
<p>The WHO&rsquo;s International Classification of Functioning, Disability and Health (ICF, 2001) offers a biopsychosocial synthesis: disability as an interaction between health condition, body function and structure, activity, participation, and contextual factors (both environmental and personal). This is less politically clean than the social model — it does not attribute all disablement to society — but it is more honest about the complexity.</p>
<p>The point is not to retreat from the social model&rsquo;s insights but to acknowledge that &ldquo;removing all barriers&rdquo; is an incomplete goal even in its own terms. Impairment is real; context is transformable; and the interaction between them is irreducibly particular. There is no single intervention that produces accessibility for everyone.</p>
<h2 id="why-barrierefreiheit-is-a-false-promise">Why Barrierefreiheit Is a False Promise</h2>
<p>Consider what full accessibility would require. It would require physical spaces that accommodate every mobility profile, every sensory profile, every energy and endurance pattern. It would require information architectures that are simultaneously navigable by users with very different cognitive and perceptual systems. It would require communication norms, cultural contexts, and institutional practices that do not privilege any particular neurotype, any particular communication style, any particular relationship to time and deadlines and social convention.</p>
<p>None of that is achievable in a society with the historical sediment ours has. Our cities were built for able-bodied adults with average sensory capacity and without requirement for cognitive accessibility. Our universities were built — institutionally, not just physically — for a particular kind of learner with a particular kind of background, deploying a particular kind of intelligence. Retrofitting accessibility onto these structures is possible, valuable, and necessary. But it is not the same as having built for full human variation from the start. The ramp bolted onto the side of the neoclassical building solves the wheelchair problem and leaves everything else intact.</p>
<p>Kafer (2013) makes a more radical version of this argument. The concept of &ldquo;normal&rdquo; function — the standard against which accessibility is measured — is not neutral. It encodes a history of who was considered the default human, and who was considered an exception requiring accommodation. Achieving &ldquo;accessibility&rdquo; within a framework that still treats certain bodies and minds as exceptions to be accommodated does not escape that framework; it manages it.</p>
<p>This is why a building can pass every accessibility audit and still function as an excluding institution. The audit measures physical features. It does not measure whether disabled students are welcomed into the culture of the institution, whether their modes of participation are genuinely valued, whether the hidden curriculum of &ldquo;how to be a student&rdquo; is legible to someone whose processing differs from the assumed default.</p>
<h2 id="what-barrierearmut-means">What Barrierearmut Means</h2>
<p>If <em>Barrierefreiheit</em> is the impossible promise, <em>Barrierearmut</em> — barrier reduction — is the honest goal. It is not lesser. It is more accurate.</p>
<p>Barrier reduction as a framework asks: which barriers, for which people, with which effects, can be reduced through which interventions, at what cost, with what trade-offs? It treats accessibility as an ongoing practice rather than a checkable state. It acknowledges that every design decision — physical, digital, institutional — makes some things easier for some people and harder for others, and that the question is always whose needs are centred and whose are treated as exceptions.</p>
<p>Universal Design (Mace, 1997) moves in this direction: designing from the start for the broadest range of users, rather than designing for the norm and retrofitting for exceptions. A kerb cut is the standard example — designed for wheelchair users, also useful for people with pushchairs, luggage, bicycles, temporary injuries. But Universal Design, honestly applied, acknowledges that no design is truly universal. Every design embeds assumptions. The honest goal is to minimise the distance between those assumptions and the actual diversity of users.</p>
<p>For digital environments this is particularly visible. WCAG 2.2 defines four principles — Perceivable, Operable, Understandable, Robust — and success criteria that can be tested against. Meeting WCAG AA is a meaningful achievement. It is not the same as being accessible to all users. Screen reader users with different software behave differently with the same page. Cognitive accessibility — making content understandable, not just perceivable — is addressed by WCAG 3.0 drafts but is notoriously difficult to operationalise. The standards improve; the gap remains.</p>
<h2 id="institutional-honesty">Institutional Honesty</h2>
<p>I work in a university. Universities have accessibility offices, procedures, documentation requirements. A student with a disability can request accommodations: extended exam time, written materials in accessible formats, individual arrangements. These accommodations are real and valuable. They are also, structurally, a system for managing exceptions to a norm that the institution has no intention of revising.</p>
<p>The student who needs extended time is asking the institution to adjust its standard procedure for their case. The institution does so, often generously. But the standard procedure — the timed exam, the lecture format, the office-hours model — remains the standard. The exception is granted; the norm persists. This is barrier management, not barrier reduction.</p>
<p>Barrier reduction would mean asking, as a matter of institutional practice: what is the actual pedagogical purpose of the timed exam, and are there better ways to assess that competency that do not exclude students whose processing differs? It would mean asking what the lecture format assumes about the listener, and whether those assumptions are necessary. These questions are uncomfortable because they challenge practices that are also convenient, and because the people who benefit from the current norms are the ones with the institutional power to change them.</p>
<p>This is not a problem unique to universities. It is the general structure of the problem.</p>
<h2 id="a-direction-not-a-destination">A Direction, Not a Destination</h2>
<p>I am not arguing for giving up on accessibility work. The opposite. I am arguing that naming the goal honestly — barrier reduction, not barrier freedom — produces better practice than the false promise of an achievable endpoint.</p>
<p>Barrierefreiheit as a legal standard can be met by a compliant building that is still a hostile institution. Barrierearmut as a practice requires continuous attention to who is being excluded and by what, and ongoing effort to reduce that exclusion knowing that it will never be complete.</p>
<p>That is harder. It does not allow the institution to certify itself as done. It requires asking the uncomfortable questions about whose default is encoded in the design — a question that leads, quickly, to the question of privilege.</p>
<p>That is the next post: <a href="/posts/privilege-and-education/">The Invisible Entrance Fee: On Privilege, Education, and the Institutions That Reproduce Both</a>.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>Kafer, A. (2013). <em>Feminist, Queer, Crip</em>. Indiana University Press.</li>
<li>Mace, R.L. (1985). Universal Design: Barrier Free Environments for Everyone. <em>Designers West</em>, 33(1), 147–152.</li>
<li>Oliver, M. (1990). <em>The Politics of Disablement</em>. Macmillan.</li>
<li>Shakespeare, T. &amp; Watson, N. (2002). The social model of disability: an outdated ideology? <em>Research in Social Science and Disability</em>, 2, 9–28.</li>
<li>UPIAS (1976). <em>Fundamental Principles of Disability</em>. Union of the Physically Impaired Against Segregation.</li>
<li>WHO (2001). <em>International Classification of Functioning, Disability and Health (ICF)</em>. World Health Organization.</li>
<li>UN General Assembly (2006). <em>Convention on the Rights of Persons with Disabilities</em> (A/RES/61/106).</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-11-05</strong>: Corrected the Mace reference from (1997) <em>Designers West</em> 44(1) to (1985) <em>Designers West</em> 33(1), 147–152. The year 1997 relates to the separate &ldquo;Principles of Universal Design&rdquo; publication by Connell, Jones, Mace et al. at NC State, not the <em>Designers West</em> article.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Are Cats Liquid? The Deborah Number and the Rheology of Cats</title>
      <link>https://sebastianspicker.github.io/posts/liquid-cats-deborah-number/</link>
      <pubDate>Wed, 03 Apr 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/liquid-cats-deborah-number/</guid>
      <description>Marc-Antoine Fardin won the 2017 Ig Nobel Prize in Physics for proving, rigorously, that cats are liquid. The argument rests on the Deborah number De = τ/T: if the material&amp;rsquo;s relaxation time τ is shorter than the observation time T, the material behaves as a fluid. A cat filling a sink (De ≈ 0.008) is a liquid. A cat bouncing off a table (De ≫ 1) is a solid. The classification is not a joke — it is standard rheology, applied to an unusual substrate.</description>
      <content:encoded><![CDATA[<p><em>One of our strays discovered, sometime in her first winter indoors — they are
strictly indoor cats now, on our vet&rsquo;s recommendation — that she could fit into
a salad bowl. Not sit beside it, not rest her head on its rim: fit into it,
curled into a precise sphere with her tail tucked under her chin and her ears
folded flat, filling the bowl as liquid fills a container. The bowl has a
diameter of 22 centimetres. I did not find this as surprising as perhaps I
should have: there is a quantity in materials science that determines, rigorously,
whether a given material in a given situation should be classified as a solid or
a liquid. For a cat in a bowl, this quantity is comfortably below one.</em></p>
<p><em>The material is a liquid. The material is also a cat.</em></p>
<hr>
<h2 id="the-definition-of-a-fluid">The Definition of a Fluid</h2>
<p>The intuitive distinction between solids and liquids is that solids hold their
shape and liquids conform to their container. But this distinction is one of
timescale, not of material identity.</p>
<p>A classic demonstration: place a ball of silly putty on a table. Over the
course of an hour, it flows slowly outward, taking the shape of the table
surface — clearly a liquid. Strike it sharply with a hammer and it shatters —
clearly a solid. The material has not changed. The timescale of the
interaction has.</p>
<p>The same principle applies to glass (contrary to popular myth, medieval window
glass is not thicker at the bottom because it has flowed — the variation is
from the manufacturing process, and the relaxation time of soda-lime glass at
room temperature is of order $10^{23}$ years — but at elevated temperatures
near the glass transition, silicate glass flows readily). It applies
to mantle rock, which is solid on the scale of earthquake waves and liquid on
the scale of continental drift. It applies to pitch, to ice sheets, to asphalt
on a hot day.</p>
<p>The formal tool for capturing this is the <strong>Deborah number</strong>.</p>
<hr>
<h2 id="the-deborah-number">The Deborah Number</h2>
<p>The Deborah number was introduced by Marcus Reiner in 1964, in a short note
in <em>Physics Today</em> (Reiner 1964). It is defined as:</p>
$$\mathrm{De} = \frac{\tau}{T},$$<p>where $\tau$ is the <strong>relaxation time</strong> of the material — roughly, the
characteristic time over which it can rearrange its internal structure and
relieve stress — and $T$ is the <strong>observation time</strong> or the timescale of the
imposed deformation.</p>
<ul>
<li>$\mathrm{De} \ll 1$: The material relaxes quickly relative to the timescale
of observation. Internal stresses are continuously relieved. The material
behaves as a <strong>fluid</strong>.</li>
<li>$\mathrm{De} \gg 1$: The material relaxes slowly relative to the observation
timescale. Internal stresses persist. The material behaves as a <strong>solid</strong>.</li>
<li>$\mathrm{De} \sim 1$: The material is in a viscoelastic regime — partly
fluid, partly solid, exhibiting time-dependent behaviour that is neither.</li>
</ul>
<p>The name comes from the prophetess Deborah, who sang in Judges 5:5: <em>&ldquo;The
mountains flowed before the Lord.&rdquo;</em> At the timescale of a divine perspective,
mountains are liquid. At the timescale of a human lifetime, they are not.
Reiner&rsquo;s point was that the solid-liquid distinction is not a property of
the material but of the relationship between the material&rsquo;s internal
dynamics and the observer&rsquo;s timescale.</p>
<p>For Newtonian fluids (water, air at ordinary conditions), $\tau \to 0$ and
$\mathrm{De} \to 0$ for any finite observation time — they are always liquid.
For a perfectly elastic solid (an ideal spring), $\tau \to \infty$ and
$\mathrm{De} \to \infty$ for any finite observation time — always solid. Real
materials lie between these extremes.</p>
<hr>
<h2 id="the-maxwell-viscoelastic-model">The Maxwell Viscoelastic Model</h2>
<p>The simplest model of a material with a finite relaxation time is the Maxwell
element: a spring (elastic, spring constant $G$) in series with a dashpot
(viscous, viscosity $\eta$). Under a step stress $\sigma_0$ applied at time
$t = 0$, the strain evolves as:</p>
$$\epsilon(t) = \frac{\sigma_0}{G} + \frac{\sigma_0}{\eta}\,t,$$<p>where $\tau = \eta / G$ is the Maxwell relaxation time. The first term is the
instantaneous elastic deformation of the spring; the second is the linear
viscous creep of the dashpot. For $t \ll \tau$, the elastic strain dominates
and the material behaves as a solid; for $t \gg \tau$, the viscous flow
dominates and the material behaves as a liquid. The material &ldquo;decides&rdquo; whether
to be solid or liquid depending on the ratio of $\tau$ to the duration of the
applied stress — which is precisely the Deborah number.</p>
<p>The <strong>creep compliance</strong> $J(t) = \epsilon(t)/\sigma_0 = t/\eta + 1/G$ grows
linearly with time for $t \gg \tau$, confirming liquid behaviour on long
timescales. The <strong>relaxation modulus</strong> $G(t) = \sigma(t)/\epsilon_0 = G
e^{-t/\tau}$ decays exponentially to zero, confirming that the material
cannot sustain a permanent stress — again, liquid behaviour on long timescales.</p>
<hr>
<h2 id="on-the-rheology-of-cats">On the Rheology of Cats</h2>
<p>In 2014, Marc-Antoine Fardin, a physicist at the ENS Lyon,
published &ldquo;On the Rheology of Cats&rdquo; in the <em>Rheology Bulletin</em> 83(2), 16–17.
The paper asked whether cats satisfy the defining rheological criterion for
liquids, using the Deborah number as the test. Fardin was awarded the 2017
Ig Nobel Prize in Physics — which is awarded for research that &ldquo;makes you
laugh, then makes you think&rdquo; — for this work.</p>
<p>The paper is not a joke. It is standard rheology applied to an unusual material,
with appropriately hedged conclusions and correct citations to the primary
literature on viscoelastic flow. The humour is in the application; the physics
is serious.</p>
<h3 id="estimating-the-cats-relaxation-time">Estimating the Cat&rsquo;s Relaxation Time</h3>
<p>The relaxation time $\tau$ of a cat is the time scale over which the cat&rsquo;s
body deforms to fill a container. This is observable. A cat placed near a
suitable container — a salad bowl, a cardboard box, a bathroom sink —
adopts a conformed shape on a timescale of roughly 5–30 seconds. The initial
posture (stiff, alert) gives way to a relaxed conformation as the cat
assesses the container and adjusts. Fardin estimated $\tau \approx 1$–$30$
seconds, with the exact value depending on the container&rsquo;s attractiveness
to the specific cat.</p>
<p>This is the material&rsquo;s characteristic relaxation time. The fact that it is
finite — that the cat does eventually conform to the container — is the
essential observation.</p>
<h3 id="computing-the-deborah-number-for-various-situations">Computing the Deborah Number for Various Situations</h3>
<p><strong>Scenario 1: Cat in a sink.</strong>
A cat taking ten minutes to settle into a bathroom sink. Observation time
$T = 600\,\mathrm{s}$, relaxation time $\tau \approx 5\,\mathrm{s}$.</p>
$$\mathrm{De}_\mathrm{sink} = \frac{5}{600} \approx 0.008 \ll 1.$$<p>The cat is unambiguously a <strong>liquid</strong>.</p>
<p><strong>Scenario 2: Cat in a cardboard box.</strong>
Conformation over approximately 30 minutes, $\tau \approx 20\,\mathrm{s}$.</p>
$$\mathrm{De}_\mathrm{box} = \frac{20}{1800} \approx 0.011 \ll 1.$$<p><strong>Liquid.</strong></p>
<p><strong>Scenario 3: Cat dropping from a bookshelf.</strong>
Contact time during a jump approximately $T \approx 0.05\,\mathrm{s}$,
relaxation time still $\tau \approx 5\,\mathrm{s}$.</p>
$$\mathrm{De}_\mathrm{jump} = \frac{5}{0.05} = 100 \gg 1.$$<p><strong>Solid.</strong> The cat does not deform into the shape of the bookshelf during the
jump; it rebounds elastically.</p>
<p><strong>Scenario 4: Cat startled by a loud noise.</strong>
Reaction time $T \approx 0.3\,\mathrm{s}$, $\tau \approx 5\,\mathrm{s}$.</p>
$$\mathrm{De}_\mathrm{startle} = \frac{5}{0.3} \approx 17 \gg 1.$$<p><strong>Solid.</strong> On short timescales, cats behave as elastic materials — they spring,
they bounce, they do not flow.</p>
<p>The cat is neither permanently solid nor permanently liquid. It is a
<strong>viscoelastic material</strong> whose phase classification depends on the timescale
of the interaction. This is not a loose analogy; it is the definition of
viscoelasticity.</p>
<hr>
<h2 id="non-newtonian-behaviour-and-flow-instabilities">Non-Newtonian Behaviour and Flow Instabilities</h2>
<p>Fardin noted an additional complication: cat flow is not Newtonian. A Newtonian
fluid has a viscosity $\eta$ that is independent of the applied shear rate
$\dot\gamma$. Many real materials are <strong>shear-thinning</strong> (viscosity decreases
with increasing shear rate — ketchup, blood, many polymer solutions) or
<strong>shear-thickening</strong> (viscosity increases with increasing shear rate —
cornstarch suspension, some dense suspensions). Cats, Fardin observed, appear
to be shear-thinning: the more rapidly you attempt to move a relaxed cat from
its current position, the more &ldquo;liquid&rdquo; (accommodating, compliant) it becomes,
up to a point at which the cat transitions to solid behaviour (claws, teeth).</p>
<p>This is, formally, the behaviour of a <strong>yield-stress fluid</strong>: a material that
behaves as a solid below a critical stress $\sigma_y$ and flows above it. The
Herschel–Bulkley model describes such fluids:</p>
$$\sigma = \sigma_y + k \dot\gamma^n, \quad \sigma > \sigma_y,$$<p>where $k$ is the flow consistency index and $n < 1$ for shear-thinning. The
challenge of fitting $k$, $n$, and $\sigma_y$ for a specific cat is
experimental, and Fardin acknowledged this was left to future work.</p>
<p>The <strong>Deborah number</strong> and the <strong>yield stress</strong> together provide a two-parameter
phase diagram for cat rheology:</p>
<ul>
<li>Low stress, short timescale: solid (De ≫ 1 or σ &lt; σ_y)</li>
<li>Low stress, long timescale: liquid (De ≪ 1)</li>
<li>High stress: yield, followed by flow</li>
</ul>
<hr>
<h2 id="flow-instabilities-the-rayleigh-plateau-connection">Flow Instabilities: The Rayleigh-Plateau Connection</h2>
<p>Fardin also noted that cats confined to containers thinner than their body
diameter can exhibit flow instabilities. A cat attempting to fit into a glass
too narrow for its body will sometimes adopt a helical or coiled configuration —
an instability reminiscent of the <strong>Rayleigh–Plateau instability</strong> of a liquid
jet.</p>
<p>The Rayleigh–Plateau instability occurs when a cylindrical fluid jet of radius
$r_0$ is subject to perturbations of wavelength $\lambda > 2\pi r_0$. Modes
with wavelength longer than the cylinder&rsquo;s circumference are unstable and grow,
breaking the jet into droplets. The dispersion relation for growth rate $\sigma$
as a function of wavenumber $k = 2\pi/\lambda$ (for an inviscid jet) is:</p>
$$\sigma^2 = \frac{\gamma}{\rho r_0^3}\, k r_0 \bigl(1 - k^2 r_0^2\bigr)
I_1(kr_0)/I_0(kr_0),$$<p>where $\gamma$ is surface tension and $I_0, I_1$ are modified Bessel functions.
The analogy with a cat is inexact — surface tension is not the dominant
restoring force — but the qualitative instability mechanism (a long cylinder of
material is unstable to perturbations whose wavelength exceeds the cylinder&rsquo;s
circumference) appears to apply, suggesting that very elongated cats in very
narrow containers should be unstable to coiling. This is, again, left to future
experimental work.</p>
<hr>
<h2 id="why-the-deborah-number-matters-outside-of-cat-physics">Why the Deborah Number Matters (Outside of Cat Physics)</h2>
<p>The Deborah number is not a curiosity; it is a central dimensionless number
in engineering and materials science.</p>
<p><strong>Polymer processing</strong>: The flow of polymer melts through injection-moulding
channels involves De in the range $10^{-2}$–$10^2$. Too high a De leads to
elastic instabilities, melt fracture, and surface defects in the finished part.</p>
<p><strong>Blood rheology</strong>: Blood is a non-Newtonian viscoelastic fluid. In the large
arteries (low shear rate), red blood cells aggregate into <em>rouleaux</em> and
blood behaves as a shear-thinning fluid. In the capillaries (high shear rate),
rouleaux break up and individual cells deform to fit through vessels smaller
than their resting diameter — liquid behaviour on short length scales.</p>
<p><strong>Geophysics</strong>: The mantle is an elastic solid for seismic waves ($T \sim$
seconds, De ≫ 1) and a viscous fluid for convection ($T \sim 10^8$–$10^9$
years, De ≪ 1). The same material. Different Deborah numbers.</p>
<p><strong>Glaciology</strong>: Ice is an elastic solid for rapid fracture (calving of icebergs)
and a viscous fluid for glacier flow. The transition occurs at timescales of
years to decades, depending on temperature and stress.</p>
<p>The cat is in good company.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Fardin, M.-A. (2014). On the rheology of cats. <em>Rheology Bulletin</em>, 83(2),
16–17.</p>
</li>
<li>
<p>Reiner, M. (1964). The Deborah number. <em>Physics Today</em>, 17(1), 62.
<a href="https://doi.org/10.1063/1.3051374">https://doi.org/10.1063/1.3051374</a></p>
</li>
<li>
<p>Barnes, H.A., Hutton, J.F., &amp; Walters, K. (1989). <em>An Introduction to
Rheology.</em> Elsevier (Rheology Series, Vol. 3).</p>
</li>
<li>
<p>Bird, R.B., Armstrong, R.C., &amp; Hassager, O. (1987). <em>Dynamics of Polymeric
Liquids, Vol. 1: Fluid Mechanics</em> (2nd ed.). Wiley-Interscience.</p>
</li>
<li>
<p>Eggers, J. (1997). Nonlinear dynamics and breakup of free-surface flows.
<em>Reviews of Modern Physics</em>, 69(3), 865–930.
<a href="https://doi.org/10.1103/RevModPhys.69.865">https://doi.org/10.1103/RevModPhys.69.865</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-15</strong>: Fixed Deborah number in summary from 0.08 to 0.008 (matching the body calculation: 5/600 = 0.00833).</li>
<li><strong>2025-12-15</strong>: Corrected Fardin&rsquo;s institutional affiliation from &ldquo;Paris Diderot University&rdquo; to &ldquo;ENS Lyon&rdquo; — his affiliation on the 2014 <em>Rheology Bulletin</em> paper is Université de Lyon / ENS Lyon (CNRS UMR 5672). He moved to Paris Diderot later in 2014, after the paper was published.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Hunting Exoplanets with Your Phone: A Classroom Experiment That Actually Works</title>
      <link>https://sebastianspicker.github.io/posts/exoplanet-hunting-smartphones/</link>
      <pubDate>Mon, 11 Mar 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/exoplanet-hunting-smartphones/</guid>
      <description>Finding planets around other stars sounds like it requires a space telescope. It does not — at least not the analogy version. This is the story of how a lamp, a ball, and a smartphone became a peer-reviewed physics classroom experiment, published in The Physics Teacher in 2024.</description>
      <content:encoded><![CDATA[<p><em>This post describes the work behind &ldquo;Exoplanet Hunting in the Classroom: An
Easy-to-Implement Experiment Based on Video-Aided Light Curve Analysis with
Smartphones&rdquo;, published in The Physics Teacher in 2024 (co-authored with
Alexander Küpper). It also draws on the earlier German-language paper on
analogy experiments for the transit method, published in Astronomie+Raumfahrt
in 2022.</em></p>
<hr>
<h2 id="the-pedagogical-problem">The Pedagogical Problem</h2>
<p>The transit method is how the majority of confirmed exoplanets have been
found. When a planet passes in front of its host star, it blocks a fraction
of the star&rsquo;s light. A sufficiently precise light sensor pointed at the star
will record a characteristic dip: a flat-bottomed decrease in flux during
the transit, with a precise shape determined by the ratio of the planet&rsquo;s
radius to the star&rsquo;s radius, the duration of the transit, and the geometry
of the orbit.</p>
<p>This is conceptually accessible. The physics is essentially shadow casting —
a topic covered in primary school — applied to an astronomically interesting
situation. Students understand it quickly and find it genuinely exciting.</p>
<p>The problem is the implementation. How do you actually demonstrate this
in a classroom?</p>
<p>Standard approaches divide into three categories, each with limitations:</p>
<ol>
<li>
<p><strong>Simulations and database exercises</strong>: Students work with real data from
Kepler or TESS, or use a software simulation. These are conceptually
valid but remote from physical experience. There is no sensor, no
measurement, no uncertainty to grapple with.</p>
</li>
<li>
<p><strong>Prefabricated kits</strong>: Products like PocketLab or Pasco offer purpose-built
transit experiment setups. They work, but they are expensive, closed-source,
and require manufacturer-specific software. A school that buys a Pasco sensor
is locked into the Pasco ecosystem.</p>
</li>
<li>
<p><strong>DIY benchtop setups</strong>: Various published designs use phototransistors,
Arduinos, or similar components with a benchtop light source. These are
flexible and cheap but require component procurement, assembly, and some
technical confidence from the teacher. The barrier to entry is real.</p>
</li>
</ol>
<p>What was missing was an approach that was inexpensive, open-source, required
no specialist equipment procurement, and worked at the level of a student
experiment rather than a teacher demonstration.</p>
<hr>
<h2 id="the-smartphone-solution">The Smartphone Solution</h2>
<p>Modern Android smartphones include an ambient light sensor that is directly
accessible via <a href="https://phyphox.org">phyphox</a>, the free measurement app
developed at RWTH Aachen. Set up the experiment correctly, and the phone
records a real-time light curve.</p>
<p>The basic setup requires three things:</p>
<ul>
<li>A light source (a standard desk lamp, ideally with a constant-brightness
LED bulb to avoid flicker)</li>
<li>An opaque sphere to act as the &ldquo;planet&rdquo; (a tennis ball, a ping-pong ball,
anything with a defined circular silhouette)</li>
<li>A smartphone running phyphox, positioned beneath the lamp at a fixed
distance and oriented so the light sensor faces upward</li>
</ul>
<p>When the sphere is moved across the light path at a controlled height and
speed, the light sensor records a transit: a smooth dip in measured
illuminance with the flat-bottomed shape characteristic of a planetary
transit across a uniformly bright disk.</p>
<p>This is the core experiment. It works. The transit signal is clear enough
to measure even with the modest precision of a phone&rsquo;s ambient light sensor,
provided the background illumination is controlled (dark room or at least
consistent ambient light).</p>
<hr>
<h2 id="the-iphoneroblem-and-its-solution">The iPhoneRoblem and Its Solution</h2>
<p>Apple devices do not expose their ambient light sensor through any public
software API. An iPhone running phyphox cannot access the sensor that is
physically present in the device.</p>
<p>The workaround we recommend: an external Bluetooth light sensor connected
to phyphox. Options include the TI SensorTag CC2650, Bluetooth multimeters
such as the OWON B35T, or an Arduino Nano 33 BLE Sense. The Arduino option
is particularly well-suited to educational contexts: it is open-source, it
is inexpensive, and its absence of a built-in operating system makes it
more reliable as a pure sensor.</p>
<p>The external sensor approach also has a benefit beyond iPhone compatibility:
it produces more consistent data across different devices, since you are
measuring at a fixed external point rather than through whatever optical
pathway the phone manufacturer chose. For experiments where comparison
across student groups matters, this is not trivial.</p>
<hr>
<h2 id="video-aided-light-curve-analysis">Video-Aided Light Curve Analysis</h2>
<p>The standard approach to a transit experiment is: measure the dip, calculate
the planet-to-star radius ratio from the relative depth, done. This works
and is pedagogically valid.</p>
<p>The paper introduces a complementary approach: simultaneously recording a
video of the &ldquo;planet&rdquo; passing in front of the &ldquo;lamp&rdquo;, and using the video
frames to cross-reference the light curve data.</p>
<p>Why? Because the light curve from a real transit experiment does not look
exactly like the idealised textbook version. There is noise. There is
baseline drift. The &ldquo;ingress&rdquo; and &ldquo;egress&rdquo; phases — where the planet is
partially in front of the star — are often unclear at smartphone sensor
resolution. Students frequently have difficulty connecting the shape of
the curve to the physical geometry that produced it.</p>
<p>Video-aided analysis addresses this directly. Frame-by-frame, students can
see exactly where the planet was at each moment in the light curve. The
ingress becomes visible: when the sphere first touches the lamp&rsquo;s light cone,
the sensor begins to register the dip. The mid-transit flat bottom corresponds
to full occultation of a central portion of the lamp. The egress mirrors the
ingress. The correspondence between geometry and photometry — which is the
conceptual core of the transit method — becomes explicit.</p>
<p>In a teaching context, this turns the error and noise in the light curve from
an obstacle into an educational resource. Students can identify specific
features of the curve and ask: what was happening in the physical experiment
at that moment? The uncertainty is no longer an embarrassment. It is a
diagnostic.</p>
<hr>
<h2 id="scaffolding-levels">Scaffolding Levels</h2>
<p>The paper distinguishes three implementation modes, corresponding to different
levels of student independence:</p>
<p><strong>Demonstration experiment</strong>: Teacher sets up and runs the apparatus. Students
observe and discuss. Appropriate as an introduction to the concept before
students engage with it independently.</p>
<p><strong>Guided student experiment</strong>: Students follow a structured procedure, with
specified setup, data collection protocol, and analysis worksheet. Appropriate
for students who have not designed their own experiments and for lesson
contexts where time is limited.</p>
<p><strong>Open inquiry</strong>: Students are given the materials and a research question —
&ldquo;How does the depth of the transit dip depend on the size of the planet?&rdquo; —
and design their own procedure. Appropriate for upper secondary students with
experience in experimental design, and for lesson contexts that explicitly
address scientific method.</p>
<p>The materials for all three modes are described in the paper. The open inquiry
mode is the most demanding but also the most research-authentic: students
are not following a protocol but building one, confronting the actual decisions
that experimental physicists make.</p>
<hr>
<h2 id="from-the-classroom-to-the-telescope">From the Classroom to the Telescope</h2>
<p>A transit experiment with a lamp and a phone is, obviously, not the same
as the photometry done by TESS or the James Webb Space Telescope. The
planet-star radius ratios measurable in the classroom analog are much
larger than for most real exoplanets. The signal-to-noise is worse. The
lamp is not a star.</p>
<p>But the method is the same. The measurement principle — flux dip proportional
to the square of the radius ratio, duration determined by orbital geometry —
is the same physics that Kepler used to find thousands of planets. When
students calculate the &ldquo;radius&rdquo; of their tennis-ball planet from their light
curve, they are doing, in miniature, what professional astronomers do with
data from space.</p>
<p>This connection to real research is not incidental to the pedagogy. It is
central to it. The transit method works as a classroom experiment not because
it is a good demonstration of some abstract physics principle but because
it is a genuine slice of how contemporary science actually operates. The
question the experiment answers — is there something out there? — is the
same question the professional community is asking.</p>
<p>The simulation companion to this work — a browser-based model of transit
photometry with full limb darkening, exomoon scenarios, and N-body dynamics —
is described in <a href="/posts/the-gift-of-transits/">this separate post</a>. The
simulation is the place to go when you want to explore parameter space;
the physical experiment is the place to go when you want to understand
what a measurement actually is.</p>
<hr>
<h2 id="connection-to-the-astro-lab">Connection to the astro-lab</h2>
<p>The transit experiment described here grew directly out of the
<a href="/posts/astro-lab-at-home/">astro-lab project</a> at the University of Cologne,
where Alexander Küpper and I had been developing smartphone-based analogy
experiments for exoplanet detection since the COVID pivot in 2020. The
astro-lab@home established the feasibility of the smartphone approach;
the A+R 2022 paper on Analogieexperimente für die Transitmethode explored
the design space more systematically; the TPT 2024 paper is the version
written for an international teacher audience, with the comparative
equipment table, the video-aided analysis technique, and the scaffolding
levels made explicit.</p>
<p>If you want to extend the experiment to exomoons — detecting the gravitational
wobble that a moon induces in a planet&rsquo;s transit — that work is described
in <a href="/posts/exomoon-analogy-experiment/">a later post</a>.</p>
<p><em>For the curriculum article that places the transit experiment in the NRW
Sekundarstufe I context — including the Direct Imaging pre-experiment —
see <a href="/posts/fremde-welten-exoplanet-teaching/">Fremde Welten</a>.</em></p>
<hr>
<h2 id="references">References</h2>
<p>Spicker, S. J., &amp; Küpper, A. (2024). Exoplanet hunting in the classroom:
An easy-to-implement experiment based on video-aided light curve analysis
with smartphones. <em>The Physics Teacher</em>, 62(3).
<a href="https://doi.org/10.1119/5.0125305">https://doi.org/10.1119/5.0125305</a></p>
<p>Küpper, A., &amp; Spicker, S. J. (2022). Analogieexperimente zur Transitmethode
für den Einsatz in Schule und Hochschule. <em>Astronomie+Raumfahrt im Unterricht</em>,
59(5).</p>
<p>Staacks, S., Hütz, S., Heinke, H., &amp; Stampfer, C. (2018). Advanced tools
for smartphone-based experiments: phyphox. <em>Physics Education</em>, 53(4), 045009.
<a href="https://doi.org/10.1088/1361-6552/aac05e">https://doi.org/10.1088/1361-6552/aac05e</a></p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-10-03</strong>: Updated the DOI for Spicker &amp; Küpper (2024) to the correct 10.1119/5.0125305.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>You Cannot Have All Three: The Fairness Impossibility Theorem</title>
      <link>https://sebastianspicker.github.io/posts/fairness-impossibility-ai-bias/</link>
      <pubDate>Fri, 08 Mar 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/fairness-impossibility-ai-bias/</guid>
      <description>Three natural fairness criteria for an AI classifier — calibration, equal false positive rates, equal false negative rates — cannot all hold simultaneously when base rates differ across groups. This is not an engineering failure. It is a theorem. Choosing which criterion to satisfy is a political decision, not a technical one.</description>
      <content:encoded><![CDATA[<h2 id="summary">Summary</h2>
<p>In 2016 ProPublica published an investigation showing that COMPAS — a widely used recidivism risk
assessment tool — assigned higher risk scores to Black defendants than to White defendants with
equivalent actual recidivism rates. The tool&rsquo;s developer responded that COMPAS is well-calibrated:
among defendants of any race assigned a given score, the subsequent recidivism rates are
consistent with that score. Both claims were correct.</p>
<p>The apparent contradiction between them is resolved by a mathematical result that was proved
independently by two groups the same year. The fairness impossibility theorem establishes that
calibration, equal false positive rates, and equal false negative rates cannot all hold
simultaneously when base rates differ between groups — unless the classifier is perfect.</p>
<p>This is not a property of COMPAS specifically. It is not fixed by a better algorithm, more
diverse training data, or more careful engineering. It is a constraint that holds for any
probabilistic classifier operating on groups with unequal prevalence of the predicted outcome.</p>
<p>The question this forces is not &ldquo;how do we make the algorithm fair?&rdquo; The question is &ldquo;which
fairness criterion do we endorse, and can we defend that choice to the people it disadvantages?&rdquo;
That is not a technical question.</p>
<h2 id="the-compas-investigation">The COMPAS Investigation</h2>
<p>Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner published &ldquo;Machine Bias&rdquo; in ProPublica
on 23 May 2016 (<a href="#ref-angwin2016">Angwin et al., 2016</a>). They had obtained COMPAS risk scores for
approximately 7,000 defendants in Broward County, Florida, along with actual two-year recidivism
data. Their finding: among defendants who did not go on to reoffend, Black defendants were
falsely labelled high-risk at roughly twice the rate of White defendants. The false positive rate
was substantially higher for Black defendants.</p>
<p>Northpointe (now Equivant), the tool&rsquo;s developer, responded that ProPublica&rsquo;s analysis was
misleading. COMPAS is <em>calibrated</em>: within any given score band, the actual recidivism rate is
the same regardless of race. A score of 7 means approximately the same thing for a Black
defendant as for a White defendant. This is a genuine and important property for a risk assessment
to have.</p>
<p>Both analyses were conducted correctly. The tension between them is not a matter of one side
being wrong. It is a matter of two legitimate fairness criteria being simultaneously satisfied
being mathematically impossible.</p>
<h2 id="three-definitions-of-fairness">Three Definitions of Fairness</h2>
<p>Let \(Y \in \{0, 1\}\) be the true outcome (reoffend/not), \(\hat{Y}\) be the classifier&rsquo;s
prediction, and \(A \in \{0, 1\}\) indicate group membership.</p>
<p><strong>Calibration</strong> (predictive parity): for all score values \(s\),</p>
$$P(Y = 1 \mid \hat{Y} = s, A = 0) = P(Y = 1 \mid \hat{Y} = s, A = 1)$$<p>If the model assigns a score of 7 to a defendant, the actual reoffending rate should be the
same regardless of race. This is what COMPAS satisfies.</p>
<p><strong>False positive rate parity</strong>:</p>
$$P(\hat{Y} = 1 \mid Y = 0, A = 0) = P(\hat{Y} = 1 \mid Y = 0, A = 1)$$<p>Among defendants who will not reoffend, the probability of being incorrectly labelled high-risk
should be equal across groups. This is what ProPublica found violated.</p>
<p><strong>False negative rate parity</strong>:</p>
$$P(\hat{Y} = 0 \mid Y = 1, A = 0) = P(\hat{Y} = 0 \mid Y = 1, A = 1)$$<p>Among defendants who will reoffend, the probability of being incorrectly labelled low-risk
should be equal across groups.</p>
<p>All three properties seem like reasonable things to ask of a fair classifier. The impossibility
theorem says you cannot have all three at once — with a precise exception.</p>
<h2 id="the-impossibility-theorem">The Impossibility Theorem</h2>
<p>Alexandra Chouldechova proved the relevant result in 2017 using Broward County data as her case
study (<a href="#ref-chouldechova2017">Chouldechova, 2017</a>). Jon Kleinberg, Sendhil Mullainathan, and
Manish Raghavan proved an equivalent result independently (<a href="#ref-kleinberg2017">Kleinberg et al., 2017</a>).</p>
<p>The argument is straightforward. Suppose a classifier is calibrated and produces a binary
prediction (high/low risk). Let \(p_0\) and \(p_1\) be the base rates — the actual reoffending
rates — in groups 0 and 1. For a binary classifier with positive predictive value PPV and
negative predictive value NPV:</p>
<ul>
<li>The false positive rate satisfies (via Bayes): \(\text{FPR} = \frac{\text{TPR} \cdot \text{PR} \cdot (1-\text{PPV})}{\text{PPV} \cdot (1-\text{PR})}\) where PR is prevalence and TPR is sensitivity</li>
<li>The false negative rate satisfies (via Bayes): \(\text{FNR} = \frac{\text{TNR} \cdot (1-\text{PR}) \cdot (1-\text{NPV})}{\text{NPV} \cdot \text{PR}}\) where TNR is specificity</li>
</ul>
<p>If calibration holds — PPV and NPV are equal across groups — and the base rates \(p_0 \neq p_1\),
then the FPR and FNR in each group are functions of that group&rsquo;s specific base rate. They cannot
both be equalized across groups unless either:</p>
<ol>
<li>\(p_0 = p_1\): the base rates are equal, or</li>
<li>The classifier is perfect: FPR = FNR = 0.</li>
</ol>
<p>In the real case — unequal base rates, imperfect classifier — calibration and equalized error
rates are mutually exclusive. You can have one or the other but not both. The three criteria have
two degrees of freedom, and the third is determined by the first two plus the base rates. It is an
algebraic constraint, not an engineering limitation.</p>
<h2 id="a-structural-analogy">A Structural Analogy</h2>
<p>The structural similarity to another impossibility result is worth noting.</p>
<p>The Robertson inequality in quantum mechanics (<a href="#ref-robertson1929">Robertson, 1929</a>) states that
for any two observables \(\hat{A}\) and \(\hat{B}\):</p>
$$\Delta A \cdot \Delta B \geq \frac{1}{2} \left| \langle [\hat{A}, \hat{B}] \rangle \right|$$<p>This is not an engineering failure. It is a consequence of the algebraic structure of the theory:
if \([\hat{A}, \hat{B}] \neq 0\), then \(\Delta A\) and \(\Delta B\) cannot simultaneously be
made arbitrarily small. No measurement apparatus, however precise, can violate it. The constraint
is in the mathematics, not the hardware.</p>
<p>The fairness impossibility has the same character. Three desiderata, a structural constraint that
prevents simultaneous satisfaction, and no algorithmic escape route. A better model does not help.
Richer training data does not help. The constraint is in the arithmetic of conditional
probabilities and base rates.</p>
<p>The disanalogy is this: in quantum mechanics, \(\hbar\) is a fundamental constant — you cannot
reduce it. In fairness, the base rates are not constants of nature. They are historical outcomes
of social processes: incarceration rates, policing patterns, economic conditions, educational
access. The theorem does not tell you that unequal base rates are acceptable; it tells you that
given unequal base rates, the three fairness criteria cannot all be satisfied.</p>
<h2 id="gender-bias-in-ai-systems">Gender Bias in AI Systems</h2>
<p>The impossibility theorem applies to any binary classification setting with unequal base rates.
The empirical landscape of AI gender bias gives several concrete instances where one criterion was
satisfied while others were not.</p>
<p>In October 2018, Reuters reported that Amazon had developed and then abandoned an internal
AI-based recruiting tool that systematically downgraded résumés from women
(<a href="#ref-dastin2018">Dastin, 2018</a>). The model had been trained on a decade of hiring decisions,
in which successful hires were predominantly male. The model learned that &ldquo;male&rdquo; features were
associated with success and penalized female indicators accordingly. Calibration to the training
data produced systematic gender bias in output.</p>
<p>Tolga Bolukbasi and colleagues showed in 2016 that word embeddings trained on large text corpora
encoded gender stereotypes in their geometric structure
(<a href="#ref-bolukbasi2016">Bolukbasi et al., 2016</a>). The analogy \(\text{man} : \text{computer
programmer} :: \text{woman} : \text{homemaker}\) could be recovered directly from the vector
arithmetic of the embedding space. The embedding was calibrated to the text corpus, which reflected
the occupational distribution of the time — and perpetuated it.</p>
<p>Jieyu Zhao and colleagues found that image captioning and activity recognition models amplified
existing gender associations (<a href="#ref-zhao2017">Zhao et al., 2017</a>). &ldquo;Cooking&rdquo; was associated with
women in 67% of training images; the models amplified that to 84% at inference.
The amplification is a consequence of models learning the easiest features that predict the label
— and in a world where cooking is disproportionately female, &ldquo;female appearance&rdquo; becomes a
feature that predicts &ldquo;cooking.&rdquo;</p>
<p>Joy Buolamwini and Timnit Gebru&rsquo;s &ldquo;Gender Shades&rdquo; study found error rates of up to 34.7% for
darker-skinned women in commercial facial recognition systems, compared to 0.8% for lighter-skinned
men (<a href="#ref-buolamwini2018">Buolamwini &amp; Gebru, 2018</a>). The classifiers were calibrated on
predominantly light-skinned training data. Calibration on the majority group produced large errors
on the minority group — exactly the pattern the impossibility theorem describes.</p>
<p>Hadas Kotek and colleagues tested four large language models on gender-stereotyped occupational
prompts in 2023 (<a href="#ref-kotek2023">Kotek et al., 2023</a>). The models were three to six times more
likely to choose the gender-stereotyped occupation when responding to ambiguous prompts. The
models were calibrated to human-generated text; human-generated text encodes human stereotypes.</p>
<h2 id="the-solutions-and-their-limits">The Solutions and Their Limits</h2>
<p>Three broad approaches exist to algorithmic debiasing, and all three face the same constraint.</p>
<p><strong>Pre-processing</strong> removes bias from training data before training. Zemel and colleagues proposed
&ldquo;Learning Fair Representations&rdquo; — a latent embedding that encodes the data usefully while
obscuring group membership (<a href="#ref-zemel2013">Zemel et al., 2013</a>). This can reduce bias in the
learned representation, but it cannot simultaneously satisfy all three fairness criteria; it
trades one against another by compressing the group-informative dimensions.</p>
<p><strong>Post-processing</strong> adjusts the classifier&rsquo;s decisions after training. Moritz Hardt, Eric Price,
and Nathan Srebro&rsquo;s equalized odds approach (<a href="#ref-hardt2016">Hardt et al., 2016</a>) adjusts
decision thresholds separately per group to achieve FPR/FNR parity. This satisfies equalized
odds by construction — but only by abandoning calibration, which the Chouldechova theorem requires
when base rates differ.</p>
<p><strong>In-processing</strong> incorporates a fairness constraint into the training objective. Agarwal and
colleagues proposed a reductions approach that allows the practitioner to specify which fairness
constraint to impose (<a href="#ref-agarwal2018">Agarwal et al., 2018</a>). But you must choose. The
algorithm can optimize for any one of the three criteria; it cannot optimize for all three
simultaneously when base rates differ.</p>
<p>A 2021 survey by Mitchell and colleagues confirms that all three paradigms face the same
impossibility (<a href="#ref-mitchell2021">Mitchell et al., 2021</a>). The choice of paradigm is a choice
about which criterion to prioritize, and that choice has distributional consequences that fall
differently on different groups.</p>
<h2 id="the-political-choice">The Political Choice</h2>
<p>This is where Arvind Narayanan&rsquo;s framing becomes essential. His 2018 tutorial catalogued 21
distinct definitions of algorithmic fairness and titled it &ldquo;21 Fairness Definitions and Their
Politics&rdquo; (<a href="#ref-narayanan2018">Narayanan, 2018</a>). The title is the argument: the definitions
are not equivalent, choosing among them is not a technical decision, and the choice encodes a
prior about what justice requires.</p>
<p>In the criminal justice context: a false positive (predicting recidivism when the defendant will
not reoffend) imposes a cost on the defendant — higher bail, longer sentence, restricted
conditions of release. A false negative (predicting non-recidivism when the defendant will
reoffend) imposes a cost on potential future victims and on public safety. When we choose to
minimize FPR parity, we are choosing to protect defendants from false accusation. When we choose
to minimize FNR parity, we are choosing to protect the public from missed offenders. These are
both defensible values. They produce different error distributions across groups.</p>
<p>Choosing overall accuracy as the metric — which is what maximizing predictive performance
typically means — is itself a value choice: it implicitly weights errors by their frequency in
the population, which means errors made on less-common outcomes are relatively under-penalized.
When racial disparities in base rates are products of historical injustice, this choice compounds
that injustice.</p>
<p>Solon Barocas, Moritz Hardt, and Arvind Narayanan&rsquo;s textbook <em>Fairness and Machine Learning</em>
(2023) makes explicit that the choice between fairness criteria is a normative, not technical,
decision (<a href="#ref-barocas2023">Barocas et al., 2023</a>). The book does not tell you which criterion
to choose. It tells you that you must choose, that the choice has political content, and that
presenting it as a technical optimization problem conceals that content.</p>
<p>Reuben Binns&rsquo; analysis through political philosophy confirms that different fairness criteria
correspond to different underlying theories of justice: Rawlsian, Dworkinian, luck egalitarian
framings all generate different orderings of the three criteria
(<a href="#ref-binns2018">Binns, 2018</a>). The choice of fairness criterion is the choice of a
theory of justice, whether or not the engineers implementing the system have thought of it in
those terms.</p>
<h2 id="the-theorem-is-not-the-problem">The Theorem Is Not the Problem</h2>
<p>I want to be clear about what the impossibility theorem does and does not say.</p>
<p>It does not say that algorithmic fairness is impossible. It says that you must choose among
competing fairness criteria when base rates differ across groups, and that the choice has
distributional consequences. Systems can be built that satisfy calibration, or equalized odds,
or demographic parity — just not all three at once with unequal base rates.</p>
<p>It does not say that base rate disparities are natural or acceptable. The disparities in
recidivism rates, hiring rates, image training sets, and text corpora are products of social
history. The theorem constrains what a classifier can do <em>given</em> those disparities; it does not
prescribe them.</p>
<p>What it does say is that &ldquo;we built a fair algorithm&rdquo; is not a statement that can be made without
specifying which fairness criterion was satisfied and which was not. It is not a statement that
can be defended on purely technical grounds. And it is not a statement that escapes political
accountability by hiding behind mathematical precision.</p>
<p>The fairness debate in AI is, at its core, a debate about which errors we are willing to make, in
whom, with what consequences. The theorem makes that debate unavoidable. Whether we have the
vocabulary and the will to conduct it in those terms is a different question entirely.</p>
<h2 id="references">References</h2>
<ul>
<li><span id="ref-angwin2016"></span>Angwin, J., Larson, J., Mattu, S., &amp; Kirchner, L. (2016, May 23). Machine bias. <em>ProPublica</em>. <a href="https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing">https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing</a></li>
<li><span id="ref-chouldechova2017"></span>Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. <em>Big Data</em>, 5(2), 153–163. <a href="https://doi.org/10.1089/big.2016.0047">DOI: 10.1089/big.2016.0047</a></li>
<li><span id="ref-kleinberg2017"></span>Kleinberg, J., Mullainathan, S., &amp; Raghavan, M. (2017). Inherent trade-offs in the fair determination of risk scores. In <em>Proceedings of the 8th Innovations in Theoretical Computer Science Conference</em> (ITCS 2017). <a href="https://doi.org/10.4230/LIPIcs.ITCS.2017.43">DOI: 10.4230/LIPIcs.ITCS.2017.43</a></li>
<li><span id="ref-robertson1929"></span>Robertson, H. P. (1929). The uncertainty principle. <em>Physical Review</em>, 34, 163–164. <a href="https://doi.org/10.1103/PhysRev.34.163">DOI: 10.1103/PhysRev.34.163</a></li>
<li><span id="ref-dastin2018"></span>Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. <em>Reuters</em>. <a href="https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G">https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G</a></li>
<li><span id="ref-bolukbasi2016"></span>Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V., &amp; Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In <em>Advances in Neural Information Processing Systems 29</em> (NeurIPS 2016). arXiv:1607.06520</li>
<li><span id="ref-zhao2017"></span>Zhao, J., Wang, T., Yatskar, M., Ordonez, V., &amp; Chang, K.-W. (2017). Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In <em>Proceedings of EMNLP 2017</em>, pp. 2979–2989. <a href="https://aclanthology.org/D17-1323/">ACL Anthology: D17-1323</a></li>
<li><span id="ref-buolamwini2018"></span>Buolamwini, J., &amp; Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In <em>Proceedings of the 1st Conference on Fairness, Accountability and Transparency</em> (FAT* 2018), PMLR Vol. 81, pp. 77–91. <a href="https://proceedings.mlr.press/v81/buolamwini18a.html">https://proceedings.mlr.press/v81/buolamwini18a.html</a></li>
<li><span id="ref-kotek2023"></span>Kotek, H., Dockum, R., &amp; Sun, D. Q. (2023). Gender bias and stereotypes in large language models. In <em>Proceedings of The ACM Collective Intelligence Conference</em> (CI &lsquo;23), pp. 12–24. <a href="https://doi.org/10.1145/3582269.3615599">DOI: 10.1145/3582269.3615599</a></li>
<li><span id="ref-zemel2013"></span>Zemel, R., Wu, Y., Swersky, K., Pitassi, T., &amp; Dwork, C. (2013). Learning fair representations. In <em>Proceedings of the 30th International Conference on Machine Learning</em> (ICML 2013), PMLR Vol. 28, No. 3, pp. 325–333. <a href="https://proceedings.mlr.press/v28/zemel13.html">https://proceedings.mlr.press/v28/zemel13.html</a></li>
<li><span id="ref-hardt2016"></span>Hardt, M., Price, E., &amp; Srebro, N. (2016). Equality of opportunity in supervised learning. In <em>Advances in Neural Information Processing Systems 29</em> (NeurIPS 2016), pp. 3323–3331. arXiv:1610.02413</li>
<li><span id="ref-agarwal2018"></span>Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., &amp; Wallach, H. (2018). A reductions approach to fair classification. In <em>Proceedings of the 35th International Conference on Machine Learning</em> (ICML 2018), PMLR Vol. 80, pp. 60–69. arXiv:1803.02453</li>
<li><span id="ref-mitchell2021"></span>Mitchell, S., Potash, E., Barocas, S., D&rsquo;Amour, A., &amp; Lum, K. (2021). Algorithmic fairness: Choices, assumptions, and definitions. <em>Annual Review of Statistics and Its Application</em>, 8, 141–163. <a href="https://doi.org/10.1146/annurev-statistics-042720-125902">DOI: 10.1146/annurev-statistics-042720-125902</a></li>
<li><span id="ref-narayanan2018"></span>Narayanan, A. (2018). <em>21 Fairness Definitions and Their Politics</em>. Tutorial at FAT* 2018. <a href="https://facctconference.org/static/tutorials/narayanan-21defs18.pdf">PDF</a></li>
<li><span id="ref-barocas2023"></span>Barocas, S., Hardt, M., &amp; Narayanan, A. (2023). <em>Fairness and Machine Learning: Limitations and Opportunities</em>. MIT Press. <a href="https://fairmlbook.org">https://fairmlbook.org</a></li>
<li><span id="ref-binns2018"></span>Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. In <em>Proceedings of the 2018 Conference on Fairness, Accountability, and Transparency</em> (FAT* 2018), PMLR Vol. 81, pp. 149–159. arXiv:1712.03586</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-11-05</strong>: Updated the Zhao et al. (2017) cooking statistics to match the paper: 67% female agents for cooking in the training set (33% was the male share), amplified to 84% female at inference.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>What the Videography Manual Didn&#39;t Cover: Filming Music Education</title>
      <link>https://sebastianspicker.github.io/posts/filming-music-education/</link>
      <pubDate>Tue, 13 Feb 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/filming-music-education/</guid>
      <description>The classroom videography manual we published in 2023 was about filming teaching. Music education has the same word in it — teaching — but it is a fundamentally different recording challenge. Sound is the subject matter. The lesson is often one person, in a practice room. And the feedback cycle the teacher needs to reach is mostly the one that happens when no camera is present. A reflection on what the manual missed, and a software prototype that tries to address part of it.</description>
      <content:encoded><![CDATA[<p><em>This post follows from the <a href="/posts/villa-videography-manual/">May 2023 post on the classroom videography
manual</a>. Read that one first if you want
the baseline.</em></p>
<hr>
<h2 id="the-assumption-underneath-the-manual">The Assumption Underneath the Manual</h2>
<p>The manual we published — Kramer, Spicker, and Kaspar, 2023, open access at
<a href="https://kups.ub.uni-koeln.de/65599/">kups.ub.uni-koeln.de/65599</a> — is a
good document for what it is. It covers a classroom. It assumes a teacher
in front of twenty to thirty students, a forty-five minute lesson, a room
with windows that create backlighting problems, a consent process that
involves four institutional levels, and two static cameras facing each other
as the baseline configuration.</p>
<p>All of that is correct for the context it addresses. The context is
school-based subject teaching: physics, mathematics, German, history. The
University of Cologne teacher education programme we developed the manual
for is primarily about preparing people for exactly that context.</p>
<p>When I moved to the Cologne University of Music, I brought the same assumptions
with me. It took a while for me to notice how much the new context violated
them.</p>
<hr>
<h2 id="sound-is-not-the-same-problem">Sound Is Not the Same Problem</h2>
<p>In the manual, the section on audio equipment is focused on speech capture.
The recommendation — lavalier microphones for the teacher, boundary
microphones at the cameras for student audio — is correct for a lesson where
the subject matter is communicated through talking. The teacher talks. The
students talk back. The quality criterion for the audio is: can we understand
what is being said?</p>
<p>In music education, the subject matter <em>is</em> sound. What the student
produces acoustically is not background noise supporting verbal instruction —
it is the object of the lesson. And it is produced by instruments that
have almost nothing in common acoustically with a human voice.</p>
<p>A lavalier microphone clipped to a teacher&rsquo;s collar, positioned to capture
speech from thirty centimetres away, will record a student&rsquo;s piano playing
through the back of the teacher&rsquo;s head, through the air, through a
directional capsule aimed at the wrong thing. The resulting audio is
technically present and analytically useless.</p>
<p>Instruments have frequency ranges, dynamic ranges, and directional patterns
that require completely different microphone selection and placement. A
violin at fortissimo in a small practice room will clip every speech-grade
microphone in the room. A pianissimo pianists&rsquo; breath-controlled passage
that a skilled listener can hear clearly will barely register on a distant
boundary microphone designed to capture &ldquo;the general acoustic environment.&rdquo;
The distinction between a correctly produced tone and an incorrectly produced
tone — which is the actual content of the lesson — may or may not be
audible in the captured audio depending on whether anyone thought about
microphone choice before walking through the door.</p>
<p>The manual&rsquo;s principle of &ldquo;as much as necessary, as little as possible&rdquo;
still applies, but &ldquo;necessary&rdquo; is a completely different specification
here.</p>
<hr>
<h2 id="the-one-to-one-lesson-problem">The One-to-One Lesson Problem</h2>
<p>The classroom videography framework — including the manual — is built around
a structural assumption: there is a teacher, and there is a class.
The teacher stands or moves at the front; the students are arrayed in rows
or groups. Two cameras can cover this because the spatial structure is
relatively stable and the relevant action is roughly predictable.</p>
<p>A university instrumental lesson is typically one-to-one, in a small
practice room, for sixty minutes. The spatial structure is two people
close together around an instrument. The relevant action includes:</p>
<ul>
<li>The teacher demonstrating a passage on their own instrument</li>
<li>The teacher making a physical correction — adjusting bow arm position,
repositioning the student&rsquo;s hand on the fingerboard, demonstrating
breath support by putting a hand on the student&rsquo;s diaphragm</li>
<li>The student playing and the teacher listening with their eyes closed</li>
<li>The teacher singing a melodic contour to show phrasing</li>
<li>Both of them playing at the same time (unison work, call and response)</li>
</ul>
<p>A standard two-camera classroom setup captures none of this usefully.
The standard framing — wide angle, teacher on one side, student on the
other — produces footage where &ldquo;something is happening near the piano&rdquo;
but where the analytically relevant detail (the finger position, the
bow angle, the postural correction) is invisible at normal viewing distance.</p>
<p>You need different framing. You probably need closer cameras. You might
need a third angle for body position. And you need to accept that this
raises the setup complexity substantially beyond what the manual recommends
as a baseline.</p>
<hr>
<h2 id="what-the-lesson-is-actually-about">What the Lesson Is Actually About</h2>
<p>There is a deeper structural difference that the equipment and setup
challenges are symptoms of.</p>
<p>In subject-matter teaching, the lesson is the unit of analysis. A
forty-five-minute lesson has a beginning, a development, a conclusion.
The teacher enters with a plan; the video captures how that plan was
executed and how the students responded. The analytical interest is in
the lesson as a coherent pedagogical event.</p>
<p>In instrumental music education, the lesson is a container for cycles.
A student plays a passage. The teacher identifies a problem — the
intonation at bar twelve, the tendency to rush the syncopated rhythm,
the bow pressure collapsing in the crescendo. The teacher says or
demonstrates something. The student tries again. The teacher listens
to what changed and what did not.</p>
<p>These cycles are the unit of analysis, and they happen dozens of times
in a single lesson. The lesson-level video is useful context, but the
analytically interesting question is inside the cycle: what did the
teacher identify, what intervention did they choose, what happened to
the student&rsquo;s playing afterward?</p>
<p>Capturing those cycles in usable form requires not just video of the
lesson but video that is indexed to them — where each attempt-and-response
pair can be located and compared. A continuous recording of a sixty-minute
lesson is not organised for this purpose. Timestamps help but do not
replace the work of finding and annotating each cycle.</p>
<hr>
<h2 id="the-absent-camera-problem">The Absent Camera Problem</h2>
<p>There is a more fundamental issue that no amount of improved equipment
configuration addresses.</p>
<p>The feedback cycle a teacher most wants to reach is the one that happens
in a student&rsquo;s practice session. Between lessons, the student is alone
in a practice room, working through the same passages, repeating the same
mistakes (or, occasionally, having the experience of something going right
for reasons they do not fully understand). The teacher&rsquo;s instructions from
the last lesson are present only in the student&rsquo;s memory of them, which is
fallible and partial.</p>
<p>The videography manual is about research documentation: a trained operator,
institutional consent, equipment brought in from outside. None of that is
available in a student&rsquo;s practice session at eleven o&rsquo;clock on a Wednesday
night. And even if you could film it — which you could, technically, with
a phone — the resulting footage would be unwatched, because no workflow
exists to get it from the student&rsquo;s device to the teacher&rsquo;s eyes in a form
that supports structured feedback.</p>
<p>The practical reality is that most music teachers receive feedback about a
student&rsquo;s practice only through the student&rsquo;s report of it (&ldquo;I practiced
every day&rdquo;) and through the evidence presented in the lesson (which may or
may not reflect what practice actually looked like). The gap between
practice and lesson feedback is a structural feature of music education,
and it is not something that research videography can address.</p>
<hr>
<h2 id="a-software-response">A Software Response</h2>
<p>The tool I built to think through this problem is called Resonance, and it
is available at <a href="https://github.com/sebastianspicker/resonance">github.com/sebastianspicker/resonance</a>.</p>
<p>The design is deliberately different from the research videography model.
Instead of an external camera operator documenting a lesson for later
analysis, Resonance puts the documentation instrument in the student&rsquo;s
hands. Students capture short audio or video clips of their own practice —
snippets of a passage they want the teacher to hear, a moment where
something went wrong, a phrase they are finally getting right — and submit
them to a course. The teacher reviews the queue and adds feedback with
timestamped annotations: &ldquo;at 0:23, the bow pressure drops — this is what
is generating the scratch.&rdquo;</p>
<p>The asymmetry is intentional. The student decides what to document.
The teacher provides structured, specific feedback. The cycle is
asynchronous — the student submits at eleven on a Wednesday night; the
teacher responds Thursday morning — which means it is independent of
the lesson schedule.</p>
<p>The technical decisions follow from the use context. Students practice in
rooms where connectivity is unreliable, so the app is offline-first:
recordings are captured locally and uploaded when a connection is available.
An iPad is the natural form factor for a music student — larger screen,
better camera, sits on a music stand. The backend is standard (Node.js,
Postgres, S3-compatible object storage) because the interesting problem here
is not the infrastructure but the workflow.</p>
<p>Resonance is a prototype and a proof of concept, not a production system.
The authentication is explicitly development-mode only. The goal was to
build enough of the thing to be able to think clearly about what it does
and does not solve.</p>
<hr>
<h2 id="what-it-does-not-solve">What It Does Not Solve</h2>
<p>Resonance addresses the absent-camera problem for the practice-to-feedback
loop. It does not address the research documentation problem that the
videography manual was written for.</p>
<p>If you want to study <em>how music teachers give feedback</em> — as a research
question about teaching practice, not just as a workflow tool — you still
need the full apparatus: controlled recording conditions, appropriate
microphones for instruments, multi-camera coverage of the lesson, consent
for the resulting footage to be used for research and teaching purposes,
and post-processing that produces an analytically usable document.</p>
<p>Resonance footage is not that. It is what a student chose to capture on an
iPad in a practice room, with whatever acoustic environment happened to be
present. It is useful for the practice-feedback cycle; it is not a research
record.</p>
<p>The challenges I described in the first two sections — appropriate
microphones, multi-angle coverage of one-to-one lessons, capture of
the practice cycle rather than the lesson arc — are still open problems
for anyone trying to do systematic observational research in music education.
The manual gives you the framework for thinking about them. It does not
give you solutions, because those solutions are context-specific and, in
several cases, not yet worked out by the field.</p>
<p>What I find interesting is that the two problems — research documentation
and practice-feedback — might look the same (filming music education)
but require almost entirely different responses. Getting clear on which
problem you are solving turns out to be most of the work.</p>
<hr>
<p><em>The full classroom videography manual is at
<a href="https://kups.ub.uni-koeln.de/65599/">kups.ub.uni-koeln.de/65599</a>.
The Resonance repository is at
<a href="https://github.com/sebastianspicker/resonance">github.com/sebastianspicker/resonance</a>.</em></p>
<hr>
<h2 id="references">References</h2>
<p>Kramer, C., Spicker, S. J., &amp; Kaspar, K. (2023). <em>Manual zur Erstellung
von Unterrichtsvideographien</em>. KUPS Open Access.
<a href="https://kups.ub.uni-koeln.de/65599/">https://kups.ub.uni-koeln.de/65599/</a></p>
<p>Lehmann, A. C., Sloboda, J. A., &amp; Woody, R. H. (2007). <em>Psychology for
Musicians: Understanding and Acquiring the Skills</em>. Oxford University Press.</p>
<p>Presland, C. (2005). Conservatoire student and instrumental professor:
The student perspective on a complex relationship. <em>British Journal of Music
Education</em>, 22(3), 237–248.
<a href="https://doi.org/10.1017/S0265051705006558">https://doi.org/10.1017/S0265051705006558</a></p>
<p>Creech, A., &amp; Hallam, S. (2011). Learning a musical instrument: The
influence of interpersonal interaction on outcomes for school-aged pupils.
<em>Psychology of Music</em>, 39(1), 102–122.
<a href="https://doi.org/10.1177/0305735610370222">https://doi.org/10.1177/0305735610370222</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>When Musicians Lock In: Coupled Oscillators and the Physics of Ensemble Synchronisation</title>
      <link>https://sebastianspicker.github.io/posts/kuramoto-ensemble-sync/</link>
      <pubDate>Thu, 08 Feb 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/kuramoto-ensemble-sync/</guid>
      <description>Every ensemble faces the same physical problem: N oscillators with slightly different natural frequencies trying to synchronise through a shared coupling channel. The Kuramoto model — developed by a statistical physicist to describe fireflies, neurons, and power grids — applies directly to musicians. It predicts a phase transition between incoherence and synchrony, quantifies why latency destroys networked ensemble performance, and connects to recent EEG studies of inter-brain synchronisation.</description>
      <content:encoded><![CDATA[<p><em>The problem is ancient and the language for it is recent. In any ensemble — a
string quartet, a jazz rhythm section, an orchestra — musicians with slightly
different internal tempos must stay together. They do this by listening to each
other. But what, exactly, does &ldquo;listening to each other&rdquo; do to their timing? And
what happens when the listening channel is imperfect — delayed by the speed of
sound across a wide stage, or by a network cable crossing a continent? The answer
involves a differential equation that was not written to describe music.</em></p>
<p><em>This post extends the latency analysis in <a href="/posts/nmp-latency-lola-mvtp/">Latency in Networked Music
Performance</a> with the dynamical systems framework
that underlies it.</em></p>
<hr>
<h2 id="two-clocks-on-a-board">Two Clocks on a Board</h2>
<p>The first documented observation of coupled-oscillator synchronisation was made
not by a musician but by a physicist. In 1665, Christiaan Huygens, confined to
bed with illness, was watching two pendulum clocks mounted on the same wooden
beam. Over the course of the night, the pendulums had synchronised into
<em>anti-phase</em> oscillation — swinging in opposite directions in exact unison.
He reported it to his father:</p>
<blockquote>
<p>&ldquo;I have noticed a remarkable effect which no-one has observed before&hellip; two
clocks on the same board always end up in mutual synchrony.&rdquo;</p>
</blockquote>
<p>The mechanism was mechanical coupling through the beam. Each pendulum&rsquo;s swing
imparted a small impulse to the wood; the other pendulum felt this as a
perturbation to its rhythm. Small perturbations, accumulated over hours, drove
the clocks into a shared frequency and a fixed phase relationship.</p>
<p>This is the prototype of every ensemble synchronisation problem. Each musician
is a clock. The acoustic environment — the air in the room, the reflected sound
from the walls, the vibrations through the stage floor — is the wooden beam.</p>
<hr>
<h2 id="the-kuramoto-model">The Kuramoto Model</h2>
<p>Yoshiki Kuramoto formalised the mathematics of coupled oscillators in 1975,
motivated by biological synchronisation problems: firefly flashing, circadian
rhythms, cardiac pacemakers. His model considers $N$ oscillators, each with a
phase $\theta_i(t)$ evolving according to:</p>
$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin(\theta_j - \theta_i), \qquad i = 1, \ldots, N.$$<p>The first term, $\omega_i$, is the oscillator&rsquo;s <em>natural frequency</em> — the tempo it
would maintain in isolation. These are drawn from a distribution $g(\omega)$, which
in a real ensemble reflects the spread of individual preferred tempos among the
players. The second term is the coupling: each oscillator is attracted toward the
phases of all others, with strength $K/N$. The factor $1/N$ keeps the total
coupling intensive (independent of ensemble size) as $N$ grows large.</p>
<p>Musically: $\theta_i$ is the phase of musician $i$&rsquo;s internal pulse at a given
moment, $\omega_i$ is their preferred tempo if playing alone, and $K$ is the
coupling strength — how much they adjust their tempo in response to what they
hear from the others.</p>
<hr>
<h2 id="the-order-parameter-and-the-phase-transition">The Order Parameter and the Phase Transition</h2>
<p>To measure the degree of synchronisation, Kuramoto introduced the complex order
parameter:</p>
$$r(t)\, e^{i\psi(t)} = \frac{1}{N} \sum_{j=1}^{N} e^{i\theta_j(t)},$$<p>where $r(t) \in [0, 1]$ is the <em>coherence</em> of the ensemble and $\psi(t)$ is the
collective mean phase. When $r = 0$, the phases are uniformly spread around the
unit circle — the ensemble is incoherent. When $r = 1$, all phases coincide —
perfect synchrony. In a live ensemble, $r$ is a direct measure of rhythmic
cohesion, though of course not one you can read off a score.</p>
<p>Substituting the order parameter into the equation of motion:</p>
$$\frac{d\theta_i}{dt} = \omega_i + K r \sin(\psi - \theta_i).$$<p>Each oscillator now interacts only with the mean-field quantities $r$ and $\psi$,
not with every other oscillator individually. The coupling pulls each musician
toward the collective mean phase with a force proportional to both $K$ (how
attentively they listen) and $r$ (how coherent the group already is).</p>
<p>This mean-field form reveals the essential physics. For small $K$, oscillators
with widely differing $\omega_i$ cannot follow the mean field — they drift at
their own frequencies, and $r \approx 0$. At a critical coupling strength $K_c$,
a macroscopic fraction of oscillators suddenly locks to a shared frequency, and
$r$ begins to grow continuously from zero. For a unimodal,
symmetric frequency distribution $g(\omega)$ with density $g(\bar\omega)$ at the
mean:</p>
$$K_c = \frac{2}{\pi\, g(\bar\omega)}.$$<p>Above $K_c$, the coherence grows as:</p>
$$r \approx \sqrt{\frac{K - K_c}{K_c}}, \qquad K \gtrsim K_c.$$<p>This is a <strong>second-order (continuous) phase transition</strong> — the same
mathematical structure as a ferromagnet approaching the Curie temperature,
where spontaneous magnetisation appears continuously above a critical coupling.
The musical ensemble and the magnetic material belong to the same universality
class, governed by the same mean-field exponent $\frac{1}{2}$.</p>
<p>Above $K_c$, the fraction of oscillators that are <em>locked</em> (synchronised to the
mean-field frequency) can be computed explicitly. An oscillator with natural
frequency $\omega_i$ locks to the mean field if $|\omega_i - \bar\omega| \leq
Kr$. For a Lorentzian distribution $g(\omega) = \frac{\gamma/\pi}{(\omega -
\bar\omega)^2 + \gamma^2}$, this yields:</p>
$$r = \sqrt{1 - \frac{K_c}{K}}, \qquad K_c = 2\gamma,$$<p>which is the exact self-consistency equation for the Kuramoto model with
Lorentzian frequency spread (Strogatz, 2000).</p>
<p>The physical reading is direct: whether an ensemble locks into a shared pulse or
drifts apart is a threshold phenomenon. A group of musicians with similar
preferred tempos has a peaked $g(\bar\omega)$, giving a low $K_c$ — they
synchronise easily with minimal attentive listening. A group with widely varying
individual tempos needs stronger, more sustained coupling to cross the threshold.
This is not a matter of musical discipline; it is a material property of the
ensemble.</p>
<hr>
<h2 id="concert-hall-applause-neda-et-al-2000">Concert Hall Applause: Neda et al. (2000)</h2>
<p>The Kuramoto model is not only a theoretical construction. Neda et al. (2000)
applied it to concert hall applause — one of the most direct real-world
demonstrations of coupled-oscillator dynamics in a musical context.</p>
<p>They recorded applause in Romanian and Hungarian theaters and found that audiences
spontaneously alternate between two distinct states. In the <em>incoherent</em> regime,
each audience member claps at their own preferred rate (typically 2–3 Hz). Through
acoustic coupling — each person hears the room-averaged sound and adjusts their
clapping — the audience gradually synchronises to a shared, slower frequency
(around 1.5 Hz): the <em>synchronised</em> regime.</p>
<p>The transitions between the two regimes are quantitatively consistent with the
Kuramoto phase transition: the emergence of synchrony corresponds to $K$ crossing
$K_c$ as people progressively pay more attention to the collective sound.
Furthermore, Neda et al. document a characteristic phenomenon when synchrony
breaks down: individual clapping frequency approximately <em>doubles</em> as audience
members attempt to re-establish coherence. This frequency-doubling — a feature of
nonlinear oscillator systems near instability — is exactly what the delayed
response of coupling near $K_c$ predicts.</p>
<p>The paper is a useful pedagogical artefact: every music student has experienced
concert hall applause, and hearing that it undergoes a physically measurable phase
transition makes the connection between physics and musical experience concrete.</p>
<hr>
<h2 id="latency-and-the-limits-of-networked-ensemble-performance">Latency and the Limits of Networked Ensemble Performance</h2>
<p>In standard acoustic ensemble playing, the coupling delay is the propagation time
for sound to cross the ensemble: at $343\ \text{m/s}$, across a ten-metre stage,
roughly 30 ms. This is why orchestral seating is arranged with attention to who
needs to hear whom first.</p>
<p>In networked music performance (NMP), the coupling delay $\tau$ is much larger:
tens to hundreds of milliseconds depending on geographic distance and network
infrastructure. The Kuramoto model generalises naturally to include this delay:</p>
$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin\!\bigl(\theta_j(t - \tau) - \theta_i(t)\bigr).$$<p>Each musician hears the others&rsquo; phases as they were $\tau$ seconds ago, not as
they are now.</p>
<p>In a synchronised state where all oscillators share the collective frequency
$\bar\omega$ and phase $\psi(t) = \bar\omega t$, the delayed phase signal is
$\psi(t - \tau) = \bar\omega t - \bar\omega\tau$. The effective coupling
force contains a factor $\cos(\bar\omega\tau)$: the delay introduces a phase
shift that reduces the useful component of the coupling. The critical coupling
with delay is therefore:</p>
$$K_c(\tau) = \frac{K_c(0)}{\cos(\bar\omega \tau)}.$$<p>As $\tau$ increases, $K_c(\tau)$ grows: synchronisation requires progressively
stronger coupling (more attentive adjustment) to compensate for the information
lag. The denominator $\cos(\bar\omega\tau)$ reaches zero when
$\bar\omega\tau = \pi/2$. At this point $K_c(\tau) \to \infty$: no finite coupling
strength can maintain synchrony. The critical delay is:</p>
$$\tau_c = \frac{\pi}{2\bar\omega}.$$<p>For an ensemble performing at 120 BPM, the beat frequency is
$\bar\omega = 2\pi \times 2\ \text{Hz} = 4\pi\ \text{rad/s}$:</p>
$$\tau_c = \frac{\pi}{2 \times 4\pi} = \frac{1}{8}\ \text{s} = 125\ \text{ms}.$$<p>This is a remarkably clean result. The Kuramoto model with delay predicts that
ensemble synchronisation collapses at around 125 ms one-way delay for a standard
performance tempo. The empirical literature on NMP — from LoLa deployments across
European conservatories to controlled latency studies in the lab — consistently
finds that rhythmic coherence degrades noticeably above 50–80 ms and becomes
essentially unworkable above 100–150 ms one-way. The model and the data agree.</p>
<p>The derivation also shows why faster tempos are harder in NMP: $\tau_c \propto
1/\bar\omega$, so doubling the tempo halves the tolerable latency. An ensemble
performing at 240 BPM in a distributed setting faces a theoretical ceiling of
62 ms — which rules out transcontinental performance for most repertoire.</p>
<hr>
<h2 id="brains-in-sync-eeg-hyperscanning">Brains in Sync: EEG Hyperscanning</h2>
<p>The Kuramoto framework has recently been applied at a neural level.
EEG hyperscanning — simultaneous EEG recording from multiple participants during
a shared musical activity — has shown that musicians performing together exhibit
<em>inter-brain synchronisation</em>: coherent cortical oscillations at the frequency of
the music are measurable between players (Lindenberger et al., 2009; Müller et
al., 2013). The phase coupling between brains during joint performance is
significantly higher than during solo performance and higher than for musicians
playing simultaneously but without acoustic coupling.</p>
<p>This suggests that the Kuramoto coupling operates at two levels: the acoustic
(each musician hears the other and adjusts physical timing) and the neural (each
musician&rsquo;s cortical oscillators entrain to the shared musical pulse). The
question of which level is primary — whether neural synchrony causes or follows
from acoustic synchrony — remains open.</p>
<p>A 2023 review by Demos and Palmer argues that pairwise Kuramoto-type coupling is
insufficient to capture full ensemble dynamics. Group-level effects — the
differentiation between leader and follower roles, the emergence of collective
timing that no individual would produce alone — require nonlinear dynamical
frameworks that go beyond mean-field averaging. The model that adequately
describes a string quartet may need to be richer than the one that describes a
population of identical fireflies.</p>
<hr>
<h2 id="what-this-means-for-teaching">What This Means for Teaching</h2>
<p>The Kuramoto model reframes standard rehearsal intuitions in physical terms.</p>
<p><strong>&ldquo;Listen more&rdquo;</strong> translates to &ldquo;increase your effective coupling constant $K$.&rdquo;
A musician who plays without attending to others has set $K \approx 0$ and will
drift freely according to their own $\omega_i$. Listening — actively adjusting
tempo in response to what you hear — is not metaphorical. It is the physical
mechanism of coupling, and its effect is to pull you toward the mean phase $\psi$
with a force $Kr\sin(\psi - \theta_i)$.</p>
<p><strong>&ldquo;Our tempos are too different&rdquo;</strong> is a claim about $g(\bar\omega)$ and therefore
about $K_c$. A group with a wide spread of natural tempos needs more and stronger
listening to synchronise. This is not a moral failing but a parameter; it
suggests that ensemble warm-up time or explicit tempo negotiation before a
performance serves to reduce the spread of natural frequencies before the coupling
has to do all the work.</p>
<p><strong>Latency as a rehearsal experiment</strong> can be made explicit. Artificially delaying
the acoustic return to one musician in an ensemble — via headphone monitoring with
variable delay — allows students to experience directly how the coordination
degrades as $\tau$ increases toward $\tau_c$. They feel the system approaching
the phase transition without the theoretical framework, but the framework makes
the experience interpretable afterward.</p>
<p><strong>The click track</strong> replaces peer-to-peer Kuramoto coupling with an external
forcing term: each musician locks to a shared reference with fixed $\omega$
rather than adjusting dynamically to the group mean. This eliminates the phase
transition but also eliminates the adaptive dynamics — the micro-timing
fluctuations and expressive rubato — that characterise live ensemble playing. It
is a pedagogically important distinction, even if studios routinely make the
pragmatic choice.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Demos, A. P., &amp; Palmer, C.
(2023). Social and nonlinear dynamics unite: Musical group synchrony. <em>Trends
in Cognitive Sciences</em>, 27(11), 1008–1018.
<a href="https://doi.org/10.1016/j.tics.2023.08.005">https://doi.org/10.1016/j.tics.2023.08.005</a></p>
</li>
<li>
<p>Huygens, C. (1665). Letter to his father Constantijn Huygens, 26 February
1665. In <em>Œuvres complètes de Christiaan Huygens</em>, Vol. 5, p. 243. Martinus
Nijhoff, 1893.</p>
</li>
<li>
<p>Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear
oscillators. In H. Araki (Ed.), <em>International Symposium on Mathematical
Problems in Theoretical Physics</em> (Lecture Notes in Physics, Vol. 39,
pp. 420–422). Springer.</p>
</li>
<li>
<p>Kuramoto, Y. (1984). <em>Chemical Oscillations, Waves, and Turbulence.</em> Springer.</p>
</li>
<li>
<p>Lindenberger, U., Li, S.-C., Gruber, W., &amp; Müller, V. (2009). Brains swinging
in concert: Cortical phase synchronization while playing guitar.
<em>BMC Neuroscience</em>, 10, 22. <a href="https://doi.org/10.1186/1471-2202-10-22">https://doi.org/10.1186/1471-2202-10-22</a></p>
</li>
<li>
<p>Müller, V., Sänger, J., &amp; Lindenberger, U. (2013). Intra- and inter-brain
synchronization during musical improvisation on the guitar. <em>PLOS ONE</em>, 8(9),
e73852. <a href="https://doi.org/10.1371/journal.pone.0073852">https://doi.org/10.1371/journal.pone.0073852</a></p>
</li>
<li>
<p>Neda, Z., Ravasz, E., Vicsek, T., Brechet, Y., &amp; Barabási, A.-L. (2000).
Physics of the rhythmic applause. <em>Physical Review E</em>, 61(6), 6987–6992.
<a href="https://doi.org/10.1103/PhysRevE.61.6987">https://doi.org/10.1103/PhysRevE.61.6987</a></p>
</li>
<li>
<p>Strogatz, S. H. (2000). From Kuramoto to Crawford: Exploring the onset of
synchronization in populations of coupled oscillators. <em>Physica D: Nonlinear
Phenomena</em>, 143(1–4), 1–20.
<a href="https://doi.org/10.1016/S0167-2789(00)00094-4">https://doi.org/10.1016/S0167-2789(00)00094-4</a></p>
</li>
<li>
<p>Strogatz, S. H. (2003). <em>Sync: How Order Emerges from Chaos in the Universe,
Nature, and Daily Life.</em> Hyperion.</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-14</strong>: Updated the author list for the Demos (2023) <em>Trends in Cognitive Sciences</em> reference to the published two authors (Demos &amp; Palmer). The five names previously listed were from a different Demos paper.</li>
<li><strong>2026-01-14</strong>: Changed &ldquo;period-doubling&rdquo; to &ldquo;frequency-doubling.&rdquo; When the clapping frequency doubles, the period halves; &ldquo;frequency-doubling&rdquo; is the precise term in this context.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Impossible Heptagon</title>
      <link>https://sebastianspicker.github.io/posts/tool-impossible-heptagon/</link>
      <pubDate>Mon, 15 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/tool-impossible-heptagon/</guid>
      <description>Danny Carey calls it sacred geometry. Gauss proved it non-constructible. They are, unexpectedly, describing the same object.</description>
      <content:encoded><![CDATA[<p>Danny Carey — drummer of Tool, one of the most rhythmically inventive musicians in rock — keeps a seven-pointed star on his kit and speaks about it using the language of sacred geometry. The heptagram appears in Tool&rsquo;s visual artwork, in the Thelemic symbolism Carey draws on, in pre-modern cosmological diagrams, and in the decorative traditions of several cultures that had no contact with each other. The claim, loosely stated, is that seven-fold symmetry is privileged: that it reflects something structurally true, that its forms carry significance beyond the aesthetic.</p>
<p>The scientific reflex here is usually impatience. &ldquo;Sacred geometry&rdquo; occupies an uncomfortable cultural space — mathematically dressed, factually thin, reliant on the listener not checking claims too carefully. The golden ratio does not appear everywhere in nature. Most things described as sacred in this tradition are better described as things the speaker found surprising before learning a more precise vocabulary.</p>
<p>But the heptagon is genuinely strange. Not for the reasons usually given. For a different reason — a theorem.</p>
<p><strong>The regular heptagon cannot be constructed with compass and straightedge.</strong></p>
<p>Not &ldquo;it is difficult.&rdquo; Not &ldquo;no one has found a construction yet.&rdquo; The regular seven-sided polygon — all sides equal, all interior angles equal — is <em>provably impossible</em> to construct using an unmarked ruler and compass in finitely many steps. This has been known since 1801.</p>
<h2 id="the-classical-constraint">The Classical Constraint</h2>
<p>Greek geometry restricted its tools deliberately. An unmarked straightedge draws lines through two known points. A compass draws circles centred at a known point with a given radius. No angle trisection. No markings. No graduated instruments. Just these two operations, applied one at a time, finitely many times.</p>
<p>Within this constraint, a great deal is achievable. A perpendicular bisector. An equilateral triangle. A regular pentagon — which requires the golden ratio and takes some work, but is reachable. A regular hexagon (trivially: six equilateral triangles around a centre).</p>
<p>Then: nothing for the heptagon. Greek geometers left no construction. Medieval Islamic mathematicians, who knew the regular polygon problem well, left no construction. Albrecht Dürer, in his 1525 <em>Underweysung der Messung</em>, gave an approximate construction that falls short by a small but nonzero margin. Each generation encountered the same wall.</p>
<p>In 1796, an 18-year-old Gauss proved that the regular 17-gon <em>is</em> constructible — a result so unexpected that he reportedly decided at that moment to become a mathematician rather than a philologist. In his 1801 <em>Disquisitiones Arithmeticae</em> he gave the complete characterisation of which regular polygons are constructible and which are not <a href="#ref-1">[1]</a>. The heptagon was definitively placed among the impossible.</p>
<h2 id="gausss-theorem">Gauss&rsquo;s Theorem</h2>
<p>A regular $n$-gon is constructible with compass and straightedge if and only if $n$ has the form</p>
$$n = 2^k \cdot p_1 \cdot p_2 \cdots p_m$$<p>where $k \geq 0$ and the $p_i$ are distinct <strong>Fermat primes</strong> — primes of the form $2^{2^j} + 1$.</p>
<p>The Fermat primes currently known:</p>
<table>
  <thead>
      <tr>
          <th>$j$</th>
          <th>$F_j = 2^{2^j}+1$</th>
          <th>Prime?</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>0</td>
          <td>3</td>
          <td>✓</td>
      </tr>
      <tr>
          <td>1</td>
          <td>5</td>
          <td>✓</td>
      </tr>
      <tr>
          <td>2</td>
          <td>17</td>
          <td>✓</td>
      </tr>
      <tr>
          <td>3</td>
          <td>257</td>
          <td>✓</td>
      </tr>
      <tr>
          <td>4</td>
          <td>65537</td>
          <td>✓</td>
      </tr>
      <tr>
          <td>5</td>
          <td>4 294 967 297</td>
          <td>✗ (Euler, 1732)</td>
      </tr>
      <tr>
          <td>6</td>
          <td>18 446 744 073 709 551 617</td>
          <td>✗</td>
      </tr>
      <tr>
          <td>⋮</td>
          <td>⋮</td>
          <td>no further Fermat primes known</td>
      </tr>
  </tbody>
</table>
<p>Five Fermat primes are known, all identified by the seventeenth century. Fermat himself conjectured that all numbers of this form are prime; he was wrong from $j = 5$ onward. Whether any further Fermat primes exist remains an open problem.</p>
<p>The constructible regular polygons therefore include the triangle (3), square (4), pentagon (5), hexagon (6), octagon (8), decagon (10), 15-gon, 17-gon, 257-gon, 65537-gon, and products of these with powers of 2. The 65537-gon was actually fully constructed by Johann Gustav Hermes, who spent around ten years on the computation in the 1880s and deposited a manuscript reportedly filling a large trunk at the University of Göttingen, where it remains.</p>
<p>Seven is prime, but $7 \neq 2^{2^j} + 1$ for any $j$ — it is not a Fermat prime. Therefore the regular heptagon is not on the list. It is not constructible.</p>
<h2 id="the-algebra-behind-the-geometry">The Algebra Behind the Geometry</h2>
<p>Why does the structure of Fermat primes determine constructibility? The connection goes through algebra <a href="#ref-2">[2]</a><a href="#ref-3">[3]</a>.</p>
<p>Every compass-and-straightedge construction corresponds to solving a sequence of equations of degree at most 2. Bisecting an angle, finding an intersection of a line and a circle — each step is a quadratic operation. After $k$ such steps, the numbers reachable lie in some field extension of $\mathbb{Q}$ (the rationals) with degree over $\mathbb{Q}$ at most $2^k$. Constructibility therefore requires the degree of the relevant extension to be a power of 2.</p>
<p>To construct a regular $n$-gon, you need to construct the angle $2\pi/n$, which requires constructing $\cos(2\pi/n)$. The question is: over what kind of field extension does $\cos(2\pi/n)$ sit?</p>
<p>For $n = 7$: let $\omega = e^{2\pi i/7}$, a primitive 7th root of unity. The minimal polynomial of $\omega$ over $\mathbb{Q}$ is the 7th cyclotomic polynomial</p>
$$\Phi_7(x) = x^6 + x^5 + x^4 + x^3 + x^2 + x + 1,$$<p>which is irreducible over $\mathbb{Q}$, giving $[\mathbb{Q}(\omega) : \mathbb{Q}] = 6$. Since $\cos(2\pi/7) = (\omega + \omega^{-1})/2$, and since $\omega$ satisfies a degree-2 polynomial over $\mathbb{Q}(\cos 2\pi/7)$, we get</p>
$$[\mathbb{Q}(\cos 2\pi/7) : \mathbb{Q}] = 3.$$<p>Specifically, $c = \cos(2\pi/7)$ is the root of the irreducible cubic</p>
$$8c^3 + 4c^2 - 4c - 1 = 0,$$<p>or equivalently, $\alpha = 2\cos(2\pi/7)$ satisfies</p>
$$\alpha^3 + \alpha^2 - 2\alpha - 1 = 0.$$<p>The three roots of this cubic are $2\cos(2\pi/7)$, $2\cos(4\pi/7)$, and $2\cos(6\pi/7)$. By Vieta&rsquo;s formulas their sum is $-1$ and their product is $1$ — which can be verified directly from the identity $\cos(2\pi/7) + \cos(4\pi/7) + \cos(6\pi/7) = -1/2$.</p>
<p>The degree of the extension is 3. Three is not a power of 2. Therefore $\cos(2\pi/7)$ cannot be reached by any tower of quadratic extensions of $\mathbb{Q}$. Therefore the regular heptagon is not constructible. $\square$</p>
<p>Compare the pentagon: $\cos(2\pi/5) = (\sqrt{5}-1)/4$, satisfying the quadratic $4x^2 + 2x - 1 = 0$. Degree 2 — a power of 2. Constructible.</p>
<p>The 17-gon: the Galois group of $\mathbb{Q}(\zeta_{17})/\mathbb{Q}$ is $(\mathbb{Z}/17\mathbb{Z})^* \cong \mathbb{Z}/16\mathbb{Z}$, order $16 = 2^4$. The extension decomposes into four quadratic steps. This is exactly what Gauss computed at 18.</p>
<p>For 7: $(\mathbb{Z}/7\mathbb{Z})^* \cong \mathbb{Z}/6\mathbb{Z}$, order $6 = 2 \times 3$. The factor of 3 is the obstruction. The Galois group is not a 2-group, so the extension cannot be decomposed into quadratic steps. The heptagon is out of reach.</p>
<h2 id="sacred-precisely">Sacred, Precisely</h2>
<p>The phrase &ldquo;sacred geometry&rdquo; usually does work that &ldquo;elegant mathematics&rdquo; could do more honestly. But the heptagon is a case where something with genuine mathematical content sits underneath the mystical framing.</p>
<p>The Platonic tradition held that certain geometric forms exist as ideals — perfect, unchanging, more real than their physical approximations. The philosopher&rsquo;s claim is that the heptagon exists in a realm beyond its material instantiation. The mathematician&rsquo;s claim is: the heptagon is perfectly well-defined — seven equal sides, seven equal angles — but it cannot be reached from $\mathbb{Q}$ by the operations available to ruler and compass. You can approximate it to any desired precision. You can construct it exactly using origami, which allows angle trisection and is strictly more powerful than compass and straightedge <a href="#ref-4">[4]</a>. But the classical constructive program — the one that reaches the pentagon, the hexagon, the 17-gon, the 65537-gon — cannot reach the heptagon.</p>
<p>There is a precise mathematical sense in which it lies outside the constructible world. Whether that constitutes sacredness is a question for a different kind of argument. But it is not nothing. The Pythagoreans were working without Galois theory; they had an intuition without the theorem. The theorem, when it came, confirmed that intuition about seven while explaining it more clearly than they could.</p>
<p>Carey&rsquo;s intuition — that 7 sits outside the ordinary — is, by this route, formally correct.</p>
<h2 id="what-the-heptagram-is">What the Heptagram Is</h2>
<p>The regular heptagon may be impossible to construct exactly, but the heptagram — the seven-pointed star — is perfectly drawable. Connecting every second vertex of an approximate regular heptagon gives $\{7/2\}$ in Schläfli notation <a href="#ref-5">[5]</a>; connecting every third vertex gives $\{7/3\}$. Both are closed figures. Both appear throughout pre-modern symbolic traditions, which is unsurprising: they are the most intricate star polygons drawable with a single pen stroke before complexity outruns visibility.</p>
<p>They are also generators of rhythmic structure. Because 7 is prime, every star polygon on seven points visits all seven vertices in a single closed traversal — a property that does not hold for six-pointed or eight-pointed stars. This turns out to matter for how drum patterns are built across multiple bars. That connection — from the primality of 7 to the architecture of rhythmic accent cycles — is the subject of the companion post, <a href="/posts/tool-star-polygons-drum-machines/">Star Polygons and Drum Machines</a>.</p>
<p>The broader series on mathematics in Tool&rsquo;s music began with the Fibonacci structure embedded in the time signatures and syllable counts of &ldquo;Lateralus&rdquo; <a href="#ref-6">[6]</a>, and the group-theoretic structure underlying twelve-tone equal temperament provides the same algebraic scaffolding seen here <a href="#ref-7">[7]</a>.</p>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Gauss, C.F. (1801). <em>Disquisitiones Arithmeticae</em>. Leipzig: Fleischer. (§VII.)</p>
<p><span id="ref-2"></span>[2] Stewart, I. (2004). <em>Galois Theory</em> (3rd ed.). CRC Press. Ch. 4.</p>
<p><span id="ref-3"></span>[3] Conway, J.H. &amp; Guy, R.K. (1996). <em>The Book of Numbers</em>. Springer. pp. 190–202.</p>
<p><span id="ref-4"></span>[4] Hull, T. (2011). Solving cubics with creases: The work of Beloch and Lill. <em>The American Mathematical Monthly</em>, 118(4), 307–315. <a href="https://doi.org/10.4169/amer.math.monthly.118.04.307">DOI: 10.4169/amer.math.monthly.118.04.307</a></p>
<p><span id="ref-5"></span>[5] Coxeter, H.S.M. (1973). <em>Regular Polytopes</em> (3rd ed.). Dover. Ch. 2.</p>
<p><span id="ref-6"></span>[6] See <a href="/posts/fibonacci-lateralus/">Fibonacci and Lateralus</a> on this blog.</p>
<p><span id="ref-7"></span>[7] See <a href="/posts/twelve-tet-group-theory-musical-tuning/">Twelve-TET and Group Theory</a> on this blog.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Twelve Is Not an Accident: The Group Theory of Musical Tuning</title>
      <link>https://sebastianspicker.github.io/posts/twelve-tet-group-theory-musical-tuning/</link>
      <pubDate>Fri, 15 Dec 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/twelve-tet-group-theory-musical-tuning/</guid>
      <description>Why does the octave divide into twelve? The answer lies not in aesthetics but in the continued-fraction convergents of log₂(3/2) — and the same group structure that gives Messiaen his modes.</description>
      <content:encoded><![CDATA[<p>Sit down at a piano and count the keys in one octave. Twelve. Seven white, five black, twelve total pitch classes before the pattern repeats. Ask a musician why twelve and they will probably say something about Western tradition, the church modes, or maybe vaguely gesture at the circle of fifths. Ask a musicologist and you might hear about Pythagoras, or the development of equal temperament in the Baroque period, or the well-tempered tuning systems of J. S. Bach. All of that history is real and worth knowing. But none of it explains <em>why</em> the number 12 works, and why every serious attempt at a usable keyboard instrument across widely separated cultures converges on the same cardinality.</p>
<p>The real answer is in number theory. Specifically, it is in the continued fraction expansion of a single irrational number: $\log_2(3/2)$. The number 12 is not a cultural choice. It is the smallest integer that gives a genuinely good rational approximation to that number — subject to the constraint that a human hand can navigate the resulting keyboard. Once you see the argument, the feeling of contingency evaporates completely. Twelve is forced on us.</p>
<p>Along the way, the same mathematical structure — the cyclic group $\mathbb{Z}_{12}$ — explains why Messiaen&rsquo;s modes of limited transposition exist, why the circle of fifths closes exactly, and why certain chord types (augmented triads, diminished seventh chords, the whole-tone scale) have a strange self-similar quality that composers have exploited for centuries. If you want the full treatment of the Messiaen connection, I wrote a dedicated post: <a href="/posts/messiaen-modes-group-theory/">Messiaen, Modes, and the Group Theory of Harmony</a>. Here I want to build the foundations from scratch, starting with the one interval that makes all of this necessary.</p>
<hr>
<h2 id="the-interval-that-started-everything">The interval that started everything</h2>
<p>The perfect fifth has a frequency ratio of exactly 3:2. Play two strings in that ratio and the sound is stable, open, and unmistakably consonant — second only to the octave (2:1) in the hierarchy of simple intervals. The reason is physics: the overtone series of any vibrating string includes the fundamental frequency $f$, then $2f$, $3f$, $4f$, and so on. Two notes a perfect fifth apart share the overtone at $3f$ (for the lower note) and $2f'$ (for the upper note, where $f' = 3f/2$): those are the same frequency, $3f$. Shared overtones mean the two notes reinforce rather than fight each other. This is why the fifth sounds stable: it is literally built into the harmonic structure of physical vibration.</p>
<p>Humans discovered the fifth independently in ancient Greece, China, India, and Mesopotamia. It is not a cultural artifact <a href="#ref-4">[4]</a>. Given that stability, it is natural to ask: can we build a complete pitch system by stacking fifths? Take a starting note, go up a fifth, up another, up another, and keep going. The notes you produce — C, G, D, A, E, B, F♯, … — are acoustically related to the starting point in a simple way, and they sound good together. This is the Pythagorean tuning system, and it underlies the construction of diatonic scales.</p>
<p>But here is the problem. A fifth raises the pitch by a factor of 3/2. An octave raises it by a factor of 2. These are independent: one is a power of 3 and the other a power of 2, and no power of 3/2 will ever equal a power of 2 exactly. In the language of modern mathematics, $\log_2(3/2)$ is irrational — this follows directly from the fundamental theorem of arithmetic, since no product of powers of 2 can equal a product of powers of 3. Whether it is also transcendental is an open question; a proof would follow from Schanuel&rsquo;s conjecture, but that conjecture remains unresolved. What matters for tuning is the irrationality alone. Stacking pure fifths and stacking octaves are incommensurable operations. The circle of fifths can never close in pure Pythagorean tuning. We will always end up slightly sharp or flat relative to where we started.</p>
<p>This incommensurability is the central problem of musical tuning. Everything else — equal temperament, just intonation, meantone tuning, the Pythagorean comma, the whole apparatus of tuning theory — is a response to it.</p>
<hr>
<h2 id="equal-temperament-and-the-approximation-problem">Equal temperament and the approximation problem</h2>
<p>In an equal temperament with $N$ notes per octave, we divide the octave into $N$ equal logarithmic steps. Each step corresponds to a frequency ratio of $2^{1/N}$. We then ask: how many steps $k$ gives the best approximation to a perfect fifth?</p>
<p>The condition is simply that $2^{k/N}$ should be close to $3/2$, which means $k/N$ should be close to $\log_2(3/2)$. So we need a good rational approximation to</p>
$$\log_2\!\left(\frac{3}{2}\right) = \log_2 3 - 1 \approx 0.584962\ldots$$<p>The classical tool for finding best rational approximations is the continued fraction. Any real number $x$ can be written as</p>
$$x = a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \cdots}}}$$<p>where the $a_i$ are non-negative integers (positive for $i \geq 1$), called the partial quotients. For $\log_2(3/2)$ the expansion is</p>
$$\log_2\!\left(\frac{3}{2}\right) = [0;\, 1,\, 1,\, 2,\, 2,\, 3,\, 1,\, 5,\, 2,\, 23,\, 2,\, \ldots]$$<p>The truncated continued fractions — the convergents — give the sequence of best rational approximations:</p>
$$\frac{0}{1},\quad \frac{1}{1},\quad \frac{1}{2},\quad \frac{3}{5},\quad \frac{7}{12},\quad \frac{24}{41},\quad \frac{31}{53},\quad \frac{179}{306},\quad \ldots$$<p>Each convergent $k/N$ corresponds to a tuning system: the denominator $N$ is the number of equal steps per octave, and the numerator $k$ is the number of steps that best approximates a fifth. So we get: 1-TET (trivial), 2-TET (trivial), 5-TET, 12-TET, 41-TET, 53-TET, 306-TET, and so on <a href="#ref-1">[1]</a>, <a href="#ref-2">[2]</a>.</p>
<p>The key property of convergents is that they give <em>uniquely good</em> approximations. No rational number with a smaller denominator comes closer to the true value than a convergent does. So 7/12 is not merely a decent approximation to $\log_2(3/2)$ — it is provably the best approximation with denominator at most 12. To do better with a denominator below 41, you cannot.</p>
<p>To put numbers on it: in 12-TET, the fifth is $2^{7/12} \approx 1.498307\ldots$, while the true fifth is exactly $1.500000$. The error is about 0.11%, or roughly 2 cents (hundredths of a semitone). In 53-TET, the fifth is $2^{31/53} \approx 1.499941\ldots$, an error of less than 0.004%, about 0.07 cents — essentially indistinguishable from pure. Both 12 and 53 are convergents. Intermediate values like 19-TET or 31-TET are not convergents (they are not best approximations), and their fifths, while sometimes used in experimental or microtonal music, are less accurate relative to their complexity.</p>
<p>Why does this matter? Because a tuning system that approximates the fifth poorly will produce harmonies that beat visibly — the slight mistuning causes the sound to waver in a way that trained ears find uncomfortable in sustained chords. A good fifth approximation is not a luxury; it is the condition for the system to be musically usable in the harmonic practice that most of the world&rsquo;s music assumes.</p>
<hr>
<h2 id="the-pythagorean-comma">The Pythagorean comma</h2>
<p>Before equal temperament became standard (roughly the 18th century in Western Europe), instruments were tuned using pure Pythagorean fifths: exact 3:2 ratios, stacked on top of each other. This gives beautiful, stable individual fifths, but it collects a debt.</p>
<p>After stacking 12 pure fifths, you have climbed in frequency by $(3/2)^{12}$:</p>
$$(3/2)^{12} = \frac{3^{12}}{2^{12}} = \frac{531441}{4096} \approx 129.746\ldots$$<p>Meanwhile, 7 octaves is $2^7 = 128$. The ratio between these is</p>
$$\frac{(3/2)^{12}}{2^7} = \frac{3^{12}}{2^{19}} = \frac{531441}{524288} \approx 1.01364$$<p>This is the Pythagorean comma: roughly 23.46 cents, or about a quarter of a semitone <a href="#ref-4">[4]</a>. In Pythagorean tuning, the circle of fifths never closes. After 12 fifths you arrive at a note that is nominally the same pitch class as the starting point — but sharp by 23.46 cents. That final fifth, the one that &ldquo;should&rdquo; close the circle, sounds badly out of tune. It was historically called the &ldquo;wolf fifth&rdquo; because it howls.</p>
<p>Equal temperament solves this by distributing the comma across all 12 fifths. Each fifth is flattened by $23.46/12 \approx 1.955$ cents. The individual fifths are no longer pure, but the error is small enough to be acceptable — and crucially, it is <em>uniform</em>, so every key sounds equally good (or equally impure, depending on your perspective).</p>
<p>The Pythagorean comma being small — about 1.96% of the octave — is precisely why 12-TET works. It is small because 7/12 is an unusually good convergent of $\log_2(3/2)$. The two facts are the same fact. The comma is the numerator of the error when you approximate $\log_2(3/2)$ by $7/12$, multiplied up by 12 fifths&rsquo; worth of accumulation. When the approximation is good, the comma is small, and the distribution is imperceptible. This is why the piano is tuned the way it is.</p>
<hr>
<h2 id="the-group-theory">The group theory</h2>
<p>We are now ready for the algebra. In 12-TET, pitch classes form the set $\{0, 1, 2, \ldots, 11\}$ where we identify 0 with C, 1 with C♯, 2 with D, 3 with D♯, 4 with E, 5 with F, 6 with F♯, 7 with G, 8 with G♯, 9 with A, 10 with A♯, and 11 with B. Addition is modulo 12: after 11 comes 0 again, because after B comes C in the next octave (same pitch class). This is $\mathbb{Z}_{12}$, the integers mod 12, and it is a group under addition <a href="#ref-1">[1]</a>.</p>
<p>Transposition by a semitone is addition of 1. Transposition by a perfect fifth is addition of 7, because the fifth is 7 semitones in 12-TET. Start from C (0) and repeatedly add 7, always reducing modulo 12:</p>
$$0 \to 7 \to 14 \equiv 2 \to 9 \to 16 \equiv 4 \to 11 \to 18 \equiv 6 \to 13 \equiv 1 \to 8 \to 15 \equiv 3 \to 10 \to 17 \equiv 5 \to 12 \equiv 0$$<p>In note names: C, G, D, A, E, B, F♯, C♯, G♯, D♯/E♭, A♯/B♭, F, C. That is the circle of fifths — all 12 pitch classes visited exactly once before returning to the start. The circle of fifths is the orbit of 0 under repeated addition of 7 in $\mathbb{Z}_{12}$.</p>
<p>Why does the orbit visit all 12 elements? Because $\gcd(7, 12) = 1$. This is Bézout&rsquo;s theorem applied to cyclic groups: an element $g$ generates $\mathbb{Z}_n$ (i.e., its orbit under repeated addition covers all of $\mathbb{Z}_n$) if and only if $\gcd(g, n) = 1$. The generators of $\mathbb{Z}_{12}$ are exactly the elements coprime to 12: that is $\{1, 5, 7, 11\}$. Musically: transposition by 1 semitone (chromatic scale), by 5 semitones (perfect fourth), by 7 semitones (perfect fifth), or by 11 semitones (major seventh) each generates all 12 pitch classes. Transposition by 2 (a whole tone) does not — it produces only the 6-element whole-tone scale. Transposition by 3 (a minor third) produces only the 4-element diminished seventh chord.</p>
<p>This is not a curiosity; it is the algebraic skeleton of tonal music. The circle of fifths closes because 7 and 12 are coprime. That coprimality is guaranteed by the continued fraction structure: the numerator and denominator of a convergent in lowest terms are always coprime (as they must be, being a reduced fraction), and 7/12 is such a convergent.</p>
<p>Now consider the subgroups of $\mathbb{Z}_{12}$. By Lagrange&rsquo;s theorem, subgroups of a finite group must have orders dividing the group order. The divisors of 12 are 1, 2, 3, 4, 6, and 12, so these are the only possible subgroup orders. For cyclic groups there is exactly one subgroup of each order dividing $n$, and it is generated by $n/d$ where $d$ is the subgroup order. The full list:</p>
<p>The trivial subgroup of order 1 is just $\{0\}$. The subgroup of order 2 is $\{0, 6\}$, generated by 6 — that is, the tritone axis, the interval of exactly half an octave. The subgroup of order 3 is $\{0, 4, 8\}$, generated by 4 — this is the augmented triad, three notes equally spaced around the octave by major thirds. The subgroup of order 4 is $\{0, 3, 6, 9\}$, generated by 3 — the diminished seventh chord, four notes equally spaced by minor thirds. The subgroup of order 6 is $\{0, 2, 4, 6, 8, 10\}$, generated by 2 — the whole-tone scale. And the full group of order 12 is all of $\mathbb{Z}_{12}$.</p>
<p>Each of these has a musical life. The augmented triad ($\{0, 4, 8\}$) sounds ambiguous because it maps onto itself under transposition by a major third — there are only 4 distinct augmented triads total, not 12. Composers exploit this ambiguity when they want harmonic instability without committing to a direction. The diminished seventh ($\{0, 3, 6, 9\}$) is similarly ambiguous: it has only 3 distinct forms and can resolve to any of several keys, which is why it appears so often at structural pivots in Romantic music. These properties are direct consequences of the subgroup structure of $\mathbb{Z}_{12}$.</p>
<hr>
<h2 id="messiaens-modes-as-cosets">Messiaen&rsquo;s modes as cosets</h2>
<p>Olivier Messiaen described his &ldquo;modes of limited transposition&rdquo; in his 1944 treatise <em>Technique de mon langage musical</em>. He identified seven scales — including the whole-tone scale and the octatonic scale — that have the peculiar property of mapping onto themselves under some transposition strictly smaller than an octave. He found them by ear, by introspection, and by exhaustive search at the keyboard. He did not have the group theory. But the group theory makes their existence not merely explainable but <em>inevitable</em>.</p>
<p>Here is the key definition. A scale $S \subseteq \mathbb{Z}_{12}$ is a mode of limited transposition if there exists some $t \in \{1, 2, \ldots, 11\}$ such that $S + t \equiv S \pmod{12}$ (as a set). In other words, transposing the scale by $t$ semitones maps the scale onto itself. The integer $t$ is called a period of the scale.</p>
<p>Now, the set of all periods of $S$ — together with 0 — forms a subgroup of $\mathbb{Z}_{12}$ (it is closed under addition modulo 12, since if both $t_1$ and $t_2$ are periods then so is $t_1 + t_2$). Call this subgroup $H$. The condition for $S$ to be a mode of limited transposition is simply that $H$ is nontrivial — that is, $H \neq \{0\}$.</p>
<p>Moreover, if $H$ is the period subgroup of $S$, then $S$ must be a union of cosets of $H$ in $\mathbb{Z}_{12}$. This follows immediately from the fact that $H$ acts on $S$ by translation and maps $S$ to itself: every element of $S$ belongs to exactly one coset of $H$, and $S$ is a union of whole cosets. The size of $S$ must therefore be a multiple of $|H|$.</p>
<p>The whole-tone scale $\{0, 2, 4, 6, 8, 10\}$ is itself the unique subgroup of order 6 in $\mathbb{Z}_{12}$. Its period subgroup is the whole-tone scale itself. Transposing by any even number (2, 4, 6, 8, or 10) maps it to itself. Transposing by an odd number gives the complementary whole-tone scale $\{1, 3, 5, 7, 9, 11\}$. There are therefore only 2 distinct transpositions of the whole-tone scale, not 12.</p>
<p>The octatonic (diminished) scale $\{0, 1, 3, 4, 6, 7, 9, 10\}$ has period subgroup $\{0, 3, 6, 9\}$ — the subgroup of order 4. It is a union of two cosets: $\{0, 3, 6, 9\}$ itself and $\{1, 4, 7, 10\}$. Transposing by 3 maps it onto itself. There are only 3 distinct transpositions. Messiaen calls this his Mode 2.</p>
<p>The general formula is clean: a mode of limited transposition with period subgroup of order $d$ has exactly $12/d$ distinct transpositions. For the whole-tone scale, $d = 6$ gives $12/6 = 2$ transpositions. For the octatonic scale, $d = 4$ gives $12/4 = 3$ transpositions.</p>
<p>What Messiaen found by ear was the complete classification of subsets of $\mathbb{Z}_{12}$ that are unions of cosets of a nontrivial subgroup <a href="#ref-5">[5]</a>. The group theory makes their existence a theorem rather than a discovery. I find this genuinely beautiful: a composer&rsquo;s intuition about harmonic symmetry turns out to be an exercise in the theory of cosets of cyclic groups. For the full analysis of each of Messiaen&rsquo;s seven modes in these terms, see <a href="/posts/messiaen-modes-group-theory/">Messiaen, Modes, and the Group Theory of Harmony</a>.</p>
<hr>
<h2 id="why-not-53">Why not 53?</h2>
<p>Given that 53-TET approximates the fifth with an error of less than 0.004% — compared to 12-TET&rsquo;s 0.11% — one might ask why we do not simply use 53-TET. The mathematical case is overwhelming. In addition to the nearly perfect fifth, 53-TET gives excellent approximations to the just major third (frequency ratio 5:4) and the just minor third (6:5). It was seriously advocated by the 19th-century theorist Robert Holford Macdowall Bosanquet, who even built a 53-key harmonium to demonstrate it. The Chinese theorist Jing Fang described a 53-note system in the 1st century BC. The Arabic music theorist Al-Farabi considered 53-division scales in the 10th century. Everyone who has ever thought carefully about tuning arrives at 53 eventually.</p>
<p>And yet no 53-TET instrument has ever entered widespread use. The reason is anatomical, not mathematical. A piano with 53 keys per octave spans more than 2 metres per octave at any reasonable key size — impossible to play. A guitar with 53 frets per octave has frets spaced roughly 3–4 millimetres apart in the upper register: no human fingertip is narrow enough to press a single fret without touching its neighbours. Even if you could play it, reading 53-TET notation would require an entirely new theoretical and pedagogical apparatus.</p>
<p>The constraint is: we want the largest $N$ such that (a) $N$ is a convergent denominator of $\log_2(3/2)$, so the fifth approximation is genuinely good, and (b) $N$ is small enough to navigate with human hands and readable at a glance. The convergent denominators are 1, 2, 5, 12, 41, 53, 306, &hellip; Of these, 12 is the largest that satisfies condition (b). The next convergent, 41, already strains human dexterity — 41-TET keyboard instruments have been built experimentally but never mass-produced. At 53 the case is closed.</p>
<p>One might argue about where exactly the cutoff is, and reasonable people might draw it at 19 or 31 (which are not convergents but have other virtues). But the point is that 12 is not merely a local optimum found by trial and error. It is the specific value where the continued fraction and human physiology intersect.</p>
<hr>
<h2 id="closing">Closing</h2>
<p>There is something I find genuinely satisfying about this argument. Music feels like the most human of activities — expressive, cultural, steeped in history and tradition. And yet the number 12, which lies at the foundation of so much of the world&rsquo;s music, is not a human choice at all. It is the continued-fraction convergent of an irrational number that was fixed by the physics of vibrating strings long before any human struck a tuning fork.</p>
<p>The circle of fifths closes because $\gcd(7, 12) = 1$: a fact about integers, not about culture. Messiaen&rsquo;s modes exist because $\mathbb{Z}_{12}$ has nontrivial proper subgroups: a fact about cyclic groups, not about 20th-century French aesthetics. The augmented triad sounds ambiguous because it is a coset of the order-3 subgroup of $\mathbb{Z}_{12}$: a fact about quotient groups, not about Romantic harmony conventions.</p>
<p>I came to music theory sideways — through acoustics, then signal processing, then the mathematics of scales. What surprised me, when I finally worked through the continued fraction argument properly, was not that the math existed but that it was so <em>tight</em>. There is essentially no freedom in the answer. Given the constraint that a musical scale should be built around the most consonant interval (after the octave), should form a closed group structure, and should be navigable by a human performer, the answer is 12. Not approximately 12, not 12 as a historical compromise. Exactly 12.</p>
<p>The number is not a tradition. It is a theorem.</p>
<hr>
<p>For more on related themes: the Fibonacci sequence and golden ratio in music appear in <a href="/posts/fibonacci-lateralus/">Fibonacci, Lateralus, and the Golden Ratio</a>. The Euclidean algorithm and rhythmic structure are explored in <a href="/posts/euclidean-rhythms/">Euclidean Rhythms</a> — a sister post to this one in the math-and-music thread. And for the physics of audio sampling rates, where a similar interplay of number theory and practical constraints forces another specific number, see <a href="/posts/why-44100-hz-cd-sampling-rate/">Why 44,100 Hz?</a>.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Balzano, G. J. (1980). The group-theoretic description of 12-fold and microtonal pitch systems. <em>Computer Music Journal</em>, 4(4), 66–84.</p>
<p><span id="ref-2"></span>[2] Carey, N., &amp; Clampitt, D. (1989). Aspects of well-formed scales. <em>Music Theory Spectrum</em>, 11(2), 187–206.</p>
<p><span id="ref-3"></span>[3] Milne, A., Sethares, W. A., &amp; Plamondon, J. (2007). Isomorphic controllers and dynamic tuning. <em>Computer Music Journal</em>, 31(4), 15–32.</p>
<p><span id="ref-4"></span>[4] Lloyd, L. S., &amp; Boyle, H. (1978). <em>Intervals, Scales and Temperaments</em>. St. Martin&rsquo;s Press.</p>
<p><span id="ref-5"></span>[5] Douthett, J., &amp; Steinbach, P. (1998). Parsimonious graphs: A study in parsimony, contextual transformations, and modes of limited transposition. <em>Journal of Music Theory</em>, 42(2), 241–263.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-11-20</strong>: Updated the spelling of &ldquo;Robert Holford Macdowall Bosanquet&rdquo; (previously rendered as &ldquo;Macdowell&rdquo;).</li>
<li><strong>2025-11-20</strong>: Changed &ldquo;about 1.36% of the octave&rdquo; to &ldquo;about 1.96% of the octave.&rdquo; The 1.36% figure is the frequency ratio above unity (531441/524288 ≈ 1.01364); the logarithmic fraction of the 1200-cent octave is 23.46/1200 ≈ 1.96%.</li>
<li><strong>2025-11-20</strong>: Changed &ldquo;12 octaves&rsquo; worth of accumulation&rdquo; to &ldquo;12 fifths&rsquo; worth of accumulation.&rdquo; The Pythagorean comma accumulates over 12 stacked fifths (which span approximately 7 octaves), not 12 octaves.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Non-Commutative Pre-Schoolers</title>
      <link>https://sebastianspicker.github.io/posts/non-commutative-pre-schoolers/</link>
      <pubDate>Mon, 13 Nov 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/non-commutative-pre-schoolers/</guid>
      <description>The same structural reason a toddler cannot put shoes on before socks is why position and momentum cannot be simultaneously measured. Non-commutativity is not exotic physics — it is the default logic of any ordered world.</description>
      <content:encoded><![CDATA[<h2 id="summary">Summary</h2>
<p>A three-year-old cannot put her shoes on before her socks. Not because she lacks motor skills —
because the operations do not commute.</p>
<p>The same structural constraint, dressed in the language of operators on a Hilbert space, is why
Heisenberg&rsquo;s uncertainty principle holds. This post is about that connection: the accidental
algebra lesson built into getting dressed, and why the physicists of 1925 had to abandon one of
arithmetic&rsquo;s most taken-for-granted assumptions.</p>
<h2 id="getting-dressed-is-a-non-abelian-problem">Getting Dressed Is a Non-Abelian Problem</h2>
<p>Start with the mundane. Your morning routine imposes a strict partial order on operations:
underwear before trousers, socks before shoes, cap before chin-strap if you cycle. Try reversing
any pair and the sequence fails — physically, not just socially. You cannot pull a sock over a shoe.</p>
<p>The operation &ldquo;put on socks&rdquo; followed by &ldquo;put on shoes&rdquo; produces a wearable human; the reverse
produces neither, and no amount of wishing commutativity into existence will help.</p>
<p>In the language of abstract algebra, two operations \(A\) and \(B\) <em>commute</em> if \(AB = BA\) —
if doing them in either order yields the same result. Everyday life is full of operations that do
not commute: rotate a book 90° around its vertical axis then 90° around its horizontal axis; now
reverse the order. The final orientations differ. Turn right then turn left while driving; left
then right. Different positions.</p>
<p>The intuition is not hard to build. What is surprising is how rarely we note it, and what it costs
us when we finally hit a domain — quantum mechanics — where non-commutativity is not an
inconvenient edge case but the central fact.</p>
<h2 id="piaget-said-seven-toddlers-disagreed">Piaget Said Seven; Toddlers Disagreed</h2>
<p>Jean Piaget argued that children do not acquire <em>operational thinking</em> — the ability to mentally
perform and reverse sequences of actions — until the <em>concrete operational stage</em>, roughly ages
seven to eleven (<a href="#ref-inhelder1958">Inhelder &amp; Piaget, 1958</a>). Before that, he claimed, children
lack the understanding that an operation can be undone or reordered.</p>
<p>Post-Piagetian research pushed back hard. Patricia Bauer and Jean Mandler tested infants aged
sixteen and twenty months on novel, multi-step action sequences (<a href="#ref-bauer1989">Bauer &amp; Mandler, 1989</a>).
For causally structured sequences — where step A physically enables step B — infants reproduced
the correct order after a two-week delay. They were not told the order was important. They had no
language to encode it. They just knew, implicitly, that the operations had a necessary direction.</p>
<p>A 2020 study by Klemfuss and colleagues tested 100 children aged roughly two-and-a-half to five on temporal ordering
questions (<a href="#ref-klemfuss2020">Klemfuss et al., 2020</a>). Children answered &ldquo;what happened first?&rdquo; questions
correctly 82% of the time. The errors that did appear followed an encoding-order bias — children
defaulted to reporting the next event in the sequence as originally experienced, regardless of
what was asked. The ordering knowledge was intact. What
children lack, for Piaget&rsquo;s full seven years, is the <em>formal</em> recursive conception of
reversibility. The <em>procedural</em> knowledge — that some sequences must be done in the right order
and cannot be freely rearranged — is there from the second year of life.</p>
<p>Which means: learning that \(AB \neq BA\) is not learning something exotic. It is articulating
something the nervous system already knows.</p>
<h2 id="the-mathematicians-commutator">The Mathematician&rsquo;s Commutator</h2>
<p>Abstract algebra formalized this intuition in the nineteenth century. A <em>group</em> is <em>abelian</em>
(commutative) if every pair of elements satisfies \(ab = ba\). Integers under addition: abelian.
Rotations in three dimensions: not.</p>
<p>Arthur Cayley&rsquo;s 1858 memoir established matrix algebra as a formal theory
(<a href="#ref-cayley1858">Cayley, 1858</a>). Multiply two \(2 \times 2\) matrices:</p>
$$
A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}, \quad
B = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}
$$$$
AB = \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}, \quad
BA = \begin{pmatrix} 3 & 4 \\ 1 & 2 \end{pmatrix}
$$<p>\(AB \neq BA\). Non-commutativity is not a curiosity; it is the generic condition for matrix
products. Commutativity is the special case — and requiring justification.</p>
<p>William Rowan Hamilton had already gone further. On 16 October 1843, walking along the Royal Canal
in Dublin, he discovered the quaternions and carved their multiplication rule into the stone of
Broom Bridge:</p>
$$
i^2 = j^2 = k^2 = ijk = -1
$$<p>From this it follows immediately that \(ij = k\) but \(ji = -k\). Hamilton&rsquo;s four-dimensional
number system — the first algebraic structure beyond the complex numbers — was non-commutative by
construction. He did not apologize for it. He celebrated it.</p>
<p>The Lie algebra structure underlying these commutator relations is the same skeleton that governs
Messiaen&rsquo;s modes of limited transposition, which I traced in <a href="/posts/messiaen-modes-group-theory/">a previous post on group theory and
music</a> — a very different physical domain, but identical algebraic
machinery.</p>
<h2 id="born-jordan-and-the-physicists-shock">Born, Jordan, and the Physicist&rsquo;s Shock</h2>
<p>Classical mechanics treats position \(x\) and momentum \(p\) as ordinary real numbers. Real
numbers commute: \(xp = px\). The Poisson bracket \(\{x, p\} = 1\) encodes a classical
relationship, but the underlying quantities are scalars, and scalars commute.</p>
<p>In July 1925, Werner Heisenberg published a paper that could not quite bring itself to say what it
was doing (<a href="#ref-heisenberg1925">Heisenberg, 1925</a>). He replaced classical dynamical variables
with arrays of numbers — what we would now call matrices — and found, uncomfortably, that the
resulting quantum condition required order to matter.</p>
<p>While Heisenberg was on vacation, Max Born and Pascual Jordan finished the translation into matrix
language (<a href="#ref-bornjordan1925">Born &amp; Jordan, 1925</a>). They wrote the commutation relation
explicitly, recognized it as the fundamental law, and showed that it reproduced the known quantum
results:</p>
$$
[\hat{x}, \hat{p}] = \hat{x}\hat{p} - \hat{p}\hat{x} = i\hbar
$$<p>Non-commutativity of position and momentum was not a mathematical accident. It was the theory.</p>
<p>The uncertainty principle followed four years later as a <em>theorem</em>, not an additional postulate.
Howard Robertson proved in 1929 that for any two observables \(\hat{A}\) and \(\hat{B}\), the
Cauchy–Schwarz inequality on Hilbert space yields (<a href="#ref-robertson1929">Robertson, 1929</a>):</p>
$$
\Delta A \cdot \Delta B \geq \frac{1}{2} \left| \langle [\hat{A}, \hat{B}] \rangle \right|
$$<p>Substituting \(\hat{A} = \hat{x}\), \(\hat{B} = \hat{p}\), \([\hat{x}, \hat{p}] = i\hbar\):</p>
$$
\Delta x \cdot \Delta p \geq \frac{\hbar}{2}
$$<p>This is the uncertainty principle. It does not say nature is fuzzy or that measurement disturbs
systems in some vague intuitive sense. It says: position and momentum are operators that do not
commute, and the Robertson inequality then constrains their joint variance. Non-commutativity <em>is</em>
the uncertainty principle. Put the shoes on before the socks and the state is not defined.</p>
<p>The same logic applies to angular momentum. The three components satisfy:</p>
$$
[\hat{L}_x, \hat{L}_y] = i\hbar \hat{L}_z, \quad
[\hat{L}_y, \hat{L}_z] = i\hbar \hat{L}_x, \quad
[\hat{L}_z, \hat{L}_x] = i\hbar \hat{L}_y
$$<p>This is the Lie algebra \(\mathfrak{su}(2)\). You cannot simultaneously determine two components
of angular momentum to arbitrary precision — not because the measurement apparatus is noisy, but
because the operations of measuring them do not commute.</p>
<p>The fiber bundle language that underlies these rotation groups also appears, in different physical
dress, in the problem of the falling cat and geometric phases — another case where the order of
rotations has non-trivial physical consequences (<a href="/posts/falling-cat-geometric-phase/">see that post</a>).</p>
<h2 id="connes-and-non-commutative-space">Connes and Non-Commutative Space</h2>
<p>Alain Connes asked what happens if we allow the coordinates of <em>space itself</em> to be
non-commutative. In ordinary geometry, the algebra of coordinate functions on a manifold is
commutative: \(f(x) \cdot g(x) = g(x) \cdot f(x)\). Connes&rsquo; non-commutative geometry replaces
this with a <em>spectral triple</em> \((\mathcal{A}, \mathcal{H}, D)\): an algebra \(\mathcal{A}\) of
operators (possibly non-commutative) acting on a Hilbert space \(\mathcal{H}\), with a
generalized Dirac operator \(D\) encoding the geometry (<a href="#ref-connes1994">Connes, 1994</a>).</p>
<p>The payoff was remarkable. With Ali Chamseddine, Connes showed that if \(\mathcal{A}\) is chosen
as a specific non-commutative product of the real numbers, complex numbers, quaternions, and
matrix algebras, the spectral action principle reproduces the full Lagrangian of the Standard
Model coupled to general relativity from a single geometric principle
(<a href="#ref-chamseddine1996">Chamseddine &amp; Connes, 1996</a>). The Higgs field, the gauge bosons, the
graviton: all from the geometry of a non-commutative space.</p>
<p>Classical geometry is the special case where the coordinate algebra is commutative. Drop that
assumption and you open up a vastly richer landscape. Quantum mechanics lives in that landscape.
Possibly, so does the structure of spacetime at the Planck scale.</p>
<h2 id="the-lesson-pre-schoolers-already-know">The Lesson Pre-Schoolers Already Know</h2>
<p>There is an irony here that I cannot quite leave alone. Students learning linear algebra for the
first time consistently make the same mistake. Anna Sierpinska documented it carefully: they assume
\(AB = BA\) for matrices because they have spent years in arithmetic and scalar algebra where
multiplication commutes (<a href="#ref-sierpinska2000">Sierpinska, 2000</a>). The commutativity of ordinary
multiplication is so deeply internalized that abandoning it feels like breaking a rule.</p>
<p>But the pre-schooler in the sock-and-shoe scenario never had that problem. Her procedural memory,
documented in infants as young as sixteen months by Bauer and Mandler, encoded the correct
asymmetry directly. The order of operations is the first thing a developing mind learns about
actions in the world, before the arithmetic of school teaches it the convenient fiction that order
is irrelevant.</p>
<p>Arithmetic is the outlier. \(3 + 5 = 5 + 3\) because counting does not depend on where you
start. But putting on clothes, multiplying matrices, rotating rigid bodies, measuring quantum
observables: these operations carry memory of order, and they repay the attention a child already
brings to them before she can name a number.</p>
<p>The universe is non-abelian. We are born knowing it. School briefly convinces us otherwise.
Physics eventually agrees with the pre-schooler.</p>
<h2 id="references">References</h2>
<ul>
<li><span id="ref-inhelder1958"></span>Inhelder, B., &amp; Piaget, J. (1958). <em>The Growth of Logical Thinking from Childhood to Adolescence</em>. Basic Books.</li>
<li><span id="ref-bauer1989"></span>Bauer, P. J., &amp; Mandler, J. M. (1989). One thing follows another: Effects of temporal structure on 1- to 2-year-olds&rsquo; recall of events. <em>Developmental Psychology</em>, 25, 197–206.</li>
<li><span id="ref-klemfuss2020"></span>Klemfuss, J. Z., McWilliams, K., Henderson, H. M., Olaguez, A. P., &amp; Lyon, T. D. (2020). Order of encoding predicts young children&rsquo;s responses to sequencing questions. <em>Cognitive Development</em>, 55, 100927. <a href="https://doi.org/10.1016/j.cogdev.2020.100927">DOI: 10.1016/j.cogdev.2020.100927</a></li>
<li><span id="ref-cayley1858"></span>Cayley, A. (1858). A memoir on the theory of matrices. <em>Philosophical Transactions of the Royal Society of London</em>, 148, 17–37. <a href="https://doi.org/10.1098/rstl.1858.0002">DOI: 10.1098/rstl.1858.0002</a></li>
<li><span id="ref-heisenberg1925"></span>Heisenberg, W. (1925). Über quantentheoretische Umdeutung kinematischer und mechanischer Beziehungen. <em>Zeitschrift für Physik</em>, 33, 879–893.</li>
<li><span id="ref-bornjordan1925"></span>Born, M., &amp; Jordan, P. (1925). Zur Quantenmechanik. <em>Zeitschrift für Physik</em>, 34, 858–888.</li>
<li><span id="ref-robertson1929"></span>Robertson, H. P. (1929). The uncertainty principle. <em>Physical Review</em>, 34, 163–164. <a href="https://doi.org/10.1103/PhysRev.34.163">DOI: 10.1103/PhysRev.34.163</a></li>
<li><span id="ref-connes1994"></span>Connes, A. (1994). <em>Noncommutative Geometry</em>. Academic Press. ISBN 0-12-185860-X.</li>
<li><span id="ref-chamseddine1996"></span>Chamseddine, A. H., &amp; Connes, A. (1996). Universal formula for noncommutative geometry actions: Unification of gravity and the standard model. <em>Physical Review Letters</em>, 77, 4868–4871. <a href="https://doi.org/10.1103/PhysRevLett.77.4868">DOI: 10.1103/PhysRevLett.77.4868</a></li>
<li><span id="ref-sierpinska2000"></span>Sierpinska, A. (2000). On some aspects of students&rsquo; thinking in linear algebra. In J.-L. Dorier (Ed.), <em>On the Teaching of Linear Algebra</em> (pp. 209–246). Kluwer Academic Publishers. <a href="https://doi.org/10.1007/0-306-47224-4_8">DOI: 10.1007/0-306-47224-4_8</a></li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-02-03</strong>: Corrected the age range for the Klemfuss et al. (2020) study from &ldquo;two to four&rdquo; to &ldquo;roughly two-and-a-half to five&rdquo; — the actual participants were aged 30–61 months.</li>
<li><strong>2026-02-03</strong>: Updated the characterisation of Klemfuss et al. (2020) findings to reflect the paper&rsquo;s central result: errors follow an encoding-order bias (children default to the next event in encoding sequence). The paper&rsquo;s title — &ldquo;Order of encoding predicts young children&rsquo;s responses&rdquo; — names the mechanism.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>LK-99: Six Weeks That Showed How Physics Works</title>
      <link>https://sebastianspicker.github.io/posts/lk99-preprint-physics-sociology/</link>
      <pubDate>Mon, 09 Oct 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/lk99-preprint-physics-sociology/</guid>
      <description>On July 22, 2023, a Korean preprint claimed that LK-99 — a copper-doped lead apatite — was a room-temperature, ambient-pressure superconductor. Within six weeks, the claim was definitively falsified. The episode is more interesting for what it revealed about the sociology of science than for the compound itself: how a global community self-corrected at extraordinary speed, and how the media managed to fail at conveying uncertainty despite watching it happen in real time.</description>
      <content:encoded><![CDATA[<h2 id="july-22-2023">July 22, 2023</h2>
<p>On a Saturday morning in late July 2023, two preprints appeared on arXiv. They were submitted by researchers affiliated with the Quantum Energy Research Centre in Seoul — Sukbae Lee, Ji-Hoon Kim, and colleagues — and they claimed something that condensed matter physicists have been chasing for over a century: a material that superconducts at room temperature and ambient pressure.</p>
<p>The compound was called LK-99. It was a copper-doped lead apatite, synthesized from common precursors using a procedure that, on paper, any moderately equipped laboratory could attempt. The claimed critical temperature was above 400 K — well above 293 K, which is room temperature, which is roughly the temperature of a warm afternoon in Seoul in July.</p>
<p>A video circulated almost immediately. A small, grey, irregular piece of LK-99 appeared to be partially levitating — tilting up, one end raised — above a permanent neodymium magnet. In the video it wobbles slightly, like something caught between gravity and an invisible hand.</p>
<p>Physics Twitter — I will use that name; it was still recognizably that in July 2023 — detonated. Within 72 hours, laboratories across the world were racing to synthesize LK-99. Discord servers formed. GitHub repositories appeared with shared synthesis protocols. Preprints from independent groups began accumulating before the original authors had likely had a good night&rsquo;s sleep.</p>
<p>Six weeks later, the claim was dead.</p>
<p>I want to write about what happened in those six weeks, because I think the episode is more interesting as sociology of science than as condensed matter physics. LK-99 turned out to be a modest semiconductor with a ferromagnetic impurity. But the speed and the manner of that determination — the way a globally distributed community of physicists organized itself, shared data in real time, converged on a falsification, and then moved on — that is genuinely remarkable, and worth examining carefully.</p>
<h2 id="why-room-temperature-superconductivity-is-the-grail">Why Room-Temperature Superconductivity Is the Grail</h2>
<p>Let me be precise about why this particular claim generates the response it does.</p>
<p>Superconductivity is the phenomenon in which certain materials, below a critical temperature T<sub>c</sub>, carry electrical current with exactly zero resistance. Not very low resistance — zero. A current established in a superconducting loop will, in principle, continue flowing indefinitely without any driving voltage. This is not a small quantitative improvement over ordinary conductors; it is a qualitatively different regime of physics.</p>
<p>The trouble is that essentially all known superconductors require extreme cooling. Conventional metallic superconductors — the ones Heike Kamerlingh Onnes discovered in mercury in 1911 — become superconducting below about 30 K at best. That is liquid helium temperature, which is expensive, logistically demanding, and entirely impractical for large-scale applications. The discovery of high-temperature cuprate superconductors in 1986 (Bednorz and Müller, Nobel Prize 1987) was genuinely revolutionary: some cuprates superconduct up to about 138 K. But 138 K is still −135°C. It requires liquid nitrogen cooling, which is cheaper than liquid helium but still not something you install in a power grid without substantial infrastructure.</p>
<p>The current record belongs to a class of hydrogen-rich compounds under extreme pressure — carbonaceous sulfur hydride at roughly 15°C, but requiring about 267 GPa of pressure. For context, the pressure at the center of the Earth is about 360 GPa. You cannot run a power cable through a diamond anvil cell.</p>
<p>Room-temperature, ambient-pressure superconductivity would be transformative in a way that very few material discoveries are. Electrical grids currently lose somewhere between 5 and 10 percent of all transmitted energy to resistive heating — a staggering quantity of energy, simply dissipated as heat in cables. Zero-resistance transmission would eliminate that loss. Magnetically levitated transport would become feasible without the cryogenic infrastructure that makes current Maglev systems enormously expensive to build and maintain. Compact, affordable MRI machines would become possible. Effects on computing, on energy storage, on medical technology — the list runs long. It would be one of the most consequential material discoveries in the history of technology.</p>
<p>This is why the response to the LK-99 preprints was not hysteria but rather the entirely rational behavior of a community that understood exactly what was at stake if the claim were true.</p>
<h2 id="what-lk-99-was-and-what-it-claimed">What LK-99 Was and What It Claimed</h2>
<p>LK-99 is chemically expressed as Pb₁₀₋ₓCuₓ(PO₄)₆O, where x is approximately 0.9 to 1.1. It is a lead apatite — the same crystal family as the mineral in tooth enamel — with a fraction of the lead atoms replaced by copper.</p>
<p>The proposed mechanism, as sketched in the preprints, involved Cu²⁺ substituting for Pb²⁺. Because copper has a slightly smaller ionic radius than lead, this substitution induces a local structural distortion. The claim was that this distortion produces a flat electronic band at the Fermi level — and flat bands are associated with strong electronic correlations that can, in principle, give rise to unconventional superconductivity. The analogy to twisted bilayer graphene was implicit in the discussion, though the mechanism is quite different and twisted bilayer graphene superconducts only well below 1 K.</p>
<p>Reading the preprints in late July 2023 was, I confess, a slightly uncomfortable experience. The writing was rushed. The two preprints — submitted by different author subsets from the same group — were internally inconsistent in places. The resistance measurements showed a large drop with temperature, but not zero resistance. The synthesis protocol was described in enough detail to be reproducible, which was good, but the characterization was incomplete in ways that mattered.</p>
<p>Red flags were present from the beginning, and many physicists noted them immediately. The levitation video showed a piece of LK-99 that was tilted and wobbling — not the stable, complete expulsion of magnetic flux you would expect from a true Meissner effect. A perfect superconductor placed above a magnet would levitate horizontally and stably. This piece was doing something, but the something was not obviously Meissner levitation.</p>
<p>And yet. The synthesis was simple. The claim was specific and testable. If there was even a small chance it was real, the imperative to check was overwhelming. So labs checked.</p>
<h2 id="the-replication-wave">The Replication Wave</h2>
<p>What happened over the following weeks was, as far as I am aware, unprecedented in condensed matter physics.</p>
<p>Normally, a replication in physics looks like this: a group reads a paper, decides it is interesting enough to attempt, orders precursor materials, synthesizes the compound (which takes weeks to months), characterizes it with appropriate instruments (more weeks), writes up the results, submits them (more weeks), and eventually publishes — often six months to a year after the original claim, sometimes much longer. The feedback cycle is slow by design: slowness is a feature, not a bug, because it allows careful work rather than hasty work.</p>
<p>The LK-99 replication did not look like this.</p>
<p>Within a week, preprints from independent groups — China, India, the United States, Germany — were appearing on arXiv. Discord servers with hundreds of members were organizing synthesis attempts in real time, sharing thermograms, resistance measurements, and microscope images as they came off instruments. Twitter threads tracked emerging results with the urgency of a live event. A GitHub repository maintained by the community accumulated synthesis protocols, shared data files, and links to new preprints as they appeared.</p>
<p>Some groups reported partial levitation. Others reported anomalous resistance drops. Others — starting almost immediately — reported synthesizing the material and finding nothing unusual at all.</p>
<p>The speed of this was extraordinary not because of any particular organizational effort, but because the incentive structure happened to align with the infrastructure that now exists. Preprints made sharing immediate. Social media made results public the moment they existed. The synthesis was simple enough to attempt in any reasonably equipped solid-state chemistry lab. And the motivation — the prize, if it were real — was enormous. You would not need to tell anyone to work on this. You would have to tell people to stop.</p>
<p>By mid-August 2023 — three weeks after the original preprints — the key debunking papers had appeared. By late August, there was no serious scientific debate remaining.</p>
<h2 id="the-mechanism-of-falsification">The Mechanism of Falsification</h2>
<p>The levitating video was explained first, and the explanation is both mundane and instructive.</p>
<p>The LK-99 synthesis produces, as an essentially unavoidable impurity, copper sulfide — Cu₂S. Copper sulfide is interesting in its own right: it undergoes a structural phase transition at roughly 105°C (378 K) from a low-temperature chalcocite form to a high-temperature superionic conductor. This transition is accompanied by a large, sharp drop in electrical resistance — exactly the kind of anomalous feature that, in a sample of mixed composition, might be misidentified as a superconducting transition.</p>
<p>More importantly for the levitation: the LK-99 synthesis products ubiquitously contain ferromagnetic impurity phases. A ferromagnetic material will interact with a permanent magnet. Partial levitation, tilted and unstable, is entirely consistent with a ferromagnetic-diamagnetic competition — not with the Meissner effect.</p>
<p>Several groups published debunking papers in rapid succession. Kumar and colleagues (<a href="#ref-Senapati2023">Kumar et al., 2023</a>) reported the absence of superconductivity in LK-99 samples; other groups synthesized Cu₂S independently, confirmed its resistance anomaly near 380 K, and showed quantitatively that the LK-99 observations were fully consistent with Cu₂S contamination and ferromagnetic impurities. Liu and Meng (<a href="#ref-LiuMeng2023">Liu &amp; Meng, 2023</a>) provided a complementary symmetry analysis explaining why the structural distortion mechanism did not actually predict superconductivity.</p>
<p>Several Chinese groups with high-quality synthesis capabilities — and, frankly, strong motivation to find a positive result — produced very pure LK-99 samples and found what you would expect of a clean lead apatite: a semiconductor with modest diamagnetism. Nothing anomalous. When you removed the Cu₂S impurity, you removed the anomaly.</p>
<p>Daniel Garisto summarized the consensus in a <em>Nature</em> news piece in August 2023 (<a href="#ref-Garisto2023">Garisto, 2023</a>): LK-99 is not a superconductor. The case was closed, with an efficiency that the scientific community should be proud of.</p>
<h2 id="a-useful-contrast-ranga-dias">A Useful Contrast: Ranga Dias</h2>
<p>The LK-99 episode does not exist in isolation. The preceding years had seen other extraordinary claims of room-temperature or near-room-temperature superconductivity, and the most prominent involved Ranga Dias at the University of Rochester.</p>
<p>Dias published two papers in <em>Nature</em> claiming superconductivity at or near room temperature: one in 2020, describing carbonaceous sulfur hydride at roughly 15°C under 267 GPa (<a href="#ref-Snider2020">Snider et al., 2020</a> — and I note that the earlier Dias and Silvera <em>Science</em> paper on metallic hydrogen (<a href="#ref-DiasSilvera2017">Dias &amp; Silvera, 2017</a>) received a significant erratum and has been widely questioned — establishing a pattern), and one in 2023, describing nitrogen-doped lutetium hydride under much lower pressure. Both <em>Nature</em> papers were eventually retracted — the 2020 paper in 2022, the 2023 paper in November 2023 — amid serious and credible allegations of data manipulation. The criticisms included statistical anomalies in background signals, apparent image duplication across different experimental conditions, and raw data that did not match the published figures. Hirsch, who had been following these claims closely, documented many of the irregularities (<a href="#ref-Hirsch2021">Hirsch, 2021</a>).</p>
<p>The contrast with LK-99 is worth sitting with. The Korean team appears to have been guilty of honest overreach: genuine excitement about anomalous observations, insufficient characterization before posting, motivated interpretation of ambiguous data. This happens in science. Extraordinary rewards for being right create extraordinary pressure to believe you are right. The LK-99 researchers may have seen something they genuinely could not explain and convinced themselves it was what they hoped it was.</p>
<p>The Dias case, if the allegations of data manipulation are accurate — and the retractions, and the University of Rochester investigation that followed, suggest they have merit — is something different: not motivated misinterpretation but deliberate fabrication. The scientific outcomes are superficially similar: both sets of claims were false, both caused the community to expend significant effort on falsification, both damaged the credibility of the field. But the causes, and the appropriate institutional and moral responses, differ substantially.</p>
<p>How do you tell them apart in real time? In both cases, you had extraordinary claims that passed initial peer review at prestigious venues. In both cases, independent replication failed. The LK-99 falsification came faster, partly because the synthesis was simpler and partly because the community mobilized more broadly. The Dias case took years, and the data manipulation allegations required access to raw data that the research group was slow to provide.</p>
<p>I do not have a clean answer. The difference in mechanism — honest error versus alleged fraud — is not directly observable from the outside. What you can observe is willingness to share data, consistency of results across different instruments and laboratories, and whether the research group facilitates or obstructs independent verification. On those criteria, the LK-99 group and the Dias group look quite different.</p>
<h2 id="the-sociology-of-what-happened">The Sociology of What Happened</h2>
<p>Let me step back from the physics and say something about what the LK-99 episode reveals about how science actually functions.</p>
<p>The first thing it reveals is that community self-correction works, and now works at extraordinary speed when the incentive is high enough. The coordinated global replication was not organized by any institution, any journal, any funding body. It emerged spontaneously from a community that understood what was at stake and had the tools — preprint servers, social media, Discord, GitHub — to coordinate without central direction. The result was a falsification that, in a previous era, might have taken two to five years, completed in six weeks. That is remarkable.</p>
<p>The second thing it reveals is that the preprint revolution is real and consequential. The LK-99 preprints bypassed traditional peer review entirely. That could be bad — and in principle, a false claim could propagate further and faster without peer review as a gate. In practice, in this case, removing the gate allowed not just the false claim but its falsification to move at the same speed. Peer review, as it is normally practiced, is too slow to respond to a claim like this on a timescale that matters. The community replaced it with something faster: immediate, distributed, adversarial review by people with direct experimental access to the question.</p>
<p>This is not an argument against peer review. It is an argument that peer review in the traditional sense — two or three reviewers reading a manuscript over a few weeks — is not the only form that meaningful scientific scrutiny takes.</p>
<p>The third thing the episode reveals is that social media&rsquo;s role in science communication is deeply ambivalent. Twitter accelerated the spread of both the original claim and the debunking. The community of physicists on Twitter was, on the whole, appropriately skeptical from the first day — I saw many threads on July 22 and 23 that noted the red flags I mentioned above: the tilted levitation, the non-zero resistance, the inconsistencies between the two preprints. But that skepticism was invisible to most science journalists, who were looking at the same videos and preprints and reading the excitement rather than the caveats.</p>
<h2 id="the-media-and-the-calibration-problem">The Media, and the Calibration Problem</h2>
<p>I want to be specific about the media failure, because I think it matters.</p>
<p>The appropriate headline on July 23, 2023 was something like: &ldquo;Korean researchers post preprints claiming room-temperature superconductivity; claim is extraordinary and unverified; replication underway.&rdquo; That headline is accurate. It conveys the genuine excitement — because the claim, if true, would be extraordinary — while conveying the appropriate uncertainty about an unverified preprint from a single group.</p>
<p>The headlines that actually appeared, across outlets that should know better, included &ldquo;Room-temperature superconductor discovered&rdquo; and &ldquo;Scientists may have created the holy grail of energy.&rdquo; These are not accurate. They convey neither the uncertainty nor the specific nature of the claim. They treat a preprint as a discovery.</p>
<p>This is a calibration failure — the same kind of failure I have written about in other contexts. On this blog, I have discussed how LLMs can fail catastrophically when they lack the context to assess whether their confident-sounding output is grounded in anything real (<a href="/posts/car-wash-grounding/">see the car-wash post</a>, and more generally the discussion of context and grounding in <a href="/posts/more-context-not-always-better/">more context is not always better</a>). The mechanism in journalism is different but the structure is the same: confidence that is not appropriately calibrated to evidence.</p>
<p>The Bayesian structure of the situation was, or should have been, clear. The prior probability of a room-temperature, ambient-pressure superconductor being found in any given week is very small — not because room-temperature superconductors are impossible, but because such discoveries do not happen often and many previous claims have failed. Call that prior probability low. Against that prior, what evidence did we have on July 23? A video showing partial, unstable levitation — which, as I noted, is not what Meissner levitation looks like. Two rushed preprints that disagreed with each other in some details. No independent replication. P(levitation video | not a superconductor) was not particularly small, as the Cu₂S explanation would later demonstrate. So the posterior probability that LK-99 was a room-temperature superconductor, given the evidence available on July 23, was not meaningfully higher than the prior — which was low.</p>
<p>A well-calibrated science journalist would not have written &ldquo;Room-temperature superconductor discovered.&rdquo; A well-calibrated scientist — and many of them said exactly this — would have written &ldquo;interesting claim, requires replication, maintain high skepticism.&rdquo; The scientific community was, on the whole, well-calibrated. The journalism was not.</p>
<p>This is not a new observation. Science journalists have been criticized for overclaiming since there have been science journalists. But the LK-99 episode is a particularly clean example because the timescale was so short: the calibration failure in the media and the calibration success in the scientific community happened simultaneously, in full public view, and could be compared directly.</p>
<p>I write occasionally about AI systems and their tendency to produce confident outputs that are not grounded in evidence — a form of miscalibration that is particularly dangerous because the confident tone is not a signal of accuracy (<a href="/posts/ai-detectors-systematic-minds/">a theme that runs through recent posts on this blog</a>). The LK-99 episode is a reminder that miscalibration is not unique to neural networks. It is a general failure mode in any system that needs to estimate uncertainty about claims — human, institutional, or artificial. The cure in all cases is the same: track confidence to evidence, update on data, resist the pull of exciting priors.</p>
<h2 id="what-the-scientific-community-actually-did">What the Scientific Community Actually Did</h2>
<p>I want to be careful not to end on a note of pure cynicism about the media and leave the scientific community looking saintly. The community is not saintly.</p>
<p>There were preprints from independent groups that claimed positive results before the falsification was clear — groups that perhaps saw anomalies and wanted to be part of the story. There was social pressure, documented in real time on Twitter, to share exciting results before they were fully analyzed. The Discord servers and GitHub repositories that were genuinely useful for coordination were also, occasionally, vectors for misinformation and premature interpretation.</p>
<p>The community self-corrected. That is the important thing. The noise in the system resolved into a clear answer, in six weeks, through a process that was adversarial in the best scientific sense: many people trying to verify or refute a specific testable claim, sharing data openly, calling out methodological problems in public. The answer that emerged was correct.</p>
<p>I find this genuinely impressive. It is easy to be cynical about institutional science — about publication bias, about the replication crisis in psychology and medicine, about the incentive structures that reward novelty over rigor. The LK-99 episode is a counter-example. It is evidence that, when a question is clear and testable and the stakes are high, the system works. Not perfectly, not without noise, but functionally.</p>
<p>Peer review in the classical sense was absent. Peer review in a broader sense — global, immediate, public, adversarial — worked faster than any journal could have managed, and reached a correct conclusion.</p>
<h2 id="the-next-extraordinary-claim">The Next Extraordinary Claim</h2>
<p>LK-99 is over. The compound will appear in future textbooks, probably in a sidebar about famous failed claims in condensed matter physics, alongside Schön and Dias and others. The researchers who synthesized and characterized it honestly will get some credit for the negative result; the original Korean team will, I imagine, have a difficult few years professionally.</p>
<p>The question I am left with is what happens next time.</p>
<p>Room-temperature superconductivity will, almost certainly, be claimed again. The prize is too large and the search too active. Possibly the claim will be correct — I would not put that probability at zero. More likely it will be another false positive, another Cu₂S lurking in the impurity profile.</p>
<p>Will the media learn from LK-99? I am genuinely uncertain. The incentive structure for science journalism rewards excitement over accuracy, and &ldquo;extraordinary claim requires replication&rdquo; is a less clickable headline than &ldquo;room-temperature superconductor discovered.&rdquo; The journalists who wrote those headlines were not stupid; they were responding rationally to the incentives of their profession.</p>
<p>Will the scientific community respond as effectively? I think so, at least for claims of this kind: testable, synthesis-based, with enough labs in the world capable of attempting replication. The infrastructure — preprints, Discord, shared repositories — exists and is now demonstrated to work. The speed of the LK-99 falsification sets a kind of benchmark.</p>
<p>What the episode showed, in the end, is not that science is infallible or that the system is without problems. It showed that, under the right conditions — a clear empirical question, a distributed community with the tools and motivation to address it, and a culture of open data sharing — science can self-correct at remarkable speed. The failure was in communication, not in the science. That is a meaningful distinction.</p>
<p>Whether the media will have learned anything by the time the next extraordinary claim appears — that, I confess, I doubt.</p>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-LeeKim2023"></span>Lee, S., Kim, J. H., &amp; Kwon, Y.-W. (2023). The First Room-Temperature Ambient-Pressure Superconductor. <em>arXiv</em>:2307.12008. Retrieved from <a href="https://arxiv.org/abs/2307.12008">https://arxiv.org/abs/2307.12008</a></p>
</li>
<li>
<p><span id="ref-Senapati2023"></span>Kumar, K., Surface, N. B., &amp; Baral, B. (2023). Absence of superconductivity in LK-99 at ambient conditions. <em>arXiv</em>:2308.03544. Retrieved from <a href="https://arxiv.org/abs/2308.03544">https://arxiv.org/abs/2308.03544</a></p>
</li>
<li>
<p><span id="ref-LiuMeng2023"></span>Liu, S., &amp; Meng, S. (2023). Symmetry-breaking and the origin of the anomalous properties of LK-99. <em>arXiv</em>:2308.05135. Retrieved from <a href="https://arxiv.org/abs/2308.05135">https://arxiv.org/abs/2308.05135</a></p>
</li>
<li>
<p><span id="ref-Garisto2023"></span>Garisto, D. (2023). LK-99 isn&rsquo;t a superconductor — how science sleuths solved the mystery. <em>Nature</em>, 620, 705–706. <a href="https://doi.org/10.1038/d41586-023-02585-7">DOI: 10.1038/d41586-023-02585-7</a></p>
</li>
<li>
<p><span id="ref-Snider2020"></span>Snider, E., Dasenbrock-Gammon, N., McBride, R., Debessai, M., Vindana, H., Vencatasamy, K., Lawler, K. V., Salamat, A., &amp; Dias, R. P. (2020). Room-temperature superconductivity in a carbonaceous sulfur hydride. <em>Nature</em>, 586, 373–377. <a href="https://doi.org/10.1038/s41586-020-2801-z">DOI: 10.1038/s41586-020-2801-z</a> (Retracted 2022.)</p>
</li>
<li>
<p><span id="ref-DiasSilvera2017"></span>Dias, R. P., &amp; Silvera, I. F. (2017). Observation of the Wigner-Huntington transition to metallic hydrogen. <em>Science</em>, 355, 715–718. <a href="https://doi.org/10.1126/science.aal1579">DOI: 10.1126/science.aal1579</a> (Erratum published 2017; widely questioned.)</p>
</li>
<li>
<p><span id="ref-Hirsch2021"></span>Hirsch, J. E. (2021). Rejoinder to &ldquo;Comment on &lsquo;Absence of magnetic evidence for superconductivity in hydride compounds&rsquo;&rdquo; by Dias and Salamat. <em>Physica C</em>, 590, 1353964. <a href="https://doi.org/10.1016/j.physc.2021.1353964">DOI: 10.1016/j.physc.2021.1353964</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-09-14</strong>: Updated the Cu₂S characterisation: pure Cu₂S is diamagnetic; the ferromagnetism in LK-99 samples comes from impurity phases. Updated the Dias &amp; Silvera 2017 <em>Science</em> paper status: it received an erratum but was not formally retracted (unlike the 2020 and 2023 <em>Nature</em> papers). Updated the Senapati et al. reference to the correct LK-99 debunking literature (the previous arXiv ID resolved to a different paper).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Zero Angular Momentum: The Falling Cat and the Geometry of Shape Space</title>
      <link>https://sebastianspicker.github.io/posts/falling-cat-geometric-phase/</link>
      <pubDate>Tue, 03 Oct 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/falling-cat-geometric-phase/</guid>
      <description>A cat dropped upside-down rotates 180° and lands on its feet, despite having zero angular momentum throughout. This is not a trick and not a violation of physics. The explanation took physicists from 1894 to 1993 to fully work out, and the answer — a geometric phase arising from the holonomy of a fiber bundle — is the same mathematics that governs the Berry phase in quantum mechanics and the Aharonov-Bohm effect in electrodynamics. We adopted two strays this year. They fall beautifully.</description>
      <content:encoded><![CDATA[<p><em>We adopted two stray cats in 2023. They had been living under a garden shed and
had strong opinions about most things, including the correct height from which to
leap onto a bookshelf and whether landing was optional. They are indoor cats now,
for health reasons — a vet&rsquo;s recommendation they find unconvincing but have largely
accepted. Watching one of them drop
from a windowsill — always feet-first, always orientated correctly, from heights
that would leave me reconsidering my life choices — I found myself thinking about
a problem I had first encountered in a mechanics course and had never fully
resolved to my satisfaction.</em></p>
<p><em>How does a cat rotate with zero angular momentum?</em></p>
<hr>
<h2 id="the-problem">The Problem</h2>
<p>When a cat is dropped from an inverted position — upside-down, held by a
practised experimenter, then released — it rotates approximately 180° and
lands on its feet. The drop takes around 0.3 seconds. The cat begins with
negligible angular momentum (the experimenter can release it with almost no
spin), and there are no external torques during free fall. By conservation of
angular momentum, the total angular momentum of the cat must remain constant
throughout the fall.</p>
<p>The total angular momentum is therefore approximately zero throughout the
fall.</p>
<p>And yet the cat rotates 180°.</p>
<p>This is the falling cat problem. It was first documented quantitatively by
Étienne-Jules Marey in 1894 using chronophotography — among the first
high-speed photography of any biological motion — and it has
occupied physicists, mathematicians, neuroscientists, and roboticists ever
since.</p>
<p>The problem is not exotic. Every cat owner has seen it. What requires
explanation is why our intuitions about angular momentum fail here, and what
replaces them.</p>
<hr>
<h2 id="why-the-obvious-answers-do-not-work">Why the Obvious Answers Do Not Work</h2>
<p>There are two naive explanations for the cat&rsquo;s righting reflex, both wrong.</p>
<p><strong>Explanation 1: The cat uses initial angular momentum.</strong> The experimenter
gives the cat a small spin before releasing it; the cat amplifies this to
achieve the full 180°. This fails because controlled experiments (and Marey&rsquo;s
original photographs) confirm that cats can right themselves even when
released with zero initial spin. Careful experimenters have verified this
explicitly.</p>
<p><strong>Explanation 2: The cat pushes against the air.</strong> A falling cat could, in
principle, use aerodynamic forces to push against the air and generate a
reaction. This fails because the angular impulse from air drag over 0.3
seconds is far too small to account for the observed 180° rotation. Marey&rsquo;s
chronophotographs already showed that the motion begins immediately on
release, before air resistance could contribute meaningfully.</p>
<p>Both explanations appeal to external torques. The correct explanation requires
none.</p>
<hr>
<h2 id="marey-and-the-photographic-evidence">Marey and the Photographic Evidence</h2>
<p>Étienne-Jules Marey published his chronophotographic sequence of a falling
cat in <em>La Nature</em> on 10 November 1894. The images, taken at 60 frames per
second, show the following clearly:</p>
<ol>
<li>The front and rear halves of the cat move <em>asymmetrically</em>. The front half
rotates in one direction; the rear half rotates by a smaller angle in the
opposite direction.</li>
<li>The cat pulls its front legs in close to its body (reducing the moment of
inertia of the front half) while extending its rear legs (increasing the
moment of inertia of the rear half).</li>
<li>The front half then rotates rapidly (large angle, small moment of inertia);
the rear half rotates slowly in the opposite direction (small angle, large
moment of inertia).</li>
<li>The cat then extends its front legs and pulls in its rear legs, and reverses
the process.</li>
</ol>
<p>The net effect: the cat&rsquo;s body orientation rotates by 180° even though the
<em>total</em> angular momentum — computed as the sum of both halves — remains
constant. The key word is <em>sum</em>. Individual parts can exchange angular momentum
through internal torques; the sum is conserved.</p>
<p>This mechanism — internal redistribution of angular momentum without changing
its total — is correct but not complete. It explains <em>that</em> rotation is
possible, not <em>how much</em> rotation is achieved per cycle of shape change. For
that, we need the mathematics.</p>
<hr>
<h2 id="kane-and-scher-the-two-cylinder-model">Kane and Scher: The Two-Cylinder Model</h2>
<p>The first rigorous mechanical model was published by T.R. Kane and M.P. Scher
in 1969 (<em>International Journal of Solids and Structures</em> 5, 663–670).</p>
<p>They modelled the cat as two rigid axisymmetric cylinders — a front half and
a rear half — connected at a joint that allows relative bending and twisting.
The joint constraint imposes that the relative twist between the two halves is
zero (a &ldquo;no-twist&rdquo; condition: the cylinders cannot spin relative to each other
at their connection). The total angular momentum of the system is held fixed
at zero.</p>
<p>Let the two cylinders have moments of inertia $I_1$ and $I_2$ about their
symmetry axes, and let $\phi$ be the bend angle between them and $\psi$ the
twist angle. The zero-angular-momentum constraint, combined with the no-twist
condition, gives a system of equations that can be integrated numerically to
find the net body rotation as a function of the shape-change trajectory
$(\phi(t), \psi(t))$.</p>
<p>Kane and Scher showed that a specific sequence of shape changes — one complete
cycle in the $(\phi, \psi)$ plane — produces a net rotation of approximately
90–100°. A second cycle gives the rest. The calculation was the first to
confirm, from mechanics alone, that the righting manoeuvre requires no external
torques and is entirely consistent with conservation of angular momentum.</p>
<p>What the Kane–Scher model does not explain is <em>why</em> the net rotation per cycle
depends on the area enclosed by the trajectory in shape space — or why the
same mathematical structure appears in quantum mechanics. For that, we need
Montgomery&rsquo;s formulation.</p>
<hr>
<h2 id="montgomery-fiber-bundles-and-geometric-holonomy">Montgomery: Fiber Bundles and Geometric Holonomy</h2>
<p>In 1993, Richard Montgomery published a reformulation of the falling cat problem
using gauge theory (<em>Dynamics and Control of Mechanical Systems</em>, Fields
Institute Communications, AMS, pp. 193–218). The reformulation is the
definitive mathematical treatment, and it connects the cat to one of the deepest
structures in modern physics.</p>
<h3 id="the-configuration-space">The Configuration Space</h3>
<p>The full configuration space of the cat — the space of all possible positions
and orientations — is</p>
$$Q = SO(3) \times \mathcal{S},$$<p>where $SO(3)$ is the rotation group (describing the cat&rsquo;s overall orientation
in space) and $\mathcal{S}$ is the <em>shape space</em> (describing the internal
geometry: the bend angle, the twist, the position of each limb relative to the
body).</p>
<p>The angular momentum constraint $\mathbf{L} = 0$ defines a <em>horizontal
distribution</em> on $Q$ — a preferred subspace of tangent vectors at each point
that correspond to shape changes at zero angular momentum. This distribution is
not integrable (it does not come from a foliation), which is the mathematical
signature that holonomy is possible.</p>
<h3 id="the-fiber-bundle">The Fiber Bundle</h3>
<p>The projection</p>
$$\pi \colon Q \to \mathcal{S}, \qquad (R, s) \mapsto s,$$<p>makes $Q$ into a principal fiber bundle over $\mathcal{S}$ with structure group
$SO(3)$. The fiber above each shape $s \in \mathcal{S}$ is the set of all
orientations the cat can have with that shape.</p>
<p>A <em>connection</em> on this bundle is a rule for &ldquo;lifting&rdquo; paths in the base
$\mathcal{S}$ to horizontal paths in the total space $Q$ — that is, paths
along which the angular momentum constraint is satisfied. This connection
$\mathcal{A}$ is a one-form on $\mathcal{S}$ taking values in the Lie algebra
$\mathfrak{so}(3)$.</p>
<h3 id="holonomy-the-geometric-phase">Holonomy: The Geometric Phase</h3>
<p>When the cat executes a closed loop $\gamma$ in shape space — a sequence of
shape changes that returns it to its initial shape — the <em>holonomy</em> of the
connection $\mathcal{A}$ around $\gamma$ gives the net rotation:</p>
$$R_\gamma = \mathrm{Hol}_\mathcal{A}(\gamma) \in SO(3).$$<p>For the full non-Abelian case ($SO(3)$), the holonomy is a path-ordered
exponential along $\gamma$ and its relationship to the curvature involves
non-Abelian corrections. But the essential geometric intuition is captured
by the Abelian case — rotation about a single axis — where Stokes&rsquo;s theorem
gives the net rotation directly:</p>
$$\theta_\gamma = \iint_{\Sigma} F,$$<p>where $\Sigma$ is a surface bounded by $\gamma$ and $F = d\mathcal{A}$
is the curvature 2-form. The cat&rsquo;s net rotation per cycle is the integral
of the curvature over the area enclosed by its shape-change loop in
$\mathcal{S}$. For small loops, the curvature $F_\mathcal{A} = d\mathcal{A}</p>
<ul>
<li>\mathcal{A} \wedge \mathcal{A}$ determines the holonomy to leading order
in both the Abelian and non-Abelian cases.</li>
</ul>
<p>The rotation is <em>geometric</em>: it depends on the shape of the loop, not on the
speed at which the loop is traversed. A cat executing the same shape-change
sequence twice as fast achieves the same rotation in half the time.</p>
<hr>
<h2 id="the-connection-to-berry-phase">The Connection to Berry Phase</h2>
<p>The gauge structure of the falling cat problem is not an isolated curiosity.
It is the same mathematical structure that governs several central phenomena
in modern physics.</p>
<p><strong>The Berry phase</strong> (Berry 1984, <em>Proceedings of the Royal Society A</em>) arises
when a quantum system is transported adiabatically around a closed loop $C$ in
parameter space. The state acquires a phase</p>
$$\gamma_B = \oint_C \mathbf{A} \cdot d\mathbf{R},$$<p>where $\mathbf{A} = i\langle n(\mathbf{R}) | \nabla_\mathbf{R} | n(\mathbf{R}) \rangle$
is the Berry connection — a gauge field on parameter space. The Berry phase is
the holonomy of this connection, which is to say: the cat righting itself and
a quantum state accumulating a geometric phase are instances of the <em>same
mathematical theorem</em>.</p>
<p>Shapere and Wilczek (1989) made this connection explicit for deformable bodies,
noting that the net rotation of a swimming microorganism or a falling cat is
the holonomy of a gauge connection on shape space — exactly the Berry phase,
expressed in the language of classical mechanics.</p>
<p><strong>The Foucault pendulum</strong> precesses at a rate of $2\pi\sin\phi$ per sidereal
day, where $\phi$ is the latitude. The holonomy of the Levi-Civita connection
on $S^2$ for parallel transport around the circle of latitude is the solid
angle of the enclosed polar cap, $\Omega = 2\pi(1 - \sin\phi)$. The
lab-frame precession $2\pi\sin\phi = 2\pi - \Omega$ is the complementary
angle — the two sum to a full rotation because the local frame itself
completes one circuit per sidereal day. It is another geometric phase.</p>
<p><strong>The Aharonov-Bohm effect</strong> (1959) produces a phase shift for electrons
circling a solenoid, even when the electrons travel only through field-free
regions. The phase is the holonomy of the electromagnetic vector potential
$\mathbf{A}$ around the loop — a Berry phase for the electromagnetic field.</p>
<p>All four phenomena — the falling cat, the Berry phase, the Foucault pendulum,
the Aharonov-Bohm effect — are manifestations of the same structure: a
connection on a fiber bundle, and holonomy as the geometric consequence of
traversing a closed loop.</p>
<p>Batterman (2003, <em>Studies in History and Philosophy of Modern Physics</em> 34,
527–557) gives a particularly clear account of this unification, drawing out
the common mathematical skeleton and its physical implications.</p>
<hr>
<h2 id="high-rise-syndrome-terminal-velocity-and-the-parachute-cat">High-Rise Syndrome: Terminal Velocity and the Parachute Cat</h2>
<p>There is a grounding empirical footnote to the elegant geometry above. Whitney
and Mehlhaff (1987, <em>Journal of the American Veterinary Medical Association</em>
191, 1399–1403) analysed 132 cats brought to a Manhattan veterinary clinic after
falling from buildings of two to thirty-two stories. Their finding was
counterintuitive:</p>
<p>Cats falling from above seven stories had a <em>lower</em> injury rate than cats
falling two to six stories. Overall, 90% of the cats in the study survived,
with injuries paradoxically less severe at greater heights.</p>
<p>The explanation involves two phases. Below seven stories, the cat is still
accelerating: it is tense, its legs are extended to brace for impact, and it
absorbs the force of landing poorly. Above seven stories, the cat reaches
terminal velocity — approximately $100\,\mathrm{km/h}$ for a falling cat — and
then, apparently, <em>relaxes</em>. The vestibular system, having identified that the
fall is not ending imminently, switches from the righting reflex to a
parachute posture: legs spread horizontally, body flattened, increasing the
cross-sectional area and hence air resistance.</p>
<p>Terminal velocity is reached when the drag force equals the gravitational force:</p>
$$mg = \frac{1}{2} C_D \rho A v_t^2, \qquad
v_t = \sqrt{\frac{2mg}{C_D \rho A}}.$$<p>For a spread-eagle cat ($m \approx 4\,\mathrm{kg}$, $A \approx 0.06\,\mathrm{m}^2$,
$C_D \approx 1.0$, $\rho_\mathrm{air} \approx 1.2\,\mathrm{kg/m}^3$):</p>
$$v_t \approx \sqrt{\frac{2 \times 4 \times 9.8}{1.0 \times 1.2 \times 0.06}}
\approx 33\,\mathrm{m/s} \approx 120\,\mathrm{km/h}.$$<p>(The exact value depends on posture and fur drag; empirical estimates for
cats in the parachute posture are lower, roughly $25$–$30\,\mathrm{m/s}$,
because the effective area increases when the limbs are spread.)</p>
<p>A human in free-fall has terminal velocity around $55\,\mathrm{m/s}$
($200\,\mathrm{km/h}$) — faster, because the mass-to-area ratio is higher.
The cat, with its low mass and high drag relative to body weight, hits a
gentler terminal velocity and distributes the impact more effectively.</p>
<p>The study is sometimes cited as evidence that cats are invincible. A significant
caveat is <strong>survivorship bias</strong>: cats that died on impact were likely not brought
to the veterinary clinic, so the dataset underrepresents fatal outcomes,
especially for higher falls. The apparent decrease in injury rate above seven
stories may partly reflect the fact that the most severely injured cats from
those heights never entered the study. The aerodynamic posture explanation is
plausible, but the data do not cleanly separate it from the sampling bias.</p>
<hr>
<h2 id="robotics-and-spacecraft">Robotics and Spacecraft</h2>
<p>The falling cat problem has practical applications beyond veterinary statistics.</p>
<p><strong>Spacecraft attitude control</strong>: Astronauts in free fall can change their
body orientation without thrusters, using the same gauge-theoretic mechanism
as the cat. NASA and ESA have studied cat-inspired reorientation manoeuvres
for astronauts and satellites.</p>
<p><strong>Robotics</strong>: The two-cylinder model inspired early robot designs capable of
reorienting in free fall — useful for robots deployed from aircraft or
spacecraft. Subsequent work (including a 2022 review in <em>IEEE Transactions on
Robotics</em>) has produced legged robots that can right themselves after being
knocked over using shape-change sequences derived from the Montgomery connection.</p>
<p><strong>Gymnastics and diving</strong>: Human athletes performing somersaults and twists
exploit the same gauge structure, though without articulating the mathematics.
A tuck increases rotation rate (smaller $I$, constant $L$ → larger $\omega$);
a layout decreases it. Changing the tuck–layout timing mid-rotation produces
a net twist — holonomy in the shape space of a human body.</p>
<hr>
<h2 id="the-view-from-a-windowsill">The View from a Windowsill</h2>
<p>My cats have no opinion about fiber bundles. When one of them drops from the
top of the bookcase, she is not solving the variational problem</p>
$$\min_{\gamma \in \Omega} \int_\gamma |\dot{s}|^2 \, dt,
\quad \text{subject to } \mathrm{Hol}_\mathcal{A}(\gamma) = R_{180°},$$<p>she is executing a motor program refined over millions of years of feline
evolution. The vestibular system provides continuous feedback on body
orientation; the cerebellum coordinates the shape-change sequence; the whole
manoeuvre is over in a third of a second.</p>
<p>What physics tells us is that the manoeuvre is <em>possible</em> — that no law of
nature forbids a body with zero angular momentum from reorienting — and gives
the precise geometric reason: the curvature of a connection on shape space is
non-zero, which means the holonomy of closed loops is non-trivial.</p>
<p>The same curvature that allows a cat to right itself allows a quantum state to
accumulate a geometric phase, allows the Foucault pendulum to precess, and
allows the Aharonov-Bohm effect to shift an interference fringe without a local
field. These are not analogies. They are the same theorem, applied to different
physical systems in different mathematical languages.</p>
<p>I find this more remarkable than the cat.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Batterman, R.W. (2003). Falling cats, parallel parking, and polarized light.
<em>Studies in History and Philosophy of Modern Physics</em>, 34(4), 527–557.
<a href="https://doi.org/10.1016/S1355-2198(03)00062-5">https://doi.org/10.1016/S1355-2198(03)00062-5</a></p>
</li>
<li>
<p>Berry, M.V. (1984). Quantal phase factors accompanying adiabatic changes.
<em>Proceedings of the Royal Society A</em>, 392, 45–57.
<a href="https://doi.org/10.1098/rspa.1984.0023">https://doi.org/10.1098/rspa.1984.0023</a></p>
</li>
<li>
<p>Gbur, G.J. (2019). <em>Falling Felines and Fundamental Physics.</em> Yale University
Press.</p>
</li>
<li>
<p>Kane, T.R., &amp; Scher, M.P. (1969). A dynamical explanation of the falling cat
phenomenon. <em>International Journal of Solids and Structures</em>, 5(7), 663–670.
<a href="https://doi.org/10.1016/0020-7683(69)90086-9">https://doi.org/10.1016/0020-7683(69)90086-9</a></p>
</li>
<li>
<p>Marey, É.-J. (1894). Des mouvements que certains animaux exécutent pour
retomber sur leurs pieds lorsqu&rsquo;ils sont précipités d&rsquo;un lieu élevé. <em>La
Nature</em>, 10 November 1894.</p>
</li>
<li>
<p>Montgomery, R. (1993). Gauge theory of the falling cat. In M. Enos (Ed.),
<em>Dynamics and Control of Mechanical Systems</em> (Fields Institute Communications,
Vol. 1, pp. 193–218). American Mathematical Society.</p>
</li>
<li>
<p>Shapere, A., &amp; Wilczek, F. (Eds.). (1989). <em>Geometric Phases in Physics.</em>
World Scientific.</p>
</li>
<li>
<p>Whitney, W.O., &amp; Mehlhaff, C.J. (1987). High-rise syndrome in cats. <em>Journal
of the American Veterinary Medical Association</em>, 191(11), 1399–1403.</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-15</strong>: Corrected the Marey publication date from 22 November 1894 to 10 November 1894 (in text and in reference). Updated the Whitney &amp; Mehlhaff (1987) statistics to reflect that the 90% survival rate applies to all cats in the study, as reported in the paper, rather than specifically to those falling from above seven stories.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Can a Planet Have a Moon? Teaching Exomoon Detection with a Disco Ball Motor</title>
      <link>https://sebastianspicker.github.io/posts/exomoon-analogy-experiment/</link>
      <pubDate>Thu, 14 Sep 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/exomoon-analogy-experiment/</guid>
      <description>Every classroom treatment of exoplanet detection focuses on the transit method. What gets omitted is that moons of exoplanets could also host life — and that with a small motor and a slight modification to the standard transit experiment, you can show students what an exomoon signature looks like in a light curve. Published in MNU Journal in 2023.</description>
      <content:encoded><![CDATA[<p><em>This post describes the paper &ldquo;Ein Analogieexperiment zur Suche nach Exomonden&rdquo;
(An Analogy Experiment for the Search for Exomoons), published in MNU Journal
in 2023 together with Alexander Küpper.</em></p>
<hr>
<h2 id="the-gap-in-the-curriculum">The Gap in the Curriculum</h2>
<p>Most physics and astronomy teaching units that address the search for
extraterrestrial life focus on exoplanets. The transit method gets
visualised, a light curve gets plotted, and the lesson ends with: some
exoplanets are in the habitable zone. The end.</p>
<p>What tends to get omitted: moons of exoplanets — exomoons — could equally
be candidates for extraterrestrial life, particularly if the exoplanet
itself sits in the habitable zone. The moon would then be in the habitable
zone too, and a large moon could maintain the atmospheric conditions necessary
for liquid water. The possibility is taken seriously in the astrophysics
community, and survey data consistently shows that students find the question
of life in the universe among the most interesting topics in all of science.</p>
<p>The pedagogical gap is this: the transit method is routinely demonstrated
in analogy experiments, but the extension to exomoon detection is almost
never treated experimentally, even though it is a natural continuation of
the same experiment with only minor modifications. This paper is an attempt
to close that gap.</p>
<hr>
<h2 id="what-an-exomoon-signal-looks-like">What an Exomoon Signal Looks Like</h2>
<p>When only a planet transits a star, the resulting light curve shows a
characteristic symmetric dip: flux drops as the planet moves in front of
the star, holds at a reduced level during full transit, and recovers as
the planet exits. The normalised flux during the flat-bottomed phase is:</p>
$$I(t) = \frac{A_s - A_p}{A_s} = 1 - \frac{A_p}{A_s}$$<p>where the dip depth $\delta = A_p / A_s$ is determined by the ratio of the
planet&rsquo;s cross-sectional area to the star&rsquo;s.</p>
<p>When the planet has a moon, the situation is more complex. The light curve
is now governed by:</p>
$$I(t) = \frac{A_s - (A_p + A_m - A_{pm}(t))}{A_s}$$<p>where $A_m$ is the moon&rsquo;s cross-sectional area and $A_{pm}(t)$ is the
time-dependent overlap between the planet&rsquo;s and moon&rsquo;s projected disks
(the moon is orbiting the planet, so this overlap changes during the
transit).</p>
<p>The consequence: additional dips and asymmetries appear in the light curve.
The moon can transit slightly before the planet (causing a small flux dip
before the main transit begins), or slide in front of the planet during
the transit (temporarily reducing the combined occulting area, causing
a brief flux recovery in the middle of the dip), or emerge from behind
the planet on the exit side (causing a small dip after the main transit
ends). The exact signature depends on the relative sizes of planet and
moon, their orbital period ratio, and the geometry of the particular
transit.</p>
<p>These signatures are small. In real astrophysics, this is why no exomoon
has been unambiguously confirmed. In a classroom analogy experiment, the
signals are large enough to see clearly — which is exactly what makes the
experiment pedagogically useful.</p>
<hr>
<h2 id="the-experimental-setup">The Experimental Setup</h2>
<p>The starting point is a standard transit analogy experiment: a sphere
(the planet) on a rod, moved slowly around a lamp (the star) by a slowly
rotating motor. A light sensor — an Android smartphone running phyphox,
or an Arduino with a suitable sensor — records the illuminance over time.
The resulting light curve shows the characteristic symmetric transit dip.</p>
<p>The modification is straightforward: attach a small battery-powered motor
to the planet sphere, with a smaller sphere (the moon) on the motor&rsquo;s arm.
The motor we used is a disco ball motor — inexpensive, widely available,
and with a rotation speed that works well relative to the transit timescale
if you choose the geometry appropriately.</p>
<p>The result is a physical system with two independent circular motions:</p>
<ul>
<li>The planet orbiting the star (driven by the main slow-rotation motor)</li>
<li>The moon orbiting the planet (driven by the disco ball motor)</li>
</ul>
<p>When this system transits the &ldquo;star&rdquo; (the lamp), the light sensor records
a compound light curve with the exomoon signatures described above.</p>
<p><strong>One technical note on sensors:</strong> High sample rate matters here.
The exomoon signatures are brief features on top of the transit dip, and
a sensor that samples too slowly will average them out. We found that
the TI SensorTag CC2650, despite being a reasonable choice for the basic
transit experiment, has a light sensor sample rate of only 1.25 Hz —
too slow to resolve exomoon signatures reliably. Android smartphones and
Arduinos both achieve adequate sample rates. The Pasco light sensor
used in the paper samples at up to 20 Hz and resolves the features clearly.</p>
<hr>
<h2 id="reading-the-light-curves">Reading the Light Curves</h2>
<p>The paper presents two distinct light curve types that emerge from the
experiment, each with a different exomoon orbital configuration.</p>
<p><strong>Type 1</strong>: The moon&rsquo;s orbital period is short relative to the transit
duration. Multiple exomoon signatures appear within a single transit.
These include:</p>
<ul>
<li>A small dip before the main transit begins (moon transiting alone)</li>
<li>Asymmetric ingress/egress (moon leading or trailing the planet)</li>
<li>A brief flux recovery midway through the transit (moon passing
behind the planet, reducing the total occluding area)</li>
<li>A small post-transit dip (moon still in front of the star after
the planet has exited)</li>
</ul>
<p><strong>Type 2</strong>: Specific orbital phase alignment where the moon moves
directly behind the planet at the moment of maximum occultation. In
this case, the deepest point of the transit corresponds to planet alone
blocking the star (moon hidden behind planet). As the moon emerges from
behind the planet, the total occluded area increases again briefly before
both planet and moon exit.</p>
<p>This second case is particularly useful for quantitative analysis: if the
orbital geometry is right, students can separately determine the planet&rsquo;s
radius from the secondary dip depth and the combined planet-moon radius
from the primary dip depth.</p>
<hr>
<h2 id="video--light-curve-together">Video + Light Curve Together</h2>
<p>The paper recommends recording a video of the experiment simultaneously
with the light sensor measurement, from the perspective of the sensor
(i.e., looking up at the lamp from below). This technique — which is
also central to the <a href="/posts/exoplanet-hunting-smartphones/">transit method paper</a>
— is even more valuable here.</p>
<p>Without the video, the exomoon signatures in the light curve are easy
to misread as noise or experimental error. With the video, students can
advance frame by frame through the moments corresponding to the unusual
features and see exactly what the physical system was doing: the moon
sliding in front of the planet, the moon emerging from the planet&rsquo;s
shadow, the moon transiting alone at the start or end of the main event.</p>
<p>The cognitive load of interpreting an unfamiliar, complex signal drops
substantially when the signal can be correlated frame by frame with a
visual record of what produced it.</p>
<hr>
<h2 id="differentiation-and-extensions">Differentiation and Extensions</h2>
<p>The paper suggests the exomoon experiment as an extension for higher-ability
students at the end of a unit on exoplanet detection, not as the entry
point. The transit method should come first; the exomoon experiment builds
on it.</p>
<p>For students who are comfortable with quantitative analysis, the formula
above allows a full treatment: given the measured light curve and a known
lamp radius, students can derive both the planet radius and the moon radius
from the dip depths at the appropriate moments.</p>
<p>Possible further extensions:</p>
<ol>
<li><strong>Noise floor investigation</strong>: systematically vary the moon&rsquo;s size and
determine the smallest moon still detectable. This connects directly
to the real astrophysical problem — the reason no exomoon has been
confirmed is that the signal is buried in noise.</li>
<li><strong>Period ratio effects</strong>: vary the transit speed (and thus the effective
period ratio between moon and planet) to see how the light curve changes.</li>
<li><strong>Sensor comparison</strong>: test different sensor types and compare their
ability to resolve exomoon signatures. This turns the instrumental
limitation into an explicit investigation.</li>
</ol>
<p>For the deeper theoretical connections — transit timing variations, the
David Kipping approach to exomoon detection — see the
<a href="/posts/the-gift-of-transits/">transit simulation post</a>, which models
these effects in a browser-based tool.</p>
<p><em>For the secondary school curriculum context and the Direct Imaging
pre-experiment that typically precedes the transit unit, see
<a href="/posts/fremde-welten-exoplanet-teaching/">Fremde Welten</a>.</em></p>
<hr>
<h2 id="references">References</h2>
<p>Küpper, A., &amp; Spicker, S. J. (2023). Ein Analogieexperiment zur Suche
nach Exomonden. <em>MNU Journal</em>, 76(5).</p>
<p>Sato, M., &amp; Asada, H. (2009). Effects of mutual transits by extrasolar
planet-companion systems on light curves. <em>Publications of the
Astronomical Society of Japan</em>, 61(4), L29–L34.</p>
<p>Tusnski, L. R. M., &amp; Valio, A. (2011). Transit model of planets with
moon and ring systems. <em>The Astrophysical Journal</em>, 743(1), 97.</p>
<p>Heller, R. (2018). On the detection of extrasolar moons and rings.
In H. J. Deeg &amp; J. A. Belmonte (Eds.), <em>Handbook of Exoplanets</em>
(pp. 835–851). Springer.</p>
<p>Küpper, A., Spicker, S. J., &amp; Schadschneider, A. (2022).
Analogieexperimente zur Transitmethode für den Physik- und
Astronomieunterricht in der Sekundarstufe I. <em>Astronomie+Raumfahrt
im Unterricht</em>, 59(188), 46–50.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-10-03</strong>: Updated the Tusnski &amp; Valio (2011) reference to use article number 97, replacing the previous page range &ldquo;1–16.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>How Low Can You Go? Measuring Latency for Networked Music Performance Across Europe</title>
      <link>https://sebastianspicker.github.io/posts/nmp-latency-lola-mvtp/</link>
      <pubDate>Sat, 26 Aug 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/nmp-latency-lola-mvtp/</guid>
      <description>We measured end-to-end audio and video latency for LoLa and MVTP across six European research-network links. One-way audio latency ranged from 7.5 to 22.5 ms. Routing topology mattered more than geographic distance. Enterprise firewalls were a disaster. Here is what we found.</description>
      <content:encoded><![CDATA[<p><em>This post summarises a manuscript submitted with Benjamin Bentz and colleagues
from the RAPP Lab network. The paper is not yet peer-reviewed; numbers and
conclusions are based on operational measurements collected 2020–2023.
Feedback welcome — particularly from anyone who has run similar measurements
on non-European or wireless-last-mile links.</em></p>
<hr>
<h2 id="the-problem">The Problem</h2>
<p>Musicians playing together in the same room experience acoustic propagation
delay of roughly 3 ms per metre of separation — essentially free latency that
most ensembles never consciously register. When you distribute musicians across
a network, you inherit that propagation cost plus everything the signal chain
adds on top: buffers, codec processing, routing hops, switching overhead.</p>
<p>Conventional video-conferencing (Zoom, Teams, etc.) operates at end-to-end
delays of roughly 100–300 ms. That is comfortable for speech — human
conversation tolerates round-trip delays up to about 250 ms before it starts
to feel wrong — but it is well above the threshold at which ensemble timing
breaks down. The NMP literature generally puts the upper bound for
synchronous rhythmic playing somewhere between 20 and 30 ms one-way, with
considerable variation by tempo, instrument, and whether the performers can
see each other [Carôt 2011; Tsioutas &amp; Xylomenos 2021; Medina Victoria 2019].</p>
<p>Specialised low-latency systems cut the processing overhead by avoiding
compression, using hardware-accelerated video pipelines, and riding
research-and-education networks that offer better jitter characteristics than
commodity internet. Two of the better-known ones are <strong>LoLa</strong> (Low Latency
Audio Visual Streaming System, developed at Conservatorio G. Tartini Trieste)
and <strong>MVTP</strong> (Modular Video Transmission Platform, developed at CESNET in
Prague). We deployed both at Hochschule für Musik und Tanz Köln as part of
the RAPP Lab collaboration and spent about two and a half years measuring them.</p>
<hr>
<h2 id="the-latency-budget">The Latency Budget</h2>
<p>End-to-end latency in NMP is cumulative and non-recoverable. Once delay enters
the chain, nothing downstream can subtract it. The budget looks like:</p>
\[
  L_\text{total} = L_\text{capture} + L_\text{buffer} + L_\text{network} + L_\text{playback}
\]<p>Network latency \( L_\text{network} \) includes propagation (roughly
\( d / (2 \times 10^8) \) seconds for a fibre link of distance \( d \) metres,
accounting for the refractive index of glass) plus per-hop processing.
Everything else is system-dependent.</p>
<p>The key insight is that \( L_\text{buffer} \) is not fixed — it is a
consequence of jitter. A jittery link forces larger buffers to avoid
underruns, which directly adds to perceived latency. This is why raw bandwidth
is almost irrelevant for NMP: a 1 Gbps link with erratic jitter will perform
worse than a 100 Mbps link with deterministic behaviour.</p>
<hr>
<h2 id="what-we-measured-and-how">What We Measured and How</h2>
<p><strong>Network RTT.</strong> ICMP ping, 1,000 packets per run. We report the median as a
robust summary; the mean is too sensitive to the occasional rogue packet.</p>
<p><strong>End-to-end audio latency.</strong> An audio signal-loop: transmit a test signal
from site A to site B, have site B return it immediately, estimate round-trip
delay by cross-correlation. One-way latency = signal-loop RTT / 2. This method
captures local processing and buffering at both ends in addition to the network
leg, which is what actually matters for a musician.</p>
<p><strong>Video latency.</strong> Component-based estimation (capture frame cadence +
processing pipeline + display). We did not have a frame-accurate video
loopback method, so treat these numbers as estimates rather than precision
measurements. That caveat matters less than it might seem because, as you will
see, video was always slower than audio by a wide enough margin that it did not
drive the operational decisions.</p>
<p><strong>Firewall impact.</strong> A controlled 4-hour session on the Cologne–Vienna link,
alternating between a DMZ configuration (direct research-backbone access) and
a transparent enterprise firewall, logging packet loss and decoder instability.</p>
<p>Six partner institutions, air distances from 175 to 1,655 km, measurements
collected between October 2020 and March 2023.</p>
<hr>
<h2 id="results">Results</h2>
<h3 id="audio-latency">Audio latency</h3>
<table>
  <thead>
      <tr>
          <th>Partner (from Cologne)</th>
          <th>Air distance (km)</th>
          <th>Median RTT (ms)</th>
          <th>One-way audio latency (ms)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Prague</td>
          <td>535</td>
          <td>5.0</td>
          <td>7.5</td>
      </tr>
      <tr>
          <td>Vienna</td>
          <td>745</td>
          <td>7.0</td>
          <td>9.5</td>
      </tr>
      <tr>
          <td>Detmold</td>
          <td>175</td>
          <td>7.5</td>
          <td>10.0</td>
      </tr>
      <tr>
          <td>Trieste</td>
          <td>775</td>
          <td>10.0</td>
          <td>12.5</td>
      </tr>
      <tr>
          <td>Rome</td>
          <td>1,090</td>
          <td>17.5</td>
          <td>20.0</td>
      </tr>
      <tr>
          <td>Tallinn</td>
          <td>1,465</td>
          <td>19.5</td>
          <td>22.0–22.5</td>
      </tr>
  </tbody>
</table>
<p>The number that jumps out immediately: <strong>Detmold (175 km away) has higher
latency than Vienna (745 km away).</strong> This is a routing issue, not a physics
one. The Detmold link was traversing a less efficient campus path that added
extra hops before reaching the research backbone. Prague, by contrast, was
connected via a particularly short routing path and achieved the lowest latency
of any link despite not being the geographically closest.</p>
<p>The practical implication: geographic distance is a poor predictor of
achievable latency. Measure RTT; do not estimate from a map.</p>
<h3 id="video-latency">Video latency</h3>
<p>Estimated one-way video latency was 20–35 ms across all configurations,
with the dominant contributions coming from frame cadence (at 60 fps, you wait
up to 16.7 ms for a frame to be captured regardless of what the network is
doing) and buffering at the decoder. In every deployment, video consistently
lagged audio. Musicians unsurprisingly fell back on audio for synchronization
and treated video as a supplementary cue — useful for expressive and social
information, not for timing.</p>
<h3 id="the-firewall-experiment">The firewall experiment</h3>
<p>This is the result I find most important for anyone planning a similar
deployment.</p>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>DMZ (no firewall)</th>
          <th>With enterprise firewall</th>
          <th>Change</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Dropped audio packets</td>
          <td>0.002%</td>
          <td>0.052%</td>
          <td>+26×</td>
      </tr>
      <tr>
          <td>Audio buffer realignments/hour</td>
          <td>0.3</td>
          <td>3.9</td>
          <td>+13×</td>
      </tr>
      <tr>
          <td>Dropped video frames</td>
          <td>0.04%</td>
          <td>0.74%</td>
          <td>+18×</td>
      </tr>
      <tr>
          <td>Additional latency</td>
          <td>—</td>
          <td>0.5–1.0 ms</td>
          <td>—</td>
      </tr>
  </tbody>
</table>
<p>The raw latency increase (0.5–1.0 ms) is small and largely irrelevant. The
packet loss and buffer event increases are not. A 26-fold increase in dropped
audio packets on an otherwise uncongested link means the firewall is doing
something — likely deep packet inspection or stateful tracking — that
introduces enough irregularity to destabilise small audio buffers. This forces
you to either accept dropouts or increase buffer size, and increasing buffer
size increases latency.</p>
<p>The message is: if your institution requires traffic inspection for
security policy compliance, you are paying a latency tax that is more about
<em>stability</em> than the raw delay number, and that tax is substantial.</p>
<hr>
<h2 id="discussion">Discussion</h2>
<p>Based on the measured latencies and reported musical tolerances from the
literature, I would roughly characterise the links as follows:</p>
<ul>
<li>
<p><strong>Prague, Vienna, Detmold, Trieste (7.5–12.5 ms):</strong> Compatible with
most repertoire including rhythmically demanding chamber music.
Musicians in our sessions reported the interaction as &ldquo;natural&rdquo; or
&ldquo;like being in the same room&rdquo; at these latencies.</p>
</li>
<li>
<p><strong>Rome (20 ms):</strong> Usable with attention to repertoire and tempo.
Slower movements and music where tight rhythmic locking is not the
primary aesthetic concern work well. Rhythmically dense passages at
fast tempi become harder.</p>
</li>
<li>
<p><strong>Tallinn (22–22.5 ms):</strong> At the upper edge of the comfortable range.
Still usable — we ran a concert collaboration in March 2023 — but
musicians adapt their interaction strategies, leaning more on musical
anticipation than reactive synchronization.</p>
</li>
</ul>
<p>What is notably absent from this data: anything outside the European
research-network context. All six links ran on GÉANT or national backbone
equivalents with favourable jitter characteristics. The numbers almost
certainly do not transfer directly to commodity internet, satellite links, or
mixed-topology paths.</p>
<p><strong>Limitations I want to be explicit about.</strong> The video latency estimates are
component-based, not directly measured, so treat that 20–35 ms range with
appropriate skepticism. The firewall comparison is a single 4-hour session on
a single link; I would not want to extrapolate too aggressively to other
firewall vendors or configurations. And this is an operational measurement
study, not a controlled perceptual experiment — I cannot tell you from this
data at precisely what latency threshold a given ensemble will declare a
session unusable, because that depends on the music, the musicians, and
factors I did not measure.</p>
<hr>
<h2 id="practical-takeaways">Practical Takeaways</h2>
<p>For anyone setting up a similar system:</p>
<ol>
<li><strong>Measure RTT before committing to a partner institution.</strong> A 100 km
difference in air distance can easily be swamped by routing differences.</li>
<li><strong>Get DMZ placement if at all possible.</strong> The firewall results suggest
this matters more than any other single configuration decision.</li>
<li><strong>Minimise campus hops between your endpoint and the research backbone.</strong>
Each additional switching layer adds jitter risk.</li>
<li><strong>Use small audio buffers and monitor for underruns.</strong> If your baseline
RTT is good, your buffer can be small; if underruns increase, that is an
early warning that network stability is degrading before packet loss
becomes audible.</li>
<li><strong>Accept that video will lag audio and design your session accordingly.</strong>
This is not a system failure; it is a consequence of how video pipelines
work at low latency. Plan for it.</li>
</ol>
<hr>
<h2 id="references">References</h2>
<p>Carôt, A. (2011). Low latency audio streaming for Internet-based musical
interaction. <em>Advances in Multimedia and Interactive Technologies</em>.
<a href="https://doi.org/10.4018/978-1-61692-831-5.ch015">https://doi.org/10.4018/978-1-61692-831-5.ch015</a></p>
<p>Drioli, C., Allocchio, C., &amp; Buso, N. (2013). Networked performances and
natural interaction via LOLA. <em>LNCS</em>, 7990, 240–250.
<a href="https://doi.org/10.1007/978-3-642-40050-6_21">https://doi.org/10.1007/978-3-642-40050-6_21</a></p>
<p>Medina Victoria, A. (2019). <em>A method for the measurement of the latency
tolerance range of Western musicians</em>. Ph.D. dissertation, Cork Institute
of Technology (now Munster Technological University).</p>
<p>Rottondi, C., Chafe, C., Allocchio, C., &amp; Sarti, A. (2016). An overview on
networked music performance technologies. <em>IEEE Access</em>, 4, 8823–8843.
<a href="https://doi.org/10.1109/ACCESS.2016.2628440">https://doi.org/10.1109/ACCESS.2016.2628440</a></p>
<p>Tsioutas, K. &amp; Xylomenos, G. (2021). On the impact of audio characteristics
to the quality of musicians experience in network music performance. <em>JAES</em>,
69(12), 914–923. <a href="https://doi.org/10.17743/jaes.2021.0041">https://doi.org/10.17743/jaes.2021.0041</a></p>
<p>Ubik, S., Halak, J., Kolbe, M., Melnikov, J., &amp; Frič, M. (2021). Lessons
learned from distance collaboration in live culture. <em>AISC</em>, 1378, 608–615.
<a href="https://doi.org/10.1007/978-3-030-74009-2_77">https://doi.org/10.1007/978-3-030-74009-2_77</a></p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-20</strong>: Updated the Drioli et al. (2013) LNCS volume number to 7990 (ECLAP 2013 proceedings). Updated the Ubik et al. (2021) AISC volume number to 1378 and page range to 608–615. Updated the fifth author&rsquo;s surname to &ldquo;Frič.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Milky Way Is a Gravitational Wave Detector</title>
      <link>https://sebastianspicker.github.io/posts/nanograv-pulsar-timing-gravitational-wave-background/</link>
      <pubDate>Fri, 07 Jul 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/nanograv-pulsar-timing-gravitational-wave-background/</guid>
      <description>LIGO uses 4-kilometre laser arms to detect gravitational waves at hundreds of hertz. Pulsar timing arrays use millisecond pulsars scattered across the Milky Way — arms measured in light-years — to detect gravitational waves at nanohertz frequencies, ten orders of magnitude lower. In June 2023, five independent pulsar timing arrays simultaneously announced the detection of a stochastic gravitational wave background. The Milky Way itself was the detector.</description>
      <content:encoded><![CDATA[<p>On 28–29 June 2023, four independent research collaborations published papers simultaneously: NANOGrav (North American Nanohertz Observatory for Gravitational Waves), the EPTA (European Pulsar Timing Array), the PPTA (Parkes Pulsar Timing Array, Australian), and InPTA (Indian Pulsar Timing Array). Each announced essentially the same result: evidence for a stochastic gravitational wave background at nanohertz frequencies. The simultaneous coordinated publication is itself telling — in physics, that kind of coordination usually signals that each group needed the others to make the claim credible. If one group had published alone, the community would have been skeptical. Four independent datasets saying the same thing is a different matter.</p>
<p>The detector for all four collaborations was not a machine. It was the Milky Way — specifically, a collection of millisecond pulsars scattered across the galaxy, separated by thousands of light-years. The arms of this instrument make LIGO look microscopic by comparison, and they are pointed in every direction at once.</p>
<p>There is a professional habit, well established in physics, of finding the scale of things either comforting or terrifying. This one is genuinely awe-inspiring.</p>
<h2 id="the-scale-problem">The scale problem</h2>
<p>To detect a gravitational wave, your instrument needs to be roughly comparable in size to the wavelength you are trying to observe. This is not a hard rule — LIGO&rsquo;s 4-km arms are much shorter than the wavelengths it detects — but the sensitivity of any interferometer scales with arm length, so the constraint matters practically even when it is not absolute.</p>
<p>LIGO detects gravitational waves in the frequency range of roughly 10–1000 Hz. A wave at 100 Hz has a wavelength of</p>
$$\lambda = \frac{c}{f} = \frac{3 \times 10^8 \text{ m/s}}{100 \text{ Hz}} \approx 3000 \text{ km},$$<p>about a quarter of the Earth&rsquo;s diameter. LIGO&rsquo;s 4-km Fabry-Pérot arm cavities are short compared to the wavelength, but the cavities fold the light hundreds of times to achieve an effective path length of about 1600 km, and the interferometer is exquisitely sensitive to differential arm-length changes of order $10^{-19}$ m — a fraction of the proton radius. The physics is heroic. But it works at the scale of stellar-mass binary mergers: neutron star–neutron star, neutron star–black hole, stellar-mass black hole–black hole. Those systems radiate at audible frequencies. LIGO famously converts its strain signals to sound, and the chirps do sound like something from a science fiction film.</p>
<p>Now consider pulsar timing arrays. The targets are nanohertz gravitational waves: frequencies of order $f \sim 10^{-9}$ to $10^{-8}$ Hz. At $10^{-9}$ Hz, the wavelength is</p>
$$\lambda = \frac{c}{f} \approx \frac{3 \times 10^8 \text{ m/s}}{10^{-9} \text{ Hz}} = 3 \times 10^{17} \text{ m} \approx 10 \text{ pc} \approx 33 \text{ light-years}.$$<p>To detect oscillations at these frequencies, you need arms measured in light-years. Tens to thousands of light-years. The nearest millisecond pulsars are a few hundred light-years away; the most distant ones used in PTA arrays are several thousand light-years distant. By accident of the galaxy&rsquo;s size and the distribution of recycled pulsars within it, we happen to live inside an instrument of approximately the right dimensions to detect nanohertz gravitational waves. The galaxy did not plan this. We got lucky.</p>
<h2 id="millisecond-pulsars-as-cosmic-clocks">Millisecond pulsars as cosmic clocks</h2>
<p>A gravitational wave detector is only as good as its clock. LIGO measures differential length changes in its arm cavities using the interference of laser light — effectively using the constancy of the speed of light as a metronome. Pulsar timing arrays use a different clock: the rotation of a neutron star.</p>
<p>Ordinary pulsars are the collapsed remnants of massive stars, born in supernova explosions. A neutron star of roughly 1.4 solar masses is compressed into a sphere about 10 km across, rotating rapidly and radiating beams of radio waves from near its magnetic poles. The rotation periods of young pulsars are typically of order seconds, and they spin down on timescales of millions of years as they lose rotational energy to magnetic dipole radiation. They are not particularly stable clocks — the spin-down is uneven, and they occasionally &ldquo;glitch,&rdquo; suddenly spinning up by a small amount.</p>
<p>Millisecond pulsars (MSPs) are a different beast entirely. They have been <em>recycled</em>: spun up to near-millisecond rotation periods by accreting matter from a binary companion star. The accretion process deposits angular momentum onto the neutron star, increasing its spin rate dramatically, while simultaneously burying and weakening its magnetic field. The typical MSP has a surface magnetic field of order $10^4$–$10^5$ G, four to five orders of magnitude weaker than a young pulsar&rsquo;s $10^9$–$10^{12}$ G. Since the spin-down torque scales as $B^2$, the weaker field means the MSP loses rotational energy far more slowly. Once the accretion stops — when the companion has exhausted its transferable mass — the MSP is left spinning rapidly and stably, with a spin-down rate of order $10^{-20}$ s/s.</p>
<p>The result is rotational stability competitive with terrestrial atomic clocks. PSR J0437$-$4715, one of the best-timed MSPs, has a rotation period of $P \approx 5.76$ ms and a timing residual — the scatter of individual pulse arrival times around the best-fit timing model — of order 100 ns over decades of observation. For a pulsar completing about 174 rotations per second, a residual of 100 ns over a baseline of years is remarkable. The fractional frequency stability is $\delta P / P \sim 10^{-14}$ or better. These are not merely good clocks; they are among the most stable periodic phenomena known to physics.</p>
<p>The timing model accounts for everything we know about the pulsar&rsquo;s environment: its spin period and spin-down rate, its proper motion across the sky, parallax (from which we get the distance), Shapiro delay from any companions, dispersion measure variations in the interstellar medium, and more. After subtracting all modelled effects, what remains are timing residuals — small, unexplained deviations in the pulse arrival times. If a gravitational wave passes through, it will appear in those residuals.</p>
<h2 id="how-a-gravitational-wave-shifts-a-pulse-arrival-time">How a gravitational wave shifts a pulse arrival time</h2>
<p>A gravitational wave is a propagating perturbation of the spacetime metric. In the transverse-traceless (TT) gauge, a wave propagating along the $z$-direction perturbs the metric as</p>
$$ds^2 = -c^2 dt^2 + \bigl(1 + h_+\bigr)dx^2 + \bigl(1 - h_+\bigr)dy^2 + 2h_\times \, dx \, dy + dz^2,$$<p>where $h_+$ and $h_\times$ are the two polarisation amplitudes, both functions of $t - z/c$. As the wave passes, the proper distances in the $x$- and $y$-directions oscillate: space itself is being stretched and squeezed at the wave frequency.</p>
<p>A radio pulse travelling from a pulsar to Earth traverses this oscillating spacetime. The proper path length changes, and so does the travel time. The timing residual induced in a pulsar in direction $\hat{n}$ by a gravitational wave with wavevector $\hat{k}$ is</p>
$$R(t) \propto \int_0^t dt' \, \frac{\Delta\nu(t')}{\nu} \propto \int_0^t dt' \, h(t', \hat{n}, \hat{k}),$$<p>where $h$ is an appropriate contraction of the metric perturbation with the geometry of the Earth-pulsar baseline. The key point: the timing residual is the <em>time integral</em> of the metric strain. For a sinusoidal wave at frequency $f$, the residual oscillates at the same frequency but with amplitude suppressed by a factor of $1/(2\pi f T_{\rm obs})$ relative to what one might naively expect — which is partly why PTA analysis is technically demanding.</p>
<p>For a single pulsar, a timing residual tells you that <em>something</em> disturbed the spacetime between Earth and the pulsar — but it could be a systematic in the timing model, interstellar medium fluctuations, or intrinsic pulsar noise. You cannot claim a gravitational wave detection from one pulsar alone. What you need is the correlation between many pulsars.</p>
<h2 id="the-hellings-downs-curve">The Hellings-Downs curve</h2>
<p>Here is the central idea of pulsar timing array science. Consider two pulsars in directions $\hat{n}_a$ and $\hat{n}_b$, separated on the sky by an angle $\theta$ such that $\cos\theta = \hat{n}_a \cdot \hat{n}_b$. Both are embedded in the same stochastic gravitational wave background — a superposition of waves arriving from all directions, at all frequencies in the nanohertz band, with random phases and amplitudes. The timing residuals of both pulsars will be perturbed by this background. The question is: what is the expected cross-correlation between their residuals?</p>
<p>Hellings and Downs (<a href="#ref-HellingsDowns1983">1983</a>) computed this, assuming the background is isotropic (equal power from all directions), unpolarised, and stationary. The answer is now called the Hellings-Downs (HD) curve:</p>
$$\Gamma(\theta) = \frac{3}{2} \left(\frac{1-\cos\theta}{2}\right) \ln\!\left(\frac{1-\cos\theta}{2}\right) - \frac{1}{4}\left(\frac{1-\cos\theta}{2}\right) + \frac{1}{2}.$$<p>Let me unpack the features of this function:</p>
<ul>
<li>At $\theta = 0$ (the same pulsar, or two pulsars in the same direction): $\Gamma(0) = 1/2$, maximum positive correlation. This makes sense — both pulsars see the same wave.</li>
<li>At $\theta \approx 50°$: the curve crosses zero and turns negative.</li>
<li>At $\theta = \pi/2$ (pulsars at right angles): $\Gamma(\pi/2) \approx -0.15$, near the minimum of the curve.</li>
<li>At $\theta = \pi$ (antipodal pulsars, opposite directions on the sky): $\Gamma(\pi) = 1/4$. Positive correlation even for pulsars in opposite directions — counterintuitive but correct.</li>
<li>In between, the curve dips negative (anticorrelated) for angles roughly $50°$–$120°$.</li>
</ul>
<p>The shape is uniquely <em>quadrupolar</em>. It arises directly from the spin-2 tensor nature of gravitational waves. A scalar perturbation (like a monopole clock error common to all pulsars — such as an error in the terrestrial time standard used to timestamp the observations) would produce a flat, angle-independent correlation. A dipole perturbation (like an error in Earth&rsquo;s ephemeris, or a systematic in our knowledge of Earth&rsquo;s position) would produce a dipolar, $\cos\theta$ pattern. Only spin-2 tensor radiation produces the Hellings-Downs shape.</p>
<p>This is why the HD curve is the smoking gun. If you observe cross-correlations between pulsar timing residuals that match this specific, non-trivial curve as a function of angular separation — something that dips negative around 90° and recovers to a positive value at 180° — you have direct evidence that a tensor gravitational wave background is responsible. No other known astrophysical systematic produces this pattern.</p>
<p>The difficulty is statistical. You need enough pulsars, spread over a wide range of sky angles, each timed with sufficient precision, over a long enough baseline, that you can measure this correlation function with confidence. That has been the programme of the PTA collaborations since the 1990s. In June 2023, they had enough.</p>
<h2 id="the-2023-evidence">The 2023 evidence</h2>
<p>The NANOGrav 15-year dataset (<a href="#ref-Agazie2023">Agazie et al., 2023</a>) comprises 68 millisecond pulsars observed for up to 15 years, with an average of roughly 2200 timing observations per pulsar. The dataset represents an enormous investment of telescope time — primarily at the Arecibo Observatory (until its collapse in December 2020) and the Green Bank Telescope.</p>
<p>The analysis found an excess of low-frequency noise common to many pulsars, consistent with a power-law spectral shape. More importantly, when the cross-correlations between all pairs of pulsars were computed and binned by angular separation, the result was consistent with the Hellings-Downs curve. The statistical significance of the HD correlation — that is, the evidence that the spatial correlation pattern matches the quadrupolar prediction rather than some isotropic or zero-correlation model — was 3–4$\sigma$ depending on the analysis method and prior assumptions. The collaboration carefully described this as &ldquo;evidence for&rdquo; rather than &ldquo;detection of&rdquo; a gravitational wave background, following community conventions (detection would conventionally require $\geq 5\sigma$).</p>
<p>Simultaneously, the EPTA published its second data release (<a href="#ref-Antoniadis2023">Antoniadis et al., 2023</a>), the PPTA published its results (<a href="#ref-Reardon2023">Reardon et al., 2023</a>), and InPTA contributed its analysis. A fifth collaboration, the Chinese Pulsar Timing Array (CPTA), also published consistent results (<a href="#ref-Xu2023">Xu et al., 2023</a>). Each group used independent datasets, different telescopes, different software pipelines, different statistical methodologies. All found the same thing.</p>
<p>The fact that five groups independently recovered a consistent signal with approximately the right spectral shape and approximately the correct spatial correlations is the argument for reality. Any single group&rsquo;s result could be explained by a systematic error in their data or analysis. Five groups with independent data and methods converging on the same result is much harder to explain as coincidence.</p>
<p>The combined interpretation is clear: something is producing a stochastic background of nanohertz gravitational waves permeating the galaxy, with spatial correlations consistent with the tensor quadrupole signature predicted by general relativity. The Milky Way is a gravitational wave detector, and it has measured something.</p>
<h2 id="what-is-making-the-noise">What is making the noise?</h2>
<p>The million-dollar question. The observed signal has a characteristic spectral shape: the power spectral density of the timing residuals scales approximately as $S(f) \propto f^{-13/3}$, or equivalently, the gravitational wave energy density spectrum</p>
$$\Omega_{\rm GW}(f) \propto f^{2/3}.$$<p>This $f^{2/3}$ scaling is the expected spectrum for an ensemble of circular binary systems driven purely by gravitational wave emission — specifically, a population of supermassive black hole binaries (SMBHBs) distributed across a wide range of masses and redshifts.</p>
<p>Here is the astrophysical picture. When two massive galaxies merge — an event that happens billions of times over cosmic history — their central supermassive black holes (each typically $10^7$–$10^{10}$ solar masses) do not immediately merge. First, they sink toward the centre of the merger remnant by dynamical friction against the stellar background, forming a loosely bound binary on scales of a parsec or so. The binary then hardens: three-body interactions with individual stars passing close to the binary extract orbital energy, driving the pair to smaller separations. Eventually, when the binary has hardened to a separation where gravitational wave emission dominates the energy loss, the pair inspirals and merges on a timescale of millions to billions of years.</p>
<p>The incoherent superposition of gravitational wave emission from the many billions of SMBHB systems across the observable universe — at all masses, at all orbital frequencies, at all redshifts — produces a stochastic background. It is, in a sense, cosmic traffic noise: no individual merger is detectable as a discrete event, but the combined hum from all of them is. The predicted spectral amplitude and shape from this population are broadly consistent with the observed signal, though the uncertainties on the astrophysical model are large enough that the agreement is not a precise test.</p>
<p>Alternative sources are possible. A first-order phase transition in the early universe (such as a QCD or electroweak phase transition) would produce a background of gravitational waves with a different spectral shape — more peaked and potentially harder than the SMBHB prediction. A network of cosmic strings, topological defects from symmetry-breaking phase transitions in the early universe, would produce yet another spectrum, approximately flat in $\Omega_{\rm GW}(f)$. Primordial gravitational waves from inflation are expected at far lower frequencies, near the CMB scale, but with a nearly scale-invariant spectrum that could contribute at nanohertz frequencies in some models. The 2023 data are most consistent with the SMBHB interpretation, but cannot rule out contributions from early-universe sources — or combinations of both. Future data with longer baselines and more pulsars will tighten the spectral measurements and may reveal deviations from the SMBHB prediction.</p>
<p>There is also the tantalising prospect of eventually resolving individual SMBHB systems above the stochastic background — the gravitational wave equivalent of resolving individual radio sources above the extragalactic background. No individual SMBHB has yet been identified in PTA data, but the 15-year NANOGrav dataset already places interesting upper limits on the most massive known candidate systems.</p>
<h2 id="multi-frequency-gravitational-wave-astronomy">Multi-frequency gravitational wave astronomy</h2>
<p>We are now, unambiguously, in the era of multi-frequency gravitational wave astronomy. The universe produces gravitational waves across an enormous span of frequencies, and different frequency windows probe fundamentally different source populations:</p>
<ul>
<li>
<p><strong>Primordial / CMB B-modes</strong> (~$10^{-17}$ Hz): Wavelengths comparable to the Hubble scale. Primordial tensor perturbations from inflation would imprint a distinctive B-mode polarisation pattern in the cosmic microwave background. No confirmed detection yet; the BICEP/Keck programme and future CMB experiments (LiteBIRD, CMB-S4) are sensitive to this regime.</p>
</li>
<li>
<p><strong>Pulsar timing arrays</strong> (~$10^{-9}$–$10^{-8}$ Hz): The nanohertz band, just accessed by NANOGrav and partners. Sources: SMBHB inspirals, possibly early-universe phase transitions and cosmic strings. Arm length: thousands of light-years.</p>
</li>
<li>
<p><strong>LISA</strong> (~$10^{-4}$–$10^{-1}$ Hz): The millihertz band. The LISA space interferometer (planned launch in the 2030s) will have arm lengths of 2.5 million km and will detect SMBHB mergers directly as they happen, stellar-mass compact binary inspirals years before their LIGO-band merger, and possibly signals from extreme mass-ratio inspirals. LISA will see the SMBHB sources that PTAs see as a stochastic background, but in their final years of inspiral.</p>
</li>
<li>
<p><strong>LIGO/Virgo/KAGRA</strong> (~10–$10^3$ Hz): The audible band. Stellar-mass black hole and neutron star mergers. Over 90 confirmed events as of the end of O3, with many more candidates in O4. Source masses: $\sim$1–100 $M_\odot$.</p>
</li>
</ul>
<p>This is the gravitational wave analogue of multi-wavelength astronomy. Just as the universe looks completely different in radio, infrared, optical, X-ray, and gamma-ray light — each wavelength band revealing different physical processes and source populations — the universe sounds completely different at each gravitational wave frequency. PTAs hear the rumble of cosmic structure formation; LISA will hear the whisper of the final million years before the most massive black hole mergers; LIGO hears the sharp crack of stellar-mass collisions.</p>
<p>The 2023 announcements represent the opening of the nanohertz window. We have gone from one gravitational wave frequency band (LIGO/Virgo) to two. The next decade, with LISA launching and PTAs continuing to accumulate data, will see the opening of a third.</p>
<h2 id="a-brief-detour-ligos-o4-run-and-the-mass-gap-object">A brief detour: LIGO&rsquo;s O4 run and the mass-gap object</h2>
<p>While the PTA collaborations were making their June 2023 announcement, LIGO&rsquo;s fourth observing run (O4, which ran from May 2023 to June 2025) was proceeding at an extraordinary rate. The upgraded detectors were detecting candidate gravitational wave events at roughly one every two to three days — over 200 candidates across the full run. This is now a production science rather than an exploration.</p>
<p>Among the most scientifically interesting events was GW230529, detected on 29 May 2023 and published by the LIGO Scientific Collaboration (<a href="#ref-Abbott2024">Abbott et al., 2024</a>). The signal is consistent with the merger of a neutron star with a compact object whose mass was measured to be approximately $2.5$–$4.5\ M_\odot$. This mass range sits squarely in what theorists call the &ldquo;mass gap&rdquo; — the range between the heaviest neutron stars ($\lesssim 2.3\ M_\odot$, though this upper limit is uncertain) and the lightest stellar-mass black holes inferred from X-ray binaries ($\gtrsim 5\ M_\odot$).</p>
<p>Whether GW230529&rsquo;s companion was the heaviest neutron star ever observed, or the lightest black hole, is genuinely unknown. The distinction matters enormously for nuclear physics: if it is a neutron star, it constrains the nuclear equation of state at supranuclear densities. If it is a black hole, it means the mass gap is narrower than X-ray observations suggested, and our understanding of compact object formation needs revision. Gravitational wave observations alone cannot distinguish a rapidly spinning heavy neutron star from a slowly spinning light black hole without additional electromagnetic counterpart observations, and no counterpart was found for GW230529. This question will likely not be settled until we have electromagnetic constraints from similar systems, or until we accumulate enough mass-gap events to understand the population statistically.</p>
<h2 id="the-instrument-we-did-not-build">The instrument we did not build</h2>
<p>I want to return to the pulsar timing array concept, because I think it deserves more than a passing technical description. The idea is this: nature has distributed a set of extremely stable clocks across the galaxy. We did not put them there. We did not design them. We simply discovered that neutron stars, after a particular evolutionary pathway involving mass transfer from a binary companion, achieve a rotational stability that happens to be sufficient to detect perturbations of spacetime at cosmological scales.</p>
<p>The instrument is the galaxy itself — or rather, our ability to model it. We build a timing model for each pulsar: a comprehensive description of every known effect that influences pulse arrival times. We subtract the model. What remains, the residuals, contains the signal we cannot yet explain. We cross-correlate the residuals of 68 (or 25, or 57, depending on the collaboration) pulsars in pairs, compute the correlation as a function of angular separation, and compare to the Hellings-Downs prediction.</p>
<p>The engineering challenge is not building the detector. It is characterising it. Understanding the noise. Modelling the interstellar medium, which disperses radio pulses in a frequency-dependent way and varies as the pulsar moves through clouds of ionised gas. Accounting for clock errors in the terrestrial time standards used to timestamp observations. Dealing with instrumental noise in each of the different radio telescopes that contribute data. Building a Bayesian framework that can simultaneously fit the timing model parameters, the pulsar noise properties, and the GWB parameters for dozens of pulsars.</p>
<p>This is painstaking, years-long work. The 15-year NANOGrav dataset reflects something like 600 pulsar-years of observation. The detection is earned.</p>
<p>The Hellings-Downs correlation — the specific pattern that emerged from those residuals, consistent with the quadrupolar fingerprint of general relativity&rsquo;s spin-2 gravitational waves — is one of the more beautiful results I have seen in recent astrophysics. It is a direct measurement of the tensor nature of gravity, at frequency scales eleven orders of magnitude below anything LIGO can access, using a detector assembled by the Milky Way over the course of 10 billion years of stellar evolution and galaxy mergers.</p>
<p>We are in an age of gravitational wave astronomy. I find that remarkable.</p>
<p>If you are interested in the broader theme of using astronomical observations as physics experiments rather than just cataloguing the sky, the posts on <a href="/posts/the-gift-of-transits/">transit photometry and the gift of transits</a> and on <a href="/posts/exoplanet-hunting-smartphones/">smartphone-based exoplanet observations</a> cover similar ground at different scales — stellar radii measured in units of stellar radii, by timing the dimming of a star as a planet crosses its face. The underlying logic is the same: precision timing plus a physical model plus statistics equals a measurement of something you could not directly touch.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p><span id="ref-HellingsDowns1983"></span>Hellings, R. W., &amp; Downs, G. S. (1983). Upper limits on the isotropic gravitational radiation background from pulsar timing analysis. <em>The Astrophysical Journal Letters</em>, 265, L39–L42. <a href="https://doi.org/10.1086/183954">DOI: 10.1086/183954</a></p>
</li>
<li>
<p><span id="ref-Agazie2023"></span>Agazie, G., et al. (NANOGrav Collaboration). (2023). The NANOGrav 15 yr Data Set: Evidence for a Gravitational-wave Background. <em>The Astrophysical Journal Letters</em>, 951, L8. <a href="https://doi.org/10.3847/2041-8213/acdac6">DOI: 10.3847/2041-8213/acdac6</a></p>
</li>
<li>
<p><span id="ref-Antoniadis2023"></span>Antoniadis, J., et al. (EPTA Collaboration). (2023). The second data release from the European Pulsar Timing Array — III. Search for gravitational wave signals. <em>Astronomy &amp; Astrophysics</em>, 678, A50. <a href="https://doi.org/10.1051/0004-6361/202346844">DOI: 10.1051/0004-6361/202346844</a></p>
</li>
<li>
<p><span id="ref-Reardon2023"></span>Reardon, D. J., et al. (PPTA Collaboration). (2023). Search for an isotropic gravitational-wave background with the Parkes Pulsar Timing Array. <em>The Astrophysical Journal Letters</em>, 951, L6. <a href="https://doi.org/10.3847/2041-8213/acdd02">DOI: 10.3847/2041-8213/acdd02</a></p>
</li>
<li>
<p><span id="ref-Xu2023"></span>Xu, H., et al. (CPTA Collaboration). (2023). Searching for the nano-Hertz stochastic gravitational wave background with the Chinese Pulsar Timing Array. <em>Research in Astronomy and Astrophysics</em>, 23(7), 075024. <a href="https://doi.org/10.1088/1674-4527/acdfa5">DOI: 10.1088/1674-4527/acdfa5</a></p>
</li>
<li>
<p><span id="ref-Abbott2024"></span>Abbott, R., et al. (LIGO Scientific Collaboration). (2024). Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 $M_\odot$ Compact Object and a Neutron Star. <em>The Astrophysical Journal Letters</em>, 970, L34. <a href="https://doi.org/10.3847/2041-8213/ad5beb">DOI: 10.3847/2041-8213/ad5beb</a></p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-12-01</strong>: Corrected the summary from &ldquo;four independent pulsar timing arrays&rdquo; to &ldquo;five&rdquo; — the CPTA (Chinese Pulsar Timing Array) also published consistent results in June 2023 and is counted as the fifth group in the body text.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Fremde Welten: Teaching Exoplanet Detection in the Secondary School Classroom</title>
      <link>https://sebastianspicker.github.io/posts/fremde-welten-exoplanet-teaching/</link>
      <pubDate>Wed, 14 Jun 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/fremde-welten-exoplanet-teaching/</guid>
      <description>A unit for lower secondary physics classes (grades 8–10) on detecting exoplanets with analogy experiments. Published in Unterricht Physik in 2023, it starts where students&amp;rsquo; misconceptions are — with the (wrong) assumption that you can just look at exoplanets through a telescope — and works forward from there.</description>
      <content:encoded><![CDATA[<p><em>This post describes the article &ldquo;Fremde Welten — Die Suche nach Exoplaneten
mit Analogieexperimenten thematisieren&rdquo; (Strange Worlds: Teaching Exoplanet
Detection with Analogy Experiments), published in Unterricht Physik (Issue 194,
2023) with Alexander Küpper.</em></p>
<hr>
<h2 id="where-students-start">Where Students Start</h2>
<p>Before students encounter the transit method, most of them have a clear mental
model of how exoplanet detection works: you point a large telescope at a nearby
star, and if there is a planet, you see it. &ldquo;You could see them [the exoplanets]
with a telescope/binoculars&rdquo; and &ldquo;You can see them with an extremely powerful
telescope&rdquo; are typical responses from year 8–9 students before they work through
an actual detection unit.</p>
<p>This is not an unreasonable starting intuition. Telescopes see things far away.
Planets are things far away. The inference seems to follow.</p>
<p>What it misses is the contrast ratio problem. A star is not just brighter
than its planets — it is overwhelmingly, almost incomprehensibly brighter.
In visible light, a star like the Sun outshines Jupiter by roughly a billion
to one. Against that glare, the planet is functionally invisible. Direct
imaging of exoplanets is possible in special circumstances — young planets
far from their stars, imaged in infrared — but for the vast majority of
exoplanets, it is not a viable detection method.</p>
<p>The unit described in this article takes that misconception as its entry point
and builds from there.</p>
<hr>
<h2 id="the-direct-imaging-experiment">The Direct Imaging Experiment</h2>
<p>The first experiment in the unit is a hands-on demonstration of why direct
imaging is difficult.</p>
<p>The setup: a student points their smartphone camera at a small light source
(a switched-on torch). Directly next to the torch, barely a few centimetres
away, is a pin with a coloured head — the &ldquo;exoplanet&rdquo;. On the phone&rsquo;s display,
the pinhead is invisible. The torch (star) drowns it out completely.</p>
<p>Students can then investigate what would need to change for the pinhead to
become visible. The answer they discover: block the torch with a small disc
held in front of the camera at the right distance. With the direct glare
suppressed, the illuminated pinhead becomes visible in the image.</p>
<p>This is a coronagraph in miniature. The same principle is used in real
direct-imaging instruments like SPHERE on the VLT or the coronagraph in
the Nancy Grace Roman Space Telescope. Students discover, experimentally,
the essential idea: to see an exoplanet directly, you need to suppress the
star&rsquo;s light without blocking the planet&rsquo;s.</p>
<p>The experiment also motivates a natural follow-on question: under what
conditions does direct imaging work at all? Students can vary the pinhead
distance from the torch and its size, exploring qualitatively the conditions
under which the &ldquo;exoplanet&rdquo; becomes detectable even with partial suppression.
The answer — large planets, far from their host star — matches the real
observational bias: most directly imaged exoplanets are large, young
(still warm from formation), and in wide orbits.</p>
<hr>
<h2 id="the-transit-experiment">The Transit Experiment</h2>
<p>Once the limits of direct imaging are established, the unit introduces the
transit method as the primary indirect technique. The pedagogical structure
is deliberate: students have already understood that you cannot usually see
exoplanets directly, which motivates the question of how else you might
detect them.</p>
<p>The transit experiment uses a lamp as the star, a ball moved by hand
(approximately periodically) around the lamp, and an Android smartphone
running <a href="https://phyphox.org">phyphox</a> as the light sensor. When the ball
crosses in front of the lamp from the sensor&rsquo;s perspective, the measured
illuminance dips. Students see a real light curve — not a simulation,
not a graph from a database, but something they produced themselves from
a physical measurement.</p>
<p>Two phyphox experiment files are provided for download (via QR code in
the article and at astro-lab.app):</p>
<p><strong>Basic experiment</strong>: records the raw illuminance data and displays the
light curve. The focus is qualitative — what shape does the dip have?
What determines the depth? What determines the period? Students can
formulate the relationship between dip depth and planet-to-star size ratio
as a qualitative rule (the larger the planet relative to the star, the
deeper the dip) without necessarily working through the mathematics.</p>
<p><strong>Extended experiment</strong>: adds real-time calculations of the transit depth
$\Delta F$, the maximum illuminance $I_*$ and transit illuminance $I_\text{transit}$,
the transit duration, and the orbital period. For students who are ready
for it, this allows a quantitative derivation of the &ldquo;planet&rdquo; radius from
the light curve — given a known lamp radius and the measured transit depth:</p>
$$\Delta F = \left(\frac{R_p}{R_*}\right)^2$$<p>The extended experiment also invites critical engagement with the model:
the radius derived from the analogy experiment will differ from the
actual ball radius, because the distance ratios in the tabletop setup
are not to scale. Making that discrepancy explicit — and asking students
why it arises — is good science practice.</p>
<hr>
<h2 id="limits-of-the-transit-method">Limits of the Transit Method</h2>
<p>A recurring theme in the unit is that every detection method has limits,
and understanding those limits is part of understanding the method.</p>
<p>For the transit method, the fundamental limit is inclination. A transit
is only observable if the planet&rsquo;s orbital plane is aligned (nearly
edge-on) relative to our line of sight. Most exoplanetary systems, viewed
from Earth, will not be aligned in this way. The transit method is
therefore a biased sample: it preferentially detects planets in edge-on
orbits, and it misses most planets entirely.</p>
<p>Students can explore this experimentally: tilt the plane of the ball&rsquo;s
orbit away from edge-on and observe what happens to the light curve.
The dip disappears. This connects naturally to a broader point about
how astronomical surveys work: when we report &ldquo;X% of stars have
detectable planets&rdquo;, we are reporting a fraction that has been corrected
for this and other observational biases.</p>
<p>The article includes a differentiation note: the limits investigation
works well as an open inquiry task, with students formulating and testing
their own hypotheses about what orbital configurations produce detectable
transits.</p>
<hr>
<h2 id="exoplanets-as-a-curriculum-bridge">Exoplanets as a Curriculum Bridge</h2>
<p>One point the article makes explicitly is that exoplanets are not just an
astronomy topic but a context that connects to multiple items in the German
physics curriculum for Sekundarstufe I. The cross-connections include:</p>
<ul>
<li><strong>Optics</strong>: the seeing process (why does the star outshine the planet?),
shadow formation, refraction in telescopes</li>
<li><strong>Mechanics</strong>: orbital period, Kepler&rsquo;s laws at a qualitative level,
the habitable zone as a consequence of stellar luminosity and distance</li>
<li><strong>Thermodynamics</strong>: planetary surface temperature, the greenhouse
effect, albedo</li>
<li><strong>Pressure</strong>: atmospheric pressure, habitability (a connection
developed more fully in the <a href="/posts/mission-to-mars/">Mission to Mars</a>
experiment)</li>
</ul>
<p>The motivating context — could this planet host life? — sustains
student engagement across these topics in a way that treating them
in isolation does not.</p>
<hr>
<h2 id="what-comes-after">What Comes After</h2>
<p>The transit method is a productive entry point, but the search for
extraterrestrial life does not end with planet detection. The article
closes by noting that the detected exoplanets need to be analysed
for habitability — which depends on orbital radius (habitable zone),
stellar temperature, planet radius (mass is not available from transit
data alone), atmospheric composition, albedo, and greenhouse effect.</p>
<p>Many of these can be connected back to physics experiment contexts,
and the astro-lab project has developed smartphone-based analogy
experiments for several of them. Detailed information on these is at
<a href="https://astro-lab.app">astro-lab.app</a>.</p>
<p>For the full pedagogical sequence — from the original astro-lab
student laboratory, through the COVID pivot to home experiments, to
the return to school — see <a href="/posts/astro-lab-at-home/">The Lab Goes Home</a>.
For the exomoon extension, which takes the transit experiment further
into the question of moon-hosted life, see
<a href="/posts/exomoon-analogy-experiment/">Can a Planet Have a Moon?</a>.</p>
<hr>
<h2 id="references">References</h2>
<p>Küpper, A., &amp; Spicker, S. J. (2023). Fremde Welten — Die Suche nach
Exoplaneten mit Analogieexperimenten thematisieren. <em>Unterricht Physik</em>,
34(194), 4–9.</p>
<p>Küpper, A., Spicker, S. J., &amp; Schadschneider, A. (2022).
Analogieexperimente zur Transitmethode für den Physik- und
Astronomieunterricht in der Sekundarstufe I. <em>Astronomie+Raumfahrt
im Unterricht</em>, 59(188), 46–50.</p>
<p>Spicker, S. J., &amp; Küpper, A. (2024). Exoplanet hunting in the classroom:
An easy-to-implement experiment based on video-aided light curve analysis
with smartphones. <em>The Physics Teacher</em>, 62(3).
<a href="https://doi.org/10.1119/5.0125305">https://doi.org/10.1119/5.0125305</a></p>
<p>MSB NRW (2019). <em>Kernlehrplan für die Sekundarstufe I — Gymnasium in
Nordrhein-Westfalen: Physik.</em> Ministerium für Schule und Bildung NRW.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-10-03</strong>: Updated the DOI for Spicker &amp; Küpper (2024) to the correct 10.1119/5.0125305.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>How to Actually Film a Classroom: An Open-Access Manual on Classroom Videography</title>
      <link>https://sebastianspicker.github.io/posts/villa-videography-manual/</link>
      <pubDate>Tue, 09 May 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/villa-videography-manual/</guid>
      <description>Three years after writing about why classroom video works, Charlotte Kramer, Kai Kaspar, and I wrote a manual on how to actually do it. The gap between knowing that video-based learning is effective and being able to produce usable footage turns out to be substantial. The manual is open access. Here is what is in it and why some of it surprised me to write.</description>
      <content:encoded><![CDATA[<p><em>This post is a follow-up to the <a href="/posts/villa-video-teacher-education/">June 2020 post on ViLLA and video in teacher
education</a>. That post was about why classroom
video is useful and what the ViLLA project found. This one is about the practical
question that post sidestepped: what does it actually take to film a real lesson?</em></p>
<p><em>The manual — Kramer, C., Spicker, S. J., &amp; Kaspar, K. (2023). Manual zur Erstellung
von Unterrichtsvideographien — is open access and freely downloadable at
<a href="https://kups.ub.uni-koeln.de/65599/">kups.ub.uni-koeln.de/65599</a>. Funded by the BMBF
under the ZuS Qualitätsoffensive Lehrerbildung programme (grant 01JA1815).</em></p>
<hr>
<h2 id="why-a-manual-exists">Why a Manual Exists</h2>
<p>The argument for classroom video in teacher education is not hard to make. The evidence
that video-based learning improves the perceptual and interpretive skills of student
teachers is solid enough that &ldquo;should we use video?&rdquo; is no longer a particularly
interesting question. The interesting questions are downstream: which kind of video,
for what purpose, produced how, stored where, used under what conditions.</p>
<p>The last of those — produced how — turns out to be the one that most programmes have
the least guidance on. There is a reasonably large research literature on the
<em>effects</em> of classroom video, and a smaller but growing literature on <em>design
principles</em> for video-based learning environments. There is much less on the
practical production side: what you need to decide before you enter a school
building, what can go wrong during filming, and what the post-processing work
actually involves.</p>
<p>The gap matters because it creates a reproducibility problem. If every research group
that wants classroom video has to figure out independently how to handle consent across
four institutional levels, how to position two cameras in a classroom with a window
on the wrong side, and how much post-processing time to budget per lesson, a lot of
effort goes into re-solving problems that have already been solved. The manual is an
attempt to make that accumulated knowledge explicit and shareable.</p>
<hr>
<h2 id="three-phases-and-why-preparation-is-the-most-important-one">Three Phases, and Why Preparation Is the Most Important One</h2>
<p>The manual is structured around the production lifecycle: preparation, production,
and post-processing. Each section ends with a practical checklist. The structuring
is not original — it follows Thomson (2019) and draws on Herrle and Breitenbach (2016)
and several other methodological guides — but the synthesis reflects what we learned
from actually running videography sessions at the University of Cologne over several
years.</p>
<p>The strongest claim in the manual is that <strong>preparation is the most important phase</strong>.
This sounds obvious and is consistently underestimated.</p>
<h3 id="methodical-preparation-the-question-before-the-camera-question">Methodical preparation: the question before the camera question</h3>
<p>Before any equipment decisions, the manual asks you to work through a prior question:
is video actually the right medium for what you want to know?</p>
<p>This is not a rhetorical check. Classroom video is excellent at capturing dynamic
processes — movement, gesture, voice, simultaneous events — and works well for
constructs like classroom management and communication patterns. It works less well
for constructs where the relevant data is not visible on the surface, like a student&rsquo;s
prior knowledge activation or the cognitive demands of a task. Using video for those
questions is possible, but you need more sessions, more annotation work, and supplementary
instruments. Building that into your timeline before you start is considerably better
than realising it after you have sixty hours of footage.</p>
<p>The manual also distinguishes four decisions about what kind of video you are making:</p>
<ul>
<li><strong>Authentic vs. staged</strong>: real everyday teaching vs. deliberately constructed
cases. Authentic footage gives you ecological validity; staged footage lets you
control which situations appear.</li>
<li><strong>Own vs. others&rsquo; teaching</strong>: self-recording for reflection vs. observing others
for general analysis.</li>
<li><strong>Typical vs. best practice</strong>: real-world teaching in its ordinary form vs.
exemplary demonstration material.</li>
<li><strong>Sequence vs. full lesson</strong>: a targeted extract sufficient for a specific analytic
focus vs. a complete lesson for contextualised, developmental analysis.</li>
</ul>
<p>None of these are neutral technical choices. They are methodological decisions that
determine what the resulting footage can be used for and what it cannot.</p>
<h3 id="organisational-preparation-the-consent-problem-is-harder-than-it-looks">Organisational preparation: the consent problem is harder than it looks</h3>
<p>The most time-consuming part of any real videography project is not the filming.
It is obtaining the permissions.</p>
<p>You need written consent from pupils, parents or guardians (separately, depending
on age — the threshold is 14 in the German legal framework the manual follows),
the class teacher, school leadership, the school authority, and in some states the
relevant ministry. The scope of the consent you obtain determines the scope of
use you can put the footage to: footage filmed under a narrow research-project-only
consent cannot be uploaded to ViLLA; footage filmed with broad usage rights can.
The broader the rights you request, the higher the barrier for participants to agree.</p>
<p>The practical implication: decide early what you want to do with the footage, because
what you put in the information letters and consent forms determines what is possible
for the lifetime of the data. This is a decision you cannot easily undo.</p>
<p>The manual also addresses the case where some pupils do not consent: in that situation,
it is often possible to position non-consenting pupils in a &ldquo;blind spot&rdquo; — an area
of the room where neither camera nor microphone captures them. But this requires
knowing the room layout and the planned seating arrangement in advance, which is
another reason organisational preparation starts earlier than you think.</p>
<h3 id="technical-preparation-as-much-as-necessary-as-little-as-possible">Technical preparation: as much as necessary, as little as possible</h3>
<p>The guiding principle for equipment selection is stated directly in the manual:
<em>so viel wie nötig, so wenig wie möglich</em> — as much as necessary, as little as
possible.</p>
<p>This matters because there is a pull toward technical elaboration that does not
always serve the research purpose. More cameras capture more perspectives; more
microphones capture more of the acoustic space; 360° cameras give you everything.
But more equipment means more setup time, more opportunities for failure during
filming, and substantially more post-processing work. And more visual complexity
in the final video does not automatically mean more analytically useful material —
it can mean more cognitive load for the students watching it.</p>
<p>The baseline setup the manual recommends is two static cameras positioned facing
each other: one centred on the students, one centred on the teacher. This
configuration, with lavalier microphones on teachers and boundary microphones for
student audio at the cameras, captures most of what you need for classroom management
research and teacher education at a level of complexity that is manageable. Extensions
— pan cameras for interaction analysis, additional cameras for group work, mobile
eye-tracking for teacher perspective, 360° cameras — are described as additions
for specific purposes, not as defaults.</p>
<hr>
<h2 id="what-happens-during-filming">What Happens During Filming</h2>
<p>The production section of the manual is the most specific and in some ways the
most useful part if you are planning a session for the first time. Some things
worth knowing:</p>
<p><strong>Start the cameras before the lesson.</strong> Authentically start once means you cannot
go back. Events that happen before the official start of the lesson — how a teacher
enters, how students settle, how the first few minutes of a lesson are framed — can
be analytically relevant. And any technical problems that surface before teaching
begins can still be fixed. Footage filmed before the lesson is easy to cut in post;
lost footage from the opening of a lesson is gone.</p>
<p><strong>The camera operator&rsquo;s job is to be boring.</strong> The manual is explicit that operators
should neither engage with the lesson content nor conspicuously attend to the
equipment. A relaxed posture, eyes on the monitor, not reacting to what is happening
in the room — this is the technique that allows pupils and teachers to stop registering
the cameras, which typically happens within the first few minutes if operators are not
drawing attention to themselves.</p>
<p><strong>Use a clapper.</strong> When running multiple cameras or separate audio recorders, a
handclap or clapperboard after all devices are rolling gives you a synchronisation
point for later editing. This is known to everyone who has ever synchronised footage,
but it is the kind of thing that is easy to forget in the scramble of setting up
during a ten-minute break.</p>
<p><strong>Backlighting is the enemy.</strong> Windows behind subjects produce the most common image
quality problem in classroom footage. The manual discusses ND filters for cases where
backlighting cannot be avoided, but the first-choice solution is room scouting in
advance to know where the windows are and plan camera placement accordingly.</p>
<hr>
<h2 id="post-processing-the-hidden-cost">Post-Processing: The Hidden Cost</h2>
<p>The post-processing chapter is the one I think is most likely to recalibrate
expectations productively.</p>
<p>Post-processing is time-intensive in proportion to the number of camera angles,
the number of audio tracks requiring synchronisation or correction, and the extent
of image and sound quality work needed. The manual is explicit that editing should
be done by people with content knowledge — not just technical skill — because the
person in the edit suite is constantly making decisions about what to include, how
to cut between perspectives, when to show the teacher&rsquo;s face vs. the students'
faces. Those decisions are not editorially neutral. They determine what a viewer of
the finished video can perceive.</p>
<p>This is the point in the manual where the methodological problem I mentioned in
the previous post becomes concrete: the videography setting is not a neutral window
onto the classroom. The two-camera cross-cut convention (cut to the face of whoever
is speaking) is widely used and convenient for teaching purposes, but it is also
an editorial choice that foregrounds spoken exchange and makes other information —
spatial position, background activity, gestural communication between students —
less visible. Knowing that this choice was made is part of what a researcher or
educator needs to know in order to use the footage responsibly.</p>
<p>Data security deserves its own mention. Video files are large, they contain images
of minors, and they need to be stored under conditions that comply with current
data protection law — which means redundant backup, restricted access, purpose
limitation, and active awareness of what the current legal requirements are (which
change). The manual recommends checking applicable regulations before starting
rather than after, and treating data security as part of the workflow design rather
than an administrative afterthought.</p>
<hr>
<h2 id="what-is-coming-next">What Is Coming Next</h2>
<p>The manual&rsquo;s final chapter points toward three developments that are worth tracking:</p>
<p><strong>360° video and VR.</strong> Gold and Windscheid (2020) found that 360° classroom video
produces higher presence in student teacher observers than conventional video, though
without differences in learning outcomes measured by events noticed or ratings of
teaching quality. Whether the presence effect translates into something measurable
is an open empirical question. The VR version of this — using 360° classroom footage
as an immersive training environment where student teachers can observe without
the pressure of having to act — is methodologically interesting and practically
plausible at costs that are no longer prohibitive.</p>
<p><strong>Animated classroom video.</strong> The handful of studies on animated (as opposed to
filmed) classroom situations suggests that student teachers notice similar
learning-relevant events in animated and real footage (Smith et al., 2012; Chieu
et al., 2011). If that holds up, animation offers a way to construct specific scenarios
that would be hard to capture or ethically complex to film — situations involving
conflict, failure, or particular forms of student difficulty — without requiring
access to actual classrooms or consent from real pupils.</p>
<p><strong>Mobile eye-tracking.</strong> The combination of classroom videography with mobile
eye-tracking worn by teachers (Rüth, Zimmermann, &amp; Kaspar, 2020) opens the
teacher&rsquo;s-perspective angle that a fixed camera cannot capture. It is a technically
more demanding addition to the setup but an analytically distinctive one, and the
hardware costs have come down substantially.</p>
<hr>
<h2 id="a-note-on-open-access">A Note on Open Access</h2>
<p>The manual is freely available at <a href="https://kups.ub.uni-koeln.de/65599/">kups.ub.uni-koeln.de/65599</a>. We made it open access deliberately. The practical obstacles to classroom videography — not knowing how to handle consent, not knowing what equipment configuration works for a standard lesson, not knowing how long post-processing will actually take — are not obstacles that should be higher for researchers at institutions without an existing videography infrastructure. The knowledge exists; it should be findable.</p>
<p>If you are at the University of Cologne and want to run a videography session but
do not have your own equipment, the ZuS Media Labs project has a lending programme.
Contact the team at <a href="mailto:zus-kontakt@uni-koeln.de">zus-kontakt@uni-koeln.de</a> for the current equipment catalogue.</p>
<hr>
<p><em>For the specific challenges the manual doesn&rsquo;t address — recording in music
education, instrument acoustics, one-to-one lessons, and practice-session
documentation — see the
<a href="/posts/filming-music-education/">follow-up post on filming music education</a>.</em></p>
<hr>
<h2 id="references">References</h2>
<p>Chieu, V. M., Herbst, P., &amp; Weiss, M. (2011). Effect of an animated classroom story
embedded in online discussion on helping mathematics teachers learn to notice.
<em>Journal of the Learning Sciences</em>, 20(4), 589–624.
<a href="https://doi.org/10.1080/10508406.2011.528324">https://doi.org/10.1080/10508406.2011.528324</a></p>
<p>Gold, B., &amp; Windscheid, J. (2020). Observing 360-degree classroom videos — effects
of video type on presence, emotions, workload, classroom observations, and ratings
of teaching quality. <em>Computers &amp; Education</em>, 156, 103960.
<a href="https://doi.org/10.1016/j.compedu.2020.103960">https://doi.org/10.1016/j.compedu.2020.103960</a></p>
<p>Herrle, M., &amp; Breitenbach, S. (2016). Planung, Durchführung und Nachbereitung
videogestützter Beobachtungen im Unterricht. In U. Rauin, M. Herrle &amp; T. Engartner
(Hrsg.), <em>Videoanalysen in der Unterrichtsforschung</em>, 30–49. Beltz Juventa.</p>
<p>Kramer, C., König, J., Strauß, S., &amp; Kaspar, K. (2020). Classroom videos or transcripts?
A quasi-experimental study to assess the effects of media-based learning on
pre-service teachers&rsquo; situation-specific skills of classroom management.
<em>International Journal of Educational Research</em>, 103, 101624.
<a href="https://doi.org/10.1016/j.ijer.2020.101624">https://doi.org/10.1016/j.ijer.2020.101624</a></p>
<p>Rüth, M., Zimmermann, D., &amp; Kaspar, K. (2020). Mobiles Eye-Tracking im Unterricht.
In K. Kaspar et al. (Hrsg.), <em>Bildung, Schule, Digitalisierung</em>, 222–228. Waxmann.</p>
<p>Smith, D., McLaughlin, T., &amp; Brown, I. (2012). 3-D computer animation vs. live-action
video. <em>Contemporary Issues in Technology and Teacher Education</em>, 12(1), 41–54.</p>
<p>Thomson, A. (2019). <em>The creation and use of video-for-learning in higher education</em>.
Master&rsquo;s thesis, Queensland University of Technology.
<a href="https://doi.org/10.5204/thesis.eprints.130743">https://doi.org/10.5204/thesis.eprints.130743</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>The Charm of Impossibilities: Group Theory and Messiaen&#39;s Modes of Limited Transposition</title>
      <link>https://sebastianspicker.github.io/posts/messiaen-modes-group-theory/</link>
      <pubDate>Wed, 19 Apr 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/messiaen-modes-group-theory/</guid>
      <description>Messiaen&amp;rsquo;s seven modes of limited transposition cannot be fully transposed through all twelve keys — not by convention, but because of group theory. The modes are pitch-class sets whose stabiliser subgroups in ℤ₁₂ are non-trivial. The orbit–stabiliser theorem gives the exact count of distinct transpositions for each mode, and the subgroup lattice of ℤ₁₂ maps directly onto the hierarchy of the seven modes.</description>
      <content:encoded><![CDATA[<p><em>I first encountered Messiaen&rsquo;s second mode — the octatonic scale — in an
analysis seminar during my physics studies, played by a colleague on an upright
piano in a rehearsal room with terrible acoustics. She demonstrated something
that stopped me: no matter how many times she transposed the scale up by a minor
third, she could never find a &ldquo;new&rdquo; version. After three transpositions she was
back where she started. She called it the charm of impossibilities. It took me
years to understand why it is impossible, and longer still to see that the answer
is not musical but algebraic.</em></p>
<p><em>This post is a companion to <a href="/posts/fibonacci-lateralus/">Fibonacci, the Golden Ratio, and Tool&rsquo;s
Lateralus</a>, which found number theory in a prog-rock
song. Here we find abstract algebra in twentieth-century sacred music.</em></p>
<hr>
<h2 id="pitch-classes-and-the-chromatic-clock">Pitch Classes and the Chromatic Clock</h2>
<p>Western music divides the octave into twelve equal semitones. For purposes of
harmony and counterpoint, the absolute pitch is often less important than the
pitch <em>class</em> — the equivalence class of all pitches related by octave
transposition. Middle C and the C two octaves above belong to the same pitch
class.</p>
<p>We label the twelve pitch classes $0, 1, 2, \ldots, 11$, with $0 = \mathrm{C}$,
$1 = \mathrm{C}\sharp/\mathrm{D}\flat$, $2 = \mathrm{D}$, and so on up to
$11 = \mathrm{B}$. Addition is taken modulo 12 — the integers wrap around like
a clock face, with $11 + 2 = 1$ (one semitone above B is C$\sharp$).</p>
<p>The set of pitch classes with this operation is a group:</p>
$$\mathbb{Z}_{12} = \{0, 1, 2, \ldots, 11\}, \qquad x \oplus y = (x + y) \bmod 12.$$<p>This is the cyclic group of order 12. It has an identity element ($0$, &ldquo;no
transposition&rdquo;), every element has an inverse ($-n \bmod 12$), and the operation
is associative. If you are used to thinking about the chromatic scale as a linear
sequence ending at the octave, $\mathbb{Z}_{12}$ is the insistence that it is
actually a circle.</p>
<hr>
<h2 id="musical-operations-as-group-elements">Musical Operations as Group Elements</h2>
<p>Two operations are fundamental in tonal and post-tonal music theory.</p>
<p><strong>Transposition</strong> by $n$ semitones maps every pitch class up by $n$:</p>
$$T_n \colon x \mapsto x + n \pmod{12}.$$<p>The twelve transpositions $T_0, T_1, \ldots, T_{11}$ are exactly the elements of
$\mathbb{Z}_{12}$, with $T_n$ corresponding to the integer $n$. Composing two
transpositions gives a transposition: $T_m \circ T_n = T_{m+n}$.</p>
<p><strong>Inversion</strong> reflects the pitch-class circle:</p>
$$I \colon x \mapsto -x \pmod{12}.$$<p>Inversion maps C to C, D to B$\flat$, E to A$\flat$, and so on — it is the
mirror symmetry of the chromatic circle about the C/F$\sharp$ axis. Combining
inversion with transposition gives the <em>inversional transpositions</em>:</p>
$$I_n \colon x \mapsto n - x \pmod{12}.$$<p>The transpositions and inversional transpositions together generate a group of
order 24:</p>
$$D_{12} = \langle T_1, I \rangle.$$<p>This is the <em>dihedral group</em> $D_{12}$ — the same abstract group that describes
the symmetries of a regular 12-gon (twelve rotations and twelve reflections). The
identification is not coincidental: the twelve pitch classes arranged in a circle
<em>are</em> the vertices of a regular 12-gon, and the musical operations are
geometrically the symmetries of that polygon.</p>
<p>Twelve-tone composition — Schoenberg&rsquo;s method — is almost entirely a
working-out of the consequences of $D_{12}$ acting on ordered sequences of the
twelve pitch classes. The four canonical row forms (prime, inversion, retrograde,
retrograde-inversion) correspond to cosets of $\mathbb{Z}_{12}$ (the transposition subgroup).</p>
<hr>
<h2 id="orbits-and-stabilisers">Orbits and Stabilisers</h2>
<p>Let $S \subseteq \mathbb{Z}_{12}$ be a pitch-class set — a chord, a scale, a
collection of any size.</p>
<p>The <strong>orbit</strong> of $S$ under $\mathbb{Z}_{12}$ is the collection of all distinct
transpositions of $S$:</p>
$$\mathrm{Orb}(S) = \{ T_n(S) : n \in \mathbb{Z}_{12} \}.$$<p>For most sets, all twelve transpositions produce a different set, so
$|\mathrm{Orb}(S)| = 12$. The C major scale, for example, has twelve distinct
transpositions, one for each key.</p>
<p>But some sets are symmetric under certain transpositions: there exists $n \neq 0$
such that $T_n(S) = S$. The collection of all symmetry transpositions of $S$ is
the <strong>stabiliser</strong>:</p>
$$\mathrm{Stab}(S) = \{ T_n \in \mathbb{Z}_{12} : T_n(S) = S \}.$$<p>Because composing two symmetry transpositions yields another, $\mathrm{Stab}(S)$
is a <em>subgroup</em> of $\mathbb{Z}_{12}$.</p>
<p>The <strong>orbit–stabiliser theorem</strong> gives the fundamental count:</p>
$$|\mathrm{Orb}(S)| \cdot |\mathrm{Stab}(S)| = |\mathbb{Z}_{12}| = 12.$$<p>The number of distinct transpositions of $S$ equals $12$ divided by the number
of transpositions that leave $S$ unchanged. The more internally symmetric $S$ is,
the fewer new versions you can produce by transposing it.</p>
<p>A set with $|\mathrm{Stab}(S)| > 1$ — one that is invariant under some
non-trivial transposition — is a <strong>mode of limited transposition</strong>.</p>
<hr>
<h2 id="mode-1-the-whole-tone-scale">Mode 1: The Whole-Tone Scale</h2>
<p>The whole-tone scale contains the six pitch classes at even intervals:</p>
$$\mathrm{Mode\ 1} = \{0, 2, 4, 6, 8, 10\}.$$<p>Transposing by $T_2$:</p>
$$T_2(\{0, 2, 4, 6, 8, 10\}) = \{2, 4, 6, 8, 10, 0\} = \{0, 2, 4, 6, 8, 10\}. \checkmark$$<p>The set is unchanged. The same holds for $T_4, T_6, T_8, T_{10}$. The stabiliser
is the full subgroup of even transpositions:</p>
$$\mathrm{Stab}(\mathrm{Mode\ 1}) = \{T_0, T_2, T_4, T_6, T_8, T_{10}\} \cong \mathbb{Z}_6.$$<p>By the orbit–stabiliser theorem:</p>
$$|\mathrm{Orb}(\mathrm{Mode\ 1})| = \frac{12}{6} = 2.$$<p>There are exactly two distinct whole-tone scales. Every pianist learns this: the
one on C and the one on C$\sharp$. Composing with whole-tone harmony means
working from a stock of only two harmonic pools with no way to modulate into a
genuinely new version of the scale. This is Messiaen&rsquo;s first charm of
impossibility.</p>
<hr>
<h2 id="mode-2-the-octatonic-scale">Mode 2: The Octatonic Scale</h2>
<p>The octatonic (diminished) scale alternates half-step and whole-step intervals.
Starting on C:</p>
$$\mathrm{Mode\ 2} = \{0, 1, 3, 4, 6, 7, 9, 10\}.$$<p>Does $T_3$ leave this set invariant?</p>
$$T_3(\{0, 1, 3, 4, 6, 7, 9, 10\}) = \{3, 4, 6, 7, 9, 10, 0, 1\} = \{0, 1, 3, 4, 6, 7, 9, 10\}. \checkmark$$<p>Also $T_6$ and $T_9$. The stabiliser is the subgroup generated by transposition
by a minor third:</p>
$$\mathrm{Stab}(\mathrm{Mode\ 2}) = \{T_0, T_3, T_6, T_9\} \cong \mathbb{Z}_4.$$<p>The orbit size:</p>
$$|\mathrm{Orb}(\mathrm{Mode\ 2})| = \frac{12}{4} = 3.$$<p>There are exactly three distinct octatonic scales. Composers from Rimsky-Korsakov
and Bartók to Coltrane have exploited this closed system. The three scales
correspond to the three cosets of the subgroup $\langle T_3 \rangle$ in
$\mathbb{Z}_{12}$: the cosets $\{0, 3, 6, 9\}$, $\{1, 4, 7, 10\}$, and
$\{2, 5, 8, 11\}$ are the &ldquo;starting-point classes&rdquo; that generate each scale.
Note that the scales themselves are not pairwise disjoint — each has eight
pitch classes, so any two share four — but the coset structure determines
which transpositions produce the same scale and which produce a different one.</p>
<hr>
<h2 id="the-subgroup-lattice-and-all-seven-modes">The Subgroup Lattice and All Seven Modes</h2>
<p>The orbit–stabiliser theorem constrains which stabiliser sizes are algebraically
possible. Since $\mathrm{Stab}(S)$ is a subgroup of $\mathbb{Z}_{12}$, its order
must divide 12. The <em>proper non-trivial</em> subgroups of $\mathbb{Z}_{12}$ — those
with order strictly between 1 and 12 — are precisely:</p>
<table>
  <thead>
      <tr>
          <th>Subgroup</th>
          <th>Generator</th>
          <th>Order</th>
          <th>Orbit size</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>$\langle T_2 \rangle = \{T_0, T_2, T_4, T_6, T_8, T_{10}\}$</td>
          <td>$T_2$</td>
          <td>6</td>
          <td>2</td>
      </tr>
      <tr>
          <td>$\langle T_3 \rangle = \{T_0, T_3, T_6, T_9\}$</td>
          <td>$T_3$</td>
          <td>4</td>
          <td>3</td>
      </tr>
      <tr>
          <td>$\langle T_4 \rangle = \{T_0, T_4, T_8\}$</td>
          <td>$T_4$</td>
          <td>3</td>
          <td>4</td>
      </tr>
      <tr>
          <td>$\langle T_6 \rangle = \{T_0, T_6\}$</td>
          <td>$T_6$</td>
          <td>2</td>
          <td>6</td>
      </tr>
  </tbody>
</table>
<p>These four subgroups exist because the proper divisors of 12 that are greater
than 1 are exactly $\{2, 3, 4, 6\}$. The subgroups of $\mathbb{Z}_n$ are in
bijection with the divisors of $n$ — a consequence of the fundamental theorem of
cyclic groups. Since $12 = 2^2 \times 3$, the proper divisors are $1, 2, 3, 4,
6$.</p>
<p>Each row of the table maps onto a level in Messiaen&rsquo;s system:</p>
<ul>
<li><strong>Mode 1</strong> (whole-tone scale): stabiliser $\langle T_2 \rangle$, 2 transpositions</li>
<li><strong>Mode 2</strong> (octatonic scale): stabiliser $\langle T_3 \rangle$, 3 transpositions</li>
<li><strong>Mode 3</strong>: stabiliser $\langle T_4 \rangle$, 4 transpositions</li>
<li><strong>Modes 4 – 7</strong>: stabiliser $\langle T_6 \rangle$, 6 transpositions each</li>
</ul>
<p>The subgroup lattice of $\mathbb{Z}_{12}$ — its Hasse diagram of containment
relationships — maps directly onto the hierarchy of Messiaen&rsquo;s modes. The more
symmetric the stabiliser subgroup, the fewer distinct transpositions the mode
admits.</p>
<p>The containment relations are: $\langle T_2 \rangle \supset \langle T_4 \rangle$
and $\langle T_2 \rangle \supset \langle T_6 \rangle$ and
$\langle T_3 \rangle \supset \langle T_6 \rangle$. Correspondingly, Mode 1
(stabiliser $\langle T_2 \rangle$, order 6) is &ldquo;more limited&rdquo; than Mode 3
(stabiliser $\langle T_4 \rangle$, order 3), in the sense that $\langle T_4
\rangle \subset \langle T_2 \rangle$: every symmetry of Mode 3 is also a symmetry
of Mode 1&rsquo;s stabiliser.</p>
<hr>
<h2 id="why-exactly-seven-modes">Why Exactly Seven Modes?</h2>
<p>Messiaen was not enumerating all pitch-class sets with non-trivial stabilisers —
there are many more than seven. At the level of the stabiliser $\langle T_6
\rangle$, for example, there are numerous pitch-class sets invariant under the
tritone transposition $T_6$: any set $S$ such that $S = S + 6$ qualifies. Some
of these sets are large (ten pitch classes), some are small (two pitch classes),
some are musically coherent and some are not.</p>
<p>Messiaen selected seven that he found aesthetically and compositionally viable:
scales of moderate cardinality, with a balance of interval types, that he could
use as raw material for his harmonic language. The group theory explains the
<em>constraint</em> (modes are possible only at the four stabiliser types listed above),
not the <em>selection</em> (which specific sets Messiaen chose among the many that
satisfy the constraint).</p>
<p>The question &ldquo;why seven?&rdquo; is therefore partly combinatorial and partly
compositional. What is group-theoretically determined is the number of <em>levels</em>
(four: orbit sizes 2, 3, 4, 6) and the <em>impossibility</em> of any mode with, say,
five distinct transpositions (since 5 does not divide 12).</p>
<hr>
<h2 id="what-messiaen-knew--and-did-not-know">What Messiaen Knew — and Did Not Know</h2>
<p>Messiaen described his modes in <em>Technique de mon langage musical</em> (1944). His
account is entirely musical and phenomenological. He lists each mode by its
interval sequence, notes how many transpositions it admits, and names the
limitation a &ldquo;charm.&rdquo; The impossibility is for him a spiritual property, a form
of harmonic stasis that he associated — as a devout Catholic — with divine
eternity. A mode that cannot depart is, in his compositional theology, a glimpse
of the unchanging.</p>
<p>He was not doing group theory. The orbit–stabiliser theorem (in its abstract form)
postdates Lagrange (1771), Cauchy (early 19th century), and Galois (1832). But
the concepts were not part of music-theoretic discourse until Milton Babbitt&rsquo;s
work in the 1950s, and they were not formalised in the pitch-class set framework
I have used here until Allen Forte&rsquo;s <em>The Structure of Atonal Music</em> (1973) and
David Lewin&rsquo;s <em>Generalized Musical Intervals and Transformations</em> (1987).</p>
<p>What Messiaen had was a musician&rsquo;s ear for symmetry. He could <em>hear</em> that the
modes were closed, without having the algebraic vocabulary to explain why. The
group theory shows that he was correct, and why he was correct with a precision
that no amount of phenomenological description could provide.</p>
<hr>
<h2 id="from-messiaen-to-lewin">From Messiaen to Lewin</h2>
<p>Lewin&rsquo;s transformational theory (1987) generalises the $\mathbb{Z}_{12}$ framework
to arbitrary musical spaces. A <em>Generalized Interval System</em> is a triple
$(S, G, \mathrm{int})$ where $S$ is a set of musical objects, $G$ is a group, and
$\mathrm{int} : S \times S \to G$ assigns an interval to each ordered pair of
objects in a way that is consistent with the group structure.</p>
<p>This framework treats musical transformations — not just pitch-class transpositions
but rhythmic augmentations, timbral shifts, any structurally defined operation —
as elements of a group. The mathematics does not privilege any particular musical
parameter; it applies wherever a transformation group acts on a set of musical
objects.</p>
<p>Neo-Riemannian theory, which emerged from Lewin&rsquo;s work in the 1980s and 1990s
and was systematised by Cohn (1998), applies this framework to triadic
transformations (the operations L, P, and R that map major and minor triads to
their relatives, parallels, and leading-tone exchanges). The group generated by
L, P, and R on the set of 24 major and minor triads is isomorphic to $D_{12}$
— the same dihedral group that governs Messiaen&rsquo;s modes, but acting on a
different musical space.</p>
<p>Emmanuel Amiot&rsquo;s more recent work (2016) applies the discrete Fourier transform
to pitch-class sets, using the DFT coefficients on $\mathbb{Z}_{12}$ as a
continuous measure of a set&rsquo;s similarity to the modes of limited transposition.
The Fourier coefficients detect the algebraic symmetries that stabilisers measure
discretely: a set with large coefficient at frequency $k$ (in the DFT over
$\mathbb{Z}_{12}$) is close, in a precise sense, to having the stabiliser
$\langle T_{12/k} \rangle$.</p>
<p>The group-theoretic perspective has moved, over seventy years, from a marginal
curiosity to the dominant mathematical framework in music theory. Messiaen&rsquo;s
modes — which once seemed like personal compositional idiosyncrasies — are
revealed as structurally constrained: the possible stabiliser orders are fixed
by the divisors of 12, and the orbit sizes that Messiaen&rsquo;s ear discovered are
exactly those that Lagrange&rsquo;s theorem permits. Many pitch-class sets have
non-trivial stabilisers; Messiaen found the seven that are musically viable.
Their limitation is not a personal choice but an algebraic fact.</p>
<p>The charm of impossibilities is a theorem of group theory. And it is exactly as
beautiful as Messiaen heard it to be.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li>
<p>Amiot, E. (2016). <em>Music Through Fourier Space: Discrete Fourier Transform in
Music Theory.</em> Springer (Computational Music Science).</p>
</li>
<li>
<p>Babbitt, M. (1960). Twelve-tone invariants as compositional determinants.
<em>The Musical Quarterly</em>, 46(2), 246–259.
<a href="https://doi.org/10.1093/mq/XLVI.2.246">https://doi.org/10.1093/mq/XLVI.2.246</a></p>
</li>
<li>
<p>Cohn, R. (1998). Introduction to neo-Riemannian theory: A survey and a
historical perspective. <em>Journal of Music Theory</em>, 42(2), 167–180.
<a href="https://doi.org/10.2307/843871">https://doi.org/10.2307/843871</a></p>
</li>
<li>
<p>Forte, A. (1973). <em>The Structure of Atonal Music.</em> Yale University Press.</p>
</li>
<li>
<p>Lewin, D. (1987). <em>Generalized Musical Intervals and Transformations.</em> Yale
University Press. (Reissued Oxford University Press, 2007.)</p>
</li>
<li>
<p>Messiaen, O. (1944). <em>Technique de mon langage musical.</em> Alphonse Leduc.
(English translation: Satterfield, J., 1956.)</p>
</li>
<li>
<p>Tymoczko, D. (2006). The geometry of musical chords. <em>Science</em>, 313(5783),
72–74. <a href="https://doi.org/10.1126/science.1126287">https://doi.org/10.1126/science.1126287</a></p>
</li>
<li>
<p>Tymoczko, D. (2011). <em>A Geometry of Music: Harmony and Counterpoint in the
Extended Common Practice.</em> Oxford University Press.</p>
</li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-14</strong>: Changed &ldquo;cosets of $D_{12}$&rdquo; to &ldquo;cosets of $\mathbb{Z}_{12}$ (the transposition subgroup)&rdquo; in the twelve-tone composition paragraph. $D_{12}$ (order 24) already includes both transpositions and inversions, yielding only 2 cosets in the full serial group. The four row forms {P, I, R, RI} correspond to 4 cosets of the transposition-only subgroup $\mathbb{Z}_{12}$ (order 12) in the full group of order 48.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Nobody Is Normal, Nobody Is Sick: A Roast of a Well-Meaning Slogan</title>
      <link>https://sebastianspicker.github.io/posts/nobody-is-normal-psychiatric-slogan-roast/</link>
      <pubDate>Sat, 18 Feb 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/nobody-is-normal-psychiatric-slogan-roast/</guid>
      <description>&amp;ldquo;Aus der Nähe betrachtet ist keiner normal.&amp;rdquo; The slogan of a Sozialpsychiatrisches Zentrum sounds compassionate. It is, under scrutiny, a gift to everyone who has ever said &amp;ldquo;but everyone gets depressed sometimes.&amp;rdquo; It attacks a concept psychiatry abandoned decades ago, dilutes the clinical categories people with severe conditions need to be taken seriously, and — most ironically — argues against the relevance of its own institution. A Karneval roast, with citations.</description>
      <content:encoded><![CDATA[<h2 id="tldr">TL;DR</h2>
<ul>
<li>
<p><strong>The slogan</strong>: <em>&ldquo;Aus der Nähe betrachtet ist keiner normal&rdquo;</em> — roughly, &ldquo;Up close,
nobody is normal.&rdquo; Displayed at a <em>Sozialpsychiatrisches Zentrum</em> to reduce
stigma around mental illness.</p>
</li>
<li>
<p><strong>What it gets right</strong>: stigma around psychiatric conditions is real and harmful.
The slogan&rsquo;s <em>intention</em> is correct.</p>
</li>
<li>
<p><strong>What it gets catastrophically wrong</strong>:</p>
<ul>
<li>It conflates statistical normality (deviation from average) with clinical
significance (harmful dysfunction). These are different concepts, and
modern psychiatric nosology uses the second, not the first.</li>
<li>&ldquo;Nobody is normal&rdquo; is exactly the argument people use to dismiss depression,
OCD, and anxiety as not-real-illness. Lending it institutional authority from
a psychiatric centre is counterproductive.</li>
<li>Psychologist Nick Haslam calls the underlying mechanism &ldquo;concept creep&rdquo;:
stretching clinical concepts until they cover everyone paradoxically devalues
them for the people who actually need them.</li>
<li>The anti-stigma research literature does not robustly support the
normalisation framing. Evidence is mixed, sometimes running in the wrong
direction.</li>
<li>A psychiatric centre whose slogan implies that the normal/abnormal
distinction is arbitrary has implicitly argued against the relevance of
its own services.</li>
</ul>
</li>
<li>
<p><strong>Analogous translation</strong>: &ldquo;Aus der Nähe betrachtet hat keiner ein normales Herz.&rdquo;
Up close, nobody has a normal heart. This is technically true. It does not
help people with cardiac disease. Neither does the original.</p>
</li>
<li>
<p><strong>What would actually help</strong>: affirming that psychiatric conditions are <em>real</em>,
<em>treatable</em>, and <em>do not define the whole person</em> — without dissolving the
conceptual distinction on which clinical care depends.</p>
</li>
</ul>
<hr>
<h2 id="the-slogan-and-what-it-wants">The Slogan and What It Wants</h2>
<p><em>Sozialpsychiatrische Zentren</em> — community psychiatric centres in
German-speaking countries — do important work: outreach, supported housing,
day programmes, a bridge between acute inpatient care and independent living.
The stigma around mental illness is real, persistent, and measurably harmful.
Tackling it is legitimate and necessary.</p>
<p>The slogan &ldquo;Aus der Nähe betrachtet ist keiner normal&rdquo; is designed to
contribute to that project. The implicit argument: the line between &ldquo;normal&rdquo;
and &ldquo;mentally ill&rdquo; is blurry. Everyone has quirks, struggles, peculiarities.
&ldquo;Normal&rdquo; is a fiction. Therefore: don&rsquo;t stigmatise people with psychiatric
diagnoses, because they are no different in kind from everyone else.</p>
<p>This sounds compassionate. It sounds inclusive. It sounds like the kind of
thing a thoughtful person would print on a poster.</p>
<p>It is precisely the wrong thing to say — and in a way that causes active
damage to the people it is trying to help.</p>
<hr>
<h2 id="what-seems-fine-is-not-fine">What Seems Fine Is Not Fine</h2>
<p>Let me put it plainly before building the argument.</p>
<p>The slogan&rsquo;s logic: nobody is normal → the normal/abnormal distinction is
arbitrary → psychiatric diagnosis is arbitrary → people with diagnoses should
not be stigmatised.</p>
<p>The conclusion is correct. The route to it is a disaster.</p>
<p>The problem is not the destination. The problem is what the argument concedes
on the way: that psychiatric categories are essentially a matter of
perspective, that the distinction between clinical illness and ordinary human
variation dissolves under sufficiently close examination, that if you look
hard enough, everyone is mentally ill.</p>
<p>That last implication is the argument that has been used, for decades, to
dismiss people with genuine clinical conditions. <em>&ldquo;Everyone gets depressed
sometimes.&rdquo;</em> <em>&ldquo;Everyone is a bit OCD.&rdquo;</em> <em>&ldquo;Everyone gets anxious — have you
tried exercise?&rdquo;</em></p>
<p>The person deploying this framing usually believes they are being kind,
inclusive, normalising. What they are doing is removing the evidentiary
ground on which someone with major depressive disorder, or
obsessive-compulsive disorder, or generalised anxiety disorder stands when
they say: <em>I am ill. I need treatment. My condition is real.</em></p>
<p>The slogan borrows this structure and prints it on a poster. That a
psychiatric institution is doing it makes it worse, not better.</p>
<hr>
<h2 id="problem-1-the-wrong-target">Problem 1: The Wrong Target</h2>
<p>The first error is attacking a concept of &ldquo;normal&rdquo; that psychiatry itself
abandoned decades ago.</p>
<p>When the slogan says &ldquo;nobody is normal,&rdquo; it implies that psychiatric diagnosis
works by measuring deviation from some statistical average of human behaviour.
Sufficiently deviant equals disordered; not-too-deviant equals normal. Since
everyone deviates from the average in some direction, &ldquo;normal&rdquo; is an illusion.</p>
<p>This is a reasonable critique of a naive, 19th-century model of mental
illness. It is not a critique of modern psychiatric nosology.</p>
<p>Jerome Wakefield&rsquo;s influential 1992 analysis in the <em>American Psychologist</em>
argues that genuine mental disorder requires two components: <em>dysfunction</em> —
the failure of a psychological mechanism to perform its naturally selected
function — and <em>harm</em> — the dysfunction causes suffering or impairment to the
person (<a href="#ref-wakefield1992">Wakefield, 1992</a>). &ldquo;Harmful dysfunction,&rdquo; not
statistical deviance. You can be spectacularly unusual and not disordered.
You can be statistically common — depression affects roughly one in five
people over a lifetime — and severely ill.</p>
<p>The DSM-5 builds in a related safeguard: the <em>clinical significance
criterion</em>. For most diagnoses, the symptom cluster must cause &ldquo;clinically
significant distress or impairment in social, occupational, or other important
areas of functioning&rdquo; (<a href="#ref-dsm5">American Psychiatric Association,
2013</a>). High neuroticism, unusual ideation, eccentric behaviour —
none of these, on their own, constitute a disorder under this criterion. What
matters is whether the person is suffering and whether their functioning is
impaired.</p>
<p>Christopher Boorse, working from a biomedical angle, defined health in terms
of <em>species-typical functioning</em> — whether biological systems are doing what
they evolved to do (<a href="#ref-boorse1977">Boorse, 1977</a>). Boorse&rsquo;s formulation is
contested, but its core point holds: the relevant question is not &ldquo;is this
person similar to the average person&rdquo; but &ldquo;are this person&rsquo;s systems
performing their functions.&rdquo; These are very different questions.</p>
<p>The slogan attacks a straw man. Real psychiatric diagnosis — when done
well — is not in the business of pathologising deviation from a norm of
cheerfulness or orderliness or sociability. It is in the business of
identifying harmful dysfunction. The &ldquo;nobody is normal&rdquo; framing has no purchase
on that target.</p>
<hr>
<h2 id="problem-2-concept-creep-and-the-dilution-effect">Problem 2: Concept Creep and the Dilution Effect</h2>
<p>Nick Haslam, a psychologist at the University of Melbourne, has documented
what he calls &ldquo;concept creep&rdquo; — the progressive expansion of psychological
concepts (trauma, mental disorder, depression, bullying) to cover increasingly
mild instances of what they originally described
(<a href="#ref-haslam2016">Haslam, 2016</a>).</p>
<p>The expansion happens in two directions: <em>horizontal</em> (covering more types of
phenomena) and <em>vertical</em> (covering less severe instances). A concept of
&ldquo;trauma&rdquo; that originally required exposure to life-threatening events has
expanded to include ordinary life stressors. A concept of &ldquo;depression&rdquo; that
originally meant severe, impairing low mood has expanded toward ordinary
sadness.</p>
<p>Concept creep sounds inclusive. It is, in practice, a dilution. When
&ldquo;everyone is a bit depressed&rdquo; becomes institutionally sanctioned, the person
with major depressive disorder — who cannot get out of bed, who has not eaten
in three days, who is considering suicide — finds their claim to the label
contested. The clinical category loses its clinical weight precisely because
everyone is in it.</p>
<p>The slogan &ldquo;nobody is normal&rdquo; is concept creep in slogan form. By implying
that the clinical/non-clinical distinction is arbitrary, it weakens the
conceptual infrastructure on which clinical claims rest. This is not a
hypothetical harm. It is the mechanism by which a great deal of dismissal of
severe mental illness operates: not by claiming that mental illness doesn&rsquo;t
exist, but by claiming that everyone is a bit mentally ill, so what&rsquo;s the
problem, stop complaining.</p>
<p>Allen Frances, who chaired the DSM-IV task force and subsequently became a
sharp critic of diagnostic inflation, wrote a book (<em>Saving Normal</em>, 2013)
about the opposite problem: the expansion of diagnostic categories to
medicalise ordinary human variation
(<a href="#ref-frances2013">Frances, 2013</a>). Frances&rsquo;s worry and the slogan&rsquo;s argument
share a logical structure — &ldquo;the line between normal and disordered is blurry,
therefore the line is somewhat arbitrary&rdquo; — and both forget the same thing:
the people with the most severe, genuine, impairing psychiatric conditions
need that line to carry weight. Blur it enough and their most urgent claims
become indistinguishable from everyone else&rsquo;s minor struggles.</p>
<hr>
<h2 id="problem-3-what-the-anti-stigma-literature-actually-says">Problem 3: What the Anti-Stigma Literature Actually Says</h2>
<p>Does the &ldquo;we&rsquo;re all a bit X&rdquo; normalisation framing reliably reduce stigma?
The evidence is, at best, mixed.</p>
<p>Patrick Corrigan and David Penn&rsquo;s review of social-psychological approaches to
psychiatric stigma identifies a consistent risk in normalisation campaigns:
they can fail to distinguish between the ordinary distress that everyone
experiences and the clinical conditions that require treatment and support
(<a href="#ref-corrigan1999">Corrigan &amp; Penn, 1999</a>). When stigma reduction messaging
implies that psychiatric conditions are simply more-of-what-everyone-has, it
may reduce perceived severity and undermine motivation to support treatment
access.</p>
<p>Kvaale, Haslam, and Gottdiener&rsquo;s meta-analysis of biogenetic framings in
anti-stigma campaigns — which share structural features with the normalisation
approach — found paradoxical effects: reduced blame, yes, but sometimes
increased perceived dangerousness and greater social distance
(<a href="#ref-kvaale2013">Kvaale, Haslam, &amp; Gottdiener, 2013</a>). The &ldquo;we&rsquo;re all on
a spectrum&rdquo; variant has its own specific paradox: if nobody is normal, the
distinction that generates stigma dissolves — but so does the distinction
that generates <em>respect for people with serious conditions who need
real resources</em>. Both edges cut.</p>
<p>What the literature supports more robustly is <em>contact</em>: direct, positive
interaction with people who have experience of mental illness, presented as
whole persons and not primarily as patients. Contact works better than
educational campaigns about what mental illness is or isn&rsquo;t. The &ldquo;nobody is
normal&rdquo; poster is an educational campaign about what mental illness isn&rsquo;t. It
is probably less effective than a conversation.</p>
<hr>
<h2 id="problem-4-the-institutional-contradiction">Problem 4: The Institutional Contradiction</h2>
<p>There is a fourth problem, and I find it the most striking.</p>
<p>The slogan belongs to a <em>Sozialpsychiatrisches Zentrum</em> — an institution that
exists precisely because some people have psychiatric conditions that impair
their functioning and require dedicated support. Its implicit mission: there
is a meaningful distinction between people who need psychiatric services and
people who do not, and we provide those services for the former.</p>
<p>The slogan: nobody is normal.</p>
<p>If nobody is normal, then everybody is, in the relevant sense, a bit
psychiatrically ill. If the line between normal and not-normal is arbitrary,
then so is the line between people who need psychiatric services and people
who don&rsquo;t. If the category &ldquo;psychiatric condition requiring support&rdquo; is as
fuzzy as the slogan implies — a mere matter of proximity and perspective —
then why should anyone prioritise coming to this particular institution?</p>
<p>The slogan, taken seriously, argues against the relevance of its own
institution. A psychiatric centre has printed on its posters the claim that
psychiatric categories dissolve under close examination. This is an unusual
thing for a psychiatric centre to announce.</p>
<hr>
<h2 id="the-analogous-translation">The Analogous Translation</h2>
<p>Let me make the logical structure visible with a direct translation into
another field of medicine:</p>
<blockquote>
<p><em>&ldquo;Aus der Nähe betrachtet hat keiner ein normales Herz.&rdquo;</em></p>
<p><em>&ldquo;Up close, nobody has a normal heart.&rdquo;</em></p>
</blockquote>
<p>This is, in a technical sense, largely true. Cardiologists can find something
to remark on in almost any heart — a minor valve irregularity, some degree of
atherosclerosis past middle age, a benign arrhythmia, a structural variation
within the clinical reference range. Under sufficiently detailed examination,
the perfectly normal heart is a platonic ideal rather than a clinical reality.</p>
<p>Does this mean coronary artery disease doesn&rsquo;t exist? Does it mean myocardial
infarction is a matter of perspective or proximity? Does it mean that someone
waiting for a cardiac transplant should be reassured that, up close, nobody
has a normal heart, so they shouldn&rsquo;t worry too much about their own?</p>
<p>Obviously not. The clinical category of cardiac disease does not depend on
the existence of a perfectly normal heart. It depends on whether specific
mechanisms are failing in ways that cause harm — which is true for some
people and not for others, regardless of whether everyone has some minor
deviation from an idealised cardiovascular anatomy.</p>
<p>The slogan about psychiatric normalcy makes exactly the same error. The
clinical category of mental disorder does not depend on the existence of a
psychologically perfect human being. It depends on whether psychological
mechanisms are failing in ways that cause harm — which is true for some
people and not for others, regardless of whether everyone has quirks,
struggles, or eccentricities.</p>
<p>The heart analogy is also useful for what it reveals about whose interests
the slogan serves. &ldquo;Nobody has a normal heart&rdquo; would be printed, presumably,
to reassure people who feel embarrassed about their cardiac condition — to
say: you&rsquo;re not so different from anyone else. What it actually does is make
it harder for that person to say: <em>my heart is not functioning well, and
that is a real medical fact that deserves real medical attention.</em> The
compassionate intent and the practical effect run in opposite directions.</p>
<hr>
<h2 id="what-would-actually-help">What Would Actually Help</h2>
<p>The goal — reducing stigma against people with psychiatric conditions — is
correct and important. The approach — dissolving the category of &ldquo;normal&rdquo;
until psychiatric and non-psychiatric become indistinguishable — is not.</p>
<p>A more defensible anti-stigma argument goes: mental illness is <em>real</em>, it
involves genuine failures of psychological functioning, it causes genuine
suffering, and <em>none of that makes the person with it less worthy of respect,
resources, and full participation in society.</em> This is the position that
affirms both the reality of the condition and the humanity of the person.
It does not require denying the normal/abnormal distinction. It requires
insisting that the distinction does not carry the moral weight that stigma
assigns to it.</p>
<p>The difference between &ldquo;nobody is normal, so stop stigmatising&rdquo; and &ldquo;you can
be ill and still be a person of full worth&rdquo; sounds subtle. In practice, it is
enormous. The first removes the conceptual ground from under the people most
in need. The second leaves the ground intact while refusing to let it be used
as a weapon.</p>
<p><em>Psychisch krank — und trotzdem ganz.</em> Mentally ill — and still whole. Not:
nobody is normal. But: being ill doesn&rsquo;t make you less of a person. The second
slogan does not hand ammunition to the dismissers. The first one does.</p>
<hr>
<h2 id="karneval-coda">Karneval Coda</h2>
<p>It is Karneval. Everyone is wearing a mask.</p>
<p>The slogan &ldquo;Aus der Nähe betrachtet ist keiner normal&rdquo; is wearing a mask too:
the mask of tolerance, of radical inclusion, of refusing to pathologise
difference. Under the mask is a logical structure that, taken seriously, would
dissolve the evidentiary basis for psychiatric care, hand a slogan to everyone
who has ever told someone with depression that they just need to try harder,
and leave the people with the most severe conditions with one fewer
conceptual tool for insisting that their suffering is real, their need is
legitimate, and their claim on resources and support deserves to be taken
seriously.</p>
<p>The mask is well-intentioned. Karneval ends on Wednesday.
The poster will still be on the wall.</p>
<hr>
<h2 id="references">References</h2>
<ul>
<li><span id="ref-wakefield1992"></span>Wakefield, J. C. (1992). The concept of mental disorder: On the boundary between biological facts and social values. <em>American Psychologist</em>, 47(3), 373–388. <a href="https://doi.org/10.1037/0003-066X.47.3.373">DOI: 10.1037/0003-066X.47.3.373</a></li>
<li><span id="ref-dsm5"></span>American Psychiatric Association. (2013). <em>Diagnostic and Statistical Manual of Mental Disorders</em> (5th ed.). American Psychiatric Publishing. <a href="https://doi.org/10.1176/appi.books.9780890425596">DOI: 10.1176/appi.books.9780890425596</a></li>
<li><span id="ref-boorse1977"></span>Boorse, C. (1977). Health as a theoretical concept. <em>Philosophy of Science</em>, 44(4), 542–573. <a href="https://doi.org/10.1086/288768">DOI: 10.1086/288768</a></li>
<li><span id="ref-haslam2016"></span>Haslam, N. (2016). Concept creep: Psychology&rsquo;s expanding concepts of harm and pathology. <em>Psychological Inquiry</em>, 27(1), 1–17. <a href="https://doi.org/10.1080/1047840X.2016.1082418">DOI: 10.1080/1047840X.2016.1082418</a></li>
<li><span id="ref-frances2013"></span>Frances, A. (2013). <em>Saving Normal: An Insider&rsquo;s Revolt against Out-of-Control Psychiatric Diagnosis, DSM-5, Big Pharma, and the Medicalization of Ordinary Life</em>. HarperCollins.</li>
<li><span id="ref-corrigan1999"></span>Corrigan, P. W., &amp; Penn, D. L. (1999). Lessons from social psychology on discrediting psychiatric stigma. <em>American Psychologist</em>, 54(9), 765–776. <a href="https://doi.org/10.1037/0003-066X.54.9.765">DOI: 10.1037/0003-066X.54.9.765</a></li>
<li><span id="ref-kvaale2013"></span>Kvaale, E. P., Haslam, N., &amp; Gottdiener, W. H. (2013). The &lsquo;side effects&rsquo; of medicalization: A meta-analytic review of how biogenetic explanations affect stigma. <em>Clinical Psychology Review</em>, 33(6), 782–794. <a href="https://doi.org/10.1016/j.cpr.2013.06.002">DOI: 10.1016/j.cpr.2013.06.002</a></li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Spiral Out: Tool&#39;s Lateralus, the Fibonacci Sequence, and the Mathematics of Musical Structure</title>
      <link>https://sebastianspicker.github.io/posts/fibonacci-lateralus/</link>
      <pubDate>Tue, 08 Nov 2022 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/fibonacci-lateralus/</guid>
      <description>Alongside physics and astronomy, two other things have occupied an unreasonable share of my attention since adolescence: mathematics and music. Lateralus by Tool — released 2001, still in rotation — is the piece that most conspicuously occupies the intersection. The song is structurally built around the Fibonacci sequence, from the syllable counts in Maynard Keenan&amp;rsquo;s vocals to the time signature pattern that concatenates to F(16). This post works through the mathematics in some detail and asks why it works musically.</description>
      <content:encoded><![CDATA[<h2 id="two-passions-one-song">Two Passions, One Song</h2>
<p>Physics training means coming to mathematics as a tool before arriving at it as
an object of aesthetic interest, and it
took me longer than it should have to notice that a proof can be
beautiful in the same way a piece of music can be beautiful — not
despite its rigour but because of it. Both reward attention to
structure. Both have surfaces accessible to a casual listener and depths
that only reveal themselves when you look harder.</p>
<p>Lateralus, the title track of Tool&rsquo;s 2001 album, is a convenient case
study for the overlap. It is not the only piece of music built around
Fibonacci numbers — Bartók made the connection decades earlier, and it
appears in scattered places across Western and non-Western traditions —
but it is among the most thoroughly and deliberately constructed, and
the mathematical structure is audible rather than merely theoretical.</p>
<p>What follows is an attempt to do justice to both dimensions: the
mathematics of the Fibonacci sequence and the golden ratio, and the
musical mechanics of how those structures show up and what they do.</p>
<hr>
<h2 id="the-fibonacci-sequence">The Fibonacci Sequence</h2>
<p>The sequence is defined by a recurrence relation. Starting from the
initial values $F(1) = 1$ and $F(2) = 1$, each subsequent term is the
sum of the two preceding ones:</p>
$$F(n) = F(n-1) + F(n-2), \quad n \geq 3$$<p>This gives:</p>
$$1,\; 1,\; 2,\; 3,\; 5,\; 8,\; 13,\; 21,\; 34,\; 55,\; 89,\; 144,\; 233,\; 377,\; 610,\; \mathbf{987},\; 1597,\; \ldots$$<p>The term $987$ is the sixteenth Fibonacci number, $F(16)$. Keep that
in mind.</p>
<p>The recurrence can be encoded compactly in a matrix formulation. For
$n \geq 1$:</p>
$$\begin{pmatrix} F(n+1) \\ F(n) \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^n \begin{pmatrix} 1 \\ 0 \end{pmatrix}$$<p>This is more than notational tidiness — it connects the Fibonacci
sequence to the eigenvalues of the matrix
$\mathbf{A} = \bigl(\begin{smallmatrix}1 & 1 \\ 1 & 0\end{smallmatrix}\bigr)$,
which are exactly $\varphi$ and $-1/\varphi$ where $\varphi$ is the
golden ratio. That connection gives us Binet&rsquo;s formula, a closed-form
expression for the $n$-th Fibonacci number:</p>
$$F(n) = \frac{\varphi^n - \psi^n}{\sqrt{5}}, \quad \varphi = \frac{1+\sqrt{5}}{2},\quad \psi = \frac{1-\sqrt{5}}{2} = -\frac{1}{\varphi}$$<p>Since $|\psi| < 1$, the term $\psi^n / \sqrt{5}$ diminishes rapidly,
and for large $n$ we have the convenient approximation:</p>
$$F(n) \approx \frac{\varphi^n}{\sqrt{5}}$$<p>This means Fibonacci numbers grow <em>exponentially</em>, at a rate governed by
the golden ratio. The sequence does not grow linearly or polynomially; it
spirals outward.</p>
<hr>
<h2 id="the-golden-ratio">The Golden Ratio</h2>
<p>The golden ratio $\varphi$ appears as the limit of consecutive Fibonacci
ratios:</p>
$$\varphi = \lim_{n \to \infty} \frac{F(n+1)}{F(n)} = \frac{1+\sqrt{5}}{2} \approx 1.61803\ldots$$<p>It can be derived from a simple geometric proportion: divide a line
segment into two parts such that the ratio of the whole segment to the
longer part equals the ratio of the longer part to the shorter part.
Calling those ratios $r$:</p>
$$\frac{a+b}{a} = \frac{a}{b} = r \implies r^2 - r - 1 = 0 \implies r = \frac{1+\sqrt{5}}{2} = \varphi$$<p>What makes $\varphi$ mathematically distinctive is its continued fraction
representation:</p>
$$\varphi = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \cdots}}}$$<p>This is the simplest possible infinite continued fraction. It is also, in
a precise sense, the <em>hardest</em> real number to approximate by rational
fractions. The convergents of a continued fraction are the best rational
approximations to a real number at each level of precision; the
convergents of $\varphi$ are exactly the ratios of consecutive Fibonacci
numbers: $1/1$, $2/1$, $3/2$, $5/3$, $8/5$, $13/8$, $\ldots$ These
converge more slowly to $\varphi$ than the convergents of any other
irrational number. $\varphi$ is, in this sense, maximally irrational.</p>
<p>That property has a physical consequence. In botanical phyllotaxis — the
arrangement of leaves, seeds, and petals on plants — structures that grow
by adding new elements at a fixed angular increment will pack most
efficiently when that increment is as far as possible from any rational
fraction of a full rotation. The optimal angle is:</p>
$$\theta = \frac{2\pi}{\varphi^2} \approx 137.508°$$<p>This is the <em>golden angle</em>, and it is the reason sunflower seed spirals
count $55$ and $89$ (consecutive Fibonacci numbers) in their two
counter-rotating sets. The mathematics of efficient growth in nature
and the mathematics of the Fibonacci sequence are the same mathematics.</p>
<p>The golden spiral — the logarithmic spiral whose growth factor per
quarter turn is $\varphi$ — is the visual representation of this: it
is self-similar, expanding without bound while maintaining constant
proportionality.</p>
<hr>
<h2 id="fibonacci-numbers-in-music-before-tool">Fibonacci Numbers in Music: Before Tool</h2>
<p>The connection between the Fibonacci sequence and musical structure is
not Tool&rsquo;s invention. The most carefully documented case is Béla
Bartók, whose Music for Strings, Percussion and Celesta (1936) has been
analysed exhaustively by Ernő Lendvai. In the first movement, the
climax arrives at bar 55 (a Fibonacci number), and Lendvai counted the
overall structure as 89 bars — the score has 88, but he added an implied
final rest bar to reach the Fibonacci number — dividing at bar 55 with
near-mathematical precision. Lendvai argued that Bartók consciously embedded Fibonacci
proportions into formal structure, tonal architecture, and thematic
development throughout much of his output.</p>
<p>Whether these proportions were conscious design or an instinct that
selected naturally resonant proportions is contested. The same question
applies to claims about Mozart and Chopin. What is more defensible is
a structural observation about the piano keyboard and Western scales
that requires no attribution of intent:</p>
<p>A single octave on the piano keyboard has <strong>13 keys</strong>, comprising <strong>8
white keys</strong> and <strong>5 black keys</strong>. The black keys are grouped as <strong>2</strong>
and <strong>3</strong>. The numbers $2, 3, 5, 8, 13$ are five consecutive Fibonacci
numbers — $F(3)$ through $F(7)$.</p>
<p>The standard Western scales make this concrete. The major scale
contains <strong>7 distinct pitches</strong> within an octave of <strong>12 semitones</strong>.
The pentatonic scale (ubiquitous in folk, blues, rock) contains <strong>5</strong>
pitches. The chromatic scale contains <strong>12</strong> pitch classes per octave;
counting both endpoints of the octave (C to C) gives <strong>13</strong> chromatic
notes, the next Fibonacci number.</p>
<p>Harmonic intervals in just intonation are rational approximations of
simple frequency ratios: the octave (2:1), the perfect fifth (3:2),
the perfect fourth (4:3), the major third (5:4), the minor third (6:5).
The numerators and denominators are small integers, often Fibonacci
numbers or their neighbours. The major triad — the structural foundation
of tonal Western music — consists of intervals in frequency ratios
$4:5:6$, three consecutive integers that bracket the Fibonacci-adjacent
range.</p>
<p>This does not mean that Western music is secretly Fibonacci. It means
that the integer frequency ratios that produce consonant intervals are
the small integers, and small integers include the small Fibonacci
numbers. The connection is genuine but not exclusive.</p>
<hr>
<h2 id="lateralus">Lateralus</h2>
<p>Tool&rsquo;s <em>Lateralus</em> (2001, album of the same name) is unusual in that
the Fibonacci construction is not an analytical inference applied after
the fact — it was discussed publicly by the band. Drummer Danny Carey has
spoken about his engagement with sacred geometry and mathematical
structure, and the song&rsquo;s construction has been described as intentional
by multiple band members.</p>
<p>There are two primary levels of Fibonacci structure in the song. The
third — the thematic content of the lyrics — makes the mathematical
frame explicit.</p>
<h3 id="the-syllable-count">The Syllable Count</h3>
<p>The opening verses are constructed so that successive lines contain
syllable counts following the Fibonacci sequence ascending:
$1, 1, 2, 3, 5, 8, 13$. The first syllable count is a single word.
The second is another. The third is a two-syllable phrase. The sequence
continues, each line adding the weight of the previous two, until the
thirteenth-syllable line, which in structure and delivery feels like the
crest of a wave.</p>
<p>The second half of the verse then descends: $13, 8, 5, 3, 2, 1, 1$.
Or, in some analyses, the chorus and pre-chorus sections begin a new
ascending Fibonacci run before the full descent, creating a nested
structure of expansions and contractions.</p>
<p>The audible effect of this design is not arbitrary. A sequence of lines
whose syllable counts follow $1, 1, 2, 3, 5, 8, 13$ creates a
consistently accelerating density of text over the same musical time.
The vocal line becomes more compressed as the syllable count rises,
building tension — and then the descent releases it. This is not how
most pop or rock lyrics are structured. It produces a breathing,
organic quality, the way a plant reaches toward light.</p>
<h3 id="the-time-signature-987">The Time Signature: 987</h3>
<p>The verse sections of the song cycle through three time signatures in
succession: $9/8$, then $8/8$, then $7/8$.</p>
$$9/8 + 8/8 + 7/8$$<p>This three-bar pattern repeats. Now: the sequence of numerators is $9$,
$8$, $7$. Written as a three-digit number: <strong>987</strong>. And as noted above,
$987 = F(16)$, the sixteenth Fibonacci number.</p>
<p>Whether this is a deliberate encoding or a remarkable coincidence is a
matter of interpretation. The time signature sequence is definitely
deliberate — asymmetric meters of this kind require careful compositional
choice. The fact that their numerators concatenate to a Fibonacci number
is either intentional and clever or accidental and still remarkable.
Either way, the time signature pattern has a musical function independent
of the Fibonacci reading.</p>
<p>In standard rock, time is almost always $4/4$: four even beats per bar,
a pulse that is maximally predictable and maximally amenable to groove.
The $9/8 + 8/8 + 7/8$ pattern is the opposite. Each bar has a different
length. The listener&rsquo;s internal metronome, calibrated to $4/4$, cannot
lock onto the pattern. The music generates forward momentum not through
a repeated downbeat but through the continuous, non-periodic unfolding
of measures whose lengths shift. This is the rhythmic analogue of a
spiral: no two revolutions are identical in length, but the growth is
consistent.</p>
<p>The chorus and other sections use different time signatures, including
stretches in $5/8$ and $7/8$ — Fibonacci numbers again, and specifically
the $5, 8, 13$ triplet that appears so often in this context.</p>
<h3 id="the-thematic-content">The Thematic Content</h3>
<p>The lyrics are explicitly about spirals, Fibonacci growth, and the
experience of reaching beyond a current state of development. They
reference the idea of expanding one&rsquo;s perception outward through
accumulating cycles, each containing and exceeding the previous one.
The chorus refrain — about spiralling outward — names the mathematical
structure of the golden spiral directly. The song is, in its own terms,
about the process that the mathematics describes.</p>
<p>This kind of thematic coherence between structure and content is what
makes the construction interesting rather than merely clever. The
Fibonacci structure is not decorative. It is the argument of the song
made manifest in its form.</p>
<hr>
<h2 id="why-fibonacci-structure-works-in-music">Why Fibonacci Structure Works in Music</h2>
<p>The most interesting question is not whether the Fibonacci structure is
there — it clearly is — but why it produces the musical effect it does.</p>
<p>Consider what the Fibonacci sequence represents physically. It is the
growth law of structures that build on their own preceding state:
$F(n) = F(n-1) + F(n-2)$. Unlike arithmetic growth (add a constant)
or geometric growth (multiply by a constant), Fibonacci growth is
<em>self-referential</em>. Each term contains the memory of the previous two.
The sequence is expansive but not uniform; it accelerates, but always
in proportion to what came before.</p>
<p>Musical tension and release are, in an important sense, the same
mechanism. A phrase creates an expectation; its continuation either
confirms or subverts that expectation; resolution reduces the tension.
What makes a musical phrase feel like it is building toward something
is precisely the progressive accumulation of expectation — each bar
adding its weight to the previous, the accumulated tension requiring
resolution at a scale proportional to the build-up. The Fibonacci
syllable structure in Lateralus generates this literally: each line is
denser than the previous two lines&rsquo; combined syllable count would
suggest is comfortable, until the structure has to breathe.</p>
<p>The time signature asymmetry works similarly. In $4/4$, the beat is
predictable, and the listener&rsquo;s body can lock to it and then coast on
that lock. In $9/8 + 8/8 + 7/8$, the beat is never fully locked — the
pattern is periodic (it repeats) but the internal structure of each
repetition is shifting. The listener is perpetually catching up,
perpetually leaning slightly into the music to find the next downbeat.
This is not discomfort — it is engagement. The mathematical reason is
that the pattern is large enough to be periodic (it does repeat) but
small enough to be audible as a unit. The brain can learn the 24-beat
super-pattern; it just requires attention that $4/4$ does not.</p>
<p>There is a deeper reason why golden-ratio proportions feel right in
musical form. The golden section of a piece — the point at which the
piece divides in the $\varphi : 1$ ratio — is the point of maximum
accumulated development before the final resolution. In a five-minute
piece, the golden section falls at roughly 3:05. This is, empirically,
where the emotional and structural climax tends to sit in a wide range
of well-regarded music, from Baroque to jazz. Whether composers
consciously target this proportion or whether the proportion is what
accumulated development looks like when done well is not easily
separable. But the mathematical reason it is <em>a</em> proportion worth
targeting is that $\varphi$ is the only division point that is
self-similar: the ratio of the whole to the longer part equals the ratio
of the longer part to the shorter part. There is no arbitrary scale
associated with the golden section; it is scale-invariant, the same
proportion at every level of analysis.</p>
<hr>
<h2 id="a-brief-note-on-binet-and-limits">A Brief Note on Binet and Limits</h2>
<p>The closed-form expression for Fibonacci numbers,</p>
$$F(n) = \frac{\varphi^n - \psi^n}{\sqrt{5}},$$<p>has a pleasing consequence for large $n$. Since $|\psi| \approx 0.618 < 1$,
the term $\psi^n \to 0$, and $F(n)$ is simply the nearest integer to
$\varphi^n / \sqrt{5}$. The integers produced by the Fibonacci recurrence
are the integers that $\varphi^n / \sqrt{5}$ passes closest to. The
exponential growth of $\varphi^n$ and the rounding to integers together
give the sequence.</p>
<p>This is also why the ratios $F(n+1)/F(n)$ converge to $\varphi$
exponentially fast — the error is $\mathcal{O}(|\psi/\varphi|^n)
= \mathcal{O}(\varphi^{-2n})$ — and why, for musical purposes, the
Fibonacci ratios $8:5$, $13:8$, $21:13$ are already excellent
approximations of the golden ratio, close enough that the ear cannot
distinguish them from $\varphi$ in any direct sense.</p>
<hr>
<h2 id="what-lateralus-is">What Lateralus Is</h2>
<p><em>Lateralus</em> is not a math lecture set to music. It is a nine-minute
progressive metal track that is physically involving, rhythmically
complex, and lyrically coherent. The Fibonacci structure would be
worthless if the song were not also, on purely musical terms, good.</p>
<p>What the mathematics adds is a vocabulary for something the song achieves
anyway: the sense of growing without ever arriving, of each section being
both a resolution of what came before and an opening toward something
larger. The golden spiral does not end. The Fibonacci sequence does not
converge. The song does not resolve in the sense that a classical sonata
resolves; it spirals to a close.</p>
<p>The reason this is worth writing about is that it makes concrete a
connection that is usually stated vaguely: mathematics and music are
similar. They are similar in specific and articulable ways. The
self-referential structure of the Fibonacci recurrence, the scale-
invariance of the golden ratio, the information-theoretic account of
tension and expectation — these are not metaphors for musical experience.
They are, in this case, the actual mechanism.</p>
<hr>
<h2 id="references">References</h2>
<p>Lendvai, E. (1971). <em>Béla Bartók: An Analysis of His Music.</em> Kahn &amp;
Averill.</p>
<p>Benson, D. J. (2006). <em>Music: A Mathematical Offering.</em> Cambridge
University Press. <em>(For an introduction to the general theory of tuning,
temperament, and harmonic series.)</em></p>
<p>Tool. (2001). <em>Lateralus.</em> Volcano Records.</p>
<p>Livio, M. (2002). <em>The Golden Ratio: The Story of Phi, the World&rsquo;s Most
Astonishing Number.</em> Broadway Books.</p>
<p>Knott, R. (2013). Fibonacci numbers and the golden section in art,
architecture and music. <em>University of Surrey Mathematics Department.</em>
<a href="https://r-knott.surrey.ac.uk/Fibonacci/fibInArt.html">https://r-knott.surrey.ac.uk/Fibonacci/fibInArt.html</a></p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-11-20</strong>: Clarified the Bartók bar count: the written score has 88 bars; Lendvai&rsquo;s analysis counted 89 by adding an implied final rest bar to reach the Fibonacci number. Previously stated as &ldquo;89 bars&rdquo; without qualification.</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>What Black Hole Images Actually Show (and Why a Wormhole Would Look Different)</title>
      <link>https://sebastianspicker.github.io/posts/black-hole-image-wormhole-shadow/</link>
      <pubDate>Mon, 17 Oct 2022 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/black-hole-image-wormhole-shadow/</guid>
      <description>The EHT images of M87* and Sgr A* are remarkable not because they surprised us, but because they confirmed a century-old prediction at microarcsecond precision. The more interesting question: what would a wormhole look like? Completely different — and we have never seen that.</description>
      <content:encoded><![CDATA[<h2 id="summary">Summary</h2>
<p>In 2019 and 2022 the Event Horizon Telescope released images of two supermassive black holes. Both
looked exactly like physics predicted they would. That precision — agreement to within a few
percent — is what makes them scientifically powerful. The ring is not merely beautiful; it is a
quantitative measurement of a metric.</p>
<p>The more interesting question, the one I want to spend time on here, is: what would a wormhole
look like? The answer is: radically different. Which means the images are also evidence — not just
confirmation of a black hole, but a <em>ruling out</em> of certain alternatives at those locations.</p>
<h2 id="the-images">The Images</h2>
<p>The Event Horizon Telescope is a planet-scale interferometer: radio dishes from Hawaii to the
South Pole, phase-locked to atomic clocks, synthesizing an effective aperture the diameter of
Earth. At millimetre wavelengths, this gives an angular resolution of around 20 microarcseconds —
enough to resolve a grapefruit on the Moon.</p>
<p>In April 2019 the collaboration published six simultaneous papers on M87*, the supermassive black
hole at the centre of the Virgo A galaxy
(<a href="#ref-eht2019a">EHT Collaboration et al., 2019a</a>, <a href="#ref-eht2019b">2019b</a>). The ring had an
angular diameter of \(42 \pm 3\) μas, consistent with a black hole of mass
\(M = (6.5 \pm 0.7) \times 10^9 \, M_\odot\) at a distance of 16.8 Mpc. The southern arc of
the ring was brighter — I will return to why.</p>
<p>In May 2022 the same team published results on Sagittarius A*, the Milky Way&rsquo;s central black hole
(<a href="#ref-eht2022">EHT Collaboration et al., 2022</a>). The ring diameter: \(51.8 \pm 2.3\) μas,
corresponding to a mass of \(\sim 4 \times 10^6 \, M_\odot\) at 8.18 kpc. M87* is roughly 1500
times more massive than Sgr A* and roughly 2000 times farther away — so the two apparent ring
sizes are within 25% of each other. The universe arranged the coincidence; the EHT exploited it.</p>
<h2 id="the-physics-of-the-ring">The Physics of the Ring</h2>
<p>The ring is not the black hole itself. You cannot image an event horizon: by definition, no
information escapes from it. What the EHT resolves is the <em>photon sphere</em> — the region of
unstable circular photon orbits — and its shadow.</p>
<p>For a non-rotating (Schwarzschild) black hole, the photon sphere sits at:</p>
$$
r_\text{ph} = \frac{3GM}{c^2} = \frac{3}{2} R_S
$$<p>where \(R_S = 2GM/c^2\) is the Schwarzschild radius. Light orbiting here is in unstable
equilibrium: a small perturbation sends it either spiralling inward or escaping to infinity. The
<em>critical impact parameter</em> — the perpendicular distance from the optical axis at which an
incoming photon just grazes the photon sphere — is:</p>
$$
b_c = \frac{3\sqrt{3} \, GM}{c^2} \approx 5.196 \, \frac{GM}{c^2}
$$<p>The angular diameter of the shadow as seen by a distant observer is therefore:</p>
$$
\theta_\text{shadow} = \frac{2 b_c}{D} = \frac{6\sqrt{3} \, GM}{c^2 D}
$$<p>Plugging in the EHT numbers for M87* (\(M = 6.5 \times 10^9 \, M_\odot\), \(D = 16.8\) Mpc):</p>
$$
\theta \approx \frac{6 \times 1.732 \times 6.5 \times 10^9 \times 1477 \, \text{m}}{16.8 \times 3.086 \times 10^{22} \, \text{m}}
\approx 40 \, \mu\text{as}
$$<p>The EHT measured \(42 \pm 3\) μas. Agreement within 5%. This is not a post-hoc fit; it is a
prediction that follows directly from general relativity and a mass independently constrained by
stellar kinematics.</p>
<p>The first numerical simulation of this image was done by Jean-Pierre Luminet in 1979, using punch
cards and an IBM 7040 (<a href="#ref-luminet1979">Luminet, 1979</a>). He computed the geodesics, rendered the
result by hand on photographic paper, and produced an image that looks startlingly like the
2019 photograph — forty years before the telescope existed.</p>
<h2 id="the-brightness-asymmetry">The Brightness Asymmetry</h2>
<p>The southern arc of the M87* ring is brighter. This is not an instrumental artefact. The
accretion disk — the superheated plasma spiralling into the black hole — orbits at mildly
relativistic speeds, \(v \sim 0.3\text{–}0.6 \, c\). On the approaching side of the disk,
synchrotron emission is Doppler-beamed toward the observer: intensity amplified, frequency
blueshifted. On the receding side, the flux is deboosted (<a href="#ref-eht2019b">EHT Collaboration et al., 2019b</a>).</p>
<p>In M87* the approaching side faces south, which implies — combined with the known orientation of
M87&rsquo;s large-scale relativistic jet — that the black hole spin axis points away from Earth. The
brightness asymmetry is, in effect, a spin measurement.</p>
<h2 id="interstellar-did-it-right">Interstellar Did It Right</h2>
<p>In 2014, the visual effects company Double Negative rendered the black hole Gargantua for
Christopher Nolan&rsquo;s <em>Interstellar</em>. They did this by integrating the actual geodesic equations for
a rapidly spinning (near-extremal Kerr) black hole. Kip Thorne, one of the producers, collaborated
on two companion papers with the visual effects team (<a href="#ref-james2015a">James et al., 2015a</a>).</p>
<p>The resulting image showed the accretion disk wrapping both above and below the black hole,
producing a characteristic double-arc structure — direct emission at the equator plus a secondary
image of the disk mirrored by gravitational lensing. This was not artistic licence. It was the
first photorealistic render of a black hole produced from first principles, and the physicists
found new results in the process: features of the lens map that had not previously been worked out
analytically.</p>
<p>The same team published a companion paper on the wormhole in <em>Interstellar</em>
(<a href="#ref-james2015b">James et al., 2015b</a>). That paper is where things get interesting.</p>
<h2 id="what-a-wormhole-would-actually-look-like">What a Wormhole Would Actually Look Like</h2>
<p>A traversable Morris-Thorne wormhole connects two regions of spacetime through a throat. An
observer near the throat would see <em>both</em> connected universes simultaneously — one on each side
of the throat boundary. The key visual feature, worked out in detail by Thomas Müller
(<a href="#ref-muller2004">Müller, 2004</a>), is this:</p>
<ul>
<li>Looking through the throat, you see the far-side universe compressed into a disk, bounded by
a bright Einstein ring at the throat.</li>
<li>Outside the ring, you see the near-side universe, heavily distorted by the wormhole&rsquo;s
gravitational field.</li>
<li>There is no <em>shadow</em> in the sense a black hole has — no region from which light cannot escape.
Instead, the ring acts as a portal: all light that reaches the throat passes through rather
than being absorbed.</li>
</ul>
<p>The James et al. (2015b) wormhole paper shows this explicitly. The <em>Interstellar</em> wormhole was
rendered as a spherical lens with a celestial hemisphere visible through it. The visual signature
is a double celestial sphere: your own sky distorted around the outside, and a compressed view of
a distant universe through the middle.</p>
<p>This looks nothing like the EHT images.</p>
<p>The EHT sees a <em>shadow</em> — a dark central region from which no emission escapes, surrounded by a
bright ring. A traversable wormhole at the same mass and distance would show a bright ring with
a <em>second universe</em> visible in the centre, not a dark disk. The topologies of the light-path
structures are fundamentally different.</p>
<h2 id="the-images-rule-something-out">The Images Rule Something Out</h2>
<p>This is the point I find underappreciated. The EHT results are usually discussed as confirming
that M87* and Sgr A* are black holes consistent with GR. That framing is correct. But the images
are also <em>falsifying evidence</em> against alternatives.</p>
<p>Several exotic compact object proposals — gravastars, boson stars, some wormhole metrics — predict
shadow-like features. But traversable wormholes of the Morris-Thorne type do not. The EHT image
morphology — shadow, photon ring, brightness asymmetry tracking Doppler beaming — matches the
Kerr metric quantitatively. An astrophysical wormhole of the type that appears in popular science
coverage would look observably different.</p>
<p>The constraint is not absolute. You could construct wormhole geometries whose photon-sphere
structure mimics a black hole&rsquo;s shadow. But those are not the wormholes that typically appear in
discussions of traversable shortcuts through spacetime, and the Morris-Thorne type — the
physically simplest case — is ruled out at M87* and Sgr A* by the EHT morphology alone.</p>
<p>For more on wormhole theory — ER bridges as time-reversal symmetry, the
Einstein-Rosen caterpillar, and Euclidean wormholes in single theories — see
<a href="/posts/try-to-relax-ironic-process-wormholes/">a later post</a>. The physics is rich and ongoing. But
a picture of a wormhole, if one were ever imaged, would not look like what the EHT published.
It would look like a portal.</p>
<h2 id="the-astonishing-thing-is-that-it-worked">The Astonishing Thing Is That It Worked</h2>
<p>I want to end on this. The ring around M87* was predicted in 1916 from a theory written down
without any observation of a black hole, by people who were not sure black holes existed, using
mathematics developed for entirely different purposes. Luminet computed the image in 1979 on
punch cards, and it matched a photograph taken in 2019 with a planet-scale interferometer.</p>
<p>The agreement is 5%. In astrophysics, where parameters routinely span ten orders of magnitude,
that is essentially exact.</p>
<p>The images are astonishing not because they surprised physicists — they confirmed what general
relativity predicted. They are astonishing because general relativity is apparently the kind of
theory that earns the right to be trusted at microarcsecond precision, at distances of 16.8
megaparsecs, around objects whose entire interiors are, by construction, hidden from us.</p>
<p>Peer review welcome. If you have a wormhole geometry whose shadow is indistinguishable from a
Kerr black hole at current EHT resolution, I would genuinely like to read the paper.</p>
<h2 id="references">References</h2>
<ul>
<li><span id="ref-eht2019a"></span>Event Horizon Telescope Collaboration et al. (2019). First M87 Event Horizon Telescope results. I. The shadow of the supermassive black hole. <em>The Astrophysical Journal Letters</em>, 875, L1. <a href="https://doi.org/10.3847/2041-8213/ab0ec7">DOI: 10.3847/2041-8213/ab0ec7</a></li>
<li><span id="ref-eht2019b"></span>Event Horizon Telescope Collaboration et al. (2019). First M87 Event Horizon Telescope results. V. Physical origin of the asymmetric ring. <em>The Astrophysical Journal Letters</em>, 875, L5. <a href="https://doi.org/10.3847/2041-8213/ab0f43">DOI: 10.3847/2041-8213/ab0f43</a></li>
<li><span id="ref-eht2022"></span>Event Horizon Telescope Collaboration et al. (2022). First Sagittarius A* Event Horizon Telescope results. I. The shadow of the supermassive black hole in the center of the Milky Way. <em>The Astrophysical Journal Letters</em>, 930, L12. <a href="https://doi.org/10.3847/2041-8213/ac6674">DOI: 10.3847/2041-8213/ac6674</a></li>
<li><span id="ref-luminet1979"></span>Luminet, J.-P. (1979). Image of a spherical black hole with thin accretion disk. <em>Astronomy &amp; Astrophysics</em>, 75, 228–235.</li>
<li><span id="ref-james2015a"></span>James, O., von Tunzelmann, E., Franklin, P., &amp; Thorne, K. S. (2015). Gravitational lensing by spinning black holes in astrophysics, and in the movie <em>Interstellar</em>. <em>Classical and Quantum Gravity</em>, 32, 065001. <a href="https://doi.org/10.1088/0264-9381/32/6/065001">DOI: 10.1088/0264-9381/32/6/065001</a></li>
<li><span id="ref-james2015b"></span>James, O., von Tunzelmann, E., Franklin, P., &amp; Thorne, K. S. (2015). Visualizing Interstellar&rsquo;s wormhole. <em>American Journal of Physics</em>, 83(6), 486–499. <a href="https://doi.org/10.1119/1.4916949">DOI: 10.1119/1.4916949</a></li>
<li><span id="ref-muller2004"></span>Müller, T. (2004). Visual appearance of a Morris-Thorne wormhole. <em>American Journal of Physics</em>, 72(8), 1045–1050. <a href="https://doi.org/10.1119/1.1758220">DOI: 10.1119/1.1758220</a></li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>The Lab Goes Home: astro-lab@home and the COVID Pivot in Astronomy Education</title>
      <link>https://sebastianspicker.github.io/posts/astro-lab-at-home/</link>
      <pubDate>Fri, 14 Oct 2022 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/astro-lab-at-home/</guid>
      <description>In spring 2020, the astro-lab at the University of Cologne shut down like everything else. The question was whether you could replicate a hands-on student lab using smartphones and household materials — and send it home. This is the story of how we tried, what we published in CAPjournal, and what happened when schools reopened.</description>
      <content:encoded><![CDATA[<p><em>This post describes two related projects: the astro-lab@home, published in
CAPjournal in 2022 with Alexander Küpper and André Bresges; and its successor,
the astro-lab@school, published the same year in Astronomie+Raumfahrt. Both grew
from the same question: what does astronomy education look like when you cannot
bring students into a lab?</em></p>
<hr>
<h2 id="what-the-astro-lab-was">What the astro-lab Was</h2>
<p>Before the pandemic, the astro-lab at the University of Cologne was a
student laboratory focused on extrasolar planets. School groups — mostly
secondary school students — came in and worked through a set of analogy
experiments: how do you detect a planet you cannot see? How do you infer
its size, its orbit, whether it might be habitable?</p>
<p>The pedagogical bet was that exoplanet research, precisely because it is
headline-generating and genuinely open-ended, could counteract the
motivational slump in physics that tends to set in around middle school.
The context — life in the universe, habitable worlds, the possibility of
something out there — did a lot of the work that no abstract force diagram
could do.</p>
<p>The experiments themselves were analogy experiments: a lamp standing in
for a star, a sphere on a track standing in for a planet. The key
measurement was the transit: when the &ldquo;planet&rdquo; passed in front of the
&ldquo;star&rdquo;, the light sensor registered a dip. Students measured the dip,
estimated the ratio of areas, connected it to radius, and got a number
that meant something. The number was not precise. It did not need to be.
It was real.</p>
<hr>
<h2 id="spring-2020">Spring 2020</h2>
<p>In March 2020, schools shut down, and the University of Cologne
followed. Visits to the astro-lab were cancelled. The question the team
faced — Alexander Küpper, André Bresges, and I — was not whether to do
something but what was actually feasible.</p>
<p>German distance learning at the time was characterised by worksheet
packages delivered to students with minimal interactive contact. Only 16%
of German students reported being in video conferences with their
teachers; 30% reported no contact at all since the initial shutdown. The
infrastructure was not there, the habits were not there, and the
expectation that students had the materials and equipment for a
physics lab at home was not warranted.</p>
<p>What students did have, almost universally, was a smartphone.</p>
<p>Modern smartphones contain a remarkable array of sensors: ambient light
sensors, accelerometers, gyroscopes, barometers, magnetometers. The app
<a href="https://phyphox.org">phyphox</a>, developed at RWTH Aachen, makes those
sensors accessible with a clean interface designed for use in education.
If the sensor hardware was already in students&rsquo; pockets, the lab setup
problem became: what household materials can stand in for the rest of the
apparatus?</p>
<hr>
<h2 id="astro-labhome-bringing-science-to-the-sofa">astro-lab@home: Bringing Science to the Sofa</h2>
<p>The astro-lab@home project adapted the original lab experiments for
home use with smartphones and everyday materials. The core transit
experiment — measuring the dip in light caused by an opaque object
passing in front of a lamp — turned out to be reproducible without any
specialist equipment. A table lamp, a ball on a string, and a
smartphone positioned beneath the lamp gave you the raw data. phyphox
recorded the light curve in real time.</p>
<p>We designed the setup to be flexible enough to work with what students
actually had. The default used the ambient light sensor in Android
devices, which is directly accessible through phyphox. iPhones do not
expose their light sensor through software interfaces, so for Apple
devices we recommended an external Bluetooth sensor — an inexpensive
workaround that also had the advantage of producing more consistent data
across device types.</p>
<p>The resulting package was not just an equipment list. We developed
accompanying materials that explained the physics (why does a transit
produce a specific shape of dip rather than a sharp cutoff?), connected
the analogy experiment to the real science (how does this scale up to the
actual transit photometry done by TESS and Kepler?), and offered
scaffolding at different levels of independence.</p>
<p>The project was published in the IAU&rsquo;s <a href="https://www.capjournal.org">CAPjournal</a>
in 2022 — a journal aimed at communicators and educators in astronomy.
The audience was intentionally broad: teachers looking for accessible
classroom activities, outreach organisations trying to reach students at
home, curious individuals who wanted to do something real with their
phone. &ldquo;Bringing science to the sofa&rdquo; was the headline, and that was
genuine. The experiments worked in a living room.</p>
<hr>
<h2 id="what-came-next-astro-labschool">What Came Next: astro-lab@school</h2>
<p>When schools reopened and in-person teaching became possible again, the
question was not simply &ldquo;back to normal&rdquo; but what the COVID period had
actually taught us about the format.</p>
<p>The astro-lab@school, published in Astronomie+Raumfahrt in 2022, addressed
that question directly. Some things from the home version had worked
better than expected. The smartphone-based setup was cheaper, more
portable, and more directly in students&rsquo; hands than the original benchtop
apparatus. There was something pedagogically valuable about students
using their own devices rather than lab equipment provided by someone
else.</p>
<p>The astro-lab@school retained the smartphone-centred approach and
adapted it for a school context: class sizes, time constraints, the
reality of mixed equipment across a room of thirty students. The
experiments from the home version were modified for group work and
parallel execution. The scaffolding materials were reworked for the
paced structure of a school lesson rather than the self-directed format
of home use.</p>
<p>The result was not a reversion to the pre-pandemic lab. It was a hybrid:
in-person group work, but with tools and methods developed for
distributed individual use. The pandemic had, inadvertently, pushed the
format toward something more robust.</p>
<hr>
<h2 id="a-note-on-what-made-this-work">A Note on What Made This Work</h2>
<p>The core technical contribution — smartphones as measurement instruments
for analogy experiments in astronomy education — is described in more
detail in a <a href="/posts/exoplanet-hunting-smartphones/">later publication in <em>The Physics Teacher</em></a>,
which covers the experimental setups, sensor comparison, and pedagogical
progression in a form aimed at an international teaching audience. If
you want the how-to, start there.</p>
<p>What I want to note here is something slightly different: the role of
context.</p>
<p>The astro-lab bet on exoplanets as a motivational context, and the
evidence supports that bet. Exoplanet research remains one of the few
areas of physics that generates genuine public enthusiasm, and students'
interest in the topic is empirically documented. What the COVID period
showed is that the context is robust enough to survive the removal of the
lab infrastructure. Students working on transit photometry with a lamp
and a smartphone in their kitchen were doing the same thing, conceptually,
as students at a benchtop sensor station at the university. The physical
setup was different. The question was the same.</p>
<p>That is, I think, a more general lesson. Context-driven education is
not dependent on a specific material configuration. The question carries.</p>
<hr>
<p><em>For the curriculum unit that places these experiments in the context of the
NRW Sekundarstufe I physics syllabus, see
<a href="/posts/fremde-welten-exoplanet-teaching/">Fremde Welten</a>.
For the air pressure / Mars experiment that grew from the same lab, see
<a href="/posts/mission-to-mars/">Mission to Mars</a>.</em></p>
<hr>
<h2 id="references">References</h2>
<p>Spicker, S. J., Küpper, A., &amp; Bresges, A. (2022). astro-lab@home — bringing
science to the sofa. <em>CAPjournal</em>, 31, 12–17.</p>
<p>Küpper, A., &amp; Spicker, S. J. (2022). astro-lab@school. <em>Astronomie+Raumfahrt
im Unterricht</em>, 59(6).</p>
<p>Küpper, A., &amp; Schulz, A. (2017). Das Schülerlabor astro-lab an der
Universität zu Köln. <em>Astronomie+Raumfahrt im Unterricht</em>, 54(1).</p>
<p>Stampfer, C., &amp; Staacks, S. (2020). phyphox — using smartphones as
experimental tools. <em>Physics Education</em>, 55(5), 055007.
<a href="https://doi.org/10.1088/1361-6552/ab8a2e">https://doi.org/10.1088/1361-6552/ab8a2e</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Why Universities Need Their Own YouTube</title>
      <link>https://sebastianspicker.github.io/posts/educast-nrw-hochschul-youtube/</link>
      <pubDate>Tue, 05 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/educast-nrw-hochschul-youtube/</guid>
      <description>In June 2022 I presented on educast.nrw at the Tag der Lehre at HfMT Köln. This is the longer argument behind that talk: why universities should not outsource their video infrastructure to commercial platforms, and what a better alternative looks like in practice.</description>
      <content:encoded><![CDATA[<p>In June 2022 I gave a presentation at the <em>Tag der Lehre</em> at the Hochschule für Musik und Tanz Köln on video-supported teaching with <a href="https://educast.nrw/de/">educast.nrw</a>. Presenting something to colleagues who already use ILIAS every day and are broadly skeptical of yet another platform is a useful discipline. You have to answer the obvious question fast: why should we care about this, when YouTube works fine?</p>
<p>The short answer is that YouTube does not work fine, for reasons that matter specifically to universities. The longer answer is what this post is about.</p>
<h2 id="what-is-wrong-with-youtube">What Is Wrong with YouTube</h2>
<p>YouTube is the world&rsquo;s dominant video platform. It is free to use, globally available, handles any file size, transcodes automatically, and comes with an audience of two billion logged-in users. For individual creators who want reach, it is genuinely hard to beat.</p>
<p>For universities it fails in at least three important ways.</p>
<p><strong>Data protection.</strong> When you upload a lecture or a concert recording to YouTube, the content, the metadata, and the viewing behaviour of your students go to Google&rsquo;s servers — which are predominantly in the United States. Under the GDPR, transferring personal data to third countries requires either an adequacy decision, standard contractual clauses with additional safeguards, or explicit informed consent. After the Schrems II ruling (Court of Justice of the EU, 2020), the adequacy of US-based data transfers became legally contested in a way that makes institutional YouTube use genuinely difficult for European universities. Using it for anything with identifiable students — which includes most teaching content — is a compliance problem.</p>
<p><strong>Platform logic.</strong> YouTube is designed to maximise watch time. Its recommendation algorithm is not neutral. It will recommend whatever comes after your lecture, and what comes after your lecture is not under your control. For educational content — especially sensitive material, or content that should remain in a defined pedagogical context — this is a real problem. The platform is not indifferent to what is hosted on it; it shapes how it is consumed.</p>
<p><strong>Institutional fragility.</strong> YouTube is free until it isn&rsquo;t. Platform terms change; monetisation policies change; content is demonetised or removed based on automated systems with imperfect appeal mechanisms. Building institutional infrastructure on a free commercial service is a bet that the commercial incentives of that service will remain aligned with your needs. That bet has a poor historical record.</p>
<p>None of this means that individual instructors should never use YouTube. It means that universities should not make YouTube their default institutional solution.</p>
<h2 id="educastnrw-a-cooperative-model">educast.nrw: A Cooperative Model</h2>
<p><a href="https://educast.nrw/de/">educast.nrw</a> is a project of the Digitale Hochschule NRW — a cooperative of North Rhine-Westphalian universities building shared digital infrastructure. The concept is a state-wide video service, run by universities for universities, for the recording, processing, management, and distribution of video content in teaching and research. The platforms calls itself &ldquo;Hochschul-YouTube&rdquo;, which is both accurate and slightly underselling what makes it different.</p>
<p>The HfMT Köln participates as a user institution, which is where I come in as the IT contact for setting it up on our end.</p>
<p>The technical foundation is <a href="https://opencast.org/">Opencast</a>, an open-source video management system developed by a consortium of universities. This matters: the software is auditable, the development direction is set by the institutions that use it rather than by advertising revenue, and the infrastructure runs on German servers that are explicitly GDPR-compliant. Licenses on uploaded content are freely choosable — CC-BY-SA is an option, which means the university&rsquo;s teaching materials can be open access if that is what the instructor wants.</p>
<h2 id="what-it-can-do">What It Can Do</h2>
<p>The feature set covers the actual use cases of a university, not the use cases of a content creator trying to build a following.</p>
<p><strong>Recording.</strong> The Opencast Studio browser app records in three modes: screen only, camera only, or screen and camera simultaneously as a synchronised multi-stream. That last option — <em>Presentation</em> and <em>Presenter</em> as separate streams, played back with the viewer switching focus between them, or as picture-in-picture — is the format that works for a lecture. You get the slides and the speaker in the same video, but the viewer can choose which to focus on. That flexibility is not something you get from a simple screen recording uploaded to YouTube.</p>
<p><strong>Multi-perspective video.</strong> For a music university this is the feature that changes things. A concert or a masterclass is not well-served by a single camera angle. The platform supports simultaneous recording from two camera perspectives — a wide shot and a detail shot, say, or a front view and a hands view for a piano performance. The viewer can switch between them in playback, or the institution can set a default presentation. This is infrastructure that makes the teaching use of concert recordings actually feasible, not just technically possible.</p>
<p><strong>Formats.</strong> Video up to 4K with adaptive bitrate streaming (the player adjusts automatically to the viewer&rsquo;s connection), audio up to broadcast quality (48kHz/16-bit, exceeding CD&rsquo;s 44.1kHz), with FLAC at up to 96kHz/24-bit in development. No file size limit. These specifications matter for music. A piano recording compressed to whatever YouTube decides to do with it is not the same as an uncompromised audio stream. The difference is audible.</p>
<p><strong>ILIAS integration.</strong> This is the practical hinge. Video that lives in a separate platform is video that students may or may not find. Video embedded directly in the ILIAS course page, in the learning module, at the point in the curriculum where it is relevant — that is video that is part of the course rather than adjacent to it. The integration between educast.nrw and ILIAS is direct: upload to the video platform, embed in ILIAS on pages, in learning modules, or as standalone video objects, all from within ILIAS.</p>
<p><strong>Access rights.</strong> The granularity here is what distinguishes it from any public platform. Each video can be set to: public (anyone on the internet), institution-wide (anyone logged in at the university), course-wide (only enrolled students in a specific course), individual (specific named people), or private (only the uploader). A graduation concert might be public. A practice session for student feedback might be course-only. A recording made for an individual student&rsquo;s reflection might be shared only with that student and their teacher. These are all normal use cases in a music university; they all require different settings; the platform handles all of them.</p>
<h2 id="use-cases-in-a-music-university">Use Cases in a Music University</h2>
<p>The general university use case — lecture recording, video tutorial, self-study module — applies at HfMT as much as anywhere. But a music and dance university has some specific ones.</p>
<p><strong>Concert recordings.</strong> HfMT Köln runs performances at both its Cologne and Wuppertal sites. Recording these and making them available to students, faculty, and selectively to the public used to mean someone had to manage files, find hosting, deal with YouTube&rsquo;s automated copyright detection flagging student performances of copyrighted repertoire, and explain to students why their graduation concert had been muted by an algorithm. The controlled platform makes all of this manageable.</p>
<p><strong>Stage presence as a reflective tool.</strong> Watching yourself perform is a standard part of performance training. It is uncomfortable, useful, and until recently required either dedicated recording equipment or the ad-hoc use of a phone propped against something. A proper recording infrastructure with controlled access — the student sees the video, their teacher sees the video, nobody else does — changes the pedagogical viability of this approach. The barrier to actually using video feedback in practice teaching drops substantially.</p>
<p><strong>Theory and practice.</strong> This is the institutional argument I made in the presentation and stand by: video infrastructure that works for a lecture also works for a concert. The same system that stores the introduction to music theory also stores the masterclass by a visiting artist. This is not incidental — it is the point of a shared infrastructure. You do not need to choose a platform for academic content and a different one for performance content. The platform works for both.</p>
<h2 id="the-argument-behind-the-argument">The Argument Behind the Argument</h2>
<p>There is a broader principle at work here that extends beyond video platforms. Public universities are funded by public money. The infrastructure they build with that money — software, platforms, data, content — should be under their control, governed by their values, and ideally available to other public institutions. The commercial platform model inverts this: you get free hosting in exchange for your data, your students&rsquo; attention, and your institutional dependence.</p>
<p>educast.nrw is an example of what the alternative looks like in practice: a cooperative of public institutions building shared infrastructure on open-source software, governed collectively, with data on European servers under European law. It is not perfect — the setup overhead is real, the user experience does not match YouTube&rsquo;s, and the feature roadmap (automatic subtitling, H5P support, livestreaming, annotation tools) is still catching up to what commercial platforms have had for years. But the model is right.</p>
<p>The question of who owns the video infrastructure of a university is the same question as who owns its email, its learning management system, its student data. The answer should be: the university, operating under law, answerable to its students and to the public that funds it.</p>
<hr>
<p><em>The slides from the Tag der Lehre 2022 presentation are available on request. For educast.nrw setup at HfMT Köln, contact the IT department.</em></p>
<hr>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://educast.nrw/de/">educast.nrw</a></li>
<li><a href="https://opencast.org/">Opencast</a></li>
<li><a href="https://studio.opencast.org/">Opencast Studio</a></li>
<li><a href="https://tobira.opencast.org/">Tobira Videoportal</a></li>
<li><a href="https://github.com/opencast-ilias">opencast-ilias plugin (GitHub)</a></li>
</ul>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-08-18</strong>: Corrected the capitalisation of &ldquo;OpenCast&rdquo; to &ldquo;Opencast&rdquo; throughout (matching the project&rsquo;s official spelling on opencast.org).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Teaching Stellar Evolution Without a Star: DIY Experiments and a Board Game</title>
      <link>https://sebastianspicker.github.io/posts/stellar-evolution-diy/</link>
      <pubDate>Mon, 11 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/stellar-evolution-diy/</guid>
      <description>Stellar evolution is now in the NRW physics curriculum, but there are almost no direct experiments you can do with it. Two responses: some DIY smartphone experiments for stellar formation, and a board game called &amp;ldquo;Staub und Sterne&amp;rdquo; (Dust and Stars) that lets students play through the stellar lifecycle. Both grew from the astro-lab project at the University of Cologne.</description>
      <content:encoded><![CDATA[<p><em>This post covers two related pieces of work: a paper on DIY smartphone
experiments for stellar formation, submitted to Astronomie+Raumfahrt
(co-authored with Alexander Küpper); and a board game, &ldquo;Staub und Sterne&rdquo;
(Dust and Stars), designed for use in secondary school physics by Miriam Küpper
and Alexander Küpper.</em></p>
<hr>
<h2 id="the-curriculum-problem">The Curriculum Problem</h2>
<p>The 2019 revision of the NRW Gymnasium physics curriculum for Sekundarstufe I
requires students to be able to describe, in broad outline, the typical stages
of stellar evolution. This is new territory for many teachers — it is not a
topic that would have appeared in teacher education programmes of ten or twenty
years ago, and few teachers have personal experience with it from their own
school or university courses.</p>
<p>More fundamentally: stellar evolution is a topic where the usual experimental
approach does not work. You cannot compress an interstellar gas cloud in a
classroom. You cannot observe a star form in real time. The timescales involved
are tens of millions to billions of years; the spatial scales are measured in
light-years and astronomical units. The experimental toolkit that works for
optics, mechanics, and even much of electromagnetism simply does not apply.</p>
<p>This creates a genuine pedagogical challenge. Students have strong interest in
astrophysical topics — the ROSE study documents this consistently — and stellar
evolution involves physical concepts that are curriculum-relevant (gravity,
pressure, energy, radiation). But the standard path from &ldquo;concept&rdquo; to
&ldquo;experiment&rdquo; to &ldquo;understanding&rdquo; is not available in the usual form.</p>
<p>Two approaches are described here. One uses what students do have — smartphones
and household materials — to model the physics of stellar formation through
analogy. The other accepts that some physics is better learned through
structured play, and designs accordingly.</p>
<hr>
<h2 id="diy-experiments-for-stellar-formation">DIY Experiments for Stellar Formation</h2>
<p>The physics of star formation starts with an interstellar gas cloud and the
competition between gravity and pressure. A cloud collapses when gravity wins:
specifically, when the cloud is massive enough (or cold enough) that
gravitational attraction overcomes the thermal pressure of the gas. This is
the Jeans criterion, and it is the quantitative condition that separates clouds
that will form stars from clouds that will disperse.</p>
<p>The qualitative version is accessible to secondary school students: a dense,
cold, massive cloud is more likely to collapse than a diffuse, hot, small one.
Once collapse begins, it is self-reinforcing — increasing density increases
the gravitational attraction, which drives further compression, which increases
the density further.</p>
<p>Two DIY experiments were developed to give students a physical encounter with
the key concepts, using materials that can be assembled at home or in school
without specialist equipment.</p>
<p><strong>Experiment 1: Compression and heating.</strong> When a gas is compressed, it heats.
This is directly measurable with the temperature sensor in a smartphone (or a
separate Bluetooth thermometer connected to phyphox) and a simple compression
apparatus — a syringe, a sealed container, or an inflation device. Students
observe the temperature rise during compression and temperature drop during
expansion, establishing the qualitative relationship. In the stellar formation
context: the collapsing gas cloud heats as it compresses, which is why a
protostar is hot long before nuclear fusion ignites.</p>
<p><strong>Experiment 2: Self-reinforcing compression.</strong> A simple model of the positive
feedback loop in gravitational collapse: a weighted ball in a flexible container,
which compresses a small spring or air cushion. The more the ball compresses
the cushion, the further it falls. Students can explore the threshold conditions
under which the system reaches a stable equilibrium versus continues to
compress indefinitely — a qualitative model of the Jeans criterion.</p>
<p>Both experiments are designed to be performed with available materials at
the DIY/home level. The smartphone&rsquo;s sensor integration via phyphox provides
quantitative data where possible, maintaining the connection to real
measurement that is a design principle across all the astro-lab experiments.</p>
<hr>
<h2 id="why-stellar-evolution-is-hard-to-experiment-with">Why Stellar Evolution Is Hard to Experiment With</h2>
<p>A methodological note worth making explicit: the shift from direct experiment
to analogy experiment to board game is not a retreat from rigor. It is a
recognition that different kinds of physical and conceptual content require
different pedagogical approaches.</p>
<p>For exoplanet detection, we can build a genuine analogy: the physics of a
planet blocking a star&rsquo;s light and a ball blocking a lamp&rsquo;s light are
structurally identical. The analogy experiment produces data whose
interpretation follows the same logic as the real scientific data.</p>
<p>For stellar evolution, the analogy is weaker. The compression of a gas
syringe models one aspect of the collapse (temperature increase) but not the
self-gravitating dynamics, the radiation pressure that eventually halts
collapse, or the nuclear ignition that defines the transition from protostar
to main sequence star. No tabletop experiment captures the whole process.</p>
<p>This is important to tell students: the experiment models this aspect, and
not those aspects. Making the model limits explicit is part of the scientific
literacy the unit is supposed to develop.</p>
<hr>
<h2 id="staub-und-sterne-a-board-game-for-stellar-evolution">&ldquo;Staub und Sterne&rdquo;: A Board Game for Stellar Evolution</h2>
<p>The board game &ldquo;Staub und Sterne&rdquo; (Dust and Stars), designed by Miriam Küpper
and Alexander Küpper, takes a different route to the same content.</p>
<p>Games have been used in physics education in all phases of a lesson: as entry
points (introducing a topic without immediately constraining it to a specific
physical question), as vehicles for content acquisition, and as reinforcement
and assessment tools. For stellar evolution specifically, the argument for a
game is strong: the content involves a branching process with multiple pathways
depending on a single initial parameter (mass), it is cyclic (the remnant of
stellar death seeds the gas cloud that forms the next generation of stars), and
it is inherently dynamic — the drama of a supernova is hard to convey through
a diagram but easy to convey through play.</p>
<p>The target audience is years 7–8 (or year 8–9 depending on the school&rsquo;s
internal curriculum placement). The learning objectives:</p>
<ul>
<li>Describe the stages of stellar evolution as a function of mass</li>
<li>Name the possible end states (white dwarf, neutron star, black hole) and
the stellar paths that lead to each</li>
<li>Describe stellar evolution as a cyclic process: the gas cloud produced at
the end of a star&rsquo;s life can, under the right conditions, seed the formation
of new stars</li>
</ul>
<p>The game &ldquo;Staub und Sterne&rdquo; (the name translates as &ldquo;Dust and Stars&rdquo;) has
players navigating a star through its lifecycle, with the key branching
decision determined by the star&rsquo;s initial mass. A low-mass star follows one
path; a high-mass star follows another. Both paths end in a stellar remnant
and a dispersed gas cloud — raw material for the next cycle.</p>
<p>The game design incorporates the research on flow experience in learning:
cooperative or competitive play, immediate feedback on decisions, the kind
of engaged attention that is rare in conventional physics lessons and that
the ROSE study data suggest is precisely what is missing for many students
in physics classrooms.</p>
<hr>
<h2 id="a-note-on-what-experiments-cannot-reach">A Note on What Experiments Cannot Reach</h2>
<p>There is a broader point here that the exoplanet posts sidestep because
the experiments for exoplanet detection are so unusually good. For most
astrophysics — stellar evolution, galactic dynamics, cosmology — there is
no analogy experiment that captures the full physics. The observable has
been observed, the theory has been developed, but the pedagogical problem
of how to give students a physical encounter with that knowledge remains
genuinely difficult.</p>
<p>Games, simulations, interactive visualisations, and structural analogies all
have a role. Each of them is a partial solution; none of them is what a
well-designed experiment is. Knowing which approach fits which content, and
being honest with students about the limits of the model you are using, is
part of what physics teaching requires.</p>
<p>The experiments described in this post are a start on one small part of
that problem.</p>
<hr>
<p><em>The exoplanet experiments from the same project are described in the
<a href="/posts/astro-lab-at-home/">astro-lab@home</a>,
<a href="/posts/exoplanet-hunting-smartphones/">Hunting Exoplanets with Your Phone</a>,
and <a href="/posts/fremde-welten-exoplanet-teaching/">Fremde Welten</a> posts.</em></p>
<p><em>The misconceptions students bring to stellar evolution — about the Sun,
gravity, nucleosynthesis, and the language of astronomy — are documented
in detail in <a href="/posts/astronomy-misconceptions/">Please Stop Saying the Sun Is on Fire</a>,
written as a companion to the September 2020 teacher training session that
motivated much of this work.</em></p>
<hr>
<h2 id="references">References</h2>
<p>Spicker, S. J., &amp; Küpper, A. (submitted). Einfache DIY-Experimente zum
Verständnis der Sternentstehung für den Physik- und Astronomieunterricht
sowie zu Hause. <em>Astronomie+Raumfahrt im Unterricht</em>.</p>
<p>Küpper, M., &amp; Küpper, A. (2022). Sternentwicklung spielerisch verstehen:
Konzeption eines Brettspiels für den Physikunterricht der Sekundarstufe I.
<em>Presentation at AG Lehrerfortbildung, Universität zu Köln.</em></p>
<p>Elster, D. (2008). Was interessiert Jugendliche an den Naturwissenschaften?
VFPC Verein zur Förderung des physikalischen und chemischen Unterrichts.</p>
<p>MSB NRW (2019). <em>Kernlehrplan für die Sekundarstufe I — Gymnasium in
Nordrhein-Westfalen: Physik.</em> Ministerium für Schule und Bildung NRW.</p>
<p>Ward-Thompson, D., &amp; Whitworth, A. (2011). <em>An Introduction to Star Formation.</em>
Cambridge University Press.</p>
]]></content:encoded>
    </item>
    <item>
      <title>They Told Me Not to Use Design Thinking. They Were Right.</title>
      <link>https://sebastianspicker.github.io/posts/design-thinking-vs-grounded-theory/</link>
      <pubDate>Tue, 23 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/design-thinking-vs-grounded-theory/</guid>
      <description>When you are a physicist doing education research, methodology feels like a bureaucratic formality standing between you and the interesting work. Everyone told me to use grounded theory instead of design thinking in my thesis. I ignored them. This is the postmortem.</description>
      <content:encoded><![CDATA[<p><em>A follow-up to the <a href="/posts/mission-to-mars/">Mission to Mars</a> post, which
describes the experimental work. This one is about the methodology layer
underneath it — specifically, what I got wrong.</em></p>
<hr>
<h2 id="the-setup">The Setup</h2>
<p>My background is in physics. I ended up in physics education research
sideways, through the astro-lab project and through a genuine interest in
why students find physics so alienating and what might help. When it came
time to frame that work as a thesis, I had to choose a methodology.</p>
<p>I chose design thinking. Or more precisely, I chose something that
borrowed heavily from design-based research and design thinking frameworks
and that felt, at the time, like the obvious match for what I was doing.
I was designing experiments. I was iterating on them. I was testing them
with students and refining them. Design thinking is a framework for
exactly this process. What could be more natural?</p>
<p>Several people told me I was making a mistake. Colleagues with more
qualitative research experience, a supervisor who had been through
the methodology debates in education research more times than he wanted
to count. The consistent advice was: use grounded theory. Be systematic
about your data. Let the categories emerge from what you actually observe
rather than from what you designed the experiment to produce.</p>
<p>I thought I understood what they were saying. I did not understand what
they were saying.</p>
<hr>
<h2 id="what-i-thought-design-thinking-gave-me">What I Thought Design Thinking Gave Me</h2>
<p>Design thinking, as a research framing, offered what felt like a clean
correspondence between method and subject matter. The thing I was
producing was a designed artifact — a teaching experiment. The process
I was following was inherently iterative: run it, observe what happens,
revise, run it again. The framework had a vocabulary for this (empathise,
define, ideate, prototype, test) that matched my actual working process.</p>
<p>Design-based research, the academic version of this approach in education,
has a real literature behind it. It is used in educational technology
research and in curriculum development. It is not a made-up category. The
argument for it is reasonable: if you are trying to design effective
educational interventions, then designing and studying those interventions
at the same time is a coherent research strategy.</p>
<p>What I told myself was: I am doing design-based research. The methodology
matches the work. The thesis will describe the design process, the
rationale for each design decision, the iterative refinements, and the
evidence that the final design works. This is a contribution to knowledge
because it produces a principled, evidence-informed design that other
practitioners can use and adapt.</p>
<p>This is not wrong. But it is not enough for a thesis. And I only
understood why it is not enough after I had spent considerable time
trying to make it be enough.</p>
<hr>
<h2 id="the-reckoning-in-the-methodology-chapter">The Reckoning in the Methodology Chapter</h2>
<p>The methodology chapter of a thesis is where you have to be explicit
about the epistemological status of your claims. You are not just
describing what you did. You are explaining why the thing you did counts
as knowledge production, what kind of knowledge it produces, and how
someone else could evaluate whether you did it correctly.</p>
<p>This is where design thinking started to come apart.</p>
<p><strong>What kind of claim does a design study make?</strong> The honest answer is:
it makes a claim about this design, in these contexts, with these
students. It does not easily generalise beyond that. If I show that
the Mission to Mars experiment produces measurable improvements in
students&rsquo; understanding of air pressure in a student lab context at
the University of Cologne in 2019, the implication for other teachers
in other contexts is&hellip; unclear. The design worked here. Maybe it
will work for you. Good luck.</p>
<p>A thesis contribution needs to be something more transferable than that.
It needs to produce knowledge about a phenomenon, not just knowledge
about a specific designed object. &ldquo;Here is a well-designed experiment&rdquo;
is a practitioner contribution, which is genuinely valuable, but it is
not the same as a theoretical contribution to the field.</p>
<p><strong>The iteration problem.</strong> Design thinking celebrates iterative
refinement. But in a thesis, every iteration needs to be motivated by
evidence, and the nature of the evidence and how it maps onto the
design changes needs to be made explicit. If I changed something between
version 1 and version 2 of the experiment, the methodology chapter must
explain: what data told me to make that change? How did I analyse it?
What coding framework did I apply? What alternative changes did I
consider and rule out, and on what grounds?</p>
<p>Design thinking has no systematic answer to these questions. It has
process descriptions (&ldquo;we tested with users and gathered feedback&rdquo;) but
not research methodology answers (&ldquo;I applied open coding to the think-aloud
protocols and the following categories emerged, which pointed toward
this specific revision&rdquo;). Without that precision, the &ldquo;iteration&rdquo; in
the methodology chapter looks like: I tried it, it did not quite work,
I made it better. Which is honest but not a researchable process.</p>
<p><strong>The validation problem.</strong> Design-based research often validates its
designs against the criteria that motivated the design. I designed the
experiment to address specific student misconceptions about air pressure.
I then tested whether students who did the experiment had fewer of those
misconceptions afterward. If the answer is yes, the design is validated.</p>
<p>But this is circular in a way that becomes visible under examination.
The misconceptions I targeted were the ones I identified at the start.
The students I studied were the ones who came to my lab. The measurement
instrument I used was one I designed to detect the specific changes
I expected the design to produce. The whole system is oriented toward
confirming the design rather than discovering something about the
phenomenon.</p>
<p>Grounded theory cuts this loop. You start with the data — the
students&rsquo; actual responses, their misconceptions as they express them,
the things that confuse them that you did not anticipate — and you
build categories from the bottom up. What you end up with is a theory
of how students actually think about air pressure (or whatever the topic
is), which may or may not match what you assumed when you designed the
experiment. The cases where it does not match are precisely where the
theoretical contribution lives.</p>
<hr>
<h2 id="what-grounded-theory-would-have-required">What Grounded Theory Would Have Required</h2>
<p>Grounded theory, done properly, is laborious. The Glaserian version
(open coding, theoretical sampling until saturation, constant
comparative method) requires treating every interview, every observation,
every student response as a data source to be systematically analysed,
compared, and connected into a coherent theory.</p>
<p>Theoretical sampling means you do not decide in advance how many students
to study or what contexts to observe. You keep gathering data until new
cases stop producing new categories — until the theory is saturated.
This is methodologically sound and practically painful, because you
cannot know in advance when you will be done.</p>
<p>Memoing — writing ongoing analytical notes about the emerging categories
and their relationships — is a discipline that forces you to be explicit
about your reasoning at every step. Not just &ldquo;these two responses seem
similar&rdquo; but &ldquo;these two responses are similar because both students are
treating pressure as a property of moving air, and here is how that
connects to the misconception documented by [citation].&rdquo;</p>
<p>I did not want to do this. I wanted to design experiments. Grounded
theory felt like a detour from the thing I was actually interested in.</p>
<p>The advice I received was: this is not a detour. A systematic analysis
of what students think about air pressure, and how they think about it,
and what experiences shift their thinking, is a theoretical contribution
that would make the experiments more useful to everyone — not just a
record of experiments that worked in one lab in one city in one year.</p>
<p>They were right about this.</p>
<hr>
<h2 id="what-i-actually-learned-too-late-to-use-in-the-thesis">What I Actually Learned (Too Late to Use in the Thesis)</h2>
<p>The most useful student responses in the Mission to Mars experiment
were not the ones that confirmed the design was working. They were the
unexpected ones.</p>
<p>The PVC pipe failure — the moment when the lid pops off and students
hear the sound — was included because I thought it would demonstrate the
direction of pressure force in a visceral way. What I observed, which
I noted but did not systematically analyse, was that different students
interpreted the pop differently. Some immediately understood it as the
internal air pushing out. Others interpreted it as the external vacuum
pulling the lid. A few were unsure which way the force had been directed
even after the event.</p>
<p>A grounded theory analysis of those responses would have produced
something genuinely interesting: a typology of how students process
a demonstrable physical event when it conflicts with their existing
pressure intuitions. That typology would have been transferable to
other experimental contexts, other pressure scenarios, other situations
where students encounter the vacuum-suction confusion.</p>
<p>Instead I noted it, described it qualitatively, and moved on because
it was not what the design was optimised to produce.</p>
<p>That is the design thinking trap. You are so focused on the designed
outcome that you treat unexpected observations as noise rather than as
data. Grounded theory treats them as the most valuable data you have.</p>
<hr>
<h2 id="a-note-for-other-physicists-entering-education-research">A Note for Other Physicists Entering Education Research</h2>
<p>If you are coming from a natural science background and you are starting
work in education research, the methodology question will feel foreign
at first. In physics, methodology is largely a matter of technical
choice — which instrument, which statistical test, which model. The
epistemological questions (what kind of knowledge does this produce?
how does it generalise?) are handled by the experimental framework
itself, which is a known, shared, peer-reviewed practice.</p>
<p>In qualitative education research, those questions are not handled in
advance. You have to work them out explicitly, for your specific study,
in writing. This is uncomfortable for people trained in a tradition where
you do the experiment and then write up what happened.</p>
<p>The temptation, for a physicist, is to choose a methodology that feels
like a framework for doing things rather than one that feels like a
framework for thinking about what you found. Design thinking is a
framework for doing things. Grounded theory is a framework for thinking
about what you found.</p>
<p>Both are legitimate. But a thesis needs to make a theoretical contribution,
and theoretical contributions come from systematic analysis of phenomena,
not from documentation of designed objects.</p>
<p>I would have finished faster and understood more if I had done the
uncomfortable thing from the start.</p>
<hr>
<p><em>The experimental work this post is commenting on is described in
<a href="/posts/mission-to-mars/">Mission to Mars</a>. For a more successful later
use of qualitative methodology in a related context, see
<a href="/posts/ai-transcription-grounded-theory/">AI Transcription and Grounded Theory</a>.</em></p>
<hr>
<h2 id="references">References</h2>
<p>Glaser, B. G., &amp; Strauss, A. L. (1967). <em>The Discovery of Grounded
Theory: Strategies for Qualitative Research.</em> Aldine.</p>
<p>Strauss, A., &amp; Corbin, J. (1998). <em>Basics of Qualitative Research:
Techniques and Procedures for Developing Grounded Theory</em> (2nd ed.).
SAGE Publications.</p>
<p>The Design-Based Research Collective (2003). Design-based research: An
emerging paradigm for educational inquiry. <em>Educational Researcher</em>,
32(1), 5–8. <a href="https://doi.org/10.3102/0013189X032001005">https://doi.org/10.3102/0013189X032001005</a></p>
<p>Brown, T. (2008). Design thinking. <em>Harvard Business Review</em>, 86(6),
84–92.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Mission to Mars: Teaching Air Pressure with a Smartphone and a Vacuum Pump</title>
      <link>https://sebastianspicker.github.io/posts/mission-to-mars/</link>
      <pubDate>Fri, 17 Sep 2021 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/mission-to-mars/</guid>
      <description>You give students a vacuum pump, a bag of household materials, and a smartphone running phyphox. Their task: build a spaceship prototype that will survive the pressure difference between the crew compartment and space. A design-based inquiry experiment published in The Physics Teacher in 2021, and one of the more memorable experiments I have been part of running.</description>
      <content:encoded><![CDATA[<p><em>This post describes &ldquo;Mission to Mars: Concept and Implementation of a
Design-Based (Hands-On) Smartphone Experiment Helping Students Understand
the Effects Caused by Differences in Air Pressure&rdquo;, published in The Physics
Teacher (Vol. 59, 2021) together with Alexander Küpper and André Bresges.</em></p>
<hr>
<h2 id="the-problem-with-air-pressure">The Problem With Air Pressure</h2>
<p>Air pressure is one of those topics that students nominally know something
about from everyday life and have almost always misconceived. The documented
misconceptions are a long list: air is &ldquo;empty&rdquo; (nothing in it), air is
weightless, air only exerts pressure when it moves (like wind), a vacuum
&ldquo;sucks&rdquo; rather than being a region where surrounding air pushes in, and
pressure increases with height rather than decreasing.</p>
<p>Some of these misconceptions are stubborn precisely because everyday
experience seems to support them. Air does not feel like it has weight.
A vacuum cleaner does feel like it is pulling. The atmosphere, experienced
from inside it, does not announce itself as a pressure source.</p>
<p>The standard approach to this material — explaining atmospheric pressure,
defining $p = F/A$, working through barometric altitude formulae — addresses
the conceptual gaps at the declarative level. Students can recite that air
exerts pressure in all directions. Whether they have actually updated their
mental model is a different question.</p>
<p>&ldquo;Mission to Mars&rdquo; is a design-based attempt at conceptual change through
physical encounter with the consequences of pressure differences.</p>
<hr>
<h2 id="the-context-why-mars">The Context: Why Mars?</h2>
<p>The motivation for choosing the Mars context was empirical, not poetic.
The ROSE study — a large international survey of student interests in
science — consistently finds that space, astronomy, and human exploration
rank among the most motivating contexts for physics learning, for both
boys and girls. Physics education research in Germany has known for decades
that generic &ldquo;physics&rdquo; lessons underperform motivationally compared with
context-embedded physics, and astronomy is one of the contexts with the
clearest evidence base.</p>
<p>&ldquo;Mission to Mars&rdquo; asks students: a crewed mission to Mars would travel
through the vacuum of space, with the crew living in a pressurised
compartment. The compartment has to maintain atmospheric pressure while
surrounded by near-vacuum. What happens if it fails? And how would you
design a spacecraft structure to prevent that failure?</p>
<p>The question is concrete. The physics behind it — the difference between
the pressure inside the compartment and the near-zero pressure outside,
and the forces this pressure difference exerts on any structure — is the
content of the lesson.</p>
<hr>
<h2 id="the-experiment">The Experiment</h2>
<p>The full version of the experiment, as we ran it at the astro-lab at the
University of Cologne, uses a vacuum pump, a bell jar, and a smartphone
running <a href="https://phyphox.org">phyphox</a>. The smartphone&rsquo;s built-in barometric
pressure sensor records real-time atmospheric pressure inside the bell jar
as the pump evacuates it.</p>
<p>Before building anything, students verify that the smartphone is a
functional pressure gauge: they measure the current atmospheric pressure
in the room and compare it with a provided reference value. This step
matters pedagogically — it establishes that the phone is a real scientific
instrument, not just a device for receiving worksheets.</p>
<p>Then comes the design-build-test cycle:</p>
<p><strong>Design</strong>: Students are given PVC plumbing pipe sections, empty food
containers, resealable bags, rubber bands, clamps, and other household
materials. Their task is to build a prototype &ldquo;spaceship&rdquo; — a container
that will maintain near-atmospheric pressure inside while the bell jar
around it is evacuated to low pressure. The phone (or external sensor)
goes inside the prototype to measure whether the prototype is holding.</p>
<p><strong>Predict</strong>: Before testing, students are asked to state why they think
their prototype will or won&rsquo;t work. This surfaces their preconceptions
in a low-stakes way and sets up the next stage.</p>
<p><strong>Test</strong>: The prototype goes into the bell jar. The pump runs. The pressure
sensor records. The light curve — sorry, the pressure curve — tells the
story. Four outcomes are possible:</p>
<ul>
<li><strong>Nearly flat line</strong>: the prototype is airtight, pressure inside
stays near atmospheric. Mission success.</li>
<li><strong>&ldquo;Bathtub&rdquo; curve</strong>: a visible failure event — a cap pops off, the
pressure inside drops sharply and then equalises. Students hear the
pop. They did not expect the pop. This is the moment.</li>
<li><strong>Gradual decay</strong>: the prototype leaks slowly, the pressure inside
drops steadily. Invisible failure.</li>
<li><strong>Noisy signal</strong>: something wrong with the setup.</li>
</ul>
<p><strong>The PVC pipe trap</strong>: the PVC pipe is deliberately included because it
is the most impressive-looking material and is reliably incorrect. The
friction between pipe and lid is insufficient at the pressure differences
reached in the bell jar. The lid pops off. Students rebuild.</p>
<hr>
<h2 id="the-misconceptions-addressed">The Misconceptions, Addressed</h2>
<p>The design-test-rebuild cycle forces students to confront the misconceptions
listed above in a direct physical way:</p>
<p><em>Air is empty/weightless</em>: handled in pre-activities with standard
demonstrations (the dunked napkin, the deflated-vs-inflated balloon).</p>
<p><em>Air only exerts pressure when moving</em>: the bell jar demonstration makes
this concrete — the sensor shows pressure even in a static, undisturbed
volume. When the pump evacuates the jar, the &ldquo;stillness&rdquo; of the remaining
air doesn&rsquo;t change its pressure.</p>
<p><em>A vacuum sucks</em>: this is the crucial one, and the PVC lid pop addresses
it more effectively than any explanation. The lid does not get sucked
outward. The air inside the prototype at near-atmospheric pressure pushes
the lid open against the external near-vacuum. When the lid fails and
students hear the rush of air flowing back in after the valve is opened,
the direction of the pressure force becomes viscerally clear: it was
always the higher-pressure region pushing into the lower-pressure region.</p>
<p>The inquiry is scaffolded through worksheets and index cards, and there
is a teaching assistant present in the lab version to catch dangerous
situations (the smartphone can be damaged if exposed to too low a pressure
— the instructions include a warning about testing pump suction strength
before risking the device).</p>
<hr>
<h2 id="diy-variants-for-school-and-home">DIY Variants for School and Home</h2>
<p>The full lab setup is expensive and not portable. One design principle we
wanted to maintain was accessibility: the experiment should work at three
budget levels.</p>
<p><strong>Low budget (&lt; $5)</strong>: empty food containers connected to a household
vacuum cleaner through a small hole. Works, no real-time measurement
visible to students.</p>
<p><strong>Mid budget ($5–$50)</strong>: translucent storage containers in nested sizes
(large = &ldquo;space&rdquo;, small = &ldquo;spaceship&rdquo;), a small sealing ring to connect
the vacuum source. Students can watch the phone display through the
container during evacuation. The vacuum achieved is lower than the lab
version, but the qualitative experience — the prototype holding or
failing — is the same.</p>
<p><strong>Expensive ($500+)</strong>: the full lab version with bell jar and diaphragm
pump. Best analogy, best data, highest barrier.</p>
<p>The DIY take-home message, as the paper puts it: be creative, fail
forward. Anything that creates some vacuum and fits a prototype counts.</p>
<p>The experiment adapts readily to e-learning contexts: each group builds
a prototype, tests it (or has a family member film it), and presents the
outcome — including why the first prototype failed and how the second
was improved — in a shared video conference.</p>
<hr>
<h2 id="a-note-on-where-this-fits">A Note on Where This Fits</h2>
<p>&ldquo;Mission to Mars&rdquo; grew out of the astro-lab at the University of Cologne,
the same student laboratory context as the exoplanet transit experiments.
The common thread is not the specific physics topic (air pressure here,
photometry there) but the experimental approach: smartphones as real
measurement instruments, everyday materials as apparatus, an astronomical
context that sustains engagement, and a design-build-test cycle that
forces students to encounter the physics physically rather than only
propositionally.</p>
<p>The air pressure content connects naturally to the exoplanet unit at a
curriculum level: habitability of exoplanets depends partly on atmospheric
pressure. In the <a href="/posts/fremde-welten-exoplanet-teaching/">Fremde Welten article</a>,
atmospheric pressure is listed as one of the factors that determine whether
a detected exoplanet could support life — an explicit cross-link between
the two units.</p>
<p>The <a href="/posts/astro-lab-at-home/">astro-lab@home post</a> describes how the
broader astro-lab programme — including this experiment — was adapted for
home use during the pandemic. The air pressure experiment is among the
more challenging to replicate at home, but the low-budget vacuum cleaner
variant makes a version of it possible.</p>
<p>The design-build-test structure of this experiment also ended up at the
centre of a methodological argument during my thesis work. The short
version: everyone told me to use grounded theory instead of design
thinking as the research framework, and they were right to do so. That
story is in <a href="/posts/design-thinking-vs-grounded-theory/">a separate post</a>.</p>
<hr>
<h2 id="references">References</h2>
<p>Spicker, S. J., Küpper, A., &amp; Bresges, A. (2022). Mission to Mars:
Concept and implementation of a design-based (hands-on) smartphone
experiment helping students understand the effects caused by differences
in air pressure. <em>The Physics Teacher</em>, 60(1), 47–50.
<a href="https://doi.org/10.1119/10.0009109">https://doi.org/10.1119/10.0009109</a></p>
<p>Küpper, A., &amp; Schulz, A. (2017). Schülerinnen und Schüler auf der
Suche nach der Erde 2.0 im Schülerlabor der Universität zu Köln.
<em>Astronomie+Raumfahrt im Unterricht</em>, 54(157), 40–45.</p>
<p>Staacks, S., Hütz, S., Heinke, H., &amp; Stampfer, C. (2018). Advanced
tools for smartphone-based experiments: phyphox. <em>Physics Education</em>,
53(4), 045009.
<a href="https://doi.org/10.1088/1361-6552/aac05e">https://doi.org/10.1088/1361-6552/aac05e</a></p>
<p>Sjoberg, S., &amp; Schreiner, C. (2010). <em>The ROSE project: An overview
and key findings.</em> University of Oslo.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-10-03</strong>: Updated the self-citation to the correct year (2022), volume/issue (60(1)), pages (47–50), and DOI (10.1119/10.0009109).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Please Stop Saying the Sun Is on Fire</title>
      <link>https://sebastianspicker.github.io/posts/astronomy-misconceptions/</link>
      <pubDate>Tue, 17 Nov 2020 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/astronomy-misconceptions/</guid>
      <description>In September 2020 I gave a teacher training talk on stellar formation and the misconceptions students bring into class. The misconception list was long enough to be its own document. Here it is, with commentary. Includes: the Sun as a heat-planet, gravity that only works when things move, metals that always existed, and the obligatory complaint about quantum leaps.</description>
      <content:encoded><![CDATA[<p><em>In September 2020 Alexander Küpper and I gave a teacher training session on
stellar formation — why experiments for it are hard to design, and what
misconceptions students typically arrive with. This post is loosely based on
the misconceptions part of that talk, which turned out to generate the most
discussion.</em></p>
<hr>
<h2 id="why-misconceptions-are-not-just-wrong-answers">Why Misconceptions Are Not Just Wrong Answers</h2>
<p>Before the list, a clarification that matters pedagogically.</p>
<p>When education researchers say &ldquo;misconception,&rdquo; they do not mean a random
error or a gap in knowledge. A misconception is a stable, self-consistent
mental model that students actively use to interpret new information. It
persists not because the student hasn&rsquo;t heard the correct explanation but
because the incorrect model handles a wide range of everyday experience
reasonably well.</p>
<p>&ldquo;Fire is a thing that makes heat and light and consumes fuel&rdquo; is a perfectly
adequate mental model for everything a student encounters outside a physics
class. It explains candles, campfires, gas hobs, and car engines. The fact
that it also leads the same student to conclude that the Sun &ldquo;burns&rdquo; in the
chemical combustion sense is not a failure of intelligence — it is the
natural extension of a model that works.</p>
<p>The implication, which Bransford, Brown, and Cocking put plainly in 2000:
if you ignore what students already believe and simply present the correct
model, &ldquo;the understanding they develop can vary substantially from what the
instructor intended.&rdquo; The new information gets interpreted through the
existing model, not in place of it. You end up with students who can repeat
&ldquo;the Sun fuses hydrogen&rdquo; while still, in their mental model, imagining it as
a very large and very hot fire.</p>
<p>With that said: here is the list.</p>
<hr>
<h2 id="the-sun-is-not-a-star">The Sun Is Not a Star</h2>
<p>This one leads because it is the most structurally interesting.</p>
<p>Bailey et al. (2009), in a study of students&rsquo; pre-instructional ideas about
stars and star formation, document the following category of response: the
Sun is a special kind of astronomical body with its own distinct properties.
It is not a star. Stars are the things you see in the night sky. The Sun is
different.</p>
<p>This is not an isolated finding. Schecker et al. (2018) document the same
pattern in the German context. Students who know perfectly well that &ldquo;the
Sun is a star&rdquo; as a stated fact will nonetheless, when asked to reason about
stellar properties, implicitly exempt the Sun from those properties. Stars
are far away, they are small and faint, they are cold and distant. The Sun
is close, large, and bright. Ergo the Sun cannot really be a star, whatever
the textbook says.</p>
<p>The pedagogical consequence is that teaching stellar evolution to students
who hold this model requires first collapsing the Sun/star distinction —
otherwise everything that follows is about something unfamiliar and distant
rather than about the object eight light minutes away that we can observe
in detail.</p>
<p>A companion misconception: <strong>all stars are smaller than the Sun</strong>. This is
the inverse problem. Students who correctly classify the Sun as a star but
have only seen stars as faint points of light infer that stars must be small.
The Sun, which they know to be large, therefore cannot be a typical star.
Betelgeuse — a red supergiant with a radius approximately 700 times the
Sun&rsquo;s, which if placed at the Sun&rsquo;s position would extend past the asteroid belt — tends
to produce strong cognitive dissonance when it is first encountered.</p>
<hr>
<h2 id="the-sun-is-on-fire">The Sun Is on Fire</h2>
<p>The combustion model of stellar energy is, empirically, the most common
student conception and the hardest to dislodge.</p>
<p>From Favia et al.&rsquo;s misconception inventory, translated loosely:</p>
<ul>
<li><em>&ldquo;The Sun is made of fire.&rdquo;</em></li>
<li><em>&ldquo;Stars run on fuel: petrol or natural gas.&rdquo;</em></li>
<li><em>&ldquo;The Sun is made of molten lava.&rdquo;</em></li>
<li><em>&ldquo;The Sun is a heat-planet.&rdquo;</em></li>
</ul>
<p>Bailey et al.&rsquo;s quantitative data: when asked how stars produce light,
32% of students described chemical burning. A further 28% described unspecified
&ldquo;chemical reactions.&rdquo; Only 7% named nuclear fusion. Only 3% could both name
fusion and correctly connect it to the production of light.</p>
<p>The combustion model is coherent and consistent. It gives you a mechanism
(fuel + oxygen → heat and light), a timescale (stars eventually run out of
fuel and go dark), and a product (visible light and heat). What it cannot
handle is the scale: the Sun has been burning for 4.6 billion years and has
approximately 5 billion years of fuel remaining. Chemical combustion at the
Sun&rsquo;s luminosity would exhaust any chemically plausible fuel supply in tens
of thousands of years. This is the crack in the model that fusion fills —
not by saying &ldquo;the Sun burns differently&rdquo; but by replacing the entire energy
mechanism with one that operates at scales the combustion model cannot reach.</p>
<p>One related misconception worth noting explicitly: <strong>the Sun is hottest at
its surface</strong>. This is the intuitive model — things are hot near the fire
and cooler further away. The corona&rsquo;s temperature of a million Kelvin, far
above the photospheric 5,778 K, violates this so thoroughly that it remained
an active research problem for decades (and, in some senses, still is).
Students encountering coronal heating for the first time do not usually reject
it, but they do find it genuinely strange in a way they cannot articulate —
which is the signature of something colliding with a stable prior model.</p>
<hr>
<h2 id="gravity-only-works-when-things-move">Gravity Only Works When Things Move</h2>
<p>The gravity misconceptions documented in the research literature are worth
treating separately because they have direct consequences for understanding
stellar formation — which depends entirely on gravity acting on stationary
or slowly drifting gas clouds.</p>
<p>The relevant findings:</p>
<p><strong>Gravity requires motion</strong> (Palmer, 2001). A significant proportion of
students believe that gravity only acts on objects that are in motion. A
stationary object is not subject to gravitational attraction. A table sitting
on the floor: fine, no gravity needed. A gas cloud drifting slowly through
space: also fine. A gas cloud being compressed by gravitational self-attraction:
this requires gravity to act on particles that are not yet moving, which the
model cannot accommodate.</p>
<p><strong>Force implies movement</strong> (Gunstone &amp; Watts, 1985). The more general version:
forces produce motion, and where there is no net motion, there is no net force.
The concept of force balance — two equal and opposite forces summing to zero
net force, with the object not moving — is not available to students holding
this model. It is hard to overstate how consequential this is for astrophysics.
Almost every stable astrophysical structure — a main-sequence star, a planetary
orbit, a galaxy&rsquo;s rotation — is a force balance. Students without the concept
cannot reason about any of them correctly.</p>
<p><strong>Gravity only acts on Earth</strong> (Bar, Brosh, and Sneider, 2016). Students in
the space context often reason that gravity is a property of Earth specifically.
In space, things are &ldquo;weightless&rdquo; — and weightlessness is interpreted as the
absence of gravity rather than as the experience of free fall in a gravitational
field. The result: gravity cannot be the mechanism by which an interstellar
gas cloud collapses, because gas clouds are in space and gravity does not work
there. Asghar and Libarkin (2010) found that only one in five non-physics
college students could correctly describe gravity as an attractive force between
masses, using the correct vocabulary.</p>
<p>These are not fringe findings. They are the majority conception at the
pre-instructional stage. Any unit on stellar formation that opens with
&ldquo;gravity compresses the gas cloud&rdquo; is speaking to students who mostly do not
believe that gravity can do that to a gas cloud in space.</p>
<hr>
<h2 id="metals-always-existed">Metals Always Existed</h2>
<p>This misconception is my personal favourite because it requires no incorrect
intuition — it requires an absence of information that most people have never
had reason to acquire.</p>
<p>Students and adults who have not encountered stellar nucleosynthesis simply
have no model for where heavy elements come from. Asked directly, a common
response is that metals &ldquo;always existed&rdquo; — they are a feature of the universe,
present from the beginning. The alternative framing: &ldquo;stars create matter from
nothing&rdquo; — which captures the sense that something is being generated, without
a mechanism.</p>
<p>The correct picture: the Big Bang produced primarily hydrogen and helium, with
trace amounts of lithium and beryllium. Every heavier element — including all
the carbon in your body, all the iron in your blood, all the oxygen in every
breath — was synthesised in a stellar interior or in a supernova. The gold in
a wedding ring was produced in a neutron star merger. We are, in the precise
sense of the phrase, made of star stuff; but not because stars are somehow
magical, because the nuclear physics of stellar interiors and violent stellar
deaths is the only process in the universe that can manufacture these elements.</p>
<p>This has a direct implication for stellar evolution education: if students
believe metals always existed, the cycle of stellar death and new star
formation — in which dying stars enrich the interstellar medium with heavy
elements that become part of the next generation of stars and their planets —
loses most of its meaning. The cycle is interesting precisely because it
explains why later-generation stars and their planets have a richer elemental
composition than first-generation stars. Remove that frame and you have
a sequence of events with no cumulative significance.</p>
<hr>
<h2 id="some-language-based-misconceptions-a-brief-digression">Some Language-Based Misconceptions (A Brief Digression)</h2>
<p>Since I promised something about quantum leaps: the phrase &ldquo;quantum leap&rdquo;
in everyday usage means a sudden, large, discontinuous advance. In physics,
a quantum transition is the smallest possible discrete change in a system&rsquo;s
energy state. The electron moves from one energy level to another; the
photon is emitted or absorbed; the scale of change is on the order of
electron-volts. It is, emphatically, not large.</p>
<p>The astronomy version of this class of error:</p>
<p><strong>&ldquo;Light year&rdquo; used as a unit of time.</strong> &ldquo;That happened light years ago.&rdquo;
A light year is the distance light travels in one year — approximately
9.46 × 10¹² kilometres. It is a unit of distance, not time. This one is
so embedded in everyday usage that correcting it usually produces mild
annoyance rather than reconsideration.</p>
<p><strong>&ldquo;Shooting stars.&rdquo;</strong> Meteors — small rocky or metallic bodies entering
the atmosphere — have nothing to do with stars. They are typically the
size of a grain of sand to a pebble. The visual resemblance to a moving
point of light crossing the sky is where the name comes from; the
resemblance to stellar physics is zero.</p>
<p><strong>&ldquo;Black holes suck things in.&rdquo;</strong> Black holes do not have more gravity
than the object that formed them at the same distance. If the Sun were
replaced by a black hole of equal mass, the planets would continue on
their current orbits. A black hole is only a black hole within its
Schwarzschild radius; beyond that it is a gravitational field like any
other. What black holes have is a point of no return — the event horizon —
beyond which escape velocity exceeds the speed of light. They do not
actively pull. They are very massive objects that objects can fall into.</p>
<p><strong>&ldquo;The dark side of the Moon.&rdquo;</strong> The Moon has a far side (permanently
facing away from Earth, due to tidal locking) and a near side. Both sides
receive approximately equal sunlight over the lunar cycle. The far side
is not permanently dark; it has a day and a night like the near side.
&ldquo;Dark side&rdquo; persists in common usage because Pink Floyd used it as an
album title and nobody wanted to call it &ldquo;The Far Side of the Moon.&rdquo;
(Although Douglas Adams would have had something to say about that.)</p>
<hr>
<h2 id="why-this-list-matters-for-teaching">Why This List Matters for Teaching</h2>
<p>The misconceptions described above are not randomly distributed. They cluster
around three areas where intuitive extrapolation from everyday experience
leads systematically away from the correct physics:</p>
<ol>
<li>
<p><strong>Scale</strong>: human intuition was not built for 150 million kilometres,
let alone 4.6 billion years or the 9.46 × 10¹² km in a light year.
The Sun cannot be fire because fire cannot last 4.6 billion years;
but &ldquo;4.6 billion years&rdquo; is not a number that everyday experience makes
graspable.</p>
</li>
<li>
<p><strong>Energy mechanism</strong>: combustion is the dominant frame for &ldquo;things that
produce heat and light.&rdquo; Nuclear fusion is not part of everyday experience
at any scale. The conceptual distance between them is not factual but
mechanistic — it requires replacing an entire causal model.</p>
</li>
<li>
<p><strong>Gravity</strong>: our direct experience of gravity is of a downward force,
active at Earth&rsquo;s surface, which keeps things from floating away.
The idea of gravity as a universal mutual attraction between all masses
— active in empty space, responsible for cloud collapse and galaxy formation
— is a substantive generalisation that everyday experience does not motivate.</p>
</li>
</ol>
<p>The pedagogical literature&rsquo;s recommendation is not to avoid these topics
but to surface the prior models explicitly before presenting the correct
physics. If you ask students &ldquo;where does the Sun&rsquo;s energy come from?&rdquo; before
you teach nuclear fusion, you learn what they believe and you create the
cognitive conditions for productive conceptual conflict. If you simply present
the fusion model without that step, students add &ldquo;fusion&rdquo; to their vocabulary
while retaining &ldquo;fire&rdquo; in their mental model.</p>
<p>The experiments Alexander Küpper and I have been developing through the
astro-lab project — described in the <a href="/posts/stellar-evolution-diy/">stellar evolution post</a>
and the <a href="/posts/astro-lab-at-home/">astro-lab@home post</a> — are designed
with these specific misconceptions in mind. The net-based gravity experiment
addresses the &ldquo;gravity doesn&rsquo;t work in space&rdquo; and &ldquo;force requires motion&rdquo;
problems directly, by making gravitational attraction between all particles
visible as a material structure. The pressure-temperature experiment makes
the &ldquo;compression heats the gas&rdquo; step concrete before any mention of fusion.</p>
<p>These are not complete solutions to deeply held misconceptions. But they are
a start at building the conceptual scaffolding that makes &ldquo;and then fusion
begins&rdquo; something other than an assertion to be memorised and filed away
without understanding.</p>
<hr>
<h2 id="references">References</h2>
<p>Asghar, A. A., &amp; Libarkin, J. C. (2010). Gravity, magnetism, and &ldquo;down&rdquo;:
Non-physics college students&rsquo; conceptions of gravity. <em>The Science Educator</em>,
19(1), 42–55.</p>
<p>Bailey, J. M., Prather, E. E., Johnson, B., &amp; Slater, T. F. (2009). College
students&rsquo; preinstructional ideas about stars and star formation.
<em>Astronomy Education Review</em>, 8(1).
<a href="https://doi.org/10.3847/AER2009038">https://doi.org/10.3847/AER2009038</a></p>
<p>Bar, V., Brosh, Y., &amp; Sneider, C. (2016). Weight, mass, and gravity:
Threshold concepts in learning science. <em>Science Educator</em>, 25(1), 22–34.</p>
<p>Bransford, J. D., Brown, A. L., &amp; Cocking, R. R. (Eds.) (2000). <em>How People
Learn: Brain, Mind, Experience, and School.</em> National Academy Press.</p>
<p>Favia, A., Comins, N. F., &amp; Thorpe, G. L. (2013). The elements of item
response theory and its framework in analyzing introductory astronomy college
student misconceptions. I. Galaxies. <em>Astronomy Education Review</em>.</p>
<p>Gunstone, R., &amp; Watts, M. (1985). Force and motion. In R. Driver, E. Guesne,
&amp; A. Tiberghien (Eds.), <em>Children&rsquo;s Ideas in Science</em> (pp. 85–104).
Open University Press.</p>
<p>Palmer, D. (2001). Students&rsquo; alternative conceptions and scientifically
acceptable conceptions about gravity. <em>International Journal of Science
Education</em>, 23(7), 691–706.
<a href="https://doi.org/10.1080/09500690010006527">https://doi.org/10.1080/09500690010006527</a></p>
<p>Schecker, H., Wilhelm, T., Hopf, M., &amp; Duit, R. (Eds.) (2018).
<em>Schülervorstellungen und Physikunterricht.</em> Springer.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2025-09-14</strong>: Updated the DOI for Bailey et al. (2009) to the correct 10.3847/AER2009038.</li>
<li><strong>2025-09-14</strong>: Changed &ldquo;would extend past Mars&rdquo; to &ldquo;would extend past the asteroid belt&rdquo; for Betelgeuse at ~700 R☉. At ~3.26 AU, Betelgeuse&rsquo;s radius exceeds Mars&rsquo;s orbital distance (1.52 AU) by more than a factor of two and reaches well into the asteroid belt (2.2–3.3 AU).</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>What Happens When You Film Student Teachers: ViLLA and the Case for Video in Teacher Education</title>
      <link>https://sebastianspicker.github.io/posts/villa-video-teacher-education/</link>
      <pubDate>Sun, 14 Jun 2020 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/villa-video-teacher-education/</guid>
      <description>ViLLA is an online portal of real classroom videos built for teacher education at the University of Cologne. The idea sounds straightforward. Getting there required filming actual lessons, building infrastructure, surviving a quasi-experiment, and eventually convincing the federal government that this was worth scaling. Some notes on how that went.</description>
      <content:encoded><![CDATA[<p><em>In September 2019 I gave a presentation on the ViLLA project at the ZuS Innovation Workshop
at the University of Cologne together with Daniel Zimmermann. This post is the
blog-friendly version of that presentation — what ViLLA is, why video in teacher
education is not as obvious as it sounds, and what the research actually showed.
The project team at the time: Prof. Dr. Dr. Kai Kaspar, Prof. Dr. Johannes König,
Charlotte Kramer, Marco Rüth, Daniel Zimmermann, Anne van Laak, and myself.</em></p>
<hr>
<h2 id="the-problem-with-learning-to-teach">The Problem With Learning to Teach</h2>
<p>Here is the uncomfortable thing about learning to teach: for the first few years of your
career, your primary research subjects are children. Every class you misread, every
transition you fumble, every moment you lose the room — those are learning experiences,
and the students in the room pay part of the cost.</p>
<p>This is not a new problem, and nobody is pretending it has a clean solution. But it
raises a question that teacher education programmes have been grappling with for a long
time: how much of the relevant learning can happen before the student teacher is standing
alone in front of thirty eleven-year-olds?</p>
<p>One answer — not the only one, but a defensible one — is: more of it, if you give people
good video.</p>
<hr>
<h2 id="what-villa-is">What ViLLA Is</h2>
<p><strong>ViLLA</strong> (Videos in der Lehrerinnen- und Lehrerausbildung — Videos in Teacher Education)
is an online portal of real classroom recordings built for use in teacher education at
the University of Cologne. The idea was to film actual teaching, make the recordings
searchable and pedagogically annotated, and give student teachers access to genuine
classroom situations before they were responsible for managing one themselves.</p>
<p>This sounds straightforward until you try to do it. Filming real classrooms requires
ethical clearance, consent from pupils and parents, cooperation from schools, and a
recording setup that doesn&rsquo;t turn the lesson into a performance. The resulting videos
need to be usable for instruction, which means they need accompanying material:
lesson plans, worksheets, transcripts, annotations by subject-matter specialists.
And then they need to be housed somewhere students can actually find them.</p>
<p>The first phase of ViLLA ran from April 2013 to December 2014, funded by the
University of Cologne&rsquo;s Innovation in Teaching programme. We opened officially on
5 November 2014 with a database of classroom sequences tagged by subject, year group,
school type, and didactic focus. The core intended audience: student teachers,
<em>Referendar</em>innen* (trainee teachers in the practical training phase), and the
university instructors and school-based mentors working with them.</p>
<hr>
<h2 id="what-the-research-showed">What the Research Showed</h2>
<p>The project was not just infrastructure. From the beginning we ran research alongside
the portal development — specifically, quasi-experimental studies on whether and how
video-based instruction actually improves the skills we care about.</p>
<p>The target construct was <strong>situation-specific skills for classroom management</strong> —
the ability to perceive, interpret, and respond to classroom events in real time.
This is a domain where there is reasonable theoretical agreement that expert teachers
differ from novices not primarily in declarative knowledge (knowing that you should
address disruptions early) but in perception and response speed (actually noticing
the early signs and acting on them).</p>
<p>The key finding from the ViLLA studies: <strong>combining video with transcripts was more
effective than control seminars that used neither</strong>. Students who worked with video
and transcript material showed better development of situation-specific classroom
management skills than comparison groups. The effect was not enormous, but it was
there, it replicated, and it was large enough to justify the infrastructure investment.</p>
<p>The transcript component is worth highlighting because it&rsquo;s not obvious. You might
expect that video alone would be sufficient — you are showing people real teaching.
But the transcript creates an additional layer of perceptual access: you can pause
on a moment, read back exactly what was said, annotate, compare your reading of the
situation with a peer&rsquo;s. The multimodal combination seems to do something that either
medium alone does not.</p>
<hr>
<h2 id="villa-20-scaling-up">ViLLA 2.0: Scaling Up</h2>
<p>By 2015, ViLLA had grown into a second development phase. In November 2016 it received
federal funding through the BMBF&rsquo;s <em>Qualitätsoffensive Lehrerbildung</em> (Quality Initiative
for Teacher Education), embedded in the University of Cologne&rsquo;s
<em>Zukunftsstrategie Lehrer</em>innenbildung* (ZuS) umbrella project.</p>
<p>The scale change was significant. <strong>185 videos</strong> in the database by the time of
the 2019 presentation, covering more subjects, more school types, and more
outside-school teaching and learning scenarios than the original portal had included.
The self-learning modules — originally an add-on — became a central feature.</p>
<p>Two types of modules emerged from the practice:</p>
<p><strong>Case-based modules</strong> built around a specific filmed sequence, asking the learner
to work through what they observe, what decisions the teacher made, and what they
would do differently. These are close to case-based reasoning as used in medical
education — the video is the case.</p>
<p><strong>Theme-centred modules</strong> organised around a pedagogical concept (classroom
transitions, group work monitoring, handling disruptions) and drawing on multiple
video examples to illustrate the same phenomenon across different contexts. The
goal is pattern recognition — not learning what to do in <em>this</em> lesson, but
developing a schema that transfers to next year&rsquo;s class in a different school.</p>
<hr>
<h2 id="the-meta-portal-and-what-it-means">The Meta-Portal and What It Means</h2>
<p>One development I am particularly interested in from a research infrastructure
perspective: ViLLA&rsquo;s integration into <strong>unterrichtsvideos.net</strong>, a meta-portal
that aggregates classroom video collections from universities across Germany.</p>
<p>The single-portal model has an obvious limitation: your institution&rsquo;s videos
reflect your institution&rsquo;s context. The schools you filmed, the subject specialists
on your team, the pedagogical questions your programme emphasises. Aggregation
across portals means a student teacher in Cologne can access video collected at
Münster or Berlin, search across the combined database by year group and subject,
and get access without separate registration at each institution.</p>
<p>This matters for research too. A shared infrastructure with standardised tagging
creates the conditions for cross-institutional studies. You can ask whether the
same video material works differently in different programme contexts, or whether
different annotation frameworks lead to different learning outcomes. The portal is
also, then, a methodology — a way of generating comparable data.</p>
<hr>
<h2 id="what-i-think-is-actually-interesting-here">What I Think Is Actually Interesting Here</h2>
<p>I should be honest about where my personal research interest sits in all of this,
because it is not primarily in the technology.</p>
<p>The thing that I find genuinely interesting about the ViLLA project is the implicit
theory of professional learning it rests on. We filmed real lessons — not idealised
demonstrations, not training videos produced for the purpose, but actual classroom
teaching with the roughness and contingency that implies. We then gave those videos
to student teachers and asked them to look carefully.</p>
<p>The assumption is that professional perception can be educated. That what distinguishes
a competent teacher from a novice is not just accumulated experience but the capacity
to read situations quickly and accurately — and that this capacity can be developed
through structured encounter with material before you are responsible for it.</p>
<p>This is an empirical claim and we have evidence for it. But it also connects to
broader questions about expertise, perception, and what it means to prepare someone
for a practice-based profession. Medical education has been working on these
questions through simulation and case-based learning for decades. Teacher education
is, in many institutions, still catching up.</p>
<p>ViLLA is one attempt to close that gap. Whether it is the right attempt, in its
current form, is something I am still working out. But the question it is trying to
answer — what do you need to have seen, and thought about, before you can teach
well — seems to me like one of the important ones.</p>
<hr>
<h2 id="where-this-is-going">Where This Is Going</h2>
<p>Two strands that were live at the time of the 2019 presentation and that I will
return to in later posts:</p>
<p>The <strong>ProvidiS</strong> project (Förderung der professionellen Wahrnehmung in digitalen,
videobasierten Selbstlernmodulen — Promoting Professional Perception in Digital,
Video-Based Self-Learning Modules), a follow-on BMBF project in cooperation with
the Universities of Münster and FU Berlin, which moves from infrastructure to
targeted intervention design. The question shifts from &ldquo;does video work?&rdquo; to
&ldquo;which features of video-based learning design produce which effects on professional
perception, for which learners?&rdquo;</p>
<p>And a methodological strand I have become increasingly interested in: <strong>the
videography setting itself as a research question</strong>. How you film a lesson — camera
placement, editing conventions, what gets cut — shapes what the viewer can perceive.
The transcript does something similar. These are not neutral mediations. They are
constructions, and the choices made in constructing them have downstream effects on
what student teachers learn to see. This connects to questions I have been thinking
about in qualitative methodology more broadly — which I will probably end up writing
about separately.</p>
<hr>
<h2 id="references">References</h2>
<p>König, J., Blömeke, S., Klein, P., Suhl, U., Busse, A., &amp; Kaiser, G. (2014).
Is teachers&rsquo; general pedagogical knowledge a premise for noticing and interpreting
classroom situations? <em>Teaching and Teacher Education</em>, 38, 76–88.
<a href="https://doi.org/10.1016/j.tate.2013.11.004">https://doi.org/10.1016/j.tate.2013.11.004</a></p>
<p>Kramer, C., König, J., Strauß, S., &amp; Kaspar, K. (2020). Classroom videos or
transcripts? A quasi-experimental study to assess the effects of media-based
learning on pre-service teachers&rsquo; situation-specific skills of classroom
management. <em>International Journal of Educational Research</em>, 103, 101624.
<a href="https://doi.org/10.1016/j.ijer.2020.101624">https://doi.org/10.1016/j.ijer.2020.101624</a></p>
<p>Sherin, M. G. (2007). The development of teachers&rsquo; professional vision in video
clubs. In R. Goldman, R. Pea, B. Barron, &amp; S. J. Derry (Eds.),
<em>Video Research in the Learning Sciences</em> (pp. 383–395). Lawrence Erlbaum.</p>
<p>van Es, E. A., &amp; Sherin, M. G. (2002). Learning to notice: Scaffolding new teachers'
interpretations of classroom interactions. <em>Journal of Technology and Teacher
Education</em>, 10(4), 571–596.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Hello World — What Is This Blog?</title>
      <link>https://sebastianspicker.github.io/posts/hello-world/</link>
      <pubDate>Wed, 22 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/hello-world/</guid>
      <description>An introduction to this blog: scientific ideas too lazy to submit, peer review openly invited, criticism will be posted.</description>
      <content:encoded><![CDATA[<h2 id="why-does-this-exist">Why does this exist?</h2>
<p>I have a folder. It is full of half-finished ideas, speculative derivations, and results that are <em>probably</em> interesting but will <em>definitely</em> never make it through formal peer review — at least not at the current level of polish.</p>
<p>So instead of letting them rot, I&rsquo;m posting them here.</p>
<p>The format is loose. The rigor is variable. The pinky promise is firm:</p>
<blockquote>
<p><strong>Peer review is welcome. All criticism will be posted alongside the original entry.</strong></p>
</blockquote>
<p>If you find an error, a flawed assumption, or a better framing — open an <a href="https://github.com/sebastianspicker/sebastianspicker.github.io/issues">issue on GitHub</a>. I will read it, respond to it, and append it to the post.</p>
<hr>
<h2 id="what-to-expect">What to expect</h2>
<p>Posts will look roughly like this:</p>
<ol>
<li><strong>An idea</strong> — usually the kind that arrives at 11 pm and seems very important.</li>
<li><strong>Some argument</strong> — math, code, or prose, depending on what makes sense.</li>
<li><strong>Honest limitations</strong> — what would need to be true for this to actually hold up.</li>
<li><strong>Open questions</strong> — what I don&rsquo;t know and am not going to pretend I do.</li>
</ol>
<hr>
<h2 id="a-taste-of-the-math-rendering">A taste of the math rendering</h2>
<p>Since this blog covers scientific content, equations should work. Here&rsquo;s a sanity check:</p>
<p>Inline math: the Gaussian integral \( \int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi} \) is a classic.</p>
<p>Display math:</p>
$$
\sum_{n=0}^{\infty} \frac{(-1)^n}{2n+1} = 1 - \frac{1}{3} + \frac{1}{5} - \cdots = \frac{\pi}{4}
$$<p>And a block with <code>\[...\]</code> delimiters:</p>
\[
  \mathcal{F}\{f\}(\xi) = \int_{-\infty}^{\infty} f(x)\, e^{-2\pi i x \xi}\, dx
\]<p>If those rendered correctly, we&rsquo;re in business.</p>
<hr>
<h2 id="a-taste-of-code">A taste of code</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">estimate_pi</span><span class="p">(</span><span class="n">n_samples</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">float</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Monte Carlo estimation of pi.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">n_samples</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">inside</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">y</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">&lt;=</span> <span class="mf">1.0</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">inside</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;π ≈ </span><span class="si">{</span><span class="n">estimate_pi</span><span class="p">(</span><span class="mi">10_000_000</span><span class="p">)</span><span class="si">:</span><span class="s2">.6f</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span></code></pre></div><hr>
<p>See you in the next post.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
