Science-and-Society on Sebastian Spicker

There Is an App for That — Until There Isn't

Tue, 07 Apr 2026 00:00:00 +0000

Someone vibe coded an app that tells you how many layers to wear today. It has 85,000 users. Someone else tracks her eyelash styles — every new set gets a photo and a note about the method. A father built Storypot: his kids drag emoji into a virtual pot and the app generates a bedtime story. A product manager made Standup Buddy, which randomises who talks first in a daily meeting. That is the entire feature. These are not bad things. Some of them are genuinely lovely — Storypot in particular. The layers app clearly meets a need, given 85,000 people agree. I have built tools like this myself — I automated my concert setlist workflow and wrote about it on this blog — and the feeling of compressing a forty-minute ritual into four minutes of machine-assisted execution is real and satisfying.

There is a term for this now. Karpathy coined it in early 2025: vibe coding. You describe what you want, the model writes the code, you run it, you fix what breaks by describing the fix, and at no point do you necessarily understand what the code does. The barrier to building software has not been lowered so much as removed. A single person with an afternoon and a language model can ship what would have required a team and a quarter, two years ago.

Meanwhile. In Germany, the average wait from an initial consultation to the start of psychotherapy is 142 days — nearly five months — according to a BPtK analysis of statutory insurance billing data [1]. The Telefonseelsorge — the crisis line, the last resort — handled 1.2 million calls in 2024. It is staffed by approximately 7,700 volunteers and funded primarily by the Protestant and Catholic churches. Its financing is described, in its own institutional language, as äußerst angespannt — extremely strained [2]. Six days ago, on April 1, psychotherapy fees in Germany were cut by 4.5% [3]. The thesis of this post is structural, not moral. There is a class of work that scales, and a class of work that does not. Our entire economy of attention — cultural, financial, technological — is optimised for the first class. The second class is not merely neglected. It is being made structurally more expensive, in a precise economic sense, by the very productivity gains that make the first class so intoxicating. And the policy apparatus, facing this structural pressure, is doing exactly what you would predict: it is funding apps.

The economist William Baumol explained the mechanism in 1966. It has a name, and the name is a diagnosis.

The Seduction of Leverage

What makes vibe coding culturally significant is not the code. It is the leverage. A single developer, aided by a language model, can produce software that reaches millions of users. The marginal cost of an additional user approaches zero. The output scales without bound while the input — one person, one prompt, one afternoon — stays fixed. This is the defining characteristic of automatable work: the ratio of output to input can grow without limit.

This is not new. Software has always had this property. What is new is that the barrier to producing software has collapsed. You no longer need to understand data structures, or networking, or the programming language. You need an idea and a few hours. The productivity frontier has shifted so dramatically that the interesting constraint is no longer can I build it but should anyone use it. The cultural response has been euphoric. Communities, podcasts, courses, manifestos. People who have never written a line of code are shipping products. I am not interested in dismissing this. The ability to build is a form of agency, and more people having it is not, in itself, a problem. The problem is what the euphoria obscures.

What Therapy Actually Requires

A psychotherapy session has the following structure. One therapist sits with one patient for approximately fifty minutes. The therapist listens, observes, formulates, responds. The patient speaks, reflects, resists, revises. The therapeutic alliance — the quality of the relationship between therapist and patient — is one of the most robust predictors of treatment outcome, across modalities, across conditions, across decades of research [4]. This is not a feature that can be optimised away. It is the mechanism of action. When a meta-analysis finds that the specific technique matters less than the relationship — that CBT, psychodynamic, and humanistic therapies produce roughly equivalent outcomes when the alliance is strong — it is telling you that the human in the room is not an implementation detail. The human in the room is the intervention.

You cannot parallelise this. A therapist cannot see two patients simultaneously without degrading the thing that makes the session work. You cannot batch it — twelve people in a room is group therapy, which is a different intervention with different dynamics and different limitations. You cannot cache it — the session is not a retrieval operation over stored responses but an emergent interaction that depends on what happens in the room that day. The irreducible unit of therapy is: one trained human, fully present, for one hour, with one other human. This has not changed since Freud’s consulting room on Berggasse 19, and no plausible technological development will change it, because the presence is the treatment. A therapist working full-time can see roughly twenty-five to thirty patients per week. That is the ceiling. It is set by the biology of attention and the ethics of care, not by inefficiency.

Baumol’s Cost Disease

In 1966, the economists William Baumol and William Bowen published Performing Arts, The Economic Dilemma, a study of why orchestras, theatre companies, and dance troupes were perpetually in financial crisis despite growing audiences and rising cultural prestige [5]. Their diagnosis was precise. A string quartet requires four musicians and approximately forty minutes to perform Beethoven’s Op. 131. This was true in 1826 and is true in 2026. The productivity of the quartet — measured in output per unit of labour input — has not increased. It cannot increase. The performance is the labour.

Meanwhile, the productivity of a textile worker, a steelworker, a software developer has increased by orders of magnitude. Wages in the productive sectors rise because productivity rises. Wages in the nonproductive sectors must keep pace — not because musicians deserve parity as a matter of justice, though they may, but because if they do not keep pace, musicians will leave for sectors that pay more. The quartet must compete in the same labour market as the factory and the tech company.

The result: the relative cost of live performance rises without bound. Not because musicians got worse. Not because audiences stopped caring. But because everything else got cheaper, and the quartet cannot. Baumol later generalised the result beyond the performing arts to all services in which the labour itself constitutes the product: education, healthcare, legal services, and — centrally for our purposes — psychotherapy [6]. A therapy session is a string quartet. The labour is the product. The productivity cannot increase. The cost, relative to the scalable economy, rises every time the scalable economy gets more productive. And vibe coding is a massive productivity shock to the scalable economy.

There Is an App for That

In 2019, the German government passed the Digitales-Versorgung-Gesetz, creating a fast-track approval process for Digitale Gesundheitsanwendungen — digital health applications, or DiGA. The idea: apps that can be prescribed by a doctor and reimbursed by statutory health insurance, just like medication. A patient walks into a practice, receives a prescription code, downloads the app, and the Krankenkasse pays [7]. As of mid-2025, the BfArM directory lists roughly 58 DiGA. Nearly half target psychiatric conditions — depression, anxiety, insomnia, burnout. Names like deprexis, HelloBetter, Selfapy. A patient who would wait 142 days for a therapist can get a DiGA prescribed the same afternoon.

The pricing structure deserves attention. In the first twelve months after listing, manufacturers set their own price. The average: €541 per prescription [8]. Some exceeded €2,000. After the first year, negotiated prices drop to an average of roughly €226 — but by then, the insurance has already paid the introductory rate for every early adopter. Total statutory health insurance spending on DiGA since 2020: €234 million. That spending grew 71% between 2023 and 2024 [9]. Here is the number that should sit next to that one. A single outpatient psychotherapy session costs the insurance system approximately €115. The €234 million spent on DiGA since 2020 could have funded over two million therapy sessions — enough for roughly 80,000 complete courses of 25-session treatment. And here is the evidence question. Only 12 of the 68 DiGA that have entered the directory demonstrated a proven positive care effect at the time of inclusion. The rest were listed provisionally, with twelve months to produce evidence. About one in six were subsequently delisted — removed from the directory because the evidence did not materialise [10].

I want to be precise about what I am and am not saying. Some DiGA have a real evidence base. Structured CBT exercises delivered digitally can produce measurable short-term symptom improvement — I reviewed the Woebot trial data in an earlier post on AI companions and took those results seriously. A DiGA that delivers psychoeducation and behavioural activation exercises is a tool, and tools can be useful. But a tool and a therapeutic relationship are not the same product delivered through different channels. They are different products. The policy framework treats them as substitutable — the patient who cannot access a therapist receives an app instead. The substitution is not a clinical judgement. It is a structural inevitability: facing the impossibility of scaling therapy, the system reaches for the scalable alternative, because the scalable alternative is what the incentive structure rewards. This is not a corruption story. This is Baumol’s cost disease expressed through health policy. The system is doing exactly what the theory predicts.

The Fear and the Compliance

There is an irony at the centre of the current discourse about AI and work that I want to name, because I think it is underexamined. People are afraid of AI. Specifically, they are afraid it will take their jobs. The surveys confirm this consistently — Gallup, Pew, the European Commission’s Eurobarometer — significant fractions of the working population in every developed country report anxiety about AI-driven job displacement.

And yet. The same people — not a different demographic, not a separate population, the same people — are enthusiastically using AI to do their work. They use language models to write their emails, their reports, their presentations. They vibe code tools for their teams. They let AI draft their strategy documents, summarise their meetings, compose their performance reviews. They celebrate the productivity gain. They post about it. This is not hypocrisy. It is something more interesting: a revealed preference for automation that contradicts a stated preference against it. The fear is about structural displacement — losing the role. The compliance is about local optimisation — doing the task more efficiently. No one wakes up and decides to automate themselves out of a job. They automate one task at a time, each automation locally sensible, until the job is a shell around an AI core. And all of this activity — the fear, the adoption, the discourse, the think pieces, the congressional hearings — is directed at automatable work. The kind of work where AI is a plausible substitute.

No one is afraid that AI will take the crisis counsellor’s job. No one is vibe coding a replacement for a psychiatric nurse. The work that is collapsing is not collapsing because AI replaced it. It is collapsing because it was never scalable, never attracted the capital or the talent that scalable work attracts, and every productivity gain in the scalable sector makes the unscalable sector relatively more expensive and harder to staff. The discourse about AI and jobs is, in this sense, exactly backwards. The threat is not that AI will replace the work that matters most. The threat is that it will make the work that matters most invisible — by making everything else so cheap and fast and abundant that we forget the expensive, slow, irreducibly human work exists at all.

The Political Arithmetic

On March 11, 2026, the Erweiterter Bewertungsausschuss — the body that sets fee schedules for outpatient care in Germany — decided a 4.5% flat cut to nearly all psychotherapeutic service fees, effective April 1 [3]. The health insurers had originally demanded 10%. Germany spends €4.6 billion annually on outpatient psychotherapy — roughly 1.5% of total statutory health insurance expenditure. The fee cut applies to this budget. The average therapist surplus — what remains after practice costs — is approximately €52 per hour [11]. The cut is not large in percentage terms. It is large in the context of a profession that is already among the lowest-paid in outpatient medicine. Nearly half a million people signed a petition against the cuts. There were protests in Berlin, Leipzig, Hanover, Hamburg, Stuttgart, Munich. The Kassenärztliche Bundesvereinigung filed a lawsuit. The Bundespsychotherapeutenkammer called the decision skandalös [12].

What makes this particularly striking is the sequence. The coalition agreement signed by CDU/CSU and SPD in May 2025 explicitly addresses mental health — securing psychotherapy training financing, needs-based planning for child and adolescent psychotherapy, crisis intervention rights for psychotherapists, and a suicide prevention law. The BPtK itself welcomed the agreement as giving mental health a neuen Stellenwert, a new significance [13]. Less than a year later, the same government’s arbitration body cuts psychotherapy fees by 4.5%. The stated commitment and the enacted policy point in opposite directions. This is not unusual in politics. What is unusual is that it maps so precisely onto Baumol’s mechanism: the coalition agreement acknowledges the problem in language; the fee schedule acknowledges it in arithmetic. And the arithmetic wins, because the arithmetic always wins when the work does not scale. The Bedarfsplanung, the needs-based planning system that determines how many psychotherapy seats are approved per region, was partially reformed in 2019 after decades of operating on 1990s-era ratios. The reform added roughly 800 seats. The BPtK considers it still fundamentally inadequate [14].

The arithmetic is plain. DiGA spending: growing 71% year on year. Psychotherapy fees: cut by 4.5%. The direction is unambiguous. Invest in the scalable. Cut the unscalable. And the damage compounds in a way that the policy apparatus appears not to understand, or not to care about. A therapist who leaves the profession because €52 per hour is no longer viable does not return when the cut is reversed. The training pipeline for a new clinical psychologist runs six to eight years from university admission to licensure. Over forty thousand accredited psychotherapists serve the system today [14]. Every one who leaves creates a gap measured in decades, not budget cycles. The Telefonseelsorge, staffed by volunteers and funded by the churches, is not a mental health system. It is what remains when the mental health system is not there. Treating it as a substitute — treating 7,700 volunteers as adequate coverage for a country of 84 million — is not a policy position. It is an admission that the actual policy has failed.

The Uncomfortable Part

Here is where I should, by the conventions of the form, propose a solution. I should say something about funding, about training pipelines, about recognising care work as infrastructure rather than a cost centre.

I think those things are true. I think we should pay therapists more, not less. I think Baumol’s cost disease means we should expect this to be expensive and fund it anyway, because the alternative — accepting that people in crisis will wait 142 days while the scalable economy celebrates another productivity milestone — is a failure of collective priorities so basic that it should be uncomfortable to state plainly. But I am also the person who automated his setlist workflow and was satisfied by the compression. I vibe code things. I use AI tools daily. I am inside the attention gradient, not observing it from above. The part of me that finds leverage intoxicating is the same part that writes this blog, and I do not think I am unusual in this.

The structural isomorphism is exact: Baumol’s string quartet, the therapist’s fifty minutes, the crisis counsellor’s phone call at 3am. The labour is the product. The product does not scale. The cost rises. The talent flows elsewhere. And the policy, rather than resisting the gradient, follows it — funding apps, cutting fees, digitising what cannot be digitised without changing what it is. The layers app reaches 85,000 users. The therapy app is reimbursed within the week. The therapist is available in five months, if at all.

I do not have a clean resolution to offer. I have a diagnosis — Baumol’s cost disease, applied to the attention economy of a civilisation that has discovered how to make scalable work almost free — and an observation: the political system is not counteracting the disease. It is accelerating it. The quartet still needs four musicians. The session still needs the therapist in the room. The phone still needs someone to answer it. Nothing we are building will change this. The question is whether we notice before the people who needed the answer stop calling.

References

[1] Bundespsychotherapeutenkammer. Psychisch Kranke warten 142 Tage auf eine psychotherapeutische Behandlung. BPtK. https://www.bptk.de/pressemitteilungen/psychisch-kranke-warten-142-tage-auf-eine-psychotherapeutische-behandlung/

[2] Evangelisch-Lutherische Kirche in Norddeutschland (2025). Finanzierung der Telefonseelsorge ist äußerst angespannt. https://www.kirche-mv.de/nachrichten/2025/februar/finanzierung-der-telefonseelsorge-ist-aeusserst-angespannt

[3] Kassenärztliche Bundesvereinigung (2026). Paukenschlag: KBV klagt gegen massive Kürzungen psychotherapeutischer Leistungen. https://www.kbv.de/presse/pressemitteilungen/2026/paukenschlag-kbv-klagt-gegen-massive-kuerzungen-psychotherapeutischer-leistungen

[4] Flückiger, C., Del Re, A. C., Wampold, B. E., & Horvath, A. O. (2018). The alliance in adult psychotherapy: A meta-analytic synthesis. Psychotherapy, 55(4), 316–340. https://doi.org/10.1037/pst0000172

[5] Baumol, W. J., & Bowen, W. G. (1966). Performing Arts, The Economic Dilemma: A Study of Problems Common to Theater, Opera, Music and Dance. Twentieth Century Fund.

[6] Baumol, W. J. (2012). The Cost Disease: Why Computers Get Cheaper and Health Care Doesn’t. Yale University Press.

[7] Bundesinstitut für Arzneimittel und Medizinprodukte. DiGA-Verzeichnis. https://diga.bfarm.de/de

[8] GKV-Spitzenverband (2025). Bericht des GKV-Spitzenverbandes über die Inanspruchnahme und Entwicklung der Versorgung mit Digitalen Gesundheitsanwendungen. Reported in: MTR Consult. https://mtrconsult.com/news/gkv-report-utilization-and-development-digital-health-application-diga-care-germany

[9] Heise Online (2025). Insurers critique high costs and low benefits of prescription apps. https://www.heise.de/en/news/Insurers-critique-high-costs-and-low-benefits-of-prescription-apps-10375339.html

[10] Goeldner, M., & Gehder, S. (2024). Digital Health Applications (DiGAs) on a Fast Track: Insights From a Data-Driven Analysis of Prescribable Digital Therapeutics in Germany From 2020 to Mid-2024. JMIR mHealth and uHealth. https://pmc.ncbi.nlm.nih.gov/articles/PMC11393499/

[11] Taz (2026). Weniger Honorar für Psychotherapie. https://taz.de/Weniger-Honorar-fuer-Psychotherapie/!6162806/

[12] Bundespsychotherapeutenkammer (2026). Gemeinsam gegen die Kürzung psychotherapeutischer Leistungen. https://www.bptk.de/pressemitteilungen/gemeinsam-gegen-die-kuerzung-psychotherapeutischer-leistungen/

[13] Bundespsychotherapeutenkammer (2025). Koalitionsvertrag gibt psychischer Gesundheit neuen Stellenwert. https://www.bptk.de/pressemitteilungen/koalitionsvertrag-gibt-psychischer-gesundheit-neuen-stellenwert/

[14] Bundespsychotherapeutenkammer. Reform der Bedarfsplanung. https://www.bptk.de/ratgeber/reform-der-bedarfsplanung/

Oppenheimer Didn't Have an Acceptable Use Policy

Tue, 03 Mar 2026 00:00:00 +0000

Physicists inherit, along with the formalism and the problem sets, a particular set of guilt. The profession has been working through its relationship to weapons, state violence, and the gap between scientific capability and ethical readiness since August 1945. This post is about why I think the current moment in AI closely resembles that history, and why Anthropic’s decision to draw a line matters even if — especially if — you think the line is imperfect.

What Just Happened

The news this week involves Anthropic and the question of whether and how large language models should be available for military applications. Anthropic has stepped back from a path toward unrestricted military use and restated a position: there are things their models will not be used for, weapons development and autonomous lethal systems among them. The response from parts of the defence and national security community has been predictable — naïve, idealistic, unilateral disarmament, your adversaries will not make the same choice.

These are not stupid objections. I want to take them seriously. But I also want to explain why, as someone who spent years studying physics in the shadow of the Manhattan Project’s legacy, the framing of those objections sounds very familiar, and why that familiarity is not reassuring.

What the Physicists Thought They Were Doing

The scientists who built the atomic bomb were not, for the most part, indifferent to what they were building. Many of them were refugees from European fascism. They understood what a Nazi atomic weapon would mean. The urgency was real, the moral reasoning was coherent, and the conclusion — build it before the other side does — followed from the premises.

What the premises did not include was adequate weight for what happens after the technical problem is solved.

By the time the Trinity test produced results in July 1945, Germany had already surrendered. The original justification — prevent the Nazis from getting there first — had evaporated. What remained was a weapon, an infrastructure for building more weapons, and a strategic and political logic that had largely moved beyond the scientists’ control. The Franck Report, written by a group of Manhattan Project scientists in June 1945, argued against using the bomb on a Japanese city without prior demonstration. It was ignored. Oppenheimer, who chaired the Interim Committee’s scientific panel, signed off on the Hiroshima target recommendation. He spent the rest of his life with that.

The lesson most physics students absorb from this history is something like: the scientists were not the decision-makers, the decision was going to be made anyway, and the presence of principled scientists in the room was better than their absence. The system was going to do what it was going to do; all you could influence was the margin.

I believed this for a long time. I am less sure of it now.

The Analogy and Its Limits

The comparison between the atom bomb and artificial general intelligence — or even current large language models at the capability frontier — is made often enough that it has become a cliché, which is usually the point at which people stop thinking carefully about it. Let me try to be specific about where the analogy holds and where it breaks.

Where it holds:

The core structural similarity is this: a small number of researchers, working at the frontier of a capability that most people do not understand, are making decisions that will constrain or enable uses they cannot fully anticipate, in contexts they will not control. The physics community in 1942 had a clearer view of what fission could do than any political or military decision-maker. The AI research community in 2026 has a clearer view of what large language models can do — and of what more capable successors will do — than most of the people who will deploy them.

That epistemic position is not morally neutral. Knowing more than the decision-makers does not mean you have unlimited responsibility, but it does mean you have more responsibility than someone who does not know. Feigning ignorance about downstream applications is not available to you.

The second similarity: once the capability exists and is demonstrated, the normative landscape changes. Before Trinity, the question of whether to build nuclear weapons was still open. After Trinity, it was no longer open in the same way — the knowledge existed, the infrastructure existed, the geopolitical expectations had already been set. The arms race was not caused by the bomb, but the bomb’s existence changed what the arms race meant and how fast it moved. We are somewhere in the vicinity of that transition with frontier AI systems. The question of whether to build them is still formally open for any given company or research group, but the landscape is already different from what it was five years ago.

Where it breaks:

The atom bomb was a single-use physical object whose primary function was destroying things. Large language models are general-purpose cognitive tools with a very wide range of applications, the majority of which are not weapons-relevant. This matters because it changes the policy space. You could, in principle, have not built the atom bomb. You cannot, in principle, not build language models while still having language models for medicine, education, scientific research, and the other applications that are clearly beneficial. The dual-use problem for AI is more severe, not less severe, than it was for physics.

The other important difference: the Manhattan Project was conducted in secret, under wartime conditions, with a relatively well-defined adversarial structure. The current AI landscape involves many organisations, many countries, public publication of research, and no clear equivalent of the Axis/Allied framing. The game theory of “if we don’t do it, they will” is more complicated when “they” is not a single identifiable adversary with symmetric interests.

What Anthropic’s Line Actually Says

Setting aside for a moment whether the line is in the right place, there is something worth examining in the act of drawing it at all.

The standard criticism — that a unilateral ethical commitment in a competitive field simply advantages less scrupulous actors — assumes that ethical commitments are pure costs with no countervailing benefits. This is the argument the weapons lobby has made about every arms control proposal in the history of arms control, and it has sometimes been right. Unilateral disarmament without reciprocal commitments can leave you worse off. This is not a trivial point.

But it smuggles in an assumption that deserves scrutiny: that the relevant competition is primarily between AI companies, and that the only variable that matters is relative capability. If you accept that framing, then any ethical constraint is a handicap and the only rational strategy is to develop as fast as possible with as few restrictions as possible.

That framing has a name in physics. It is called the arms race equilibrium, and the physics community spent thirty years understanding what it produces. It produces capability accumulation without a corresponding development of the normative frameworks, institutional safeguards, and mutual verification mechanisms that make the capability survivable. It produces Hiroshima, then the hydrogen bomb, then MIRV, then the point at which the accumulated arsenal is large enough to end complex life on Earth several times over, at which point you negotiate the first real arms limitation treaties — from a starting position of vastly more deployed capability than anyone needed and vastly less trust than anyone wanted.

The question Anthropic is implicitly asking is whether there is a path that does not look like that. The answer is not obvious. But I think it is worth asking.

What the Physicists Should Have Done

Here is the counterfactual that haunts the Manhattan Project’s legacy: what if the scientific community had treated the ethics of the bomb as seriously as the physics, from the beginning?

Not naïvely. Not by refusing to work on it and ceding the possibility of influencing it. But by making the ethical analysis parallel to the technical analysis, by treating the question of use as a scientific question with as much rigour as the question of yield, and by using the epistemic authority that came from being the people who understood the capability to push, hard, for the normative frameworks that did not yet exist.

Some scientists did this. Szilard circulated a petition, signed by 70 Manhattan Project scientists, against the use of the bomb on Japanese cities without prior warning. It did not work. But the effort was real, and the record of the effort matters — both as evidence that the scientific community was not unanimous in its acquiescence and as a model for what engaged dissent looks like from inside a project that is going to proceed regardless.

What most scientists did not do, and what the profession largely did not do in the decades that followed, was treat the ethical work as primary. Physics built its identity around the technical capability — the extraordinary achievement of understanding nature at the deepest level — and treated the ethical consequences as someone else’s department. The bomb was the military’s problem. The cold war was the politicians’ problem. The physicists kept doing physics.

This was comfortable and it was wrong.

What I Want From AI Researchers

I want AI researchers to do what the physicists did not, and to do it now, while the critical decisions are still open.

Anthropic drawing a line is one version of this. It is imperfect — the line is in a particular place, the enforcement mechanisms are limited, the competitive dynamics are real. But it is a claim that the people who built the capability have ongoing responsibility for how it is used, and that some uses are outside the bounds of what should happen regardless of what is technically possible.

That claim is not naïve. It is, in fact, the claim the Franck Report was making in 1945: that capability does not determine use, that scientists have a voice in the normative question, and that using that voice is part of the job rather than a distraction from it.

What I want beyond that is for the AI research community to treat the ethics as primary rather than as footnotes. Not ethics review boards that approve research post hoc. Not responsible AI teams that are consulted after the capability has been developed. A genuine integration of the normative analysis into the research process itself — asking, at each stage, what this capability makes possible and who benefits from that possibility and who pays the cost.

The physics community got to August 1945 before it had that conversation in earnest. The conversation has been going on ever since, and it has produced important institutional frameworks — the Bulletin of the Atomic Scientists, the arms control treaties, the export control regimes, the norms against first use. These things matter. But they were built in reaction to a capability that had already been deployed, and the shape of everything that followed was constrained by that starting point.

The AI community is not there yet. The starting point is still being established. That is what makes this moment consequential, and what makes Anthropic’s line — wherever exactly it is drawn — worth defending as an act of principle rather than dismissing as an act of commercial positioning.

A Note on the “Of Our Time” Framing

I am aware that comparisons to the atom bomb are sometimes used to generate unwarranted urgency, to short-circuit careful reasoning by invoking the most extreme case. I want to be clear about what I am and am not claiming.

I am not claiming that current large language models are as immediately dangerous as nuclear weapons. They are not.

I am claiming that the structural situation — researchers at the capability frontier, ahead of the policy frameworks, making decisions that will constrain future options, in a competitive environment with adversarial dynamics — is similar enough that the lessons of the Manhattan Project period are directly relevant. Not as prophecy. As a guide to the kind of mistakes that are available to make.

The physicists had plenty of warning. Szilard had been worried since 1933. Einstein wrote to Roosevelt in 1939. The Franck Report was written before Hiroshima. The warnings were on the record. What was not on the record was a scientific community that treated those warnings as actionable constraints on its own behaviour rather than as advisories for policymakers.

That is the thing I want to be different this time.

References

Franck, J. et al. (1945). Report of the Committee on Political and Social Problems (The Franck Report). National Archives, Record Group 77.

Oppenheimer, J. R. (1965). Interview on The Decision to Drop the Bomb (NBC documentary). Recorded 1965.

Rhodes, R. (1986). The Making of the Atomic Bomb. Simon & Schuster.

Russell, B., & Einstein, A. (1955). The Russell–Einstein Manifesto. Pugwash Conferences on Science and World Affairs.

Szilard, L. (1945). A Petition to the President of the United States. July 17, 1945. Available via the Atomic Heritage Foundation.

Bulletin of the Atomic Scientists (1945–present). Doomsday Clock statements. https://thebulletin.org/doomsday-clock/