<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Ai-Detection on Sebastian Spicker</title>
    <link>https://sebastianspicker.github.io/tags/ai-detection/</link>
    <description>Recent content in Ai-Detection on Sebastian Spicker</description>
    <image>
      <title>Sebastian Spicker</title>
      <url>https://sebastianspicker.github.io/og-image.png</url>
      <link>https://sebastianspicker.github.io/og-image.png</link>
    </image>
    <generator>Hugo -- 0.160.0</generator>
    <language>en</language>
    <lastBuildDate>Wed, 18 Feb 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://sebastianspicker.github.io/tags/ai-detection/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>If You Think This Is Written by AI, You Are Both Right and Wrong</title>
      <link>https://sebastianspicker.github.io/posts/ai-detectors-systematic-minds/</link>
      <pubDate>Wed, 18 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-detectors-systematic-minds/</guid>
      <description>AI detectors flag the US Constitution as machine-generated. They also flag technical papers, legal prose, and — with striking consistency — writing produced by autistic minds and physics-trained ones. The error is not in the measurement. It is in the baseline assumption: that systematic, precise writing is inhuman.</description>
      <content:encoded><![CDATA[<p>I use AI tools in my writing. This post, like several others on this blog,
was written with LLM assistance — research, structure, drafting,
revision. If you run any of these posts through an AI writing detector, you
will likely receive a high probability-of-AI score. The detector will be
picking up something real.</p>
<p>It will also be wrong about what that means.</p>
<hr>
<h2 id="the-constitution-problem">The Constitution Problem</h2>
<p>In 2023, as universities began deploying AI detection tools at scale,
educators started testing them on texts that were definitively not
AI-generated. The results were instructive. The United States Constitution
received high AI-probability scores from multiple commercial detectors.
GPTZero returned a rating of 92% likely AI-written. The Federalist Papers
fared similarly. So did sections of the King James Bible and Kant&rsquo;s <em>Critique
of Pure Reason</em>. Historical documents, written by humans, for human purposes,
in an era when no AI existed — flagged as machine-generated.</p>
<p>This was not a marginal edge case. It was consistent across tools and across
documents. And while it was widely reported as evidence that the detectors
were broken, there is a more precise reading available: the detectors were
working correctly, and we had misunderstood what they were measuring.</p>
<hr>
<h2 id="what-the-detectors-actually-measure">What the Detectors Actually Measure</h2>
<p>Most commercial AI detectors — GPTZero, Turnitin&rsquo;s detection layer,
Copyleaks — use some combination of two statistical signals.</p>
<p><strong>Perplexity.</strong> A language model assigns a probability to each token given
the preceding tokens. Low perplexity means the text was, token by token,
what the model expected — it sits close to the centre of the probability
distribution. AI-generated text tends to have low perplexity because that
is precisely what generation does: it samples from the high-probability
region of the distribution <a href="#ref-1">[1]</a>. Human text, on average,
has higher perplexity, because humans write for specific contexts with
idiosyncratic word choices, rhetorical effects that require the unexpected,
and the accumulated noise of composing for a real reader.</p>
<p><strong>Burstiness.</strong> A term introduced by Edward Tian, GPTZero&rsquo;s creator: human
writing has high burstiness — sentence lengths vary widely, vocabulary
density shifts, complex constructions alternate with simple ones. AI writing
is more uniform. The statistical distribution of sentence lengths in LLM
output is narrower than in most human prose <a href="#ref-2">[2]</a>.</p>
<p>The underlying assumption these tools share: human writing is variable,
contextually messy, idiosyncratic. AI writing is smooth and predictable.</p>
<p>This is accurate for a large class of human writing — casual prose, personal
essays, social media, student writing in informal registers. It is wrong
about a different and well-defined class of human writing. The Constitution
sits in that class. So does a lot of other text.</p>
<hr>
<h2 id="the-systemising-brain">The Systemising Brain</h2>
<p>Simon Baron-Cohen&rsquo;s empathising–systemising (E-S) theory distinguishes two
cognitive orientations. Empathising involves attending to social and emotional
cues, inferring mental states, navigating the pragmatic, implicit layer of
communication — what is meant rather than what is said. Systemising involves
attending to rules, patterns, and underlying regularities — the drive to
understand how things work and to represent them in explicit, transferable,
internally consistent terms <a href="#ref-3">[3]</a>.</p>
<p>Both orientations are distributed across the human population. They are not
exclusive, and neither is pathological. But autism spectrum conditions are
robustly associated with high systemising and relatively lower empathising —
not because autistic people lack emotions or care about others, but because
the cognitive mode that comes naturally to them is one of rules, structures,
and explicit representation rather than social inference and pragmatic
implication. The intense world theory <a href="#ref-4">[4]</a> adds a
complementary perspective: autistic brains may be characterised by
hyper-reactivity and hyper-plasticity, with pattern-seeking and systematising
serving partly as a way of making a too-intense world navigable. The
systematicity is not a deficit. It is an adaptation.</p>
<p>This has direct consequences for writing.</p>
<p>High-systemising writing tends toward:</p>
<ul>
<li>
<p><strong>Consistent vocabulary.</strong> The same term is used for the same concept
throughout, because substituting a synonym introduces ambiguity about
whether the referent is actually the same. Neurotypical writing freely
uses synonyms for stylistic variety; systemising writing resists this
on principle.</p>
</li>
<li>
<p><strong>Explicit logical structure.</strong> Claims are supported by stated reasons
rather than left to pragmatic inference. If there are three conditions,
all three are named. Nothing is &ldquo;needless to say.&rdquo;</p>
</li>
<li>
<p><strong>Low social hedging.</strong> Phrases like &ldquo;as everyone knows&rdquo; or &ldquo;obviously&rdquo;
are avoided, because they perform social alignment rather than convey
information — and they depend on shared assumptions the writer is not
confident are actually shared. (This connects to a point I made in the
<a href="/posts/car-wash-walk/">car-wash-walk post</a> about Gricean pragmatics:
autistic communication often violates the maxim of quantity in the
direction of over-informing, because nothing is assumed implicit.)</p>
</li>
<li>
<p><strong>Grammatical parallelism.</strong> Parallel logical content takes parallel
grammatical form. This is not stylistic affectation; it is a natural
consequence of representing structure explicitly.</p>
</li>
<li>
<p><strong>Minimal rhetorical noise.</strong> The prose does not meander, warm up, or
perform relatability. It states what needs to be stated.</p>
</li>
</ul>
<p>Now run text with these properties through an AI detector. Consistent
vocabulary reads as low lexical diversity. Explicit structure reads as low
burstiness. Minimal rhetorical noise reads as smooth, generated output. The
detector is measuring these properties accurately. The attribution to machine
generation is where it goes wrong.</p>
<p>Liang et al. <a href="#ref-5">[5]</a> demonstrated a closely related failure empirically: AI
detectors are significantly more likely to flag writing by non-native English
speakers as AI-generated. Non-native writers at advanced levels of formal
English tend to write more carefully, more consistently, and more in
accordance with explicit grammar rules — because they learned the language
as a system of explicit rules rather than acquiring it through immersive
social exposure. More systematic writing: higher AI probability score. The
mechanism is the same. The population is different.</p>
<hr>
<h2 id="the-physicist-brain">The Physicist Brain</h2>
<p>Physics writing has its own conventions, independently developed but pointing
in the same direction.</p>
<p>Scientific prose requires defined terms used consistently: in a paper about
quantum error correction, &ldquo;logical qubit,&rdquo; &ldquo;physical qubit,&rdquo; and &ldquo;syndrome&rdquo;
each mean exactly one thing, used identically in section 2 and section 5.
It requires explicit assumptions: &ldquo;We assume the noise is Markovian.&rdquo; &ldquo;In
the limit of large N.&rdquo; These are not vague hedges; they are precise
statements about the domain of validity of the results. It requires logical
derivation over rhetorical persuasion: the connectives are &ldquo;since,&rdquo;
&ldquo;therefore,&rdquo; &ldquo;it follows that&rdquo; — explicit logical operators, not narrative
bridges. And the passive construction of &ldquo;the signal was measured&rdquo; rather
than &ldquo;I measured the signal&rdquo; removes the individual from the result,
because the result should be reproducible regardless of who performs the
measurement.</p>
<p>The outcome is prose that is systematic, consistent, and structurally
predictable. From the outside — and from the vantage point of an AI
detector — it looks machine-generated.</p>
<p>Paul Dirac is the physicist who comes to mind first here. His 1928 paper
deriving the relativistic wave equation for the electron contains almost no
rhetorical apparatus. Motivation, equation, consequence: each stated once,
clearly, with no warm-up and no elaboration beyond what the argument
requires. It is not warm. It is not discursive. It is beautiful in the way
that a proof is beautiful: every element earns its place. Run it through
GPTZero and see what you get.</p>
<p>This connection between the physicist&rsquo;s prose style and the autistic cognitive
mode is not accidental. Baron-Cohen et al. <a href="#ref-6">[6]</a> surveyed Cambridge students
by academic discipline and found that physical scientists and mathematicians
scored consistently higher on the Autism Quotient (AQ) than humanities
students and controls, with mathematicians scoring highest of all. The
systemising orientation associated with autism spectrum conditions is also
associated with — and presumably selected for — in quantitative scientific
disciplines. The physicist&rsquo;s prose reflects this. So does the writing of a
high-systemising person who has never studied physics.</p>
<p>The categories overlap without being identical. What they share is a
cognitive preference for explicit structure, consistent vocabulary, and
logical transparency over social performance and rhetorical persuasion. The
writing that emerges from that preference looks, to an AI detector, like it
was generated by a machine.</p>
<p>It was not.</p>
<hr>
<h2 id="the-category-error">The Category Error</h2>
<p>The error AI detectors make is not a measurement error. It is a category
error.</p>
<p>They are trained to distinguish two things: output generated by a
contemporary LLM, and a specific subset of human writing — typically casual,
personal, or student prose collected from online sources. When they encounter
text outside either of those training categories — systematic and precise but
human-generated — the classifier has no good option. The text does not match
the &ldquo;AI&rdquo; training data exactly, and it does not match the &ldquo;human&rdquo; baseline
either. It gets assigned to the bin it fits least badly.</p>
<p>What is happening when the Constitution is flagged: it is systematic,
definitional, prescriptive, and internally consistent. It was written by
lawyers and statesmen who understood that ambiguity in foundational documents
creates legal chaos. They wrote to be unambiguous. The result is text with
low perplexity and low burstiness — the statistical signature the detector
associates with AI.</p>
<p>GPTZero&rsquo;s creator Edward Tian acknowledged this problem when it was reported:
the Constitution appears so frequently in LLM training data that it registers
as &ldquo;already known&rdquo; to the model, which artificially lowers its perplexity
score. That is a real and specific issue. But it is secondary. The deeper
issue is that the Constitution would score low-perplexity even without the
training-data contamination effect, because systematic, definitional prose
is intrinsically low-perplexity. Precise language is predictable language.
That is partly the point of precise language.</p>
<p>The baseline assumption — that human writing is variable and idiosyncratic —
holds for much human writing. It does not hold for legal drafting, technical
documentation, scientific papers, sacred and historical texts written to be
durable and precise, writing by people with high systemising orientation, or
writing by non-native speakers at formal registers. That is not a small
population of edge cases. It is a substantial fraction of all written
material that exists.</p>
<hr>
<h2 id="right-and-wrong-at-the-same-time">Right and Wrong at the Same Time</h2>
<p>So: if you think these posts are AI-generated, you are right and wrong at
the same time.</p>
<p>Right, in two ways. First: yes, I use AI tools. LLM assistance is part of
my writing process — not an occasional aid, but a regular part of how
research notes and half-formed arguments become structured posts. Second:
the writing style of these posts is systematic and precise in ways that
detectors register as machine-generated. That systematicity is real, and
if a detector picks it up, it is measuring something.</p>
<p>Wrong, also in two ways. First: the ideas, judgments, and connections in
these posts are mine. The decisions about what to include and what to leave
out, which papers to cite and how to frame their implications, where the
interesting tension lies between neurodiversity research and the assumptions
baked into AI detection tools — those are not outputs of a language model
working in isolation. They are the product of someone who works at the
intersection of these fields and has thought about them for a while. An LLM
cannot generate these posts without a human who has already decided what
to say.</p>
<p>Second, and more important for the argument here: the systematic, precise
character of this writing is not evidence of machine generation. It is a
cognitive signature — one associated with physics training, with high
systemising orientation, with the <a href="/posts/inner-echo/">overlap between those two things that I
have written about elsewhere</a> in the context of
neurodiversity more broadly.</p>
<p>The detector is measuring a real property of the text. It is misattributing
the origin of that property.</p>
<p>The interesting question this opens is not &ldquo;did AI write this?&rdquo; That question
is increasingly poorly posed in an era where thinking and writing are already
deeply entangled with machine assistance, in ways that differ sharply from
person to person and task to task. The better question is: <em>whose judgment
is in the text?</em> Whose choices about what to include, what to connect, what
to leave out?</p>
<p>The systematicity in this writing is mine. The recognition that AI detectors
systematically disadvantage autistic writers, physicist writers, and
non-native speakers is a judgment I made, not one a language model was
prompted to produce. The connection to the Constitution — a document written
to be maximally unambiguous, flagged as maximally AI-like — is a connection
I found worth drawing.</p>
<p>Whether that makes this text &ldquo;human&rdquo; is a philosophical question I am happy
to leave open. What it is not is AI hallucination.</p>
<hr>
<h2 id="references">References</h2>
<p><span id="ref-1"></span>[1] Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., &amp; Finn, C. (2023). DetectGPT: Zero-shot machine-generated text detection using probability curvature. <em>Proceedings of the 40th International Conference on Machine Learning (ICML 2023)</em>. <a href="https://arxiv.org/abs/2301.11305">https://arxiv.org/abs/2301.11305</a></p>
<p><span id="ref-2"></span>[2] Gehrmann, S., Strobelt, H., &amp; Rush, A. M. (2019). GLTR: Statistical detection and visualization of generated text. <em>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</em>, 111–116. <a href="https://doi.org/10.18653/v1/P19-3019">https://doi.org/10.18653/v1/P19-3019</a></p>
<p><span id="ref-3"></span>[3] Baron-Cohen, S. (2009). Autism: The empathising–systemising (E-S) theory. <em>Annals of the New York Academy of Sciences</em>, 1156(1), 68–80. <a href="https://doi.org/10.1111/j.1749-6632.2009.04467.x">https://doi.org/10.1111/j.1749-6632.2009.04467.x</a></p>
<p><span id="ref-4"></span>[4] Markram, K., &amp; Markram, H. (2010). The intense world theory — a unifying theory of the neurobiology of autism. <em>Frontiers in Human Neuroscience</em>, 4, 224. <a href="https://doi.org/10.3389/fnhum.2010.00224">https://doi.org/10.3389/fnhum.2010.00224</a></p>
<p><span id="ref-5"></span>[5] Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., &amp; Zou, J. (2023). GPT detectors are biased against non-native English writers. <em>Patterns</em>, 4(7), 100779. <a href="https://doi.org/10.1016/j.patter.2023.100779">https://doi.org/10.1016/j.patter.2023.100779</a></p>
<p><span id="ref-6"></span>[6] Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., &amp; Clubley, E. (2001). The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. <em>Journal of Autism and Developmental Disorders</em>, 31(1), 5–17. <a href="https://doi.org/10.1023/A:1005653411471">https://doi.org/10.1023/A:1005653411471</a></p>
]]></content:encoded>
    </item>
  </channel>
</rss>
