More Context Is Not Always Better
The intuition that feeding a language model more information improves its outputs is wrong often enough to matter. Here is why, and what to do about it.
The intuition that feeding a language model more information improves its outputs is wrong often enough to matter. Here is why, and what to do about it.
AI detectors flag the US Constitution as machine-generated. They also flag technical papers, legal prose, and — with striking consistency — writing produced by autistic minds and physics-trained ones. The error is not in the measurement. It is in the baseline assumption: that systematic, precise writing is inhuman.
A new video went viral last week: same question, “should I drive to the car wash?”, different wrong answer — the AI said to walk instead. This is neither the tokenisation failure from the strawberry post nor the grounding failure from the rainy-day post. It is a pragmatic inference failure: the model understood all the words and (probably) had the right world state, but assigned its response to the wrong interpretation of the question. A third and more subtle failure mode, with Grice as the theoretical handle.
A viral video this month showed an AI assistant confidently answering “should I go to the car wash today?” without knowing it was raining outside. The internet found it funny. The failure mode is real but distinct from the strawberry counting problem — this is not a representation issue, it is a grounding issue. The model understood the question perfectly. What it lacked was access to the state of the world the question was about.
In late 2025, agentic coding tools went from impressive demos to daily infrastructure. The problem nobody talked about enough: when an LLM agent has write access to a codebase and no formal constraints, reproducibility breaks down. The Ralph Loop is a deterministic, story-driven execution framework that addresses this — one tool call per story, scoped writes, atomic state. A design rationale with a formal sketch of why the constraints matter.
Everyone has a Downloads folder full of “scan0023.pdf” and “document(3)-final-FINAL.pdf”. Renaming them by content sounds trivial — read the file, understand what it is, give it a name. The implementation reveals something useful about how LLMs actually handle text: what a token is, why context windows matter in practice, why you want structured output instead of prose, and why heuristics should go first. The repository is at github.com/sebastianspicker/AI-PDF-Renamer.
On 2 December 2024 I gave three workshops at HfMT Köln’s Thementag on AI and music education. The handouts covered data protection, AI tools for students, and AI in teaching. This post is the argument behind them — focused on the curriculum question that none of the tools answer on their own: what should change, and what should not?
In September 2024, OpenAI revealed that its new o1 model had been code-named “Strawberry” internally — the same word that language models have famously been unable to count letters in. The irony was too perfect to pass up. But the counting failure is not a sign that LLMs are naive or broken. It is a precise, informative symptom of how they process text. Here is the actual explanation, with a minimum of hand-waving.