Music on Sebastian Spicker

Automate the Boring Stuff: Setlist to Playlist

Tue, 10 Feb 2026 00:00:00 +0000

Saturday was the Deftones at the Westfalenhalle in Dortmund. One of those concerts where the setlist is part of the experience — where you register, with something close to physical relief, that the arc landed exactly right, and you spend the Uber home mentally replaying the order.

Sunday I built a playlist from it. It took about forty minutes.

This is the post about why that number is already too low, and also possibly too high.

The Ritual

There is a specific kind of concert listening that happens in the days after a show. You go home, you look up the setlist — setlist.fm is the canonical archive, maintained with an almost academic precision by people who care — and you build a playlist from it in whatever streaming app you use. Then you play it through, in order, and what comes back is not just the music but the spatial memory of the room, the sound mix, the moment the lights dropped for that particular song.

I have been doing this for years. It is a ritual, and like most rituals, part of its meaning is in the doing. The forty minutes of searching song by song, the occasional discovery that a deep cut is on Apple Music in one version but not another, the fiddling with live versus studio — that friction is not purely annoying. It is part of the processing.

And yet. The pile of unprocessed setlists sits in a folder. Shows I attended and never got around to. Setlists I meant to build into playlists and didn’t, because the forty minutes were not available that week, and then the moment passed. The ritual unrealised is just a list of song titles.

This is the dilemma, and it is not entirely trivial.

Why This Is Harder Than It Should Be

The setlist.fm API is excellent. It gives you structured data: artist, venue, date, song titles in order, with notations for encores, covers, and dropped songs. What it does not give you is streaming IDs. The song title is a string; the Apple Music track is an object with a catalog ID, a duration, multiple versions, regional availability, and the possibility of not existing at all in the catalog of your country.

The matching problem — connecting a string like “Change (In the House of Flies)” to the correct Apple Music track, filtered for the right album version, ignoring the live recordings you did not ask for — is not hard, but it is fiddly. You can get 80% of a setlist matched automatically without much effort. The remaining 20% are the covers, the deep cuts, the songs with subtitles in parentheses that differ between the setlist record and the catalog metadata.

Spotify has a fairly rich ecosystem of community tools for exactly this workflow, because Spotify’s API is permissive and well-documented and the auth flow is reasonable for third-party developers. Apple Music is harder. The MusicKit framework is real and capable, but the authentication requires managing a private key and JWT tokens signed with developer credentials — not the OAuth dance most developers are used to. The result is that the setlist → Apple Music pipeline is significantly underbuilt compared to the Spotify equivalent.

This is partly why I built setlist-to-playlist as a PWA rather than reaching for an existing tool.

How It Works

The app is a Progressive Web App — installable, mobile-friendly, works as a small tool you open on your phone in the taxi home from a show — built on Next.js with a monorepo structure managed by pnpm and Turbo. The architecture is in three phases:

Import. You paste a setlist.fm URL or ID. The app queries setlist.fm through a server-side proxy — the API key lives on the server and never touches the client — and returns the structured setlist data: songs in order, with metadata about covers, medleys, and notes.

Preview and matching. The core package runs a matching algorithm against the Apple Music catalog, using the MusicKit JS API for browser-based catalog search. For each song title, it searches Apple Music and presents the best candidate, giving you the chance to confirm or swap before anything is written. This is the step where the 20% problem is addressed manually — the app handles the obvious cases automatically and surfaces the ambiguous ones for human judgement.

Export. Once you are happy with the track list, the app creates a playlist in your Apple Music library. MusicKit handles the authentication in-browser; the backend generates the JWT tokens using credentials from Apple Developer, signing with the private key server-side so it stays off the client.

The whole thing is local-first in the sense that matters: the Apple Music authentication is between your browser and Apple, and no playlist data or listening history is stored by the app. The only thing the server touches is the API key proxying and the JWT generation.

The Actual Experience

After the Deftones show: opened the app on the phone, pasted the setlist.fm URL, had the playlist in Apple Music in about four minutes. Three tracks needed manual confirmation — two because of live-versus-studio ambiguity, one because a cover required a search adjustment, the kind of edge case where the name setlist.fm records differs from what appears in regional streaming catalogs.

Four minutes instead of forty. Mission accomplished.

And yet.

I noticed, processing the setlist that quickly, that something was missing. Not the music — the music was all there, in order, correct. What was missing was the time spent inside the setlist. The forty minutes of handling each song is also forty minutes of thinking about each song, of remembering where in the set it fell, of deciding which album version you want to hear. The automation removed the friction and also removed the processing.

I am not sure this is a problem. It is probably more accurate to say that it is a trade-off, and that what trade-off you want depends on what you are doing with the ritual. If the backlog is the problem — the pile of unprocessed shows — the automation solves it cleanly. If the processing itself is the point, you probably should not automate it, and the tool is there for when you want it.

That is the correct relationship to automation, I think. Not “this should always be automated” or “this should never be automated”, but “here is a tool that removes the mechanical part; use it when the mechanical part is not the point”.

A Note on the Tech Stack

For the interested: Next.js 15 with App Router, pnpm workspaces with Turbo for the monorepo, MusicKit JS for Apple Music integration, setlist.fm REST API. The JWT for Apple Music uses the jose library for token signing. The matching logic lives in a standalone packages/core module, which makes it testable in isolation and reusable if anyone wants to port this to a different frontend or a CLI.

The repo is at github.com/sebastianspicker/setlist-to-playlist. PRs welcome, particularly around the matching heuristics — that is the part where there is the most room for improvement.

The Deftones were exceptional, for the record. The Westfalenhalle was loud in the way that only a concrete hall that size can be loud, which is to say: correctly loud.

The playlist is good. I am glad it took four minutes and not forty.

I am also glad I know what I gave up.

A Gas at Temperature T: Xenakis and the Physics of Stochastic Music

Tue, 14 Oct 2025 00:00:00 +0000

Iannis Xenakis (1922–2001) was trained as a civil engineer at the Athens Polytechnic, joined the Greek Resistance during the Second World War and the subsequent Greek Civil War, survived a British army tank shell in January 1945 that cost him the sight in his left eye and part of his jaw, was sentenced to death in absentia by the Greek military government, fled to Paris in 1947, and worked for twelve years as an architect in Le Corbusier’s atelier — where he contributed structural engineering to the Unité d’Habitation in Marseille and designed the Philips Pavilion for Expo 58. In parallel, already in his thirties, he taught himself composition — approaching Honegger (who was too ill to teach) and then studying with Messiaen — and became one of the central figures of the post-war avant-garde. I mention the biography not as background colour but because it bears on the physics. A person who has been through what Xenakis had been through by 1950 is not likely to be intimidated by the kinetic theory of gases.

He was not. In 1955–56 he composed Pithoprakta — “actions through probability” — for 46 strings, each of which is, in his own account, a molecule of an ideal gas. This post works through the mathematics he used and asks what it means when a composer takes statistical mechanics seriously as a compositional tool.

The Problem with Post-War Serialism

To understand why Xenakis did what he did, it helps to know what everyone else was doing. By the early 1950s, the dominant tendency in European new music was total serialism: the systematic extension of Schoenberg’s twelve-tone technique to rhythm, dynamics, articulation, and register. Every parameter of every note was determined by a series. Messiaen had sketched this direction in Mode de valeurs et d’intensités (1949); Boulez and Stockhausen had taken it to its logical extreme.

The result, as Xenakis observed with characteristic bluntness in Formalized Music (1963/1992), was a kind of sonic indistinguishability: because every parameter varied according to independent deterministic series, the textures produced by total serialism sounded essentially like random noise. The maximum of local determinism had produced the appearance of global chaos.

His diagnosis was precise and, I think, correct: if the perceptual result of maximum determinism and maximum randomness is the same, then the path forward is not to find a better deterministic scheme but to embrace randomness explicitly, at the level that governs the macroscopic structure. Control the distribution; let the individual events vary within it. This is exactly what statistical mechanics does for a gas: it does not track every molecule, but it knows with great precision what the distribution of velocities will be.

Statistical Mechanics in Brief

In a classical ideal gas of $N$ molecules at thermal equilibrium with temperature $T$, the molecules move in all directions with speeds distributed according to the Maxwell-Boltzmann speed distribution:

$$f(v) = \sqrt{\frac{2}{\pi}}\, \frac{v^2}{a^3}\, \exp\!\left(-\frac{v^2}{2a^2}\right), \qquad a = \sqrt{\frac{k_B T}{m}},$$

where $m$ is the molecular mass and $k_B$ is Boltzmann’s constant. The parameter $a$ sets the characteristic speed scale: it grows with temperature (hotter gas means faster molecules) and shrinks with molecular mass (heavier molecules move more slowly at the same temperature).

The distribution has a characteristic shape: it rises as $v^2$ for small speeds (few molecules are nearly stationary), peaks at the most probable speed $v_p = a\sqrt{2}$, and falls off as $e^{-v^2/2a^2}$ for large speeds (very fast molecules are exponentially rare). The three characteristic speeds are:

$$v_p = a\sqrt{2}, \qquad \langle v \rangle = a\sqrt{\tfrac{8}{\pi}}, \qquad v_\mathrm{rms} = a\sqrt{3}.$$

No individual molecule is tracked. The distribution is everything: once you know $f(v)$, you know all macroscopic properties of the gas — pressure, mean kinetic energy, thermal conductivity — without knowing the trajectory of a single molecule. The individual is sacrificed to the ensemble.

Pithoprakta and the Orchestra as Gas

In Pithoprakta (1955–56), Xenakis assigns each of the 46 string instruments to a molecule of a gas. The musical analogue of molecular speed is the velocity of a glissando: the rate at which a glissando moves through pitch, measured in semitones per second. Slow glissandi are cold molecules; fast glissandi are hot ones.

For a given passage with a specified musical “temperature” (an intensity-and-density parameter he could set as a compositional choice), the 46 glissando speeds are drawn from the Maxwell-Boltzmann distribution for that temperature. No two strings play the same glissando at the same speed. The effect, to a listener, is a dense sound-mass — a shimmer or a roar — whose internal texture varies but whose overall character (the temperature, the density) is under the composer’s control at exactly the level that matters perceptually.

Xenakis worked out the velocities numerically by hand. The score of Pithoprakta was among the first in which the individual parts were derived from a statistical distribution rather than from a melody, a row, or an improvisation instruction. The calculation is tedious but not difficult: for each time window, choose a temperature, compute $f(v)$ for the 46 values of $v$ that tile the distribution, and assign one speed to each instrument.

The connection between macroscopic structure and microscopic liberty is deliberately preserved. The shape of the sound-mass — its brightness, its turbulence, its rate of change — is controlled. Each individual line is unpredictable. This is, structurally, the same trade-off that makes thermodynamics work: you give up on the individual trajectory and gain exact knowledge of the aggregate.

Musical Temperature as a Compositional Parameter

The analogy is worth making precise. In the physical gas, raising the temperature $T$ increases $a = \sqrt{k_B T / m}$, which shifts the peak of $f(v)$ to the right and widens the distribution. More molecules have high speeds; the variance of speeds increases.

In Pithoprakta, raising the musical “temperature” has the same effect: more instruments perform rapid glissandi; the pitch-space trajectories are more varied; the texture becomes more active and more turbulent. Lowering the temperature concentrates the glissando speeds near zero — slow motion, near-stasis, long sustained tones that change pitch only gradually. The orchestra cools.

This mapping is not metaphorical. Xenakis computed it. The score contains numerically derived glissando speeds; the connection between the perceptual temperature of the texture and the statistical parameter $T$ is quantitative. When musicians speak of a passage “heating up,” they are usually using a figure of speech. In Pithoprakta, they are describing a thermodynamic fact.

The Poisson Distribution and Event Density

Pithoprakta uses a second physical model alongside the Maxwell-Boltzmann distribution: the Poisson process, which governs the density of independent, randomly occurring events.

If musical events (pizzicato attacks, bow changes, individual note entries) occur at a mean rate of $\lambda$ events per second, the probability of exactly $k$ events occurring in a time window of length $T$ is:

$$P(N = k) = \frac{(\lambda T)^k\, e^{-\lambda T}}{k!}.$$

The Poisson distribution has a single parameter $\lambda$ that controls both the mean and the variance (they are equal: $\langle N \rangle = \mathrm{Var}(N) = \lambda T$). A high $\lambda$ produces a dense cluster of events; a low $\lambda$ produces sparse, widely spaced events.

Xenakis used this to control the density of pizzicato attacks independently of the glissando texture. A passage can be cool (slow glissandi) and dense (many pizzicati), or hot and sparse, or any combination. The two distributions operate on independent musical parameters — pitch motion and event density — giving the composer a two-dimensional thermodynamic control space over the texture.

Markov Chains: Analogique A and Analogique B

In Analogique A (for string orchestra, 1958–59) and its companion Analogique B (for sinusoidal tones, same year), Xenakis moved to a different stochastic framework: Markov chains.

A Markov chain is a sequence of states where the probability of transitioning to the next state depends only on the current state. The chain is specified by a transition matrix $P$, where $P_{ij}$ is the probability of moving from state $i$ to state $j$:

$$P_{ij} \geq 0, \qquad \sum_j P_{ij} = 1 \quad \forall\, i.$$

Under mild conditions (irreducibility and aperiodicity), the chain converges to a unique stationary distribution $\pi$ satisfying:

$$\pi P = \pi, \qquad \sum_i \pi_i = 1.$$

The convergence is geometric: if $\lambda_2$ is the second-largest eigenvalue of $P$ in absolute value, then after $n$ steps the distribution $\pi^{(n)}$ satisfies $\|\pi^{(n)} - \pi\| \leq C |\lambda_2|^n$ for some constant $C$. The gap $1 - |\lambda_2|$ — the spectral gap — controls how quickly the chain forgets its initial state. A transition matrix with a large spectral gap produces rapid convergence; one with $|\lambda_2| \approx 1$ produces long-memory dependence between distant states. This is a compositional choice: the spectral gap determines how quickly a piece’s texture changes character.

In Analogique A, Xenakis divided the sonic space into a grid of cells defined by pitch register (high/middle/low), density (sparse/medium/dense), and dynamic (soft/loud). Each “screen” — a brief time window — occupies one cell in this grid. The progression of screens through the piece is governed by transition probabilities: from a high/dense/loud screen, there is some probability of moving to each adjacent cell, specified by Xenakis’s chosen transition matrix.

This is a Markov chain on a discrete state space of sonic textures. The macroscopic trajectory of the piece — its overall movement through sound- quality space — is determined by the transition matrix, which the composer sets. The details of each screen are filled in stochastically, within the parameters of the current state. Again, the individual is sacrificed to the aggregate; control is exercised at the level of the distribution rather than the event.

Game Theory: Duel and Stratégie

The most extreme and, to my mind, most interesting of Xenakis’s formalisations is the use of game theory in Duel (1959) and Stratégie (1962).

A two-player zero-sum game is specified by a payoff matrix $A \in \mathbb{R}^{m \times n}$. Player 1 (the “maximiser”) chooses a row $i$; Player 2 (the “minimiser”) chooses a column $j$; Player 1 receives payoff $A_{ij}$ and Player 2 receives $-A_{ij}$. In a pure-strategy game, each player selects a single action. In a mixed-strategy game, each player chooses a probability distribution over their actions: Player 1 uses $\mathbf{x} \in \Delta_m$ and Player 2 uses $\mathbf{y} \in \Delta_n$, where $\Delta_k$ denotes the standard $(k-1)$-simplex.

The expected payoff to Player 1 under mixed strategies is:

$$E(\mathbf{x}, \mathbf{y}) = \mathbf{x}^\top A\, \mathbf{y}.$$

Von Neumann’s minimax theorem (1928) guarantees that:

$$\max_{\mathbf{x} \in \Delta_m} \min_{\mathbf{y} \in \Delta_n} \mathbf{x}^\top A\, \mathbf{y} \;=\; \min_{\mathbf{y} \in \Delta_n} \max_{\mathbf{x} \in \Delta_m} \mathbf{x}^\top A\, \mathbf{y} \;=\; v^*,$$

where $v^*$ is the value of the game. The pair $(\mathbf{x}^*, \mathbf{y}^*)$ that achieves this saddle point is the Nash equilibrium: neither player can improve their expected payoff by unilaterally deviating from their equilibrium strategy.

In Stratégie, each conductor leads one orchestra. Each has nineteen “tactics” — six basic musical textures (e.g., sustained chords, staccato pizzicati, glissandi masses, silence) plus thirteen combinatorial tactics that combine two or three of the basics. The payoff matrix is a $19 \times 19$ integer matrix, also defined by Xenakis, specifying how many points Conductor 1 scores when their orchestra plays tactic $i$ against Conductor 2’s tactic $j$. A referee tracks the score.

The conductors make decisions in real time during the performance, choosing tactics based on what the other conductor is doing and on the evolving score. The piece ends when one conductor reaches a predetermined score threshold.

The Nash equilibrium of the payoff matrix tells each conductor, in principle, the optimal distribution over tactics to play: if both play optimally, the expected score trajectory is determined. In practice, conductors are not expected to compute mixed strategies on the podium; Xenakis’s point is structural. The game-theoretic formalism is used to design the payoff matrix so that no tactic dominates — every choice has consequences that depend on the opponent’s choice — guaranteeing that the piece will always contain genuine strategic tension regardless of who is conducting.

Duel (1959) is the earlier, simpler version for two chamber orchestras. Stratégie (1962) was premiered in April 1963 at the Venice Biennale with two conductors competing live. The audience was aware of the game, of the score, and of the payoff matrix. The premiere was by most accounts a success, though the practical complications of running a zero-sum game in a concert hall (including the question of whether conductors were actually computing Nash equilibria or just following intuition) were never fully resolved.

Formalized Music

Xenakis assembled his theoretical framework in Musiques formelles (1963), translated and expanded as Formalized Music (1971; revised edition 1992). The book is one of the strangest documents in twentieth-century music theory: part treatise, part manifesto, part mathematical appendix. It covers stochastic composition, Markov chains, game theory, set theory, group theory, and symbolic logic — all presented with the confidence of someone who is equally at home in the engineering faculty and the concert hall, and with the occasional obscurity of someone writing simultaneously for two audiences who share almost no vocabulary.

The core argument is that musical composition can and should be treated as the application of mathematical structures to sonic material, not because mathematics makes music “better” but because mathematical structures are the most powerful available tools for controlling relationships between sounds at multiple scales simultaneously. The statistical distributions control the macroscopic; the individual values vary within them. The game- theoretic payoff matrix controls the strategic interaction; the individual tactics fill in the details. Mathematics operates at the structural level and leaves the acoustic surface free.

This is a different relationship between mathematics and music from the ones in my earlier posts on group theory and Messiaen or the Euclidean algorithm and world rhythms. In those cases, mathematics describes structure that already exists in the music — structure the composers arrived at by ear. In Xenakis, mathematics is the generative tool: the score is derived from the calculation.

What the Analogy Does and Does Not Do

The Maxwell-Boltzmann analogy in Pithoprakta is exact in one direction and approximate in another.

It is exact in the following sense: the glissando speeds Xenakis computed for his 46 strings genuinely follow the Maxwell-Boltzmann distribution with the parameters he chose. The score is a realisation of that distribution. If you collect the glissando speeds from the score and plot their histogram, you will find the characteristic $v^2 e^{-v^2/2a^2}$ shape.

It is approximate — or rather, it is analogical — in the sense that strings in an orchestra are not molecules of a gas. They do not collide. They have mass and inertia in a physical sense that has no direct mapping to musical parameters. The temperature $T$ is not a temperature in any thermodynamic sense; it is a compositional variable that Xenakis chose to parameterise with the same symbol because the formal relationship is the same. The analogy is structural, not ontological.

This is worth saying plainly because it is easy to be misled in both directions: either to over-claim (the orchestra is a gas) or to dismiss (the orchestra is merely labelled with physical vocabulary). The actual claim is more modest and more interesting: the mathematical structure of the Maxwell-Boltzmann distribution is the right tool for specifying a certain kind of orchestral texture, namely one where individual elements vary stochastically around a controlled macroscopic envelope. The physics provides the formalism; the music provides the application. This is how mathematics works in engineering, too.

The Centenary and What Remains

Xenakis died in 2001, by then partially deaf and with dementia. His centenary in 2022 produced a wave of new performances, recordings, and scholarship — including the Meta-Xenakis volume (Open Book Publishers, 2022), which collects analyses of his compositional mathematics, his architectural work (he designed the Philips Pavilion for Le Corbusier’s Expo 58 in Brussels using the same ruled-surface geometry he was using in Metastaseis), and his political biography.

What remains resonant about his project is not the specific distributions he chose — the Maxwell-Boltzmann is not the only or even necessarily the best distribution for many musical applications — but the epistemological position it represents. Xenakis insisted that the right question to ask about a musical texture is not “what is the note at beat 3 of bar 47?” but “what is the distribution from which the events in this section are drawn?” This shift from individual determination to statistical control is precisely the shift that makes thermodynamics possible as a science, and Xenakis was the first composer to apply it deliberately and systematically.

When a composer writes “let the orchestra be a gas at temperature $T$” and then actually computes the consequences with Boltzmann’s constant in front of him, I do not feel that physics has been appropriated. I feel that it has been recognised — seen, from a different direction, as the same thing it always was: a set of tools for thinking about ensembles of interacting elements whose individual behaviour is too complex to track but whose collective behaviour is not.

The orchestra is not a gas. But the Maxwell-Boltzmann distribution describes it anyway.

References

Ames, C. (1989). The Markov process as a compositional model: A survey and tutorial. Leonardo, 22(2), 175–187. https://doi.org/10.2307/1575226
Jedrzejewski, F. (2006). Mathematical Theory of Music. Delatour France / IRCAM.
Nash, J. F. (1950). Equilibrium points in $n$-person games. Proceedings of the National Academy of Sciences, 36(1), 48–49. https://doi.org/10.1073/pnas.36.1.48
Nierhaus, G. (2009). Algorithmic Composition: Paradigms of Automated Music Generation. Springer.
Matossian, N. (2005). Xenakis (revised ed.). Moufflon Publications.
Solomos, M. (Ed.). (2022). Meta-Xenakis. Open Book Publishers. https://doi.org/10.11647/OBP.0313
von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische Annalen, 100(1), 295–320. https://doi.org/10.1007/BF01448847
von Neumann, J., & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press.
Xenakis, I. (1992). Formalized Music: Thought and Mathematics in Composition (revised ed.). Pendragon Press. (Originally published as Musiques formelles, La Revue Musicale, 1963.)

Changelog

2026-01-14: Corrected the description of Stratégie (1962): each conductor has nineteen tactics (six basic plus thirteen combinatorial), with a 19 x 19 payoff matrix — not six tactics and a 6 x 6 matrix. The six-tactic, 6 x 6 description applies to the earlier Duel (1959).
2026-01-14: Added “in April 1963” to the Stratégie premiere sentence. The composition date is 1962; the premiere took place on 25 April 1963 at the Venice Biennale.
2026-01-14: Changed “studying briefly with Honegger” to “approaching Honegger (who was too ill to teach).” Xenakis sought instruction from Honegger circa 1949, but Honegger was in declining health and did not take him as a student.

Star Polygons and Drum Machines

Mon, 07 Jul 2025 00:00:00 +0000

Two star polygons appear in Danny Carey’s visual vocabulary, and they are not the same star. One is open, almost friendly — seven points connected by relatively shallow angles. The other is sharper, the points more acute. They look like variations on a theme, which is accurate: both are drawn on seven equally spaced vertices, but one connects every second vertex and the other connects every third.

In Schläfli notation — the system for naming regular star polygons — these are $\{7/2\}$ and $\{7/3\}$ [1]. Both appear in Tool’s artwork, in Thelemic symbolism, in medieval Islamic geometric patterns, and on the floor plans of cathedrals. They are the most visually intricate star polygons that can be drawn in a single closed stroke before the figure becomes illegible.

Both of them have a property that five-pointed and six-pointed stars do not share: they visit every vertex before closing. This is a consequence of 7 being prime. And it turns out to matter for how rhythmic accent cycles are built.

The Schläfli Symbol

A regular star polygon $\{n/k\}$ is constructed by placing $n$ points evenly on a circle and connecting every $k$-th point in sequence until the path closes. The structural key is a single number:

$$d = \gcd(n, k).$$

If $d = 1$, the traversal visits all $n$ vertices before returning to the start — a single connected figure. If $d > 1$, the path visits only $n/d$ vertices before closing, and the full figure consists of $d$ separate copies of the smaller star $\{(n/d)\,/\,(k/d)\}$.

The most familiar example of the disconnected case: $\{6/2\}$, the Star of David. Here $\gcd(6,2) = 2$, so the figure breaks into two copies of $\{3/1\} = \{3\}$ — two overlapping equilateral triangles. The traversal starting at vertex 1 visits $1 \to 3 \to 5 \to 1$, leaving vertices 2, 4, 6 entirely unvisited.

The pentagram $\{5/2\}$ is connected: $\gcd(5,2)=1$, traversal $1 \to 3 \to 5 \to 2 \to 4 \to 1$, all five vertices.

For $n=7$:

$\{7/2\}$: $\gcd(7,2)=1$, traversal $1 \to 3 \to 5 \to 7 \to 2 \to 4 \to 6 \to 1$, all seven vertices.
$\{7/3\}$: $\gcd(7,3)=1$, traversal $1 \to 4 \to 7 \to 3 \to 6 \to 2 \to 5 \to 1$, all seven vertices.

Both connected. Neither leaves any vertex unvisited.

The Group Theory

The traversal of $\{n/k\}$ is an instance of a standard construction in modular arithmetic: the orbit of an element under repeated addition in $\mathbb{Z}/n\mathbb{Z}$.

Label the $n$ vertices $0, 1, \ldots, n-1$. Starting at vertex 0, the traversal visits:

$$0, \quad k \bmod n, \quad 2k \bmod n, \quad 3k \bmod n, \quad \ldots$$

The orbit of 0 under the action of $+k$ is the subgroup of $\mathbb{Z}/n\mathbb{Z}$ generated by $k$. By a standard result, this subgroup has size $n / \gcd(n,k)$.

When $\gcd(n,k) = 1$: orbit size $= n$. The traversal visits every vertex.
When $\gcd(n,k) = d > 1$: orbit size $= n/d$. The traversal visits only a fraction of the vertices.

For prime $n$: $\gcd(n,k) = 1$ for every $1 \leq k \leq n-1$, without exception. Every traversal is complete. There is no step size that traps the path in a proper sub-orbit before visiting all vertices. This follows directly from the fact that a prime has no divisors other than 1 and itself, so $\mathbb{Z}/p\mathbb{Z}$ has no non-trivial subgroups (Lagrange’s theorem: any subgroup of a group of prime order must have order 1 or $p$).

This is the specific property that makes 7 — and any prime — rhythmically fertile.

The Contrast with Six

The comparison with $n = 6$ is the clearest illustration.

In $\mathbb{Z}/6\mathbb{Z}$, the possible step sizes are 1, 2, 3, 4, 5. Their orbits:

Step $k$	$\gcd(6,k)$	Orbit size	Vertices visited
1	1	6	0,1,2,3,4,5 (the hexagon)
2	2	3	0,2,4 only
3	3	2	0,3 only
4	2	3	0,2,4 only
5	1	6	0,5,4,3,2,1 (the hexagon reversed)

The only step sizes that visit all six vertices are 1 and 5 — both of which just traverse the hexagon itself, not a star. Every non-trivial star polygon on six points gets trapped. $\{6/2\}$ visits only half the vertices. $\{6/3\}$ visits only two. There is no connected six-pointed star that isn’t either the hexagon or a compound figure.

In $\mathbb{Z}/7\mathbb{Z}$, every step from 2 to 5 generates the full group:

Step $k$	$\gcd(7,k)$	Orbit size	Traversal
2	1	7	1,3,5,7,2,4,6
3	1	7	1,4,7,3,6,2,5
4	1	7	1,5,2,6,3,7,4
5	1	7	1,6,4,2,7,5,3

All four non-trivial step sizes give connected traversals. Both are stars. Both visit every vertex. This is not a coincidence: it is the algebraic signature of primality.

From Geometry to Rhythm

The connection to drumming is direct. Here is the mechanism.

Consider a repeating rhythmic figure of 7 beats — a bar of 7/8, say, with positions 1 through 7. An earlier post discussed Euclidean rhythms: the algorithm that distributes $k$ onset positions as evenly as possible among $n$ slots. That is a problem of selection — which of the $n$ positions to activate.

The star polygon traversal asks a different question. Given that all $n$ positions are present, in what order of emphasis should they be related, such that each accent is a fixed distance from the last? The traversal of $\{n/k\}$ answers this: accent position $1$, then $1+k$, then $1+2k$, and so on modulo $n$.

For $\{7/2\}$: the accent cycle within a single bar runs $1 \to 3 \to 5 \to 7 \to 2 \to 4 \to 6$. Each featured beat is two positions ahead of the last.

Now project this across multiple bars. In bar 1, the primary accent sits on beat 1. In bar 2, if the accent shifts by 2, it lands on beat 3. Bar 3: beat 5. Bar 4: beat 7. Bar 5: beat 2. Bar 6: beat 4. Bar 7: beat 6. Bar 8: beat 1 again.

The accent takes seven bars to return to its starting position. Because $\gcd(2,7) = 1$, the step of 2 generates all of $\mathbb{Z}/7\mathbb{Z}$: every beat position receives the accent exactly once before the cycle resets. The resulting large-scale figure is $7 \times 7 = 49$ beats long — a super-phrase built from a single local rule.

The $\{7/3\}$ traversal generates the same exhaustiveness with a different path. Step 3 gives $1 \to 4 \to 7 \to 3 \to 6 \to 2 \to 5$: a seven-bar accent cycle that visits every position before repeating, but with wider spacing between accented beats, creating a different feel over the same underlying meter.

A six-beat figure with step 2 cannot do this. The accent visits only beats 1, 3, 5 — half the cycle — and loops back without touching beats 2, 4, 6. A drummer building phrase-level architecture from a six-beat grid is working with a more fragmented material.

Two Problems, One Prime

It is worth stating the relationship between the star polygon approach and Euclidean rhythms precisely, because the two are sometimes conflated [2].

The Euclidean algorithm distributes $k$ onsets among $n$ positions with maximal evenness. The result is a subset of the $n$ positions — a selection. The primality of $n$ matters here too: because $\gcd(k,p) = 1$ for prime $p$ and any $1 \leq k \leq p-1$, the Euclidean rhythm $E(k,p)$ always achieves its theoretical maximum of evenness. There are no divisibility shortcuts that cause clumping.

The star polygon traversal selects no subset — it relates all $n$ positions via a cyclic permutation. The primality of $n$ matters here because it guarantees that every non-trivial cyclic permutation (every step size $k$ with $1 < k < n$) generates the full group, visiting all positions before repeating.

Same arithmetic property — $\gcd(k,p) = 1$ for all non-zero $k$ — but the two problems ask different things of it. Euclidean rhythms use it to guarantee dense coverage. Star polygon traversals use it to guarantee no sub-orbit trapping.

The Compound Structure

Written out explicitly, the $\{7/2\}$ accent pattern over seven bars looks like this — with bold marking the featured beat in each bar:

$$\begin{array}{rccccccc} \text{bar 1:} & \mathbf{1} & 2 & 3 & 4 & 5 & 6 & 7 \\ \text{bar 2:} & 1 & 2 & \mathbf{3} & 4 & 5 & 6 & 7 \\ \text{bar 3:} & 1 & 2 & 3 & 4 & \mathbf{5} & 6 & 7 \\ \text{bar 4:} & 1 & 2 & 3 & 4 & 5 & 6 & \mathbf{7} \\ \text{bar 5:} & 1 & \mathbf{2} & 3 & 4 & 5 & 6 & 7 \\ \text{bar 6:} & 1 & 2 & 3 & \mathbf{4} & 5 & 6 & 7 \\ \text{bar 7:} & 1 & 2 & 3 & 4 & 5 & \mathbf{6} & 7 \\ \end{array}$$

Each bar is metrically identical. The large-scale accent — which beat carries the phrase-level emphasis — traces the traversal path of the $\{7/2\}$ star polygon across the seven-bar cycle.

This is the kind of large-scale rhythmic architecture visible in a great deal of Tool’s output. Whether Danny Carey explicitly constructs accent cycles from star polygon traversal paths, or whether the same structure emerges from his intuitive sense of how prime time signatures behave, produces the same result. The mathematics and the musical instinct point toward the same pattern.

Why the Heptagram

The full mathematical picture of why seven-fold symmetry is special — why the regular heptagon cannot be constructed by compass and straightedge, what the minimal polynomial of $\cos(2\pi/7)$ implies about the heptagon’s position outside the constructible world, and how the Galois group of the cyclotomic field over $\mathbb{Q}$ carries the obstruction — is developed in the companion post The Impossible Heptagon.

The short version, for the purposes of this post: seven is the smallest odd prime that is not a Fermat prime ($2^{2^j}+1$). This algebraic accident places it outside the reach of ruler-and-compass construction — the heptagon exists as an ideal but cannot be manifested by the classical tools. Its star polygons are the accessible shadows of an inaccessible form. And its primality, in both the constructibility sense and the traversal sense, is precisely what makes it inexhaustible as a rhythmic resource.

The Fibonacci structure in “Lateralus” [3], the group theory underlying twelve-tone equal temperament [4], and the Euclidean rhythm algorithm [5] are all different facets of the same observation: mathematical structure, introduced as compositional constraint, generates musical complexity that cannot easily be produced by intuition alone. The star polygon is another instance. The drummer who keeps a heptagram on his kit has found, by a non-mathematical route, an object with a precise and interesting mathematical identity.

References

[1] Coxeter, H.S.M. (1973). Regular Polytopes (3rd ed.). Dover. Ch. 2.

[2] Toussaint, G. (2013). The Geometry of Musical Rhythm: What Makes a “Good” Rhythm Good? CRC Press.

[3] See Fibonacci and Lateralus on this blog.

[4] See Twelve-TET and Group Theory on this blog.

[5] See Euclidean Rhythms on this blog.

The Oldest Algorithm in the World Plays the Clave

Mon, 07 Apr 2025 00:00:00 +0000

The first time I encountered the West African standard bell pattern it was in a Music and Physics seminar. The lecturer played a twelve-beat cycle on a wood block — seven strokes distributed unevenly but with a rightness that arrested the room. She then played the Cuban clave, the bossa nova timeline, a Bulgarian aksak rhythm. Different cultures, different instruments, different centuries. She asked whether there was a pattern. There was. It is named after a mathematician who died around 270 BCE.

Euclid’s Algorithm

Every student who has taken a number theory course has encountered the algorithm for computing the greatest common divisor of two positive integers. Given $a \geq b$, repeatedly replace $(a, b)$ with $(b, a \bmod b)$ until the remainder is zero; the last non-zero remainder is the GCD.

For example, $\gcd(8, 3)$:

$$8 = 2 \times 3 + 2 \;\Rightarrow\; \gcd(8, 3) = \gcd(3, 2)$$

$$3 = 1 \times 2 + 1 \;\Rightarrow\; \gcd(3, 2) = \gcd(2, 1)$$

$$2 = 2 \times 1 + 0 \;\Rightarrow\; \gcd(2, 1) = 1.$$

Three steps, result 1 (8 and 3 are coprime). The algorithm is efficient: the number of steps is proportional to the number of digits in the smaller input. It appears in Book VII of Euclid’s Elements, composed around 300 BCE, making it the oldest non-trivial algorithm in the Western mathematical tradition.

Distributing Onsets: Toussaint’s Observation

In 2005, Godfried Toussaint — a computer scientist and ethnomusicologist at McGill University — published the observation that the problem of distributing $k$ musical onsets as evenly as possible among $n$ time slots has the same recursive structure as Euclid’s algorithm applied to the pair $(k, n-k)$ (Toussaint, 2005).

The algorithm that solves this distribution problem was independently discovered in nuclear physics. Bjorklund (2003), working on timing systems for the Spallation Neutron Source particle accelerator at Oak Ridge, needed to distribute $k$ beam-extraction pulses as evenly as possible among $n$ machine cycles. The algorithm he derived — Bjorklund’s algorithm — is mathematically equivalent to the Euclidean algorithm applied to the same pair of integers.

The resulting pattern is denoted $E(k, n)$: the Euclidean rhythm with $k$ onsets distributed among $n$ pulses. A 1 denotes an onset; a 0 denotes a rest.

Working Through $E(3, 8)$: The Tresillo

Let us derive $E(3, 8)$ — 3 onsets distributed in 8 pulses — step by step.

Start: 3 onset groups and 5 rest groups:

$$[1]\; [1]\; [1]\; [0]\; [0]\; [0]\; [0]\; [0]$$

Step 1: Distribute one rest group into each onset group, pairing until the shorter list is exhausted. Three pairs, with $5 - 3 = 2$ rest groups remaining:

$$[1,0]\; [1,0]\; [1,0]\; [0]\; [0]$$

Step 2: Now 3 longer groups and 2 shorter groups. Distribute one shorter group into each longer group, $3 - 2 = 1$ longer group unpaired:

$$[1,0,0]\; [1,0,0]\; [1,0]$$

Step 3: The two group types have different lengths and only one group of the shorter type remains; no further pairing is possible. Read the sequence left to right:

$$E(3, 8) = [1, 0, 0, 1, 0, 0, 1, 0].$$

This is the Cuban tresillo — one of the foundational rhythmic cells of Afro-Cuban music, used across son, salsa, and mambo. Its onset positions are $\{0, 3, 6\}$, giving gap sizes $[3, 3, 2]$: two wide gaps and one narrow gap, arranged as evenly as the integers allow.

The parallel with Euclid’s algorithm is direct. In the division $8 = 2 \times 3

2$, the quotient 2 gives the number of pairing steps before a remainder appears, and the remainder 2 gives the number of groups in the shorter list at each intermediate stage. The recursion $\gcd(8, 3) \to \gcd(3, 2) \to \gcd(2, 1)$ mirrors the three steps above.

The Gap Structure

For any $E(k, n)$, the spacing between consecutive onsets takes exactly two values:

$$\text{gap} \in \left\{\left\lfloor \frac{n}{k} \right\rfloor,\ \left\lceil \frac{n}{k} \right\rceil\right\}.$$

The number of each gap size is determined by the constraint that all $k$ gaps sum to $n$. Writing $\alpha = n \bmod k$:

$$\alpha \cdot \left\lceil \frac{n}{k} \right\rceil \;+\; (k - \alpha) \cdot \left\lfloor \frac{n}{k} \right\rfloor = n.$$

So $E(k,n)$ has $\alpha$ gaps of the larger size and $k - \alpha$ gaps of the smaller size. The Euclidean property is that these two gap types are distributed as evenly as possible among themselves — not clustered at one end of the cycle but interleaved. A cycle that maximises the minimum distance between any two consecutive onsets has this property; it is called maximally even (Clough and Douthett, 1991).

For $E(3, 8)$: $\lfloor 8/3 \rfloor = 2$, $\lceil 8/3 \rceil = 3$, $\alpha = 8 \bmod 3 = 2$. Two gaps of 3, one gap of 2. Gap sequence $[3, 3, 2]$. Maximum-evenness is why the tresillo sounds “right” even though it is asymmetric: the asymmetry is the smallest possible deviation from perfect regularity.

A Gallery of World Rhythms

The following table, derived from Toussaint (2005, 2020), shows Euclidean rhythms alongside their ethnomusicological identifications. Asterisks mark patterns given as rotations of the canonical form.

Pattern	Gap structure	Musical tradition
$E(2,3) = [1,0,1]$	$[2,1]$	Iambic foot; West African, Balkan
$E(3,8) = [1,0,0,1,0,0,1,0]$	$[3,3,2]$	Cuban tresillo; Flamenco
$E(5,8) = [1,0,1,1,0,1,1,0]$	$[2,1,2,1,2]^*$	Cuban cinquillo
$E(4,9) = [1,0,0,1,0,1,0,1,0]^*$	$[3,2,2,2]^*$	Turkish aksak patterns
$E(7,12) = [1,0,1,1,0,1,0,1,1,0,1,0]$	$[2,1,2,2,1,2,2]^*$	West African standard bell
$E(9,16)$	$[2,1,2,2,1,2,2,1,2,1]^*$	Brazilian and West African
$E(13,24)$		South Indian (Carnatic) tāla

Three of these are worth examining in more detail.

$E(5,8)$: the cinquillo. Five onsets in eight pulses: $\lfloor 8/5 \rfloor = 1$, $\lceil 8/5 \rceil = 2$, $\alpha = 3$. Three gaps of 2 and two gaps of 1. Gap sequence $[2,1,2,1,2]$ or a rotation thereof. The cinquillo is a fundamental pattern in Cuban music, used as a melodic rhythmic figure in the nineteenth-century contradanza and in much of what followed.

$E(7,12)$: the West African standard bell. Seven onsets in a twelve-beat cycle: $\lfloor 12/7 \rfloor = 1$, $\lceil 12/7 \rceil = 2$, $\alpha = 5$. Five gaps of 2 and two gaps of 1. This timeline — used across the Ewe, Akan, and many other traditions in West Africa — is the cyclic reference structure against which other rhythmic layers are measured in ensemble drumming. It is also the pitch-class set $\{0, 2, 4, 5, 7, 9, 11\}$ — the Western diatonic scale, translated from pitch to rhythm. That the same maximally-even distribution describes both the diatonic scale in pitch space and the standard bell in rhythm is one of the more remarkable coincidences in mathematical music theory.

Universality across non-connected cultures. The tresillo $E(3,8)$ appears independently in Cuban music, Flamenco, Namibian Juǀ’hoansi music, and medieval Persian music (Toussaint, 2020). These traditions had no common musical ancestor that could have transmitted the pattern. The Euclidean algorithm produces what maximum evenness demands, and maximum evenness turns out to be what these rhythmic traditions independently converged on.

Circular Notation and Necklaces

Euclidean rhythms are most naturally represented as necklaces — equivalence classes of binary sequences under cyclic rotation. All rotations of $E(3,8)$ represent the same rhythmic structure with a different starting downbeat: the musical identity is independent of which position is designated “beat 1.”

In circular notation, place $n$ equally spaced dots on a circle and mark the $k$ onset positions. The pattern is immediately visible: the $k$ onset-dots divide the circle as evenly as possible. For $E(7,12)$, the seven onset dots on a twelve-position circle look like the seven vertices of a near-regular heptagon inscribed in a dodecagon. For $E(3,8)$, the three onset dots form a near- equilateral triangle.

This geometric representation makes the maximum-evenness property transparent in a way that the linear binary string does not. It also makes clear why Euclidean rhythms feel “balanced” when played: the onset dots distribute the “weight” of the cycle as uniformly as the integer constraints allow.

The mathematical theory of necklaces belongs to combinatorics on words. Euclidean rhythms correspond to specific equivalence classes of binary sequences known as Christoffel words (Lothaire, 2002): words over the alphabet $\{0,1\}$ whose combinatorial properties encode the slope of a line segment, which brings us to the third independent context in which the same algorithm appears.

The Bresenham Connection

Jack Bresenham’s line algorithm (1965) rasterises a line from $(0,0)$ to $(n,k)$ on a grid of integer pixels. At each column $x$, the algorithm tracks whether the fractional error accumulated since the last row increment exceeds $\frac{1}{2}$, and if so, increments the row and resets the error. The sequence of column positions at which the row increments is the onset pattern $E(k,n)$.

Formally, an onset occurs at position $m$ in $E(k,n)$ if and only if:

$$\left\lfloor \frac{(m+1)\, k}{n} \right\rfloor > \left\lfloor \frac{m\, k}{n} \right\rfloor.$$

Equivalently, the onset positions themselves form the sequence:

$$s_j = \left\lfloor \frac{j \cdot n}{k} \right\rfloor, \qquad j = 0, 1, \ldots, k-1.$$

For $E(3,8)$: $s_0 = 0$, $s_1 = \lfloor 8/3 \rfloor = 2$, $s_2 = \lfloor 16/3 \rfloor = 5$, giving onset positions $\{0, 2, 5\}$ — a rotation of the tresillo.

This is exactly the Bresenham increment condition. Drawing the line from $(0,0)$ to $(8,3)$ and marking where the $y$-coordinate takes a step produces the onset positions $\{2, 5, 7\}$ — a rotation of the tresillo $\{0, 3, 6\}$.

Three independent fields — ancient Greek number theory, Afro-Caribbean percussion, and 1960s computer graphics — converge on the same mathematical object. This is not a coincidence. All three are solving the same fundamental problem: how to distribute $k$ discrete events as evenly as possible among $n$ slots. When the problem is universal, its solution is too.

Euclidean Rhythms in Contemporary Practice

Toussaint’s 2005 paper was primarily a contribution to computational ethnomusicology, but it reached electronic music production rapidly. Euclidean rhythm sequencers are now standard in modular synthesis (dedicated Eurorack hardware modules exist under names including “Euclidean” and “Erica Synths Pico”) and digital audio workstations (as Max for Live devices and software plug-ins). The interface is minimal: set $k$ and $n$, adjust the rotation offset, and hear the resulting timeline immediately.

This has opened a compositional mode in which the mathematical structure is operational: a producer constructing a layered African-style polyrhythm by stacking $E(3,8)$, $E(5,8)$, and $E(7,8)$ on different instruments is — whether they know it or not — computing the Euclidean algorithm three times and listening to the result.

Implications for Teaching Rhythm

Music conservatories in the European tradition teach rhythm almost entirely through Western notation: time signatures, note values, dotted notes, ties. This system is well-suited to the repertoire it was designed for. It handles Euclidean rhythms awkwardly. The tresillo $E(3,8)$ requires either a triplet feel against a binary pulse or a notation involving a dotted quarter note followed by a dotted quarter and a quarter, which correctly represents the sound but obscures the structural principle entirely.

The Euclidean framework suggests a different pedagogical starting point. Rather than beginning from the bar line and asking how notes fill it, begin from the cycle length $n$ and the onset count $k$ and ask how to distribute the onsets as evenly as possible. The answer is always computable and always produces a recognisable rhythm.

For students who encounter West African, Afro-Cuban, or Middle Eastern music — which conservatory students increasingly do — having a framework that makes these rhythms structurally necessary rather than culturally exotic changes the pedagogical relationship fundamentally. The tresillo is not a deviation from “normal” rhythm. It is the unique maximally even solution to the problem of placing three beats in eight pulses. That the same algorithm appeared in a 300 BCE Alexandrian text on number theory is an accident of the history of mathematics. That it sounds right is not.

Whether conservatory curricula are ready to incorporate the Euclidean framework alongside Western notation is a separate question. The mathematics does not demand it. But it offers a language for rhythm that transcends the Western bar-line without abandoning precision — and that seems worth something, especially in a world where the music students will perform and teach is no longer exclusively European.

References

Bjorklund, E. (2003). The theory of rep-rate pattern generation in the SNS timing system. Technical Report SNS-NOTE-CNTRL-99, Spallation Neutron Source, Oak Ridge National Laboratory.
Bresenham, J. E. (1965). Algorithm for computer control of a digital plotter. IBM Systems Journal, 4(1), 25–30. https://doi.org/10.1147/sj.41.0025
Clough, J., & Douthett, J. (1991). Maximally even sets. Journal of Music Theory, 35(1–2), 93–173. https://doi.org/10.2307/843811
Lothaire, M. (2002). Algebraic Combinatorics on Words. Cambridge University Press.
Toussaint, G. T. (2005). The Euclidean algorithm generates traditional musical rhythms. In R. Sarhangi & J. Sharp (Eds.), Proceedings of BRIDGES 2005: Mathematical Connections in Art, Music, and Science (pp. 47–56). Bridges Conference.
Toussaint, G. T. (2020). The Geometry of Musical Rhythm: What Makes a “Good” Rhythm Good? (2nd ed.). Chapman & Hall/CRC Press.

When Musicians Lock In: Coupled Oscillators and the Physics of Ensemble Synchronisation

Thu, 08 Feb 2024 00:00:00 +0000

The problem is ancient and the language for it is recent. In any ensemble — a string quartet, a jazz rhythm section, an orchestra — musicians with slightly different internal tempos must stay together. They do this by listening to each other. But what, exactly, does “listening to each other” do to their timing? And what happens when the listening channel is imperfect — delayed by the speed of sound across a wide stage, or by a network cable crossing a continent? The answer involves a differential equation that was not written to describe music.

This post extends the latency analysis in Latency in Networked Music Performance with the dynamical systems framework that underlies it.

Two Clocks on a Board

The first documented observation of coupled-oscillator synchronisation was made not by a musician but by a physicist. In 1665, Christiaan Huygens, confined to bed with illness, was watching two pendulum clocks mounted on the same wooden beam. Over the course of the night, the pendulums had synchronised into anti-phase oscillation — swinging in opposite directions in exact unison. He reported it to his father:

“I have noticed a remarkable effect which no-one has observed before… two clocks on the same board always end up in mutual synchrony.”

The mechanism was mechanical coupling through the beam. Each pendulum’s swing imparted a small impulse to the wood; the other pendulum felt this as a perturbation to its rhythm. Small perturbations, accumulated over hours, drove the clocks into a shared frequency and a fixed phase relationship.

This is the prototype of every ensemble synchronisation problem. Each musician is a clock. The acoustic environment — the air in the room, the reflected sound from the walls, the vibrations through the stage floor — is the wooden beam.

The Kuramoto Model

Yoshiki Kuramoto formalised the mathematics of coupled oscillators in 1975, motivated by biological synchronisation problems: firefly flashing, circadian rhythms, cardiac pacemakers. His model considers $N$ oscillators, each with a phase $\theta_i(t)$ evolving according to:

$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin(\theta_j - \theta_i), \qquad i = 1, \ldots, N.$$

The first term, $\omega_i$, is the oscillator’s natural frequency — the tempo it would maintain in isolation. These are drawn from a distribution $g(\omega)$, which in a real ensemble reflects the spread of individual preferred tempos among the players. The second term is the coupling: each oscillator is attracted toward the phases of all others, with strength $K/N$. The factor $1/N$ keeps the total coupling intensive (independent of ensemble size) as $N$ grows large.

Musically: $\theta_i$ is the phase of musician $i$’s internal pulse at a given moment, $\omega_i$ is their preferred tempo if playing alone, and $K$ is the coupling strength — how much they adjust their tempo in response to what they hear from the others.

The Order Parameter and the Phase Transition

To measure the degree of synchronisation, Kuramoto introduced the complex order parameter:

$$r(t)\, e^{i\psi(t)} = \frac{1}{N} \sum_{j=1}^{N} e^{i\theta_j(t)},$$

where $r(t) \in [0, 1]$ is the coherence of the ensemble and $\psi(t)$ is the collective mean phase. When $r = 0$, the phases are uniformly spread around the unit circle — the ensemble is incoherent. When $r = 1$, all phases coincide — perfect synchrony. In a live ensemble, $r$ is a direct measure of rhythmic cohesion, though of course not one you can read off a score.

Substituting the order parameter into the equation of motion:

$$\frac{d\theta_i}{dt} = \omega_i + K r \sin(\psi - \theta_i).$$

Each oscillator now interacts only with the mean-field quantities $r$ and $\psi$, not with every other oscillator individually. The coupling pulls each musician toward the collective mean phase with a force proportional to both $K$ (how attentively they listen) and $r$ (how coherent the group already is).

This mean-field form reveals the essential physics. For small $K$, oscillators with widely differing $\omega_i$ cannot follow the mean field — they drift at their own frequencies, and $r \approx 0$. At a critical coupling strength $K_c$, a macroscopic fraction of oscillators suddenly locks to a shared frequency, and $r$ begins to grow continuously from zero. For a unimodal, symmetric frequency distribution $g(\omega)$ with density $g(\bar\omega)$ at the mean:

$$K_c = \frac{2}{\pi\, g(\bar\omega)}.$$

Above $K_c$, the coherence grows as:

$$r \approx \sqrt{\frac{K - K_c}{K_c}}, \qquad K \gtrsim K_c.$$

This is a second-order (continuous) phase transition — the same mathematical structure as a ferromagnet approaching the Curie temperature, where spontaneous magnetisation appears continuously above a critical coupling. The musical ensemble and the magnetic material belong to the same universality class, governed by the same mean-field exponent $\frac{1}{2}$.

Above $K_c$, the fraction of oscillators that are locked (synchronised to the mean-field frequency) can be computed explicitly. An oscillator with natural frequency $\omega_i$ locks to the mean field if $|\omega_i - \bar\omega| \leq Kr$. For a Lorentzian distribution $g(\omega) = \frac{\gamma/\pi}{(\omega - \bar\omega)^2 + \gamma^2}$, this yields:

$$r = \sqrt{1 - \frac{K_c}{K}}, \qquad K_c = 2\gamma,$$

which is the exact self-consistency equation for the Kuramoto model with Lorentzian frequency spread (Strogatz, 2000).

The physical reading is direct: whether an ensemble locks into a shared pulse or drifts apart is a threshold phenomenon. A group of musicians with similar preferred tempos has a peaked $g(\bar\omega)$, giving a low $K_c$ — they synchronise easily with minimal attentive listening. A group with widely varying individual tempos needs stronger, more sustained coupling to cross the threshold. This is not a matter of musical discipline; it is a material property of the ensemble.

Concert Hall Applause: Neda et al. (2000)

The Kuramoto model is not only a theoretical construction. Neda et al. (2000) applied it to concert hall applause — one of the most direct real-world demonstrations of coupled-oscillator dynamics in a musical context.

They recorded applause in Romanian and Hungarian theaters and found that audiences spontaneously alternate between two distinct states. In the incoherent regime, each audience member claps at their own preferred rate (typically 2–3 Hz). Through acoustic coupling — each person hears the room-averaged sound and adjusts their clapping — the audience gradually synchronises to a shared, slower frequency (around 1.5 Hz): the synchronised regime.

The transitions between the two regimes are quantitatively consistent with the Kuramoto phase transition: the emergence of synchrony corresponds to $K$ crossing $K_c$ as people progressively pay more attention to the collective sound. Furthermore, Neda et al. document a characteristic phenomenon when synchrony breaks down: individual clapping frequency approximately doubles as audience members attempt to re-establish coherence. This frequency-doubling — a feature of nonlinear oscillator systems near instability — is exactly what the delayed response of coupling near $K_c$ predicts.

The paper is a useful pedagogical artefact: every music student has experienced concert hall applause, and hearing that it undergoes a physically measurable phase transition makes the connection between physics and musical experience concrete.

Latency and the Limits of Networked Ensemble Performance

In standard acoustic ensemble playing, the coupling delay is the propagation time for sound to cross the ensemble: at $343\ \text{m/s}$, across a ten-metre stage, roughly 30 ms. This is why orchestral seating is arranged with attention to who needs to hear whom first.

In networked music performance (NMP), the coupling delay $\tau$ is much larger: tens to hundreds of milliseconds depending on geographic distance and network infrastructure. The Kuramoto model generalises naturally to include this delay:

$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_{j=1}^{N} \sin\!\bigl(\theta_j(t - \tau) - \theta_i(t)\bigr).$$

Each musician hears the others’ phases as they were $\tau$ seconds ago, not as they are now.

In a synchronised state where all oscillators share the collective frequency $\bar\omega$ and phase $\psi(t) = \bar\omega t$, the delayed phase signal is $\psi(t - \tau) = \bar\omega t - \bar\omega\tau$. The effective coupling force contains a factor $\cos(\bar\omega\tau)$: the delay introduces a phase shift that reduces the useful component of the coupling. The critical coupling with delay is therefore:

$$K_c(\tau) = \frac{K_c(0)}{\cos(\bar\omega \tau)}.$$

As $\tau$ increases, $K_c(\tau)$ grows: synchronisation requires progressively stronger coupling (more attentive adjustment) to compensate for the information lag. The denominator $\cos(\bar\omega\tau)$ reaches zero when $\bar\omega\tau = \pi/2$. At this point $K_c(\tau) \to \infty$: no finite coupling strength can maintain synchrony. The critical delay is:

$$\tau_c = \frac{\pi}{2\bar\omega}.$$

For an ensemble performing at 120 BPM, the beat frequency is $\bar\omega = 2\pi \times 2\ \text{Hz} = 4\pi\ \text{rad/s}$:

$$\tau_c = \frac{\pi}{2 \times 4\pi} = \frac{1}{8}\ \text{s} = 125\ \text{ms}.$$

This is a remarkably clean result. The Kuramoto model with delay predicts that ensemble synchronisation collapses at around 125 ms one-way delay for a standard performance tempo. The empirical literature on NMP — from LoLa deployments across European conservatories to controlled latency studies in the lab — consistently finds that rhythmic coherence degrades noticeably above 50–80 ms and becomes essentially unworkable above 100–150 ms one-way. The model and the data agree.

The derivation also shows why faster tempos are harder in NMP: $\tau_c \propto 1/\bar\omega$, so doubling the tempo halves the tolerable latency. An ensemble performing at 240 BPM in a distributed setting faces a theoretical ceiling of 62 ms — which rules out transcontinental performance for most repertoire.

Brains in Sync: EEG Hyperscanning

The Kuramoto framework has recently been applied at a neural level. EEG hyperscanning — simultaneous EEG recording from multiple participants during a shared musical activity — has shown that musicians performing together exhibit inter-brain synchronisation: coherent cortical oscillations at the frequency of the music are measurable between players (Lindenberger et al., 2009; Müller et al., 2013). The phase coupling between brains during joint performance is significantly higher than during solo performance and higher than for musicians playing simultaneously but without acoustic coupling.

This suggests that the Kuramoto coupling operates at two levels: the acoustic (each musician hears the other and adjusts physical timing) and the neural (each musician’s cortical oscillators entrain to the shared musical pulse). The question of which level is primary — whether neural synchrony causes or follows from acoustic synchrony — remains open.

A 2023 review by Demos and Palmer argues that pairwise Kuramoto-type coupling is insufficient to capture full ensemble dynamics. Group-level effects — the differentiation between leader and follower roles, the emergence of collective timing that no individual would produce alone — require nonlinear dynamical frameworks that go beyond mean-field averaging. The model that adequately describes a string quartet may need to be richer than the one that describes a population of identical fireflies.

What This Means for Teaching

The Kuramoto model reframes standard rehearsal intuitions in physical terms.

“Listen more” translates to “increase your effective coupling constant $K$.” A musician who plays without attending to others has set $K \approx 0$ and will drift freely according to their own $\omega_i$. Listening — actively adjusting tempo in response to what you hear — is not metaphorical. It is the physical mechanism of coupling, and its effect is to pull you toward the mean phase $\psi$ with a force $Kr\sin(\psi - \theta_i)$.

“Our tempos are too different” is a claim about $g(\bar\omega)$ and therefore about $K_c$. A group with a wide spread of natural tempos needs more and stronger listening to synchronise. This is not a moral failing but a parameter; it suggests that ensemble warm-up time or explicit tempo negotiation before a performance serves to reduce the spread of natural frequencies before the coupling has to do all the work.

Latency as a rehearsal experiment can be made explicit. Artificially delaying the acoustic return to one musician in an ensemble — via headphone monitoring with variable delay — allows students to experience directly how the coordination degrades as $\tau$ increases toward $\tau_c$. They feel the system approaching the phase transition without the theoretical framework, but the framework makes the experience interpretable afterward.

The click track replaces peer-to-peer Kuramoto coupling with an external forcing term: each musician locks to a shared reference with fixed $\omega$ rather than adjusting dynamically to the group mean. This eliminates the phase transition but also eliminates the adaptive dynamics — the micro-timing fluctuations and expressive rubato — that characterise live ensemble playing. It is a pedagogically important distinction, even if studios routinely make the pragmatic choice.

References

Demos, A. P., & Palmer, C. (2023). Social and nonlinear dynamics unite: Musical group synchrony. Trends in Cognitive Sciences, 27(11), 1008–1018. https://doi.org/10.1016/j.tics.2023.08.005
Huygens, C. (1665). Letter to his father Constantijn Huygens, 26 February 1665. In Œuvres complètes de Christiaan Huygens, Vol. 5, p. 243. Martinus Nijhoff, 1893.
Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear oscillators. In H. Araki (Ed.), International Symposium on Mathematical Problems in Theoretical Physics (Lecture Notes in Physics, Vol. 39, pp. 420–422). Springer.
Kuramoto, Y. (1984). Chemical Oscillations, Waves, and Turbulence. Springer.
Lindenberger, U., Li, S.-C., Gruber, W., & Müller, V. (2009). Brains swinging in concert: Cortical phase synchronization while playing guitar. BMC Neuroscience, 10, 22. https://doi.org/10.1186/1471-2202-10-22
Müller, V., Sänger, J., & Lindenberger, U. (2013). Intra- and inter-brain synchronization during musical improvisation on the guitar. PLOS ONE, 8(9), e73852. https://doi.org/10.1371/journal.pone.0073852
Neda, Z., Ravasz, E., Vicsek, T., Brechet, Y., & Barabási, A.-L. (2000). Physics of the rhythmic applause. Physical Review E, 61(6), 6987–6992. https://doi.org/10.1103/PhysRevE.61.6987
Strogatz, S. H. (2000). From Kuramoto to Crawford: Exploring the onset of synchronization in populations of coupled oscillators. Physica D: Nonlinear Phenomena, 143(1–4), 1–20. https://doi.org/10.1016/S0167-2789(00)00094-4
Strogatz, S. H. (2003). Sync: How Order Emerges from Chaos in the Universe, Nature, and Daily Life. Hyperion.

Changelog

2026-01-14: Updated the author list for the Demos (2023) Trends in Cognitive Sciences reference to the published two authors (Demos & Palmer). The five names previously listed were from a different Demos paper.
2026-01-14: Changed “period-doubling” to “frequency-doubling.” When the clapping frequency doubles, the period halves; “frequency-doubling” is the precise term in this context.

The Impossible Heptagon

Mon, 15 Jan 2024 00:00:00 +0000

Danny Carey — drummer of Tool, one of the most rhythmically inventive musicians in rock — keeps a seven-pointed star on his kit and speaks about it using the language of sacred geometry. The heptagram appears in Tool’s visual artwork, in the Thelemic symbolism Carey draws on, in pre-modern cosmological diagrams, and in the decorative traditions of several cultures that had no contact with each other. The claim, loosely stated, is that seven-fold symmetry is privileged: that it reflects something structurally true, that its forms carry significance beyond the aesthetic.

The scientific reflex here is usually impatience. “Sacred geometry” occupies an uncomfortable cultural space — mathematically dressed, factually thin, reliant on the listener not checking claims too carefully. The golden ratio does not appear everywhere in nature. Most things described as sacred in this tradition are better described as things the speaker found surprising before learning a more precise vocabulary.

But the heptagon is genuinely strange. Not for the reasons usually given. For a different reason — a theorem.

The regular heptagon cannot be constructed with compass and straightedge.

Not “it is difficult.” Not “no one has found a construction yet.” The regular seven-sided polygon — all sides equal, all interior angles equal — is provably impossible to construct using an unmarked ruler and compass in finitely many steps. This has been known since 1801.

The Classical Constraint

Greek geometry restricted its tools deliberately. An unmarked straightedge draws lines through two known points. A compass draws circles centred at a known point with a given radius. No angle trisection. No markings. No graduated instruments. Just these two operations, applied one at a time, finitely many times.

Within this constraint, a great deal is achievable. A perpendicular bisector. An equilateral triangle. A regular pentagon — which requires the golden ratio and takes some work, but is reachable. A regular hexagon (trivially: six equilateral triangles around a centre).

Then: nothing for the heptagon. Greek geometers left no construction. Medieval Islamic mathematicians, who knew the regular polygon problem well, left no construction. Albrecht Dürer, in his 1525 Underweysung der Messung, gave an approximate construction that falls short by a small but nonzero margin. Each generation encountered the same wall.

In 1796, an 18-year-old Gauss proved that the regular 17-gon is constructible — a result so unexpected that he reportedly decided at that moment to become a mathematician rather than a philologist. In his 1801 Disquisitiones Arithmeticae he gave the complete characterisation of which regular polygons are constructible and which are not [1]. The heptagon was definitively placed among the impossible.

Gauss’s Theorem

A regular $n$-gon is constructible with compass and straightedge if and only if $n$ has the form

$$n = 2^k \cdot p_1 \cdot p_2 \cdots p_m$$

where $k \geq 0$ and the $p_i$ are distinct Fermat primes — primes of the form $2^{2^j} + 1$.

The Fermat primes currently known:

$j$	$F_j = 2^{2^j}+1$	Prime?
0	3	✓
1	5	✓
2	17	✓
3	257	✓
4	65537	✓
5	4 294 967 297	✗ (Euler, 1732)
6	18 446 744 073 709 551 617	✗
⋮	⋮	no further Fermat primes known

Five Fermat primes are known, all identified by the seventeenth century. Fermat himself conjectured that all numbers of this form are prime; he was wrong from $j = 5$ onward. Whether any further Fermat primes exist remains an open problem.

The constructible regular polygons therefore include the triangle (3), square (4), pentagon (5), hexagon (6), octagon (8), decagon (10), 15-gon, 17-gon, 257-gon, 65537-gon, and products of these with powers of 2. The 65537-gon was actually fully constructed by Johann Gustav Hermes, who spent around ten years on the computation in the 1880s and deposited a manuscript reportedly filling a large trunk at the University of Göttingen, where it remains.

Seven is prime, but $7 \neq 2^{2^j} + 1$ for any $j$ — it is not a Fermat prime. Therefore the regular heptagon is not on the list. It is not constructible.

The Algebra Behind the Geometry

Why does the structure of Fermat primes determine constructibility? The connection goes through algebra [2][3].

Every compass-and-straightedge construction corresponds to solving a sequence of equations of degree at most 2. Bisecting an angle, finding an intersection of a line and a circle — each step is a quadratic operation. After $k$ such steps, the numbers reachable lie in some field extension of $\mathbb{Q}$ (the rationals) with degree over $\mathbb{Q}$ at most $2^k$. Constructibility therefore requires the degree of the relevant extension to be a power of 2.

To construct a regular $n$-gon, you need to construct the angle $2\pi/n$, which requires constructing $\cos(2\pi/n)$. The question is: over what kind of field extension does $\cos(2\pi/n)$ sit?

For $n = 7$: let $\omega = e^{2\pi i/7}$, a primitive 7th root of unity. The minimal polynomial of $\omega$ over $\mathbb{Q}$ is the 7th cyclotomic polynomial

$$\Phi_7(x) = x^6 + x^5 + x^4 + x^3 + x^2 + x + 1,$$

which is irreducible over $\mathbb{Q}$, giving $[\mathbb{Q}(\omega) : \mathbb{Q}] = 6$. Since $\cos(2\pi/7) = (\omega + \omega^{-1})/2$, and since $\omega$ satisfies a degree-2 polynomial over $\mathbb{Q}(\cos 2\pi/7)$, we get

$$[\mathbb{Q}(\cos 2\pi/7) : \mathbb{Q}] = 3.$$

Specifically, $c = \cos(2\pi/7)$ is the root of the irreducible cubic

$$8c^3 + 4c^2 - 4c - 1 = 0,$$

or equivalently, $\alpha = 2\cos(2\pi/7)$ satisfies

$$\alpha^3 + \alpha^2 - 2\alpha - 1 = 0.$$

The three roots of this cubic are $2\cos(2\pi/7)$, $2\cos(4\pi/7)$, and $2\cos(6\pi/7)$. By Vieta’s formulas their sum is $-1$ and their product is $1$ — which can be verified directly from the identity $\cos(2\pi/7) + \cos(4\pi/7) + \cos(6\pi/7) = -1/2$.

The degree of the extension is 3. Three is not a power of 2. Therefore $\cos(2\pi/7)$ cannot be reached by any tower of quadratic extensions of $\mathbb{Q}$. Therefore the regular heptagon is not constructible. $\square$

Compare the pentagon: $\cos(2\pi/5) = (\sqrt{5}-1)/4$, satisfying the quadratic $4x^2 + 2x - 1 = 0$. Degree 2 — a power of 2. Constructible.

The 17-gon: the Galois group of $\mathbb{Q}(\zeta_{17})/\mathbb{Q}$ is $(\mathbb{Z}/17\mathbb{Z})^* \cong \mathbb{Z}/16\mathbb{Z}$, order $16 = 2^4$. The extension decomposes into four quadratic steps. This is exactly what Gauss computed at 18.

For 7: $(\mathbb{Z}/7\mathbb{Z})^* \cong \mathbb{Z}/6\mathbb{Z}$, order $6 = 2 \times 3$. The factor of 3 is the obstruction. The Galois group is not a 2-group, so the extension cannot be decomposed into quadratic steps. The heptagon is out of reach.

Sacred, Precisely

The phrase “sacred geometry” usually does work that “elegant mathematics” could do more honestly. But the heptagon is a case where something with genuine mathematical content sits underneath the mystical framing.

The Platonic tradition held that certain geometric forms exist as ideals — perfect, unchanging, more real than their physical approximations. The philosopher’s claim is that the heptagon exists in a realm beyond its material instantiation. The mathematician’s claim is: the heptagon is perfectly well-defined — seven equal sides, seven equal angles — but it cannot be reached from $\mathbb{Q}$ by the operations available to ruler and compass. You can approximate it to any desired precision. You can construct it exactly using origami, which allows angle trisection and is strictly more powerful than compass and straightedge [4]. But the classical constructive program — the one that reaches the pentagon, the hexagon, the 17-gon, the 65537-gon — cannot reach the heptagon.

There is a precise mathematical sense in which it lies outside the constructible world. Whether that constitutes sacredness is a question for a different kind of argument. But it is not nothing. The Pythagoreans were working without Galois theory; they had an intuition without the theorem. The theorem, when it came, confirmed that intuition about seven while explaining it more clearly than they could.

Carey’s intuition — that 7 sits outside the ordinary — is, by this route, formally correct.

What the Heptagram Is

The regular heptagon may be impossible to construct exactly, but the heptagram — the seven-pointed star — is perfectly drawable. Connecting every second vertex of an approximate regular heptagon gives $\{7/2\}$ in Schläfli notation [5]; connecting every third vertex gives $\{7/3\}$. Both are closed figures. Both appear throughout pre-modern symbolic traditions, which is unsurprising: they are the most intricate star polygons drawable with a single pen stroke before complexity outruns visibility.

They are also generators of rhythmic structure. Because 7 is prime, every star polygon on seven points visits all seven vertices in a single closed traversal — a property that does not hold for six-pointed or eight-pointed stars. This turns out to matter for how drum patterns are built across multiple bars. That connection — from the primality of 7 to the architecture of rhythmic accent cycles — is the subject of the companion post, Star Polygons and Drum Machines.

The broader series on mathematics in Tool’s music began with the Fibonacci structure embedded in the time signatures and syllable counts of “Lateralus” [6], and the group-theoretic structure underlying twelve-tone equal temperament provides the same algebraic scaffolding seen here [7].

References

[1] Gauss, C.F. (1801). Disquisitiones Arithmeticae. Leipzig: Fleischer. (§VII.)

[2] Stewart, I. (2004). Galois Theory (3rd ed.). CRC Press. Ch. 4.

[3] Conway, J.H. & Guy, R.K. (1996). The Book of Numbers. Springer. pp. 190–202.

[4] Hull, T. (2011). Solving cubics with creases: The work of Beloch and Lill. The American Mathematical Monthly, 118(4), 307–315. DOI: 10.4169/amer.math.monthly.118.04.307

[5] Coxeter, H.S.M. (1973). Regular Polytopes (3rd ed.). Dover. Ch. 2.

[6] See Fibonacci and Lateralus on this blog.

[7] See Twelve-TET and Group Theory on this blog.

Twelve Is Not an Accident: The Group Theory of Musical Tuning

Fri, 15 Dec 2023 00:00:00 +0000

Sit down at a piano and count the keys in one octave. Twelve. Seven white, five black, twelve total pitch classes before the pattern repeats. Ask a musician why twelve and they will probably say something about Western tradition, the church modes, or maybe vaguely gesture at the circle of fifths. Ask a musicologist and you might hear about Pythagoras, or the development of equal temperament in the Baroque period, or the well-tempered tuning systems of J. S. Bach. All of that history is real and worth knowing. But none of it explains why the number 12 works, and why every serious attempt at a usable keyboard instrument across widely separated cultures converges on the same cardinality.

The real answer is in number theory. Specifically, it is in the continued fraction expansion of a single irrational number: $\log_2(3/2)$. The number 12 is not a cultural choice. It is the smallest integer that gives a genuinely good rational approximation to that number — subject to the constraint that a human hand can navigate the resulting keyboard. Once you see the argument, the feeling of contingency evaporates completely. Twelve is forced on us.

Along the way, the same mathematical structure — the cyclic group $\mathbb{Z}_{12}$ — explains why Messiaen’s modes of limited transposition exist, why the circle of fifths closes exactly, and why certain chord types (augmented triads, diminished seventh chords, the whole-tone scale) have a strange self-similar quality that composers have exploited for centuries. If you want the full treatment of the Messiaen connection, I wrote a dedicated post: Messiaen, Modes, and the Group Theory of Harmony. Here I want to build the foundations from scratch, starting with the one interval that makes all of this necessary.

The interval that started everything

The perfect fifth has a frequency ratio of exactly 3:2. Play two strings in that ratio and the sound is stable, open, and unmistakably consonant — second only to the octave (2:1) in the hierarchy of simple intervals. The reason is physics: the overtone series of any vibrating string includes the fundamental frequency $f$, then $2f$, $3f$, $4f$, and so on. Two notes a perfect fifth apart share the overtone at $3f$ (for the lower note) and $2f'$ (for the upper note, where $f' = 3f/2$): those are the same frequency, $3f$. Shared overtones mean the two notes reinforce rather than fight each other. This is why the fifth sounds stable: it is literally built into the harmonic structure of physical vibration.

Humans discovered the fifth independently in ancient Greece, China, India, and Mesopotamia. It is not a cultural artifact [4]. Given that stability, it is natural to ask: can we build a complete pitch system by stacking fifths? Take a starting note, go up a fifth, up another, up another, and keep going. The notes you produce — C, G, D, A, E, B, F♯, … — are acoustically related to the starting point in a simple way, and they sound good together. This is the Pythagorean tuning system, and it underlies the construction of diatonic scales.

But here is the problem. A fifth raises the pitch by a factor of 3/2. An octave raises it by a factor of 2. These are independent: one is a power of 3 and the other a power of 2, and no power of 3/2 will ever equal a power of 2 exactly. In the language of modern mathematics, $\log_2(3/2)$ is irrational — this follows directly from the fundamental theorem of arithmetic, since no product of powers of 2 can equal a product of powers of 3. Whether it is also transcendental is an open question; a proof would follow from Schanuel’s conjecture, but that conjecture remains unresolved. What matters for tuning is the irrationality alone. Stacking pure fifths and stacking octaves are incommensurable operations. The circle of fifths can never close in pure Pythagorean tuning. We will always end up slightly sharp or flat relative to where we started.

This incommensurability is the central problem of musical tuning. Everything else — equal temperament, just intonation, meantone tuning, the Pythagorean comma, the whole apparatus of tuning theory — is a response to it.

Equal temperament and the approximation problem

In an equal temperament with $N$ notes per octave, we divide the octave into $N$ equal logarithmic steps. Each step corresponds to a frequency ratio of $2^{1/N}$. We then ask: how many steps $k$ gives the best approximation to a perfect fifth?

The condition is simply that $2^{k/N}$ should be close to $3/2$, which means $k/N$ should be close to $\log_2(3/2)$. So we need a good rational approximation to

$$\log_2\!\left(\frac{3}{2}\right) = \log_2 3 - 1 \approx 0.584962\ldots$$

The classical tool for finding best rational approximations is the continued fraction. Any real number $x$ can be written as

$$x = a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \cdots}}}$$

where the $a_i$ are non-negative integers (positive for $i \geq 1$), called the partial quotients. For $\log_2(3/2)$ the expansion is

$$\log_2\!\left(\frac{3}{2}\right) = [0;\, 1,\, 1,\, 2,\, 2,\, 3,\, 1,\, 5,\, 2,\, 23,\, 2,\, \ldots]$$

The truncated continued fractions — the convergents — give the sequence of best rational approximations:

$$\frac{0}{1},\quad \frac{1}{1},\quad \frac{1}{2},\quad \frac{3}{5},\quad \frac{7}{12},\quad \frac{24}{41},\quad \frac{31}{53},\quad \frac{179}{306},\quad \ldots$$

Each convergent $k/N$ corresponds to a tuning system: the denominator $N$ is the number of equal steps per octave, and the numerator $k$ is the number of steps that best approximates a fifth. So we get: 1-TET (trivial), 2-TET (trivial), 5-TET, 12-TET, 41-TET, 53-TET, 306-TET, and so on [1], [2].

The key property of convergents is that they give uniquely good approximations. No rational number with a smaller denominator comes closer to the true value than a convergent does. So 7/12 is not merely a decent approximation to $\log_2(3/2)$ — it is provably the best approximation with denominator at most 12. To do better with a denominator below 41, you cannot.

To put numbers on it: in 12-TET, the fifth is $2^{7/12} \approx 1.498307\ldots$, while the true fifth is exactly $1.500000$. The error is about 0.11%, or roughly 2 cents (hundredths of a semitone). In 53-TET, the fifth is $2^{31/53} \approx 1.499941\ldots$, an error of less than 0.004%, about 0.07 cents — essentially indistinguishable from pure. Both 12 and 53 are convergents. Intermediate values like 19-TET or 31-TET are not convergents (they are not best approximations), and their fifths, while sometimes used in experimental or microtonal music, are less accurate relative to their complexity.

Why does this matter? Because a tuning system that approximates the fifth poorly will produce harmonies that beat visibly — the slight mistuning causes the sound to waver in a way that trained ears find uncomfortable in sustained chords. A good fifth approximation is not a luxury; it is the condition for the system to be musically usable in the harmonic practice that most of the world’s music assumes.

The Pythagorean comma

Before equal temperament became standard (roughly the 18th century in Western Europe), instruments were tuned using pure Pythagorean fifths: exact 3:2 ratios, stacked on top of each other. This gives beautiful, stable individual fifths, but it collects a debt.

After stacking 12 pure fifths, you have climbed in frequency by $(3/2)^{12}$:

$$(3/2)^{12} = \frac{3^{12}}{2^{12}} = \frac{531441}{4096} \approx 129.746\ldots$$

Meanwhile, 7 octaves is $2^7 = 128$. The ratio between these is

$$\frac{(3/2)^{12}}{2^7} = \frac{3^{12}}{2^{19}} = \frac{531441}{524288} \approx 1.01364$$

This is the Pythagorean comma: roughly 23.46 cents, or about a quarter of a semitone [4]. In Pythagorean tuning, the circle of fifths never closes. After 12 fifths you arrive at a note that is nominally the same pitch class as the starting point — but sharp by 23.46 cents. That final fifth, the one that “should” close the circle, sounds badly out of tune. It was historically called the “wolf fifth” because it howls.

Equal temperament solves this by distributing the comma across all 12 fifths. Each fifth is flattened by $23.46/12 \approx 1.955$ cents. The individual fifths are no longer pure, but the error is small enough to be acceptable — and crucially, it is uniform, so every key sounds equally good (or equally impure, depending on your perspective).

The Pythagorean comma being small — about 1.96% of the octave — is precisely why 12-TET works. It is small because 7/12 is an unusually good convergent of $\log_2(3/2)$. The two facts are the same fact. The comma is the numerator of the error when you approximate $\log_2(3/2)$ by $7/12$, multiplied up by 12 fifths’ worth of accumulation. When the approximation is good, the comma is small, and the distribution is imperceptible. This is why the piano is tuned the way it is.

The group theory

We are now ready for the algebra. In 12-TET, pitch classes form the set $\{0, 1, 2, \ldots, 11\}$ where we identify 0 with C, 1 with C♯, 2 with D, 3 with D♯, 4 with E, 5 with F, 6 with F♯, 7 with G, 8 with G♯, 9 with A, 10 with A♯, and 11 with B. Addition is modulo 12: after 11 comes 0 again, because after B comes C in the next octave (same pitch class). This is $\mathbb{Z}_{12}$, the integers mod 12, and it is a group under addition [1].

Transposition by a semitone is addition of 1. Transposition by a perfect fifth is addition of 7, because the fifth is 7 semitones in 12-TET. Start from C (0) and repeatedly add 7, always reducing modulo 12:

$$0 \to 7 \to 14 \equiv 2 \to 9 \to 16 \equiv 4 \to 11 \to 18 \equiv 6 \to 13 \equiv 1 \to 8 \to 15 \equiv 3 \to 10 \to 17 \equiv 5 \to 12 \equiv 0$$

In note names: C, G, D, A, E, B, F♯, C♯, G♯, D♯/E♭, A♯/B♭, F, C. That is the circle of fifths — all 12 pitch classes visited exactly once before returning to the start. The circle of fifths is the orbit of 0 under repeated addition of 7 in $\mathbb{Z}_{12}$.

Why does the orbit visit all 12 elements? Because $\gcd(7, 12) = 1$. This is Bézout’s theorem applied to cyclic groups: an element $g$ generates $\mathbb{Z}_n$ (i.e., its orbit under repeated addition covers all of $\mathbb{Z}_n$) if and only if $\gcd(g, n) = 1$. The generators of $\mathbb{Z}_{12}$ are exactly the elements coprime to 12: that is $\{1, 5, 7, 11\}$. Musically: transposition by 1 semitone (chromatic scale), by 5 semitones (perfect fourth), by 7 semitones (perfect fifth), or by 11 semitones (major seventh) each generates all 12 pitch classes. Transposition by 2 (a whole tone) does not — it produces only the 6-element whole-tone scale. Transposition by 3 (a minor third) produces only the 4-element diminished seventh chord.

This is not a curiosity; it is the algebraic skeleton of tonal music. The circle of fifths closes because 7 and 12 are coprime. That coprimality is guaranteed by the continued fraction structure: the numerator and denominator of a convergent in lowest terms are always coprime (as they must be, being a reduced fraction), and 7/12 is such a convergent.

Now consider the subgroups of $\mathbb{Z}_{12}$. By Lagrange’s theorem, subgroups of a finite group must have orders dividing the group order. The divisors of 12 are 1, 2, 3, 4, 6, and 12, so these are the only possible subgroup orders. For cyclic groups there is exactly one subgroup of each order dividing $n$, and it is generated by $n/d$ where $d$ is the subgroup order. The full list:

The trivial subgroup of order 1 is just $\{0\}$. The subgroup of order 2 is $\{0, 6\}$, generated by 6 — that is, the tritone axis, the interval of exactly half an octave. The subgroup of order 3 is $\{0, 4, 8\}$, generated by 4 — this is the augmented triad, three notes equally spaced around the octave by major thirds. The subgroup of order 4 is $\{0, 3, 6, 9\}$, generated by 3 — the diminished seventh chord, four notes equally spaced by minor thirds. The subgroup of order 6 is $\{0, 2, 4, 6, 8, 10\}$, generated by 2 — the whole-tone scale. And the full group of order 12 is all of $\mathbb{Z}_{12}$.

Each of these has a musical life. The augmented triad ($\{0, 4, 8\}$) sounds ambiguous because it maps onto itself under transposition by a major third — there are only 4 distinct augmented triads total, not 12. Composers exploit this ambiguity when they want harmonic instability without committing to a direction. The diminished seventh ($\{0, 3, 6, 9\}$) is similarly ambiguous: it has only 3 distinct forms and can resolve to any of several keys, which is why it appears so often at structural pivots in Romantic music. These properties are direct consequences of the subgroup structure of $\mathbb{Z}_{12}$.

Messiaen’s modes as cosets

Olivier Messiaen described his “modes of limited transposition” in his 1944 treatise Technique de mon langage musical. He identified seven scales — including the whole-tone scale and the octatonic scale — that have the peculiar property of mapping onto themselves under some transposition strictly smaller than an octave. He found them by ear, by introspection, and by exhaustive search at the keyboard. He did not have the group theory. But the group theory makes their existence not merely explainable but inevitable.

Here is the key definition. A scale $S \subseteq \mathbb{Z}_{12}$ is a mode of limited transposition if there exists some $t \in \{1, 2, \ldots, 11\}$ such that $S + t \equiv S \pmod{12}$ (as a set). In other words, transposing the scale by $t$ semitones maps the scale onto itself. The integer $t$ is called a period of the scale.

Now, the set of all periods of $S$ — together with 0 — forms a subgroup of $\mathbb{Z}_{12}$ (it is closed under addition modulo 12, since if both $t_1$ and $t_2$ are periods then so is $t_1 + t_2$). Call this subgroup $H$. The condition for $S$ to be a mode of limited transposition is simply that $H$ is nontrivial — that is, $H \neq \{0\}$.

Moreover, if $H$ is the period subgroup of $S$, then $S$ must be a union of cosets of $H$ in $\mathbb{Z}_{12}$. This follows immediately from the fact that $H$ acts on $S$ by translation and maps $S$ to itself: every element of $S$ belongs to exactly one coset of $H$, and $S$ is a union of whole cosets. The size of $S$ must therefore be a multiple of $|H|$.

The whole-tone scale $\{0, 2, 4, 6, 8, 10\}$ is itself the unique subgroup of order 6 in $\mathbb{Z}_{12}$. Its period subgroup is the whole-tone scale itself. Transposing by any even number (2, 4, 6, 8, or 10) maps it to itself. Transposing by an odd number gives the complementary whole-tone scale $\{1, 3, 5, 7, 9, 11\}$. There are therefore only 2 distinct transpositions of the whole-tone scale, not 12.

The octatonic (diminished) scale $\{0, 1, 3, 4, 6, 7, 9, 10\}$ has period subgroup $\{0, 3, 6, 9\}$ — the subgroup of order 4. It is a union of two cosets: $\{0, 3, 6, 9\}$ itself and $\{1, 4, 7, 10\}$. Transposing by 3 maps it onto itself. There are only 3 distinct transpositions. Messiaen calls this his Mode 2.

The general formula is clean: a mode of limited transposition with period subgroup of order $d$ has exactly $12/d$ distinct transpositions. For the whole-tone scale, $d = 6$ gives $12/6 = 2$ transpositions. For the octatonic scale, $d = 4$ gives $12/4 = 3$ transpositions.

What Messiaen found by ear was the complete classification of subsets of $\mathbb{Z}_{12}$ that are unions of cosets of a nontrivial subgroup [5]. The group theory makes their existence a theorem rather than a discovery. I find this genuinely beautiful: a composer’s intuition about harmonic symmetry turns out to be an exercise in the theory of cosets of cyclic groups. For the full analysis of each of Messiaen’s seven modes in these terms, see Messiaen, Modes, and the Group Theory of Harmony.

Why not 53?

Given that 53-TET approximates the fifth with an error of less than 0.004% — compared to 12-TET’s 0.11% — one might ask why we do not simply use 53-TET. The mathematical case is overwhelming. In addition to the nearly perfect fifth, 53-TET gives excellent approximations to the just major third (frequency ratio 5:4) and the just minor third (6:5). It was seriously advocated by the 19th-century theorist Robert Holford Macdowall Bosanquet, who even built a 53-key harmonium to demonstrate it. The Chinese theorist Jing Fang described a 53-note system in the 1st century BC. The Arabic music theorist Al-Farabi considered 53-division scales in the 10th century. Everyone who has ever thought carefully about tuning arrives at 53 eventually.

And yet no 53-TET instrument has ever entered widespread use. The reason is anatomical, not mathematical. A piano with 53 keys per octave spans more than 2 metres per octave at any reasonable key size — impossible to play. A guitar with 53 frets per octave has frets spaced roughly 3–4 millimetres apart in the upper register: no human fingertip is narrow enough to press a single fret without touching its neighbours. Even if you could play it, reading 53-TET notation would require an entirely new theoretical and pedagogical apparatus.

The constraint is: we want the largest $N$ such that (a) $N$ is a convergent denominator of $\log_2(3/2)$, so the fifth approximation is genuinely good, and (b) $N$ is small enough to navigate with human hands and readable at a glance. The convergent denominators are 1, 2, 5, 12, 41, 53, 306, … Of these, 12 is the largest that satisfies condition (b). The next convergent, 41, already strains human dexterity — 41-TET keyboard instruments have been built experimentally but never mass-produced. At 53 the case is closed.

One might argue about where exactly the cutoff is, and reasonable people might draw it at 19 or 31 (which are not convergents but have other virtues). But the point is that 12 is not merely a local optimum found by trial and error. It is the specific value where the continued fraction and human physiology intersect.

Closing

There is something I find genuinely satisfying about this argument. Music feels like the most human of activities — expressive, cultural, steeped in history and tradition. And yet the number 12, which lies at the foundation of so much of the world’s music, is not a human choice at all. It is the continued-fraction convergent of an irrational number that was fixed by the physics of vibrating strings long before any human struck a tuning fork.

The circle of fifths closes because $\gcd(7, 12) = 1$: a fact about integers, not about culture. Messiaen’s modes exist because $\mathbb{Z}_{12}$ has nontrivial proper subgroups: a fact about cyclic groups, not about 20th-century French aesthetics. The augmented triad sounds ambiguous because it is a coset of the order-3 subgroup of $\mathbb{Z}_{12}$: a fact about quotient groups, not about Romantic harmony conventions.

I came to music theory sideways — through acoustics, then signal processing, then the mathematics of scales. What surprised me, when I finally worked through the continued fraction argument properly, was not that the math existed but that it was so tight. There is essentially no freedom in the answer. Given the constraint that a musical scale should be built around the most consonant interval (after the octave), should form a closed group structure, and should be navigable by a human performer, the answer is 12. Not approximately 12, not 12 as a historical compromise. Exactly 12.

The number is not a tradition. It is a theorem.

For more on related themes: the Fibonacci sequence and golden ratio in music appear in Fibonacci, Lateralus, and the Golden Ratio. The Euclidean algorithm and rhythmic structure are explored in Euclidean Rhythms — a sister post to this one in the math-and-music thread. And for the physics of audio sampling rates, where a similar interplay of number theory and practical constraints forces another specific number, see Why 44,100 Hz?.

References

[1] Balzano, G. J. (1980). The group-theoretic description of 12-fold and microtonal pitch systems. Computer Music Journal, 4(4), 66–84.

[2] Carey, N., & Clampitt, D. (1989). Aspects of well-formed scales. Music Theory Spectrum, 11(2), 187–206.

[3] Milne, A., Sethares, W. A., & Plamondon, J. (2007). Isomorphic controllers and dynamic tuning. Computer Music Journal, 31(4), 15–32.

[4] Lloyd, L. S., & Boyle, H. (1978). Intervals, Scales and Temperaments. St. Martin’s Press.

[5] Douthett, J., & Steinbach, P. (1998). Parsimonious graphs: A study in parsimony, contextual transformations, and modes of limited transposition. Journal of Music Theory, 42(2), 241–263.

Changelog

2025-11-20: Updated the spelling of “Robert Holford Macdowall Bosanquet” (previously rendered as “Macdowell”).
2025-11-20: Changed “about 1.36% of the octave” to “about 1.96% of the octave.” The 1.36% figure is the frequency ratio above unity (531441/524288 ≈ 1.01364); the logarithmic fraction of the 1200-cent octave is 23.46/1200 ≈ 1.96%.
2025-11-20: Changed “12 octaves’ worth of accumulation” to “12 fifths’ worth of accumulation.” The Pythagorean comma accumulates over 12 stacked fifths (which span approximately 7 octaves), not 12 octaves.

The Charm of Impossibilities: Group Theory and Messiaen's Modes of Limited Transposition

Wed, 19 Apr 2023 00:00:00 +0000

I first encountered Messiaen’s second mode — the octatonic scale — in an analysis seminar during my physics studies, played by a colleague on an upright piano in a rehearsal room with terrible acoustics. She demonstrated something that stopped me: no matter how many times she transposed the scale up by a minor third, she could never find a “new” version. After three transpositions she was back where she started. She called it the charm of impossibilities. It took me years to understand why it is impossible, and longer still to see that the answer is not musical but algebraic.

This post is a companion to Fibonacci, the Golden Ratio, and Tool’s Lateralus, which found number theory in a prog-rock song. Here we find abstract algebra in twentieth-century sacred music.

Pitch Classes and the Chromatic Clock

Western music divides the octave into twelve equal semitones. For purposes of harmony and counterpoint, the absolute pitch is often less important than the pitch class — the equivalence class of all pitches related by octave transposition. Middle C and the C two octaves above belong to the same pitch class.

We label the twelve pitch classes $0, 1, 2, \ldots, 11$, with $0 = \mathrm{C}$, $1 = \mathrm{C}\sharp/\mathrm{D}\flat$, $2 = \mathrm{D}$, and so on up to $11 = \mathrm{B}$. Addition is taken modulo 12 — the integers wrap around like a clock face, with $11 + 2 = 1$ (one semitone above B is C$\sharp$).

The set of pitch classes with this operation is a group:

$$\mathbb{Z}_{12} = \{0, 1, 2, \ldots, 11\}, \qquad x \oplus y = (x + y) \bmod 12.$$

This is the cyclic group of order 12. It has an identity element ($0$, “no transposition”), every element has an inverse ($-n \bmod 12$), and the operation is associative. If you are used to thinking about the chromatic scale as a linear sequence ending at the octave, $\mathbb{Z}_{12}$ is the insistence that it is actually a circle.

Musical Operations as Group Elements

Two operations are fundamental in tonal and post-tonal music theory.

Transposition by $n$ semitones maps every pitch class up by $n$:

$$T_n \colon x \mapsto x + n \pmod{12}.$$

The twelve transpositions $T_0, T_1, \ldots, T_{11}$ are exactly the elements of $\mathbb{Z}_{12}$, with $T_n$ corresponding to the integer $n$. Composing two transpositions gives a transposition: $T_m \circ T_n = T_{m+n}$.

Inversion reflects the pitch-class circle:

$$I \colon x \mapsto -x \pmod{12}.$$

Inversion maps C to C, D to B$\flat$, E to A$\flat$, and so on — it is the mirror symmetry of the chromatic circle about the C/F$\sharp$ axis. Combining inversion with transposition gives the inversional transpositions:

$$I_n \colon x \mapsto n - x \pmod{12}.$$

The transpositions and inversional transpositions together generate a group of order 24:

$$D_{12} = \langle T_1, I \rangle.$$

This is the dihedral group $D_{12}$ — the same abstract group that describes the symmetries of a regular 12-gon (twelve rotations and twelve reflections). The identification is not coincidental: the twelve pitch classes arranged in a circle are the vertices of a regular 12-gon, and the musical operations are geometrically the symmetries of that polygon.

Twelve-tone composition — Schoenberg’s method — is almost entirely a working-out of the consequences of $D_{12}$ acting on ordered sequences of the twelve pitch classes. The four canonical row forms (prime, inversion, retrograde, retrograde-inversion) correspond to cosets of $\mathbb{Z}_{12}$ (the transposition subgroup).

Orbits and Stabilisers

Let $S \subseteq \mathbb{Z}_{12}$ be a pitch-class set — a chord, a scale, a collection of any size.

The orbit of $S$ under $\mathbb{Z}_{12}$ is the collection of all distinct transpositions of $S$:

$$\mathrm{Orb}(S) = \{ T_n(S) : n \in \mathbb{Z}_{12} \}.$$

For most sets, all twelve transpositions produce a different set, so $|\mathrm{Orb}(S)| = 12$. The C major scale, for example, has twelve distinct transpositions, one for each key.

But some sets are symmetric under certain transpositions: there exists $n \neq 0$ such that $T_n(S) = S$. The collection of all symmetry transpositions of $S$ is the stabiliser:

$$\mathrm{Stab}(S) = \{ T_n \in \mathbb{Z}_{12} : T_n(S) = S \}.$$

Because composing two symmetry transpositions yields another, $\mathrm{Stab}(S)$ is a subgroup of $\mathbb{Z}_{12}$.

The orbit–stabiliser theorem gives the fundamental count:

$$|\mathrm{Orb}(S)| \cdot |\mathrm{Stab}(S)| = |\mathbb{Z}_{12}| = 12.$$

The number of distinct transpositions of $S$ equals $12$ divided by the number of transpositions that leave $S$ unchanged. The more internally symmetric $S$ is, the fewer new versions you can produce by transposing it.

A set with $|\mathrm{Stab}(S)| > 1$ — one that is invariant under some non-trivial transposition — is a mode of limited transposition.

Mode 1: The Whole-Tone Scale

The whole-tone scale contains the six pitch classes at even intervals:

$$\mathrm{Mode\ 1} = \{0, 2, 4, 6, 8, 10\}.$$

Transposing by $T_2$:

$$T_2(\{0, 2, 4, 6, 8, 10\}) = \{2, 4, 6, 8, 10, 0\} = \{0, 2, 4, 6, 8, 10\}. \checkmark$$

The set is unchanged. The same holds for $T_4, T_6, T_8, T_{10}$. The stabiliser is the full subgroup of even transpositions:

$$\mathrm{Stab}(\mathrm{Mode\ 1}) = \{T_0, T_2, T_4, T_6, T_8, T_{10}\} \cong \mathbb{Z}_6.$$

By the orbit–stabiliser theorem:

$$|\mathrm{Orb}(\mathrm{Mode\ 1})| = \frac{12}{6} = 2.$$

There are exactly two distinct whole-tone scales. Every pianist learns this: the one on C and the one on C$\sharp$. Composing with whole-tone harmony means working from a stock of only two harmonic pools with no way to modulate into a genuinely new version of the scale. This is Messiaen’s first charm of impossibility.

Mode 2: The Octatonic Scale

The octatonic (diminished) scale alternates half-step and whole-step intervals. Starting on C:

$$\mathrm{Mode\ 2} = \{0, 1, 3, 4, 6, 7, 9, 10\}.$$

Does $T_3$ leave this set invariant?

$$T_3(\{0, 1, 3, 4, 6, 7, 9, 10\}) = \{3, 4, 6, 7, 9, 10, 0, 1\} = \{0, 1, 3, 4, 6, 7, 9, 10\}. \checkmark$$

Also $T_6$ and $T_9$. The stabiliser is the subgroup generated by transposition by a minor third:

$$\mathrm{Stab}(\mathrm{Mode\ 2}) = \{T_0, T_3, T_6, T_9\} \cong \mathbb{Z}_4.$$

The orbit size:

$$|\mathrm{Orb}(\mathrm{Mode\ 2})| = \frac{12}{4} = 3.$$

There are exactly three distinct octatonic scales. Composers from Rimsky-Korsakov and Bartók to Coltrane have exploited this closed system. The three scales correspond to the three cosets of the subgroup $\langle T_3 \rangle$ in $\mathbb{Z}_{12}$: the cosets $\{0, 3, 6, 9\}$, $\{1, 4, 7, 10\}$, and $\{2, 5, 8, 11\}$ are the “starting-point classes” that generate each scale. Note that the scales themselves are not pairwise disjoint — each has eight pitch classes, so any two share four — but the coset structure determines which transpositions produce the same scale and which produce a different one.

The Subgroup Lattice and All Seven Modes

The orbit–stabiliser theorem constrains which stabiliser sizes are algebraically possible. Since $\mathrm{Stab}(S)$ is a subgroup of $\mathbb{Z}_{12}$, its order must divide 12. The proper non-trivial subgroups of $\mathbb{Z}_{12}$ — those with order strictly between 1 and 12 — are precisely:

Subgroup	Generator	Order	Orbit size
$\langle T_2 \rangle = \{T_0, T_2, T_4, T_6, T_8, T_{10}\}$	$T_2$	6	2
$\langle T_3 \rangle = \{T_0, T_3, T_6, T_9\}$	$T_3$	4	3
$\langle T_4 \rangle = \{T_0, T_4, T_8\}$	$T_4$	3	4
$\langle T_6 \rangle = \{T_0, T_6\}$	$T_6$	2	6

These four subgroups exist because the proper divisors of 12 that are greater than 1 are exactly $\{2, 3, 4, 6\}$. The subgroups of $\mathbb{Z}_n$ are in bijection with the divisors of $n$ — a consequence of the fundamental theorem of cyclic groups. Since $12 = 2^2 \times 3$, the proper divisors are $1, 2, 3, 4, 6$.

Each row of the table maps onto a level in Messiaen’s system:

Mode 1 (whole-tone scale): stabiliser $\langle T_2 \rangle$, 2 transpositions
Mode 2 (octatonic scale): stabiliser $\langle T_3 \rangle$, 3 transpositions
Mode 3: stabiliser $\langle T_4 \rangle$, 4 transpositions
Modes 4 – 7: stabiliser $\langle T_6 \rangle$, 6 transpositions each

The subgroup lattice of $\mathbb{Z}_{12}$ — its Hasse diagram of containment relationships — maps directly onto the hierarchy of Messiaen’s modes. The more symmetric the stabiliser subgroup, the fewer distinct transpositions the mode admits.

The containment relations are: $\langle T_2 \rangle \supset \langle T_4 \rangle$ and $\langle T_2 \rangle \supset \langle T_6 \rangle$ and $\langle T_3 \rangle \supset \langle T_6 \rangle$. Correspondingly, Mode 1 (stabiliser $\langle T_2 \rangle$, order 6) is “more limited” than Mode 3 (stabiliser $\langle T_4 \rangle$, order 3), in the sense that $\langle T_4 \rangle \subset \langle T_2 \rangle$: every symmetry of Mode 3 is also a symmetry of Mode 1’s stabiliser.

Why Exactly Seven Modes?

Messiaen was not enumerating all pitch-class sets with non-trivial stabilisers — there are many more than seven. At the level of the stabiliser $\langle T_6 \rangle$, for example, there are numerous pitch-class sets invariant under the tritone transposition $T_6$: any set $S$ such that $S = S + 6$ qualifies. Some of these sets are large (ten pitch classes), some are small (two pitch classes), some are musically coherent and some are not.

Messiaen selected seven that he found aesthetically and compositionally viable: scales of moderate cardinality, with a balance of interval types, that he could use as raw material for his harmonic language. The group theory explains the constraint (modes are possible only at the four stabiliser types listed above), not the selection (which specific sets Messiaen chose among the many that satisfy the constraint).

The question “why seven?” is therefore partly combinatorial and partly compositional. What is group-theoretically determined is the number of levels (four: orbit sizes 2, 3, 4, 6) and the impossibility of any mode with, say, five distinct transpositions (since 5 does not divide 12).

What Messiaen Knew — and Did Not Know

Messiaen described his modes in Technique de mon langage musical (1944). His account is entirely musical and phenomenological. He lists each mode by its interval sequence, notes how many transpositions it admits, and names the limitation a “charm.” The impossibility is for him a spiritual property, a form of harmonic stasis that he associated — as a devout Catholic — with divine eternity. A mode that cannot depart is, in his compositional theology, a glimpse of the unchanging.

He was not doing group theory. The orbit–stabiliser theorem (in its abstract form) postdates Lagrange (1771), Cauchy (early 19th century), and Galois (1832). But the concepts were not part of music-theoretic discourse until Milton Babbitt’s work in the 1950s, and they were not formalised in the pitch-class set framework I have used here until Allen Forte’s The Structure of Atonal Music (1973) and David Lewin’s Generalized Musical Intervals and Transformations (1987).

What Messiaen had was a musician’s ear for symmetry. He could hear that the modes were closed, without having the algebraic vocabulary to explain why. The group theory shows that he was correct, and why he was correct with a precision that no amount of phenomenological description could provide.

From Messiaen to Lewin

Lewin’s transformational theory (1987) generalises the $\mathbb{Z}_{12}$ framework to arbitrary musical spaces. A Generalized Interval System is a triple $(S, G, \mathrm{int})$ where $S$ is a set of musical objects, $G$ is a group, and $\mathrm{int} : S \times S \to G$ assigns an interval to each ordered pair of objects in a way that is consistent with the group structure.

This framework treats musical transformations — not just pitch-class transpositions but rhythmic augmentations, timbral shifts, any structurally defined operation — as elements of a group. The mathematics does not privilege any particular musical parameter; it applies wherever a transformation group acts on a set of musical objects.

Neo-Riemannian theory, which emerged from Lewin’s work in the 1980s and 1990s and was systematised by Cohn (1998), applies this framework to triadic transformations (the operations L, P, and R that map major and minor triads to their relatives, parallels, and leading-tone exchanges). The group generated by L, P, and R on the set of 24 major and minor triads is isomorphic to $D_{12}$ — the same dihedral group that governs Messiaen’s modes, but acting on a different musical space.

Emmanuel Amiot’s more recent work (2016) applies the discrete Fourier transform to pitch-class sets, using the DFT coefficients on $\mathbb{Z}_{12}$ as a continuous measure of a set’s similarity to the modes of limited transposition. The Fourier coefficients detect the algebraic symmetries that stabilisers measure discretely: a set with large coefficient at frequency $k$ (in the DFT over $\mathbb{Z}_{12}$) is close, in a precise sense, to having the stabiliser $\langle T_{12/k} \rangle$.

The group-theoretic perspective has moved, over seventy years, from a marginal curiosity to the dominant mathematical framework in music theory. Messiaen’s modes — which once seemed like personal compositional idiosyncrasies — are revealed as structurally constrained: the possible stabiliser orders are fixed by the divisors of 12, and the orbit sizes that Messiaen’s ear discovered are exactly those that Lagrange’s theorem permits. Many pitch-class sets have non-trivial stabilisers; Messiaen found the seven that are musically viable. Their limitation is not a personal choice but an algebraic fact.

The charm of impossibilities is a theorem of group theory. And it is exactly as beautiful as Messiaen heard it to be.

References

Amiot, E. (2016). Music Through Fourier Space: Discrete Fourier Transform in Music Theory. Springer (Computational Music Science).
Babbitt, M. (1960). Twelve-tone invariants as compositional determinants. The Musical Quarterly, 46(2), 246–259. https://doi.org/10.1093/mq/XLVI.2.246
Cohn, R. (1998). Introduction to neo-Riemannian theory: A survey and a historical perspective. Journal of Music Theory, 42(2), 167–180. https://doi.org/10.2307/843871
Forte, A. (1973). The Structure of Atonal Music. Yale University Press.
Lewin, D. (1987). Generalized Musical Intervals and Transformations. Yale University Press. (Reissued Oxford University Press, 2007.)
Messiaen, O. (1944). Technique de mon langage musical. Alphonse Leduc. (English translation: Satterfield, J., 1956.)
Tymoczko, D. (2006). The geometry of musical chords. Science, 313(5783), 72–74. https://doi.org/10.1126/science.1126287
Tymoczko, D. (2011). A Geometry of Music: Harmony and Counterpoint in the Extended Common Practice. Oxford University Press.

Changelog

2026-01-14: Changed “cosets of $D_{12}$” to “cosets of $\mathbb{Z}_{12}$ (the transposition subgroup)” in the twelve-tone composition paragraph. $D_{12}$ (order 24) already includes both transpositions and inversions, yielding only 2 cosets in the full serial group. The four row forms {P, I, R, RI} correspond to 4 cosets of the transposition-only subgroup $\mathbb{Z}_{12}$ (order 12) in the full group of order 48.

Spiral Out: Tool's Lateralus, the Fibonacci Sequence, and the Mathematics of Musical Structure

Tue, 08 Nov 2022 00:00:00 +0000

Two Passions, One Song

Physics training means coming to mathematics as a tool before arriving at it as an object of aesthetic interest, and it took me longer than it should have to notice that a proof can be beautiful in the same way a piece of music can be beautiful — not despite its rigour but because of it. Both reward attention to structure. Both have surfaces accessible to a casual listener and depths that only reveal themselves when you look harder.

Lateralus, the title track of Tool’s 2001 album, is a convenient case study for the overlap. It is not the only piece of music built around Fibonacci numbers — Bartók made the connection decades earlier, and it appears in scattered places across Western and non-Western traditions — but it is among the most thoroughly and deliberately constructed, and the mathematical structure is audible rather than merely theoretical.

What follows is an attempt to do justice to both dimensions: the mathematics of the Fibonacci sequence and the golden ratio, and the musical mechanics of how those structures show up and what they do.

The Fibonacci Sequence

The sequence is defined by a recurrence relation. Starting from the initial values $F(1) = 1$ and $F(2) = 1$, each subsequent term is the sum of the two preceding ones:

$$F(n) = F(n-1) + F(n-2), \quad n \geq 3$$

This gives:

$$1,\; 1,\; 2,\; 3,\; 5,\; 8,\; 13,\; 21,\; 34,\; 55,\; 89,\; 144,\; 233,\; 377,\; 610,\; \mathbf{987},\; 1597,\; \ldots$$

The term $987$ is the sixteenth Fibonacci number, $F(16)$. Keep that in mind.

The recurrence can be encoded compactly in a matrix formulation. For $n \geq 1$:

$$\begin{pmatrix} F(n+1) \\ F(n) \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^n \begin{pmatrix} 1 \\ 0 \end{pmatrix}$$

This is more than notational tidiness — it connects the Fibonacci sequence to the eigenvalues of the matrix $\mathbf{A} = \bigl(\begin{smallmatrix}1 & 1 \\ 1 & 0\end{smallmatrix}\bigr)$, which are exactly $\varphi$ and $-1/\varphi$ where $\varphi$ is the golden ratio. That connection gives us Binet’s formula, a closed-form expression for the $n$-th Fibonacci number:

$$F(n) = \frac{\varphi^n - \psi^n}{\sqrt{5}}, \quad \varphi = \frac{1+\sqrt{5}}{2},\quad \psi = \frac{1-\sqrt{5}}{2} = -\frac{1}{\varphi}$$

Since $|\psi| < 1$, the term $\psi^n / \sqrt{5}$ diminishes rapidly, and for large $n$ we have the convenient approximation:

$$F(n) \approx \frac{\varphi^n}{\sqrt{5}}$$

This means Fibonacci numbers grow exponentially, at a rate governed by the golden ratio. The sequence does not grow linearly or polynomially; it spirals outward.

The Golden Ratio

The golden ratio $\varphi$ appears as the limit of consecutive Fibonacci ratios:

$$\varphi = \lim_{n \to \infty} \frac{F(n+1)}{F(n)} = \frac{1+\sqrt{5}}{2} \approx 1.61803\ldots$$

It can be derived from a simple geometric proportion: divide a line segment into two parts such that the ratio of the whole segment to the longer part equals the ratio of the longer part to the shorter part. Calling those ratios $r$:

$$\frac{a+b}{a} = \frac{a}{b} = r \implies r^2 - r - 1 = 0 \implies r = \frac{1+\sqrt{5}}{2} = \varphi$$

What makes $\varphi$ mathematically distinctive is its continued fraction representation:

$$\varphi = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \cdots}}}$$

This is the simplest possible infinite continued fraction. It is also, in a precise sense, the hardest real number to approximate by rational fractions. The convergents of a continued fraction are the best rational approximations to a real number at each level of precision; the convergents of $\varphi$ are exactly the ratios of consecutive Fibonacci numbers: $1/1$, $2/1$, $3/2$, $5/3$, $8/5$, $13/8$, $\ldots$ These converge more slowly to $\varphi$ than the convergents of any other irrational number. $\varphi$ is, in this sense, maximally irrational.

That property has a physical consequence. In botanical phyllotaxis — the arrangement of leaves, seeds, and petals on plants — structures that grow by adding new elements at a fixed angular increment will pack most efficiently when that increment is as far as possible from any rational fraction of a full rotation. The optimal angle is:

$$\theta = \frac{2\pi}{\varphi^2} \approx 137.508°$$

This is the golden angle, and it is the reason sunflower seed spirals count $55$ and $89$ (consecutive Fibonacci numbers) in their two counter-rotating sets. The mathematics of efficient growth in nature and the mathematics of the Fibonacci sequence are the same mathematics.

The golden spiral — the logarithmic spiral whose growth factor per quarter turn is $\varphi$ — is the visual representation of this: it is self-similar, expanding without bound while maintaining constant proportionality.

Fibonacci Numbers in Music: Before Tool

The connection between the Fibonacci sequence and musical structure is not Tool’s invention. The most carefully documented case is Béla Bartók, whose Music for Strings, Percussion and Celesta (1936) has been analysed exhaustively by Ernő Lendvai. In the first movement, the climax arrives at bar 55 (a Fibonacci number), and Lendvai counted the overall structure as 89 bars — the score has 88, but he added an implied final rest bar to reach the Fibonacci number — dividing at bar 55 with near-mathematical precision. Lendvai argued that Bartók consciously embedded Fibonacci proportions into formal structure, tonal architecture, and thematic development throughout much of his output.

Whether these proportions were conscious design or an instinct that selected naturally resonant proportions is contested. The same question applies to claims about Mozart and Chopin. What is more defensible is a structural observation about the piano keyboard and Western scales that requires no attribution of intent:

A single octave on the piano keyboard has 13 keys, comprising 8 white keys and 5 black keys. The black keys are grouped as 2 and 3. The numbers $2, 3, 5, 8, 13$ are five consecutive Fibonacci numbers — $F(3)$ through $F(7)$.

The standard Western scales make this concrete. The major scale contains 7 distinct pitches within an octave of 12 semitones. The pentatonic scale (ubiquitous in folk, blues, rock) contains 5 pitches. The chromatic scale contains 12 pitch classes per octave; counting both endpoints of the octave (C to C) gives 13 chromatic notes, the next Fibonacci number.

Harmonic intervals in just intonation are rational approximations of simple frequency ratios: the octave (2:1), the perfect fifth (3:2), the perfect fourth (4:3), the major third (5:4), the minor third (6:5). The numerators and denominators are small integers, often Fibonacci numbers or their neighbours. The major triad — the structural foundation of tonal Western music — consists of intervals in frequency ratios $4:5:6$, three consecutive integers that bracket the Fibonacci-adjacent range.

This does not mean that Western music is secretly Fibonacci. It means that the integer frequency ratios that produce consonant intervals are the small integers, and small integers include the small Fibonacci numbers. The connection is genuine but not exclusive.

Lateralus

Tool’s Lateralus (2001, album of the same name) is unusual in that the Fibonacci construction is not an analytical inference applied after the fact — it was discussed publicly by the band. Drummer Danny Carey has spoken about his engagement with sacred geometry and mathematical structure, and the song’s construction has been described as intentional by multiple band members.

There are two primary levels of Fibonacci structure in the song. The third — the thematic content of the lyrics — makes the mathematical frame explicit.

The Syllable Count

The opening verses are constructed so that successive lines contain syllable counts following the Fibonacci sequence ascending: $1, 1, 2, 3, 5, 8, 13$. The first syllable count is a single word. The second is another. The third is a two-syllable phrase. The sequence continues, each line adding the weight of the previous two, until the thirteenth-syllable line, which in structure and delivery feels like the crest of a wave.

The second half of the verse then descends: $13, 8, 5, 3, 2, 1, 1$. Or, in some analyses, the chorus and pre-chorus sections begin a new ascending Fibonacci run before the full descent, creating a nested structure of expansions and contractions.

The audible effect of this design is not arbitrary. A sequence of lines whose syllable counts follow $1, 1, 2, 3, 5, 8, 13$ creates a consistently accelerating density of text over the same musical time. The vocal line becomes more compressed as the syllable count rises, building tension — and then the descent releases it. This is not how most pop or rock lyrics are structured. It produces a breathing, organic quality, the way a plant reaches toward light.

The Time Signature: 987

The verse sections of the song cycle through three time signatures in succession: $9/8$, then $8/8$, then $7/8$.

$$9/8 + 8/8 + 7/8$$

This three-bar pattern repeats. Now: the sequence of numerators is $9$, $8$, $7$. Written as a three-digit number: 987. And as noted above, $987 = F(16)$, the sixteenth Fibonacci number.

Whether this is a deliberate encoding or a remarkable coincidence is a matter of interpretation. The time signature sequence is definitely deliberate — asymmetric meters of this kind require careful compositional choice. The fact that their numerators concatenate to a Fibonacci number is either intentional and clever or accidental and still remarkable. Either way, the time signature pattern has a musical function independent of the Fibonacci reading.

In standard rock, time is almost always $4/4$: four even beats per bar, a pulse that is maximally predictable and maximally amenable to groove. The $9/8 + 8/8 + 7/8$ pattern is the opposite. Each bar has a different length. The listener’s internal metronome, calibrated to $4/4$, cannot lock onto the pattern. The music generates forward momentum not through a repeated downbeat but through the continuous, non-periodic unfolding of measures whose lengths shift. This is the rhythmic analogue of a spiral: no two revolutions are identical in length, but the growth is consistent.

The chorus and other sections use different time signatures, including stretches in $5/8$ and $7/8$ — Fibonacci numbers again, and specifically the $5, 8, 13$ triplet that appears so often in this context.

The Thematic Content

The lyrics are explicitly about spirals, Fibonacci growth, and the experience of reaching beyond a current state of development. They reference the idea of expanding one’s perception outward through accumulating cycles, each containing and exceeding the previous one. The chorus refrain — about spiralling outward — names the mathematical structure of the golden spiral directly. The song is, in its own terms, about the process that the mathematics describes.

This kind of thematic coherence between structure and content is what makes the construction interesting rather than merely clever. The Fibonacci structure is not decorative. It is the argument of the song made manifest in its form.

Why Fibonacci Structure Works in Music

The most interesting question is not whether the Fibonacci structure is there — it clearly is — but why it produces the musical effect it does.

Consider what the Fibonacci sequence represents physically. It is the growth law of structures that build on their own preceding state: $F(n) = F(n-1) + F(n-2)$. Unlike arithmetic growth (add a constant) or geometric growth (multiply by a constant), Fibonacci growth is self-referential. Each term contains the memory of the previous two. The sequence is expansive but not uniform; it accelerates, but always in proportion to what came before.

Musical tension and release are, in an important sense, the same mechanism. A phrase creates an expectation; its continuation either confirms or subverts that expectation; resolution reduces the tension. What makes a musical phrase feel like it is building toward something is precisely the progressive accumulation of expectation — each bar adding its weight to the previous, the accumulated tension requiring resolution at a scale proportional to the build-up. The Fibonacci syllable structure in Lateralus generates this literally: each line is denser than the previous two lines’ combined syllable count would suggest is comfortable, until the structure has to breathe.

The time signature asymmetry works similarly. In $4/4$, the beat is predictable, and the listener’s body can lock to it and then coast on that lock. In $9/8 + 8/8 + 7/8$, the beat is never fully locked — the pattern is periodic (it repeats) but the internal structure of each repetition is shifting. The listener is perpetually catching up, perpetually leaning slightly into the music to find the next downbeat. This is not discomfort — it is engagement. The mathematical reason is that the pattern is large enough to be periodic (it does repeat) but small enough to be audible as a unit. The brain can learn the 24-beat super-pattern; it just requires attention that $4/4$ does not.

There is a deeper reason why golden-ratio proportions feel right in musical form. The golden section of a piece — the point at which the piece divides in the $\varphi : 1$ ratio — is the point of maximum accumulated development before the final resolution. In a five-minute piece, the golden section falls at roughly 3:05. This is, empirically, where the emotional and structural climax tends to sit in a wide range of well-regarded music, from Baroque to jazz. Whether composers consciously target this proportion or whether the proportion is what accumulated development looks like when done well is not easily separable. But the mathematical reason it is a proportion worth targeting is that $\varphi$ is the only division point that is self-similar: the ratio of the whole to the longer part equals the ratio of the longer part to the shorter part. There is no arbitrary scale associated with the golden section; it is scale-invariant, the same proportion at every level of analysis.

A Brief Note on Binet and Limits

The closed-form expression for Fibonacci numbers,

$$F(n) = \frac{\varphi^n - \psi^n}{\sqrt{5}},$$

has a pleasing consequence for large $n$. Since $|\psi| \approx 0.618 < 1$, the term $\psi^n \to 0$, and $F(n)$ is simply the nearest integer to $\varphi^n / \sqrt{5}$. The integers produced by the Fibonacci recurrence are the integers that $\varphi^n / \sqrt{5}$ passes closest to. The exponential growth of $\varphi^n$ and the rounding to integers together give the sequence.

This is also why the ratios $F(n+1)/F(n)$ converge to $\varphi$ exponentially fast — the error is $\mathcal{O}(|\psi/\varphi|^n) = \mathcal{O}(\varphi^{-2n})$ — and why, for musical purposes, the Fibonacci ratios $8:5$, $13:8$, $21:13$ are already excellent approximations of the golden ratio, close enough that the ear cannot distinguish them from $\varphi$ in any direct sense.

What Lateralus Is

Lateralus is not a math lecture set to music. It is a nine-minute progressive metal track that is physically involving, rhythmically complex, and lyrically coherent. The Fibonacci structure would be worthless if the song were not also, on purely musical terms, good.

What the mathematics adds is a vocabulary for something the song achieves anyway: the sense of growing without ever arriving, of each section being both a resolution of what came before and an opening toward something larger. The golden spiral does not end. The Fibonacci sequence does not converge. The song does not resolve in the sense that a classical sonata resolves; it spirals to a close.

The reason this is worth writing about is that it makes concrete a connection that is usually stated vaguely: mathematics and music are similar. They are similar in specific and articulable ways. The self-referential structure of the Fibonacci recurrence, the scale- invariance of the golden ratio, the information-theoretic account of tension and expectation — these are not metaphors for musical experience. They are, in this case, the actual mechanism.

References

Lendvai, E. (1971). Béla Bartók: An Analysis of His Music. Kahn & Averill.

Benson, D. J. (2006). Music: A Mathematical Offering. Cambridge University Press. (For an introduction to the general theory of tuning, temperament, and harmonic series.)

Tool. (2001). Lateralus. Volcano Records.

Livio, M. (2002). The Golden Ratio: The Story of Phi, the World’s Most Astonishing Number. Broadway Books.

Knott, R. (2013). Fibonacci numbers and the golden section in art, architecture and music. University of Surrey Mathematics Department. https://r-knott.surrey.ac.uk/Fibonacci/fibInArt.html

Changelog

2025-11-20: Clarified the Bartók bar count: the written score has 88 bars; Lendvai’s analysis counted 89 by adding an implied final rest bar to reach the Fibonacci number. Previously stated as “89 bars” without qualification.