Math on Sebastian Spicker

A Gas at Temperature T: Xenakis and the Physics of Stochastic Music

Tue, 14 Oct 2025 00:00:00 +0000

Iannis Xenakis (1922–2001) was trained as a civil engineer at the Athens Polytechnic, joined the Greek Resistance during the Second World War and the subsequent Greek Civil War, survived a British army tank shell in January 1945 that cost him the sight in his left eye and part of his jaw, was sentenced to death in absentia by the Greek military government, fled to Paris in 1947, and worked for twelve years as an architect in Le Corbusier’s atelier — where he contributed structural engineering to the Unité d’Habitation in Marseille and designed the Philips Pavilion for Expo 58. In parallel, already in his thirties, he taught himself composition — approaching Honegger (who was too ill to teach) and then studying with Messiaen — and became one of the central figures of the post-war avant-garde. I mention the biography not as background colour but because it bears on the physics. A person who has been through what Xenakis had been through by 1950 is not likely to be intimidated by the kinetic theory of gases.

He was not. In 1955–56 he composed Pithoprakta — “actions through probability” — for 46 strings, each of which is, in his own account, a molecule of an ideal gas. This post works through the mathematics he used and asks what it means when a composer takes statistical mechanics seriously as a compositional tool.

The Problem with Post-War Serialism

To understand why Xenakis did what he did, it helps to know what everyone else was doing. By the early 1950s, the dominant tendency in European new music was total serialism: the systematic extension of Schoenberg’s twelve-tone technique to rhythm, dynamics, articulation, and register. Every parameter of every note was determined by a series. Messiaen had sketched this direction in Mode de valeurs et d’intensités (1949); Boulez and Stockhausen had taken it to its logical extreme.

The result, as Xenakis observed with characteristic bluntness in Formalized Music (1963/1992), was a kind of sonic indistinguishability: because every parameter varied according to independent deterministic series, the textures produced by total serialism sounded essentially like random noise. The maximum of local determinism had produced the appearance of global chaos.

His diagnosis was precise and, I think, correct: if the perceptual result of maximum determinism and maximum randomness is the same, then the path forward is not to find a better deterministic scheme but to embrace randomness explicitly, at the level that governs the macroscopic structure. Control the distribution; let the individual events vary within it. This is exactly what statistical mechanics does for a gas: it does not track every molecule, but it knows with great precision what the distribution of velocities will be.

Statistical Mechanics in Brief

In a classical ideal gas of $N$ molecules at thermal equilibrium with temperature $T$, the molecules move in all directions with speeds distributed according to the Maxwell-Boltzmann speed distribution:

$$f(v) = \sqrt{\frac{2}{\pi}}\, \frac{v^2}{a^3}\, \exp\!\left(-\frac{v^2}{2a^2}\right), \qquad a = \sqrt{\frac{k_B T}{m}},$$

where $m$ is the molecular mass and $k_B$ is Boltzmann’s constant. The parameter $a$ sets the characteristic speed scale: it grows with temperature (hotter gas means faster molecules) and shrinks with molecular mass (heavier molecules move more slowly at the same temperature).

The distribution has a characteristic shape: it rises as $v^2$ for small speeds (few molecules are nearly stationary), peaks at the most probable speed $v_p = a\sqrt{2}$, and falls off as $e^{-v^2/2a^2}$ for large speeds (very fast molecules are exponentially rare). The three characteristic speeds are:

$$v_p = a\sqrt{2}, \qquad \langle v \rangle = a\sqrt{\tfrac{8}{\pi}}, \qquad v_\mathrm{rms} = a\sqrt{3}.$$

No individual molecule is tracked. The distribution is everything: once you know $f(v)$, you know all macroscopic properties of the gas — pressure, mean kinetic energy, thermal conductivity — without knowing the trajectory of a single molecule. The individual is sacrificed to the ensemble.

Pithoprakta and the Orchestra as Gas

In Pithoprakta (1955–56), Xenakis assigns each of the 46 string instruments to a molecule of a gas. The musical analogue of molecular speed is the velocity of a glissando: the rate at which a glissando moves through pitch, measured in semitones per second. Slow glissandi are cold molecules; fast glissandi are hot ones.

For a given passage with a specified musical “temperature” (an intensity-and-density parameter he could set as a compositional choice), the 46 glissando speeds are drawn from the Maxwell-Boltzmann distribution for that temperature. No two strings play the same glissando at the same speed. The effect, to a listener, is a dense sound-mass — a shimmer or a roar — whose internal texture varies but whose overall character (the temperature, the density) is under the composer’s control at exactly the level that matters perceptually.

Xenakis worked out the velocities numerically by hand. The score of Pithoprakta was among the first in which the individual parts were derived from a statistical distribution rather than from a melody, a row, or an improvisation instruction. The calculation is tedious but not difficult: for each time window, choose a temperature, compute $f(v)$ for the 46 values of $v$ that tile the distribution, and assign one speed to each instrument.

The connection between macroscopic structure and microscopic liberty is deliberately preserved. The shape of the sound-mass — its brightness, its turbulence, its rate of change — is controlled. Each individual line is unpredictable. This is, structurally, the same trade-off that makes thermodynamics work: you give up on the individual trajectory and gain exact knowledge of the aggregate.

Musical Temperature as a Compositional Parameter

The analogy is worth making precise. In the physical gas, raising the temperature $T$ increases $a = \sqrt{k_B T / m}$, which shifts the peak of $f(v)$ to the right and widens the distribution. More molecules have high speeds; the variance of speeds increases.

In Pithoprakta, raising the musical “temperature” has the same effect: more instruments perform rapid glissandi; the pitch-space trajectories are more varied; the texture becomes more active and more turbulent. Lowering the temperature concentrates the glissando speeds near zero — slow motion, near-stasis, long sustained tones that change pitch only gradually. The orchestra cools.

This mapping is not metaphorical. Xenakis computed it. The score contains numerically derived glissando speeds; the connection between the perceptual temperature of the texture and the statistical parameter $T$ is quantitative. When musicians speak of a passage “heating up,” they are usually using a figure of speech. In Pithoprakta, they are describing a thermodynamic fact.

The Poisson Distribution and Event Density

Pithoprakta uses a second physical model alongside the Maxwell-Boltzmann distribution: the Poisson process, which governs the density of independent, randomly occurring events.

If musical events (pizzicato attacks, bow changes, individual note entries) occur at a mean rate of $\lambda$ events per second, the probability of exactly $k$ events occurring in a time window of length $T$ is:

$$P(N = k) = \frac{(\lambda T)^k\, e^{-\lambda T}}{k!}.$$

The Poisson distribution has a single parameter $\lambda$ that controls both the mean and the variance (they are equal: $\langle N \rangle = \mathrm{Var}(N) = \lambda T$). A high $\lambda$ produces a dense cluster of events; a low $\lambda$ produces sparse, widely spaced events.

Xenakis used this to control the density of pizzicato attacks independently of the glissando texture. A passage can be cool (slow glissandi) and dense (many pizzicati), or hot and sparse, or any combination. The two distributions operate on independent musical parameters — pitch motion and event density — giving the composer a two-dimensional thermodynamic control space over the texture.

Markov Chains: Analogique A and Analogique B

In Analogique A (for string orchestra, 1958–59) and its companion Analogique B (for sinusoidal tones, same year), Xenakis moved to a different stochastic framework: Markov chains.

A Markov chain is a sequence of states where the probability of transitioning to the next state depends only on the current state. The chain is specified by a transition matrix $P$, where $P_{ij}$ is the probability of moving from state $i$ to state $j$:

$$P_{ij} \geq 0, \qquad \sum_j P_{ij} = 1 \quad \forall\, i.$$

Under mild conditions (irreducibility and aperiodicity), the chain converges to a unique stationary distribution $\pi$ satisfying:

$$\pi P = \pi, \qquad \sum_i \pi_i = 1.$$

The convergence is geometric: if $\lambda_2$ is the second-largest eigenvalue of $P$ in absolute value, then after $n$ steps the distribution $\pi^{(n)}$ satisfies $\|\pi^{(n)} - \pi\| \leq C |\lambda_2|^n$ for some constant $C$. The gap $1 - |\lambda_2|$ — the spectral gap — controls how quickly the chain forgets its initial state. A transition matrix with a large spectral gap produces rapid convergence; one with $|\lambda_2| \approx 1$ produces long-memory dependence between distant states. This is a compositional choice: the spectral gap determines how quickly a piece’s texture changes character.

In Analogique A, Xenakis divided the sonic space into a grid of cells defined by pitch register (high/middle/low), density (sparse/medium/dense), and dynamic (soft/loud). Each “screen” — a brief time window — occupies one cell in this grid. The progression of screens through the piece is governed by transition probabilities: from a high/dense/loud screen, there is some probability of moving to each adjacent cell, specified by Xenakis’s chosen transition matrix.

This is a Markov chain on a discrete state space of sonic textures. The macroscopic trajectory of the piece — its overall movement through sound- quality space — is determined by the transition matrix, which the composer sets. The details of each screen are filled in stochastically, within the parameters of the current state. Again, the individual is sacrificed to the aggregate; control is exercised at the level of the distribution rather than the event.

Game Theory: Duel and Stratégie

The most extreme and, to my mind, most interesting of Xenakis’s formalisations is the use of game theory in Duel (1959) and Stratégie (1962).

A two-player zero-sum game is specified by a payoff matrix $A \in \mathbb{R}^{m \times n}$. Player 1 (the “maximiser”) chooses a row $i$; Player 2 (the “minimiser”) chooses a column $j$; Player 1 receives payoff $A_{ij}$ and Player 2 receives $-A_{ij}$. In a pure-strategy game, each player selects a single action. In a mixed-strategy game, each player chooses a probability distribution over their actions: Player 1 uses $\mathbf{x} \in \Delta_m$ and Player 2 uses $\mathbf{y} \in \Delta_n$, where $\Delta_k$ denotes the standard $(k-1)$-simplex.

The expected payoff to Player 1 under mixed strategies is:

$$E(\mathbf{x}, \mathbf{y}) = \mathbf{x}^\top A\, \mathbf{y}.$$

Von Neumann’s minimax theorem (1928) guarantees that:

$$\max_{\mathbf{x} \in \Delta_m} \min_{\mathbf{y} \in \Delta_n} \mathbf{x}^\top A\, \mathbf{y} \;=\; \min_{\mathbf{y} \in \Delta_n} \max_{\mathbf{x} \in \Delta_m} \mathbf{x}^\top A\, \mathbf{y} \;=\; v^*,$$

where $v^*$ is the value of the game. The pair $(\mathbf{x}^*, \mathbf{y}^*)$ that achieves this saddle point is the Nash equilibrium: neither player can improve their expected payoff by unilaterally deviating from their equilibrium strategy.

In Stratégie, each conductor leads one orchestra. Each has nineteen “tactics” — six basic musical textures (e.g., sustained chords, staccato pizzicati, glissandi masses, silence) plus thirteen combinatorial tactics that combine two or three of the basics. The payoff matrix is a $19 \times 19$ integer matrix, also defined by Xenakis, specifying how many points Conductor 1 scores when their orchestra plays tactic $i$ against Conductor 2’s tactic $j$. A referee tracks the score.

The conductors make decisions in real time during the performance, choosing tactics based on what the other conductor is doing and on the evolving score. The piece ends when one conductor reaches a predetermined score threshold.

The Nash equilibrium of the payoff matrix tells each conductor, in principle, the optimal distribution over tactics to play: if both play optimally, the expected score trajectory is determined. In practice, conductors are not expected to compute mixed strategies on the podium; Xenakis’s point is structural. The game-theoretic formalism is used to design the payoff matrix so that no tactic dominates — every choice has consequences that depend on the opponent’s choice — guaranteeing that the piece will always contain genuine strategic tension regardless of who is conducting.

Duel (1959) is the earlier, simpler version for two chamber orchestras. Stratégie (1962) was premiered in April 1963 at the Venice Biennale with two conductors competing live. The audience was aware of the game, of the score, and of the payoff matrix. The premiere was by most accounts a success, though the practical complications of running a zero-sum game in a concert hall (including the question of whether conductors were actually computing Nash equilibria or just following intuition) were never fully resolved.

Formalized Music

Xenakis assembled his theoretical framework in Musiques formelles (1963), translated and expanded as Formalized Music (1971; revised edition 1992). The book is one of the strangest documents in twentieth-century music theory: part treatise, part manifesto, part mathematical appendix. It covers stochastic composition, Markov chains, game theory, set theory, group theory, and symbolic logic — all presented with the confidence of someone who is equally at home in the engineering faculty and the concert hall, and with the occasional obscurity of someone writing simultaneously for two audiences who share almost no vocabulary.

The core argument is that musical composition can and should be treated as the application of mathematical structures to sonic material, not because mathematics makes music “better” but because mathematical structures are the most powerful available tools for controlling relationships between sounds at multiple scales simultaneously. The statistical distributions control the macroscopic; the individual values vary within them. The game- theoretic payoff matrix controls the strategic interaction; the individual tactics fill in the details. Mathematics operates at the structural level and leaves the acoustic surface free.

This is a different relationship between mathematics and music from the ones in my earlier posts on group theory and Messiaen or the Euclidean algorithm and world rhythms. In those cases, mathematics describes structure that already exists in the music — structure the composers arrived at by ear. In Xenakis, mathematics is the generative tool: the score is derived from the calculation.

What the Analogy Does and Does Not Do

The Maxwell-Boltzmann analogy in Pithoprakta is exact in one direction and approximate in another.

It is exact in the following sense: the glissando speeds Xenakis computed for his 46 strings genuinely follow the Maxwell-Boltzmann distribution with the parameters he chose. The score is a realisation of that distribution. If you collect the glissando speeds from the score and plot their histogram, you will find the characteristic $v^2 e^{-v^2/2a^2}$ shape.

It is approximate — or rather, it is analogical — in the sense that strings in an orchestra are not molecules of a gas. They do not collide. They have mass and inertia in a physical sense that has no direct mapping to musical parameters. The temperature $T$ is not a temperature in any thermodynamic sense; it is a compositional variable that Xenakis chose to parameterise with the same symbol because the formal relationship is the same. The analogy is structural, not ontological.

This is worth saying plainly because it is easy to be misled in both directions: either to over-claim (the orchestra is a gas) or to dismiss (the orchestra is merely labelled with physical vocabulary). The actual claim is more modest and more interesting: the mathematical structure of the Maxwell-Boltzmann distribution is the right tool for specifying a certain kind of orchestral texture, namely one where individual elements vary stochastically around a controlled macroscopic envelope. The physics provides the formalism; the music provides the application. This is how mathematics works in engineering, too.

The Centenary and What Remains

Xenakis died in 2001, by then partially deaf and with dementia. His centenary in 2022 produced a wave of new performances, recordings, and scholarship — including the Meta-Xenakis volume (Open Book Publishers, 2022), which collects analyses of his compositional mathematics, his architectural work (he designed the Philips Pavilion for Le Corbusier’s Expo 58 in Brussels using the same ruled-surface geometry he was using in Metastaseis), and his political biography.

What remains resonant about his project is not the specific distributions he chose — the Maxwell-Boltzmann is not the only or even necessarily the best distribution for many musical applications — but the epistemological position it represents. Xenakis insisted that the right question to ask about a musical texture is not “what is the note at beat 3 of bar 47?” but “what is the distribution from which the events in this section are drawn?” This shift from individual determination to statistical control is precisely the shift that makes thermodynamics possible as a science, and Xenakis was the first composer to apply it deliberately and systematically.

When a composer writes “let the orchestra be a gas at temperature $T$” and then actually computes the consequences with Boltzmann’s constant in front of him, I do not feel that physics has been appropriated. I feel that it has been recognised — seen, from a different direction, as the same thing it always was: a set of tools for thinking about ensembles of interacting elements whose individual behaviour is too complex to track but whose collective behaviour is not.

The orchestra is not a gas. But the Maxwell-Boltzmann distribution describes it anyway.

References

Ames, C. (1989). The Markov process as a compositional model: A survey and tutorial. Leonardo, 22(2), 175–187. https://doi.org/10.2307/1575226
Jedrzejewski, F. (2006). Mathematical Theory of Music. Delatour France / IRCAM.
Nash, J. F. (1950). Equilibrium points in $n$-person games. Proceedings of the National Academy of Sciences, 36(1), 48–49. https://doi.org/10.1073/pnas.36.1.48
Nierhaus, G. (2009). Algorithmic Composition: Paradigms of Automated Music Generation. Springer.
Matossian, N. (2005). Xenakis (revised ed.). Moufflon Publications.
Solomos, M. (Ed.). (2022). Meta-Xenakis. Open Book Publishers. https://doi.org/10.11647/OBP.0313
von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische Annalen, 100(1), 295–320. https://doi.org/10.1007/BF01448847
von Neumann, J., & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press.
Xenakis, I. (1992). Formalized Music: Thought and Mathematics in Composition (revised ed.). Pendragon Press. (Originally published as Musiques formelles, La Revue Musicale, 1963.)

Changelog

2026-01-14: Corrected the description of Stratégie (1962): each conductor has nineteen tactics (six basic plus thirteen combinatorial), with a 19 x 19 payoff matrix — not six tactics and a 6 x 6 matrix. The six-tactic, 6 x 6 description applies to the earlier Duel (1959).
2026-01-14: Added “in April 1963” to the Stratégie premiere sentence. The composition date is 1962; the premiere took place on 25 April 1963 at the Venice Biennale.
2026-01-14: Changed “studying briefly with Honegger” to “approaching Honegger (who was too ill to teach).” Xenakis sought instruction from Honegger circa 1949, but Honegger was in declining health and did not take him as a student.

The Oldest Algorithm in the World Plays the Clave

Mon, 07 Apr 2025 00:00:00 +0000

The first time I encountered the West African standard bell pattern it was in a Music and Physics seminar. The lecturer played a twelve-beat cycle on a wood block — seven strokes distributed unevenly but with a rightness that arrested the room. She then played the Cuban clave, the bossa nova timeline, a Bulgarian aksak rhythm. Different cultures, different instruments, different centuries. She asked whether there was a pattern. There was. It is named after a mathematician who died around 270 BCE.

Euclid’s Algorithm

Every student who has taken a number theory course has encountered the algorithm for computing the greatest common divisor of two positive integers. Given $a \geq b$, repeatedly replace $(a, b)$ with $(b, a \bmod b)$ until the remainder is zero; the last non-zero remainder is the GCD.

For example, $\gcd(8, 3)$:

$$8 = 2 \times 3 + 2 \;\Rightarrow\; \gcd(8, 3) = \gcd(3, 2)$$

$$3 = 1 \times 2 + 1 \;\Rightarrow\; \gcd(3, 2) = \gcd(2, 1)$$

$$2 = 2 \times 1 + 0 \;\Rightarrow\; \gcd(2, 1) = 1.$$

Three steps, result 1 (8 and 3 are coprime). The algorithm is efficient: the number of steps is proportional to the number of digits in the smaller input. It appears in Book VII of Euclid’s Elements, composed around 300 BCE, making it the oldest non-trivial algorithm in the Western mathematical tradition.

Distributing Onsets: Toussaint’s Observation

In 2005, Godfried Toussaint — a computer scientist and ethnomusicologist at McGill University — published the observation that the problem of distributing $k$ musical onsets as evenly as possible among $n$ time slots has the same recursive structure as Euclid’s algorithm applied to the pair $(k, n-k)$ (Toussaint, 2005).

The algorithm that solves this distribution problem was independently discovered in nuclear physics. Bjorklund (2003), working on timing systems for the Spallation Neutron Source particle accelerator at Oak Ridge, needed to distribute $k$ beam-extraction pulses as evenly as possible among $n$ machine cycles. The algorithm he derived — Bjorklund’s algorithm — is mathematically equivalent to the Euclidean algorithm applied to the same pair of integers.

The resulting pattern is denoted $E(k, n)$: the Euclidean rhythm with $k$ onsets distributed among $n$ pulses. A 1 denotes an onset; a 0 denotes a rest.

Working Through $E(3, 8)$: The Tresillo

Let us derive $E(3, 8)$ — 3 onsets distributed in 8 pulses — step by step.

Start: 3 onset groups and 5 rest groups:

$$[1]\; [1]\; [1]\; [0]\; [0]\; [0]\; [0]\; [0]$$

Step 1: Distribute one rest group into each onset group, pairing until the shorter list is exhausted. Three pairs, with $5 - 3 = 2$ rest groups remaining:

$$[1,0]\; [1,0]\; [1,0]\; [0]\; [0]$$

Step 2: Now 3 longer groups and 2 shorter groups. Distribute one shorter group into each longer group, $3 - 2 = 1$ longer group unpaired:

$$[1,0,0]\; [1,0,0]\; [1,0]$$

Step 3: The two group types have different lengths and only one group of the shorter type remains; no further pairing is possible. Read the sequence left to right:

$$E(3, 8) = [1, 0, 0, 1, 0, 0, 1, 0].$$

This is the Cuban tresillo — one of the foundational rhythmic cells of Afro-Cuban music, used across son, salsa, and mambo. Its onset positions are $\{0, 3, 6\}$, giving gap sizes $[3, 3, 2]$: two wide gaps and one narrow gap, arranged as evenly as the integers allow.

The parallel with Euclid’s algorithm is direct. In the division $8 = 2 \times 3

2$, the quotient 2 gives the number of pairing steps before a remainder appears, and the remainder 2 gives the number of groups in the shorter list at each intermediate stage. The recursion $\gcd(8, 3) \to \gcd(3, 2) \to \gcd(2, 1)$ mirrors the three steps above.

The Gap Structure

For any $E(k, n)$, the spacing between consecutive onsets takes exactly two values:

$$\text{gap} \in \left\{\left\lfloor \frac{n}{k} \right\rfloor,\ \left\lceil \frac{n}{k} \right\rceil\right\}.$$

The number of each gap size is determined by the constraint that all $k$ gaps sum to $n$. Writing $\alpha = n \bmod k$:

$$\alpha \cdot \left\lceil \frac{n}{k} \right\rceil \;+\; (k - \alpha) \cdot \left\lfloor \frac{n}{k} \right\rfloor = n.$$

So $E(k,n)$ has $\alpha$ gaps of the larger size and $k - \alpha$ gaps of the smaller size. The Euclidean property is that these two gap types are distributed as evenly as possible among themselves — not clustered at one end of the cycle but interleaved. A cycle that maximises the minimum distance between any two consecutive onsets has this property; it is called maximally even (Clough and Douthett, 1991).

For $E(3, 8)$: $\lfloor 8/3 \rfloor = 2$, $\lceil 8/3 \rceil = 3$, $\alpha = 8 \bmod 3 = 2$. Two gaps of 3, one gap of 2. Gap sequence $[3, 3, 2]$. Maximum-evenness is why the tresillo sounds “right” even though it is asymmetric: the asymmetry is the smallest possible deviation from perfect regularity.

A Gallery of World Rhythms

The following table, derived from Toussaint (2005, 2020), shows Euclidean rhythms alongside their ethnomusicological identifications. Asterisks mark patterns given as rotations of the canonical form.

Pattern	Gap structure	Musical tradition
$E(2,3) = [1,0,1]$	$[2,1]$	Iambic foot; West African, Balkan
$E(3,8) = [1,0,0,1,0,0,1,0]$	$[3,3,2]$	Cuban tresillo; Flamenco
$E(5,8) = [1,0,1,1,0,1,1,0]$	$[2,1,2,1,2]^*$	Cuban cinquillo
$E(4,9) = [1,0,0,1,0,1,0,1,0]^*$	$[3,2,2,2]^*$	Turkish aksak patterns
$E(7,12) = [1,0,1,1,0,1,0,1,1,0,1,0]$	$[2,1,2,2,1,2,2]^*$	West African standard bell
$E(9,16)$	$[2,1,2,2,1,2,2,1,2,1]^*$	Brazilian and West African
$E(13,24)$		South Indian (Carnatic) tāla

Three of these are worth examining in more detail.

$E(5,8)$: the cinquillo. Five onsets in eight pulses: $\lfloor 8/5 \rfloor = 1$, $\lceil 8/5 \rceil = 2$, $\alpha = 3$. Three gaps of 2 and two gaps of 1. Gap sequence $[2,1,2,1,2]$ or a rotation thereof. The cinquillo is a fundamental pattern in Cuban music, used as a melodic rhythmic figure in the nineteenth-century contradanza and in much of what followed.

$E(7,12)$: the West African standard bell. Seven onsets in a twelve-beat cycle: $\lfloor 12/7 \rfloor = 1$, $\lceil 12/7 \rceil = 2$, $\alpha = 5$. Five gaps of 2 and two gaps of 1. This timeline — used across the Ewe, Akan, and many other traditions in West Africa — is the cyclic reference structure against which other rhythmic layers are measured in ensemble drumming. It is also the pitch-class set $\{0, 2, 4, 5, 7, 9, 11\}$ — the Western diatonic scale, translated from pitch to rhythm. That the same maximally-even distribution describes both the diatonic scale in pitch space and the standard bell in rhythm is one of the more remarkable coincidences in mathematical music theory.

Universality across non-connected cultures. The tresillo $E(3,8)$ appears independently in Cuban music, Flamenco, Namibian Juǀ’hoansi music, and medieval Persian music (Toussaint, 2020). These traditions had no common musical ancestor that could have transmitted the pattern. The Euclidean algorithm produces what maximum evenness demands, and maximum evenness turns out to be what these rhythmic traditions independently converged on.

Circular Notation and Necklaces

Euclidean rhythms are most naturally represented as necklaces — equivalence classes of binary sequences under cyclic rotation. All rotations of $E(3,8)$ represent the same rhythmic structure with a different starting downbeat: the musical identity is independent of which position is designated “beat 1.”

In circular notation, place $n$ equally spaced dots on a circle and mark the $k$ onset positions. The pattern is immediately visible: the $k$ onset-dots divide the circle as evenly as possible. For $E(7,12)$, the seven onset dots on a twelve-position circle look like the seven vertices of a near-regular heptagon inscribed in a dodecagon. For $E(3,8)$, the three onset dots form a near- equilateral triangle.

This geometric representation makes the maximum-evenness property transparent in a way that the linear binary string does not. It also makes clear why Euclidean rhythms feel “balanced” when played: the onset dots distribute the “weight” of the cycle as uniformly as the integer constraints allow.

The mathematical theory of necklaces belongs to combinatorics on words. Euclidean rhythms correspond to specific equivalence classes of binary sequences known as Christoffel words (Lothaire, 2002): words over the alphabet $\{0,1\}$ whose combinatorial properties encode the slope of a line segment, which brings us to the third independent context in which the same algorithm appears.

The Bresenham Connection

Jack Bresenham’s line algorithm (1965) rasterises a line from $(0,0)$ to $(n,k)$ on a grid of integer pixels. At each column $x$, the algorithm tracks whether the fractional error accumulated since the last row increment exceeds $\frac{1}{2}$, and if so, increments the row and resets the error. The sequence of column positions at which the row increments is the onset pattern $E(k,n)$.

Formally, an onset occurs at position $m$ in $E(k,n)$ if and only if:

$$\left\lfloor \frac{(m+1)\, k}{n} \right\rfloor > \left\lfloor \frac{m\, k}{n} \right\rfloor.$$

Equivalently, the onset positions themselves form the sequence:

$$s_j = \left\lfloor \frac{j \cdot n}{k} \right\rfloor, \qquad j = 0, 1, \ldots, k-1.$$

For $E(3,8)$: $s_0 = 0$, $s_1 = \lfloor 8/3 \rfloor = 2$, $s_2 = \lfloor 16/3 \rfloor = 5$, giving onset positions $\{0, 2, 5\}$ — a rotation of the tresillo.

This is exactly the Bresenham increment condition. Drawing the line from $(0,0)$ to $(8,3)$ and marking where the $y$-coordinate takes a step produces the onset positions $\{2, 5, 7\}$ — a rotation of the tresillo $\{0, 3, 6\}$.

Three independent fields — ancient Greek number theory, Afro-Caribbean percussion, and 1960s computer graphics — converge on the same mathematical object. This is not a coincidence. All three are solving the same fundamental problem: how to distribute $k$ discrete events as evenly as possible among $n$ slots. When the problem is universal, its solution is too.

Euclidean Rhythms in Contemporary Practice

Toussaint’s 2005 paper was primarily a contribution to computational ethnomusicology, but it reached electronic music production rapidly. Euclidean rhythm sequencers are now standard in modular synthesis (dedicated Eurorack hardware modules exist under names including “Euclidean” and “Erica Synths Pico”) and digital audio workstations (as Max for Live devices and software plug-ins). The interface is minimal: set $k$ and $n$, adjust the rotation offset, and hear the resulting timeline immediately.

This has opened a compositional mode in which the mathematical structure is operational: a producer constructing a layered African-style polyrhythm by stacking $E(3,8)$, $E(5,8)$, and $E(7,8)$ on different instruments is — whether they know it or not — computing the Euclidean algorithm three times and listening to the result.

Implications for Teaching Rhythm

Music conservatories in the European tradition teach rhythm almost entirely through Western notation: time signatures, note values, dotted notes, ties. This system is well-suited to the repertoire it was designed for. It handles Euclidean rhythms awkwardly. The tresillo $E(3,8)$ requires either a triplet feel against a binary pulse or a notation involving a dotted quarter note followed by a dotted quarter and a quarter, which correctly represents the sound but obscures the structural principle entirely.

The Euclidean framework suggests a different pedagogical starting point. Rather than beginning from the bar line and asking how notes fill it, begin from the cycle length $n$ and the onset count $k$ and ask how to distribute the onsets as evenly as possible. The answer is always computable and always produces a recognisable rhythm.

For students who encounter West African, Afro-Cuban, or Middle Eastern music — which conservatory students increasingly do — having a framework that makes these rhythms structurally necessary rather than culturally exotic changes the pedagogical relationship fundamentally. The tresillo is not a deviation from “normal” rhythm. It is the unique maximally even solution to the problem of placing three beats in eight pulses. That the same algorithm appeared in a 300 BCE Alexandrian text on number theory is an accident of the history of mathematics. That it sounds right is not.

Whether conservatory curricula are ready to incorporate the Euclidean framework alongside Western notation is a separate question. The mathematics does not demand it. But it offers a language for rhythm that transcends the Western bar-line without abandoning precision — and that seems worth something, especially in a world where the music students will perform and teach is no longer exclusively European.

References

Bjorklund, E. (2003). The theory of rep-rate pattern generation in the SNS timing system. Technical Report SNS-NOTE-CNTRL-99, Spallation Neutron Source, Oak Ridge National Laboratory.
Bresenham, J. E. (1965). Algorithm for computer control of a digital plotter. IBM Systems Journal, 4(1), 25–30. https://doi.org/10.1147/sj.41.0025
Clough, J., & Douthett, J. (1991). Maximally even sets. Journal of Music Theory, 35(1–2), 93–173. https://doi.org/10.2307/843811
Lothaire, M. (2002). Algebraic Combinatorics on Words. Cambridge University Press.
Toussaint, G. T. (2005). The Euclidean algorithm generates traditional musical rhythms. In R. Sarhangi & J. Sharp (Eds.), Proceedings of BRIDGES 2005: Mathematical Connections in Art, Music, and Science (pp. 47–56). Bridges Conference.
Toussaint, G. T. (2020). The Geometry of Musical Rhythm: What Makes a “Good” Rhythm Good? (2nd ed.). Chapman & Hall/CRC Press.

The Charm of Impossibilities: Group Theory and Messiaen's Modes of Limited Transposition

Wed, 19 Apr 2023 00:00:00 +0000

I first encountered Messiaen’s second mode — the octatonic scale — in an analysis seminar during my physics studies, played by a colleague on an upright piano in a rehearsal room with terrible acoustics. She demonstrated something that stopped me: no matter how many times she transposed the scale up by a minor third, she could never find a “new” version. After three transpositions she was back where she started. She called it the charm of impossibilities. It took me years to understand why it is impossible, and longer still to see that the answer is not musical but algebraic.

This post is a companion to Fibonacci, the Golden Ratio, and Tool’s Lateralus, which found number theory in a prog-rock song. Here we find abstract algebra in twentieth-century sacred music.

Pitch Classes and the Chromatic Clock

Western music divides the octave into twelve equal semitones. For purposes of harmony and counterpoint, the absolute pitch is often less important than the pitch class — the equivalence class of all pitches related by octave transposition. Middle C and the C two octaves above belong to the same pitch class.

We label the twelve pitch classes $0, 1, 2, \ldots, 11$, with $0 = \mathrm{C}$, $1 = \mathrm{C}\sharp/\mathrm{D}\flat$, $2 = \mathrm{D}$, and so on up to $11 = \mathrm{B}$. Addition is taken modulo 12 — the integers wrap around like a clock face, with $11 + 2 = 1$ (one semitone above B is C$\sharp$).

The set of pitch classes with this operation is a group:

$$\mathbb{Z}_{12} = \{0, 1, 2, \ldots, 11\}, \qquad x \oplus y = (x + y) \bmod 12.$$

This is the cyclic group of order 12. It has an identity element ($0$, “no transposition”), every element has an inverse ($-n \bmod 12$), and the operation is associative. If you are used to thinking about the chromatic scale as a linear sequence ending at the octave, $\mathbb{Z}_{12}$ is the insistence that it is actually a circle.

Musical Operations as Group Elements

Two operations are fundamental in tonal and post-tonal music theory.

Transposition by $n$ semitones maps every pitch class up by $n$:

$$T_n \colon x \mapsto x + n \pmod{12}.$$

The twelve transpositions $T_0, T_1, \ldots, T_{11}$ are exactly the elements of $\mathbb{Z}_{12}$, with $T_n$ corresponding to the integer $n$. Composing two transpositions gives a transposition: $T_m \circ T_n = T_{m+n}$.

Inversion reflects the pitch-class circle:

$$I \colon x \mapsto -x \pmod{12}.$$

Inversion maps C to C, D to B$\flat$, E to A$\flat$, and so on — it is the mirror symmetry of the chromatic circle about the C/F$\sharp$ axis. Combining inversion with transposition gives the inversional transpositions:

$$I_n \colon x \mapsto n - x \pmod{12}.$$

The transpositions and inversional transpositions together generate a group of order 24:

$$D_{12} = \langle T_1, I \rangle.$$

This is the dihedral group $D_{12}$ — the same abstract group that describes the symmetries of a regular 12-gon (twelve rotations and twelve reflections). The identification is not coincidental: the twelve pitch classes arranged in a circle are the vertices of a regular 12-gon, and the musical operations are geometrically the symmetries of that polygon.

Twelve-tone composition — Schoenberg’s method — is almost entirely a working-out of the consequences of $D_{12}$ acting on ordered sequences of the twelve pitch classes. The four canonical row forms (prime, inversion, retrograde, retrograde-inversion) correspond to cosets of $\mathbb{Z}_{12}$ (the transposition subgroup).

Orbits and Stabilisers

Let $S \subseteq \mathbb{Z}_{12}$ be a pitch-class set — a chord, a scale, a collection of any size.

The orbit of $S$ under $\mathbb{Z}_{12}$ is the collection of all distinct transpositions of $S$:

$$\mathrm{Orb}(S) = \{ T_n(S) : n \in \mathbb{Z}_{12} \}.$$

For most sets, all twelve transpositions produce a different set, so $|\mathrm{Orb}(S)| = 12$. The C major scale, for example, has twelve distinct transpositions, one for each key.

But some sets are symmetric under certain transpositions: there exists $n \neq 0$ such that $T_n(S) = S$. The collection of all symmetry transpositions of $S$ is the stabiliser:

$$\mathrm{Stab}(S) = \{ T_n \in \mathbb{Z}_{12} : T_n(S) = S \}.$$

Because composing two symmetry transpositions yields another, $\mathrm{Stab}(S)$ is a subgroup of $\mathbb{Z}_{12}$.

The orbit–stabiliser theorem gives the fundamental count:

$$|\mathrm{Orb}(S)| \cdot |\mathrm{Stab}(S)| = |\mathbb{Z}_{12}| = 12.$$

The number of distinct transpositions of $S$ equals $12$ divided by the number of transpositions that leave $S$ unchanged. The more internally symmetric $S$ is, the fewer new versions you can produce by transposing it.

A set with $|\mathrm{Stab}(S)| > 1$ — one that is invariant under some non-trivial transposition — is a mode of limited transposition.

Mode 1: The Whole-Tone Scale

The whole-tone scale contains the six pitch classes at even intervals:

$$\mathrm{Mode\ 1} = \{0, 2, 4, 6, 8, 10\}.$$

Transposing by $T_2$:

$$T_2(\{0, 2, 4, 6, 8, 10\}) = \{2, 4, 6, 8, 10, 0\} = \{0, 2, 4, 6, 8, 10\}. \checkmark$$

The set is unchanged. The same holds for $T_4, T_6, T_8, T_{10}$. The stabiliser is the full subgroup of even transpositions:

$$\mathrm{Stab}(\mathrm{Mode\ 1}) = \{T_0, T_2, T_4, T_6, T_8, T_{10}\} \cong \mathbb{Z}_6.$$

By the orbit–stabiliser theorem:

$$|\mathrm{Orb}(\mathrm{Mode\ 1})| = \frac{12}{6} = 2.$$

There are exactly two distinct whole-tone scales. Every pianist learns this: the one on C and the one on C$\sharp$. Composing with whole-tone harmony means working from a stock of only two harmonic pools with no way to modulate into a genuinely new version of the scale. This is Messiaen’s first charm of impossibility.

Mode 2: The Octatonic Scale

The octatonic (diminished) scale alternates half-step and whole-step intervals. Starting on C:

$$\mathrm{Mode\ 2} = \{0, 1, 3, 4, 6, 7, 9, 10\}.$$

Does $T_3$ leave this set invariant?

$$T_3(\{0, 1, 3, 4, 6, 7, 9, 10\}) = \{3, 4, 6, 7, 9, 10, 0, 1\} = \{0, 1, 3, 4, 6, 7, 9, 10\}. \checkmark$$

Also $T_6$ and $T_9$. The stabiliser is the subgroup generated by transposition by a minor third:

$$\mathrm{Stab}(\mathrm{Mode\ 2}) = \{T_0, T_3, T_6, T_9\} \cong \mathbb{Z}_4.$$

The orbit size:

$$|\mathrm{Orb}(\mathrm{Mode\ 2})| = \frac{12}{4} = 3.$$

There are exactly three distinct octatonic scales. Composers from Rimsky-Korsakov and Bartók to Coltrane have exploited this closed system. The three scales correspond to the three cosets of the subgroup $\langle T_3 \rangle$ in $\mathbb{Z}_{12}$: the cosets $\{0, 3, 6, 9\}$, $\{1, 4, 7, 10\}$, and $\{2, 5, 8, 11\}$ are the “starting-point classes” that generate each scale. Note that the scales themselves are not pairwise disjoint — each has eight pitch classes, so any two share four — but the coset structure determines which transpositions produce the same scale and which produce a different one.

The Subgroup Lattice and All Seven Modes

The orbit–stabiliser theorem constrains which stabiliser sizes are algebraically possible. Since $\mathrm{Stab}(S)$ is a subgroup of $\mathbb{Z}_{12}$, its order must divide 12. The proper non-trivial subgroups of $\mathbb{Z}_{12}$ — those with order strictly between 1 and 12 — are precisely:

Subgroup	Generator	Order	Orbit size
$\langle T_2 \rangle = \{T_0, T_2, T_4, T_6, T_8, T_{10}\}$	$T_2$	6	2
$\langle T_3 \rangle = \{T_0, T_3, T_6, T_9\}$	$T_3$	4	3
$\langle T_4 \rangle = \{T_0, T_4, T_8\}$	$T_4$	3	4
$\langle T_6 \rangle = \{T_0, T_6\}$	$T_6$	2	6

These four subgroups exist because the proper divisors of 12 that are greater than 1 are exactly $\{2, 3, 4, 6\}$. The subgroups of $\mathbb{Z}_n$ are in bijection with the divisors of $n$ — a consequence of the fundamental theorem of cyclic groups. Since $12 = 2^2 \times 3$, the proper divisors are $1, 2, 3, 4, 6$.

Each row of the table maps onto a level in Messiaen’s system:

Mode 1 (whole-tone scale): stabiliser $\langle T_2 \rangle$, 2 transpositions
Mode 2 (octatonic scale): stabiliser $\langle T_3 \rangle$, 3 transpositions
Mode 3: stabiliser $\langle T_4 \rangle$, 4 transpositions
Modes 4 – 7: stabiliser $\langle T_6 \rangle$, 6 transpositions each

The subgroup lattice of $\mathbb{Z}_{12}$ — its Hasse diagram of containment relationships — maps directly onto the hierarchy of Messiaen’s modes. The more symmetric the stabiliser subgroup, the fewer distinct transpositions the mode admits.

The containment relations are: $\langle T_2 \rangle \supset \langle T_4 \rangle$ and $\langle T_2 \rangle \supset \langle T_6 \rangle$ and $\langle T_3 \rangle \supset \langle T_6 \rangle$. Correspondingly, Mode 1 (stabiliser $\langle T_2 \rangle$, order 6) is “more limited” than Mode 3 (stabiliser $\langle T_4 \rangle$, order 3), in the sense that $\langle T_4 \rangle \subset \langle T_2 \rangle$: every symmetry of Mode 3 is also a symmetry of Mode 1’s stabiliser.

Why Exactly Seven Modes?

Messiaen was not enumerating all pitch-class sets with non-trivial stabilisers — there are many more than seven. At the level of the stabiliser $\langle T_6 \rangle$, for example, there are numerous pitch-class sets invariant under the tritone transposition $T_6$: any set $S$ such that $S = S + 6$ qualifies. Some of these sets are large (ten pitch classes), some are small (two pitch classes), some are musically coherent and some are not.

Messiaen selected seven that he found aesthetically and compositionally viable: scales of moderate cardinality, with a balance of interval types, that he could use as raw material for his harmonic language. The group theory explains the constraint (modes are possible only at the four stabiliser types listed above), not the selection (which specific sets Messiaen chose among the many that satisfy the constraint).

The question “why seven?” is therefore partly combinatorial and partly compositional. What is group-theoretically determined is the number of levels (four: orbit sizes 2, 3, 4, 6) and the impossibility of any mode with, say, five distinct transpositions (since 5 does not divide 12).

What Messiaen Knew — and Did Not Know

Messiaen described his modes in Technique de mon langage musical (1944). His account is entirely musical and phenomenological. He lists each mode by its interval sequence, notes how many transpositions it admits, and names the limitation a “charm.” The impossibility is for him a spiritual property, a form of harmonic stasis that he associated — as a devout Catholic — with divine eternity. A mode that cannot depart is, in his compositional theology, a glimpse of the unchanging.

He was not doing group theory. The orbit–stabiliser theorem (in its abstract form) postdates Lagrange (1771), Cauchy (early 19th century), and Galois (1832). But the concepts were not part of music-theoretic discourse until Milton Babbitt’s work in the 1950s, and they were not formalised in the pitch-class set framework I have used here until Allen Forte’s The Structure of Atonal Music (1973) and David Lewin’s Generalized Musical Intervals and Transformations (1987).

What Messiaen had was a musician’s ear for symmetry. He could hear that the modes were closed, without having the algebraic vocabulary to explain why. The group theory shows that he was correct, and why he was correct with a precision that no amount of phenomenological description could provide.

From Messiaen to Lewin

Lewin’s transformational theory (1987) generalises the $\mathbb{Z}_{12}$ framework to arbitrary musical spaces. A Generalized Interval System is a triple $(S, G, \mathrm{int})$ where $S$ is a set of musical objects, $G$ is a group, and $\mathrm{int} : S \times S \to G$ assigns an interval to each ordered pair of objects in a way that is consistent with the group structure.

This framework treats musical transformations — not just pitch-class transpositions but rhythmic augmentations, timbral shifts, any structurally defined operation — as elements of a group. The mathematics does not privilege any particular musical parameter; it applies wherever a transformation group acts on a set of musical objects.

Neo-Riemannian theory, which emerged from Lewin’s work in the 1980s and 1990s and was systematised by Cohn (1998), applies this framework to triadic transformations (the operations L, P, and R that map major and minor triads to their relatives, parallels, and leading-tone exchanges). The group generated by L, P, and R on the set of 24 major and minor triads is isomorphic to $D_{12}$ — the same dihedral group that governs Messiaen’s modes, but acting on a different musical space.

Emmanuel Amiot’s more recent work (2016) applies the discrete Fourier transform to pitch-class sets, using the DFT coefficients on $\mathbb{Z}_{12}$ as a continuous measure of a set’s similarity to the modes of limited transposition. The Fourier coefficients detect the algebraic symmetries that stabilisers measure discretely: a set with large coefficient at frequency $k$ (in the DFT over $\mathbb{Z}_{12}$) is close, in a precise sense, to having the stabiliser $\langle T_{12/k} \rangle$.

The group-theoretic perspective has moved, over seventy years, from a marginal curiosity to the dominant mathematical framework in music theory. Messiaen’s modes — which once seemed like personal compositional idiosyncrasies — are revealed as structurally constrained: the possible stabiliser orders are fixed by the divisors of 12, and the orbit sizes that Messiaen’s ear discovered are exactly those that Lagrange’s theorem permits. Many pitch-class sets have non-trivial stabilisers; Messiaen found the seven that are musically viable. Their limitation is not a personal choice but an algebraic fact.

The charm of impossibilities is a theorem of group theory. And it is exactly as beautiful as Messiaen heard it to be.

References

Amiot, E. (2016). Music Through Fourier Space: Discrete Fourier Transform in Music Theory. Springer (Computational Music Science).
Babbitt, M. (1960). Twelve-tone invariants as compositional determinants. The Musical Quarterly, 46(2), 246–259. https://doi.org/10.1093/mq/XLVI.2.246
Cohn, R. (1998). Introduction to neo-Riemannian theory: A survey and a historical perspective. Journal of Music Theory, 42(2), 167–180. https://doi.org/10.2307/843871
Forte, A. (1973). The Structure of Atonal Music. Yale University Press.
Lewin, D. (1987). Generalized Musical Intervals and Transformations. Yale University Press. (Reissued Oxford University Press, 2007.)
Messiaen, O. (1944). Technique de mon langage musical. Alphonse Leduc. (English translation: Satterfield, J., 1956.)
Tymoczko, D. (2006). The geometry of musical chords. Science, 313(5783), 72–74. https://doi.org/10.1126/science.1126287
Tymoczko, D. (2011). A Geometry of Music: Harmony and Counterpoint in the Extended Common Practice. Oxford University Press.

Changelog

2026-01-14: Changed “cosets of $D_{12}$” to “cosets of $\mathbb{Z}_{12}$ (the transposition subgroup)” in the twelve-tone composition paragraph. $D_{12}$ (order 24) already includes both transpositions and inversions, yielding only 2 cosets in the full serial group. The four row forms {P, I, R, RI} correspond to 4 cosets of the transposition-only subgroup $\mathbb{Z}_{12}$ (order 12) in the full group of order 48.