From Oxide to Oversampling: The Physics of Recorded Sound

There is an argument that has been running in recording studios since roughly 1982, when the first commercially mastered compact discs appeared. On one side: analogue tape has warmth, depth, something the ear likes. On the other: digital audio is more accurate, lower noise, the measurements say so. The argument produces more heat than light, because most participants treat it as an aesthetic question — a matter of feeling, taste, preference. It is not. The difference between tape and digital audio is a physics difference, and the physics is specific enough to calculate.

The physics here turns out to be some of my favourite kind: it sits at the intersection of condensed matter, signal processing, and Fourier analysis, and it connects directly to why certain sounds are perceived as pleasant. This post walks through both sides. Part I is the ferromagnetic physics of magnetic tape and the harmonic structure of saturation distortion. Part II is the delta-sigma modulator and the engineering trick that achieves 24-bit dynamic range from a 1-bit comparator. Neither side of the debate is as simple as its partisans claim, and the physics of both is more interesting than the aesthetics argument they have been stuck in for forty years.

Part I: The Physics of Magnetic Tape

Ferromagnetic Recording

Magnetic recording tape is a thin polymer substrate coated with a layer of ferromagnetic particles suspended in a binder. For most of the twentieth century those particles were iron oxide — specifically $\gamma\text{-Fe}_2\text{O}_3$, gamma-phase ferric oxide — though chromium dioxide ($\text{CrO}_2$) and later metal-particle formulations with pure iron or iron-cobalt alloys were developed for higher coercivity and better high-frequency response. What all of these materials share is the key property of ferromagnetism: each particle is a small permanent magnet, a magnetic domain with a net magnetic moment that can be oriented by an external field and that will retain that orientation when the field is removed.

The recording process exploits this directly. The recording head is a toroidal electromagnet with a narrow gap. When audio-frequency current flows through the head’s coil, the field at the gap follows the current, and as the tape moves past at a fixed speed, successive particles along the tape length are aligned according to the instantaneous field at the moment they pass the gap. The result is a spatial encoding of the time-domain audio signal along the tape. On playback, the inverse process occurs: the moving pattern of magnetised particles generates a time-varying flux in the playback head’s core, which induces a voltage in the coil by Faraday’s law, reproducing the original current waveform.

So far this description is entirely linear. The head current maps to a field, the field maps to a magnetisation, the magnetisation maps back to a voltage. If all three relationships were linear, tape would be a near-perfect recording medium — limited only by particle noise and head gap frequency response. The nonlinearity comes from the second relationship in that chain, and it comes from the fundamental physics of how ferromagnetic materials respond to an applied field.

The B-H Curve and Hysteresis

The relationship between the applied magnetic field intensity $H$ (from the recording head, measured in A/m) and the resulting magnetic flux density $B$ in the tape (measured in tesla) is not linear. It follows a curve — actually a family of nested curves — known as the hysteresis loop, and its shape determines almost everything interesting about tape recording [3].

Starting from a demagnetised state and increasing $H$ from zero, the initial slope $dB/dH$ — the magnetic permeability $\mu$ — is relatively low. The domains in the material are oriented randomly and require a threshold of energy to begin reorienting. As $H$ increases further, the permeability rises, and there is a region of steep, approximately linear increase in $B$. Then, as $H$ continues to increase, the material saturates: progressively fewer unaligned domains remain, the slope falls, and eventually $dB/dH \to 0$ as all domains are aligned. The $B$-$H$ curve is S-shaped, and the saturation is irreversible in a specific sense: if you now reduce $H$ back toward zero, $B$ does not retrace the original path. It remains at a higher value — the remanence $B_r$ — and you must apply a reverse field of magnitude $H_c$, the coercivity, to bring $B$ back to zero. The loop formed by this cycle of magnetisation and demagnetisation is the hysteresis loop, and its area is proportional to the energy dissipated as heat per cycle.

The crucial feature for audio recording is what happens near the origin. A small audio signal, sitting near $H = 0$, does not experience a nicely linear region of the $B$-$H$ curve. The initial permeability is low, and there is an inflection point near zero: the slope increases as you move away from zero before the saturation region brings it back down again. This means that even at low recording levels, the transfer function from head current to tape magnetisation is nonlinear, and in a particular way — the response is symmetric under $H \to -H$, which means the dominant nonlinear term is even-order. Without some remedy, even a gentle sine wave would emerge from the playback head with significant even-harmonic content added. The signal would also sit in a region of the curve where the effective permeability depends on signal amplitude, making the recording level-dependent in an uncontrolled way. Something needed to be done about this, and the solution found in the 1940s is one of the more elegant pieces of applied physics in the history of the recording industry.

The Bias Signal

The solution is called AC bias, and its discovery is usually credited to Braunmühl and Weber at the German Reichs-Rundfunk-Gesellschaft around 1940, though there are earlier related patents. The idea is simple once stated: add a high-frequency signal — typically between 50 kHz and 150 kHz, well above the audio band — to the recording current before it drives the head. This bias signal has an amplitude large enough to drive the tape through multiple cycles of its B-H curve on each audio cycle, but it is filtered out of the playback signal by the tape’s own limited high-frequency response and by subsequent low-pass filtering.

The effect on the recording process is to linearise the transfer function. The operating point is no longer stationary near the inflection point at $H = 0$. Instead, it rides up and down the B-H curve rapidly many times per audio period, driven by the bias. The audio signal merely modulates the envelope of this rapid oscillation. The net magnetisation that remains after the tape leaves the head gap is the time average of many rapid traversals of the hysteresis loop, and this average tracks the audio signal with good linearity provided the signal level is modest. The bias amplitude and frequency are tuned carefully for each tape formulation — too little bias and the linearisation is incomplete; too much and the signal is undermodulated and the high-frequency response suffers as the bias begins to erase fine spatial patterns written by high-frequency audio. Getting the bias right is part of the alignment procedure for every analogue tape machine and part of why different tape formulations require different machine settings.

The result, for moderate recording levels, is a remarkably clean and linear recording medium. The nonlinear character of the B-H curve is effectively tamed by the bias trick, and the remaining imperfections are mostly second-order: azimuth errors, print-through, head bump, self-demagnetisation at short wavelengths. For practical purposes, a well-aligned analogue tape machine at moderate recording levels is a linear system.

Harmonic Generation at High Levels

At high recording levels — when the audio signal is large enough to push the operating point into the saturation region even after the bias has done its linearising work — the picture changes. The transfer function from input current to output magnetisation becomes genuinely nonlinear, and the harmonic content of the distortion becomes the central question.

The standard framework is a Taylor expansion of the transfer function around the operating point:

$$y(t) = a_1 x(t) + a_2 x^2(t) + a_3 x^3(t) + a_4 x^4(t) + \cdots$$

where $x(t)$ is the input signal (the audio current), $y(t)$ is the output (the magnetisation recorded on tape), and the coefficients $a_n$ are determined by the shape of the B-H curve near saturation. For a pure tone $x(t) = A \sin(\omega t)$, the higher-order terms generate harmonics in a calculable way.

The second-order term gives:

$$a_2 x^2(t) = a_2 A^2 \sin^2(\omega t) = \frac{a_2 A^2}{2}\bigl(1 - \cos 2\omega t\bigr)$$

This is a DC offset plus a component at $2\omega$ — the second harmonic, one octave above the fundamental.

The third-order term gives:

$$a_3 x^3(t) = a_3 A^3 \sin^3(\omega t) = a_3 A^3 \left(\frac{3}{4}\sin\omega t - \frac{1}{4}\sin 3\omega t\right)$$

The $\frac{3}{4}$ piece adds to (or subtracts from) the fundamental depending on the sign of $a_3$; the $-\frac{1}{4}$ piece is a third harmonic at $3\omega$, one octave and a fifth above the fundamental.

Carrying through to fourth order:

$$a_4 x^4(t) = \frac{a_4 A^4}{8}\bigl(3 - 4\cos 2\omega t + \cos 4\omega t\bigr)$$

which contributes additional DC, a component at $2\omega$, and a fourth harmonic at $4\omega$.

Collecting the terms through fourth order, the output is approximately:

$$y(t) \approx \left(a_1 + \frac{3a_3 A^2}{4}\right)A\sin\omega t - \frac{a_2 A^2}{2}\cos 2\omega t - \frac{a_3 A^3}{4}\sin 3\omega t + \cdots$$

The important observation is about which harmonics dominate and what they sound like. The B-H curve of a ferromagnetic material near saturation is approximately symmetric: the saturation behaviour for positive $H$ mirrors that for negative $H$. A symmetric nonlinearity has $a_2 = a_4 = 0$ (all even coefficients vanish by symmetry), and only odd harmonics are generated. But at moderate levels, just before full saturation, the symmetry of the B-H loop as traversed by the biased signal is not perfect, and the even-order terms are nonzero — though small. This gives tape its characteristic distortion signature: at moderate saturation levels, the even harmonics ($2\omega$, $4\omega$) dominate; at heavy saturation, the odd harmonics ($3\omega$, $5\omega$) appear more strongly.

The perceptual consequence of this is the crux of the “analogue warmth” story. The second harmonic is the octave of the fundamental. The fourth harmonic is the double octave. These are, in Western harmonic practice and in the physics of vibrating strings, the most consonant possible intervals. Adding even harmonics at low amplitude to a fundamental makes the sound fuller and richer without introducing beating or dissonance. Odd harmonics — particularly the fifth (at $5\omega$, a major third above the double octave) and the seventh (a flattened seventh above the double octave) — are less consonant relative to the fundamental and at high amplitude produce the harsh, buzzy character associated with heavy distortion or the deliberate aggression of a fuzz pedal.

There is one more effect worth naming: the saturation is a soft knee. The B-H curve does not have a sharp corner at saturation — it curves gradually from the linear region into the flat-topped saturation region. This means that transient signals — percussive attacks, consonant onsets — that briefly exceed the nominal recording level are not hard-clipped but gently compressed. Their peaks are rounded by the shape of the B-H curve. Engineers and producers who record through tape often describe this as the machine “breathing” or as a pleasing “gluing” of transients. The physics is simple: the soft-knee transfer function applies more gain reduction to instantaneous peaks than to the sustained body of the signal, functioning as a fast, musically transparent dynamic compressor for any material that approaches saturation.

Part II: The Physics of Delta-Sigma Conversion

Nyquist-Rate ADC and Its Limits

The straightforward approach to analogue-to-digital audio conversion samples the signal at a rate just above twice the highest audio frequency — the Nyquist rate — using a quantiser with enough bits to achieve the desired dynamic range. For CD-quality audio, the sampling rate is 44.1 kHz (slightly above $2 \times 20{,}000$ Hz) and the word length is 16 bits. The dynamic range of a $b$-bit PCM system is, to a good approximation:

$$\text{SNR} \approx 6.02b + 1.76 \text{ dB}$$

so 16 bits gives approximately $6.02 \times 16 + 1.76 \approx 98$ dB, which matches the dynamic range of the best analogue tape and is well above the approximately 70 dB achievable with the noise floor of typical studio tape at 15 ips [4].

The engineering problem with a straightforward Nyquist-rate ADC is the anti-aliasing filter. Before sampling, all content above $f_s/2 = 22.05$ kHz must be removed. If it is not, energy at frequency $f > f_s/2$ aliases into the audio band as a spurious component at $f_s - f$, which is inaudible in origin but very much audible in its alias. To achieve 98 dB of alias suppression — matching the 16-bit dynamic range — the filter must attenuate signals at 22.05 kHz by 98 dB relative to signals at 20 kHz. The transition band is only 2.05 kHz wide. That requires a very high-order analogue filter — typically seventh-order elliptic or Chebyshev — and such filters have significant phase distortion within the audio band, particularly at frequencies near the passband edge. In 1982, building this filter precisely, cheaply, and repeatably in consumer hardware was a genuine engineering challenge. The filters introduced audible phase and amplitude ripple that the original measurements had not anticipated and that contributed to early criticisms of the CD sound.

Oversampling

The delta-sigma ($\Sigma\Delta$) ADC architecture was developed to sidestep the steep-filter problem entirely, and its adoption in consumer audio from the late 1980s onwards largely resolved the anti-aliasing filter debate [1]. The core idea is oversampling: instead of sampling at 44.1 kHz with 16 bits, the $\Sigma\Delta$ converter samples at $M \times 44.1$ kHz — where $M$ is the oversampling ratio, typically 64 in early audio converters, giving $64 \times 44.1 = 2.8224$ MHz — with a 1-bit quantiser. The anti-aliasing filter now needs to attenuate everything above 1.4112 MHz before sampling. Its transition band runs from 20 kHz to 1.4112 MHz, a ratio of roughly 70:1. This is easy: a simple, cheap, first- or second-order RC filter suffices, with negligible phase distortion anywhere in the audio band. The price paid is that the quantiser is now only 1 bit, and a 1-bit quantiser has terrible resolution on its own.

To understand what oversampling buys even before any clever signal processing, consider the quantisation noise floor. For a uniform quantiser with step size $\Delta$, the quantisation noise power is $P_q = \Delta^2/12$, and this noise is spread approximately uniformly from 0 to $f_s/2$. The noise power spectral density is $P_q / (f_s/2)$. After oversampling by a factor of $M$ — so that the effective Nyquist band runs from 0 to $f_{\text{audio}} = f_s/(2M)$ — the in-band noise power is:

$$P_{\text{in-band}} = \frac{P_q}{f_s/2} \cdot f_{\text{audio}} = \frac{P_q}{f_s/2} \cdot \frac{f_s}{2M} = \frac{P_q}{M}$$

Each doubling of $M$ halves the in-band noise power, an improvement of 3 dB, equivalent to half a bit of resolution. At 64× oversampling this gives 18 dB, or three extra bits — useful, but not enough to get from a 1-bit quantiser to 16-bit performance. We need something more.

Noise Shaping

The second ingredient — and the one that makes $\Sigma\Delta$ conversion genuinely remarkable — is noise shaping. Rather than spreading quantisation noise uniformly in frequency, we can engineer its spectral distribution so that almost all the noise power sits above the audio band, where it is removed by a digital low-pass filter (the decimation filter) at the output.

A first-order $\Sigma\Delta$ modulator achieves this by a feedback loop. At each sample step, the quantiser takes the running integral of the difference between the input signal and the previous quantised output. More precisely: the quantisation error $e_n = y_n - \hat{x}_n$ (where $\hat{x}_n$ is the input to the quantiser and $y_n$ is the 1-bit output) is fed back and subtracted from the next input before integration. This is the integrator-feedback structure that gives the modulator its name: $\Sigma$ for the integrating summation, $\Delta$ for the difference.

In the $z$-domain, this feedback structure gives the quantisation noise a transfer function of:

$$N(z) = 1 - z^{-1}$$

that is, the noise at time $n$ is the current error minus the previous error — a first-difference operation. In the frequency domain, substituting $z = e^{j 2\pi f / f_s}$:

$$\bigl|N(f)\bigr|^2 = \left|1 - e^{-j 2\pi f / f_s}\right|^2 = 4\sin^2\!\left(\frac{\pi f}{f_s}\right)$$

For frequencies well below the sampling rate, $f \ll f_s$, the small-angle approximation gives:

$$\bigl|N(f)\bigr|^2 \approx \left(\frac{2\pi f}{f_s}\right)^2$$

The noise power spectral density rises as $f^2$ — it is heavily suppressed at low frequencies and pushed up toward $f_s/2$. Integrating this shaped noise over the audio band $[0, f_{\text{audio}}]$ and comparing to the flat-spectrum case, the in-band SNR improvement for a first-order modulator scales as $M^3$ rather than $M^1$: every doubling of oversampling ratio gives 9 dB improvement (1.5 bits) instead of 3 dB. At 64× oversampling — six doublings — a first-order modulator recovers approximately 54 dB, or 9 effective bits.

A second-order modulator applies the noise-shaping filter twice, giving $|N(f)|^2 \propto f^4$ and an SNR gain scaling as $M^5$: 15 dB per octave of oversampling. At 64× — again six doublings — this recovers approximately 90 dB, or 15 effective bits. Modern high-performance audio ADCs use fifth- to seventh-order modulators operating at 128× oversampling or higher. The in-band noise floor drops to levels corresponding to 20–24 effective bits — entirely from a 1-bit hardware comparator, with all the resolution coming from the noise shaping and the subsequent digital decimation filter.

The following table illustrates the SNR gain achievable at practical oversampling ratios:

Modulator order	Oversampling ratio	SNR gain	Effective bits gained
1st order	64×	54 dB	9
2nd order	64×	90 dB	15
5th order	128×	~120 dB	~20

The 5th-order row deserves a moment’s attention. A single-bit comparator — a device that outputs only 1 or 0, with no analogue subtlety whatsoever — combined with oversampling and noise shaping, achieves the resolution of a 20-bit Nyquist-rate ADC and is doing so using a simple digital feedback loop and an analogue integrator that can be fabricated cheaply on a CMOS chip. This is, I think, one of the more quietly stunning pieces of engineering in consumer electronics, and it goes entirely unnoticed because the CD player it lives inside is now considered mundane.

There is a subtlety worth adding for completeness. Real $\Sigma\Delta$ modulators of order three and above are potentially unstable — the noise-shaping loop can become unstable for large input signals, producing limit cycles or tonal artefacts. Managing this stability is a significant part of the design problem and involves either restricting the input range, adding nonlinear stability control, or using multi-bit internal quantisers (which reduce the quantisation step and ease the stability constraint while retaining most of the noise-shaping benefit). The multi-bit approach also addresses a related issue: the ideal 1-bit DAC in the feedback loop is inherently linear (there are only two levels, so there is no differential nonlinearity), but multi-bit internal DACs must be trimmed or calibrated to avoid nonlinearity in the feedback path corrupting the noise shaping. These engineering details are discussed thoroughly in Norsworthy, Schreier, and Temes [5], which remains the standard reference.

The digital audio infrastructure that delta-sigma conversion enabled — clean, cheap, phase-linear converters without steep analogue filters — also made digital audio workable in latency-sensitive applications like live performance. For a discussion of why latency matters so much in network music performance and how it shapes system design, see my earlier post on NMP latency and the physics of musical timing.

The Irony of the Comparison

Both tape saturation and delta-sigma conversion are, at root, about the same problem: how to manage the relationship between a signal and the finite resolution of the medium storing it. Tape manages the problem physically and somewhat accidentally — the ferromagnetic B-H curve happens to generate even harmonics that are consonant with the recorded signal, and the bias trick linearises the response well enough that the distortion only becomes audible when the engineer deliberately pushes into saturation. Delta-sigma manages the problem mathematically and deliberately — quantisation noise is redistributed in frequency by a designed feedback loop so that it falls outside the audible band.

Neither approach is perfect, and neither is neutral. Tape adds signal-correlated harmonic distortion whose spectral content depends on recording level and which compresses transients in a way that changes the perceived dynamics. Digital audio, even with delta-sigma conversion, has its own imperfections: idle-channel noise from the modulator, potential for tonal limit-cycle artefacts at specific input levels, and the abrupt onset of hard clipping at full scale — which, unlike tape saturation, is symmetrical and rapid and adds all harmonics simultaneously, giving the harsh, unpleasant character that digital overloads are known for. The soft-knee vs. hard-clip distinction is real and audible, and it is probably the most defensible technical basis for the claim that analogue tape handles transient overloads more graciously.

What is not defensible is the claim that one medium is inherently more musical than the other, or that digital audio lacks something fundamental that tape possesses. They are differently imperfect. The imperfections of tape happen to sit at harmonic relationships that Western ears, shaped by a tradition of music built on those same harmonic intervals, find pleasing. The imperfections of digital audio are not at pleasing harmonic intervals; they are wideband quantisation noise (before shaping) or ultrasonic shaped noise (after), and a sharp cliff at full scale. Different physics, different perceptual character.

A Personal Note

I spent a long time thinking the tape versus digital debate was mostly audiophile mythology — a community of enthusiasts rationalising the warmth of nostalgia as the warmth of oxide particles. The physics is more interesting than that, and doing the calculation changed my view. The second-harmonic content of tape saturation is not an accident or a romantic story; it is what you get when you push a symmetric nonlinearity with an audio sine wave, and the reason it sounds pleasant is not arbitrary but is grounded in the physics of consonance and the harmonic series. The delta-sigma converter is not a mundane commodity chip but a genuinely elegant solution to an otherwise intractable filter-design problem, and the fact that it achieves 24-bit resolution from a 1-bit comparator by spectral redistribution of noise is the kind of result that should get more attention in physics education.

Both technologies deserve better than the aesthetics argument they have been fighting in for forty years. The tools to understand them are not exotic — Taylor series, Fourier analysis, the z-transform, and the basic physics of ferromagnetism — and the reward is a clear-eyed picture of what is actually going on inside two of the most consequential inventions in the history of recorded music. If you are interested in related mathematics underlying other aspects of music, the posts on Euclidean rhythms and Messiaen’s modes and group theory cover the combinatorial and algebraic structures in rhythm and pitch that sit alongside the physics discussed here.

References

[1] Candy, J. C., & Temes, G. C. (Eds.). (1992). Oversampling Delta-Sigma Data Converters: Theory, Design, and Simulation. IEEE Press.

[2] Reiss, J. D., & McPherson, A. (2015). Audio Effects: Theory, Implementation and Application. CRC Press.

[3] Bertram, H. N. (1994). Theory of Magnetic Recording. Cambridge University Press.

[4] Pohlmann, K. C. (2010). Principles of Digital Audio (6th ed.). McGraw-Hill.

[5] Norsworthy, S. R., Schreier, R., & Temes, G. C. (Eds.). (1997). Delta-Sigma Data Converters: Theory, Design, and Simulation. IEEE Press.

Changelog

2026-01-14: Updated the interval description for the 7th harmonic to “above the double octave.” The 7th harmonic (7f) sits between the double octave (4f) and the triple octave (8f).

Part I: The Physics of Magnetic Tape#

Ferromagnetic Recording#

The B-H Curve and Hysteresis#

The Bias Signal#

Harmonic Generation at High Levels#

Part II: The Physics of Delta-Sigma Conversion#

Nyquist-Rate ADC and Its Limits#

Oversampling#

Noise Shaping#

The Irony of the Comparison#

A Personal Note#

References#

Changelog#