The Milky Way Is a Gravitational Wave Detector

On 28–29 June 2023, four independent research collaborations published papers simultaneously: NANOGrav (North American Nanohertz Observatory for Gravitational Waves), the EPTA (European Pulsar Timing Array), the PPTA (Parkes Pulsar Timing Array, Australian), and InPTA (Indian Pulsar Timing Array). Each announced essentially the same result: evidence for a stochastic gravitational wave background at nanohertz frequencies. The simultaneous coordinated publication is itself telling — in physics, that kind of coordination usually signals that each group needed the others to make the claim credible. If one group had published alone, the community would have been skeptical. Four independent datasets saying the same thing is a different matter.

The detector for all four collaborations was not a machine. It was the Milky Way — specifically, a collection of millisecond pulsars scattered across the galaxy, separated by thousands of light-years. The arms of this instrument make LIGO look microscopic by comparison, and they are pointed in every direction at once.

There is a professional habit, well established in physics, of finding the scale of things either comforting or terrifying. This one is genuinely awe-inspiring.

The scale problem

To detect a gravitational wave, your instrument needs to be roughly comparable in size to the wavelength you are trying to observe. This is not a hard rule — LIGO’s 4-km arms are much shorter than the wavelengths it detects — but the sensitivity of any interferometer scales with arm length, so the constraint matters practically even when it is not absolute.

LIGO detects gravitational waves in the frequency range of roughly 10–1000 Hz. A wave at 100 Hz has a wavelength of

$$\lambda = \frac{c}{f} = \frac{3 \times 10^8 \text{ m/s}}{100 \text{ Hz}} \approx 3000 \text{ km},$$

about a quarter of the Earth’s diameter. LIGO’s 4-km Fabry-Pérot arm cavities are short compared to the wavelength, but the cavities fold the light hundreds of times to achieve an effective path length of about 1600 km, and the interferometer is exquisitely sensitive to differential arm-length changes of order $10^{-19}$ m — a fraction of the proton radius. The physics is heroic. But it works at the scale of stellar-mass binary mergers: neutron star–neutron star, neutron star–black hole, stellar-mass black hole–black hole. Those systems radiate at audible frequencies. LIGO famously converts its strain signals to sound, and the chirps do sound like something from a science fiction film.

Now consider pulsar timing arrays. The targets are nanohertz gravitational waves: frequencies of order $f \sim 10^{-9}$ to $10^{-8}$ Hz. At $10^{-9}$ Hz, the wavelength is

$$\lambda = \frac{c}{f} \approx \frac{3 \times 10^8 \text{ m/s}}{10^{-9} \text{ Hz}} = 3 \times 10^{17} \text{ m} \approx 10 \text{ pc} \approx 33 \text{ light-years}.$$

To detect oscillations at these frequencies, you need arms measured in light-years. Tens to thousands of light-years. The nearest millisecond pulsars are a few hundred light-years away; the most distant ones used in PTA arrays are several thousand light-years distant. By accident of the galaxy’s size and the distribution of recycled pulsars within it, we happen to live inside an instrument of approximately the right dimensions to detect nanohertz gravitational waves. The galaxy did not plan this. We got lucky.

Millisecond pulsars as cosmic clocks

A gravitational wave detector is only as good as its clock. LIGO measures differential length changes in its arm cavities using the interference of laser light — effectively using the constancy of the speed of light as a metronome. Pulsar timing arrays use a different clock: the rotation of a neutron star.

Ordinary pulsars are the collapsed remnants of massive stars, born in supernova explosions. A neutron star of roughly 1.4 solar masses is compressed into a sphere about 10 km across, rotating rapidly and radiating beams of radio waves from near its magnetic poles. The rotation periods of young pulsars are typically of order seconds, and they spin down on timescales of millions of years as they lose rotational energy to magnetic dipole radiation. They are not particularly stable clocks — the spin-down is uneven, and they occasionally “glitch,” suddenly spinning up by a small amount.

Millisecond pulsars (MSPs) are a different beast entirely. They have been recycled: spun up to near-millisecond rotation periods by accreting matter from a binary companion star. The accretion process deposits angular momentum onto the neutron star, increasing its spin rate dramatically, while simultaneously burying and weakening its magnetic field. The typical MSP has a surface magnetic field of order $10^4$–$10^5$ G, four to five orders of magnitude weaker than a young pulsar’s $10^9$–$10^{12}$ G. Since the spin-down torque scales as $B^2$, the weaker field means the MSP loses rotational energy far more slowly. Once the accretion stops — when the companion has exhausted its transferable mass — the MSP is left spinning rapidly and stably, with a spin-down rate of order $10^{-20}$ s/s.

The result is rotational stability competitive with terrestrial atomic clocks. PSR J0437$-$4715, one of the best-timed MSPs, has a rotation period of $P \approx 5.76$ ms and a timing residual — the scatter of individual pulse arrival times around the best-fit timing model — of order 100 ns over decades of observation. For a pulsar completing about 174 rotations per second, a residual of 100 ns over a baseline of years is remarkable. The fractional frequency stability is $\delta P / P \sim 10^{-14}$ or better. These are not merely good clocks; they are among the most stable periodic phenomena known to physics.

The timing model accounts for everything we know about the pulsar’s environment: its spin period and spin-down rate, its proper motion across the sky, parallax (from which we get the distance), Shapiro delay from any companions, dispersion measure variations in the interstellar medium, and more. After subtracting all modelled effects, what remains are timing residuals — small, unexplained deviations in the pulse arrival times. If a gravitational wave passes through, it will appear in those residuals.

How a gravitational wave shifts a pulse arrival time

A gravitational wave is a propagating perturbation of the spacetime metric. In the transverse-traceless (TT) gauge, a wave propagating along the $z$-direction perturbs the metric as

$$ds^2 = -c^2 dt^2 + \bigl(1 + h_+\bigr)dx^2 + \bigl(1 - h_+\bigr)dy^2 + 2h_\times \, dx \, dy + dz^2,$$

where $h_+$ and $h_\times$ are the two polarisation amplitudes, both functions of $t - z/c$. As the wave passes, the proper distances in the $x$- and $y$-directions oscillate: space itself is being stretched and squeezed at the wave frequency.

A radio pulse travelling from a pulsar to Earth traverses this oscillating spacetime. The proper path length changes, and so does the travel time. The timing residual induced in a pulsar in direction $\hat{n}$ by a gravitational wave with wavevector $\hat{k}$ is

$$R(t) \propto \int_0^t dt' \, \frac{\Delta\nu(t')}{\nu} \propto \int_0^t dt' \, h(t', \hat{n}, \hat{k}),$$

where $h$ is an appropriate contraction of the metric perturbation with the geometry of the Earth-pulsar baseline. The key point: the timing residual is the time integral of the metric strain. For a sinusoidal wave at frequency $f$, the residual oscillates at the same frequency but with amplitude suppressed by a factor of $1/(2\pi f T_{\rm obs})$ relative to what one might naively expect — which is partly why PTA analysis is technically demanding.

For a single pulsar, a timing residual tells you that something disturbed the spacetime between Earth and the pulsar — but it could be a systematic in the timing model, interstellar medium fluctuations, or intrinsic pulsar noise. You cannot claim a gravitational wave detection from one pulsar alone. What you need is the correlation between many pulsars.

The Hellings-Downs curve

Here is the central idea of pulsar timing array science. Consider two pulsars in directions $\hat{n}_a$ and $\hat{n}_b$, separated on the sky by an angle $\theta$ such that $\cos\theta = \hat{n}_a \cdot \hat{n}_b$. Both are embedded in the same stochastic gravitational wave background — a superposition of waves arriving from all directions, at all frequencies in the nanohertz band, with random phases and amplitudes. The timing residuals of both pulsars will be perturbed by this background. The question is: what is the expected cross-correlation between their residuals?

Hellings and Downs (1983) computed this, assuming the background is isotropic (equal power from all directions), unpolarised, and stationary. The answer is now called the Hellings-Downs (HD) curve:

$$\Gamma(\theta) = \frac{3}{2} \left(\frac{1-\cos\theta}{2}\right) \ln\!\left(\frac{1-\cos\theta}{2}\right) - \frac{1}{4}\left(\frac{1-\cos\theta}{2}\right) + \frac{1}{2}.$$

Let me unpack the features of this function:

At $\theta = 0$ (the same pulsar, or two pulsars in the same direction): $\Gamma(0) = 1/2$, maximum positive correlation. This makes sense — both pulsars see the same wave.
At $\theta \approx 50°$: the curve crosses zero and turns negative.
At $\theta = \pi/2$ (pulsars at right angles): $\Gamma(\pi/2) \approx -0.15$, near the minimum of the curve.
At $\theta = \pi$ (antipodal pulsars, opposite directions on the sky): $\Gamma(\pi) = 1/4$. Positive correlation even for pulsars in opposite directions — counterintuitive but correct.
In between, the curve dips negative (anticorrelated) for angles roughly $50°$–$120°$.

The shape is uniquely quadrupolar. It arises directly from the spin-2 tensor nature of gravitational waves. A scalar perturbation (like a monopole clock error common to all pulsars — such as an error in the terrestrial time standard used to timestamp the observations) would produce a flat, angle-independent correlation. A dipole perturbation (like an error in Earth’s ephemeris, or a systematic in our knowledge of Earth’s position) would produce a dipolar, $\cos\theta$ pattern. Only spin-2 tensor radiation produces the Hellings-Downs shape.

This is why the HD curve is the smoking gun. If you observe cross-correlations between pulsar timing residuals that match this specific, non-trivial curve as a function of angular separation — something that dips negative around 90° and recovers to a positive value at 180° — you have direct evidence that a tensor gravitational wave background is responsible. No other known astrophysical systematic produces this pattern.

The difficulty is statistical. You need enough pulsars, spread over a wide range of sky angles, each timed with sufficient precision, over a long enough baseline, that you can measure this correlation function with confidence. That has been the programme of the PTA collaborations since the 1990s. In June 2023, they had enough.

The 2023 evidence

The NANOGrav 15-year dataset (Agazie et al., 2023) comprises 68 millisecond pulsars observed for up to 15 years, with an average of roughly 2200 timing observations per pulsar. The dataset represents an enormous investment of telescope time — primarily at the Arecibo Observatory (until its collapse in December 2020) and the Green Bank Telescope.

The analysis found an excess of low-frequency noise common to many pulsars, consistent with a power-law spectral shape. More importantly, when the cross-correlations between all pairs of pulsars were computed and binned by angular separation, the result was consistent with the Hellings-Downs curve. The statistical significance of the HD correlation — that is, the evidence that the spatial correlation pattern matches the quadrupolar prediction rather than some isotropic or zero-correlation model — was 3–4$\sigma$ depending on the analysis method and prior assumptions. The collaboration carefully described this as “evidence for” rather than “detection of” a gravitational wave background, following community conventions (detection would conventionally require $\geq 5\sigma$).

Simultaneously, the EPTA published its second data release (Antoniadis et al., 2023), the PPTA published its results (Reardon et al., 2023), and InPTA contributed its analysis. A fifth collaboration, the Chinese Pulsar Timing Array (CPTA), also published consistent results (Xu et al., 2023). Each group used independent datasets, different telescopes, different software pipelines, different statistical methodologies. All found the same thing.

The fact that five groups independently recovered a consistent signal with approximately the right spectral shape and approximately the correct spatial correlations is the argument for reality. Any single group’s result could be explained by a systematic error in their data or analysis. Five groups with independent data and methods converging on the same result is much harder to explain as coincidence.

The combined interpretation is clear: something is producing a stochastic background of nanohertz gravitational waves permeating the galaxy, with spatial correlations consistent with the tensor quadrupole signature predicted by general relativity. The Milky Way is a gravitational wave detector, and it has measured something.

What is making the noise?

The million-dollar question. The observed signal has a characteristic spectral shape: the power spectral density of the timing residuals scales approximately as $S(f) \propto f^{-13/3}$, or equivalently, the gravitational wave energy density spectrum

$$\Omega_{\rm GW}(f) \propto f^{2/3}.$$

This $f^{2/3}$ scaling is the expected spectrum for an ensemble of circular binary systems driven purely by gravitational wave emission — specifically, a population of supermassive black hole binaries (SMBHBs) distributed across a wide range of masses and redshifts.

Here is the astrophysical picture. When two massive galaxies merge — an event that happens billions of times over cosmic history — their central supermassive black holes (each typically $10^7$–$10^{10}$ solar masses) do not immediately merge. First, they sink toward the centre of the merger remnant by dynamical friction against the stellar background, forming a loosely bound binary on scales of a parsec or so. The binary then hardens: three-body interactions with individual stars passing close to the binary extract orbital energy, driving the pair to smaller separations. Eventually, when the binary has hardened to a separation where gravitational wave emission dominates the energy loss, the pair inspirals and merges on a timescale of millions to billions of years.

The incoherent superposition of gravitational wave emission from the many billions of SMBHB systems across the observable universe — at all masses, at all orbital frequencies, at all redshifts — produces a stochastic background. It is, in a sense, cosmic traffic noise: no individual merger is detectable as a discrete event, but the combined hum from all of them is. The predicted spectral amplitude and shape from this population are broadly consistent with the observed signal, though the uncertainties on the astrophysical model are large enough that the agreement is not a precise test.

Alternative sources are possible. A first-order phase transition in the early universe (such as a QCD or electroweak phase transition) would produce a background of gravitational waves with a different spectral shape — more peaked and potentially harder than the SMBHB prediction. A network of cosmic strings, topological defects from symmetry-breaking phase transitions in the early universe, would produce yet another spectrum, approximately flat in $\Omega_{\rm GW}(f)$. Primordial gravitational waves from inflation are expected at far lower frequencies, near the CMB scale, but with a nearly scale-invariant spectrum that could contribute at nanohertz frequencies in some models. The 2023 data are most consistent with the SMBHB interpretation, but cannot rule out contributions from early-universe sources — or combinations of both. Future data with longer baselines and more pulsars will tighten the spectral measurements and may reveal deviations from the SMBHB prediction.

There is also the tantalising prospect of eventually resolving individual SMBHB systems above the stochastic background — the gravitational wave equivalent of resolving individual radio sources above the extragalactic background. No individual SMBHB has yet been identified in PTA data, but the 15-year NANOGrav dataset already places interesting upper limits on the most massive known candidate systems.

Multi-frequency gravitational wave astronomy

We are now, unambiguously, in the era of multi-frequency gravitational wave astronomy. The universe produces gravitational waves across an enormous span of frequencies, and different frequency windows probe fundamentally different source populations:

Primordial / CMB B-modes (~$10^{-17}$ Hz): Wavelengths comparable to the Hubble scale. Primordial tensor perturbations from inflation would imprint a distinctive B-mode polarisation pattern in the cosmic microwave background. No confirmed detection yet; the BICEP/Keck programme and future CMB experiments (LiteBIRD, CMB-S4) are sensitive to this regime.
Pulsar timing arrays (~$10^{-9}$–$10^{-8}$ Hz): The nanohertz band, just accessed by NANOGrav and partners. Sources: SMBHB inspirals, possibly early-universe phase transitions and cosmic strings. Arm length: thousands of light-years.
LISA (~$10^{-4}$–$10^{-1}$ Hz): The millihertz band. The LISA space interferometer (planned launch in the 2030s) will have arm lengths of 2.5 million km and will detect SMBHB mergers directly as they happen, stellar-mass compact binary inspirals years before their LIGO-band merger, and possibly signals from extreme mass-ratio inspirals. LISA will see the SMBHB sources that PTAs see as a stochastic background, but in their final years of inspiral.
LIGO/Virgo/KAGRA (~10–$10^3$ Hz): The audible band. Stellar-mass black hole and neutron star mergers. Over 90 confirmed events as of the end of O3, with many more candidates in O4. Source masses: $\sim$1–100 $M_\odot$.

This is the gravitational wave analogue of multi-wavelength astronomy. Just as the universe looks completely different in radio, infrared, optical, X-ray, and gamma-ray light — each wavelength band revealing different physical processes and source populations — the universe sounds completely different at each gravitational wave frequency. PTAs hear the rumble of cosmic structure formation; LISA will hear the whisper of the final million years before the most massive black hole mergers; LIGO hears the sharp crack of stellar-mass collisions.

The 2023 announcements represent the opening of the nanohertz window. We have gone from one gravitational wave frequency band (LIGO/Virgo) to two. The next decade, with LISA launching and PTAs continuing to accumulate data, will see the opening of a third.

A brief detour: LIGO’s O4 run and the mass-gap object

While the PTA collaborations were making their June 2023 announcement, LIGO’s fourth observing run (O4, which ran from May 2023 to June 2025) was proceeding at an extraordinary rate. The upgraded detectors were detecting candidate gravitational wave events at roughly one every two to three days — over 200 candidates across the full run. This is now a production science rather than an exploration.

Among the most scientifically interesting events was GW230529, detected on 29 May 2023 and published by the LIGO Scientific Collaboration (Abbott et al., 2024). The signal is consistent with the merger of a neutron star with a compact object whose mass was measured to be approximately $2.5$–$4.5\ M_\odot$. This mass range sits squarely in what theorists call the “mass gap” — the range between the heaviest neutron stars ($\lesssim 2.3\ M_\odot$, though this upper limit is uncertain) and the lightest stellar-mass black holes inferred from X-ray binaries ($\gtrsim 5\ M_\odot$).

Whether GW230529’s companion was the heaviest neutron star ever observed, or the lightest black hole, is genuinely unknown. The distinction matters enormously for nuclear physics: if it is a neutron star, it constrains the nuclear equation of state at supranuclear densities. If it is a black hole, it means the mass gap is narrower than X-ray observations suggested, and our understanding of compact object formation needs revision. Gravitational wave observations alone cannot distinguish a rapidly spinning heavy neutron star from a slowly spinning light black hole without additional electromagnetic counterpart observations, and no counterpart was found for GW230529. This question will likely not be settled until we have electromagnetic constraints from similar systems, or until we accumulate enough mass-gap events to understand the population statistically.

The instrument we did not build

I want to return to the pulsar timing array concept, because I think it deserves more than a passing technical description. The idea is this: nature has distributed a set of extremely stable clocks across the galaxy. We did not put them there. We did not design them. We simply discovered that neutron stars, after a particular evolutionary pathway involving mass transfer from a binary companion, achieve a rotational stability that happens to be sufficient to detect perturbations of spacetime at cosmological scales.

The instrument is the galaxy itself — or rather, our ability to model it. We build a timing model for each pulsar: a comprehensive description of every known effect that influences pulse arrival times. We subtract the model. What remains, the residuals, contains the signal we cannot yet explain. We cross-correlate the residuals of 68 (or 25, or 57, depending on the collaboration) pulsars in pairs, compute the correlation as a function of angular separation, and compare to the Hellings-Downs prediction.

The engineering challenge is not building the detector. It is characterising it. Understanding the noise. Modelling the interstellar medium, which disperses radio pulses in a frequency-dependent way and varies as the pulsar moves through clouds of ionised gas. Accounting for clock errors in the terrestrial time standards used to timestamp observations. Dealing with instrumental noise in each of the different radio telescopes that contribute data. Building a Bayesian framework that can simultaneously fit the timing model parameters, the pulsar noise properties, and the GWB parameters for dozens of pulsars.

This is painstaking, years-long work. The 15-year NANOGrav dataset reflects something like 600 pulsar-years of observation. The detection is earned.

The Hellings-Downs correlation — the specific pattern that emerged from those residuals, consistent with the quadrupolar fingerprint of general relativity’s spin-2 gravitational waves — is one of the more beautiful results I have seen in recent astrophysics. It is a direct measurement of the tensor nature of gravity, at frequency scales eleven orders of magnitude below anything LIGO can access, using a detector assembled by the Milky Way over the course of 10 billion years of stellar evolution and galaxy mergers.

We are in an age of gravitational wave astronomy. I find that remarkable.

If you are interested in the broader theme of using astronomical observations as physics experiments rather than just cataloguing the sky, the posts on transit photometry and the gift of transits and on smartphone-based exoplanet observations cover similar ground at different scales — stellar radii measured in units of stellar radii, by timing the dimming of a star as a planet crosses its face. The underlying logic is the same: precision timing plus a physical model plus statistics equals a measurement of something you could not directly touch.

References

Hellings, R. W., & Downs, G. S. (1983). Upper limits on the isotropic gravitational radiation background from pulsar timing analysis. The Astrophysical Journal Letters, 265, L39–L42. DOI: 10.1086/183954
Agazie, G., et al. (NANOGrav Collaboration). (2023). The NANOGrav 15 yr Data Set: Evidence for a Gravitational-wave Background. The Astrophysical Journal Letters, 951, L8. DOI: 10.3847/2041-8213/acdac6
Antoniadis, J., et al. (EPTA Collaboration). (2023). The second data release from the European Pulsar Timing Array — III. Search for gravitational wave signals. Astronomy & Astrophysics, 678, A50. DOI: 10.1051/0004-6361/202346844
Reardon, D. J., et al. (PPTA Collaboration). (2023). Search for an isotropic gravitational-wave background with the Parkes Pulsar Timing Array. The Astrophysical Journal Letters, 951, L6. DOI: 10.3847/2041-8213/acdd02
Xu, H., et al. (CPTA Collaboration). (2023). Searching for the nano-Hertz stochastic gravitational wave background with the Chinese Pulsar Timing Array. Research in Astronomy and Astrophysics, 23(7), 075024. DOI: 10.1088/1674-4527/acdfa5
Abbott, R., et al. (LIGO Scientific Collaboration). (2024). Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 $M_\odot$ Compact Object and a Neutron Star. The Astrophysical Journal Letters, 970, L34. DOI: 10.3847/2041-8213/ad5beb

Changelog

2025-12-01: Corrected the summary from “four independent pulsar timing arrays” to “five” — the CPTA (Chinese Pulsar Timing Array) also published consistent results in June 2023 and is counted as the fifth group in the body text.

The scale problem#

Millisecond pulsars as cosmic clocks#

How a gravitational wave shifts a pulse arrival time#

The Hellings-Downs curve#

The 2023 evidence#

What is making the noise?#

Multi-frequency gravitational wave astronomy#

A brief detour: LIGO’s O4 run and the mass-gap object#

The instrument we did not build#

References#

Changelog#