<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>LoLa on Sebastian Spicker</title>
    <link>https://sebastianspicker.github.io/tags/lola/</link>
    <description>Recent content in LoLa on Sebastian Spicker</description>
    <image>
      <title>Sebastian Spicker</title>
      <url>https://sebastianspicker.github.io/og-image.png</url>
      <link>https://sebastianspicker.github.io/og-image.png</link>
    </image>
    <generator>Hugo -- 0.160.0</generator>
    <language>en</language>
    <lastBuildDate>Sat, 26 Aug 2023 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://sebastianspicker.github.io/tags/lola/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>How Low Can You Go? Measuring Latency for Networked Music Performance Across Europe</title>
      <link>https://sebastianspicker.github.io/posts/nmp-latency-lola-mvtp/</link>
      <pubDate>Sat, 26 Aug 2023 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/nmp-latency-lola-mvtp/</guid>
      <description>We measured end-to-end audio and video latency for LoLa and MVTP across six European research-network links. One-way audio latency ranged from 7.5 to 22.5 ms. Routing topology mattered more than geographic distance. Enterprise firewalls were a disaster. Here is what we found.</description>
      <content:encoded><![CDATA[<p><em>This post summarises a manuscript submitted with Benjamin Bentz and colleagues
from the RAPP Lab network. The paper is not yet peer-reviewed; numbers and
conclusions are based on operational measurements collected 2020–2023.
Feedback welcome — particularly from anyone who has run similar measurements
on non-European or wireless-last-mile links.</em></p>
<hr>
<h2 id="the-problem">The Problem</h2>
<p>Musicians playing together in the same room experience acoustic propagation
delay of roughly 3 ms per metre of separation — essentially free latency that
most ensembles never consciously register. When you distribute musicians across
a network, you inherit that propagation cost plus everything the signal chain
adds on top: buffers, codec processing, routing hops, switching overhead.</p>
<p>Conventional video-conferencing (Zoom, Teams, etc.) operates at end-to-end
delays of roughly 100–300 ms. That is comfortable for speech — human
conversation tolerates round-trip delays up to about 250 ms before it starts
to feel wrong — but it is well above the threshold at which ensemble timing
breaks down. The NMP literature generally puts the upper bound for
synchronous rhythmic playing somewhere between 20 and 30 ms one-way, with
considerable variation by tempo, instrument, and whether the performers can
see each other [Carôt 2011; Tsioutas &amp; Xylomenos 2021; Medina Victoria 2019].</p>
<p>Specialised low-latency systems cut the processing overhead by avoiding
compression, using hardware-accelerated video pipelines, and riding
research-and-education networks that offer better jitter characteristics than
commodity internet. Two of the better-known ones are <strong>LoLa</strong> (Low Latency
Audio Visual Streaming System, developed at Conservatorio G. Tartini Trieste)
and <strong>MVTP</strong> (Modular Video Transmission Platform, developed at CESNET in
Prague). We deployed both at Hochschule für Musik und Tanz Köln as part of
the RAPP Lab collaboration and spent about two and a half years measuring them.</p>
<hr>
<h2 id="the-latency-budget">The Latency Budget</h2>
<p>End-to-end latency in NMP is cumulative and non-recoverable. Once delay enters
the chain, nothing downstream can subtract it. The budget looks like:</p>
\[
  L_\text{total} = L_\text{capture} + L_\text{buffer} + L_\text{network} + L_\text{playback}
\]<p>Network latency \( L_\text{network} \) includes propagation (roughly
\( d / (2 \times 10^8) \) seconds for a fibre link of distance \( d \) metres,
accounting for the refractive index of glass) plus per-hop processing.
Everything else is system-dependent.</p>
<p>The key insight is that \( L_\text{buffer} \) is not fixed — it is a
consequence of jitter. A jittery link forces larger buffers to avoid
underruns, which directly adds to perceived latency. This is why raw bandwidth
is almost irrelevant for NMP: a 1 Gbps link with erratic jitter will perform
worse than a 100 Mbps link with deterministic behaviour.</p>
<hr>
<h2 id="what-we-measured-and-how">What We Measured and How</h2>
<p><strong>Network RTT.</strong> ICMP ping, 1,000 packets per run. We report the median as a
robust summary; the mean is too sensitive to the occasional rogue packet.</p>
<p><strong>End-to-end audio latency.</strong> An audio signal-loop: transmit a test signal
from site A to site B, have site B return it immediately, estimate round-trip
delay by cross-correlation. One-way latency = signal-loop RTT / 2. This method
captures local processing and buffering at both ends in addition to the network
leg, which is what actually matters for a musician.</p>
<p><strong>Video latency.</strong> Component-based estimation (capture frame cadence +
processing pipeline + display). We did not have a frame-accurate video
loopback method, so treat these numbers as estimates rather than precision
measurements. That caveat matters less than it might seem because, as you will
see, video was always slower than audio by a wide enough margin that it did not
drive the operational decisions.</p>
<p><strong>Firewall impact.</strong> A controlled 4-hour session on the Cologne–Vienna link,
alternating between a DMZ configuration (direct research-backbone access) and
a transparent enterprise firewall, logging packet loss and decoder instability.</p>
<p>Six partner institutions, air distances from 175 to 1,655 km, measurements
collected between October 2020 and March 2023.</p>
<hr>
<h2 id="results">Results</h2>
<h3 id="audio-latency">Audio latency</h3>
<table>
  <thead>
      <tr>
          <th>Partner (from Cologne)</th>
          <th>Air distance (km)</th>
          <th>Median RTT (ms)</th>
          <th>One-way audio latency (ms)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Prague</td>
          <td>535</td>
          <td>5.0</td>
          <td>7.5</td>
      </tr>
      <tr>
          <td>Vienna</td>
          <td>745</td>
          <td>7.0</td>
          <td>9.5</td>
      </tr>
      <tr>
          <td>Detmold</td>
          <td>175</td>
          <td>7.5</td>
          <td>10.0</td>
      </tr>
      <tr>
          <td>Trieste</td>
          <td>775</td>
          <td>10.0</td>
          <td>12.5</td>
      </tr>
      <tr>
          <td>Rome</td>
          <td>1,090</td>
          <td>17.5</td>
          <td>20.0</td>
      </tr>
      <tr>
          <td>Tallinn</td>
          <td>1,465</td>
          <td>19.5</td>
          <td>22.0–22.5</td>
      </tr>
  </tbody>
</table>
<p>The number that jumps out immediately: <strong>Detmold (175 km away) has higher
latency than Vienna (745 km away).</strong> This is a routing issue, not a physics
one. The Detmold link was traversing a less efficient campus path that added
extra hops before reaching the research backbone. Prague, by contrast, was
connected via a particularly short routing path and achieved the lowest latency
of any link despite not being the geographically closest.</p>
<p>The practical implication: geographic distance is a poor predictor of
achievable latency. Measure RTT; do not estimate from a map.</p>
<h3 id="video-latency">Video latency</h3>
<p>Estimated one-way video latency was 20–35 ms across all configurations,
with the dominant contributions coming from frame cadence (at 60 fps, you wait
up to 16.7 ms for a frame to be captured regardless of what the network is
doing) and buffering at the decoder. In every deployment, video consistently
lagged audio. Musicians unsurprisingly fell back on audio for synchronization
and treated video as a supplementary cue — useful for expressive and social
information, not for timing.</p>
<h3 id="the-firewall-experiment">The firewall experiment</h3>
<p>This is the result I find most important for anyone planning a similar
deployment.</p>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>DMZ (no firewall)</th>
          <th>With enterprise firewall</th>
          <th>Change</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Dropped audio packets</td>
          <td>0.002%</td>
          <td>0.052%</td>
          <td>+26×</td>
      </tr>
      <tr>
          <td>Audio buffer realignments/hour</td>
          <td>0.3</td>
          <td>3.9</td>
          <td>+13×</td>
      </tr>
      <tr>
          <td>Dropped video frames</td>
          <td>0.04%</td>
          <td>0.74%</td>
          <td>+18×</td>
      </tr>
      <tr>
          <td>Additional latency</td>
          <td>—</td>
          <td>0.5–1.0 ms</td>
          <td>—</td>
      </tr>
  </tbody>
</table>
<p>The raw latency increase (0.5–1.0 ms) is small and largely irrelevant. The
packet loss and buffer event increases are not. A 26-fold increase in dropped
audio packets on an otherwise uncongested link means the firewall is doing
something — likely deep packet inspection or stateful tracking — that
introduces enough irregularity to destabilise small audio buffers. This forces
you to either accept dropouts or increase buffer size, and increasing buffer
size increases latency.</p>
<p>The message is: if your institution requires traffic inspection for
security policy compliance, you are paying a latency tax that is more about
<em>stability</em> than the raw delay number, and that tax is substantial.</p>
<hr>
<h2 id="discussion">Discussion</h2>
<p>Based on the measured latencies and reported musical tolerances from the
literature, I would roughly characterise the links as follows:</p>
<ul>
<li>
<p><strong>Prague, Vienna, Detmold, Trieste (7.5–12.5 ms):</strong> Compatible with
most repertoire including rhythmically demanding chamber music.
Musicians in our sessions reported the interaction as &ldquo;natural&rdquo; or
&ldquo;like being in the same room&rdquo; at these latencies.</p>
</li>
<li>
<p><strong>Rome (20 ms):</strong> Usable with attention to repertoire and tempo.
Slower movements and music where tight rhythmic locking is not the
primary aesthetic concern work well. Rhythmically dense passages at
fast tempi become harder.</p>
</li>
<li>
<p><strong>Tallinn (22–22.5 ms):</strong> At the upper edge of the comfortable range.
Still usable — we ran a concert collaboration in March 2023 — but
musicians adapt their interaction strategies, leaning more on musical
anticipation than reactive synchronization.</p>
</li>
</ul>
<p>What is notably absent from this data: anything outside the European
research-network context. All six links ran on GÉANT or national backbone
equivalents with favourable jitter characteristics. The numbers almost
certainly do not transfer directly to commodity internet, satellite links, or
mixed-topology paths.</p>
<p><strong>Limitations I want to be explicit about.</strong> The video latency estimates are
component-based, not directly measured, so treat that 20–35 ms range with
appropriate skepticism. The firewall comparison is a single 4-hour session on
a single link; I would not want to extrapolate too aggressively to other
firewall vendors or configurations. And this is an operational measurement
study, not a controlled perceptual experiment — I cannot tell you from this
data at precisely what latency threshold a given ensemble will declare a
session unusable, because that depends on the music, the musicians, and
factors I did not measure.</p>
<hr>
<h2 id="practical-takeaways">Practical Takeaways</h2>
<p>For anyone setting up a similar system:</p>
<ol>
<li><strong>Measure RTT before committing to a partner institution.</strong> A 100 km
difference in air distance can easily be swamped by routing differences.</li>
<li><strong>Get DMZ placement if at all possible.</strong> The firewall results suggest
this matters more than any other single configuration decision.</li>
<li><strong>Minimise campus hops between your endpoint and the research backbone.</strong>
Each additional switching layer adds jitter risk.</li>
<li><strong>Use small audio buffers and monitor for underruns.</strong> If your baseline
RTT is good, your buffer can be small; if underruns increase, that is an
early warning that network stability is degrading before packet loss
becomes audible.</li>
<li><strong>Accept that video will lag audio and design your session accordingly.</strong>
This is not a system failure; it is a consequence of how video pipelines
work at low latency. Plan for it.</li>
</ol>
<hr>
<h2 id="references">References</h2>
<p>Carôt, A. (2011). Low latency audio streaming for Internet-based musical
interaction. <em>Advances in Multimedia and Interactive Technologies</em>.
<a href="https://doi.org/10.4018/978-1-61692-831-5.ch015">https://doi.org/10.4018/978-1-61692-831-5.ch015</a></p>
<p>Drioli, C., Allocchio, C., &amp; Buso, N. (2013). Networked performances and
natural interaction via LOLA. <em>LNCS</em>, 7990, 240–250.
<a href="https://doi.org/10.1007/978-3-642-40050-6_21">https://doi.org/10.1007/978-3-642-40050-6_21</a></p>
<p>Medina Victoria, A. (2019). <em>A method for the measurement of the latency
tolerance range of Western musicians</em>. Ph.D. dissertation, Cork Institute
of Technology (now Munster Technological University).</p>
<p>Rottondi, C., Chafe, C., Allocchio, C., &amp; Sarti, A. (2016). An overview on
networked music performance technologies. <em>IEEE Access</em>, 4, 8823–8843.
<a href="https://doi.org/10.1109/ACCESS.2016.2628440">https://doi.org/10.1109/ACCESS.2016.2628440</a></p>
<p>Tsioutas, K. &amp; Xylomenos, G. (2021). On the impact of audio characteristics
to the quality of musicians experience in network music performance. <em>JAES</em>,
69(12), 914–923. <a href="https://doi.org/10.17743/jaes.2021.0041">https://doi.org/10.17743/jaes.2021.0041</a></p>
<p>Ubik, S., Halak, J., Kolbe, M., Melnikov, J., &amp; Frič, M. (2021). Lessons
learned from distance collaboration in live culture. <em>AISC</em>, 1378, 608–615.
<a href="https://doi.org/10.1007/978-3-030-74009-2_77">https://doi.org/10.1007/978-3-030-74009-2_77</a></p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-20</strong>: Updated the Drioli et al. (2013) LNCS volume number to 7990 (ECLAP 2013 proceedings). Updated the Ubik et al. (2021) AISC volume number to 1378 and page range to 608–615. Updated the fifth author&rsquo;s surname to &ldquo;Frič.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
  </channel>
</rss>
