Music-Education on Sebastian Spicker

Artificial Intelligence in Music Pedagogy: Curriculum Implications from a Thementag

Sat, 07 Dec 2024 00:00:00 +0000

On 2 December 2024, the Hochschule für Musik und Tanz Köln held a Thementag: “Next level? Künstliche Intelligenz und Musikpädagogik im Dialog.” I gave three workshops — on data protection and AI, on AI tools for students, and on AI in teaching. The handouts from those sessions cover the practical and regulatory ground. This post is the argument behind them: what I think changes in music education when these tools become ambient, and what I think does not.

The Occasion

“Next level?” The question mark is doing real work. The framing HfMT chose for the day was appropriately provisional: not a declaration that AI has already transformed music education, but an invitation to ask whether, in what direction, and at what cost.

The invitations that reach me for events like this tend to come with one of two framings. The first is enthusiasm: AI is coming, we need to get ahead of it, here are tools your students are already using. The second is anxiety: AI is coming, it threatens everything we do, we need to protect students from it. Both framings are understandable. Neither is adequate to the curriculum question, which is slower-moving and more structural than either suggests.

I prepared three sets of handouts. The first covered data protection — the least glamorous topic in AI education, and the one that most directly determines what can legally be deployed in a university setting. The second covered AI tools for students: what exists, what it does, and what critical thinking skills you need to use it without being used by it. The third covered AI for instructors: where it helps, where it flatters, and where it makes things worse.

This post does not recapitulate the handouts. It addresses the question I kept returning to across all three workshops: what does this change about what a music student needs to learn?

What the Technology Actually Is

My physics training left me professionally uncomfortable with hand-waving — including my own. Before discussing curriculum implications, it is worth being specific about what these tools are.

The dominant paradigm in current AI — responsible for ChatGPT, for Whisper, for Suno.AI, for Google Magenta, for the large language models whose outputs are now visible everywhere — is the transformer architecture (Vaswani et al., 2017). A transformer is a neural network that processes sequences by computing, for each element, a weighted attention over all other elements. The attention weights are learned from data. The result is a model that can capture long-range dependencies in sequences — text, audio, musical notes — without the recurrence that made earlier architectures difficult to train at scale.

What this means practically: these models are trained on very large corpora, they learn statistical regularities, and they generate outputs that are statistically consistent with their training distribution. They are not reasoning from first principles. They do not “know” music theory the way a student who has internalised harmonic function knows it. They have learned, from enormous quantities of text and audio, what tends to follow what. For many tasks this is sufficient. For tasks that require understanding of underlying structure, it is not — and the failure modes are characteristic rather than random.

BERT (Devlin et al., 2018) showed that pre-training on large corpora and fine-tuning on specific tasks produces models that outperform task-specific architectures on a wide range of benchmarks. The same transfer-learning paradigm has spread to audio (Whisper pre-trains on 680,000 hours of labelled audio), to music generation (Magenta’s transformer-based models produce melodically coherent sequences), and to multimodal domains. The technology is mature, improving, and available to students now. Knowing what it is — not just what it produces — is the starting point for any sensible curriculum discussion about it.

The Data Protection Constraint

Before any discussion of pedagogical benefit, there is a legal boundary that most AI-in-education discussions skip over. In Germany, and in the EU more broadly, the deployment of AI tools in a university setting is governed by the GDPR (DSGVO, Regulation 2016/679) and, at state level in NRW, by the DSG NRW. The constraints are not abstract: they determine which tools can be used for which purposes with which students.

The core principle is data minimisation: only data necessary for a specific, documented purpose may be collected or processed. When a student uses a commercial AI tool to get feedback on a composition exercise and enters text that could identify them or their institution, that data may be stored, processed, and used for model improvement by an operator whose servers are outside the EU. Whether such transfers remain legally valid under GDPR after the Schrems II ruling (Court of Justice of the EU, 2020) is contested — and “contested” is not a position in which an institution can comfortably require students to use a tool.

The practical upshot for curriculum design is this: AI tools running on EU servers with documented processing agreements can be integrated into formal coursework. Commercial tools whose terms specify US-based processing and model training on user data cannot be required of students. They can be discussed and demonstrated, but making them mandatory puts students in a position where they must choose between their privacy and their grade.

This is not a reason to avoid AI in teaching. It is a reason to be honest about the regulatory landscape, to distinguish clearly between tools you can require and tools you can recommend, and to make data protection literacy part of what students learn. The skill of reading a terms-of-service document and identifying the data flows it describes is not a legal skill — it is a general literacy skill that matters for every digital tool a music professional will use.

What Changes for Students

The question I was asked most often across the three workshops was some version of: “If AI can already do X, should students still learn X?”

The question is less simple than it appears, and the answer is not uniform across skills.

Skills where automation reduces the required production threshold do exist. A student who spends weeks mastering advanced music engraving tools for score production, when AI can generate a usable first draft from a much simpler description, has arguably spent time that could have been better allocated elsewhere. Not because the underlying skill is worthless — it is not — but because the threshold of competence required to produce a working output has dropped. The student’s time might be more valuable spent on something that has not been automated.

Skills where automation creates new requirements are more interesting. Transcription is a useful example. Automatic speech recognition — using models like Whisper for spoken-word transcription, or specialised models for audio-to-score music transcription — is now accurate enough to produce usable first drafts from audio. This does not eliminate the need for transcription skill in a music student. It changes it. A student who cannot evaluate the output of an automatic transcription — who cannot hear where the model has made characteristic errors, who does not have an internalised sense of what a correct transcription looks like — is unable to use the tool productively. The required skill has shifted from production to evaluation. This is not a lesser skill; it is a different one, and it is not automatically acquired alongside the ability to run the tool.

Skills that automation cannot replace are those that depend on embodied, situated, relational knowledge: stage presence, real-time improvisation, the subtle negotiation of musical meaning in ensemble, the pedagogical relationship between teacher and student. These are not beyond AI in principle. They are far beyond it in practice, and the gap is not closing as quickly as the generative AI discourse sometimes suggests.

The curriculum implication is not “teach less” or simply “teach differently.” It is: be explicit about which category each skill falls into, and design assessment accordingly. An assignment that asks students to produce something AI can produce is now testing something different from what it was testing two years ago — not necessarily nothing, but something different. The rubric should reflect that.

What Changes for Instructors

The same three-category analysis applies symmetrically to teaching.

Routine task automation is genuinely useful. Generating first drafts of worksheets, producing exercises at different difficulty levels, transcribing a recorded lesson for later analysis — these are tasks where AI can save meaningful time without compromising the pedagogical judgment required to make use of the output. Holmes et al. (2019) identify feedback generation as one of the clearer wins for AI in education: systems that provide immediate, targeted feedback at a scale that human instructors cannot match. A transcription model listening to a student practice and flagging rhythmic inconsistencies does not replace a teacher. It extends the feedback loop beyond the lesson hour.

Content generation with limits is where AI is most seductive and most dangerous. A model like ChatGPT can produce a reading list on any topic, a summary of any debate in the literature, a set of discussion questions for any text. The outputs are fluent, plausible, and frequently wrong in ways that are difficult to detect without domain expertise. Jobin et al. (2019) and Mittelstadt et al. (2016) both document the broader concern with AI opacity and accountability: when a model produces a confident-sounding claim, the burden of verification falls on the user. An instructor who outsources the construction of course materials to a model, and who lacks enough domain knowledge to catch the errors, is not saving time — they are transferring risk to their students.

Hallucinations — outputs that are plausible in form but false in content — are not bugs in the usual sense. They are a structural consequence of how generative models work. A model trained to predict likely next tokens will produce the most statistically plausible continuation, not the most accurate one. For music education, where historical facts, composer attributions, and music-theoretic claims need to be correct, this matters. The model’s fluency is not evidence of its accuracy.

Personalisation is the most-cited promise of AI in education (Luckin et al., 2016; Roll & Wylie, 2016) and the hardest to evaluate in practice. The argument is that AI can adapt instructional content to individual learners' needs in real time, producing one-to-one tutoring at scale. The evidence in formal educational settings is more mixed than the boosters suggest. What is clear is that personalisation at scale requires data — and extensive data about individual students’ learning trajectories raises the same data protection concerns already discussed, in more acute form.

The Music-Specific Question

I want to be direct about something that came up repeatedly across the day and that the general AI-in-education literature handles badly: music education is not generic.

The skills involved — listening, performing, interpreting, composing, improvising — have a phenomenological and embodied dimension that does not map cleanly onto the text-prediction paradigm that most current AI systems instantiate. Suno.AI can generate a stylistically convincing chord progression in the manner of a named composer. It cannot explain why the progression is convincing in the way a student who has internalised tonal function can explain it. Google Magenta can generate a continuation of a melodic fragment that is locally coherent. It cannot navigate the structural expectations of a sonata form with the intentionality that a performer brings to interpreting one.

This is not a criticism of these tools. It is a description of what they are. The curriculum implication is that music education must be clear about what it is teaching: the product — a score, a performance, a composition — or the process and understanding of which the product is evidence. Where assessment focuses on the product, AI creates an obvious challenge. Where it focuses on demonstrable process and understanding — including the ability to critically evaluate AI-generated outputs — it creates new opportunities.

The more interesting question is whether AI tools can make musical process more visible and discussable. A composition student who uses a generative model, notices that the output is harmonically correct but rhythmically inert, and can articulate why it is inert — and then revise it accordingly — has demonstrated more sophisticated musical understanding than a student who produces the same output without any generative assistance. The tool does not lower the standard; it shifts where the standard is applied.

There is an analogy in music theory pedagogy. The availability of notation software that can play back a student’s harmony exercise and flag parallel fifths changed what ear training and harmony teaching emphasise — but it did not make harmony teaching obsolete. It changed the floor (students can check mechanical correctness automatically) and raised the ceiling (more class time can be spent on voice-leading logic and expressive intention). AI tools are a larger version of the same displacement: the floor rises, the ceiling rises with it, and the pedagogical question is always what you are doing between the two.

Copyright and Academic Integrity

Two issues that crossed all three workshops and deserve direct treatment.

On copyright: the training data of generative music models includes copyrighted recordings and scores, the legal status of which is actively litigated in multiple jurisdictions. When Suno.AI generates a piece “in the style of” a named composer, it is drawing on patterns extracted from that composer’s work — work that is under copyright in the case of living or recently deceased composers. The output is not a direct copy, but neither is the relationship to the training data legally settled. Music students who use these tools in professional contexts should know that they are working in a legally uncertain space, and institutions should not pretend otherwise.

On academic integrity: the issue is not that students might use AI to cheat — they will, some of them, and they have always found ways to cheat with whatever tools were available. The issue is that current AI policies at many institutions are incoherent: prohibiting AI use in assessment while providing no clear guidance on what counts as AI use, and assigning tasks where AI assistance is undetectable and arguably appropriate. The more useful approach is to design tasks where AI assistance is either irrelevant (because the task requires live performance or real-time demonstration) or visible and assessed (because the task explicitly includes reflection on how AI was used and to what effect).

Three Things I Came Away With

After a full day of workshops, discussions, and the conversations that happen in the corridors between sessions, I left with three positions that feel more settled than they did in the morning.

First: the data protection question is not separable from the pedagogical question. Any serious curriculum discussion of AI in music education has to start with what can legally be deployed, not with what would be useful if constraints were not a factor. The constraints are a factor.

Second: the skill most urgently needed — in students and in instructors — is not AI literacy in the sense of knowing which tool to use for which task. It is the critical capacity to evaluate AI-generated outputs: to notice what is wrong, to understand why it is wrong, and to correct it. This requires domain expertise first. You cannot critically evaluate an AI-generated harmonic analysis if you do not understand harmonic analysis. The tools do not lower the bar for domain knowledge. They raise the bar for its critical application.

Third: the curriculum question is not “how do we accommodate AI?” It is “what are we actually trying to teach, and does the answer change when AI can produce the visible output of that process?” Answering that honestly, skill by skill, for a full music programme, is slow work. It cannot be done at a one-day event. But a one-day event, if it is well-designed, can start the conversation in the right place.

HfMT’s Thementag started it in the right place.

References

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://arxiv.org/abs/1810.04805
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org
Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial Intelligence in Education: Promises and Implications for Teaching and Learning. Center for Curriculum Redesign.
Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1, 389–399. https://doi.org/10.1038/s42256-019-0088-2
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Luckin, R., Holmes, W., Griffiths, M., & Forcier, L. B. (2016). Intelligence Unleashed: An Argument for AI in Education. Pearson.
Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2). https://doi.org/10.1177/2053951716679679
Roll, I., & Wylie, R. (2016). Evolution and revolution in artificial intelligence in education. International Journal of Artificial Intelligence in Education, 26(2), 582–599. https://doi.org/10.1007/s40593-016-0110-3
Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762

The Boring Parts of Networked Music Performance

Fri, 14 Jun 2024 00:00:00 +0000

This post is based on a manuscript in progress with colleagues from the RAPP Lab network. It builds directly on the August 2023 latency measurements. That post covered what the numbers look like. This one covers why getting to those numbers was the easy part.

The Setup

After spending two and a half years measuring latency across six European research-network links, I can tell you that the audio numbers are achievable. 7.5 to 22.5 ms one-way across Prague to Tallinn, LoLa and MVTP both working, musicians playing together across national borders in real time. Technically, that story has a satisfying ending.

What the measurement paper does not capture is everything that had to be true institutionally before we could run a single test. The firewall negotiations. The repeated calibration sessions. The network configuration that nobody outside our small group knew how to reproduce when someone left. The grant that funded the equipment but not the person who kept it running. The performance session that nearly collapsed because a campus IT update had silently changed a routing rule three days prior.

The technical infrastructure worked. The institutional infrastructure around it was precarious in ways that only became visible when something broke.

This is what the follow-up paper tries to name.

What Is a Digital Music Lab, Actually?

The term gets applied to everything from a laptop cart in a classroom to IRCAM Paris. We use it to mean something specific: a Digital Music Lab (DML) is a hybrid environment where space, equipment, software, personnel and organisational routines are configured together to support iterative artistic experimentation, research-led learning and outward-facing engagement.

The key word is configured together. A room full of excellent hardware is not a DML any more than a library is just a building full of books. What makes either work is an invisible layer of social organisation: access policies, shared norms, maintained documentation, people who know what to do when something breaks.

We borrow a concept from infrastructure studies to describe this: performative infrastructure. The concept draws on Star and Ruhleder (1996), and it captures something precise — that infrastructure does not merely enable activity, it also shapes what kinds of activity are possible in the first place. The decision to use LoLa rather than Zoom is not just a technical choice; it is an institutional statement about what kind of musical interaction this space is designed to support, and about who is expected to use it.

This framing matters because it shifts the design question. You are not asking “what equipment should we buy?” You are asking “what kind of practice do we want to make possible, and what organisational conditions make that practice sustainable?”

Four Things That Actually Determine Whether a DML Survives

1. Flexible by design, not by accident

Resilient labs resist the temptation to optimise for one use case. The systems that have lasted — Stanford CCRMA is the obvious reference point, nearly five decades and counting — tend to separate a stable core (networking, routing, authentication, documentation) from a more rapidly changing layer of creative tools and workflows. The core does not change when you switch DAWs or update your streaming platform. The tools on top of it can.

This sounds obvious. In practice it means being deliberate about which dependencies you are willing to accept. A lab built on a single vendor ecosystem can offer tight integration, but it creates a single point of failure and a maintenance contract you will be negotiating forever. A lab built on open protocols and well-documented configurations is more work to set up and less work to sustain.

The other thing flexibility buys is pedagogical range. The same environment can host an introductory workshop, an advanced performance-research project and a public-facing concert without requiring incompatible reconfiguration for each. This is not a luxury. It is what makes a DML worth the overhead compared to just booking a studio.

2. Governance that survives personnel turnover

The single most dangerous sentence in any DML is: “We can ask [person] — they know how it works.”

Every lab has that person. The one who configured the routing. The one who knows which cable does what. The one who has the institutional memory of every workaround and edge case. When that person moves on, the lab frequently becomes unreliable within six months and functionally inaccessible within a year — even if all the equipment is still there. We call these zombie infrastructures: technically present, functionally dead.

The corrective is not to document everything (though that helps). It is to design governance so that knowledge is distributed by default. Distributed stewardship roles — student assistants, rotating committees, peer mentors — mean that multiple people develop operational knowledge as a matter of routine, not as emergency knowledge transfer when someone announces they are leaving.

Technical staff need to be treated as co-creators in this model, not as service providers. When networked performance is framed as peripheral experimentation rather than core infrastructure, maintenance becomes precarious and invisible. When it is framed as core, collaboration between artistic and technical roles becomes institutional routine.

3. Maintenance as a budget line, not an afterthought

Here is the infrastructure paradox: systems are valued for enabling novelty, but they require boring, recurring investment to remain usable. Project funding solves the novelty problem. It almost never solves the maintenance problem.

The costs that make a lab reliable are not one-off:

Staff continuity (or explicit knowledge transfer when staff change)
Documentation that is actively maintained, not written once and forgotten
Renewal cycles for hardware and software that actually match the pace of change in the underlying ecosystem
User support during active sessions, not just during setup

At HfMT Köln, the operational work that dominated actual implementation time was none of the things that appear in grant applications: coordinating network pathways across campus boundaries, establishing and re-establishing calibration routines after infrastructure updates, producing documentation legible to people who were not present at the original setup, providing real-time support during rehearsals when something behaved unexpectedly.

None of this is glamorous. All of it is what determines whether musicians can actually use the system on a given Tuesday afternoon.

4. Inclusion that is designed, not assumed

Technology-intensive environments reproduce exclusion reliably unless they are actively designed not to. The mechanisms are familiar: assumed prior experience, cultural signals about who belongs, scheduling that conflicts with caring responsibilities, documentation in a single language, interfaces that reward a particular kind of technical confidence.

For DMLs specifically, there is an additional layer. Networked music performance is genuinely different from co-located performance. The latency conditions require different listening and coordination strategies. For musicians trained in tight synchronous ensemble playing, the first experience of performing over a network is often disorienting — latency is not a technical glitch to be fixed, it is a compositional condition to be understood and worked with.

Framing this as a deficit is pedagogically counterproductive. Framing it as an occasion to develop new artistic vocabulary — to think deliberately about what interaction strategies work at 12 ms versus 22 ms, about how anticipatory listening changes the character of improvisation — turns an obstacle into content. Some of the most interesting musical thinking in our sessions came from participants who were trying to understand why something that was effortless in a rehearsal room required conscious attention over the network.

The Tensions That Do Not Resolve

Being honest about what the paper does not solve:

Project funding versus operational costs. We do not have a structural solution to the mismatch between how labs are funded (innovation grants with defined end dates) and how they need to operate (indefinitely, with predictable maintenance budgets). Collaborative purchasing agreements and shared technical teams across institutions can distribute the burden, but they introduce coordination overhead. There is no clean answer here.

Experimentation versus accountability metrics. Universities and funders want quantifiable outputs. Artistic experimentation often produces its most valuable results as changed practices and new aesthetic understanding — things that do not appear in publication counts or utilisation statistics. The best available response is to be explicit about this mismatch when negotiating evaluation criteria, and to establish review processes that include artistic peers and community partners rather than only administrators. This is possible more often than people think, but it requires someone to argue for it proactively.

Openness versus depth. A lab built for maximum accessibility is not the same as a lab optimised for a specific research agenda, and trying to be both usually means doing neither well. The design question is not which is better but where the tradeoff lies for a particular institution’s mission. CCRMA and IRCAM have made different bets on this axis over decades and both have produced important work. The mistake is not having an opinion about where you sit on the spectrum.

Recommendations

These are for institutions and funders, assembled from what the paper describes as working across multiple DML contexts:

Treat DMLs as long-term cultural infrastructure. Recurring budget lines for renewal, documentation and support — not just start-up funding.
Separate your stable backbone from your creative tools. Networking, routing, authentication and documentation should not be rebuilt every time you change your video platform.
Design governance that does not rely on one person. Distributed stewardship roles, clear succession documentation, operational knowledge treated as shared rather than individual.
Make invisible labour visible. Technical stewardship, facilitation and community liaison need to appear in hiring, workload models and evaluation — not just in informal practice.
Lower the floor for participation. Scaffolded onboarding, peer mentoring, programming that supports diverse musical practices and levels of technical experience.
Sort out data governance before you start recording. Consent, archiving and reuse policies for audio/video, especially when community partners or students are involved.
Plan for the lab’s eventual obsolescence. Versioning policies, migration plans, criteria for retiring tools. Zombie infrastructures are a governance failure, not a technical one.
Evaluate on multiple axes. Technical reliability is one. Learning trajectories, student agency, community partnership durability and artistic outcomes are others. Reporting only the first one creates a misleading picture of whether the lab is actually working.

What This Does and Does Not Claim

The argument in the paper is conceptual and practice-informed rather than empirical in the standard sense. We synthesise literature and draw on the HfMT Köln implementation as a vignette — it is an illustration, not a representative sample. The framework we propose (four design principles, the performative infrastructure framing) is offered as an analytical vocabulary for planning and evaluation, not as a validated theory.

What it is useful for: making implicit infrastructure choices explicit, naming tensions before they become crises, and supporting more realistic conversations between artistic users, technical staff and institutional leadership about what it actually takes to make this work.

References

Borgdorff, H. (2012). The Conflict of the Faculties: Perspectives on Artistic Research and Academia. Leiden University Press.

Labbé, D., Zuberec, C., & Turner, S. (2022). Creative hubs in Hanoi, Vietnam: Transgressive spaces in a socialist state? Urban Studies. https://doi.org/10.1177/00420980221086371

McKay, G. (2017). Community music: History and current practice. International Journal of Community Music, 10(2), 129–137. https://doi.org/10.1386/ijcm.10.2.129_1

Morreale, F., Bowers, J., & McPherson, A. (2021). Collaborating in distributed musical partnerships. Computers in Human Behavior, 120, 106757. https://doi.org/10.1016/j.chb.2021.106757

Selwyn, N. (2021). Education and Technology: Key Issues and Debates (3rd ed.). Bloomsbury Academic.

Star, S. L., & Ruhleder, K. (1996). Steps toward an ecology of infrastructure. Information Systems Research, 7(1), 111–134. https://doi.org/10.1287/isre.7.1.111

Wenger, E. (1998). Communities of Practice: Learning, Meaning, and Identity. Cambridge University Press. https://doi.org/10.1017/CBO9780511803932

Changelog

2026-01-20: Removed the Chafe (2018) “Stanford CCRMA: A 40-year retrospective” reference, which could not be confirmed in available databases (DOI does not resolve, not listed in Computer Music Journal 42(3)). The body text reference to CCRMA as an institutional example is retained; it does not depend on this citation.
2026-01-20: Changed “The term comes from Star and Ruhleder (1996)” to “The concept draws on Star and Ruhleder (1996).” Star and Ruhleder’s paper is the foundational text on relational infrastructure, but they did not coin the specific compound term “performative infrastructure.”

What the Videography Manual Didn't Cover: Filming Music Education

Tue, 13 Feb 2024 00:00:00 +0000

This post follows from the May 2023 post on the classroom videography manual. Read that one first if you want the baseline.

The Assumption Underneath the Manual

The manual we published — Kramer, Spicker, and Kaspar, 2023, open access at kups.ub.uni-koeln.de/65599 — is a good document for what it is. It covers a classroom. It assumes a teacher in front of twenty to thirty students, a forty-five minute lesson, a room with windows that create backlighting problems, a consent process that involves four institutional levels, and two static cameras facing each other as the baseline configuration.

All of that is correct for the context it addresses. The context is school-based subject teaching: physics, mathematics, German, history. The University of Cologne teacher education programme we developed the manual for is primarily about preparing people for exactly that context.

When I moved to the Cologne University of Music, I brought the same assumptions with me. It took a while for me to notice how much the new context violated them.

Sound Is Not the Same Problem

In the manual, the section on audio equipment is focused on speech capture. The recommendation — lavalier microphones for the teacher, boundary microphones at the cameras for student audio — is correct for a lesson where the subject matter is communicated through talking. The teacher talks. The students talk back. The quality criterion for the audio is: can we understand what is being said?

In music education, the subject matter is sound. What the student produces acoustically is not background noise supporting verbal instruction — it is the object of the lesson. And it is produced by instruments that have almost nothing in common acoustically with a human voice.

A lavalier microphone clipped to a teacher’s collar, positioned to capture speech from thirty centimetres away, will record a student’s piano playing through the back of the teacher’s head, through the air, through a directional capsule aimed at the wrong thing. The resulting audio is technically present and analytically useless.

Instruments have frequency ranges, dynamic ranges, and directional patterns that require completely different microphone selection and placement. A violin at fortissimo in a small practice room will clip every speech-grade microphone in the room. A pianissimo pianists’ breath-controlled passage that a skilled listener can hear clearly will barely register on a distant boundary microphone designed to capture “the general acoustic environment.” The distinction between a correctly produced tone and an incorrectly produced tone — which is the actual content of the lesson — may or may not be audible in the captured audio depending on whether anyone thought about microphone choice before walking through the door.

The manual’s principle of “as much as necessary, as little as possible” still applies, but “necessary” is a completely different specification here.

The One-to-One Lesson Problem

The classroom videography framework — including the manual — is built around a structural assumption: there is a teacher, and there is a class. The teacher stands or moves at the front; the students are arrayed in rows or groups. Two cameras can cover this because the spatial structure is relatively stable and the relevant action is roughly predictable.

A university instrumental lesson is typically one-to-one, in a small practice room, for sixty minutes. The spatial structure is two people close together around an instrument. The relevant action includes:

The teacher demonstrating a passage on their own instrument
The teacher making a physical correction — adjusting bow arm position, repositioning the student’s hand on the fingerboard, demonstrating breath support by putting a hand on the student’s diaphragm
The student playing and the teacher listening with their eyes closed
The teacher singing a melodic contour to show phrasing
Both of them playing at the same time (unison work, call and response)

A standard two-camera classroom setup captures none of this usefully. The standard framing — wide angle, teacher on one side, student on the other — produces footage where “something is happening near the piano” but where the analytically relevant detail (the finger position, the bow angle, the postural correction) is invisible at normal viewing distance.

You need different framing. You probably need closer cameras. You might need a third angle for body position. And you need to accept that this raises the setup complexity substantially beyond what the manual recommends as a baseline.

What the Lesson Is Actually About

There is a deeper structural difference that the equipment and setup challenges are symptoms of.

In subject-matter teaching, the lesson is the unit of analysis. A forty-five-minute lesson has a beginning, a development, a conclusion. The teacher enters with a plan; the video captures how that plan was executed and how the students responded. The analytical interest is in the lesson as a coherent pedagogical event.

In instrumental music education, the lesson is a container for cycles. A student plays a passage. The teacher identifies a problem — the intonation at bar twelve, the tendency to rush the syncopated rhythm, the bow pressure collapsing in the crescendo. The teacher says or demonstrates something. The student tries again. The teacher listens to what changed and what did not.

These cycles are the unit of analysis, and they happen dozens of times in a single lesson. The lesson-level video is useful context, but the analytically interesting question is inside the cycle: what did the teacher identify, what intervention did they choose, what happened to the student’s playing afterward?

Capturing those cycles in usable form requires not just video of the lesson but video that is indexed to them — where each attempt-and-response pair can be located and compared. A continuous recording of a sixty-minute lesson is not organised for this purpose. Timestamps help but do not replace the work of finding and annotating each cycle.

The Absent Camera Problem

There is a more fundamental issue that no amount of improved equipment configuration addresses.

The feedback cycle a teacher most wants to reach is the one that happens in a student’s practice session. Between lessons, the student is alone in a practice room, working through the same passages, repeating the same mistakes (or, occasionally, having the experience of something going right for reasons they do not fully understand). The teacher’s instructions from the last lesson are present only in the student’s memory of them, which is fallible and partial.

The videography manual is about research documentation: a trained operator, institutional consent, equipment brought in from outside. None of that is available in a student’s practice session at eleven o’clock on a Wednesday night. And even if you could film it — which you could, technically, with a phone — the resulting footage would be unwatched, because no workflow exists to get it from the student’s device to the teacher’s eyes in a form that supports structured feedback.

The practical reality is that most music teachers receive feedback about a student’s practice only through the student’s report of it (“I practiced every day”) and through the evidence presented in the lesson (which may or may not reflect what practice actually looked like). The gap between practice and lesson feedback is a structural feature of music education, and it is not something that research videography can address.

A Software Response

The tool I built to think through this problem is called Resonance, and it is available at github.com/sebastianspicker/resonance.

The design is deliberately different from the research videography model. Instead of an external camera operator documenting a lesson for later analysis, Resonance puts the documentation instrument in the student’s hands. Students capture short audio or video clips of their own practice — snippets of a passage they want the teacher to hear, a moment where something went wrong, a phrase they are finally getting right — and submit them to a course. The teacher reviews the queue and adds feedback with timestamped annotations: “at 0:23, the bow pressure drops — this is what is generating the scratch.”

The asymmetry is intentional. The student decides what to document. The teacher provides structured, specific feedback. The cycle is asynchronous — the student submits at eleven on a Wednesday night; the teacher responds Thursday morning — which means it is independent of the lesson schedule.

The technical decisions follow from the use context. Students practice in rooms where connectivity is unreliable, so the app is offline-first: recordings are captured locally and uploaded when a connection is available. An iPad is the natural form factor for a music student — larger screen, better camera, sits on a music stand. The backend is standard (Node.js, Postgres, S3-compatible object storage) because the interesting problem here is not the infrastructure but the workflow.

Resonance is a prototype and a proof of concept, not a production system. The authentication is explicitly development-mode only. The goal was to build enough of the thing to be able to think clearly about what it does and does not solve.

What It Does Not Solve

Resonance addresses the absent-camera problem for the practice-to-feedback loop. It does not address the research documentation problem that the videography manual was written for.

If you want to study how music teachers give feedback — as a research question about teaching practice, not just as a workflow tool — you still need the full apparatus: controlled recording conditions, appropriate microphones for instruments, multi-camera coverage of the lesson, consent for the resulting footage to be used for research and teaching purposes, and post-processing that produces an analytically usable document.

Resonance footage is not that. It is what a student chose to capture on an iPad in a practice room, with whatever acoustic environment happened to be present. It is useful for the practice-feedback cycle; it is not a research record.

The challenges I described in the first two sections — appropriate microphones, multi-angle coverage of one-to-one lessons, capture of the practice cycle rather than the lesson arc — are still open problems for anyone trying to do systematic observational research in music education. The manual gives you the framework for thinking about them. It does not give you solutions, because those solutions are context-specific and, in several cases, not yet worked out by the field.

What I find interesting is that the two problems — research documentation and practice-feedback — might look the same (filming music education) but require almost entirely different responses. Getting clear on which problem you are solving turns out to be most of the work.

The full classroom videography manual is at kups.ub.uni-koeln.de/65599. The Resonance repository is at github.com/sebastianspicker/resonance.

References

Kramer, C., Spicker, S. J., & Kaspar, K. (2023). Manual zur Erstellung von Unterrichtsvideographien. KUPS Open Access. https://kups.ub.uni-koeln.de/65599/

Lehmann, A. C., Sloboda, J. A., & Woody, R. H. (2007). Psychology for Musicians: Understanding and Acquiring the Skills. Oxford University Press.

Presland, C. (2005). Conservatoire student and instrumental professor: The student perspective on a complex relationship. British Journal of Music Education, 22(3), 237–248. https://doi.org/10.1017/S0265051705006558

Creech, A., & Hallam, S. (2011). Learning a musical instrument: The influence of interpersonal interaction on outcomes for school-aged pupils. Psychology of Music, 39(1), 102–122. https://doi.org/10.1177/0305735610370222