What the Videography Manual Didn't Cover: Filming Music Education

This post follows from the May 2023 post on the classroom videography manual. Read that one first if you want the baseline.

The Assumption Underneath the Manual

The manual we published — Kramer, Spicker, and Kaspar, 2023, open access at kups.ub.uni-koeln.de/65599 — is a good document for what it is. It covers a classroom. It assumes a teacher in front of twenty to thirty students, a forty-five minute lesson, a room with windows that create backlighting problems, a consent process that involves four institutional levels, and two static cameras facing each other as the baseline configuration.

All of that is correct for the context it addresses. The context is school-based subject teaching: physics, mathematics, German, history. The University of Cologne teacher education programme we developed the manual for is primarily about preparing people for exactly that context.

When I moved to the Cologne University of Music, I brought the same assumptions with me. It took a while for me to notice how much the new context violated them.

Sound Is Not the Same Problem

In the manual, the section on audio equipment is focused on speech capture. The recommendation — lavalier microphones for the teacher, boundary microphones at the cameras for student audio — is correct for a lesson where the subject matter is communicated through talking. The teacher talks. The students talk back. The quality criterion for the audio is: can we understand what is being said?

In music education, the subject matter is sound. What the student produces acoustically is not background noise supporting verbal instruction — it is the object of the lesson. And it is produced by instruments that have almost nothing in common acoustically with a human voice.

A lavalier microphone clipped to a teacher’s collar, positioned to capture speech from thirty centimetres away, will record a student’s piano playing through the back of the teacher’s head, through the air, through a directional capsule aimed at the wrong thing. The resulting audio is technically present and analytically useless.

Instruments have frequency ranges, dynamic ranges, and directional patterns that require completely different microphone selection and placement. A violin at fortissimo in a small practice room will clip every speech-grade microphone in the room. A pianissimo pianists’ breath-controlled passage that a skilled listener can hear clearly will barely register on a distant boundary microphone designed to capture “the general acoustic environment.” The distinction between a correctly produced tone and an incorrectly produced tone — which is the actual content of the lesson — may or may not be audible in the captured audio depending on whether anyone thought about microphone choice before walking through the door.

The manual’s principle of “as much as necessary, as little as possible” still applies, but “necessary” is a completely different specification here.

The One-to-One Lesson Problem

The classroom videography framework — including the manual — is built around a structural assumption: there is a teacher, and there is a class. The teacher stands or moves at the front; the students are arrayed in rows or groups. Two cameras can cover this because the spatial structure is relatively stable and the relevant action is roughly predictable.

A university instrumental lesson is typically one-to-one, in a small practice room, for sixty minutes. The spatial structure is two people close together around an instrument. The relevant action includes:

The teacher demonstrating a passage on their own instrument
The teacher making a physical correction — adjusting bow arm position, repositioning the student’s hand on the fingerboard, demonstrating breath support by putting a hand on the student’s diaphragm
The student playing and the teacher listening with their eyes closed
The teacher singing a melodic contour to show phrasing
Both of them playing at the same time (unison work, call and response)

A standard two-camera classroom setup captures none of this usefully. The standard framing — wide angle, teacher on one side, student on the other — produces footage where “something is happening near the piano” but where the analytically relevant detail (the finger position, the bow angle, the postural correction) is invisible at normal viewing distance.

You need different framing. You probably need closer cameras. You might need a third angle for body position. And you need to accept that this raises the setup complexity substantially beyond what the manual recommends as a baseline.

What the Lesson Is Actually About

There is a deeper structural difference that the equipment and setup challenges are symptoms of.

In subject-matter teaching, the lesson is the unit of analysis. A forty-five-minute lesson has a beginning, a development, a conclusion. The teacher enters with a plan; the video captures how that plan was executed and how the students responded. The analytical interest is in the lesson as a coherent pedagogical event.

In instrumental music education, the lesson is a container for cycles. A student plays a passage. The teacher identifies a problem — the intonation at bar twelve, the tendency to rush the syncopated rhythm, the bow pressure collapsing in the crescendo. The teacher says or demonstrates something. The student tries again. The teacher listens to what changed and what did not.

These cycles are the unit of analysis, and they happen dozens of times in a single lesson. The lesson-level video is useful context, but the analytically interesting question is inside the cycle: what did the teacher identify, what intervention did they choose, what happened to the student’s playing afterward?

Capturing those cycles in usable form requires not just video of the lesson but video that is indexed to them — where each attempt-and-response pair can be located and compared. A continuous recording of a sixty-minute lesson is not organised for this purpose. Timestamps help but do not replace the work of finding and annotating each cycle.

The Absent Camera Problem

There is a more fundamental issue that no amount of improved equipment configuration addresses.

The feedback cycle a teacher most wants to reach is the one that happens in a student’s practice session. Between lessons, the student is alone in a practice room, working through the same passages, repeating the same mistakes (or, occasionally, having the experience of something going right for reasons they do not fully understand). The teacher’s instructions from the last lesson are present only in the student’s memory of them, which is fallible and partial.

The videography manual is about research documentation: a trained operator, institutional consent, equipment brought in from outside. None of that is available in a student’s practice session at eleven o’clock on a Wednesday night. And even if you could film it — which you could, technically, with a phone — the resulting footage would be unwatched, because no workflow exists to get it from the student’s device to the teacher’s eyes in a form that supports structured feedback.

The practical reality is that most music teachers receive feedback about a student’s practice only through the student’s report of it (“I practiced every day”) and through the evidence presented in the lesson (which may or may not reflect what practice actually looked like). The gap between practice and lesson feedback is a structural feature of music education, and it is not something that research videography can address.

A Software Response

The tool I built to think through this problem is called Resonance, and it is available at github.com/sebastianspicker/resonance.

The design is deliberately different from the research videography model. Instead of an external camera operator documenting a lesson for later analysis, Resonance puts the documentation instrument in the student’s hands. Students capture short audio or video clips of their own practice — snippets of a passage they want the teacher to hear, a moment where something went wrong, a phrase they are finally getting right — and submit them to a course. The teacher reviews the queue and adds feedback with timestamped annotations: “at 0:23, the bow pressure drops — this is what is generating the scratch.”

The asymmetry is intentional. The student decides what to document. The teacher provides structured, specific feedback. The cycle is asynchronous — the student submits at eleven on a Wednesday night; the teacher responds Thursday morning — which means it is independent of the lesson schedule.

The technical decisions follow from the use context. Students practice in rooms where connectivity is unreliable, so the app is offline-first: recordings are captured locally and uploaded when a connection is available. An iPad is the natural form factor for a music student — larger screen, better camera, sits on a music stand. The backend is standard (Node.js, Postgres, S3-compatible object storage) because the interesting problem here is not the infrastructure but the workflow.

Resonance is a prototype and a proof of concept, not a production system. The authentication is explicitly development-mode only. The goal was to build enough of the thing to be able to think clearly about what it does and does not solve.

What It Does Not Solve

Resonance addresses the absent-camera problem for the practice-to-feedback loop. It does not address the research documentation problem that the videography manual was written for.

If you want to study how music teachers give feedback — as a research question about teaching practice, not just as a workflow tool — you still need the full apparatus: controlled recording conditions, appropriate microphones for instruments, multi-camera coverage of the lesson, consent for the resulting footage to be used for research and teaching purposes, and post-processing that produces an analytically usable document.

Resonance footage is not that. It is what a student chose to capture on an iPad in a practice room, with whatever acoustic environment happened to be present. It is useful for the practice-feedback cycle; it is not a research record.

The challenges I described in the first two sections — appropriate microphones, multi-angle coverage of one-to-one lessons, capture of the practice cycle rather than the lesson arc — are still open problems for anyone trying to do systematic observational research in music education. The manual gives you the framework for thinking about them. It does not give you solutions, because those solutions are context-specific and, in several cases, not yet worked out by the field.

What I find interesting is that the two problems — research documentation and practice-feedback — might look the same (filming music education) but require almost entirely different responses. Getting clear on which problem you are solving turns out to be most of the work.

The full classroom videography manual is at kups.ub.uni-koeln.de/65599. The Resonance repository is at github.com/sebastianspicker/resonance.

References

Kramer, C., Spicker, S. J., & Kaspar, K. (2023). Manual zur Erstellung von Unterrichtsvideographien. KUPS Open Access. https://kups.ub.uni-koeln.de/65599/

Lehmann, A. C., Sloboda, J. A., & Woody, R. H. (2007). Psychology for Musicians: Understanding and Acquiring the Skills. Oxford University Press.

Presland, C. (2005). Conservatoire student and instrumental professor: The student perspective on a complex relationship. British Journal of Music Education, 22(3), 237–248. https://doi.org/10.1017/S0265051705006558

Creech, A., & Hallam, S. (2011). Learning a musical instrument: The influence of interpersonal interaction on outcomes for school-aged pupils. Psychology of Music, 39(1), 102–122. https://doi.org/10.1177/0305735610370222

The Assumption Underneath the Manual#

Sound Is Not the Same Problem#

The One-to-One Lesson Problem#

What the Lesson Is Actually About#

The Absent Camera Problem#

A Software Response#

What It Does Not Solve#

References#