Research-Methods on Sebastian Spicker

Your Transcript Is Already an Interpretation: AI Transcription and Grounded Theory

Tue, 10 Jun 2025 00:00:00 +0000

In June 2025 I put together a practical guide on AI-assisted transcription for professors of music pedagogy at HfMT Köln — primarily a hands-on introduction to aTrain and noScribe. This post is the methodological companion to that guide: the stuff I could not fit into a workshop handout but that I think matters more than the installation instructions.

The Seduction

AI transcription tools have reached a point where, for clean audio of a single speaker in a quiet room, the output is genuinely good. You load a 90-minute interview, click a button, wait roughly 20 minutes, and get a readable transcript with timestamps and speaker labels. In transcript-hours, that is an order of magnitude faster than manual transcription. The appeal is obvious, especially if you are a qualitative researcher working with a backlog of interview recordings.

The two tools I have been evaluating — aTrain (developed at University of Graz) and noScribe (an independent open-source project) — both run entirely locally on your machine. No audio file is uploaded anywhere. No cloud API is involved. This matters for interview research: you are handling other people’s speech, often on topics they regard as sensitive, and the GDPR landscape for sending recordings to external servers is genuinely complicated. Local processing sidesteps that problem entirely.

Both tools are built on OpenAI’s Whisper model, which is — despite the name — open-source and runs offline. They differ in interface philosophy, feature depth, and what methodological commitments they make visible.

But the seduction is the problem. The speed and cleanliness of the output makes it easy to treat the transcript as a neutral record rather than as a construction. It is not. Every transcription is an act of interpretation. An AI transcription is an act of interpretation performed by an algorithm that does not know what your research question is.

Why This Is a Grounded Theory Problem Specifically

In grounded theory — whether you follow the Strauss and Corbin tradition or the constructivist reformulation by Charmaz — the researcher is not a passive recorder of data. The analytical process begins with the first moment of contact with the material. Coding, memo-writing, constant comparison, and theoretical sampling all assume that you are working with data that you have genuinely engaged with and that reflects choices made with your research question in mind.

Transcription is the first of those choices. What counts as a pause? Do you mark hesitations and self-corrections? Do you capture overlapping speech? Do you note emphasis, speed changes, or trailing-off? The answers to these questions are not neutral. They are determined by what level of analysis you intend. A thematic analysis of interview content needs something different from a conversation analysis of turn-taking, which needs something different from a discourse analysis attending to hedges and disfluencies.

When you transcribe manually, you make these choices explicitly or implicitly, but you make them. When you delegate to an algorithm, the algorithm makes them — according to its training data and its default settings — and then presents you with output that looks authoritative.

The risk is not that AI transcription is inaccurate (though it sometimes is). The risk is that it is selectively accurate in ways you did not choose and that those choices shape what you subsequently see in the data.

What the Tools Actually Do

aTrain

aTrain is the simpler of the two. Windows-native (Microsoft Store), with a macOS beta for Apple Silicon. The interface has essentially one meaningful decision point after you load your file: whether to activate speaker detection. Everything else is handled automatically. Output formats are plain text with timestamps, SRT subtitle files, and — most useful for researchers — direct QDA exports for MAXQDA, ATLAS.ti, and NVivo with synchronised audio-timestamp links.

What aTrain does not do: it does not mark pauses. It does not detect disfluencies (the ähms, uhs, self-interruptions, false starts). It does not detect overlapping speech. It produces clean, semantically coherent transcripts — which means it actively smooths what you gave it. If a speaker says “well — I mean — it was, I think it was more like — yeah, complicated”, aTrain will probably give you something closer to “I think it was complicated”. The hesitation structure disappears.

For a thematic interview study where you are interested in what people said about a topic, this is probably fine. For any analysis where how something was said is part of the data — pace, repair, emphasis, epistemic hedging — aTrain is erasing data you need.

noScribe

noScribe is more complex in almost every dimension. Available for Windows, macOS (including Apple Silicon and Intel), and Linux. The interface exposes a meaningful number of configuration decisions:

Mark Pause: off, or marked at 1-, 2-, or 3-second thresholds, with conventional notation (.), (..), (...), (10 seconds pause)
Speaker Detection: automatic count, fixed count, or disabled
Overlapping Speech: experimental detection, marked with //double slash//
Disfluencies: off or on — captures ähm, äh, self-corrections, false starts
Timestamps: by speaker turn or every 60 seconds

It also has an integrated editor (noScribeEdit) with synchronised audio playback: click anywhere in the transcript and the audio seeks to that position. This is the single most useful feature for post-transcription review, and aTrain does not have anything equivalent.

The configuration complexity is not gratuitous. It reflects the fact that different methodological frameworks require different transcription conventions. noScribe’s disfluency detection corresponds roughly to what a GAT2-Light transcription requires. Its pause notation system maps onto conversation analytic conventions. The choices you make in the interface are methodological choices, not just technical preferences.

The Normalisation Problem

Both tools perform what I would call normalisation: they produce transcripts that read more fluently than the original speech. This is a feature from a usability standpoint and a methodological liability from a qualitative research standpoint.

Specific failure modes I observed in evaluation:

Compound word errors (more pronounced in noScribe for German): VR-Brille (“VR headset”) transcribed as Brille VR, proper nouns mangled, domain vocabulary rendered phonetically. In music research contexts this is particularly salient — instrument names, notation terms, composer names, and genre vocabulary are all potential failure points.

Speaker detection overcounting: both tools, when speaker detection is active, tend to identify more speakers than are present. A two-person interview with one hesitant speaker may generate three or four speaker labels. Manual correction is required.

Acoustic transcription: noScribe occasionally produces what the document calls lautliche Transkriptionen — phonetic renderings rather than semantic ones. A speaker saying Beamer (data projector) may be transcribed as Bima. This is not an error in the conventional sense; it is the model accurately representing what it heard acoustically rather than semantically resolving it. For music researchers studying how non-specialist participants talk about technical equipment, this is interesting. For most interview research, it requires correction.

Pause and overlap reliability degrades with audio quality: both tools perform well on clean, close-mic mono recordings of single speakers in quiet rooms. Introduce a second speaker, ambient noise, variable recording distance, or a phone recording, and accuracy drops substantially. This matters specifically for music interview research, where the interview setting is often a rehearsal room or performance space rather than an acoustic booth.

A Methodological Comparison, Not a Feature List

The useful comparison between aTrain and noScribe is not technical — it is about which methodological contexts each is suited to.

Research context	Tool	Why
Thematic/content analysis, single speaker	aTrain	Speed, simplicity, adequate accuracy, QDA export
Grounded theory with attention to epistemic hedging	noScribe + disfluencies	Captures the hesitation structure that carries methodological information
Conversation analysis	Neither, or noScribe as starting point	CA requires phonetic detail neither tool reliably produces
Large corpus, initial open coding	aTrain	Volume and speed outweigh detail at early stages
Interpretive phenomenological analysis	noScribe	The pause and disfluency data is IPA-relevant
Teaching transcription as a research practice	Both	See below

The last row deserves its own section.

Using Both Tools to Teach About Transcription

The most pedagogically valuable use of these tools is probably not producing transcripts — it is using them to make the constructed nature of transcripts visible to students.

A simple exercise: take a three-minute excerpt of an interview recording. Have students transcribe it manually according to whatever convention the course uses. Then run the same excerpt through aTrain and noScribe with different settings. Compare the three or four resulting transcripts in a seminar discussion.

The differences that emerge are not about which transcript is “correct”. They are about what each transcript makes visible and what it hides. The aTrain transcript will be clean and readable. The manually-produced transcript will have annotation that the students chose based on what struck them as relevant. The noScribe transcript with disfluencies enabled will look noisy. All three are representations of the same three minutes of speech.

Questions that come out of this reliably: Why did the student who transcribed manually mark that particular pause? What did the student not mark that the software did? What did the software produce that the student did not hear? What does the “cleaner” transcript lose?

This is the entry point to a genuinely grounded theory-relevant conversation about data construction: the transcript is not the data. The transcript is a representation of the data made according to principles that should be theoretically motivated, and those principles should be stated explicitly in the methods section.

What These Tools Cannot Replace

The document I prepared for the HfMT professors ends with a sentence I want to quote directly from the German, because it is the methodological core of the whole thing:

Automatisierung ersetzt nicht das Nachdenken über Daten. Automation does not replace thinking about data.

More precisely: the algorithm makes decisions about what counts as a pause, what counts as language, whose voice counts as a separate speaker — without knowing what is scientifically relevant. It does not know that the half-second hesitation before a particular word is the most important moment in the interview. It does not know that the overlapping “mm-hm” is a data point for your analysis of how the interviewee manages discomfort. It does not know that the repeated self-correction in the middle of a sentence about teaching practice is where your emerging category is.

You have to know that. And you only know it if you have been in enough contact with the material to have developed theoretical sensitivity — which is exactly what Strauss and Corbin mean when they describe the iterative relationship between data collection, coding, and theoretical development in grounded theory.

AI transcription tools save the hours of typing. They do not and cannot substitute for the analytical engagement that makes a grounded theory study produce knowledge rather than a theme list.

Use them. But use them knowing what they are doing.

Practical Summary

aTrain: one-click, local, GDPR-compliant, good QDA integration, appropriate for thematic analysis. No disfluencies, no pauses, no overlap detection. Versions: Windows (Microsoft Store), macOS beta. Current version: 1.3.1.
noScribe: more complex, highly configurable, disfluency and pause detection, integrated audio-sync editor, appropriate for grounded theory and discourse-oriented work. More demanding to set up. Current version: 0.6.2.
Neither tool is appropriate as a black-box solution for conversation analysis or prosodic research.
Both tools require manual post-processing. Estimate correction time at roughly 20–40% of the original interview length for clean recordings with a single speaker; more for multi-speaker or suboptimal audio.
In teaching: the exercise of comparing manual, aTrain, and noScribe transcripts of the same excerpt is more pedagogically valuable than any of the transcripts individually.

References

Charmaz, K. (2014). Constructing Grounded Theory (2nd ed.). SAGE Publications.

Dresing, T. & Pehl, T. (2018). Praxisbuch Interview, Transkription & Analyse (8th ed.). Eigenverlag. https://www.audiotranskription.de

Haberl, A., Fleiß, J., Kowald, D., & Thalmann, S. (2024). Take the aTrain. Introducing an interface for the accessible transcription of interviews. Journal of Behavioral and Experimental Finance, 41, 100891. https://doi.org/10.1016/j.jbef.2024.100891

Kailscheuer, K. (2023). noScribe [software]. https://github.com/kaixxx/noScribe

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356. https://arxiv.org/abs/2212.04356

Strauss, A. & Corbin, J. (1998). Basics of Qualitative Research (2nd ed.). SAGE Publications.

Changelog

2026-01-20: Updated the aTrain reference to the published form: Haberl, A., Fleiß, J., Kowald, D., & Thalmann, S. (2024), “Take the aTrain. Introducing an interface for the accessible transcription of interviews.”

What the Videography Manual Didn't Cover: Filming Music Education

Tue, 13 Feb 2024 00:00:00 +0000

This post follows from the May 2023 post on the classroom videography manual. Read that one first if you want the baseline.

The Assumption Underneath the Manual

The manual we published — Kramer, Spicker, and Kaspar, 2023, open access at kups.ub.uni-koeln.de/65599 — is a good document for what it is. It covers a classroom. It assumes a teacher in front of twenty to thirty students, a forty-five minute lesson, a room with windows that create backlighting problems, a consent process that involves four institutional levels, and two static cameras facing each other as the baseline configuration.

All of that is correct for the context it addresses. The context is school-based subject teaching: physics, mathematics, German, history. The University of Cologne teacher education programme we developed the manual for is primarily about preparing people for exactly that context.

When I moved to the Cologne University of Music, I brought the same assumptions with me. It took a while for me to notice how much the new context violated them.

Sound Is Not the Same Problem

In the manual, the section on audio equipment is focused on speech capture. The recommendation — lavalier microphones for the teacher, boundary microphones at the cameras for student audio — is correct for a lesson where the subject matter is communicated through talking. The teacher talks. The students talk back. The quality criterion for the audio is: can we understand what is being said?

In music education, the subject matter is sound. What the student produces acoustically is not background noise supporting verbal instruction — it is the object of the lesson. And it is produced by instruments that have almost nothing in common acoustically with a human voice.

A lavalier microphone clipped to a teacher’s collar, positioned to capture speech from thirty centimetres away, will record a student’s piano playing through the back of the teacher’s head, through the air, through a directional capsule aimed at the wrong thing. The resulting audio is technically present and analytically useless.

Instruments have frequency ranges, dynamic ranges, and directional patterns that require completely different microphone selection and placement. A violin at fortissimo in a small practice room will clip every speech-grade microphone in the room. A pianissimo pianists’ breath-controlled passage that a skilled listener can hear clearly will barely register on a distant boundary microphone designed to capture “the general acoustic environment.” The distinction between a correctly produced tone and an incorrectly produced tone — which is the actual content of the lesson — may or may not be audible in the captured audio depending on whether anyone thought about microphone choice before walking through the door.

The manual’s principle of “as much as necessary, as little as possible” still applies, but “necessary” is a completely different specification here.

The One-to-One Lesson Problem

The classroom videography framework — including the manual — is built around a structural assumption: there is a teacher, and there is a class. The teacher stands or moves at the front; the students are arrayed in rows or groups. Two cameras can cover this because the spatial structure is relatively stable and the relevant action is roughly predictable.

A university instrumental lesson is typically one-to-one, in a small practice room, for sixty minutes. The spatial structure is two people close together around an instrument. The relevant action includes:

The teacher demonstrating a passage on their own instrument
The teacher making a physical correction — adjusting bow arm position, repositioning the student’s hand on the fingerboard, demonstrating breath support by putting a hand on the student’s diaphragm
The student playing and the teacher listening with their eyes closed
The teacher singing a melodic contour to show phrasing
Both of them playing at the same time (unison work, call and response)

A standard two-camera classroom setup captures none of this usefully. The standard framing — wide angle, teacher on one side, student on the other — produces footage where “something is happening near the piano” but where the analytically relevant detail (the finger position, the bow angle, the postural correction) is invisible at normal viewing distance.

You need different framing. You probably need closer cameras. You might need a third angle for body position. And you need to accept that this raises the setup complexity substantially beyond what the manual recommends as a baseline.

What the Lesson Is Actually About

There is a deeper structural difference that the equipment and setup challenges are symptoms of.

In subject-matter teaching, the lesson is the unit of analysis. A forty-five-minute lesson has a beginning, a development, a conclusion. The teacher enters with a plan; the video captures how that plan was executed and how the students responded. The analytical interest is in the lesson as a coherent pedagogical event.

In instrumental music education, the lesson is a container for cycles. A student plays a passage. The teacher identifies a problem — the intonation at bar twelve, the tendency to rush the syncopated rhythm, the bow pressure collapsing in the crescendo. The teacher says or demonstrates something. The student tries again. The teacher listens to what changed and what did not.

These cycles are the unit of analysis, and they happen dozens of times in a single lesson. The lesson-level video is useful context, but the analytically interesting question is inside the cycle: what did the teacher identify, what intervention did they choose, what happened to the student’s playing afterward?

Capturing those cycles in usable form requires not just video of the lesson but video that is indexed to them — where each attempt-and-response pair can be located and compared. A continuous recording of a sixty-minute lesson is not organised for this purpose. Timestamps help but do not replace the work of finding and annotating each cycle.

The Absent Camera Problem

There is a more fundamental issue that no amount of improved equipment configuration addresses.

The feedback cycle a teacher most wants to reach is the one that happens in a student’s practice session. Between lessons, the student is alone in a practice room, working through the same passages, repeating the same mistakes (or, occasionally, having the experience of something going right for reasons they do not fully understand). The teacher’s instructions from the last lesson are present only in the student’s memory of them, which is fallible and partial.

The videography manual is about research documentation: a trained operator, institutional consent, equipment brought in from outside. None of that is available in a student’s practice session at eleven o’clock on a Wednesday night. And even if you could film it — which you could, technically, with a phone — the resulting footage would be unwatched, because no workflow exists to get it from the student’s device to the teacher’s eyes in a form that supports structured feedback.

The practical reality is that most music teachers receive feedback about a student’s practice only through the student’s report of it (“I practiced every day”) and through the evidence presented in the lesson (which may or may not reflect what practice actually looked like). The gap between practice and lesson feedback is a structural feature of music education, and it is not something that research videography can address.

A Software Response

The tool I built to think through this problem is called Resonance, and it is available at github.com/sebastianspicker/resonance.

The design is deliberately different from the research videography model. Instead of an external camera operator documenting a lesson for later analysis, Resonance puts the documentation instrument in the student’s hands. Students capture short audio or video clips of their own practice — snippets of a passage they want the teacher to hear, a moment where something went wrong, a phrase they are finally getting right — and submit them to a course. The teacher reviews the queue and adds feedback with timestamped annotations: “at 0:23, the bow pressure drops — this is what is generating the scratch.”

The asymmetry is intentional. The student decides what to document. The teacher provides structured, specific feedback. The cycle is asynchronous — the student submits at eleven on a Wednesday night; the teacher responds Thursday morning — which means it is independent of the lesson schedule.

The technical decisions follow from the use context. Students practice in rooms where connectivity is unreliable, so the app is offline-first: recordings are captured locally and uploaded when a connection is available. An iPad is the natural form factor for a music student — larger screen, better camera, sits on a music stand. The backend is standard (Node.js, Postgres, S3-compatible object storage) because the interesting problem here is not the infrastructure but the workflow.

Resonance is a prototype and a proof of concept, not a production system. The authentication is explicitly development-mode only. The goal was to build enough of the thing to be able to think clearly about what it does and does not solve.

What It Does Not Solve

Resonance addresses the absent-camera problem for the practice-to-feedback loop. It does not address the research documentation problem that the videography manual was written for.

If you want to study how music teachers give feedback — as a research question about teaching practice, not just as a workflow tool — you still need the full apparatus: controlled recording conditions, appropriate microphones for instruments, multi-camera coverage of the lesson, consent for the resulting footage to be used for research and teaching purposes, and post-processing that produces an analytically usable document.

Resonance footage is not that. It is what a student chose to capture on an iPad in a practice room, with whatever acoustic environment happened to be present. It is useful for the practice-feedback cycle; it is not a research record.

The challenges I described in the first two sections — appropriate microphones, multi-angle coverage of one-to-one lessons, capture of the practice cycle rather than the lesson arc — are still open problems for anyone trying to do systematic observational research in music education. The manual gives you the framework for thinking about them. It does not give you solutions, because those solutions are context-specific and, in several cases, not yet worked out by the field.

What I find interesting is that the two problems — research documentation and practice-feedback — might look the same (filming music education) but require almost entirely different responses. Getting clear on which problem you are solving turns out to be most of the work.

The full classroom videography manual is at kups.ub.uni-koeln.de/65599. The Resonance repository is at github.com/sebastianspicker/resonance.

References

Kramer, C., Spicker, S. J., & Kaspar, K. (2023). Manual zur Erstellung von Unterrichtsvideographien. KUPS Open Access. https://kups.ub.uni-koeln.de/65599/

Lehmann, A. C., Sloboda, J. A., & Woody, R. H. (2007). Psychology for Musicians: Understanding and Acquiring the Skills. Oxford University Press.

Presland, C. (2005). Conservatoire student and instrumental professor: The student perspective on a complex relationship. British Journal of Music Education, 22(3), 237–248. https://doi.org/10.1017/S0265051705006558

Creech, A., & Hallam, S. (2011). Learning a musical instrument: The influence of interpersonal interaction on outcomes for school-aged pupils. Psychology of Music, 39(1), 102–122. https://doi.org/10.1177/0305735610370222

How to Actually Film a Classroom: An Open-Access Manual on Classroom Videography

Tue, 09 May 2023 00:00:00 +0000

This post is a follow-up to the June 2020 post on ViLLA and video in teacher education. That post was about why classroom video is useful and what the ViLLA project found. This one is about the practical question that post sidestepped: what does it actually take to film a real lesson?

The manual — Kramer, C., Spicker, S. J., & Kaspar, K. (2023). Manual zur Erstellung von Unterrichtsvideographien — is open access and freely downloadable at kups.ub.uni-koeln.de/65599. Funded by the BMBF under the ZuS Qualitätsoffensive Lehrerbildung programme (grant 01JA1815).

Why a Manual Exists

The argument for classroom video in teacher education is not hard to make. The evidence that video-based learning improves the perceptual and interpretive skills of student teachers is solid enough that “should we use video?” is no longer a particularly interesting question. The interesting questions are downstream: which kind of video, for what purpose, produced how, stored where, used under what conditions.

The last of those — produced how — turns out to be the one that most programmes have the least guidance on. There is a reasonably large research literature on the effects of classroom video, and a smaller but growing literature on design principles for video-based learning environments. There is much less on the practical production side: what you need to decide before you enter a school building, what can go wrong during filming, and what the post-processing work actually involves.

The gap matters because it creates a reproducibility problem. If every research group that wants classroom video has to figure out independently how to handle consent across four institutional levels, how to position two cameras in a classroom with a window on the wrong side, and how much post-processing time to budget per lesson, a lot of effort goes into re-solving problems that have already been solved. The manual is an attempt to make that accumulated knowledge explicit and shareable.

Three Phases, and Why Preparation Is the Most Important One

The manual is structured around the production lifecycle: preparation, production, and post-processing. Each section ends with a practical checklist. The structuring is not original — it follows Thomson (2019) and draws on Herrle and Breitenbach (2016) and several other methodological guides — but the synthesis reflects what we learned from actually running videography sessions at the University of Cologne over several years.

The strongest claim in the manual is that preparation is the most important phase. This sounds obvious and is consistently underestimated.

Methodical preparation: the question before the camera question

Before any equipment decisions, the manual asks you to work through a prior question: is video actually the right medium for what you want to know?

This is not a rhetorical check. Classroom video is excellent at capturing dynamic processes — movement, gesture, voice, simultaneous events — and works well for constructs like classroom management and communication patterns. It works less well for constructs where the relevant data is not visible on the surface, like a student’s prior knowledge activation or the cognitive demands of a task. Using video for those questions is possible, but you need more sessions, more annotation work, and supplementary instruments. Building that into your timeline before you start is considerably better than realising it after you have sixty hours of footage.

The manual also distinguishes four decisions about what kind of video you are making:

Authentic vs. staged: real everyday teaching vs. deliberately constructed cases. Authentic footage gives you ecological validity; staged footage lets you control which situations appear.
Own vs. others’ teaching: self-recording for reflection vs. observing others for general analysis.
Typical vs. best practice: real-world teaching in its ordinary form vs. exemplary demonstration material.
Sequence vs. full lesson: a targeted extract sufficient for a specific analytic focus vs. a complete lesson for contextualised, developmental analysis.

None of these are neutral technical choices. They are methodological decisions that determine what the resulting footage can be used for and what it cannot.

The most time-consuming part of any real videography project is not the filming. It is obtaining the permissions.

You need written consent from pupils, parents or guardians (separately, depending on age — the threshold is 14 in the German legal framework the manual follows), the class teacher, school leadership, the school authority, and in some states the relevant ministry. The scope of the consent you obtain determines the scope of use you can put the footage to: footage filmed under a narrow research-project-only consent cannot be uploaded to ViLLA; footage filmed with broad usage rights can. The broader the rights you request, the higher the barrier for participants to agree.

The practical implication: decide early what you want to do with the footage, because what you put in the information letters and consent forms determines what is possible for the lifetime of the data. This is a decision you cannot easily undo.

The manual also addresses the case where some pupils do not consent: in that situation, it is often possible to position non-consenting pupils in a “blind spot” — an area of the room where neither camera nor microphone captures them. But this requires knowing the room layout and the planned seating arrangement in advance, which is another reason organisational preparation starts earlier than you think.

Technical preparation: as much as necessary, as little as possible

The guiding principle for equipment selection is stated directly in the manual: so viel wie nötig, so wenig wie möglich — as much as necessary, as little as possible.

This matters because there is a pull toward technical elaboration that does not always serve the research purpose. More cameras capture more perspectives; more microphones capture more of the acoustic space; 360° cameras give you everything. But more equipment means more setup time, more opportunities for failure during filming, and substantially more post-processing work. And more visual complexity in the final video does not automatically mean more analytically useful material — it can mean more cognitive load for the students watching it.

The baseline setup the manual recommends is two static cameras positioned facing each other: one centred on the students, one centred on the teacher. This configuration, with lavalier microphones on teachers and boundary microphones for student audio at the cameras, captures most of what you need for classroom management research and teacher education at a level of complexity that is manageable. Extensions — pan cameras for interaction analysis, additional cameras for group work, mobile eye-tracking for teacher perspective, 360° cameras — are described as additions for specific purposes, not as defaults.

What Happens During Filming

The production section of the manual is the most specific and in some ways the most useful part if you are planning a session for the first time. Some things worth knowing:

Start the cameras before the lesson. Authentically start once means you cannot go back. Events that happen before the official start of the lesson — how a teacher enters, how students settle, how the first few minutes of a lesson are framed — can be analytically relevant. And any technical problems that surface before teaching begins can still be fixed. Footage filmed before the lesson is easy to cut in post; lost footage from the opening of a lesson is gone.

The camera operator’s job is to be boring. The manual is explicit that operators should neither engage with the lesson content nor conspicuously attend to the equipment. A relaxed posture, eyes on the monitor, not reacting to what is happening in the room — this is the technique that allows pupils and teachers to stop registering the cameras, which typically happens within the first few minutes if operators are not drawing attention to themselves.

Use a clapper. When running multiple cameras or separate audio recorders, a handclap or clapperboard after all devices are rolling gives you a synchronisation point for later editing. This is known to everyone who has ever synchronised footage, but it is the kind of thing that is easy to forget in the scramble of setting up during a ten-minute break.

Backlighting is the enemy. Windows behind subjects produce the most common image quality problem in classroom footage. The manual discusses ND filters for cases where backlighting cannot be avoided, but the first-choice solution is room scouting in advance to know where the windows are and plan camera placement accordingly.

Post-Processing: The Hidden Cost

The post-processing chapter is the one I think is most likely to recalibrate expectations productively.

Post-processing is time-intensive in proportion to the number of camera angles, the number of audio tracks requiring synchronisation or correction, and the extent of image and sound quality work needed. The manual is explicit that editing should be done by people with content knowledge — not just technical skill — because the person in the edit suite is constantly making decisions about what to include, how to cut between perspectives, when to show the teacher’s face vs. the students' faces. Those decisions are not editorially neutral. They determine what a viewer of the finished video can perceive.

This is the point in the manual where the methodological problem I mentioned in the previous post becomes concrete: the videography setting is not a neutral window onto the classroom. The two-camera cross-cut convention (cut to the face of whoever is speaking) is widely used and convenient for teaching purposes, but it is also an editorial choice that foregrounds spoken exchange and makes other information — spatial position, background activity, gestural communication between students — less visible. Knowing that this choice was made is part of what a researcher or educator needs to know in order to use the footage responsibly.

Data security deserves its own mention. Video files are large, they contain images of minors, and they need to be stored under conditions that comply with current data protection law — which means redundant backup, restricted access, purpose limitation, and active awareness of what the current legal requirements are (which change). The manual recommends checking applicable regulations before starting rather than after, and treating data security as part of the workflow design rather than an administrative afterthought.

What Is Coming Next

The manual’s final chapter points toward three developments that are worth tracking:

360° video and VR. Gold and Windscheid (2020) found that 360° classroom video produces higher presence in student teacher observers than conventional video, though without differences in learning outcomes measured by events noticed or ratings of teaching quality. Whether the presence effect translates into something measurable is an open empirical question. The VR version of this — using 360° classroom footage as an immersive training environment where student teachers can observe without the pressure of having to act — is methodologically interesting and practically plausible at costs that are no longer prohibitive.

Animated classroom video. The handful of studies on animated (as opposed to filmed) classroom situations suggests that student teachers notice similar learning-relevant events in animated and real footage (Smith et al., 2012; Chieu et al., 2011). If that holds up, animation offers a way to construct specific scenarios that would be hard to capture or ethically complex to film — situations involving conflict, failure, or particular forms of student difficulty — without requiring access to actual classrooms or consent from real pupils.

Mobile eye-tracking. The combination of classroom videography with mobile eye-tracking worn by teachers (Rüth, Zimmermann, & Kaspar, 2020) opens the teacher’s-perspective angle that a fixed camera cannot capture. It is a technically more demanding addition to the setup but an analytically distinctive one, and the hardware costs have come down substantially.

A Note on Open Access

The manual is freely available at kups.ub.uni-koeln.de/65599. We made it open access deliberately. The practical obstacles to classroom videography — not knowing how to handle consent, not knowing what equipment configuration works for a standard lesson, not knowing how long post-processing will actually take — are not obstacles that should be higher for researchers at institutions without an existing videography infrastructure. The knowledge exists; it should be findable.

If you are at the University of Cologne and want to run a videography session but do not have your own equipment, the ZuS Media Labs project has a lending programme. Contact the team at zus-kontakt@uni-koeln.de for the current equipment catalogue.

For the specific challenges the manual doesn’t address — recording in music education, instrument acoustics, one-to-one lessons, and practice-session documentation — see the follow-up post on filming music education.

References

Chieu, V. M., Herbst, P., & Weiss, M. (2011). Effect of an animated classroom story embedded in online discussion on helping mathematics teachers learn to notice. Journal of the Learning Sciences, 20(4), 589–624. https://doi.org/10.1080/10508406.2011.528324

Gold, B., & Windscheid, J. (2020). Observing 360-degree classroom videos — effects of video type on presence, emotions, workload, classroom observations, and ratings of teaching quality. Computers & Education, 156, 103960. https://doi.org/10.1016/j.compedu.2020.103960

Herrle, M., & Breitenbach, S. (2016). Planung, Durchführung und Nachbereitung videogestützter Beobachtungen im Unterricht. In U. Rauin, M. Herrle & T. Engartner (Hrsg.), Videoanalysen in der Unterrichtsforschung, 30–49. Beltz Juventa.

Kramer, C., König, J., Strauß, S., & Kaspar, K. (2020). Classroom videos or transcripts? A quasi-experimental study to assess the effects of media-based learning on pre-service teachers’ situation-specific skills of classroom management. International Journal of Educational Research, 103, 101624. https://doi.org/10.1016/j.ijer.2020.101624

Rüth, M., Zimmermann, D., & Kaspar, K. (2020). Mobiles Eye-Tracking im Unterricht. In K. Kaspar et al. (Hrsg.), Bildung, Schule, Digitalisierung, 222–228. Waxmann.

Smith, D., McLaughlin, T., & Brown, I. (2012). 3-D computer animation vs. live-action video. Contemporary Issues in Technology and Teacher Education, 12(1), 41–54.

Thomson, A. (2019). The creation and use of video-for-learning in higher education. Master’s thesis, Queensland University of Technology. https://doi.org/10.5204/thesis.eprints.130743

They Told Me Not to Use Design Thinking. They Were Right.

Tue, 23 Nov 2021 00:00:00 +0000

A follow-up to the Mission to Mars post, which describes the experimental work. This one is about the methodology layer underneath it — specifically, what I got wrong.

The Setup

My background is in physics. I ended up in physics education research sideways, through the astro-lab project and through a genuine interest in why students find physics so alienating and what might help. When it came time to frame that work as a thesis, I had to choose a methodology.

I chose design thinking. Or more precisely, I chose something that borrowed heavily from design-based research and design thinking frameworks and that felt, at the time, like the obvious match for what I was doing. I was designing experiments. I was iterating on them. I was testing them with students and refining them. Design thinking is a framework for exactly this process. What could be more natural?

Several people told me I was making a mistake. Colleagues with more qualitative research experience, a supervisor who had been through the methodology debates in education research more times than he wanted to count. The consistent advice was: use grounded theory. Be systematic about your data. Let the categories emerge from what you actually observe rather than from what you designed the experiment to produce.

I thought I understood what they were saying. I did not understand what they were saying.

What I Thought Design Thinking Gave Me

Design thinking, as a research framing, offered what felt like a clean correspondence between method and subject matter. The thing I was producing was a designed artifact — a teaching experiment. The process I was following was inherently iterative: run it, observe what happens, revise, run it again. The framework had a vocabulary for this (empathise, define, ideate, prototype, test) that matched my actual working process.

Design-based research, the academic version of this approach in education, has a real literature behind it. It is used in educational technology research and in curriculum development. It is not a made-up category. The argument for it is reasonable: if you are trying to design effective educational interventions, then designing and studying those interventions at the same time is a coherent research strategy.

What I told myself was: I am doing design-based research. The methodology matches the work. The thesis will describe the design process, the rationale for each design decision, the iterative refinements, and the evidence that the final design works. This is a contribution to knowledge because it produces a principled, evidence-informed design that other practitioners can use and adapt.

This is not wrong. But it is not enough for a thesis. And I only understood why it is not enough after I had spent considerable time trying to make it be enough.

The Reckoning in the Methodology Chapter

The methodology chapter of a thesis is where you have to be explicit about the epistemological status of your claims. You are not just describing what you did. You are explaining why the thing you did counts as knowledge production, what kind of knowledge it produces, and how someone else could evaluate whether you did it correctly.

This is where design thinking started to come apart.

What kind of claim does a design study make? The honest answer is: it makes a claim about this design, in these contexts, with these students. It does not easily generalise beyond that. If I show that the Mission to Mars experiment produces measurable improvements in students’ understanding of air pressure in a student lab context at the University of Cologne in 2019, the implication for other teachers in other contexts is… unclear. The design worked here. Maybe it will work for you. Good luck.

A thesis contribution needs to be something more transferable than that. It needs to produce knowledge about a phenomenon, not just knowledge about a specific designed object. “Here is a well-designed experiment” is a practitioner contribution, which is genuinely valuable, but it is not the same as a theoretical contribution to the field.

The iteration problem. Design thinking celebrates iterative refinement. But in a thesis, every iteration needs to be motivated by evidence, and the nature of the evidence and how it maps onto the design changes needs to be made explicit. If I changed something between version 1 and version 2 of the experiment, the methodology chapter must explain: what data told me to make that change? How did I analyse it? What coding framework did I apply? What alternative changes did I consider and rule out, and on what grounds?

Design thinking has no systematic answer to these questions. It has process descriptions (“we tested with users and gathered feedback”) but not research methodology answers (“I applied open coding to the think-aloud protocols and the following categories emerged, which pointed toward this specific revision”). Without that precision, the “iteration” in the methodology chapter looks like: I tried it, it did not quite work, I made it better. Which is honest but not a researchable process.

The validation problem. Design-based research often validates its designs against the criteria that motivated the design. I designed the experiment to address specific student misconceptions about air pressure. I then tested whether students who did the experiment had fewer of those misconceptions afterward. If the answer is yes, the design is validated.

But this is circular in a way that becomes visible under examination. The misconceptions I targeted were the ones I identified at the start. The students I studied were the ones who came to my lab. The measurement instrument I used was one I designed to detect the specific changes I expected the design to produce. The whole system is oriented toward confirming the design rather than discovering something about the phenomenon.

Grounded theory cuts this loop. You start with the data — the students’ actual responses, their misconceptions as they express them, the things that confuse them that you did not anticipate — and you build categories from the bottom up. What you end up with is a theory of how students actually think about air pressure (or whatever the topic is), which may or may not match what you assumed when you designed the experiment. The cases where it does not match are precisely where the theoretical contribution lives.

What Grounded Theory Would Have Required

Grounded theory, done properly, is laborious. The Glaserian version (open coding, theoretical sampling until saturation, constant comparative method) requires treating every interview, every observation, every student response as a data source to be systematically analysed, compared, and connected into a coherent theory.

Theoretical sampling means you do not decide in advance how many students to study or what contexts to observe. You keep gathering data until new cases stop producing new categories — until the theory is saturated. This is methodologically sound and practically painful, because you cannot know in advance when you will be done.

Memoing — writing ongoing analytical notes about the emerging categories and their relationships — is a discipline that forces you to be explicit about your reasoning at every step. Not just “these two responses seem similar” but “these two responses are similar because both students are treating pressure as a property of moving air, and here is how that connects to the misconception documented by [citation].”

I did not want to do this. I wanted to design experiments. Grounded theory felt like a detour from the thing I was actually interested in.

The advice I received was: this is not a detour. A systematic analysis of what students think about air pressure, and how they think about it, and what experiences shift their thinking, is a theoretical contribution that would make the experiments more useful to everyone — not just a record of experiments that worked in one lab in one city in one year.

They were right about this.

What I Actually Learned (Too Late to Use in the Thesis)

The most useful student responses in the Mission to Mars experiment were not the ones that confirmed the design was working. They were the unexpected ones.

The PVC pipe failure — the moment when the lid pops off and students hear the sound — was included because I thought it would demonstrate the direction of pressure force in a visceral way. What I observed, which I noted but did not systematically analyse, was that different students interpreted the pop differently. Some immediately understood it as the internal air pushing out. Others interpreted it as the external vacuum pulling the lid. A few were unsure which way the force had been directed even after the event.

A grounded theory analysis of those responses would have produced something genuinely interesting: a typology of how students process a demonstrable physical event when it conflicts with their existing pressure intuitions. That typology would have been transferable to other experimental contexts, other pressure scenarios, other situations where students encounter the vacuum-suction confusion.

Instead I noted it, described it qualitatively, and moved on because it was not what the design was optimised to produce.

That is the design thinking trap. You are so focused on the designed outcome that you treat unexpected observations as noise rather than as data. Grounded theory treats them as the most valuable data you have.

A Note for Other Physicists Entering Education Research

If you are coming from a natural science background and you are starting work in education research, the methodology question will feel foreign at first. In physics, methodology is largely a matter of technical choice — which instrument, which statistical test, which model. The epistemological questions (what kind of knowledge does this produce? how does it generalise?) are handled by the experimental framework itself, which is a known, shared, peer-reviewed practice.

In qualitative education research, those questions are not handled in advance. You have to work them out explicitly, for your specific study, in writing. This is uncomfortable for people trained in a tradition where you do the experiment and then write up what happened.

The temptation, for a physicist, is to choose a methodology that feels like a framework for doing things rather than one that feels like a framework for thinking about what you found. Design thinking is a framework for doing things. Grounded theory is a framework for thinking about what you found.

Both are legitimate. But a thesis needs to make a theoretical contribution, and theoretical contributions come from systematic analysis of phenomena, not from documentation of designed objects.

I would have finished faster and understood more if I had done the uncomfortable thing from the start.

The experimental work this post is commenting on is described in Mission to Mars. For a more successful later use of qualitative methodology in a related context, see AI Transcription and Grounded Theory.

References

Glaser, B. G., & Strauss, A. L. (1967). The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine.

Strauss, A., & Corbin, J. (1998). Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory (2nd ed.). SAGE Publications.

The Design-Based Research Collective (2003). Design-based research: An emerging paradigm for educational inquiry. Educational Researcher, 32(1), 5–8. https://doi.org/10.3102/0013189X032001005

Brown, T. (2008). Design thinking. Harvard Business Review, 86(6), 84–92.

Research-Methods on Sebastian Spicker

Your Transcript Is Already an Interpretation: AI Transcription and Grounded Theory

The Seduction

Why This Is a Grounded Theory Problem Specifically

What the Tools Actually Do

aTrain

noScribe

The Normalisation Problem

A Methodological Comparison, Not a Feature List

Using Both Tools to Teach About Transcription

What These Tools Cannot Replace

Practical Summary

References

Changelog

What the Videography Manual Didn't Cover: Filming Music Education

The Assumption Underneath the Manual

Sound Is Not the Same Problem

The One-to-One Lesson Problem

What the Lesson Is Actually About

The Absent Camera Problem

A Software Response

What It Does Not Solve

References

How to Actually Film a Classroom: An Open-Access Manual on Classroom Videography

Why a Manual Exists

Three Phases, and Why Preparation Is the Most Important One

Methodical preparation: the question before the camera question

Organisational preparation: the consent problem is harder than it looks

Technical preparation: as much as necessary, as little as possible

What Happens During Filming

Post-Processing: The Hidden Cost

What Is Coming Next

A Note on Open Access

References

They Told Me Not to Use Design Thinking. They Were Right.

The Setup

What I Thought Design Thinking Gave Me

The Reckoning in the Methodology Chapter

What Grounded Theory Would Have Required

What I Actually Learned (Too Late to Use in the Thesis)

A Note for Other Physicists Entering Education Research

References