Qualitative-Methods on Sebastian Spicker

Your Transcript Is Already an Interpretation: AI Transcription and Grounded Theory

Tue, 10 Jun 2025 00:00:00 +0000

In June 2025 I put together a practical guide on AI-assisted transcription for professors of music pedagogy at HfMT Köln — primarily a hands-on introduction to aTrain and noScribe. This post is the methodological companion to that guide: the stuff I could not fit into a workshop handout but that I think matters more than the installation instructions.

The Seduction

AI transcription tools have reached a point where, for clean audio of a single speaker in a quiet room, the output is genuinely good. You load a 90-minute interview, click a button, wait roughly 20 minutes, and get a readable transcript with timestamps and speaker labels. In transcript-hours, that is an order of magnitude faster than manual transcription. The appeal is obvious, especially if you are a qualitative researcher working with a backlog of interview recordings.

The two tools I have been evaluating — aTrain (developed at University of Graz) and noScribe (an independent open-source project) — both run entirely locally on your machine. No audio file is uploaded anywhere. No cloud API is involved. This matters for interview research: you are handling other people’s speech, often on topics they regard as sensitive, and the GDPR landscape for sending recordings to external servers is genuinely complicated. Local processing sidesteps that problem entirely.

Both tools are built on OpenAI’s Whisper model, which is — despite the name — open-source and runs offline. They differ in interface philosophy, feature depth, and what methodological commitments they make visible.

But the seduction is the problem. The speed and cleanliness of the output makes it easy to treat the transcript as a neutral record rather than as a construction. It is not. Every transcription is an act of interpretation. An AI transcription is an act of interpretation performed by an algorithm that does not know what your research question is.

Why This Is a Grounded Theory Problem Specifically

In grounded theory — whether you follow the Strauss and Corbin tradition or the constructivist reformulation by Charmaz — the researcher is not a passive recorder of data. The analytical process begins with the first moment of contact with the material. Coding, memo-writing, constant comparison, and theoretical sampling all assume that you are working with data that you have genuinely engaged with and that reflects choices made with your research question in mind.

Transcription is the first of those choices. What counts as a pause? Do you mark hesitations and self-corrections? Do you capture overlapping speech? Do you note emphasis, speed changes, or trailing-off? The answers to these questions are not neutral. They are determined by what level of analysis you intend. A thematic analysis of interview content needs something different from a conversation analysis of turn-taking, which needs something different from a discourse analysis attending to hedges and disfluencies.

When you transcribe manually, you make these choices explicitly or implicitly, but you make them. When you delegate to an algorithm, the algorithm makes them — according to its training data and its default settings — and then presents you with output that looks authoritative.

The risk is not that AI transcription is inaccurate (though it sometimes is). The risk is that it is selectively accurate in ways you did not choose and that those choices shape what you subsequently see in the data.

What the Tools Actually Do

aTrain

aTrain is the simpler of the two. Windows-native (Microsoft Store), with a macOS beta for Apple Silicon. The interface has essentially one meaningful decision point after you load your file: whether to activate speaker detection. Everything else is handled automatically. Output formats are plain text with timestamps, SRT subtitle files, and — most useful for researchers — direct QDA exports for MAXQDA, ATLAS.ti, and NVivo with synchronised audio-timestamp links.

What aTrain does not do: it does not mark pauses. It does not detect disfluencies (the ähms, uhs, self-interruptions, false starts). It does not detect overlapping speech. It produces clean, semantically coherent transcripts — which means it actively smooths what you gave it. If a speaker says “well — I mean — it was, I think it was more like — yeah, complicated”, aTrain will probably give you something closer to “I think it was complicated”. The hesitation structure disappears.

For a thematic interview study where you are interested in what people said about a topic, this is probably fine. For any analysis where how something was said is part of the data — pace, repair, emphasis, epistemic hedging — aTrain is erasing data you need.

noScribe

noScribe is more complex in almost every dimension. Available for Windows, macOS (including Apple Silicon and Intel), and Linux. The interface exposes a meaningful number of configuration decisions:

Mark Pause: off, or marked at 1-, 2-, or 3-second thresholds, with conventional notation (.), (..), (...), (10 seconds pause)
Speaker Detection: automatic count, fixed count, or disabled
Overlapping Speech: experimental detection, marked with //double slash//
Disfluencies: off or on — captures ähm, äh, self-corrections, false starts
Timestamps: by speaker turn or every 60 seconds

It also has an integrated editor (noScribeEdit) with synchronised audio playback: click anywhere in the transcript and the audio seeks to that position. This is the single most useful feature for post-transcription review, and aTrain does not have anything equivalent.

The configuration complexity is not gratuitous. It reflects the fact that different methodological frameworks require different transcription conventions. noScribe’s disfluency detection corresponds roughly to what a GAT2-Light transcription requires. Its pause notation system maps onto conversation analytic conventions. The choices you make in the interface are methodological choices, not just technical preferences.

The Normalisation Problem

Both tools perform what I would call normalisation: they produce transcripts that read more fluently than the original speech. This is a feature from a usability standpoint and a methodological liability from a qualitative research standpoint.

Specific failure modes I observed in evaluation:

Compound word errors (more pronounced in noScribe for German): VR-Brille (“VR headset”) transcribed as Brille VR, proper nouns mangled, domain vocabulary rendered phonetically. In music research contexts this is particularly salient — instrument names, notation terms, composer names, and genre vocabulary are all potential failure points.

Speaker detection overcounting: both tools, when speaker detection is active, tend to identify more speakers than are present. A two-person interview with one hesitant speaker may generate three or four speaker labels. Manual correction is required.

Acoustic transcription: noScribe occasionally produces what the document calls lautliche Transkriptionen — phonetic renderings rather than semantic ones. A speaker saying Beamer (data projector) may be transcribed as Bima. This is not an error in the conventional sense; it is the model accurately representing what it heard acoustically rather than semantically resolving it. For music researchers studying how non-specialist participants talk about technical equipment, this is interesting. For most interview research, it requires correction.

Pause and overlap reliability degrades with audio quality: both tools perform well on clean, close-mic mono recordings of single speakers in quiet rooms. Introduce a second speaker, ambient noise, variable recording distance, or a phone recording, and accuracy drops substantially. This matters specifically for music interview research, where the interview setting is often a rehearsal room or performance space rather than an acoustic booth.

A Methodological Comparison, Not a Feature List

The useful comparison between aTrain and noScribe is not technical — it is about which methodological contexts each is suited to.

Research context	Tool	Why
Thematic/content analysis, single speaker	aTrain	Speed, simplicity, adequate accuracy, QDA export
Grounded theory with attention to epistemic hedging	noScribe + disfluencies	Captures the hesitation structure that carries methodological information
Conversation analysis	Neither, or noScribe as starting point	CA requires phonetic detail neither tool reliably produces
Large corpus, initial open coding	aTrain	Volume and speed outweigh detail at early stages
Interpretive phenomenological analysis	noScribe	The pause and disfluency data is IPA-relevant
Teaching transcription as a research practice	Both	See below

The last row deserves its own section.

Using Both Tools to Teach About Transcription

The most pedagogically valuable use of these tools is probably not producing transcripts — it is using them to make the constructed nature of transcripts visible to students.

A simple exercise: take a three-minute excerpt of an interview recording. Have students transcribe it manually according to whatever convention the course uses. Then run the same excerpt through aTrain and noScribe with different settings. Compare the three or four resulting transcripts in a seminar discussion.

The differences that emerge are not about which transcript is “correct”. They are about what each transcript makes visible and what it hides. The aTrain transcript will be clean and readable. The manually-produced transcript will have annotation that the students chose based on what struck them as relevant. The noScribe transcript with disfluencies enabled will look noisy. All three are representations of the same three minutes of speech.

Questions that come out of this reliably: Why did the student who transcribed manually mark that particular pause? What did the student not mark that the software did? What did the software produce that the student did not hear? What does the “cleaner” transcript lose?

This is the entry point to a genuinely grounded theory-relevant conversation about data construction: the transcript is not the data. The transcript is a representation of the data made according to principles that should be theoretically motivated, and those principles should be stated explicitly in the methods section.

What These Tools Cannot Replace

The document I prepared for the HfMT professors ends with a sentence I want to quote directly from the German, because it is the methodological core of the whole thing:

Automatisierung ersetzt nicht das Nachdenken über Daten. Automation does not replace thinking about data.

More precisely: the algorithm makes decisions about what counts as a pause, what counts as language, whose voice counts as a separate speaker — without knowing what is scientifically relevant. It does not know that the half-second hesitation before a particular word is the most important moment in the interview. It does not know that the overlapping “mm-hm” is a data point for your analysis of how the interviewee manages discomfort. It does not know that the repeated self-correction in the middle of a sentence about teaching practice is where your emerging category is.

You have to know that. And you only know it if you have been in enough contact with the material to have developed theoretical sensitivity — which is exactly what Strauss and Corbin mean when they describe the iterative relationship between data collection, coding, and theoretical development in grounded theory.

AI transcription tools save the hours of typing. They do not and cannot substitute for the analytical engagement that makes a grounded theory study produce knowledge rather than a theme list.

Use them. But use them knowing what they are doing.

Practical Summary

aTrain: one-click, local, GDPR-compliant, good QDA integration, appropriate for thematic analysis. No disfluencies, no pauses, no overlap detection. Versions: Windows (Microsoft Store), macOS beta. Current version: 1.3.1.
noScribe: more complex, highly configurable, disfluency and pause detection, integrated audio-sync editor, appropriate for grounded theory and discourse-oriented work. More demanding to set up. Current version: 0.6.2.
Neither tool is appropriate as a black-box solution for conversation analysis or prosodic research.
Both tools require manual post-processing. Estimate correction time at roughly 20–40% of the original interview length for clean recordings with a single speaker; more for multi-speaker or suboptimal audio.
In teaching: the exercise of comparing manual, aTrain, and noScribe transcripts of the same excerpt is more pedagogically valuable than any of the transcripts individually.

References

Charmaz, K. (2014). Constructing Grounded Theory (2nd ed.). SAGE Publications.

Dresing, T. & Pehl, T. (2018). Praxisbuch Interview, Transkription & Analyse (8th ed.). Eigenverlag. https://www.audiotranskription.de

Haberl, A., Fleiß, J., Kowald, D., & Thalmann, S. (2024). Take the aTrain. Introducing an interface for the accessible transcription of interviews. Journal of Behavioral and Experimental Finance, 41, 100891. https://doi.org/10.1016/j.jbef.2024.100891

Kailscheuer, K. (2023). noScribe [software]. https://github.com/kaixxx/noScribe

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356. https://arxiv.org/abs/2212.04356

Strauss, A. & Corbin, J. (1998). Basics of Qualitative Research (2nd ed.). SAGE Publications.

Changelog

2026-01-20: Updated the aTrain reference to the published form: Haberl, A., Fleiß, J., Kowald, D., & Thalmann, S. (2024), “Take the aTrain. Introducing an interface for the accessible transcription of interviews.”

They Told Me Not to Use Design Thinking. They Were Right.

Tue, 23 Nov 2021 00:00:00 +0000

A follow-up to the Mission to Mars post, which describes the experimental work. This one is about the methodology layer underneath it — specifically, what I got wrong.

The Setup

My background is in physics. I ended up in physics education research sideways, through the astro-lab project and through a genuine interest in why students find physics so alienating and what might help. When it came time to frame that work as a thesis, I had to choose a methodology.

I chose design thinking. Or more precisely, I chose something that borrowed heavily from design-based research and design thinking frameworks and that felt, at the time, like the obvious match for what I was doing. I was designing experiments. I was iterating on them. I was testing them with students and refining them. Design thinking is a framework for exactly this process. What could be more natural?

Several people told me I was making a mistake. Colleagues with more qualitative research experience, a supervisor who had been through the methodology debates in education research more times than he wanted to count. The consistent advice was: use grounded theory. Be systematic about your data. Let the categories emerge from what you actually observe rather than from what you designed the experiment to produce.

I thought I understood what they were saying. I did not understand what they were saying.

What I Thought Design Thinking Gave Me

Design thinking, as a research framing, offered what felt like a clean correspondence between method and subject matter. The thing I was producing was a designed artifact — a teaching experiment. The process I was following was inherently iterative: run it, observe what happens, revise, run it again. The framework had a vocabulary for this (empathise, define, ideate, prototype, test) that matched my actual working process.

Design-based research, the academic version of this approach in education, has a real literature behind it. It is used in educational technology research and in curriculum development. It is not a made-up category. The argument for it is reasonable: if you are trying to design effective educational interventions, then designing and studying those interventions at the same time is a coherent research strategy.

What I told myself was: I am doing design-based research. The methodology matches the work. The thesis will describe the design process, the rationale for each design decision, the iterative refinements, and the evidence that the final design works. This is a contribution to knowledge because it produces a principled, evidence-informed design that other practitioners can use and adapt.

This is not wrong. But it is not enough for a thesis. And I only understood why it is not enough after I had spent considerable time trying to make it be enough.

The Reckoning in the Methodology Chapter

The methodology chapter of a thesis is where you have to be explicit about the epistemological status of your claims. You are not just describing what you did. You are explaining why the thing you did counts as knowledge production, what kind of knowledge it produces, and how someone else could evaluate whether you did it correctly.

This is where design thinking started to come apart.

What kind of claim does a design study make? The honest answer is: it makes a claim about this design, in these contexts, with these students. It does not easily generalise beyond that. If I show that the Mission to Mars experiment produces measurable improvements in students’ understanding of air pressure in a student lab context at the University of Cologne in 2019, the implication for other teachers in other contexts is… unclear. The design worked here. Maybe it will work for you. Good luck.

A thesis contribution needs to be something more transferable than that. It needs to produce knowledge about a phenomenon, not just knowledge about a specific designed object. “Here is a well-designed experiment” is a practitioner contribution, which is genuinely valuable, but it is not the same as a theoretical contribution to the field.

The iteration problem. Design thinking celebrates iterative refinement. But in a thesis, every iteration needs to be motivated by evidence, and the nature of the evidence and how it maps onto the design changes needs to be made explicit. If I changed something between version 1 and version 2 of the experiment, the methodology chapter must explain: what data told me to make that change? How did I analyse it? What coding framework did I apply? What alternative changes did I consider and rule out, and on what grounds?

Design thinking has no systematic answer to these questions. It has process descriptions (“we tested with users and gathered feedback”) but not research methodology answers (“I applied open coding to the think-aloud protocols and the following categories emerged, which pointed toward this specific revision”). Without that precision, the “iteration” in the methodology chapter looks like: I tried it, it did not quite work, I made it better. Which is honest but not a researchable process.

The validation problem. Design-based research often validates its designs against the criteria that motivated the design. I designed the experiment to address specific student misconceptions about air pressure. I then tested whether students who did the experiment had fewer of those misconceptions afterward. If the answer is yes, the design is validated.

But this is circular in a way that becomes visible under examination. The misconceptions I targeted were the ones I identified at the start. The students I studied were the ones who came to my lab. The measurement instrument I used was one I designed to detect the specific changes I expected the design to produce. The whole system is oriented toward confirming the design rather than discovering something about the phenomenon.

Grounded theory cuts this loop. You start with the data — the students’ actual responses, their misconceptions as they express them, the things that confuse them that you did not anticipate — and you build categories from the bottom up. What you end up with is a theory of how students actually think about air pressure (or whatever the topic is), which may or may not match what you assumed when you designed the experiment. The cases where it does not match are precisely where the theoretical contribution lives.

What Grounded Theory Would Have Required

Grounded theory, done properly, is laborious. The Glaserian version (open coding, theoretical sampling until saturation, constant comparative method) requires treating every interview, every observation, every student response as a data source to be systematically analysed, compared, and connected into a coherent theory.

Theoretical sampling means you do not decide in advance how many students to study or what contexts to observe. You keep gathering data until new cases stop producing new categories — until the theory is saturated. This is methodologically sound and practically painful, because you cannot know in advance when you will be done.

Memoing — writing ongoing analytical notes about the emerging categories and their relationships — is a discipline that forces you to be explicit about your reasoning at every step. Not just “these two responses seem similar” but “these two responses are similar because both students are treating pressure as a property of moving air, and here is how that connects to the misconception documented by [citation].”

I did not want to do this. I wanted to design experiments. Grounded theory felt like a detour from the thing I was actually interested in.

The advice I received was: this is not a detour. A systematic analysis of what students think about air pressure, and how they think about it, and what experiences shift their thinking, is a theoretical contribution that would make the experiments more useful to everyone — not just a record of experiments that worked in one lab in one city in one year.

They were right about this.

What I Actually Learned (Too Late to Use in the Thesis)

The most useful student responses in the Mission to Mars experiment were not the ones that confirmed the design was working. They were the unexpected ones.

The PVC pipe failure — the moment when the lid pops off and students hear the sound — was included because I thought it would demonstrate the direction of pressure force in a visceral way. What I observed, which I noted but did not systematically analyse, was that different students interpreted the pop differently. Some immediately understood it as the internal air pushing out. Others interpreted it as the external vacuum pulling the lid. A few were unsure which way the force had been directed even after the event.

A grounded theory analysis of those responses would have produced something genuinely interesting: a typology of how students process a demonstrable physical event when it conflicts with their existing pressure intuitions. That typology would have been transferable to other experimental contexts, other pressure scenarios, other situations where students encounter the vacuum-suction confusion.

Instead I noted it, described it qualitatively, and moved on because it was not what the design was optimised to produce.

That is the design thinking trap. You are so focused on the designed outcome that you treat unexpected observations as noise rather than as data. Grounded theory treats them as the most valuable data you have.

A Note for Other Physicists Entering Education Research

If you are coming from a natural science background and you are starting work in education research, the methodology question will feel foreign at first. In physics, methodology is largely a matter of technical choice — which instrument, which statistical test, which model. The epistemological questions (what kind of knowledge does this produce? how does it generalise?) are handled by the experimental framework itself, which is a known, shared, peer-reviewed practice.

In qualitative education research, those questions are not handled in advance. You have to work them out explicitly, for your specific study, in writing. This is uncomfortable for people trained in a tradition where you do the experiment and then write up what happened.

The temptation, for a physicist, is to choose a methodology that feels like a framework for doing things rather than one that feels like a framework for thinking about what you found. Design thinking is a framework for doing things. Grounded theory is a framework for thinking about what you found.

Both are legitimate. But a thesis needs to make a theoretical contribution, and theoretical contributions come from systematic analysis of phenomena, not from documentation of designed objects.

I would have finished faster and understood more if I had done the uncomfortable thing from the start.

The experimental work this post is commenting on is described in Mission to Mars. For a more successful later use of qualitative methodology in a related context, see AI Transcription and Grounded Theory.

References

Glaser, B. G., & Strauss, A. L. (1967). The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine.

Strauss, A., & Corbin, J. (1998). Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory (2nd ed.). SAGE Publications.

The Design-Based Research Collective (2003). Design-based research: An emerging paradigm for educational inquiry. Educational Researcher, 32(1), 5–8. https://doi.org/10.3102/0013189X032001005

Brown, T. (2008). Design thinking. Harvard Business Review, 86(6), 84–92.

What Happens When You Film Student Teachers: ViLLA and the Case for Video in Teacher Education

Sun, 14 Jun 2020 00:00:00 +0000

In September 2019 I gave a presentation on the ViLLA project at the ZuS Innovation Workshop at the University of Cologne together with Daniel Zimmermann. This post is the blog-friendly version of that presentation — what ViLLA is, why video in teacher education is not as obvious as it sounds, and what the research actually showed. The project team at the time: Prof. Dr. Dr. Kai Kaspar, Prof. Dr. Johannes König, Charlotte Kramer, Marco Rüth, Daniel Zimmermann, Anne van Laak, and myself.

The Problem With Learning to Teach

Here is the uncomfortable thing about learning to teach: for the first few years of your career, your primary research subjects are children. Every class you misread, every transition you fumble, every moment you lose the room — those are learning experiences, and the students in the room pay part of the cost.

This is not a new problem, and nobody is pretending it has a clean solution. But it raises a question that teacher education programmes have been grappling with for a long time: how much of the relevant learning can happen before the student teacher is standing alone in front of thirty eleven-year-olds?

One answer — not the only one, but a defensible one — is: more of it, if you give people good video.

What ViLLA Is

ViLLA (Videos in der Lehrerinnen- und Lehrerausbildung — Videos in Teacher Education) is an online portal of real classroom recordings built for use in teacher education at the University of Cologne. The idea was to film actual teaching, make the recordings searchable and pedagogically annotated, and give student teachers access to genuine classroom situations before they were responsible for managing one themselves.

This sounds straightforward until you try to do it. Filming real classrooms requires ethical clearance, consent from pupils and parents, cooperation from schools, and a recording setup that doesn’t turn the lesson into a performance. The resulting videos need to be usable for instruction, which means they need accompanying material: lesson plans, worksheets, transcripts, annotations by subject-matter specialists. And then they need to be housed somewhere students can actually find them.

The first phase of ViLLA ran from April 2013 to December 2014, funded by the University of Cologne’s Innovation in Teaching programme. We opened officially on 5 November 2014 with a database of classroom sequences tagged by subject, year group, school type, and didactic focus. The core intended audience: student teachers, Referendarinnen* (trainee teachers in the practical training phase), and the university instructors and school-based mentors working with them.

What the Research Showed

The project was not just infrastructure. From the beginning we ran research alongside the portal development — specifically, quasi-experimental studies on whether and how video-based instruction actually improves the skills we care about.

The target construct was situation-specific skills for classroom management — the ability to perceive, interpret, and respond to classroom events in real time. This is a domain where there is reasonable theoretical agreement that expert teachers differ from novices not primarily in declarative knowledge (knowing that you should address disruptions early) but in perception and response speed (actually noticing the early signs and acting on them).

The key finding from the ViLLA studies: combining video with transcripts was more effective than control seminars that used neither. Students who worked with video and transcript material showed better development of situation-specific classroom management skills than comparison groups. The effect was not enormous, but it was there, it replicated, and it was large enough to justify the infrastructure investment.

The transcript component is worth highlighting because it’s not obvious. You might expect that video alone would be sufficient — you are showing people real teaching. But the transcript creates an additional layer of perceptual access: you can pause on a moment, read back exactly what was said, annotate, compare your reading of the situation with a peer’s. The multimodal combination seems to do something that either medium alone does not.

ViLLA 2.0: Scaling Up

By 2015, ViLLA had grown into a second development phase. In November 2016 it received federal funding through the BMBF’s Qualitätsoffensive Lehrerbildung (Quality Initiative for Teacher Education), embedded in the University of Cologne’s Zukunftsstrategie Lehrerinnenbildung* (ZuS) umbrella project.

The scale change was significant. 185 videos in the database by the time of the 2019 presentation, covering more subjects, more school types, and more outside-school teaching and learning scenarios than the original portal had included. The self-learning modules — originally an add-on — became a central feature.

Two types of modules emerged from the practice:

Case-based modules built around a specific filmed sequence, asking the learner to work through what they observe, what decisions the teacher made, and what they would do differently. These are close to case-based reasoning as used in medical education — the video is the case.

Theme-centred modules organised around a pedagogical concept (classroom transitions, group work monitoring, handling disruptions) and drawing on multiple video examples to illustrate the same phenomenon across different contexts. The goal is pattern recognition — not learning what to do in this lesson, but developing a schema that transfers to next year’s class in a different school.

The Meta-Portal and What It Means

One development I am particularly interested in from a research infrastructure perspective: ViLLA’s integration into unterrichtsvideos.net, a meta-portal that aggregates classroom video collections from universities across Germany.

The single-portal model has an obvious limitation: your institution’s videos reflect your institution’s context. The schools you filmed, the subject specialists on your team, the pedagogical questions your programme emphasises. Aggregation across portals means a student teacher in Cologne can access video collected at Münster or Berlin, search across the combined database by year group and subject, and get access without separate registration at each institution.

This matters for research too. A shared infrastructure with standardised tagging creates the conditions for cross-institutional studies. You can ask whether the same video material works differently in different programme contexts, or whether different annotation frameworks lead to different learning outcomes. The portal is also, then, a methodology — a way of generating comparable data.

What I Think Is Actually Interesting Here

I should be honest about where my personal research interest sits in all of this, because it is not primarily in the technology.

The thing that I find genuinely interesting about the ViLLA project is the implicit theory of professional learning it rests on. We filmed real lessons — not idealised demonstrations, not training videos produced for the purpose, but actual classroom teaching with the roughness and contingency that implies. We then gave those videos to student teachers and asked them to look carefully.

The assumption is that professional perception can be educated. That what distinguishes a competent teacher from a novice is not just accumulated experience but the capacity to read situations quickly and accurately — and that this capacity can be developed through structured encounter with material before you are responsible for it.

This is an empirical claim and we have evidence for it. But it also connects to broader questions about expertise, perception, and what it means to prepare someone for a practice-based profession. Medical education has been working on these questions through simulation and case-based learning for decades. Teacher education is, in many institutions, still catching up.

ViLLA is one attempt to close that gap. Whether it is the right attempt, in its current form, is something I am still working out. But the question it is trying to answer — what do you need to have seen, and thought about, before you can teach well — seems to me like one of the important ones.

Where This Is Going

Two strands that were live at the time of the 2019 presentation and that I will return to in later posts:

The ProvidiS project (Förderung der professionellen Wahrnehmung in digitalen, videobasierten Selbstlernmodulen — Promoting Professional Perception in Digital, Video-Based Self-Learning Modules), a follow-on BMBF project in cooperation with the Universities of Münster and FU Berlin, which moves from infrastructure to targeted intervention design. The question shifts from “does video work?” to “which features of video-based learning design produce which effects on professional perception, for which learners?”

And a methodological strand I have become increasingly interested in: the videography setting itself as a research question. How you film a lesson — camera placement, editing conventions, what gets cut — shapes what the viewer can perceive. The transcript does something similar. These are not neutral mediations. They are constructions, and the choices made in constructing them have downstream effects on what student teachers learn to see. This connects to questions I have been thinking about in qualitative methodology more broadly — which I will probably end up writing about separately.

References

König, J., Blömeke, S., Klein, P., Suhl, U., Busse, A., & Kaiser, G. (2014). Is teachers’ general pedagogical knowledge a premise for noticing and interpreting classroom situations? Teaching and Teacher Education, 38, 76–88. https://doi.org/10.1016/j.tate.2013.11.004

Kramer, C., König, J., Strauß, S., & Kaspar, K. (2020). Classroom videos or transcripts? A quasi-experimental study to assess the effects of media-based learning on pre-service teachers’ situation-specific skills of classroom management. International Journal of Educational Research, 103, 101624. https://doi.org/10.1016/j.ijer.2020.101624

Sherin, M. G. (2007). The development of teachers’ professional vision in video clubs. In R. Goldman, R. Pea, B. Barron, & S. J. Derry (Eds.), Video Research in the Learning Sciences (pp. 383–395). Lawrence Erlbaum.

van Es, E. A., & Sherin, M. G. (2002). Learning to notice: Scaffolding new teachers' interpretations of classroom interactions. Journal of Technology and Teacher Education, 10(4), 571–596.