<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Qualitative-Methods on Sebastian Spicker</title>
    <link>https://sebastianspicker.github.io/tags/qualitative-methods/</link>
    <description>Recent content in Qualitative-Methods on Sebastian Spicker</description>
    <image>
      <title>Sebastian Spicker</title>
      <url>https://sebastianspicker.github.io/og-image.png</url>
      <link>https://sebastianspicker.github.io/og-image.png</link>
    </image>
    <generator>Hugo -- 0.160.0</generator>
    <language>en</language>
    <lastBuildDate>Tue, 10 Jun 2025 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://sebastianspicker.github.io/tags/qualitative-methods/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Your Transcript Is Already an Interpretation: AI Transcription and Grounded Theory</title>
      <link>https://sebastianspicker.github.io/posts/ai-transcription-grounded-theory/</link>
      <pubDate>Tue, 10 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/ai-transcription-grounded-theory/</guid>
      <description>aTrain and noScribe are local, GDPR-compliant, Whisper-based transcription tools that can genuinely save hours of work in qualitative interview research. They also make methodological decisions on your behalf without telling you. If you do grounded theory, you need to know which decisions those are.</description>
      <content:encoded><![CDATA[<p><em>In June 2025 I put together a practical guide on AI-assisted transcription
for professors of music pedagogy at HfMT Köln — primarily a hands-on
introduction to aTrain and noScribe. This post is the methodological
companion to that guide: the stuff I could not fit into a workshop handout
but that I think matters more than the installation instructions.</em></p>
<hr>
<h2 id="the-seduction">The Seduction</h2>
<p>AI transcription tools have reached a point where, for clean audio of a
single speaker in a quiet room, the output is genuinely good. You load a
90-minute interview, click a button, wait roughly 20 minutes, and get a
readable transcript with timestamps and speaker labels. In transcript-hours,
that is an order of magnitude faster than manual transcription. The appeal is
obvious, especially if you are a qualitative researcher working with a backlog
of interview recordings.</p>
<p>The two tools I have been evaluating — <strong>aTrain</strong> (developed at University of
Graz) and <strong>noScribe</strong> (an independent open-source project) — both run
entirely locally on your machine. No audio file is uploaded anywhere. No
cloud API is involved. This matters for interview research: you are handling
other people&rsquo;s speech, often on topics they regard as sensitive, and the
GDPR landscape for sending recordings to external servers is genuinely
complicated. Local processing sidesteps that problem entirely.</p>
<p>Both tools are built on <strong>OpenAI&rsquo;s Whisper model</strong>, which is — despite the
name — open-source and runs offline. They differ in interface philosophy,
feature depth, and what methodological commitments they make visible.</p>
<p>But the seduction is the problem. The speed and cleanliness of the output
makes it easy to treat the transcript as a neutral record rather than as a
construction. It is not. Every transcription is an act of interpretation. An
AI transcription is an act of interpretation performed by an algorithm that
does not know what your research question is.</p>
<hr>
<h2 id="why-this-is-a-grounded-theory-problem-specifically">Why This Is a Grounded Theory Problem Specifically</h2>
<p>In grounded theory — whether you follow the Strauss and Corbin tradition or
the constructivist reformulation by Charmaz — the researcher is not a passive
recorder of data. The analytical process begins with the first moment of
contact with the material. Coding, memo-writing, constant comparison, and
theoretical sampling all assume that you are working with data that you have
genuinely engaged with and that reflects choices made with your research
question in mind.</p>
<p>Transcription is the first of those choices. What counts as a pause? Do you
mark hesitations and self-corrections? Do you capture overlapping speech? Do
you note emphasis, speed changes, or trailing-off? The answers to these
questions are not neutral. They are determined by what level of analysis you
intend. A thematic analysis of interview content needs something different
from a conversation analysis of turn-taking, which needs something different
from a discourse analysis attending to hedges and disfluencies.</p>
<p>When you transcribe manually, you make these choices explicitly or
implicitly, but you make them. When you delegate to an algorithm, the
algorithm makes them — according to its training data and its default
settings — and then presents you with output that looks authoritative.</p>
<p>The risk is not that AI transcription is inaccurate (though it sometimes is).
The risk is that it is <em>selectively accurate in ways you did not choose</em> and
that those choices shape what you subsequently see in the data.</p>
<hr>
<h2 id="what-the-tools-actually-do">What the Tools Actually Do</h2>
<h3 id="atrain">aTrain</h3>
<p>aTrain is the simpler of the two. Windows-native (Microsoft Store), with a
macOS beta for Apple Silicon. The interface has essentially one meaningful
decision point after you load your file: whether to activate speaker
detection. Everything else is handled automatically. Output formats are plain
text with timestamps, SRT subtitle files, and — most useful for researchers —
direct QDA exports for MAXQDA, ATLAS.ti, and NVivo with synchronised
audio-timestamp links.</p>
<p>What aTrain does not do: it does not mark pauses. It does not detect
disfluencies (the <em>ähms</em>, <em>uhs</em>, self-interruptions, false starts). It does
not detect overlapping speech. It produces clean, semantically coherent
transcripts — which means it actively smooths what you gave it. If a
speaker says <em>&ldquo;well — I mean — it was, I think it was more like — yeah,
complicated&rdquo;</em>, aTrain will probably give you something closer to <em>&ldquo;I think it
was complicated&rdquo;</em>. The hesitation structure disappears.</p>
<p>For a thematic interview study where you are interested in what people said
about a topic, this is probably fine. For any analysis where <em>how</em> something
was said is part of the data — pace, repair, emphasis, epistemic hedging —
aTrain is erasing data you need.</p>
<h3 id="noscribe">noScribe</h3>
<p>noScribe is more complex in almost every dimension. Available for Windows,
macOS (including Apple Silicon and Intel), and Linux. The interface exposes
a meaningful number of configuration decisions:</p>
<ul>
<li><strong>Mark Pause</strong>: off, or marked at 1-, 2-, or 3-second thresholds, with
conventional notation <code>(.)</code>, <code>(..)</code>, <code>(...)</code>, <code>(10 seconds pause)</code></li>
<li><strong>Speaker Detection</strong>: automatic count, fixed count, or disabled</li>
<li><strong>Overlapping Speech</strong>: experimental detection, marked with <code>//double slash//</code></li>
<li><strong>Disfluencies</strong>: off or on — captures <em>ähm</em>, <em>äh</em>, self-corrections,
false starts</li>
<li><strong>Timestamps</strong>: by speaker turn or every 60 seconds</li>
</ul>
<p>It also has an integrated editor (noScribeEdit) with synchronised audio
playback: click anywhere in the transcript and the audio seeks to that
position. This is the single most useful feature for post-transcription
review, and aTrain does not have anything equivalent.</p>
<p>The configuration complexity is not gratuitous. It reflects the fact that
different methodological frameworks require different transcription
conventions. noScribe&rsquo;s disfluency detection corresponds roughly to what a
GAT2-Light transcription requires. Its pause notation system maps onto
conversation analytic conventions. The choices you make in the interface are
methodological choices, not just technical preferences.</p>
<hr>
<h2 id="the-normalisation-problem">The Normalisation Problem</h2>
<p>Both tools perform what I would call <em>normalisation</em>: they produce transcripts
that read more fluently than the original speech. This is a feature from a
usability standpoint and a methodological liability from a qualitative
research standpoint.</p>
<p>Specific failure modes I observed in evaluation:</p>
<p><strong>Compound word errors</strong> (more pronounced in noScribe for German): <em>VR-Brille</em>
(&ldquo;VR headset&rdquo;) transcribed as <em>Brille VR</em>, proper nouns mangled, domain
vocabulary rendered phonetically. In music research contexts this is
particularly salient — instrument names, notation terms, composer names, and
genre vocabulary are all potential failure points.</p>
<p><strong>Speaker detection overcounting</strong>: both tools, when speaker detection is
active, tend to identify more speakers than are present. A two-person
interview with one hesitant speaker may generate three or four speaker labels.
Manual correction is required.</p>
<p><strong>Acoustic transcription</strong>: noScribe occasionally produces what the document
calls <em>lautliche Transkriptionen</em> — phonetic renderings rather than semantic
ones. A speaker saying <em>Beamer</em> (data projector) may be transcribed as <em>Bima</em>.
This is not an error in the conventional sense; it is the model accurately
representing what it heard acoustically rather than semantically resolving it.
For music researchers studying how non-specialist participants talk about
technical equipment, this is interesting. For most interview research, it
requires correction.</p>
<p><strong>Pause and overlap reliability degrades with audio quality</strong>: both tools
perform well on clean, close-mic mono recordings of single speakers in quiet
rooms. Introduce a second speaker, ambient noise, variable recording distance,
or a phone recording, and accuracy drops substantially. This matters
specifically for music interview research, where the interview setting is
often a rehearsal room or performance space rather than an acoustic booth.</p>
<hr>
<h2 id="a-methodological-comparison-not-a-feature-list">A Methodological Comparison, Not a Feature List</h2>
<p>The useful comparison between aTrain and noScribe is not technical — it is
about which methodological contexts each is suited to.</p>
<table>
  <thead>
      <tr>
          <th>Research context</th>
          <th>Tool</th>
          <th>Why</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Thematic/content analysis, single speaker</td>
          <td>aTrain</td>
          <td>Speed, simplicity, adequate accuracy, QDA export</td>
      </tr>
      <tr>
          <td>Grounded theory with attention to epistemic hedging</td>
          <td>noScribe + disfluencies</td>
          <td>Captures the hesitation structure that carries methodological information</td>
      </tr>
      <tr>
          <td>Conversation analysis</td>
          <td>Neither, or noScribe as starting point</td>
          <td>CA requires phonetic detail neither tool reliably produces</td>
      </tr>
      <tr>
          <td>Large corpus, initial open coding</td>
          <td>aTrain</td>
          <td>Volume and speed outweigh detail at early stages</td>
      </tr>
      <tr>
          <td>Interpretive phenomenological analysis</td>
          <td>noScribe</td>
          <td>The pause and disfluency data is IPA-relevant</td>
      </tr>
      <tr>
          <td>Teaching transcription as a research practice</td>
          <td>Both</td>
          <td><em>See below</em></td>
      </tr>
  </tbody>
</table>
<p>The last row deserves its own section.</p>
<hr>
<h2 id="using-both-tools-to-teach-about-transcription">Using Both Tools to Teach About Transcription</h2>
<p>The most pedagogically valuable use of these tools is probably not producing
transcripts — it is using them to make the constructed nature of transcripts
visible to students.</p>
<p>A simple exercise: take a three-minute excerpt of an interview recording.
Have students transcribe it manually according to whatever convention the
course uses. Then run the same excerpt through aTrain and noScribe with
different settings. Compare the three or four resulting transcripts in a
seminar discussion.</p>
<p>The differences that emerge are not about which transcript is &ldquo;correct&rdquo;. They
are about what each transcript makes visible and what it hides. The aTrain
transcript will be clean and readable. The manually-produced transcript will
have annotation that the students chose based on what struck them as relevant.
The noScribe transcript with disfluencies enabled will look noisy. All three
are representations of the same three minutes of speech.</p>
<p>Questions that come out of this reliably: Why did the student who transcribed
manually mark that particular pause? What did the student not mark that the
software did? What did the software produce that the student did not hear?
What does the &ldquo;cleaner&rdquo; transcript lose?</p>
<p>This is the entry point to a genuinely grounded theory-relevant conversation
about data construction: the transcript is not the data. The transcript is a
representation of the data made according to principles that should be
theoretically motivated, and those principles should be stated explicitly in
the methods section.</p>
<hr>
<h2 id="what-these-tools-cannot-replace">What These Tools Cannot Replace</h2>
<p>The document I prepared for the HfMT professors ends with a sentence I want
to quote directly from the German, because it is the methodological core of
the whole thing:</p>
<blockquote>
<p><em>Automatisierung ersetzt nicht das Nachdenken über Daten.</em>
Automation does not replace thinking about data.</p>
</blockquote>
<p>More precisely: the algorithm makes decisions about what counts as a pause,
what counts as language, whose voice counts as a separate speaker — without
knowing what is scientifically relevant. It does not know that the half-second
hesitation before a particular word is the most important moment in the
interview. It does not know that the overlapping &ldquo;mm-hm&rdquo; is a data point for
your analysis of how the interviewee manages discomfort. It does not know
that the repeated self-correction in the middle of a sentence about teaching
practice is where your emerging category is.</p>
<p>You have to know that. And you only know it if you have been in enough
contact with the material to have developed theoretical sensitivity — which is
exactly what Strauss and Corbin mean when they describe the iterative
relationship between data collection, coding, and theoretical development in
grounded theory.</p>
<p>AI transcription tools save the hours of typing. They do not and cannot
substitute for the analytical engagement that makes a grounded theory study
produce knowledge rather than a theme list.</p>
<p>Use them. But use them knowing what they are doing.</p>
<hr>
<h2 id="practical-summary">Practical Summary</h2>
<ul>
<li><strong>aTrain</strong>: one-click, local, GDPR-compliant, good QDA integration,
appropriate for thematic analysis. No disfluencies, no pauses, no
overlap detection. Versions: Windows (Microsoft Store), macOS beta.
Current version: 1.3.1.</li>
<li><strong>noScribe</strong>: more complex, highly configurable, disfluency and pause
detection, integrated audio-sync editor, appropriate for grounded theory
and discourse-oriented work. More demanding to set up. Current version:
0.6.2.</li>
<li><strong>Neither tool</strong> is appropriate as a black-box solution for conversation
analysis or prosodic research.</li>
<li><strong>Both tools</strong> require manual post-processing. Estimate correction time
at roughly 20–40% of the original interview length for clean recordings
with a single speaker; more for multi-speaker or suboptimal audio.</li>
<li><strong>In teaching</strong>: the exercise of comparing manual, aTrain, and noScribe
transcripts of the same excerpt is more pedagogically valuable than any
of the transcripts individually.</li>
</ul>
<hr>
<h2 id="references">References</h2>
<p>Charmaz, K. (2014). <em>Constructing Grounded Theory</em> (2nd ed.).
SAGE Publications.</p>
<p>Dresing, T. &amp; Pehl, T. (2018). <em>Praxisbuch Interview, Transkription &amp;
Analyse</em> (8th ed.). Eigenverlag. <a href="https://www.audiotranskription.de">https://www.audiotranskription.de</a></p>
<p>Haberl, A., Fleiß, J., Kowald, D., &amp; Thalmann, S. (2024). Take the aTrain.
Introducing an interface for the accessible transcription of interviews.
<em>Journal of Behavioral and Experimental Finance</em>, 41, 100891.
<a href="https://doi.org/10.1016/j.jbef.2024.100891">https://doi.org/10.1016/j.jbef.2024.100891</a></p>
<p>Kailscheuer, K. (2023). noScribe [software].
<a href="https://github.com/kaixxx/noScribe">https://github.com/kaixxx/noScribe</a></p>
<p>Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., &amp; Sutskever, I.
(2022). Robust speech recognition via large-scale weak supervision.
arXiv preprint arXiv:2212.04356. <a href="https://arxiv.org/abs/2212.04356">https://arxiv.org/abs/2212.04356</a></p>
<p>Strauss, A. &amp; Corbin, J. (1998). <em>Basics of Qualitative Research</em>
(2nd ed.). SAGE Publications.</p>
<hr>
<h2 id="changelog">Changelog</h2>
<ul>
<li><strong>2026-01-20</strong>: Updated the aTrain reference to the published form: Haberl, A., Fleiß, J., Kowald, D., &amp; Thalmann, S. (2024), &ldquo;Take the aTrain. Introducing an interface for the accessible transcription of interviews.&rdquo;</li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>They Told Me Not to Use Design Thinking. They Were Right.</title>
      <link>https://sebastianspicker.github.io/posts/design-thinking-vs-grounded-theory/</link>
      <pubDate>Tue, 23 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/design-thinking-vs-grounded-theory/</guid>
      <description>When you are a physicist doing education research, methodology feels like a bureaucratic formality standing between you and the interesting work. Everyone told me to use grounded theory instead of design thinking in my thesis. I ignored them. This is the postmortem.</description>
      <content:encoded><![CDATA[<p><em>A follow-up to the <a href="/posts/mission-to-mars/">Mission to Mars</a> post, which
describes the experimental work. This one is about the methodology layer
underneath it — specifically, what I got wrong.</em></p>
<hr>
<h2 id="the-setup">The Setup</h2>
<p>My background is in physics. I ended up in physics education research
sideways, through the astro-lab project and through a genuine interest in
why students find physics so alienating and what might help. When it came
time to frame that work as a thesis, I had to choose a methodology.</p>
<p>I chose design thinking. Or more precisely, I chose something that
borrowed heavily from design-based research and design thinking frameworks
and that felt, at the time, like the obvious match for what I was doing.
I was designing experiments. I was iterating on them. I was testing them
with students and refining them. Design thinking is a framework for
exactly this process. What could be more natural?</p>
<p>Several people told me I was making a mistake. Colleagues with more
qualitative research experience, a supervisor who had been through
the methodology debates in education research more times than he wanted
to count. The consistent advice was: use grounded theory. Be systematic
about your data. Let the categories emerge from what you actually observe
rather than from what you designed the experiment to produce.</p>
<p>I thought I understood what they were saying. I did not understand what
they were saying.</p>
<hr>
<h2 id="what-i-thought-design-thinking-gave-me">What I Thought Design Thinking Gave Me</h2>
<p>Design thinking, as a research framing, offered what felt like a clean
correspondence between method and subject matter. The thing I was
producing was a designed artifact — a teaching experiment. The process
I was following was inherently iterative: run it, observe what happens,
revise, run it again. The framework had a vocabulary for this (empathise,
define, ideate, prototype, test) that matched my actual working process.</p>
<p>Design-based research, the academic version of this approach in education,
has a real literature behind it. It is used in educational technology
research and in curriculum development. It is not a made-up category. The
argument for it is reasonable: if you are trying to design effective
educational interventions, then designing and studying those interventions
at the same time is a coherent research strategy.</p>
<p>What I told myself was: I am doing design-based research. The methodology
matches the work. The thesis will describe the design process, the
rationale for each design decision, the iterative refinements, and the
evidence that the final design works. This is a contribution to knowledge
because it produces a principled, evidence-informed design that other
practitioners can use and adapt.</p>
<p>This is not wrong. But it is not enough for a thesis. And I only
understood why it is not enough after I had spent considerable time
trying to make it be enough.</p>
<hr>
<h2 id="the-reckoning-in-the-methodology-chapter">The Reckoning in the Methodology Chapter</h2>
<p>The methodology chapter of a thesis is where you have to be explicit
about the epistemological status of your claims. You are not just
describing what you did. You are explaining why the thing you did counts
as knowledge production, what kind of knowledge it produces, and how
someone else could evaluate whether you did it correctly.</p>
<p>This is where design thinking started to come apart.</p>
<p><strong>What kind of claim does a design study make?</strong> The honest answer is:
it makes a claim about this design, in these contexts, with these
students. It does not easily generalise beyond that. If I show that
the Mission to Mars experiment produces measurable improvements in
students&rsquo; understanding of air pressure in a student lab context at
the University of Cologne in 2019, the implication for other teachers
in other contexts is&hellip; unclear. The design worked here. Maybe it
will work for you. Good luck.</p>
<p>A thesis contribution needs to be something more transferable than that.
It needs to produce knowledge about a phenomenon, not just knowledge
about a specific designed object. &ldquo;Here is a well-designed experiment&rdquo;
is a practitioner contribution, which is genuinely valuable, but it is
not the same as a theoretical contribution to the field.</p>
<p><strong>The iteration problem.</strong> Design thinking celebrates iterative
refinement. But in a thesis, every iteration needs to be motivated by
evidence, and the nature of the evidence and how it maps onto the
design changes needs to be made explicit. If I changed something between
version 1 and version 2 of the experiment, the methodology chapter must
explain: what data told me to make that change? How did I analyse it?
What coding framework did I apply? What alternative changes did I
consider and rule out, and on what grounds?</p>
<p>Design thinking has no systematic answer to these questions. It has
process descriptions (&ldquo;we tested with users and gathered feedback&rdquo;) but
not research methodology answers (&ldquo;I applied open coding to the think-aloud
protocols and the following categories emerged, which pointed toward
this specific revision&rdquo;). Without that precision, the &ldquo;iteration&rdquo; in
the methodology chapter looks like: I tried it, it did not quite work,
I made it better. Which is honest but not a researchable process.</p>
<p><strong>The validation problem.</strong> Design-based research often validates its
designs against the criteria that motivated the design. I designed the
experiment to address specific student misconceptions about air pressure.
I then tested whether students who did the experiment had fewer of those
misconceptions afterward. If the answer is yes, the design is validated.</p>
<p>But this is circular in a way that becomes visible under examination.
The misconceptions I targeted were the ones I identified at the start.
The students I studied were the ones who came to my lab. The measurement
instrument I used was one I designed to detect the specific changes
I expected the design to produce. The whole system is oriented toward
confirming the design rather than discovering something about the
phenomenon.</p>
<p>Grounded theory cuts this loop. You start with the data — the
students&rsquo; actual responses, their misconceptions as they express them,
the things that confuse them that you did not anticipate — and you
build categories from the bottom up. What you end up with is a theory
of how students actually think about air pressure (or whatever the topic
is), which may or may not match what you assumed when you designed the
experiment. The cases where it does not match are precisely where the
theoretical contribution lives.</p>
<hr>
<h2 id="what-grounded-theory-would-have-required">What Grounded Theory Would Have Required</h2>
<p>Grounded theory, done properly, is laborious. The Glaserian version
(open coding, theoretical sampling until saturation, constant
comparative method) requires treating every interview, every observation,
every student response as a data source to be systematically analysed,
compared, and connected into a coherent theory.</p>
<p>Theoretical sampling means you do not decide in advance how many students
to study or what contexts to observe. You keep gathering data until new
cases stop producing new categories — until the theory is saturated.
This is methodologically sound and practically painful, because you
cannot know in advance when you will be done.</p>
<p>Memoing — writing ongoing analytical notes about the emerging categories
and their relationships — is a discipline that forces you to be explicit
about your reasoning at every step. Not just &ldquo;these two responses seem
similar&rdquo; but &ldquo;these two responses are similar because both students are
treating pressure as a property of moving air, and here is how that
connects to the misconception documented by [citation].&rdquo;</p>
<p>I did not want to do this. I wanted to design experiments. Grounded
theory felt like a detour from the thing I was actually interested in.</p>
<p>The advice I received was: this is not a detour. A systematic analysis
of what students think about air pressure, and how they think about it,
and what experiences shift their thinking, is a theoretical contribution
that would make the experiments more useful to everyone — not just a
record of experiments that worked in one lab in one city in one year.</p>
<p>They were right about this.</p>
<hr>
<h2 id="what-i-actually-learned-too-late-to-use-in-the-thesis">What I Actually Learned (Too Late to Use in the Thesis)</h2>
<p>The most useful student responses in the Mission to Mars experiment
were not the ones that confirmed the design was working. They were the
unexpected ones.</p>
<p>The PVC pipe failure — the moment when the lid pops off and students
hear the sound — was included because I thought it would demonstrate the
direction of pressure force in a visceral way. What I observed, which
I noted but did not systematically analyse, was that different students
interpreted the pop differently. Some immediately understood it as the
internal air pushing out. Others interpreted it as the external vacuum
pulling the lid. A few were unsure which way the force had been directed
even after the event.</p>
<p>A grounded theory analysis of those responses would have produced
something genuinely interesting: a typology of how students process
a demonstrable physical event when it conflicts with their existing
pressure intuitions. That typology would have been transferable to
other experimental contexts, other pressure scenarios, other situations
where students encounter the vacuum-suction confusion.</p>
<p>Instead I noted it, described it qualitatively, and moved on because
it was not what the design was optimised to produce.</p>
<p>That is the design thinking trap. You are so focused on the designed
outcome that you treat unexpected observations as noise rather than as
data. Grounded theory treats them as the most valuable data you have.</p>
<hr>
<h2 id="a-note-for-other-physicists-entering-education-research">A Note for Other Physicists Entering Education Research</h2>
<p>If you are coming from a natural science background and you are starting
work in education research, the methodology question will feel foreign
at first. In physics, methodology is largely a matter of technical
choice — which instrument, which statistical test, which model. The
epistemological questions (what kind of knowledge does this produce?
how does it generalise?) are handled by the experimental framework
itself, which is a known, shared, peer-reviewed practice.</p>
<p>In qualitative education research, those questions are not handled in
advance. You have to work them out explicitly, for your specific study,
in writing. This is uncomfortable for people trained in a tradition where
you do the experiment and then write up what happened.</p>
<p>The temptation, for a physicist, is to choose a methodology that feels
like a framework for doing things rather than one that feels like a
framework for thinking about what you found. Design thinking is a
framework for doing things. Grounded theory is a framework for thinking
about what you found.</p>
<p>Both are legitimate. But a thesis needs to make a theoretical contribution,
and theoretical contributions come from systematic analysis of phenomena,
not from documentation of designed objects.</p>
<p>I would have finished faster and understood more if I had done the
uncomfortable thing from the start.</p>
<hr>
<p><em>The experimental work this post is commenting on is described in
<a href="/posts/mission-to-mars/">Mission to Mars</a>. For a more successful later
use of qualitative methodology in a related context, see
<a href="/posts/ai-transcription-grounded-theory/">AI Transcription and Grounded Theory</a>.</em></p>
<hr>
<h2 id="references">References</h2>
<p>Glaser, B. G., &amp; Strauss, A. L. (1967). <em>The Discovery of Grounded
Theory: Strategies for Qualitative Research.</em> Aldine.</p>
<p>Strauss, A., &amp; Corbin, J. (1998). <em>Basics of Qualitative Research:
Techniques and Procedures for Developing Grounded Theory</em> (2nd ed.).
SAGE Publications.</p>
<p>The Design-Based Research Collective (2003). Design-based research: An
emerging paradigm for educational inquiry. <em>Educational Researcher</em>,
32(1), 5–8. <a href="https://doi.org/10.3102/0013189X032001005">https://doi.org/10.3102/0013189X032001005</a></p>
<p>Brown, T. (2008). Design thinking. <em>Harvard Business Review</em>, 86(6),
84–92.</p>
]]></content:encoded>
    </item>
    <item>
      <title>What Happens When You Film Student Teachers: ViLLA and the Case for Video in Teacher Education</title>
      <link>https://sebastianspicker.github.io/posts/villa-video-teacher-education/</link>
      <pubDate>Sun, 14 Jun 2020 00:00:00 +0000</pubDate>
      <guid>https://sebastianspicker.github.io/posts/villa-video-teacher-education/</guid>
      <description>ViLLA is an online portal of real classroom videos built for teacher education at the University of Cologne. The idea sounds straightforward. Getting there required filming actual lessons, building infrastructure, surviving a quasi-experiment, and eventually convincing the federal government that this was worth scaling. Some notes on how that went.</description>
      <content:encoded><![CDATA[<p><em>In September 2019 I gave a presentation on the ViLLA project at the ZuS Innovation Workshop
at the University of Cologne together with Daniel Zimmermann. This post is the
blog-friendly version of that presentation — what ViLLA is, why video in teacher
education is not as obvious as it sounds, and what the research actually showed.
The project team at the time: Prof. Dr. Dr. Kai Kaspar, Prof. Dr. Johannes König,
Charlotte Kramer, Marco Rüth, Daniel Zimmermann, Anne van Laak, and myself.</em></p>
<hr>
<h2 id="the-problem-with-learning-to-teach">The Problem With Learning to Teach</h2>
<p>Here is the uncomfortable thing about learning to teach: for the first few years of your
career, your primary research subjects are children. Every class you misread, every
transition you fumble, every moment you lose the room — those are learning experiences,
and the students in the room pay part of the cost.</p>
<p>This is not a new problem, and nobody is pretending it has a clean solution. But it
raises a question that teacher education programmes have been grappling with for a long
time: how much of the relevant learning can happen before the student teacher is standing
alone in front of thirty eleven-year-olds?</p>
<p>One answer — not the only one, but a defensible one — is: more of it, if you give people
good video.</p>
<hr>
<h2 id="what-villa-is">What ViLLA Is</h2>
<p><strong>ViLLA</strong> (Videos in der Lehrerinnen- und Lehrerausbildung — Videos in Teacher Education)
is an online portal of real classroom recordings built for use in teacher education at
the University of Cologne. The idea was to film actual teaching, make the recordings
searchable and pedagogically annotated, and give student teachers access to genuine
classroom situations before they were responsible for managing one themselves.</p>
<p>This sounds straightforward until you try to do it. Filming real classrooms requires
ethical clearance, consent from pupils and parents, cooperation from schools, and a
recording setup that doesn&rsquo;t turn the lesson into a performance. The resulting videos
need to be usable for instruction, which means they need accompanying material:
lesson plans, worksheets, transcripts, annotations by subject-matter specialists.
And then they need to be housed somewhere students can actually find them.</p>
<p>The first phase of ViLLA ran from April 2013 to December 2014, funded by the
University of Cologne&rsquo;s Innovation in Teaching programme. We opened officially on
5 November 2014 with a database of classroom sequences tagged by subject, year group,
school type, and didactic focus. The core intended audience: student teachers,
<em>Referendar</em>innen* (trainee teachers in the practical training phase), and the
university instructors and school-based mentors working with them.</p>
<hr>
<h2 id="what-the-research-showed">What the Research Showed</h2>
<p>The project was not just infrastructure. From the beginning we ran research alongside
the portal development — specifically, quasi-experimental studies on whether and how
video-based instruction actually improves the skills we care about.</p>
<p>The target construct was <strong>situation-specific skills for classroom management</strong> —
the ability to perceive, interpret, and respond to classroom events in real time.
This is a domain where there is reasonable theoretical agreement that expert teachers
differ from novices not primarily in declarative knowledge (knowing that you should
address disruptions early) but in perception and response speed (actually noticing
the early signs and acting on them).</p>
<p>The key finding from the ViLLA studies: <strong>combining video with transcripts was more
effective than control seminars that used neither</strong>. Students who worked with video
and transcript material showed better development of situation-specific classroom
management skills than comparison groups. The effect was not enormous, but it was
there, it replicated, and it was large enough to justify the infrastructure investment.</p>
<p>The transcript component is worth highlighting because it&rsquo;s not obvious. You might
expect that video alone would be sufficient — you are showing people real teaching.
But the transcript creates an additional layer of perceptual access: you can pause
on a moment, read back exactly what was said, annotate, compare your reading of the
situation with a peer&rsquo;s. The multimodal combination seems to do something that either
medium alone does not.</p>
<hr>
<h2 id="villa-20-scaling-up">ViLLA 2.0: Scaling Up</h2>
<p>By 2015, ViLLA had grown into a second development phase. In November 2016 it received
federal funding through the BMBF&rsquo;s <em>Qualitätsoffensive Lehrerbildung</em> (Quality Initiative
for Teacher Education), embedded in the University of Cologne&rsquo;s
<em>Zukunftsstrategie Lehrer</em>innenbildung* (ZuS) umbrella project.</p>
<p>The scale change was significant. <strong>185 videos</strong> in the database by the time of
the 2019 presentation, covering more subjects, more school types, and more
outside-school teaching and learning scenarios than the original portal had included.
The self-learning modules — originally an add-on — became a central feature.</p>
<p>Two types of modules emerged from the practice:</p>
<p><strong>Case-based modules</strong> built around a specific filmed sequence, asking the learner
to work through what they observe, what decisions the teacher made, and what they
would do differently. These are close to case-based reasoning as used in medical
education — the video is the case.</p>
<p><strong>Theme-centred modules</strong> organised around a pedagogical concept (classroom
transitions, group work monitoring, handling disruptions) and drawing on multiple
video examples to illustrate the same phenomenon across different contexts. The
goal is pattern recognition — not learning what to do in <em>this</em> lesson, but
developing a schema that transfers to next year&rsquo;s class in a different school.</p>
<hr>
<h2 id="the-meta-portal-and-what-it-means">The Meta-Portal and What It Means</h2>
<p>One development I am particularly interested in from a research infrastructure
perspective: ViLLA&rsquo;s integration into <strong>unterrichtsvideos.net</strong>, a meta-portal
that aggregates classroom video collections from universities across Germany.</p>
<p>The single-portal model has an obvious limitation: your institution&rsquo;s videos
reflect your institution&rsquo;s context. The schools you filmed, the subject specialists
on your team, the pedagogical questions your programme emphasises. Aggregation
across portals means a student teacher in Cologne can access video collected at
Münster or Berlin, search across the combined database by year group and subject,
and get access without separate registration at each institution.</p>
<p>This matters for research too. A shared infrastructure with standardised tagging
creates the conditions for cross-institutional studies. You can ask whether the
same video material works differently in different programme contexts, or whether
different annotation frameworks lead to different learning outcomes. The portal is
also, then, a methodology — a way of generating comparable data.</p>
<hr>
<h2 id="what-i-think-is-actually-interesting-here">What I Think Is Actually Interesting Here</h2>
<p>I should be honest about where my personal research interest sits in all of this,
because it is not primarily in the technology.</p>
<p>The thing that I find genuinely interesting about the ViLLA project is the implicit
theory of professional learning it rests on. We filmed real lessons — not idealised
demonstrations, not training videos produced for the purpose, but actual classroom
teaching with the roughness and contingency that implies. We then gave those videos
to student teachers and asked them to look carefully.</p>
<p>The assumption is that professional perception can be educated. That what distinguishes
a competent teacher from a novice is not just accumulated experience but the capacity
to read situations quickly and accurately — and that this capacity can be developed
through structured encounter with material before you are responsible for it.</p>
<p>This is an empirical claim and we have evidence for it. But it also connects to
broader questions about expertise, perception, and what it means to prepare someone
for a practice-based profession. Medical education has been working on these
questions through simulation and case-based learning for decades. Teacher education
is, in many institutions, still catching up.</p>
<p>ViLLA is one attempt to close that gap. Whether it is the right attempt, in its
current form, is something I am still working out. But the question it is trying to
answer — what do you need to have seen, and thought about, before you can teach
well — seems to me like one of the important ones.</p>
<hr>
<h2 id="where-this-is-going">Where This Is Going</h2>
<p>Two strands that were live at the time of the 2019 presentation and that I will
return to in later posts:</p>
<p>The <strong>ProvidiS</strong> project (Förderung der professionellen Wahrnehmung in digitalen,
videobasierten Selbstlernmodulen — Promoting Professional Perception in Digital,
Video-Based Self-Learning Modules), a follow-on BMBF project in cooperation with
the Universities of Münster and FU Berlin, which moves from infrastructure to
targeted intervention design. The question shifts from &ldquo;does video work?&rdquo; to
&ldquo;which features of video-based learning design produce which effects on professional
perception, for which learners?&rdquo;</p>
<p>And a methodological strand I have become increasingly interested in: <strong>the
videography setting itself as a research question</strong>. How you film a lesson — camera
placement, editing conventions, what gets cut — shapes what the viewer can perceive.
The transcript does something similar. These are not neutral mediations. They are
constructions, and the choices made in constructing them have downstream effects on
what student teachers learn to see. This connects to questions I have been thinking
about in qualitative methodology more broadly — which I will probably end up writing
about separately.</p>
<hr>
<h2 id="references">References</h2>
<p>König, J., Blömeke, S., Klein, P., Suhl, U., Busse, A., &amp; Kaiser, G. (2014).
Is teachers&rsquo; general pedagogical knowledge a premise for noticing and interpreting
classroom situations? <em>Teaching and Teacher Education</em>, 38, 76–88.
<a href="https://doi.org/10.1016/j.tate.2013.11.004">https://doi.org/10.1016/j.tate.2013.11.004</a></p>
<p>Kramer, C., König, J., Strauß, S., &amp; Kaspar, K. (2020). Classroom videos or
transcripts? A quasi-experimental study to assess the effects of media-based
learning on pre-service teachers&rsquo; situation-specific skills of classroom
management. <em>International Journal of Educational Research</em>, 103, 101624.
<a href="https://doi.org/10.1016/j.ijer.2020.101624">https://doi.org/10.1016/j.ijer.2020.101624</a></p>
<p>Sherin, M. G. (2007). The development of teachers&rsquo; professional vision in video
clubs. In R. Goldman, R. Pea, B. Barron, &amp; S. J. Derry (Eds.),
<em>Video Research in the Learning Sciences</em> (pp. 383–395). Lawrence Erlbaum.</p>
<p>van Es, E. A., &amp; Sherin, M. G. (2002). Learning to notice: Scaffolding new teachers'
interpretations of classroom interactions. <em>Journal of Technology and Teacher
Education</em>, 10(4), 571–596.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
