DSC-2026-08 | From Audio to Text: Automated Transcriptions with Whisper

Wann?

04. Juni 2026
9:30 - 12:30 Uhr
14:00 - 16:30 Uhr

Wo?

Campus
Raum folgt in Kürze

Trainer*innen

Annika Nolte
Nele Fuchs
Data Science Center, Universität Bremen

Anzahl Teilnehmende: Max. 20
Sprache: Englisch

Kalendereintrag:
t3://file?uid=200835

Why is the topic important?

Interviews are a key method of qualitative research, forming the basis of scholarly insights in many areas of the digital humanities, including oral history, linguistics and ethnography. But qualitative audio data is also becoming increasingly important in other disciplines. However, transcribing the collected audio data is extremely time- and resource-intensive. As a rule of thumb, one hour of recorded material requires approximately four to sixty hours of manual transcription time (Evers 2011).

This process can be significantly accelerated using automated methods such as Whisper, an open-weight tool and Python package for automatic speech recognition (ASR). Whisper enables the creation of initial transcript drafts, which can then be manually revised.

Workshop Goal

Participants will have the opportunity to experiment hands-on with Whisper and related tool, and assess their potential for their own research. At the same time, the workshop aims to enable participants to critically reflect on the methodological, ethical, and technical challenges of automated transcription. By the end of the workshop, participants will have gained a solid understanding of the various applications of Whisper. Participants will be able to generate reliable first drafts of transcriptions. Optionally, participants will learn how Python can help scale transcription to larger collections, adapt workflows to specific needs, and keep processing transparent and reproducible.

Workshop Content

Please note: Attending only the morning session is possible. 

Morning
  • Introduction to audio transcription in qualitative research and its relevance.
  • Overview of requirements for transcription tools (e.g., data protection, GDPR compliance).
  • Presentation of Whisper and the open-source software noScribe built on it.
  • Critical discussion: Opportunities and limitations of automated transcription (accuracy, bias, impact on research processes).
  • Hands-on session: First transcriptions with noScribe using provided audio files; optionally perform transcriptions with participants’ own audio files.

Afternoon (optional)
  • Introduction to Whisper as a machine learning model and Python package (for advanced workflows such as larger audio collections and workflow adaptations).
  • Transparent and reproducible transcription workflows.
  • Step-by-step demo: Running and customizing Whisper scripts in Jupyter Notebooks.
  • Hands-on session: Use and adapt the provided scripts; optionally perform transcriptions with participants’ own audio files.

Target Audience & Prior Knowledge

This workshop is designed for anyone who needs to transcribe audio files and wants to automate the process. It is primarily aimed at researchers working with qualitative data.

No specific technical knowledge is required to participate in the morning session. Experience with transcribing interviews or other research audio data is helpful but not mandatory.

The afternoon session is aimed at participants who wish to gain deeper insights into using Whisper as a Python package. Basic knowledge of a programming language (ideally Python) is beneficial but not required. The workshop is designed so that technically inclined researchers without prior programming experience can also gain an initial understanding of working with scripts. For participants who want to build up Python basics, we also offer a Python beginners workshop, and the corresponding self-study materials will be published on GitHub after the workshop (https://github.com/Data-Science-Center-UB/Python-Introduction-for-Researchers).

Technical Requirements

  • Own laptop and connection to the Wifi (via eduroam).
  • For the afternoon: Please make sure you have access to the Jupyter4NFDI.
  • Please bring headphones with you so that you can listen to audio files.

About the Trainers

Nele Fuchs and Annika Nolte are data scientists for training and consulting at the DSC.

As a DSC data scientist and environmental scientist, Annika Nolte supports researchers with their data management and analysis workflows. In training and consulting, Annika draws on broad expertise in Earth system sciences and extensive experience in scientific programming. Her main focus areas are data standardization, data management, statistical methods, geospatial analysis, and machine learning in environmental and marine sciences.

Nele Fuchs studied Philosophy, Material Culture: Textile (CvO University of Oldenburg), and Transcultural Studies (University of Bremen). As a Data Scientist in the Humanities, she supports researchers in Digital Humanities, data science methods for qualitative research, and FAIR-compliant qualitative data management, leveraging her expertise in handling sensitive qualitative data.