News-Article

From Audio to Text: DSC Brings AI-Supported Transcription to the Digital Humanities

At DHd2026 conference in Vienna, Nele Fuchs and Annika Nolte led a hands-on workshop on automated audio transcription with Whisper. The session highlights how the DSC supports practical data skills in the Digital Humanities as part of its DataNord-project.

On February 24, 2026, Nele Fuchs and Annika Nolte from the Data Science Center (DSC) will lead a full-day workshop at DHd2026: Not only text, not only data in Vienna. The annual conference brings together researchers from across the Digital Humanities to explore how diverse data types, methods, and algorithmic approaches can be sustainably integrated into humanities research.

Within this context, the two DataNord team members will offer the workshop “From Audio to Text: Automated Transcription with Whisper.” It is aimed at researchers and multipliers who want to integrate qualitative audio data more efficiently into their research workflows.

Automated transcription as an entry point to data-intensive Digital Humanities research

Interviews play a central role in many areas of the Digital Humanities, including oral history, linguistics, and ethnography. At the same time, manual transcription is extremely time-consuming. In the workshop, participants learn how this process can be significantly streamlined using Whisper, an open-weight automatic speech recognition model.

Whisper generates initial transcript drafts that can then be reviewed and refined manually. A key focus of the workshop is the responsible use of AI tools in research – covering topics such as data protection, hardware requirements (CPU vs. GPU), and the limitations of automated transcription quality.

In a hands-on session, participants work with Whisper using provided audio files or their own material. Together, they reflect on how automatic transcription affects source data – especially when large datasets cannot be fully checked manually – and discuss the implications for subsequent (semi-)automated analyses.

Open teaching materials for sustainable data skills

The workshop’s teaching concept and accompanying Jupyter notebooks are available as Open Educational Resources (OER) on Zenodo and can be reused by multipliers in their own teaching and training formats. These materials were developed as part of the BMFTR-funded DataNord project.

In this way, the DSC strengthens data skills in the humanities through hands-on work with real research practices – from data preparation and AI-supported workflows to questions of reproducibility.

DataNord and the DSC: building data skills for the humanities

The workshop reflects the broader approach of DataNord and the DSC: teaching data skills in a discipline-specific and interdisciplinary way. Especially in the Digital Humanities, where text, audio, images, and computational methods increasingly intersect, accessible entry points to data science, research data management, and AI are essential.

Through formats like this, the DSC brings data science expertise directly into humanities research – while also creating space for exchange around the methodological, technical, and ethical dimensions of data-intensive scholarship.


Additional links:

Book of Abstract (With DSC article: Fuchs, N., Nolte, A., Steinmann, L., Drechsler, R., 2026. Vom Audio zum Text: Automatisierte Transkriptionen mit Whisper. pp. 79-81)
https://zenodo.org/records/18693970 (OER Resource From Audio to Text: Automated Transcriptions with Whisper – An Open Educational Resource)
DataNord

If you have any questions, please contact:

Nele Fuchs
DSC Data Scientist | Humanities
Tel. +49 (421) 218 59853
E-Mail: n.fuchsprotect me ?!uni-bremenprotect me ?!.de

Annika Nolte
DSC Data Scientist | Environmental and Marine Sciences
Tel. +49 (421) 218 59856
E-Mail:anolteprotect me ?!uni-bremenprotect me ?!.de

 

Updated by: News