Zum Hauptinhalt springen

OT-ST-WS-08 | Data preparation

Dr. James Imber, Karl Kortum, Dr. Nikolay Koldunov, Dr. Stephan Kloep, Dr. Jan-Ocko Heuer

Coding

Interested in looking into characteristics of different types of data?

Data holds the answers to all manner of questions and analysis methods can extract those answers - linking the two is “data preparation”. Anyone interested in working with data will likely need to know at least some of the principles outlined.

Furthermore, cross-discipline data analysis is part of the scientific progress. This workshop offers the opportunity to look into data preparation procedures for different data types and to learn about their individual characteristics.

On 07 Septemer, we start with a general introduction into data preparation. Afterwards, a brief overview about important aspects of image data, climate model data, clinical data, and qualitative data preparation will be presented. Such aspects will be adressed in single hands-on sessions focussing on specific data types (see below). Knowing about the contents of the single sessions, participants can register for hands-on sessions. Registration for the hands-on sessions starts on 07 September 2022.

 

07.09.2022 | General introduction and image data preparation (09:30-10:30)

An understanding of how to approach a new dataset and the typical steps that will be necessary in a variety of contexts. An insight into the preparation of image data and practical experience performing such tasks using Python.

07.09.2022 | Climate model data (10:30-11:00)

This topic will be useful for students that plan to work with models of Earth System components as their primary research tool. It also will be useful for researchers that require information about past or future state of the Earth System as additional parameter in their research (e.g. weather conditions when interview was taken, or clinical study was performed).

07.09.2022 | Clinical data (11:00-11:30)

Clinical data is either collected during the course of ongoing patient care or as part of a formal clinical trial program. To analyse clinical data a sophisticated data management concept including data preparation procedures is needed to comply with data protection regulations.

07.09.2022 | Qualitative data (11:30-12:00)

The term “qualitative data” is used to describe a broad variety of heterogeneous data, including various types of text (e.g. transcripts of interviews or observations), audio, video, picture or material artefacts. From the perspective of “quantitative” research – i.e., the application of statistical methods to standardized numerical data –, qualitative materials just seem to be data that need more structure. But qualitative material is a specific type of data that is usually richer, more context-dependent and more sensitive than quantitative data. On the other hand, qualitative data can be fruitfully analyzed with common tools of quantitative inquiry (e.g. text mining). Thus, this workshop addresses both quantitative and qualitative researchers and aims to introduce them to the particular ethical, legal and practical challenges of qualitative materials – e.g. in terms of data protection, informed consent, anonymization, documentation and data sharing – to outline good practices as well as examples of fruitful research data management.

  • Own PC, laptop
  • Internet, web browser (up-to-date)
  • For online format a second screen might be beneficial

Contents

In the social sciences, “qualitative data” refers to various kinds of data – such as interview transcripts, observation protocols, field notes, pictures, audio or video recordings, and material artefacts – that are less structured than “quantitative data” and thus cannot – and often should not – be easily transformed into numerical relatives for statistical analysis. By contrast, qualitative data are often heterogeneous, complex, very information-rich, highly context-dependent and sensitive – thus posing particular ethical, legal and practical challenges for research data management (RDM), including data protection and informed consent, anonymization, documentation, and sharing beyond the research project of origin. This hands-on workshop aims to make participants familiar with major challenges of preparing qualitative social science data for (qualitative or quantitative) analysis and to introduce them to good practices and RDM tools to deal with those challenges. Particular attention is given to appropriate research documentation using a ‘study report’ and various context materials, and to the anonymization/pseudonymization of qualitative text data, including an introduction to the tool ‘QualiAnon’.

The following topics will be primarily addressed:

  • What are qualitative data? Examples and characteristics
  • Why, for Whom and How are qualitative data important?
  • Producing and preparing qualitative data for research/analysis
  • Focus: finding, re-using and sharing qualitative data
  • Focus: documenting qualitative data and contexts
  • Focus: anonymizing qualitative (text) data

 

Prior knowledge

---

Technical requirements

Own computer with modern web browser; WiFi: Access to eduroam

 

Further reading

Corti, Louise; van den Eynden, Veerle; Bishop, Libby; Woollard, Matthew (2020): Managing and sharing research data: A guide to good practice. 2nd ed. Los Angeles: SAGE Publications

Contents

An introduction to the preparation of data for analysis, beginning with the initial production or acquisition through to an analysis ready dataset. The specific case of image data will then be discussed in more detail using examples from satellite-based Earth Observation.

 

Outcomes

An understanding of how to approach a new dataset and the typical steps that will be necessary in a variety of contexts. An insight into the preparation of image data and practical experience performing such tasks using Python.

 

Prior knowledge

Basic experience with programming on any language would be an advantage.

 

Technical requirements

Own computer with modern web browser; WiFi: Access to eduroam

A software environment will be provided via the online tool Jupyther Hub; for local installation, participants will receive installation instructions prior to the workshop

 

Further reading

---

Contents

We will cover the following topics:

  • Where to find climate model information
  • netCDF data format
  • Basic types of atmospheric, ocean, land and sea ice data
  • Validation of model data
  • Ways to extract weather and climate information for particular regions and times in past and the future

Course on GitHub: https://github.com/koldunovn/DT_model_data

Outcomes

Basic understanding of strengths and weaknesses of data from Earth System models. Information on what kinds of data on weather and climate are available, how to get it and extract and post-process it.

 

Prior knowledge

Some exercises will require a degree of familiarity with the Python programming language. Some specific extension packages will be used but prior knowledge of these will not be assumed. Basic experience with programming on any language would be an advantage.

 

Technical requirements

Own computer with modern web browser; WiFi: Access to eduroam

A software environment will be provided via the online tool Jupyther Hub; for local installation, participants will receive installation instructions prior to the workshop

 

Further reading

---

General introduction

When?

07.09.2022, 09:30 - 12:00


Where?

Online via VC


Language?

English


Registration deadline: 23.09.2022

Other status groups or externals:

Free places will be offered to candidates on the waiting list after registration was closed (one week before the workshop takes place).

Optional hands-on sessions:

PART 1: 28.09.2022, Qualitative data

PART 2: 29.09.2022, Image data

PART 3: 29.09.2022, Climate model data

> 10:00 - 12:00, 13:00 - 15:00 (each session)


Where?

MARUM, Room 2070


Language?

English


Registration starts after the general introduction to the different parts on 07.09.2022

Registration deadline: 27.09.2022

Dr. James Imber & Karl Kortum

Researcher at the Remote Sensing Technology Institute of the German Aerospace Center (DLR) as a member of the Synthetic Aperture Radar (SAR) Oceanography group

&

PhD Candidate at the Remote Sensing Technology Institute of the German Aerospace Center (DLR) as a member of the Synthetic Aperture Radar (SAR) Oceanography group

mehr
Nikolay Koldunov

Dr. Nikolay Koldunov

Scientist at Alfred Wegener Institute

mehr

Dr. Stephan Kloep

Data Manager at the Competence Center for Clinical Studies at the University of Bremen

Jan-Ocko Heuer

Dr. Jan-Ocko Heuer

Postdoctoral Researcher, Research Data Center (RDC) Qualiservice, SOCIUM Research Center on Inequality and Social Policy, University of Bremen