Completed | 1. Data science and big data, 26.01.2022 | 2 - 4 pm
Parallel to the digital transformation, a novel scientific discipline has been developed – data science. Data science allows new approaches for interdisciplinary (big) data analyses through complex algorithms and artificial intelligence (machine learning, deep learning etc.). Such approaches extract information from the data sets beyond the current scientific knowledge. Therefore, data science is of interest for nearly all research as well as industry/economy fields and often termed as a novel key discipline (e.g. Society of Informatics e.V., 2019). This course provides a basic overview about data science applications.
To produce reliable data science results a profound knowledge about the data analyses methods, data management techniques and innovative technologies is required. Additionally, to assess these results and approaches an awareness of their ethical, legal, and social implications is demanded (all topics are addressed in the following courses and operator tracks).
1. History (timeline comparison with CPU power and storage costs) & clarification of terms
- Statistics > Machine Learning > Deep Learning
- Data Mining > Big Data
- Machine Learning vs. Artificial Intelligence
2. What is Data Science?
- Collection > Analysis > Visualization
- Machine Learning
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Big Data (data science with huge datasets, more memory of one PC required)
- Languages (i.e. python, R)
Basic overview about data science applications, methods, terms, tools and big data.
Completed | 2. Philosophical reflections on data science, 04.02.2022 | 10 am -12
A critical awareness (“Critical Thinking”, see below) is crucial for an appropriate and reasonable assessment of data preparation, sharing and utilization in context of research data management, data protection and data science applications.
Furthermore, critical thinking establishes a common language across disciplines which is aware of limits or difficulties and thereby, essential for cooperative and future-oriented research.
Philosophy is often about “big concepts”; concepts such as knowledge, understanding, autonomy, transparency, intelligence, and creativity. And all these concepts are at stake in the context of current research in data science and artificial intelligence. It seems inescapable that we lose some of our own autonomy once our cars start driving autonomously and our houses become smarter and smarter. Computers outsmart us in number crunching since decades, but will they also outsmart us in creativity? Will they become the “better scientists” or will there always remain a difference between “pure prediction” and “real understanding”? Is predictive success acceptable even if it comes with a loss in transparency? After all, transparency is something we are very much worried about not only in science but in all kinds of political and societal contexts. At the same time, privacy and data protection laws are a major theme in public discourse as well. Consider tracking apps, for instance—do we really want to become transparent citizens and consumers, X-rayed as it were by a machine learning algorithm no one might actually understand? Further reading here.
Critical Thinking: Critical reflection of own research/work and development of empathy for other disciplines, their mindsets and ways to think.
- Schneider P et al. (2020) Rethinking drug design in the artificial intelligence era. Nature Reviews Drug Discovery 19, 353–364. doi:/10.1038/s41573-019-0050-3
- Boden, M. A. (1998). Creativity and artificial intelligence. Artificial Intelligence 103(1): 347-356
- Burge, T. (1998). Computer Proof, Apriori Knowledge, and Other Minds. Noûs, 32: 1-37. https://doi.org/10.1111/0029-4624.32.s12.1
- Iten, R., et al. (2020). Discovering Physical Concepts with Neural Networks. Physical Review Letters 124(1): 010508
Completed | 3. Asking the right research questions in data science, 08.02.2022 | 9 - 11 am
“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question” said the renowned statistician John Tukey as early as 1969.
Based on my own experience in statistical consultations, much confusion occurs due to a mismatch between research question and data/methods. However, even more fundamentally, the research question is often not even clearly articulated at the outset – perhaps because researchers anticipate that the right question can only be answered approximately. But how can we discuss what data and methods are suitable, if we are unclear or vague about the question to be answered? It seems that now, in the era of big data characterised by an abundance of data and a similar abundance of methods for analysing the data, the issue of asking the right question receives a new urgency.
In this course we will discuss the different types of research questions one might face in a variety of applied fields within data science, such as psychology, epidemiology, genetics, or political & social sciences. Key distinctions concern questions that are (i) descriptive, (ii) predictive, or (iii) causal (i.e. about counterfactual prediction). We will consider how these types of research questions are interrelated with the choices / requirements of data, methods of analysis, and the need for more or less specific subject matter background knowledge. We will see how starting with a clear and explicit research question helps with assessing, and maybe avoiding, potential sources of (structural) bias in answering that research question.
Key topics that will be covered:
- Types of research questions (descriptive, predictive, causal/counterfactual)
- Issues of validity and structural bias (e.g. selection, confounding, ascertainment)
- The target trial principle
Upon completion, participants of the course will be able to
- categorise research questions as descriptive, predictive or causal
- elicit a research question by formulating a target trial
- determine implications for the required data and choice of appropriate methods
- identify possible threats to validity / sources of structural bias.
Some prior exposure to or experience with analysing data will be helpful.
- Miguel A. Hernán, John Hsu & Brian Healy (2019) A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks, CHANCE, 32:1, 42-49.
- Miguel A. Hernán, James M. Robins, Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available, American Journal of Epidemiology, Volume 183, Issue 8, 15 April 2016, Pages 758–764.
- Huitfeldt A., Is caviar a risk factor for being a millionaire? BMJ 2016; 355 :i6536 doi:10.1136/bmj.i6536; https://www.bmj.com/content/355/bmj.i6536
Completed | 4. Statistical thinking, 16.02.2022 | 2 - 4 pm
Data science approaches are based on statistical/mathematical methods as well as computer science competences. In this context, it is crucial to understand the basic principles of statistical methods. This will help to adequately apply statistical methods and to produce reliable statistical results.
This course provides an introduction into statistical basics and concepts relevant for data science applications. After a brief presentation of the categories of statistics (descriptive, predictive, confirmatory) and their general ideas, selected basic methods will be explained and illustrated by practical examples: concept of probability, parameter estimation, confidence intervals and testing of hypotheses.
A basic understanding of the major statistical principles.
- Fahrmeir, Heumann, Künstler, Pigeot, Tutz (2016). Statistik – Der Weg zur Datenanalyse, 8. Auflage, Springer-Verlag, Berlin, Heidelberg.
- Fahrmeir, Künstler, Pigeot, Tutz, Caputo, Lang (2009). Arbeitsbuch Statistik, 5. Auflage, Springer-Verlag, Berlin, Heidelberg.
- Freedman, Pisani, Purves (1998). Statistics, 3rd edition, W.W. Norton and Company, New York.
- Spiegelhalter (2019). The Art of Statistics: Learning from Data, Pelican, London.
Completed | 5. Digital ethics, 22.02.2022 | 2 - 4 pm
Digital ethics is of special interests for data scientists, engineers, and managers because their results and products have a direct impact on individuals and society. Ethics is concerned with respecting the dignity and vulnerability of human beings and with contributing to a good life. Ethics for Data Science - like Medical Ethics or Engineering Ethics - is aspirational (providing values for goals to achieve) and preventive (providing tools for understanding and avoiding problematic outcomes). Given the numerous areas of application the range of topics can be as varied and complex as fascinating.
- Thematic areas of digital ethics
- Introduction to theories and tools for digital ethics
- Types of ethical concerns with algorithms
- Values for digital ethics
Participants will acquire an understanding for the manifold areas of ethical concern affected by digital technologies, and for the various principles and values that are at play when dealing with these concerns. A basic map of tools for ethical thinking will be presented, that provides orientation for further research into questions of digital ethics.
Completed | 6. About the meaningfulness of data, 01.03.2022 | 10 am -12
Data are not, as etymology suggests, „the given“, but they are generated, constructed, or made (sometimes in the bad understanding of the wording). Therefore, we need to shed some light onto the implicit and hidden presuppositions in our scientific agenda. For a start, let’s assume that there is no meaning in the data per se, but that meaning happens to data, it is attached to it. In fact, YOU attach it, and therefore you must assume liability for it in both the scientific and the legal sense. Technological sophistication and programming skills might not be enough: there’s an extra mile for you to walk in the vast realm of philosophy. In order to identify how your scientific attitudes become a bearing and how your decision making as a researcher along the rocky road of empirical research adds, withdraws, or alters the meaningfulness of data, we will span a wide range from epistemological paradigms down to specific choices of statistical models in handling missing data, bridged by measurement theory and its map of pathways from the (in-)tangible world to numbers.
- Notions of meaning, data, and information
- Epistemology and Ontology: how data refer to what is being measured
- The ideal research process: are data decisive ? A menu of paradigms
- Data and theory. Realism – Anti-realism – Pragmatism. Models. Truth.
- Introduction to measurement theory: do you abide by the rules?
- Fuzziness, vagueness, uncertainty, incompleteness: “bad” data?
- Missing data: how your philosophical stance indeed impacts study results
Since, as a psychologist and statistician, I cannot claim expertise in your respective field of work, I will not, and cannot, tell you how to “do it right”. But the patterns behind „doing it wrong“ are quite universal: unawareness and intransparency. My aim is to make the implicite explicit, and foster a critical mindset when it comes to relating data to meaning in your specific discipline.
None. Just be nosy and open-minded.
Completed | 7. Data and information management, 09.03.2022 | 1 - 4 pm
A comprehensive management of research data is part of each research project and belongs to good scientific practice. It accompanies each phase of a research project – from the proposal phase via data acquisition and data analyses to the publication phase. The overall goal of research data management is the production of findable (F), accessible (A), Interoperable (I) and reusable (R) – FAIR - data sets.
A good stewardship of data (following the FAIR principles; Wilkinson et al., 2016) and an open data culture (Nosek et al., 2015) foster reproducibility as well as sustainability in science and makes up the fundament for data science applications
- Research data: Data life cycle and accompanied challenges
- Data management plans (DMP)
- FAIR data principle
- Meta data: standardization and its significance
- Archiving, publication and citation of research data sets
- Understanding for the significance of research data management and an overview about concepts and approaches
- Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
- Wilkinson, M. D. et al. Comment: A design framework and exemplar metrics for FAIRness. Sci. Data 5, 1–4 (2018).
- Hodson, S. et al. Turning FAIR data into reality: interim report from the European Commission Expert Group on FAIR data (Version Interim draft). Interim Rep. from Eur. Comm. Expert Gr. FAIR data (2018). https://doi.org/10.5281/zenodo.1285272
- Collins, S. et al. FAIR Data Action Plan. Interim Recomm. actions from Eur. Comm. Expert Gr. FAIR data 1–21 (2018). https://doi.org/10.5281/zenodo.1285290
- Wilkinson, M. D. et al. Interoperability and FAIRness through a novel combination of Web technologies. PeerJ Comput. Sci. 3, e110 (2017).
- Mons, B. et al. Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open Science Cloud. Inf. Serv. Use 37, 49–56 (2017).
Completed | 8. Data protection and licenses, 17.03.2022 | 3 - 5 pm
Compliance with legal requirements in the handling of research data is an indispensable requirement for the long-term success of research data management.
Legal framework of research data management with a special focus on questions of copyright law and data protection law.
Acquiring basic legal knowledge of the possibilities and limitations of research data management.
- Christen/Ranbaduge/Schnell, Linking Sensitive Data. Methods and Techniques for Practical Privacy-Preserving Information Sharing, Chapter 2, 2020
- Donnelly/McDonagh, Research, Consent and the GDPR Exemption, European journal of health law, 2019, p. 97
- Ducato, Data protection, scientific research, and the role of information, Computer Law & Security Review, 2020, 105412
Completed | 9. Managing confidential data, 24.03.2022 | 10 am - 12
Many data are classified as confidential because they contain either sensitive personal or institutional information. Their confidentiality limits the use of these data, but they can still be of great benefit to research, provided they are used and managed properly. In this course we will focus on data collected in an applied and industry-related context-concrete applications from wind energy research will serve as examples. We will discuss what challenges arise in this context and how to deal with them to generate the best possible research output (not least with the requirements of open science in mind).
- Classification of confidential data
- Specific requirements for managing confidential data
- Approaches for doing (open) science based on confidential data
Understanding the specific requirements of confidential data and learning methods for working with / conducting research based on confidential data.
Completed | 10. Managing qualitative data, 29.03.2022 | 9 am -12
The term “qualitative data” is used for various kinds of non-standardized materials in qualitative social research, including various types of text (e.g. interview transcripts, observation protocols), images, audiovisual data, or material artefacts. From the perspective of “quantitative” research – i.e., the application of statistical methods to standardized numerical data –, qualitative materials just seem to be data that need more structure. But qualitative material is a specific type of data that is usually richer, more context-dependent and more sensitive than quantitative data. On the other hand, qualitative data can be fruitfully analyzed with common tools of quantitative inquiry (e.g. text mining). Thus, this lecture addresses both quantitative and qualitative researchers and aims to introduce them to the particular ethical, legal and practical challenges of managing qualitative materials – e.g. in terms of data protection, informed consent, anonymization, documentation and data sharing – to outline good practices as well as examples of fruitful data management.
1. Introducing qualitative data & research
- What is qualitative research? Aims, examples & characteristics
- Quantitative versus qualitative data & research processes
- Methodology, context and data in qualitative inquiry
- Mixing qualitative and quantitative data and research
2. Managing qualitative data in practice
- Data collection & informed consent
- Data transformations for analysis
- Archiving & sharing data
- Finding & re-using data
Basic overview about qualitative research data and their management.
- Corti, Louise; van den Eynden, Veerle; Bishop, Libby; Woollard, Matthew (2020): Managing and sharing research data: A guide to good practice. 2nd ed. Los Angeles: SAGE Publications.
Completed | 11. Computer science basics for data science, 27.04.2022 | 2 - 4 pm
Computer science is a key component for data science applications and research data management as methods and procedures rely on it. For instance, to enable fast access to information, data sets must be stored efficiently in data structures. Clever modelling and algorithmic processing hereby guarantee a fast search and selection of information of even big data sets. This course will provide insights into computer science basics and gives an overview about relevant topics for data science.
- Computer science and its subdisciplines: applied, technical, practical, theoretical
- Programming languages
- Data storage and -processing
- Data structures
- Example: Sorting (Bubble Sort, Merge Sort, Quicksort)
Basic overview about computer sciences and its subdisciplines; basics in system engineering.
- Martin Dietzfelbinger, Kurt Mehlhorn, Peter Sanders, Algorithmen und Datenstrukturen - Die Grundwerkzeuge, Springer, 2014
- Kurt Mehlhorn, Datenstrukturen und effiziente Algorithmen - Band 1: Sortieren und Suchen, Vieweg+Teubner, 2012
Completed | 12. Overview about programming languages, 05.05.2022 | 10 am - 12
Programming is the essential tool for managing data sets and conducting data science methods. It is crucial for
- documentation work
- data preparation
- quality control of data sets
- data analyses
- transforming data into graphics
and makes handling of even big data sets possible.
- What actually is a programming language? What characterizes a programming language and what is it for?
- Why is HTML not a programming language and what has Turing to do with it?
Approximately 700 programming languages exist - how to keep an overview? We learn to distinguish languages from their degree of abstraction and programming paradigm (imperative, procedural, object-oriented, functional, logical, …) or their area of application. Further, we talk about which programming languages you should know and some of them are briefly presented in this course.
Overview about programming languages, their features, significance and criteria for distinction.
Completed | 13. Cryptography basics, 11.05.2022 | 10 am -12
Cryptography is the key technology to ensure the security and privacy of IT-systems. The understanding of basic principles of cryptographic functions is an indispensable prerequisite for the development of modern IT systems.
This course will provide elementary knowledge in cryptography (in theory and practice). For example: asymmetric vs. symmetric encryption, cryptographic hash functions, digital signatures and public-key infrastructures, post-quantum cryptography.
Basic knowledge in cryptography, which in particular allows to assess the strength of cryptographic methods in practice.
C.Paar, J.Pelzl: Understanding Cryptography A.J. Menezes et al: Handbook of Applied Cryptography
Completed | 14. Security & privacy, 18.05.2022 | 10 am -12
Security and privacy are key aspects in developing and maintaining trustworthy systems. A lack of security results in vulnerable systems exposed unprotected to potential attackers and presenting an incalculable economical and personal risk. As personal data has become the new currency in the digital era, its protection from unauthorized processing and distribution is a key issue to preserve the privacy and self-determination of individuals.
Techniques to measure and enhance the security / privacy of IT-systems
- Security: security protocols, security policies and their enforcement
(e.g. access-control, dataflow control)
- Privacy: GDPR, privacy-enhancing techniques
(e.g. differential privacy, k-anonymity)
This course provides basic knowledge in security and privacy techniques and sketches their underlying foundations.