Skip to main content

OT-SC-WS-04 | Evaluating machine learning and artificial intelligence algorithms

Prof. Dr. Werner Brannath, Dr. Max Westphal


Artificial Intelligence (AI) and Machine Learning Methods (ML) have been successfully applied in many areas with some very prominent examples like e.g. the AlphaGo algorithm. However, there are also examples for ML/AI-algorithms that perform quite poor (e.g. IBM Watson). When building an ML/AI-algorithm a large number of prediction or classification models are explored with the goal of selecting the seemingly best one. This can lead to a severe overestimation of the prediction or classification ability (also called “performance”). The algorithm also depends on the data used for its determination.  A weak performance of an ML/AI-algorithm is often difficult to identify but can produce severe harm when the algorithm is applied.

One therefore needs to evaluate ML/AI-algorithms and must apply specific methods and techniques (e.g. based on cross-validation or bootstrapping) to avoid overestimation. One also needs to carefully distinguish between the performance of the model building process itself (unconditional performance) and the prediction/classification ability of the finally selected algorithm (conditional performance). In this course we will illustrate the difficulties and challenges with the judgment of ML/AI performance and introduce a number of techniques for a reliable estimation of unconditional and conditional performance.  The course content is a requirement for those who aim to build an ML/AI- algorithm and helpful to those who want to apply an existing one.


  • Motivation: why do we need (quantitative) method evaluation in ML/AI?
  • Definition of performance measures of ML/AI solutions, primarily for supervised methods (classification, regression), but also unsupervised methods (e.g. clustering)
  • Statistical inference for selected performance measures (estimation, statistical testing, confidence intervals)
  • Important terminology and concepts (in-sample vs. out-of-sample performance, conditional vs. unconditional performance)
  • Practical aspects (experimental design, study planning)
  • Application of evaluation methods to case-studies in R


  • Understanding of the challenges and difficulties with the evaluation of ML/AI-algorithms
  • Understanding and knowing how to apply basic and more advanced methods for ML/AI-algorithm evaluation

Prior knowledge

  • Basic statistical knowledge (e.g. Statistical Basics, Quantitative analysis)
  • Basic machine learning skills (e.g. Machine learning algorithms, Deep learning/neural networks)



  • Own PC, laptop
  • For online format a second screen might be beneficial
  • Japkowicz, Nathalie, and Mohak Shah. Evaluating learning algorithms: a classification perspective. Cambridge University Press, 2011
  • Kuhn, Max, and Kjell Johnson. Applied predictive modeling. Vol. 26. New York: Springer, 2013.
  • Raschka, Sebastian. "Model evaluation, model selection, and algorithm selection in machine learning." preprint arXiv:1811.12808 (2018).


01.11.2021, 09:00 - 17:00

03.11.2021, 09:00 - 17:00

05.11.2021, 09:00 - 16:30


Online via VC



Werner Brannath

Prof. Dr. Werner Brannath

  • Professor of Applied Statistics and Biometry at faculty of Mathematics and Computer Science at the University of Bremen
  • Director of the Group Biometry at the Competence Center for Clinical Trials Bremen (KKSB)
Max Westphal

Dr. Max Westphal

Post-doctoral researcher - Data Science & Biostatistics - at Fraunhofer MEVIS