Prerequisites:
o Basic knowledge of probability theory
  • expectation, variance, covariance
  • transformation formula for densities
  • conditional probability distribution, independence
  • convergence in probability, convergence in distribution
  • law of large numbers, central limit theorem
  • Markov inequality
  • characteristic function
  • normal distribution, chi-squared distribution, t-distribution
o Basic knowledge of statistics
  • ML estimation (also for p-dimensional parameters), moment estimators
  • confidence intervals
  • hypothesis tests
o Basic knowledge of linear algebra
  • matrix inverse
  • determinant, trace
  • eigendecomposition
  • square root of a matrix
o Basic knowledge of calculus
  • limits
  • differentiation (of functions of more than one variable)
  • integration (of functions of more than one variable)
  • Lagrange multiplier
o Basic knowledge in R (students can also use alternative programming languages like Python or Matlab, but solutions to problems will always be given in R)
  • reading data
  • creating plots
  • using loops
  • using logical operators ("&", "|","!")
  • defining functions
  • matrix operations
Note that we will shortly repeat more unknown concepts like Lagrange multiplier so that some gaps in knowledge are not problematic
Aim of the course:
In multivariate statistics we observe multiple measurements for each individual observation. This can be vital signs like heart rate, blood pressure and respiratory rate of a patient or household expenditures for housing, food, education and entertainment. A focus lies on finding and modelling dependencies between these individual variables so that we can gain insights into the underlying mechanics.
A particular challenge is posed by the case where the dimension of the observations is large. Nowadays collecting data is much cheaper than in the past so that working with huge data sets is not unusual anymore. We will tackle this problem among others by means of dimension reduction. Graphical tools will help us to understand and visualize the structure of big and complex data sets.
Often, we cannot assume that all observations are homogeneous and follow the same probability model. In this case we want to discover groups within the data set and classify observations into them.
At the end of the course the student will be able to analyze complex datasets. They can handle large data sets, investigate the underlying dependence structure, identify subpopulations and test for the equality of their means and covariances. Students will understand the mathematical foundation of multivariate statistical procedures and thus understand their limitations. They will be able to derive theoretical properties like consistency and asymptotic normality of new estimators. Students can adapt existing hypothesis-tests to their needs or construct new tests based on general principles.
Rules about homework / exam:
Doing homework is voluntary but recommended. The final grade is based on the written exam only. To pass the course, the grade for the (retake) exam should be at least 5.5.
Lecture notes / literature:
Lectures notes are published online. They are based on the books:
  • Wolfgang Härdle and Leopold Simar (2024): Applied Multivariate Statistical Analysis
  • Theodore Anderson (2003): An Introduction to Multivariate Statistical Analysis