Prerequisites
Basic knowledge of probability and statistics, and a sufficient level of mathematical maturity; e.g., having followed an undergraduate course in Mathematical Statistics should provide you with enough background. Familiarity with R is useful, but not required.
Aim of the course
The aim of this course is to obtain a broad knowledge of nonparametric methods in statistics. Many methods in statistics are parametric in nature. In this case the distribution of the data is assumed to be parametrized by a nite-dimensional parameter. The basic idea of nonparametric methods is to drop, or relax, this often restrictive assumption. These methods thereby oer much more exibility to model the data than classical parametric methods. The topics that we cover in this course form a mix of classical distribution free methods and more modern topics. The focus is on both application and theory of these methods. Examples will be illustrated using statistical computing tools, namely the statistical computing software R.
In this course we cover the following topics:
- Non-parametric inference and the empirical CDF [W, Chap. 1, 2.1] This is an introduction to non-parametric inference, and to the empirical comulative distribution function and its properties.
- Goodness of t tests [N] The goodness of t problem tries to answer the question of whether a certain parametric statistical model is appropriate or not. If it is hard to nd a suitable parametric model, a nonparametric model offers a viable alternative.
- Permutation and rank tests [N] If parametric assumptions are hard to justify and/or rejected by a goodness of t test, then the validity of classical tests (often based on the normal distribution) can be significantly hindered. In this case, permutation tests are a good alternative. Rank tests are permutation tests applied to order statistics. Such tests are often employed in practice and are strong competitors of classical tests. We treat the main principles and discuss various properties of these tests.
- The Jackknife and the Bootstrap [W, Chap. 3] We discuss the jackknife and the bootstrap resampling methods, which often allows us to make meaningful statistical statements based on simulations.
- Smoothing [W, Chap. 4] A generic way to look at non-parametric statistics is as a method to remove noise from-,or to smooth a curve. We use kernel estimators and smoothing splines to introduce the main concepts underlying smoothing techniques.
- Non-parametric regression [W, Chap. 5] The simplest classical linear regression model assumes that the relation between a response variable Y and a predictor variable X can be modeled by a straight line. However, this may not be appropriate. Non-parametric regression aims to t a curve while making as few assumptions as possible.
- Local regression and penalised regression [W, Chap. 5] We discuss various approaches to non-parametric regression, such as local regression and penalized regression. Besides being practically relevant, these methods also raise mathematically interesting questions.
- Penalised likelihood methods [N] Applying the maximum likelihood principle to estimate parameters of non-parametric estimators often leads to overfitting. A way around this is to penalise certain estimators based on their complexity, or lack of smoothness.
- Adaptation [N] Adaptation techniques are used to adjust the complexity of an estimator. This is imporant when doing non-parametric inference, since the size of the model in influences the precision of estimators.
The first 6 weeks are dedicated to topics 1-4; weeks 7-13 are dedicated to topics 5-9 in non-parametric statistics.
Rules about Homework / Exam
Throughout the course you will be required to solve homework exercises. Some of these will be theoretical, while others will be practical, and often require the use of computational tools. It is recommended (but not mandatory) that you use the statistical computing package R for these. Homework exercises that are handed in in a timely fashion will be graded and provide extra credit points towards the final grade.
You should hand in your answers at the beginning of class (alternatively, only in exceptional cases, via email). Answers can be handwritten, but please write them clearly and be organized. I encourage you to work together in groups of two or three students maximum. (You can hand-in your answers as a group.) If answers are handed in on time these will count for the final grade.
Let E denote the exam grade and let H denote the combined homework grades (on a scale 0-10). The final grade F is computed as
F = max(0.75E+0.25H,E) if E>=5.0
F = E if E<5.0
Lecture Notes / Literature
[W]: L. Wasserman, All of Nonparametric Statistics, Springer, 2006. (ISBN 978-0-387-25145-5)
[N]: Notes and articles to be handed out during the course.
Lecturer
Paulo Serra (TUE)
- Docent: Paulo Serra