Description
Unsupervised learning refers to a broad set of methods from Statistics and Machine Learning used in settings where there is no specific "main" variable of interest (response variable) guiding the analysis, as is the case in supervised learning. Instead, the data consist of many variables, typically referred to as features, which are regarded as equally relevant to a specific problem or topic. The goal in such cases is essentially to discover interesting things about the data; i.e. to identify meaningful structures. For instance, to discover interesting subgroups within the data or to visualise the data in ways which are informative. Given this broad framework, unsupervised learning connects to several related, yet also distinct, domains; namely, exploratory data analysis, dimensionaly reduction, data visualisation and cluster analysis. Consequently, there exists a wide range of methods and techniques employed; examples include distance-based methods (e.g. k-means, hierarchical clustering), dimensionality-reduction techniques (e.g. principal component analysis), statistical modelling approaches (e.g. Gaussian mixtures) and algorithms from kernel density estimation (e.g. mean-shift). Unsupervised learning is of growing importance in numerous fields; some examples of application include image segmentation in medical research, customer targeting in market research and optimization of search engines for websites.
The goal in this project will be to understand the principles of unsupervised learning, learn about some of the aforementioned methods and techniques, and also learn how to use them in practice. Individual projects can follow several directions; for instance, focus on empirical comparisons, delve deeper into more advanced methods or analyse interesting datasets.
Prerequisites
Statistical Concepts II. Also familiarity with R, Python or other appropriate programming languages is essential.
Corequisite
Statistical Methods III (preferably).
Resources