Methods and Techniques of Unsupervised Learning

Dr. Konstantinos Perrakis

Description

Unsupervised learning refers to a broad set of methods from Statistics and Machine Learning used in settings where there is no specific "main" variable of interest (response variable) guiding the analysis, as is the case in supervised learning. Instead, the data consist of many variables, typically referred to as features, which are regarded as equally relevant to a specific problem or topic. The goal in such cases is essentially to discover interesting things about the data; i.e. to identify meaningful structures. For instance, to discover interesting subgroups within the data or to visualise the data in ways which are informative. Given this broad framework, unsupervised learning connects to several related, yet also distinct, domains; namely, exploratory data analysis, dimensionaly reduction, data visualisation and cluster analysis. Consequently, there exists a wide range of methods and techniques employed; examples include distance-based methods (e.g. k-means, hierarchical clustering), dimensionality-reduction techniques (e.g. principal component analysis), statistical modelling approaches (e.g. Gaussian mixtures) and algorithms from kernel density estimation (e.g. mean-shift). Unsupervised learning is of growing importance in numerous fields; some examples of application include image segmentation in medical research, customer targeting in market research and optimization of search engines for websites.

The goal in this project will be to understand the principles of unsupervised learning, learn about some of the aforementioned methods and techniques, and also learn how to use them in practice. Individual projects can follow several directions; for instance, focus on empirical comparisons, delve deeper into more advanced methods or analyse interesting datasets.

Prerequisites

Statistical Concepts II. Also familiarity with R, Python or other appropriate programming languages is essential.

Corequisite

Statistical Methods III (preferably).

Resources

General online information can be found in the wikipedia page which provides a lot of useful links with more information about the various applications and methods.
Recommended reading:
- Chapter 10 (good to start with) in: An Introduction to Statistical Learning. Available online by the authors here.
- Chapters 13 & 14 (a bit more technical) in: The Elements of Statistical Learning, 2nd edition. Available online by the authors here.

Project III 2020-21