Project IV (MATH4072) 2019-20

Credal Classification

Matthias Troffaes

Description

Classification is one of the most universal decision problems around us, and living creatures are remarkably good at performing this task. Some fish perform spatial pattern recognition and abstraction by exclusive use of active electrolocation. Humans classify characters, words, and the meanings they have in a particular context. With the advent of technology, many classifying tasks have been successfully delegated to machines. For instance, when we open our inboxes, email clients classify potential spam with remarkable accuracy.

Crucial to classification is learning. However, during the initial learning phase, traditional machine algorithms, such as the naive Bayes classifier, typically misclassify as they always need to produce a single classification outcome, whereas humans and fishes can simply produce a set of options to convey their lack of information, without discriminating further between the elements of that set. The subject of credal classification is exactly to model classification under circumstances where not much information is present, by allowing the classifier to produce sets of outcomes rather than single outcomes.

In this project you will learn the basics of the naive Bayes classifier and its extension, the naive credal classifier. Essentially, in case the available information is insufficient to identify all probabilities that affect the classification problem, credal classification starts out with a set of probabilities, and naturally produces sets of classification outcomes, which become smaller (and eventually become singletons) as more information becomes available.

Next, you could look at how such sets can be obtained from data, and study possible applications of the credal classifier, including (but not limited to!) spam filtering, or medical diagnosis problems such as disease classification.

Prerequisites

Statistical Concepts II

Resources

R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley-Interscience, 1973.
Zaffalon, M. (1999). A credal approach to naive classification. In de Cooman, G., Cozman, F. G., Moral, S., Walley, P. (Eds), ISIPTA '99: Proceedings of the First International Symposium on Imprecise Probabilities and Their Applications. The Imprecise Probabilities Project, Universiteit Gent, Belgium, pp. 405-414.
Zaffalon, M. (2001). Statistical inference of the naive credal classifier. In de Cooman, G., Fine, T., Seidenfeld, T. (Eds), ISIPTA '01: Proceedings of the Second International Symposium on Imprecise Probabilities and Their Applications. Shaker Publishing, The Netherlands, pp. 384-393.
Statistical classification on Wikipedia.
Jonathan Zdziarski. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005.
Zaffalon, M., Wesnes, K., Petrini, O. (2003). Reliable diagnoses of dementia by the naive credal classifier inferred from incomplete cognitive data. Artificial Intelligence in Medicine 29(1-2), 61-79.