Project IV

Project IV (MATH4072) 2021/22

Exact Tests in Analysis of Categorical Data

Peter Craig

Description

The majority of procedures for testing particular statistical hypotheses are only approximate, in the sense that the actual significance level differs from that claimed; usually, the approximation improves as the sample size increases. However, in some special situations, there exist exact tests which work for all sample sizes. A familiar example is the test, based on the t-distribution, for the mean of a population which is known/assumed to have a normal distribution.

The best known example is in fact Fisher's exact test which is an alternative to the usual inexact "chi-squared" test of independence in a two-factor contingency table. An implementation is a standard component in R. In principle, the idea extends to a wide variety of models for categorical data seen as higher-dimensional contingency tables. However, the computational challenges are significant and frequently simulation methods are used to approximate the exact test, even for two-factor tables with large numbers of rows and columns.

Starting with the theory underlying Fisher's exact test, students will study the basic principles underlying such exact tests and approaches to tackling the computational complexity.

There are connections to a number of ideas studied in the Bayesian Statistics module, including Monte Carlo, importance sampling and Markov Chain Monte Carlo. However, there is no need to be taking Bayesian Statistics in order to do this project. There are also connections to computational algebraic geometry for interested students.

Meetings will be as a group during the first term. During the second term, meetings will usually be individual with the supervisor and may sometimes take place online if the supervisor is away from Durham.

Prerequisites

Statistical Concepts II and an interest in computation (probably but not necessarily in R). Monte Carlo II and Topics in Statistics III may prove helpful but are not essential.

Resources

Wikipedia Exact Test article and whither it leads you.

Fisher RA (1922). On the interpretation of χ² from contingency tables, and the calculation of P, Journal of the Royal Statistical Society, 85, pp. 87-94.

Agresti A, An Introduction to Categorical Data, Wiley, 1996.

Besag J and Clifford P (1989), Generalised Monte Carlo significance tests, Biometrika, 74, pp. 633-642.

email: Peter Craig