Project IV 19-20 (PSC)

Project IV 2019-20

Exact Tests in Analysis of Categorical Data

Peter Craig

Description

The majority of procedures for testing particular statistical hypotheses are only approximate, in the sense that the actual significance level differs from that claimed; usually, the approximation improves as the sample size increases. However, in some special situations, there exist exact tests which work for all sample sizes. A familiar example is the test, based on the t-distribution, for the mean of a population which is known/assumed to have a normal distribution.

For categorical data, the best known example is Fisher's exact test which is an alternative to the usual inexact "chi-squared" test of independence in a two-factor contingency table. An implementation is a standard component in R. In principle, the idea extends to a wide variety of models for categorical data seen as higher-dimensional contingency tables. However, the computational challenges are significant and frequently simulation methods are used to approximate the exact test, even for two-factor tables with large numbers of rows and columns.

Starting with the theory underlying Fisher's exact test, students will study the basic principles underlying such exact tests and approaches to tackling the computational complexity.

There are connections to a number of ideas studied in the Bayesian Statistics module, including Monte Carlo, importance sampling and Markov Chain Monte Carlo. However, there is no need to be taking Bayesian Statistics in order to do this project. There are also connections to computational algebraic geometry for interested students.

Prerequisites

Statistical Concepts II and an interest in computation (probably but not necessarily in R). Monte Carlo II and Topics in Statistics III may prove helpful but are not essential.

Resources

Wikipedia Exact Test article.

Wikipedia article on Fisher's exact test .

Fisher RA (1932) Statistical Methods for Research Workers (5th edition) is in the library. You may also find a copy of this online. On about page 112 of the 5th edition, there is a short section 21.02 entitled "The exact treatment of 2x2 tables" which should be read in the context of the rest of section 21. Other editions will have similar material.

Agresti A (1996) An Introduction to Categorical Data, Wiley

Agresti A. (1992). A Survey of Exact Inference for Contingency Tables, Statistical Science, 7, pp 131–153. doi:10.1214/ss/1177011454

Besag J and Clifford P (1989), Generalised Monte Carlo significance tests, Biometrika, 74, pp 633-642.

email: Peter Craig