Today we look at the properties of a hypothesis test for the fairness of a die. We simulate 120 rolls of a die and use our data to compare two competing hypotheses: one in which the die is fair, and the other in which the die has a specific bias towards 6. In particular, we will investigate the properties of the significance and the power of a hypothesis test.
seq
sum
sapply
plot
, and a histogram with hist
which
function[]
We will roll a die 120 times, and denote the number of times a 6 is rolled as \(X\). Let \(p\) be the probability of rolling a 6, with each roll of the die independent of the other rolls. We will use \(X\) to evaluate two competing hypotheses:
That is, under the null hypothesis, the probability of rolling 6 agrees with that from a fair die, and the alternative hypothesis corresponds to a die biased towards 6.
We will define our test in terms of \(X\), the number of 6s rolled, and the goal will be to find a ‘good’ critical value \(x^*\) of \(X\), such that, if \(X \geq x^*\), we reject \(H_0\) in favour of \(H_1\). We will look at what ‘good’ means in terms of significance level and power of the test and will investigate the sort of experiment size required to get a good value of both as well as the behaviour of power as the bias changes.
The significance level of a test, \(\alpha\), for a simple null hypothesis is the probability of rejecting \(H_0\) when \(H_0\) is true. For any chosen \(x^*\), the significance level of this test is \(\mathbb{P}[X \geq x^*]\), given \(H_0\).
Since the distribution for \(X\), the number of 6s rolled, is binomial with probability \(p\), we can use the binomial distribution to find \(\mathbb{P}[X \geq x]\) for any given value of \(x\). We can use R
’s built-in binomial probability function to find the significance level for any particular \(x^*\).
seq
function or the :
operator). Save this to X
.pxho <- dbinom(X, size=120, p=1/6)
plot
to draw a plot of \(X\) horizontally against the corresponding probability value vertically. Use the optional argument type='h'
to draw a vertical line graph.
A logical vector is a vector that only contains the logical values of TRUE and FALSE. In R
, true values are designated with TRUE, and false values with FALSE.
We will rarely construct these vectors directly, logical vectors are created by using comparison operators like < (less than), == (equals to), and != (not equal to) with numerical vectors.
==
equal to!=
not equal to<
less than>
greater than>=
greater than or equal toFor example
values <- 1:10
values > 5
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
values ==4
## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
To extract a single value or subset of values from a vector, we use the syntax: a[index]
where a is the vector, and index is a vector of index values.
values <- c(28, 67, 90, 23, 35, 16, 31, 69, 76, 83)
values[1]
## [1] 28
If we want multiple values from the vector, we can supply a vector of indices:
values[5:8]
## [1] 35 16 31 69
values[c(1,3,9)]
## [1] 28 90 76
We can also combine this idea of indexing with the idea of logical vectors introduced above. For instance, to extract only the values greater than 50:
values[values>50]
## [1] 67 90 69 76 83
Suppose we wanted to set the critical value \(x^*\) for our test to be \(X=20\), and we would reject the null hypothesis if we observed more 6s. This corresponds to the expected number of 6s in 120 rolls of the dice, and so we would reject if we observed more 6s than were expected.
TRUE
and FALSE
values corresponding to the statement \(X\geq 20\)pxho
for every \(X\geq 20\).sum
## [1] 0.5379619
What we have just calculated is the significance level for a test which rejects \(H_0\) for \(X\geq 20\). Clearly, the significance value is quite high - and rather higher than the usual 5% or 1% values we’re used to. However, we can always raise the value of \(x^*\) which will lower the significance probability.
sapply
and a custom function which uses the code written above, evaluate the significance level for all 121 values of \(X\) in the vector X
.alpha
.
Given a logical vector, we may be interested in finding the indices of the vector which are TRUE
. To do this we apply the which
function:
which(values>50)
## [1] 2 3 8 9 10
which((1:12)%%2 == 0) # which of the integers 1 to 12 are even?
## [1] 2 4 6 8 10 12
Perhaps the most popular choice of significance level is 0.05. We can use the which
function to find out which values of alpha are less than or equal to 0.05 by typing
which(alpha <= 0.05)
TRUE
. The first value returned therefore corresponds to the index of the smallest value of \(x^*\) and the highest significance level which is acceptable.
which(alpha <=0.05)
X
and alpha
to find the values of \(x^*\) and \(\alpha\) for this approximate 5% test.
For simple hypotheses, the power of a test is the probability of rejecting \(H_0\) when \(H_1\) is true. To work out the power of a test for a given significance level, we first select \(x^*\) appropriate to the chosen significance level and then compute \(\mathbb{P}[X >= x^*~|~H_1]\), the probability under \(H_1\) that we fall in the rejection region for our test and thus reject \(H_0\).
sapply
to compute a vector of power values in the same way we computed the vector of significances. Call this vector powers
.
If you have done this correctly, you should find that the power for \(x^* = 28\) is 0.1177323
which you can verify by returning the value of powers[29]
.
So, the price we have paid for having such a low probability of rejecting \(H_0\) when it is true (i.e. such a low significance level) is a low probability of rejecting \(H_0\) when \(H_1\) is true (i.e. a low power). We can see that there is a tradeoff between significance and power by plotting the two against each other.
plot
to plot the significance values (alpha
) horizontally against the power values (powers
) vertically. Use type='l'
to join the values with lines.
For a given significance level, as we increase the sample size we will also increase the power of the test since extra data will provide more evidence for the test. There are two interesting questions raised by this initial exploration:
We look at the second question first.
Remembering that the significance level depends only on the null hypothesis, we can always choose \(x^* = 28\) to get a significance of less than 0.05. However, as we change the alternative hypothesis, the power of this test will change. The more extreme our alternative in comparison to the null hypothesis, the easier it should be to detect a difference between them and decide whether to reject or not.
We can look at what the power will be for all possible alternative hypotheses, by computing the power function. Suppose our alternative hypothesis is now
\(H_1 : p=\frac{1}{6} + a,~~~ \text{for}~ a\in \left( 0,\frac{5}{6}\right].\)
powerFun
, with argument a
, that computes the power of the hypothesis test for our critical value of \(x^*=28\), i.e. a significance level of 0.037.
seq
uence from
0, to to
5/6, and of length
100. Save this vector as p1vals
.sapply
to apply your power function to every value in p1vals
. Call the result tPowers
.tPowers
against p1vals
as a line. Interpret the curve you see.
We have seen how the power of the test changes as the bias in the alternative hypothesis changes. The other important question is how big a sample size is required to get a large power for the same significance.
Suppose we use our original hypotheses \(H_0\) and \(H_1\). We are now going to combine all of the steps taken in the previous sections in order to compute the power for different sample sizes.
n
, the number of die rolls, as an argument, and that finds the smallest \(x^*\) such that the significance level is not greater than 0.05, then uses that \(x^*\) to compute the power of the test with respect to the null hypothesis \(H_1\). Call this function powerFun2
.power_vec
.power_vec
against the sample sizes \(n\).n
such that the significance level is not greater than 0.05 and the power is \(\geq\) 0.9. Call this value nstar
. Warning: Don’t try values of n bigger than around 5000 as these could take a long time to compute.
n = nstar
calculate \(x^*\) using the first few lines of your function.
nstar
rolls from the die biased in the way described under \(H_1\). Count the number of 6s in the experiment.