1 Independent \(t\)-test: Relaxing the equal variance assumption

If we abandon the assumption of equal variances, then our first sample is i.i.d. \({\mathcal{N}\left({\mu_X, \sigma_X^2}\right)}\) and the second sample i.i.d \({\mathcal{N}\left({\mu_Y, \sigma_Y^2}\right)}\), with \(\sigma_X\) not necessarily equal to \(\sigma_Y\). Clearly, there is no single variance parameter to estimate and the test which uses the pooled sample variance would not be appropriate. We wonโ€™t cover the detail of the theory for this in lectures (nor is it examinable), but the idea is straightforward and easy to apply. Without the equal variance assumption, our test statistic becomes \[ t = \frac{(\bar{x}-\bar{y}) - (\mu_X - \mu_Y)}{\sqrt{\frac{s_X^2}{n} + \frac{s_Y^2}{m}}}, \] which still has a \(t\) distribution. The problem arises in determining the appropriate degrees of freedom for the distribution of \(t\). The degrees of freedom of the \(t\) distribution is no longer \(n+m-2\), but instead is approximated by this hideous expression \[ \nu \approx \frac{\left(\frac{s^2_X}{n}+\frac{s^2_Y}{m}\right)^2}{\frac{s^4_X}{n^2(n-1)} + \frac{s^4_Y}{m^2(m-1)}}, \] though often in practice we take the lazier route of using \[\nu=\min(n,m)-1\] as the degrees of freedom (this simpler case corresponds to a conservative version of the test).

  • Apply the independent sample \(t\)-test with unequal variances to compare the two groups.
  • What do you conclude? Do the results agree with or contradict the equal-variance test?

2 Creating your own functions

  • Use your code to construct your own version of the t.test function.
    • Add an optional equalvariance argument than can be TRUE or FALSE and will adjust the test performed. You will need to us an if statement to handle the different cases.
    • You can use the print and cat functions to get R to print output in the console. Use this to show the test statistic, degrees of freedom, and p-value.
  • Write a function to perform a paired sample t-test.
    • Try it out on the immer data set from the MASS package, which contains pairs of measurements of the barley yield from the same fields in years 1931 (Y1) and 1932 (Y2).
    • Check your results with the t.test function using the argument paired=TRUE.
  • Use your rank sum test code above to construct a function that performs the signed rank test given two paired samples of data.
    • Compare your results to the paired-sample t-test above using the same data set.
    • You will find the qsignrank and psignrank functions useful to find critical values and \(p\)-values for the signed rank test statistic.

3 Doing it the easy way: Using the stats package

While we could write our own code every time we want to do a \(t\)-test or rank-sum test, this gets rather tedious rather quickly. Thankfully, these tests are supported by the stats package in R which allows us to pass the problem of computing the test to a pre-defined function, and we can then simply interpret the results

  • Use the library function to load the R package stats.
  • Read the techniques page on the t.test function and apply it to your two samples A and B. Use the optional argument var.equal to perform an equal-variance test (TRUE), and FALSE to test without this assumption. Compare with your results from Section 2.
  • Read the help for the wilcox.test function function and try it on A and B. The optional argument exact=TRUE will compute the test exactly, whereas exact=FALSE will use a Normal approximation. Do the results agree with your calculations?