If we abandon the assumption of equal variances, then our first sample is i.i.d. \({\mathcal{N}\left({\mu_X, \sigma_X^2}\right)}\) and the second sample i.i.d \({\mathcal{N}\left({\mu_Y, \sigma_Y^2}\right)}\), with \(\sigma_X\) not necessarily equal to \(\sigma_Y\). Clearly, there is no single variance parameter to estimate and the test which uses the pooled sample variance would not be appropriate. We wonโt cover the detail of the theory for this in lectures (nor is it examinable), but the idea is straightforward and easy to apply. Without the equal variance assumption, our test statistic becomes \[ t = \frac{(\bar{x}-\bar{y}) - (\mu_X - \mu_Y)}{\sqrt{\frac{s_X^2}{n} + \frac{s_Y^2}{m}}}, \] which still has a \(t\) distribution. The problem arises in determining the appropriate degrees of freedom for the distribution of \(t\). The degrees of freedom of the \(t\) distribution is no longer \(n+m-2\), but instead is approximated by this hideous expression \[ \nu \approx \frac{\left(\frac{s^2_X}{n}+\frac{s^2_Y}{m}\right)^2}{\frac{s^4_X}{n^2(n-1)} + \frac{s^4_Y}{m^2(m-1)}}, \] though often in practice we take the lazier route of using \[\nu=\min(n,m)-1\] as the degrees of freedom (this simpler case corresponds to a conservative version of the test).
t.test
function.
equalvariance
argument than can be TRUE
or FALSE
and will adjust the test performed. You will need to us an if
statement to handle the different cases.print
and cat
functions to get R to print output in the console. Use this to show the test statistic, degrees of freedom, and p-value.
immer
data set from the MASS
package, which contains pairs of measurements of the barley yield from the same fields in years 1931 (Y1
) and 1932 (Y2
).t.test
function using the argument paired=TRUE
.
qsignrank
and psignrank
functions useful to find critical values and \(p\)-values for the signed rank test statistic.
stats
packageWhile we could write our own code every time we want to do a \(t\)-test or rank-sum test, this gets rather tedious rather quickly. Thankfully, these tests are supported by the stats
package in R which allows us to pass the problem of computing the test to a pre-defined function, and we can then simply interpret the results
library
function to load the R package stats
.t.test
function and apply it to your two samples A
and B
. Use the optional argument var.equal
to perform an equal-variance test (TRUE
), and FALSE
to test without this assumption. Compare with your results from Section 2.wilcox.test
function function and try it on A
and B
. The optional argument exact=TRUE
will compute the test exactly, whereas exact=FALSE
will use a Normal approximation. Do the results agree with your calculations?