Testing a sample mean for difference from a "true" mean

When calibrating or comparing a scientific instrument or measurement method of some kind, we want to be answer the question "Does an observed sample mean differ from the "true" mean in any significant way?". If it does, then we have evidence of a systematic difference. This question can be answered with a Students-t test: more information can be found on the NIST site.

Of course, the assignment of "true" to one mean may be quite arbitrary, often this is simply a "traditional" method of measurement.

The following example code is taken from the example program students_t_single_sample.cpp.

We'll begin by defining a procedure to determine which of the possible hypothesis are rejected or not-rejected at a given significance level:

Note

Non-statisticians might say 'not-rejected' means 'accepted', (often of the null-hypothesis) implying, wrongly, that there really IS no difference, but statisticans eschew this to avoid implying that there is positive evidence of 'no difference'. 'Not-rejected' here means there is no evidence of difference, but there still might well be a difference. For example, see argument from ignorance and Absence of evidence does not constitute evidence of absence.

// Needed includes:
#include <boost/math/distributions/students_t.hpp>
#include <iostream>
#include <iomanip>
// Bring everything into global namespace for ease of use:
using namespace boost::math;
using namespace std;

void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha)
{
   //
   // M = true mean.
   // Sm = Sample Mean.
   // Sd = Sample Standard Deviation.
   // Sn = Sample Size.
   // alpha = Significance Level.

Most of the procedure is pretty-printing, so let's just focus on the calculation, we begin by calculating the t-statistic:

// Difference in means:
double diff = Sm - M;
// Degrees of freedom:
unsigned v = Sn - 1;
// t-statistic:
double t_stat = diff * sqrt(double(Sn)) / Sd;

Finally calculate the probability from the t-statistic. If we're interested in simply whether there is a difference (either less or greater) or not, we don't care about the sign of the t-statistic, and we take the complement of the probability for comparison to the significance level:

students_t dist(v);
double q = cdf(complement(dist, fabs(t_stat)));

The procedure then prints out the results of the various tests that can be done, these can be summarised in the following table:

Hypothesis	Test
The Null-hypothesis: there is no difference in means	Reject if complement of CDF for \|t\| < significance level / 2: `cdf(complement(dist, fabs(t))) < alpha / 2`
The Alternative-hypothesis: there is difference in means	Reject if complement of CDF for \|t\| > significance level / 2: `cdf(complement(dist, fabs(t))) > alpha / 2`
The Alternative-hypothesis: the sample mean is less than the true mean.	Reject if CDF of t > 1 - significance level: `cdf(complement(dist, t)) < alpha`
The Alternative-hypothesis: the sample mean is greater than the true mean.	Reject if complement of CDF of t < significance level: `cdf(dist, t) < alpha`

	Note
	Notice that the comparisons are against `alpha / 2` for a two-sided test and against `alpha` for a one-sided test

Now that we have all the parts in place, let's take a look at some sample output, first using the Heat flow data from the NIST site. The data set was collected by Bob Zarr of NIST in January, 1990 from a heat flow meter calibration and stability analysis. The corresponding dataplot output for this test can be found in section 3.5.2 of the NIST/SEMATECH e-Handbook of Statistical Methods..

__________________________________
Student t test for a single sample
__________________________________

Number of Observations                                 =  195
Sample Mean                                            =  9.26146
Sample Standard Deviation                              =  0.02279
Expected True Mean                                     =  5.00000

Sample Mean - Expected Test Mean                       =  4.26146
Degrees of Freedom                                     =  194
T Statistic                                            =  2611.28380
Probability that difference is due to chance           =  0.000e+000

Results for Alternative Hypothesis and alpha           =  0.0500

Alternative Hypothesis     Conclusion
Mean != 5.000            NOT REJECTED
Mean  < 5.000            REJECTED
Mean  > 5.000            NOT REJECTED

You will note the line that says the probability that the difference is due to chance is zero. From a philosophical point of view, of course, the probability can never reach zero. However, in this case the calculated probability is smaller than the smallest representable double precision number, hence the appearance of a zero here. Whatever its "true" value is, we know it must be extraordinarily small, so the alternative hypothesis - that there is a difference in means - is not rejected.

For comparison the next example data output is taken from P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907. The values result from the determination of mercury by cold-vapour atomic absorption.

__________________________________
Student t test for a single sample
__________________________________

Number of Observations                                 =  3
Sample Mean                                            =  37.80000
Sample Standard Deviation                              =  0.96437
Expected True Mean                                     =  38.90000

Sample Mean - Expected Test Mean                       =  -1.10000
Degrees of Freedom                                     =  2
T Statistic                                            =  -1.97566
Probability that difference is due to chance           =  1.869e-001

Results for Alternative Hypothesis and alpha           =  0.0500

Alternative Hypothesis     Conclusion
Mean != 38.900            REJECTED
Mean  < 38.900            NOT REJECTED
Mean  > 38.900            NOT REJECTED

As you can see the small number of measurements (3) has led to a large uncertainty in the location of the true mean. So even though there appears to be a difference between the sample mean and the expected true mean, we conclude that there is no significant difference, and are unable to reject the null hypothesis. However, if we were to lower the bar for acceptance down to alpha = 0.1 (a 90% confidence level) we see a different output:

__________________________________
Student t test for a single sample
__________________________________

Number of Observations                                 =  3
Sample Mean                                            =  37.80000
Sample Standard Deviation                              =  0.96437
Expected True Mean                                     =  38.90000

Sample Mean - Expected Test Mean                       =  -1.10000
Degrees of Freedom                                     =  2
T Statistic                                            =  -1.97566
Probability that difference is due to chance           =  1.869e-001

Results for Alternative Hypothesis and alpha           =  0.1000

Alternative Hypothesis     Conclusion
Mean != 38.900            REJECTED
Mean  < 38.900            NOT REJECTED
Mean  > 38.900            REJECTED

In this case, we really have a borderline result, and more data (and/or more accurate data), is needed for a more convincing conclusion.