Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Complements are supported too - and when to use them

Often you don't want the value of the CDF, but its complement, which is to say 1-p rather than p. It is tempting to calculate the CDF and subtract it from 1, but if p is very close to 1 then cancellation error will cause you to lose accuracy, perhaps totally.

See below "Why and when to use complements?"

In this library, whenever you want to receive a complement, just wrap all the function arguments in a call to complement(...), for example:

students_t dist(5);
cout << "CDF at t = 1 is " << cdf(dist, 1.0) << endl;
cout << "Complement of CDF at t = 1 is " << cdf(complement(dist, 1.0)) << endl;

But wait, now that we have a complement, we have to be able to use it as well. Any function that accepts a probability as an argument can also accept a complement by wrapping all of its arguments in a call to complement(...), for example:

students_t dist(5);

for(double i = 10; i < 1e10; i *= 10)
{
   // Calculate the quantile for a 1 in i chance:
   double t = quantile(complement(dist, 1/i));
   // Print it out:
   cout << "Quantile of students-t with 5 degrees of freedom\n"
           "for a 1 in " << i << " chance is " << t << endl;
}
[Tip] Tip

Critical values are just quantiles

Some texts talk about quantiles, or percentiles or fractiles, others about critical values, the basic rule is:

Lower critical values are the same as the quantile.

Upper critical values are the same as the quantile from the complement of the probability.

For example, suppose we have a Bernoulli process, giving rise to a binomial distribution with success ratio 0.1 and 100 trials in total. The lower critical value for a probability of 0.05 is given by:

quantile(binomial(100, 0.1), 0.05)

and the upper critical value is given by:

quantile(complement(binomial(100, 0.1), 0.05))

which return 4.82 and 14.63 respectively.

[Tip] Tip

Why bother with complements anyway?

It's very tempting to dispense with complements, and simply subtract the probability from 1 when required. However, consider what happens when the probability is very close to 1: let's say the probability expressed at float precision is 0.999999940f, then 1 - 0.999999940f = 5.96046448e-008, but the result is actually accurate to just one single bit: the only bit that didn't cancel out!

Or to look at this another way: consider that we want the risk of falsely rejecting the null-hypothesis in the Student's t test to be 1 in 1 billion, for a sample size of 10,000. This gives a probability of 1 - 10-9, which is exactly 1 when calculated at float precision. In this case calculating the quantile from the complement neatly solves the problem, so for example:

quantile(complement(students_t(10000), 1e-9))

returns the expected t-statistic 6.00336, where as:

quantile(students_t(10000), 1-1e-9f)

raises an overflow error, since it is the same as:

quantile(students_t(10000), 1)

Which has no finite result.

With all distributions, even for more reasonable probability (unless the value of p can be represented exactly in the floating-point type) the loss of accuracy quickly becomes significant if you simply calculate probability from 1 - p (because it will be mostly garbage digits for p ~ 1).

So always avoid, for example, using a probability near to unity like 0.99999

quantile(my_distribution, 0.99999)

and instead use

quantile(complement(my_distribution, 0.00001))

since 1 - 0.99999 is not exactly equal to 0.00001 when using floating-point arithmetic.

This assumes that the 0.00001 value is either a constant, or can be computed by some manner other than subtracting 0.99999 from 1.


PrevUpHomeNext