Floating-point Comparison

Comparison of floating-point values has always been a source of endless difficulty and confusion.

Unlike integral values that are exact, all floating-point operations will potentially produce an inexact result that will be rounded to the nearest available binary representation. Even apparently inocuous operations such as assigning 0.1 to a double produces an inexact result (as this decimal number has no exact binary representation).

Floating-point computations also involve rounding so that some 'computational noise' is added, and hence results are also not exact (although repeatable, at least under identical platforms and compile options).

Sadly, this conflicts with the expectation of most users, as many articles and innumerable cries for help show all too well.

Some background reading is:

Knuth D.E. The art of computer programming, vol II, section 4.2, especially Floating-Point Comparison 4.2.2, pages 198-220.
David Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
Alberto Squassabia, Comparing floats listing
Google Floating-Point_Comparison guide
Boost.Test Floating-Point_Comparison

Boost provides a number of ways to compare floating-point values to see if they are tolerably close enough to each other, but first we must decide what kind of comparison we require:

Absolute difference/error: the absolute difference between two values a and b is simply fabs(a-b). This is the only meaningful comparison to make if we know that the result may have cancellation error (see below).
The edit distance between the two values: i.e. how many (binary) floating-point values are between two values a and b? This is provided by the function Boost.Math float_distance, but is probably only useful when you know that the distance should be very small. This function is somewhat difficult to compute, and doesn't scale to values that are very far apart. In other words, use with care.
The relative distance/error between two values. This is quick and easy to compute, and is generally the method of choice when checking that your results are "tolerably close" to one another. However, it is not as exact as the edit distance when dealing with small differences, and due to the way floating-point values are encoded can "wobble" by a factor of 2 compared to the "true" edit distance. This is the method documented below: if float_distance is a surgeon's scalpel, then relative_difference is more like a Swiss army knife: both have important but different use cases.

Relative Comparison of Floating-point Values

#include <boost/math/special_functions/relative_difference.hpp>

template <class T, class U>
calculated-result-type relative_difference(T a, U b);

template <class T, class U>
calculated-result-type epsilon_difference(T a, U b);

The function relative_difference returns the relative distance/error E between two values as defined by:

E = fabs((a - b) / min(a,b))

The function epsilon_difference is a convenience function that returns relative_difference(a, b) / eps where eps is the machine epsilon for the result type.

The following special cases are handled as follows:

If either of a or b is a NaN, then returns the largest representable value for T: for example for type double, this is std::numeric_limits<double>::max() which is the same as DBL_MAX or 1.7976931348623157e+308.
If a and b differ in sign then returns the largest representable value for T.
If both a and b are both infinities (of the same sign), then returns zero.
If just one of a and b is an infinity, then returns the largest representable value for T.
If both a and b are zero then returns zero.
If just one of a or b is a zero or a denormalized value, then it is treated as if it were the smallest (non-denormalized) value representable in T for the purposes of the above calculation.

These rules were primarily designed to assist with our own test suite, they are designed to be robust enough that the function can in most cases be used blindly, including in cases where the expected result is actually too small to represent in type T and underflows to zero.

Examples

Some using statements will ensure that the functions we need are accessible.

using namespace boost::math;

using boost::math::relative_difference;
using boost::math::epsilon_difference;
using boost::math::float_next;
using boost::math::float_prior;

The following examples display values with all possibly significant digits. Newer compilers should provide std::numeric_limitsFPT>::max_digits10 for this purpose, and here we use float precision where max_digits10 = 9 to avoid displaying a distracting number of decimal digits.

	Note
	Older compilers can use this formula to calculate `max_digits10` from `std::numeric_limits<FPT>::digits10`: `int max_digits10 = 2 + std::numeric_limits<FPT>::digits10 * 3010/10000;`

One can set the display including all trailing zeros (helpful for this example to show all potentially significant digits), and also to display bool values as words rather than integers:

std::cout.precision(std::numeric_limits<float>::max_digits10);
std::cout << std::boolalpha << std::showpoint << std::endl;

When comparing values that are quite close or approximately equal, we could use either float_distance or relative_difference/epsilon_difference, for example with type float, these two values are adjacent to each other:

float a = 1;
float b = 1 + std::numeric_limits<float>::epsilon();
std::cout << "a = " << a << std::endl;
std::cout << "b = " << b << std::endl;
std::cout << "float_distance = " << float_distance(a, b) << std::endl;
std::cout << "relative_difference = " << relative_difference(a, b) << std::endl;
std::cout << "epsilon_difference = " << epsilon_difference(a, b) << std::endl;

Which produces the output:

a = 1.00000000
b = 1.00000012
float_distance = 1.00000000
relative_difference = 1.19209290e-007
epsilon_difference = 1.00000000

In the example above, it just so happens that the edit distance as measured by float_distance, and the difference measured in units of epsilon were equal. However, due to the way floating point values are represented, that is not always the case:

a = 2.0f / 3.0f;   // 2/3 inexactly represented as a float
b = float_next(float_next(float_next(a))); // 3 floating point values above a
std::cout << "a = " << a << std::endl;
std::cout << "b = " << b << std::endl;
std::cout << "float_distance = " << float_distance(a, b) << std::endl;
std::cout << "relative_difference = " << relative_difference(a, b) << std::endl;
std::cout << "epsilon_difference = " << epsilon_difference(a, b) << std::endl;

Which produces the output:

a = 0.666666687
b = 0.666666865
float_distance = 3.00000000
relative_difference = 2.68220901e-007
epsilon_difference = 2.25000000

There is another important difference between float_distance and the relative_difference/epsilon_difference functions in that float_distance returns a signed result that reflects which argument is larger in magnitude, where as relative_difference/epsilon_difference simply return an unsigned value that represents how far apart the values are. For example if we swap the order of the arguments:

std::cout << "float_distance = " << float_distance(b, a) << std::endl;
std::cout << "relative_difference = " << relative_difference(b, a) << std::endl;
std::cout << "epsilon_difference = " << epsilon_difference(b, a) << std::endl;

The output is now:

float_distance = -3.00000000
relative_difference = 2.68220901e-007
epsilon_difference = 2.25000000

Zeros are always treated as equal, as are infinities as long as they have the same sign:

a = 0;
b = -0;  // signed zero
std::cout << "relative_difference = " << relative_difference(a, b) << std::endl;
a = b = std::numeric_limits<float>::infinity();
std::cout << "relative_difference = " << relative_difference(a, b) << std::endl;
std::cout << "relative_difference = " << relative_difference(a, -b) << std::endl;

Which produces the output:

relative_difference = 0.000000000
relative_difference = 0.000000000
relative_difference = 3.40282347e+038

Note that finite values are always infinitely far away from infinities even if those finite values are very large:

a = (std::numeric_limits<float>::max)();
b = std::numeric_limits<float>::infinity();
std::cout << "a = " << a << std::endl;
std::cout << "b = " << b << std::endl;
std::cout << "relative_difference = " << relative_difference(a, b) << std::endl;
std::cout << "epsilon_difference = " << epsilon_difference(a, b) << std::endl;

Which produces the output:

a = 3.40282347e+038
b = 1.#INF0000
relative_difference = 3.40282347e+038
epsilon_difference = 3.40282347e+038

Finally, all denormalized values and zeros are treated as being effectively equal:

a = std::numeric_limits<float>::denorm_min();
b = a * 2;
std::cout << "a = " << a << std::endl;
std::cout << "b = " << b << std::endl;
std::cout << "float_distance = " << float_distance(a, b) << std::endl;
std::cout << "relative_difference = " << relative_difference(a, b) << std::endl;
std::cout << "epsilon_difference = " << epsilon_difference(a, b) << std::endl;
a = 0;
std::cout << "a = " << a << std::endl;
std::cout << "b = " << b << std::endl;
std::cout << "float_distance = " << float_distance(a, b) << std::endl;
std::cout << "relative_difference = " << relative_difference(a, b) << std::endl;
std::cout << "epsilon_difference = " << epsilon_difference(a, b) << std::endl;

Which produces the output:

a = 1.40129846e-045
b = 2.80259693e-045
float_distance = 1.00000000
relative_difference = 0.000000000
epsilon_difference = 0.000000000
a = 0.000000000
b = 2.80259693e-045
float_distance = 2.00000000
relative_difference = 0.000000000
epsilon_difference = 0.000000000

Notice how, in the above example, two denormalized values that are a factor of 2 apart are none the less only one representation apart!

All the above examples are contained in float_comparison_example.cpp.

Handling Absolute Errors

Imagine we're testing the following function:

double myspecial(double x)
{
   return sin(x) - sin(4 * x);
}

This function has multiple roots, some of which are quite predicable in that both sin(x) and sin(4x) are zero together. Others occur because the values returned from those two functions precisely cancel out. At such points the relative difference between the true value of the function and the actual value returned may be arbitrarily large due to cancellation error.

In such a case, testing the function above by requiring that the values returned by relative_error or epsilon_error are below some threshold is pointless: the best we can do is to verify that the absolute difference between the true and calculated values is below some threshold.

Of course, determining what that threshold should be is often tricky, but a good starting point would be machine epsilon multiplied by the largest of the values being summed. In the example above, the largest value returned by sin(whatever) is 1, so simply using machine epsilon as the target for maximum absolute difference might be a good start (though in practice we may need a slightly higher value - some trial and error will be necessary).