Home | Libraries | People | FAQ | More |
For nearly all applications, the built-in floating-point types, double
(and long
double
if this offers higher precision
than double
) offer enough precision,
typically a dozen decimal digits.
Some reasons why one would want to use a higher precision:
double
is (or may
be) too inaccurate.
long double
(or may be) is too inaccurate.
Many functions and distributions have differences from exact values that are only a few least significant bits - computation noise. Others, often those for which analytical solutions are not available, require approximations and iteration: these may lose several decimal digits of precision.
Much larger loss of precision can occur for boundary or corner cases, often caused by cancellation errors.
(Some of the worst and most common examples of cancellation error or loss of significance can be avoided by using complements: see why complements?).
If you require a value which is as accurate as can be represented in the
floating-point type, and is thus the closest representable value and has
an error less than 1/2 a least
significant bit or ulp
it may be useful to use a higher-precision type, for example, cpp_dec_float_50
, to generate this value.
Conversion of this value to a built-in floating-point type ('float', double
or long
double
) will not cause any further
loss of precision. A decimal digit string will also be 'read' precisely by
the compiler into a built-in floating-point type to the nearest representable
value.
Note | |
---|---|
In contrast, reading a value from an |
William Kahan coined the term Table-Maker's Dilemma for the problem of correctly rounding functions. Using a much higher precision (50 or 100 decimal digits) is a practical way of generating (almost always) correctly rounded values.