Home | Libraries | People | FAQ | More |
A second example compares four generalized nth-root finding algorithms
for various n-th roots (5, 7 and 13) of a single value 28.0, for four floating-point
types, float
, double
, long
double
and a Boost.Multiprecision
type cpp_bin_float_50
.
In each case the target accuracy was set using our "recomended"
accuracy limits (or at least limits that make a good starting point - which
is likely to give close to full accuracy without resorting to unnecessary
iterations).
Function |
Precision Requested |
---|---|
TOMS748 |
numeric_limits<T>::digits - 2 |
Newton |
floor(numeric_limits<T>::digits * 0.6) |
Halley |
floor(numeric_limits<T>::digits * 0.4) |
Schröder |
floor(numeric_limits<T>::digits * 0.4) |
Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code root_n_finding_algorithms.cpp.
The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.
To pick out the 'best' and 'worst' algorithms are highlighted in blue and red. More than one result can be 'best' when normalized times are indistinguishable within the uncertainty.
Fraction of full accuracy 1
Table 12.3. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
7 |
320 |
1.53 |
0 |
11 |
576 |
2.61 |
1 |
11 |
557 |
2.48 |
1 |
12 |
119843 |
7.52 |
0 |
||||
Newton |
3 |
209 |
1.00 |
0 |
4 |
221 |
1.00 |
-1 |
4 |
225 |
1.00 |
-1 |
6 |
15937 |
1.00 |
0 |
||||
Halley |
2 |
214 |
1.02 |
0 |
3 |
256 |
1.16 |
0 |
3 |
243 |
1.08 |
0 |
4 |
28437 |
1.78 |
0 |
||||
Schröder |
2 |
218 |
1.04 |
0 |
3 |
245 |
1.11 |
-1 |
3 |
245 |
1.09 |
-1 |
4 |
35625 |
2.24 |
0 |
Table 12.4. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
493 |
2.18 |
1 |
15 |
762 |
3.05 |
2 |
15 |
765 |
3.08 |
2 |
14 |
157343 |
7.09 |
0 |
||||
Newton |
5 |
226 |
1.00 |
0 |
6 |
250 |
1.00 |
0 |
6 |
248 |
1.00 |
0 |
8 |
22187 |
1.00 |
0 |
||||
Halley |
4 |
257 |
1.14 |
0 |
5 |
293 |
1.17 |
0 |
5 |
293 |
1.18 |
0 |
6 |
44062 |
1.99 |
0 |
||||
Schröder |
5 |
285 |
1.26 |
0 |
6 |
317 |
1.27 |
0 |
6 |
317 |
1.28 |
0 |
7 |
61406 |
2.77 |
0 |
Table 12.5. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
556 |
2.24 |
-2 |
14 |
784 |
2.94 |
2 |
14 |
793 |
2.94 |
2 |
17 |
235781 |
8.88 |
2 |
||||
Newton |
6 |
248 |
1.00 |
0 |
7 |
267 |
1.00 |
0 |
7 |
270 |
1.00 |
0 |
9 |
26562 |
1.00 |
0 |
||||
Halley |
4 |
254 |
1.02 |
-1 |
5 |
290 |
1.09 |
0 |
5 |
293 |
1.09 |
0 |
6 |
46406 |
1.75 |
0 |
||||
Schröder |
6 |
312 |
1.26 |
0 |
7 |
351 |
1.31 |
0 |
7 |
356 |
1.32 |
0 |
8 |
76250 |
2.87 |
0 |
Fraction of full accuracy 1
Table 12.6. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
7 |
239 |
1.50 |
0 |
11 |
451 |
2.53 |
1 |
11 |
439 |
2.49 |
1 |
12 |
90312 |
7.51 |
0 |
||||
Newton |
3 |
159 |
1.00 |
0 |
4 |
178 |
1.00 |
-1 |
4 |
176 |
1.00 |
-1 |
6 |
12031 |
1.00 |
0 |
||||
Halley |
2 |
168 |
1.06 |
0 |
3 |
203 |
1.14 |
0 |
3 |
198 |
1.13 |
0 |
4 |
20937 |
1.74 |
0 |
||||
Schröder |
2 |
173 |
1.09 |
0 |
3 |
206 |
1.16 |
-1 |
3 |
203 |
1.15 |
-1 |
4 |
26250 |
2.18 |
0 |
Table 12.7. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
385 |
2.19 |
1 |
15 |
635 |
3.13 |
2 |
15 |
621 |
3.17 |
2 |
14 |
114843 |
6.81 |
0 |
||||
Newton |
5 |
176 |
1.00 |
0 |
6 |
203 |
1.00 |
0 |
6 |
196 |
1.00 |
0 |
8 |
16875 |
1.00 |
0 |
||||
Halley |
4 |
209 |
1.19 |
0 |
5 |
254 |
1.25 |
0 |
5 |
246 |
1.26 |
0 |
6 |
32343 |
1.92 |
0 |
||||
Schröder |
5 |
223 |
1.27 |
0 |
6 |
273 |
1.34 |
0 |
6 |
275 |
1.40 |
0 |
7 |
45156 |
2.68 |
0 |
Table 12.8. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
467 |
2.42 |
-2 |
14 |
648 |
3.06 |
2 |
14 |
640 |
2.99 |
2 |
17 |
170000 |
8.85 |
2 |
||||
Newton |
6 |
193 |
1.00 |
0 |
7 |
212 |
1.00 |
0 |
7 |
214 |
1.00 |
0 |
9 |
19218 |
1.00 |
0 |
||||
Halley |
4 |
209 |
1.08 |
-1 |
5 |
256 |
1.21 |
0 |
5 |
250 |
1.17 |
0 |
6 |
32656 |
1.70 |
0 |
||||
Schröder |
6 |
248 |
1.28 |
0 |
7 |
306 |
1.44 |
0 |
7 |
298 |
1.39 |
0 |
8 |
53437 |
2.78 |
0 |
Fraction of full accuracy 1
Table 12.9. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
7 |
193 |
2.14 |
0 |
11 |
432 |
3.86 |
1 |
9 |
579 |
3.83 |
0 |
12 |
59062 |
7.56 |
0 |
||||
Newton |
3 |
90 |
1.00 |
0 |
4 |
112 |
1.00 |
-1 |
5 |
151 |
1.00 |
0 |
6 |
7812 |
1.00 |
0 |
||||
Halley |
2 |
98 |
1.09 |
0 |
3 |
135 |
1.21 |
0 |
3 |
201 |
1.33 |
0 |
4 |
13750 |
1.76 |
0 |
||||
Schröder |
2 |
112 |
1.24 |
0 |
3 |
142 |
1.27 |
-1 |
3 |
206 |
1.36 |
0 |
4 |
17031 |
2.18 |
0 |
Table 12.10. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
351 |
1.97 |
1 |
15 |
621 |
3.18 |
2 |
13 |
906 |
3.61 |
0 |
14 |
75468 |
7.10 |
0 |
||||
Newton |
5 |
178 |
1.00 |
0 |
6 |
195 |
1.00 |
0 |
7 |
251 |
1.00 |
0 |
8 |
10625 |
1.00 |
0 |
||||
Halley |
4 |
196 |
1.10 |
0 |
5 |
242 |
1.24 |
0 |
5 |
345 |
1.37 |
0 |
6 |
21093 |
1.99 |
0 |
||||
Schröder |
5 |
225 |
1.26 |
0 |
6 |
270 |
1.38 |
0 |
6 |
384 |
1.53 |
0 |
7 |
29062 |
2.74 |
0 |
Table 12.11. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
429 |
2.22 |
-2 |
14 |
679 |
3.02 |
2 |
14 |
1098 |
3.94 |
1 |
17 |
114531 |
8.83 |
2 |
||||
Newton |
6 |
193 |
1.00 |
0 |
7 |
225 |
1.00 |
0 |
7 |
279 |
1.00 |
0 |
9 |
12968 |
1.00 |
0 |
||||
Halley |
4 |
196 |
1.02 |
-1 |
5 |
248 |
1.10 |
0 |
5 |
348 |
1.25 |
0 |
6 |
21718 |
1.67 |
0 |
||||
Schröder |
6 |
254 |
1.32 |
0 |
7 |
323 |
1.44 |
0 |
7 |
453 |
1.62 |
0 |
8 |
35625 |
2.75 |
0 |
Some tentative conclusions can be drawn from this limited exercise.
double
.
double
allows convergence to the final
exact result in one or two iterations. So in this contrived example,
crudely dividing the exponent by N for a 'guess', it would be far better
to use a pow<double>
or , if more precise pow<long double>
,
function to estimate a 'guess'. The limitation of this tactic is that
the range of possible (exponent) values may be less than the multiprecision
type.
Clearly, your mileage will vary, but in summary, Newton-Raphson iteration seems the first choice of algorithm, and effort to find a good 'guess' the first speed-up target, especially for Boost.Multiprecision. And of course, compiler optimisation is crucial for speed.