Comparison of Nth-root Finding Algorithms

A second example compares four generalized nth-root finding algorithms for various n-th roots (5, 7 and 13) of a single value 28.0, for four floating-point types, float, double, long double and a Boost.Multiprecision type cpp_bin_float_50. In each case the target accuracy was set using our "recomended" accuracy limits (or at least limits that make a good starting point - which is likely to give close to full accuracy without resorting to unnecessary iterations).

Function	Precision Requested
TOMS748	numeric_limits<T>::digits - 2
Newton	floor(numeric_limits<T>::digits * 0.6)
Halley	floor(numeric_limits<T>::digits * 0.4)
Schröder	floor(numeric_limits<T>::digits * 0.4)

Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code root_n_finding_algorithms.cpp.

The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.

To pick out the 'best' and 'worst' algorithms are highlighted in blue and red. More than one result can be 'best' when normalized times are indistinguishable within the uncertainty.

Program root_n_finding_algorithms.cpp, Microsoft Visual C++ version 12.0, Dinkumware standard library version 610, Win32 Compiled in optimise mode., _X86_SSE2

Fraction of full accuracy 1

Table 12.3. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	7	320	1.53	0	11	576	2.61	1	11	557	2.48	1	12	119843	7.52	0
Newton	3	209	1.00	0	4	221	1.00	-1	4	225	1.00	-1	6	15937	1.00	0
Halley	2	214	1.02	0	3	256	1.16	0	3	243	1.08	0	4	28437	1.78	0
Schröder	2	218	1.04	0	3	245	1.11	-1	3	245	1.09	-1	4	35625	2.24	0

Table 12.4. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	12	493	2.18	1	15	762	3.05	2	15	765	3.08	2	14	157343	7.09	0
Newton	5	226	1.00	0	6	250	1.00	0	6	248	1.00	0	8	22187	1.00	0
Halley	4	257	1.14	0	5	293	1.17	0	5	293	1.18	0	6	44062	1.99	0
Schröder	5	285	1.26	0	6	317	1.27	0	6	317	1.28	0	7	61406	2.77	0

Table 12.5. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	12	556	2.24	-2	14	784	2.94	2	14	793	2.94	2	17	235781	8.88	2
Newton	6	248	1.00	0	7	267	1.00	0	7	270	1.00	0	9	26562	1.00	0
Halley	4	254	1.02	-1	5	290	1.09	0	5	293	1.09	0	6	46406	1.75	0
Schröder	6	312	1.26	0	7	351	1.31	0	7	356	1.32	0	8	76250	2.87	0

Program root_n_finding_algorithms.cpp, Microsoft Visual C++ version 12.0, Dinkumware standard library version 610, Win32 Compiled in optimise mode., _X64_AVX

Fraction of full accuracy 1

Table 12.6. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	7	239	1.50	0	11	451	2.53	1	11	439	2.49	1	12	90312	7.51	0
Newton	3	159	1.00	0	4	178	1.00	-1	4	176	1.00	-1	6	12031	1.00	0
Halley	2	168	1.06	0	3	203	1.14	0	3	198	1.13	0	4	20937	1.74	0
Schröder	2	173	1.09	0	3	206	1.16	-1	3	203	1.15	-1	4	26250	2.18	0

Table 12.7. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	12	385	2.19	1	15	635	3.13	2	15	621	3.17	2	14	114843	6.81	0
Newton	5	176	1.00	0	6	203	1.00	0	6	196	1.00	0	8	16875	1.00	0
Halley	4	209	1.19	0	5	254	1.25	0	5	246	1.26	0	6	32343	1.92	0
Schröder	5	223	1.27	0	6	273	1.34	0	6	275	1.40	0	7	45156	2.68	0

Table 12.8. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	12	467	2.42	-2	14	648	3.06	2	14	640	2.99	2	17	170000	8.85	2
Newton	6	193	1.00	0	7	212	1.00	0	7	214	1.00	0	9	19218	1.00	0
Halley	4	209	1.08	-1	5	256	1.21	0	5	250	1.17	0	6	32656	1.70	0
Schröder	6	248	1.28	0	7	306	1.44	0	7	298	1.39	0	8	53437	2.78	0

Program root_n_finding_algorithms.cpp, GNU C++ version 4.9.2, GNU libstdc++ version 20141030, Win32 Compiled in optimise mode., _X64_SSE2

Fraction of full accuracy 1

Table 12.9. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	7	193	2.14	0	11	432	3.86	1	9	579	3.83	0	12	59062	7.56	0
Newton	3	90	1.00	0	4	112	1.00	-1	5	151	1.00	0	6	7812	1.00	0
Halley	2	98	1.09	0	3	135	1.21	0	3	201	1.33	0	4	13750	1.76	0
Schröder	2	112	1.24	0	3	142	1.27	-1	3	206	1.36	0	4	17031	2.18	0

Table 12.10. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	12	351	1.97	1	15	621	3.18	2	13	906	3.61	0	14	75468	7.10	0
Newton	5	178	1.00	0	6	195	1.00	0	7	251	1.00	0	8	10625	1.00	0
Halley	4	196	1.10	0	5	242	1.24	0	5	345	1.37	0	6	21093	1.99	0
Schröder	5	225	1.26	0	6	270	1.38	0	6	384	1.53	0	7	29062	2.74	0

Table 12.11. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2

	float				double				long d				cpp50
Algo	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis	Its	Times	Norm	Dis
TOMS748	12	429	2.22	-2	14	679	3.02	2	14	1098	3.94	1	17	114531	8.83	2
Newton	6	193	1.00	0	7	225	1.00	0	7	279	1.00	0	9	12968	1.00	0
Halley	4	196	1.02	-1	5	248	1.10	0	5	348	1.25	0	6	21718	1.67	0
Schröder	6	254	1.32	0	7	323	1.44	0	7	453	1.62	0	8	35625	2.75	0

Some tentative conclusions can be drawn from this limited exercise.

Perhaps surprisingly, there is little difference between the various algorithms for fundamental types floating-point types. Using the first derivatives (Newton-Raphson iteration) is usually the best, but while the improvement over the no-derivative TOMS Algorithm 748: enclosing zeros of continuous functions is considerable in number of iterations, but little in execution time. This reflects the fact that the function we are finding the root for is trivial to evaluate, so runtimetimes are dominated by the time taken by the boilerplate code in each method.
The extra cost of evaluating the second derivatives (Halley or Schröder) is usually too much for any net benefit: as with the cube root, these functors are so cheap to evaluate that the runtime is largely dominated by the complexity of the root finding method.
For a Boost.Multiprecision floating-point type, the Newton-Raphson iteration is a clear winner with a several-fold gain over TOMS Algorithm 748: enclosing zeros of continuous functions, and again no improvement from the second-derivative algorithms.
The run-time of 50 decimal-digit Boost.Multiprecision is about 30-fold greater than double.
The column 'dis' showing the number of bits distance from the correct result. The Newton-Raphson algorithm shows a bit or two better accuracy than TOMS Algorithm 748: enclosing zeros of continuous functions.
The goodness of the 'guess' is especially crucial for Boost.Multiprecision. Separate experiments show that evaluating the 'guess' using double allows convergence to the final exact result in one or two iterations. So in this contrived example, crudely dividing the exponent by N for a 'guess', it would be far better to use a pow<double> or , if more precise pow<long double>, function to estimate a 'guess'. The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.
Using floating-point extension SSE2 instructions made a modest ten-percent speedup.
Using MSVC, there was some improvement using 64-bit, markedly for Boost.Multiprecision.
The GCC compiler 4.9.1 using 64-bit was at least five-folder faster that 32-bit, apparently reflecting better optimization.

Clearly, your mileage will vary, but in summary, Newton-Raphson iteration seems the first choice of algorithm, and effort to find a good 'guess' the first speed-up target, especially for Boost.Multiprecision. And of course, compiler optimisation is crucial for speed.