Let's look at some assembly. All assembly here was produced with Clang 4.0
with -O3
.
Given these definitions:
namespace user { struct number { double value; friend number operator+(number lhs, number rhs) { return number{lhs.value + rhs.value}; } friend number operator*(number lhs, number rhs) { return number{lhs.value * rhs.value}; } }; }
Here is a Boost.YAP-based arithmetic function:
user::number eval_as_yap_expr(user::number a_, user::number x_, user::number y_) { term<user::number> a{{a_}}; term<user::number> x{{x_}}; term<user::number> y{{y_}}; auto expr = (a * x + y) * (a * x + y) + (a * x + y); return yap::evaluate(expr); }
and the assembly it produces:
arithmetic_perf[0x100001c00] <+0>: pushq %rbp arithmetic_perf[0x100001c01] <+1>: movq %rsp, %rbp arithmetic_perf[0x100001c04] <+4>: mulsd %xmm1, %xmm0 arithmetic_perf[0x100001c08] <+8>: addsd %xmm2, %xmm0 arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1 arithmetic_perf[0x100001c10] <+16>: mulsd %xmm1, %xmm1 arithmetic_perf[0x100001c14] <+20>: addsd %xmm0, %xmm1 arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0 arithmetic_perf[0x100001c1c] <+28>: popq %rbp arithmetic_perf[0x100001c1d] <+29>: retq
And for the equivalent function using builtin expressions:
user::number eval_as_cpp_expr(user::number a, user::number x, user::number y) { return (a * x + y) * (a * x + y) + (a * x + y); }
the assembly is:
arithmetic_perf[0x100001e10] <+0>: pushq %rbp arithmetic_perf[0x100001e11] <+1>: movq %rsp, %rbp arithmetic_perf[0x100001e14] <+4>: mulsd %xmm1, %xmm0 arithmetic_perf[0x100001e18] <+8>: addsd %xmm2, %xmm0 arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1 arithmetic_perf[0x100001e20] <+16>: mulsd %xmm1, %xmm1 arithmetic_perf[0x100001e24] <+20>: addsd %xmm0, %xmm1 arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0 arithmetic_perf[0x100001e2c] <+28>: popq %rbp arithmetic_perf[0x100001e2d] <+29>: retq
If we increase the number of terminals by a factor of four:
user::number eval_as_yap_expr_4x(user::number a_, user::number x_, user::number y_) { term<user::number> a{{a_}}; term<user::number> x{{x_}}; term<user::number> y{{y_}}; auto expr = (a * x + y) * (a * x + y) + (a * x + y) + (a * x + y) * (a * x + y) + (a * x + y) + (a * x + y) * (a * x + y) + (a * x + y) + (a * x + y) * (a * x + y) + (a * x + y); return yap::evaluate(expr); }
the results are the same: in this simple case, the Boost.YAP and builtin expressions result in the same object code.
However, increasing the number of terminals by an additional factor of 2.5 (for a total of 90 terminals), the inliner can no longer do as well for Boost.YAP expressions as for builtin ones.
More complex nonarithmetic code produces more mixed results. For example, here is a function using code from the Map Assign example:
std::map<std::string, int> make_map_with_boost_yap () { return map_list_of ("<", 1) ("<=",2) (">", 3) (">=",4) ("=", 5) ("<>",6) ; }
By contrast, here is the Boost.Assign version of the same function:
std::map<std::string, int> make_map_with_boost_assign () { return boost::assign::map_list_of ("<", 1) ("<=",2) (">", 3) (">=",4) ("=", 5) ("<>",6) ; }
Here is how you might do it "manually":
std::map<std::string, int> make_map_manually () { std::map<std::string, int> retval; retval.emplace("<", 1); retval.emplace("<=",2); retval.emplace(">", 3); retval.emplace(">=",4); retval.emplace("=", 5); retval.emplace("<>",6); return retval; }
Finally, here is the same map created from an initializer list:
std::map<std::string, int> make_map_inializer_list () { std::map<std::string, int> retval = { {"<", 1}, {"<=",2}, {">", 3}, {">=",4}, {"=", 5}, {"<>",6} }; return retval; }
All of these produce roughly the same amount of assembly instructions. Benchmarking these four functions with Google Benchmark yields these results:
Table 1.5. Runtimes of Different Map Constructions
Function |
Time (ns) |
---|---|
make_map_with_boost_yap() |
1285 |
make_map_with_boost_assign() |
1459 |
make_map_manually() |
985 |
make_map_inializer_list() |
954 |
The Boost.YAP-based implementation finishes in the middle of the pack.
In general, the expression trees produced by Boost.YAP get evaluated down to something close to the hand-written equivalent. There is an abstraction penalty, but it is small for reasonably-sized expressions.