PrevUpHomeNext

Object Code

Let's look at some assembly. All assembly here was produced with Clang 4.0 with -O3. Given these definitions:

namespace user {

    struct number
    {
        double value;

        friend number operator+(number lhs, number rhs)
        {
            return number{lhs.value + rhs.value};
        }

        friend number operator*(number lhs, number rhs)
        {
            return number{lhs.value * rhs.value};
        }
    };
}

Here is a Boost.YAP-based arithmetic function:

user::number eval_as_yap_expr(user::number a_, user::number x_, user::number y_)
{
    term<user::number> a{{a_}};
    term<user::number> x{{x_}};
    term<user::number> y{{y_}};
    auto expr = (a * x + y) * (a * x + y) + (a * x + y);
    return yap::evaluate(expr);
}

and the assembly it produces:

arithmetic_perf[0x100001c00] <+0>:  pushq  %rbp
arithmetic_perf[0x100001c01] <+1>:  movq   %rsp, %rbp
arithmetic_perf[0x100001c04] <+4>:  mulsd  %xmm1, %xmm0
arithmetic_perf[0x100001c08] <+8>:  addsd  %xmm2, %xmm0
arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1
arithmetic_perf[0x100001c10] <+16>: mulsd  %xmm1, %xmm1
arithmetic_perf[0x100001c14] <+20>: addsd  %xmm0, %xmm1
arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0
arithmetic_perf[0x100001c1c] <+28>: popq   %rbp
arithmetic_perf[0x100001c1d] <+29>: retq

And for the equivalent function using builtin expressions:

user::number eval_as_cpp_expr(user::number a, user::number x, user::number y)
{
    return (a * x + y) * (a * x + y) + (a * x + y);
}

the assembly is:

arithmetic_perf[0x100001e10] <+0>:  pushq  %rbp
arithmetic_perf[0x100001e11] <+1>:  movq   %rsp, %rbp
arithmetic_perf[0x100001e14] <+4>:  mulsd  %xmm1, %xmm0
arithmetic_perf[0x100001e18] <+8>:  addsd  %xmm2, %xmm0
arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1
arithmetic_perf[0x100001e20] <+16>: mulsd  %xmm1, %xmm1
arithmetic_perf[0x100001e24] <+20>: addsd  %xmm0, %xmm1
arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0
arithmetic_perf[0x100001e2c] <+28>: popq   %rbp
arithmetic_perf[0x100001e2d] <+29>: retq

If we increase the number of terminals by a factor of four:

user::number
eval_as_yap_expr_4x(user::number a_, user::number x_, user::number y_)
{
    term<user::number> a{{a_}};
    term<user::number> x{{x_}};
    term<user::number> y{{y_}};
    auto expr = (a * x + y) * (a * x + y) + (a * x + y) +
                (a * x + y) * (a * x + y) + (a * x + y) +
                (a * x + y) * (a * x + y) + (a * x + y) +
                (a * x + y) * (a * x + y) + (a * x + y);
    return yap::evaluate(expr);
}

the results are the same: in this simple case, the Boost.YAP and builtin expressions result in the same object code.

However, increasing the number of terminals by an additional factor of 2.5 (for a total of 90 terminals), the inliner can no longer do as well for Boost.YAP expressions as for builtin ones.

More complex nonarithmetic code produces more mixed results. For example, here is a function using code from the Map Assign example:

std::map<std::string, int> make_map_with_boost_yap ()
{
    return map_list_of
        ("<", 1)
        ("<=",2)
        (">", 3)
        (">=",4)
        ("=", 5)
        ("<>",6)
        ;
}

By contrast, here is the Boost.Assign version of the same function:

std::map<std::string, int> make_map_with_boost_assign ()
{
    return boost::assign::map_list_of
        ("<", 1)
        ("<=",2)
        (">", 3)
        (">=",4)
        ("=", 5)
        ("<>",6)
        ;
}

Here is how you might do it "manually":

std::map<std::string, int> make_map_manually ()
{
    std::map<std::string, int> retval;
    retval.emplace("<", 1);
    retval.emplace("<=",2);
    retval.emplace(">", 3);
    retval.emplace(">=",4);
    retval.emplace("=", 5);
    retval.emplace("<>",6);
    return retval;
}

Finally, here is the same map created from an initializer list:

std::map<std::string, int> make_map_inializer_list ()
{
    std::map<std::string, int> retval = {
        {"<", 1},
        {"<=",2},
        {">", 3},
        {">=",4},
        {"=", 5},
        {"<>",6}
    };
    return retval;
}

All of these produce roughly the same amount of assembly instructions. Benchmarking these four functions with Google Benchmark yields these results:

Table 1.5. Runtimes of Different Map Constructions

Function

Time (ns)

make_map_with_boost_yap()

1285

make_map_with_boost_assign()

1459

make_map_manually()

985

make_map_inializer_list()

954


The Boost.YAP-based implementation finishes in the middle of the pack.

In general, the expression trees produced by Boost.YAP get evaluated down to something close to the hand-written equivalent. There is an abstraction penalty, but it is small for reasonably-sized expressions.


PrevUpHomeNext