Inlining of a recursive function

Here’s the solution I’ve found, thanks to grek40’s comment and to StoryTeller’s answer.

(As for my previous problem with the unused function template instance left in the compiled binary, I solved it by compiling the original code — without the gnu::always_inline and gnu::flatten attributes — with the arguments -ffunction-sections -fdata-sections -Wl,--gc-sections.)

Now reflect_mask_helper_0 is inside a struct (because C++ doesn’t allow partial specialization of function templates), and the i parameter of the function became the Index parameter of the struct template.

#include <iostream>
#include <limits.h>

// End recursive template-expansion of function select below.
template <typename Type>
static inline constexpr Type select(unsigned index)
{ return Type(); }

// Select one of the items passed to it.
// e.g. select(0, a, b, c) = a; select(1, a, b, c) = b; etc.
template <typename Type, typename... Params>
[[gnu::always_inline]]
static inline constexpr Type select(unsigned index, Type value, Params... values)
{ return index == 0 ? value : select<Type>(index - 1, values...); }

template <typename Type>
[[gnu::always_inline]]
static inline constexpr Type reflect_mask_helper_1(Type mask, Type shift, Type value)
{ return ((value & mask) >> shift) | ((value << shift) & mask); }

template <typename Type, unsigned Index>
struct reflect_mask_helper_0
{
  [[gnu::always_inline]]
  static inline constexpr Type invoke(Type value)
  {
    return reflect_mask_helper_0<Type, Index - 1>::call(
      reflect_mask_helper_1<Type>(
        static_cast<Type>(select(Index - 1,
          0xaaaaaaaaaaaaaaaa, 0xcccccccccccccccc, 0xf0f0f0f0f0f0f0f0,
          0xff00ff00ff00ff00, 0xffff0000ffff0000, 0xffffffff00000000)),
        1 << (Index - 1),
        value));
  }
};

template <typename Type>
struct reflect_mask_helper_0<Type, 0>
{
  [[gnu::always_inline]]
  static inline constexpr Type invoke(Type value) { return value; }
};

template <typename Type>
static inline constexpr Type reflect_mask(Type value)
{ return reflect_mask_helper_0<Type, __builtin_ctz(sizeof(Type) * CHAR_BIT)>::invoke(value); }

int main(void) {
  for (int i = 0; i < 65536; i++) {
    std::cout << reflect_mask<uint16_t>(i) << std::endl;
  }
}

If you check out the resulting assembly code, if you remove the always_inline and flatten attributes, you can see that gcc actually inlines everything correctly.

So, this issue is a QoI thing. Maybe, at that point, when always_inline handled, it cannot be inlined (hence the error message), but gcc decides to inline it afterwards anyways.

Btw., you can finetune gcc, and with a little modification to your code, gcc can compile it:

  • pass --param max-early-inliner-iterations=3 to gcc
  • remove the flatten attribute (no idea, why it matters...)

(So, actually, this issue has nothing to do with recursive calls - from the compiler standpoint, it doesn't matter whether the function is recursive, or not, it just follows the flow of the code - to a certain extent, of course. Here, recursive depth is just 4, it is not too hard to follow for a compiler)


select doesn't actually calls itself. It pops the front of the type list it received and then calls another specialization of select<Type, ...>. The trailing parameter pack is different. Since that "recursion" is essentially a finite set of nested function calls (different functions), GCC can see right through it, regardless of the run-time parameter.

But reflect_mask_helper_0 does call itself, with the same template arguments, indefinitely. GCC has no way to tell how deep this run-time recursion will go at run-time. Recall that a constexpr function is still a regular function that must be invocable at run-time.