How can modern compiler optimization convert recursion into returning a constant?

GCC's optimization passes work on an intermediary representation of your code in a format called GIMPLE.

Using the -fdump-* options, you can ask GCC to output intermediary states of the tree and discover many details about the performed optimizations.

In this case the interesting files are (numbers may vary depending on the GCC version):

.004t.gimple

This is the starting point:

int Identity(int) (int i)
{
  int D.2330;
  int D.2331;
  int D.2332;

  if (i == 1) goto <D.2328>; else goto <D.2329>;
  <D.2328>:
  D.2330 = 1;
  return D.2330;
  <D.2329>:
  D.2331 = i + -1;
  D.2332 = Identity (D.2331);
  D.2330 = D.2332 + 1;
  return D.2330;
}

.038t.eipa_sra

The last optimized source which presents recursion:

int Identity(int) (int i)
{
  int _1;
  int _6;
  int _8;
  int _10;

  <bb 2>:
  if (i_3(D) == 1)
    goto <bb 4>;
  else
    goto <bb 3>;

  <bb 3>:
  _6 = i_3(D) + -1;
  _8 = Identity (_6);
  _10 = _8 + 1;

  <bb 4>:
  # _1 = PHI <1(2), _10(3)>
  return _1;
}

As is normal with SSA, GCC inserts fake functions known as PHI at the start of basic blocks where needed in order to merge the multiple possible values of a variable.

Here:

# _1 = PHI <1(2), _10(3)>

where _1 either gets the value of 1, or of _10, depending on whether we reach here via block 2 or block 3.

.039t.tailr1

This is the first dump in which the recursion has been turned into a loop:

int Identity(int) (int i)
{
  int _1;
  int add_acc_4;
  int _6;
  int acc_tmp_8;
  int add_acc_10;

  <bb 2>:
  # i_3 = PHI <i_9(D)(0), _6(3)>
  # add_acc_4 = PHI <0(0), add_acc_10(3)>
  if (i_3 == 1)
    goto <bb 4>;
  else
    goto <bb 3>;

  <bb 3>:
  _6 = i_3 + -1;
  add_acc_10 = add_acc_4 + 1;
  goto <bb 2>;

  <bb 4>:
  # _1 = PHI <1(2)>
  acc_tmp_8 = add_acc_4 + _1;
  return acc_tmp_8;
}

The same optimisation that handles tail calls also handles trivial cases of making the call tail recursive by creating accumulators.

There is a very similar example in the starting comment of the https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-tailcall.c file:


The file implements the tail recursion elimination. It is also used to analyze the tail calls in general, passing the results to the rtl level where they are used for sibcall optimization.

In addition to the standard tail recursion elimination, we handle the most trivial cases of making the call tail recursive by creating accumulators.

For example the following function

int sum (int n)
{
  if (n > 0)
    return n + sum (n - 1);
  else
    return 0;
}

is transformed into

int sum (int n)
{
  int acc = 0;
  while (n > 0)
    acc += n--;
  return acc;
}

To do this, we maintain two accumulators (a_acc and m_acc) that indicate when we reach the return x statement, we should return a_acc + x * m_acc instead. They are initially initialized to 0 and 1, respectively, so the semantics of the function is obviously preserved. If we are guaranteed that the value of the accumulator never change, we omit the accumulator.

There are three cases how the function may exit. The first one is handled in adjust_return_value, the other two in adjust_accumulator_values (the second case is actually a special case of the third one and we present it separately just for clarity):

  1. Just return x, where x is not in any of the remaining special shapes. We rewrite this to a gimple equivalent of return m_acc * x + a_acc.
  2. return f (...), where f is the current function, is rewritten in a classical tail-recursion elimination way, into assignment of arguments and jump to the start of the function. Values of the accumulators are unchanged.
  3. return a + m * f(...), where a and m do not depend on call to f. To preserve the semantics described before we want this to be rewritten in such a way that we finally return a_acc + (a + m * f(...)) * m_acc = (a_acc + a * m_acc) + (m * m_acc) * f(...). I.e. we increase a_acc by a * m_acc, multiply m_acc by m and eliminate the tail call to f. Special cases when the value is just added or just multiplied are obtained by setting a = 0 or m = 1.