Why to avoid Explicit recursion in Haskell?

You are measuring how quickly GHC can do half a million modulus operations. As you might expect, "in the blink of an eye" is the answer regardless of how you iterate. There is no obvious difference in speed.

You claim that you can see that explicit recursion is using less memory, but the heap profiling data you provide shows the opposite: more allocation and higher max residency when using explicit recursion. I don't think the difference is significant, but if it were then your evidence would be contradicting your claim.

As to the question of why to avoid explicit recursion, it's not really clear what part of that thread you read that made you come to your conclusion. You linked to a giant thread which itself links to another giant thread, with many competing opinions. The comment that stands out the most to me is it's not about efficiency, it's about levels of abstraction. You are looking at this the wrong way by trying to measure its performance.


First, don't try to understand the performance of GHC-compiled code using anything other than optimized compilation:

$ stack ghc -- -O2 Find.hs
$ ./Find +RTS -s

With the -O2 flag (and GHC version 8.6.4), your find performs as follows:

      16,051,544 bytes allocated in the heap
          14,184 bytes copied during GC
          44,576 bytes maximum residency (2 sample(s))
          29,152 bytes maximum slop
               0 MB total memory in use (0 MB lost due to fragmentation)

However, this is very misleading. None of this memory usage is due to the looping performed by foldr. Rather it's all due to the use of boxed Integers. If you switch to using plain Ints which the compiler can unbox:

main = print $ find (\x -> x `mod` 2 == 0) [1::Int, 3..1000000]
                                             ^^^^^

the memory performance changes drastically and demonstrates the true memory cost of foldr:

      51,544 bytes allocated in the heap
       3,480 bytes copied during GC
      44,576 bytes maximum residency (1 sample(s))
      25,056 bytes maximum slop
           0 MB total memory in use (0 MB lost due to fragmentation)

If you test findRec with Ints like so:

 main = print $ findRec (\x -> x `mod` 2 == 0) [1::Int, 3..1000000]

you'll see much worse memory performance:

  40,051,528 bytes allocated in the heap
      14,992 bytes copied during GC
      44,576 bytes maximum residency (2 sample(s))
      29,152 bytes maximum slop
           0 MB total memory in use (0 MB lost due to fragmentation)

which seems to make a compelling case that recursion should be avoided in preference to foldr, but this, too, is very misleading. What you are seeing here is not the memory cost of recursion, but rather the memory cost of "list building".

See, foldr and the expression [1::Int, 3..1000000] both include some magic called "list fusion". This means that when they are used together (i.e., when foldr is applied to [1::Int 3..1000000]), an optimization can be performed to completely eliminate the creation of a Haskell list. Critically, the foldr code, even using list fusion, compiles to recursive code which looks like this:

main_go
  = \ x ->
      case gtInteger# x lim of {
        __DEFAULT ->
          case eqInteger# (modInteger x lvl) lvl1 of {
            __DEFAULT -> main_go (plusInteger x lvl);
                      -- ^^^^^^^ - SEE?  IT'S JUST RECURSION
            1# -> Just x
          };
        1# -> Nothing
      }
end Rec }

So, it's list fusion, rather than "avoiding recursion" that makes find faster than findRec.

You can see this is true by considering the performance of:

find1 :: Int -> Maybe Int
find1 n | n >= 1000000 = Nothing
        | n `mod` 2 == 0 = Just n
        | otherwise = find1 (n+2)

main :: IO ()
main = print $ find1 1

Even though this uses recursion, it doesn't generate a list (or use boxed Integers), so it runs just like the foldr version:

      51,544 bytes allocated in the heap
       3,480 bytes copied during GC
      44,576 bytes maximum residency (1 sample(s))
      25,056 bytes maximum slop
           0 MB total memory in use (0 MB lost due to fragmentation)

So, what are the take home lessons?

  • Always benchmark Haskell code using ghc -O2, never GHCi or ghc without optimization flags.
  • Less than 10% of people in any Reddit thread know what they're talking about.
  • foldr can sometimes perform better than explicit recursion when special optimizations like list fusion can apply.
  • But in the general case, explicit recursion performs just as well as foldr or other specialized constructs.
  • Also, optimizing Haskell code is hard.

Actually, here's a better (more serious) take-home lesson. Especially when you're getting started with Haskell, make every possible effort to avoid thinking about "optimizing" your code. Far more than any other language I know, there is an enormous gulf between the code you write and the code the compiler generates, so don't even try to figure it out right now. Instead, write code that is clear, straightforward, and idiomatic. If you try to learn the "rules" for high-performance code now, you'll get them all wrong and learn really bad programming style into the bargain.