GHC: Are there consistent rules for memoization for calls with fixed values?

To elaborate in case it wasn't clear from @amalloy's answer, the problem is that you're conflating two things here -- the implicit memoization-like-behavior (what people mean when they talk about Haskell's "automatic memoization", though it is not true memoization!) that results directly from thunk-based lazy evaluation, and a compiler optimization technique that's basically a form of common subexpression elimination. The former is predictable, more or less; the latter is at the whim of the compiler.

Recall that real memoization is a property of the implementation of a function: the function "remembers" results calculated for certain combinations of arguments, and may reuse those results instead of recalculating them from scratch when called multiple times with the same arguments. When GHC generates code for functions, it does not automatically generate code to perform this kind of memoization.

Instead, the GHC code generates to implement function application is unusual. Instead of actually applying the function to arguments to generate the final result as a value, a "result" is immediately constructed in the form of a thunk, which you can view as a suspended function call or a "promise" to deliver a value at a later time.

When, at some future point, the actual value is needed, the thunk is forced (which actually causes the original function call to take place), and the thunk is updated with the value. If that same value is needed again later, the value is already available, so the thunk doesn't need to be forced a second time. This is the "automatic memoization". Note that it takes place at the "result" level rather than the "function" level -- the result of a function application remembers its value; a function does not remember the results it previously produced.

Now, normally the concept of the result of a function application remembering its value would be ridiculous. In strict languages, we don't worry that after x = sqrt(10), reusing x will cause multiple sqrt calls because x hasn't "memoized" its own value. That is, in strict languages, all function application results are "automatically memoized" in the same sense they are in Haskell.

The difference is lazy evaluation, which allows us to write something like:

stuff = map expensiveComputation [1..10000]

which returns a thunk immediately without performing any expensive computations. Afterwards:

f n = stuff !! n

magically creates a memoized function, not because GHC generates code in the implementation of f to somehow memoize the call f 1000, but because f 1000 forces (a bunch of list constructor thunks and then) a single expensiveComputation whose return value is "memoized" as the value at index 1000 in the list stuff -- it was a thunk, but after being forced, it remembers its own value, just like any value in a strict language would.

So, given your definition of slow_fib, none of your examples are actually making use of Haskell's automatic memoization, in the usual sense people mean. Any speedups you're seeing are the result of various compiler optimizations that are (or aren't) recognizing common subexpressions or inlining / unwrapping short loops.

To write a memoized fib, you need to do it as explicitly as you would in a strict language, by creating a data structure to hold the memoized values, though lazy evaluation and mutually recursive definitions can sometimes make it seem like it's "automatic":

import qualified Data.Vector as V
import Data.Vector (Vector,(!))

fibv :: Vector Integer
fibv = V.generate 1000000 getfib
  where getfib 0 = 1
        getfib 1 = 1
        getfib i = fibv ! (i-1) + fibv ! (i-2)

fib :: Int -> Integer
fib n = fibv ! n

All of the examples you link at the end exploit the same technique: instead of implementing function f directly, they first introduce a list whose contents are all the calls to f that could ever be made. That list is computed only once, lazily; and then a simple lookup in that list is used as the implementation of the user-facing function. So, they are not relying on any caching from GHC.

Your question is different: you hope that calling some function will be automatically cached for you, and in general that does not happen. The real question is why any of your results are fast. I'm not sure, but I think it is to do with Constant Applicative Forms (CAFs), which GHC may share between multiple use sites, at its discretion.

The most relevant feature of a CAF here is the "Constant" part: GHC will only introduce such a cache for an expression whose value is constant throughout the entire run of the program, not just for some particular scope. So, you can be sure that f x <> f x will never reuse the result of f x (at least not due to CAF folding; maybe GHC can find some other excuse to memoize this for some functions, but typically it does not).

The two things in your program that are not CAFs are the implementation of slow_fib, and the recursive case of fib_plus_40. GHC definitely cannot introduce any caching of the results of those expressions. The base case for fib_plus_40 is a CAF, as are all of the expressions and subexpressions in main. So, GHC can choose to cache/share any of those subexpressions, and not share any of them, as it pleases. Perhaps it sees that slow_fib 40 is "obviously" simple enough to save, but it's not so sure about whether the slow_fib 35 expressions in main should be shared. Meanwhile, it sounds like it does decide to share the IO action putStrLn $ show $ slow_fib 35 for whatever reason. Seems like a weird choice to you and me, but we're not compilers.

The moral here is that you cannot count on this at all: if you want to ensure you compute a value only once, you need to save it in a variable somewhere, and refer to that variable instead of recomputing it.


To confirm this, I took luqui's advice and looked at the -ddump-simpl output. Here are some snippets showing the explicit caching:

-- RHS size: {terms: 2, types: 0, coercions: 0}
lvl1_r4ER :: Integer
[GblId, Str=DmdType]
lvl1_r4ER = $wslow_fib_r4EP 40#

Rec {
-- RHS size: {terms: 21, types: 4, coercions: 0}
Main.main_fib_plus_40 [Occ=LoopBreaker] :: Integer -> Integer
[GblId, Arity=1, Str=DmdType <S,U>]
Main.main_fib_plus_40 =
  \ (n_a1DF :: Integer) ->
    case integer-gmp-1.0.0.1:GHC.Integer.Type.leInteger#
           n_a1DF Main.main7
    of wild_a2aQ { __DEFAULT ->
    case GHC.Prim.tagToEnum# @ Bool wild_a2aQ of _ [Occ=Dead] {
      False ->
        integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
          (Main.main_fib_plus_40
             (integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
                n_a1DF Main.main4))
          (Main.main_fib_plus_40
             (integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
                n_a1DF lvl_r4EQ));
      True -> lvl1_r4ER
    }
    }
end Rec }

This doesn't tell us why GHC is choosing to introduce this cache - remember, it's allowed to do what it wants. But it does confirm the mechanism, that it introduces a variable to hold the repeated calculation. I can't show you core for your longer main involving smaller numbers, because when I compile it I get more sharing: the expressions in section 2 are cached for me as well.