Haskell - why would I use infinite data structures?

  • Annotating a list with its indices temporarily uses an infinite list of indices:

    zip [0..] ['a','b','c','d'] = [(0,'a'), (1,'b'), (2,'c'), (3,'d')]
    
  • Memoizing functions while maintaining purity (in this case this transformation causes an exponential speed increase, because the memo table is used recursively):

    fib = (memo !!)
        where
        memo = map fib' [0..]  -- cache of *all* fibonacci numbers (evaluated on demand) 
        fib' 0 = 0
        fib' 1 = 1
        fib' n = fib (n-1) + fib (n-2)
    
  • Purely mocking programs with side-effects (free monads)

    data IO a = Return a
              | GetChar (Char -> IO a)
              | PutChar Char (IO a)
    

    Potentially non-terminating programs are represented with infinite IO strucutres; e.g. forever (putChar 'y') = PutChar 'y' (PutChar 'y' (PutChar 'y' ...))

  • Tries: if you define a type approximately like the following:

    data Trie a = Trie a (Trie a) (Trie a)
    

    it can represent an infinite collection of as indexed by the naturals. Note that there is no base case for the recursion, so every Trie is infinite. But the element at index n can be accessed in log(n) time. Which means you can do something like this (using some functions in the inttrie library):

    findIndices :: [Integer] -> Trie [Integer]
    findIndices = foldr (\(i,x) -> modify x (i:)) (pure []) . zip [0..]
    

    this builds an efficient "reverse lookup table" which given any value in the list can tell you at which indices it occurs, and it both caches results and streams information as soon as it's available:

    -- N.B. findIndices [0, 0,1, 0,1,2, 0,1,2,3, 0,1,2,3,4...]
    > table = findIndices (concat [ [0..n] | n <- [0..] ])
    > table `apply` 0
    [0,1,3,6,10,15,21,28,36,45,55,66,78,91,...
    

    all from a one-line infinite fold.

I'm sure there are more examples, there are so many cool things you can do.


The basic advantage of lists in Haskell is that they’re a control structure that looks like a data structure. You can write code that operates incrementally on streams of data, but it looks like simple operations on lists. This is in contrast to other languages that require the use of an explicitly incremental structure, like iterators (Python’s itertools), coroutines (C# IEnumerable), or ranges (D).

For example, a sort function can be written in such a way that it sorts as few elements as possible before it starts to produce results. While sorting the entire list takes O(n log n) / linearithmic time in the length of the list, minimum xs = head (sort xs) only takes O(n) / linear time, because head will only examine the first constructor of the list, like x : _, and leave the tail as an unevaluated thunk that represents the remainder of the sorting operation.

This means that performance is compositional: for example, if you have a long chain of operations on a stream of data, like sum . map (* 2) . filter (< 5), it looks like it would first filter all the elements, then map a function over them, then take the sum, producing a full intermediate list at every step. But what happens is that each element is only processed one at a time: given [1, 2, 6], this basically proceeds as follows, with all the steps happening incrementally:

  • total = 0
  • 1 < 5 is true
  • 1 * 2 == 2
  • total = 0 + 2 = 2
  • 2 < 5 is true
  • 2 * 2 == 4
  • total = 2 + 4 = 6
  • 6 < 5 is false
  • result = 6

This is exactly how you would write a fast loop in an imperative language (pseudocode):

total = 0;
for x in xs {
  if (x < 5) {
    total = total + x * 2;
  }
}

This means that performance is compositional: because of laziness, this code has constant memory usage during the processing of the list. And there is nothing special inside map or filter that makes this happen: they can be entirely independent.

For another example, and in the standard library computes the logical AND of a list, e.g. and [a, b, c] == a && b && c, and it’s implemented simply as a fold: and = foldr (&&) True. The moment it reaches a False element in the input, it stops evaluation, simply because && is lazy in its right argument. Laziness gives you composition!

For a great paper on all this, read the famous Why Functional Programming Matters by John Hughes, which goes over the advantages of lazy functional programming (in Miranda, a forebear of Haskell) far better than I could.