Walk through a list split function in Haskell

Let’s try running this function on a sample input list, say [1,2,3,4,5]:

  1. We start with foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [1,2,3,4,5]. Here a is the first element of the list, and (x,y) start out as ([],[]), so (a:y,x) returns ([1],[]).
  2. The next element of the input list is a = 2, and (x,y) = ([1],[]), so (a:y,x) = ([2],[1]). Note that the order of the lists has swapped. Each iteration will swap the lists again; however, the next element of the input list will always be added to the first list, which is how the splitting works.
  3. The next element of the input list is a = 3, and (x,y) = ([2],[1]), so (a:y,x) = ([3,1],[2]).
  4. The next element of the input list is a = 4, and (x,y) = ([3,1],[2]), so (a:y,x) = ([4,2],[3,1]).
  5. The next element of the input list is a = 4, and (x,y) = ([4,2],[3,1]), so (a:y,x) = ([5,3,1],[4,2]).
  6. There are no more elements left, so the return value is ([5,3,1],[4,2]).

As the walkthrough shows, the split function works by maintaining two lists, swapping them on each iteration, and appending each element of the input to a different list.


We can take a look at an example. For example if we have a list [1, 4, 2, 5]. If we thus process the list, then we see that foldr will be calculated as:

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [1,4,2,5]

So here a is first the first item of the list, and then it will tus return something like:

(1:y, x)
    where (x, y) = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [4,2,5]

Notice that here the (x, y) tuple is swapped when we prepend a to the first item of the 2-tuple.

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [2,5]

and if we keep doing that, we thus obtain:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:y''', x''')
          (x''', y''') = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) []

Since we reached the end of the list, we thus obtain for the foldr … ([], []) [], the 2-tuple ([], []):

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:y''', x''')
          (x''', y''') = ([],[])

So x''' = [] and y''' = [], so thus this is resolved to:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x'' = [5] and y'' = []:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x' = [5] and y' = [2]:

(1:y, x)
    where (x, y) = (4:[5], [2])
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x = [4, 5] and y = [2] so eventually we obtain:

(1:[2], [4,5])
    where (x, y) = (4:[5], [2])
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so the result is the expected ([1,2], [4,5]).


Approximately,

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
=
let g a ~(x,y) = (a:y,x) in
g a $ g b $ g c $ g d $ g e ([],[])
=
g a $ g b $ g c $ g d $ ([e],[])
=
g a $ g b $ g c $ ([d],[e])
=
g a $ g b $ ([c,e],[d])
=
g a $ ([b,d],[c,e])
=
([a,c,e],[b,d])

But truly,

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
=
let g a ~(x,y) = (a:y,x) in
g a $ foldr g ([],[]) [b,c,d,e]
=
(a:y,x) where 
    (x,y) = foldr g ([],[]) [b,c,d,e]
=
(a:y,x) where 
    (x,y) = (b:y2,x2) where
                 (x2,y2) = foldr g ([],[]) [c,d,e]
=
(a:y,x) where 
    (x,y) = (b:y2,x2) where
                 (x2,y2) = (c:y3,x3) where
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])

which is forced in the top-down manner by access (if and when), being progressively fleshed-out as, e.g.,

=
(a:x2,b:y2) where 
                 (x2,y2) = (c:y3,x3) where
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:y3,b:x3) where 
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:x4,b:d:y4) where 
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:e:y5,b:d:x5) where 
                                                              (x5,y5) = ([],[])
=
(a:c:e:[],b:d:[]) 

but it could be that the forcing will be done in a different order, depending on how it is called, e.g.

print . (!!1) . snd $ foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
print . (!!2) . fst $ foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]

etc.


edit: to address the questions about the lazy pattern, it is done for proper laziness of the resulting function:

  • foldr with the combining function which is strict in its second argument, encodes recursion, which is bottom-up. The result of recursively processing the rest of the list is constructed first, and the head portion of the result is combined with that, afterwards.

  • foldr with the combining function which is lazy in its second argument, encodes corecursion, which is top-down. The head portion of the resulting value is constructed first, and the rest is filled out later. It is very reminiscent of tail recursion modulo cons, in Prolog and elsewhere. Lazy evaluation as a concept came from "CONS should not evaluate its arguments"; TRMC does not evaluate the second argument to the constructor until later, which is what really matters.