Why is this version of 'fix' more efficient in Haskell?

I don't think this always (or necessarily ever) helps when you're calling fix with a function that takes two arguments to produce a function taking one argument. You'd have to run some benchmarks to see. But you can also call it with a function taking one argument!

fix (1 :)

is a circular linked list. Using the naive definition of fix, it would instead be an infinite list, with new pieces built lazily as the structure is forced.


The slow fix calls f on each step of recursion, while the fast one calls f exactly once. It can be visualized with tracing:

import Debug.Trace

fix  f = f (fix f)
fix' f = let x = f x in x

facf :: (Int -> Int) -> Int -> Int
facf f 0 = 1
facf f n = n * f (n - 1)

tracedFacf x = trace "called" facf x

fac  = fix tracedFacf
fac' = fix' tracedFacf

Now try some running:

> fac 3
called
called
called
called
6
> fac' 3
called
6

In more detail, let x = f x in x results in a lazy thunk being allocated for x, and a pointer to this thunk is passed to f. On first evaluating fix' f, the thunk is evaluated and all references to it (here specifically: the one we pass to f) are redirected to the resulting value. It just happens that x is given a value that also contains a reference to x.

I admit this can be rather mind-bending. It's something that one should get used to when working with laziness.


I believe this has been asked already, but I couldn't find the answer. The reason is that the first version

fix f = f (fix f)

is a recursive function, so it can't be inlined and then optimized. From the GHC manual:

For example, for a self-recursive function, the loop breaker can only be the function itself, so an INLINE pragma is always ignored.

But

fix f = let x = f x in x

isn't recursive, the recursion is moved into the let binding, so it's possible to inline it.

Update: I did some tests and while the former version doesn't inline while the latter does, it doesn't seem to be crucial for performance. So the other explanations (a single object on heap vs creating one every iteration) seem to be more accurate.