What is the connection between laziness and purity?

You're right, from a modern POV this doesn't really make sense. What is true is that lazy-by-default would make reasoning about side-effectful code a nightmare, so lazyness does require purity – but not the other way around.

What does require lazyness though is the way Haskell in versions 1.0–1.2, following its predecessor Miranda, emulated IO without monads. Short of any explicit notion of side-effect sequencing, the type of executable programs was

main :: [Response] -> [Request]

which would, for a simple interactive program, work something like this: main would at first just ignore its input list. So thanks to lazyness, the values in that list wouldn't actually need to exist, at that point. In the meantime it would produce a first Request value, e.g. a terminal prompt for the user to type something. What was typed would then come back as a Response value, which only now actually needed to be evaluated, giving rise to a new Request, etc. etc..

https://www.haskell.org/definition/haskell-report-1.0.ps.gz

In version 1.3 they then switched to the monadic-IO interface that we all know and love today, and at that point lazyness wasn't really necessary anymore. But before that, common wisdom was that the only way to interact with the real world without lazyness was to allow side-effectful functions, thus the statement that without lazyness, Haskell would just have gone down the same path as Lisp and ML before it.


There are two aspects to this tweet: first, the fact that from a technical point of view, laziness generally mandates purity; second, the fact that from a practical point of view, strictness could still allow purity, but in practice it usually doesn't (i.e., with strictness, purity "goes out the window").

Simon Peyton-Jones explains both of these aspects in the paper "A History of Haskell: Being Lazy With Class". With respect to the technical aspect, in the section 3.2 Haskell is Pure, he writes (with my bold emphasis):

An immediate consequence of laziness is that evaluation order is demand-driven. As a result, it becomes more or less impossible to reliably perform input/output or other side effects as the result of a function call. Haskell is, therefore, a pure language.

If you can't see why laziness makes impure effects unreliable, I'm sure it's because you're over-thinking it. Here's a simple example illustrating the problem. Consider a hypothetical impure function that reads some information from a configuration file, namely some "basic" configuration and some "extended" configuration whose format depends on the configuration file version information in the header:

getConfig :: Handle -> Config
getConfig h =
  let header = readHeader h
      basic = readBasicConfig h
      extended = readExtendedConfig (headerVersion header) h
  in Config basic extended

where readHeader, readBasicConfig, and readExtendedConfig are all impure functions that sequentially read bytes from the file (i.e., using typical, file pointer-based sequential reads) and parse them into the appropriate data structures.

In a lazy language, this function probably can't work as intended. If the header, basic, and extended variable values are all lazily evaluated, then if the caller forces basic first followed by extended, the effects will be called in order readBasic, readHeader, readExtendedConfig; while if the caller forces extended first followed by basic, the effects will be called in order readHeader, readExtendedConfig, readBasic. In either case, bytes intended to be parsed by one function will be parsed by another.

And, these evaluation orders are gross oversimplifications which assume that the effects of the sub-functions are "atomic" and that readExtendedConfig reliably forces the version argument for any access to extended. If not, depending on which parts of basic and extended are forced, the order of the (sub)effects in readBasic, readExtendedConfig, and readHeader may be reordered and/or intermingled.

You can work around this specific limitation by disallowing sequential file access (though that comes with some significant costs!), but similar unpredictable out-of-order effect execution will cause problems with other I/O operations (how do we ensure that a file updating function reads the old contents before truncated the file for update?), mutable variables (when exactly does that lock variable get incremented?), etc.

With respect to the practical aspect (again with my bold emphasis), SPJ writes:

Once we were committed to a lazy language, a pure one was inescapable. The converse is not true, but it is notable that in practice most pure programming languages are also lazy. Why? Because in a call-by-value language, whether functional or not, the temptation to allow unrestricted side effects inside a "function" is almost irresistible.

...

In retrospect, therefore, perhaps the biggest single benefit of laziness is not laziness per se, but rather that laziness kept us pure, and thereby motivated a great deal of productive work on monads and encapsulated state.

In his tweet, I believe Hutton is referring not to the technical consequence of laziness leading to purity, but rather the practical consequence of strictness tempting the language designers to relax purity "just in this one, special case", after which purity quickly goes out the window.