How does a Haskell compiler work?

Compilers are a huge subject and it would be impossible to explain them in entirety here. But here's an overview for a general compiler. Hopefully this will give you some understanding that may make reading things specifically about GHC a little easier to understand.

Compilers generally work by a series of transformations in 2 parts, the front-end and back-end.

The first transformation is turning plain text into something a little easier to traverse. This itself is usually split into 2 parts:

Lexical Analysis or Tokenization - The act of transforming plain text into small chunks (typically operators, identifiers, literals etc).

Syntactic Analysis or Parsing - Turning these small chunks into a tree structure. (typically an AST, an Abstract Syntax Tree)

The next stage is semantic analysis. In this stage a compiler will usually add information to the AST (like type information) and build a symbol table. That concludes the front-end.

The next transformation transforms the AST into an IR, an Intermediate Representation. This is generally, nowadays an SSA form, a Single Static Assignment.

This is then optimized, via Constant Propagation, Dead code analysis, Vectorisation etc.

The last transformation is code generation. Transforming the IR into machine code. This can be very complicated. It is also sometimes referred to as lowering.

For more information I recommend this wikipedia page.


You can get an answer from the horse's mouth! Simon Peyton Jones (GHC wizard) wrote a book explaining how to implement functional programming languages. It's available for free online since it's now out of print: http://research.microsoft.com/en-us/um/people/simonpj/papers/pj-lester-book/

Of course, GHC has moved on since the book was written, but it's still very relevant.


Are you looking for details especially about compiling lazy-evaluation? There is Simon Peyton-Jones's book mentioned by Max Bolingbroke, also the book detailing Clean's implementation is online:

http://wiki.clean.cs.ru.nl/Functional_Programming_and_Parallel_Graph_Rewriting

If you have a university affiliation and want something smaller you could try to get these books (Henderson & Diller are certainly out of print):

Antoni Diller "Compiling Function Languages" ISBN 0 471 92027 4

Peter Henderson "Functional Programming Application and Implementation" ISBN 0-13-331579-7

AJT Davie "An Introduction to Functional Programming Systems using Haskell" ISBN 0 521 27724 8

Diller has a full compiler for a lazy language (implemented in Pascal) via combinator reduction. This was the implementation technique invented by David Turner for SASL. Henderson has many parts of a compiler for LISPkit a miniature, lazy variant of Lisp. Davie details quite a bit of the machinery for compiling a lazy language, for instance there's a description of the STG thats much shorter than Simon Peyton-Jones's book (the STG is the abstract machine SPJ used for Haskell).

The Clean developers have quite a bit of info on implementing SAPL (a Simple Applicative Language) if you look through their publications list:

https://clean.cs.ru.nl/Publications

Finally there are quite a number of papers documenting aspects of the Utrecht Haskell Compiler UHC (and EHC). I think most of the information is how the compiler is organized (with attribute grammars and "Shuffle") and how the type systems (there are various levels of type system in EHC) are implemented, rather than how the back-end 'compilation' works.


Unfortunately, I suspect that what you're looking for doesn't exist. Compiler theory and Formal Language theory are reasonably complex topics in Computer Science, and Haskell is by no means a starting point.

First, you should probably get a good grounding in:

  • Lexical Analysis: http://en.wikipedia.org/wiki/Lexical_analysis
  • Parsing: http://en.wikipedia.org/wiki/Parsing#Programming_languages
    • Context Free (and other) Grammar systems (CFG, BNF)
  • Code Generation: http://en.wikipedia.org/wiki/Code_generation_(compiler)

I would suspect anything explaining anything about the internals of Haskell would require a substantially better understanding of the above topics than say, C would.

I've taken a single course on the subject so far, so I have no formal literature to recommend, but I'm sure there exist many good sources.