Why is there no implicit parallelism in Haskell?

This is a long studied topic. While you can implicitly derive parallelism in Haskell code, the problem is that there is too much parallelism, at too fine a grain, for current hardware.

So you end up spending effort on book keeping, not running things faster.

Since we don't have infinite parallel hardware, it is all about picking the right granularity -- too coarse and there will be idle processors, too fine and the overheads will be unacceptable.

What we have is more coarse grained parallelism (sparks) suitable for generating thousands or millions of parallel tasks (so not at the instruction level), which map down onto the mere handful of cores we typically have available today.

Note that for some subsets (e.g. array processing) there are fully automatic parallelization libraries with tight cost models.

For background on this see Feedback Directed Implicit Parallelism, where they introduce an automated approach to the insertion of par in arbitrary Haskell programs.


While your code block may not be the best example due to implicit data dependence between the a and b, it is worth noting that these two bindings commute in that

f = do
  a <- Just 1
  b <- Just $ Just 2
  ...

will give the same results

f = do
  b <- Just $ Just 2
  a <- Just 1
  ...

so this could still be parallelized in a speculative fashion. It is worth noting that this does not need to have anything to do with monads. We could, for instance, evaluate all independent expressions in a let-block in parallel or we could introduce a version of let that would do so. The lparallel library for Common Lisp does this.

Now, I am by no means an expert on the subject, but this is my understanding of the problem. A major stumbling block is determining when it is advantageous to parallelize the evaluation of multiple expressions. There is overhead associated with starting the separate threads for evaluation, and, as your example shows, it may result in wasted work. Some expressions may be too small to make parallel evaluation worth the overhead. As I understand it, coming up will a fully accurate metric of the cost of an expression would amount to solving the halting problem, so you are relegated to using an heuristic approach to determining what to evaluate in parallel.

Then it is not always faster to throw more cores at a problem. Even when explicitly parallelizing a problem with the many Haskell libraries available, you will often not see much speedup just by evaluating expressions in parallel due to heavy memory allocation and usage and the strain this puts on the garbage collector and CPU cache. You end up needing a nice compact memory layout and to traverse your data intelligently. Having 16 threads traverse linked lists will just bottleneck you at your memory bus and could actually make things slower.

At the very least, what expressions can be effectively parallelized is something that is not obvious to many programmers (at least it isn't to this one), so getting a compiler to do it effectively is non-trivial.