forkIO threads and OS threads

First, let's address one quick misconception:

I understand that these user threads are eventually by the OS mapped to OS threads.

Actually, the Haskell runtime is in charge of choosing which Haskell thread a particular OS thread from its pool is executing.

Now the questions, one at a time.

Why can Haskell threads that do completely different things be mapped to one and the same OS thread?

Ignoring FFI for the moment, all OS threads are actually running the Haskell runtime, which keeps track of a list of ready Haskell threads. The runtime chooses a Haskell thread to execute, and jumps into the code, executing until the thread yields control back to the runtime. At that moment, the runtime has a chance to continue executing the same thread or pick a different one.

In short: many Haskell threads can be mapped to a single OS thread because in reality that OS thread is doing only one thing, namely, running the Haskell runtime.

Why was there no need to initiate the OS thread with a fixed instruction (as it is needed in forkIO)?

I don't understand this question (and I think it stems from a second misconception). You start OS threads with a fixed instruction in exactly the same sense that you start Haskell threads with a fixed instruction: for each thing, you just give a chunk of code to execute and that's what it does.

How does the scheduler(?) recognize user threads in an application that could possibly be distributed?

"Distributed" is a dangerous word: usually, it refers to spreading code across multiple machines (presumably not what you meant here). As for how the Haskell runtime can tell when there's multiple threads, well, that's easy: you tell it when you call forkIO.

In other words, why are OS threads so flexible?

It's not clear to me that OS threads are any more flexible than Haskell threads, so this question is a bit strange.

Last, is there any way to dump the heap of a selected thread from within the application?

I actually don't really know of any tools for dumping the Haskell heap at all, in multithreaded applications or otherwise. You can dump a representation of the part of the heap reachable from a particular object, if you like, using a package like vacuum. I've used vacuum-cairo to visualize these dumps with great success in the past.

For further information, you may enjoy the middle two sections, "Conventions" and "Foreign Imports", from my intro to multithreaded gtk2hs programming, and perhaps also bits of the section on "The Non-Threaded Runtime".


Instead of trying to directly answer your question, I will try to provide a conceptual model for how multi-threaded Haskell programs are implemented. I will ignore many details, and complexities.

Operating systems implement preemptive multithreading using hardware interrupts to allow multiple "threads" of computation to run logically on the same core at the same time.

The threads provided by operating systems tend to be heavy weight. They are well suited to certain types of "multi-threaded" applications, and, on systems like Linux, are fundamentally the same tool that allows multiple programs to run at the same time (a task they excel at).

But, these threads are bit heavy weight for many uses in high level languages such as Haskell. Essentially, the GHC runtime works as mini-OS, implementing its own "threads" on top of the OS threads, in the same way an OS implements threads on top of cores.

It is conceptually easy to imagine that a language like Haskell would be implemented in this way. Evaluating Haskell consists of "forcing thunks" where a thunk is a unit of computation that might 1. depend on another value (thunk) and/or 2. create new thunks.

Thus, one can imagine multiple threads each evaluating thunks at the same time. One would construct a queue of thunks to be evaluated. Each thread would pop the top of the queue, and evaluate that thunk until it was completed, then select a new thunk from the queue. The operation par and its ilk can "spark" new computation by adding a thunk to that queue.

Extending this model to IO actions is not particularly hard to imagine either. Instead of each simply forcing pure thunk, we imagine the unit of Haskell computation being somewhat more complicated. Psuedo Haskell for such a runtime:

type Spark = (ThreadId,Action)
data Action = Compute Thunk | Perform IOAction

note: this is for conceptual understanding only, don't think things are implemented this way

When we run a Spark, we look for exceptions "thrown" to that thread ID. Assuming we have none, execution consists of either forcing a thunk or performing an IO action.

Obviously, my explanation here has been very hand-wavy, and ignored some complexity. For more, the GHC team have written excellent articles such as "Runtime Support for Multicore Haskell" by Marlow et al. You might also want to look at text book on Operating Systems, as they often go in some depth on how to build a scheduler.