How is the fork/join framework better than a thread pool?

If you have n busy threads all working away at 100% independently, that's going to be better than n threads in a Fork-Join (FJ) pool. But it never works out that way.

There might not be able to precisely split the problem into n equal pieces. Even if you do, thread scheduling is some way off being fair. You'll end up waiting for the slowest thread. If you have multiple task then they can each run with less than n-way parallelism (generally more efficient), yet go up to n-way when other tasks have finished.

So why don't we just cut the problem up into FJ-size pieces and have a thread pool work on that. Typical FJ usage cuts the problem into tiny pieces. Doing these in a random order requires much co-ordination at a hardware level. The overheads would be a killer. In FJ, tasks are put onto a queue that the thread reads off in Last In First Out order (LIFO/stack), and work stealing (in core work, generally) is done First In First Out (FIFO/"queue"). The result is that long array processing can be done largely sequentially, even though it is broken into tiny chunks. (It is also the case that it might not be trivial to break the problem up into small evenly sized chunks in one big bang. Say dealing with a some form of hierarchy without balancing.)

Conclusion: FJ allows more efficient use of hardware threads in an uneven situations, which will be always if you have more than one thread.


I think the basic misunderstanding is, that the Fork/Join examples do NOT show work stealing but only some kind of standard divide and conquer.

Work stealing would be like this: Worker B has finished his work. He is a kind one, so he looks around and sees Worker A still working very hard. He strolls over and asks: "Hey lad, I could give you a hand." A replies. "Cool, I have this task of 1000 units. So far I have finished 345 leaving 655. Could you please work on number 673 to 1000, I'll do the 346 to 672." B says "OK, let's start so we can go to the pub earlier."

You see - the workers must communicate between each other even when they started the real work. This is the missing part in the examples.

The examples on the other hand show only something like "use subcontractors":

Worker A: "Dang, I have 1000 units of work. Too much for me. I'll do 500 myself and subcontract 500 to someone else." This goes on until the big task is broken down into small packets of 10 units each. These will be executed by the available workers. But if one packet is a kind of poison pill and takes considerably longer than other packets -- bad luck, the divide phase is over.

The only remaining difference between Fork/Join and splitting the task upfront is this: When splitting upfront you have the work queue full right from start. Example: 1000 units, the threshold is 10, so the queue has 100 entries. These packets are distributed to the threadpool members.

Fork/Join is more complex and tries to keep the number of packets in the queue smaller:

  • Step 1: Put one packet containing (1...1000) into queue
  • Step 2: One worker pops the packet(1...1000) and replaces it with two packets: (1...500) and (501...1000).
  • Step 3: One worker pops packet (500...1000) and pushes (500...750) and (751...1000).
  • Step n: The stack contains these packets: (1..500), (500...750), (750...875)... (991..1000)
  • Step n+1: Packet (991..1000) is popped and executed
  • Step n+2: Packet (981..990) is popped and executed
  • Step n+3: Packet (961..980) is popped and split into (961...970) and (971..980). ....

You see: in Fork/Join the queue is smaller (6 in the example) and the "split" and "work" phases are interleaved.

When multiple workers are popping and pushing simultaneously the interactions are not so clear of course.

Tags:

Java

Fork Join