How does Erlang schedule work for multicore CPU machines?

Erlang does not use threads in the traditional sense. The Erlang VM creates one system thread for each hardware core of the CPU. When you start a thread in Erlang, you are really creating a "task", which is different from a system thread. Erlang manages these tasks inside of the VM.

Depending on VM and it's configuration, these tasks may or may not be mapped to individual CPU cores, which I believe is what you are seeing here.

There is an interesting blog article you might like here.


Erlang's default behavior has historically been to run one scheduler, which is basically a native OS thread, which chooses Erlang tasks to run from a queue. With the advent of multi-core and multi-processor systems, the runtime was extended to take advantage. Starting the runtime with -smp enabled will cause the runtime to create multiple schedulers, usually one per logical CPU. You can manually specify the number of schedulers with the -S flag e.g. -S 16.

This is documented in the Erlang Run-Time System Reference Manual.

A deeper discussion of SMP support can be found in this discussion thread.

EDIT

I should also point out that, as of R12B, SMP is enabled by default on platforms that support it (equivalent to the -smp auto flag). If you're curious about your own runtime, the following quote from the discussion thread will be of interest:

You can see what was chosen at the first line of printout from the "erl" command. E.g. Erlang (BEAM) emulator version 5.6.4 [source] [smp:4] [asynch-threads:0] .....

The "[smp:4]" above tells that the SMP VM is run and with 4 schedulers.


The reason you see so little parallelism is that your program is basically sequential. All the work is being done in one process in the fib/3 function. The processes you spawn all just send a message and then die and the spawning process synchronously waits for these messages so there is no real concurrency. You could just as well just call the loop/3 function directly with these values.

Otherwise it is as others have mentioned that Erlang automatically uses all the multiple cores available and distributes processes across these where possible. In your case however there is little need to do this and no gain so the system does not do it.

This is actually one of the more difficult things in writing concurrent applications. It is not enough to just spread things into many processes, you actually have to make sure that these processes actually run concurrently. It means rethinking your algorithms, which can be difficult.