Why threads are showing better performance than coroutines?

The way you've set up your problem, you shouldn't expect any benefit from coroutines. In all cases you submit a non-divisible block of computation to an executor. You are not leveraging the idea of coroutine suspension, where you can write sequential code that actually gets chopped up and executed piecewise, possibly on different threads.

Most use cases of coroutines revolve around blocking code: avoiding the scenario where you hog a thread to do nothing but wait for a response. They may also be used to interleave CPU-intensive tasks, but this is a more special-cased scenario.

I would suggest benchmarking 1,000,000 tasks that involve several sequential blocking steps, like in Roman Elizarov's KotlinConf 2017 talk:

suspend fun postItem(item: Item) {
    val token = requestToken()
    val post = createPost(token, item)
    processPost(post)
}

where all of requestToken(), createPost() and processPost() involve network calls.

If you have two implementations of this, one with suspend funs and another with regular blocking functions, for example:

fun requestToken() {
   Thread.sleep(1000)
   return "token"
}

vs.

suspend fun requestToken() {
    delay(1000)
    return "token"
}

you'll find that you can't even set up to execute 1,000,000 concurrent invocations of the first version, and if you lower the number to what you can actually achieve without OutOfMemoryException: unable to create new native thread, the performance advantage of coroutines should be evident.

If you want to explore possible advantages of coroutines for CPU-bound tasks, you need a use case where it's not irrelevant whether you execute them sequentially or in parallel. In your examples above, this is treated as an irrelevant internal detail: in one version you run 1,000 concurrent tasks and in the other one you use just four, so it's almost sequential execution.

Hazelcast Jet is an example of such a use case because the computation tasks are co-dependent: one's output is another one's input. In this case you can't just run a few of them until completion, on a small thread pool, you actually have to interleave them so the buffered output doesn't explode. If you try to set up such a scenario with and without coroutines, you'll once again find that you're either allocating as many threads as there are tasks, or you are using suspendable coroutines, and the latter approach wins. Hazelcast Jet implements the spirit of coroutines in plain Java API. Its approach would hugely benefit from the coroutine programming model, but currently it's pure Java.

Disclosure: the author of this post belongs to the Jet engineering team.


Coroutines are not designed to be faster than threads, it is for lower RAM consumption and better syntax for async calls.