If lifecycleScope is supervisor, why its child coroutine's failure causes the app crash?

There are few things which are playing important roles in your use case

Here everything is fine, Coroutine #1 failure doesn't affect nor the parent, neither the Coroutine #2. That's the purpose of supervision

  1. CoroutineExceptionHandler vs Thread.uncaughtExceptionHandler

CoroutineExceptionHandler is the default handler which will print the exception details once the exception is being thrown by coroutine. Using launch with join will force the coroutine to wait until are jobs are finished so that's why you are able to see the output of both the coroutines.

Now if a coroutine crashed with join then it will throw CancellationException

In particular, it means that a parent coroutine invoking join on a child coroutine that was started using launch(coroutineContext) { ... } builder throws CancellationException if the child had crashed, unless a non-standard CoroutineExceptionHandler is installed in the context.

CoroutineExceptionHandler without join: By default, CoroutineExceptionHandler will ignore the CancellationException and if you don't use join then it won't print anything.

CoroutineExceptionHandler with join : if you use join on coroutine then builder will throw the CancellationException and since the job is not complete yet (other coroutines are still in progress ) then it will print the error and continue with other jobs.

supervisorScope.coroutineContext[Job]!!.children.forEach { it.join() }

Follow the same behaviour define with Exception propagation where GlobalScope has no associated Job object.

In Android, Thread.uncaughtExceptionHandler is the default handler and it will kill the app in case of uncaught exception and show the crash dialog.

That's the different between handling the exceptions with or without join in different ecosystems hence you get no termination behaviour in your kotlin test with join(which is not in android app)

Although lifecycleScope.coroutineContext is SupervisorJob() + Dispatchers.Main.immediate, here I see that failure of child coroutine affected the parent and other children.

  1. No, child is not effecting the parent coroutine because there is no child at all. Your both coroutines will be executed on same thread as individual parent coroutines and there is no parent-child relation (use Thread.currentThread()?.name in your coroutines to view the thread name) so in case of exception, the parent will delegate the exception to uncaughtExceptionHandler of android which will kill the app(refer point 1).

So, you can either use withContext

lifecycleScope.launch(Dispatchers.Default) {
            for (i in 0 until 5) {
                Log.d(TAG, "testSupervisorScope: Coroutine #1: $i")
                delay(100)
            }

            try {
                // can use another context to change thread, e.g Dispatchers.IO
                withContext(lifecycleScope.coroutineContext) {
                    Log.d(TAG, "testSupervisorScope: Coroutine withContext start")
                    delay(100)
                    throw RuntimeException("Coroutine sub-task failure")
                }

            } catch (e: java.lang.RuntimeException) {
                e.printStackTrace()
            }
        }

or in order the establish a parent-child relationships use the same scope to call child coroutines as

   private fun testSupervisorScope() = runBlocking {
        // Coroutine #1
        lifecycleScope.launch(Dispatchers.Default) {
            for (i in 0 until 5) {
                Log.d(TAG, "testSupervisorScope: Coroutine #1: $i")
                delay(100)
            }


            // Coroutine child #1
            try {
                childCoroutineWithException().await()
            } catch (e: Exception) {
                Log.d(TAG, "caught exception")
                e.printStackTrace()
            }
        }
    }

    // Note: use same scope `lifecycleScope` to ceate child coroutine to establish parent-child relation
    fun childCoroutineWithException(): Deferred<String> = lifecycleScope.async {
        Log.d(TAG, "testSupervisorScope: Coroutine child #1 start")
        delay(100)
        throw RuntimeException("Coroutine child #1 failure")
    }

Once the parent-child relation is established then above code will be able handle the exception in the catch block and will not effect other child coroutines execution.

Result with child coroutines:

CoroutineJobActivity: testSupervisorScope: Coroutine #1: 1
CoroutineJobActivity: testSupervisorScope: Coroutine #1: 2
CoroutineJobActivity: testSupervisorScope: Coroutine #1: 3
CoroutineJobActivity: testSupervisorScope: Coroutine #1: 4
CoroutineJobActivity: testSupervisorScope: Coroutine #1: 5
CoroutineJobActivity: testSupervisorScope: Coroutine child #1 start
CoroutineJobActivity: Coroutine child #1 failure

You can further simplify your example by removing runBlocking

private fun testSupervisorScope(){
    // Coroutine #1
    lifecycleScope.launch(Dispatchers.Default) {
        for (i in 0 until 5) {
            Log.d(TAG, "testSupervisorScope: Coroutine #1: $i")
            try {
                childCoroutineWithException().await()
            } catch (e: Exception) {
                Log.d(TAG, "caught exception")
                e.printStackTrace()
            }
            delay(100)
        }

    }
}

// Note: use same scope `lifecycleScope` to ceate child coroutine to establish parent-child relation
fun childCoroutineWithException(): Deferred<String> = lifecycleScope.async {
    Log.d(TAG, "testSupervisorScope: Coroutine child #1 start")
    delay(100)
    throw RuntimeException("Coroutine child #1 failure")
}

You can implement your own handler for uncaught exceptions to avoid app crash with(Don't do it unless you really need it, cause it's bad practice, causes Technical debt).

Need to handle uncaught exception and send log file


If you take a closer look at your output:

Exception in thread "DefaultDispatcher-worker-1" java.lang.RuntimeException: Coroutine #1 failure
    at supervisor.SupervisorJobUsage$main$1$1.invokeSuspend(SupervisorJobUsage.kt:16)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:561)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:727)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:667)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:655)

This a report from the JVM-level uncaught exception handler. It means that, even though it didn't cancel your scope's job, the exception killed the Java thread. The executor can recover from such errors easily, but Android uses a different uncaught exception handler, one that immediately kills the whole app. Nothing about the coroutine scope changes that behavior.

Here's some code you can try out to see this mechanism in action:

GlobalScope.launch(Dispatchers.Default) {
    Thread.currentThread().setUncaughtExceptionHandler { thread, exception ->
        Log.e("MyTag", "We got an error on ${thread.name}: $exception")
    }
    throw RuntimeException("Dead")
}

If I comment out the setUncaughtExceptionHandler call, I get an app crash just like you. But with that in place, I just get a line in the log.

You wouldn't write that in production, of course, but if you add a coroutine exception handler to the scope, it will have the same effect.

The whole story doesn't make a lot of sense to me, though, and I think exception handling in general is still an area that needs polishing in Kotlin coroutines.