Active tasks is a negative number in Spark UI

It is a Spark issue. It occurs when executors restart after failures. The JIRA issue for the same is already created. You can get more details about the same from https://issues.apache.org/jira/browse/SPARK-10141 link.


Answered in the Spark-dev mailing list from S. Owen, there are several JIRA tickets that are relevant to this issue, such as:

  1. ResourceManager UI showing negative value
  2. NodeManager reports negative running containers

This behavior usually occurs when (many) executors restart after failure(s).


This behavior can also occur when the application uses too many executors. Use coalesce() to fix this case.

To be exact, in Prepare my bigdata with Spark via Python, I had >400k partitions. I used data.coalesce(1024), as described in Repartition an RDD, and I was able to bypass that Spark UI bug. You see, partitioning, is a very important concept when it comes to Distributed Computing and Spark.

In my question I also use 1-2k executors, so it must be related.

Note: Too few partitions and you might experience this Spark Java Error: Size exceeds Integer.MAX_VALUE.