Quartz Performance

In a previous project, I was confronted with the same problem. In our case, Quartz performed good up a granularity of a second. Sub-second scheduling was a stretch and as you are observing, misfires happened often and the system became unreliable.

Solved this issue by creating 2 levels of scheduling: Quartz would schedule a job 'set' of n consecutive jobs. With a clustered Quartz, this means that a given server in the system would get this job 'set' to execute. The n tasks in the set are then taken in by a "micro-scheduler": basically a timing facility that used the native JDK API to further time the jobs up to the 10ms granularity.

To handle the individual jobs, we used a master-worker design, where the master was taking care of the scheduled delivery (throttling) of the jobs to a multi-threaded pool of workers.

If I had to do this again today, I'd rely on a ScheduledThreadPoolExecutor to manage the 'micro-scheduling'. For your case, it would look something like this:

ScheduledThreadPoolExecutor scheduledExecutor;
...
    scheduledExecutor = new ScheduledThreadPoolExecutor(THREAD_POOL_SIZE);
...

// Evenly spread the execution of a set of tasks over a period of time
public void schedule(Set<Task> taskSet, long timePeriod, TimeUnit timeUnit) {
    if (taskSet.isEmpty()) return; // or indicate some failure ...
    long period = TimeUnit.MILLISECOND.convert(timePeriod, timeUnit);
    long delay = period/taskSet.size();
    long accumulativeDelay = 0;
    for (Task task:taskSet) {
        scheduledExecutor.schedule(task, accumulativeDelay, TimeUnit.MILLISECOND);
        accumulativeDelay += delay;
    }
}

This gives you a general idea on how use the JDK facility to micro-schedule tasks. (Disclaimer: You need to make this robust for a prod environment, like check failing tasks, manage retries (if supported), etc...).

With some testing + tuning, we found an optimal balance between the Quartz jobs and the amount of jobs in one scheduled set.

We experienced a 100X throughput improvement in this way. Network bandwidth was our actual limit.


First of all check How do I improve the performance of JDBC-JobStore? in Quartz documentation.

As you can probably guess there is in absolute value and definite metric. It all depends on your setup. However here are few hints:

  • 20 jobs per second means around 100 database queries per second, including updates and locking. That's quite a lot!

  • Consider distributing your Quartz setup to cluster. However if database is a bottleneck, it won't help you. Maybe TerracottaJobStore will come to the rescue?

  • Having K cores in the system everything less than K will underutilize your system. If your jobs are CPU intensive, K is fine. If they are calling external web services, blocking or sleeping, consider much bigger values. However more than 100-200 threads will significantly slow down your system due to context switching.

  • Have you tried profiling? What is your machine doing most of the time? Can you post thread dump? I suspect poor database performance rather than CPU, but it depends on your use case.