Job queue with job affinity

Temporal Workflow is capable of supporting your use case with minimal effort.

Here is a strawman design that satisfies your requirements:

  • Send signalWithStart request to a user workflow using userID as the workflow ID. It either delivers the signal to the workflow or first starts the workflow and delivers the signal to it.
  • All requests to that workflow are buffered by it. Temporal provides a hard guarantee that only one workflow with a given ID can exist in an open state. So all signals (events) are guaranteed to be buffered in the workflow that belongs to the user. Temporal preserves all data in the workflow (including stack traces and local variables) in the presence of any process or infra failures. So no need to persist the taskQueue variable explicitly.
  • An internal workflow event loop dispatches these requests one by one.
  • When the buffer is empty workflow can complete.

Here is the workflow code that implements it in Java (Go and PHP SDKs are also supported, NodeJS is in alpha):

@WorkflowInterface
public interface SerializedExecutionWorkflow {

    @WorkflowMethod
    void execute();

    @SignalMethod
    void addTask(Task t);
}

@ActivityInterface
public interface TaskProcessorActivity {
    void process(Task poll);
}

public class SerializedExecutionWorkflowImpl implements SerializedExecutionWorkflow {

    private final Queue<Task> taskQueue = new ArrayDeque<>();
    private final TaskProcesorActivity processor = Workflow.newActivityStub(TaskProcesorActivity.class);

    @Override
    public void execute() {
        while(!taskQueue.isEmpty()) {
            processor.process(taskQueue.poll());
        }
    }

    @Override
    public void addTask(Task t) {
        taskQueue.add(t);
    }
}

And then the code that enqueues that task to the workflow through the signal method:

private void addTask(WorkflowClient cadenceClient, Task task) {
    // Set workflowId to userId
    WorkflowOptions options = WorkflowOptions.newBuilder()
       .setTaskQueue(TASK_QUEUE)
       .setWorkflowId(task.getUserId())
       .build();
    // Use workflow interface stub to start/signal workflow instance
    SerializedExecutionWorkflow workflow = temporalClient.newWorkflowStub(SerializedExecutionWorkflow.class, options);
    BatchRequest request = temporalClient.newSignalWithStartRequest();
    request.add(workflow::execute);
    request.add(workflow::addTask, task);
    temporalClient.signalWithStart(request);
}

Temporal offers a lot of other advantages over using queues for task processing.

  • Built it exponential retries with unlimited expiration interval
  • Failure handling. For example, it allows executing a task that notifies another service if both updates couldn't succeed during a configured interval.
  • Support for long running heartbeating operations
  • Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverable failures (SAGA)
  • Gives complete visibility into the current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Temporal every event is recorded.
  • Ability to cancel an update in-flight.
  • Distributed CRON support

See the presentation that goes over the Temporal programming model. It mentions the Cadence project which is the predecessor of Temporal.


what I want to have is a job queue: I have multiple clients that create jobs (publishers), and a number of workers that process these jobs (consumers). Now I want to distribute the jobs created by the publishers to the various consumers, which is basically doable using almost any message queue with load balancing across a queue, e.g. using RabbitMQ or even MQTT 5.

However, now things get complicated... every job refers to an external entity, let's say a user. What I want is that the jobs for a single user get processed in order, but for multiple users in parallel. I do not have the requirement that the jobs for user X always go to worker Y, since they should be processed sequentially anyway.

Evenif it was not this particular use case, I did a survey of (dynamic) task scheduling [0] [1] a couple month ago and nothing like that surfaced.

Every scheduling algorithm I read about have some properties that are common to all other tasks like priority, age, enqueue time, task name (and by extension average time to process). If you tasks were all linked to a user you could build a scheduler that takes user_id into account to pick task from the queue.

But I guess, you don't want to build your own scheduler, anyway it would be waste because, from experience with such need, existing message queues allow to implement your requirement.

To summarize your requirements you need:

A scheduler that run only one task per user at the same time.

The solution is to use a distributed lock, something like REDIS distlock and acquire the lock before the task starts and refresh it regularly during the task execution. If a new task for the same user comes in and try to execute it will fail to acquire the lock and will be re-enqueued.

Here is a pseudo-code:

def my_task(user_id, *args, **kwargs):
    if app.distlock(user_id, blocking=False):
        exec_my_task(user_id, *args, **kwargs)
    else:
        raise RetryTask()

Don't forget to refresh and release the lock.

A similar approach is taken to enforce robots.txt delay between every requests in crawlers.