Guarantee only a single asynchronous job runs at a time

Solution:

I've implemented the following in numerous orgs and it works pretty well. It's similar to Keith's suggestion, but a bit more detailed and it does usually do near-realtime processing:

  1. Create a custom object to queue records that need to be sent via callout. Each time a callout is required add a record, use an auto-number to preserve ordering
  2. Do all of your callouts in a Queueable which processes one record at a time
  3. Create another custom object for mutual exclusion (let's call it Mutex__c). This has a external id field on it, referring to the process you want to run (I often use this to manage multiple integrations in one org) and a checkbox field on it called something like Run_Queueable__c
  4. Create a trigger on Mutex__c object which starts a Queueable when Run_Queueable__c turns from false to true
  5. Have your Queuable set Run_Queueable__c to false when it has nothing left to process, otherwise keep re-queueing itself until everything is done
  6. Have a trigger on the queue items which does an upsert with the queueable name and Run_Queueable__c = true

This ensures that only one Queuable is running at once. Even if two transactions start at once, only one of them gets to set the mutual exclusion record from false to true. The other one just writes over the true value with true again, so it doesn't start another Queueable.

So, I'd have something like this as a trigger on the queue object:

Set<String> doCalloutStatuses = new Set<String> {
        'Pending',
        'Retry'
};

for(Integer i=0; i < newList.size(); i++) {
    My_Queue_Object__c newQ = newList[i];

    if(doUpsertStatuses.contains(newQ.Callout_Status__c)
            && (oldList == null
            || !doUpsertStatuses.contains(oldList[i].Callout_Status__c))) {
        upsert new Mutex__c(Queueable_Name = MyQueueable.class.getName(), Run_Queueable__c = true);
        return;
    }
}

I can't really post all the code as it's integrated into a load of internal libraries that we have. But, hopefully, you get the idea.

Generally, it works well for me. The one major complication has been that if the Mutex__c object gets out of sync with what's actually running, then you're in trouble. This can happen during an org-split or SF maintenance where they kill your job before it has chance to set Run_Queueable__c=false. They you get what they call a zombie process in Unix, so you need a scheduled job to go reap them.


The only locking mechanism I know of is the SOQL for update. (But appears broken at the moment for this scenario - see Webservice Callouts within a Select For Update statement are not blocked per Daniel's comment.)

So I suggest say a custom setting field that is queried/set/unset by the Queueable and when the QueryException results the Queueable re-enqueues itself as this means a callout is already in progress.