How to implement critical sections on ARM Cortex A9

The most difficult part of handling a critical section without an OS is not actually creating the mutex, but rather figuring out what should happen if code wants to use a resource which is not presently available. The load-exclusive and conditional-store-exclusive instructions make it fairly easy to create an "swap" function which, given a pointer to an integer, will atomically store a new value but return what the pointed-to integer had contained:

int32_t atomic_swap(int32_t *dest, int32_t new_value)
{
  int32_t old_value;
  do
  {
    old_value = __LDREXW(&dest);
  } while(__STREXW(new_value,&dest);
  return old_value;
}

Given a function like the above, one can easily enter a mutex via something like

if (atomic_swap(&mutex, 1)==0)
{
   ... do stuff in mutex ... ;
   mutex = 0; // Leave mutex
}
else
{ 
  ... couldn't get mutex...
}

In the absence of an OS, the main difficulty often lies with the "couldn't get mutex" code. If an interrupt occurs when a mutex-guarded resource is busy, it may be necessary to have the interrupt-handling code set a flag and save some information to indicate what it wanted to do, and then have any main-like code which acquires the mutex check whenever it's going to release the mutex to see whether an interrupt wanted to do something while the mutex was held and, if so, perform the action on behalf of the interrupt.

Although it's possible to avoid problems with interrupts wanting to use mutex-guarded resources by simply disabling interrupts (and indeed, disabling interrupts can eliminate the need for any other kind of mutex), in general it's desirable to avoid disabling interrupts any longer than necessary.

A useful compromise can be to use a flag as described above, but have the main-line code which is going to release the mutex disable interrupts and check the aforementioned flag just before doing so (re-enable interrupts after releasing the mutex). Such an approach doesn't require leaving interrupts disabled very long, but will guard against the possibility that if the main-line code tests the interrupt's flag after releasing the mutex, there's a danger that between the time it sees the flag and the time it acts upon it, it might get preempted by other code which acquires and releases the mutex and and acts upon the interrupt flag; if the main-line code doesn't test the interrupt's flag after releasing the mutex, an interrupt which occurs just before the main-line code releases the mutex might get blocked by the mutex but not noticed by the main-line.

In any case, what's most important will be to have a means by which code that tries to use a mutex-guarded resource when it's unavailable will have a means of repeating its attempt once the resource is released.


This is a heavy handed way to do critical sections; disable interrupts. It may not work if your system has/handles data faults. It will also increase interrupt latency. The Linux irqflags.h has some macros that handle this. The cpsie and cpsid instructions maybe useful; However, they do not save state and will not allow for nesting. cps does not use a register.

For the Cortex-A series, the ldrex/strex are more efficient and can work to form a mutex for critical section or they can be used with lock-free algorithms to get rid of the critical section.

In some sense, the ldrex/strex seem like a ARMv5 swp. However, they are much more complex to implement in practice. You need a working cache and the target memory of the ldrex/strex needs to be in the cache. The ARM documentation on the ldrex/strex is rather nebulous as they want mechanisms to work on non Cortex-A CPUs. However, for the Cortex-A the mechanism to keep local CPU cache in sync with other CPUs is the same one used to implement the ldrex/strex instructions. For the Cortex-A series the reserve granual (size of ldrex/strex reserved memory) is the same as a cache line; you also need to align memory to the cache line if you intend to modify multiple values, like with a doubly linked list.

I suspect there is some subtle error.

mrs %[key], cpsr
orr r1, %[key], #0xC0  ; context switch here?
msr cpsr_c, r1

You need to ensure that the sequence can never be pre-empted. Otherwise, you may get two key variables with interrupts enabled and the lock release will be incorrect. You can use the swp instruction with the key memory to ensure consistency on the ARMv5, but this instruction is deprecated on the Cortex-A in favour of ldrex/strex as it works better for multi-CPU systems.

All of this depends on what kind of scheduling your system has. It sounds like you only have mainlines and interrupts. You often need the critical section primitives to have some hooks to the scheduler depending on what levels (system/user space/etc) you want the critical section to work with.

Also, is there an opensource library that has these types of primitives for ARM (or even a good lightweight spinlock/semephore library)?

This is difficult to write in a portable way. Ie, such libraries may exist for certain versions of ARM CPUs and for specific OSes.


I see several potential problems with those critical sections. There are caveats and solutions to all of these, but as a summary:

  • There's nothing preventing the compiler from moving code across these macros, for optimization or random other reasons.
  • They save and restore some parts of the processor state the compiler expects inline assembly to leave alone (unless it's told otherwise).
  • There's nothing preventing an interrupt from occurring in the middle of the sequence and changing the state between when it's read and when it's written.

First off, you definitely need some compiler memory barriers. GCC implements these as clobbers. Basically, this is a way to tell the compiler "No, you can't move memory accesses across this piece of inline assembly because it might affect the result of the memory accesses." Specifically, you need both "memory" and "cc" clobbers, on both the begin and end macros. These will prevent other things (like function calls) from being reordered relative to the inline assembly too, because the compiler knows they might have memory accesses. I have seen GCC for ARM hold state in condition code registers across inline assembly with "memory" clobbers, so you definitely do need the "cc" clobber.

Secondly, these critical sections are saving and restoring a lot more than just whether interrupts are enabled. Specifically, they're saving and restoring most of the CPSR (Current Program Status Register) (the link is for Cortex-R4 because I couldn't find a nice diagram for an A9, but it should be identical). There are subtle restrictions around which pieces of state can actually be modified, but it's more than necessary here.

Among other things, this includes the condition codes (where the results of instructions like cmp are stored so subsequent conditional instructions can act on the result). The compiler will definitely be confused by this. This is easily solvable using the "cc" clobber as mentioned above. However, this will make code fail every time, so it doesn't sound like what you're seeing problems with. Somewhat of a ticking time bomb though, in that modifying random other code might cause the compiler to do something a little different which will be broken by this.

This will also attempt to save/restore the IT bits, which are used to implement Thumb conditional execution. Note that if you never execute Thumb code, this doesn't matter. I've never figured out how GCC's inline assembly deals with the IT bits, other than concluding it doesn't, meaning the compiler must never put inline assembly in an IT block and always expects the assembly to end outside of an IT block. I've never seen GCC generate code violating these assumptions, and I've done some fairly intricate inline assembly with heavy optimization, so I'm reasonably sure they hold. This means it probably won't actually attempt to change the IT bits, in which case everything is fine. Attempting to modify these bits is classified as "architecturally unpredictable", so it could do all kinds of bad things, but probably won't do anything at all.

The last category of bits which will be saved/restored (besides the ones to actually disable interrupts) are the mode bits. These probably won't change, so it probably won't matter, but if you have any code that deliberately changes modes these interrupt sections could cause problems. Changing between privileged and user mode is the only case of doing this I would expect.

Third, there's nothing preventing an interrupt from changing other parts of CPSR between the MRS and MSR in ARM_INT_LOCK. Any such changes could be overwritten. In most reasonable systems, asynchronous interrupts don't change the state of the code they're interrupt (including CPSR). If they do, it becomes very hard to reason about what code will do. However, it is possible (changing the FIQ disable bit seems most likely to me), so you should consider if your system does this.

Here's how I would implement these in a way which addresses all the potential issues I pointed out:

#define ARM_INT_KEY_TYPE            unsigned int
#define ARM_INT_LOCK(key_)   \
asm volatile(\
    "mrs %[key], cpsr\n\t"\
    "ands %[key], %[key], #0xC0\n\t"\
    "cpsid if\n\t" : [key]"=r"(key_) :: "memory", "cc" );
#define ARM_INT_UNLOCK(key_) asm volatile (\
    "tst %[key], #0x40\n\t"\
    "beq 0f\n\t"\
    "cpsie f\n\t"\
    "0: tst %[key], #0x80\n\t"\
    "beq 1f\n\t"\
    "cpsie i\n\t"
    "1:\n\t" :: [key]"r" (key_) : "memory", "cc")

Make sure to compile with -mcpu=cortex-a9 because at least some GCC versions (like mine) default to an older ARM CPU which doesn't support cpsie and cpsid.

I used ands instead of just and in ARM_INT_LOCK so it's a 16-bit instruction if this is used in Thumb code. The "cc" clobber is necessary anyways, so it's strictly a performance/code size benefit.

0 and 1 are local labels, for reference.

These should be usable in all the same ways as your versions. The ARM_INT_LOCK is just as fast/small as your original one. Unfortunately, I couldn't come up with a way to do ARM_INT_UNLOCK safely in anywhere near as few instructions.

If your system has constraints on when IRQs and FIQs are disabled, this could be simplified. For example, if they're always disabled together, you could combine into one cbz + cpsie if like this:

#define ARM_INT_UNLOCK(key_) asm volatile (\
    "cbz %[key], 0f\n\t"\
    "cpsie if\n\t"\
    "0:\n\t" :: [key]"r" (key_) : "memory", "cc")

Alternatively, if you don't care about FIQs at all then it's similar to just drop enabling/disabling them completely.

If you know that nothing else ever changes any of the other state bits in CPSR between the lock and unlock, then you could also use continue with something very similar to your original code, except with both "memory" and "cc" clobbers in both ARM_INT_LOCK and ARM_INT_UNLOCK