Should volatile still be used for sharing data with ISRs in modern C++?

Short answer: always use std::atomic<T> whose is_lock_free() returns true.

Reasoning:

  1. volatile can work reliably on simple architectures (single-core, no cache, ARM/Cortex-M) like STM32F2 or ATSAMG55 with e.g. IAR compiler. But...
  2. It may fail to work as expected on more complex architectures (multi-core with cache) and when compiler tries to do certain optimisations (many examples in other answers, won't repeat that).
  3. atomic_flag and atomic_int (if is_lock_free() which they should) are safe to use anywhere, because they work like volatile with added memory bariers / synchronization when needed (avoiding the problems in previous point).
  4. The reason I specifically said you have to only use those with is_lock_free() being true is because you cannot stop IRQ as you could stop a thread. No, IRQ interrupts main loop and does its job, it cannot wait-lock on a mutex because it is blocking the main loop it would be waiting for.

Practical note: I personally either use atomic_flag (the one and only guaranteed to work) to implement sort of spin-lock, where ISR will disable itself when finding the lock locked, while main loop will always re-enable the ISR after unlocking. Or I use double-counter lock-free queue (SPSC - single producer, single consumer) using that atomit_int. (Have one reader-counter and one writer-counter, subtract to find the real count. Good for UART etc.)


Of the commercial compilers I've tested that weren't based on gcc or clang, all of them would treat a read or write via volatile pointer or lvalue as being capable of accessing any other object, without regard for whether it would seem possible for the pointer or lvalue to hit the object in question. Some, such as MSVC, formally documented the fact that volatile writes have release semantics and volatile reads have acquire semantics, while others would require a read/write pair to achieve acquire semantics.

Such semantics make it possible to use volatile objects to build a mutex that can guard "ordinary" objects on systems with a strong memory model (including single-core systems with interrupts), or on compilers that apply acquire/release barriers at the hardware memory ordering level rather than merely the compiler ordering level.

Neither clang or gcc, however, offers any option other than -O0 which would offer such semantics, since they would impede "optimizations" that would otherwise be able to convert code that performs seemingly-redundant loads and stores [that are actually needed for correct operation] into "more efficient" code [that doesn't work]. To make one's code usable with those, I would recommend defining a 'memory clobber' macro (which for clang or gcc would be asm volatile ("" ::: "memory");) and invoking it between the action which needs to precede a volatile write and the write itself, or between a volatile read and the first action which would need to follow it. If one does that, that would allow one's code to be readily adapted to implementations that would neither support nor require such barriers, simply by defining the macro as an empty expansion.

Note that while some compilers interpret all asm directives as a memory clobber, and there wouldn't be any other purpose for an empty asm directive, gcc simply ignores empty asm directives rather than interpreting them in such fashion.

An example of a situation where gcc's optimizations would prove problematic (clang seems to handle this particular case correctly, but some others still pose problems):

short buffer[10];
volatile short volatile *tx_ptr;
volatile int tx_count;
void test(void)
{
    buffer[0] = 1;
    tx_ptr = buffer;
    tx_count = 1;
    while(tx_count)
        ;
    buffer[0] = 2;
    tx_ptr = buffer;
    tx_count = 1;
    while(tx_count)
        ;
}

GCC will decide to optimize out the assignment buffer[0]=1; because the Standard doesn't require it to recognize that storing the buffer's address into a volatile might have side effects that would interact with the value stored there.

[edit: further experimentation shows that icc will reorder accesses to volatile objects, but since it reorders them even with respect to each other, I'm not sure what to make of that, since that would seem broken by any imaginable interpretation of the Standard].


I think that in this case both volatile and atomic will most likely work in practice on the 32 bit ARM. At least in an older version of STM32 tools I saw that in fact the C atomics were implemented using volatile for small types.

Volatile will work because the compiler may not optimize away any access to the variable that appears in the code.

However, the generated code must differ for types that cannot be loaded in a single instruction. If you use a volatile int64_t, the compiler will happily load it in two separate instructions. If the ISR runs between loading the two halves of the variable, you will load half the old value and half the new value.

Unfortunately using atomic<int64_t> may also fail with interrupt service routines if the implementation is not lock free. For Cortex-M, 64-bit accesses are not necessarily lockfree, so atomic should not be relied on without checking the implementation. Depending on the implementation, the system might deadlock if the locking mechanism is not reentrant and the interrupt happens while the lock is held. Since C++17, this can be queried by checking atomic<T>::is_always_lock_free. A specific answer for a specific atomic variable (this may depend on alignment) may be obtained by checking flagA.is_lock_free() since C++11.

So longer data must be protected by a separate mechanism (for example by turning off interrupts around the access and making the variable atomic or volatile.

So the correct way is to use std::atomic, as long as the access is lock free. If you are concerned about performance, it may pay off to select the appropriate memory order and stick to values that can be loaded in a single instruction.

Not using either would be wrong, the compiler will check the flag only once.

These functions all wait for a flag, but they get translated differently:

#include <atomic>
#include <cstdint>

using FlagT = std::int32_t;

volatile FlagT flag = 0;
void waitV()
{
    while (!flag) {}
}

std::atomic<FlagT> flagA;
void waitA()
{
    while(!flagA) {}    
}

void waitRelaxed()
{
    while(!flagA.load(std::memory_order_relaxed)) {}    
}

FlagT wrongFlag;
void waitWrong()
{
    while(!wrongFlag) {}
}

Using volatile you get a loop that reexamines the flag as you wanted:

waitV():
        ldr     r2, .L5
.L2:
        ldr     r3, [r2]
        cmp     r3, #0
        beq     .L2
        bx      lr
.L5:
        .word   .LANCHOR0

Atomic with the default sequentially consistent access produces synchronized access:

waitA():
        push    {r4, lr}
.L8:
        bl      __sync_synchronize
        ldr     r3, .L11
        ldr     r4, [r3, #4]
        bl      __sync_synchronize
        cmp     r4, #0
        beq     .L8
        pop     {r4}
        pop     {r0}
        bx      r0
.L11:
        .word   .LANCHOR0

If you do not care about the memory order you get a working loop just as with volatile:

waitRelaxed():
        ldr     r2, .L17
.L14:
        ldr     r3, [r2, #4]
        cmp     r3, #0
        beq     .L14
        bx      lr
.L17:
        .word   .LANCHOR0

Using neither volatile nor atomic will bite you with optimization enabled, as the flag is only checked once:

waitWrong():
        ldr     r3, .L24
        ldr     r3, [r3, #8]
        cmp     r3, #0
        bne     .L23
.L22:                        // infinite loop!
        b       .L22
.L23:
        bx      lr
.L24:
        .word   .LANCHOR0
flag:
flagA:
wrongFlag: