Fences in C++0x, guarantees just on atomics or memory in general

Fences provide ordering on all data. However, in order to guarantee that the fence operation from one thread is visible to a second, you need to use atomic operations for the flag, otherwise you have a data race.

std::atomic<bool> ready(false);
int data=0;

void thread_1()
{
    data=42;
    std::atomic_thread_fence(std::memory_order_release);
    ready.store(true,std::memory_order_relaxed);
}

void thread_2()
{
    if(ready.load(std::memory_order_relaxed))
    {
        std::atomic_thread_fence(std::memory_order_acquire);
        std::cout<<"data="<<data<<std::endl;
    }
}

If thread_2 reads ready to be true, then the fences ensure that data can safely be read, and the output will be data=42. If ready is read to be false, then you cannot guarantee that thread_1 has issued the appropriate fence, so a fence in thread 2 would still not provide the necessary ordering guarantees --- if the if in thread_2 was omitted, the access to data would be a data race and undefined behaviour, even with the fence.

Clarification: A std::atomic_thread_fence(std::memory_order_release) is generally equivalent to a store fence, and will likely be implemented as such. However, a single fence on one processor does not guarantee any memory ordering: you need a corresponding fence on a second processor, AND you need to know that when the acquire fence was executed the effects of the release fence were visible to that second processor. It is obvious that if CPU A issues an acquire fence, and then 5 seconds later CPU B issues a release fence, then that release fence cannot synchronize with the acquire fence. Unless you have some means of checking whether or not the fence has been issued on the other CPU, the code on CPU A cannot tell whether it issued its fence before or after the fence on CPU B.

The requirement that you use an atomic operation to check whether or not the fence has been seen is a consequence of the data race rules: you cannot access a non-atomic variable from multiple threads without an ordering relationship, so you cannot use a non-atomic variable to check for an ordering relationship.

A stronger mechanism such as a mutex can of course be used, but that would render the separate fence pointless, as the mutex would provide the fence.

Relaxed atomic operations are likely just plain loads and stores on modern CPUs, though possibly with additional alignment requirements to ensure atomicity.

Code written to use processor-specific fences can readily be changed to use C++0x fences, provided the operations used to check synchronization (rather than those used to access the synchronized data) are atomic. Existing code may well rely on the atomicity of plain loads and stores on a given CPU, but conversion to C++0x will require using atomic operations for those checks in order to provide the ordering guarantees.


My understanding is that they are proper fences. The circumstantial evidence being that, after all, they are meant to map to features found in actual hardware and which allows efficient implementation of synchronization algorithms. As you say, fences that apply only to some specific values are 1. useless and 2. not found on current hardware.

That being said, AFAICS the section you quote describes the "synchronizes-with" relationship between fences and atomic operations. For a definition of what this means, see section 1.10 Multi-threaded executions and data races. Again, AFAICS, this does not imply that the fences apply only to the atomic objects, but rather I suspect the meaning is that while ordinary loads and stores may pass acquire and release fences in the usual way (one direction only), atomic loads/stores may not.

Wrt. atomic objects, my understanding is that on all targets Linux supports, properly aligned plain integer variables whose sizeof() <= sizeof(*void) are atomic, hence Linux uses normal integers as synchronization variables (that is, the Linux kernel atomic operations operate on normal integer variables). C++ does not want to impose such a limitation, hence the separate atomic integer types. Also, in C++ operations on atomic integer types imply barriers, whereas in the Linux kernel all barriers are explicit (which is sort of obvious since without compiler support for atomic types that is what one must do).