Is atomic_thread_fence(memory_order_release) different from using memory_order_acq_rel?

A standalone fence imposes stronger ordering than an atomic operation with the same ordering constraint, but this does not change the direction in which ordering is enforced.

Bot an atomic release operation and a standalone release fence are uni-directional, but the atomic operation orders with respect to itself whereas the atomic fence imposes ordering with respect to other stores.

For example, an atomic operation with release semantics:

std::atomic<int> sync{0};

// memory operations A

sync.store(1, std::memory_order_release);

// store B

This guarantees that no memory operation part of A (loads & stores) can be (visibly) reordered with the atomic store itself. But it is uni-directional and no ordering rules apply to memory operations that are sequenced after the atomic operation; therefore, store B can still be reordered with any of the memory operations in A.

A standalone release fence changes this behavior:

// memory operations A

std::atomic_thread_fence(std::memory_order_release);

// load X

sync.store(1, std::memory_order_relaxed);

// stores B

This guarantees that no memory operation in A can be (visibly) reordered with any of the stores that are sequenced after the release fence. Here, the store to B can no longer be reordered with any of the memory operations in A, and as such, the release fence is stronger than the atomic release operation. But it also uni-directional since the load from X can still be reordered with any memory operation in A.

The difference is subtle and usually an atomic release operation is preferred over a standalone release fence.

The rules for a standalone acquire fence are similar, except that it enforces ordering in the opposite direction and operates on loads:

// loads B

sync.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);

// memory operations A

No memory operation in A can be reordered with any load that is sequenced before the standalone acquire fence.

A standalone fence with std::memory_order_acq_rel ordering combines the logic for both acquire and release fences.

// memory operations A
// load A

std::atomic_thread_fence(std::memory_order_acq_rel);

// store B
//memory operations B

But this can get incredibly tricky once you realize that a store in A can still be reordered with a load in B. Acq/rel fences should probably be avoided in favor of regular atomic operations, or even better, mutexes.