Memory barriers (fences) are not always used in the C++11 memory model for atomic operations. In the GCC built-in atomics API, this is reflected in the memorder parameter in its functions. The possible values for this map directly to the values in the C++11 atomics API:
- __ATOMIC_RELAXED: Implies no inter-thread ordering constraints.
- __ATOMIC_CONSUME: This is currently implemented using the stronger __ATOMIC_ACQUIRE memory order because of a deficiency in C++11's semantics for memory_order_consume.
- __ATOMIC_ACQUIRE: Creates an inter-thread happens-before constraint from the release (or stronger) semantic store to this acquire load
- __ATOMIC_RELEASE: Creates an inter-thread happens-before constraint to acquire (or stronger) semantic loads that read from this release store
- __ATOMIC_ACQ_REL: Combines the effects of both __ATOMIC_ACQUIRE and __ATOMIC_RELEASE.
- __ATOMIC_SEQ_CST: Enforces total ordering with all other __ATOMIC_SEQ_CST operations.
The preceding list was copied from the GCC manual's chapter on atomics for GCC 7.1. Along with the comments in that chapter, it makes it quite clear that trade-offs were made when implementing both the C++11 atomics support within its memory model and in the compiler's implementation.
Since atomics rely on the underlying hardware support, there will never be a single piece of code using atomics that will work across a wide variety of architectures.