In this example using atomic_fetch_sub, an indexed container is processed by multiple threads concurrently, without the use of locks:
#include <string>
#include <thread>
#include <vector>
#include <iostream>
#include <atomic>
#include <numeric>
const int N = 10000;
std::atomic<int> cnt;
std::vector<int> data(N);
void reader(int id) {
for (;;) {
int idx = atomic_fetch_sub_explicit(&cnt, 1, std::memory_order_relaxed);
if (idx >= 0) {
std::cout << "reader " << std::to_string(id) << " processed item "
<< std::to_string(data[idx]) << '\n';
}
else {
std::cout << "reader " << std::to_string(id) << " done.\n";
break;
}
}
}
int main() {
std::iota(data.begin(), data.end(), 1);
cnt = data.size() - 1;
std::vector<std::thread> v;
for (int n = 0; n < 10; ++n) {
v.emplace_back(reader, n);
}
for (std::thread& t : v) {
t.join();
}
}
This example code uses a vector filled with integers of size N as the data source, filling it with 1s. The atomic counter object is set to the size of the data vector. After this, 10 threads are created (initialized in place using the vector's emplace_back C++11 feature), which run the reader function.
In that function, we read the current value of the index counter from memory using the atomic_fetch_sub_explicit function, which allows us to use the memory_order_relaxed memory order. This function also subtracts the value we pass from this old value, counting the index down by 1.
So long as the index number we obtain this way is higher or equal to zero, the function continues, otherwise it will quit. Once all the threads have finished, the application exits.