early-access version 3088

This commit is contained in:
pineappleEA
2022-11-05 15:35:56 +01:00
parent 4e4fc25ce3
commit b601909c6d
35519 changed files with 5996896 additions and 860 deletions

View File

@@ -0,0 +1,25 @@
# Boost.Atomic library documentation Jamfile
#
# Copyright Helge Bahmann 2011.
# Copyright Tim Blechmann 2012.
# Distributed under the Boost Software License, Version 1.0.
# (See accompanying file LICENSE_1_0.txt or copy at
# http://www.boost.org/LICENSE_1_0.txt)
import quickbook ;
import boostbook : boostbook ;
xml atomic : atomic.qbk ;
boostbook standalone
: atomic
: <xsl:param>boost.root=../../../..
<xsl:param>boost.libraries=../../../libraries.htm
<format>pdf:<xsl:param>"boost.url.prefix=http://www.boost.org/doc/libs/release/libs/atomic/doc/html"
;
###############################################################################
alias boostdoc ;
explicit boostdoc ;
alias boostrelease : standalone ;
explicit boostrelease ;

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,160 @@
[/
/ Copyright (c) 2021 Andrey Semashev
/
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
/]
[section:changelog Changelog]
[heading Boost 1.79]
* Fixed compilation for Universal Windows Platform (UWP). ([github_issue 54])
* Added `BOOST_ATOMIC_NO_DARWIN_ULOCK` configuration macro. The macto affects compilation on Darwin systems and disables `ulock`-based implementation of waiting and notifying operations. This may be useful to comply with Apple App Store requirements. ([github_issue 55])
[heading Boost 1.78]
* Use process-local futex operations on Android for non-IPC waiting and notifying operations.
* Added support for Linux targets that only define `SYS_futex_time64` syscall, such as riscv32.
* Added a workaround for incorrect result of `std::alignment_of` on clang 8 for 64-bit types on 32-bit x86 targets.
* Added a ulock backend for waiting and notifying operations on Darwin systems since Mac OS 10.12, iOS 10.0, tvOS 10.0 or watchOS 3.0. The backend supports native 32-bit process-local waiting and notifying operations, and since Mac OS 10.15, iOS 13.0, tvOS 13.0 or watchOS 6.0 - also 64-bit process-local operations and 32 and 64-bit inter-process operations.
* On Windows, corrected discrepancy between [^['atomic-type]::always_has_native_wait_notify] and the corresponding capability macros when targeting Windows 8 or later. The library will now directly use `WaitOnAddress` and related APIs from public headers and therefore require user to link with `synchronization.lib` if the user requires Windows 8 or later by defining `BOOST_USE_WINAPI_VERSION`, `_WIN32_WINNT` or similar macros. The library is linked automatically on compilers that support auto-linking (e.g. MSVC).
* Added support for types with padding bits, except unions, on compilers that provide a way to clear the padding bits. This feature is supported by gcc 11 and MSVC 14.2 (compiler version 19.27) and newer, as well as other compilers supporting similar intrinsics. On compilers that don't allow to clear the padding bits, types with padding are still generally not supported with the exception of 80-bit `long double` on x86 targets. A new `BOOST_ATOMIC_NO_CLEAR_PADDING` capability macro is defined to indicate when clearing the padding is not supported.
* Initializing constructors of `atomic_ref` and `ipc_atomic_ref` no longer use atomic instructions to clear the padding bits in the referenced object. This reduces the cost of the atomic reference construction. This is considered safe because clearing the padding does not issue writes to the bytes that contribute to the object value. However, some thread safety checking software may falsely detect this as a data race.
* Initializing constructors of `atomic` and `ipc_atomic` are now `constexpr` for enums, classes and floating point types. For classes and floating point types, the constructors are `constexpr` if the compiler supports `constexpr` `std::bit_cast`, the type has no padding bytes and no padding is required to implement native atomic operations (i.e., for [^atomic<['T]>], the object of type [^['T]] fits exactly in the internal storage of the atomic).
* In accordance with C++20, default constructors of `atomic` and `ipc_atomic` now perform value initialization of the contained object. For types without a user-defined default constructor, this means the default-constructed atomic will be zero-initialized.
* Added a workaround for compilation error on AIX caused by the assembler tool not supporting numeric labels. ([github_pr 50])
* Added a workaround for compilation error with Visual Studio 2015 prior to Update 3. ([github_issue 52])
[heading Boost 1.77]
* Added `make_atomic_ref` and `make_ipc_atomic_ref` factory functions for constructing atomic reference objects.
* Added C++17 template argument deduction guides for `atomic_ref` and `ipc_atomic_ref` to allow omitting template arguments when they can be deduced from constructor arguments.
[heading Boost 1.76]
* Fixed compilation with MSVC for ARM. ([github_pr 49])
[heading Boost 1.75]
* Implemented SSE2 and SSE4.1 versions of address lookup algorithm, which is used in the internal lock pool implementation. This may improve performance of waiting and notifying operations in heavily contended cases.
* Fixed a possible compilation error on AArch64 targets caused by incorrect instructions generated for bitwise (logical) operations with immediate constants. ([github_issue 41])
[heading Boost 1.74]
* Added missing `const` qualifiers to some operations in `atomic_ref`.
* Added support for `yield` instruction on ARMv8-A. The instruction is used internally in spin loops to reduce CPU power consumption.
* Added support for C++20 [link atomic.interface.interface_wait_notify_ops waiting and notifying operations]. The implementation includes generic backend that involves the internal lock pool, as well as specialized backends for Windows, Linux, FreeBSD, DragonFly BSD and NetBSD. Atomic types provide a new method `has_native_wait_notify`, a static boolean constant `always_has_native_wait_notify` and a set of capability macros that allow to detect if the implementation supports native waiting and notifying operations for a given type.
* Changed internal representation of `atomic_flag` to use 32-bit storage. This allows for more efficient waiting and notifying operations on `atomic_flag` on some platforms.
* Added support for build-time configuration of the internal lock pool size. The user can define the `BOOST_ATOMIC_LOCK_POOL_SIZE_LOG2` macro to specify binary logarithm of the size of the lock pool. The default value is 8, meaning that the size of the lock pool is 256, up from 64 used in the previous release.
* Added support for a new set of atomic types dedicated for [link atomic.interface.interface_ipc inter-process communication]: `ipc_atomic_flag`, `ipc_atomic` and `ipc_atomic_ref`. Users are recommended to port their code using non-IPC types for inter-process communication to the new types. The new types provide the same set of operations as their non-IPC counterparts, with the following differences:
* Most operations have an added precondition that `is_lock_free` returns `true` for the given atomic object. The library will issue a compile time error if this precondition is known to be not satisfied at compile time.
* All provided operations are address-free, meaning that the atomic object (in case of `ipc_atomic_ref` - the referenced object) may be located in process-shared memory or mapped into the same process at multiple different addresses.
* The new `has_native_wait_notify` operation and `always_has_native_wait_notify` constant indicate support for native inter-process waiting and notifying operations. When that support is not present, the operations are implemented with a busy loop, which is less efficient, but still is address-free. A separate set of capability macros is also provided to indicate this support.
* Added new `atomic_unsigned_lock_free` and `atomic_signed_lock_free` types introduced in C++20. The types indicate the atomic object type for an unsigned or signed integer, respectively, that is lock-free and preferably has native support for waiting and notifying operations.
* Added new gcc assembler backends for ARMv8-A (for both AArch32 and AArch64). The new backends are used to implement operations not supported by compiler intrinsics (including 128-bit operations on AArch64) and can also be used when compiler intrinsics are not available. Both little and big endian targets are supported. AArch64 backend supports extensions defined in ARMv8.1 and ARMv8.3.
* Added support for big endian targets in the legacy ARM backend based on gcc assembler blocks (this backend is used on ARMv7 and older targets). Previously, the backend assumed little endian memory layout, which is significant for 64-bit operations.
* Improved performance of seq_cst stores and thread fences on x86 by using `lock`-prefixed instructions instead of `mfence`. This means that the operations no longer affect non-temporal stores, which was also not guaranteed before. Use specialized instructions and intrinsics to order non-temporal memory accesses.
* Fixed capability macros for 80-bit `long double` on x86 targets not indicating lock-free operations even if 128-bit atomic operations were available.
* Fixed compilation of gcc asm blocks on Alpha targets.
* In the gcc `__sync*` intrinsics backend, fixed that store and load operations of large objects (larger than a pointer size) could be non-atomic. The implementation currently assumes that small objects can be stored with a single instruction atomically on all modern architectures.
[heading Boost 1.73]
* Implemented C++20 `atomic_ref`. See [link atomic.interface.interface_atomic_ref docs] and especially the [link atomic.interface.interface_atomic_ref.caveats caveats] section.
* Implemented `atomic_flag::test` operation, which was introduced in C++20.
* `atomic<T>` should now take into account alignment requirements of `T`, which makes a difference if those requirements are higher than that of the internal storage of `atomic`.
* Added static asserts enforcing the requirements on the value type `T` used with `atomic` and `atomic_ref`. This should prohibit invalid types from being used as atomics.
* Improved internal lock pool implementation. The pool is larger, and lock selection accounts for atomic object alignment, which should reduce the potential of thread contention.
* Fixed incorrect x86 code generated for `bit_test_and_*` operations on 8 and 16-bit arguments. Other architectures are not affected.
* Fixed a possible unaligned memory access in `compare_exchange_*` operations, if alignment requirements of `value_type` are less than that of the internal storage of `atomic`.
* `boost/atomic/atomic.hpp` no longer includes `boost/atomic/atomic_flag.hpp` and `boost/atomic/fences.hpp` and only defines the `boost::atomic` class template and related typedefs. Include the other headers explicitly or use `boost/atomic.hpp` to include all parts of Boost.Atomic.
* The `atomic<T>::storage()` accessor and associated `atomic<T>::storage_type` type are deprecated. Instead, users are advised to use `atomic<T>::value()` and `atomic<T>::value_type`, respectively. Users can define `BOOST_ATOMIC_SILENCE_STORAGE_DEPRECATION` to disable deprecation warnings for the time of transition. The deprecated pieces will be removed in a future release.
* Removed support for `BOOST_ATOMIC_DETAIL_HIGHLIGHT_OP_AND_TEST`. This macro was used as a helper for transition to the updated returned values of `*_and_test` operations in Boost.Atomic 1.67, which was released 2 years before 1.73.
[heading Boost 1.72]
* Added a workaround for `__float128` not being considered as a floating point type by some versions of libstdc++.
* Improved compatibility with clang-win compiler.
[heading Boost 1.67]
* [*Breaking change:] Changed the result of the `(op)_and_test` operations added in Boost 1.66 to the opposite - the functions now return `true` if the operation result is non-zero. This is consistent with other `test` methods in Boost.Atomic and the C++ standard library. Users can define `BOOST_ATOMIC_DETAIL_HIGHLIGHT_OP_AND_TEST` when compiling their code to emit warnings on every use of the changed functions. This way users can locate the code that needs to be updated. ([github_issue 11])
* Update for C++2a. On C++11 compilers that support scoped enums, the `memory_order` enumeration is now scoped and contains constants with shorter names like `acquire`, `release` or `seq_cst` (i.e. users can use `memory_order::acquire` instead of `memory_order_acquire`). The old constants are also provided for backward compatibility. ([@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0439r0.html P0439R0])
* Update for C++2a. Added experimental support for atomic operations on floating point types. In addition to general operations, `add`, `sub`, `negate` operations and their `fetch_(op)` and `opaque_(op)` versions are supported. Lock-free property can be tested with the new macros `BOOST_ATOMIC_FLOAT/DOUBLE/LONG_DOUBLE_LOCK_FREE`. The support for floating point types is optional and can be disabled by defining `BOOST_ATOMIC_NO_FLOATING_POINT`. ([@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0020r6.html P0020R6])
* Added new experimental operations:
* `negate_and_test` and `complement_and_test` which perform negation or bitwise complement and return `true` if the result is not zero.
* `add`, `sub`, `negate`, `bitwise_and`, `bitwise_or`, `bitwise_xor`, `bitwise_complement` operations which perform the operation and return its result.
* For generic `atomic<T>` specialization, the default constructor is now trivial if `T`'s default constructor is.
* The internal implementation of `atomic<T>` has been updated to avoid undefined behavior that stems from signed integer overflows. As required by the C++ standard, the library uses two's complement representation of signed integers internally and accroding rules of overflow. Currently, the library requires the native signed integer types to also use two's complement representation (but no defined overflow semantics).
* Improved Clang support. In particular, fixed DCAS not being lock-free and fixed possible incorrect code generated on 32-bit x86.
* Improved MinGW support. For gcc versions up to 4.6, fixed compilation of DCAS on x86.
* In x86 PIE code, asm blocks now preserve `ebx` value.
[heading Boost 1.66]
* Implemented a set of experimental extended atomic operations for integral types:
* `fetch_negate`, `fetch_complement` - atomically replaces the value with a negated or binary complemented value and returns the original value
* `opaque_<op>` - equivalent to `fetch_<op>` except that it doesn't return the original value
* `<op>_and_test` - atomically applies `<op>` and returns `true` if the result is zero. *Note:* The result of these operations will change to the opposite in Boost 1.67. The code that uses these functions will need to be updated.
* `bit_test_and_set`, `bit_test_and_reset`, `bit_test_and_complement` - atomically sets, resets or complements the specified bit and returns the original value of the bit
* Following C++17 ([@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0558r1.pdf P0558R1]), arithmetic operations for pointers to non-object types are no longer provided.
* Also following C++17 exposed `atomic<T>::value_type` and `atomic<T>::difference_type` member typedefs, where applicable, to the user's code.
* Improved compatibility with gcc 7. In particular, using 128-bit operations on x86-64 should no longer require linking with libatomic (the compiler-supplied library).
[heading Boost 1.64]
* Fixed possible incorrect code generation in 64-bit atomic operations on 32-bit x86 with gcc versions older than 4.7 and compatible compilers.
[heading Boost 1.63]
* Added the static constant `atomic<T>::is_always_lock_free` for conformance with C++17. The constant indicates that the given specialization always provides lock-free implementation of atomic operations.
* Improved support of Oracle Studio on x86 targets.
[*Post-release notes:]
* Using 64-bit atomic operations on 32-bit x86 with gcc versions older than 4.7 and compatible compilers can result in generation of incorrect code. This problem is fixed in [@https://github.com/boostorg/atomic/commit/a67cc1b055cf09f371e2eca544884634a1ccc886 this] commit.
[heading Boost 1.62]
* Improved support for Oracle Studio and SPARC. The library now provides native atomic operations on SPARCv8+.
[heading Boost 1.60]
* Enforced proper alignment of `atomic<>` storage. This should fix possible issues on platforms that support atomic operations on data units larger than the native word size. This may also change binary layout of user's data structures that have `atomic<>` members.
* Fixed compilation for PowerPC with IBM XL C++ compiler. Corrected memory barriers in PowerPC assembler.
* Fixed compilation with MSVC-8 for ARM.
* Fixed compilation with gcc 4.4 for x86-64, when 128-bit atomic operations were used. ([ticket 10994])
* Optimized some gcc assembler blocks for x86/x86-64 to reduce the number of used registers. This may require binutils 2.10 or later.
[heading Boost 1.56]
* The library has been redesigned. Besides internal refactoring, various bugs were fixed, including incorrect values of feature test macros and integer overflow handling.
* Changed values of the `memory_order` enumeration. The concrete values are not part of the interface, but this change may potentially break ABI, if the enum is used in user's interfaces.
* Implemented support for 128-bit atomic operations on Windows x64 with MSVC. The library assumes presence of the `cmpxchg16b` instruction in the target CPUs. Some early AMD CPUs don't support this instruction. To target those define the `BOOST_ATOMIC_NO_CMPXCHG16B` macro.
* Implemented experimental support for Windows ARM target with MSVC.
* Implemented experimental support for DEC Alpha target with GCC.
* Improved support for ARMv6 and later with GCC. Implemented all atomic operations as assembler blocks instead of CAS-based loops. 64-bit operations are supported with ARMv7.
* Implemented optional support for the `BOOST_ATOMIC_FLAG_INIT` macro and static initialization of `atomic_flag`. ([ticket 8158])
* Fixed compilation for SPARCv9 target. ([ticket 9446])
* Fixed compilation for PowerPC target. ([ticket 9447])
* Fixed several compatibility problems with Clang on x86 and x86-64. ([ticket 9610], [ticket 9842])
* Removed specialized code for Windows on IA64 platform.
[heading Boost 1.55]
* Added support for 64-bit atomic operations on x86 target for GCC, MSVC and compatible compilers. The support is enabled when it is known at compile time that the target CPU supports required instructions.
* Added support for 128-bit atomic operations on x86-64 target for GCC and compatible compilers. The support is enabled when it is known at compile time that the target CPU supports required instructions. The support can be tested for with the new `BOOST_ATOMIC_INT128_LOCK_FREE` macro.
* Added a more efficient implementation of `atomic<>` based on GCC `__atomic*` intrinsics available since GCC 4.7.
* Added support for more ARM v7 CPUs, improved detection of Thumb 2.
* Added support for x32 (i.e. 64-bit x86 with 32-bit pointers) target on GCC and compatible compilers.
* Removed dependency on Boost.Thread.
* Internal lock pool now includes proper padding and alignment to avoid false sharing.
* Fixed compilation with Intel compiler on Windows. Removed internal macro duplication when compiled on Windows.
* Some code refactoring to use C++11 features when available.
[heading Boost 1.53]
* Initial Boost release with [*Boost.Atomic].
[endsect]

View File

@@ -0,0 +1,399 @@
[/
/ Copyright (c) 2009 Helge Bahmann
/
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
/]
[section:example_reference_counters Reference counting]
The purpose of a ['reference counter] is to count the number
of pointers to an object. The object can be destroyed as
soon as the reference counter reaches zero.
[section Implementation]
[c++]
#include <boost/intrusive_ptr.hpp>
#include <boost/atomic.hpp>
class X {
public:
typedef boost::intrusive_ptr<X> pointer;
X() : refcount_(0) {}
private:
mutable boost::atomic<int> refcount_;
friend void intrusive_ptr_add_ref(const X * x)
{
x->refcount_.fetch_add(1, boost::memory_order_relaxed);
}
friend void intrusive_ptr_release(const X * x)
{
if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
boost::atomic_thread_fence(boost::memory_order_acquire);
delete x;
}
}
};
[endsect]
[section Usage]
[c++]
X::pointer x = new X;
[endsect]
[section Discussion]
Increasing the reference counter can always be done with
[^memory_order_relaxed]: New references to an object can only
be formed from an existing reference, and passing an existing
reference from one thread to another must already provide any
required synchronization.
It is important to enforce any possible access to the object in
one thread (through an existing reference) to ['happen before]
deleting the object in a different thread. This is achieved
by a "release" operation after dropping a reference (any
access to the object through this reference must obviously
happened before), and an "acquire" operation before
deleting the object.
It would be possible to use [^memory_order_acq_rel] for the
[^fetch_sub] operation, but this results in unneeded "acquire"
operations when the reference counter does not yet reach zero
and may impose a performance penalty.
[endsect]
[endsect]
[section:example_spinlock Spinlock]
The purpose of a ['spin lock] is to prevent multiple threads
from concurrently accessing a shared data structure. In contrast
to a mutex, threads will busy-wait and waste CPU cycles instead
of yielding the CPU to another thread. ['Do not use spinlocks
unless you are certain that you understand the consequences.]
[section Implementation]
[c++]
#include <boost/atomic.hpp>
class spinlock {
private:
typedef enum {Locked, Unlocked} LockState;
boost::atomic<LockState> state_;
public:
spinlock() : state_(Unlocked) {}
void lock()
{
while (state_.exchange(Locked, boost::memory_order_acquire) == Locked) {
/* busy-wait */
}
}
void unlock()
{
state_.store(Unlocked, boost::memory_order_release);
}
};
[endsect]
[section Usage]
[c++]
spinlock s;
s.lock();
// access data structure here
s.unlock();
[endsect]
[section Discussion]
The purpose of the spinlock is to make sure that one access
to the shared data structure always strictly "happens before"
another. The usage of acquire/release in lock/unlock is required
and sufficient to guarantee this ordering.
It would be correct to write the "lock" operation in the following
way:
[c++]
lock()
{
while (state_.exchange(Locked, boost::memory_order_relaxed) == Locked) {
/* busy-wait */
}
atomic_thread_fence(boost::memory_order_acquire);
}
This "optimization" is however a) useless and b) may in fact hurt:
a) Since the thread will be busily spinning on a blocked spinlock,
it does not matter if it will waste the CPU cycles with just
"exchange" operations or with both useless "exchange" and "acquire"
operations. b) A tight "exchange" loop without any
memory-synchronizing instruction introduced through an "acquire"
operation will on some systems monopolize the memory subsystem
and degrade the performance of other system components.
[endsect]
[endsect]
[section:singleton Singleton with double-checked locking pattern]
The purpose of the ['Singleton with double-checked locking pattern] is to ensure
that at most one instance of a particular object is created.
If one instance has been created already, access to the existing
object should be as light-weight as possible.
[section Implementation]
[c++]
#include <boost/atomic.hpp>
#include <boost/thread/mutex.hpp>
class X {
public:
static X * instance()
{
X * tmp = instance_.load(boost::memory_order_consume);
if (!tmp) {
boost::mutex::scoped_lock guard(instantiation_mutex);
tmp = instance_.load(boost::memory_order_consume);
if (!tmp) {
tmp = new X;
instance_.store(tmp, boost::memory_order_release);
}
}
return tmp;
}
private:
static boost::atomic<X *> instance_;
static boost::mutex instantiation_mutex;
};
boost::atomic<X *> X::instance_(0);
[endsect]
[section Usage]
[c++]
X * x = X::instance();
// dereference x
[endsect]
[section Discussion]
The mutex makes sure that only one instance of the object is
ever created. The [^instance] method must make sure that any
dereference of the object strictly "happens after" creating
the instance in another thread. The use of [^memory_order_release]
after creating and initializing the object and [^memory_order_consume]
before dereferencing the object provides this guarantee.
It would be permissible to use [^memory_order_acquire] instead of
[^memory_order_consume], but this provides a stronger guarantee
than is required since only operations depending on the value of
the pointer need to be ordered.
[endsect]
[endsect]
[section:example_ringbuffer Wait-free ring buffer]
A ['wait-free ring buffer] provides a mechanism for relaying objects
from one single "producer" thread to one single "consumer" thread without
any locks. The operations on this data structure are "wait-free" which
means that each operation finishes within a constant number of steps.
This makes this data structure suitable for use in hard real-time systems
or for communication with interrupt/signal handlers.
[section Implementation]
[c++]
#include <boost/atomic.hpp>
template<typename T, size_t Size>
class ringbuffer {
public:
ringbuffer() : head_(0), tail_(0) {}
bool push(const T & value)
{
size_t head = head_.load(boost::memory_order_relaxed);
size_t next_head = next(head);
if (next_head == tail_.load(boost::memory_order_acquire))
return false;
ring_[head] = value;
head_.store(next_head, boost::memory_order_release);
return true;
}
bool pop(T & value)
{
size_t tail = tail_.load(boost::memory_order_relaxed);
if (tail == head_.load(boost::memory_order_acquire))
return false;
value = ring_[tail];
tail_.store(next(tail), boost::memory_order_release);
return true;
}
private:
size_t next(size_t current)
{
return (current + 1) % Size;
}
T ring_[Size];
boost::atomic<size_t> head_, tail_;
};
[endsect]
[section Usage]
[c++]
ringbuffer<int, 32> r;
// try to insert an element
if (r.push(42)) { /* succeeded */ }
else { /* buffer full */ }
// try to retrieve an element
int value;
if (r.pop(value)) { /* succeeded */ }
else { /* buffer empty */ }
[endsect]
[section Discussion]
The implementation makes sure that the ring indices do
not "lap-around" each other to ensure that no elements
are either lost or read twice.
Furthermore it must guarantee that read-access to a
particular object in [^pop] "happens after" it has been
written in [^push]. This is achieved by writing [^head_ ]
with "release" and reading it with "acquire". Conversely
the implementation also ensures that read access to
a particular ring element "happens before" before
rewriting this element with a new value by accessing [^tail_]
with appropriate ordering constraints.
[endsect]
[endsect]
[section:mp_queue Lock-free multi-producer queue]
The purpose of the ['lock-free multi-producer queue] is to allow
an arbitrary number of producers to enqueue objects which are
retrieved and processed in FIFO order by a single consumer.
[section Implementation]
[c++]
template<typename T>
class lockfree_queue {
public:
struct node {
T data;
node * next;
};
void push(const T &data)
{
node * n = new node;
n->data = data;
node * stale_head = head_.load(boost::memory_order_relaxed);
do {
n->next = stale_head;
} while (!head_.compare_exchange_weak(stale_head, n, boost::memory_order_release));
}
node * pop_all(void)
{
T * last = pop_all_reverse(), * first = 0;
while(last) {
T * tmp = last;
last = last->next;
tmp->next = first;
first = tmp;
}
return first;
}
lockfree_queue() : head_(0) {}
// alternative interface if ordering is of no importance
node * pop_all_reverse(void)
{
return head_.exchange(0, boost::memory_order_consume);
}
private:
boost::atomic<node *> head_;
};
[endsect]
[section Usage]
[c++]
lockfree_queue<int> q;
// insert elements
q.push(42);
q.push(2);
// pop elements
lockfree_queue<int>::node * x = q.pop_all()
while(x) {
X * tmp = x;
x = x->next;
// process tmp->data, probably delete it afterwards
delete tmp;
}
[endsect]
[section Discussion]
The implementation guarantees that all objects enqueued are
processed in the order they were enqueued by building a singly-linked
list of object in reverse processing order. The queue is atomically
emptied by the consumer (in an operation that is not only lock-free but
wait-free) and brought into correct order.
It must be guaranteed that any access to an object to be enqueued
by the producer "happens before" any access by the consumer. This
is assured by inserting objects into the list with ['release] and
dequeuing them with ['consume] memory order. It is not
necessary to use ['acquire] memory order in [^waitfree_queue::pop_all]
because all operations involved depend on the value of
the atomic pointer through dereference.
[endsect]
[endsect]

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 38 KiB

View File

@@ -0,0 +1,312 @@
[/
/ Copyright (c) 2009 Helge Bahmann
/
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
/]
[section:template_organization Organization of class template layers]
The implementation uses multiple layers of template classes that
inherit from the next lower level each and refine or adapt the respective
underlying class:
* [^boost::atomic<T>] is the topmost-level, providing
the external interface. Implementation-wise, it does not add anything
(except for hiding copy constructor and assignment operator).
* [^boost::detail::atomic::internal_atomic&<T,S=sizeof(T),I=is_integral_type<T> >]:
This layer is mainly responsible for providing the overloaded operators
mapping to API member functions (e.g. [^+=] to [^fetch_add]).
The defaulted template parameter [^I] allows
to expose the correct API functions (via partial template
specialization): For non-integral types, it only
publishes the various [^exchange] functions
as well as load and store, for integral types it
additionally exports arithmetic and logic operations.
[br]
Depending on whether the given type is integral, it
inherits from either [^boost::detail::atomic::platform_atomic<T,S=sizeof(T)>]
or [^boost::detail::atomic::platform_atomic_integral<T,S=sizeof(T)>].
There is however some special-casing: for non-integral types
of size 1, 2, 4 or 8, it will coerce the datatype into an integer representation
and delegate to [^boost::detail::atomic::platform_atomic_integral<T,S=sizeof(T)>]
-- the rationale is that platform implementors only need to provide
integer-type operations.
* [^boost::detail::atomic::platform_atomic_integral<T,S=sizeof(T)>]
must provide the full set of operations for an integral type T
(i.e. [^load], [^store], [^exchange],
[^compare_exchange_weak], [^compare_exchange_strong],
[^fetch_add], [^fetch_sub], [^fetch_and],
[^fetch_or], [^fetch_xor], [^is_lock_free]).
The default implementation uses locking to emulate atomic operations, so
this is the level at which implementors should provide template specializations
to add support for platform-specific atomic operations.
[br]
The two separate template parameters allow separate specialization
on size and type (which, with fixed size, cannot
specify more than signedness/unsignedness). The rationale is that
most platform-specific atomic operations usually depend only on the
operand size, so that common implementations for signed/unsigned
types are possible. Signedness allows to properly to choose sign-extending
instructions for the [^load] operation, avoiding later
conversion. The expectation is that in most implementations this will
be a normal assignment in C, possibly accompanied by memory
fences, so that the compiler can automatically choose the correct
instruction.
* At the lowest level, [^boost::detail::atomic::platform_atomic<T,S=sizeof(T)>]
provides the most basic atomic operations ([^load], [^store],
[^exchange], [^compare_exchange_weak],
[^compare_exchange_strong]) for arbitrarily generic data types.
The default implementation uses locking as a fallback mechanism.
Implementors generally do not have to specialize at this level
(since these will not be used for the common integral type sizes
of 1, 2, 4 and 8 bytes), but if s/he can if s/he so wishes to
provide truly atomic operations for "odd" data type sizes.
Some amount of care must be taken as the "raw" data type
passed in from the user through [^boost::atomic<T>]
is visible here -- it thus needs to be type-punned or otherwise
manipulated byte-by-byte to avoid using overloaded assignment,
comparison operators and copy constructors.
[endsect]
[section:platform_atomic_implementation Implementing platform-specific atomic operations]
In principle implementors are responsible for providing the
full range of named member functions of an atomic object
(i.e. [^load], [^store], [^exchange],
[^compare_exchange_weak], [^compare_exchange_strong],
[^fetch_add], [^fetch_sub], [^fetch_and],
[^fetch_or], [^fetch_xor], [^is_lock_free]).
These must be implemented as partial template specializations for
[^boost::detail::atomic::platform_atomic_integral<T,S=sizeof(T)>]:
[c++]
template<typename T>
class platform_atomic_integral<T, 4>
{
public:
explicit platform_atomic_integral(T v) : i(v) {}
platform_atomic_integral(void) {}
T load(memory_order order=memory_order_seq_cst) const volatile
{
// platform-specific code
}
void store(T v, memory_order order=memory_order_seq_cst) volatile
{
// platform-specific code
}
private:
volatile T i;
};
As noted above, it will usually suffice to specialize on the second
template argument, indicating the size of the data type in bytes.
[section:automatic_buildup Templates for automatic build-up]
Often only a portion of the required operations can be
usefully mapped to machine instructions. Several helper template
classes are provided that can automatically synthesize missing methods to
complete an implementation.
At the minimum, an implementor must provide the
[^load], [^store],
[^compare_exchange_weak] and
[^is_lock_free] methods:
[c++]
template<typename T>
class my_atomic_32 {
public:
my_atomic_32() {}
my_atomic_32(T initial_value) : value(initial_value) {}
T load(memory_order order=memory_order_seq_cst) volatile const
{
// platform-specific code
}
void store(T new_value, memory_order order=memory_order_seq_cst) volatile
{
// platform-specific code
}
bool compare_exchange_weak(T &expected, T desired,
memory_order success_order,
memory_order_failure_order) volatile
{
// platform-specific code
}
bool is_lock_free() const volatile {return true;}
protected:
// typedef is required for classes inheriting from this
typedef T integral_type;
private:
T value;
};
The template [^boost::detail::atomic::build_atomic_from_minimal]
can then take care of the rest:
[c++]
template<typename T>
class platform_atomic_integral<T, 4>
: public boost::detail::atomic::build_atomic_from_minimal<my_atomic_32<T> >
{
public:
typedef build_atomic_from_minimal<my_atomic_32<T> > super;
explicit platform_atomic_integral(T v) : super(v) {}
platform_atomic_integral(void) {}
};
There are several helper classes to assist in building "complete"
atomic implementations from different starting points:
* [^build_atomic_from_minimal] requires
* [^load]
* [^store]
* [^compare_exchange_weak] (4-operand version)
* [^build_atomic_from_exchange] requires
* [^load]
* [^store]
* [^compare_exchange_weak] (4-operand version)
* [^compare_exchange_strong] (4-operand version)
* [^exchange]
* [^build_atomic_from_add] requires
* [^load]
* [^store]
* [^compare_exchange_weak] (4-operand version)
* [^compare_exchange_strong] (4-operand version)
* [^exchange]
* [^fetch_add]
* [^build_atomic_from_typical] (<I>supported on gcc only</I>) requires
* [^load]
* [^store]
* [^compare_exchange_weak] (4-operand version)
* [^compare_exchange_strong] (4-operand version)
* [^exchange]
* [^fetch_add_var] (protected method)
* [^fetch_inc] (protected method)
* [^fetch_dec] (protected method)
This will generate a [^fetch_add] method
that calls [^fetch_inc]/[^fetch_dec]
when the given parameter is a compile-time constant
equal to +1 or -1 respectively, and [^fetch_add_var]
in all other cases. This provides a mechanism for
optimizing the extremely common case of an atomic
variable being used as a counter.
The prototypes for these methods to be implemented is:
[c++]
template<typename T>
class my_atomic {
public:
T fetch_inc(memory_order order) volatile;
T fetch_dec(memory_order order) volatile;
T fetch_add_var(T counter, memory_order order) volatile;
};
These helper templates are defined in [^boost/atomic/detail/builder.hpp].
[endsect]
[section:automatic_buildup_small Build sub-word-sized atomic data types]
There is one other helper template that can build sub-word-sized
atomic data types even though the underlying architecture allows
only word-sized atomic operations:
[c++]
template<typename T>
class platform_atomic_integral<T, 1> :
public build_atomic_from_larger_type<my_atomic_32<uint32_t>, T>
{
public:
typedef build_atomic_from_larger_type<my_atomic_32<uint32_t>, T> super;
explicit platform_atomic_integral(T v) : super(v) {}
platform_atomic_integral(void) {}
};
The above would create an atomic data type of 1 byte size, and
use masking and shifts to map it to 32-bit atomic operations.
The base type must implement [^load], [^store]
and [^compare_exchange_weak] for this to work.
[endsect]
[section:other_sizes Atomic data types for unusual object sizes]
In unusual circumstances, an implementor may also opt to specialize
[^public boost::detail::atomic::platform_atomic<T,S=sizeof(T)>]
to provide support for atomic objects not fitting an integral size.
If you do that, keep the following things in mind:
* There is no reason to ever do this for object sizes
of 1, 2, 4 and 8
* Only the following methods need to be implemented:
* [^load]
* [^store]
* [^compare_exchange_weak] (4-operand version)
* [^compare_exchange_strong] (4-operand version)
* [^exchange]
The type of the data to be stored in the atomic
variable (template parameter [^T])
is exposed to this class, and the type may have
overloaded assignment and comparison operators --
using these overloaded operators however will result
in an error. The implementor is responsible for
accessing the objects in a way that does not
invoke either of these operators (using e.g.
[^memcpy] or type-casts).
[endsect]
[endsect]
[section:platform_atomic_fences Fences]
Platform implementors need to provide a function performing
the action required for [funcref boost::atomic_thread_fence atomic_thread_fence]
(the fallback implementation will just perform an atomic operation
on an integer object). This is achieved by specializing the
[^boost::detail::atomic::platform_atomic_thread_fence] template
function in the following way:
[c++]
template<>
void platform_atomic_thread_fence(memory_order order)
{
// platform-specific code here
}
[endsect]
[section:platform_atomic_puttogether Putting it altogether]
The template specializations should be put into a header file
in the [^boost/atomic/detail] directory, preferably
specifying supported compiler and architecture in its name.
The file [^boost/atomic/detail/platform.hpp] must
subsequently be modified to conditionally include the new
header.
[endsect]