A fast single-producer, single-consumer lock-free queue for C++

Overview

A single-producer, single-consumer lock-free queue for C++

This mini-repository has my very own implementation of a lock-free queue (that I designed from scratch) for C++.

It only supports a two-thread use case (one consuming, and one producing). The threads can't switch roles, though you could use this queue completely from a single thread if you wish (but that would sort of defeat the purpose!).

Note: If you need a general-purpose multi-producer, multi-consumer lock free queue, I have one of those too.

Features

  • Blazing fast
  • Compatible with C++11 (supports moving objects instead of making copies)
  • Fully generic (templated container of any type) -- just like std::queue, you never need to allocate memory for elements yourself (which saves you the hassle of writing a lock-free memory manager to hold the elements you're queueing)
  • Allocates memory up front, in contiguous blocks
  • Provides a try_enqueue method which is guaranteed never to allocate memory (the queue starts with an initial capacity)
  • Also provides an enqueue method which can dynamically grow the size of the queue as needed
  • Also provides try_emplace/emplace convenience methods
  • Has a blocking version with wait_dequeue
  • Completely "wait-free" (no compare-and-swap loop). Enqueue and dequeue are always O(1) (not counting memory allocation)
  • On x86, the memory barriers compile down to no-ops, meaning enqueue and dequeue are just a simple series of loads and stores (and branches)

Use

Simply drop the readerwriterqueue.h and atomicops.h files into your source code and include them :-) A modern compiler is required (MSVC2010+, GCC 4.7+, ICC 13+, or any C++11 compliant compiler should work).

Note: If you're using GCC, you really do need GCC 4.7 or above -- 4.6 has a bug that prevents the atomic fence primitives from working correctly.

Example:

using namespace moodycamel;

ReaderWriterQueue<int> q(100);       // Reserve space for at least 100 elements up front

q.enqueue(17);                       // Will allocate memory if the queue is full
bool succeeded = q.try_enqueue(18);  // Will only succeed if the queue has an empty slot (never allocates)
assert(succeeded);

int number;
succeeded = q.try_dequeue(number);  // Returns false if the queue was empty

assert(succeeded && number == 17);

// You can also peek at the front item of the queue (consumer only)
int* front = q.peek();
assert(*front == 18);
succeeded = q.try_dequeue(number);
assert(succeeded && number == 18);
front = q.peek(); 
assert(front == nullptr);           // Returns nullptr if the queue was empty

The blocking version has the exact same API, with the addition of wait_dequeue and wait_dequeue_timed methods:

BlockingReaderWriterQueue<int> q;

std::thread reader([&]() {
    int item;
#if 1
    for (int i = 0; i != 100; ++i) {
        // Fully-blocking:
        q.wait_dequeue(item);
    }
#else
    for (int i = 0; i != 100; ) {
        // Blocking with timeout
        if (q.wait_dequeue_timed(item, std::chrono::milliseconds(5)))
            ++i;
    }
#endif
});
std::thread writer([&]() {
    for (int i = 0; i != 100; ++i) {
        q.enqueue(i);
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
    }
});
writer.join();
reader.join();

assert(q.size_approx() == 0);

Note that wait_dequeue will block indefinitely while the queue is empty; this means care must be taken to only call wait_dequeue if you're sure another element will come along eventually, or if the queue has a static lifetime. This is because destroying the queue while a thread is waiting on it will invoke undefined behaviour.

CMake installation

As an alternative to including the source files in your project directly, you can use CMake to install the library in your system's include directory:

mkdir build
cd build
cmake ..
make install

Then, you can include it from your source code:

#include <readerwriterqueue/readerwriterqueue.h>

Disclaimers

The queue should only be used on platforms where aligned integer and pointer access is atomic; fortunately, that includes all modern processors (e.g. x86/x86-64, ARM, and PowerPC). Not for use with a DEC Alpha processor (which has very weak memory ordering) :-)

Note that it's only been tested on x86(-64); if someone has access to other processors I'd love to run some tests on anything that's not x86-based.

Finally, I am not an expert. This is my first foray into lock-free programming, and though I'm confident in the code, it's possible that there are bugs despite the effort I put into designing and testing this data structure.

Use this code at your own risk; in particular, lock-free programming is a patent minefield, and this code may very well violate a pending patent (I haven't looked). It's worth noting that I came up with this algorithm and implementation from scratch, independent of any existing lock-free queues.

More info

See the LICENSE.md file for the license (simplified BSD).

My blog post introduces the context that led to this code, and may be of interest if you're curious about lock-free programming.

Comments
  • Don't destruct element after moving in BlockingReaderWriterQueue's inner_dequeue(..)

    Don't destruct element after moving in BlockingReaderWriterQueue's inner_dequeue(..)

    Clang's static analyzer warns that the destructor should not be called on the object that's had its content moved out.

    2021-02-12_19-44

    Given this warning was coming from clang-11, I opted to install clang 12 and 13 to see if it was possibly a false-positive (and fixed in newer releases). This isn't the case and all three versions of clang's static analyzer flag this.

    This appears to be a valid issue, given the element object should be left in a well-defined state even after its contents have been std::moved out.

    For example, the element object could just as easily have more content moved back into it; or it could allocate new content. Calling the destructor on it closes that door for good though, and doesn't leave element in a workable state.

    This logic is also confirmed in the last paragraph here:


    A typical move constructor:

    Holder(Holder&& other) // <-- rvalue reference in input
    {
        m_data = other.m_data;   // (1)
        m_size = other.m_size;
        other.m_data = nullptr;  // (2)
        other.m_size = 0;
    }
    

    It takes in input an rvalue reference to another Holder object. This is the key part: being an rvalue reference, we can modify it.

    So let's steal its data first (1), then set it to null (2). No deep copies here, we have just moved resources around!

    It's important to set the rvalue reference data to some valid state (2) to prevent it from being accidentally deleted when the temporary object dies: our Holder destructor calls delete[] m_data, remember?

    In general, for reasons that will become more clear in a few paragraphs, it's a good idea to always leave the objects being stolen from in some well-defined state.


    opened by kcgen 37
  • Feature request: wait_enqueue for BlockingReaderWriterQueue

    Feature request: wait_enqueue for BlockingReaderWriterQueue

    The BlockingReaderWriterQueue doesn't block the producer when the queue is full, which leads to unbounded queue growth.

    Unbounded queue growth is undesirable in almost every normal system because it leads to runaway usage:

    • CPU: excessive generation-side CPU usage (producing content an unbounded rate)
    • Memory: excessive per-socket memory needs (due to unbounded send-side buffer growth)
    • Time: excessive latency in an audio-feed (due to a unbounded send-side delivery)
    • Time: excessive buffer-bloat in a network-feed (due to a unbounded send-side delivery)

    And ideal system:

    • Consumes when the queue has at least one item and waits when it's empty
    • Produces when space exists in the queue and waits when it's full
    • The queue and its maximum size ties the two together, and sets the overall resource usage characteristics for the system.

    Currently the BlockingReaderWriterQueue meets the first (throttling the consumer), but not mechanism exists to throttle the producer, so currently some form of bandaid code is needed to deal with the second.

    It would be fantastic to have wait_enqueue as an available feature/function!

    Here's some sample code to illustrate the problem.

    g++ test.cpp -pthread && ./a.out

    #include "readerwriterqueue.h"
    #include <thread>
    #include <stdio.h>
    
    using namespace moodycamel;
    
    constexpr auto MAX_QUEUE_DEPTH = 8;
    BlockingReaderWriterQueue<int, MAX_QUEUE_DEPTH> q(MAX_QUEUE_DEPTH);
    
    int create_next_item()
    {
    	static int i = 1;
    	std::this_thread::sleep_for(std::chrono::milliseconds(100));
    	return i++;
    }
    
    void operate_on_item(const int &item)
    {
    	std::this_thread::sleep_for(std::chrono::seconds(1));
    }
    
    int main()
    {
    	printf("Running single-producer, single-consumer with a %d queue depth\n",
    	       MAX_QUEUE_DEPTH);
    
    	std::thread reader([&]() {
    		int item;
    		while (true) {
    			q.wait_dequeue(item);
    			operate_on_item(item);
    			printf("Operated on%3d, %3lu items queued\n", item,
    			       q.size_approx());
    		}
    	});
    
    	std::thread writer([&]() {
    		while (true) {
    			const auto item = create_next_item();
    			q.enqueue(item);
    		}
    	});
    
    	writer.join();
    	reader.join();
    
    	assert(q.size_approx() == 0);
    
    	return 0;
    }
    
    opened by kcgen 21
  • Can it support FreeRTOS?

    Can it support FreeRTOS?

    When compiling in a FreeRTOS (#define FREERTOS 1) environment, the header build reports:

    readerwriterqueue/atomicops.h 581:2: error: #error Unsupported platform! (No semaphore wrapper available)

    In an environment that does not support it, but a libc++ is available, we (should) have the following at least:

    #ifdef __cplusplus
    #if __cplusplus > 202000L
    #include <semaphore>
        std::binary_semaphore wakeup_;
    #else
    #include <condition_variable>
    #include <mutex>
        std::condition_variable wakeup_;
        std::mutex lock_;
    #endif
    #endif
    

    For a direct FreeRTOS implementation, I would expect it to use something like:

    #include <FreeRTOS.h>
    #include <semphr.h>
    #include <task.h>
    *__m = xSemaphoreCreateCounting(max,initial);
    xSemaphoreTake(,*__m, portMAX_DELAY);
    xSemaphoreGive(*__m);
    vSemaphoreDelete(*__m);
    
    opened by salyzyn 14
  • Add emplace() and try_emplace() APIs.

    Add emplace() and try_emplace() APIs.

    Change-Id: I9236fdc4b9c143afd81fed509c8d6d4803878e5a

    Add C++11 style construct-in-place functionality. This is particularly useful for try_emplace(), which may fail if queue is full, as it avoid unnecessary work, and invoking move on the params.

    See also: https://github.com/cameron314/readerwriterqueue/issues/54 http://en.cppreference.com/w/cpp/container/queue/emplace

    opened by mark-99 14
  • ThreadSanitizer complains about vector being move-assigned

    ThreadSanitizer complains about vector being move-assigned

    I'm developing an audio application, where the (realtime) Audio thread and the GUI thread communicate via a ReaderWriterQueue. I have a queue containing objects of type PitchCurveUpdateMessage, which has a field of type std::vector<std::pair<int, ClientsidePitchData>>. I populate the vector in the GUI thread, wrap it in a PitchCurveUpdate object and send it over to the Audio thread via queue.try_enqueue.

    On the audio thread, I have a function called fetch(), which calls try_dequeue, providing a pre-allocated PitchCurveUpdateMessage as the target object to avoid memory allocation. To me, this seems like basic usage of the ReaderWriterQueue, however, XCode 10's ThreadSanitizer reports a data race in my code (see below).

    Is there anything I'm missing when working with std::vectors? Do I need to protect them in any special way? Are vectors not suitable for use in a ReaderWriterQueue, as they allocate their own heap memory?

    Thanks in advance for any help!

    opened by CrushedPixel 11
  • blocking dequeue until enqueue

    blocking dequeue until enqueue

    Is there an efficient way of implementing a blocking dequeue? Am I thinking about it wrong? I don't want my consumer thread spinning if there's no data.

    opened by vlovich 11
  • __memcpy_ssse3_back

    __memcpy_ssse3_back

    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x00007eff2dff036b in __memcpy_ssse3_back () from /lib64/libc.so.6
    Missing separate debuginfos, use: debuginfo-install boost-iostreams-1.53.0-27.el7.x86_64 boost-program-options-1.53.0-27.el7.x86_64 boost-regex-1.53.0-27.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 devtoolset-3-elfutils-libelf-0.161-1.el7.x86_64 glibc-2.17-260.el7_6.4.x86_64 libicu-50.1.2-17.el7.x86_64
    (gdb) bt
    #0  0x00007eff2dff036b in __memcpy_ssse3_back () from /lib64/libc.so.6
    #1  0x0000000000480f8e in bool moodycamel::ReaderWriterQueue<(anonymous namespace)::kernel_event, 512ul>::inner_enqueue<(moodycamel::ReaderWriterQueue<(anonymous namespace)::kernel_event, 512ul>::AllocationMode)0, (anonymous namespace)::kernel_event const&>((anonymous namespace)::kernel_event const&) ()
    #2  0x000000000047f801 in moodycamel::ReaderWriterQueue<(anonymous namespace)::kernel_event, 512ul>::enqueue((anonymous namespace)::kernel_event const&) ()
    #3  0x000000000047e51d in moodycamel::BlockingReaderWriterQueue<(anonymous namespace)::kernel_event, 512ul>::enqueue((anonymous namespace)::kernel_event const&) ()
    #4  0x000000000047bcc6 in (anonymous namespace)::event_handler(void*, void*, int) ()
    #5  0x00007eff30046d07 in perf_reader_event_read () from /lib64/libbcc.so.0
    #6  0x00007eff3228dc32 in ebpf::BPFPerfBuffer::poll(int) () from /lib64/libbcc.so.0
    #7  0x000000000047b5bf in (anonymous namespace)::EHeapTrack::poll(void (*)(void*, void*, int), void*, int, std::string const&, int) ()
    #8  0x000000000047bc2d in (anonymous namespace)::EventPollTask::run() ()
    #9  0x000000000047d93f in main::{lambda()#1}::operator()() const ()
    #10 0x0000000000486016 in void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) ()
    #11 0x0000000000485ea5 in std::_Bind_simple<main::{lambda()#1} ()>::operator()() ()
    #12 0x0000000000485d9a in std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run() ()
    #13 0x00007eff2ea52070 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
        at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
    #14 0x00007eff2e270dd5 in start_thread (arg=0x7eff2afd8700) at pthread_create.c:307
    #15 0x00007eff2df99ead in clone () from /lib64/libc.so.6
    

    I have two workers, one worker is the producer and the other one is the consumer, the producer's code like below:

    void event_handler(void *cb_cookie, void *raw, int raw_size)
    {
        auto event = static_cast<kernel_event*>(raw);
        auto ept = static_cast<EventPollTask*>(cb_cookie);
        ept->m_q.enqueue(*event);
    }
    

    And the consumer's code like below:

    void run(void) 
    {
        union kernel_event raw;
        while (stopRequested() == false) {
            m_q.wait_dequeue(raw);
            ...
        }
    }
    

    After I comment out the consumer's code, the producer seems to work well, after I enable consumer's code, in a short time, the program will get SIGSEGV.

    I've no idea how to debug this.

    OS version: Centos 7.6 gcc/g++ version: 4.8.5 or 4.9.2

    opened by ethercflow 10
  • Assertion failed: (!inSection &&

    Assertion failed: (!inSection && "ReaderWriterQueue does not support enqueuing or dequeuing elements from other elements' ctors and dtors")

    This is my pseudo code using asio.

    asio::strand strand;
    ReaderWriterQueue<int> queue;
    
    void doSomething()
    {
      strand.post([]
      {
          int* value = queue.peek();
          // Do something
      }
    }
    
    // EntryPoint
    void Enqueue(int value)
    {
          queue.enqueue(value);
          Process();
    }
    
    
    void Process()
    {
      strand.post([]
      {
          int* value = queue.peek();              // <-- assert failed!!!!!
          if(value)
          {
              // Do something
              queue.pop();
          }
      });
      doSomething();
    }
    

    doSomething() and Process() may work in different threads, but they are obviously being synchronized to asio::strand.

    whats wrong?

    opened by nkari82 10
  • readerwritercircularbuffer.h not included in latest release. Intentional? Why?

    readerwritercircularbuffer.h not included in latest release. Intentional? Why?

    Although it appears to predate that release. Is that an indication that the fixed-sized queue is not ready for prime-times somehow? I'd like a queue as much for parallelism, as for potentially throttling the producer side to not accumulate too much memory in queue (producer fetches large data from the network, and consumer writes it to disk; or the reverse). That's a correct use-case for that variant, no?

    Thanks for any insights on this. --DD

    PS: I've successfully used it, and I'm very happy with the results, but would like confirmation it is under the same BSD license, and ready for prime time.

    PPS: Note that I already have Boost-licensed dependencies, and noticed your multi-thread queue is dual-licensed, but not this single-thread one. Since it's easier to get new dependencies accepted under existing licenses, I'd welcome a similar arrangement. FWIW.

    opened by ddevienne 9
  • Feature request for blocking queue - controlling wait

    Feature request for blocking queue - controlling wait

    When dequeuing on a potentially blocking queue it is often useful to implement a timeout or more generally have a means to wake up the waiting receiver ideally without sending a 'fake' packet through.

    For the send side, could we expose a signal() function to call sema.signal() and remove the check in the dequeue functions that expect a woken thread to always succeed in inner.try_dequeue

    This does however change the stated description of try_dequeue and wait_dequeue so I wonder if there is a better way....

    [Could implement a 'dummy msg', 'timeout' and/or 'done' command packet but that affects the generic nature of the queue's payload template as we'd need to define and recognise those commands.]

    AE_FORCEINLINE void signal()
    {
        sema.signal();
    }
    
    template<typename U>
    bool try_dequeue(U& result)
    {
        if (sema.tryWait()) {
            return inner.try_dequeue(result);
        }
        return false;
    }
    
    template<typename U>
    void wait_dequeue(U& result)
    {
        sema.wait();
        bool success = inner.try_dequeue(result);
        // TODO what to do with result??? Can't convey 'failed' wait
    }
    
    enhancement 
    opened by mintyc 9
  • coredump

    coredump

    I used the queue in my other program quite ok. However in one of my lib class I defined as

    // CTxnQueue.h
    struct CTxnQueue {
    	struct Item {
    		Item() {}
                    ....	
    	};
    
    	CTxnQueue(CTxnArchive& ar) : m_Archive(ar) {}
            // some members
    
    	static moodycamel::BlockingReaderWriterQueue<const Item>   m_txnMsgQueue;
    };
    //CTxnQueue.cpp
    FT_BEGIN_NAMESPACE_DECL
    namespace FtOM {
    moodycamel::BlockingReaderWriterQueue<const CTxnQueue::Item>   CTxnQueue::m_txnMsgQueue;
    };
    FT_END_NAMESPACE_DECL
    
    // FtOMTxnStore.cpp
    template<typename MT, typename AUX>
    FT_INLINE
    void FtOMTxnStore<MT, AUX>::IOThreadRun()
    {
    	while (m_bRun) {
    		CTxnQueue::Item item;
    		CTxnQueue::m_txnMsgQueue.wait_dequeue_timed(item, std::chrono::milliseconds(1)); // line 72
    		auto pArchive = item.m_pArchive;
    		if (pArchive) {
    			auto bRejectFlag = item.m_bRejectFlag;
    			if (!bRejectFlag) {
    				auto pcliMsg = item.m_pcliMsg;
    				if (pcliMsg) {
    					if (pArchive->Write(pcliMsg,item.m_iRptState) <= 0) {
    						m_rlog << LOG_LEVEL::ERROR << LOG_HEADER << "Failed to write to txn file" << endl;
    					}
    				} else {
    					m_rlog << LOG_LEVEL::ERROR << LOG_HEADER << "msg is nullptr" << endl;
    				}
    			} else {
    				if (pArchive->SetRejectFlag(item.m_nFileOffset,item.m_iMsgType, item.m_iRptState) <= 0) {
    					m_rlog << LOG_LEVEL::ERROR << LOG_HEADER << "Failed to SetRejectFlag to txn file" << endl;
    				}
    			}
    		} else {
    			m_rlog << LOG_LEVEL::ERROR << LOG_HEADER << "txn archive is nullptr" << endl;
    			m_bRun = false;
    		}
    	} 
    }
    

    coredump trace:

    #0 load (__m=std::memory_order_relaxed, this=) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/atomic_base.h:396 #1 load (this=) at /fs02/home/thirdparty/readerwriterqueue/atomicops.h:307 #2 tryWait (this=) at /fs02/home/thirdparty/readerwriterqueue/atomicops.h:632 #3 wait (timeout_usecs=1000, this=) at /fs02/home/thirdparty/readerwriterqueue/atomicops.h:648 #4 wait_dequeue_timed<FT_ADP_LIBRARY::FtOM::CTxnQueue::Item> (timeout_usecs=1000, result=..., this=) at /fs02/home/thirdparty/readerwriterqueue/readerwriterqueue.h:834 #5 wait_dequeue_timed<FT_ADP_LIBRARY::FtOM::CTxnQueue::Item, long, std::ratio<1, 1000> > (timeout=..., result=..., this=) at /fs02/home/thirdparty/readerwriterqueue/readerwriterqueue.h:855 #6 FT_ADP_LIBRARY::FtOM::FtOMTxnStore<std::integral_constant<bool, true>, void>::IOThreadRun ( this=0x7f4457820cb8) at /fs02/home/mcheng/workspace/Rel-15-1-0/directfeeds/face.spdlog.59/FtOMAcceptor/FtOMTxnStore.cpp:72 #7 0x000000000077a45f in execute_native_thread_routine () #8 0x00007f4467658dd5 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f4465f94b3d in clone () from /lib64/libc.so.6

    opened by chengm349 8
  • Queue as member variable breaks my program

    Queue as member variable breaks my program

    While trying to make my Video to ascii program multithreaded, by adding this queue i created a bug that i just cant fix and i figured out what is causing it just now. Whenever i have a queue as a member variable in my class an error occurs, and if i comment the member variable declaration out it doesnt happen anymore. Also you can criticize my code overall if you want, im learning

    Here is a video of me reproducing this: https://drive.google.com/file/d/11Y6_ug_H8E413T8e7YkKXet7-9SUaeCH/view?usp=sharing

    Pls help me fixing this bug, here is the code on github: https://github.com/BetterRage/Movie2Ascii.git

    To build it you need ffmpeg, SDL, and spdlog

    opened by BetterRage 2
  • some questions

    some questions

    Hello: if the dequeue work very slowly, the queue will become bigger and bigger, is there some limit or max level for the queue? can we delete the element

    question 
    opened by kanli201904 1
  • Haiku

    Haiku

    This should make the library run on Haiku (https://www.haiku-os.org/). The PR uses Haiku's own API. As Haiku has pthreads support in a different directory, I changed the makefiles of the tests, which pass on x64 (I'll also check x32). They of course still work under linux

    opened by TheZeldakatze 0
  • Select multiple queues

    Select multiple queues

    What is the best way to wait behind multiple queues and being notified when a message is coming from any of them? Actually I want something like select (in linux sockets or go programming language for channels). What I am doing now is iterating over all of them and use the non-blocking api try_dequeue, but this solution is busy wait and not efficient.

    opened by thegreathir 2
  • clang performance

    clang performance

    on x86_64 I bench gcc9 and clang8/9 and I see that clang has poor performance Do you know that behaviour? Is there some options to add in the clang case?

    clang++-9 -std=c++11    -Wpedantic -Wall -DNDEBUG -O3 -g bench.cpp ../tests/common/simplethread.cpp systemtime.cpp -o benchmarks -pthread -Wl,--no-as-needed -lrt
    
    $ ./benchmarks 
                      |----------------  Min -----------------|----------------- Max -----------------|----------------- Avg -----------------|
    Benchmark         |   RWQ   |  BRWCB  |  SPSC   |  Folly  |   RWQ   |  BRWCB  |  SPSC   |  Folly  |   RWQ   |  BRWCB  |  SPSC   |  Folly  | xSPSC | xFolly
    ------------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------+-------
    Raw add           | 0.0005s | 0.0011s | 0.0002s | 0.0001s | 0.0005s | 0.0011s | 0.0002s | 0.0001s | 0.0005s | 0.0011s | 0.0002s | 0.0001s | 0.43x | 0.17x
    Raw remove        | 0.0002s | 0.0010s | 0.0003s | 0.0001s | 0.0002s | 0.0010s | 0.0003s | 0.0001s | 0.0002s | 0.0010s | 0.0003s | 0.0001s | 1.97x | 0.41x
    Raw empty remove  | 0.0027s | 0.0011s | 0.0016s | 0.0015s | 0.0027s | 0.0012s | 0.0017s | 0.0016s | 0.0027s | 0.0012s | 0.0017s | 0.0016s | 0.62x | 0.59x
    Single-threaded   | 0.0043s | 0.0047s | 0.0039s | 0.0038s | 0.0043s | 0.0047s | 0.0039s | 0.0039s | 0.0043s | 0.0047s | 0.0039s | 0.0039s | 0.91x | 0.90x
    Mostly add        | 0.0067s | 0.0176s | 0.0053s | 0.0056s | 0.0068s | 0.0195s | 0.0060s | 0.0057s | 0.0068s | 0.0187s | 0.0058s | 0.0057s | 0.85x | 0.84x
    Mostly remove     | 0.0041s | 0.0059s | 0.0038s | 0.0043s | 0.0042s | 0.0060s | 0.0040s | 0.0044s | 0.0042s | 0.0059s | 0.0039s | 0.0043s | 0.93x | 1.03x
    Heavy concurrent  | 0.0092s | 0.0171s | 0.0046s | 0.0044s | 0.0263s | 0.0721s | 0.0047s | 0.0079s | 0.0179s | 0.0557s | 0.0047s | 0.0069s | 0.26x | 0.39x
    Random concurrent | 0.0103s | 0.0133s | 0.0101s | 0.0103s | 0.0103s | 0.0135s | 0.0101s | 0.0104s | 0.0103s | 0.0134s | 0.0101s | 0.0103s | 0.98x | 1.00x
    
    Average ops/s:
        ReaderWriterQueue:                  260.27 million
        BlockingReaderWriterCircularBuffer: 275.78 million
        SPSC queue:                         295.60 million
        Folly queue:                        562.96 million
    
    
    g++ -std=c++11    -Wpedantic -Wall -DNDEBUG -O3 -g bench.cpp ../tests/common/simplethread.cpp systemtime.cpp -o benchmarks -pthread -Wl,--no-as-needed -lrt
    
    $ ./benchmarks 
                      |----------------  Min -----------------|----------------- Max -----------------|----------------- Avg -----------------|
    Benchmark         |   RWQ   |  BRWCB  |  SPSC   |  Folly  |   RWQ   |  BRWCB  |  SPSC   |  Folly  |   RWQ   |  BRWCB  |  SPSC   |  Folly  | xSPSC | xFolly
    ------------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+-------+-------
    Raw add           | 0.0001s | 0.0013s | 0.0002s | 0.0002s | 0.0001s | 0.0013s | 0.0003s | 0.0002s | 0.0001s | 0.0013s | 0.0003s | 0.0002s | 1.76x | 1.10x
    Raw remove        | 0.0002s | 0.0010s | 0.0002s | 0.0002s | 0.0002s | 0.0010s | 0.0003s | 0.0002s | 0.0002s | 0.0010s | 0.0003s | 0.0002s | 1.63x | 1.24x
    Raw empty remove  | 0.0022s | 0.0009s | 0.0016s | 0.0011s | 0.0022s | 0.0009s | 0.0017s | 0.0011s | 0.0022s | 0.0009s | 0.0017s | 0.0011s | 0.76x | 0.49x
    Single-threaded   | 0.0046s | 0.0054s | 0.0045s | 0.0045s | 0.0046s | 0.0054s | 0.0046s | 0.0045s | 0.0046s | 0.0054s | 0.0045s | 0.0045s | 0.99x | 0.99x
    Mostly add        | 0.0022s | 0.0170s | 0.0046s | 0.0048s | 0.0023s | 0.0170s | 0.0055s | 0.0049s | 0.0023s | 0.0170s | 0.0050s | 0.0049s | 2.23x | 2.16x
    Mostly remove     | 0.0042s | 0.0046s | 0.0041s | 0.0033s | 0.0042s | 0.0053s | 0.0044s | 0.0034s | 0.0042s | 0.0048s | 0.0043s | 0.0034s | 1.02x | 0.80x
    Heavy concurrent  | 0.0018s | 0.0150s | 0.0048s | 0.0115s | 0.0019s | 0.0256s | 0.0050s | 0.0168s | 0.0018s | 0.0190s | 0.0049s | 0.0149s | 2.68x | 8.09x
    Random concurrent | 0.0127s | 0.0158s | 0.0130s | 0.0130s | 0.0128s | 0.0161s | 0.0130s | 0.0131s | 0.0128s | 0.0160s | 0.0130s | 0.0130s | 1.02x | 1.02x
    
    Average ops/s:
        ReaderWriterQueue:                  504.36 million
        BlockingReaderWriterCircularBuffer: 330.05 million
        SPSC queue:                         293.11 million
        Folly queue:                        452.94 million
    
    ```
    
    opened by mipac 3
Releases(v1.0.6)
Owner
Cameron
Cameron
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

moodycamel::ConcurrentQueue An industrial-strength lock-free queue for C++. Note: If all you need is a single-producer, single-consumer queue, I have

Cameron 7.4k Jan 3, 2023
C++11 thread safe, multi-producer, multi-consumer blocking queue, stack & priority queue class

BlockingCollection BlockingCollection is a C++11 thread safe collection class that provides the following features: Modeled after .NET BlockingCollect

Code Ex Machina LLC 50 Nov 23, 2022
A bounded multi-producer multi-consumer concurrent queue written in C++11

MPMCQueue.h A bounded multi-producer multi-consumer concurrent queue written in C++11. It's battle hardened and used daily in production: In the Frost

Erik Rigtorp 836 Dec 25, 2022
Forkpool - A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20

riften::Forkpool A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20. This project uses C++20's coroutines to implement c

Conor Williams 129 Dec 31, 2022
Awesome-lockfree - A collection of resources on wait-free and lock-free programming

Awesome Lock-Free A collection of resources on wait-free and lock-free programming. ?? ?? ?? Even better resource from MattPD: C++ links: atomics, loc

Erik Rigtorp 1.4k Jan 1, 2023
Simple and fast C library implementing a thread-safe API to manage hash-tables, linked lists, lock-free ring buffers and queues

libhl C library implementing a set of APIs to efficiently manage some basic data structures such as : hashtables, linked lists, queues, trees, ringbuf

Andrea Guzzo 392 Dec 3, 2022
Fast, generalized, implementation of the Chase-Lev lock-free work-stealing deque for C++17

riften::Deque A bleeding-edge lock-free, single-producer multi-consumer, Chase-Lev work stealing deque as presented in the paper "Dynamic Circular Wor

Conor Williams 120 Dec 22, 2022
Concurrency Kit 2.1k Jan 4, 2023
GECOS: A lock-free synchronization mechanism

GECOS GECOS is a lock-free synchronization mechanism, and this repository compares various well-known mechanisms such as RCU and HP (Hazard Pointers).

null 6 Sep 9, 2021
Bikeshed - Lock free hierarchical work scheduler

Branch OSX / Linux / Windows master master bikeshed Lock free hierarchical work scheduler Builds with MSVC, Clang and GCC, header only, C99 compliant,

Dan Engelbrecht 81 Dec 30, 2022
Concurrent-deque - Lock-free concurrent work stealing deque in C++

A lock-free work-stealing deque This is a C++ implementation of the Chase-Lev deque, a concurrent single-producer, multi-consumer queue presented in t

Shubham Lagwankar 30 Jan 7, 2023
Rpmalloc - Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C

rpmalloc - General Purpose Memory Allocator This library provides a public domain cross platform lock free thread caching 16-byte aligned memory alloc

Mattias Jansson 1.7k Jan 5, 2023
Vgpu unlock - Unlock vGPU functionality for consumer grade GPUs.

vgpu_unlock Unlock vGPU functionality for consumer-grade Nvidia GPUs. Important! This tool is not guarenteed to work out of the box in some cases, so

Jonathan Johansson 3.6k Dec 29, 2022
Code from https://queue.acm.org/detail.cfm?id=3448307 unzipped

Copyright (C) 2020-2021 Terence Kelly. All rights reserved. Author contact: tpkelly@acm.org, [email protected], [email protected] Adde

Breck Yunits 21 May 30, 2021
Smart queue that executes tasks in threadpool-like manner

execq execq is kind of task-based approach of processing data using threadpool idea with extended features. It supports different task sources and mai

Vladimir (Alkenso) 33 Dec 22, 2022
Thread pool - Thread pool using std::* primitives from C++17, with optional priority queue/greenthreading for POSIX.

thread_pool Thread pool using std::* primitives from C++11. Also includes a class for a priority thread pool. Requires concepts and C++17, including c

Tyler Hardin 77 Dec 30, 2022
This is a study on how to do create a queue via IPC (inter-process communication)

IPC queue This is a study on how to do create a queue via IPC (inter-process communication). Please take a look at the examples of producer and consum

Tarcísio Zotelli Ferraz 1 Nov 28, 2022
DwThreadPool - A simple, header-only, dependency-free, C++ 11 based ThreadPool library.

dwThreadPool A simple, header-only, dependency-free, C++ 11 based ThreadPool library. Features C++ 11 Minimal Source Code Header-only No external depe

Dihara Wijetunga 27 Oct 28, 2022
🧵 Fast and easy multithreading for React Native using JSI

react-native-multithreading ?? Fast and easy multithreading for React Native using JSI. Installation npm install react-native-multithreading npx pod-i

Marc Rousavy 988 Dec 31, 2022