A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

Overview

moodycamel::ConcurrentQueue

An industrial-strength lock-free queue for C++.

Note: If all you need is a single-producer, single-consumer queue, I have one of those too.

Features

  • Knock-your-socks-off blazing fast performance.
  • Single-header implementation. Just drop it in your project.
  • Fully thread-safe lock-free queue. Use concurrently from any number of threads.
  • C++11 implementation -- elements are moved (instead of copied) where possible.
  • Templated, obviating the need to deal exclusively with pointers -- memory is managed for you.
  • No artificial limitations on element types or maximum count.
  • Memory can be allocated once up-front, or dynamically as needed.
  • Fully portable (no assembly; all is done through standard C++11 primitives).
  • Supports super-fast bulk operations.
  • Includes a low-overhead blocking version (BlockingConcurrentQueue).
  • Exception safe.

Reasons to use

There are not that many full-fledged lock-free queues for C++. Boost has one, but it's limited to objects with trivial assignment operators and trivial destructors, for example. Intel's TBB queue isn't lock-free, and requires trivial constructors too. There're many academic papers that implement lock-free queues in C++, but usable source code is hard to find, and tests even more so.

This queue not only has less limitations than others (for the most part), but it's also faster. It's been fairly well-tested, and offers advanced features like bulk enqueueing/dequeueing (which, with my new design, is much faster than one element at a time, approaching and even surpassing the speed of a non-concurrent queue even under heavy contention).

In short, there was a lock-free queue shaped hole in the C++ open-source universe, and I set out to fill it with the fastest, most complete, and well-tested design and implementation I could. The result is moodycamel::ConcurrentQueue :-)

Reasons not to use

The fastest synchronization of all is the kind that never takes place. Fundamentally, concurrent data structures require some synchronization, and that takes time. Every effort was made, of course, to minimize the overhead, but if you can avoid sharing data between threads, do so!

Why use concurrent data structures at all, then? Because they're gosh darn convenient! (And, indeed, sometimes sharing data concurrently is unavoidable.)

My queue is not linearizable (see the next section on high-level design). The foundations of its design assume that producers are independent; if this is not the case, and your producers co-ordinate amongst themselves in some fashion, be aware that the elements won't necessarily come out of the queue in the same order they were put in relative to the ordering formed by that co-ordination (but they will still come out in the order they were put in by any individual producer). If this affects your use case, you may be better off with another implementation; either way, it's an important limitation to be aware of.

My queue is also not NUMA aware, and does a lot of memory re-use internally, meaning it probably doesn't scale particularly well on NUMA architectures; however, I don't know of any other lock-free queue that is NUMA aware (except for SALSA, which is very cool, but has no publicly available implementation that I know of).

Finally, the queue is not sequentially consistent; there is a happens-before relationship between when an element is put in the queue and when it comes out, but other things (such as pumping the queue until it's empty) require more thought to get right in all eventualities, because explicit memory ordering may have to be done to get the desired effect. In other words, it can sometimes be difficult to use the queue correctly. This is why it's a good idea to follow the samples where possible. On the other hand, the upside of this lack of sequential consistency is better performance.

High-level design

Elements are stored internally using contiguous blocks instead of linked lists for better performance. The queue is made up of a collection of sub-queues, one for each producer. When a consumer wants to dequeue an element, it checks all the sub-queues until it finds one that's not empty. All of this is largely transparent to the user of the queue, however -- it mostly just worksTM.

One particular consequence of this design, however, (which seems to be non-intuitive) is that if two producers enqueue at the same time, there is no defined ordering between the elements when they're later dequeued. Normally this is fine, because even with a fully linearizable queue there'd be a race between the producer threads and so you couldn't rely on the ordering anyway. However, if for some reason you do extra explicit synchronization between the two producer threads yourself, thus defining a total order between enqueue operations, you might expect that the elements would come out in the same total order, which is a guarantee my queue does not offer. At that point, though, there semantically aren't really two separate producers, but rather one that happens to be spread across multiple threads. In this case, you can still establish a total ordering with my queue by creating a single producer token, and using that from both threads to enqueue (taking care to synchronize access to the token, of course, but there was already extra synchronization involved anyway).

I've written a more detailed overview of the internal design, as well as the full nitty-gritty details of the design, on my blog. Finally, the source itself is available for perusal for those interested in its implementation.

Basic use

The entire queue's implementation is contained in one header, concurrentqueue.h. Simply download and include that to use the queue. The blocking version is in a separate header, blockingconcurrentqueue.h, that depends on concurrentqueue.h and [lightweightsemaphore.h][lightweightsemaphore.h]. The implementation makes use of certain key C++11 features, so it requires a fairly recent compiler (e.g. VS2012+ or g++ 4.8; note that g++ 4.6 has a known bug with std::atomic and is thus not supported). The algorithm implementations themselves are platform independent.

Use it like you would any other templated queue, with the exception that you can use it from many threads at once :-)

Simple example:

#include "concurrentqueue.h"

moodycamel::ConcurrentQueue<int> q;
q.enqueue(25);

int item;
bool found = q.try_dequeue(item);
assert(found && item == 25);

Description of basic methods:

  • ConcurrentQueue(size_t initialSizeEstimate) Constructor which optionally accepts an estimate of the number of elements the queue will hold
  • enqueue(T&& item) Enqueues one item, allocating extra space if necessary
  • try_enqueue(T&& item) Enqueues one item, but only if enough memory is already allocated
  • try_dequeue(T& item) Dequeues one item, returning true if an item was found or false if the queue appeared empty

Note that it is up to the user to ensure that the queue object is completely constructed before being used by any other threads (this includes making the memory effects of construction visible, possibly via a memory barrier). Similarly, it's important that all threads have finished using the queue (and the memory effects have fully propagated) before it is destructed.

There's usually two versions of each method, one "explicit" version that takes a user-allocated per-producer or per-consumer token, and one "implicit" version that works without tokens. Using the explicit methods is almost always faster (though not necessarily by a huge factor). Apart from performance, the primary distinction between them is their sub-queue allocation behaviour for enqueue operations: Using the implicit enqueue methods causes an automatically-allocated thread-local producer sub-queue to be allocated (it is marked for reuse once the thread exits). Explicit producers, on the other hand, are tied directly to their tokens' lifetimes (and are also recycled as needed).

Full API (pseudocode):

# Allocates more memory if necessary
enqueue(item) : bool
enqueue(prod_token, item) : bool
enqueue_bulk(item_first, count) : bool
enqueue_bulk(prod_token, item_first, count) : bool

# Fails if not enough memory to enqueue
try_enqueue(item) : bool
try_enqueue(prod_token, item) : bool
try_enqueue_bulk(item_first, count) : bool
try_enqueue_bulk(prod_token, item_first, count) : bool

# Attempts to dequeue from the queue (never allocates)
try_dequeue(item&) : bool
try_dequeue(cons_token, item&) : bool
try_dequeue_bulk(item_first, max) : size_t
try_dequeue_bulk(cons_token, item_first, max) : size_t

# If you happen to know which producer you want to dequeue from
try_dequeue_from_producer(prod_token, item&) : bool
try_dequeue_bulk_from_producer(prod_token, item_first, max) : size_t

# A not-necessarily-accurate count of the total number of elements
size_approx() : size_t

Blocking version

As mentioned above, a full blocking wrapper of the queue is provided that adds wait_dequeue and wait_dequeue_bulk methods in addition to the regular interface. This wrapper is extremely low-overhead, but slightly less fast than the non-blocking queue (due to the necessary bookkeeping involving a lightweight semaphore).

There are also timed versions that allow a timeout to be specified (either in microseconds or with a std::chrono object).

The only major caveat with the blocking version is that you must be careful not to destroy the queue while somebody is waiting on it. This generally means you need to know for certain that another element is going to come along before you call one of the blocking methods. (To be fair, the non-blocking version cannot be destroyed while in use either, but it can be easier to coordinate the cleanup.)

Blocking example:

#include "blockingconcurrentqueue.h"

moodycamel::BlockingConcurrentQueue<int> q;
std::thread producer([&]() {
    for (int i = 0; i != 100; ++i) {
        std::this_thread::sleep_for(std::chrono::milliseconds(i % 10));
        q.enqueue(i);
    }
});
std::thread consumer([&]() {
    for (int i = 0; i != 100; ++i) {
        int item;
        q.wait_dequeue(item);
        assert(item == i);
        
        if (q.wait_dequeue_timed(item, std::chrono::milliseconds(5))) {
            ++i;
            assert(item == i);
        }
    }
});
producer.join();
consumer.join();

assert(q.size_approx() == 0);

Advanced features

Tokens

The queue can take advantage of extra per-producer and per-consumer storage if it's available to speed up its operations. This takes the form of "tokens": You can create a consumer token and/or a producer token for each thread or task (tokens themselves are not thread-safe), and use the methods that accept a token as their first parameter:

moodycamel::ConcurrentQueue<int> q;

moodycamel::ProducerToken ptok(q);
q.enqueue(ptok, 17);

moodycamel::ConsumerToken ctok(q);
int item;
q.try_dequeue(ctok, item);
assert(item == 17);

If you happen to know which producer you want to consume from (e.g. in a single-producer, multi-consumer scenario), you can use the try_dequeue_from_producer methods, which accept a producer token instead of a consumer token, and cut some overhead.

Note that tokens work with the blocking version of the queue too.

When producing or consuming many elements, the most efficient way is to:

  1. Use the bulk methods of the queue with tokens
  2. Failing that, use the bulk methods without tokens
  3. Failing that, use the single-item methods with tokens
  4. Failing that, use the single-item methods without tokens

Having said that, don't create tokens willy-nilly -- ideally there would be one token (of each kind) per thread. The queue will work with what it is given, but it performs best when used with tokens.

Note that tokens aren't actually tied to any given thread; it's not technically required that they be local to the thread, only that they be used by a single producer/consumer at a time.

Bulk operations

Thanks to the novel design of the queue, it's just as easy to enqueue/dequeue multiple items as it is to do one at a time. This means that overhead can be cut drastically for bulk operations. Example syntax:

moodycamel::ConcurrentQueue<int> q;

int items[] = { 1, 2, 3, 4, 5 };
q.enqueue_bulk(items, 5);

int results[5];     // Could also be any iterator
size_t count = q.try_dequeue_bulk(results, 5);
for (size_t i = 0; i != count; ++i) {
    assert(results[i] == items[i]);
}

Preallocation (correctly using try_enqueue)

try_enqueue, unlike just plain enqueue, will never allocate memory. If there's not enough room in the queue, it simply returns false. The key to using this method properly, then, is to ensure enough space is pre-allocated for your desired maximum element count.

The constructor accepts a count of the number of elements that it should reserve space for. Because the queue works with blocks of elements, however, and not individual elements themselves, the value to pass in order to obtain an effective number of pre-allocated element slots is non-obvious.

First, be aware that the count passed is rounded up to the next multiple of the block size. Note that the default block size is 32 (this can be changed via the traits). Second, once a slot in a block has been enqueued to, that slot cannot be re-used until the rest of the block has completely been completely filled up and then completely emptied. This affects the number of blocks you need in order to account for the overhead of partially-filled blocks. Third, each producer (whether implicit or explicit) claims and recycles blocks in a different manner, which again affects the number of blocks you need to account for a desired number of usable slots.

Suppose you want the queue to be able to hold at least N elements at any given time. Without delving too deep into the rather arcane implementation details, here are some simple formulas for the number of elements to request for pre-allocation in such a case. Note the division is intended to be arithmetic division and not integer division (in order for ceil() to work).

For explicit producers (using tokens to enqueue):

(ceil(N / BLOCK_SIZE) + 1) * MAX_NUM_PRODUCERS * BLOCK_SIZE

For implicit producers (no tokens):

(ceil(N / BLOCK_SIZE) - 1 + 2 * MAX_NUM_PRODUCERS) * BLOCK_SIZE

When using mixed producer types:

((ceil(N / BLOCK_SIZE) - 1) * (MAX_EXPLICIT_PRODUCERS + 1) + 2 * (MAX_IMPLICIT_PRODUCERS + MAX_EXPLICIT_PRODUCERS)) * BLOCK_SIZE

If these formulas seem rather inconvenient, you can use the constructor overload that accepts the minimum number of elements (N) and the maximum number of explicit and implicit producers directly, and let it do the computation for you.

Finally, it's important to note that because the queue is only eventually consistent and takes advantage of weak memory ordering for speed, there's always a possibility that under contention try_enqueue will fail even if the queue is correctly pre-sized for the desired number of elements. (e.g. A given thread may think that the queue's full even when that's no longer the case.) So no matter what, you still need to handle the failure case (perhaps looping until it succeeds), unless you don't mind dropping elements.

Exception safety

The queue is exception safe, and will never become corrupted if used with a type that may throw exceptions. The queue itself never throws any exceptions (operations fail gracefully (return false) if memory allocation fails instead of throwing std::bad_alloc).

It is important to note that the guarantees of exception safety only hold if the element type never throws from its destructor, and that any iterators passed into the queue (for bulk operations) never throw either. Note that in particular this means std::back_inserter iterators must be used with care, since the vector being inserted into may need to allocate and throw a std::bad_alloc exception from inside the iterator; so be sure to reserve enough capacity in the target container first if you do this.

The guarantees are presently as follows:

  • Enqueue operations are rolled back completely if an exception is thrown from an element's constructor. For bulk enqueue operations, this means that elements are copied instead of moved (in order to avoid having only some of the objects be moved in the event of an exception). Non-bulk enqueues always use the move constructor if one is available.
  • If the assignment operator throws during a dequeue operation (both single and bulk), the element(s) are considered dequeued regardless. In such a case, the dequeued elements are all properly destructed before the exception is propagated, but there's no way to get the elements themselves back.
  • Any exception that is thrown is propagated up the call stack, at which point the queue is in a consistent state.

Note: If any of your type's copy constructors/move constructors/assignment operators don't throw, be sure to annotate them with noexcept; this will avoid the exception-checking overhead in the queue where possible (even with zero-cost exceptions, there's still a code size impact that has to be taken into account).

Traits

The queue also supports a traits template argument which defines various types, constants, and the memory allocation and deallocation functions that are to be used by the queue. The typical pattern to providing your own traits is to create a class that inherits from the default traits and override only the values you wish to change. Example:

struct MyTraits : public moodycamel::ConcurrentQueueDefaultTraits
{
	static const size_t BLOCK_SIZE = 256;		// Use bigger blocks
};

moodycamel::ConcurrentQueue<int, MyTraits> q;

How to dequeue types without calling the constructor

The normal way to dequeue an item is to pass in an existing object by reference, which is then assigned to internally by the queue (using the move-assignment operator if possible). This can pose a problem for types that are expensive to construct or don't have a default constructor; fortunately, there is a simple workaround: Create a wrapper class that copies the memory contents of the object when it is assigned by the queue (a poor man's move, essentially). Note that this only works if the object contains no internal pointers. Example:

struct MyObjectMover {
    inline void operator=(MyObject&& obj) {
        std::memcpy(data, &obj, sizeof(MyObject));
        
        // TODO: Cleanup obj so that when it's destructed by the queue
        // it doesn't corrupt the data of the object we just moved it into
    }

    inline MyObject& obj() { return *reinterpret_cast<MyObject*>(data); }

private:
    align(alignof(MyObject)) char data[sizeof(MyObject)];
};

A less dodgy alternative, if moves are cheap but default construction is not, is to use a wrapper that defers construction until the object is assigned, enabling use of the move constructor:

struct MyObjectMover {
    inline void operator=(MyObject&& x) {
        new (data) MyObject(std::move(x));
        created = true;
    }

    inline MyObject& obj() {
        assert(created);
        return *reinterpret_cast<MyObject*>(data);
    }

    ~MyObjectMover() {
        if (created)
            obj().~MyObject();
    }

private:
    align(alignof(MyObject)) char data[sizeof(MyObject)];
    bool created = false;
};

Samples

There are some more detailed samples here. The source of the unit tests and benchmarks are available for reference as well.

Benchmarks

See my blog post for some benchmark results (including versus boost::lockfree::queue and tbb::concurrent_queue), or run the benchmarks yourself (requires MinGW and certain GnuWin32 utilities to build on Windows, or a recent g++ on Linux):

cd build
make benchmarks
bin/benchmarks

The short version of the benchmarks is that it's so fast (especially the bulk methods), that if you're actually using the queue to do anything, the queue won't be your bottleneck.

Tests (and bugs)

I've written quite a few unit tests as well as a randomized long-running fuzz tester. I also ran the core queue algorithm through the CDSChecker C++11 memory model model checker. Some of the inner algorithms were tested separately using the Relacy model checker, and full integration tests were also performed with Relacy. I've tested on Linux (Fedora 19) and Windows (7), but only on x86 processors so far (Intel and AMD). The code was written to be platform-independent, however, and should work across all processors and OSes.

Due to the complexity of the implementation and the difficult-to-test nature of lock-free code in general, there may still be bugs. If anyone is seeing buggy behaviour, I'd like to hear about it! (Especially if a unit test for it can be cooked up.) Just open an issue on GitHub.

Using vcpkg

You can download and install moodycamel::ConcurrentQueue using the vcpkg dependency manager:

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
vcpkg install concurrentqueue

The moodycamel::ConcurrentQueue port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

License

I'm releasing the source of this repository (with the exception of third-party code, i.e. the Boost queue (used in the benchmarks for comparison), Intel's TBB library (ditto), CDSChecker, Relacy, and Jeff Preshing's cross-platform semaphore, which all have their own licenses) under a simplified BSD license. I'm also dual-licensing under the Boost Software License. See the LICENSE.md file for more details.

Note that lock-free programming is a patent minefield, and this code may very well violate a pending patent (I haven't looked), though it does not to my present knowledge. I did design and implement this queue from scratch.

Diving into the code

If you're interested in the source code itself, it helps to have a rough idea of how it's laid out. This section attempts to describe that.

The queue is formed of several basic parts (listed here in roughly the order they appear in the source). There's the helper functions (e.g. for rounding to a power of 2). There's the default traits of the queue, which contain the constants and malloc/free functions used by the queue. There's the producer and consumer tokens. Then there's the queue's public API itself, starting with the constructor, destructor, and swap/assignment methods. There's the public enqueue methods, which are all wrappers around a small set of private enqueue methods found later on. There's the dequeue methods, which are defined inline and are relatively straightforward.

Then there's all the main internal data structures. First, there's a lock-free free list, used for recycling spent blocks (elements are enqueued to blocks internally). Then there's the block structure itself, which has two different ways of tracking whether it's fully emptied or not (remember, given two parallel consumers, there's no way to know which one will finish first) depending on where it's used. Then there's a small base class for the two types of internal SPMC producer queues (one for explicit producers that holds onto memory but attempts to be faster, and one for implicit ones which attempt to recycle more memory back into the parent but is a little slower). The explicit producer is defined first, then the implicit one. They both contain the same general four methods: One to enqueue, one to dequeue, one to enqueue in bulk, and one to dequeue in bulk. (Obviously they have constructors and destructors too, and helper methods.) The main difference between them is how the block handling is done (they both use the same blocks, but in different ways, and map indices to them in different ways).

Finally, there's the miscellaneous internal methods: There's the ones that handle the initial block pool (populated when the queue is constructed), and an abstract block pool that comprises the initial pool and any blocks on the free list. There's ones that handle the producer list (a lock-free add-only linked list of all the producers in the system). There's ones that handle the implicit producer lookup table (which is really a sort of specialized TLS lookup). And then there's some helper methods for allocating and freeing objects, and the data members of the queue itself, followed lastly by the free-standing swap functions.

Comments
  • static_assert failed

    static_assert failed "The queue does not support super-aligned types at this time" with T=std::function (iOS)

    I tried to use ConcurrentQueue in our project as a drop-in replacement of our crude std::queue+std::mutex queue, but unfortunately I've hit a snag on iOS.

    When instantiating the queue with std::function<void(void)> that assertion in the title fails while building on Xcode7 for iOS, where it turns out that a lot of types have a bigger alignment than max_align_t:

    • std::alignment_of <std::function<void()>>::value == 8
    • std::alignment_of <details::max_align_t>::value == 4

    I thought of opening this issue because I feel that the solution here might be more trivial than having to implement support for overaligned types. I'm not sure of how the static_cast could be fixed, but perhaps max_align_t isn't the right alignment to check there? I'm pretty sure that malloc's alignment on iOS is actually 16 bytes, so the code should just work.

    opened by Tomcc 33
  • Queue allocation space

    Queue allocation space

    I allocated a size to the queue, but after initializing the queue, there is no space allocated, but the space is allocated only when it is actually used. Is there a way to allocate space immediately

    opened by GREATwo 28
  • <concurrentqueue.h> not found

    not found

    I import the project via CPM.cmake and I use it like:

    CPMAddPackage(
      NAME concurrentqueue
      GIT_TAG 07534961616f00728a99cbe864c5833c445cfd9b
      GITHUB_REPOSITORY cameron314/concurrentqueue
    )
    ...
    target_link_libraries(${PROJECT_NAME} PRIVATE concurrentqueue)
    

    but, when I try to actually use the header, it doesn't find it.

    I include the header like:

    #include <concurrentqueue.h>
    // also tried: #include <concurrent_queue.h>
    // also tried: #include <moodycamel/concurrentqueue.h>
    

    What do I do?

    opened by mscofield0 21
  • Concurrent Queue As MultipleProducer and Single Consumer

    Concurrent Queue As MultipleProducer and Single Consumer

    @cameron314

    I am using this Concurrent Queue and there are some anomalies I want to understand.

    I have created a single Concurrent Queue and its handle has been shared among Multiple Producer Threads, which means that there are Multiple Producers producing the data to a Single Thread I am doing like queueHandle->try_enqueue(myTempPbject)

    And there is **single Consumer Thread ** which is consuming the data from the queue, What I have observed that RAM keep on growing, it started with 144 MB Ram and with in 20 minutes shoot to 170 MB RAM.
    The Structure is of size 156 bytes . Please tell me how to overcome this.

    cannot-reproduce 
    opened by abhit011 19
  • iOS

    iOS

    I would like to use the concurrent queue in an iOS project. However, when I try to compile I get the following error:

    concurrentqueue.h:103:49: Thread-local storage is not supported for the current target

    The problem is in this line:

    static inline thread_id_t thread_id() { static MOODYCAMEL_THREADLOCAL int x; return reinterpret_cast<thread_id_t>(&x); }
    

    For now I've found a quick solution by creating a file concurrentqueue.mm with the following contents:

    #include "concurrentqueue.h"
    #import <Foundation/Foundation.h>
    
    moodycamel::details::thread_id_t moodycamel::details::thread_id() { return reinterpret_cast<thread_id_t>([NSThread currentThread]); }
    

    ...adding a conditional include to the top of concurrentqueue.h:

    #if defined(__APPLE__)
    #include "TargetConditionals.h"
    #endif
    

    ...and replacing the problematic line by this code:

    #if defined(__APPLE__) && (defined(TARGET_IPHONE_SIMULATOR) || defined(TARGET_OS_IPHONE))
        thread_id_t thread_id();
    #else
        static inline thread_id_t thread_id() { static MOODYCAMEL_THREADLOCAL int x; return reinterpret_cast<thread_id_t>(&x); }
    #endif
    

    This seems to work, but it's not as nice as just having one header file. I can't think of a better solution right now, but maybe we can discuss it here.

    opened by martinfinke 19
  • how does consumer retrieve from the queue

    how does consumer retrieve from the queue

    In the document, the author mentioned that there are multiple sub-queues owned by each producer. But it doesn't detail how consumer get an item. Since there are multiple sub-queues, does a consumer go over them in certain order to find an un-empty one?

    Our experiment shows a weird behavior. First, we have one thread to enqueue a number of items. Then, the same thread retrieve items, which will be enqueued back by some other threads. What we observe is that one item has 90% of chance to be retrieved.

    Can you explain what happened?

    opened by tobyfan1980 15
  • Program received signal SIGSEGV, Segmentation fault  while using producer tokens

    Program received signal SIGSEGV, Segmentation fault while using producer tokens

    When I use moodycamel::ConcurrentQueuestd::string with 4 consumer and 4 producer threads with initial size 1024*1024.

    NOTE:

    1. This worked fine while using data type as int.
    2. This work fine while using non producer tokens
    3. I have undefined MOODYCAMEL_CPP11_THREAD_LOCAL_SUPPORTED
    4. Using GCC 4.9
    5. Darwin 7831c1c1501a 14.0.0 Darwin Kernel Version 14.0.0: Fri Sep 19 00:26:44 PDT 2014; root:xnu-2782.1.97~2/RELEASE_X86_64 x86_64
    Program received signal SIGSEGV, Segmentation fault.
    0x000000010000363e in moodycamel::ConcurrentQueue<std::string,     moodycamel::ConcurrentQueueDefaultTraits>::ExplicitProducer::~ExplicitProducer() ()
    (gdb) bt
    #0  0x000000010000363e in moodycamel::ConcurrentQueue<std::string,     moodycamel::ConcurrentQueueDefaultTraits>::ExplicitProducer::~ExplicitProducer() ()
    #1  0x00000001000028b1 in test_1000() ()
    #2  0x0000000100008599 in main ()
    
    bug 
    opened by rohitjoshi 15
  • Possible data race using `BlockingConcurrentQueue::wait_dequeue` detected by thread sanitizer

    Possible data race using `BlockingConcurrentQueue::wait_dequeue` detected by thread sanitizer

    I have come across a situation where ThreadSanitizer detects a data race after moving an object inside a moodycamel::BlockingConcurrentQueue. The object has non-default move-constructor and move-assignment operators.

    I apologize in advance if I am doing something wrong/stupid in my code - it's likely that the race condition is caused because of a mistake I may have made.

    Here's a minimal example, also available on gist.

    #include <array>
    #include <atomic>
    #include "./blockingconcurrentqueue.h"
    
    using namespace std::chrono_literals;
    
    // global counter
    // * the program can exit when it reaches 1000
    // * incremented by tasks
    std::atomic<int> ctr{0};
    
    // task 
    // * non-copyable
    // * increments `ctr` on construction
    // * transfers some mock state `x` on moves
    struct task
    {
        int x = 0;
    
        task()
        {
            ++ctr;
        }
    
        task(const task&) = delete;
        task& operator=(const task&) = delete;
    
        task(task&& rhs) : x{rhs.x}
        {
        }
    
        task& operator=(task&& rhs)
        {
            x = rhs.x;
            return *this;
        }
    };
    
    // task queue
    moodycamel::BlockingConcurrentQueue<task> q;
    
    // worker
    // * contains thread that constantly calls `wait_dequeue`
    // * on dtor, spawns tasks until thread is joined
    struct worker
    {
        std::thread th;
        std::atomic<bool> running{true};
        std::atomic<bool> exited{false};
    
        worker()
        {
            th = std::thread([this]
                {
                    task t;
                    while(running)
                    {
                        q.wait_dequeue(t);
                    }
                    exited = true;
                });
        }
    
        ~worker()
        {
            running = false;
    
            while(!exited)
            {
                q.enqueue(task{});
                std::this_thread::sleep_for(1ms);
            }
    
            th.join();
        }
    };
    
    int main()
    {
        // create 8 workers
        std::array<worker, 8> ws;
    
        // enqueue 1000 tasks
        for(int i = 0; i < 1000; ++i)
        {
            q.enqueue(task{});
        }
    
        // wait for `ctr` to be `>= 1000`
        while(ctr < 1000)
        {
            std::this_thread::sleep_for(1ms);
        }
    }
    

    The worker instances simulate a thread pool. The task type simulates a moveable callable object (like std::function).

    When compiling with -fsanitize=thread, both g++ 6.1.1 and clang++ 3.8.0 report a huge number of the following data race (also available on the same gist):

    WARNING: ThreadSanitizer: data race (pid=15381)
      Read of size 4 at 0x7d340000cf48 by thread T7:
        #0 task::operator=(task&&) <null> (a.out+0x000000402c36)
        #1 bool moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::dequeue<task>(task&) <null> (a.out+0x000000406279)
        #2 bool moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::ProducerBase::dequeue<task>(task&) <null> (a.out+0x000000405340)
        #3 bool moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::try_dequeue<task>(task&) <null> (a.out+0x0000004044e9)
        #4 void moodycamel::BlockingConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::wait_dequeue<task>(task&) <null> (a.out+0x000000403303)
        #5 worker::worker()::{lambda()#1}::operator()() const <null> (a.out+0x000000402ce6)
        #6 void std::_Bind_simple<worker::worker()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) <null> (a.out+0x00000040a1a4)
        #7 std::_Bind_simple<worker::worker()::{lambda()#1} ()>::operator()() <null> (a.out+0x00000040a0dd)
        #8 std::thread::_State_impl<std::_Bind_simple<worker::worker()::{lambda()#1} ()> >::_M_run() <null> (a.out+0x00000040a018)
        #9 execute_native_thread_routine /build/gcc-multilib/src/gcc/libstdc++-v3/src/c++11/thread.cc:83 (libstdc++.so.6+0x0000000baaae)
    
      Previous write of size 8 at 0x7d340000cf48 by main thread:
        #0 malloc /build/gcc-multilib/src/gcc/libsanitizer/tsan/tsan_interceptors.cc:538 (libtsan.so.0+0x000000026aac)
        #1 moodycamel::ConcurrentQueueDefaultTraits::malloc(unsigned long) <null> (a.out+0x000000402102)
        #2 moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::Block* moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::create<moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::Block>() <null> (a.out+0x00000040880b)
        #3 moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::Block* moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::requisition_block<(moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)0>() <null> (a.out+0x000000407add)
        #4 bool moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::enqueue<(moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)0, task>(task&&) <null> (a.out+0x000000406ed4)
        #5 bool moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::inner_enqueue<(moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)0, task>(task&&) <null> (a.out+0x0000004057eb)
        #6 moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::enqueue(task&&) <null> (a.out+0x0000004049c6)
        #7 moodycamel::BlockingConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::enqueue(task&&) <null> (a.out+0x00000040343e)
        #8 main <null> (a.out+0x0000004016dd)
    
      Location is heap block of size 208 at 0x7d340000cf30 allocated by main thread:
        #0 malloc /build/gcc-multilib/src/gcc/libsanitizer/tsan/tsan_interceptors.cc:538 (libtsan.so.0+0x000000026aac)
        #1 moodycamel::ConcurrentQueueDefaultTraits::malloc(unsigned long) <null> (a.out+0x000000402102)
        #2 moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::Block* moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::create<moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::Block>() <null> (a.out+0x00000040880b)
        #3 moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::Block* moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::requisition_block<(moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)0>() <null> (a.out+0x000000407add)
        #4 bool moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::ImplicitProducer::enqueue<(moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)0, task>(task&&) <null> (a.out+0x000000406ed4)
        #5 bool moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::inner_enqueue<(moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::AllocationMode)0, task>(task&&) <null> (a.out+0x0000004057eb)
        #6 moodycamel::ConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::enqueue(task&&) <null> (a.out+0x0000004049c6)
        #7 moodycamel::BlockingConcurrentQueue<task, moodycamel::ConcurrentQueueDefaultTraits>::enqueue(task&&) <null> (a.out+0x00000040343e)
        #8 main <null> (a.out+0x0000004016dd)
    
      Thread T7 (tid=15389, running) created by main thread at:
        #0 pthread_create /build/gcc-multilib/src/gcc/libsanitizer/tsan/tsan_interceptors.cc:876 (libtsan.so.0+0x000000028360)
        #1 __gthread_create /build/gcc-multilib/src/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:662 (libstdc++.so.6+0x0000000badc4)
        #2 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) /build/gcc-multilib/src/gcc/libstdc++-v3/src/c++11/thread.cc:163 (libstdc++.so.6+0x0000000badc4)
        #3 worker::worker() <null> (a.out+0x000000402da3)
        #4 std::array<worker, 8ul>::array() <null> (a.out+0x000000402f80)
        #5 main <null> (a.out+0x0000004016b0)
    
    SUMMARY: ThreadSanitizer: data race (a.out+0x402c36) in task::operator=(task&&)
    

    It seems that the data race occurs during the move operations, between queuing and dequeuing task instances.

    Unless I am missing something obvious, I do not see why a data race should occur when moving the object in and out the queue. Is this a false positive or a potential problem in the blocking queue implementation?

    opened by vittorioromeo 14
  • Fuzz tests on Android fail

    Fuzz tests on Android fail

    Hi, I'm evaluating the concurrentqueue for use on android, in a c++ app.

    I tried running the fuzz tests (well, basically a chopped down version of the main in fuzztests.cpp), and some of them failed. It happens in two different points of the same type of test (core_thread_local). Here's some data, grouped by type of error:

    AFTER 672 ITERATIONS
     multithread_produce: 232 successful, 0 failed
     multithread_consume: 240 successful, 0 failed
     multithread_produce_and_consume: 192 successful, 0 failed
     completely_random: 208 successful, 0 failed
     core_add_only_list: 252 successful, 0 failed
     core_thread_local: 197 successful, 14 failed
     - Executed 1335 tests so far
    
    Reason: assertion failed on line 593: p->value == 0
     Seed: d7990b4e4dea1895
     Seed: 96fe1c7d0392537b
     Seed: 139af5f3d1c7e4bb
     Seed: d013a74b4faa2d1b
     Seed: 7b0ceb690c3286d9
     Seed: 955a9db3764ac57b
     Seed: 09efdf685ad77030
    
    Reason: assertion failed on line 611: *localData[i] == i
     Seed: 3eab167012cb7d8c
     Seed: c448fd7c0b9725fc
     Seed: 4e5b54df425bbbb6
     Seed: 639d2c83865d2b47
     Seed: f9ae9a0365bec08c
     Seed: c3bd84c3c96e2f17
     Seed: 6de7a43c21a7bd6e
    

    I'm not sure how to interpret these results yet, I will try having a closer look at the code, in the meanwhile I thought it wouldn't hurt to ask and share what I got. Any pointer would be much appreciated, thanks!

    As a reference, here's the test function I "assembled":

    bool FuzzyTestMoodyCamelConcurrentQueue()
    {
        uint32_t iteration = 0;
        uint64_t seed = 0;
        int result = 0;
        test_type test;
        const char* failReason;
    
        while (true)
        {
            seed = (static_cast<uint64_t>(std::time(NULL)) << 32) | iteration++;
            // MurmurHash3 64-bit finalizer
            seed ^= seed >> 33;
            seed *= 0xff51afd7ed558ccd;
            seed ^= seed >> 33;
            seed *= 0xc4ceb9fe1a85ec53;
    
            g_seed.store(seed, std::memory_order_release);
            int result;
    
            try {
                result = run_test(seed, 2, test, failReason);
            }
            catch (std::exception const& e) {
    
                PRINTMSG("*** Exception : %s\n      Seed: %08x%08x\n      Test: %s\n\n", e.what(), (uint32_t)(seed >> 32), (uint32_t)(seed), test_names[test]);
                return false;
            }
            catch (...) {
    
                PRINTMSG("*** Unknown exception thrown!\n      Seed: %08x%08x\n      Test: %s\n\n", (uint32_t)(seed >> 32), (uint32_t)(seed), test_names[test]);
                return false;
            }
    
            if (!result) {
                result = 1;
                PRINTMSG("*** Failure detected!\n      Seed: %08x%08x\n      Test: %s\n      Reason: %s\n", (uint32_t)(seed >> 32), (uint32_t)(seed), test_names[test], failReason);
            }
    
            if ((iteration & 15) == 0) {
    
                std::uint64_t total = 0;
                PRINTMSG("AFTER %u ITERATIONS",iteration);
                for (int i = 0; i != TEST_TYPE_COUNT; ++i)
                {
                    PRINTMSG(" %s: %llu successful, %llu failed\n", test_names[i], (unsigned long long)(test_count[i] - fail_count[i]), (unsigned long long)fail_count[i]);
                    total += test_count[i];
                }
                PRINTMSG(" - Executed %llu tests so far", (unsigned long long)total);
            }
        }
    }
    

    where PRINTMSG is a macro defined to the android native log function.

    bug 
    opened by sonicdebris 14
  • recycling implicit producers (introduced in e53f28cb3) causes heap_use_after_free error

    recycling implicit producers (introduced in e53f28cb3) causes heap_use_after_free error

    Location of error: concurrentqueue.h:412 Identified by g++ AddressSanitizer

    Environment: g++ 4.8.2-19ubuntu1, compiling with std=c++11, O2, -fopenmp, pthreads

    I can consistently replicate the behavior in my code, but cannot pinpoint the location of the error sicne this occurs at thread termination. Downgrading to previous commit 4671562 resolves the problem. ConcurrentQueue unit tests pass fine (but it's using std::thread). I would like to be able to use your latest version of code so it'd be nice to figure out why this is happending.

    ==5558== ERROR: AddressSanitizer: heap-use-after-free on address 0x601800023fe8 at pc 0x7f622d6b12b1 bp 0x7f62284cbdc0 sp 0x7f62284cbdb8 READ of size 8 at 0x601800023fe8 thread T16777215 #0 0x7f622d6b12b0 in moodycamel::details::ThreadExitNotifier::~ThreadExitNotifier() /home/tpan/src/bliss/ext/concurrentqueue/concurrentqueue.h:412 #1 0x7f622a249d78 in run /build/buildd/gcc-4.8-4.8.2/build/x86_64-linux-gnu/libstdc++-v3/libsupc++/../../../../src/libstdc++-v3/libsupc++/atexit_thread.cc:64 #2 0x7f62297ebf81 in __nptl_deallocate_tsd /build/buildd/eglibc-2.19/nptl/pthread_create.c:158 #3 0x7f62297ec194 in start_thread /build/buildd/eglibc-2.19/nptl/pthread_create.c:325 #4 0x7f6229afd00c (/lib/x86_64-linux-gnu/libc.so.6+0xfb00c) 0x601800023fe8 is located 104 bytes inside of 128-byte region [0x601800023f80,0x601800024000) ==5558== AddressSanitizer CHECK failed: ../../../../src/libsanitizer/asan/asan_report.cc:344 "((t)) != (0)" (0x0, 0x0) #0 0x7f622a50331d (/usr/lib/x86_64-linux-gnu/libasan.so.0+0x1231d) #1 0x7f622a50a133 (/usr/lib/x86_64-linux-gnu/libasan.so.0+0x19133) #2 0x7f622a5080d6 (/usr/lib/x86_64-linux-gnu/libasan.so.0+0x170d6) #3 0x7f622a508f71 (/usr/lib/x86_64-linux-gnu/libasan.so.0+0x17f71) #4 0x7f622a503733 (/usr/lib/x86_64-linux-gnu/libasan.so.0+0x12733) #5 0x7f622d6b12b0 in moodycamel::details::ThreadExitNotifier::~ThreadExitNotifier() /home/tpan/src/bliss/ext/concurrentqueue/concurrentqueue.h:412 #6 0x7f622a249d78 in run /build/buildd/gcc-4.8-4.8.2/build/x86_64-linux-gnu/libstdc++-v3/libsupc++/../../../../src/libstdc++-v3/libsupc++/atexit_thread.cc:64 #7 0x7f62297ebf81 in __nptl_deallocate_tsd /build/buildd/eglibc-2.19/nptl/pthread_create.c:158 #8 0x7f62297ec194 in start_thread /build/buildd/eglibc-2.19/nptl/pthread_create.c:325 #9 0x7f6229afd00c (/lib/x86_64-linux-gnu/libc.so.6+0xfb00c)

    bug 
    opened by tcpan 14
  • Need to understand ConcurrentQueue

    Need to understand ConcurrentQueue

    Hello I am looking for some faster queuing library and then when i saw the benchmark results of ConcurrentQueue, it's just awesome..

    I am learning this, i need to understand few things whats this ConcurrentQueueDefaultTraits? I mean under which circumstances I should use this And I am going to produce from multiple threads and going to consume from respective consumer threads Do i need to just create one instance of something like

    ConcurrentQueue<customeObject, Traits> q; or each thread have ConcurrentQueue<customeObject, Traits> q;

    help wanted 
    opened by necromancersecret 13
  • Use of queue when producer is in a different process from consumer (over shared memory)

    Use of queue when producer is in a different process from consumer (over shared memory)

    I'd like to be able to use this queue where the producer is in a separate process from the consumer (via Linux shared memory). What issues if any do you see when trying to do this?

    opened by lel4866 1
  • Replace BLOCK_SIZE with CQ_BLOCK_SIZE

    Replace BLOCK_SIZE with CQ_BLOCK_SIZE

    As Linux defines as a C macro it in <sys/mount.h>.

    So, as much as I think that Linux shouldn't define such generic-sounding macro... I don't think it'd be an easy change to get accepted upstream, but it leads to errors if for some reason <sys/mount.h> gets included before concurrentqueue, thus it's likely safer to prefix the variable name here. CQ_BLOCK_SIZE yields no matches in my /usr/include so less likely to conflict

    opened by jcelerier 1
  • Why not provide the wait_enqueue API? Is there any difficulty?

    Why not provide the wait_enqueue API? Is there any difficulty?

    When I use blockingconcurrentqueue, I notice that the wait_enqueue API is not provided, which will blocks when it find no space.

    What I want to know is, are we not implementing the API because of any difficulties or is it difficult to implement the API in the current implementation?

    opened by caipengbo 3
  • Compilation warning when compiling with gcc -Wsign-conversion

    Compilation warning when compiling with gcc -Wsign-conversion

    -Wsign-conversion This is a quite aggressive gcc compiler warning, but it often detects issues so sometimes we test builds with it switched on. Because the concurrentqueue is header-only the warning permeates across the build. Obviously we can silence this via various means, but it is relatively straight-forward to perform the relevant casts in the code.

    The warning occurs on the few lines performing differences with division, such as 2021 and 2965:

        auto offset = static_cast<size_t>(static_cast<typename std::make_signed<index_t>::type>(blockBaseIndex - headBase) / BLOCK_SIZE);
    

    The warning is emitted due to the signed result of blockBaseIndex - headBase being mixed with the unsigned BLOCK_SIZE. Casting BLOCK_SIZE to a signed type would be one possible fix.

    (g++ 11.2.1)

    opened by Chardrazle 2
Releases(v1.0.3)
  • v1.0.3(Feb 24, 2021)

    This release includes the following minor enhancements:

    • Workaround for compiler bug in GCC 9.2 producing compile-time error
    • A basic C API was contributed
    • A basic CMakeLists.txt was contributed
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Sep 25, 2020)

    This version includes small improvements to the lightweight semaphore code for the blocking version of queue, in particular the new MAX_SEMA_SPINS trait.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Feb 1, 2020)

  • v1.0.0-beta(Sep 26, 2016)

    First release after a long gestation period of ad-hoc bug reports and fixes. Should be quite stable, but more fixes are to come.

    All bug reports welcome!

    Source code(tar.gz)
    Source code(zip)
Owner
Cameron
Cameron
A collection of hash tables for parallel programming, including lock-free, wait-free tables.

Hatrack Hash tables for parallel programming This project consisists of fast hash tables suitable for parallel programming, including multiple lock-fr

null 57 Jul 26, 2022
A C++ data container replicating std::queue functionality but with better performance.

A data container replicating std::queue functionality but with better performance than standard library containers in a queue context. C++98/03/11/14/etc-compatible.

Matt Bentley 21 Aug 16, 2022
Data Structure Studying Group: array, linked list, stack, queue, deque, tree, graph. Summary of these concepts are descripted in the markdown files.

DATA_STRUCTURE STUDY ?? CURRICULUM ✔️ Day1 array linked list ✔️ Day2 circular linked list double linked list polynomial addtion reverse linked list ✔️

Soyeon Kim 3 Jul 22, 2022
Multi-Scale Geometric Consistency Guided and Planar Prior Assisted Multi-View Stereo

ACMMP [News] The code for ACMH is released!!! [News] The code for ACMM is released!!! [News] The code for ACMP is released!!! About This repository co

Qingshan Xu 74 Sep 22, 2022
Custom Malloc/Free implementation in C

To run the tests simply compile using "make" and run the executable "memory" with "./memory". The tests run automatically from main(), there are 3 tes

null 1 Dec 30, 2021
Multi-Purpose Header-Only C Data Structures

USAGE These header files are meant to be a simple means of using datastructures in a C project. They are universally useable with any other C datatype

Andre Schneider 2 Jul 15, 2022
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.

LruClockCache Low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. (see wiki for details) using MyKeyType = std::string; us

Hüseyin Tuğrul BÜYÜKIŞIK 30 Sep 16, 2022
A fast hash map/hash table (whatever you want to call it) for the C programming language.

C HashMap A fast hash map/hash table (whatever you want to call it) for the C programming language. It can associate a key with a pointer or integer v

Mashpoe 68 Sep 15, 2022
Simple C++ code to benchmark fast division algorithms

fast_division Simple C++ code to benchmark fast division algorithms relying on constant divisors. The code is a companion to the paper Integer Divisio

Daniel Lemire 37 Jul 31, 2022
This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer (short for FST) in c++ language.This FST C++ open source project has much significant advantages.

Orchid-Fst 1. Project Overview This project Orchid-Fst implements a fast text string dictionary search data structure: Finite state transducer , which

Bin Ding 9 Jul 25, 2022
Another neofetch-like utility but this time it's fast.

SystemFetch Another neofetch-like utility but this time it's fast. Example Speed Here is a table of the time it took to execute all of these programs,

YSU 10 Jul 22, 2021
This is a curve topology verification tool based on Fast Linking Numbers for Topology Verification of Loopy Structures.

Fast Linking Numbers This tool, called verifycurves, takes input models that consist of closed-loop curves, and outputs a topology certificate as a .t

Ante Qu 22 Jan 26, 2022
merge two sorted lists fast

Py Merge Merge sorted list faster than using list.sort or heapq.merge. import merge # create some sorted lists a = list(range(-100, 1700)) b = list(r

Earthly 10 Nov 21, 2021
C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

Thibaut Goetghebuer-Planchon 567 Sep 20, 2022
A family of header-only, very fast and memory-friendly hashmap and btree containers.

The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map

Gregory Popovitch 1.6k Sep 21, 2022
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Giorgio Vinciguerra 630 Sep 21, 2022
Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

➵ robin_hood unordered map & set robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map

Martin Leitner-Ankerl 1.2k Sep 23, 2022
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Thibaut Goetghebuer-Planchon 811 Sep 16, 2022
A fast Python Common substrings of multiple strings library with C++ implementation

A fast Python Common substrings of multiple strings library with C++ implementation Having a bunch of strings, can I print some substrings which appea

Đào Nguyên Dương 7 Aug 21, 2022