Parallel-hashmap - A family of header-only, very fast and memory-friendly hashmap and btree containers.

Overview

The Parallel Hashmap Linux MacOS Windows

Overview

This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map and std::set, with the following characteristics:

  • Header only: nothing to build, just copy the parallel_hashmap directory to your project and you are good to go.

  • drop-in replacement for std::unordered_map, std::unordered_set, std::map and std::set

  • Compiler with C++11 support required, C++14 and C++17 APIs are provided (such as try_emplace)

  • Very efficient, significantly faster than your compiler's unordered map/set or Boost's, or than sparsepp

  • Memory friendly: low memory usage, although a little higher than sparsepp

  • Supports heterogeneous lookup

  • Easy to forward declare: just include phmap_fwd_decl.h in your header files to forward declare Parallel Hashmap containers [note: this does not work currently for hash maps with pointer keys]

  • Dump/load feature: when a flat hash map stores data that is std::trivially_copyable, the table can be dumped to disk and restored as a single array, very efficiently, and without requiring any hash computation. This is typically about 10 times faster than doing element-wise serialization to disk, but it will use 10% to 60% extra disk space. See examples/serialize.cc. (flat hash map/set only)

  • Tested on Windows (vs2015 & vs2017, vs2019, Intel compiler 18 and 19), linux (g++ 4.8.4, 5, 6, 7, 8, clang++ 3.9, 4.0, 5.0) and MacOS (g++ and clang++) - click on travis and appveyor icons above for detailed test status.

  • Automatic support for boost's hash_value() method for providing the hash function (see examples/hash_value.h). Also default hash support for std::pair and std::tuple.

  • natvis visualization support in Visual Studio (hash map/set only)

@byronhe kindly provided this Chinese translation of the README.md.

Fast and memory friendly

Click here For a full writeup explaining the design and benefits of the Parallel Hashmap.

The hashmaps and btree provided here are built upon those open sourced by Google in the Abseil library. The hashmaps use closed hashing, where values are stored directly into a memory array, avoiding memory indirections. By using parallel SSE2 instructions, these hashmaps are able to look up items by checking 16 slots in parallel, allowing the implementation to remain fast even when the table is filled up to 87.5% capacity.

IMPORTANT: This repository borrows code from the abseil-cpp repository, with modifications, and may behave differently from the original. This repository is an independent work, with no guarantees implied or provided by the authors. Please visit abseil-cpp for the official Abseil libraries.

Installation

Copy the parallel_hashmap directory to your project. Update your include path. That's all.

If you are using Visual Studio, you probably want to add phmap.natvis to your projects. This will allow for a clear display of the hash table contents in the debugger.

A cmake configuration files (CMakeLists.txt) is provided for building the tests and examples. Command for building and running the tests is: mkdir build && cd build && cmake -DPHMAP_BUILD_TESTS=ON -DPHMAP_BUILD_EXAMPLES=ON .. && cmake --build . && make test

Example

#include <iostream>
#include <string>
#include <parallel_hashmap/phmap.h>

using phmap::flat_hash_map;
 
int main()
{
    // Create an unordered_map of three strings (that map to strings)
    flat_hash_map<std::string, std::string> email = 
    {
        { "tom",  "[email protected]"},
        { "jeff", "[email protected]"},
        { "jim",  "[email protected]"}
    };
 
    // Iterate and print keys and values 
    for (const auto& n : email) 
        std::cout << n.first << "'s email is: " << n.second << "\n";
 
    // Add a new entry
    email["bill"] = "[email protected]";
 
    // and print it
    std::cout << "bill's email is: " << email["bill"] << "\n";
 
    return 0;
}

Various hash maps and their pros and cons

The header parallel_hashmap/phmap.h provides the implementation for the following eight hash tables:

  • phmap::flat_hash_set
  • phmap::flat_hash_map
  • phmap::node_hash_set
  • phmap::node_hash_map
  • phmap::parallel_flat_hash_set
  • phmap::parallel_flat_hash_map
  • phmap::parallel_node_hash_set
  • phmap::parallel_node_hash_map

The header parallel_hashmap/btree.h provides the implementation for the following btree-based ordered containers:

  • phmap::btree_set
  • phmap::btree_map
  • phmap::btree_multiset
  • phmap::btree_multimap

The btree containers are direct ports from Abseil, and should behave exactly the same as the Abseil ones, modulo small differences (such as supporting std::string_view instead of absl::string_view, and being forward declarable).

When btrees are mutated, values stored within can be moved in memory. This means that pointers or iterators to values stored in btree containers can be invalidated when that btree is modified. This is a significant difference with std::map and std::set, as the std containers do offer a guarantee of pointer stability. The same is true for the 'flat' hash maps and sets.

The full types with template parameters can be found in the parallel_hashmap/phmap_fwd_decl.h header, which is useful for forward declaring the Parallel Hashmaps when necessary.

Key decision points for hash containers:

  • The flat hash maps will move the keys and values in memory. So if you keep a pointer to something inside a flat hash map, this pointer may become invalid when the map is mutated. The node hash maps don't, and should be used instead if this is a problem.

  • The flat hash maps will use less memory, and usually be faster than the node hash maps, so use them if you can. the exception is when the values inserted in the hash map are large (say more than 100 bytes [needs testing]) and costly to move.

  • The parallel hash maps are preferred when you have a few hash maps that will store a very large number of values. The non-parallel hash maps are preferred if you have a large number of hash maps, each storing a relatively small number of values.

  • The benefits of the parallel hash maps are:
    a. reduced peak memory usage (when resizing), and
    b. multithreading support (and inherent internal parallelism)

Key decision points for btree containers:

Btree containers are ordered containers, which can be used as alternatives to std::map and std::set. They store multiple values in each tree node, and are therefore more cache friendly and use significantly less memory.

Btree containers will usually be preferable to the default red-black trees of the STL, except when:

  • pointer stability or iterator stability is required
  • the value_type is large and expensive to move

When an ordering is not needed, a hash container is typically a better choice than a btree one.

Changes to Abseil's hashmaps

  • The default hash framework is std::hash, not absl::Hash. However, if you prefer the default to be the Abseil hash framework, include the Abseil headers before phmap.h and define the preprocessor macro PHMAP_USE_ABSL_HASH.

  • The erase(iterator) and erase(const_iterator) both return an iterator to the element following the removed element, as does the std::unordered_map. A non-standard void _erase(iterator) is provided in case the return value is not needed.

  • No new types, such as absl::string_view, are provided. All types with a std::hash<> implementation are supported by phmap tables (including std::string_view of course if your compiler provides it).

  • The Abseil hash tables internally randomize a hash seed, so that the table iteration order is non-deterministic. This can be useful to prevent Denial Of Service attacks when a hash table is used for a customer facing web service, but it can make debugging more difficult. The phmap hashmaps by default do not implement this randomization, but it can be enabled by adding #define PHMAP_NON_DETERMINISTIC 1 before including the header phmap.h (as is done in raw_hash_set_test.cc).

  • Unlike the Abseil hash maps, we do an internal mixing of the hash value provided. This prevents serious degradation of the hash table performance when the hash function provided by the user has poor entropy distribution. The cost in performance is very minimal, and this helps provide reliable performance even with imperfect hash functions.

Memory usage

type memory usage additional peak memory usage when resizing
flat tables flat_mem_usage flat_peak_usage
node tables node_mem_usage node_peak_usage
parallel flat tables flat_mem_usage parallel_flat_peak
parallel node tables node_mem_usage parallel_node_peak
  • size() is the number of values in the container, as returned by the size() method
  • load_factor() is the ratio: size() / bucket_count(). It varies between 0.4375 (just after the resize) to 0.875 (just before the resize). The size of the bucket array doubles at each resize.
  • the value 9 comes from sizeof(void *) + 1, as the node hash maps store one pointer plus one byte of metadata for each entry in the bucket array.
  • flat tables store the values, plus one byte of metadata per value), directly into the bucket array, hence the sizeof(C::value_type) + 1.
  • the additional peak memory usage (when resizing) corresponds the the old bucket array (half the size of the new one, hence the 0.5), which contains the values to be copied to the new bucket array, and which is freed when the values have been copied.
  • the parallel hashmaps, when created with a template parameter N=4, create 16 submaps. When the hash values are well distributed, and in single threaded mode, only one of these 16 submaps resizes at any given time, hence the factor 0.03 roughly equal to 0.5 / 16

Iterator invalidation for hash containers

The rules are the same as for std::unordered_map, and are valid for all the phmap hash containers:

Operations Invalidated
All read only operations, swap, std::swap Never
clear, rehash, reserve, operator= Always
insert, emplace, emplace_hint, operator[] Only if rehash triggered
erase Only to the element erased

Iterator invalidation for btree containers

Unlike for std::map and std::set, any mutating operation may invalidate existing iterators to btree containers.

Operations Invalidated
All read only operations, swap, std::swap Never
clear, operator= Always
insert, emplace, emplace_hint, operator[] Yes
erase Yes

Example 2 - providing a hash function for a user-defined class

In order to use a flat_hash_set or flat_hash_map, a hash function should be provided. This can be done with one of the following methods:

  • Provide a hash functor via the HashFcn template parameter

  • As with boost, you may add a hash_value() friend function in your class.

For example:

#include <parallel_hashmap/phmap_utils.h> // minimal header providing phmap::HashState()
#include <string>
using std::string;

struct Person
{
    bool operator==(const Person &o) const
    { 
        return _first == o._first && _last == o._last && _age == o._age; 
    }

    friend size_t hash_value(const Person &p)
    {
        return phmap::HashState().combine(0, p._first, p._last, p._age);
    }

    string _first;
    string _last;
    int    _age;
};
  • Inject a specialization of std::hash for the class into the "std" namespace. We provide a convenient and small header phmap_utils.h which allows to easily add such specializations.

For example:

file "Person.h"

#include <parallel_hashmap/phmap_utils.h> // minimal header providing phmap::HashState()
#include <string>
using std::string;

struct Person
{
    bool operator==(const Person &o) const
    { 
        return _first == o._first && _last == o._last && _age == o._age; 
    }

    string _first;
    string _last;
    int    _age;
};

namespace std
{
    // inject specialization of std::hash for Person into namespace std
    // ----------------------------------------------------------------
    template<> struct hash<Person>
    {
        std::size_t operator()(Person const &p) const
        {
            return phmap::HashState().combine(0, p._first, p._last, p._age);
        }
    };
}

The std::hash specialization for Person combines the hash values for both first and last name and age, using the convenient phmap::HashState() function, and returns the combined hash value.

file "main.cpp"

#include "Person.h"   // defines Person  with std::hash specialization

#include <iostream>
#include <parallel_hashmap/phmap.h>

int main()
{
    // As we have defined a specialization of std::hash() for Person,
    // we can now create sparse_hash_set or sparse_hash_map of Persons
    // ----------------------------------------------------------------
    phmap::flat_hash_set<Person> persons = 
        { { "John", "Mitchell", 35 },
          { "Jane", "Smith",    32 },
          { "Jane", "Smith",    30 },
        };

    for (auto& p: persons)
        std::cout << p._first << ' ' << p._last << " (" << p._age << ")" << '\n';

}

Thread safety

Parallel Hashmap containers follow the thread safety rules of the Standard C++ library. In Particular:

  • A single phmap hash table is thread safe for reading from multiple threads. For example, given a hash table A, it is safe to read A from thread 1 and from thread 2 simultaneously.

  • If a single hash table is being written to by one thread, then all reads and writes to that hash table on the same or other threads must be protected. For example, given a hash table A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.

  • It is safe to read and write to one instance of a type even if another thread is reading or writing to a different instance of the same type. For example, given hash tables A and B of the same type, it is safe if A is being written in thread 1 and B is being read in thread 2.

  • The parallel tables can be made internally thread-safe for concurrent read and write access, by providing a synchronization type (for example std::mutex) as the last template argument. Because locking is performed at the submap level, a high level of concurrency can still be achieved. Read access can be done safely using if_contains(), which passes a reference value to the callback while holding the submap lock. Similarly, write access can be done safely using modify_if, try_emplace_l or lazy_emplace_l. However, please be aware that iterators or references returned by standard APIs are not protected by the mutex, so they cannot be used reliably on a hash map which can be changed by another thread.

  • Examples on how to use various mutex types, including boost::mutex, boost::shared_mutex and absl::Mutex can be found in examples/bench.cc

Using the Parallel Hashmap from languages other than C++

While C++ is the native language of the Parallel Hashmap, we welcome bindings making it available for other languages. One such implementation has been created for Python and is described below:

Acknowledgements

Many thanks to the Abseil developers for implementing the swiss table and btree data structures (see abseil-cpp) upon which this work is based, and to Google for releasing it as open-source.

Issues
  • Question: keys with multiple integers

    Question: keys with multiple integers

    Good morning, I have a very simple question.

    Let's say that I want to create a hash table and use keys identified by 3 integer values. For example:

    {1, 4, 5} ---> 3 {3, 4, 6} ---> 7 ecc...

    Namely, each key can have only one single value associated to it, but the key itself is identified by 3 (ore more!) integers.

    Is that possible to do something like this with these hash tables?

    If the answer is yes, could you please provide me a super easy example which explains how to achieve this? Otherwise, if the answer is no, is there an easy way to do something like in c++, for example using the STL?

    Thank you very much for your help and congratulations!

    opened by PietropaoloFrisoni 41
  • Recommendation for  deserializing a large hashmap?

    Recommendation for deserializing a large hashmap?

    Thank you Greg for this awesome contribution, I am finding it quite useful for my application.

    I plan to generate a very large hashmap with bitsets as the key and integers as the value. The size of the hashmap will be 10GB when serialized to binary. It might also be important to note that the hashmap will not be modified after generation; purely used for look-up purposes.

    I went ahead and serialized a 1GB flat hashmap using cereal and that took about 10 minutes, while it took about 20 minutes to deserialize.

    From what I could gather, hashmaps are slow to deserialize due to the fact they need to repopulate the entire map again from scratch (or so I've been told).

    To mitigate the slow deserialization process I was thinking to split the hashmap up and by some means parallelize the process of deserialization. Or maybe pre-allocating the memory necessary for the hashmap will speed things up. I am not really sure.

    I look forward to receiving your response.

    opened by andreirajkovic 26
  • Threadsafe combined erase and insert

    Threadsafe combined erase and insert

    I would like to use a parallel_hash_set to keep only the set of keys that have not been inserted once already. Upon a second insert, if the key already exists in the set, it should be erased. Keys will ONLY ever be inserted either once or twice. In other words, every time I go to insert a key, something like,

    if (!set.erase(key))
            set.insert(key);
    
    

    This would work fine without threads, but I would like to do this on the order of a billion keys and the erase and insert are happening concurrently on many threads. Is there an efficient way to do this with the existing parallel_hash_set and it's own blocks and own mutex?

    opened by karentierney 23
  • C++20 support

    C++20 support

    Can you please tell me when you plan to release a version with C++20 support? Or can I use the https://github.com/greg7mdp/parallel-hashmap/tree/c%2B%2B20 branch?

    opened by phprus 22
  • Explore using highbits from the hash for subtable selection

    Explore using highbits from the hash for subtable selection

    Hey, just wanted to say that I really like the work you are doing here. We have found that SwissTables performance can suffer dramatically when there is not enough entropy in the hash function. By using a hash of (h ^ (h >> 4)) & 0x0F to select your subtables, you are creating just such a reduction of entropy. Given that tables rarely grow large enough to use all their high bits, you can probably get better performance by doing something like h >> 56. At the very least it is worth exploring.

    opened by fowles 19
  • Ability to manipulate a hash_map value behind the write lock

    Ability to manipulate a hash_map value behind the write lock

    I've encounter a case where I would like to modify the value inside the hash map in a thread-safe way. I would like to propose a function similar to if_contains that I have so far named if_modify. The difference between the two functions is if_modify is not const, and it locks the unique_lock rather than the shared_lock. Otherwise they use the same function body.

    opened by bpmckinnon 18
  • Building with Intel Compiler

    Building with Intel Compiler

    C++ isn't my strong suit so I haven't quite figured out what this issue is yet. But, when I try to build the example in MSVS 2017 and 2019 with Windows SDK 10.0.17763.0, using ICC 2019 update 3 (all are debug builds using default settings), I get these errors:

    Severity Code Description Project File Line Suppression State Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 4144 Error (active) E1389 redeclaration cannot add dllexport/dllimport to "ldexpf" (declared at line 706 of "C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt\corecrt_math.h") hash C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\math.h 704 Error (active) E1389 redeclaration cannot add dllexport/dllimport to "powf" (declared at line 743 of "C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt\corecrt_math.h") hash C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019\windows\compiler\include\math.h 799 Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 557 Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 573 Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 604 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 2166 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 3304 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 4203 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 4262 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 4321 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 4372 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 4419 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 4470 Error #3466 inheriting constructors must be inherited from a direct base class hash d:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap.h 4520

    Works fine with Clang and MSVC using c++14 standard though.

    Under c++17 with Clang I get:

    Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 557 Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 573 Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 604 Error cannot convert 'const std::basic_string_view<char, std::char_traits >' to 'size_t' (aka 'unsigned int') without a conversion operator hash C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\xhash 31

    And with MSVC

    Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 557 Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 573 Error (active) E2512 the argument to a feature-test macro must be a simple identifier hash D:\Downloads\parallel-hashmap-master\parallel_hashmap\phmap_config.h 604

    Though you might want to know.

    Cheers

    opened by WhoWouldaThunk 17
  • Parallel Flat Hashmap Performance

    Parallel Flat Hashmap Performance

    Hello, I've been trying to use the parallel_flat_hash_map as a object cache since it seems like it'd work well for that.

    I'm using it to cache the results of mathematical computations in a simulation. A simulation instance runs in a unique thread and each simulation does a lot of mathematical computation. The reason I think caching should work well for this, is that my software has a decent probability of requesting inputs that may have already been computed before across the many threads that are all doing this math. Thus, it should be faster to simply compute them once, and save that to a cache, and check on every function invocation if it's already been computed before. If so, return the pre-computed answer from the cache. A trivial example could be:

    #include <chrono>
    #include <mutex>
    #include <thread>
    
    #include <parallel_hashmap/phmap.h>
    
    template<class K, class V>
    using ParallelFlatHashMap = phmap::parallel_flat_hash_map<K, V, phmap::priv::hash_default_hash<K>, phmap::priv::hash_default_eq<K>, phmap::priv::Allocator<phmap::priv::Pair<K,V>>, 8, std::mutex>;
    
    using namespace std::chrono_literals;
    
    ParallelFlatHashMap<std::pair<int, double>, double> cache;
    
    double complexMathFunction(int a, double b)
    {
        if(cache.contains({a, b}) {
            return cache[{a, b}]; // If the input is already in the cache, it should be faster to just return the answer
        }
        
        // If we hit this branch, we know this input combination has never been called before
        // Complex math that takes many milliseconds (sleeping for this trivial example)
        std::this_thread::sleep_for(100ms);
        
        // Trivial math for this example
        const double ans = double(a) + b;
        
        cache.insert({{a, b}, ans}); // Insert the key combination and the answer to cache, that way if this combination of inputs is already been computed, we don't need to re-do it
        
        return ans;
    }
    
    void simulate()
    {
        // For this example, do 1000 iterations of the simulation (can vary)
        for(int i = 0; i < 1000; i++) {
            // Every 10 iterations should generate the same repeating key. Not super realistic, but should be sufficient for testing?
            const double ans = complexMathFunction(i % 10, 3.14);
            // Do something with the answer (doesn't matter in this context)
        }
    }
    
    int main()
    {
        std::vector<std::jthread> threads;
        for(int i = 0; i < 100; i++) {
            // Kick off a new simulation thread
            threads.emplace_back(simulate);
        }
        
        // Wait for all simulation threads to finish
        for(auto& t: threads) {
            t.join();
        }
        
        return 0;
    }
    

    I've been profiling the performance of my real software and it only seems marginally faster than my code that doesn't use the caching method. When I use a profiler (such as perf), I notice that a large chunk of the time is waiting on mutex locking. I figured if a bunch of the threads were writing often to the cache, then this could be reasonable. However, if most of the threads are reading out of the cache, then it'd be much faster, as no writes are being done. For the purposes of testing, I've even pre-computed all of the possible values between inputs that I know will be put into the math functions and put them into the cache, and then kicked off all of the simulation threads. In this context, 100% of the threads should be read-only (as all of the possible inputs of the math functions have already been pre-computed). This is marginally faster than not caching, but profiling still shows a large chunk of time waiting on mutexes.

    Are there more parallel optimized functions for contains or operator[] that I should be using?

    Thanks for reading all of this!

    opened by evan-swinney 15
  • Added dump&&load interface for trivially_copyable types

    Added dump&&load interface for trivially_copyable types

    I need to dump and load the hash map/set very quickly for trivially_copyable types such as uint32,uint64 and so on. Fortunately all necessary data of these types are stored in ctrl_ and slots_ area so we can dump&load ctrl_ and slots_ directly.

    opened by gtarcoder 14
  • insert in parallel problem

    insert in parallel problem

    Hi, I'm using your hash map to write data to it in parallel. Here is the function that is being called in parallel (using concurrency lib on Windows, c++, VS2019):

    // class member: phmap::parallel_flat_hash_map<SimpleString, int, phmap::priv::hash_default_hash,
    phmap::priv::hash_default_eq,
    std::allocator<std::pair<const SimpleString, int>>, 8, srwlock> m_stringsMap;

    // class func:

    int addParallel(const SimpleString& str, volatile long* curIdx) { int newIndex = -1;

        auto inserted = m_stringsMap.insert(std::make_pair(str, newIndex));
        if (inserted.second) // we are the first to insert
        {
            newIndex = InterlockedIncrement(curIdx);
            m_stringsMap[str] = newIndex;
        }
        else // we are NOT first - someone already inserted the pair
        {
            while (newIndex < 0)
            {
                ::SwitchToThread();
                newIndex = m_stringsMap[str]; // 
            }
        }
        
        return newIndex;
    }
    

    My assumption is that insert behaves same way as concurrency::concurrent_unordered_map:

    1. If this thread was the one that inserted the key, it can safely modify the value for it, since only this thread will have inserted.second set to true.
    2. if other thread already inserted that key (and doesn't matter what is the value for that key is!), then inserted.second is false, and all that we need to do is to read from it until the value is modified to be something other than -1 (curIdx starts with 0).

    Most of the time, this works. But occasionally, it goes into forever loop in that while loop. This can happen, as I understand, in the following cases:

    • either thread that actually inserted the key received inserted.second as false; OR
    • newIndex = m_stringsMap[str] keeps reading initially inserted value instead of reading the updated value.

    Similar logic works flawlessly with concurrency::concurrent_unordered_map. I really wanted to use your map, since it's faster almost twice than concurrency::concurrent_unordered_map; but this bug stops me from using it. Can you please take a look?

    opened by kanonka 13
  • Keep original bucket count when copying a map

    Keep original bucket count when copying a map

    Hello, I am facing the following issue: I want to assign a map to other, however I lose the bucket count while doing so. The following code explains it better:

    auto map1 = phmap::parallel_flat_hash_map<int, int>(15000);
    for (int j = 0; j < 1000; j++)
        map1.insert(std::make_pair(j, j));
    phmap::parallel_flat_hash_map<int, int> map2;
    map2 = map1;
    std::cout << "map1 " << map1.size() << " " << map1.bucket_count() << "\n";
    std::cout << "map2 " << map2.size() << " " << map2.bucket_count() << "\n";
    

    The output of this is:

    map1 1000 262128
    map2 1000 1648
    

    I would like map2 to have 26k buckets too. I know there are other ways to copy, but I am implementing code with templates and this is just the simpler way to assign. Any idea on how to fix this?

    opened by ila 12
  • Seg Fault on malloc failure/out of memory

    Seg Fault on malloc failure/out of memory

    I find that flat_hash_map will sometimes cause a seg fault when the memory limit of the process is reached and malloc() returns null. The seg fault is on the free() call made on the nested container in the map called from the flat_hash_map destructor. The following block of code will consistently seg fault when inserted into my larger project if I use "ulimit -v" to set a process memory limit: phmap::flat_hash_map< size_t, string > hm; for( size_t i = 0; i < 1000000000; ++i ) { hm.insert( std::make_pair( i, "sfaasdfgasdgdfgsdhfghfgsdgfdgsdfgsdfgdfgfgsfdg" ) ); } However, this code appears to work fine and throw std::bad_alloc when I put these lines in a standalone binary. So it has something to do with the context or state of the program at the time when this code is run.

    A few more notes:

    • This only seems to happen when the map key or value are types with internal allocations such as std::vector or std::string.
    • Changing the hash_map type to unordered_map or spp::sparse_hash_map result in the expected std::bad_alloc.
    • Changing the hash_map type to google::sparse_hash_map results in the error "sparsehash FATAL ERROR: failed to allocate 32 groups" and call to exit().
    • Passing in a custom allocator that checks for a null return from malloc() inside the allocate() call and throws std::bad_alloc fixes the problem.

    The stack trace starting at the flat_hash_map destructor is: #0 0x000000000173776d in __exchange_and_add (__val=-1, __mem=0xfffffffffffffff8) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/ext/atomicity.h:82 #1 __exchange_and_add_dispatch (__val=-1, __mem=0xfffffffffffffff8) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/ext/atomicity.h:82 #2 __exchange_and_add_dispatch (__val=-1, __mem=0xfffffffffffffff8) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/ext/atomicity.h:78 #3 _M_dispose (__a=..., this=0xffffffffffffffe8) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/bits/basic_string.h:3311 #4 _M_dispose (__a=..., this=0xffffffffffffffe8) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/bits/basic_string.h:3295 #5 ~basic_string (this=0x7fffef1a0028, __in_chrg=) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/bits/basic_string.h:3706 #6 ~pair (this=0x7fffef1a0020, __in_chrg=) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/bits/stl_pair.h:208 #7 destroy<std::pair<unsigned long, std::basic_string > > (this=0x7fffffffc648, __p=0x7fffef1a0020) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/ext/new_allocator.h:153 #8 destroy<std::pair<unsigned long, std::basic_string > > (__a=..., __p=0x7fffef1a0020) at /grid/common/pkgsData/gcc-v9.3.0p1/Linux/RHEL6.0-2013-x86_64/include/c++/9.3.0/bits/alloc_traits.h:497 #9 destroy_impl<std::allocator<std::pair<unsigned long const, std::basic_string > >, std::pair<unsigned long, std::basic_string > > (a=..., p=0x7fffef1a0020) at ../src/parallel_hashmap/phmap_base.h:1542 #10 destroy<std::pair<unsigned long, std::basic_string > > (a=..., p=0x7fffef1a0020) at ../src/parallel_hashmap/phmap_base.h:1500 #11 destroy<std::allocator<std::pair<unsigned long const, std::basic_string > > > (alloc=0x7fffffffc648, slot=0x7fffef1a0020) at ../src/parallel_hashmap/phmap_base.h:4695 #12 destroy<std::allocator<std::pair<unsigned long const, std::basic_string > > > (alloc=0x7fffffffc648, slot=0x7fffef1a0020) at ../src/parallel_hashmap/phmap.h:3996 #13 destroy<std::allocator<std::pair<unsigned long const, std::basic_string > > > (alloc=0x7fffffffc648, slot=0x7fffef1a0020) at ../src/parallel_hashmap/phmap_base.h:479 #14 phmap::priv::raw_hash_set<phmap::priv::FlatHashMapPolicy<unsigned long, std::string>, phmap::Hash, phmap::EqualTo, std::allocator<std::pair<unsigned long const, std::string> > >::destroy_slots (this=0x7fffffffc620) at ../src/parallel_hashmap/phmap.h:1906 #15 0x0000000000c34eaf in ~raw_hash_set (this=0x7fffffffc620, __in_chrg=) at ../src/parallel_hashmap/phmap.h:4397 #16 ~raw_hash_map (this=0x7fffffffc620, __in_chrg=) at ../src/parallel_hashmap/phmap.h:2228 #17 ~flat_hash_map (this=0x7fffffffc620, __in_chrg=) at ../src/parallel_hashmap/phmap.h:4397

    opened by fegennari 15
  • Custom pointer type support ?

    Custom pointer type support ?

    It seems custom pointer types are not supported with the parallel hashmap: https://github.com/greg7mdp/parallel-hashmap/blob/master/parallel_hashmap/phmap.h#L843-L846

    Would this be difficult to add ? Because I would like to apply the parallel hashmap to a project to process openstreetmap data. This involves loading several billions of geometrical data. We do this by loading the the data into a boost unordered_map backed by a boost interprocess mmap file. This way, the data is hashed but backed by a file on filesystem. This is required, because converting the planet actually involved processing 60G of compressed data.

    The loading process is now completely single threaded, but ideally, we would like to load the node/ways/relations store with openstreetmap from the planet osm file in parallel. Parallel hashmap provides an interesting approach for this, but this approach would require using the boost interprocess offset pointer.

    https://github.com/systemed/tilemaker

    Is there a particular reason why custom pointer types are not allowed ?

    opened by kleunen 15
Releases(1.35)
  • 1.35(Jun 26, 2022)

  • 1.34(Mar 3, 2022)

    Nothing really major, mostly fix issues reported by users.

    • some improvements (performance, functionality) to custom apis for parallel hash containers, which allow to operate under protection of the internal mutex.
    • user-contributed debug visualizers (gdb, lldb)
    • fix issue when using phmap as submodule
    • switch to github actions for ci
    • very fast dump/restore cereal serialization of phmap containers containing trivially_copyable types
    • compiler issues/warnings
    Source code(tar.gz)
    Source code(zip)
  • 1.33(May 1, 2021)

    • Some small fixes to cleanup warnings with new compiler versions
    • add some extensions APIs allowing atomic test or changes while under the internal lock (for parallel_*_map maps): if_contains, modify_if, lazy_emplace_l, try_emplace_l, `erase_if``
    • replace internal container_internal namespace to the shorter priv
    • some other small cleanups (doc, tests, warnings).
    Source code(tar.gz)
    Source code(zip)
  • 1.32(May 18, 2020)

    The main change is the addition of a non-standard "if_contains" API to the parallel_hash_map, which allows to look up values in a thread-safe manner. Also fix error when using newly deprecated std::result_of with latest Visual Studio 2019 compilers.

    Source code(tar.gz)
    Source code(zip)
  • 1.31(Apr 2, 2020)

    Small changes, such as windows compilation flags, and renaming a couple variables which conflicted with #defines from some system headers.

    Source code(tar.gz)
    Source code(zip)
  • 1.30(Jan 1, 2020)

  • 1.27(Nov 4, 2019)

    • support hash for std::pair and std::tuple out of the box
    • add a couple of missing includes
    • fix compilation issue on msvc when xhash is included.
    • fix hash_value() support
    Source code(tar.gz)
    Source code(zip)
  • 1.24(Sep 8, 2019)

    • add dump/load feature for hash maps containing std::trivially_copyable values
    • throw exception for invalid operator access as standard requires
    • allow heterogeneous lookups with std::string_view
    • improve serialization when cereal is used.
    • fix some compilation warnings.
    Source code(tar.gz)
    Source code(zip)
  • 1.23(Jun 16, 2019)

  • 1.22(Apr 30, 2019)

    • reorganize headers so that phmap_fwd_decl.h can always be used without compilation errors
    • add support to boost's hash_value() method for providing the hash
    • fix incorrect return values in phamp::EqualTo causing compilation warnings
    Source code(tar.gz)
    Source code(zip)
  • 1.21(Apr 27, 2019)

  • 1.2(Apr 12, 2019)

    • improve mutex support (with support for boost::shared_mutex and boost::upgrade_mutex)
    • add systematic mixing of the hash value provided. This prevents serious degradation of the hash table performance when the hash function provided by the user has poor entropy distribution. The cost in performance is very minimal, and this helps provide reliable performance even with not so good hash functions.
    • make the iteration order deterministic by default (can still be made non-deterministic by adding #define PHMAP_NON_DETERMINISTIC 1 before including the header phmap.h (as is done in raw_hash_set_test.cc
    • clean and add a couple examples
    Source code(tar.gz)
    Source code(zip)
  • 1.1.0(Apr 1, 2019)

Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model - Parallel Execution and Memory Abstraction

Kokkos: Core Libraries Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platfor

Kokkos 1.1k Jul 30, 2022
Shared-Memory Parallel Graph Partitioning for Large K

KaMinPar The graph partitioning software KaMinPar -- Karlsruhe Minimal Graph Partitioning. KaMinPar is a shared-memory parallel tool to heuristically

Karlsruhe High Quality Graph Partitioning 14 Jul 5, 2022
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous tasks programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, a

Taskflow 7.1k Aug 5, 2022
Powerful multi-threaded coroutine dispatcher and parallel execution engine

Quantum Library : A scalable C++ coroutine framework Quantum is a full-featured and powerful C++ framework build on top of the Boost coroutine library

Bloomberg 447 Jul 25, 2022
An optimized C library for math, parallel processing and data movement

PAL: The Parallel Architectures Library The Parallel Architectures Library (PAL) is a compact C library with optimized routines for math, synchronizat

Parallella 296 Jul 24, 2022
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, an

Taskflow 7.1k Aug 5, 2022
Parallel algorithms (quick-sort, merge-sort , enumeration-sort) implemented by p-threads and CUDA

程序运行方式 一、编译程序,进入sort-project(cuda-sort-project),输入命令行 make 程序即可自动编译为可以执行文件sort(cudaSort)。 二、运行可执行程序,输入命令行 ./sort 或 ./cudaSort 三、删除程序 make clean 四、指定线程

Fu-Yun Wang 3 May 30, 2022
EnkiTS - A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

Support development of enkiTS through Github Sponsors or Patreon enkiTS Master branch Dev branch enki Task Scheduler A permissively licensed C and C++

Doug Binks 1.3k Aug 6, 2022
ParallelComputingPlayground - Shows different programming techniques for parallel computing on CPU and GPU

ParallelComputingPlayground Shows different programming techniques for parallel computing on CPU and GPU. Purpose The idea here is to compute a Mandel

Morten Nobel-Jørgensen 2 May 16, 2020
Material for the UIBK Parallel Programming Lab (2021)

UIBK PS Parallel Systems (703078, 2021) This repository contains material required to complete exercises for the Parallel Programming lab in the 2021

null 12 May 6, 2022
C++-based high-performance parallel environment execution engine for general RL environments.

EnvPool is a highly parallel reinforcement learning environment execution engine which significantly outperforms existing environment executors. With

Sea AI Lab 571 Aug 5, 2022
Partr - Parallel Tasks Runtime

Parallel Tasks Runtime A parallel task execution runtime that uses parallel depth-first (PDF) scheduling [1]. [1] Shimin Chen, Phillip B. Gibbons, Mic

null 32 Jul 17, 2022
Cpp-taskflow - Modern C++ Parallel Task Programming Library

Cpp-Taskflow A fast C++ header-only library to help you quickly write parallel programs with complex task dependencies Why Cpp-Taskflow? Cpp-Taskflow

null 4 Mar 30, 2021
Thrust - The C++ parallel algorithms library.

Thrust: Code at the speed of light Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interfac

NVIDIA Corporation 4.1k Jul 29, 2022
KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software

KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software, aiming at modularity, extensibility, and high performance. Kratos is written in C++, and counts with an extensive Python interface.

KratosMultiphysics 707 Jul 29, 2022
Parallel implementation of Dijkstra's shortest path algorithm using MPI

Parallel implementation of Dijkstra's shortest path algorithm using MPI

Alex Diop 1 Jan 21, 2022
Parallel bitonic sorter with limited enclave page cache (default 80MB).

Parallel Oblivious Sorter with SGX Parallel bitonic sorter with limited Intel SGX enclave page cache (default 80MB). Compile and Run Compile make clea

ZHENG Leqian 3 Jul 7, 2022
C++20's jthread for C++11 and later in a single-file header-only library

jthread lite: C++20's jthread for C++11 and later A work in its infancy. Suggested by Peter Featherstone. Contents Example usage In a nutshell License

Martin Moene 44 Jul 16, 2022
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Christian Blume 586 Jul 29, 2022