Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

Overview

➵ robin_hood unordered map & set Release GitHub license

Travis CI Build Status Appveyor Build Status Codacy Badge Total alerts Language grade: C/C++ Coverage Status

robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map / std::unordered_set which is both faster and more memory efficient for real-world use cases.

Installation & Usage

Direct Inclusion

  1. Add robin_hood.h to your C++ project.
  2. Use robin_hood::unordered_map instead of std::unordered_map
  3. Use robin_hood::unordered_set instead of std::unordered_set

Conan, the C/C++ Package Manager

  1. Setup your CMakeLists.txt (see Conan documentation on how to use MSBuild, Meson and others) like this:
    project(myproject CXX)
    
    add_executable(${PROJECT_NAME} main.cpp)
    
    include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake) # Include Conan-generated file
    conan_basic_setup(TARGETS) # Introduce Conan-generated targets
    
    target_link_libraries(${PROJECT_NAME} CONAN_PKG::robin-hood-hashing)
  2. Create conanfile.txt in your source dir (don't forget to update the version)
    [requires]
    robin-hood-hashing/3.10.0
    
    [generators]
    cmake
  3. Install and run Conan, then build your project as always:
    pip install conan
    mkdir build
    cd build
    conan install ../ --build=missing
    cmake ../
    cmake --build .
    The robin-hood-hashing package in Conan is kept up to date by Conan contributors. If the version is out of date, please create an issue or pull request on the conan-center-index repository.

Benchmarks

Please see extensive benchmarks at Hashmaps Benchmarks. In short: robin_hood is always among the fastest maps and uses far less memory than std::unordered_map.

Design Choices

  • Two memory layouts. Data is either stored in a flat array, or with node indirection. Access for unordered_flat_map is extremely fast due to no indirection, but references to elements are not stable. It also causes allocation spikes when the map resizes, and will need plenty of memory for large objects. Node based map has stable references & pointers (NOT iterators! Similar to std::unordered_map) and uses const Key in the pair. It is a bit slower due to indirection. The choice is yours; you can either use robin_hood::unordered_flat_map or robin_hood::unordered_node_map directly. If you use robin_hood::unordered_map It tries to choose the layout that seems appropriate for your data.

  • Custom allocator. Node based representation has a custom bulk allocator that tries to make few memory allocations. All allocated memory is reused, so there won't be any allocation spikes. It's very fast as well.

  • Optimized hash. robin_hood::hash has custom implementations for integer types and for std::string that are very fast and falls back to std::hash for everything else.

  • Depends on good Hashing. For a really bad hash the performance will not only degrade like in std::unordered_map, the map will simply fail with an std::overflow_error. In practice, when using the standard robin_hood::hash, I have never seen this happening.

License

Licensed under the MIT License. See the LICENSE file for details.

by martinus

Issues
  • Support for Unique hash map?

    Support for Unique hash map?

    As discussed on https://github.com/CGAL/cgal/issues/4049 I am investigating using robin_hood::unordered_map for a unique hash map. A Unique hash map is a specialization on insertion only and unique hash values allow for a more time and space-efficient implementation. In the case of the implementation in CGAL the memory addresses of Handle objects are used for key-hashes and therefore by their very nature there cannot be duplicates/collisions.

    The CGAL implementation suffers from poor construction speed due to the default hardcoded table size of 512, and this is what I hope to improve by using robin_hood::unordered_map. However the insert speed of the CGAL::Unique_hash_map does seem to outperform by a factor of 0.2, as can be seen in my benchmark project https://github.com/GilesBathgate/hashmap-benchmarks

    What I'd like to establish is whether improvements can be made to either CGAL::Unique_hash_map that would facilitate faster construction, or whether robin_hood::* can be improved with the techniques used in the CGAL implementation.

    I realise that the main purpose of robin hood hashing is to handle the case when there are duplicate keys, so perhaps its not the best candidate. Are there any other faster hash map implementations that can be used for a unique hash map that would be more suitable?

    opened by GilesBathgate 12
  • Consider porting to C++11 / C++03

    Consider porting to C++11 / C++03

    A large number of development environments still do not have C++14 support, especially in many embedded environments that only support C++03.

    Can you consider porting this powerful and efficient container to these older versions?

    Thanks :-)

    enhancement 
    opened by asbai 7
  • support remove/erase idiom

    support remove/erase idiom

    I noticed when applying the remove/erase idiom that the following method is not implemented:

    iterator erase ( const_iterator first, const_iterator last );

    Reference: https://en.cppreference.com/w/cpp/container/unordered_map/erase Example: cache.erase( std::remove_if( cache.begin(), cache.end(), [&] ( const pair<int,int>& i ) { return i.second == 1 ); } ), cache.end() );

    wontfix 
    opened by jcageman 7
  • Fix CRC32 support on Windows ARM64 and use runtime detection

    Fix CRC32 support on Windows ARM64 and use runtime detection

    __cpuid isn't present on ARM64 builds in Windows, and CRC32 is always present for ARM64 processors that Windows 10 supports (making runtime detection unnecessary).

    Also, remove the hardware support detection at compile time. You've already added runtime detection (the more flexible method), so use that.

    opened by wyattoday 6
  • Not using intrinsics

    Not using intrinsics

    What was the reason for reverting https://github.com/martinus/robin-hood-hashing/commit/f45a309af7d08cb98fc08dc7903ac9a7091db6d5 ? Any reliability concerns or just performance?

    I'm asking because, on certain contexts, including x86intrin.h on Mingw-w64 triggers a bug on g++.

    bug 
    opened by oscarfv 6
  • Please add C++ 17 node API to node based map

    Please add C++ 17 node API to node based map

    For node-based maps, these functions can enable writing much more efficient code:

    https://en.cppreference.com/w/cpp/container/unordered_map/extract

    Indeed, I am not using your map in a number of places precisely due to me using this node ptr API.

    enhancement 
    opened by ned14 6
  • it = map.erase(it) potentially passes object twice

    it = map.erase(it) potentially passes object twice

    It seems @jcageman has stumbled uppon an issue that probably all robin-hood map with backward shift deletion have (see #18). Here is a simple reproducer

    Insert two elements, where both hash to the last bucket of the map. One entry will be the last one of the map, the other one will wrap around and use the first bucket.

    robin_hood::unordered_node_map<int, int, DummyHash> map;
    map[1024] = 1;
    map[2048] = 2;
    
    it = map.begin(); // it points to 2048, the first element which has wrapped around
    ++it;  // it points to the last element, 1024 which is in its original bucket
    it = map.erase(it); // backward shift deletion removes 1024 and wraps back 2048, so it now (again!) points to 2048 and NOT end
    it = map.erase(it); // finally we are at the end.
    

    Also see https://github.com/Tessil/robin-map/issues/20

    bug 
    opened by martinus 6
  • Fast CRC32 based hash for hash_bytes and hash_int

    Fast CRC32 based hash for hash_bytes and hash_int

    • [x] CRC with runtime dispatch
    • [x] Use _mm_crc32_u64
    • [x] Don't use upper bits of the hash, only use lower bits. Then a single crc command would be enough
    • [ ] Implement optimized hash_bytes that makes use of CRC32 instruction
    performance 
    opened by martinus 6
  • Crashing with overflow_error

    Crashing with overflow_error

    Hello,

    I am currently trying to integrate robin hood hashing into my project. As mentioned on your github, I simply included the robin_hood.h and used robin_hood::unordered_map instead of std::unordered_map.

    However, upon start, my program crashes and reports the following error:

    terminate called after throwing an instance of 'std::overflow_error' what(): robin_hood::map overflow Command terminated by signal 6

    Do you have any idea what could cause this?

    Thanks!

    Cheers, Pierre

    opened by morispi 6
  • hashing function taking a seed

    hashing function taking a seed

    There are several use case where having a hash function that takes a seed as input makes sense:

    • Suppose we use a hash function for partitioning a set and a unordered_map for storing them. In that case you want different seeds being used.
    • We may want to compute hash iteratively. An example of such scenario is when the data is not serialized as a single void const* ptr, size_t len.

    What do you think?

    opened by MathieuDutSik 0
  • robin_hood::map overflow

    robin_hood::map overflow

    Dear developer,

    I am using robin hood map (3.11.2) under MinGW. I got overflow error when insert some number as below log.

    ... Add 20686628 Add 20790466 Add 20950897 Add 20938052 Add 20481416 terminate called after throwing an instance of 'std::overflow_error' what(): robin_hood::map overflow

    I have attach reproduce file. Please check it. :)

    Thank you.

    test.zip

    opened by ihmin 10
  • Insert fails with std::overflow_error again

    Insert fails with std::overflow_error again

    Hello.

    I have encountered std::overflow_error again.

    I am using robin hood version 3.10.0.

    This time I cannot create test case. I hope you can give me guidelines to find source of problem. I am working with robin_hood::unordered_flat_set<int32_t>. My application is working with 200 containers and 10 million elements in total, so I cannot find good way to create test example. It takes about 5 minutes of number crunching before application crashes while trying to insert that specific integer into that specific container.

    My application is basically doing this:

    • Create 200 containers.
    • Do work with all containers (container::insert, container::erase, container::clear). This work takes about 5 minutes.
    • At some moment, application has to process some problematic container.
    • Problematic container contains some elements.
    • I know what are new elements that must be inserted into problematic container.
    • Problematic container is cleared before insertion.
    • New elements are inserted into problematic container one-by-one. Insertion fails with std::overflow_error after some number.

    Exception is not thrown if I recreate standalone test case with same elements.

    Thanks.

    opened by slavenf 24
  • WIP: Natvis visualization

    WIP: Natvis visualization

    Saw #102 and in case others are looking for the same for visual studio, here's my natvis: https://gist.github.com/ikrima/1184c75b979cbfa655c0883c0d4ab068

    Threw it together for a specific personal debug case but sharing in case it might be useful for someone else (the natvis docs are obtuse)

    opened by ikrima 0
  • Provide GDB pretty printers for ease of debugging

    Provide GDB pretty printers for ease of debugging

    opened by Waqar144 0
  • Can I pre-hash and cache hashes to speed up lookup?

    Can I pre-hash and cache hashes to speed up lookup?

    Hi! This repo looks great and I'm planning to use robin_hood::unordered_node_map in my project (https://messerlab.org/slim/). I have a question (which might be a feature request); I hope this is a good place to ask it.

    My usage case is that I want to use robin_hood::unordered_node_map to implement symbol tables in an interpreter. Variables in the symbol table are often defined once and then used many times (in a loop, say), so fast lookup is crucial. Symbols are defined by their std::string names, so the key type for the hash is std::string. I'd like to avoid having to recompute the hash every time the interpreted code uses a variable. Since I parse the interpreted code and turn it into an AST, I have a convenient place, in the AST node, to keep a pre-computed hash value for identifiers such as variable names. I'd therefore like to be able to pre-compute the hashes for all identifiers used in the whole AST, and then use those pre-computed hash values for both insert and find operations in the symbol table. (Of course the std::string key would still need to be passed in as well, but would be used only in case of a collision in find, and would never need to be hashed since the pre-computed hash would be supplied.)

    Also, my interpreter chains multiple hash tables together, to represent nested scopes: a hash table for the local scope (such as inside a function) gets constructed and torn down very frequently, while a hash table for the global scope lives more or less forever. When a variable name is used in the interpreted code, I want to first try to look it up in the local scope, and if that fails, then check the global scope for the same key. This is another case where being able to supply a pre-computed hash value for the key would be a win, instead of computing the hash once for the local-scope lookup and then again for the global-scope lookup.

    So I'm wondering (1) whether your code already supports pre-computed hashes (sorry, I'm just getting into your code, and I'm not very good with complex C++ template code, so I'm having a bit of trouble finding my way), and (2) whether, if not, this would be straightforward to add, or whether there are fundamental design obstacles in the way of doing this with your code. If it isn't there, but would be straightforward to add, I'd be happy to take a crack at it and submit a PR, if you're too busy to deal with it. :->

    Thanks for this great open-source code!

    opened by bhaller 4
  • `is_transparent` is not working

    `is_transparent` is not working

    Hi,

    I tried using robin_hood::unordered_map, and its really nice! However, I tried using the is_transparent interface, and it almost works, but does not quite work: It looks like this is because you are putting std::hash into a wrapper.

    I defined std::hash to accept a type BUILD_args_ref in addition to the keytype BUILD_args. This all works OK, except that robin_hood::hash doesn't accept my extra hashkey type! You can see this here:

    ../../source/otc/robin_hood.h:1322:39: error: cannot convert ‘const BUILD_args_ref’ to ‘const BUILD_args&’
     1322 |         *idx = Mix{}(WHash::operator()(key));
          |                      ~~~~~~~~~~~~~~~~~^~~~~
    

    I made a quick hack that fixed this:

    // A thin wrapper around std::hash, performing an additional simple mixing step of the result.
    template <typename T>
    struct hash : public std::hash<T> {
        
        size_t operator()(T const& obj) const
            noexcept(noexcept(std::declval<std::hash<T>>().operator()(std::declval<T const&>()))) {
            // call base hash
            auto result = std::hash<T>::operator()(obj);
            // return mixed of that, to be save against identity has
            return hash_int(static_cast<uint64_t>(result));
        }
    
        template <typename K>
        size_t operator()(K const& obj) const
            noexcept(noexcept(std::declval<std::hash<T>>().operator()(std::declval<K const&>()))) {
            // call base hash
            auto result = std::hash<T>::operator()(obj);
            // return mixed of that, to be save against identity has
            return hash_int(static_cast<uint64_t>(result));
        }
    };
    

    I am not sure if this is sufficiently careful, but it seems to work.

    bug 
    opened by bredelings 1
  • Heterogeneous lookup

    Heterogeneous lookup

    Hello, first of all I would like to thank you for your library. I noticed that you added a “Heterogeneous lookup” in the latest updates, however, I still couldn’t get it working.

    struct string_hash
    {
    	using is_transparent = void;
    	
    	std::size_t operator()(const std::string& key)	const { return robin_hood::hash_bytes(key.c_str(), key.size()); }
    	std::size_t operator()(std::string_view key)	const { return robin_hood::hash_bytes(key.data(), key.size()); }
    	std::size_t operator()(const char* key)		const { return robin_hood::hash_bytes(key, std::strlen(key)); }
    };
    
    struct string_equal
    {
    	using is_transparent = int;
    
    	bool operator()(std::string_view lhs, const std::string& rhs) const {
    		const std::string_view view = rhs;
    		return lhs == view;
    	}
    
    	bool operator()(const char* lhs, const std::string& rhs) const {
    		return std::strcmp(lhs, rhs.c_str()) == 0;
    	}
    
    	bool operator()(const std::string& lhs, const std::string& rhs) const {
    		return lhs == rhs;
    	}
    };
    
    int main()
    {
    	robin_hood::unordered_flat_map<std::string, uint64_t, string_hash, string_equal> map;
    	std::string_view key = "key";
    	map.emplace(key, 100);
    	const auto it = map.find(key); // fail
    }
    
    bug 
    opened by NIKEA-SOFT 2
  • Debugflag to enable iterator stability error check

    Debugflag to enable iterator stability error check

    Add a define that enables error checking in robin_hood map, to check for invalid uses of iteratos after map/set is modified. This is mostly for debugging.

    • Add a flag ROBIN_HOOD_ITERATOR_STABILITY_CHECK or something like that which enables/disables this code part:
      • Add counter member variable to robin_hood (initialized e.g. with a random number or just zero)
      • Counter is increased whenever the map is modified
      • All iterators get this counter when counter
      • accessing the iterator checks if the iterator's counter is the same as the map's counter, throws an exception otherwise

    Thanks John for suggesting this!

    enhancement 
    opened by martinus 0
  • Investigate non-throwing overflow resolution

    Investigate non-throwing overflow resolution

    For extremely bad hash, overflow can happen. Investigate how to resolve this gracefully (and without losing performance!)

    quality 
    opened by martinus 18
Releases(3.11.3)
  • 3.11.3(Jul 12, 2021)

  • 3.11.2(May 16, 2021)

  • 3.11.1(Mar 25, 2021)

    Direct download: robin_hood.h

    • Fix compiling with MSVC 2015 #118, thanks @jeremyg-lunarg!
    • Support insert with initializer list, found by @xuanqing94
    • Updated copyright to 2021
    • overflow fix: rehash with different hash instead (#121). This uses a pretty nice method to get rid of potential overflow problems, by rehashing with a slightly modified hash. Thanks for @slavenf for making the problem reproducible!
    Source code(tar.gz)
    Source code(zip)
  • 3.10.0(Mar 14, 2021)

    Direct download: robin_hood.h

    • new API compact(): if possible reallocates the map to a smaller one.
    • Improve CMake integration in #105, thanks @Ryan-rsm-McKenzie!
    • Switch to murmurhash3 in hash_int (it's not slower, but much higher quality hash)
    • try_emplace and insert_or_assign now use a single lookup instead of two, in #116. Thanks @GilesBathgate for the proof of concept!
    Source code(tar.gz)
    Source code(zip)
  • 3.9.1(Nov 13, 2020)

  • 3.9.0(Oct 25, 2020)

    Direct download: robin_hood.h

    • README updated with conan
    • Travis now builds on OSX, ARM, S390 too.
    • better intrinsics handling, can now be manually disabled with #define ROBIN_HOOD_DISABLE_INTRINSICS
    • Support MSVC's wchar_t
    • [[noreturn]] doThrow
    • use std::free and std::malloc
    • fixed explicit ctor warning
    Source code(tar.gz)
    Source code(zip)
  • 3.8.0(Jul 18, 2020)

    Direct download: robin_hood.h

    • Add member functions from C++17 #79
    • Fix is_transparent typedef detection #78
    • Add Conan package manager usage example #81
    • Support hash for smart pointers, string views, and basic_string
    • Updated hash (again...) to not use CRC. It's still not optimal :-(

    Thanks @k0zmo, @Talkless for your contributions!

    Source code(tar.gz)
    Source code(zip)
  • 3.7.0(Jun 25, 2020)

    Direct download: robin_hood.h

    • Improved iteration speed
    • Updated with new hash algorithm
    • C++17 fix for VS2019
    • Heterogeneous lookup support (#72)
    • Minor bug & compiler warning fixes

    Thanks @wolfpld, @gergondet, @wyattoday for your contributions!

    Source code(tar.gz)
    Source code(zip)
  • 3.6.0(Mar 14, 2020)

  • 3.5.2(Feb 29, 2020)

  • 3.5.1(Feb 18, 2020)

  • 3.5.0(Jan 25, 2020)

  • 3.4.4(Jan 18, 2020)

  • 3.4.3(Nov 10, 2019)

    Direct download: robin_hood.h

    • Fixed long standing iteration & deletion bug #42 - Now there are no known bugs left!
    • Fixed -Werror=free-nonheap-object problem with g++-7
    • Cmake interface library target now working #52
    • update LINT's
    • updated doctest to 2.3.5
    Source code(tar.gz)
    Source code(zip)
  • 3.4.1(Oct 23, 2019)

  • 3.4.0(Aug 1, 2019)

    Direct download: robin_hood.h

    • Added lots of noexcept
    • Code now works with -fno-exceptions, instead of throw I simply call abort();.
    • More stringent clang-tidy checks
    • nodiscard for C++17
    • improved iterator consistency
    • Added some tests for better code coverage
    Source code(tar.gz)
    Source code(zip)
  • 3.3.2(Jun 26, 2019)

  • 3.3.1(Jun 26, 2019)

  • 3.3.0(Jun 23, 2019)

  • 3.2.15(Jun 6, 2019)

    Direct download: robin_hood.h

    This bugfix release contains a critical fix: assigning to moved map was broken. Please upgrade!

    • assign to moved map fixed
    • swap member fixed
    • Thanks @wyattoday for cleanup and ARM64 fixes!
    Source code(tar.gz)
    Source code(zip)
  • 3.2.14(May 31, 2019)

  • 3.2.13(May 31, 2019)

  • 3.2.7(May 3, 2019)

    Important robin_hood.h stability fixes and improvements:

    • fix include for old clang on darwin build
    • no more static object required: got rid of the static dummybyte, reusing mMask when it's still empty.
    • Per the standard, out-of-bounds pointer were undefined behavior. Now no out-of-bounds pointers are used any more.
    • fixed a count() and .at() bug for empty maps
    • fixed move of empty map
    • Much more stringent compile warnings, with fixes: loads of cast warnings, __int128, fallthrough
    • visual studio compile fix

    Lots of infrastructure improvements:

    • Complete rewrite of all unit tests
    • switched from catch2 to doctest: much faster builds
    • cmake all the things!
    Source code(tar.gz)
    Source code(zip)
  • 3.2.3(Apr 23, 2019)

    This is a bugfix release concering the overflow handling

    • Now 127 objects an be hashed to the same bucket before overflowing.
    • Make sure the map stays in consistent state when overflow exception is thrown
    Source code(tar.gz)
    Source code(zip)
  • 3.2.0(Feb 7, 2019)

  • 3.1.0(Feb 7, 2019)

  • 3.0.1(Feb 6, 2019)

  • 3.0.0(Jan 31, 2019)

    3.0.0 because of incompatible API Changes. Renamed flat_map into unordered_flat_map and node_map into unordered_node_map to prevent confusion with other flat_map implementations that use a sorted array.

    value_type for node now uses const Key.

    Source code(tar.gz)
    Source code(zip)
  • 2.4.0(Jan 27, 2019)

  • 2.3.1(Jan 26, 2019)

    Much faster iteration, about 50% speedup. Now robin_hood is by far the fastest in that regard.

    Some minor fixes, more tests, more benchmarks, 32bit build for windows & linux & all tests work.

    Source code(tar.gz)
    Source code(zip)
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Thibaut Goetghebuer-Planchon 579 Jul 27, 2021
A c++ toolbox of locality-sensitive hashing (LSH), provides several popular LSH algorithms, also support python and matlab.

LSHBOX-0.9 A C++ Toolbox of Locality-Sensitive Hashing for Large Scale Image Retrieval, Also Support Python and MATLAB. Change Log 2015.07.04 A new LS

null 251 Jul 6, 2021
C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

Thibaut Goetghebuer-Planchon 479 Jul 22, 2021
A fast, memory efficient hash map for C++

I now recommend using the parallel hashmap instead of sparsepp, unless if you are stuck with a non C++11 compatible compiler, or if using a little bit

Gregory Popovitch 1.1k Jul 23, 2021
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Giorgio Vinciguerra 528 Jul 14, 2021
A drop-in replacement for std::list with 293% faster insertion, 57% faster erasure, 17% faster iteration and 77% faster sorting on average. 20-24% speed increase in use-case testing.

plf_list A drop-in replacement for std::list with (on average): 293% faster insertion 57% faster erasure 17% faster iteration 77% faster sorting 70% f

Matt Bentley 105 Jul 19, 2021
ring-span lite - A C++yy-like ring_span type for C++98, C++11 and later in a single-file header-only library

ring-span lite: A circular buffer view for C++98 and later Contents Example usage In a nutshell Dependencies Installation Synopsis Reported to work wi

Martin Moene 102 Jul 19, 2021