C++ small containers

Overview

Small

C++ Small Containers

Small


  • Applications usually contain many auxiliary small data structures for each large collection of values. Container implementations often include several optimizations for the case when they are small.
  • These optimizations cannot usually make it to the STL because of ABI compatibility issues. Users might need to reimplement these containers or rely on frameworks that include these implementations.
  • Depending on large library collections for simple containers might impose a cost on the user that's higher than necessary and hinder collaboration on the evolution of these containers.
  • This library includes independent implementations of the main STL containers optimized for the case when they are small.

Build Status Latest Release Documentation Discussions


Facebook QZone Weibo Reddit Twitter LinkedIn WhatsApp Line.me Telegram.me HackerNews


READ THE DOCUMENTATION FOR A QUICK START AND EXAMPLES

Table of Contents

Quickstart

Integration:

=== "CMake"

=== "Add subdirectory"

    ```bash
    git clone https://github.com/alandefreitas/small/
    ```

    ```cmake
    add_subdirectory(small)
    # ...
    add_executable(your_target main.cpp)
    target_link_libraries(your_target PUBLIC small::small)
    ```

=== "Fetch content"

    ```cmake
    include(FetchContent)
    
    FetchContent_Declare(small
        GIT_REPOSITORY https://github.com/alandefreitas/small
        GIT_TAG origin/master # or whatever tag you want
    )

    FetchContent_GetProperties(small)
    if(NOT small_POPULATED)
        FetchContent_Populate(small)
        add_subdirectory(${small_SOURCE_DIR} ${small_BINARY_DIR} EXCLUDE_FROM_ALL)
    endif()

    # ...
    add_executable(your_target main.cpp)
    target_link_libraries(your_target PUBLIC small::small)
    ```

=== "External package"

    ```cmake
    # Follow installation instructions and then... 
    find_package(small REQUIRED)
    if(NOT small_FOUND)
        # Throw or put your FetchContent script here
    endif()

    # ...
    add_executable(your_target main.cpp)
    target_link_libraries(your_target PUBLIC small::small)
    ```

=== "Install"

!!! note

    Get the binary package from the [release section](https://github.com/alandefreitas/small/releases). 

    These binaries refer to the latest release version of small.

!!! hint
    
    If you need a more recent version of `small`, you can download the binary packages from the CI artifacts or build the library from the source files.

=== "Build from source"

` !!! note "Setting C++ Compiler" If your C++ compiler that supports C++17 is not your default compiler, make sure you provide CMake with the compiler location with the DCMAKE_C_COMPILER and DCMAKE_CXX_COMPILER options. For instance: ```bash cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2" -DCMAKE_C_COMPILER=/usr/bin/gcc-8 -DCMAKE_CXX_COMPILER=/usr/bin/g++-8 ``` ">
!!! note

    Ensure your C++ compiler and CMake are up-to-date and then:

=== "Ubuntu + GCC"

    ```bash
    # Create a new directory
    mkdir build
    cd build
    # Configure
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2"
    # Build
    sudo cmake --build . --parallel 2 --config Release
    # Install 
    sudo cmake --install .
    # Create packages
    sudo cpack .
    ```

=== "Mac Os + Clang"

    ```bash
    # Create a new directory
    mkdir build
    cd build
    # Configure
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2"
    # Build
    cmake --build . --parallel 2 --config Release
    # Install 
    cmake --install .
    # Create packages
    cpack .
    ```

=== "Windows + MSVC"

    ```bash
    # Create a new directory
    mkdir build
    cd build
    # Configure
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="/O2"
    # Build
    cmake --build . --parallel 2 --config Release
    # Install 
    cmake --install .
    # Create packages
    cpack .
    ```

!!! hint "Parallel Build"
    
    Replace `--parallel 2` with `--parallel `

!!! note "Setting C++ Compiler"

    If your C++ compiler that supports C++17 is not your default compiler, make sure you provide CMake with the compiler location with the DCMAKE_C_COMPILER and DCMAKE_CXX_COMPILER options. For instance:

    ```bash
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-O2" -DCMAKE_C_COMPILER=/usr/bin/gcc-8 -DCMAKE_CXX_COMPILER=/usr/bin/g++-8
    ```

=== "File amalgamation"

!!! note

    Because containers are header-only, you can directly copy the contents from the `source` directory into your project.

!!! hint

    In that case, you are responsible for setting the appropriate target include directories and any compile definitions you might require.

Once the library is properly integrated, you can create containers from the namespace small like any other STL container:

--8<-- "examples/main.cpp"

All containers are optimized for the case when they're small but also efficient when they are large. The containers mix the common techniques found in other small container libraries:

  • Inline allocation for small containers
  • Custom expected sizes
  • Identification of relocatable types
  • Custom growth factors with better defaults
  • Communication with system memory allocators
  • Explicit consideration of CPU cache sizes and branch prediction

Most applications have many small lists and sets of elements. These containers avoid spending a lot of time with large containers that contain just a few elements. Small containers usually try to use the stack before dynamically allocating memory and try to represent associative containers with stack arrays, unless these sets are very large.

The following containers are available:

  • small::vector
  • small::max_size_vector
  • small::string
  • small::set
  • small::max_size_set
  • small::multiset
  • small::max_size_multiset
  • small::unordered_set
  • small::max_size_unordered_set
  • small::unordered_multiset
  • small::max_size_unordered_multiset
  • small::map
  • small::max_size_map
  • small::multimap
  • small::max_size_multimap
  • small::unordered_map
  • small::max_size_unordered_map
  • small::unordered_multimap
  • small::max_size_unordered_multimap
  • small::stack
  • small::queue
  • small::priority_queue

Although many compilers support small string optimization (SSO) already, this library will ensure all strings support SOO, custom inline sizes, relocation, and unicode.

Vectors

This small vector implementation includes:

  • Inline allocation for small vectors
  • Custom expected size
  • Special treatment of relocatable types
    • Relocatable types can be moved with memcpy, bypassing destructors and constructors.
    • Relocatable types are defined by default for POD types and aggregate types of PODs
    • The small::is_relocatable traits can be used as an extension point for custom types
  • Better growth factors
  • Consider the cache line size in allocations
  • Heap allocations can be disabled with small::max_size_vector

When there are fewer elements than a given threshold, the elements are kept in a stack buffer for small vectors. Otherwise, the vector works as usual. However, if you are 100% sure you will never need more than N elements, you can use a max_size_vector, where elements are always inline.

The default number of elements in a small vector is usually the number of elements we can already fit inline in a vector. For larger data types, the default_inline_storage trait can be used as an extension point where one can define how many elements a small vector of that type should contain by default.

--8<-- "examples/default_inline_storage.cpp"

When there's a reasonable default for the number of inline elements, this strategy avoids multiple vector type instantiations for different inline storage sizes.

This small vector implementation used folly, abseil, and LLVM as a reference.

Strings

The small string includes all the common optimizations for small vectors, and a custom template parameter to set how large we expect the string to be (in bytes).

However, when strings are representing text, if there's one thing that makes them not small is not supporting UTF8. In addition to the common interface for strings, small::string includes extra functions to identify and work with UTF8 code points with random access.

--8<-- "examples/unicode_strings.cpp"

The problem of supporting UTF8 is easier to explain than it is to solve. Programming languages tend to solve this problem by (1) forbidding byte or substring access, and/or (2) allowing only access to code points with O(n) cost, where n is the number of code points. Because anything that forbids byte access would be incompatible with a C++ string, we allow direct byte access, and strings are allowed to be malformed unicode, which we can check with small::is_malformed.

All capacity and access functions contain extra overloads that accept codepoint indexes, defined as a strong type, rather than byte indexes. By using these functions, one can ensure the string is never malformed. It's up to the user to decide whether these access functions are useful and worth it in a particular application.

Access to codepoints is provided with an inline lookup-table trick that allows us to access codepoints in O(log m) time, where m is the number of multibyte code points in the strings. When there are no multibyte codepoints in the string, the string works as usual and no extra memory is required for the table.

Sets and Maps

The small set/map classes use a more cache-friendly flat set/map and all other optimizations mentioned above for internal algorithms. As with other small containers, a custom template parameter can be used to define the number of inline elements in the container.

The small::default_inline_storage and small::is_relocatable trait can also be defined for custom types, and all the usual set/map, ordered/unordered, uni/multi variants are also provided:

--8<-- "examples/associative.cpp"

Unlike a small::vector or small::string, the asymptotic time complexities of flat sets/maps are very different from their std:: counterparts and should only be used when they are small. Because they are internally implemented as arrays, manipulating these containers costs O(n).

For large containers, you can use std containers with custom allocators. Or for efficient large containers, you can use the abseil containers, implemented as B+-trees.

Contributing

Get Involved

  • Discussions are concentrated on our GitHub discussions page. Don't refrain from asking questions and proposing ideas. If this library helps you create something interesting, please divulge it with the community.
  • If you are a programmer with good ideas, please share these ideas with us.
  • Academic collaboration is more than welcome. It'd be great to see this library help people write papers.

Ideas and Roadmap

Feel free to contribute new features to this library. For complex features and changes, consider getting feedback from the community first. Contributing to an existing code base with its conventions might seem obscure at first but please don't let that discourage you from sharing your ideas.

There are many ways in which you can contribute to this library:

  • Testing the library in new environments see 1, 2, 3
  • Contributing with interesting examples see 1
  • Finding problems in this documentation see 1
  • Finding bugs in general see 1, 2, 3, 4
  • Whatever idea seems interesting to you

The only thing we ask you is to make sure your contribution is not destructive. Some contributions in which we are not interested are:

  • "I don't like this optional feature, so I removed/deprecated it"
  • "I removed this feature to support older versions of C++" but have not provided an equivalent alternative
  • "I removed this feature, so I don't have to install/update ______" but have not provided an equivalent alternative
  • "I'm creating this high-cost promise that we'll support ________ forever" but I'm not sticking around to keep that promise

In doubt, please open a discussion first

Guidelines

If contributing with code, please leave all warnings ON (-DSMALL_BUILD_WITH_PEDANTIC_WARNINGS=ON), use cppcheck, and clang-format.

If contributing to the documentation, please edit README.md directly, as the files in ./docs are automatically generated with mdsplit.

Contributors

alandefreitas
Alan De Freitas

References

These are some references we used for this work:

Comments
  • Build with Visual Studio 2022 17.2.1

    Build with Visual Studio 2022 17.2.1

    Did you try to build with the latest MSVC ? Here it fails while building the supplied small example project, with some strange errors like:

    1>D:\small\include\small/vector.hpp(410,1): error C2065: 'is_invocable_v': undeclared identifier
    1>D:\small\include\small/vector.hpp(1729): message : see reference to class template instantiation 'small::vector<T,N,Allocator,AllowHeap,SizeType,GrowthFactor>' being compiled
    1>D:\small\include\small/vector.hpp(410,74): error C2988: unrecognizable template declaration/definition
    
    bug 
    opened by Philippe91 6
  • Isolate CMake dev-tools

    Isolate CMake dev-tools

    This PR proposes a CMake file structure refactoring based on the futures project. I also added the linter used in futures.

    Let me know if I did something unnecessary.

    opened by marcosfpr 5
  • fix: Fix the linter handling numeric headers

    fix: Fix the linter handling numeric headers

    As discussed here #8. I started with a simple addition to linter's regex. I'll check the C++ grammar for valid chars in macros to see if we aren't handling something.

    opened by marcosfpr 3
  • Constrain contributors bot

    Constrain contributors bot

    Feature category

    • [ ] enhancement - build system
    • [ ] enhancement - backends
    • [ ] enhancement - build system
    • [x] enhancement - documentation
    • [ ] enhancement - plot categories

    The problem As discussed here #7, having a bot applying documentation changes in forks of arbitrary branches is kind of annoying to contributors in general because now their local branches are suddenly not in sync with their own fork. So, we might address this problem by constraining the bot to update the documentation only for specific branches.

    The solution I'd like

    A possible solution is to edit CI .yml files: https://github.com/alandefreitas/small/blob/9795e1f60eab809b200fc26ad3e7d05fef3cac42/.github/workflows/docs.yml#L3 and https://github.com/alandefreitas/small/blob/9795e1f60eab809b200fc26ad3e7d05fef3cac42/.github/workflows/build.yml#L3 constraining the bot only to specific branches.

    Additional context

    bug 
    opened by marcosfpr 3
  • Make sure destructor gets called when erasing range

    Make sure destructor gets called when erasing range

    Make sure destructor of underlying objects gets called when erasing range. In some cases destructor was not called, leading to memory leaks and other problems.

    opened by tomasz-szyszko 1
  • Linter can't handle numeric headers.

    Linter can't handle numeric headers.

    Bug category

    • [ ] bug - compilation error
    • [ ] bug - compilation warning
    • [ ] bug - runtime error
    • [ ] bug - runtime warning
    • [X] bug - logic error

    Describe the bug

    As discussed here #7, the linter has issues when handling numeric headers. For example, if we have the header SMALL_UTF8_STRING_H, the linter will change it to SMALL_STRING_H8_STRING_H, which is considered unexpected behavior.

    A possibly solution to that issue is refactoring the regex at https://github.com/marcosfpr/small/blob/c4e64a5f3114d610524342914c6c18e262176c27/dev-tools/linter/main.cpp#L213.

    Platform

    • [x] cross-platform issue - linux
    • [x] cross-platform issue - windows
    • [x] cross-platform issue - macos

    Environment Details:

    • OS:
    • OS Version:
    • Compiler:
    • Compiler version:

    Additional context

    bug 
    opened by marcosfpr 1
  • Get rid of cpp_modules scripts

    Get rid of cpp_modules scripts

    The problem

    The lib doesn't have any non-dev dependencies. Catch2 is only used for tests. There's no point in the complexity of the cpp_modules.cmake script.

    Proposed solution

    Remove the cpp_modules.cmake script and just use find_package/FetchContent for the tests, as in this example.

    Alternatives I've considered

    Maintaining things as they are.

    Additional context

    cpp_modules.cmake is there for historical reasons only.

    enhancement 
    opened by alandefreitas 0
  • Bad linter inferences on MSVC

    Bad linter inferences on MSVC

    The problem

    The linter step is inferring the wrong header guards (with / instead of _) on Windows + MSVC.

    Steps to Reproduce

    Same usual build steps as the docs.

    mkdir build
    cd build
    cmake .. <options> -DCMAKE_BUILD_TYPE=Release
    

    Output

    The linter steps have the following output on Windows + MSVC:

    Convert guard from SMALL_DETAIL_ALGORITHM_CONSOLE_UNICODE_GUARD_HPP to SMALL\DETAIL\ALGORITHM\CONSOLE_UNICODE_GUARD_HPP
      Inferred guard SMALL\DETAIL\ALGORITHM\CONSOLE_UNICODE_GUARD_HPP is not OK
      Convert guard from SMALL_DETAIL_ALGORITHM_INTCMP_HPP to SMALL\DETAIL\ALGORITHM\INTCMP_HPP
      Inferred guard SMALL\DETAIL\ALGORITHM\INTCMP_HPP is not OK
      Convert guard from SMALL_DETAIL_ALGORITHM_LEADING_ZEROS_HPP to SMALL\DETAIL\ALGORITHM\LEADING_ZEROS_HPP
      Inferred guard SMALL\DETAIL\ALGORITHM\LEADING_ZEROS_HPP is not OK
      Convert guard from SMALL_DETAIL_ALGORITHM_SHIFT_HPP to SMALL\DETAIL\ALGORITHM\SHIFT_HPP
      Inferred guard SMALL\DETAIL\ALGORITHM\SHIFT_HPP is not OK
    

    Platform

    • [ ] cross-platform issue - linux
    • [x] cross-platform issue - windows
    • [ ] cross-platform issue - macos
    • [ ] other: ___________

    Environment Details

    • OS: Windows
    • OS Version: 11
    • Compiler: MSVC
    • Compiler version: 19.29.30145.0

    Proposed solution

    Debugging and fixing.

    Alternatives I've considered

    No fixing it and not using the linter on MSVC

    Additional context

    The linter has no unit tests, so this kind of error is expected.

    bug 
    opened by alandefreitas 0
  • Doxygen Documentation

    Doxygen Documentation

    I'm starting to develop the Doxygen documentation for Small. Currently, I've just managed to get the structure done.

    For the following steps, I'd fix some issues that may appear on this structure and add new documentation Markdown files to create more documentation as we have in the Futures project.

    opened by marcosfpr 7
  • Move public traits from detail dir

    Move public traits from detail dir

    The problem

    Traits such as default_inline_storage.h and is_relocatable.h are not implementation details. They should go into small/traits/...h instead.

    Proposed solution

    Move default_inline_storage.h and is_relocatable.h into small/traits/...h.

    Alternatives I've considered

    Detail directory.

    Additional context

    These traits got moved there for historical reasons, as they were not public at the time.

    enhancement 
    opened by alandefreitas 0
  • Improve enable_allocator_from_this

    Improve enable_allocator_from_this

    The problem

    enable_allocator_from_this is used to store the allocator with EBO. Its interface should be improved to reflect the proper API for EBO.

    Proposed solution

    https://github.com/alandefreitas/futures/blob/master/include/futures/detail/utility/maybe_empty.hpp

    Alternatives I've considered

    No EBO won't work for our classes. Maintaining the current API is not safe.

    Additional context

    EBO makes the containers 8 bytes smaller when the allocator is not stateful, such as std::allocator.

    enhancement 
    opened by alandefreitas 0
  • Remove strong_type dependency

    Remove strong_type dependency

    The problem

    The library has a single strong type, which is the codepoint index for strings. There's no need to embed an external library for that.

    Proposed solution

    Simply declare the strong_type we need as an implementation detail.

    References:

    • https://en.cppreference.com/w/cpp/types/byte
    • https://github.com/alandefreitas/futures/blob/master/include/futures/detail/utility/byte.hpp

    Alternatives I've considered

    Using an alias won't work because it would be implicitly convertible to std::size_t

    Additional context

    A strong type is important to differentiate codepoint indexes from byte indexes in the UTF8 strings.

    enhancement 
    opened by alandefreitas 0
  • Improve range trait

    Improve range trait

    The problem

    The range trait does not work properly for all range types.

    Proposed solution

    A new range trait using the ADL std::begin/end functions, such as this:

    • https://github.com/alandefreitas/futures/blob/master/include/futures/algorithm/traits/is_range.hpp

    Other traits, such as is_iterator can also come from this reference repo.

    Alternatives I've considered

    The current implementation and the C++20 trait.

    Additional context

    This traits is very simple and it was enough for our purposes, but it's quite easy to change that to the correct traits.

    enhancement 
    opened by alandefreitas 0
Releases(v0.1.2)
Owner
Alan de Freitas
🔭 I’m currently working on matplotplusplus and pareto-front. 🦎 I specialize in evolutionary computation and computational intelligence in general.
Alan de Freitas
A small self-contained alternative to readline and libedit

Linenoise A minimal, zero-config, BSD licensed, readline replacement used in Redis, MongoDB, and Android. Single and multi line editing mode with the

Salvatore Sanfilippo 3k Sep 22, 2022
A small and portable INI file library with read/write support

minIni minIni is a portable and configurable library for reading and writing ".INI" files. At just below 900 lines of commented source code, minIni tr

Thiadmer Riemersma 276 Sep 17, 2022
Small configuration file parser library for C.

libConfuse Introduction Documentation Examples Build & Install Origin & References Introduction libConfuse is a configuration file parser library writ

null 412 Sep 13, 2022
A small proxy DLL which enables dev. console in Mass Effect 1, 2 and 3 (Legendary Edition).

LEBinkProxy A small proxy DLL which enables dev. console in Mass Effect 1, 2 and 3 (Legendary Edition). Usage In your game binary directory (Game\ME?\

null 10 Jan 6, 2022
Small header-only C++ library that helps to initialize Vulkan instance and device object

Vulkan Extensions & Features Help, or VkExtensionsFeaturesHelp, is a small, header-only, C++ library for developers who use Vulkan API.

Adam Sawicki 10 Apr 13, 2022
libparser is a small C library that parses input based on a precompiled context-free grammar.

libparser is a small C library that parses input based on a precompiled context-free grammar.

Mattias Andrée 9 Jul 19, 2022
Small implementation of c++ entity component system inspired by Unity

EntityComponentSystem About This is small implementation of entity component system with C++. The API is heavily inspired by Unity ECS framework, but

Lukas Chodosevičius 2 Oct 13, 2021
A small utility to set the clock on a Hayes Stack Chronograph over its serial port.

chronosync A small utility to set the clock on a Hayes Stack Chronograph over its serial port. Synopsis chronosync [-d] [-s serial speed] <serial devi

joshua stein 1 Oct 1, 2021
A small ip trigger for C++

Hello, You can use this library for get ip adresses from in the string!

Fahrettin Enes 1 Oct 22, 2021
Small Header only library to parse argv for flags

Small Header only library to parse argv for flags

Ben Wernicke 2 Dec 8, 2021
This is a Header-only c++ string implentation which specializes in dealing with small strings. 🧵

string-impl This is a Header-only c++ string implentation which specializes in dealing with small strings. ?? Usage ⌨ Instantiation A string can be in

null 1 Apr 1, 2022
C++ small containers

Applications usually contain many auxiliary small data structures for each large collection of values.

Alan de Freitas 93 Sep 12, 2022
Library of generic and type safe containers in pure C language (C99 or C11) for a wide collection of container (comparable to the C++ STL).

M*LIB: Generic type-safe Container Library for C language Overview M*LIB (M star lib) is a C library enabling to use generic and type safe container i

PpHd 523 Sep 22, 2022
A collection of single-file C libraries. (generic containers, random number generation, argument parsing and other functionalities)

cauldron A collection of single-file C libraries and tools with the goal to be portable and modifiable. Libraries library description arena-allocator.

Camel Coder 33 Sep 17, 2022
Collection of C++ containers extracted from LLVM

lvc lvc is a set of C++ containers extracted form LLVM for an easier integration in external projects. To avoid any potential conflit, the llvm namesp

Benjamin Navarro 26 Apr 22, 2022
Netdata's distributed, real-time monitoring Agent collects thousands of metrics from systems, hardware, containers, and applications with zero configuration.

Netdata is high-fidelity infrastructure monitoring and troubleshooting. It runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT devices, and is perfectly safe to install on your systems mid-incident without any preparation.

netdata 60.7k Sep 21, 2022
Allocator Aware Containers Made Easy

Custom Allocator Aware containers made easy with "wit" Have you ever looked at the allocator specification and thought about how hard it would be to

Darth Rubik 4 Sep 25, 2021
A family of header-only, very fast and memory-friendly hashmap and btree containers.

The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map

Gregory Popovitch 1.6k Sep 21, 2022