STX B+ Tree C++ Template Classes

Overview

STX B+ Tree C++ Template Classes v0.9

Author: Timo Bingmann (Mail: tb a-with-circle panthema dot net)
Date: 2013-05-05

The STX B+ Tree package is obsolete.

The B+ Tree code was merged into the TLX Library:

https://github.com/tlx/tlx/tree/master/tlx/container

This is an improved version with better STL semantics and it will be maintain in the TLX library.

Summary

The STX B+ Tree package is a set of C++ template classes implementing a B+ tree key/data container in main memory. The classes are designed as drop-in replacements of the STL containers set, map, multiset and multimap and follow their interfaces very closely. By packing multiple value pairs into each node of the tree the B+ tree reduces heap fragmentation and utilizes cache-line effects better than the standard red-black binary tree. The tree algorithms are based on the implementation in Cormen, Leiserson and Rivest's Introduction into Algorithms, Jan Jannink's paper and other algorithm resources. The classes contain extensive assertion and verification mechanisms to ensure the implementation's correctness by testing the tree invariants. To illustrate the B+ tree's structure a wxWidgets demo program is included in the source package.

Website / API Docs / Bugs / License

The current source package can be downloaded from http://panthema.net/2007/stx-btree/

The include files are extensively documented using doxygen. The compiled doxygen html documentation is included in the source package. It can also be viewed online at http://panthema.net/2007/stx-btree/stx-btree-0.9/doxygen-html/

The wxWidgets B+ tree demo program is located in the directory wxbtreedemo. Compiled binary versions can be found on the package web page.

If bugs should become known they will be posted on the above web page together with patches or corrected versions.

The B+ tree template source code is released under the Boost Software License, Version 1.0, which can be found at the header of each include file.

All auxiliary programs like the wxWidgets demo, test suite and speed tests are licensed under the GNU General Public License v3 (GPLv3), which can be found in the file COPYING.GPLv3.

Original Idea

The idea originally arose while coding a read-only database, which used a huge map of millions of non-sequential integer keys to 8-byte file offsets. When using the standard STL red-black tree implementation this would yield millions of 20-byte heap allocations and very slow search times due to the tree's height. So the original intension was to reduce memory fragmentation and improve search times. The B+ tree solves this by packing multiple data pairs into one node with a large number of descendant nodes.

In computer science lectures it is often stated that using consecutive bytes in memory would be more cache-efficient, because the CPU's cache levels always fetch larger blocks from main memory. So it would be best to store the keys of a node in one continuous array. This way the inner scanning loop would be accelerated by benefiting from cache effects and pipelining speed-ups. Thus the cost of scanning for a matching key would be lower than in a red-black tree, even though the number of key comparisons are theoretically larger. This second aspect aroused my academic interest and resulted in the speed test experiments.

A third inspiration was that no working C++ template implementation of a B+ tree could be found on the Internet. Now this one can be found.

Implementation Overview

This implementation contains five main classes within the stx namespace (blandly named Some Template eXtensions). The base class btree implements the B+ tree algorithms using inner and leaf nodes in main memory. Almost all STL-required function calls are implemented (see below for the exceptions). The asymptotic time requirements of the STL standard are theoretically not always fulfilled. However in practice this B+ tree performs better than the STL's red-black tree at the cost of using more memory. See the speed test results for details.

The base class is then specialized into btree_set, btree_multiset, btree_map and btree_multimap using default template parameters and facade functions. These classes are designed to be drop-in replacements for the corresponding STL containers.

The insertion function splits the nodes on recursion unroll. Erase is largely based on Jannink's ideas. See http://dbpubs.stanford.edu:8090/pub/1995-19 for his paper on "Implementing Deletion in B+-trees".

The two set classes (btree_set and btree_multiset) are derived from the base implementation class btree by specifying an empty struct as data_type. All functions are adapted to provide the base class with empty placeholder objects. Note that it is somewhat inefficient to implement a set or multiset using a B+ tree: a plain B tree (without +) would hold no extra copies of the keys. The main focus was on implementing the maps.

Problem with Separated Key/Data Arrays

The most noteworthy difference to the default red-black tree implementation of std::map is that the B+ tree does not hold key/data pairs together in memory. Instead each B+ tree node has two separate arrays containing keys and data values. This design was chosen to utilize cache-line effects while scanning the key array.

However it also directly generates many problems in implementing the iterators' operators. These return a (writable) reference or pointer to a value_type, which is a std::pair composition. These data/key pairs however are not stored together and thus a temporary copy must be constructed. This copy should not be written to, because it is not stored back into the B+ tree. This effectively prohibits use of many STL algorithms which writing to the B+ tree's iterators. I would be grateful for hints on how to resolve this problem without folding the key and data arrays.

Test Suite

The B+ tree distribution contains an extensive test suite. According to gcov 90.9% of the btree.h implementation is covered.

STL Incompatibilities

Key and Data Type Requirements

The tree algorithms currently do not use copy-construction. All key/data items are allocated in the nodes using the default-constructor and are subsequently only assigned new data (using operator=).

Key Iterators' Operators

The most important incompatibility are the non-writable operator* and operator-> of the iterator. See above for a discussion of the problem on separated key/data arrays. Instead of *iter and iter-> use the new function iter.data() which returns a writable reference to the data value in the tree.

Key Erase Functions

The B+ tree supports three erase functions:

size_type erase(const key_type &key); // erase all data pairs matching key bool erase_one(const key_type &key); // erase one data pair matching key void erase(iterator iter); // erase pair referenced by iter

The following STL-required function is not supported:

void erase(iterator first, iterator last);

Extensions

Beyond the usual STL interface the B+ tree classes support some extra goodies.

// Output the tree in a pseudo-hierarchical text dump to std::cout. This
// function requires that BTREE_DEBUG is defined prior to including the btree
// headers. Furthermore the key and data types must be std::ostream printable.
void print() const;

// Run extensive checks of the tree invariants. If a corruption in found the
// program will abort via assert(). See below on enabling auto-verification.
void verify() const;

// Serialize and restore the B+ tree nodes and data into/from a binary image.
// This requires that the key and data types are integral and contain no
// outside pointers or references.
void dump(std::ostream &os) const;
bool restore(std::istream &is);

B+ Tree Traits

All tree template classes take a template parameter structure which holds important options of the implementation. The following structure shows which static variables specify the options and the corresponding defaults:

struct btree_default_map_traits
{
    // If true, the tree will self verify it's invariants after each insert()
    // or erase(). The header must have been compiled with BTREE_DEBUG
    // defined.
    static const bool   selfverify = false;

    // If true, the tree will print out debug information and a tree dump
    // during insert() or erase() operation. The header must have been
    // compiled with BTREE_DEBUG defined and key_type must be std::ostream
    // printable.
    static const bool   debug = false;

    // Number of slots in each leaf of the tree. Estimated so that each node
    // has a size of about 128 bytes.
    static const int    leafslots =
                             MAX( 8, 128 / (sizeof(_Key) + sizeof(_Data)) );

    // Number of slots in each inner node of the tree. Estimated so that each
    // node has a size of about 128 bytes.
    static const int    innerslots =
                             MAX( 8, 128 / (sizeof(_Key) + sizeof(void*)) );

    // As of stx-btree-0.9, the code does linear search in find_lower() and
    // find_upper() instead of binary_search, unless the node size is larger
    // than this threshold. See notes at
    // http://panthema.net/2013/0504-STX-B+Tree-Binary-vs-Linear-Search
    static const size_t binsearch_threshold = 256;
};

Speed Tests

The implementation was tested using the speed test sources contained in the package. For a long discussion on results please see the web page

http://panthema.net/2007/stx-btree/

Comments
  • Condition set permanently to false

    Condition set permanently to false

    In the following line, the condition is always false, thus linear search is always performed. Is it on purpose?

    https://github.com/bingmann/stx-btree/blob/f27df4c7b6a7a776390ec2825dcfc1547bddb714/include/stx/btree.h#L1636

    opened by gallushi 1
  • perfomance issue

    perfomance issue

    inline bool key_lessequal(const key_type& a, const key_type b) const inline bool key_greaterequal(const key_type& a, const key_type b) const

    passing b by value significantly decreases performance

    opened by smirnov-vs 1
  • Fixed __assert() collision with System V ABI.

    Fixed __assert() collision with System V ABI.

    There was an __assert() function defined in tpunit.h which collided with the one specified in System V Application Binary Interface and defined in compatible systems (Linux, Mac OS X) in /usr/include/assert.h. This caused test suite compilation to fail on those systems. I removed the conflict by renaming the function to __stx_assert().

    opened by msiedlarek 1
  • Function erase(iterator /* first */, iterator /* last */) interface has not been implemented

    Function erase(iterator /* first */, iterator /* last */) interface has not been implemented

    #ifdef BTREE_TODO /// Erase all key/data pairs in the range [first,last). This function is /// currently not implemented by the B+ Tree. void erase(iterator /* first /, iterator / last */) { abort(); } #endif

    opened by oracleloyall 0
  • tree_map::erase(iterator ) compile revealing private member error

    tree_map::erase(iterator ) compile revealing private member error

    ::iterator::currslot’ is private within this context [exec] 2778 | if (iter.currslot >= leaf->slotuse) [exec] | ~~~~~^~~~~~~~ [exec] In file included from /third_party/cpp/stx/btree_map.h:28, [exec] from /third_party/cpp/stx/btree_map:28,

     [exec] /third_party/cpp/stx/btree.h:417:33: note: declared private here
     [exec]   417 |         unsigned short          currslot;
     [exec]       |                                 ^~~~~~~~
     [exec] In file included from /third_party/cpp/stx/btree_map.h:28,
     [exec]                  from /third_party/cpp/stx/btree_map:28,
    
    opened by dunevolt 0
  • Fix a small dereference bug

    Fix a small dereference bug

    Hi,

    Just a fix for a tiny bug that I ran into. In addition, I think that this if's body is dead code, since the code in lines 2290-2294 defines that a key cannot be inserted as the last key in the old leaf node.

    opened by ozanani 0
  • stx::btree_multimap equal_range doesn't work as expected

    stx::btree_multimap equal_range doesn't work as expected

    Impossible to iterate through all values of the same key using iterator returned by the equal_range method. I use stx-btree-dev 0.8.6-1 package from ubuntu 12.10. This code demonstrates the issue:

    stx::btree_multimap<long long, long long> m;
    for (long long i = 0; i < 1000L; i++) {
        m.insert(0L, i);
    }
    auto pair = m.equal_range(0L);
    long long count = 0L;
    while(pair.first != pair.second) {
        assert(pair.first->second == count++);
        pair.first++;
    }
    
    opened by Lazin 2
An intrusive C++17 implementation of a Red-Black-Tree, a Weight Balanced Tree, a Dynamic Segment Tree and much more!

This is Ygg (short for Yggdrasil), a C++17 implementation of several intrusive data structures: several balanced binary search trees: a red-black Tree

Lukas Barth 98 Dec 25, 2022
Golang template grammar for tree-sitter

tree-sitter-go-template Golang templates grammar for tree-sitter. NeoVim integration using nvim-treesitter Add gotmpl parser following nvim-treesitter

Nikita Galaiko 28 Nov 30, 2022
A collecton of generic reference counted data structures, tools to create compatible C style classes, and demo applications

The Offbrand library is a collection of reference counted generic data structures written in C for C. The library includes bash scripts to assist in t

Tyler Heck 82 Dec 10, 2022
Dry-comparisons - C++17 Utility classes for comparing multiple values in one simple expression

dry-comparisons This is the source code for the any_of, all_of and none_of C++17 utility mentioned in the blog post "DRY multicomparisons" Public doma

Björn Fahller 204 Jan 4, 2023
SQL grammar for tree sitter

tree-sitter-sql I want to do something fun at work since we have stuff like this in Go: const hoverDocumentQuery = ` -- source: enterprise/internal/co

TJ DeVries 22 Sep 10, 2022
This repository provides implementation of an incremental k-d tree for robotic applications.

ikd-Tree ikd-Tree is an incremental k-d tree designed for robotic applications. The ikd-Tree incrementally updates a k-d tree with new coming points o

HKU-Mars-Lab 362 Jan 4, 2023
Tree sitter grammar for Svelte

Tree-sitter-svelte Tree-sitter grammar for svelte Install npm i tree-sitter-svelte tree-sitter Usage To get started with exploring the grammar in a w

Himujjal Upadhyaya 48 Dec 2, 2022
Device Tree for Redmi K30 Ultra

Copyright (C) 2020 PixelExperience Plus Edition Device configuration for Redmi K30 Ultra ========================================= The Redmi K30 Ultra

Xayah 22 Jun 2, 2022
FleakOS Kernel Source Tree

FleakOS FleakOS Kernel Source Tree Dependencies sudo apt-get install -y xorriso sudo apt-get install -y gcc-multilib sudo apt-get install -y nasm sudo

FleakOS 30 Dec 21, 2022
A tree-sitter grammar for go.mod files

tree-sitter-go-mod tree-sitter grammar for go.mod files. Status The grammar is fairly small, and has been working well for highlighting for me. I expe

Camden Cheek 24 Dec 3, 2022
A tree-sitter grammar for HCL (HashiCorp Configuration Language), used by projects such as Terraform.

tree-sitter-hcl tree-sitter grammar for HCL (HashiCorp Configuration Language) files. HCL is the configuration format used by projects such as Terrafo

Mitchell Hashimoto 65 Nov 29, 2022
A tree-sitter grammar for protocol buffer files (proto3).

tree-sitter-proto tree-sitter grammar for protocol buffer files (proto3 only). Status The grammar should be complete. I'm still working on the highlig

Mitchell Hashimoto 43 Nov 2, 2022
Generic parse tree, configurable lexer, `lemon` parser generator, wrapped for C++17 and Python 3.

This project augments the Lemon parser generator with a high-level parse tree interface, grammar action DSL, and an integrated, configurable lexer allowing the creation of an entire standalone, object-oriented parser from a single input grammar file. The entire parser is written in native C/C++, and the parser interface is made comfortably available to both C++ and Python3 applications.

Aubrey R. Jones 12 Dec 8, 2022
Algo-Tree is a collection of Algorithms and data structures which are fundamentals to efficient code and good software design

Algo-Tree is a collection of Algorithms and data structures which are fundamentals to efficient code and good software design. Creating and designing excellent algorithms is required for being an exemplary programmer. It contains solutions in various languages such as C++, Python and Java.

DSC-Banasthali 53 Oct 4, 2022
Build a tree-sitter dynamic module

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! I should clarify that this module is NOT a standalone tree-sitter module. It is supo

Yuan Fu 53 Jan 5, 2023
RavencoinLite Core integration/staging tree

RavencoinLite Core integration/staging tree https://ravencoinlite.org What is RavencoinLite? RavencoinLite is an experimental digital currency that en

null 24 Oct 12, 2021
tree-sitter parser and syntax highlighter for the Dwarf Fortress raw language

tree-sitter-dfraw A simple language parser and highlighter made with tree-sitter tokyonight nightfly Using with nvim-treesitter Please refer to the ad

null 2 Apr 1, 2022
HEEx grammer for Tree-sitter

Tree-sitter HEEx Tree-sitter grammar and parser for HEEx, the HTML-aware and component-friendly extension of EEx for Phoenix. For Surface support, see

Connor Lay (Clay) 33 Dec 23, 2022