libmdbx is an extremely fast, compact, powerful, embedded, transactional key-value database, with permissive license

Overview

Please refer to the online documentation with C API description and pay attention to the C++ API.

Questions, feedback and suggestions are welcome to the Telegram' group.

For NEWS take a look to the ChangeLog.

libmdbx

libmdbx is an extremely fast, compact, powerful, embedded, transactional key-value database, with permissive license. libmdbx has a specific set of properties and capabilities, focused on creating unique lightweight solutions.

  1. Allows a swarm of multi-threaded processes to ACIDly read and update several key-value maps and multimaps in a locally-shared database.

  2. Provides extraordinary performance, minimal overhead through Memory-Mapping and Olog(N) operations costs by virtue of B+ tree.

  3. Requires no maintenance and no crash recovery since it doesn't use WAL, but that might be a caveat for write-intensive workloads with durability requirements.

  4. Compact and friendly for fully embedding. Only ≈25KLOC of C11, ≈64K x86 binary code of core, no internal threads neither server process(es), but implements a simplified variant of the Berkeley DB and dbm API.

  5. Enforces serializability for writers just by single mutex and affords wait-free for parallel readers without atomic/interlocked operations, while writing and reading transactions do not block each other.

  6. Guarantee data integrity after crash unless this was explicitly neglected in favour of write performance.

  7. Supports Linux, Windows, MacOS, Android, iOS, FreeBSD, DragonFly, Solaris, OpenSolaris, OpenIndiana, NetBSD, OpenBSD and other systems compliant with POSIX.1-2008.

Historically, libmdbx is a deeply revised and extended descendant of the amazing Lightning Memory-Mapped Database. libmdbx inherits all benefits from LMDB, but resolves some issues and adds a set of improvements.

The next version is under active non-public development from scratch and will be released as MithrilDB and libmithrildb for libraries & packages. Admittedly mythical Mithril is resembling silver but being stronger and lighter than steel. Therefore MithrilDB is a rightly relevant name.

MithrilDB will be radically different from libmdbx by the new database format and API based on C++17, as well as the Apache 2.0 License. The goal of this revolution is to provide a clearer and robust API, add more features and new valuable properties of the database.

https://t.me/libmdbx GithubCI AppveyorCI CircleCI CirrusCI Coverity Scan Status

The Future will (be) Positive. Всё будет хорошо.


Table of Contents

Characteristics

Features

  • Key-value data model, keys are always sorted.

  • Fully ACID-compliant, through to MVCC and CoW.

  • Multiple key-value sub-databases within a single datafile.

  • Range lookups, including range query estimation.

  • Efficient support for short fixed length keys, including native 32/64-bit integers.

  • Ultra-efficient support for multimaps. Multi-values sorted, searchable and iterable. Keys stored without duplication.

  • Data is memory-mapped and accessible directly/zero-copy. Traversal of database records is extremely-fast.

  • Transactions for readers and writers, ones do not block others.

  • Writes are strongly serialized. No transaction conflicts nor deadlocks.

  • Readers are non-blocking, notwithstanding snapshot isolation.

  • Nested write transactions.

  • Reads scale linearly across CPUs.

  • Continuous zero-overhead database compactification.

  • Automatic on-the-fly database size adjustment.

  • Customizable database page size.

  • Olog(N) cost of lookup, insert, update, and delete operations by virtue of B+ tree characteristics.

  • Online hot backup.

  • Append operation for efficient bulk insertion of pre-sorted data.

  • No WAL nor any transaction journal. No crash recovery needed. No maintenance is required.

  • No internal cache and/or memory management, all done by basic OS services.

Limitations

  • Page size: a power of 2, minimum 256 (mostly for testing), maximum 65536 bytes, default 4096 bytes.
  • Key size: minimum 0, maximum ≈½ pagesize (2022 bytes for default 4K pagesize, 32742 bytes for 64K pagesize).
  • Value size: minimum 0, maximum 2146435072 (0x7FF00000) bytes for maps, ≈½ pagesize for multimaps (2022 bytes for default 4K pagesize, 32742 bytes for 64K pagesize).
  • Write transaction size: up to 1327217884 pages (4.944272 TiB for default 4K pagesize, 79.108351 TiB for 64K pagesize).
  • Database size: up to 2147483648 pages (≈8.0 TiB for default 4K pagesize, ≈128.0 TiB for 64K pagesize).
  • Maximum sub-databases: 32765.

Gotchas

  1. There cannot be more than one writer at a time, i.e. no more than one write transaction at a time.

  2. libmdbx is based on B+ tree, so access to database pages is mostly random. Thus SSDs provide a significant performance boost over spinning disks for large databases.

  3. libmdbx uses shadow paging instead of WAL. Thus syncing data to disk might be a bottleneck for write intensive workload.

  4. libmdbx uses copy-on-write for snapshot isolation during updates, but read transactions prevents recycling an old retired/freed pages, since it read ones. Thus altering of data during a parallel long-lived read operation will increase the process work set, may exhaust entire free database space, the database can grow quickly, and result in performance degradation. Try to avoid long running read transactions.

  5. libmdbx is extraordinarily fast and provides minimal overhead for data access, so you should reconsider using brute force techniques and double check your code. On the one hand, in the case of libmdbx, a simple linear search may be more profitable than complex indexes. On the other hand, if you make something suboptimally, you can notice detrimentally only on sufficiently large data.

Comparison with other databases

For now please refer to chapter of "BoltDB comparison with other databases" which is also (mostly) applicable to libmdbx.

Improvements beyond LMDB

libmdbx is superior to legendary LMDB in terms of features and reliability, not inferior in performance. In comparison to LMDB, libmdbx make things "just work" perfectly and out-of-the-box, not silently and catastrophically break down. The list below is pruned down to the improvements most notable and obvious from the user's point of view.

Added Features

  1. Keys could be more than 2 times longer than LMDB.

For DB with default page size libmdbx support keys up to 2022 bytes and up to 32742 bytes for 64K page size. LMDB allows key size up to 511 bytes and may silently loses data with large values.

  1. Up to 30% faster than LMDB in CRUD benchmarks.

Benchmarks of the in-tmpfs scenarios, that tests the speed of the engine itself, showned that libmdbx 10-20% faster than LMDB, and up to 30% faster when libmdbx compiled with specific build options which downgrades several runtime checks to be match with LMDB behaviour.

These and other results could be easily reproduced with ioArena just by make bench-quartet command, including comparisons with RockDB and WiredTiger.

  1. Automatic on-the-fly database size adjustment, both increment and reduction.

libmdbx manages the database size according to parameters specified by mdbx_env_set_geometry() function, ones include the growth step and the truncation threshold.

Unfortunately, on-the-fly database size adjustment doesn't work under Wine due to its internal limitations and unimplemented functions, i.e. the MDBX_UNABLE_EXTEND_MAPSIZE error will be returned.

  1. Automatic continuous zero-overhead database compactification.

During each commit libmdbx merges a freeing pages which adjacent with the unallocated area at the end of file, and then truncates unused space when a lot enough of.

  1. The same database format for 32- and 64-bit builds.

libmdbx database format depends only on the endianness but not on the bitness.

  1. LIFO policy for Garbage Collection recycling. This can significantly increase write performance due write-back disk cache up to several times in a best case scenario.

LIFO means that for reuse will be taken the latest becomes unused pages. Therefore the loop of database pages circulation becomes as short as possible. In other words, the set of pages, that are (over)written in memory and on disk during a series of write transactions, will be as small as possible. Thus creates ideal conditions for the battery-backed or flash-backed disk cache efficiency.

  1. Fast estimation of range query result volume, i.e. how many items can be found between a KEY1 and a KEY2. This is a prerequisite for build and/or optimize query execution plans.

libmdbx performs a rough estimate based on common B-tree pages of the paths from root to corresponding keys.

  1. mdbx_chk utility for database integrity check. Since version 0.9.1, the utility supports checking the database using any of the three meta pages and the ability to switch to it.

  2. Support for opening databases in the exclusive mode, including on a network share.

  3. Zero-length for keys and values.

  4. Ability to determine whether the particular data is on a dirty page or not, that allows to avoid copy-out before updates.

  5. Extended information of whole-database, sub-databases, transactions, readers enumeration.

libmdbx provides a lot of information, including dirty and leftover pages for a write transaction, reading lag and holdover space for read transactions.

  1. Extended update and delete operations.

libmdbx allows one at once with getting previous value and addressing the particular item from multi-value with the same key.

  1. Useful runtime options for tuning engine to application's requirements and use cases specific.

  2. Automated steady sync-to-disk upon several thresholds and/or timeout via cheap polling.

  3. Sequence generation and three persistent 64-bit markers.

  4. Handle-Slow-Readers callback to resolve a database full/overflow issues due to long-lived read transaction(s).

  5. Ability to determine whether the cursor is pointed to a key-value pair, to the first, to the last, or not set to anything.

Other fixes and specifics

  1. Fixed more than 10 significant errors, in particular: page leaks, wrong sub-database statistics, segfault in several conditions, nonoptimal page merge strategy, updating an existing record with a change in data size (including for multimap), etc.

  2. All cursors can be reused and should be closed explicitly, regardless ones were opened within a write or read transaction.

  3. Opening database handles are spared from race conditions and pre-opening is not needed.

  4. Returning MDBX_EMULTIVAL error in case of ambiguous update or delete.

  5. Guarantee of database integrity even in asynchronous unordered write-to-disk mode.

libmdbx propose additional trade-off by MDBX_SAFE_NOSYNC with append-like manner for updates, that avoids database corruption after a system crash contrary to LMDB. Nevertheless, the MDBX_UTTERLY_NOSYNC mode is available to match behaviour of the MDB_NOSYNC in LMDB.

  1. On MacOS & iOS the fcntl(F_FULLFSYNC) syscall is used by default to synchronize data with the disk, as this is the only way to guarantee data durability in case of power failure. Unfortunately, in scenarios with high write intensity, the use of F_FULLFSYNC significantly degrades performance compared to LMDB, where the fsync() syscall is used. Therefore, libmdbx allows you to override this behavior by defining the MDBX_OSX_SPEED_INSTEADOF_DURABILITY=1 option while build the library.

  2. On Windows the LockFileEx() syscall is used for locking, since it allows place the database on network drives, and provides protection against incompetent user actions (aka poka-yoke). Therefore libmdbx may be a little lag in performance tests from LMDB where the named mutexes are used.

History

Historically, libmdbx is a deeply revised and extended descendant of the Lightning Memory-Mapped Database. At first the development was carried out within the ReOpenLDAP project. About a year later libmdbx was separated into a standalone project, which was presented at Highload++ 2015 conference.

Since 2017 libmdbx is used in Fast Positive Tables, and development is funded by Positive Technologies.

Acknowledgments

Howard Chu [email protected] is the author of LMDB, from which originated the libmdbx in 2015.

Martin Hedenfalk [email protected] is the author of btree.c code, which was used to begin development of LMDB.


Usage

Currently, libmdbx is only available in a source code form. Packages support for common Linux distributions is planned in the future, since release the version 1.0.

Never use tarballs nor zips automatically provided by Github !

Please don't use tarballs nor zips which are automatically provided by Github. These archives do not contain version information and thus are unfit to build libmdbx. Instead of ones just clone the git repository, either download a tarball or zip with the properly amalgamated source core. Moreover, please vote for ability of disabling auto-creation such unsuitable archives.

Source code embedding

libmdbx provides two official ways for integration in source code form:

  1. Using the amalgamated source code.

The amalgamated source code includes all files required to build and use libmdbx, but not for testing libmdbx itself.

  1. Adding the complete original source code as a git submodule.

This allows you to build as libmdbx and testing tool. On the other hand, this way requires you to pull git tags, and use C++11 compiler for test tool.

Please, avoid using any other techniques. Otherwise, at least don't ask for support and don't name such chimeras libmdbx.

The amalgamated source code could be created from the original clone of git repository on Linux by executing make dist. As a result, the desired set of files will be formed in the dist subdirectory.

Building and Testing

Both amalgamated and original source code provides build through the use CMake or GNU Make with bash. All build ways are completely traditional and have minimal prerequirements like build-essential, i.e. the non-obsolete C/C++ compiler and a SDK for the target platform. Obviously you need building tools itself, i.e. git, cmake or GNU make with bash. For your convenience, make help and make options are also available for listing existing targets and build options respectively.

The only significant specificity is that git' tags are required to build from complete (not amalgamated) source codes. Executing git fetch --tags --force --prune is enough to get ones, or git fetch --unshallow --tags --prune --force after the Github's actions/checkout@v2 either set fetch-depth: 0 for it.

So just using CMake or GNU Make in your habitual manner and feel free to fill an issue or make pull request in the case something will be unexpected or broken down.

Testing

The amalgamated source code does not contain any tests for or several reasons. Please read the explanation and don't ask to alter this. So for testing libmdbx itself you need a full source code, i.e. the clone of a git repository, there is no option.

The full source code of libmdbx has a test subdirectory with minimalistic test "framework". Actually yonder is a source code of the mdbx_test – console utility which has a set of command-line options that allow construct and run a reasonable enough test scenarios. This test utility is intended for libmdbx's developers for testing library itself, but not for use by users. Therefore, only basic information is provided:

  • There are few CRUD-based test cases (hill, TTL, nested, append, jitter, etc), which can be combined to test the concurrent operations within shared database in a multi-processes environment. This is the basic test scenario.
  • The Makefile provide several self-described targets for testing: smoke, test, check, memcheck, test-valgrind, test-asan, test-leak, test-ubsan, cross-gcc, cross-qemu, gcc-analyzer, smoke-fault, smoke-singleprocess, test-singleprocess, 'long-test'. Please run make --help if doubt.
  • In addition to the mdbx_test utility, there is the script long_stochastic.sh, which calls mdbx_test by going through set of modes and options, with gradually increasing the number of operations and the size of transactions. This script is used for mostly of all automatic testing, including Makefile targets and Continuous Integration.
  • Brief information of available command-line options is available by --help. However, you should dive into source code to get all, there is no option.

Anyway, no matter how thoroughly the libmdbx is tested, you should rely only on your own tests for a few reasons:

  1. Mostly of all use cases are unique. So it is no warranty that your use case was properly tested, even the libmdbx's tests engages stochastic approach.
  2. If there are problems, then your test on the one hand will help to verify whether you are using libmdbx correctly, on the other hand it will allow to reproduce the problem and insure against regression in a future.
  3. Actually you should rely on than you checked by yourself or take a risk.

Common important details

Build reproducibility

By default libmdbx track build time via MDBX_BUILD_TIMESTAMP build option and macro. So for a reproducible builds you should predefine/override it to known fixed string value. For instance:

  • for reproducible build with make: make MDBX_BUILD_TIMESTAMP=unknown ...
  • or during configure by CMake: cmake -DMDBX_BUILD_TIMESTAMP:STRING=unknown ...

Of course, in addition to this, your toolchain must ensure the reproducibility of builds. For more information please refer to reproducible-builds.org.

Containers

There are no special traits nor quirks if you use libmdbx ONLY inside the single container. But in a cross-container cases or with a host-container(s) mix the two major things MUST be guaranteed:

  1. Coherence of memory mapping content and unified page cache inside OS kernel for host and all container(s) operated with a some DB. Basically this means must be only a single physical copy of each memory mapped DB' page in the system memory.

  2. Uniqueness of PID values and/or a common space for ones:

    • for POSIX systems: PID uniqueness for all processes operated with a DB. I.e. the --pid=host is required for run DB-aware processes inside Docker, either without host interaction a --pid=container: with the same name/id.
    • for non-POSIX (i.e. Windows) systems: inter-visibility of processes handles. I.e. the OpenProcess(SYNCHRONIZE, ..., PID) must return reasonable error, including ERROR_ACCESS_DENIED, but not the ERROR_INVALID_PARAMETER as for an invalid/non-existent PID.

DSO/DLL unloading and destructors of Thread-Local-Storage objects

When building libmdbx as a shared library or use static libmdbx as a part of another dynamic library, it is advisable to make sure that your system ensures the correctness of the call destructors of Thread-Local-Storage objects when unloading dynamic libraries.

If this is not the case, then unloading a dynamic-link library with libmdbx code inside, can result in either a resource leak or a crash due to calling destructors from an already unloaded DSO/DLL object. The problem can only manifest in a multithreaded application, which makes the unloading of shared dynamic libraries with libmdbx code inside, after using libmdbx. It is known that TLS-destructors are properly maintained in the following cases:

  • On all modern versions of Windows (Windows 7 and later).

  • On systems with the __cxa_thread_atexit_impl() function in the standard C library, including systems with GNU libc version 2.18 and later.

  • On systems with libpthread/ntpl from GNU libc with bug fixes #21031 and #21032, or where there are no similar bugs in the pthreads implementation.

Linux and other platforms with GNU Make

To build the library it is enough to execute make all in the directory of source code, and make check to execute the basic tests.

If the make installed on the system is not GNU Make, there will be a lot of errors from make when trying to build. In this case, perhaps you should use gmake instead of make, or even gnu-make, etc.

FreeBSD and related platforms

As a rule, in such systems, the default is to use Berkeley Make. And GNU Make is called by the gmake command or may be missing. In addition, bash may be absent.

You need to install the required components: GNU Make, bash, C and C++ compilers compatible with GCC or CLANG. After that, to build the library, it is enough to execute gmake all (or make all) in the directory with source code, and gmake check (or make check) to run the basic tests.

Windows

For build libmdbx on Windows the original CMake and Microsoft Visual Studio 2019 are recommended. Otherwise do not forget to add ntdll.lib to linking.

Building by MinGW, MSYS or Cygwin is potentially possible. However, these scripts are not tested and will probably require you to modify the CMakeLists.txt or Makefile respectively.

It should be noted that in libmdbx was efforts to resolve runtime dependencies from CRT and other MSVC libraries. For this is enough to define the MDBX_WITHOUT_MSVC_CRT during build.

An example of running a basic test script can be found in the CI-script for AppVeyor. To run the long stochastic test scenario, bash is required, and such testing is recommended with placing the test data on the RAM-disk.

Windows Subsystem for Linux

libmdbx could be used in WSL2 but NOT in WSL1 environment. This is a consequence of the fundamental shortcomings of WSL1 and cannot be fixed. To avoid data loss, libmdbx returns the ENOLCK (37, "No record locks available") error when opening the database in a WSL1 environment.

MacOS

Current native build tools for MacOS include GNU Make, CLANG and an outdated version of bash. Therefore, to build the library, it is enough to run make all in the directory with source code, and run make check to execute the base tests. If something goes wrong, it is recommended to install Homebrew and try again.

To run the long stochastic test scenario, you will need to install the current (not outdated) version of bash. To do this, we recommend that you install Homebrew and then execute brew install bash.

Android

We recommend using CMake to build libmdbx for Android. Please refer to the official guide.

iOS

To build libmdbx for iOS, we recommend using CMake with the "toolchain file" from the ios-cmake project.

API description

Please refer to the online libmdbx API reference and/or see the mdbx.h header.

Bindings

Runtime Repo Author
Haskell libmdbx-hs Francisco Vallarino
NodeJS lmdbx-js Kris Zyp
NodeJS node-mdbx Сергей Федотов
Ruby ruby-mdbx Mahlon E. Smith
Go mdbx-go Alex Sharov
Nim NimDBX Jens Alfke
Rust libmdbx-rs Artem Vorotnikov
Rust mdbx gcxfd
Java mdbxjni Castor Technologies
Python (draft) python-bindings branch Noel Kuntze
.NET (obsolete) mdbx.NET Jerry Wang

Performance comparison

All benchmarks were done in 2015 by IOArena and multiple scripts runs on Lenovo Carbon-2 laptop, i7-4600U 2.1 GHz (2 physical cores, 4 HyperThreading cores), 8 Gb RAM, SSD SAMSUNG MZNTD512HAGL-000L1 (DXT23L0Q) 512 Gb.

Integral performance

Here showed sum of performance metrics in 3 benchmarks:

  • Read/Search on the machine with 4 logical CPUs in HyperThreading mode (i.e. actually 2 physical CPU cores);

  • Transactions with CRUD operations in sync-write mode (fdatasync is called after each transaction);

  • Transactions with CRUD operations in lazy-write mode (moment to sync data to persistent storage is decided by OS).

Reasons why asynchronous mode isn't benchmarked here:

  1. It doesn't make sense as it has to be done with DB engines, oriented for keeping data in memory e.g. Tarantool, Redis), etc.

  2. Performance gap is too high to compare in any meaningful way.

Comparison #1: Integral Performance


Read Scalability

Summary performance with concurrent read/search queries in 1-2-4-8 threads on the machine with 4 logical CPUs in HyperThreading mode (i.e. actually 2 physical CPU cores).

Comparison #2: Read Scalability


Sync-write mode

  • Linear scale on left and dark rectangles mean arithmetic mean transactions per second;

  • Logarithmic scale on right is in seconds and yellow intervals mean execution time of transactions. Each interval shows minimal and maximum execution time, cross marks standard deviation.

10,000 transactions in sync-write mode. In case of a crash all data is consistent and conforms to the last successful transaction. The fdatasync syscall is used after each write transaction in this mode.

In the benchmark each transaction contains combined CRUD operations (2 inserts, 1 read, 1 update, 1 delete). Benchmark starts on an empty database and after full run the database contains 10,000 small key-value records.

Comparison #3: Sync-write mode


Lazy-write mode

  • Linear scale on left and dark rectangles mean arithmetic mean of thousands transactions per second;

  • Logarithmic scale on right in seconds and yellow intervals mean execution time of transactions. Each interval shows minimal and maximum execution time, cross marks standard deviation.

100,000 transactions in lazy-write mode. In case of a crash all data is consistent and conforms to the one of last successful transactions, but transactions after it will be lost. Other DB engines use WAL or transaction journal for that, which in turn depends on order of operations in the journaled filesystem. libmdbx doesn't use WAL and hands I/O operations to filesystem and OS kernel (mmap).

In the benchmark each transaction contains combined CRUD operations (2 inserts, 1 read, 1 update, 1 delete). Benchmark starts on an empty database and after full run the database contains 100,000 small key-value records.

Comparison #4: Lazy-write mode


Async-write mode

  • Linear scale on left and dark rectangles mean arithmetic mean of thousands transactions per second;

  • Logarithmic scale on right in seconds and yellow intervals mean execution time of transactions. Each interval shows minimal and maximum execution time, cross marks standard deviation.

1,000,000 transactions in async-write mode. In case of a crash all data is consistent and conforms to the one of last successful transactions, but lost transaction count is much higher than in lazy-write mode. All DB engines in this mode do as little writes as possible on persistent storage. libmdbx uses msync(MS_ASYNC) in this mode.

In the benchmark each transaction contains combined CRUD operations (2 inserts, 1 read, 1 update, 1 delete). Benchmark starts on an empty database and after full run the database contains 10,000 small key-value records.

Comparison #5: Async-write mode


Cost comparison

Summary of used resources during lazy-write mode benchmarks:

  • Read and write IOPs;

  • Sum of user CPU time and sys CPU time;

  • Used space on persistent storage after the test and closed DB, but not waiting for the end of all internal housekeeping operations (LSM compactification, etc).

ForestDB is excluded because benchmark showed it's resource consumption for each resource (CPU, IOPs) much higher than other engines which prevents to meaningfully compare it with them.

All benchmark data is gathered by getrusage() syscall and by scanning the data directory.

Comparison #6: Cost comparison


This is a mirror of the origin repository that was moved to abf.io because of discriminatory restrictions for Russian Crimea.

Comments
  • Performance degradation when hot part of DB getting bigger than RAM (harder than LMDB)

    Performance degradation when hot part of DB getting bigger than RAM (harder than LMDB)

    We have speed tests for our app and they show on 64Gb RAM our app is faster with MDBX than with LMDB, but at 16GB machine - after hot part of DB goes over 16Gb speed of app+MDBX getting behind app+LMDB. I don't see any specific bottleneck. GC size is small ~16mb.

    Settings: i did try various settings - picture doesn't change - last time I did try to disable all additional features of MDBX (to get it closer to LMDB): MDBX_TXN_CHECKOWNER=0, MDBX_ENV_CHECKPID=0,MDBX_DISABLE_PAGECHECKS=1,MDBX_ENABLE_REFUND=0,NoReadahead|Durable, env.SetGeometry(-1, -1, 2TB, 10GB, -1, -1), OptTxnDpLimit=128*1024,OptRpAugmentLimit=32*1024*1024 - all others are defaults. But It doesn't change picture.

    Here is some chart: Schermata_2021-02-16_alle_19 36 40

    enhancement 
    opened by AskAlexSharov 62
  • mdbx_txn_commit_ex: MDBX_KEYEXIST: Key/data pair already exists

    mdbx_txn_commit_ex: MDBX_KEYEXIST: Key/data pair already exists

    It's a bit unclear for me - what I can do with MDBX_KEYEXIST inside mdbx_txn_commit_ex last master: 21d2af9e902ba6e3220651996754427ff9ac361d do you need some additional info?

    bug 
    opened by AskAlexSharov 38
  • mdbx_env_open() error

    mdbx_env_open() error "Invalid argument" on WSL (Windows Subsystem for Linux)

    Ubuntu 18.04/clang-9

       rc = mdbx_env_open(db->env, dir_path, 0, 0666);
        if (rc != MDBX_SUCCESS) {
            log_error(get_thread_data()->ctx, "mdbx_env_open() error: %s", mdbx_strerror(rc));
    

    Got

    mdbx_env_open() error: Invalid argument
    

    But database files were created:

    mdbx.dat  mdbx.lck
    

    Corresponding part of strace output

    mkdir("<path>", 0777) = -1 EEXIST (File exists)
    openat(AT_FDCWD, "<path>/mdbx.dat", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 12
    openat(AT_FDCWD, "<path>/mdbx.dat", O_WRONLY|O_DSYNC|O_CLOEXEC) = 13
    openat(AT_FDCWD, "<path>/mdbx.lck", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 14
    getpid()                                = 14076
    sched_yield()                           = 0
    fcntl(14, F_OFD_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = -1 EINVAL (Invalid argument)
    
    opened by oleg-kiriyenko 30
  • Unexpected internal error during cursor_put

    Unexpected internal error during cursor_put

    I use the latest release version, though the database has been originally created by a slightly older version of MDBX ( v0.9.3-193-g1275bdb6). This database has a lot of GC space, because I removed quite a few table before re-filling them. Here are the stats:

    mdbx_stat v0.10.0-0-gaa1f6fbd (2021-05-09T03:01:59+03:00, T-794e1a9437599eaf67ef14c38adfc811ebba47cd)
    Running for /home/alexey/data2/tg/chaindata/...
    Garbage Collection
      Pagesize: 4096
      Tree depth: 2
      Branch pages: 1
      Leaf pages: 16
      Overflow pages: 213376
      Entries: 995
      GC: 218500495 pages
    Status of Main DB
      Pagesize: 4096
      Tree depth: 1
      Branch pages: 0
      Leaf pages: 1
      Overflow pages: 0
      Entries: 50
    

    I has been able to reproduce the problem 3 times and I think I can reproduce again. After the first time, as @AskAlexSharov suggested, I ran mdbx_chk and here is what I got:

    alexey@tgtest:~/turbo-geth$ ./build/bin/mdbx_chk /home/alexey/data2/tg/chaindata/
    mdbx_chk v0.9.3-193-g1275bdb6 (2021-05-06T02:05:33+03:00, T-b0c05720dac4eabc77a664b84ad463110480901d)
    Running for /home/alexey/data2/tg/chaindata/ in 'read-only' mode...
    Traversal b-tree by txn#228211...
    Iterating DBIs...
    No error is detected, elapsed 6931.453 seconds
    

    On the third time, I managed to reproduce it while having more verbose logging. These are the logs before the error happened

    NFO [05-10|11:07:56.144] [1/14 Headers] Executed blocks           number=5041026 blk/second=83.400    batch=341.81MiB alloc=1.64GiB    sys=4.83GiB   numGC=10789
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    INFO [05-10|11:08:26.147] [1/14 Headers] Executed blocks           number=5043405 blk/second=79.300    batch=358.32MiB alloc=1.81GiB    sys=4.83GiB   numGC=10795
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 87 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16460 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 87 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16460 dirty-room
    INFO [05-10|11:08:56.142] [1/14 Headers] Executed blocks           number=5046040 blk/second=90.862    batch=375.57MiB alloc=1.61GiB    sys=4.83GiB   numGC=10801
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    INFO [05-10|11:09:26.136] [1/14 Headers] Executed blocks           number=5048380 blk/second=80.690    batch=390.33MiB alloc=1.08GiB    sys=4.83GiB   numGC=10807
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 87 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16460 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    INFO [05-10|11:09:56.139] [1/14 Headers] Executed blocks           number=5050968 blk/second=86.267    batch=404.02MiB alloc=1.21GiB    sys=4.83GiB   numGC=10812
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 87 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16460 dirty-room
    INFO [05-10|11:10:26.144] [1/14 Headers] Executed blocks           number=5053730 blk/second=92.067    batch=421.01MiB alloc=1.33GiB    sys=4.83GiB   numGC=10817
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    INFO [05-10|11:10:56.146] [1/14 Headers] Executed blocks           number=5056734 blk/second=100.133   batch=437.24MiB alloc=2.07GiB    sys=4.83GiB   numGC=10821
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    INFO [05-10|11:11:26.139] [1/14 Headers] Executed blocks           number=5059352 blk/second=90.276    batch=453.68MiB alloc=1.78GiB    sys=4.83GiB   numGC=10826
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    INFO [05-10|11:11:56.162] [1/14 Headers] Executed blocks           number=5061638 blk/second=76.200    batch=468.32MiB alloc=2.39GiB    sys=4.83GiB   numGC=10830
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_set_readahead:9213 readahead OFF 302514176..303038464
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    INFO [05-10|11:12:26.143] [1/14 Headers] Executed blocks           number=5064087 blk/second=84.448    batch=482.39MiB alloc=1.78GiB    sys=4.83GiB   numGC=10835
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    INFO [05-10|11:12:56.151] [1/14 Headers] Executed blocks           number=5066357 blk/second=75.667    batch=496.52MiB alloc=2.14GiB    sys=4.83GiB   numGC=10839
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16372 dirty-entries (have 89 dirty-room, need 91)
    mdbx_txn_spill:8758 spilled 16372 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 87 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16460 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 301522135 node[0] left-leaf page #302609439 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000101>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 301536063 node[0] left-leaf page #300518473 key <null>
    mdbx_page_split:21213 update prev-first key on parent <00000000000000000000000000000000000000000000000000000000000000031990d104ebe0d000e12d70f2bf340dfd9b8298fb>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 73500826 node[0] left-leaf page #300520158 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000328744a3ce59a1a2f7fa17e22a3955ba61558356f>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 301543300 node[0] left-leaf page #302632753 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000103>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 87 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16460 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 301553567 node[0] left-leaf page #302645485 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000101>
    mdbx_page_split:21203 adding to parent page 301553584 node[0] left-leaf page #302645499 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000103>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 301555121 node[0] left-leaf page #302646713 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000101>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 301559208 node[0] left-leaf page #302649728 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000102>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 301569787 node[0] left-leaf page #302652947 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000102>
    mdbx_page_split:21203 adding to parent page 301570159 node[0] left-leaf page #302654484 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000103>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    mdbx_page_split:21203 adding to parent page 301576369 node[0] left-leaf page #301565589 key <null>
    mdbx_page_split:21213 update prev-first key on parent <000000000000000000000000000000000000000000000000000000000000000398178c420d545739fe7f3b267edc1076b481db67>
    mdbx_txn_spill:8552 spilling 16373 dirty-entries (have 88 dirty-room, need 89)
    mdbx_txn_spill:8758 spilled 16373 dirty-entries, now have 16461 dirty-room
    ERROR[05-10|11:13:50.888] Error                                    err="mdbx_cursor_put: MDBX_PROBLEM: Unexpected internal error, transaction should be aborted"
    mdbx_sync_locked:13929 open-MADV_DONTNEED 303038464..293983607
    INFO [05-10|11:13:52.590] database closed (MDBX)                   mdbx=chaindata
    Error: mdbx_cursor_put: MDBX_PROBLEM: Unexpected internal error, transaction should be aborted
    

    This is the place we would want to commit the transaction.

    bug 
    opened by AlexeyAkhunov 28
  • mdbx_load fails with MDBX_MAP_FULL: Environment mapsize limit reached

    mdbx_load fails with MDBX_MAP_FULL: Environment mapsize limit reached

    Steps to reproduce the bug

    1. wget https://github.com/erthink/libmdbx/releases/download/v0.9.1/libmdbx-amalgamated-0.9.1.tar.gz
    2. tar xvfz libmdbx-amalgamated-0.9.1.tar.gz
    3. make
    4. prepare 5'000 key,value in two text files of 5'000 lines each, about 2*300KB of data (t1.txt, t2.txt)
    5. run ./mdbx_load -T foo < t1.txt
    6. run ./mdbx_load -T foo < t2.txt

    Expected behaviour

    first 2'500 key value pairs are loaded in the foo MDBX. then another 2'500 key value pairs are loaded in the foo MDBX.

    Resulting behaviour

    first

    mdbx_load v0.9.1-0-g44b1a3b (2020-09-30T13:28:01+03:00, T-cdf44eb040316fff8c20f34069d5148e758f9ab7)
    Running for foo...
    ./mdbx_load: at input line 5000: mdbx_dbi_close() error -30780, MDBX_BAD_DBI: The specified DBI-handle is invalid or changed by another thread/transaction
    

    2500 entries loaded

    then

    mdbx_load v0.9.1-0-g44b1a3b (2020-09-30T13:28:01+03:00, T-cdf44eb040316fff8c20f34069d5148e758f9ab7)
    Running for foo...
    ./mdbx_load: at input line 1516: mdbx_cursor_put() error -30792, MDBX_MAP_FULL: Environment mapsize limit reached
    

    no new entries loaded, stays at 2'500.

    I thought mdbx was able to extend the DB automatically. Else mdbx_load should propose an option to specify max size of the DB.

    Additional information

    By reducing the number of lines in the second batch, I'm able to load about 3'600 entries, 1MB of data.

    System

    Linux 5.6.0-2-amd64 #1 SMP Debian 5.6.14-2 (2020-06-09) x86_64 GNU/Linux

    opened by setop 20
  • High percentile of cursor.put show that some calls are very slow

    High percentile of cursor.put show that some calls are very slow

    We have next piece of code in app to update DupSort DBI:

    err := cursor.put(key, value, MDBX_NOOVERWRITE)
    if err != nil {
       if mdbx.IsKeyExists(err) {
          return c.put(key, value, MDBX_CURRENT)
       }
       return err
    }
    

    I built prometheus dashboard for latency of this code:

    • 0.5 percentile show very similar speed of lmdb and mdbx - ~1microsecond
    • 0.75 percentile show - ~2microseconds vs ~3microseconds
    • 0.99 percentile shows - ~200microseconds vs ~2milliseconds
    Screen Shot 2020-11-03 at 11 53 37 Screen Shot 2020-11-03 at 11 53 45 Screen Shot 2020-11-03 at 11 54 13

    Updates flushing by big sorted batches (we store updates in std:map, then sort, and flush to db in 1 tx).

    enhancement 
    opened by AskAlexSharov 20
  • Crashes and excessive memory + CPU consumption when dealing with corrupted databases

    Crashes and excessive memory + CPU consumption when dealing with corrupted databases

    I've now spent a bit of time fuzzing libmdbx, like I fuzzed Berkeley DB, LMDB, GDBM, TDB and other databases inspired by BDB and/or DBM in the past. Sorry, I didn't find about libmdbx until fairly recently...

    In the README and Makefile, I read that you worked on fixing some crashes, and that you have asan & ubsan test targets - so clearly, you paid at least some level of attention to memory safety and UB, which is a good thing. However, with a Time To First Crash around 2 minutes, '2021 libmdbx tolerates corrupted databases better than '2018 and '2021 LMDB do (TTFC << 1s on mdb_dump), and marginally better than Berkeley DB 18.1.40 (yes, the latest version, despite dozens of fixes for CVE-numbered issues over the years... I basically gave up reporting issues) does, but libmdbx is not quite fool-proof yet :)

    Building and starting a first, simple fuzzing job is straightforward, along the lines of:

    git clone https://github.com/erthink/libmdbx
    cd libmdbx
    AFL_USE_ASAN=1 CC=$HOME/AFLplusplus/afl-clang-fast DESTDIR=$HOME/libmdbx_prefix_asan make install V=1
    cd ..
    mkdir libmdbx_fuzz
    cd libmdbx_fuzz/
    mkdir input
    echo "" | $HOME/libmdbx_prefix_asan/usr/local/bin/mdbx_load -T -n empty
    echo -e "key1\nvalue1" | $HOME/libmdbx_prefix_asan/usr/local/bin/mdbx_load -T -n one
    cd ..
    mkdir /dev/shm/libmdbx_tmpdir
    AFL_TMPDIR=/dev/shm/libmdbx_tmpdir $HOME/AFLplusplus/afl-fuzz -i input -o output -M master -- $HOME/libmdbx_prefix_asan/usr/local/bin/mdbx_chk @@
    

    (the AFL++ setup, which basically reduces to git clone and make when the build dependencies are installed, is not described here, for brevity)

    I stopped the mdbx_chk fuzzing process a bit after reaching 1M execs. Triaging the crashes already showed 5 unique code locations and SIGBUS, SIGSEGV, weirdness when unpoisoning memory, use after poison through wild pointers: that's enough to warrant creating this issue and provide the information which can enable you to perform your own fuzzing jobs. The final afl-fuzz output was:

                    american fuzzy lop ++3.14a (master) [fast] {0}
    +- process timing ------------------------------------+- overall results ----+
    |        run time : 0 days, 0 hrs, 42 min, 20 sec     |  cycles done : 12    |
    |   last new path : 0 days, 0 hrs, 0 min, 54 sec      |  total paths : 240   |
    | last uniq crash : 0 days, 0 hrs, 0 min, 18 sec      | uniq crashes : 48    |
    |  last uniq hang : 0 days, 0 hrs, 6 min, 3 sec       |   uniq hangs : 15    |
    +- cycle progress ---------------------+- map coverage+----------------------+
    |  now processing : 3*2 (1.2%)         |    map density : 6.16% / 10.04%     |
    | paths timed out : 0 (0.00%)          | count coverage : 1.65 bits/tuple    |
    +- stage progress ---------------------+- findings in depth -----------------+
    |  now trying : splice 3               | favored paths : 81 (33.75%)         |
    | stage execs : 62/110 (56.36%)        |  new edges on : 120 (50.00%)        |
    | total execs : 1.05M                  | total crashes : 8155 (48 unique)    |
    |  exec speed : 447.0/sec              |  total tmouts : 78 (29 unique)      |
    +- fuzzing strategy yields ------------+-------------+- path geometry -------+
    |   bit flips : disabled (default, enable with -D)   |    levels : 11        |
    |  byte flips : disabled (default, enable with -D)   |   pending : 55        |
    | arithmetics : disabled (default, enable with -D)   |  pend fav : 0         |
    |  known ints : disabled (default, enable with -D)   | own finds : 235       |
    |  dictionary : n/a                                  |  imported : 0         |
    |havoc/splice : 202/531k, 81/509k                    | stability : 99.54%    |
    |py/custom/rq : unused, unused, unused, unused       +-----------------------+
    |    trim/eff : disabled, disabled                   |          [cpu000:100%]
    +----------------------------------------------------+
    

    The crash triage output is part of the tarball.

    NOTE: in order to reproduce crashes, the best practice is to start from fresh copies of the files. The output of AddressSanitizer killing mdbx_chk on the attached files seems to be stable from one run to the next one (apart from randomized addresses, of course), but for instance, starting from fresh files is definitely necessary for reproducing a subset of the endless stream of crashes in Berkeley DB.

    Ideas for improving the next stages of the fuzzing process:

    • first and foremost, using a much wider input corpus. The 2-file corpus I used is enough to show that libmdbx's tolerance to offline data corruption / specially crafted files needs improvements, but there should be tests with > 1 keys / values, sub-databases, different page sizes, both endiannesses, databases after some creations and deletions of items, etc.
    • fuzzing the other CLI front-ends as well;
    • using ubsan and msan instrumentation for fuzzing (unless using UMRs is an integral part of the way libmdbx works, but I doubt it);
    • using AFL persistent mode, which often speeds up the fuzzing process;
    • using Honggfuzz (and also its persistent mode): I use it less often than AFL++, but in some of my past runs, it found some interesting testcases that AFL didn't.

    Looking forward to the fixes which will make libmdbx even more production ready ;)

    mdbx_chk_asan_crashes_20210707.tar.gz

    bug 
    opened by debrouxl 19
  • GCC build errors reported on Windows

    GCC build errors reported on Windows

    I have a bug report against my Nimdbx wrapper, that mdbx.c (the amalgamation prebuilt by me with make dist) fails to compile with GCC on Windows. I do know Nimdbx builds on macOS and Ubuntu. The first two errors are:

    mdbx.c:16:10: error: #include expects "FILENAME" or <FILENAME>
     #include MDBX_CONFIG_H
              ^~~~~~~~~~~~~
    mdbx.c:1371:29: error: unknown type name 'FILE_INFO_BY_HANDLE_CLASS'
         _In_ HANDLE hFile, _In_ FILE_INFO_BY_HANDLE_CLASS FileInformationClass,
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~
    

    I've never seen a preprocessor symbol used as the parameter of #include before; is it a GCC extension? But it seems to work for me with Clang, and on Github Actions CI with GCC on Ubuntu.

    I grepped the libmdbx "dist" sources, and there is no definition of FILE_INFO_BY_HANDLE_CLASS anywhere. The code that uses it is conditionalized for Windows only. Is this a libmdbx bug? Or is that symbol supposed to come from a Windows platform header?

    Unrelated, but I also noticed a missing line break on line 1 of mdbx.c:

    #define MDBX_ALLOY 1n#define MDBX_BUILD_SOURCERY 9854b516cd42bd97a2447d2f4adfda8cd99464906b15a3646a1409c129fbb1bc_v0_9_2_21_g1c70814
    

    It looks like the "n" before the second #define was supposed to be a "\\n"...

    (This is using libmdbx commit 3a441d6d3a755e97af803f731fc9059d8818ebd6 from 4 Jan.)

    enhancement not-a-bug 
    opened by snej 17
  • FeatureRequest: OSX support

    FeatureRequest: OSX support

    I'm really excited about this project, it looks very promising, and my team and I are looking into building elixir/erlang bindings for this, but I'm having trouble compiling for OSX and I saw you mentioned in another issue that there is no OSX support yet.

    Is there a timeline or an estimate as to when you're planning to make this OSX compatible?

    enhancement help wanted 
    opened by alexdovzhanyn 17
  • Последние наработки по LSM

    Последние наработки по LSM

    Не знаю по адресу или нет. Я тут наткнулся на такое обсуждение https://forum.golangbridge.org/t/leveldb-written-in-go-build-from-scratch/4431 В итоге этот чел сделал своё хранилище: https://github.com/dgraph-io/badger Основывается оно на последних наработках по LSM (как я понял из ридми у libmdbx LSM тоже используется): https://www.usenix.org/system/files/conference/fast16/fast16-papers-lu.pdf Судя по тому что у него получилось https://blog.dgraph.io/post/badger/ Барсук порвал RocksDB. А RocksDB одна из быстрых хранилищ судя по этим тестам: https://www.influxdata.com/benchmarking-leveldb-vs-rocksdb-vs-hyperleveldb-vs-lmdb-performance-for-influxdb/

    Может будет полезным, если не в курсе.

    invalid wontfix 
    opened by 0x4E69676874466F78 17
  • Intermittent MDBX_BAD_DBI error

    Intermittent MDBX_BAD_DBI error

    For still unknown reasons I sometimes get an MDBX_BAD_DBI error in my unit tests: The specified DBI was changed unexpectedly: MDBX_BAD_DBI: The specified DBI-handle is invalid or changed by another thread/transaction (RC=-30780)

    For instance: https://github.com/david-bouyssie/mdbx4s/runs/5423443215?check_suite_focus=true

    Do you have an idea of the underlying problem?

    not-a-bug 
    opened by david-bouyssie 13
  • Incoherent flaw of Linux' unified page/buffer cache

    Incoherent flaw of Linux' unified page/buffer cache

    Исходя из наблюдаемого в серии экспериментов, можно сделать вывод, что есть некий дефект «некогерентности» в unified page cache ядра Linux (как минимум в linux-image-4.19.0-18-amd64, 4.19.208-1), который проявляется в купе при взаимодействии с остальными компонентами (io-scheduler, virtual memory management, драйвер диска/контроллера). Из-за этого некоторая часть страниц записанная одним процессом, становится видимой другому процессу с задержкой и неравномерно/неодновременно. В результате, процесс-читатель БД следуя по ссылкам (meta-page => maindb-root => maindb_leaf => subdb-root => ...) иногда видит смесь страниц измененных в последней транзакции и устаревших данных.

    1. Упрощенно есть два процесса W (писатель), R (читатель) и отображенный к ним обоим в память файл БД.
    2. При фиксации транзакции, процесс W делает следующее (упрощенно):
      • через дескриптор записывает в файл БД все измененные страницы, включая новую корневую страницу b-tree;
      • вызывает fdatasync() если включен надежный режим;
      • обновляет одну из meta-страниц, указывая в ней номер новой корневой страницы БД и номер фиксируемой транзакции;
      • еще раз вызывает fdatasync() если включен надежный режим.
    3. При старте читающей транзакции, процесс R делает следующее (упрощенно):
      • читает все три мета-страницы и выбирает "последнюю" (с максимальным номером транзакции);
      • считывает из мета-страницы всю информацию, включая номер корневой страницы БД;
      • проверят что ничего не поменялось и повторяет цикл если что-то не так;
      • продолжает транзакцию, предполагая что в памяти (через отображенный файл) ему видны актуальные данные записанные процессом W при фиксации транзакции.

    Всё вышеописанное работает при условии «когерентного» unified page cache внутри ядра ОС. Суть в том, что во всех случаях ядру нет смысла иметь в памяти более одной копии каждой страницы отображенного файла БД. Поэтому, как только кто-то пишет в этот файл через файловый дескриптор, эти изменения сразу видны всем процессам отобразившим файл БД в память:

    • если логическая страница файла БД уже загружена в память, то она обновляется inplace и уже после отправляется к io-scheduler для записи на носитель;
    • иначе, если страница отсутствует в ОЗУ, то создается с помещением в неё записываемых данных, и уже затем попадает в очереди к io-sheduler;
    • но каждый процесс всегда видит новые данные, сразу по завершению write()в писателе.

    В наблюдаемых случаях, очень-очень похоже, что происходит следующее:

    • используется NOSYNC-режимы без MDBX_WRITEMAP, т.е. данные пишутся через файловый дескриптор, но без последующего fdatasync();
    • внутри ядра записанные данные не сразу попадают в "unified page cache", либо из-за направления в io-очередь одновременно существуют старые и новые версии закэшированных страниц;
    • при некоторой небольшой паузе новые данные "доезжают" и проблема исчезает;
    • в результате процесс-читатель видит обновленную мета-страницу, но читая корневую страницу БД иногда видит старые данные.

    Был проведен эксперимент с включением режима MDBX_WRITEMAP. При этом данные изменяются непосредственно в памяти, соответствующие страницы при необходимости подгружаются в unified page cache, меняются в памяти, помечаются «грязными» и уже после могут быть направлены к очереди к io-планировщику. Это устранило проблему и подтвердило гипотезу, ~~но по-хорошему ребуется перепроверка~~ (многократно перепроверено и подтверждено).

    ~~Пока проблема была зафиксирована только на ядре 4.19 и ни разу не была замечена на ядрах 5.4, 5.8, 5.11 и 5.13. Поэтому, предположительно, соответствующий баг/недочет в ядре уже устранен в актуальных ядрах Linux.~~ Многократно перепроверено и подтверждено на всех актуальных ядрах Linux, но проблема воспроизводится (в том числе собственными тестами libmdbx) только если на машине параллельно работает нагруженный SIEM.

    not-a-bug 
    opened by erthink 7
  • [mdbx::byte] better forgo char8_t and always use unsigned char

    [mdbx::byte] better forgo char8_t and always use unsigned char

    Currently mdbx::byte is defined as either char8_t or unsigned char. However, char8_t is much more narrow in scope by design and is intended only to help with UTF8 encoding: https://stackoverflow.com/a/57453713. I suggest to always define mdbx::byte as unsigned char.

    P.S. In Silkworm we use uint8_t (= unsigned char) as our byte type; so making mdbx::byte always equal to unsigned char would allow us to use slice::byte_ptr() instead of slice::data() + cast.

    question not-a-bug proposal 
    opened by yperbasis 1
  • Engage an

    Engage an "overlapped I/O" on Windows

    Windows is the mad descending OS... However, some users still use it.

    Related to https://github.com/DoctorEvidence/lmdb-store/issues/24#issuecomment-782759084

    enhancement todo 
    opened by erthink 3
  • Simple careful mode for working with corrupted DB

    Simple careful mode for working with corrupted DB

    Add MDBX_CAREFUL environment flag which engages additional checks and turns on calling mdbx_page_check() for each page fetched to a cursor stack.

    This also allows us to move some of the checks that are currently performed in normal mode and which are disengageable by the MDBX_DISABLE_PAGECHECKS build option, under the control of this flag. Ideally, it is desirable to get rid of the MDBX_DISABLE_PAGECHECKS option altogether.

    Next, MDBX_CAREFUL should be always engaged by all mdbx-tools, especially by mdbx_chk and by mdbx_dump with -r option (recovery).

    enhancement todo 
    opened by erthink 1
  • libmdbx CLI front-end improvement ideas

    libmdbx CLI front-end improvement ideas

    I have a couple suggestions for libmdbx's CLI front-ends, inspired by functionality provided by Berkeley DB or other DBM-type databases:

    • making it possible to pass configuration options, such as the page size, to mdbx_load - effectively an equivalent to Berkeley DB's db_load -c .... Maybe mdbx_copy between different database files could benefit from that as well.
    • in order to provide CLI scriptability (and indirectly better fuzzing support) without having to use one of the language bindings for libmdbx, creating a generic mdbxtool front-end, similar to TDB's tdbtool and GDBM's gdbmtool, with the ability to read arguments from stdin, from a file, from command-line arguments, or interactively.

    FWIW, I sent the suggestion of creating similar tools to both Oracle for BDB and Howard Chu for LMDB, and years ago, Sergey Poznyakoff improved gdbmtool to follow tdbtool's lead.

    $ gdbmtool
    
    Welcome to the gdbm tool.  Type ? for help.
    
    gdbmtool> ?
     avail                         print avail list
     bucket NUMBER                 print a bucket
     cache                         print the bucket cache
     close                         close the database
     count                         count (number of entries)
     current                       print current bucket
     debug                         query/set debug level
     define key|content { FIELD-LIST } define datum structure
     delete KEY                    delete a record
     dir                           print hash directory
     export FILE [truncate] [binary|ascii] export
     fetch KEY                     fetch record
     first                         firstkey
     hash KEY                      hash value of key
     header                        print database file header
     help                          print this help list
     history [FROM] [COUNT]        show input history
     import FILE [replace] [nometa] import
     list                          list
     next [KEY]                    nextkey
     open FILE                     open new database
     quit                          quit the program
     recover [verbose] [summary] [backup] [force] [max-failed-keys=N] [max-failed-buckets=N] [max-failures=N] recover the database
     reorganize                    reorganize
     set [VAR=VALUE...]            set or list variables
     source FILE                   source command script
     status                        print current program status
     store KEY DATA                store
     unset VAR...                  unset variables
     version                       print version of gdbm
    gdbmtool> 
    
    $ tdbtool 
    tdb> ?
    database not open
    
    tdbtool: 
      create    dbname     : create a database
      open      dbname     : open an existing database
      transaction_start    : start a transaction
      transaction_commit   : commit a transaction
      transaction_cancel   : cancel a transaction
      erase                : erase the database
      dump                 : dump the database as strings
      keys                 : dump the database keys as strings
      hexkeys              : dump the database keys as hex values
      info                 : print summary info about the database
      insert    key  data  : insert a record
      move      key  file  : move a record to a destination tdb
      storehex  key  data  : store a record (replace), key/value in hex format
      store     key  data  : store a record (replace)
      show      key        : show a record by key
      delete    key        : delete a record by key
      list                 : print the database hash table and freelist
      free                 : print the database freelist
      freelist_size        : print the number of records in the freelist
      check                : check the integrity of an opened database
      repack               : repack the database
      speed                : perform speed tests on the database
      ! command            : execute system command
      1 | first            : print the first record
      n | next             : print the next record
      q | quit             : terminate
      \n                   : repeat 'next' command
    
    tdb>
    
    enhancement proposal 
    opened by debrouxl 0
  • Lot of code comments still in Russian

    Lot of code comments still in Russian

    Example https://github.com/erthink/libmdbx/blob/master/src/core.c#L4673-L4675

    Not sure what I am reading thus would be a nice to have for all comments to be in english. Thank you

    opened by AndreaLanfranchi 0
Releases(v0.7.0)
  • v0.7.0(Mar 18, 2020)

    • Workarounds for Wine (Windows compatibility layer for Linux).
    • MDBX_MAP_RESIZED renamed to MDBX_UNABLE_EXTEND_MAPSIZE.
    • Clarify API description, fix typos.
    • Speedup runtime checks in debug/checked builds.
    • Added checking for read/write transactions overlapping for the same thread, added MDBX_TXN_OVERLAPPING error and MDBX_DBG_LEGACY_OVERLAP option.
    • Added mdbx_key_from_jsonInteger(), mdbx_key_from_double(), mdbx_key_from_float(), mdbx_key_from_int64() and mdbx_key_from_int32() functions. See mdbx.h for description.
    • Fix compatibility (use zero for invalid DBI).
    • Refine/clarify error messages.
    • Avoids extra error messages "bad txn" from mdbx_chk when DB is corrupted.
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Jun 12, 2020)

    • Fix mdbx_load utility for custom comparators.
    • Fix checks related to MDBX_APPEND flag inside mdbx_cursor_put().
    • Refine/fix dbi_bind() internals.
    • Refine/fix handling STATUS_CONFLICTING_ADDRESSES.
    • Rework MDBX_DBG_DUMP option to avoid disk I/O performance degradation.
    • Add built-in help to test tool.
    • Fix mdbx_env_set_geometry() for large page size.
    • Fix env_set_geometry() for large pagesize.
    • Clarify API description & comments, fix typos.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jun 12, 2020)

    • Fix returning MDBX_RESULT_TRUE from page_alloc().
    • Fix false-positive ASAN issue.
    • Fix assertion for MDBX_NOTLS option.
    • Rework MADV_DONTNEED threshold.
    • Fix mdbx_chk utility for don't checking some numbers if walking of B-tree was disabled.
    • Use page's mp_txnid for basic integrity checking.
    • Add MDBX_FORCE_ASSERTIONS built-time option.
    • Rework MDBX_DBG_DUMP to avoid performance degradation.
    • Rename MDBX_NOSYNC to MDBX_SAFE_NOSYNC for clarity.
    • Interpret ERROR_ACCESS_DENIED from OpenProcess() as 'process exists'.
    • Avoid using FILE_FLAG_NO_BUFFERING for compatibility with small database pages.
    • Added install section for CMake.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Dec 7, 2019)

    1. Support for Mac OSX, FreeBSD, NetBSD, OpenBSD, DragonFly BSD, OpenSolaris, OpenIndiana (AIX and HP-UX pending).
    2. Use bootid for decisions of rollback.
    3. Counting retired pages and extended transaction info.
    4. MDBX_ACCEDE flag for database opening.
    5. Using OFD-locks and tracking for in-process multi-opening.
    6. Hot backup into pipe.
    7. cmake & amalgamated sources.
    8. Fastest internal sort implementation.
    9. New internal dirty-list implementation with lazy sorting.
    10. Support for lazy-sync-to-disk with polling.
    11. Extended key length.
    12. Last update transaction number for each sub-database.
    13. Automatic read ahead enabling/disabling.
    14. More auto-compactification.
    15. -fsanitize=undefined and -Wpedantic.
    16. Rework page merging.
    17. Nested transactions.
    18. API description.
    19. Checking for non-local filesystems to avoid DB corruption.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Sep 25, 2018)

    New features and Compatibility breaking:

    1. Compatible with v0.1.x (stable/1.0 branch).
    2. Identical database format for 32- and 64-bit builds.
    3. Dynamically DB file size (growth/shrink) and custom page size via mdbx_env_set_geometry().
    4. Support for Elbrus architecture.
    5. Support for Windows XP and later (building by MSVC 2013 and later, but not by MinGW).
    6. Support for MDBX_EXCLUSIVE mode, include network shares.
    7. Support for large write transactions.
    8. Extending mdbx_chk to verifing a inplaced sub-pages and accounting/statistics.
    9. Incompatible with v0.0.x (stable/0.0 branch) and with original/obsolete LMDB.

    Fixes since v0.1.x releases (stable/1.0 branch):

    • major: fix corruption due LMDB-inherited rebalance bugs.
    • major: fix accounting for sorted-duplicates tables.
    • major: fix internal (LMDB-inherited and broken) audit.

    Fixes since 0.1.5 2018-06-14:

    • fix LMDB-inherited empty and unneeded large/overflow pages.
    • fix LMDB-inherited cursor tracking bugs.
    • fix MDBX_CORRUPTED due concurrent open/shrink collision.
    • fix concurrent opening with custom pagesize.
    • fix MDBX_EKEYMISMATCH while update multi-value with MDBX_CURRENT.
    • linux: fix fallback2shared for mdbx_lck_exclusive().
    • windows: fix truncation race while unmap().
    • windows: fix nasty suspend_and_append() bug.
    • tools: fix wrong 'bad sequence' messages from mdbx_stat.
    • minor: doc fixes (ITS#8908 and ITS#8857).

    Fixes since 0.1.4 2018-05-04:

    • MAJOR: force steady-sync when shrinking DB.
    • windows: disable non-blocking DB-close.
    • minor: skip meta if usedbytes beyond EOF.
    • minor: return MDBX_CORRUPTED instead of crash if MDBX_DUPSORT mismatch.

    Fixes since 0.1.3 2018-04-03:

    • MAJOR: fix wrong freeDB search.
    • windows: fix lck_reader_alive_check().

    Fixes since 0.1.2 2018-03-22:

    • MAJOR: fix/rework rthc to avoid GNU libc ntpl bug.
    • minor: fix cursor tracking inside mdbx_rebalance().
    • minor: fix mdbx_cursor_put(MDBX_APPEND+MDBX_NOOVERWRITE) return MDBX_KEYEXIST instead of MDBX_EKEYMISMATCH.

    Fixes since 0.1.1 2018-03-14:

    • minor: fix/rework cache-line alignment.
    • minor: fix unaligned access to mp_ptrs[] on fake-page.
    • minor: lookup suitable txnid for rollback to avoid meta-pages clashes.
    • tool: rework/fix read-write mode inside mdbx_chk.
    • minor: fix minor memleak (Coverity).
    • minor: add workaround for Elbrus's libc bug.

    Fixes since 0.1.0 2018-03-07:

    • minor: fix missing MDBX_DEVEL=1.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Sep 25, 2018)

    New features and Compatibility breaking since stable/0.0:

    1. Incompatible with v0.0.x (stable/0.0 branch) and with original/obsolete LMDB.
    2. Identical database format for 32- and 64-bit builds.
    3. Dynamically DB file size (growth/shrink) and mdbx_env_set_geometry().
    4. Support for Elbrus architecture.
    5. Support for Windows (2008 and later, MSVC 2013 and later).

    Fixes since 0.1.5 2018-06-14:

    • fix LMDB-inherited empty and unneeded large/overflow pages.
    • fix LMDB-inherited cursor tracking bugs.
    • fix MDBX_CORRUPTED due concurrent open/shrink collision.
    • fix concurrent opening with custom pagesize.
    • fix MDBX_EKEYMISMATCH while update multi-value with MDBX_CURRENT.
    • linux: fix fallback2shared for mdbx_lck_exclusive().
    • windows: fix truncation race while unmap().
    • windows: fix nasty suspend_and_append() bug.
    • tools: fix wrong 'bad sequence' messages from mdbx_stat.
    • minor: doc fixes (ITS#8908 and ITS#8857).

    Fixes since 0.1.4 2018-05-04:

    • MAJOR: force steady-sync when shrinking DB.
    • windows: disable non-blocking DB-close.
    • minor: skip meta if usedbytes beyond EOF.
    • minor: return MDBX_CORRUPTED instead of crash if MDBX_DUPSORT mismatch.

    Fixes since 0.1.3 2018-04-03:

    • MAJOR: fix wrong freeDB search.
    • windows: fix lck_reader_alive_check().

    Fixes since 0.1.2 2018-03-22:

    • MAJOR: fix/rework rthc to avoid GNU libc ntpl bug.
    • minor: fix cursor tracking inside mdbx_rebalance().
    • minor: fix mdbx_cursor_put(MDBX_APPEND+MDBX_NOOVERWRITE) return MDBX_KEYEXIST instead of MDBX_EKEYMISMATCH.

    Fixes since 0.1.1 2018-03-14:

    • minor: fix/rework cache-line alignment.
    • minor: fix unaligned access to mp_ptrs[] on fake-page.
    • minor: lookup suitable txnid for rollback to avoid meta-pages clashes.
    • tool: rework/fix read-write mode inside mdbx_chk.
    • minor: fix minor memleak (Coverity).
    • minor: add workaround for Elbrus's libc bug.

    Fixes since 0.1.0 2018-03-07:

    • minor: fix missing MDBX_DEVEL=1.
    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Sep 25, 2018)

    Release of the stable/0.0 branch, which contains legacy code that compatible to original LMDB API.

    Corresponds to LMDB v0.9.22, but NOT includes all fixes from libmdbx mainstream, for instance this fix for cursor state after a deletion.

    Changes since 0.0.2 2018-07-15:

    • prevent DB corruption due LMDB-inherited rebalance bug (fixed in the master branch and 0.2.x).
    • minor doc fixes (ITS#8908 and ITS#8857).

    Changes since 0.0.2 2018-05-04:

    • add fallthrough for modern GCC and CLang.
    • add include <sys/sysmacros.h> for modern GNU libc.
    • migrate to Circle-CI 2.0
    • update Project Status in the README.

    Fixes since 0.0.1 2017-08-12 (corresponds to LMDB v0.9.21):

    Fixes since 0.0.0 2017-07-04:

    • more for cursor_del() (ITS#8699, ITS#8622).
    • fix extra madvise(MADV_REMOVE).
    • fix mdbx_set_attr().
    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Aug 9, 2018)

    Release of the stable/0.0 branch, which contains legacy code that compatible to original LMDB API.

    Corresponds to LMDB v0.9.22, but NOT includes all fixes from libmdbx mainstream, for instance this fix for cursor state after a deletion.

    Changes since 0.0.2 2018-05-04:

    • add fallthrough for modern GCC and CLang.
    • add include <sys/sysmacros.h> for modern GNU libc.
    • migrate to Circle-CI 2.0
    • update Project Status in the README.

    Fixes since 0.0.1 2017-08-12 (corresponds to LMDB v0.9.21):

    Fixes since 0.0.0 2017-07-04:

    • more for cursor_del() (ITS#8699, ITS#8622).
    • fix extra madvise(MADV_REMOVE).
    • fix mdbx_set_attr().
    Source code(tar.gz)
    Source code(zip)
  • v0.1.5(Aug 9, 2018)

    New features and Compatibility breaking since stable0.0:

    1. Incompatible with v0.0.x (stable/0.0 branch) and with original/obsolete LMDB.
    2. Identical database format for 32- and 64-bit builds.
    3. Dynamically DB file size (growth/shrink) and mdbx_env_set_geometry().
    4. Support for Elbrus architecture.
    5. Support for Windows (2008 and later, MSVC 2013 and later).

    Fixes since 0.1.4 2018-05-04:

    • MAJOR: force steady-sync when shrinking DB.
    • windows: disable non-blocking DB-close.
    • minor: skip meta if usedbytes beyond EOF.
    • minor: return MDBX_CORRUPTED instead of crash if MDBX_DUPSORT mismatch.

    Fixes since 0.1.3 2018-04-03:

    • MAJOR: fix wrong freeDB search.
    • windows: fix lck_reader_alive_check().

    Fixes since 0.1.2 2018-03-22:

    • MAJOR: fix/rework rthc to avoid GNU libc ntpl bug.
    • minor: fix cursor tracking inside mdbx_rebalance().
    • minor: fix mdbx_cursor_put(MDBX_APPEND+MDBX_NOOVERWRITE) return MDBX_KEYEXIST instead of MDBX_EKEYMISMATCH.

    Fixes since 0.1.1 2018-03-14:

    • minor: fix/rework cache-line alignment.
    • minor: fix unaligned access to mp_ptrs[] on fake-page.
    • minor: lookup suitable txnid for rollback to avoid meta-pages clashes.
    • tool: rework/fix read-write mode inside mdbx_chk.
    • minor: fix minor memleak (Coverity).
    • minor: add workaround for Elbrus's libc bug.

    Fixes since 0.1.0 2018-03-07:

    • minor: fix missing MDBX_DEVEL=1.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Aug 9, 2018)

    New features and Compatibility breaking since stable0.0:

    1. Incompatible with v0.0.x (stable/0.0 branch) and with original/obsolete LMDB.
    2. Identical database format for 32- and 64-bit builds.
    3. Dynamically DB file size (growth/shrink) and mdbx_env_set_geometry().
    4. Support for Elbrus architecture.
    5. Support for Windows (2008 and later, MSVC 2013 and later).

    Fixes since 0.1.3 2018-04-03:

    • MAJOR: fix wrong freeDB search.
    • windows: fix lck_reader_alive_check().

    Fixes since 0.1.2 2018-03-22:

    • MAJOR: fix/rework rthc to avoid GNU libc ntpl bug.
    • minor: fix cursor tracking inside mdbx_rebalance().
    • minor: fix mdbx_cursor_put(MDBX_APPEND+MDBX_NOOVERWRITE) return MDBX_KEYEXIST instead of MDBX_EKEYMISMATCH.

    Fixes since 0.1.1 2018-03-14:

    • minor: fix/rework cache-line alignment.
    • minor: fix unaligned access to mp_ptrs[] on fake-page.
    • minor: lookup suitable txnid for rollback to avoid meta-pages clashes.
    • tool: rework/fix read-write mode inside mdbx_chk.
    • minor: fix minor memleak (Coverity).
    • minor: add workaround for Elbrus's libc bug.

    Fixes since 0.1.0 2018-03-07:

    • minor: fix missing MDBX_DEVEL=1.
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Aug 9, 2018)

    Release of the stable/0.0 branch, which contains legacy code that compatible to original LMDB API.

    Corresponds to LMDB v0.9.22, but NOT includes all fixes from libmdbx mainstream, for instance this fix for cursor state after a deletion.

    Fixes since 0.0.1 2017-08-12 (corresponds to LMDB v0.9.21):

    Fixes since 0.0.0 2017-07-04:

    • more for cursor_del() (ITS#8699, ITS#8622).
    • fix extra madvise(MADV_REMOVE).
    • fix mdbx_set_attr().
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Aug 9, 2018)

    New features and Compatibility breaking since stable0.0:

    1. Incompatible with v0.0.x (stable/0.0 branch) and with original/obsolete LMDB.
    2. Identical database format for 32- and 64-bit builds.
    3. Dynamically DB file size (growth/shrink) and mdbx_env_set_geometry().
    4. Support for Elbrus architecture.
    5. Support for Windows (2008 and later, MSVC 2013 and later).

    Fixes since 0.1.2 2018-03-22:

    • MAJOR: fix/rework rthc to avoid GNU libc ntpl bug.
    • minor: fix cursor tracking inside mdbx_rebalance().
    • minor: fix mdbx_cursor_put(MDBX_APPEND+MDBX_NOOVERWRITE) return MDBX_KEYEXIST instead of MDBX_EKEYMISMATCH.

    Fixes since 0.1.1 2018-03-14:

    • minor: fix/rework cache-line alignment.
    • minor: fix unaligned access to mp_ptrs[] on fake-page.
    • minor: lookup suitable txnid for rollback to avoid meta-pages clashes.
    • tool: rework/fix read-write mode inside mdbx_chk.
    • minor: fix minor memleak (Coverity).
    • minor: add workaround for Elbrus's libc bug.

    Fixes since 0.1.0 2018-03-07:

    • minor: fix missing MDBX_DEVEL=1.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Aug 9, 2018)

    New features and Compatibility breaking since stable0.0:

    1. Incompatible with v0.0.x (stable/0.0 branch) and with original/obsolete LMDB.
    2. Identical database format for 32- and 64-bit builds.
    3. Dynamically DB file size (growth/shrink) and mdbx_env_set_geometry().
    4. Support for Elbrus architecture.
    5. Support for Windows (2008 and later, MSVC 2013 and later).

    Fixes since 0.1.1 2018-03-14:

    • minor: fix/rework cache-line alignment.
    • minor: fix unaligned access to mp_ptrs[] on fake-page.
    • minor: lookup suitable txnid for rollback to avoid meta-pages clashes.
    • tool: rework/fix read-write mode inside mdbx_chk.
    • minor: fix minor memleak (Coverity).
    • minor: add workaround for Elbrus's libc bug.

    Fixes since 0.1.0 2018-03-07:

    • minor: fix missing MDBX_DEVEL=1.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Aug 9, 2018)

    New features and Compatibility breaking since stable0.0:

    1. Incompatible with v0.0.x (stable/0.0 branch) and with original/obsolete LMDB.
    2. Identical database format for 32- and 64-bit builds.
    3. Dynamically DB file size (growth/shrink) and mdbx_env_set_geometry().
    4. Support for Elbrus architecture.
    5. Support for Windows (2008 and later, MSVC 2013 and later).

    Fixes since 0.1.0 2018-03-07:

    • minor: fix missing MDBX_DEVEL=1.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Aug 9, 2018)

    New features and Compatibility breaking since stable0.0:

    1. Incompatible with v0.0.x (stable/0.0 branch) and with original/obsolete LMDB.
    2. Identical database format for 32- and 64-bit builds.
    3. Dynamically DB file size (growth/shrink) and mdbx_env_set_geometry().
    4. Support for Elbrus architecture.
    5. Support for Windows (2008 and later, MSVC 2013 and later).
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Aug 12, 2017)

  • LMDB_0.9.19(Feb 22, 2017)

Owner
Леонид Юрьев (Leonid Yuriev)
Please don't use my work, if you are associated with Adolf Hitler, Stepan Bandera, George Soros, Michael Hodorkovsky, either support an actions of these felons.
Леонид Юрьев (Leonid Yuriev)
MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

Copyright (c) 2000, 2021, Oracle and/or its affiliates. This is a release of MySQL, an SQL database server. License information can be found in the

MySQL 8.6k Dec 26, 2022
An Embedded NoSQL, Transactional Database Engine

UnQLite - Transactional Embedded Database Engine

PixLab | Symisc Systems 1.8k Dec 24, 2022
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

Google 31.6k Jan 7, 2023
Simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.

Sparkey is a simple constant key/value storage library. It is mostly suited for read heavy systems with infrequent large bulk inserts. It includes bot

Spotify 989 Dec 14, 2022
Kreon is a key-value store library optimized for flash-based storage

Kreon is a key-value store library optimized for flash-based storage, where CPU overhead and I/O amplification are more significant bottlenecks compared to I/O randomness.

Computer Architecture and VLSI Systems (CARV) Laboratory 24 Jul 14, 2022
BerylDB is a data structure data manager that can be used to store data as key-value entries.

BerylDB is a data structure data manager that can be used to store data as key-value entries. The server allows channel subscription and is optimized to be used as a cache repository. Supported structures include lists, sets, and keys.

BerylDB 203 Dec 16, 2022
A very fast lightweight embedded database engine with a built-in query language.

upscaledb 2.2.1 Fr 10. Mär 21:33:03 CET 2017 (C) Christoph Rupp, [email protected]; http://www.upscaledb.com This is t

Christoph Rupp 542 Dec 30, 2022
A mini database for learning database

A mini database for learning database

Chuckie Tan 4 Nov 14, 2022
ESE is an embedded / ISAM-based database engine, that provides rudimentary table and indexed access.

Extensible-Storage-Engine A Non-SQL Database Engine The Extensible Storage Engine (ESE) is one of those rare codebases having proven to have a more th

Microsoft 792 Dec 22, 2022
C++11 wrapper for the LMDB embedded B+ tree database library.

lmdb++: a C++11 wrapper for LMDB This is a comprehensive C++ wrapper for the LMDB embedded database library, offering both an error-checked procedural

D.R.Y. C++ 263 Dec 27, 2022
C++ embedded memory database

ShadowDB 一个C++嵌入式内存数据库 语法极简风 支持自定义索引、复合条件查询('<','<=','==','>=','>','!=',&&,||) 能够快速fork出一份数据副本 // ShadowDB简单示例 // ShadowDB是一个可以创建索引、能够快速fork出一份数据分支的C+

null 13 Nov 10, 2022
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large scale graphs with dozens of billions of vertices (nodes) and trillions of edges, with milliseconds of latency.

vesoft inc. 834 Dec 24, 2022
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Overview GridDB is Database for IoT with both NoSQL interface and SQL Interface. Please refer to GridDB Features Reference for functionality. This rep

GridDB 2k Jan 8, 2023
SiriDB is a highly-scalable, robust and super fast time series database

SiriDB is a highly-scalable, robust and super fast time series database. Build from the ground up SiriDB uses a unique mechanism to operate without a global index and allows server resources to be added on the fly. SiriDB's unique query language includes dynamic grouping of time series for easy analysis over large amounts of time series.

SiriDB 471 Jan 9, 2023
ObjectBox C and C++: super-fast database for objects and structs

ObjectBox Embedded Database for C and C++ ObjectBox is a superfast C and C++ database for embedded devices (mobile and IoT), desktop and server apps.

ObjectBox 152 Dec 23, 2022
DuckDB is an in-process SQL OLAP Database Management System

DuckDB is an in-process SQL OLAP Database Management System

DuckDB 7.8k Jan 3, 2023
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.

yugabyte 7.4k Jan 7, 2023
TimescaleDB is an open-source database designed to make SQL scalable for time-series data.

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

Timescale 14.3k Jan 2, 2023
Beryl-cli is a client for the BerylDB database server

Beryl-cli is a client for the BerylDB database server. It offers multiple commands and is designed to be fast and user-friendly.

BerylDB 11 Oct 9, 2022