FoundationDB - the open source, distributed, transactional key-value store

Overview

FoundationDB logo

Build Status

FoundationDB is a distributed database designed to handle large volumes of structured data across clusters of commodity servers. It organizes data as an ordered key-value store and employs ACID transactions for all operations. It is especially well-suited for read/write workloads but also has excellent performance for write-intensive workloads. Users interact with the database using API language binding.

To learn more about FoundationDB, visit foundationdb.org

Documentation

Documentation can be found online at https://apple.github.io/foundationdb/. The documentation covers details of API usage, background information on design philosophy, and extensive usage examples. Docs are built from the source in this repo.

Forums

The FoundationDB Forums are the home for most of the discussion and communication about the FoundationDB project. We welcome your participation! We want FoundationDB to be a great project to be a part of and, as part of that, have established a Code of Conduct to establish what constitutes permissible modes of interaction.

Contributing

Contributing to FoundationDB can be in contributions to the code base, sharing your experience and insights in the community on the Forums, or contributing to projects that make use of FoundationDB. Please see the contributing guide for more specifics.

Getting Started

Binary downloads

Developers interested in using FoundationDB can get started by downloading and installing a binary package. Please see the downloads page for a list of available packages.

Compiling from source

Developers on an OS for which there is no binary package, or who would like to start hacking on the code, can get started by compiling from source.

The official docker image for building is foundationdb/build which has all dependencies installed. The Docker image definitions used by FoundationDB team members can be found in the dedicated repository..

To build outside the official docker image you'll need at least these dependencies:

  1. Install cmake Version 3.13 or higher CMake
  2. Install Mono
  3. Install Ninja (optional, but recommended)

If compiling for local development, please set -DUSE_WERROR=ON in cmake. Our CI compiles with -Werror on, so this way you'll find out about compiler warnings that break the build earlier.

Once you have your dependencies, you can run cmake and then build:

  1. Check out this repository.
  2. Create a build directory (you can have the build directory anywhere you like). There is currently a directory in the source tree called build, but you should not use it. See #3098
  3. cd
  4. cmake -G Ninja
  5. ninja # If this crashes it probably ran out of memory. Try ninja -j1

Language Bindings

The language bindings that are supported by cmake will have a corresponding README.md file in the corresponding bindings/lang directory.

Generally, cmake will build all language bindings for which it can find all necessary dependencies. After each successful cmake run, cmake will tell you which language bindings it is going to build.

Generating compile_commands.json

CMake can build a compilation database for you. However, the default generated one is not too useful as it operates on the generated files. When running make, the build system will create another compile_commands.json file in the source directory. This can than be used for tools like CCLS, CQuery, etc. This way you can get code-completion and code navigation in flow. It is not yet perfect (it will show a few errors) but we are constantly working on improving the development experience.

CMake will not produce a compile_commands.json, you must pass -DCMAKE_EXPORT_COMPILE_COMMANDS=ON. This also enables the target processed_compile_commands, which rewrites compile_commands.json to describe the actor compiler source file, not the post-processed output files, and places the output file in the source directory. This file should then be picked up automatically by any tooling.

Note that if building inside of the foundationdb/build docker image, the resulting paths will still be incorrect and require manual fixing. One will wish to re-run cmake with -DCMAKE_EXPORT_COMPILE_COMMANDS=OFF to prevent it from reverting the manual changes.

Using IDEs

CMake has built in support for a number of popular IDEs. However, because flow files are precompiled with the actor compiler, an IDE will not be very useful as a user will only be presented with the generated code - which is not what she wants to edit and get IDE features for.

The good news is, that it is possible to generate project files for editing flow with a supported IDE. There is a CMake option called OPEN_FOR_IDE which will generate a project which can be opened in an IDE for editing. You won't be able to build this project, but you will be able to edit the files and get most edit and navigation features your IDE supports.

For example, if you want to use XCode to make changes to FoundationDB you can create a XCode-project with the following command:

cmake -G Xcode -DOPEN_FOR_IDE=ON <FDB_SOURCE_DIRECTORY>

You should create a second build-directory which you will use for building and debugging.

FreeBSD

  1. Check out this repo on your server.

  2. Install compile-time dependencies from ports.

  3. (Optional) Use tmpfs & ccache for significantly faster repeat builds

  4. (Optional) Install a JDK for Java Bindings. FoundationDB currently builds with Java 8.

  5. Navigate to the directory where you checked out the foundationdb repo.

  6. Build from source.

    sudo pkg install -r FreeBSD \
        shells/bash devel/cmake devel/ninja devel/ccache  \
        lang/mono lang/python3 \
        devel/boost-libs devel/libeio \
        security/openssl
    mkdir .build && cd .build
    cmake -G Ninja \
        -DUSE_CCACHE=on \
        -DDISABLE_TLS=off \
        -DUSE_DTRACE=off \
        ..
    ninja -j 10
    # run fast tests
    ctest -L fast
    # run all tests
    ctest --output-on-failure -v

Linux

There are no special requirements for Linux. A docker image can be pulled from foundationdb/build that has all of FoundationDB's dependencies pre-installed, and is what the CI uses to build and test PRs.

cmake -G Ninja 
   
    
ninja
cpack -G DEB

   

For RPM simply replace DEB with RPM.

MacOS

The build under MacOS will work the same way as on Linux. To get boost and ninja you can use Homebrew.

cmake -G Ninja <PATH_TO_FOUNDATIONDB_SOURCE>

To generate a installable package,

ninja
$SRCDIR/packaging/osx/buildpkg.sh . $SRCDIR

Windows

Under Windows, the build instructions are very similar, with the main difference that Visual Studio is used to compile.

  1. Install Visual Studio 2017 (Community Edition is tested)
  2. Install cmake Version 3.12 or higher CMake
  3. Download version 1.72 of Boost
  4. Unpack boost (you don't need to compile it)
  5. Install Mono
  6. (Optional) Install a JDK. FoundationDB currently builds with Java 8
  7. Set JAVA_HOME to the unpacked location and JAVA_COMPILE to $JAVA_HOME/bin/javac.
  8. Install Python if it is not already installed by Visual Studio
  9. (Optional) Install WIX. Without it Visual Studio won't build the Windows installer
  10. Create a build directory (you can have the build directory anywhere you like): mkdir build
  11. cd build
  12. cmake -G "Visual Studio 15 2017 Win64" -DBOOST_ROOT=
  13. This should succeed. In which case you can build using msbuild: msbuild /p:Configuration=Release foundationdb.sln. You can also open the resulting solution in Visual Studio and compile from there. However, be aware that using Visual Studio for development is currently not supported as Visual Studio will only know about the generated files. msbuild is located at c:\Program Files (x86)\MSBuild\14.0\Bin\MSBuild.exe for Visual Studio 15.

If you installed WIX before running cmake you should find the FDBInstaller.msi in your build directory under packaging/msi.

TODO: Re-add instructions for TLS support #3022

Issues
  • Support for 3.x clusters to live migrate

    Support for 3.x clusters to live migrate

    I remember @ajbeamon mentioning that there's something that needs to be changed in the 3.x code (IIRC?) that will allow us to ship a foundationdb-client package that can talk to a both a 3.x cluster and a 5.x cluster. I can't seem to find that in the code, can someone point us in the right direction?

    opened by panghy 106
  • OpenTelemetry API Tracing.

    OpenTelemetry API Tracing.

    The PR introduces a new distributed tracing implementation called an OTELSpan which attempts to provide greater compatibility with the W3C Trace-Context specification https://www.w3.org/TR/trace-context and the OpenTelemetry specification and API https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md.

    Unit testing was added for this new implementation.

    Design Document

    N.B. This provides the implementation of Open Telemetry compliant traces but doesn't yet update the call sites to use this new tracing format. A follow up PR will handle those additional changes.

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-rjenkins 103
  • Physical Shard Move

    Physical Shard Move

    1. Introduced IKeyValueStore::checkpoint() and IKeyValueStore::restore(), and implemented these two for RocksDB.

      1. Store FDB data in a non-default RocksDb column family, so that it is easier to import a column family checkpoint.
      2. with checkpoint(), RocksDB created a checkpoint from the FDB column family.
      3. with restore(), RocksDB imports the column family checkpoint.
    2. Added FDB client API to create checkpoint within a transaction

    3. Added file transfer interface for storage server, so that a checkpoint can be transferred.

    4. Added checkpoint GC

    Test: passed 10000 runs of the single test: tests/fast/PhysicalShardMove.toml 20220315-173036-heliu-afdd443083d7948a

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by liquid-helium 95
  • Add transaction option for clients to use cached read versions

    Add transaction option for clients to use cached read versions

    Not every client of FDB requires full consistency all the time, specifically regarding reads. If there are transactions that do not require full consistency, and are okay with seeing a bit of stale history within a certain time bound, it is possible to avoid the request to the proxy with the use of cached read versions.

    The GRV cache is implemented on the Database/DatabaseContext level, which is an object which represents a client's connection to the FDB cluster. After the first request to use GRV caching, the cache will be populated by all future GRVs, commits, as well as a background actor which periodically asks the proxy for read versions. Future reads from this client will be able to optionally request to use cached read versions instead of waiting for a round-trip request/response from the GRV proxies.

    NOTE: THIS FEATURE WILL NOT CURRENTLY WORK WITH THE MULTIVERSION/MULTITHREADED CLIENT (There are future plans to accommodate for several Database connections, but requires implementation of shared state and more testing)

    To use the GRV cache, the transaction option USE_GRV_CACHE needs to be set, and is done so on a per-transaction basis. Doing so will attempt to provide a cached read version within the staleness bounds specified by the MAX_VERSION_CACHE_LAG knob, and will fall back on the regular GRV path and contact the proxy if no such read version is available. If the operator is trying to use the GRV cache with a transaction that was previously handled with the onError method, the option will not be set and will instead enter the regular GRV path.

    If an administrator wishes to completely avoid the GRV cache, the knob FORCE_GRV_CACHE_OFF will take precedence over all other options and will return to the regular behaviour, regardless of whether the transaction option is set or not.

    There are several other internal features which may be affected by the use of the GRV cache, most notable of which is the throttling capabilities of RateKeeper. There are several knobs (with commented details in fdbclient/ClientKnobs.h) which can be fine-tuned to adjust how the cache reacts to RateKeeper and throttled read version requests. By default, the GRV cache will be shut off for 60 seconds if the GRV proxy experiences "sustained throttling".

    The new SidebandSingle workload tests the consistency of the cache, and specifically also targets the case when commits return with an unknown result. In simulated workloads, additional sim_validation lines have been added in the code paths where read versions are updated to ensure consistency and verify that the staleness bounds are being enforced properly.

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-jfu 95
  • Multi-tenant API

    Multi-tenant API

    This introduces a new concept called tenants that is exposed to the user as distinct key-spaces identified by a name. Clients can create and delete tenants and run transactions inside them.

    Use of this API is opt-in, and users can continue to use the cluster without involving tenants if they choose.

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-ajbeamon 93
  • Redwood page format refactor to support format evolution, forensic analysis and future encryption scheme

    Redwood page format refactor to support format evolution, forensic analysis and future encryption scheme

    Significant refactor of ArenaPage API and everything that uses it.

    • Pages now have a series of headers

      • A universal header specifying header version and encoding type, and offsets to subheaders and payload
      • A main subheader (currently v1)
      • An encoding-specific subheader based on encoding type
    • The main subheader is responsible for protecting all non payload bytes including itself, the universal header, and the encoding specific header

    • Payload is after all headers, and its protection and/or encryption scheme is handled by the encoding type

    • New main header versions can be easily added in the future. Current version is 1.

    • Encoders can be easily extended in the future. Presently there are two

      • XXHashEncoding which protects the payload with a checksum
      • XOREncryptionEncoding which trivially encrypts the payload and protects it with a checksum. It exists only to prove the framework is sufficient to support encryption.
    • There is a new encryption key API, IEncryptionKeyProvider. It

      • provides the secret for a given key ID
      • provider a key ID and secret to use for a page given the page's Key-Value key range
    • Pager and BTree required significant modification to use the new page and encryption key APIs

    • Valgrind support is now much more precise than before. Previously, all page buffer bytes written to disk were marked as defined memory first, but now each data structure carefully marks only its unused payload portion as defined memory.

    Passed 100k correctness with Redwood storage and 100k Redwood correctness unit tests.

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-satherton 89
  • Read-aware Data Distributor (default disabled)

    Read-aware Data Distributor (default disabled)

    • Add DDLoadRebalance() function as the counterpart for BgDDValleyFiller and BgDDMountainChopper for read load rebalance. (Read bandwidth);
    • Add functionality to datadistribution command to disable and enable rebalance_read and rebalance_disk separately;
    • If read sampling is disabled, the read-aware DD is disabled by nature;

    Test:

    • All unit test cases passed;
    • Simulation tests (1/200k fail - MutationLogReader)

    Design Doc | Performance Analysis

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-xwang 88
  • Remote ikvs debugging

    Remote ikvs debugging

    This pr is implemented based on #5648 and fixed the correctness bugs. The purpose is to give an option to spawn a dedicated child process (in code, we use bin/fdbserver -p flowprocess ... to run the child process) to run the storage engine. Thus the storage server process will have more CPU time on networking. This can be used when you have idle CPU resources on your machine.

    A fdbserver knob remote_kv_store is added, if true, the storage engine will open in a spawned child process instead of in the storage server process.

    Added a new error remote_kvs_cancelled to replace actor_cancelled as actor_cancelled is not supposed to be sent through networksender . remote_kvs_cancelled is treated the same way as actor_cancelled in worker and storageserver.

    One thing noticeable is that we destroy the child process in simulation after it's closed. However, as the underlying cleaning up work is still working the parent process is waiting on onClosed(), we then do not destroy the child process until onClosed() is notified on the parent process.

    The child process is binding with the parent process using prctl, when parent process dies, the child process will be killed by SIGTERM automatically. In the contrast, when the child process dies, the actor spawns the child will throw the error please_reboot_remote_kv_store , which is a new error I added as to let the storage sever worker reboot itself. In particular, the best solution is only to reboot the storage server not the process, however, it needs a refactor on a hot code path to let it happen. And as this is a rare case, this solution is enough for now.


    Does not support RocksDB as a remote store as there are unimplemented interfaces like checkpoint in IKeyValueStore.h .

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-clin 85
  • Add support for change feeds

    Add support for change feeds

    This PR adds change feeds to FDB. Change feeds let a user register an identifier for an individual key range, and later allow fetching all mutations does against keys in that range. This is intended for use on small key ranges, using this feature to collect changes against a range >1GB in size is not recommended. Furthermore, performance may suffer when a change feed has been registered that is not popped and too many mutations are built up in the feed.

    Because of all of these edge cases, it is currently only intended to be used internally to FDB.

    passed 200k correctness passed 200k change feed specific correctness

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-etschannen 80
  • Unify flags in binaries

    Unify flags in binaries

    Description Flags in our binaries(fdbcli, fdbbackup, etc.) currently contain a mixture of _ and -. This PR is unifying the flags and only use - for users. And internally, we use the format, that only allows leading hyphens, and transfer all other hyphens to underscores, to do the match work. For example: --cluster-file ==> --cluster_file while comparing.

    #4817 is the related issue.

    Testing

    1. Correctness test result: 20211206-174631-chliu-foundationdb-2e6bdc276625754e compressed=True data_size=20995705 duration=3564672 ended=100003 fail_fast=10 max_runs=100000 pass=100003 priority=100 remaining=0 runtime=0:48:11 sanity=False started=100002 stopped=20211206-183442 submitted=20211206-174631 timeout=5400 username=chliu-foundationdb

    2. Ran several commands with flags manually: e.g.: fdbcli --build-flags (the output is the same with command: fdbcli --build_flags) fdbcli --no-status (the output is the same with command: fdbcli --no_status) fdbbackup --build-flags ((the output is the same with command: fdbbackup --build_flags)

    This also changes fdbmonitor/FDBService to use hyphens by default. For each fdbmonitor specific option, it will also check whether the same option exists with underscores instead.

    Knob configuration allows specifying hyphens or underscores too.

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-chliu 75
  • Features/private request streams

    Features/private request streams

    This is the first part of adding an authorization feature to FDB. Specifically this PR adds a capability to allow certain types of connections coming only from certain IP subnets. It also contains some of the basic building blocks for later token based authorization

    Did run 100k simulation tests with no failures: 20220410-172040-mpilman-foundationdb-dbda0232f9e1c223

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or master if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-mpilman 73
  • Create audit logs from transaction log or redo log

    Create audit logs from transaction log or redo log

    Hi, I am new to fdb cluster, I want to find a way to keep an audit log of the operations in fdb. (e.g., getrange,set,get and so on) As I understand, analyzing transaction log or redo log maybe a way to fetch the operations, but I only find client trace logs in /var/log/foundationdb. My question is where is the transaction log or redo log? Or is there any configuration in file foundationdb.cnf?

    opened by CodingSuen 0
  • Sharded storage

    Sharded storage

    Put description here...

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by liquid-helium 8
  • Update the tenant special keys submodule to support multiple sub-ranges

    Update the tenant special keys submodule to support multiple sub-ranges

    Rename and restructure the tenant special keys to support multiple sub-ranges. The existing range \xff\xff/management/tenant_map/ becomes \xff\xff/management/tenant/map/. Future ranges can be added underneath the tenant/ module. This will enable support for modifying tenant state (e.g. the future tenant groups) at the same time that we create a tenant.

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-ajbeamon 5
  • Worker register with cluster controller before disk file recovery

    Worker register with cluster controller before disk file recovery

    Previously worker servers only register with cluster controller after local file recovery. This PR change to let worker servers register itself before local file recovery, but indicate that although it can server stateless roles, it cannot become storage or TLog server. After local file recovery, the worker will register again to indicate it can now become storage and Tlog.

    The change fixes the deadlock issue with encryption, where when there's only stateful workers that can become master and EKP, those worker servers won't be able to register with cluster controller since no EKP is available to provide cipher keys to decrypt encrypted storage.

    For now we have the change only when encryption is on, and will make it the default behavior once we think it is stable.

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-yiwu 7
  • Change Feed Reading CPU optimization

    Change Feed Reading CPU optimization

    Improves cpu efficiency of finding change feed mutations in memory by binary searching, or exponential searching in cases where that is likely to be significantly more efficient than binary searching.

    Testing summary. Correctness: passed 10k BlobGranule* tests, and added surgical validation ASSERTs to ensure correctness

    Performance:

    • in instrumented tests, number of search comparisons with the exponential search if atLatest optimization was ~1.8x fewer on average than always doing binary search, and > 30x fewer comparisons than the previous linear search
    • On average, in single-process write-only tests, storage server cpu as a whole was around 8% lower, for both uniform random and sequential write workloads.

    Workload generation with mako: sequential: mako -p 1 -t 1 --keylen 32 --vallen 96 --mode build --rows 2500000 --trace sandbox/mako/ -c fdb.cluster --tps 100

    uniform random: mako -p 1 -t 1 --keylen 32 --vallen 96 --mode run --rows 3000000 --trace sandbox/mako/ -c fdb.cluster -x i100 --tps 100 --seconds 300

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by sfc-gh-jslocum 2
  • cherry pick pr7499 to release 7.1 branch

    cherry pick pr7499 to release 7.1 branch

    When version vector was developed a problem was misdiagnosed in PR 5949. It was thought that in UNICAST mode, version vector recovery did not work with cursors built out of tLogs in non-local data centers, when actually the issue encountered at that time was the one described in PR 7405.

    Fixes -r simulation --knob_txs_popped_max_delay=4 --logsize=4096MiB -f ./src/foundationdb/tests/fast/KillRegionCycle.toml -b on -s 715312789

    100K successful runs of KillRegionCycle.toml with vv enabled 20220629-185214-henrylambright-d6f7de4a60eab720

    Code-Reviewer Section

    The general guidelines can be found here.

    Please check each of the following things and check all boxes before accepting a PR.

    • [ ] The PR has a description, explaining both the problem and the solution.
    • [ ] The description mentions which forms of testing were done and the testing seems reasonable.
    • [ ] Every function/class/actor that was touched is reasonably well documented.

    For Release-Branches

    If this PR is made against a release-branch, please also check the following:

    • [ ] This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
    • [ ] There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)
    opened by dlambrig 5
Releases(snowflake-7.1.3-rc2)
Kreon is a key-value store library optimized for flash-based storage

Kreon is a key-value store library optimized for flash-based storage, where CPU overhead and I/O amplification are more significant bottlenecks compared to I/O randomness.

Computer Architecture and VLSI Systems (CARV) Laboratory 20 Jun 10, 2022
BerylDB is a data structure data manager that can be used to store data as key-value entries.

BerylDB is a data structure data manager that can be used to store data as key-value entries. The server allows channel subscription and is optimized to be used as a cache repository. Supported structures include lists, sets, and keys.

BerylDB 195 Jun 24, 2022
MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

Copyright (c) 2000, 2021, Oracle and/or its affiliates. This is a release of MySQL, an SQL database server. License information can be found in the

MySQL 8k Jul 4, 2022
Simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.

Sparkey is a simple constant key/value storage library. It is mostly suited for read heavy systems with infrequent large bulk inserts. It includes bot

Spotify 965 Jun 19, 2022
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

Google 29.7k Jul 4, 2022
Kunlun distributed DBMS is a NewSQL OLTP relational distributed database management system

Kunlun distributed DBMS is a NewSQL OLTP relational distributed database management system. Application developers can use Kunlun to build IT systems that handles terabytes of data, without any effort on their part to implement data sharding, distributed transaction processing, distributed query processing, crash safety, high availability, strong consistency, horizontal scalability. All these powerful features are provided by Kunlun.

zettadb 99 Jun 24, 2022
An Embedded NoSQL, Transactional Database Engine

UnQLite - Transactional Embedded Database Engine

PixLab | Symisc Systems 1.7k Jun 23, 2022
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large scale graphs with dozens of billions of vertices (nodes) and trillions of edges, with milliseconds of latency.

vesoft inc. 807 Jun 30, 2022
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.

yugabyte 6.6k Jul 1, 2022
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.

What is OceanBase database OceanBase Database is a native distributed relational database. It is developed entirely by Alibaba and Ant Group. OceanBas

OceanBase 4.4k Jun 27, 2022
BaikalDB, A Distributed HTAP Database.

BaikalDB supports sequential and randomised realtime read/write of structural data in petabytes-scale. BaikalDB is compatible with MySQL protocol and it supports MySQL style SQL dialect, by which users can migrate their data storage from MySQL to BaikalDB seamlessly.

Baidu 955 Jun 30, 2022
GalaxyEngine is a MySQL branch originated from Alibaba Group, especially supports large-scale distributed database system.

GalaxyEngine is a MySQL branch originated from Alibaba Group, especially supports large-scale distributed database system.

null 229 Jun 30, 2022
PGSpider: High-Performance SQL Cluster Engine for distributed big data.

PGSpider: High-Performance SQL Cluster Engine for distributed big data.

PGSpider 127 Jun 24, 2022
Distributed PostgreSQL as an extension

What is Citus? Citus is a PostgreSQL extension that transforms Postgres into a distributed database—so you can achieve high performance at any scale.

Citus Data 6.8k Jun 28, 2022
TimescaleDB is an open-source database designed to make SQL scalable for time-series data.

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

Timescale 13.3k Jun 27, 2022
PolarDB for PostgreSQL (PolarDB for short) is an open source database system based on PostgreSQL.

PolarDB for PostgreSQL (PolarDB for short) is an open source database system based on PostgreSQL. It extends PostgreSQL to become a share-nothing distributed database, which supports global data consistency and ACID across database nodes, distributed SQL processing, and data redundancy and high availability through Paxos based replication. PolarDB is designed to add values and new features to PostgreSQL in dimensions of high performance, scalability, high availability, and elasticity. At the same time, PolarDB remains SQL compatibility to single-node PostgreSQL with best effort.

Alibaba 2.3k Jun 30, 2022
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.

DB Browser for SQLite What it is DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files c

null 16.7k Jun 27, 2022
The open-source database for the realtime web.

RethinkDB What is RethinkDB? Open-source database for building realtime web applications NoSQL database that stores schemaless JSON documents Distribu

RethinkDB 25.5k Jun 24, 2022
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Overview GridDB is Database for IoT with both NoSQL interface and SQL Interface. Please refer to GridDB Features Reference for functionality. This rep

GridDB 1.8k Jun 27, 2022