CacheLib is a C++ library providing in-process high performance caching mechanism.

Overview

CacheLib

CacheLib

Pluggable caching engine to build and scale high performance cache services. See www.cachelib.org for documentation and more information.

What is CacheLib ?

CacheLib is a C++ library providing in-process high performance caching mechanism. CacheLib provides a thread safe API to build high throughput, low overhead caching services, with built-in ability to leverage DRAM and SSD caching transparently.

Performance benchmarking

CacheLib provides a standalone executable CacheBench that can be used to evaluate the performance of heuristics and caching hardware platforms against production workloads. Additionally CacheBench enables stress testing implementation and design changes to CacheLib to catch correctness and performance issues.

See CacheBench for usage details and examples.

Building and installation

CacheLib provides a build script which prepares and installs all dependencies and prerequisites, then builds CacheLib. The build script has been tested to work on CentOS 8, Ubuntu 18.04, and Debian 10.

git clone https://github.com/facebook/CacheLib
cd CacheLib
./contrib/build.sh -d -j -v

# The resulting library and executables:
./opt/cachelib/bin/cachebench --help

Re-running ./contrib/build.sh will update CacheLib and its dependencies to their latest versions and rebuild them.

See build for more details about the building and installation process.

Contributing

We'd love to have your help in making CacheLib better. If you're interested, please read our guide to contributing

License

CacheLib is apache licensed, as found in the LICENSE file.

Reporting and Fixing Security Issues

Please do not open GitHub issues or pull requests - this makes the problem immediately visible to everyone, including malicious actors. Security issues in CacheLib can be safely reported via Facebook's Whitehat Bug Bounty program:

https://www.facebook.com/whitehat

Facebook's security team will triage your report and determine whether or not is it eligible for a bounty under our program.

Comments
  • Shorten critical section in findEviction

    Shorten critical section in findEviction

    Remove the item from mmContainer and drop the lock before attempting eviction.

    The change improves throughput for default hit_ratio/graph_cache_leader_fbobj config by ~30%. It also reduces p99 latencies significantly. The improvement is even bigger for multi-tier approach (multiple memory tiers) which we are working on here: https://github.com/pmem/CacheLib

    I was not able to find any races/synchronization problems with this approach but there is a good chance I missed something - it would be great if you could review and evaluate this patch.

    CLA Signed 
    opened by igchor 25
  • First set of changes to cache configuration API to enable multi-tier caches

    First set of changes to cache configuration API to enable multi-tier caches

    These changes introduce per-tier cache configuration required to implement features discussed here: https://github.com/facebook/CacheLib/discussions/102. These specific changes enable single DRAM tier configs only which are compatible with the current version of cachelib. Configuration API will be expanded as multi-tier changes in other parts of the library are introduced.

    CLA Signed 
    opened by victoria-mcgrath 23
  • CacheLib won't build on Centos 8.1 with kernel 5.6.13-0_fbk6_4203_g4cb46d044bc6

    CacheLib won't build on Centos 8.1 with kernel 5.6.13-0_fbk6_4203_g4cb46d044bc6

    I have modified the NandWrites.cpp file, wdcWriteBytes function to support getting the bytes written for WDC drives and having issues building the CacheLib executable. The NandWrites.txt attached file contains the changes made to NandWrites.cpp file needed to support WDC drives.

    The gmake is failing with the following errors: CMakeFiles/cmTC_78ecd.dir/src.c.o: In function main': src.c:(.text+0x2f): undefined reference topthread_create' src.c:(.text+0x3b): undefined reference to pthread_detach' src.c:(.text+0x47): undefined reference topthread_cancel' src.c:(.text+0x58): undefined reference to `pthread_join' collect2: error: ld returned 1 exit status

    See the attached log and out files in the build-fail.zip file for more details.

    build-fail.zip NandWrites.txt

    opened by jeffreyalien 17
  • Cache performance is low when num items is high

    Cache performance is low when num items is high

    Hi, I am using Cachelib to store all items that are requested but missing from underlying layer. So the values of these items are empty. As a result, for 40GB of data, we will end up with around 2e7 items in the cache, which corresponds to the bucket power of 26. I measured the cache hit and cache miss latency. Compared with another cache using cachelib which stores regular items, this cache performance really suffers.

    | Cache type | Cache hit latency | Cache miss latency | items in cache | cache hit ratio | | --- | --- | --- | --- | --- | | Cache storing regular items | 2us | 1us | 2^22 | 7% | | Cache storing empty values | 5us | 4us | 2^25 | 44% |

    BTW, I set the cache size large enough so no evictions happens. The increased cache latency actually diluted the value of using cache. Can you give some guidance on this?

    opened by wenhaocs 16
  • ItemHandle from find() is nullptr from NVMCache after large concurrent writes

    ItemHandle from find() is nullptr from NVMCache after large concurrent writes

    Describe the bug I am implementing a cache using hybrid cache feature of CacheLib. When I am doing a test of directly inserting large amount of data into NVMCache (I wait explicitly handle.wait()), and then read data out using handle.find(), a lot of returned handle which wentToNVM but now points to nullptr.

    In details, here is my list of cache config.

    config
              .setCacheSize(40 * 1024 * 1024)
              .setCacheName(name_)
              .setAccessConfig(std::pow(2, cacheEntriesPower_));
      nvmConfig.navyConfig.setSimpleFile(<some path>,100 * 1024 * 1024,false);
      nvmConfig.navyConfig.setBlockSize(4096);
      nvmConfig.navyConfig.blockCache().setRegionSize(16 * 1024 * 1024);
      config.enableNvmCache(nvmConfig);
    

    I add a small DRAM pool: cacheInternal_->addPool(poolName, 10 * 1024 * 1024);

    Here is my test:

    for (auto i = 0; i < 10240; i++) {
        std::string key = std::to_string(i);
        std::string val = longText + key;     // longText is 4KB size
        // put vertex
        auto status = cache->Insert(key, val);
        EXPECT_TRUE(status.ok());
      }
    
    for (auto i = 0; i <= 10240; i++) {
        std::string key = std::to_string(i);
        auto handle = cache->lookup(key);   // This function will wrap find() and wait();
        EXPECT_NE(handle, nullptr);
       ..........
      }
    

    Note here I am using a fixed file name as navy device file. In this test, the small keys (starting from 0) will be evicted to NVM. When I run this test for the first time, around the first 1300 handles are not nullptr, but the remaining handles from NVM are nullptr. If I run a second time, more handles (around 3000) are not nullptr. If I continue this and about run 4 times, all returned handles do not equal to nullptr and can retrieve data. In terms of the navy file size, the first run of the test will fail and the file size is 16MB. The second run will fail, but the file size increased to 32MB. This process continued. After the last successful run, the file size is 81MB.

    So I guess, maybe it is because the device file increase speed doesn't match the speed of evicting to NVM. What if I dd a file in advance? If I dd a 100MB file as the navy device file in advance. The aforementioned test can pass.

    Another interesting thing is if I increase the DRAM part cache size by changing the pool size from 10MB to 30MB, the test can pass too. With this in mind, I tried to reproduce the error with your ConcurrentFills in cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp

    To Reproduce Here are the changes I made:

    diff --git a/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp b/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp
    index 28051e7f..944ec266 100644
    --- a/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp
    +++ b/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp
    @@ -357,11 +357,11 @@ TEST_F(NvmCacheTest, ConcurrentFills) {
       auto& nvm = this->cache();
       auto pid = this->poolId();
    
    -  const int nKeys = 1024;
    +  const int nKeys = 10240;
    
       for (unsigned int i = 0; i < nKeys; i++) {
         auto key = std::string("blah") + folly::to<std::string>(i);
    -    auto it = nvm.allocate(pid, key, 15 * 1024);
    +    auto it = nvm.allocate(pid, key, 4 * 1024);
         ASSERT_NE(nullptr, it);
         *((int*)it->getMemory()) = i;
         nvm.insertOrReplace(it);
    @@ -370,7 +370,7 @@ TEST_F(NvmCacheTest, ConcurrentFills) {
       auto doConcurrentFetch = [&](int id) {
         auto key = std::string("blah") + folly::to<std::string>(id);
         std::vector<std::thread> thr;
    -    for (unsigned int j = 0; j < 50; j++) {
    +    for (unsigned int j = 0; j < 1; j++) {
           thr.push_back(std::thread([&]() {
             auto hdl = nvm.find(key);
             hdl.wait();
    @@ -378,12 +378,12 @@ TEST_F(NvmCacheTest, ConcurrentFills) {
             ASSERT_EQ(id, *(int*)hdl->getMemory());
           }));
         }
    -    for (unsigned int j = 0; j < 50; j++) {
    +    for (unsigned int j = 0; j < 1; j++) {
           thr[j].join();
         }
       };
    
    -  for (unsigned int i = 0; i < 10; i++) {
    +  for (unsigned int i = 0; i < 10240; i++) {
         doConcurrentFetch(i);
       }
     }
    
    diff --git a/cachelib/allocator/tests/NvmTestUtils.h b/cachelib/allocator/tests/NvmTestUtils.h
    index 8f53d24b..2f1c8974 100644
    --- a/cachelib/allocator/tests/NvmTestUtils.h
    +++ b/cachelib/allocator/tests/NvmTestUtils.h
    @@ -31,7 +31,7 @@ inline NavyConfig getNvmTestConfig(const std::string& cacheDir) {
       config.setDeviceMetadataSize(4 * 1024 * 1024);
       config.setBlockSize(1024);
       config.setNavyReqOrderingShards(10);
    -  config.blockCache().setRegionSize(4 * 1024 * 1024);
    +  config.blockCache().setRegionSize(16 * 1024 * 1024);
       config.bigHash()
           .setSizePctAndMaxItemSize(50, 100)
           .setBucketSize(1024)
    

    And you will see the error: image

    System:

    • OS: Ubuntu20.04
    • Disk: NVMe SSD

    Additional context Can you help let me know if this is a potential bug or the configuration issue? Thanks!

    opened by wenhaocs 15
  • Prepare findEviction to be extended for multiple memory tiers

    Prepare findEviction to be extended for multiple memory tiers

    This PR refactors the eviction logic (inside findEviction) so that it will be easy to add support for multiple memory tiers. Problem with multi-tier configuration is that the critical section under MMContainer lock is too long. To fix that we have implemented an algorithm which utilize WaitContext to decrease the critical section. (which will be part of future PRs).

    The idea is to use moving (now exclusive) bit to synchronize eviction with SlabRelease (and in future, with readers). In this PR, I only changed how findEviction synchronizes with SlabRelease.

    This PR is a subset of: https://github.com/facebook/CacheLib/pull/132 The above PR introduced some performance regressions in the single-memory-tier version which we weren't able to fix yet, hence we decided to first implement this part (which should not affect performance) and later we can add separate path for multi-tier or try to optimize the original patch.

    CLA Signed Reverted 
    opened by igchor 14
  • Potential False Positive Data Races

    Potential False Positive Data Races

    @byrnedj and I have been running some cachebench tests and we think we found some data races that are false positives and wanted to provide the output we gathered. This was done with having thread sanitizer enabled while building with clang.

    An example of the first output (findEviction) occurs in the eviction code path - while the mmContainer iterator lock is held. Another thread attempts to get the eviction iterator and the race is detected. The tests were ran with the default settings using feature_stress/dram_cache.json and consistency/ram-with-deletes.json.

    Below are the outputs of the two that were detected. findEviction_Output.txt removeLocked_Output.txt

    opened by mcengija 14
  • File-mapped memory support in shared memory manager

    File-mapped memory support in shared memory manager

    Added file-based SHM segment option in SHM module and CacheAllocator. This PR allows any memory-mapped file to be used as a memory tier in CacheLib. However, these changes only allow use of a single tier currently. This is first half of the changes needed to enable multi-tiering in CacheLib. The multi-tier config APIs are in a separate PR (https://github.com/facebook/CacheLib/pull/138). These config APIs are needed to enable multi-tiering. Once that PR is merged, the second half of the multi-tier changes can be sent upstream via a separate PR.

    CLA Signed 
    opened by guptask 10
  • Extend cachbench with value validation

    Extend cachbench with value validation

    The main purpose of this patch is to better simulate workloads in cachebench. Setting validateValue to true allows to see performance impact of using different mediums for memory cache.

    CLA Signed 
    opened by igchor 9
  • Set limit for bucketPower on config with #cacheEntries

    Set limit for bucketPower on config with #cacheEntries

    This PR fixes the potential bug of exceeding the limit for bucketPower and lockPower, by capping the bucketPower with kMaxBucketPower.

    Fix: https://github.com/facebook/CacheLib/issues/125

    CLA Signed 
    opened by wenhaocs 9
  • HybridCache Output

    HybridCache Output

    Hello,

    I am using HybridCache to extend the size of the cache to fit the workload.

    I am writing fixed size page of 4096 bytes to the cache. However, even when the NVM cache size is set to 50GB, the number of objects in the cache does not exceed 1,958,200 which means total data of 1,958,200*4096 bytes which is around 7.5GB only. I notice similar pattern with the DRAM usage as well where the number of items of size 4096 bytes is lower than what it should be based on the allocation. Is there any allocation setting that I am not setting to optimize for fixed 4096 byte pages which is leading to fragmentation?

    This is the configuration that I am using. Please ignore the additional parameters that I have added.

    { "cache_config": { "cacheSizeMB": 100, "minAllocSize": 4096, "navyBigHashSizePct": 0, "nvmCachePath": ["/flash/cache"], "nvmCacheSizeMB": 50000, "navyReaderThreads": 32, "navyWriterThreads": 32, "navyBlockSize": 4096, "navySizeClasses": [4096, 8192, 12288, 16384] }, "test_config": { "enableLookaside": "true", "generator": "block-replay", "numThreads": 1, "traceFilePath": "/home/pranav/csv_traces/w81-w85.csv", "traceBlockSize": 512, "diskFilePath": "/disk/disk.file", "pageSize": 4096, "minLBA": 0 } }

    == Allocator Stats == Items in RAM : 17,711 (100MB allocation should fit more item!) Items in NVM : 1,958,200 (50GB allocation only fits 2million 4KB pages) Alloc Attempts: 122,793,861 Success: 100.00% RAM Evictions : 115,587,253 ( Why is each eviction not being admitted into the cache? If it is then why is items in NVM limited to 1,958,200?) Cache Gets : 113,169,162

    opened by pbhandar2 9
  • Enable the CacheLibWrapper class as a RocksDB Plugin

    Enable the CacheLibWrapper class as a RocksDB Plugin

    Took the existing code and got it to compile, build, and pass the tests as a RocksDB Plugin. Wrote a README with instructions on how to use and enable it.

    In addition to verifying that the class could be created via the customizable_test, I built and executed the unit test (requires changes to the build as outlined in the README). I also verified via db_bench (command "./db_bench --secondary_cache_uri="id=RocksCachelibWrapper;size=256M;cachename=db_bench;filename=/tmp/db_bench_cache") that the secondary cache could be created.

    I am not sure how to validate that the SecondaryCache is doing the right thing, but the test works without failures. A db_bench with readrandom appears to be slower with the SecondaryCache than without: ./db_bench --compression_type=none --num=1000000 --benchmarks=fillseq,readrandom fillseq : 32.053 micros/op 31198 ops/sec 32.053 seconds 1000000 operations; 3.5 MB/s DB path: [/tmp/rocksdbtest-1000/dbbench] readrandom : 51.583 micros/op 19386 ops/sec 51.583 seconds 1000000 operations; 2.1 MB/s (1000000 of 1000000 found)

    ./db_bench --secondary_cache_uri="id=RocksCachelibWrapper;size=256M;cachename=db_bench;filename=/tmp/db_bench_cache" --compression_type=none --num=1000000 --benchmarks=fillseq,readrandom fillseq : 32.136 micros/op 31118 ops/sec 32.136 seconds 1000000 operations; 3.4 MB/s DB path: [/tmp/rocksdbtest-1000/dbbench] readrandom : 67.078 micros/op 14907 ops/sec 67.078 seconds 1000000 operations; 1.6 MB/s (1000000 of 1000000 found)

    CLA Signed 
    opened by mrambacher 0
  • Introduce 'markedForEviction' state for the Item.

    Introduce 'markedForEviction' state for the Item.

    It is similar to 'moving' but requires ref count to be 0.

    An item which is marked for eviction causes all incRef() calls to that item to fail.

    This will be used to ensure that once item is selected for eviction, no one can interfere and prevent the eviction from suceeding.

    'markedForEviction' relies on the same 'exlusive' bit as the 'moving' state. To distinguish between those two states, 'moving' add 1 to the refCount. This is hidden from the user, so getRefCount() will not return that extra ref.

    CLA Signed 
    opened by igchor 1
  • Add combined locking support for MMContainer

    Add combined locking support for MMContainer

    through withEvictionIterator function.

    Also, expose the config option to enable and disable combined locking. withEvictionIterator is implemented as an extra function, getEvictionIterator() is still there and it's behavior hasn't changed.

    This is a subset of changes from: https://github.com/facebook/CacheLib/pull/172

    CLA Signed 
    opened by igchor 7
  • Max pool size question

    Max pool size question

    I am curious if there is a technical limitation that prevents CacheLib from supporting pool sizes larger than 64. In a multi-tenant system, we usually want the pool size to be tied to tenants.

    https://github.com/facebook/CacheLib/blob/main/cachelib/allocator/memory/MemoryPoolManager.h#L48

    opened by gangliao 3
  • Critical section part 2: findEviction optimization and code unification

    Critical section part 2: findEviction optimization and code unification

    Previous PR (https://github.com/facebook/CacheLib/pull/166) refactored code in findEviction so it would be easier to add support for multiple memory tiers.

    Now, we have two possible approaches to move forward with adding multi-tier support:

    1. Implement item movement between tiers as a separate code path in findEviction
    2. Unify eviction and item movement (for single-tier, multi-tier and SlabRelease)

    This PR implements the second approach. It unifies logic for item movement and eviction, allowing us to later reuse SlabRelease item movement logic for multi-tier implementation. This PR also enables the use of combined locking for MMContainer. It allows us to achieve much better performance in multi-tier scenario.

    We have also observed 2X throughput improvement for leader benchmarks (on the upstream, single-tier version) with this PR. We have also seen an improvement in allocation latency. However, I have to admit that this patch is pretty complex. Please let me know if you're open to merging this or if you'd prefer not to modify so much code.

    It would be great if you could validate those changes internally. I encountered one inconsistency when running consistency/navy.json benchmark but it's pretty rare (one per 1200M ops) and I'm not able to debug this. We'd like to validate those changes by running navy tests but unfortunately they are failing even on main branch: https://github.com/facebook/CacheLib/issues/169

    CLA Signed 
    opened by igchor 1
Owner
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
Facebook
A C++20 concepts library, providing container concepts etc.

More concepts #include <more_concepts/more_concepts.hpp> This library aims to provide general purpose concepts that are not available in the C++20 con

Michal Jankovič 54 Dec 18, 2022
Extended Process List (Search functionality)

Extended Process List (ps with search) (64-bit only) Added search functionality for process listing. Credits to @odzhan, Alfie Champion (@ajpc500), Sy

snoom 26 May 7, 2022
Beacon Object File (BOF) for remote process injection via thread hijacking

cThreadHijack ___________.__ .______ ___ .__ __ __ ___\__ ___/| |_________ ____ _____

Connor McGarr 158 Dec 28, 2022
A Cobalt Strike Beacon Object File (BOF) project which uses direct system calls to enumerate processes for specific loaded modules or process handles.

FindObjects-BOF A Cobalt Strike Beacon Object File (BOF) project which uses direct system calls to enumerate processes for specific modules or process

Outflank B.V. 247 Dec 28, 2022
fpicker is a Frida-based fuzzing suite supporting various modes (including AFL++ in-process fuzzing)

fpicker fpicker is a Frida-based fuzzing suite that offers a variety of fuzzing modes for in-process fuzzing, such as an AFL++ mode or a passive traci

Dennis Heinze 184 Dec 30, 2022
x64 Windows PatchGuard bypass, register process-creation callbacks from unsigned code

NoPatchGuardCallback x64 Windows PatchGuard bypass, register process-creation callbacks from unsigned code Read: https://www.godeye.club/2021/05/22/00

Kento Oki 139 Dec 26, 2022
Analyze patches in a process for investigation or repairment purposes.

HookHunter Analyze patches in a process for investigation or repairment purposes. Details HookHunter is a multi-purpose Windows tool that can search a

null 196 Jan 3, 2023
C/C++ Windows Process Injector for Educational Purposes.

ProcessInjector C/C++ Windows Process Injector for Educational Purposes. What does this software do? This is a simple process injector that uses the C

Berat Çağrı Eroğlu 8 May 3, 2022
Section Mapping Process Injection (secinject): Cobalt Strike BOF

Section Mapping Process Injection (secinject): Cobalt Strike BOF Beacon Object File (BOF) that leverages Native APIs to achieve process injection thro

null 77 Dec 16, 2022
This is a experimental tool to hide process in FreeBSD

FreeBSD process hiding This is a experimental tool to hide process in FreeBSD. Requirements clang pkg install clang kernel modules git clone --depth=

Gabriel M. Dutra 4 Oct 18, 2021
My CS:GO cheat, written with performance in mind.

sinclair_csgo My CS:GO cheat, written with performance in mind. Currently in development, and I plan to keep this as such!

Gabriel 25 Apr 9, 2022
Cheap: customized heaps for improved application performance.

Cheap: a malloc/new optimizer by Emery Berger About Cheap Cheap is a system that makes it easy to improve the performance of memory-intensive C/

Emery Berger 23 Dec 6, 2022
mpiFileUtils - File utilities designed for scalability and performance.

mpiFileUtils provides both a library called libmfu and a suite of MPI-based tools to manage large datasets, which may vary from large directory trees to large files.

High-Performance Computing 133 Jan 4, 2023
Isocline is a pure C library that can be used as an alternative to the GNU readline library

Isocline: a portable readline alternative. Isocline is a pure C library that can be used as an alternative to the GNU readline library (latest release

Daan 136 Dec 30, 2022
A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename.

whereami A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename. usag

Blackle Morisanchetto 3 Sep 24, 2022
Command-line arguments parsing library.

argparse argparse - A command line arguments parsing library in C (compatible with C++). Description This module is inspired by parse-options.c (git)

Yecheng Fu 533 Dec 26, 2022
A cross platform C99 library to get cpu features at runtime.

cpu_features A cross-platform C library to retrieve CPU features (such as available instructions) at runtime. Table of Contents Design Rationale Code

Google 2.2k Dec 22, 2022
Library that solves the exact cover problem using Dancing Links, also known as DLX.

The DLX Library The DLX library The DLX library solves instances of the exact cover problem, using Dancing Links (Knuth’s Algorithm X). Also included

Ben Lynn 44 Dec 18, 2022