Mesh - A memory allocator that automatically reduces the memory footprint of C/C++ applications.

Overview

Mesh: Compacting Memory Management for C/C++

Mesh is a drop in replacement for malloc(3) that can transparently recover from memory fragmentation without any changes to application code.

Mesh is described in detail in a paper (PDF) that appeared at PLDI 2019.

Or watch this talk by Bobby Powers at Strange Loop:

Compacting the Uncompactable

Mesh runs on Linux and macOS. Windows is a work in progress.

Mesh uses bazel as a build system, but wraps it in a Makefile, and has no runtime dependencies other than libc:

$ git clone https://github.com/plasma-umass/mesh
$ cd mesh
$ make; sudo make install
# example: run git with mesh as its allocator:
$ LD_PRELOAD=libmesh.so git status

Please open an issue if you have questions (or issues)!

But will it blend?

If you run a program linked against mesh (or with Mesh LD_PRELOADed), setting the variable MALLOCSTATS=1 will instruct mesh to print a summary at exit:

$ MALLOCSTATS=1 ./bin/redis-server-mesh ./redis.conf
25216:C 11 Mar 20:27:12.050 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
25216:C 11 Mar 20:27:12.050 # Redis version=4.0.2, bits=64, commit=dfe0d212, modified=0, pid=25216, just started
25216:C 11 Mar 20:27:12.050 # Configuration loaded
[...]
^C25216:signal-handler (1583983641) Received SIGINT scheduling shutdown...
25216:M 11 Mar 20:27:21.945 # User requested shutdown...
25216:M 11 Mar 20:27:21.945 * Removing the pid file.
25216:M 11 Mar 20:27:21.945 * Removing the unix socket file.
25216:M 11 Mar 20:27:21.945 # Redis is now ready to exit, bye bye...
MESH COUNT:         25918
Meshed MB (total):  101.2
Meshed pages HWM:   25918
Meshed MB HWM:      101.2
MH Alloc Count:     56775
MH Free  Count:     17
MH High Water Mark: 82687

Not all workloads experience fragmentation, so its possible that Mesh will have a small 'Meshed MB (total' number!

Implementation Overview

Mesh is built on Heap Layers, an infrastructure for building high performance memory allocators in C++ (see the paper for details.)

The entry point of the library is libmesh.cc. This file is where malloc, free and the instantiations of the Heap used for allocating program memory lives.

DEFINITIONS

  • Page: The smallest block of memory managed by the operating system, 4Kb on most architectures. Memory given to the allocator by the operating system is always in multiples of the page size, and aligned to the page size.
  • Span: A contiguous run of 1 or more pages. It is often larger than the page size to account for large allocations and amortize the cost of heap metadata.
  • Arena: A contiguous range of virtual address space we allocate out of. All allocations returned by malloc(3) reside within the arena.
  • GlobalHeap: The global heap carves out the Arena into Spans and performs meshing.
  • MiniHeap: Metadata for a Span -- at any time a live Span has a single MiniHeap owner. For small objects, MiniHeaps have a bitmap to track whether an allocation is live or freed.
  • ThreadLocalHeap: A collections of MiniHeaps and a ShuffleVector so that most allocations and free(3)s can be fast and lock-free.
  • ShuffleVector: A novel data structure that enables randomized allocation with bump-pointer-like speed.
Issues
  • Mesh crash on test-stress.c

    Mesh crash on test-stress.c

    Mesh can't pass the test on linux, the test code is from the mimalloc test case. https://github.com/microsoft/mimalloc/blob/master/test/test-stress.c

    I do a little fix to remove the mimalloc dependence。

    [email protected]:~/work/mimalloc/test$ LD_PRELOAD=~/work/Mesh/build/lib/libmesh.so ./test-stress Using 32 threads with a 10% load-per-thread and 50 iterations /home/kyo/work/Mesh/src/mini_heap.h:195:void mesh::MiniHeap::freeOff(size_t): ASSERTION '_bitmap.isSet(off)' FAILED: MiniHeap(0x7f0a09020e80) expected bit 5 to be set (svOff:0)

    or

    [email protected]:~/work/mimalloc/test$ LD_PRELOAD=~/work/Mesh/build/lib/libmesh.so ./test-stress Using 32 threads with a 10% load-per-thread and 50 iterations libmesh: caught null pointer dereference (signal: 11)

    And I also found Mesh can pass the test-stress test on MacOS.

    test-stress.txt

    opened by kyoguan 13
  • Build issues on linux, with various gcc and clang versions

    Build issues on linux, with various gcc and clang versions

    Hello, I'm struggling with build issues using various compilers and haven't found a working configuration. e.g.

    $ CXX=clang++-3.9 CC=clang-3.9 make
      CXX   build/src/unit/bitmap_test.o
    clang: warning: argument unused during compilation: '-Wa,--noexecstack'
      CXX   build/src/unit/mesh_test.o
    clang: warning: argument unused during compilation: '-Wa,--noexecstack'
    In file included from src/unit/mesh_test.cc:10:
    In file included from src/meshing.h:17:
    src/mini_heap.h:116:3: error: no template named 'atomic_uint32_t'; did you mean 'atomic_init'?
      atomic_uint32_t _flags;
      ^~~~~~~~~~~~~~~
      atomic_init
    /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/atomic:944:5: note: 'atomic_init' declared here
        atomic_init(atomic<_ITp>* __a, _ITp __i) noexcept
        ^
    In file included from src/unit/mesh_test.cc:10:
    In file included from src/meshing.h:17:
    src/mini_heap.h:116:3: error: unknown type name 'atomic_uint32_t'
      atomic_uint32_t _flags;
      ^
    src/mini_heap.h:474:1: error: static_assert failed "MiniHeap too big!"
    static_assert(sizeof(MiniHeap) == 64, "MiniHeap too big!");
    ^             ~~~~~~~~~~~~~~~~~~~~~~
    3 errors generated.
    GNUmakefile:111: recipe for target 'build/src/unit/mesh_test.o' failed
    make: *** [build/src/unit/mesh_test.o] Error 1
    

    gcc/g++ 6.3 gave a different error. Unfortunately my distro doesn't seem to offer other g++ versions so I was limited in what I could try easily.

    opened by jberryman 9
  • Illegal instruction

    Illegal instruction

    I get an Illegal instruction crash on Debian testing:

    mesh$ ./configure --no-optimize --no-debug
    mesh$ make
      CXX   build/src/unit/bitmap_test.o
      CXX   build/src/unit/mesh_test.o
      CXX   build/src/unit/alignment.o
      CXX   build/src/unit/size_class_test.o
      CXX   build/src/unit/binned_tracker_test.o
      CXX   build/src/unit/triple_mesh_test.o
      CXX   build/src/unit/rng_test.o
      CXX   build/src/unit/concurrent_mesh_test.o
      CXX   build/src/vendor/googletest/googletest/src/gtest-all.o
      CXX   build/src/vendor/googletest/googletest/src/gtest_main.o
      CXX   build/src/thread_local_heap.o
      CXX   build/src/global_heap.o
      CXX   build/src/runtime.o
      CXX   build/src/real.o
      CXX   build/src/meshable_arena.o
      CXX   build/src/d_assert.o
      CXX   build/src/measure_rss.o
      LD    unit.test
    Running main() from src/vendor/googletest/googletest/src/gtest_main.cc
    [==========] Running 22 tests from 8 test cases.
    [----------] Global test environment set-up.
    [----------] 11 tests from BitmapTest
    [ RUN      ] BitmapTest.RepresentationSize
    [       OK ] BitmapTest.RepresentationSize (0 ms)
    [ RUN      ] BitmapTest.LowestSetBitAt
    [       OK ] BitmapTest.LowestSetBitAt (0 ms)
    [ RUN      ] BitmapTest.HighestSetBitAt
    [       OK ] BitmapTest.HighestSetBitAt (0 ms)
    [ RUN      ] BitmapTest.SetAndExchangeAll
    [       OK ] BitmapTest.SetAndExchangeAll (0 ms)
    [ RUN      ] BitmapTest.SetAll
    [       OK ] BitmapTest.SetAll (0 ms)
    [ RUN      ] BitmapTest.SetGet
    [       OK ] BitmapTest.SetGet (124 ms)
    [ RUN      ] BitmapTest.SetGetRelaxed
    [       OK ] BitmapTest.SetGetRelaxed (2202 ms)
    [ RUN      ] BitmapTest.Builtins
    [       OK ] BitmapTest.Builtins (0 ms)
    [ RUN      ] BitmapTest.Iter
    make: *** [GNUmakefile:135: test] Illegal instruction (core dumped)
    
    opened by Zash 7
  • Segmentation fault on google-chrome

    Segmentation fault on google-chrome

    These are some really amazing results you got on Firefox! I was trying to reproduce some of this in a fresh build of chrome (or any that may be installed) and got a segmentation fault.

    Before I dig into this, I figure it would be smart to ask: is there any chance that you'd expect any of this to work on chrome? Or is there some architectural aspect of chrome that maybe would prevent some of this to work?

    $ git clone --recurse-submodules https://github.com/plasma-umass/mesh
    $ cd mesh
    $ ./configure; make; sudo make install
    # example: run git with mesh as its allocator:
    $ LD_PRELOAD=libmesh.so google-chrome
    

    Leads to

    Segmentation fault
    

    Ideas?

    opened by samuelgoto 6
  • How does the page table translation work?

    How does the page table translation work?

    Hi, I'm very interesting in the Section 4.5.1 in paper. The translation from virtual memory address to physical memory address is done by OS with the help of TLB. How can Mesh replace it instead? I think the key is We exploit the fact that mmap lets the same offset in a file(corresponding to a physical span) be mapped to multiple addresses. But I just can't fill the gap. Any help would be appreciated. :)

    opened by spin6lock 6
  • LD_PRELOAD Failure on Debian Stretch (9.4)

    LD_PRELOAD Failure on Debian Stretch (9.4)

    I have compiled the code using clang on debian 9.4 as per instructions in README. But I am getting error in LD_PRELOAD

    Error

    $ LD_PRELOAD=libmesh.so git status ERROR: ld.so: object 'libmesh.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

    clang++-3.9 --version

    clang version 3.9.1-9 (tags/RELEASE_391/rc2) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin

    uname -a

    Linux dal2mdspkgm02.corporate.local 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux

    opened by RamanGupta16 6
  • Segfault with simple stress test of global allocator

    Segfault with simple stress test of global allocator

    The following simple stress test fails rapidly with a segfault:

    #include <vector>
    #include <cstdlib>
    
    static constexpr size_t kMinAllocSz = 800000;
    static constexpr size_t kMaxAllocSz = 900000;
    static constexpr unsigned kMaxLiveAlloc = 128;  // keep no more than 128 * kMaxAllocSz memory allocated.
    
    int main(void) {
        std::vector<void *> alloc(kMaxLiveAlloc, nullptr);
    
        while (1) {
            const size_t ix = rand() % kMaxLiveAlloc;
            const size_t sz = (rand() % (kMaxAllocSz - kMinAllocSz)) + kMinAllocSz;
    
            free(alloc[ix]);
            alloc[ix] = malloc(sz);
        }
        return 0;
    }
    

    Reproduce with:

     g++ --std=c++14 -g -Wall -Werror msimple.cpp -o msimple -ldl
    LD_PRELOAD=/home/kvigor/mesh/libmesh.so ./msimple
    

    I have attempted to debug this a bit. What I have found is that in GlobalHeap::pageAlignedAlloc() I will see that the pointer returned from mh->mallocAt() == arenaBegin(), and if I check miniheapForLocked(ptr) it does not point to the original miniheap. When I later free this allocation it fails due to the same bogus miniheap pointer being found in ThreadLocalHeap::free(). Unfortunately I have not yet been able to diagnose further.

    This error does not occur if I allocate fixed size objects; having some randomness in the allocation size seems to be required to trigger the issue.

    (minor sidebar: what is the "locked" implied in miniheapForLocked()? free() calls it with no obvious locks held? A comment and/or renaming of the function might help.)

    opened by kevin-vigor 5
  • Theoretical micro-op: modify shuffle vector to swap on *allocation*

    Theoretical micro-op: modify shuffle vector to swap on *allocation*

    So the shuffle vector from the paper has triggered a bit of a side-discussion on Twitter. At on point, Amit Patel mentioned this:

    I've done something like Fischer-Yates the other way — push at the end of the vector, pop by swapping a random element to the end.

    I think this is a potential optimization of the shuffle vector as used in Mesh. In your paper as published on arXiv, you write:

    Each comprises a fixed-size array consisting of all the offsets from a span that are not already allocated, and an allocation index representing the head. Each vector is initially randomized with the Knuth-Fischer-Yates shuffle [18],and its allocation index is set to 0. Allocation proceeds by selecting the next available number in the vector, “bumping” the allocation index and returning the corresponding address. Deallocation works by placing the freed object at the front of the vector and performing one iteration of the shuffle algorithm; this operation preserves randomization of the vector.

    By moving the operation from swap-on-deallocation to swap-just-before-allocation, you keep the randomly-picked element guarantee, but remove the need to initialize the vector with Knuth-Fisher-Yates: one can simply initiate a span of n offsets with values 0 to n-1.

    (I doubt this has any real performance impact, but it should simplify the code a bit at least)

    opened by JobLeonard 5
  • Support for non-root installation

    Support for non-root installation

    I would really like to test your tool, but I need to be able to install it in my user space. I see that CMake support was just added, but it does not include installation support.

    I was able to build using cmake but when I tried copying libmesh.so to the directory where I was running my executable:

    LD_PRELOAD=libmesh.so ./my_exec
    

    I got the same error from 32.

    I like the ease of use that you get from setting LD_PRELOAD, but I am happy to pass -lmesh etc to the compiler and rebuild if that is sufficient. Will I get the same symbol "shadowing" that I assume we are after when setting LD_PRELOAD? My first attempt at the latter didn't appear to change my memory footprint...

    opened by david8dixon 4
  • Segfault when starting Firefox

    Segfault when starting Firefox

    image

    Using 667bb69 compiled on Manjaro with gcc 8.2.1 20181127. Any other details I can provide that would be helpful?

     ~/src > git clone --recurse-submodules https://github.com/plasma-umass/mesh                                                                   Tue 19 Feb 2019 10:01:51 AM PST
    Cloning into 'mesh'...
    remote: Enumerating objects: 93, done.
    remote: Counting objects: 100% (93/93), done.
    remote: Compressing objects: 100% (76/76), done.
    remote: Total 5119 (delta 33), reused 39 (delta 14), pack-reused 5026
    Receiving objects: 100% (5119/5119), 5.12 MiB | 9.78 MiB/s, done.
    Resolving deltas: 100% (3483/3483), done.
    Submodule 'Heap-Layers' (https://github.com/emeryberger/Heap-Layers) registered for path 'src/vendor/Heap-Layers'
    Submodule 'src/vendor/googletest' (https://github.com/google/googletest.git) registered for path 'src/vendor/googletest'
    Cloning into '/home/jesse/src/mesh/src/vendor/Heap-Layers'...
    remote: Enumerating objects: 14, done.
    remote: Counting objects: 100% (14/14), done.
    remote: Compressing objects: 100% (13/13), done.
    remote: Total 1706 (delta 4), reused 7 (delta 1), pack-reused 1692
    Receiving objects: 100% (1706/1706), 405.11 KiB | 5.33 MiB/s, done.
    Resolving deltas: 100% (1117/1117), done.
    Cloning into '/home/jesse/src/mesh/src/vendor/googletest'...
    remote: Enumerating objects: 2, done.
    remote: Counting objects: 100% (2/2), done.
    remote: Compressing objects: 100% (2/2), done.
    remote: Total 16274 (delta 0), reused 0 (delta 0), pack-reused 16272
    Receiving objects: 100% (16274/16274), 5.65 MiB | 6.85 MiB/s, done.
    Resolving deltas: 100% (11985/11985), done.
    Submodule path 'src/vendor/Heap-Layers': checked out 'af8961599772d1f33ac33178f86fd92cd67e8cf0'
    Submodule path 'src/vendor/googletest': checked out '529c2c6f4af29dadb8ee5cddf6a7919caa5ca5f6'
    
     ~/src > cd mesh/                                                                                                                     3259ms  Tue 19 Feb 2019 10:01:55 AM PST
     ~/s/mesh > ./configure                                                                                                                    Tue 19 Feb 2019 10:01:57 AM PST
     ~/s/mesh > make                                                                                                                           Tue 19 Feb 2019 10:01:58 AM PST
      CXX   build/src/unit/bitmap_test.o
      CXX   build/src/unit/mesh_test.o
      CXX   build/src/unit/alignment.o
      CXX   build/src/unit/binned_tracker_test.o
      CXX   build/src/unit/triple_mesh_test.o
      CXX   build/src/unit/rng_test.o
      CXX   build/src/unit/concurrent_mesh_test.o
      CXX   build/src/unit/size_class_test.o
      CXX   build/src/vendor/googletest/googletest/src/gtest-all.o
      CXX   build/src/vendor/googletest/googletest/src/gtest_main.o
      CXX   build/src/thread_local_heap.o
      CXX   build/src/global_heap.o
      CXX   build/src/runtime.o
      CXX   build/src/real.o
      CXX   build/src/meshable_arena.o
      CXX   build/src/d_assert.o
      CXX   build/src/measure_rss.o
      LD    unit.test
    
    Running main() from src/vendor/googletest/googletest/src/gtest_main.cc
    [==========] Running 22 tests from 8 test cases.
    [----------] Global test environment set-up.
    [----------] 3 tests from SizeClass
    [ RUN      ] SizeClass.MinObjectSize
    [       OK ] SizeClass.MinObjectSize (0 ms)
    [ RUN      ] SizeClass.SmallClasses
    [       OK ] SizeClass.SmallClasses (0 ms)
    [ RUN      ] SizeClass.PowerOfTwo
    [       OK ] SizeClass.PowerOfTwo (0 ms)
    [----------] 3 tests from SizeClass (0 ms total)
    
    [----------] 2 tests from ConcurrentMeshTest
    [ RUN      ] ConcurrentMeshTest.TryMesh
    [       OK ] ConcurrentMeshTest.TryMesh (1 ms)
    [ RUN      ] ConcurrentMeshTest.TryMeshInverse
    [       OK ] ConcurrentMeshTest.TryMeshInverse (1 ms)
    [----------] 2 tests from ConcurrentMeshTest (3 ms total)
    
    [----------] 1 test from RNG
    [ RUN      ] RNG.MWCRange
    [       OK ] RNG.MWCRange (0 ms)
    [----------] 1 test from RNG (0 ms total)
    
    [----------] 1 test from TripleMeshTest
    [ RUN      ] TripleMeshTest.MeshAll
    [       OK ] TripleMeshTest.MeshAll (29 ms)
    [----------] 1 test from TripleMeshTest (29 ms total)
    
    [----------] 1 test from BinnedTracker
    [ RUN      ] BinnedTracker.Tests
    [       OK ] BinnedTracker.Tests (0 ms)
    [----------] 1 test from BinnedTracker (0 ms total)
    [----------] 1 test from Alignment
    [ RUN      ] Alignment.NaturalAlignment
    [       OK ] Alignment.NaturalAlignment (442 ms)
    [----------] 1 test from Alignment (442 ms total)
    
    [----------] 2 tests from MeshTest
    [ RUN      ] MeshTest.TryMesh
    [       OK ] MeshTest.TryMesh (1 ms)
    [ RUN      ] MeshTest.TryMeshInverse
    [       OK ] MeshTest.TryMeshInverse (1 ms)
    [----------] 2 tests from MeshTest (2 ms total)
    
    [----------] 11 tests from BitmapTest
    [ RUN      ] BitmapTest.RepresentationSize
    [       OK ] BitmapTest.RepresentationSize (0 ms)
    [ RUN      ] BitmapTest.LowestSetBitAt
    [       OK ] BitmapTest.LowestSetBitAt (0 ms)
    [ RUN      ] BitmapTest.HighestSetBitAt
    [       OK ] BitmapTest.HighestSetBitAt (0 ms)
    [ RUN      ] BitmapTest.SetAndExchangeAll
    [       OK ] BitmapTest.SetAndExchangeAll (0 ms)
    [ RUN      ] BitmapTest.SetAll
    [       OK ] BitmapTest.SetAll (0 ms)
    [ RUN      ] BitmapTest.SetGet
    [       OK ] BitmapTest.SetGet (16 ms)
    [ RUN      ] BitmapTest.SetGetRelaxed
    [       OK ] BitmapTest.SetGetRelaxed (222 ms)
    [ RUN      ] BitmapTest.Builtins
    [       OK ] BitmapTest.Builtins (0 ms)
    [ RUN      ] BitmapTest.Iter
    [       OK ] BitmapTest.Iter (0 ms)
    [ RUN      ] BitmapTest.Iter2
    [       OK ] BitmapTest.Iter2 (0 ms)
    [ RUN      ] BitmapTest.SetHalf
    [       OK ] BitmapTest.SetHalf (1 ms)
    [----------] 11 tests from BitmapTest (239 ms total)
    
    [----------] Global test environment tear-down
    [==========] 22 tests from 8 test cases ran. (715 ms total)
    [  PASSED  ] 22 tests.
      CXX   build/src/libmesh.o
      LD    libmesh.so
      CXX   build/src/fragmenter.o
      LD    fragmenter
     ~/s/mesh > sudo make install                                                                                                      53.5s  Tue 19 Feb 2019 10:02:54 AM PST
    [sudo] password for jesse:
     ~/s/mesh > env LD_PRELOAD=libmesh.so git status                                                                                  2684ms > Tue 19 Feb 2019 10:05:28 AM PST
    On branch master
    Your branch is up to date with 'origin/master'.
    
    nothing to commit, working tree clean
    
    
     ~ > env LD_PRELOAD=libmesh.so firefox
    segfault (1/0x28): in arena? 0
    ExceptionHandler::GenerateDump cloned child 24873
    ExceptionHandler::SendContinueSignalToChild sent continue signal to child
    ExceptionHandler::WaitForContinueSignal waiting for continue signal...
    2019-02-19 10:07:36: minidump.cc:1926: ERROR: MinidumpModule has a module problem, 0x
    2019-02-19 10:07:36: minidump.cc:2740: ERROR: MinidumpModuleList could not read modul
    2019-02-19 10:07:36: minidump.cc:5895: ERROR: GetStream could not read stream type 4
    fish: “env LD_PRELOAD=libmesh.so firef…” terminated by signal SIGABRT (Abort)
    

    I also tried running Chromium and got a segfault as well:

    ∴ LD_PRELOAD=libmesh.so chromium
    ../../third_party/tcmalloc/gperftools-2.0/chromium/src/tcmalloc.cc:289] Attempt to free invalid pointer 0x7f07e8ef6dc0
    segfault (1/0x39): in arena? 0
    ^C^CAborted (core dumped)
    
    opened by jc00ke 4
  • Startup deadlock in mesh::freeSlowpath(void*)

    Startup deadlock in mesh::freeSlowpath(void*)

    Hi, first of all I would like to thank you for a great piece of work and very promising idea.

    I am trying to build and experiment with the mesh allocator, but unfortunately I am experiencing a deadlock on startup.

    When I run for example this simple command line: LD_PRELOAD=~/experiments/mesh/libmesh.so uname ... it ends in a deadlock with a following backtrace:

    #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39
    #1  0x00007ff641ea43ce in __cxxabiv1::__cxa_guard_acquire (g=0x7ff641f35318 <guard variable for mesh::runtime()::runtimePtr>)
        at /opt/conda/conda-bld/compilers_linux-64_1542882313995/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/guard.cc:306
    #2  0x00007ff641ea33b0 in mesh::runtime() () at src/runtime.h:109
    #3  0x00007ff641e9b1e1 in mesh::freeSlowpath(void*) [clone .lto_priv.66] (ptr=0x0) at src/libmesh.cc:69
    #4  0x00007ff6419d203b in _IO_vfprintf_internal (s=<optimized out>, format=<optimized out>, ap=<optimized out>) at vfprintf.c:2065
    #5  0x00007ff641a8ddf0 in ___vsnprintf_chk (s=0x7fff893e3d40 "/dev/shm/alloc-mesh-2746021.0d", maxlen=<optimized out>, flags=1,
        slen=<optimized out>, format=0x7ff641efd690 "%s/alloc-mesh-%d.%zud", args=0x7fff893e3bf0) at vsnprintf_chk.c:65
    #6  0x00007ff641a8dd2a in ___snprintf_chk (s=<optimized out>, [email protected]=0x7fff893e3d40 "/dev/shm/alloc-mesh-2746021.0d",
        maxlen=<optimized out>, [email protected]=127, flags=<optimized out>, [email protected]=1, slen=<optimized out>, [email protected]=128,
        format=<optimized out>, [email protected]=0x7ff641efd690 "%s/alloc-mesh-%d.%zud") at snprintf_chk.c:36
    #7  0x00007ff641ea3f24 in snprintf ()
        at ~/.conda/envs/local/x86_64-conda_cos6-linux-gnu/sysroot/usr/include/bits/stdio2.h:66
    #8  mesh::MeshableArena::openSpanDir (this=<optimized out>, pid=2746021) at src/meshable_arena.cc:81
    #9  mesh::MeshableArena::openShmSpanFile(unsigned long) [clone .constprop.37] (
        [email protected]=0x7ff641f38340 <mesh::runtime()::buf>, sz=68719476736) at src/meshable_arena.cc:481
    #10 0x00007ff641ea1ef6 in mesh::MeshableArena::openSpanFile () at src/meshable_arena.cc:539
    #11 mesh::MeshableArena::__base_ctor (this=0x7ff641f38340 <mesh::runtime()::buf>) at src/meshable_arena.cc:49
    #12 mesh::GlobalHeap::__base_ctor (this=0x7ff641f38340 <mesh::runtime()::buf>) at src/global_heap.h:88
    #13 mesh::Runtime::Runtime() [clone .constprop.33] (this=0x7ff641f38340 <mesh::runtime()::buf>) at src/runtime.cc:86
    #14 0x00007ff641ea33b9 in mesh::runtime() () at src/runtime.h:109
    #15 0x00007ff641ea3b94 in CreateThreadLocalHeap () at src/thread_local_heap.cc:18
    #16 mesh::ThreadLocalHeap::GetHeap() () at src/thread_local_heap.cc:31
    #17 0x00007ff641e9cca6 in mesh::callocSlowpath(unsigned long, unsigned long) [clone .lto_priv.74] (count=1, size=32) at src/libmesh.cc:80
    #18 0x00007ff641366310 in _dlerror_run (operate=0x7ff6413660b0 <dlsym_doit>, args=0x7fff893e3f90) at dlerror.c:142
    #19 0x00007ff64136607a in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:71
    #20 0x00007ff641e99b3a in mesh::real::init() () at src/real.cc:41
    #21 0x00007ff641e9577b in libmesh_init() [clone .lto_priv.50] () at src/libmesh.cc:14
    #22 0x00007ff641e95ace in global constructors keyed to 65535_0_thread_local_heap.o.12673 ()
       from ~/experiments/mesh/libmesh.so
    #23 0x00007ff641d309cf in call_init (env=0x7fff893e4128, argv=0x7fff893e4118, argc=1, l=<optimized out>) at dl-init.c:85
    #24 _dl_init (main_map=0x7ff641f44190, argc=1, argv=0x7fff893e4118, env=0x7fff893e4128) at dl-init.c:134
    #25 0x00007ff641d22b6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
    #26 0x0000000000000001 in ?? ()
    #27 0x00007fff893e5870 in ?? ()
    #28 0x0000000000000000 in ?? ()
    

    The problem seems to be that during the mesh::runtime() call the Runtime::Runtime() constructor (more specifically it's MeshableArena::MeshableArena() base constructor) calls MeshableArena::openSpanDir(int pid), which formats a tmpDir string via the snprintf call. That function makes an internal allocation and deallocation via malloc/free calls:

    args_value = args_malloced = malloc (nargs * bytes_per_arg);
    ...
    all_done:
      if (specs_malloced)
        free (specs);
      free (args_malloced);
    

    That in turn calls the mesh::runtime() once again leading to a deadlock when acquiring guard (__cxxabiv1::__cxa_guard_acquire) synchronizing static initialization of this code:

    static Runtime *runtimePtr = new (buf) Runtime{}; 
    

    It is totally possible that I am doing something wrong, but it seems to me as a circular initialization problem. Please, could someone look at this issue and point me in right direction? Thanks in advance. Petr

    opened by filodej 3
  • Crash after failed assert

    Crash after failed assert

    Hi, I get src/meshable_arena.cc:509:void mesh::MeshableArena::beginMesh(void*, void*, size_t): ASSERTION 'r == 0' FAILED: followed by abort(). gcc 7.3.0 CentOS Linux 7 Multi-threaded application, heavy memory usage. Any ideas?

    Thanks, Andrew

    opened by shk656461 0
  • Adapt Mesh to use huge pages

    Adapt Mesh to use huge pages

    Issue

    As it stands, Mesh can't make effective use of huge pages -- its design depends on smallish pages (e.g., 4K).

    Desired Behavior

    Mesh would be able to take advantage of huge pages and the consequent reduction of TLB footprint (and thus likely increased performance for very large heaps).

    Proposed Design

    I think it'd be possible to make a huge-page version of Mesh that uses huge pages by treating some smaller grain chunk (e.g., 1/256th of the huge page size) as a single "bit" -- 1 if there is any memory allocated in that chunk, 0 otherwise. The smaller grain chunks could be managed by a more conventional allocator (possibly from a single size class). Then, meshing would operate on the bitmaps for the huge pages, essentially as it already does today for standard page sizes.

    opened by emeryberger 1
  • Getting `null pointer dereference (signal: 11)` with OpenJDK 16

    Getting `null pointer dereference (signal: 11)` with OpenJDK 16

    Reproducer:

    git clone https://github.com/petrbouda/jvm-memory-allocators-comparison.git
    docker build -t jvm-allocators:mesh -f Dockerfile-mesh .
    docker run -e MALLOCSTATS=1 jvm-allocators:mesh java -version
    
    libmesh: caught null pointer dereference (signal: 11)
    
    opened by petrbouda 0
  • Tests are failed on Intel(R) Pentium(R) CPU G4600

    Tests are failed on Intel(R) Pentium(R) CPU G4600

    I presume that issue is processor-specific, as i just ran it on my laptop equipped with AMD Ryzen 3 3200U and quite surprisingly(maybe not) tests got passed. Here are some logs i end up getting:

    INFO: Analyzed target //src:unit-tests (0 packages loaded, 0 targets configured).
    INFO: Found 1 test target...
    FAIL: //src:unit-tests (see /home/glasser/.cache/bazel/_bazel_glasser/90ad5b34d6bf5ba4db55b6ca0ce1ca6a/execroot/org_libmesh/bazel-out/k8-fastbuild/testlogs/src/unit-tests/test.log)
    INFO: From Testing //src:unit-tests:
    ==================== Test output for //src:unit-tests:
    Running main() from gmock_main.cc
    [==========] Running 23 tests from 7 test suites.
    [----------] Global test environment set-up.
    [----------] 2 tests from Alignment
    [ RUN      ] Alignment.NaturalAlignment
    ================================================================================
    Target //src:unit-tests up-to-date:
      bazel-bin/src/unit-tests
    INFO: Elapsed time: 0.313s, Critical Path: 0.16s
    INFO: 2 processes: 2 linux-sandbox.
    INFO: Build completed, 1 test FAILED, 2 total actions
    //src:unit-tests                                                         FAILED in 0.1s
      /home/glasser/.cache/bazel/_bazel_glasser/90ad5b34d6bf5ba4db55b6ca0ce1ca6a/execroot/org_libmesh/bazel-out/k8-fastbuild/testlogs/src/unit-tests/test.log
    
    INFO: Build completed, 1 test FAILED, 2 total actions
    make: *** [Makefile:44: test] Error 3
    

    Here, in logs, i was advised to look into /home/glasser/.cache/bazel/_bazel_glasser/90ad5b34d6bf5ba4db55b6ca0ce1ca6a/execroot/org_libmesh/bazel-out/k8-fastbuild/testlogs/src/unit-tests/test.log.

    Here's an output of the latter

    exec ${PAGER:-/usr/bin/less} "$0" || exit 1
    Executing tests from //src:unit-tests
    -----------------------------------------------------------------------------
    Running main() from gmock_main.cc
    [==========] Running 23 tests from 7 test suites.
    [----------] Global test environment set-up.
    [----------] 2 tests from Alignment
    [ RUN      ] Alignment.NaturalAlignment
    

    Thanks in advance!

    opened by ThreadedStream 2
  • bug in the gcc lib

    bug in the gcc lib

    We found the gcc lib has a bug, call the system call futex without the FUTEX_PRIVATE_FLAG flag. This would block the thread after mesh, because the phys address changed. clang' lib without this bug.

    eg. std::future would trigger this bug.

    opened by kyoguan 2
Owner
PLASMA @ UMass
PLASMA @ UMass
STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

Jonathan Müller 1k Dec 2, 2021
The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.

The Hoard Memory Allocator Copyright (C) 1998-2020 by Emery Berger The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocat

Emery Berger 882 May 13, 2022
Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C

rpmalloc - General Purpose Memory Allocator This library provides a public domain cross platform lock free thread caching 16-byte aligned memory alloc

Mattias Jansson 1.5k May 10, 2022
A tiny portable C89 memory allocator

mem A tiny portable C89 memory allocator. Usage This is a single-header library. You must include this file alongside #define MEM_IMPLEMENTATION in on

null 8 Mar 31, 2022
Malloc Lab: simple memory allocator using sorted segregated free list

LAB 6: Malloc Lab Main Files mm.{c,h} - Your solution malloc package. mdriver.c - The malloc driver that tests your mm.c file short{1,2}-bal.rep - T

null 1 Feb 28, 2022
Allocator bench - bench of various memory allocators

To run benchmarks Install lockless from https://locklessinc.com/downloads/ in lockless_allocator path make Install Hoard from https://github.com/emery

Sam 44 Dec 3, 2021
Initialize the 8-bit computer memory with a program to be executed automatically on powering.

Initialize the 8-bit computer memory with a program to be executed automatically on powering. This project is small extension of Ben Eater's computer

Dmytro Striletskyi 62 Dec 13, 2021
mimalloc is a compact general purpose allocator with excellent performance.

mimalloc mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leij

Microsoft 6.7k May 13, 2022
Hardened malloc - Hardened allocator designed for modern systems

Hardened malloc - Hardened allocator designed for modern systems. It has integration into Android's Bionic libc and can be used externally with musl and glibc as a dynamic library for use on other Linux-based platforms. It will gain more portability / integration over time.

GrapheneOS 715 May 12, 2022
Snmalloc - Message passing based allocator

snmalloc snmalloc is a high-performance allocator. snmalloc can be used directly in a project as a header-only C++ library, it can be LD_PRELOADed on

Microsoft 914 May 13, 2022
Custom memory allocators in C++ to improve the performance of dynamic memory allocation

Table of Contents Introduction Build instructions What's wrong with Malloc? Custom allocators Linear Allocator Stack Allocator Pool Allocator Free lis

Mariano Trebino 1.2k May 9, 2022
MMCTX (Memory Management ConTeXualizer), is a tiny (< 300 lines), single header C99 library that allows for easier memory management by implementing contexts that remember allocations for you and provide freeall()-like functionality.

MMCTX (Memory Management ConTeXualizer), is a tiny (< 300 lines), single header C99 library that allows for easier memory management by implementing contexts that remember allocations for you and provide freeall()-like functionality.

A.P. Jo. 4 Oct 2, 2021
Memory-dumper - A tool for dumping files from processes memory

What is memory-dumper memory-dumper is a tool for dumping files from process's memory. The main purpose is to find patterns inside the process's memor

Alexander Nestorov 29 Feb 5, 2022
A C++ Class and Template Library for Performance Critical Applications

Spirick Tuning A C++ Class and Template Library for Performance Critical Applications Optimized for Performance The Spirick Tuning library provides a

Dietmar Deimling 3 Dec 6, 2021
OpenXenium JTAG and Flash Memory programmer

OpenXenium JTAG and Flash Memory programmer * Read: "Home Brew" on ORIGINAL XBOX - a detailed article on why and how * The tools in this repo will all

Koos du Preez 25 Feb 14, 2022
manually map driver for a signed driver memory space

smap manually map driver for a signed driver memory space credits https://github.com/btbd/umap tested system Windows 10 Education 20H2 UEFI installati

ekknod 71 Apr 9, 2022
Memory instrumentation tool for android app&game developers.

Overview LoliProfiler is a C/C++ memory profiling tool for Android games and applications. LoliProfiler supports profiling debuggable applications out

Tencent 416 May 13, 2022
A single file drop-in memory leak tracking solution for C++ on Windows

MemLeakTracker A single file drop-in memory leak tracking solution for C++ on Windows This small piece of code allows for global memory leak tracking

null 22 Apr 23, 2022
Dump the memory of a PPL with a userland exploit

PPLdump This tool implements a userland exploit that was initially discussed by James Forshaw (a.k.a. @tiraniddo) - in this blog post - for dumping th

Clément Labro 546 May 14, 2022