Mimalloc-bench - Suite for benchmarking malloc implementations.

Overview

Mimalloc-bench

 

Suite for benchmarking malloc implementations, originally developed for benchmarking mimalloc. Collection of various benchmarks from the academic literature, together with automated scripts to pull specific versions of benchmark programs and allocators from Github and build them.

Due to the large variance in programs and allocators, the suite is currently only developed for Unix-like systems, and specifically Ubuntu with apt-get, Fedora with dnf, and macOS (for a limited set of allocators and benchmarks). The only system-installed allocator used is glibc's implementation that ships as part of Linux's libc. All other allocators are downloaded and built as part of build-bench-env.sh -- if you are looking to run these benchmarks on a different Linux distribution look at the setup_packages function to see the packages required to build the full set of allocators.

It is quite easy to add new benchmarks and allocator implementations -- please do so!.

Enjoy, Daan

Note that all the code in the bench directory is not part of mimalloc-bench as such, and all programs in the bench directory are governed under their own specific licenses and copyrights as detailed in their README.md (or license.txt) files. They are just included here for convenience.

Benchmarking

The build-bench-env.sh script with the all argument will automatically pull all needed benchmarks and allocators and build them in the extern directory:

~/dev/mimalloc-bench> ./build-bench-env.sh all

It starts installing packages and you will need to enter the sudo password. All other programs are build in the mimalloc-bench/extern directory. Use ./build-bench-env.sh -h to see all options.

If everything succeeded, you can run the full benchmark suite (from out/bench) as:

  • ~/dev/mimalloc-bench> cd out/bench
  • ~/dev/mimalloc-bench/out/bench>../../bench.sh alla allt

Or just test mimalloc and tcmalloc on cfrac and larson with 16 threads:

  • ~/dev/mimalloc-bench/out/bench>../../bench.sh --procs=16 mi tc cfrac larson

Generally, you can specify the allocators (mi, je, tc, hd, sys (system allocator)) etc, and the benchmarks , cfrac, espresso, barnes, lean, larson, alloc-test, cscratch, etc. Or all allocators (alla) and tests (allt). Use --procs= to set the concurrency, and use --help to see all supported allocators and benchmarks.

Current Allocators

Supported allocators are as follow, see build-bench-env.sh for the versions:

  • dieharder: The DieHarder allocator is an error-resistant memory allocator for Windows, Linux, and Mac OS X.
  • hd: The Hoard allocator by Emery Berger [1]. This is one of the first multi-thread scalable allocators.
  • hm: The Hardened Malloc from GrapheneOS, security-focused.
  • iso: The Isoalloc allocator, isolation-based aiming at providing a reasonable level of security without sacrificing too much the performances.
  • je: The jemalloc allocator by Jason Evans, now developed at Facebook and widely used in practice, for example in FreeBSD and Firefox.
  • mallocng: musl's memory allocator.
  • mesh: The mesh allocator, a memory allocator that automatically reduces the memory footprint of C/C++ applications. Also tested as nomesh with the meshing feature disabled.
  • mi: The mimalloc allocator. We can also test the debug version as dmi (this can be used to check for any bugs in the benchmarks), and the secure version as smi.
  • rp: The rpmalloc allocator uses 16-byte aligned allocations and is developed by Mattias Jansson at Epic Games, used for example in Haiku.
  • sc: The scalloc allocator, a fast, multicore-scalable, low-fragmentation memory allocator
  • scudo: The scudo allocator used by Fuschia and Android.
  • sm: The Supermalloc allocator by Bradley Kuszmaul uses hardware transactional memory to speed up parallel operations.
  • sn: The snmalloc allocator is a recent concurrent message passing allocator by Liétar et al. [8].
  • sys: The system allocator. Here we usually use the glibc allocator (which is originally based on Ptmalloc2).
  • tc: The tcmalloc allocator which comes as part of the Google performance tools and is used in the Chrome browser.
  • tbb: The Intel TBB allocator that comes with the Thread Building Blocks (TBB) library [7].

Current Benchmarks

The first set of benchmarks are real world programs, or are trying to mimic some, and consists of:

  • barnes: a hierarchical n-body particle solver [4], simulating the gravitational forces between 163840 particles. It uses relatively few allocations compared to cfrac and espresso but is multithreaded.
  • cfrac: by Dave Barrett, implementation of continued fraction factorization, using many small short-lived allocations.
  • espresso: a programmable logic array analyzer, described by Grunwald, Zorn, and Henderson [3]. in the context of cache aware memory allocation.
  • gs: have ghostscript process the entire Intel Software Developer’s Manual PDF, which is around 5000 pages.
  • leanN: The Lean compiler by de Moura et al, version 3.4.1, compiling its own standard library concurrently using N threads (./lean --make -j N). Big real-world workload with intensive allocations.
  • redis: running redis-benchmark, with 1 million requests pushing 10 new list elements and then requesting the head 10 elements, and measures the requests handled per second. Simulates a real-world workload.
  • larsonN: by Larson and Krishnan [2]. Simulates a server workload using 100 separate threads which each allocate and free many objects but leave some objects to be freed by other threads. Larson and Krishnan observe this behavior (which they call bleeding) in actual server applications, and the benchmark simulates this.
  • larsonN-sized: same as the larsonN except it uses sized deallocation calls which have a fast path in some allocators.
  • z3: perform some computations in z3.

The second set of benchmarks are stress tests and consist of:

  • alloc-test: a modern allocator test developed by OLogN Technologies AG (ITHare.com) Simulates intensive allocation workloads with a Pareto size distribution. The alloc-testN benchmark runs on N cores doing 100·10⁶ allocations per thread with objects up to 1KiB in size. Using commit 94f6cb (master, 2018-07-04)
  • cache-scratch: by Emery Berger [1]. Introduced with the Hoard allocator to test for passive-false sharing of cache lines: first some small objects are allocated and given to each thread; the threads free that object and allocate immediately another one, and access that repeatedly. If an allocator allocates objects from different threads close to each other this will lead to cache-line contention.
  • cache_trash: part of Hoard benchmarking suite, designed to exercise heap cache locality.
  • glibc-simple and glibc-thread: benchmarks for the glibc.
  • malloc-large: part of mimalloc benchmarking suite, designed to exercice large (several MiB) allocations.
  • mleak: check that terminate threads don't "leak" memory.
  • rptest: modified version of the rpmalloc-benchmark suite.
  • mstress: simulates real-world server-like allocation patterns, using N threads with with allocations in powers of 2
    where objects can migrate between threads and some have long life times. Not all threads have equal workloads and after each phase all threads are destroyed and new threads created where some objects survive between phases.
  • rbstress: modified version of allocator_bench, allocates chunks in memory via ruby shenanigans.
  • sh6bench: by MicroQuill as part of SmartHeap. Stress test where some of the objects are freed in a usual last-allocated, first-freed (LIFO) order, but others are freed in reverse order. Using the public source (retrieved 2019-01-02)
  • sh8benchN: by MicroQuill as part of SmartHeap. Stress test for multi-threaded allocation (with N threads) where, just as in larson, some objects are freed by other threads, and some objects freed in reverse (as in sh6bench). Using the public source (retrieved 2019-01-02)
  • xmalloc-testN: by Lever and Boreham [5] and Christian Eder. We use the updated version from the SuperMalloc repository. This is a more extreme version of the larson benchmark with 100 purely allocating threads, and 100 purely deallocating threads with objects of various sizes migrating between them. This asymmetric producer/consumer pattern is usually difficult to handle by allocators with thread-local caches.

Example

Below is an example (Apr 2019) of the benchmark results on an HP Z4-G4 workstation with a 4-core Intel® Xeon® W2123 at 3.6 GHz with 16GB ECC memory, running Ubuntu 18.04.1 with LibC 2.27 and GCC 7.3.0.

bench-z4-1 bench-z4-2

Memory usage:

bench-z4-rss-1 bench-z4-rss-2

(note: the xmalloc-testN memory usage should be disregarded is it allocates more the faster the program runs. Unfortunately, there are no entries for SuperMalloc in the leanN and xmalloc-testN benchmarks as it faulted on those)

Resulting improvements

References

  • [1] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX). Cambridge, MA, November 2000. pdf

  • [2] P. Larson and M. Krishnan. Memory allocation for long-running server applications. In ISMM, Vancouver, B.C., Canada, 1998. pdf

  • [3] D. Grunwald, B. Zorn, and R. Henderson. Improving the cache locality of memory allocation. In R. Cartwright, editor, Proceedings of the Conference on Programming Language Design and Implementation, pages 177–186, New York, NY, USA, June 1993. pdf

  • [4] J. Barnes and P. Hut. A hierarchical O(n*log(n)) force-calculation algorithm. Nature, 324:446-449, 1986.

  • [5] C. Lever, and D. Boreham. Malloc() Performance in a Multithreaded Linux Environment. In USENIX Annual Technical Conference, Freenix Session. San Diego, CA. Jun. 2000. Available at https://​github.​com/​kuszmaul/​SuperMalloc/​tree/​master/​tests

  • [6] Timothy Crundal. Reducing Active-False Sharing in TCMalloc. 2016. http://​courses.​cecs.​anu.​edu.​au/​courses/​CSPROJECTS/​16S1/​Reports/​Timothy*​Crundal*​Report.​pdf. CS16S1 project at the Australian National University.

  • [7] Alexey Kukanov, and Michael J Voss. The Foundations for Scalable Multi-Core Software in Intel Threading Building Blocks. Intel Technology Journal 11 (4). 2007

  • [8] Paul Liétar, Theodore Butler, Sylvan Clebsch, Sophia Drossopoulou, Juliana Franco, Matthew J Parkinson, Alex Shamis, Christoph M Wintersteiger, and David Chisnall. Snmalloc: A Message Passing Allocator. In Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, 122–135. ACM. 2019.

Comments
  • Add a github action

    Add a github action

    Having a CI is nice, so let's have one.

    This already uncovered a couple of issues:

    • tcmalloc take ages to build, since we're building its benchmarks and huge pile of tests
    • same remark for snmalloc
    • the number of cores is hardcoded to 4 instead of being detected at runtime
    • the lean benchmark spits a ton of warnings, bloating the build log
    • ./build-bench-env.sh all doesn't build mesh with meshing disabled, yet ../../bench.sh alla allt is trying to run it, resulting in an error: ERROR: ld.so: object '/home/runner/work/mimalloc-bench/mimalloc-bench/extern/nomesh/libmesh.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored., preventing us to make the CI fail upon errors.
    • looking for tbb fails with:
    find: ‘/home/runner/work/mimalloc-bench/mimalloc-bench/extern/tbb/build’: No such file or directory
    dirname: missing operand
    Try 'dirname --help' for more information.
    
    opened by jvoisin 8
  • Larson sized deallocation test

    Larson sized deallocation test

    Larson sized deallocation test. This will call delete[] operator with size to test sized deallocation path.

    The test can be run with following:

    ../../bench.sh alla larson-sized
    
    opened by jq-rs 6
  • Please consider adding scudo

    Please consider adding scudo

    Please consider adding the scudo allocator. It's available in the compiler-rt/lib/scudo/standalone directory of https://github.com/llvm/llvm-project/ , and from that directory, you can build a shared library using clang++ -flto -fuse-ld=lld -fPIC -std=c++14 -fno-exceptions -fno-rtti -fvisibility=internal -msse4.2 -O3 -I include -shared -o libscudo.so *.cpp -pthread

    opened by joshtriplett 5
  • build-bench-env.sh fails to install alt allocators (and no bench.sh?)

    build-bench-env.sh fails to install alt allocators (and no bench.sh?)

    Ubuntu Disco with more sys info below. The errors below all come from running the build-bench-env.sh file. Note also there is no bench.sh file in this repo and build-bench-env.sh has nothing to create a bench.sh.

    I'd recommend checking out the instructions for each of the installers. Also it's generally bad form to install things for users or make directories outside of your repository. If you need to do both of these I would recommend zipping all of this into a docker container.

    Hoard Compilation Errors

    HEAD is now at c856b43 Fixed release target.
    git clone https://github.com/emeryberger/Heap-Layers
    Cloning into 'Heap-Layers'...
    remote: Enumerating objects: 1710, done.
    remote: Total 1710 (delta 0), reused 0 (delta 0), pack-reused 1710
    Receiving objects: 100% (1710/1710), 402.50 KiB | 4.38 MiB/s, done.
    Resolving deltas: 100% (1121/1121), done.
    clang++ -std=c++14 -O3 -DNDEBUG -ffast-math -fno-builtin-malloc -Wall -Wextra -Wshadow -Wconversion -Wuninitialized -g -W -Wconversion -Wall -I/usr/include/nptl -fno-builtin-malloc -pipe -fPIC -DNDEBUG  -I. -Iinclude -Iinclude/util -Iinclude/hoard -Iinclude/superblocks -IHeap-Layers -D_REENTRANT=1 -shared   source/libhoard.cpp source/unixtls.cpp Heap-Layers/wrappers/gnuwrapper.cpp -Bsymbolic -o libhoard.so -ldl -lpthread
    clang++ -std=c++14 -O3 -DNDEBUG -ffast-math -fno-builtin-malloc -Wall -Wextra -Wshadow -Wconversion -Wuninitialized -g -W -Wconversion -Wall -I/usr/include/nptl -fno-builtin-malloc -pipe -fPIC -DNDEBUG  -I. -Iinclude -Iinclude/util -Iinclude/hoard -Iinclude/superblocks -IHeap-Layers -D_REENTRANT=1 -shared   source/libhoard.cpp source/unixtls.cpp Heap-Layers/wrappers/gnuwrapper.cpp -Bsymbolic -o libhoard.so -ldl -lpthread
    cp libhoard.so /usr/lib
    cp: cannot stat '/usr/lib/libhoard.so': Too many levels of symbolic links
    make: *** [GNUmakefile:187: Linux-gcc-x86_64-install] Error 1
    ~/open_source/open_projects/misc/mimalloc-bench
    

    SuperMalloc

    /usr/include/stdlib.h:583:14: note: in a call to allocation function ‘aligned_alloc’ declared here
     extern void *aligned_alloc (size_t __alignment, size_t __size)
                  ^
    lto1: all warnings being treated as errors
    lto-wrapper: fatal error: g++ returned 1 exit status
    compilation terminated.
    /usr/bin/ld: error: lto-wrapper failed
    collect2: error: ld returned 1 exit status
    make: *** [../Makefile.include:122: ../release/aligned_alloc] Error 1
    rm ../release/aligned_alloc.o
    ~/open_source/open_projects/misc/mimalloc-bench
    
    

    rpmalloc

    [10/48] CC test/thread.c
    FAILED: build/ninja/linux/release/b'x86_64'/test-57ec084/thread-35aa063.o 
    clang -MMD -MT 'build/ninja/linux/release/b'\''x86_64'\''/test-57ec084/thread-35aa063.o' -MF 'build/ninja/linux/release/b'\''x86_64'\''/test-57ec084/thread-35aa063.o'.d -I. -Irpmalloc -Itest -DRPMALLOC_COMPILE=1 -funit-at-a-time -fstrict-aliasing -fno-math-errno -ffinite-math-only -funsafe-math-optimizations -fno-trapping-math -ffast-math -D_GNU_SOURCE=1 -W -Werror -pedantic -Wall -Weverything -Wno-padded -Wno-documentation-unknown-command -std=c11  -DBUILD_RELEASE=1 -O3 -g -funroll-loops -DENABLE_ASSERTS=1 -DENABLE_STATISTICS=1 -c test/thread.c -o 'build/ninja/linux/release/b'\''x86_64'\''/test-57ec084/thread-35aa063.o'
    test/thread.c:95:2: error: implicit use of sequentially-consistent atomic may incur stronger memory barriers than necessary [-Werror,-Watomic-implicit-seq-cst]
            __sync_synchronize();
            ^~~~~~~~~~~~~~~~~~
    test/thread.c:104:2: error: implicit use of sequentially-consistent atomic may incur stronger memory barriers than necessary [-Werror,-Watomic-implicit-seq-cst]
            __sync_synchronize();
            ^~~~~~~~~~~~~~~~~~
    2 errors generated.
    

    Sys Info

    uname -a
    # Linux [comp] 5.0.0-15-generic #16-Ubuntu SMP 
    # Mon May 6 17:41:33 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
    
    g++ -v
    # Using built-in specs.
    # COLLECT_GCC=g++-9
    # COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
    # OFFLOAD_TARGET_NAMES=nvptx-none
    # OFFLOAD_TARGET_DEFAULT=1
    # Target: x86_64-linux-gnu
    # Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.1.0-2ubuntu2~19.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
    # Thread model: posix
    # gcc version 9.1.0 (Ubuntu 9.1.0-2ubuntu2~19.04) 
    
    cmake --version
    # cmake version 3.13.4
    
    
    opened by SteveBronder 5
  • Add libpas comparison

    Add libpas comparison

    A new malloc() has appeared: https://github.com/WebKit/WebKit/blob/main/Source/bmalloc/libpas/Documentation.md Some discussion: https://news.ycombinator.com/item?id=31581727 There's some interesting things going on with the always_inline parts.

    opened by jasongibson 4
  • Add the new TCMalloc

    Add the new TCMalloc

    Unfortunately, there are two allocators both called tcmalloc. You are using the one from the gperftools repo, but a newer fork from upstream which uses restartable sequences is available at https://github.com/google/tcmalloc. It would be nice to be able to see results for both of them along with mimalloc.

    opened by RedBeard0531 4
  • Add hardened_malloc

    Add hardened_malloc

    This commit adds hardened_malloc and is based on #31

    Tested like this:

    $ ../../bench.sh hm je cfrac
    --- Benchmarking ---
    Running on 2 cores.
    Installed allocators:
    
    hm:   main,    aa94408,  https://github.com/GrapheneOS/hardened_malloc
    je:   5.2.1,   ea6b3e9,  https://github.com/jemalloc/jemalloc.git
    
    ---- cfrac
    
    run cfrac hm: LD_PRELOAD=/home/jvoisin/dev/mimalloc-bench/extern/hm/libhardened_malloc.so ./cfrac 17545186520507317056371138836327483792789528
    cfrac hm 0:13.17 6508 13.14 0.01 0 1069
    
    run cfrac je: LD_PRELOAD=/home/jvoisin/dev/mimalloc-bench/extern/jemalloc/lib/libjemalloc.so ./cfrac 17545186520507317056371138836327483792789528
    cfrac je 0:07.77 4828 7.73 0.00 0 474
    
    # --------------------------------------------------
    # benchmark allocator elapsed rss user sys page-faults page-reclaims
          
    cfrac hm 13.17 6508 13.14 0.01 0 1069
    cfrac je 07.77 4828 7.73 0.00 0 474
    

    cc @thestinger

    opened by jvoisin 4
  • Build fails on Ubuntu 20.04

    Build fails on Ubuntu 20.04

    Running ./build-bench-env.sh all fails in the section build sm. I am running Ubuntu 20.04 (in WSL2, though I don't think that's relevant), with g++ 10.3.0. I will provide additional information, if necessary.

    --------------------------------------------
    build sm: version 709663f
    --------------------------------------------
    
    ~/allocator-test/mimalloc-bench/extern ~/allocator-test/mimalloc-bench
    Cloning into 'SuperMalloc'...
    remote: Enumerating objects: 2756, done.
    remote: Total 2756 (delta 0), reused 0 (delta 0), pack-reused 2756
    Receiving objects: 100% (2756/2756), 5.07 MiB | 1.61 MiB/s, done.
    Resolving deltas: 100% (1857/1857), done.
    Note: switching to '709663f'.
    
    You are in 'detached HEAD' state. You can look around, make experimental
    changes and commit them, and you can discard any commits you make in this
    state without impacting any branches by switching back to a branch.
    
    If you want to create a new branch to retain commits you create, you may
    do so (now or later) by using -c with the switch command. Example:
    
      git switch -c <new-branch-name>
    
    Or undo this operation with:
    
      git switch -
    
    Turn off this advice by setting config variable advice.detachedHead to false
    
    HEAD is now at 709663f Merge pull request #48 from Willtor/master
    set -e; rm -f ../release/env.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/env.cc -MG -MF ../release/env.d.$$; \
                  sed 's,\(env\)\.o[ :]*,../release/\1.o ../release/env.d : ,g' < ../release/env.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/env.d; \
                  rm -f ../release/env.d.$$
    set -e; rm -f ../release/has_tsx.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/has_tsx.cc -MG -MF ../release/has_tsx.d.$$; \
                  sed 's,\(has_tsx\)\.o[ :]*,../release/\1.o ../release/has_tsx.d : ,g' < ../release/has_tsx.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/has_tsx.d; \
                  rm -f ../release/has_tsx.d.$$
    set -e; rm -f ../release/futex_mutex.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/futex_mutex.cc -MG -MF ../release/futex_mutex.d.$$; \
                  sed 's,\(futex_mutex\)\.o[ :]*,../release/\1.o ../release/futex_mutex.d : ,g' < ../release/futex_mutex.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/futex_mutex.d; \
                  rm -f ../release/futex_mutex.d.$$
    set -e; rm -f ../release/stats.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/stats.cc -MG -MF ../release/stats.d.$$; \
                  sed 's,\(stats\)\.o[ :]*,../release/\1.o ../release/stats.d : ,g' < ../release/stats.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/stats.d; \
                  rm -f ../release/stats.d.$$
    set -e; rm -f ../release/footprint.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/footprint.cc -MG -MF ../release/footprint.d.$$; \
                  sed 's,\(footprint\)\.o[ :]*,../release/\1.o ../release/footprint.d : ,g' < ../release/footprint.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/footprint.d; \
                  rm -f ../release/footprint.d.$$
    set -e; rm -f ../release/bassert.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/bassert.cc -MG -MF ../release/bassert.d.$$; \
                  sed 's,\(bassert\)\.o[ :]*,../release/\1.o ../release/bassert.d : ,g' < ../release/bassert.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/bassert.d; \
                  rm -f ../release/bassert.d.$$
    set -e; rm -f ../release/cache.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/cache.cc -MG -MF ../release/cache.d.$$; \
                  sed 's,\(cache\)\.o[ :]*,../release/\1.o ../release/cache.d : ,g' < ../release/cache.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/cache.d; \
                  rm -f ../release/cache.d.$$
    set -e; rm -f ../release/small_malloc.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/small_malloc.cc -MG -MF ../release/small_malloc.d.$$; \
                  sed 's,\(small_malloc\)\.o[ :]*,../release/\1.o ../release/small_malloc.d : ,g' < ../release/small_malloc.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/small_malloc.d; \
                  rm -f ../release/small_malloc.d.$$
    set -e; rm -f ../release/large_malloc.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/large_malloc.cc -MG -MF ../release/large_malloc.d.$$; \
                  sed 's,\(large_malloc\)\.o[ :]*,../release/\1.o ../release/large_malloc.d : ,g' < ../release/large_malloc.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/large_malloc.d; \
                  rm -f ../release/large_malloc.d.$$
    set -e; rm -f ../release/huge_malloc.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/huge_malloc.cc -MG -MF ../release/huge_malloc.d.$$; \
                  sed 's,\(huge_malloc\)\.o[ :]*,../release/\1.o ../release/huge_malloc.d : ,g' < ../release/huge_malloc.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/huge_malloc.d; \
                  rm -f ../release/huge_malloc.d.$$
    set -e; rm -f ../release/rng.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/rng.cc -MG -MF ../release/rng.d.$$; \
                  sed 's,\(rng\)\.o[ :]*,../release/\1.o ../release/rng.d : ,g' < ../release/rng.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/rng.d; \
                  rm -f ../release/rng.d.$$
    set -e; rm -f ../release/makechunk.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/makechunk.cc -MG -MF ../release/makechunk.d.$$; \
                  sed 's,\(makechunk\)\.o[ :]*,../release/\1.o ../release/makechunk.d : ,g' < ../release/makechunk.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/makechunk.d; \
                  rm -f ../release/makechunk.d.$$
    set -e; rm -f ../release/malloc.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/malloc.cc -MG -MF ../release/malloc.d.$$; \
                  sed 's,\(malloc\)\.o[ :]*,../release/\1.o ../release/malloc.d : ,g' < ../release/malloc.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/malloc.d; \
                  rm -f ../release/malloc.d.$$
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -I../src  -c ../src/bassert.cc -o ../release/bassert.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11  ../src/objsizes.cc ../release/bassert.o -o ../release/objsizes
    ./../release/objsizes  ../release/generated_constants.cc >  ../release/generated_constants.h
    set -e; rm -f ../release/cache.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/cache.cc -MG -MF ../release/cache.d.$$; \
                  sed 's,\(cache\)\.o[ :]*,../release/\1.o ../release/cache.d : ,g' < ../release/cache.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/cache.d; \
                  rm -f ../release/cache.d.$$
    set -e; rm -f ../release/small_malloc.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/small_malloc.cc -MG -MF ../release/small_malloc.d.$$; \
                  sed 's,\(small_malloc\)\.o[ :]*,../release/\1.o ../release/small_malloc.d : ,g' < ../release/small_malloc.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/small_malloc.d; \
                  rm -f ../release/small_malloc.d.$$
    set -e; rm -f ../release/large_malloc.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/large_malloc.cc -MG -MF ../release/large_malloc.d.$$; \
                  sed 's,\(large_malloc\)\.o[ :]*,../release/\1.o ../release/large_malloc.d : ,g' < ../release/large_malloc.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/large_malloc.d; \
                  rm -f ../release/large_malloc.d.$$
    set -e; rm -f ../release/huge_malloc.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/huge_malloc.cc -MG -MF ../release/huge_malloc.d.$$; \
                  sed 's,\(huge_malloc\)\.o[ :]*,../release/\1.o ../release/huge_malloc.d : ,g' < ../release/huge_malloc.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/huge_malloc.d; \
                  rm -f ../release/huge_malloc.d.$$
    set -e; rm -f ../release/makechunk.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/makechunk.cc -MG -MF ../release/makechunk.d.$$; \
                  sed 's,\(makechunk\)\.o[ :]*,../release/\1.o ../release/makechunk.d : ,g' < ../release/makechunk.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/makechunk.d; \
                  rm -f ../release/makechunk.d.$$
    set -e; rm -f ../release/malloc.d; \
                  g++ -MM -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  ../src/malloc.cc -MG -MF ../release/malloc.d.$$; \
                  sed 's,\(malloc\)\.o[ :]*,../release/\1.o ../release/malloc.d : ,g' < ../release/malloc.d.$$ \
                   | sed 's,generated_constants.h,../release/generated_constants.h,' > ../release/malloc.d; \
                  rm -f ../release/malloc.d.$$
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/malloc.cc -o ../release/malloc.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/makechunk.cc -o ../release/makechunk.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/rng.cc -o ../release/rng.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/huge_malloc.cc -o ../release/huge_malloc.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/large_malloc.cc -o ../release/large_malloc.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/small_malloc.cc -o ../release/small_malloc.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/cache.cc -o ../release/cache.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/footprint.cc -o ../release/footprint.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/stats.cc -o ../release/stats.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/futex_mutex.cc -o ../release/futex_mutex.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src   -c -o ../release/generated_constants.o ../release/generated_constants.cc
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/has_tsx.cc -o ../release/has_tsx.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11     -I../release   -I../src  -c ../src/env.cc -o ../release/env.o
    mkdir -p ../release/lib
    gcc-ar cr ../release/lib/supermalloc.a  ../release/malloc.o  ../release/makechunk.o  ../release/rng.o  ../release/huge_malloc.o  ../release/large_malloc.o  ../release/small_malloc.o  ../release/cache.o  ../release/bassert.o  ../release/footprint.o  ../release/stats.o  ../release/futex_mutex.o  ../release/generated_constants.o  ../release/has_tsx.o  ../release/env.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11   ../release/malloc.o  ../release/makechunk.o  ../release/rng.o  ../release/huge_malloc.o  ../release/large_malloc.o  ../release/small_malloc.o  ../release/cache.o  ../release/bassert.o  ../release/footprint.o  ../release/stats.o  ../release/futex_mutex.o  ../release/generated_constants.o  ../release/has_tsx.o  ../release/env.o -shared -ldl -o ../release/lib/libsupermalloc.so
    cc -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c11    -I../release   -I../src  -c ../tests/aligned_alloc.c -o ../release/aligned_alloc.o
    g++ -W -Wall  -O3 -flto -ggdb -pthread -fPIC -mrtm  -std=c++11  ../release/aligned_alloc.o -ldl -L../release/lib -Wl,-rpath,../release/lib -ldl ../release/lib/libsupermalloc.so -o ../release/aligned_alloc
    lto1: fatal error: bytecode stream in file ‘../release/aligned_alloc.o’ generated with GCC compiler older than 10.0
    compilation terminated.
    lto-wrapper: fatal error: g++ returned 1 exit status
    compilation terminated.
    /usr/bin/ld: error: lto-wrapper failed
    collect2: error: ld returned 1 exit status
    make: *** [../Makefile.include:122: ../release/aligned_alloc] Error 1
    rm ../release/aligned_alloc.o
    
    opened by janekb04 3
  • new[] and delete in Larson.cpp

    new[] and delete in Larson.cpp

    @daanx the Larson benchmarking does not correctly match new[] to delete[] calls. It allocates as an array, but deletes as an object: For instance allocation here

    https://github.com/daanx/mimalloc-bench/blob/bcddf9f54d3a1e4271da1cf1ad7966514ed860c9/bench/larson/larson.cpp#L379

    and deallocation here

    https://github.com/daanx/mimalloc-bench/blob/bcddf9f54d3a1e4271da1cf1ad7966514ed860c9/bench/larson/larson.cpp#L391

    This means that it is supplying size parameters to delete assuming it is deallocating a char rather than a char array. For snmalloc this was causing a lot of memory wastage as it was assuming the size was correct.

    I am happy to PR a fix, but wanted to check if you would take it.

    opened by mjp41 3
  • Hardcoded path for LD_PRELOAD

    Hardcoded path for LD_PRELOAD

    I can build part of mimalloc-bench on Linux/Aarch64 target. However, I found hardcoded path for LD_PRELOAD:

    out/bench$ ../../bench.sh rp cfrac
    ...
    ---- cfrac
    
    run cfrac rp: LD_PRELOAD=/tmp/mimalloc-bench/extern/rpmalloc/bin/linux/release/x86-64/librpmalloc.so ./cfrac 175451865205073170563711388363274837927895
    ERROR: ld.so: object '/tmp/mimalloc-bench/extern/rpmalloc/bin/linux/release/x86-64/librpmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
    

    It should not be x86-64 since I am testing aarch64.

    opened by jserv 3
  • Move all _big and _small tests to one file, and specify separate build

    Move all _big and _small tests to one file, and specify separate build

    Merged _big and _small tests - now just create two separate build targets via CMake.

    Also generalize some things into common.h, so that it's easier to adjust the diagnostics in the future. I suspect we'll eventually want NOT_CAUGHT to report which test it came from and which allocation size it attempted.

    opened by Maximus- 2
  • Any chance to modernize the code?

    Any chance to modernize the code?

    It looks like the benchmarks do no longer build with modern C compilers due to use of very old C dialect. Any chance of updating the benchmarks to modern syntax?

    opened by juj 4
  • Improve the speed of the benchmarks

    Improve the speed of the benchmarks

    Currently the CI takes around 90min to run: this is a bit excessive. Here is what we could do to reduce a bit this number:

    • Change the parameters for the redis benchmark: SYSMALLOC=1 /__w/mimalloc-bench/mimalloc-bench/extern/redis-6.2.6/src/redis-benchmark -r 1000000 -n 1000000 -q -P 16 lpush a 1 2 3 4 5 lrange a 1 5 takes around a full minute, given that we have ~16 allocators, it adds up rapidly.
      • Done in 410f665d48cea57eee03551310ea8b9aa4f8129d
    • Lean takes around 1-3 minutes per allocator, and is unmaintained. Shall we try to move to lean4 ?
    opened by jvoisin 2
Owner
Daan
Daan
mimalloc is a compact general purpose allocator with excellent performance.

mimalloc mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leij

Microsoft 7.6k Dec 30, 2022
The Best and Highest-Leveled and Newest bingding for MiMalloc Ever Existed in Rust

The Best and Highest-Leveled and Newest bingding for MiMalloc Ever Existed in Rust mimalloc 1.7.2 stable Why create this in repo https://github.com/pu

LemonHX 31 Dec 17, 2022
The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.

The Hoard Memory Allocator Copyright (C) 1998-2020 by Emery Berger The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocat

Emery Berger 927 Jan 2, 2023
jemalloc websitejemalloc - General purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. [BSD] website

jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. jemalloc first came

jemalloc memory allocator 7.7k Jan 7, 2023
Test your malloc protection

Test your allocs protections and leaks ! Report Bug · Request Feature Table of Contents About The Tool Getting Started Prerequisites Quickstart Usage

tmatis 49 Dec 7, 2022
A poggers malloc implementation

pogmalloc(3) A poggers malloc implementation Features Static allocator Real heap allocator (via sbrk(2)) Builtin GC Can also mark static memory Can be

Ariel Simulevski 2 Jun 12, 2022
Malloc Lab: simple memory allocator using sorted segregated free list

LAB 6: Malloc Lab Main Files mm.{c,h} - Your solution malloc package. mdriver.c - The malloc driver that tests your mm.c file short{1,2}-bal.rep - T

null 1 Feb 28, 2022
Custom implementation of C stdlib malloc(), realloc(), and free() functions.

C-Stdlib-Malloc-Implementation NOT INTENDED TO BE COMPILED AND RAN, DRIVER CODE NOT OWNED BY I, ARCINI This is a custom implmentation of the standard

Alex Cini 1 Dec 27, 2021
Alloc-test - Cross-platform benchmarking for memory allocators, aiming to be as close to real world as it is practical

Alloc-test - Cross-platform benchmarking for memory allocators, aiming to be as close to real world as it is practical

null 37 Aug 23, 2022
Hardened malloc - Hardened allocator designed for modern systems

Hardened malloc - Hardened allocator designed for modern systems. It has integration into Android's Bionic libc and can be used externally with musl and glibc as a dynamic library for use on other Linux-based platforms. It will gain more portability / integration over time.

GrapheneOS 893 Jan 3, 2023
Allocator bench - bench of various memory allocators

To run benchmarks Install lockless from https://locklessinc.com/downloads/ in lockless_allocator path make Install Hoard from https://github.com/emery

Sam 47 Dec 4, 2022
Malloc geiger is a hook for malloc that plays geiger counter blips in proportion to the amount of calls to malloc as a way of knowing what an application does

Malloc Geiger Malloc geiger is a hook for malloc that plays geiger counter blips in proportion to the amount of calls to malloc as a way of knowing wh

David Larsson 321 Dec 19, 2022
Benchmarking algebraic effect handler implementations

Benchmarking Effect handlers. Currently supported libraries and languages are: kk (extension .kk), Koka. ml (extension .ml), multi-core OCaml.

Daan 21 Dec 18, 2021
mimalloc is a compact general purpose allocator with excellent performance.

mimalloc mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leij

Microsoft 7.6k Dec 30, 2022
Simple conservative GC using mimalloc

migc Small and simple library that implements conservative GC using mimalloc API. Features Small and tiny. libmigc.so is just 20KB when linked with mi

playX 34 Jan 1, 2023
The Best and Highest-Leveled and Newest bingding for MiMalloc Ever Existed in Rust

The Best and Highest-Leveled and Newest bingding for MiMalloc Ever Existed in Rust mimalloc 1.7.2 stable Why create this in repo https://github.com/pu

LemonHX 31 Dec 17, 2022
A simple C++ 03/11/etc timer class for ~microsecond-precision cross-platform benchmarking. The implementation is as limited and as simple as possible to create the lowest amount of overhead.

plf_nanotimer A simple C++ 03/11/etc timer class for ~microsecond-precision cross-platform benchmarking. The implementation is as limited and simple a

Matt Bentley 102 Dec 4, 2022
A C++ micro-benchmarking framework

Nonius What is nonius? Nonius is an open-source framework for benchmarking small snippets of C++ code. It is very heavily inspired by Criterion, a sim

Nonius 339 Dec 19, 2022
The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.

The Hoard Memory Allocator Copyright (C) 1998-2020 by Emery Berger The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocat

Emery Berger 927 Jan 2, 2023
jemalloc websitejemalloc - General purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. [BSD] website

jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. jemalloc first came

jemalloc memory allocator 7.7k Jan 7, 2023