jemalloc websitejemalloc - General purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. [BSD] website

Overview
jemalloc is a general purpose malloc(3) implementation that emphasizes
fragmentation avoidance and scalable concurrency support.  jemalloc first came
into use as the FreeBSD libc allocator in 2005, and since then it has found its
way into numerous applications that rely on its predictable behavior.  In 2010
jemalloc development efforts broadened to include developer support features
such as heap profiling and extensive monitoring/tuning hooks.  Modern jemalloc
releases continue to be integrated back into FreeBSD, and therefore versatility
remains critical.  Ongoing development efforts trend toward making jemalloc
among the best allocators for a broad range of demanding applications, and
eliminating/mitigating weaknesses that have practical repercussions for real
world applications.

The COPYING file contains copyright and licensing information.

The INSTALL file contains information on how to configure, build, and install
jemalloc.

The ChangeLog file contains a brief summary of changes for each release.

URL: http://jemalloc.net/
Comments
  • OpenJDK JVM deadlock triggered with jemalloc 5.x?

    OpenJDK JVM deadlock triggered with jemalloc 5.x?

    We've found an issue that we can only reproduce when LD_PRELOADing jemalloc 5.1. The issue causes a deadlock where one or more threads are waiting to lock an object monitor that no thread is currently holding. When we attempted to debug, we discovered the thread that is expected to be holding the lock has instead left the synchronized block and has returned to the thread pool. If we don't LD_PRELOAD jemalloc and rely on glibc malloc we're unable to reproduce the issue. If we use jemalloc 4.5.0 we're unable to reproduce it there as well.

    We've written a simple test application that can reproduce the issue.

    Test application - https://github.com/djscholl/jemalloc-java-deadlock Example log output when a deadlock occurred - https://gist.github.com/djscholl/413071ef29671fb53b5a64e105421f1a

    In the example log output, we can see that 999 threads are

            -  blocked on [email protected]
    

    Which is supposedly owned by "pool-1-thread-558", but that thread doesn't appear to be holding the monitor.

    "pool-1-thread-558" Id=566 RUNNABLE
            at java.lang.Class.getConstructor(Class.java:1825)
            at java.security.Provider$Service.newInstance(Provider.java:1594)
            at sun.security.jca.GetInstance.getInstance(GetInstance.java:236)
            at sun.security.jca.GetInstance.getInstance(GetInstance.java:164)
            at java.security.Security.getImpl(Security.java:695)
            at java.security.MessageDigest.getInstance(MessageDigest.java:167)
            at com.example.DeadlockTest.lambda$null$1(DeadlockTest.java:44)
            at com.example.DeadlockTest$$Lambda$7/2093176254.run(Unknown Source)
            ...
    
            Number of locked synchronizers = 1
            - [email protected]
    

    We've reproduced this using a jemalloc.so built with everything set to the default and no malloc options set. We've tried it with --enable-debug and can reproduce it with that as well. We've tried with background threads enabled and disabled, as well as changing number of arenas. All of these configurations still result in reproducing this issue.

    We've tried this on a few different OpenJDK versions as well.

    • Multiple versions of 1.8 between 1.8.0.121 and 1.8.0.202
    • 1.9
    • 1.11

    It's a little awkward to post a JVM dead lock to jemalloc developers, but we're stuck! We'll post this same issue on OpenJDK and link that here when we have it.

    We tried running the test application with -DwaitOnDeadLock=true and attached gdb, and all we see is that the threads are in pthread_cond_wait while the thread ID that they're waiting for is elsewhere outside of the synchronized block (so it shouldn't be holding the object monitor any more).

    notabug question 
    opened by djscholl 44
  • jemalloc leads to segmentation fault in execution that is clean in valgrind with the system allocator

    jemalloc leads to segmentation fault in execution that is clean in valgrind with the system allocator

    This has been a recurring issue for us in the development of some of our software (salmon and pufferfish). Essentially, when we link with jemalloc we intermittently get segfaults. Often, re-building the application will "resolve" the issue. Of course, this is almost always a sign of an underlying memory issue in the client program, and so I had previously assumed the same was the case here (despite the fact that these crashes don't happen with the system allocator).

    However, we've gone to considerable extents to rule out (though we can never completely eliminate) the possibility of memory errors in the client code. This recent issue in the pufferfish repository demonstrates one such example where the program repeatedly segfaults when linked with jemalloc. However, it runs to completion when linked with the system allocator. Further, evaluating the execution (under the system allocator) with valgrind shows 0 memory errors (similar results are obtained using the address and memory sanitizers as well). These random crashes have been plaguing certain builds of our tools for a couple of years now, so it would be great to figure out what is going on, and to either pin it down to some memory error in our code that affects jemalloc (but which isn't detectable with valgrind) or to figure out the underlying issue in jemalloc.

    opened by rob-p 35
  • Make last-N dumping non-blocking

    Make last-N dumping non-blocking

    A few remarks:

    • I'm chopping last-N dumping into batches, each under the prof_recent_alloc_mtx, so that sampled malloc and free can proceed between the batches, rather than being blocked until the entire dumping process finishes.
    • I'm using the existing prof_dump_mtx to cover the entire dumping process, during which I first change the limit to unlimited (so that existing records can stay), then perform the dumping batches, and finally revert the limit back (and shorten the record list). prof_dump_mtx serves to only permit one thread at a time to dump, either the last-N records or the original stacktrace-based profiling information. An alternative approach is to use a separate mutex for the last-N records, so that the two types of dumping can take place concurrently. Thoughts?
    • I'm additionally changing the mallctl logic for reading and writing the limit: they now need the prof_dump_mtx. For reading, this ensures that what's being read is always the real limit. For writing, this ensures that the application cannot change the limit during dumping. The downside is that the mallctl calls are blocked until the entire dumping process finishes, but I think it's fine, because the mallctl calls are very rare and only initiated by the application.
    • I'm increasing the buffer size to be the same as the size used by stats printing and the original profiling dumping. I think I could even consolidate the last-N buffer with the original profiling buffer, especially since I'm already using prof_dump_mtx. Thoughts? I could have a separate commit for that, since that'd also need some refactoring of the original profiling dumping logic.
    • The batch size is chosen to be 64. I figured making such a choice is quite tricky and here's how I get it -
      • The goal I'm pursuing is to find a batch size so that each batch can trigger at most one I/O procedural call: the worst case blocking time is always at least one I/O, so a smaller batch size cannot reduce the worst case blocking time, while a larger batch size can multiply the worst case blocking time.
      • The amount of output per record depends primarily on (a) the length of the stack trace and (b) whether the record has been released (two stack traces if released; one if not). I examined last-N dumps from production, 4 per service, and found that one of the services happened to have both the longest stack traces and the highest proportion of released records, and the average length per record for that service is in the order of 800-900 characters (in compact JSON format). So, if I set the batch size to be 64 records, each batch will at most output less than but close to 64K characters, which is the size of the buffer.
    opened by yinan1048576 30
  • static TLS errors from jemalloc 5.0.0 built on CentOS 6

    static TLS errors from jemalloc 5.0.0 built on CentOS 6

    I help maintain packages on conda-forge which has become fairly popular in the Python community. We recently added jemalloc 5.0.0 to the package manager, built on CentOS 6 with devtoolset-2 from this base Docker image (glibc 2.12 I think)

    https://github.com/conda-forge/docker-images/blob/master/linux-anvil/Dockerfile

    On some platforms, like Ubuntu 14.04 (glibc 2.19), using dlopen on the produced shared library leads to errors like

    libjemalloc.so: cannot allocate memory in static TLS block
    

    What is the recommended workaround given that we need to compile on a glibc 2.12 system and deploy the binaries on systems with newer glibc?

    this may be related to https://sourceware.org/bugzilla/show_bug.cgi?id=14898

    cc @xhochy

    opened by wesm 29
  • Clean compilation with -Wextra

    Clean compilation with -Wextra

    Clean compilation -Wextra

    Before this pull-request jemalloc produced many warnings when compiled with -Wextra with both Clang and GCC. This pull-request fixes the issues raised by these warnings or suppresses them if they were spurious at least for the Clang and GCC versions covered by CI.

    This pull-request:

    • adds JEMALLOC_DIAGNOSTIC macros: JEMALLOC_DIAGNOSTIC_{PUSH,POP} are used to modify the stack of enabled diagnostics. The JEMALLOC_DIAGNOSTIC_IGNORE_... macros are used to ignore a concrete diagnostic.

    • adds JEMALLOC_FALLTHROUGH macro to explicitly state that falling through case labels in a switch statement is intended

    • locally supresses many unused argument warnings by adding missing UNUSED annotations

    • locally suppresses some -Wextra diagnostics:

      • -Wmissing-field-initializer is buggy in older Clang and GCC versions, not understanding that, in C, = {0} is a common idiom to initialize a struct to zero

      • -Wtype-bounds is suppressed in a particular situation where a generic macro, used in multiple different places, compares an unsigned integer for smaller than zero, which is always true.

      • -Walloc-larger-than-size= diagnostics warn when an allocation function is called with a size that is too large (out-of-range). These are suppressed in the parts of the tests where jemalloc explicitly does this to test that the allocation functions fail properly.

    • fixes a bug in the log.c tests where an array was being written out-of bounds, which was probably invoking undefined behavior.

    Closes #1196 .

    opened by gnzlbg 27
  • physical memory goes high every several hours

    physical memory goes high every several hours

    Hi There,

    We are using jemalloc 5.0.1 on our project, and found that the physical memory usage goes high every several (>10) hours. Here is the log I captured, more than 10GB physical memory were used during this time:

    Allocated: 56763404160, active: 64958468096, metadata: 3267106432, resident: 70248562688, mapped: 70786420736, retained: 405324754944 Allocated: 56876350976, active: 65120444416, metadata: 3292205344, resident: 74587324416, mapped: 75117805568, retained: 405240102912 Allocated: 56737409856, active: 64979918848, metadata: 3293146528, resident: 75795795968, mapped: 76325535744, retained: 404032372736 Allocated: 56738962464, active: 64995016704, metadata: 3296629168, resident: 76685611008, mapped: 77218127872, retained: 403615834112 Allocated: 56968671360, active: 65284304896, metadata: 3296170416, resident: 78292492288, mapped: 78825009152, retained: 402008952832 Allocated: 56968786248, active: 65279537152, metadata: 3298034096, resident: 79658573824, mapped: 80191090688, retained: 400642871296 Allocated: 56941156840, active: 65251299328, metadata: 3297322160, resident: 80860139520, mapped: 81392623616, retained: 399441338368 Allocated: 56991072392, active: 65310920704, metadata: 3312494544, resident: 82332794880, mapped: 82864013312, retained: 399729459200 Allocated: 57126460528, active: 65457401856, metadata: 3318715504, resident: 83553558528, mapped: 84290650112, retained: 399185723392 Allocated: 56571929400, active: 64856027136, metadata: 3341452928, resident: 85106311168, mapped: 85832876032, retained: 400474652672 Allocated: 56948892104, active: 65236578304, metadata: 3443298560, resident: 84992585728, mapped: 85696909312, retained: 413038342144

    Except resident/mapped varies a lot, the others almost remains the same. What's the reason of this high physical memory usage? Does jemalloc reclaim the unused physical memory instantly or periodically? btw, Huge TLB is disabled on this machine.

    question 
    opened by atwufei 27
  • test/stress/cpp/microbench failing with Floating point exception after recent commit

    test/stress/cpp/microbench failing with Floating point exception after recent commit

    We can see a test failure on x86_64 and s390x platforms in our CI after recent commits (481bbfc9906e7744716677edd49d0d6c22556a1a was OK). Both looks the same

    ...
    === test/stress/cpp/microbench ===
    100000000 iterations, malloc_free=1044987us (10.449 ns/iter), new_delete=1059986us (10.599 ns/iter), time consumption ratio=0.986:1
    test_free_vs_delete (non-reentrant): pass
    test/test.sh: line 34: 895548 Floating point exception(core dumped) $JEMALLOC_TEST_PREFIX ${t} /var/lib/jenkins/workspace/jemalloc/label/x86_64/ /var/lib/jenkins/workspace/jemalloc/label/x86_64/
    Test harness error: test/stress/cpp/microbench w/ MALLOC_CONF=""
    Use prefix to debug, e.g. JEMALLOC_TEST_PREFIX="gdb --args" sh test/test.sh test/stress/cpp/microbench
    make: *** [Makefile:707: stress] Error 1
    Build step 'Execute shell' marked build as failure
    

    The environment is Fedora 36 on x86_64 and s390x, there is no such problem other platforms (aarch64, ppc64le) it seems.

    opened by sharkcz 26
  • 5.3 Release Candidate

    5.3 Release Candidate

    This issue is for tracking progress and communication related to the upcoming 5.3 release.

    Current release candidate: ed5fc14 which is going through our production testing pipeline.

    A previous commit was production deployed in November and no issues were observed.

    CC: @jasone

    opened by interwq 25
  • Build a general purpose thread event handler

    Build a general purpose thread event handler

    I haven't added unit tests. Will add later. I want to seek some early feedback on the design first.

    A few remarks:

    • I turned off the accumulation of allocation bytes on thread_allocated when reentrancy_level > 0. Otherwise the implementation would become very tricky, because when reentrancy_level > 0, profiling should be turned off by jemalloc design - see https://github.com/jemalloc/jemalloc/blob/785b84e60382515f1bf1a63457da7a7ab5d0a96b/include/jemalloc/internal/prof_inlines_b.h#L135 but if thread_allocated keeps increasing, the wait time till the next sampling event would be incorrect, unless we keep some internal state so that when reentrancy_level drops back to 0, we can adjust the wait time; this would require that the store and fetch for the internal state should have a reentrancy_level guard. I decided to rather have the guard on thread_allocated directly: after all, it's now not just a accumulator for allocations, but a counter for events, and in general it may not be a good idea to have allocations in jemalloc internal calls trigger events in the same way (which is probably why profiling was determined to be turned off for such internal allocations). And, just to be symmetric, I also turned off the incrementation of thread_deallocated when reentrancy_level > 0.

    • The lazy creation of tdata made the event counting tricky; it means that the event counters were fooled due to the wrong wait time, and that they need to be recovered to a state as if they had only reacted to the right wait time. See my comments in the code for how I am resolving this issue and why.

    • The event handler adopts a lazy approach for wait time reset, so that there's no longer the need to sometimes manually throw the wait time to a huge number (e.g. in the case where opt_prof is off), This ends up with cleaner code.

    • I also rearranged the tsd layout. prof_tdata has been no longer needed on the fast path ever since bytes_until_sample is extracted out of tdata, and now bytes_until_sample is also not needed on the fast path though we need an additional thread_allocated_threshold_fast on the fast path. We end up needing 8 bytes less on the fast path. So I shifted rtree_ctx ahead by 16 bytes (previously we put one extra slow path field ahead of it since it needs to be 16-byte aligned), and put all slow path fields after it. I also changed the layout diagram to reflect that, including adding the span for binshards (which takes quite an amount of space and I'm not sure if putting it completely before the tcache is optimal).

    • My comment in my earlier design #1616 are mostly still relevant -

      • The first two bullets on resolving double counting issue still apply.

      • The third bullet on overwriting thread_allocated is no longer relevant: the current implementation never overwrites it so as to be consistent with the promise to the application by jemalloc.

      • The fourth bullet on resolving overflow issue still applies. One slight but visible implementation detail: previously the test for event triggering is a strict less than comparison i.e. bytes_until_sample < 0, but now the equal case also counts - the test is now thread_allocated >= thread_allocated_threshold_fast (or the real threshold in the slow path). The reason is that when thread_allocated_threshold_fast is set to be 0, we want thread_allocated >= thread_allocated_threshold_fast to be always true, so that we can fall back to the slow path.

      • Regarding the fifth bullet on the need of increased delaying due to the delayed incrementation of thread_allocated: the issue is now resolved in a cleaner and more intuitive way, via the use of the thread_allocated_last_event counter (which is a nice side byproduct of it, since it's not designed for this purpose in the first place).

      • Regarding the sixth bullet on update flag being no longer necessary for prof_sample_check(): now the logic is even simpler - there's no need for the prof_sample_check() function completely, since the thread_event() call has adjusted the wait time if and only if an event was triggered, so we only need to check the remaining wait time rather than doing any further comparison.

    optimization feature 
    opened by yinan1048576 25
  • Long max_wait_ns in jemalloc status.

    Long max_wait_ns in jemalloc status.

    Hi All,

    Described this issue on gitter and as directed raising this question here.

    The actual problem:

    We are using jemalloc version 5.0.1 in our multi-threaded file parsing application. In the process of improving our application throughput we noticed the below stack trace under gdb. #0 lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

    #1 0x00007f0afdb91dbd in GI___pthread_mutex_lock (mutex=0x7f0ae985b568) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f0afe481088 in je_malloc_mutex_lock (mutex=0x7f0ae985b568) at include/jemalloc/internal/mutex.h:77 #3 je_tcache_bin_flush_small (tbin=0x7f0ab3806088, binind=3, rem=4, tcache=0x7f0ab3806000) at src/tcache.c:105 #4 0x00007f0afe481bfd in je_tcache_event_hard (tcache=0x7f0ab3806000) at src/tcache.c:39 #5 0x00007f0afe457104 in je_tcache_event (tcache=0x7f0ab3806000) at include/jemalloc/internal/tcache.h:271 #6 je_tcache_alloc_large (size=, tcache=, zero=) at include/jemalloc/internal/tcache.h:384 #7 je_arena_malloc (zero=false, size=, arena=0x0, try_tcache=true) at include/jemalloc/internal/arena.h:969 #8 je_imalloct (arena=0x0, try_tcache=true, size=) at include/jemalloc/internal/jemalloc_internal.h:771 #9 je_imalloc (size=) at include/jemalloc/internal/jemalloc_internal.h:780 #10 malloc (size=) at src/jemalloc.c:929

    Later we noticed max_wait_ns value to be close to 0.3 seconds in jemalloc stats and this could be a potential problem for performance degradation.

    ___ Begin jemalloc statistics ___
    Version: 5.0.1-0-g896ed3a8b3f41998d4fb4d625d30ac63ef2d51fb
    Assertions disabled
    config.malloc_conf: ""
    Run-time option settings:
      opt.abort: false
      opt.abort_conf: false
      opt.retain: true
      opt.dss: "secondary"
      opt.narenas: 16
      opt.percpu_arena: "disabled"
      opt.background_thread: false (background_thread: false)
      opt.dirty_decay_ms: 10000 (arenas.dirty_decay_ms: 10000)
      opt.muzzy_decay_ms: 10000 (arenas.muzzy_decay_ms: 10000)
      opt.junk: "false"
      opt.zero: false
      opt.tcache: true
      opt.lg_tcache_max: 15
      opt.stats_print: true
      opt.stats_print_opts: ""
    Arenas: 16
    Quantum size: 16
    Page size: 4096
    Maximum thread-cached size class: 32768
    Allocated: 9712028528, active: 9955016704, metadata: 66047272, resident: 10031435776, mapped: 10083471360, retained: 1786408960
                               n_lock_ops       n_waiting      n_spin_acq  n_owner_switch   total_wait_ns     max_wait_ns  max_n_thds
    background_thread:               2810               0               0            2123               0               0           0
    ctl:                            76341             315               2            2108     12339917843       215998561           6
    prof:                               0               0               0               0               0               0           0
    Merged arenas stats:
    assigned threads: 24
    uptime: 389717405659
    dss allocation precedence: N/A
    decaying:  time       npages       sweeps     madvises       purged
       dirty:   N/A         2689         4411        18713       591223
       muzzy:   N/A            0            0            0            0
                                allocated      nmalloc      ndalloc    nrequests
    small:                     2713926512    102075050     89041810   8310924619
    large:                     6998102016       222174       166637       222174
    total:                     9712028528    102297224     89208447   8311146793
    active:                    9955016704
    mapped:                   10083471360
    retained:                  1786408960
    base:                        65356320
    internal:                      690952
    tcache:                       3343072
    resident:                 10031435776
                               n_lock_ops       n_waiting      n_spin_acq  n_owner_switch   total_wait_ns     max_wait_ns  max_n_thds
    large:                         161184             417               0           44538     33959773925       271998189           1
    extent_avail:                  407258               7               4           44551       127999148        67999548           1
    extents_dirty:                2683031            2012              61           54134     94579370305       255998296           1
    extents_muzzy:                 252944              19               1           40281               0               0           1
    extents_retained:              480595              33               3           42893      1291991401       139999069           1
    decay_dirty:                   170619               4               6           48972       335997762       119999201           1
    decay_muzzy:                   156830               0               3           48940               0               0           0
    base:                          319374               0               0           40081               0               0           0
    tcache_list:                    44446               0               0           33755               0               0           0
    bins:           size ind    allocated      nmalloc      ndalloc    nrequests      curregs     curslabs regs pgs  util       nfills     nflushes     newslabs      reslabs   n_lock_ops    n_waiting   n_spin_acq  total_wait_ns  max_wait_ns
                       8   0      9174768      4906777      3759931     36601392      1146846         2306  512   1 0.971      1004693       406832         4085       234510      1461810         2239           15   180946795463    467996886
                      16   1      7205120      4817615      4367295     32557685       450320         1822  256   1 0.965      2105395       415422         3099       549408      2569615         1948           24   154846969211    443997045
                      32   2    116546176     14673352     11031284    157879482      3642068        28549  128   1 0.996      3830591       663205        61628      1105934      4633249         3803           19   324037842967    367997551
                      48   3    119091024     21117092     18636029    146136133      2481063         9757  256   3 0.993      4776481       675422        31438       818636      5550373         2310           22   175370832583    387997418
                      64   4    173088448      8406224      5701717   3656045793      2704507        42320   64   1 0.998      2346330       478115        84551      1093944      2996245         1159           20    86831421953    351997657
                      80   5    128189680      6971937      5369566    106557284      1602371         6326  256   5 0.989      2639375       597101        11751       663173      3300550         1751           19   141603057405    379997470
                      96   6      3019200      3390002      3358552     23937634        31450          262  128   3 0.937      2327388       511514          346       687947      2884782          978           25    77851481768    283998109
                     112   7       269696       182457       180049      1175585         2408           15  256   7 0.627       176066        84788         1301         4039       307860          257            2    16923887325    199998668
                     128   8      2589824       261022       240789       447452        20233          663   32   1 0.953       200371        93323         2055        74910       342346          361           15    20267865081    255998296
                     160   9      1150880       102178        94985      3362154         7193           67  128   5 0.838        91210        45200          204        29311       181170          148            1     9171938926    127999148
                     192  10       229632        32430        31234      4559539         1196           35   64   3 0.533        11140         6582           52         2082        66033           96            3     3019979894    327997817
                     224  11     20437536       225910       134671      1162590        91239          729  128   7 0.977        52299        22166         1147        16553       120472           79            2     3583976143    111999255
                     256  12      3653888       130131       115858     13279718        14273          936   16   1 0.953        90379        33400         2894        65328       173060           93            4     4943967090    123999175
                     320  13      3213440       114813       104771     26059425        10042          182   64   5 0.862        95457        30952          240        36418       171252           40            2     2047986367    235998430
                     384  14       905856        22560        20201      1410778         2359           92   32   3 0.801        16780         6798          145         6921        68222           31            1     1295991369    127999148
                     448  15      1025024      1142867      1140579     50540401         2288           48   64   7 0.744       928728       455633           95       319196      1428922         1530            6   126807155844    375997497
                     512  16      1353728     34435026     34432382   3395765781         2644          377    8   1 0.876      3476467      3446590       770446     18050069      8507991         1339           37    81695456124    199998668
                     640  17      1716480         7842         5160      2634044         2682          104   32   5 0.805         2258         1301          156          544        48186           12            2      499996672    131999121
                     768  18      1994496         7740         5143      6693346         2597          183   16   3 0.886         2878         1693          323          629        49453           20            0      575996164    115999227
                     896  19      1291136         7314         5873    270189198         1441           65   32   7 0.692         3130         1769           97          991        49449           20            1      611995925    111999254
                    1024  20     14321664       190116       176130     21647683        13986         3560    4   1 0.982       118507        21516        23534        69129       228861          476           17    13303911434    375997498
                    1280  21     25966080        50427        30141     44317280        20286         1291   16   5 0.982         8026         2866         2609         2081        59238           20            0      447997016     55999627
                    1536  22      7971840        17197        12007        44602         5190          689    8   3 0.941         5572         2605         1404         3909        55584          160            6     3507976648    127999148
                    1792  23      6228992        37802        34326       191055         3476          247   16   7 0.879        32107        30131          385         1154       107241          165            0    12707915409    251998323
                    2048  24     18716672        17880         8741       204504         9139         4590    2   1 0.995         6315         3215         7715         3101        64821           20            0      655995630     59999601
                    2560  25   1888980480       744621         6738      5536396       737883        92250    8   5 0.999       279834         1608        92822          991       419264          398            5    15515896716    131999121
                    3072  26     14505984         9670         4948     20025486         4722         1220    4   3 0.967         2709         1411         2005         1061        51334           22            2      535996433    111999254
                    3584  27      7741440         5297         3137      2398773         2160          306    8   7 0.882         1847          841          487          676        47777           11            0      355997631     91999387
                    4096  28     12161024         8915         5946    270301101         2969         2969    1   1 1             3838         2321         8915            0        65444           26            0      651995660     63999574
                    5120  29     14402560         7441         4628        21996         2813          754    4   5 0.932         3289         1946         1189         1185        51282           65            5     1255991640    127999148
                    6144  30     13910016         4551         2287        13466         2264         1164    2   3 0.972         1351          563         2031          506        49239            9            0      443997042     91999388
                    7168  31     12350464         4218         2495        10929         1723          471    4   7 0.914         1303          545          841          694        47484            8            0      347997685    111999255
                    8192  32     17301504         9633         7521      9182132         2112         2112    1   2 1             6583         5561         9633            0        73718           60            1     2635982456    119999202
                   10240  33     23541760         4836         2537        13072         2299         1184    2   5 0.970         1797          798         2000          699        49865           21            2     1027993156    119999201
                   12288  34     19795968         3673         2062        10838         1611         1611    1   3 1             1412          505         3673            0        52073            6            0       91999387     59999600
                   14336  35     19884032         3484         2097         9892         1387          722    2   7 0.960         1439          608         1484          593        48712           17            0      159998934     35999761
    large:          size ind    allocated      nmalloc      ndalloc    nrequests  curlextents
                   16384  36     30670848       118412       116540       129671         1872
                   20480  37     32071680         3309         1743        10069         1566
                   24576  38     15704064         1845         1206        19239          639
                   28672  39     24457216         1965         1112         7376          853
                   32768...
    
    question 
    opened by sridbv 25
  • Provide a way to decommit instead of purge

    Provide a way to decommit instead of purge

    There are reasons one may want to decommit (which currently does nothing). You can implement commit/decommit with chunk_hooks currently, but the thing is that that has a non negligible performance impact. I found a way to achieve a decent equivalent with less performance impact by making purge not purge but actually decommit (and keeping chunk_hooks_t.decommit doing nothing). In fact, it looks like it even improves(!) performance on some of our Firefox benchmars (which would mean that in some cases we're better off with MEM_DECOMMIT than MEM_RESET).

    But that fails because chunks marked as purged are not committed before they are used. (then, there's also huge alloc shrinkage that will try to memset without committing, too)

    I kind of worked around the issue by changing the flags set in arena_purge_stashed to use CHUNK_MAP_DECOMMITTED instead of CHUNK_MAP_UNZEROED. That's kind of gross and obviously doesn't solve the problem with memset for huge alloc shrinkage.

    Now the question is how can we properly hook this?

    Kind of relatedly, I'd like for arena_purge_all to be able to go through all previously purged chunks again, and it seems this could be achieved by setting the flags to something other than CHUNK_MAP_UNZEROED too. Seems there's an opportunity to kill two birds with one stone.

    opened by glandium 24
  • [WiP] Use volatile asm in benchmarks to prevent pointers optimized away

    [WiP] Use volatile asm in benchmarks to prevent pointers optimized away

    To further prevent pointers from being optimized away like in issue #2356, use asm volatile as suggested by @davidtgoldblatt in PR #2359.

    The PR includes:

    • Check whether volatile asm is available
    • If available, use asm volatile in the benchmarks
    • The rtdscp availability detection is flawed. Fixed it.
    • Append JEMALLOC prefix to the macro to follow the convention.
    opened by guangli-dai 0
  • JEMalloc 5.3.0 is 58% (!) slower than JEMalloc 4.5 (for us)

    JEMalloc 5.3.0 is 58% (!) slower than JEMalloc 4.5 (for us)

    We are porting to ARM and we've found that JEMalloc 4.5, that we've been using for years and years, is causing random segmentation faults on ARM due to corruption in jemalloc. We upgraded to JEMalloc 5.3.0 and our ARM corruption disappeared!

    Unfortunately, we've also discovered that our workloads are 58% slower (on x64) with JEMalloc 5.3, than with JEMalloc 4.5. I have configured a Git branch of our product where the only difference between two builds is the switch from JEMalloc 4.5 to JEMalloc 5.3. We run multiple runs, and all tests show this slowdown. For the test I show here we run the test for about 15m (note this is on bare-metal hosts, not VMs/containers/AWS instances, and each has 98G RAM and 48 threads on Xeon Silver 4214R CPUs.

    As best as I can tell we are configuring both old and new JEMalloc the same way.

    On JEMalloc 4.5 our product runs an average of 19,694 operations per second for this workload, and with JEMalloc 5.3 it runs an average of 8,359 operations per second (these are application "operations", not CPU operations).

    I've attached stats output from 4.5 and 5.3 in case there is anything interesting to be seen there.

    Anyone have any thoughts about where to look for this issue, or next steps we should try? I did try disabling profiling just in case but that had minimal impact.

    jebase-2.txt : Stats for JEMalloc 4.5 jenew-2.txt : Stats for JEMalloc 5.3

    opened by madscientist 4
  • Heap profiling uses 16x more memory on ARM64, than it does on x64

    Heap profiling uses 16x more memory on ARM64, than it does on x64

    We have a program that attempts to keep memory usage to a value specified by the user: it will fill its cache until the memory usage is reached then free memory. If the system cannot release enough memory for a certain amount of time (default 5 minutes) it will kill itself. To determine the amount of memory in use, we first use mallctlbymib() with "epoch", then we use mallctlbymib() with "stats.allocated". We are using JEmalloc 5.3 on Linux x64 and this works fine: the system will run in a stable state for a very long time.

    We are now porting to Linux ARM64, and on ARM we find that jemalloc is not able to free enough memory to stay under the limit. For example if we set the memory limit to 2G on a 16G system it runs forever using about 2G of RAM on x64, but on ARM after a while no matter how much cache we try to free, we can't get "stats.allocated" down and we kill ourselves.

    Just to note, we compile jemalloc with our code and link it statically. We invoke configure with --without-export --disable-doc --enable-prof on both platforms, but no other options.

    I've attached two sets of jemalloc stats output: one for our x64 run and the other for our ARM run. Since they work at slightly different speeds and the x64 test runs to completion while the ARM test fails partway through, they are not fully apples-to-apples. But you can see that on ARM the "mapped" and "retained" values are significantly larger than on x64.

    I do notice that the page size on x64 is 4k while the page size on ARM is 64k. But, should that matter? Doesn't that just mean ARM reserves fewer pages and breaks the 64k up into more blocks?

    Anyone with any insight as to why this is happening, please pass along! I'm not sure where to go from here.

    last-jemalloc-stats.x86.txt last-jemalloc-stats.arm.txt

    opened by madscientist 9
  • ubsan - left shift of 4095 by 20 places cannot be represented in type 'int'

    ubsan - left shift of 4095 by 20 places cannot be represented in type 'int'

    As part of debugging #2356 I have built jemalloc with GCC ubsan and it then reports

    src/jemalloc.c:3201:6: runtime error: left shift of 4095 by 20 places cannot be represented in type 'int'
    

    for quite a number of tests.

    The build was configured with

    EXTRA_CFLAGS=-fsanitize=undefined EXTRA_CXXFLAGS=-fsanitize=undefined LDFLAGS=-fsanitize=undefined ./autogen.sh
    

    gcc is gcc-12.2.1-4.fc36.x86_64

    opened by sharkcz 1
  • Add loongarch64 LG_QUANTUM size definition.

    Add loongarch64 LG_QUANTUM size definition.

    1. Jemalloc compiles failed on loongarch architecture. Error message of building jemalloc,such as: gcc -std=gnu11 -Wall -Wextra -Wsign-compare -Wundef -Wno-format-zero-length -pipe -g3 -fvisibility=hidden -O3 -funroll-loops -g -O2 -fdebug-prefix-map=/home/loongson/debian-community/jemalloc/sys-jemalloc/debian-pa/jemalloc-5.2.1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DPIC -c -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/jemalloc.sym.o src/jemalloc.c In file included from include/jemalloc/internal/jemalloc_internal_types.h:4, from include/jemalloc/internal/sc.h:4, from include/jemalloc/internal/arena_types.h:4, from include/jemalloc/internal/jemalloc_internal_includes.h:45, from src/jemalloc.c:3: include/jemalloc/internal/quantum.h:68:6: error: #error "Unknown minimum alignment for architecture; specify via " error "Unknown minimum alignment for architecture; specify via " include/jemalloc/internal/quantum.h:69:3: error: expected identifier or ‘(’ before string constant "--with-lg-quantum"

    2. Build env $ arch loongarch64 $ gcc --version gcc (Loongnix 8.3.0-6.lnd.vec.33) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc.

    Please consider this pull request, thanks.

    opened by 212dandan 0
  • Crash only with Jemalloc, ASan is clean

    Crash only with Jemalloc, ASan is clean

    Hi, I'm facing yet another case of application that works fine, but crashes with Jemalloc. I know there have been several similar issues already, but none helped me with fixing it. Usually the problem is elsewhere and not in Jemalloc, but in this case I'm not sure where to look at.

    The problem is that Kurento Media Server seems to run fine with default system malloc, and has no memory issue detected by Address Sanitizer, but it crashes if Jemalloc is preloaded (with LD_PRELOAD). Not at first, but eventually, with enough repeated tests, it ends up crashing where no crash would happen otherwise without Jemalloc. However, to be even weirder, this only happens when running within a Docker container, and never if I run it in my host machine.

    I've tried with my system's provided Jemalloc (v5.2.1) and with current git dev branch. When running under GDB, it seems that every time the backtrace is different, which leads me to believe it is indeed a memory issue. But one that only manifests with Jemalloc.

    Kurento uses GLib, which has its own slice allocators, so this environment variable is set: G_SLICE=always-malloc.

    An extra complication happens: Jemalloc's --enable-debug cannot be used. The libsigc++ library contains a bit of memory trickery that is confirmed to not contain leaks, but it's a well known issue that its slot_base class destructor does confuse memory analyzers (see sigc::mem_fun alloc/dealloc issue):

    I can understand that the error messages from -fsanitize=address cause concern, but I believe that they can be ignored in this case. It's not nice, but we will probably have to accept it in sigc++-2.0.

    So, if I build Jemalloc with --enable-debug (which itself activates --enable-opt-size-checks), then the issue becomes hidden because Jemalloc aborts on the false positive from sigc++.

    Address Sanitizer allows running with new_delete_type_mismatch=0, to disable reports about this particular false positive, but I cannot see anything similar for Jemalloc. Thus, I cannot run with either of --enable-debug or --enable-opt-size-checks. (I guess this could become a feature request in itself?)

    Including here a couple stack traces that happens when --enable-debug is used without and with GDB debugger:

    Stack trace 1:

    • --enable-debug: YES
    • opt-size-checks: YES
    • GDB: NO
    <jemalloc>: size mismatch detected (true size 0 vs input size 0), likely caused by application sized deallocation bugs (source address: 0x7f3f40239530, the current pointer being freed). Suggest building with --enable-debug or address sanitizer for debugging. Abort.
    Aborted (thread 139909551523392, pid 124366)
    Stack trace:
    [__pthread_kill_implementation]
    ./nptl/pthread_kill.c:44
    [__GI_raise]
    sysdeps/posix/raise.c:27
    [__GI_abort]
    ./stdlib/abort.c:81 (discriminator 21)
    [je_safety_check_fail]
    /jemalloc/src/safety_check.c:34
    [je_safety_check_fail_sized_dealloc]
    /jemalloc/src/safety_check.c:16 (discriminator 4)
    [maybe_check_alloc_ctx]
    /jemalloc/src/jemalloc.c:2929
    [isfree]
    /jemalloc/src/jemalloc.c:2986
    [je_sdallocx_default]
    /jemalloc/src/jemalloc.c:3995
    [je_je_sdallocx_noflags]
    /jemalloc/src/jemalloc.c:4019
    [sizedDeleteImpl(void*, unsigned long)]
    /jemalloc/src/jemalloc_cpp.cpp:201
    [operator delete(void*, unsigned long)]
    /jemalloc/src/jemalloc_cpp.cpp:207
    [sigc::internal::signal_impl::notify(void*)]
    /usr/include/c++/9/ext/new_allocator.h:128
    [std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()]
    /usr/include/c++/9/bits/shared_ptr_base.h:729
    0x55964f6c224c at /kurento/build-Debug/kurento-media-server/server/kurento-media-server
    0x55964f62f909 at /kurento/build-Debug/kurento-media-server/server/kurento-media-server
    [std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()]
    /usr/include/c++/9/bits/shared_ptr_base.h:167
    Aborted (core dumped)
    

    Stack trace 2:

    • --enable-debug: YES
    • opt-size-checks: YES
    • GDB: YES
    <jemalloc>: size mismatch detected (true size 0 vs input size 0), likely caused by application sized deallocation bugs (source address: 0x7fffe6413810, the current pointer being freed). Suggest building with --enable-debug or address sanitizer for debugging. Abort.
    --Type <RET> for more, q to quit, c to continue without paging--c
    
    Thread 15 "kurento-media-s" received signal SIGABRT, Aborted.
    [Switching to Thread 0x7fffebcd3640 (LWP 126681)]
    __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737149482560) at ./nptl/pthread_kill.c:44
    44      ./nptl/pthread_kill.c: No such file or directory.
    (gdb) bt
    #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737149482560) at ./nptl/pthread_kill.c:44
    #1  __pthread_kill_internal (signo=6, threadid=140737149482560) at ./nptl/pthread_kill.c:78
    #2  __GI___pthread_kill (threadid=140737149482560, [email protected]=6) at ./nptl/pthread_kill.c:89
    #3  0x00007ffff6cfc476 in __GI_raise ([email protected]=6) at ../sysdeps/posix/raise.c:26
    #4  0x00007ffff6ce27f3 in __GI_abort () at ./stdlib/abort.c:79
    #5  0x00007ffff7d2aec2 in je_safety_check_fail
        (format=0x7ffff7d6b750 "<jemalloc>: size mismatch detected (true size %zu vs input size %zu), likely caused by application sized deallocation bugs (source address: %p, %s). Suggest building with --enable-debug or address san"...) at src/safety_check.c:32
    #6  0x00007ffff7d2adbb in je_safety_check_fail_sized_dealloc (current_dealloc=true, ptr=0x7fffe6413810, true_size=0, input_size=0)
        at src/safety_check.c:11
    #7  0x00007ffff7c72d23 in maybe_check_alloc_ctx (tsd=0x7fffebcd2be8, ptr=0x7fffe6413810, alloc_ctx=0x7fffebcd0090) at src/jemalloc.c:2925
    #8  0x00007ffff7c72ff4 in isfree (tsd=0x7fffebcd2be8, ptr=0x7fffe6413810, usize=48, tcache=0x7fffebcd2f40, slow_path=true) at src/jemalloc.c:2986
    #9  0x00007ffff7c7695a in je_sdallocx_default (ptr=0x7fffe6413810, size=48, flags=0) at src/jemalloc.c:3993
    #10 0x00007ffff7c76c2a in je_je_sdallocx_noflags (ptr=0x7fffe6413810, size=48) at src/jemalloc.c:4016
    #11 0x00007ffff7d522b4 in sizedDeleteImpl(void*, std::size_t) (ptr=0x7fffe6413810, size=48) at src/jemalloc_cpp.cpp:201
    #12 0x00007ffff7d522e0 in operator delete(void*, unsigned long) (ptr=0x7fffe6413810, size=48) at src/jemalloc_cpp.cpp:206
    #13 0x00007ffff711cb18 in __gnu_cxx::new_allocator<std::_List_node<sigc::slot_base> >::destroy<sigc::slot_base>(sigc::slot_base*)
        (this=0x7fffe64451c8, __p=0x7fffe640bcb0) at /usr/include/c++/9/ext/new_allocator.h:153
    #14 std::allocator_traits<std::allocator<std::_List_node<sigc::slot_base> > >::destroy<sigc::slot_base>(std::allocator<std::_List_node<sigc::slot_base> >&, sigc::slot_base*) (__a=..., __p=0x7fffe640bcb0) at /usr/include/c++/9/bits/alloc_traits.h:497
    #15 std::__cxx11::list<sigc::slot_base, std::allocator<sigc::slot_base> >::_M_erase(std::_List_iterator<sigc::slot_base>)Python Exception <class 'AttributeError'> 'NoneType' object has no attribute 'pointer':
    
        (__position=, this=0x7fffe64451c8) at /usr/include/c++/9/bits/stl_list.h:1921
    #16 std::__cxx11::list<sigc::slot_base, std::allocator<sigc::slot_base> >::erase(std::_List_const_iterator<sigc::slot_base>)Python Exception <class 'AttributeError'> 'NoneType' object has no attribute 'pointer':
    
        (__position=, this=0x7fffe64451c8) at /usr/include/c++/9/bits/list.tcc:158
    #17 sigc::internal::signal_impl::notify(void*) (d=0x7fffe981d0d0) at signal_base.cc:169
    #18 0x00007ffff78ea546 in kurento::EventHandler::~EventHandler() (this=0x7fffe9806c60, __in_chrg=<optimized out>)
        at /kurento/kms-core/src/server/implementation/EventHandler.cpp:44
    

    Notice how the issue happens in the context of sigc::slot_base, which is the false positive I mentioned above.

    Stack trace 3:

    • --enable-debug: YES
    • opt-size-checks: NO (had to edit file jemalloc_preamble.h.in to force set it to false so it does not get enabled)
    • GDB: NO
    <jemalloc>: include/jemalloc/internal/arena_inlines_b.h:462: Failed assertion: "alloc_ctx.szind == edata_szind_get(edata)"
    Aborted (thread 140330790446656, pid 143096)
    Stack trace:
    [__pthread_kill_implementation]
    ./nptl/pthread_kill.c:44
    [__GI_raise]
    sysdeps/posix/raise.c:27
    [__GI_abort]
    ./stdlib/abort.c:81 (discriminator 21)
    [arena_sdalloc]
    /jemalloc/include/jemalloc/internal/arena_inlines_b.h:463
    [isdalloct]
    /jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h:134
    [isfree]
    /jemalloc/src/jemalloc.c:3010
    [je_sdallocx_default]
    /jemalloc/src/jemalloc.c:3995
    [je_je_sdallocx_noflags]
    /jemalloc/src/jemalloc.c:4019
    [sizedDeleteImpl(void*, unsigned long)]
    /jemalloc/src/jemalloc_cpp.cpp:201
    [operator delete(void*, unsigned long)]
    /jemalloc/src/jemalloc_cpp.cpp:207
    [sigc::internal::signal_impl::notify(void*)]
    /usr/include/c++/9/ext/new_allocator.h:128
    [std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()]
    /usr/include/c++/9/bits/shared_ptr_base.h:729
    0x560978fa324c at /kurento/build-Debug/kurento-media-server/server/kurento-media-server
    0x560978f10909 at /kurento/build-Debug/kurento-media-server/server/kurento-media-server
    [std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()]
    /usr/include/c++/9/bits/shared_ptr_base.h:167
    [std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data() const]
    /usr/include/c++/9/bits/basic_string.h:191
    Aborted (core dumped)
    

    All these are to show how the false positive hides the actual issue I wanted to investigate... so maybe a mechanism to disable these checks would be helpful in Jemalloc.

    Meanwhile, I just wanted to report this with a detailed explanation, and ask if any idea comes up, as I haven't been able to find the issue that makes it crash when Jemalloc is in use.

    In any case, thanks for the effort of this project!

    opened by j1elo 4
Releases(5.3.0)
  • 5.3.0(May 6, 2022)

    This release contains many speed and space optimizations, from micro optimizations on common paths to rework of internal data structures and locking schemes, and many more too detailed to list below. Multiple percent of system level metric improvements were measured in tested production workloads. The release has gone through large-scale production testing.

    New features:

    • Add the thread.idle mallctl which hints that the calling thread will be idle for a nontrivial period of time. (@davidtgoldblatt)
    • Allow small size classes to be the maximum size class to cache in the thread-specific cache, through the opt.[lg_]tcache_max option. (@interwq, @jordalgo)
    • Make the behavior of realloc(ptr, 0) configurable with opt.zero_realloc. (@davidtgoldblatt)
    • Add make uninstall support. (@sangshuduo, @Lapenkov)
    • Support C++17 over-aligned allocation. (@marksantaniello)
    • Add the thread.peak mallctl for approximate per-thread peak memory tracking. (@davidtgoldblatt)
    • Add interval-based stats output opt.stats_interval. (@interwq)
    • Add prof.prefix to override filename prefixes for dumps. (@zhxchen17)
    • Add high resolution timestamp support for profiling. (@tyroguru)
    • Add the --collapsed flag to jeprof for flamegraph generation. (@igorwwwwwwwwwwwwwwwwwwww)
    • Add the --debug-syms-by-id option to jeprof for debug symbols discovery. (@DeannaGelbart)
    • Add the opt.prof_leak_error option to exit with error code when leak is detected using opt.prof_final. (@yunxuo)
    • Add opt.cache_oblivious as an runtime alternative to config.cache_oblivious. (@interwq)
    • Add mallctl interfaces:
      • opt.zero_realloc (@davidtgoldblatt)
      • opt.cache_oblivious (@interwq)
      • opt.prof_leak_error (@yunxuo)
      • opt.stats_interval (@interwq)
      • opt.stats_interval_opts (@interwq)
      • opt.tcache_max (@interwq)
      • opt.trust_madvise (@azat)
      • prof.prefix (@zhxchen17)
      • stats.zero_reallocs (@davidtgoldblatt)
      • thread.idle (@davidtgoldblatt)
      • thread.peak.{read,reset} (@davidtgoldblatt)

    Bug fixes:

    • Fix the synchronization around explicit tcache creation which could cause invalid tcache identifiers. This regression was first released in 5.0.0. (@yoshinorim, @davidtgoldblatt)
    • Fix a profiling biasing issue which could cause incorrect heap usage and object counts. This issue existed in all previous releases with the heap profiling feature. (@davidtgoldblatt)
    • Fix the order of stats counter updating on large realloc which could cause failed assertions. This regression was first released in 5.0.0. (@azat)
    • Fix the locking on the arena destroy mallctl, which could cause concurrent arena creations to fail. This functionality was first introduced in 5.0.0. (@interwq)

    Portability improvements:

    • Remove nothrow from system function declarations on macOS and FreeBSD. (@davidtgoldblatt, @fredemmott, @leres)
    • Improve overcommit and page alignment settings on NetBSD. (@zoulasc)
    • Improve CPU affinity support on BSD platforms. (@devnexen)
    • Improve utrace detection and support. (@devnexen)
    • Improve QEMU support with MADV_DONTNEED zeroed pages detection. (@azat)
    • Add memcntl support on Solaris / illumos. (@devnexen)
    • Improve CPU_SPINWAIT on ARM. (@AWSjswinney)
    • Improve TSD cleanup on FreeBSD. (@Lapenkov)
    • Disable percpu_arena if the CPU count cannot be reliably detected. (@azat)
    • Add malloc_size(3) override support. (@devnexen)
    • Add mmap VM_MAKE_TAG support. (@devnexen)
    • Add support for MADV_[NO]CORE. (@devnexen)
    • Add support for DragonFlyBSD. (@devnexen)
    • Fix the QUANTUM setting on MIPS64. (@brooksdavis)
    • Add the QUANTUM setting for ARC. (@vineetgarc)
    • Add the QUANTUM setting for LoongArch. (@wangjl-uos)
    • Add QNX support. (@jqian-aurora)
    • Avoid atexit(3) calls unless the relevant profiling features are enabled. (@BusyJay, @laiwei-rice, @interwq)
    • Fix unknown option detection when using Clang. (@Lapenkov)
    • Fix symbol conflict with musl libc. (@georgthegreat)
    • Add -Wimplicit-fallthrough checks. (@nickdesaulniers)
    • Add __forceinline support on MSVC. (@santagada)
    • Improve FreeBSD and Windows CI support. (@Lapenkov)
    • Add CI support for PPC64LE architecture. (@ezeeyahoo)

    Incompatible changes:

    • Maximum size class allowed in tcache (opt.[lg_]tcache_max) now has an upper bound of 8MiB. (@interwq)

    Optimizations and refactors (@davidtgoldblatt, @Lapenkov, @interwq):

    • Optimize the common cases of the thread cache operations.
    • Optimize internal data structures, including RB tree and pairing heap.
    • Optimize the internal locking on extent management.
    • Extract and refactor the internal page allocator and interface modules.

    Documentation:

    • Fix doc build with --with-install-suffix. (@lawmurray, @interwq)
    • Add PROFILING_INTERNALS.md. (@davidtgoldblatt)
    • Ensure the proper order of doc building and installation. (@Mingli-Yu)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-5.3.0.tar.bz2(718.77 KB)
  • 5.2.1(Aug 5, 2019)

    This release is primarily about Windows. A critical virtual memory leak is resolved on all Windows platforms. The regression was present in all releases since 5.0.0.

    Bug fixes:

    • Fix a severe virtual memory leak on Windows. This regression was first released in 5.0.0. (@Ignition, @j0t, @frederik-h, @davidtgoldblatt, @interwq)
    • Fix size 0 handling in posix_memalign(). This regression was first released in 5.2.0. (@interwq)
    • Fix the prof_log unit test which may observe unexpected backtraces from compiler optimizations. The test was first added in 5.2.0. (@marxin, @gnzlbg, @interwq)
    • Fix the declaration of the extent_avail tree. This regression was first released in 5.1.0. (@zoulasc)
    • Fix an incorrect reference in jeprof. This functionality was first released in 3.0.0. (@prehistoric-penguin)
    • Fix an assertion on the deallocation fast-path. This regression was first released in 5.2.0. (@yinan1048576)
    • Fix the TLS_MODEL attribute in headers. This regression was first released in 5.0.0. (@zoulasc, @interwq)

    Optimizations and refactors:

    • Implement opt.retain on Windows and enable by default on 64-bit. (@interwq, @davidtgoldblatt)
    • Optimize away a branch on the operator delete path. (@mgrice)
    • Add format annotation to the format generator function. (@zoulasc)
    • Refactor and improve the size class header generation. (@yinan1048576)
    • Remove best fit. (@djwatson)
    • Avoid blocking on background thread locks for stats. (@oranagra, @interwq)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-5.2.1.tar.bz2(541.28 KB)
  • 5.2.0(Apr 3, 2019)

    This release includes a few notable improvements, which are summarized below: 1) improved fast-path performance from the optimizations by @djwatson; 2) reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on setting the number of background threads. In addition, peak / spike memory usage is improved with certain allocation patterns. As usual, the release and prior dev versions have gone through large-scale production testing.

    New features:

    • Implement oversize_threshold, which uses a dedicated arena for allocations crossing the specified threshold to reduce fragmentation. (@interwq)
    • Add extents usage information to stats. (@tyleretzel)
    • Log time information for sampled allocations. (@tyleretzel)
    • Support 0 size in sdallocx. (@djwatson)
    • Output rate for certain counters in malloc_stats. (@zinoale)
    • Add configure option --enable-readlinkat, which allows the use of readlinkat over readlink. (@davidtgoldblatt)
    • Add configure options --{enable,disable}-{static,shared} to allow not building unwanted libraries. (@Ericson2314)
    • Add configure option --disable-libdl to enable fully static builds. (@interwq)
    • Add mallctl interfaces:
      • opt.oversize_threshold (@interwq)
      • stats.arenas.<i>.extent_avail (@tyleretzel)
      • stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel)
      • stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes (@tyleretzel)

    Portability improvements:

    • Update MSVC builds. (@maksqwe, @rustyx)
    • Workaround a compiler optimizer bug on s390x. (@rkmisra)
    • Make use of pthread_set_name_np(3) on FreeBSD. (@trasz)
    • Implement malloc_getcpu() to enable percpu_arena for windows. (@santagada)
    • Link against -pthread instead of -lpthread. (@paravoid)
    • Make background_thread not dependent on libdl. (@interwq)
    • Add stringify to fix a linker directive issue on MSVC. (@daverigby)
    • Detect and fall back when 8-bit atomics are unavailable. (@interwq)
    • Fall back to the default pthread_create(3) if dlsym(3) fails. (@interwq)

    Optimizations and refactors:

    • Refactor the TSD module. (@davidtgoldblatt)
    • Avoid taking extents_muzzy mutex when muzzy is disabled. (@interwq)
    • Avoid taking large_mtx for auto arenas on the tcache flush path. (@interwq)
    • Optimize ixalloc by avoiding a size lookup. (@interwq)
    • Implement opt.oversize_threshold which uses a dedicated arena for requests crossing the threshold, also eagerly purges the oversize extents. Default the threshold to 8 MiB. (@interwq)
    • Clean compilation with -Wextra. (@gnzlbg, @jasone)
    • Refactor the size class module. (@davidtgoldblatt)
    • Refactor the stats emitter. (@tyleretzel)
    • Optimize pow2_ceil. (@rkmisra)
    • Avoid runtime detection of lazy purging on FreeBSD. (@trasz)
    • Optimize mmap(2) alignment handling on FreeBSD. (@trasz)
    • Improve error handling for THP state initialization. (@jsteemann)
    • Rework the malloc() fast path. (@djwatson)
    • Rework the free() fast path. (@djwatson)
    • Refactor and optimize the tcache fill / flush paths. (@djwatson)
    • Optimize sync / lwsync on PowerPC. (@chmeeedalf)
    • Bypass extent_dalloc() when retain is enabled. (@interwq)
    • Optimize the locking on large deallocation. (@interwq)
    • Reduce the number of pages committed from sanity checking in debug build. (@trasz, @interwq)
    • Deprecate OSSpinLock. (@interwq)
    • Lower the default number of background threads to 4 (when the feature is enabled). (@interwq)
    • Optimize the trylock spin wait. (@djwatson)
    • Use arena index for arena-matching checks. (@interwq)
    • Avoid forced decay on thread termination when using background threads. (@interwq)
    • Disable muzzy decay by default. (@djwatson, @interwq)
    • Only initialize libgcc unwinder when profiling is enabled. (@paravoid, @interwq)

    Bug fixes (all only relevant to jemalloc 5.x):

    • Fix background thread index issues with max_background_threads. (@djwatson, @interwq)
    • Fix stats output for opt.lg_extent_max_active_fit. (@interwq)
    • Fix opt.prof_prefix initialization. (@davidtgoldblatt)
    • Properly trigger decay on tcache destroy. (@interwq, @amosbird)
    • Fix tcache.flush. (@interwq)
    • Detect whether explicit extent zero out is necessary with huge pages or custom extent hooks, which may change the purge semantics. (@interwq)
    • Fix a side effect caused by extent_max_active_fit combined with decay-based purging, where freed extents can accumulate and not be reused for an extended period of time. (@interwq, @mpghf)
    • Fix a missing unlock on extent register error handling. (@zoulasc)

    Testing:

    • Simplify the Travis script output. (@gnzlbg)
    • Update the test scripts for FreeBSD. (@devnexen)
    • Add unit tests for the producer-consumer pattern. (@interwq)
    • Add Cirrus-CI config for FreeBSD builds. (@jasone)
    • Add size-matching sanity checks on tcache flush. (@davidtgoldblatt, @interwq)

    Incompatible changes:

    • Remove --with-lg-page-sizes. (@davidtgoldblatt)

    Documentation:

    • Attempt to build docs by default, however skip doc building when xsltproc is missing. (@interwq, @cmuellner)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-5.2.0.tar.bz2(531.14 KB)
  • 5.1.0(May 8, 2018)

    This release is primarily about fine-tuning, ranging from several new features to numerous notable performance and portability enhancements. The release and prior dev versions have been running in multiple large scale applications for months, and the cumulative improvements are substantial in many cases.

    Given the long and successful production runs, this release is likely a good candidate for applications to upgrade, from both jemalloc 5.0 and before. For performance-critical applications, the newly added TUNING.md provides guidelines on jemalloc tuning.

    New features:

    • Implement transparent huge page support for internal metadata. (@interwq)
    • Add opt.thp to allow enabling / disabling transparent huge pages for all mappings. (@interwq)
    • Add maximum background thread count option. (@djwatson)
    • Allow prof_active to control opt.lg_prof_interval and prof.gdump. (@interwq)
    • Allow arena index lookup based on allocation addresses via mallctl. (@lionkov)
    • Allow disabling initial-exec TLS model. (@davidtgoldblatt, @KenMacD)
    • Add opt.lg_extent_max_active_fit to set the max ratio between the size of the active extent selected (to split off from) and the size of the requested allocation. (@interwq, @davidtgoldblatt)
    • Add retain_grow_limit to set the max size when growing virtual address space. (@interwq)
    • Add mallctl interfaces:
      • arena.<i>.retain_grow_limit (@interwq)
      • arenas.lookup (@lionkov)
      • max_background_threads (@djwatson)
      • opt.lg_extent_max_active_fit (@interwq)
      • opt.max_background_threads (@djwatson)
      • opt.metadata_thp (@interwq)
      • opt.thp (@interwq)
      • stats.metadata_thp (@interwq)

    Portability improvements:

    • Support GNU/kFreeBSD configuration. (@paravoid)
    • Support m68k, nios2 and SH3 architectures. (@paravoid)
    • Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable. (@zonyitoo)
    • Fix symbol listing for cross-compiling. (@tamird)
    • Fix high bits computation on ARM. (@davidtgoldblatt, @paravoid)
    • Disable the CPU_SPINWAIT macro for Power. (@davidtgoldblatt, @marxin)
    • Fix MSVC 2015 & 2017 builds. (@rustyx)
    • Improve RISC-V support. (@EdSchouten)
    • Set name mangling script in strict mode. (@nicolov)
    • Avoid MADV_HUGEPAGE on ARM. (@marxin)
    • Modify configure to determine return value of strerror_r. (@davidtgoldblatt, @cferris1000)
    • Make sure CXXFLAGS is tested with CPP compiler. (@nehaljwani)
    • Fix 32-bit build on MSVC. (@rustyx)
    • Fix external symbol on MSVC. (@maksqwe)
    • Avoid a printf format specifier warning. (@jasone)
    • Add configure option --disable-initial-exec-tls which can allow jemalloc to be dynamically loaded after program startup. (@davidtgoldblatt, @KenMacD)
    • AArch64: Add ILP32 support. (@cmuellner)
    • Add --with-lg-vaddr configure option to support cross compiling. (@cmuellner, @davidtgoldblatt)

    Optimizations and refactors:

    • Improve active extent fit with extent_max_active_fit. This considerably reduces fragmentation over time and improves virtual memory and metadata usage. (@davidtgoldblatt, @interwq)
    • Eagerly coalesce large extents to reduce fragmentation. (@interwq)
    • sdallocx: only read size info when page aligned (i.e. possibly sampled), which speeds up the sized deallocation path significantly. (@interwq)
    • Avoid attempting new mappings for in place expansion with retain, since it rarely succeeds in practice and causes high overhead. (@interwq)
    • Refactor OOM handling in newImpl. (@wqfish)
    • Add internal fine-grained logging functionality for debugging use. (@davidtgoldblatt)
    • Refactor arena / tcache interactions. (@davidtgoldblatt)
    • Refactor extent management with dumpable flag. (@davidtgoldblatt)
    • Add runtime detection of lazy purging. (@interwq)
    • Use pairing heap instead of red-black tree for extents_avail. (@djwatson)
    • Use sysctl on startup in FreeBSD. (@trasz)
    • Use thread local prng state instead of atomic. (@djwatson)
    • Make decay to always purge one more extent than before, because in practice large extents are usually the ones that cross the decay threshold. Purging the additional extent helps save memory as well as reduce VM fragmentation. (@interwq)
    • Fast division by dynamic values. (@davidtgoldblatt)
    • Improve the fit for aligned allocation. (@interwq, @edwinsmith)
    • Refactor extent_t bitpacking. (@rkmisra)
    • Optimize the generated assembly for ticker operations. (@davidtgoldblatt)
    • Convert stats printing to use a structured text emitter. (@davidtgoldblatt)
    • Remove preserve_lru feature for extents management. (@djwatson)
    • Consolidate two memory loads into one on the fast deallocation path. (@davidtgoldblatt, @interwq)

    Bug fixes (most of the issues are only relevant to jemalloc 5.0):

    • Fix deadlock with multithreaded fork in OS X. (@davidtgoldblatt)
    • Validate returned file descriptor before use. (@zonyitoo)
    • Fix a few background thread initialization and shutdown issues. (@interwq)
    • Fix an extent coalesce + decay race by taking both coalescing extents off the LRU list. (@interwq)
    • Fix potentially unbound increase during decay, caused by one thread keep stashing memory to purge while other threads generating new pages. The number of pages to purge is checked to prevent this. (@interwq)
    • Fix a FreeBSD bootstrap assertion. (@strejda, @interwq)
    • Handle 32 bit mutex counters. (@rkmisra)
    • Fix a indexing bug when creating background threads. (@davidtgoldblatt, @binliu19)
    • Fix arguments passed to extent_init. (@yuleniwo, @interwq)
    • Fix addresses used for ordering mutexes. (@rkmisra)
    • Fix abort_conf processing during bootstrap. (@interwq)
    • Fix include path order for out-of-tree builds. (@cmuellner)

    Incompatible changes:

    • Remove --disable-thp. (@interwq)
    • Remove mallctl interfaces:
      • config.thp (@interwq)

    Documentation:

    • Add TUNING.md. (@interwq, @davidtgoldblatt, @djwatson)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-5.1.0.tar.bz2(503.53 KB)
  • 5.0.1(Jul 2, 2017)

    This bugfix release fixes several issues, most of which are obscure enough that typical applications are not impacted.

    Bug fixes:

    • Update decay->nunpurged before purging, in order to avoid potential update races and subsequent incorrect purging volume. (@interwq)
    • Only abort on dlsym(3) error if the failure impacts an enabled feature (lazy locking and/or background threads). This mitigates an initialization failure bug for which we still do not have a clear reproduction test case. (@interwq)
    • Modify tsd management so that it neither crashes nor leaks if a thread's only allocation activity is to call free() after TLS destructors have been executed. This behavior was observed when operating with GNU libc, and is unlikely to be an issue with other libc implementations. (@interwq)
    • Mask signals during background thread creation. This prevents signals from being inadvertently delivered to background threads. (@jasone, @davidtgoldblatt, @interwq)
    • Avoid inactivity checks within background threads, in order to prevent recursive mutex acquisition. (@interwq)
    • Fix extent_grow_retained() to use the specified hooks when the arena.<i>.extent_hooks mallctl is used to override the default hooks. (@interwq)
    • Add missing reentrancy support for custom extent hooks which allocate. (@interwq)
    • Post-fork(2), re-initialize the list of tcaches associated with each arena to contain no tcaches except the forking thread's. (@interwq)
    • Add missing post-fork(2) mutex reinitialization for extent_grow_mtx. This fixes potential deadlocks after fork(2). (@interwq)
    • Enforce minimum autoconf version (currently 2.68), since 2.63 is known to generate corrupt configure scripts. (@jasone)
    • Ensure that the configured page size (--with-lg-page) is no larger than the configured huge page size (--with-lg-hugepage). (@jasone)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-5.0.1.tar.bz2(487.59 KB)
  • 5.0.0(Jun 13, 2017)

    Unlike all previous jemalloc releases, this release does not use naturally aligned "chunks" for virtual memory management, and instead uses page-aligned "extents". This change has few externally visible effects, but the internal impacts are... extensive. Many other internal changes combine to make this the most cohesively designed version of jemalloc so far, with ample opportunity for further enhancements.

    Continuous integration is now an integral aspect of development thanks to the efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a side effect the official release frequency may decrease over time.

    New features:

    • Implement optional per-CPU arena support; threads choose which arena to use based on current CPU rather than on fixed thread-->arena associations. (@interwq)
    • Implement two-phase decay of unused dirty pages. Pages transition from dirty-->muzzy-->clean, where the first phase transition relies on madvise(... MADV_FREE) semantics, and the second phase transition discards pages such that they are replaced with demand-zeroed pages on next access. (@jasone)
    • Increase decay time resolution from seconds to milliseconds. (@jasone)
    • Implement opt-in per CPU background threads, and use them for asynchronous decay-driven unused dirty page purging. (@interwq)
    • Add mutex profiling, which collects a variety of statistics useful for diagnosing overhead/contention issues. (@interwq)
    • Add C++ new/delete operator bindings. (@djwatson)
    • Support manually created arena destruction, such that all data and metadata are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats associated with destroyed arenas. (@jasone)
    • Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing merged/destroyed arena statistics via mallctl. (@jasone)
    • Add opt.abort_conf to optionally abort if invalid configuration options are detected during initialization. (@interwq)
    • Add opt.stats_print_opts, so that e.g. JSON output can be selected for the stats dumped during exit if opt.stats_print is true. (@jasone)
    • Add --with-version=VERSION for use when embedding jemalloc into another project's git repository. (@jasone)
    • Add --disable-thp to support cross compiling. (@jasone)
    • Add --with-lg-hugepage to support cross compiling. (@jasone)
    • Add mallctl interfaces (various authors):
      • background_thread
      • opt.abort_conf
      • opt.retain
      • opt.percpu_arena
      • opt.background_thread
      • opt.{dirty,muzzy}_decay_ms
      • opt.stats_print_opts
      • arena.<i>.initialized
      • arena.<i>.destroy
      • arena.<i>.{dirty,muzzy}_decay_ms
      • arena.<i>.extent_hooks
      • arenas.{dirty,muzzy}_decay_ms
      • arenas.bin.<i>.slab_size
      • arenas.nlextents
      • arenas.lextent.<i>.size
      • arenas.create
      • stats.background_thread.{num_threads,num_runs,run_interval}
      • stats.mutexes.{ctl,background_thread,prof,reset}.{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,num_owner_switch}
      • stats.arenas.<i>.{dirty,muzzy}_decay_ms
      • stats.arenas.<i>.uptime
      • stats.arenas.<i>.{pmuzzy,base,internal,resident}
      • stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}
      • stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs}
      • stats.arenas.<i>.bins.<j>.mutex.{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,num_owner_switch}
      • stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents}
      • stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy,extents_retained,decay_dirty,decay_muzzy,base,tcache_list}.{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,num_owner_switch}

    Portability improvements:

    • Improve reentrant allocation support, such that deadlock is less likely if e.g. a system library call in turn allocates memory. (@davidtgoldblatt, @interwq)
    • Support static linking of jemalloc with glibc. (@djwatson)

    Optimizations and refactors:

    • Organize virtual memory as "extents" of virtual memory pages, rather than as naturally aligned "chunks", and store all metadata in arbitrarily distant locations. This reduces virtual memory external fragmentation, and will interact better with huge pages (not yet explicitly supported). (@jasone)
    • Fold large and huge size classes together; only small and large size classes remain. (@jasone)
    • Unify the allocation paths, and merge most fast-path branching decisions. (@davidtgoldblatt, @interwq)
    • Embed per thread automatic tcache into thread-specific data, which reduces conditional branches and dereferences. Also reorganize tcache to increase fast-path data locality. (@interwq)
    • Rewrite atomics to closely model the C11 API, convert various synchronization from mutex-based to atomic, and use the explicit memory ordering control to resolve various hypothetical races without increasing synchronization overhead. (@davidtgoldblatt)
    • Extensively optimize rtree via various methods:
      • Add multiple layers of rtree lookup caching, since rtree lookups are now part of fast-path deallocation. (@interwq)
      • Determine rtree layout at compile time. (@jasone)
      • Make the tree shallower for common configurations. (@jasone)
      • Embed the root node in the top-level rtree data structure, thus avoiding one level of indirection. (@jasone)
      • Further specialize leaf elements as compared to internal node elements, and directly embed extent metadata needed for fast-path deallocation. (@jasone)
      • Ignore leading always-zero address bits (architecture-specific). (@jasone)
    • Reorganize headers (ongoing work) to make them hermetic, and disentangle various module dependencies. (@davidtgoldblatt)
    • Convert various internal data structures such as size class metadata from boot-time-initialized to compile-time-initialized. Propagate resulting data structure simplifications, such as making arena metadata fixed-size. (@jasone)
    • Simplify size class lookups when constrained to size classes that are multiples of the page size. This speeds lookups, but the primary benefit is complexity reduction in code that was the source of numerous regressions. (@jasone)
    • Lock individual extents when possible for localized extent operations, rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone)
    • Use first fit layout policy instead of best fit, in order to improve packing. (@jasone)
    • If munmap(2) is not in use, use an exponential series to grow each arena's virtual memory, so that the number of disjoint virtual memory mappings remains low. (@jasone)
    • Implement per arena base allocators, so that arenas never share any virtual memory pages. (@jasone)
    • Automatically generate private symbol name mangling macros. (@jasone)

    Incompatible changes:

    • Replace chunk hooks with an expanded/normalized set of extent hooks. (@jasone)
    • Remove ratio-based purging. (@jasone)
    • Remove --disable-tcache. (@jasone)
    • Remove --disable-tls. (@jasone)
    • Remove --enable-ivsalloc. (@jasone)
    • Remove --with-lg-size-class-group. (@jasone)
    • Remove --with-lg-tiny-min. (@jasone)
    • Remove --disable-cc-silence. (@jasone)
    • Remove --enable-code-coverage. (@jasone)
    • Remove --disable-munmap (replaced by opt.retain). (@jasone)
    • Remove Valgrind support. (@jasone)
    • Remove quarantine support. (@jasone)
    • Remove redzone support. (@jasone)
    • Remove mallctl interfaces (various authors):
      • config.munmap
      • config.tcache
      • config.tls
      • config.valgrind
      • opt.lg_chunk
      • opt.purge
      • opt.lg_dirty_mult
      • opt.decay_time
      • opt.quarantine
      • opt.redzone
      • opt.thp
      • arena.<i>.lg_dirty_mult
      • arena.<i>.decay_time
      • arena.<i>.chunk_hooks
      • arenas.initialized
      • arenas.lg_dirty_mult
      • arenas.decay_time
      • arenas.bin.<i>.run_size
      • arenas.nlruns
      • arenas.lrun.<i>.size
      • arenas.nhchunks
      • arenas.hchunk.<i>.size
      • arenas.extend
      • stats.cactive
      • stats.arenas.<i>.lg_dirty_mult
      • stats.arenas.<i>.decay_time
      • stats.arenas.<i>.metadata.{mapped,allocated}
      • stats.arenas.<i>.{npurge,nmadvise,purged}
      • stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests}
      • stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns}
      • stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns}
      • stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks}

    Bug fixes:

    • Improve interval-based profile dump triggering to dump only one profile when a single allocation's size exceeds the interval. (@jasone)
    • Use prefixed function names (as controlled by --with-jemalloc-prefix) when pruning backtrace frames in jeprof. (@jasone)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-5.0.0.tar.bz2(484.52 KB)
  • 4.5.0(Mar 1, 2017)

    This is the first release to benefit from much broader continuous integration testing, thanks to @davidtgoldblatt. Had we had this testing infrastructure in place for prior releases, it would have caught all of the most serious regressions fixed by this release.

    New features:

    • Add --disable-thp and the opt.thp mallctl to provide opt-out mechanisms for transparent huge page integration. (@jasone)
    • Update zone allocator integration to work with macOS 10.12. (@glandium)
    • Restructure *CFLAGS configuration, so that CFLAGS behaves typically, and EXTRA_CFLAGS provides a way to specify e.g. -Werror during building, but not during configuration. (@jasone, @ronawho)

    Bug fixes:

    • Fix DSS (sbrk(2)-based) allocation. This regression was first released in 4.3.0. (@jasone)
    • Handle race in per size class utilization computation. This functionality was first released in 4.0.0. (@interwq)
    • Fix lock order reversal during gdump. (@jasone)
    • Fix/refactor tcache synchronization. This regression was first released in 4.0.0. (@jasone)
    • Fix various JSON-formatted malloc_stats_print() bugs. This functionality was first released in 4.3.0. (@jasone)
    • Fix huge-aligned allocation. This regression was first released in 4.4.0. (@jasone)
    • When transparent huge page integration is enabled, detect what state pages start in according to the kernel's current operating mode, and only convert arena chunks to non-huge during purging if that is not their initial state. This functionality was first released in 4.4.0. (@jasone)
    • Fix lg_chunk clamping for the --enable-cache-oblivious --disable-fill case. This regression was first released in 4.0.0. (@jasone, @428desmo)
    • Properly detect sparc64 when building for Linux. (@glaubitz)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.5.0.tar.bz2(439.44 KB)
  • 4.4.0(Dec 4, 2016)

    New features:

    • Add configure support for --linux-android. (@cferris1000, @jasone)
    • Add the --disable-syscall configure option, for use on systems that place security-motivated limitations on syscall(2). (@jasone)
    • Add support for Debian GNU/kFreeBSD. (@thesam)

    Optimizations:

    • Add extent serial numbers and use them where appropriate as a sort key that is higher priority than address, so that the allocation policy prefers older extents. This tends to improve locality (decrease fragmentation) when memory grows downward. (@jasone)
    • Refactor madvise(2) configuration so that MADV_FREE is detected and utilized on Linux 4.5 and newer. (@jasone)
    • Mark partially purged arena chunks as non-huge-page. This improves interaction with Linux's transparent huge page functionality. (@jasone)

    Bug fixes:

    • Fix size class computations for edge conditions involving extremely large allocations. This regression was first released in 4.0.0. (@jasone, @ingvarha)
    • Remove overly restrictive assertions related to the cactive statistic. This regression was first released in 4.1.0. (@jasone)
    • Implement a more reliable detection scheme for os_unfair_lock on macOS. (@jszakmeister)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.4.0.tar.bz2(429.82 KB)
  • 4.3.1(Nov 8, 2016)

  • 4.3.0(Nov 5, 2016)

    This is the first release that passes the test suite for multiple Windows configurations, thanks in large part to @glandium setting up continuous integration via AppVeyor (and Travis CI for Linux and OS X).

    New features:

    • Add "J" (JSON) support to malloc_stats_print(). (@jasone)
    • Add Cray compiler support. (@ronawho)

    Optimizations:

    • Add/use adaptive spinning for bootstrapping and radix tree node initialization. (@jasone)

    Bug fixes:

    • Fix large allocation to search starting in the optimal size class heap, which can substantially reduce virtual memory churn and fragmentation. This regression was first released in 4.0.0. (@mjp41, @jasone)
    • Fix stats.arenas.<i>.nthreads accounting. (@interwq)
    • Fix and simplify decay-based purging. (@jasone)
    • Make DSS (sbrk(2)-related) operations lockless, which resolves potential deadlocks during thread exit. (@jasone)
    • Fix over-sized allocation of radix tree leaf nodes. (@mjp41, @ogaun, @jasone)
    • Fix over-sized allocation of arena_t (plus associated stats) data structures. (@jasone, @interwq)
    • Fix EXTRA_CFLAGS to not affect configuration. (@jasone)
    • Fix a Valgrind integration bug. (@ronawho)
    • Disallow 0x5a junk filling when running in Valgrind. (@jasone)
    • Fix a file descriptor leak on Linux. This regression was first released in 4.2.0. (@vsarunas, @jasone)
    • Fix static linking of jemalloc with glibc. (@djwatson)
    • Use syscall(2) rather than {open,read,close}(2) during boot on Linux. This works around other libraries' system call wrappers performing reentrant allocation. (@kspinka, @Whissi, @jasone)
    • Fix OS X default zone replacement to work with OS X 10.12. (@glandium, @jasone)
    • Fix cached memory management to avoid needless commit/decommit operations during purging, which resolves permanent virtual memory map fragmentation issues on Windows. (@mjp41, @jasone)
    • Fix TSD fetches to avoid (recursive) allocation. This is relevant to non-TLS and Windows configurations. (@jasone)
    • Fix malloc_conf overriding to work on Windows. (@jasone)
    • Forcibly disable lazy-lock on Windows (was forcibly enabled). (@jasone)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.3.0.tar.bz2(426.97 KB)
  • 4.2.1(Jun 8, 2016)

    Bug fixes:

    • Fix bootstrapping issues for configurations that require allocation during tsd initialization (e.g. --disable-tls). (@cferris1000, @jasone)
    • Fix gettimeofday() version of nstime_update(). (@ronawho)
    • Fix Valgrind regressions in calloc() and chunk_alloc_wrapper(). (@ronawho)
    • Fix potential VM map fragmentation regression. (@jasone)
    • Fix opt_zero-triggered in-place huge reallocation zeroing. (@jasone)
    • Fix heap profiling context leaks in reallocation edge cases. (@jasone)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.2.1.tar.bz2(421.02 KB)
  • 4.2.0(May 12, 2016)

    New features:

    • Add the arena.<i>.reset mallctl, which makes it possible to discard all of an arena's allocations in a single operation. (@jasone)
    • Add the stats.retained and stats.arenas.<i>.retained statistics. (@jasone)
    • Add the --with-version configure option. (@jasone)
    • Support --with-lg-page values larger than actual page size. (@jasone)

    Optimizations:

    • Use pairing heaps rather than red-black trees for various hot data structures. (@djwatson, @jasone)
    • Streamline fast paths of rtree operations. (@jasone)
    • Optimize the fast paths of calloc() and [m,d,sd]allocx(). (@jasone)
    • Decommit unused virtual memory if the OS does not overcommit. (@jasone)
    • Specify MAP_NORESERVE on Linux if [heuristic] overcommit is active, in order to avoid unfortunate interactions during fork(2). (@jasone)

    Bug fixes:

    • Fix chunk accounting related to triggering gdump profiles. (@jasone)
    • Link against librt for clock_gettime(2) if glibc < 2.17. (@jasone)
    • Scale leak report summary according to sampling probability. (@jasone)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.2.0.tar.bz2(420.86 KB)
  • 4.1.1(May 4, 2016)

    This bugfix release resolves a variety of mostly minor issues, though the bitmap fix is critical for 64-bit Windows.

    Bug fixes:

    • Fix the linear scan version of bitmap_sfu() to shift by the proper amount even when sizeof(long) is not the same as sizeof(void *), as on 64-bit Windows. (@jasone)
    • Fix hashing functions to avoid unaligned memory accesses (and resulting crashes). This is relevant at least to some ARM-based platforms. (@rkmisra)
    • Fix fork()-related lock rank ordering reversals. These reversals were unlikely to cause deadlocks in practice except when heap profiling was enabled and active. (@jasone)
    • Fix various chunk leaks in OOM code paths. (@jasone)
    • Fix malloc_stats_print() to print opt.narenas correctly. (@jasone)
    • Fix MSVC-specific build/test issues. (@rustyx, @yuslepukhin)
    • Fix a variety of test failures that were due to test fragility rather than core bugs. (@jasone)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.1.1.tar.bz2(410.94 KB)
  • 4.1.0(Feb 28, 2016)

    This release is primarily about optimizations, but it also incorporates a lot of portability-motivated refactoring and enhancements. Many people worked on this release, to an extent that even with the omission here of minor changes (see git revision history), and of the people who reported and diagnosed issues, so much of the work was contributed that starting with this release, changes are annotated with author credits to help reflect the collaborative effort involved.

    New features:

    • Implement decay-based unused dirty page purging, a major optimization with mallctl API impact. This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. New mallctls:

      • opt.purge
      • opt.decay_time
      • arena.<i>.decay
      • arena.<i>.decay_time
      • arenas.decay_time
      • stats.arenas.<i>.decay_time

      (@jasone, @cevans87)

    • Add --with-malloc-conf, which makes it possible to embed a default options string during configuration. This was motivated by the desire to specify --with-malloc-conf=purge:decay , since the default must remain purge:ratio until the 5.0.0 release. (@jasone)

    • Add MS Visual Studio 2015 support. (@rustyx, @yuslepukhin)

    • Make *allocx() size class overflow behavior defined. The maximum size class is now less than PTRDIFF_MAX to protect applications against numerical overflow, and all allocation functions are guaranteed to indicate errors rather than potentially crashing if the request size exceeds the maximum size class. (@jasone)

    • jeprof:

      • Add raw heap profile support. (@jasone)
      • Add --retain and --exclude for backtrace symbol filtering. (@jasone)

    Optimizations:

    • Optimize the fast path to combine various bootstrapping and configuration checks and execute more streamlined code in the common case. (@interwq)
    • Use linear scan for small bitmaps (used for small object tracking). In addition to speeding up bitmap operations on 64-bit systems, this reduces allocator metadata overhead by approximately 0.2%. (@djwatson)
    • Separate arena_avail trees, which substantially speeds up run tree operations. (@djwatson)
    • Use memoization (boot-time-computed table) for run quantization. Separate arena_avail trees reduced the importance of this optimization. (@jasone)
    • Attempt mmap-based in-place huge reallocation. This can dramatically speed up incremental huge reallocation. (@jasone)

    Incompatible changes:

    • Make opt.narenas unsigned rather than size_t. (@jasone)

    Bug fixes:

    • Fix stats.cactive accounting regression. (@rustyx, @jasone)
    • Handle unaligned keys in hash(). This caused problems for some ARM systems. (@jasone, Christopher Ferris)
    • Refactor arenas array. In addition to fixing a fork-related deadlock, this makes arena lookups faster and simpler. (@jasone)
    • Move retained memory allocation out of the default chunk allocation function, to a location that gets executed even if the application installs a custom chunk allocation function. This resolves a virtual memory leak. (@buchgr)
    • Fix a potential tsd cleanup leak. (Christopher Ferris, @jasone)
    • Fix run quantization. In practice this bug had no impact unless applications requested memory with alignment exceeding one page. (@jasone, @djwatson)
    • Fix LinuxThreads-specific bootstrapping deadlock. (Cosmin Paraschiv)
    • jeprof:
      • Don't discard curl options if timeout is not defined. (@djwatson)
      • Detect failed profile fetches. (@djwatson)
    • Fix stats.arenas.<i>.{dss,lg_dirty_mult,decay_time,pactive,pdirty} for --disable-stats case. (@jasone)
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.1.0.tar.bz2(403.22 KB)
  • 4.0.4(Oct 24, 2015)

    This bugfix release fixes another xallocx() regression. No other regressions have come to light in over a month, so this is likely a good starting point for people who prefer to wait for "dot one" releases with all the major issues shaken out.

    Bug fixes:

    • Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large allocations that have been randomly assigned an offset of 0 when --enable-cache-oblivious configure option is enabled.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.0.4.tar.bz2(382.30 KB)
  • 4.0.3(Sep 25, 2015)

    This bugfix release continues the trend of xallocx() and heap profiling fixes.

    Bug fixes:

    • Fix xallocx(..., MALLOCX_ZERO) to zero all trailing bytes of large allocations when --enable-cache-oblivious configure option is enabled.
    • Fix xallocx(..., MALLOCX_ZERO) to zero trailing bytes of huge allocations when resizing from/to a size class that is not a multiple of the chunk size.
    • Fix prof_tctx_dump_iter() to filter out nodes that were created after heap profile dumping started.
    • Work around a potentially bad thread-specific data initialization interaction with NPTL (glibc's pthreads implementation).
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.0.3.tar.bz2(381.13 KB)
  • 4.0.2(Sep 21, 2015)

    This bugfix release addresses a few bugs specific to heap profiling.

    Bug fixes:

    • Fix ixallocx_prof_sample() to never modify nor create sampled small allocations. xallocx() is in general incapable of moving small allocations, so this fix removes buggy code without loss of generality.
    • Fix irallocx_prof_sample() to always allocate large regions, even when alignment is non-zero.
    • Fix prof_alloc_rollback() to read tdata from thread-specific data rather than dereferencing a potentially invalid tctx.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.0.2.tar.bz2(380.06 KB)
  • 4.0.1(Sep 15, 2015)

    This is a bugfix release that is somewhat high risk due to the amount of refactoring required to address deep xallocx() problems. As a side effect of these fixes, xallocx() now tries harder to partially fulfill requests for optional extra space. Note that a couple of minor heap profiling optimizations are included, but these are better thought of as performance fixes that were integral to discovering most of the other bugs.

    Optimizations:

    • Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the fast path when heap profiling is enabled. Additionally, split a special case out into arena_prof_tctx_reset(), which also avoids chunk metadata reads.
    • Optimize irallocx_prof() to optimistically update the sampler state. The prior implementation appears to have been a holdover from when rallocx()/xallocx() functionality was combined as rallocm().

    Bug fixes:

    • Fix TLS configuration such that it is enabled by default for platforms on which it works correctly.
    • Fix arenas_cache_cleanup() and arena_get_hard() to handle allocation/deallocation within the application's thread-specific data cleanup functions even after arenas_cache is torn down.
    • Fix xallocx() bugs related to size+extra exceeding HUGE_MAXCLASS.
    • Fix chunk purge hook calls for in-place huge shrinking reallocation to specify the old chunk size rather than the new chunk size. This bug caused no correctness issues for the default chunk purge function, but was visible to custom functions set via the "arena.<i>.chunk_hooks" mallctl.
    • Fix heap profiling bugs:
      • Fix heap profiling to distinguish among otherwise identical sample sites with interposed resets (triggered via the "prof.reset" mallctl). This bug could cause data structure corruption that would most likely result in a segfault.
      • Fix irealloc_prof() to prof_alloc_rollback() on OOM.
      • Make one call to prof_active_get_unlocked() per allocation event, and use the result throughout the relevant functions that handle an allocation event. Also add a missing check in prof_realloc(). These fixes protect allocation events against concurrent prof_active changes.
      • Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample() in the correct order.
      • Fix prof_realloc() to call prof_free_sampled_object() after calling prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx were the same, the tctx could have been prematurely destroyed.
    • Fix portability bugs:
      • Don't bitshift by negative amounts when encoding/decoding run sizes in chunk header maps. This affected systems with page sizes greater than 8 KiB.
      • Rename index_t to szind_t to avoid an existing type on Solaris.
      • Add JEMALLOC_CXX_THROW to the memalign() function prototype, in order to match glibc and avoid compilation errors when including both jemalloc/jemalloc.h and malloc.h in C++ code.
      • Don't assume that /bin/sh is appropriate when running size_classes.sh during configuration.
      • Consider __sparcv9 a synonym for sparc64 when defining LG_QUANTUM.
      • Link tests to librt if it contains clock_gettime(2).
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.0.1.tar.bz2(379.81 KB)
  • 4.0.0(Aug 17, 2015)

    This version contains many speed and space optimizations, both minor and major. The major themes are generalization, unification, and simplification. Although many of these optimizations cause no visible behavior change, their cumulative effect is substantial.

    New features:

    • Normalize size class spacing to be consistent across the complete size range. By default there are four size classes per size doubling, but this is now configurable via the --with-lg-size-class-group option. Also add the --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and --with-lg-tiny-min options, which can be used to tweak page and size class settings. Impacts:
      • Worst case performance for incrementally growing/shrinking reallocation is improved because there are far fewer size classes, and therefore copying happens less often.
      • Internal fragmentation is limited to 20% for all but the smallest size classes (those less than four times the quantum). (1B + 4 KiB) and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
      • Chunk fragmentation tends to be lower because there are fewer distinct run sizes to pack.
    • Add support for explicit tcaches. The "tcache.create", "tcache.flush", and "tcache.destroy" mallctls control tcache lifetime and flushing, and the MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API control which tcache is used for each operation.
    • Implement per thread heap profiling, as well as the ability to enable/disable heap profiling on a per thread basis. Add the "prof.reset", "prof.lg_sample", "thread.prof.name", "thread.prof.active", "opt.prof_thread_active_init", "prof.thread_active_init", and "thread.prof.active" mallctls.
    • Add support for per arena application-specified chunk allocators, configured via the "arena.<i>.chunk_hooks" mallctl.
    • Refactor huge allocation to be managed by arenas, so that arenas now function as general purpose independent allocators. This is important in the context of user-specified chunk allocators, aside from the scalability benefits. Related new statistics:
      • The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc", "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests" mallctls provide high level per arena huge allocation statistics.
      • The "arenas.nhchunks", "arenas.hchunk.<i>.size", "stats.arenas.<i>.hchunks.<j>.nmalloc", "stats.arenas.<i>.hchunks.<j>.ndalloc", "stats.arenas.<i>.hchunks.<j>.nrequests", and "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class statistics.
    • Add the 'util' column to malloc_stats_print() output, which reports the proportion of available regions that are currently in use for each small size class.
    • Add "alloc" and "free" modes for for junk filling (see the "opt.junk" mallctl), so that it is possible to separately enable junk filling for allocation versus deallocation.
    • Add the jemalloc-config script, which provides information about how jemalloc was configured, and how to integrate it into application builds.
    • Add metadata statistics, which are accessible via the "stats.metadata", "stats.arenas.<i>.metadata.mapped", and "stats.arenas.<i>.metadata.allocated" mallctls.
    • Add the "stats.resident" mallctl, which reports the upper limit of physically resident memory mapped by the allocator.
    • Add per arena control over unused dirty page purging, via the "arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and "stats.arenas.<i>.lg_dirty_mult" mallctls.
    • Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump feature on/off during program execution.
    • Add sdallocx(), which implements sized deallocation. The primary optimization over dallocx() is the removal of a metadata read, which often suffers an L1 cache miss.
    • Add missing header includes in jemalloc/jemalloc.h, so that applications only have to #include <jemalloc/jemalloc.h>.
    • Add support for additional platforms:
      • Bitrig
      • Cygwin
      • DragonFlyBSD
      • iOS
      • OpenBSD
      • OpenRISC/or1k

    Optimizations:

    • Maintain dirty runs in per arena LRUs rather than in per arena trees of dirty-run-containing chunks. In practice this change significantly reduces dirty page purging volume.
    • Integrate whole chunks into the unused dirty page purging machinery. This reduces the cost of repeated huge allocation/deallocation, because it effectively introduces a cache of chunks.
    • Split the arena chunk map into two separate arrays, in order to increase cache locality for the frequently accessed bits.
    • Move small run metadata out of runs, into arena chunk headers. This reduces run fragmentation, smaller runs reduce external fragmentation for small size classes, and packed (less uniformly aligned) metadata layout improves CPU cache set distribution.
    • Randomly distribute large allocation base pointer alignment relative to page boundaries in order to more uniformly utilize CPU cache sets. This can be disabled via the --disable-cache-oblivious configure option, and queried via the "config.cache_oblivious" mallctl.
    • Micro-optimize the fast paths for the public API functions.
    • Refactor thread-specific data to reside in a single structure. This assures that only a single TLS read is necessary per call into the public API.
    • Implement in-place huge allocation growing and shrinking.
    • Refactor rtree (radix tree for chunk lookups) to be lock-free, and make additional optimizations that reduce maximum lookup depth to one or two levels. This resolves what was a concurrency bottleneck for per arena huge allocation, because a global data structure is critical for determining which arenas own which huge allocations.

    Incompatible changes:

    • Replace --enable-cc-silence with --disable-cc-silence to suppress spurious warnings by default.
    • Assure that the constness of malloc_usable_size()'s return type matches that of the system implementation.
    • Change the heap profile dump format to support per thread heap profiling, rename pprof to jeprof, and enhance it with the --thread=<n> option. As a result, the bundled jeprof must now be used rather than the upstream (gperftools) pprof.
    • Disable "opt.prof_final" by default, in order to avoid atexit(3), which can internally deadlock on some platforms.
    • Change the "arenas.nlruns" mallctl type from size_t to unsigned.
    • Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with "stats.arenas.<i>.bins.<j>.curregs".
    • Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
    • Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.

    Removed features:

    • Remove the *allocm() API, which is superseded by the *allocx() API.
    • Remove the --enable-dss options, and make dss non-optional on all platforms which support sbrk(2).
    • Remove the "arenas.purge" mallctl, which was obsoleted by the "arena.<i>.purge" mallctl in 3.1.0.
    • Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically detects whether it is running inside Valgrind.
    • Remove the "stats.huge.allocated", "stats.huge.nmalloc", and "stats.huge.ndalloc" mallctls.
    • Remove the --enable-mremap option.
    • Remove the "stats.chunks.current", "stats.chunks.total", and "stats.chunks.high" mallctls.

    Bug fixes:

    • Fix the cactive statistic to decrease (rather than increase) when active memory decreases. This regression was first released in 3.5.0.
    • Fix OOM handling in memalign() and valloc(). A variant of this bug existed in all releases since 2.0.0, which introduced these functions.
    • Fix an OOM-related regression in arena_tcache_fill_small(), which could cause cache corruption on OOM. This regression was present in all releases from 2.2.0 through 3.6.0.
    • Fix size class overflow handling for malloc(), posix_memalign(), memalign(), calloc(), and realloc() when profiling is enabled.
    • Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.
    • Fix fallback lg_floor() implementations to handle extremely large inputs.
    • Ensure the default purgeable zone is after the default zone on OS X.
    • Fix latent bugs in atomic_*().
    • Fix the "arena.<i>.dss" mallctl to handle read-only calls.
    • Fix tls_model configuration to enable the initial-exec model when possible.
    • Mark malloc_conf as a weak symbol so that the application can override it.
    • Correctly detect glibc's adaptive pthread mutexes.
    • Fix the --without-export configure option.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-4.0.0.tar.bz2(376.01 KB)
  • 3.6.0(Apr 15, 2015)

    This version contains a critical bug fix for a regression present in 3.5.0 and 3.5.1.

    Bug fixes:

    • Fix a regression in arena_chunk_alloc() that caused crashes during small/large allocation if chunk allocation failed. In the absence of this bug, chunk allocation failure would result in allocation failure, e.g. NULL return from malloc(). This regression was introduced in 3.5.0.
    • Fix backtracing for gcc intrinsics-based backtracing by specifying -fno-omit-frame-pointer to gcc. Note that the application (and all the libraries it links to) must also be compiled with this option for backtracing to be reliable.
    • Use dss allocation precedence for huge allocations as well as small/large allocations.
    • Fix test assertion failure message formatting. This bug did not manifest on x86_64 systems because of implementation subtleties in va_list.
    • Fix inconsequential test failures for hash and SFMT code.

    New features:

    • Support heap profiling on FreeBSD. This feature depends on the proc filesystem being mounted during heap profile dumping.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.6.0.tar.bz2(331.01 KB)
  • 3.5.1(Apr 18, 2015)

    This version primarily addresses minor bugs in test code.

    Bug fixes:

    • Configure Solaris/Illumos to use MADV_FREE.
    • Fix junk filling for mremap(2)-based huge reallocation. This is only relevant if configuring with the --enable-mremap option specified.
    • Avoid compilation failure if 'restrict' C99 keyword is not supported by the compiler.
    • Add a configure test for SSE2 rather than assuming it is usable on i686 systems. This fixes test compilation errors, especially on 32-bit Linux systems.
    • Fix mallctl argument size mismatches (size_t vs. uint64_t) in the stats unit test.
    • Fix/remove flawed alignment-related overflow tests.
    • Prevent compiler optimizations that could change backtraces in the prof_accum unit test.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.5.1.tar.bz2(328.66 KB)
  • 3.5.0(Apr 18, 2015)

    This version focuses on refactoring and automated testing, though it also includes some non-trivial heap profiling optimizations not mentioned below.

    New features:

    • Add the *allocx() API, which is a successor to the experimental *allocm() API. The *allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocm() share with posix_memalign(). Note that *allocm() is slated for removal in the next non-bugfix release.
    • Add support for LinuxThreads.

    Bug fixes:

    • Unless heap profiling is enabled, disable floating point code and don't link with libm. This, in combination with e.g. EXTRA_CFLAGS=-mno-sse on x64 systems, makes it possible to completely disable floating point register use. Some versions of glibc neglect to save/restore caller-saved floating point registers during dynamic lazy symbol loading, and the symbol loading code uses whatever malloc the application happens to have linked/loaded with, the result being potential floating point register corruption.
    • Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().
    • Fix a file descriptor leak in a prof_dump_maps() error path.
    • Fix prof_dump() to close the dump file descriptor for all relevant error paths.
    • Fix rallocm() to use the arena specified by the ALLOCM_ARENA(s) flag for allocation, not just deallocation.
    • Fix a data race for large allocation stats counters.
    • Fix a potential infinite loop during thread exit. This bug occurred on Solaris, and could affect other platforms with similar pthreads TSD implementations.
    • Don't junk-fill reallocations unless usable size changes. This fixes a violation of the _allocx()/_allocm() semantics.
    • Fix growing large reallocation to junk fill new space.
    • Fix huge deallocation to junk fill when munmap is disabled.
    • Change the default private namespace prefix from empty to je_, and change --with-private-namespace-prefix so that it prepends an additional prefix rather than replacing je_. This reduces the likelihood of applications which statically link jemalloc experiencing symbol name collisions.
    • Add missing private namespace mangling (relevant when --with-private-namespace is specified).
    • Add and use JEMALLOC_INLINE_C so that static inline functions are marked as static even for debug builds.
    • Add a missing mutex unlock in a malloc_init_hard() error path. In practice this error path is never executed.
    • Fix numerous bugs in malloc_strotumax() error handling/reporting. These bugs had no impact except for malformed inputs.
    • Fix numerous bugs in malloc_snprintf(). These bugs were not exercised by existing calls, so they had no impact.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.5.0.tar.bz2(327.92 KB)
  • 3.4.1(Apr 18, 2015)

    Bug fixes:

    • Fix a race in the "arenas.extend" mallctl that could cause memory corruption of internal data structures and subsequent crashes.
    • Fix Valgrind integration flaws that caused Valgrind warnings about reads of uninitialized memory in:
      • arena chunk headers
      • internal zero-initialized data structures (relevant to tcache and prof code)
    • Preserve errno during the first allocation. A readlink(2) call during initialization fails unless /etc/malloc.conf exists, so errno was typically set during the first allocation prior to this fix.
    • Fix compilation warnings reported by gcc 4.8.1.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.4.1.tar.bz2(248.19 KB)
  • 3.4.0(Apr 18, 2015)

    This version is essentially a small bugfix release, but the addition of aarch64 support requires that the minor version be incremented.

    Bug fixes:

    • Fix race-triggered deadlocks in chunk_record(). These deadlocks were typically triggered by multiple threads concurrently deallocating huge objects.

    New features:

    • Add support for the aarch64 architecture.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.4.0.tar.bz2(247.75 KB)
  • 3.3.1(Apr 18, 2015)

    This version fixes bugs that are typically encountered only when utilizing custom run-time options.

    Bug fixes:

    • Fix a locking order bug that could cause deadlock during fork if heap profiling were enabled.
    • Fix a chunk recycling bug that could cause the allocator to lose track of whether a chunk was zeroed. On FreeBSD, NetBSD, and OS X, it could cause corruption if allocating via sbrk(2) (unlikely unless running with the "dss:primary" option specified). This was completely harmless on Linux unless using mlockall(2) (and unlikely even then, unless the --disable-munmap configure option or the "dss:primary" option was specified). This regression was introduced in 3.1.0 by the mlockall(2)/madvise(2) interaction fix.
    • Fix TLS-related memory corruption that could occur during thread exit if the thread never allocated memory. Only the quarantine and prof facilities were susceptible.
    • Fix two quarantine bugs:
      • Internal reallocation of the quarantined object array leaked the old array.
      • Reallocation failure for internal reallocation of the quarantined object array (very unlikely) resulted in memory corruption.
    • Fix Valgrind integration to annotate all internally allocated memory in a way that keeps Valgrind happy about internal data structure access.
    • Fix building for s390 systems.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.3.1.tar.bz2(247.06 KB)
  • 3.3.0(Apr 18, 2015)

    This version includes a few minor performance improvements in addition to the listed new features and bug fixes.

    New features:

    • Add clipping support to lg_chunk option processing.
    • Add the --enable-ivsalloc option.
    • Add the --without-export option.
    • Add the --disable-zone-allocator option.

    Bug fixes:

    • Fix "arenas.extend" mallctl to output the number of arenas.
    • Fix chunk_recycle() to unconditionally inform Valgrind that returned memory is undefined.
    • Fix build break on FreeBSD related to alloca.h.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.3.0.tar.bz2(246.72 KB)
  • 3.2.0(Apr 18, 2015)

    In addition to a couple of bug fixes, this version modifies page run allocation and dirty page purging algorithms in order to better control page-level virtual memory fragmentation.

    Incompatible changes:

    • Change the "opt.lg_dirty_mult" default from 5 to 3 (32:1 to 8:1).

    Bug fixes:

    • Fix dss/mmap allocation precedence code to use recyclable mmap memory only after primary dss allocation fails.
    • Fix deadlock in the "arenas.purge" mallctl. This regression was introduced in 3.1.0 by the addition of the "arena.<i>.purge" mallctl.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.2.0.tar.bz2(244.21 KB)
  • 3.1.0(Apr 18, 2015)

    New features:

    • Auto-detect whether running inside Valgrind, thus removing the need to manually specify MALLOC_CONF=valgrind:true.
    • Add the "arenas.extend" mallctl, which allows applications to create manually managed arenas.
    • Add the ALLOCM_ARENA() flag for {,r,d}allocm().
    • Add the "opt.dss", "arena.<i>.dss", and "stats.arenas.<i>.dss" mallctls, which provide control over dss/mmap precedence.
    • Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".
    • Define LG_QUANTUM for hppa.

    Incompatible changes:

    • Disable tcache by default if running inside Valgrind, in order to avoid making unallocated objects appear reachable to Valgrind.
    • Drop const from malloc_usable_size() argument on Linux.

    Bug fixes:

    • Fix heap profiling crash if sampled object is freed via realloc(p, 0).
    • Remove const from __*_hook variable declarations, so that glibc can modify them during process forking.
    • Fix mlockall(2)/madvise(2) interaction.
    • Fix fork(2)-related deadlocks.
    • Fix error return value for "thread.tcache.enabled" mallctl.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.1.0.tar.bz2(244.91 KB)
  • 3.0.0(Apr 18, 2015)

    Although this version adds some major new features, the primary focus is on internal code cleanup that facilitates maintainability and portability, most of which is not reflected in the ChangeLog. This is the first release to incorporate substantial contributions from numerous other developers, and the result is a more broadly useful allocator (see the git revision history for contribution details). Note that the license has been unified, thanks to Facebook granting a license under the same terms as the other copyright holders (see COPYING).

    New features:

    • Implement Valgrind support, redzones, and quarantine.
    • Add support for additional platforms:
      • FreeBSD
      • Mac OS X Lion
      • MinGW
      • Windows (no support yet for replacing the system malloc)
    • Add support for additional architectures:
      • MIPS
      • SH4
      • Tilera
    • Add support for cross compiling.
    • Add nallocm(), which rounds a request size up to the nearest size class without actually allocating.
    • Implement aligned_alloc() (blame C11).
    • Add the "thread.tcache.enabled" mallctl.
    • Add the "opt.prof_final" mallctl.
    • Update pprof (from gperftools 2.0).
    • Add the --with-mangling option.
    • Add the --disable-experimental option.
    • Add the --disable-munmap option, and make it the default on Linux.
    • Add the --enable-mremap option, which disables use of mremap(2) by default.

    Incompatible changes:

    • Enable stats by default.
    • Enable fill by default.
    • Disable lazy locking by default.
    • Rename the "tcache.flush" mallctl to "thread.tcache.flush".
    • Rename the "arenas.pagesize" mallctl to "arenas.page".
    • Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB).
    • Change the "opt.prof_accum" default from true to false.

    Removed features:

    • Remove the swap feature, including the "config.swap", "swap.avail", "swap.prezeroed", "swap.nfds", and "swap.fds" mallctls.
    • Remove highruns statistics, including the "stats.arenas.<i>.bins.<j>.highruns" and "stats.arenas.<i>.lruns.<j>.highruns" mallctls.
    • As part of small size class refactoring, remove the "opt.lg_[qc]space_max", "arenas.cacheline", "arenas.subpage", "arenas.[tqcs]space_{min,max}", and "arenas.[tqcs]bins" mallctls.
    • Remove the "arenas.chunksize" mallctl.
    • Remove the "opt.lg_prof_tcmax" option.
    • Remove the "opt.lg_prof_bt_max" option.
    • Remove the "opt.lg_tcache_gc_sweep" option.
    • Remove the --disable-tiny option, including the "config.tiny" mallctl.
    • Remove the --enable-dynamic-page-shift configure option.
    • Remove the --enable-sysv configure option.

    Bug fixes:

    • Fix a statistics-related bug in the "thread.arena" mallctl that could cause invalid statistics and crashes.
    • Work around TLS deallocation via free() on Linux. This bug could cause write-after-free memory corruption.
    • Fix a potential deadlock that could occur during interval- and growth-triggered heap profile dumps.
    • Fix large calloc() zeroing bugs due to dropping chunk map unzeroed flags.
    • Fix chunk_alloc_dss() to stop claiming memory is zeroed. This bug could cause memory corruption and crashes with --enable-dss specified.
    • Fix fork-related bugs that could cause deadlock in children between fork and exec.
    • Fix malloc_stats_print() to honor 'b' and 'l' in the opts parameter.
    • Fix realloc(p, 0) to act like free(p).
    • Do not enforce minimum alignment in memalign().
    • Check for NULL pointer in malloc_usable_size().
    • Fix an off-by-one heap profile statistics bug that could be observed in interval- and growth-triggered heap profiles.
    • Fix the "epoch" mallctl to update cached stats even if the passed in epoch is 0.
    • Fix bin->runcur management to fix a layout policy bug. This bug did not affect correctness.
    • Fix a bug in choose_arena_hard() that potentially caused more arenas to be initialized than necessary.
    • Add missing "opt.lg_tcache_max" mallctl implementation.
    • Use glibc allocator hooks to make mixed allocator usage less likely.
    • Fix build issues for --disable-tcache.
    • Don't mangle pthread_create() when --with-private-namespace is specified.
    Source code(tar.gz)
    Source code(zip)
    jemalloc-3.0.0.tar.bz2(239.44 KB)
  • 2.2.5(Apr 18, 2015)

Owner
jemalloc memory allocator
jemalloc memory allocator
mimalloc is a compact general purpose allocator with excellent performance.

mimalloc mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leij

Microsoft 7.6k Dec 30, 2022
Custom implementation of C stdlib malloc(), realloc(), and free() functions.

C-Stdlib-Malloc-Implementation NOT INTENDED TO BE COMPILED AND RAN, DRIVER CODE NOT OWNED BY I, ARCINI This is a custom implmentation of the standard

Alex Cini 1 Dec 27, 2021
A poggers malloc implementation

pogmalloc(3) A poggers malloc implementation Features Static allocator Real heap allocator (via sbrk(2)) Builtin GC Can also mark static memory Can be

Ariel Simulevski 2 Jun 12, 2022
Test your malloc protection

Test your allocs protections and leaks ! Report Bug · Request Feature Table of Contents About The Tool Getting Started Prerequisites Quickstart Usage

tmatis 49 Dec 7, 2022
Malloc Lab: simple memory allocator using sorted segregated free list

LAB 6: Malloc Lab Main Files mm.{c,h} - Your solution malloc package. mdriver.c - The malloc driver that tests your mm.c file short{1,2}-bal.rep - T

null 1 Feb 28, 2022
Hardened malloc - Hardened allocator designed for modern systems

Hardened malloc - Hardened allocator designed for modern systems. It has integration into Android's Bionic libc and can be used externally with musl and glibc as a dynamic library for use on other Linux-based platforms. It will gain more portability / integration over time.

GrapheneOS 893 Jan 3, 2023
Mimalloc-bench - Suite for benchmarking malloc implementations.

Mimalloc-bench Suite for benchmarking malloc implementations, originally developed for benchmarking mimalloc. Collection of various benchmarks from th

Daan 186 Dec 24, 2022
Implementation of System V shared memory (a type of inter process communication) in xv6 operating system.

NOTE: we have stopped maintaining the x86 version of xv6, and switched our efforts to the RISC-V version (https://github.com/mit-pdos/xv6-riscv.git)

Viraj Jadhav 5 Feb 21, 2022
The Best and Highest-Leveled and Newest bingding for MiMalloc Ever Existed in Rust

The Best and Highest-Leveled and Newest bingding for MiMalloc Ever Existed in Rust mimalloc 1.7.2 stable Why create this in repo https://github.com/pu

LemonHX 31 Dec 17, 2022
STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

memory The C++ STL allocator model has various flaws. For example, they are fixed to a certain type, because they are almost necessarily required to b

Jonathan Müller 1.2k Dec 26, 2022
OpenXenium JTAG and Flash Memory programmer

OpenXenium JTAG and Flash Memory programmer * Read: "Home Brew" on ORIGINAL XBOX - a detailed article on why and how * The tools in this repo will all

Koos du Preez 29 Oct 23, 2022
A simple windows driver that can read and write to process memory from kernel mode

ReadWriteProcessMemoryDriver A simple windows driver that can read and write to process memory from kernel mode This was just a small project for me t

Hypervisor 8 Dec 7, 2022
MMCTX (Memory Management ConTeXualizer), is a tiny (< 300 lines), single header C99 library that allows for easier memory management by implementing contexts that remember allocations for you and provide freeall()-like functionality.

MMCTX (Memory Management ConTeXualizer), is a tiny (< 300 lines), single header C99 library that allows for easier memory management by implementing contexts that remember allocations for you and provide freeall()-like functionality.

A.P. Jo. 4 Oct 2, 2021
Tool for profiling heap usage and memory management

vizzy > ./build/vizzytrace /tmp/heapinfo.trace /bin/find /home/zznop -name vizzy _ _ ____ ____ ____ _ _ ( \/ )(_ _)(_ )(_ )( \/ ) \ /

Brandon Miller 28 Jul 22, 2022
STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

Jonathan Müller 1k Dec 2, 2021
A C++ Class and Template Library for Performance Critical Applications

Spirick Tuning A C++ Class and Template Library for Performance Critical Applications Optimized for Performance The Spirick Tuning library provides a

Dietmar Deimling 3 Dec 6, 2021
Test cpu and memory speed at linux-vps

Тест скорости процессора и памяти на linux-vps. Занимается бессмысленным перемножением массивов случайных чисел, для определения скорости процессора и

Anton 3 Nov 30, 2021
Using shared memory to communicate between two executables or processes, for Windows, Linux and MacOS (posix). Can also be useful for remote visualization/debugging.

shared-memory-example Using shared memory to communicate between two executables or processes, for Windows, Linux and MacOS (posix). Can also be usefu

null 9 Aug 17, 2022
A simple C++ library for creating and managing bitstreams in memory.

ezbitstream (v0.001) A simple C++ library for creating and managing bitstreams in memory. API & Implementation ezbitstream implements bitstreams with

Unsal Ozturk 2 Feb 4, 2022