Snmalloc - Message passing based allocator

Overview

snmalloc

snmalloc is a high-performance allocator. snmalloc can be used directly in a project as a header-only C++ library, it can be LD_PRELOADed on Elf platforms (e.g. Linux, BSD), and there is a crate to use it from Rust.

Its key design features are:

  • Memory that is freed by the same thread that allocated it does not require any synchronising operations.
  • Freeing memory in a different thread to initially allocated it, does not take any locks and instead uses a novel message passing scheme to return the memory to the original allocator, where it is recycled. This enables 1000s of remote deallocations to be performed with only a single atomic operation enabling great scaling with core count.
  • The allocator uses large ranges of pages to reduce the amount of meta-data required.
  • The fast paths are highly optimised with just two branches on the fast path for malloc (On Linux compiled with Clang).
  • The platform dependencies are abstracted away to enable porting to other platforms.

snmalloc's design is particular well suited to the following two difficult scenarios that can be problematic for other allocators:

  • Allocations on one thread are freed by a different thread
  • Deallocations occur in large batches

Both of these can cause massive reductions in performance of other allocators, but do not for snmalloc.

Comprehensive details about snmalloc's design can be found in the accompanying paper, and differences between the paper and the current implementation are described here. Since writing the paper, the performance of snmalloc has improved considerably.

snmalloc CI

Further documentation

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • Rewrite Apple PAL using native APIs

    Rewrite Apple PAL using native APIs

    This is a rewrite of the Apple PAL that implements the AlignedAllocation, Entropy, and LazyCommit features through native Apple APIs.

    It adds a dependency on Security.framework via SecRandomCopyBytes. Apple actively discourages use of getentropy and the symbol is not allowed on the App Store.

    TODO / Discussion

    USE_POSIX_COMMIT_CHECKS

    I removed support for USE_POSIX_COMMIT_CHECKS because the POSIX memory management APIs are no longer used but I'd like to avoid a regression. I need some context about what the memset and mprotect calls in pal_bsd.h and pal_posix.h are supposed to accomplish. Is it supposed to emulate Windows' decommit behavior? If so, why do the pages need to be decommitted? What are the invariants?

    I hope to use the mach APIs to implement the behavior if possible and prevent any M1-related bugs:

    • #278
    • #280
    • #284

    Can the size parameter of reserve_aligned be greater than 4GiB?

    There are Apple equivalents of ReclaimVirtualMemory and OfferVirtualMemory, but they only work on regions that are 4GiB and smaller. (These APIs have existed since macOS 10.6 and are present in the iOS SDK.)

    The 4GiB limit is a hard limit in the XNU kernel even on 64-bit platforms.

    ~~LowMemoryNotification~~

    ~~I have a working implementation behind the feature SNMALLOC_APPLE_LOWMEMORYNOTIFICATION. I'll push the change after we resolve USE_POSIX_COMMIT_CHECKS.~~

    opened by amari 36
  • Investigate Windows Commit charge for snmalloc

    Investigate Windows Commit charge for snmalloc

    @aganea has enabled snmalloc, mimalloc, and rpmalloc to be the allocator for lld-link. He has benchmarked this with performing ThinLTO on a clang build. The results taken from https://reviews.llvm.org/D71786 are

    | Allocator | Wall clock | Page ranges commited/decommited | Total touched pages | Peak Mem | ---------------------------------|----------------|-----|----|-----------| | Windows 10 version 2004 | 38 min 47 sec | | | 14.9 GB | | mimalloc | 2 min 22 sec | 1,449,501 | 174,3 GB | 19,8 GB | | rpmalloc | 2 min 15 sec | 270,796 | 45,9 GB | 31,9 GB | | snmalloc | 2 min 19 sec | 102,839 | 47,0 GB | 42,0 GB |

    The time is pretty comparable, but this shows snmalloc on Windows as committing considerably more memory than other allocators.

    Experiments to try

    • Different "chunk" size, 16MiB versus 1MiB.
    • Sub chunk, commit/decommit operations
    opened by mjp41 24
  • CHERI Preparatory work

    CHERI Preparatory work

    Some portability fixes and some changes to pointer provenance throughout. No explicit MIPS or CHERI support yet (on this branch), but perhaps subsets of this are ready to merge and others deserve a chance to receive commentary. :)

    opened by nwf 23
  • Adding Rust Support

    Adding Rust Support

    I am trying to bind snmalloc to rust to have a comprehensive malloc-bench (WIP).

    Basically, I have already created several bindings: snmalloc,snmalloc-sys, snmallocator.

    Yet, there is a issue to provide efficient realloc with alignment. I added one following the code in malloc.h, but I do not know whether it is sound and safe. So I guess I need some guidance here.

    opened by SchrodingerZhu 22
  • Free list corruption during random allocations trying to use as Unreal Engine allocator

    Free list corruption during random allocations trying to use as Unreal Engine allocator

    Hi! I've only learned about the existence of this library today. I am always curious about new high performance allocators, so I tried to create an implementation of snmalloc for Unreal Engine. Unreal Engine has their own memory allocator abstraction, so it should be fairly easy to do based on your header-only library mode. I also based it on their implementation of mimalloc ( https://github.com/EpicGames/UnrealEngine/blob/4.27/Engine/Source/Runtime/Core/Private/HAL/MallocMimalloc.cpp ).

    Unreal Engine has some pecularities with how it uses allocation functions, so it might use Realloc with a size 0 to Free, Realloc without an existing pointer, or Malloc with a size 0 expecting nullptr back. This wrapper takes care of all of these.

    This is the result - seems pretty straight forward, but do let me know if I am using something clearly wrong.

    using UnrealBuildTool;
    using System.IO;
    
    public class snmalloc : ModuleRules
    {
    	public snmalloc(ReadOnlyTargetRules Target) : base(Target)
    	{
    		Type = ModuleType.External;
    
    		if (Target.Platform.IsInGroup(UnrealPlatformGroup.Windows))
    		{
    			PublicSystemIncludePaths.Add(Target.UEThirdPartySourceDirectory + "snmalloc\\src");
    		}
    	}
    }
    
    #pragma once
    
    #include "CoreTypes.h"
    #include "HAL/PlatformMemory.h"
    #include "HAL/MemoryBase.h"
    
    #if PLATFORM_WINDOWS
    
    class FMallocSnmalloc final
    	: public FMalloc
    {
    public:
    	// FMalloc interface.
    	virtual void* Malloc(SIZE_T Size, uint32 Alignment) override;
    	virtual void* TryMalloc(SIZE_T Size, uint32 Alignment) override;
    	virtual void* Realloc(void* Ptr, SIZE_T NewSize, uint32 Alignment) override;
    	virtual void* TryRealloc(void* Ptr, SIZE_T NewSize, uint32 Alignment) override;
    	virtual void Free(void* Ptr) override;
    	virtual bool GetAllocationSize(void* Original, SIZE_T &SizeOut) override;
    	virtual void Trim(bool bTrimThreadCaches) override;
    	virtual SIZE_T QuantizeSize(SIZE_T Count, uint32 Alignment) override;
    
    	virtual bool IsInternallyThreadSafe() const override
    	{ 
    		return true;
    	}
    
    	virtual const TCHAR* GetDescriptiveName() override
    	{
    		return TEXT("Snmalloc");
    	}
    
    protected:
    	void OutOfMemory(uint64 Size, uint32 Alignment)
    	{
    		// this is expected not to return
    		FPlatformMemory::OnOutOfMemory(Size, Alignment);
    	}
    };
    
    #endif
    
    
    #include "HAL/MallocSnmalloc.h"
    #include "Math/UnrealMathUtility.h"
    #include "HAL/UnrealMemory.h"
    
    #if PLATFORM_WINDOWS
    
    #define SNMALLOC_CHECK_CLIENT
    #define SNMALLOC_USE_CXX17
    #define SNMALLOC_TRACING
    #define SNMALLOC_CHECK_LOADS
    
    THIRD_PARTY_INCLUDES_START
    #include "Windows/PreWindowsApi.h"
    #pragma warning(push)
    #pragma warning(disable:4582)
    #pragma warning(disable:4701)
    #pragma warning(disable:4703)
    #include <snmalloc.h>
    #pragma warning(pop)
    #include "Windows/PostWindowsApi.h"
    THIRD_PARTY_INCLUDES_END
    
    void* FMallocSnmalloc::TryMalloc(SIZE_T Size, uint32 Alignment)
    {
    	if (!Size)
    	{
    		return nullptr;
    	}
    
    #if !UE_BUILD_SHIPPING
    	uint64 LocalMaxSingleAlloc = MaxSingleAlloc.Load(EMemoryOrder::Relaxed);
    	if (LocalMaxSingleAlloc != 0 && Size > LocalMaxSingleAlloc)
    	{
    		return nullptr;
    	}
    #endif
    
    	if (Alignment != DEFAULT_ALIGNMENT)
    	{
    		check(Alignment <= snmalloc::natural_alignment(Size));
    	}
    
    	return snmalloc::ThreadAlloc::get().alloc(Size);
    }
    
    void* FMallocSnmalloc::Malloc(SIZE_T Size, uint32 Alignment)
    {
    	void* NewPtr = TryMalloc(Size, Alignment);
    
    	if (NewPtr == nullptr && Size)
    	{
    		OutOfMemory(Size, Alignment);
    	}
    
    	return NewPtr;
    }
    
    void* FMallocSnmalloc::TryRealloc(void* Ptr, SIZE_T NewSize, uint32 Alignment)
    {
    	if (!Ptr)
    	{
    		return TryMalloc(NewSize, Alignment);
    	}
    
    	if (!NewSize)
    	{
    		Free(Ptr);
    		return nullptr;
    	}
    
    #if !UE_BUILD_SHIPPING
    	uint64 LocalMaxSingleAlloc = MaxSingleAlloc.Load(EMemoryOrder::Relaxed);
    	if (LocalMaxSingleAlloc != 0 && NewSize > LocalMaxSingleAlloc)
    	{
    		return nullptr;
    	}
    #endif
    
    	auto& Alloc = snmalloc::ThreadAlloc::get();
    	SIZE_T AllocatedSize = Alloc.alloc_size(Ptr);
    
    	// Keep the current allocation if the given size is in the same size class.
    	if (AllocatedSize == snmalloc::round_size(NewSize))
    	{
    		return Ptr;
    	}
    
    	if (Alignment != DEFAULT_ALIGNMENT)
    	{
    		check(Alignment <= snmalloc::natural_alignment(NewSize));
    	}
    
    	void* NewPtr = Alloc.alloc(NewSize);
    
    	if (NewPtr)
    	{
    		FMemory::Memcpy(NewPtr, Ptr, AllocatedSize);
    		Alloc.dealloc(Ptr);
    	}
    
    	return NewPtr;
    }
    
    void* FMallocSnmalloc::Realloc(void* Ptr, SIZE_T NewSize, uint32 Alignment)
    {
    	void* NewPtr = TryRealloc(Ptr, NewSize, Alignment);
    
    	if (NewPtr == nullptr && NewSize)
    	{
    		OutOfMemory(NewSize, Alignment);
    	}
    
    	return NewPtr;
    }
    
    void FMallocSnmalloc::Free(void* Ptr)
    {
    	if (!Ptr)
    	{
    		return;
    	}
    
    	snmalloc::ThreadAlloc::get().dealloc(Ptr);
    }
    
    bool FMallocSnmalloc::GetAllocationSize(void* Original, SIZE_T &SizeOut)
    {
    	SizeOut = snmalloc::ThreadAlloc::get().alloc_size(Original);
    	return true;
    }
    
    void FMallocSnmalloc::Trim(bool bTrimThreadCaches)
    {
    	snmalloc::cleanup_unused<snmalloc::Globals>();
    }
    
    SIZE_T FMallocSnmalloc::QuantizeSize(SIZE_T Count, uint32 Alignment)
    {
    	return snmalloc::round_size(Count);
    }
    
    #endif
    
    

    I've tried the latest main branch, the main branch before the big "Buddy" refactor, and the release from 2021. I'm running the latest Windows 10 on a Ryzen 5950X and building with the latest MSVC, though Unreal Engine is only compiled in C++17 mode. I also enabled all the debug features I found. Unfortunately, the result is always the same - it crashes during some random allocation very early in startup - before any additional threads are started even.

    image

    image

    image

    This is the output of the tracing.

    Run init_impl
    Making an allocator.
    rsize 16
    slab size 16384
    Alloc chunk: 000003F768A00000 (16384)
    Run init_impl
    Attach cache to 000003F768C28000
    init(): [email protected]
    Using C++ destructor clean up
    Alloc chunk: 000003F768A40000 (262144)
    size 262144 pow2 size 18
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A04000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A08000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A0C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A10000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A14000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A18000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A1C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A20000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A24000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A28000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A2C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A30000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A34000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A38000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A3C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A80000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A84000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A88000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A8C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A90000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A94000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A98000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768A9C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AA0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AA4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AA8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AAC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AB0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AB4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AB8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768ABC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AC0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AC4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AC8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768ACC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AD0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AD4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AD8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768ADC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AE0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AE4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AE8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AEC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AF0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AF4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AF8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768AFC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B00000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B04000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B08000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B0C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B10000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B14000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B18000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B1C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B20000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B24000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B28000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B2C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B30000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B34000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B38000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B3C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B40000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B44000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B48000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B4C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B50000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B54000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B58000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B5C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B60000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B64000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B68000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B6C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B70000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B74000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B78000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B7C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B80000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B84000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B88000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B8C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B90000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B94000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B98000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768B9C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BA0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BA4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BA8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BAC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BB0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BB4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BB8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BBC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BC0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BC4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BC8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BCC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BD0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BD4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BD8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BDC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BE0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BE4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BE8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BEC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BF0000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BF4000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BF8000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768BFC000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768000000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768004000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768008000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F76800C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768010000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768014000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768018000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F76801C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768020000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768024000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768028000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F76802C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768030000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768034000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768038000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F76803C000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768040000 (16384)
    rsize 1024
    slab size 16384
    Alloc chunk: 000003F768044000 (16384)
    rsize 64
    slab size 16384
    Alloc chunk: 000003F768048000 (16384)
    rsize 512
    slab size 16384
    Alloc chunk: 000003F76804C000 (16384)
    rsize 32
    slab size 16384
    Alloc chunk: 000003F768050000 (16384)
    rsize 128
    slab size 16384
    Alloc chunk: 000003F768054000 (16384)
    rsize 768
    slab size 16384
    Alloc chunk: 000003F768058000 (16384)
    Alloc chunk: 000003F768080000 (524288)
    size 524280 pow2 size 19
    rsize 48
    slab size 16384
    Alloc chunk: 000003F76805C000 (16384)
    rsize 384
    slab size 16384
    Alloc chunk: 000003F768060000 (16384)
    rsize 256
    slab size 16384
    Alloc chunk: 000003F768064000 (16384)
    rsize 96
    slab size 16384
    Alloc chunk: 000003F768068000 (16384)
    rsize 80
    slab size 16384
    Alloc chunk: 000003F76806C000 (16384)
    rsize 640
    slab size 16384
    Alloc chunk: 000003F768070000 (16384)
    rsize 160
    slab size 16384
    Alloc chunk: 000003F768074000 (16384)
    rsize 192
    slab size 16384
    Alloc chunk: 000003F768078000 (16384)
    rsize 112
    slab size 16384
    Alloc chunk: 000003F76807C000 (16384)
    rsize 320
    slab size 16384
    Alloc chunk: 000003F768100000 (16384)
    rsize 224
    slab size 16384
    Alloc chunk: 000003F768104000 (16384)
    rsize 5120
    slab size 131072
    Alloc chunk: 000003F768120000 (131072)
    rsize 448
    slab size 16384
    Alloc chunk: 000003F768108000 (16384)
    rsize 896
    slab size 16384
    Alloc chunk: 000003F76810C000 (16384)
    Pal range alloc: 0x3f767e70000 (0x1000000)
    Pagemap.Set 0x3f768a00000
    Pagemap.Set 0x3f768a40000
    Pagemap.Set 0x3f768a44000
    Pagemap.Set 0x3f768a48000
    Pagemap.Set 0x3f768a4c000
    Pagemap.Set 0x3f768a50000
    Pagemap.Set 0x3f768a54000
    Pagemap.Set 0x3f768a58000
    Pagemap.Set 0x3f768a5c000
    Pagemap.Set 0x3f768a60000
    Pagemap.Set 0x3f768a64000
    Pagemap.Set 0x3f768a68000
    Pagemap.Set 0x3f768a6c000
    Pagemap.Set 0x3f768a70000
    Pagemap.Set 0x3f768a74000
    Pagemap.Set 0x3f768a78000
    Pagemap.Set 0x3f768a7c000
    Pagemap.Set 0x3f768a04000
    Pagemap.Set 0x3f768a08000
    Pagemap.Set 0x3f768a0c000
    Pagemap.Set 0x3f768a10000
    Pagemap.Set 0x3f768a14000
    Pagemap.Set 0x3f768a18000
    Pagemap.Set 0x3f768a1c000
    Pagemap.Set 0x3f768a20000
    Pagemap.Set 0x3f768a24000
    Pagemap.Set 0x3f768a28000
    Pagemap.Set 0x3f768a2c000
    Pagemap.Set 0x3f768a30000
    Pagemap.Set 0x3f768a34000
    Pagemap.Set 0x3f768a38000
    Pagemap.Set 0x3f768a3c000
    Pagemap.Set 0x3f768a80000
    Pagemap.Set 0x3f768a84000
    Pagemap.Set 0x3f768a88000
    Pagemap.Set 0x3f768a8c000
    Pagemap.Set 0x3f768a90000
    Pagemap.Set 0x3f768a94000
    Pagemap.Set 0x3f768a98000
    Pagemap.Set 0x3f768a9c000
    Pagemap.Set 0x3f768aa0000
    Pagemap.Set 0x3f768aa4000
    Pagemap.Set 0x3f768aa8000
    Pagemap.Set 0x3f768aac000
    Pagemap.Set 0x3f768ab0000
    Pagemap.Set 0x3f768ab4000
    Pagemap.Set 0x3f768ab8000
    Pagemap.Set 0x3f768abc000
    Pagemap.Set 0x3f768ac0000
    Pagemap.Set 0x3f768ac4000
    Pagemap.Set 0x3f768ac8000
    Pagemap.Set 0x3f768acc000
    Pagemap.Set 0x3f768ad0000
    Pagemap.Set 0x3f768ad4000
    Pagemap.Set 0x3f768ad8000
    Pagemap.Set 0x3f768adc000
    Pagemap.Set 0x3f768ae0000
    Pagemap.Set 0x3f768ae4000
    Pagemap.Set 0x3f768ae8000
    Pagemap.Set 0x3f768aec000
    Pagemap.Set 0x3f768af0000
    Pagemap.Set 0x3f768af4000
    Pagemap.Set 0x3f768af8000
    Pagemap.Set 0x3f768afc000
    Pagemap.Set 0x3f768b00000
    Pagemap.Set 0x3f768b04000
    Pagemap.Set 0x3f768b08000
    Pagemap.Set 0x3f768b0c000
    Pagemap.Set 0x3f768b10000
    Pagemap.Set 0x3f768b14000
    Pagemap.Set 0x3f768b18000
    Pagemap.Set 0x3f768b1c000
    Pagemap.Set 0x3f768b20000
    Pagemap.Set 0x3f768b24000
    Pagemap.Set 0x3f768b28000
    Pagemap.Set 0x3f768b2c000
    Pagemap.Set 0x3f768b30000
    Pagemap.Set 0x3f768b34000
    Pagemap.Set 0x3f768b38000
    Pagemap.Set 0x3f768b3c000
    Pagemap.Set 0x3f768b40000
    Pagemap.Set 0x3f768b44000
    Pagemap.Set 0x3f768b48000
    Pagemap.Set 0x3f768b4c000
    Pagemap.Set 0x3f768b50000
    Pagemap.Set 0x3f768b54000
    Pagemap.Set 0x3f768b58000
    Pagemap.Set 0x3f768b5c000
    Pagemap.Set 0x3f768b60000
    Pagemap.Set 0x3f768b64000
    Pagemap.Set 0x3f768b68000
    Pagemap.Set 0x3f768b6c000
    Pagemap.Set 0x3f768b70000
    Pagemap.Set 0x3f768b74000
    Pagemap.Set 0x3f768b78000
    Pagemap.Set 0x3f768b7c000
    Pagemap.Set 0x3f768b80000
    Pagemap.Set 0x3f768b84000
    Pagemap.Set 0x3f768b88000
    Pagemap.Set 0x3f768b8c000
    Pagemap.Set 0x3f768b90000
    Pagemap.Set 0x3f768b94000
    Pagemap.Set 0x3f768b98000
    Pagemap.Set 0x3f768b9c000
    Pagemap.Set 0x3f768ba0000
    Pagemap.Set 0x3f768ba4000
    Pagemap.Set 0x3f768ba8000
    Pagemap.Set 0x3f768bac000
    Pagemap.Set 0x3f768bb0000
    Pagemap.Set 0x3f768bb4000
    Pagemap.Set 0x3f768bb8000
    Pagemap.Set 0x3f768bbc000
    Pagemap.Set 0x3f768bc0000
    Pagemap.Set 0x3f768bc4000
    Pagemap.Set 0x3f768bc8000
    Pagemap.Set 0x3f768bcc000
    Pagemap.Set 0x3f768bd0000
    Pagemap.Set 0x3f768bd4000
    Pagemap.Set 0x3f768bd8000
    Pagemap.Set 0x3f768bdc000
    Pagemap.Set 0x3f768be0000
    Pagemap.Set 0x3f768be4000
    Pagemap.Set 0x3f768be8000
    Pagemap.Set 0x3f768bec000
    Pagemap.Set 0x3f768bf0000
    Pagemap.Set 0x3f768bf4000
    Pagemap.Set 0x3f768bf8000
    Pagemap.Set 0x3f768bfc000
    Pagemap.Set 0x3f768000000
    Pagemap.Set 0x3f768004000
    Pagemap.Set 0x3f768008000
    Pagemap.Set 0x3f76800c000
    Pagemap.Set 0x3f768010000
    Pagemap.Set 0x3f768014000
    Pagemap.Set 0x3f768018000
    Pagemap.Set 0x3f76801c000
    Pagemap.Set 0x3f768020000
    Pagemap.Set 0x3f768024000
    Pagemap.Set 0x3f768028000
    Pagemap.Set 0x3f76802c000
    Pagemap.Set 0x3f768030000
    Pagemap.Set 0x3f768034000
    Pagemap.Set 0x3f768038000
    Pagemap.Set 0x3f76803c000
    Pagemap.Set 0x3f768040000
    Pagemap.Set 0x3f768044000
    Pagemap.Set 0x3f768048000
    Pagemap.Set 0x3f76804c000
    Pagemap.Set 0x3f768050000
    Pagemap.Set 0x3f768054000
    Pagemap.Set 0x3f768058000
    Pagemap.Set 0x3f768080000
    Pagemap.Set 0x3f768084000
    Pagemap.Set 0x3f768088000
    Pagemap.Set 0x3f76808c000
    Pagemap.Set 0x3f768090000
    Pagemap.Set 0x3f768094000
    Pagemap.Set 0x3f768098000
    Pagemap.Set 0x3f76809c000
    Pagemap.Set 0x3f7680a0000
    Pagemap.Set 0x3f7680a4000
    Pagemap.Set 0x3f7680a8000
    Pagemap.Set 0x3f7680ac000
    Pagemap.Set 0x3f7680b0000
    Pagemap.Set 0x3f7680b4000
    Pagemap.Set 0x3f7680b8000
    Pagemap.Set 0x3f7680bc000
    Pagemap.Set 0x3f7680c0000
    Pagemap.Set 0x3f7680c4000
    Pagemap.Set 0x3f7680c8000
    Pagemap.Set 0x3f7680cc000
    Pagemap.Set 0x3f7680d0000
    Pagemap.Set 0x3f7680d4000
    Pagemap.Set 0x3f7680d8000
    Pagemap.Set 0x3f7680dc000
    Pagemap.Set 0x3f7680e0000
    Pagemap.Set 0x3f7680e4000
    Pagemap.Set 0x3f7680e8000
    Pagemap.Set 0x3f7680ec000
    Pagemap.Set 0x3f7680f0000
    Pagemap.Set 0x3f7680f4000
    Pagemap.Set 0x3f7680f8000
    Pagemap.Set 0x3f7680fc000
    Pagemap.Set 0x3f76805c000
    Pagemap.Set 0x3f768060000
    Pagemap.Set 0x3f768064000
    Pagemap.Set 0x3f768068000
    Pagemap.Set 0x3f76806c000
    Pagemap.Set 0x3f768070000
    Pagemap.Set 0x3f768074000
    Pagemap.Set 0x3f768078000
    Pagemap.Set 0x3f76807c000
    Pagemap.Set 0x3f768100000
    Pagemap.Set 0x3f768104000
    Pagemap.Set 0x3f768120000
    Pagemap.Set 0x3f768124000
    Pagemap.Set 0x3f768128000
    Pagemap.Set 0x3f76812c000
    Pagemap.Set 0x3f768130000
    Pagemap.Set 0x3f768134000
    Pagemap.Set 0x3f768138000
    Pagemap.Set 0x3f76813c000
    Pagemap.Set 0x3f768108000
    Pagemap.Set 0x3f76810c000
    Pagemap.Set 0x3f768110000
    Pagemap.Set 0x3f768114000
    Pagemap.Set 0x3f768118000
    Pagemap.Set 0x3f76811c000
    Pagemap.Set 0x3f768140000
    Pagemap.Set 0x3f768144000
    Pagemap.Set 0x3f768150000
    Pagemap.Set 0x3f768154000
    Pagemap.Set 0x3f768158000
    Pagemap.Set 0x3f76815c000
    Pagemap.Set 0x3f768160000
    Pagemap.Set 0x3f768164000
    Pagemap.Set 0x3f768168000
    Pagemap.Set 0x3f76816c000
    Pagemap.Set 0x3f768170000
    Pagemap.Set 0x3f768174000
    Pagemap.Set 0x3f768178000
    Pagemap.Set 0x3f76817c000
    Pagemap.Set 0x3f768148000
    Pagemap.Set 0x3f768180000
    Pagemap.Set 0x3f768184000
    Pagemap.Set 0x3f768188000
    Pagemap.Set 0x3f76818c000
    Pagemap.Set 0x3f768190000
    Pagemap.Set 0x3f768194000
    Pagemap.Set 0x3f768198000
    Pagemap.Set 0x3f76819c000
    Pagemap.Set 0x3f76814c000
    Pagemap.Set 0x3f7681c0000
    Pagemap.Set 0x3f7681c4000
    Pagemap.Set 0x3f7681c8000
    Pagemap.Set 0x3f7681cc000
    Pagemap.Set 0x3f7681d0000
    Pagemap.Set 0x3f7681d4000
    Pagemap.Set 0x3f7681d8000
    Pagemap.Set 0x3f7681dc000
    Pagemap.Set 0x3f7681e0000
    Pagemap.Set 0x3f7681e4000
    Pagemap.Set 0x3f7681e8000
    Pagemap.Set 0x3f7681ec000
    Pagemap.Set 0x3f7681f0000
    Pagemap.Set 0x3f7681f4000
    Pagemap.Set 0x3f7681f8000
    Pagemap.Set 0x3f7681fc000
    Pagemap.Set 0x3f7681a0000
    Pagemap.Set 0x3f7681a4000
    rsize 1536
    slab size 32768
    Alloc chunk: 000003F768110000 (32768)
    rsize 1792
    slab size 32768
    Alloc chunk: 000003F768118000 (32768)
    Slab is woken up
    rsize 1280
    slab size 32768
    Alloc chunk: 000003F768140000 (32768)
    rsize 3072
    slab size 65536
    Alloc chunk: 000003F768150000 (65536)
    Slab is woken up
    Slab is woken up
    Slab is woken up
    rsize 2560
    slab size 65536
    Alloc chunk: 000003F768160000 (65536)
    Slab is woken up
    Slab is woken up
    rsize 4096
    slab size 65536
    Alloc chunk: 000003F768170000 (65536)
    rsize 512
    slab size 16384
    Alloc chunk: 000003F768148000 (16384)
    Slab is woken up
    Slab is woken up
    Slab is woken up
    Slab is woken up
    rsize 6144
    slab size 131072
    Alloc chunk: 000003F768180000 (131072)
    Slab is woken up
    Slab is woken up
    Slab is woken up
    rsize 96
    slab size 16384
    Alloc chunk: 000003F76814C000 (16384)
    Slab is woken up
    Slab is woken up
    rsize 10240
    slab size 262144
    Alloc chunk: 000003F7681C0000 (262144)
    Slab is woken up
    Slab is woken up
    Slab is woken up
    rsize 8192
    slab size 131072
    Alloc chunk: 000003F7681A0000 (131072)
    rsize 384
    slab size 16384
    Alloc chunk: 000003F768200000 (16384)
    Pagemap.Set 0x3f7681a8000
    Pagemap.Set 0x3f7681ac000
    Pagemap.Set 0x3f7681b0000
    Pagemap.Set 0x3f7681b4000
    Pagemap.Set 0x3f7681b8000
    Pagemap.Set 0x3f7681bc000
    Pagemap.Set 0x3f768200000
    Heap corruption - free list corrupted!
    Exception: Exception 0xc0000409 encountered at address 0x7ffd37a2286e
    

    Any ideas what might be going wrong here? Or suggestions on how to proceed debugging it?

    opened by Zeblote 18
  • add rust support

    add rust support

    This pr tracks the work of adding a static linking target for rust support.

    My suggestion is that the merging should be considered after the discussions in #109 and https://github.com/rust-lang/rust/pull/68381 are finished.

    opened by SchrodingerZhu 17
  • Fix building on old libc systems without `getentropy`

    Fix building on old libc systems without `getentropy`

    Old libc versions (e.g. glibc < 2.25) do not provide a getentropy function.

    As discussed in #328 I added a CMake compile check and exposed a flag LIBC_HAS_GETENTROPY and put the getentropy into #ifdefs, while also disabling the Entropy PAL feature.

    Keeping this as a draft for now, as I didnt check on every platform. On a linux system with libc 2.17 it compiles fine with this change, where it previously broke.

    Should I verify with some test that the PAL is actually providing some good entropy?

    opened by mfelsche 16
  • Question: when snmalloc release memory back to os

    Question: when snmalloc release memory back to os

    image I'm trying different memory allocators for a key value store project. On the left is the rss usage with snmalloc(using rust crate) and on the right is jemalloc(using tikv-jemalloc crate which provides binding for 5.2.1). I wonder when snmalloc will release memory back to the os?

    opened by photoszzt 16
  • Compile-time fixes for Clang 10

    Compile-time fixes for Clang 10

    Hello, Would you mind applying these fixes to make the codebase compile with Clang 10 please? Many thanks!

    diff --git a/src/ds/aba.h b/src/ds/aba.h
    index 21950ff..60dcc2f 100644
    --- a/src/ds/aba.h
    +++ b/src/ds/aba.h
    @@ -121,6 +121,7 @@ namespace snmalloc
           }
     
           Cmp(const Cmp&) = delete;
    +      Cmp(Cmp&&) = default;
         };
     
         // This method is used in Verona
    diff --git a/src/ds/helpers.h b/src/ds/helpers.h
    index b479011..8f5b83c 100644
    --- a/src/ds/helpers.h
    +++ b/src/ds/helpers.h
    @@ -14,7 +14,7 @@ namespace snmalloc
       class Singleton
       {
         inline static std::atomic_flag flag;
    -    inline static std::atomic<bool> initialised = false;
    +    inline static std::atomic<bool> initialised{};
         inline static Object obj;
     
       public:
    diff --git a/src/ds/mpscq.h b/src/ds/mpscq.h
    index d5d5161..a397354 100644
    --- a/src/ds/mpscq.h
    +++ b/src/ds/mpscq.h
    @@ -14,7 +14,7 @@ namespace snmalloc
           std::is_same<decltype(T::next), std::atomic<T*>>::value,
           "T->next must be a std::atomic<T*>");
     
    -    std::atomic<T*> back = nullptr;
    +    std::atomic<T*> back{};
         T* front = nullptr;
     
       public:
    @@ -72,10 +72,10 @@ namespace snmalloc
             SNMALLOC_ASSERT(front);
             std::atomic_thread_fence(std::memory_order_acquire);
             invariant();
    -        return std::pair(first, true);
    +        return std::pair<T*, bool>(first, true);
           }
     
    -      return std::pair(nullptr, false);
    +      return std::pair<T*, bool>(nullptr, false);
         }
       };
     } // namespace snmalloc
    diff --git a/src/mem/alloc.h b/src/mem/alloc.h
    index c58015b..4f8aed8 100644
    --- a/src/mem/alloc.h
    +++ b/src/mem/alloc.h
    @@ -240,6 +240,9 @@ namespace snmalloc
         template<size_t size>
         void dealloc(void* p)
         {
    +      if (p == nullptr)
    +        return;
    +
     #ifdef USE_MALLOC
           UNUSED(size);
           return free(p);
    @@ -280,6 +283,9 @@ namespace snmalloc
          */
         SNMALLOC_FAST_PATH void dealloc(void* p, size_t size)
         {
    +      if (p == nullptr)
    +        return;
    +
     #ifdef USE_MALLOC
           UNUSED(size);
           return free(p);
    @@ -302,6 +308,9 @@ namespace snmalloc
     
         SNMALLOC_SLOW_PATH void dealloc_sized_slow(void* p, size_t size)
         {
    +      if (p == nullptr)
    +        return;
    +
           if (size == 0)
             return dealloc(p, 1);
     
    @@ -325,6 +334,9 @@ namespace snmalloc
          */
         SNMALLOC_FAST_PATH void dealloc(void* p)
         {
    +      if (p == nullptr)
    +        return;
    +
     #ifdef USE_MALLOC
           return free(p);
     #else
    diff --git a/src/mem/pagemap.h b/src/mem/pagemap.h
    index fc3da85..e658cd3 100644
    --- a/src/mem/pagemap.h
    +++ b/src/mem/pagemap.h
    @@ -188,7 +188,7 @@ namespace snmalloc
           {
             PagemapEntry* value = get_node<create_addr>(e, result);
             if (unlikely(!result))
    -          return std::pair(nullptr, 0);
    +          return std::pair<Leaf*, size_t>(nullptr, 0);
     
             shift -= BITS_PER_INDEX_LEVEL;
             ix = (static_cast<size_t>(addr) >> shift) & ENTRIES_MASK;
    @@ -208,11 +208,11 @@ namespace snmalloc
           Leaf* leaf = reinterpret_cast<Leaf*>(get_node<create_addr>(e, result));
     
           if (unlikely(!result))
    -        return std::pair(nullptr, 0);
    +        return std::pair<Leaf*, size_t>(nullptr, 0);
     
           shift -= BITS_FOR_LEAF;
           ix = (static_cast<size_t>(addr) >> shift) & LEAF_MASK;
    -      return std::pair(leaf, ix);
    +      return std::pair<Leaf*, size_t>(leaf, ix);
         }
     
         template<bool create_addr>
    diff --git a/src/mem/pooled.h b/src/mem/pooled.h
    index 71f8d91..f044a77 100644
    --- a/src/mem/pooled.h
    +++ b/src/mem/pooled.h
    @@ -14,7 +14,7 @@ namespace snmalloc
         friend class MPMCStack;
     
         /// Used by the pool for chaining together entries when not in use.
    -    std::atomic<T*> next = nullptr;
    +    std::atomic<T*> next{};
         /// Used by the pool to keep the list of all entries ever created.
         T* list_next;
         std::atomic_flag in_use = ATOMIC_FLAG_INIT;
    diff --git a/src/mem/remoteallocator.h b/src/mem/remoteallocator.h
    index 4c1f906..6d8ae55 100644
    --- a/src/mem/remoteallocator.h
    +++ b/src/mem/remoteallocator.h
    @@ -19,7 +19,7 @@ namespace snmalloc
         union
         {
           Remote* non_atomic_next;
    -      std::atomic<Remote*> next = nullptr;
    +      std::atomic<Remote*> next{};
         };
     
         alloc_id_t allocator_id;
    diff --git a/src/mem/slowalloc.h b/src/mem/slowalloc.h
    index f2fcba1..0ab1169 100644
    --- a/src/mem/slowalloc.h
    +++ b/src/mem/slowalloc.h
    @@ -63,6 +63,6 @@ namespace snmalloc
        */
       inline SlowAllocator get_slow_allocator()
       {
    -    return SlowAllocator{};
    +    return {};
       }
     } // namespace snmalloc
    diff --git a/src/pal/pal_consts.h b/src/pal/pal_consts.h
    index 538f90d..fe9a38d 100644
    --- a/src/pal/pal_consts.h
    +++ b/src/pal/pal_consts.h
    @@ -78,7 +78,7 @@ namespace snmalloc
         /**
          * List of callbacks to notify
          */
    -    std::atomic<PalNotificationObject*> callbacks = nullptr;
    +    std::atomic<PalNotificationObject*> callbacks{};
     
       public:
         /**
    diff --git a/src/pal/pal_windows.h b/src/pal/pal_windows.h
    index ba7fcea..8015827 100644
    --- a/src/pal/pal_windows.h
    +++ b/src/pal/pal_windows.h
    @@ -17,7 +17,7 @@
     #  ifdef NTDDI_WIN10_RS5
     #    if (NTDDI_VERSION >= NTDDI_WIN10_RS5) && \
           (WINVER >= _WIN32_WINNT_WIN10) && !defined(USE_SYSTEMATIC_TESTING)
    -#      define PLATFORM_HAS_VIRTUALALLOC2
    +//#      define PLATFORM_HAS_VIRTUALALLOC2
     #    endif
     #  endif
    
    opened by aganea 15
  • Crash on realloc

    Crash on realloc

    Hello, I'm seeing a 100% crash in snmalloc's realloc function, in malloc.cc, L92. The crash is caused by the sz being larger than the size initially allocated. We're trying to (re)allocate a large buffer:

    llvm-mc.exe!realloc(void * ptr=0x0000023244000000, unsigned __int64 size=0x00000000027fffff)
    

    In malloc.cc, L82, Alloc::alloc_size(ptr); finds sz to be 0x0000000002000000 bytes. However the caller knows the buffer is 0x013fffff. So we end up crashing on rep movsb while copying the buffer to the new location. I can see alloc_size(..) returning at alloc.h, L500, ie. return 1ULL << size; where size is 25.

    I haven't investigated any further. I am at git checkout a43773c5b7e1fc291ddd1580fec04a574ef886e1. I'm trying to integrate snmalloc in LLVM to compare it with other modern allocators. Currently half a dozen tests are failing because of this. Any suggestions? I can provide with a repro if needed.

    opened by aganea 14
  • Deconflate

    Deconflate "pagemap"

    I'm certain there's something wrong with this PR, but I am not certain what, exactly.

    I found it confusing that there were two things that called themselves PageMap and kept tripping over the difference when working on #41. So, in order to reduce my tripping rate and in hopes that I might help others do the same, I propose that we rename the alloc.h Pagemap thing a Slabmap. Not great, but at least different.

    Having done that, it seemed to make sense to ensure that all accesses from the Allocator to the Pagemap were via the Slabmap. And then it became apparent that everything was static, in fact, and so some additional cleanups were possible.

    I suspect there is something I'm not getting; perhaps a different, out-of-tree setting for SNMALLOC_DEFAULT_PAGEMAP breaks this? In any case, let's see what CI thinks about it, since it works for me on my desk...

    opened by nwf 14
  • Loongarch Support

    Loongarch Support

    QEMU 7.1 will begin loongarch support: https://wiki.qemu.org/ChangeLog/7.1#LoongArch.

    Currently, everything works but all check variants:

    qemu-loongarch64 -strace /home/schrodinger/Downloads/loongarch64-clfs-6.0-cross-tools-gcc_and_clang-full/cross-tools/target/usr/lib64/ld-linux-loongarch-lp64d.so.1 --library-path /home/schrodinger/Downloads/loongarch64-clfs-6.0-cross-tools-gcc_and_clang-full/cross-tools/target/usr/lib64 ./func-thread_alloc_external-check
    

    image

    opened by SchrodingerZhu 6
  • __builtin_dynamic_object_size and snmalloc

    __builtin_dynamic_object_size and snmalloc

    On twitter @richfelker has said we should consider providing __builtin_dynamic_object_size as a more comprehensive way to provide guarded memcpy like features:

    I'd love if there were some clean agreed upon way to get the size as a stronger version of __builtin_dynamic_object_size so this would be usable in all the FORTIFYable interfaces not just memcpy. twitter

    and

    The right way to do this is the way fortify works. Not putting linkage to malloc in the external memcpy, but providing an enhanced __builtin_dynamic_object_size that can query the allocator for knowledge of size. twitter

    I'm raising this issue to build a discussion of what this should mean with snmalloc.

    The Clang documentation is here: https://clang.llvm.org/docs/LanguageExtensions.html#evaluating-object-size-dynamically

    Here is the LLVM review adding it https://reviews.llvm.org/D56760

    It doesn't seem it works from an offset into an object. But I think this is worth experimenting with, as it would allow the compiler to remove some checks, and then what is left to be passed to the snmalloc routine.

    I think we could define __builtin_dynamic_object_size as just remaining_bytes from snmalloc.

    @davidchisnall thoughts? Is there something sensible that could be experimented with here?

    opened by mjp41 9
  • Introduce a new compilation option to zero inline metadata pointers

    Introduce a new compilation option to zero inline metadata pointers

    in allocations before returning to user. This is important on CHERI to avoid leaking capabilities and may also reduce the attack surface on other architecutres. This includes: Freelist pointers. RBTree metadata used by smallbuddyallocator.

    opened by rmn30 2
  • 16 byte cmpxchg

    16 byte cmpxchg

    I think there are two points to improve in current cmpxchg support for 16 byte data structures:

    • consider arm lse
    • consider using __sync_bool_compare_and_swap to force gcc emit inlined instructions.
    #include <atomic>
    
    #if defined(__aarch64__) && defined(__clang__)
    #  pragma clang attribute push(__attribute__((target("lse"))),apply_to=function)
    #  define PLATFORM_SPECIFIC_OPTIONS_ENDING _Pragma("clang attribute pop")
    #elif defined(__aarch64__) && defined(__GNUC__)
    #  pragma GCC push_options
    #  pragma GCC target("arch=armv8-a+lse")
    #  define PLATFORM_SPECIFIC_OPTIONS_ENDING _Pragma("GCC pop_options")
    #elif defined(__x86_64__) && defined(__clang__)
    #  pragma clang attribute push(__attribute__((target("cx16"))),apply_to=function)
    #  define PLATFORM_SPECIFIC_OPTIONS_ENDING _Pragma("clang attribute pop")
    #elif defined(__x86_64__) && defined(__GNUC__)
    #  pragma GCC push_options
    #  pragma GCC target("cx16")
    #  define PLATFORM_SPECIFIC_OPTIONS_ENDING _Pragma("GCC pop_options")
    #else 
    #  define PLATFORM_SPECIFIC_OPTIONS_ENDING
    #endif
    
    template<class T>
    __attribute__((always_inline)) inline bool cas(std::atomic<T> &src,
                    T const& __restrict cmp,
                    T const& __restrict with)
    {
        auto inline_copy = [](__int128 * dst, const void * __restrict src) {
    #if __has_builtin(__builtin_inline_memcpy)
        __builtin_inline_memcpy(dst, src, sizeof(__int128));
    #elif __has_builtin(__builtin_memcpy)
        __builtin_memcpy(dst, src, sizeof(__int128));
    #else
        ::memcpy(dst, src, sizeof(__int128));
    #endif
        };
        __int128 cmp_value;
        __int128 with_value;
        inline_copy(&cmp_value, &cmp);
        inline_copy(&with_value, &with);
        return __sync_bool_compare_and_swap(reinterpret_cast<__int128 *>(&src), cmp_value, with_value);
    }
    
    struct A {
        int64_t a, b;
    };
    
    bool cas_test(std::atomic<__int128> &src,
                    __int128 const& cmp,
                    __int128 const& with)
    {
        return cas(src, cmp, with);
    }
    
    bool cas_test(std::atomic<A> &src,
                    A const& cmp,
                    A const& with)
    {
        return cas(src, cmp, with);
    }
    
    PLATFORM_SPECIFIC_OPTIONS_ENDING
    
    bool cas_test2(std::atomic<__int128> &src,
                    __int128 & cmp,
                    __int128 & with)
    {
        return src.compare_exchange_weak(cmp, with);
    }
    
    
    opened by SchrodingerZhu 4
  • How to use the new hardening features

    How to use the new hardening features

    I am not seeing how to use the new hardening features when including snmalloc as a header-only library on Windows. There doesn't seem to be any #define or similar mentioned in the documentation? Is it just enabled by default?

    opened by Zeblote 6
Releases(0.6.1)
  • 0.6.1(Sep 2, 2022)

    Minor release. Largest change is bringing online Morello CHERI support.

    • CHERI support (#537, #542, #532, #543)
    • Improve error messages for checks (#526, #521)
    • Increased checks on client (#520, #550)
    • Portability fixes for Haiku and older versions of glibc (#545, #546, #533)
    • Expose macOS malloc_good_size (#538)
    • General code tidying (#527, #535, #547, #551, #536, #534, #529)
    • Documentation (#551, #522)

    Thanks to the external contributors @panekj, @devnexen and @mfelsche.

    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(May 9, 2022)

    This is a major revision of the snmalloc design. The redesign has been primarily focused on adding new security features. The redesign affects all aspects of snmalloc. A more comprehensive explanation of the new features can be found in the docs

    Source code(tar.gz)
    Source code(zip)
  • 0.5.3(Jan 29, 2021)

    All minor changes

    • Fix performance regression for Debug builds (#274)
    • Refactoring towards sandbox support (#221, #268, #269)
    • Refactoring towards CHERI support (#266)
    • Bug fixes for Power (#265), SPARC (#264), ARM 32bit (#263) Solaris (#262),
    Source code(tar.gz)
    Source code(zip)
  • 0.5.2(Nov 10, 2020)

    Bug fixes

    • Fixes issue where unused memory is zeroed by the OS, then superslab data can become corrupt. Only affected the 16MiB chunk size configuration. The bug did not affect the Linux implementation. #259
    • Typo that meant 16MiB configuration was ignored in some cases. #254

    Platform Support

    • Improve Haiku support #255
    • Added DragonFly BSD support #252

    Thanks especially to @devnexen for finding and fixing many issues in this release, and for the contributions to many previous releases in expanding snmallocs platform support.

    Source code(tar.gz)
    Source code(zip)
  • 0.5.1(Sep 22, 2020)

    The main change in this release is a new mechanism for tracking very coarse-grained memory usage statistics (#241). The interface is only exposed in malloc-extensions.h. We recommend static linking this API, as we may refine this in future releases.

    Other changes:

    • PALs now fully static (#245)

    Bug fixes:

    • Haiku Debug build fixed (#247)
    • Windows Clang CTZ was not defined on correct size for size_t (#242)
    • Fix to unit tests (#239)
    • FlatPageMap calculated too large size if payload was not a byte size (#236)
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 9, 2020)

    This version significantly improves the peak working set on Windows, and has a lower RSS on Linux systems using transparent huge pages.

    Detailed changes:

    • Change default chunksize to 1MiB, and add explicit -16mib tagged version of libraries (#229).
    • Adds new AddressSpaceManager that improves performance with transparent huge pages (#214, #227) also improves support for open enclave (#212)
    • Improve Open Enclave support with new smaller chunk size of 256KiB (#212), and add versions of libraries and tests using this chunksize with the tag -oe.
    • Preliminary support for Haiku (#218) and Solaris (#226)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.2(Jun 11, 2020)

    Changes

    • Preliminary Android support (#171)
    • Fixes to 32bit compile for Linux (#173)
    • Preparatory work for CHERI support (#188, #191, #193, #202)
    • Performance of alloc_size (#196)
    • Minor changes to build defaults (#177, #182, #187, #189, #194, #197, #206)
    • Improve Open Enclave support (#205, #201, #195)
    • Expose cfree (#179)
    • Allow different OS_PAGE_SIZES (#185)
    • Improve support for VS2019 (#207)
    • Add error message for failed init (#190)

    Bug fixes

    • realloc of large allocation may calculate incorrectly return original allocation (#178, #209)
    • Fixed sized delete of nullptr (#181)
    • Fixed memory ordering in unit test (#184)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.1(Apr 15, 2020)

    Small fixes to 0.4

    • Build fixes for GCC9 and GCC 10 (#172)
    • Bug fix - if first call of a thread is a calloc, then it might not be zeroed. (#172)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Apr 10, 2020)

    The following are the main changes since the previous version:

    • Improved Decommit on Windows
      • Removed redundant system calls #120
      • Implement lazy decommit strategy to take OS call back, so an application not performing allocation can release memory to the OS #128
      • Bug Fix: Remove infinite loop on low-memory notification #123
    • Improved performance of multiple fast and slow paths
      • Improved sized deallocation code paths #115
      • Remote deallocation code path separated into fast and slow path (increased perf considerable for producer/consumer workloads) #138
      • Improve allocation slow path and change to thread local bump allocator #143
    • Improved alignment code
      • Support alignment of large allocations #124
      • Efficient calculation of alignment requests (just two additional integer instructions on fast path) #113
    • Support more platforms and tool chains
      • Expose API surface for Rust #113
      • Improve GCC support #110 #134 #167
      • Add support for Clang on Windows #119
      • Provide OpenEnclave support #129 #131 #135 #140 #166 #167
      • Preliminary ARM support #142 , #161
    • Bug fixes based on mimalloc-bench
      • Fixed poor performance in the message queue handling due to small batch size (#158 )
      • Fixed leak of allocators during TLS teardown (#161)
      • Fixed high page fault count from using MADV_DONTNEED for zeroing (#159)
    Source code(tar.gz)
    Source code(zip)
  • 0.3(Nov 25, 2019)

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

Jonathan Müller 1k Dec 2, 2021
The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.

The Hoard Memory Allocator Copyright (C) 1998-2020 by Emery Berger The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocat

Emery Berger 905 Sep 14, 2022
mimalloc is a compact general purpose allocator with excellent performance.

mimalloc mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leij

Microsoft 7.1k Sep 16, 2022
Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C

rpmalloc - General Purpose Memory Allocator This library provides a public domain cross platform lock free thread caching 16-byte aligned memory alloc

Mattias Jansson 1.6k Sep 23, 2022
A tiny portable C89 memory allocator

mem A tiny portable C89 memory allocator. Usage This is a single-header library. You must include this file alongside #define MEM_IMPLEMENTATION in on

null 10 Jun 13, 2022
Malloc Lab: simple memory allocator using sorted segregated free list

LAB 6: Malloc Lab Main Files mm.{c,h} - Your solution malloc package. mdriver.c - The malloc driver that tests your mm.c file short{1,2}-bal.rep - T

null 1 Feb 28, 2022
Mesh - A memory allocator that automatically reduces the memory footprint of C/C++ applications.

Mesh: Compacting Memory Management for C/C++ Mesh is a drop in replacement for malloc(3) that can transparently recover from memory fragmentation with

PLASMA @ UMass 1.5k Sep 17, 2022
Hardened malloc - Hardened allocator designed for modern systems

Hardened malloc - Hardened allocator designed for modern systems. It has integration into Android's Bionic libc and can be used externally with musl and glibc as a dynamic library for use on other Linux-based platforms. It will gain more portability / integration over time.

GrapheneOS 779 Sep 16, 2022
Allocator bench - bench of various memory allocators

To run benchmarks Install lockless from https://locklessinc.com/downloads/ in lockless_allocator path make Install Hoard from https://github.com/emery

Sam 44 Dec 3, 2021
A tool for tracking memory allocation based ld-preload

libmallocTrace A tool for tracking memory allocation based ld-preload how to build make cd example && make how to use a simple way is to execute some

赵政 1 Mar 12, 2022
A C++17 message passing library based on MPI

MPL - A message passing library MPL is a message passing library written in C++17 based on the Message Passing Interface (MPI) standard. Since the C++

Heiko Bauke 108 Sep 13, 2022
STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

memory The C++ STL allocator model has various flaws. For example, they are fixed to a certain type, because they are almost necessarily required to b

Jonathan Müller 1.1k Sep 20, 2022
STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.

Jonathan Müller 1k Dec 2, 2021
C++ library for getting full ROS message definition or MD5 sum given message type as string

rosmsg_cpp C++ library for getting full message definition, MD5 sum and more given just the message type as string. This package provides both C++ lib

Vision for Robotics and Autonomous Systems 3 Jan 5, 2022
Sample project using snmalloc

Sample project to use snmalloc This is a sample project which uses snmalloc in a couple of different setup. There are a couple of ways to integrate sn

Neeraj 1 Nov 1, 2021
Embeddable Event-based Asynchronous Message/HTTP Server library for C/C++

libasyncd Embeddable Event-based Asynchronous Message/HTTP Server library for C/C++. What is libasyncd? Libasyncd is an embeddable event-driven asynch

Seungyoung 165 May 25, 2022
The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.

The Hoard Memory Allocator Copyright (C) 1998-2020 by Emery Berger The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocat

Emery Berger 905 Sep 14, 2022
mimalloc is a compact general purpose allocator with excellent performance.

mimalloc mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leij

Microsoft 7.1k Sep 16, 2022
Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C

rpmalloc - General Purpose Memory Allocator This library provides a public domain cross platform lock free thread caching 16-byte aligned memory alloc

Mattias Jansson 1.6k Sep 23, 2022