Gcpp - Experimental deferred and unordered destruction library for C++

Related tags

Data Structures gcpp
Overview

You can find a talk that describes this library here:

gcpp: Deferred and unordered destruction

Herb Sutter -- Updated 2016-10-16

Motivation, goals, and disclaimers

gcpp is a personal project to try an experiment: Can we take the deferred and unordered destruction patterns with custom reachability tracing logic that we find ourselves writing by hand today, and automate some parts of that work as a reusable C++ library that delivers it as a zero-overhead abstraction?

This is a demo of a potential additional fallback option for the rare cases where unique_ptr and shared_ptr aren't quite enough, notably when you have objects that refer to each other in local owning cycles, or when you need to defer destructor execution to meet real-time deadlines or to bound destructor stack cost. The goal is to illustrate ideas that others can draw from, that you may find useful even if you never use types like the ones below but just continue to use existing smart pointers and write your destructor-deferral and tracing code by hand.

Disclaimers: This is a demo, not a production quality library; bug reports are welcome. As of this writing, it works on Clang/libc++ 3.9 or later and Visual Studio 2015 Update 3 or later; if you have success with others, please report it by opening an Issue to update this README. See also the FAQ "So deferred_heap and deferred_ptr have no disadvantages?". And please see the Acknowledgments.

Overview

deferred_heap

A deferred_heap owns a region of memory containing objects that can be safely accessed via deferred_ptr<T> and that can point to each other arbitrarily within the same heap. It provides three main operations:

  • .make<T>() allocates and constructs a new T object and returns a deferred_ptr<T>. If T is not trivially destructible, it will also record the destructor to be eventually invoked.

  • .collect() is called explicitly by default and traces this local heap in isolation; it never traces outside this heap and its roots, including that it does not touch the memory owned by any other deferred_heaps or any other program data structures. Any unreachable objects will have their deferred destructors run before their memory is deallocated. Cycles of deferred_ptrs within the same heap are destroyed when the cycle is no longer reachable.

  • ~deferred_heap() runs any remaining deferred destructors and resets any deferred_ptrs that outlive this heap to null, then releases its memory all at once like a region.

Because .collect() and the destructor are explicit, the program can choose when (at a convenient time) and where (e.g., on what thread or processor) to run destructors.

Local small heaps are encouraged. This keeps tracing isolated and composable; combining libraries that each use deferred_heaps internally will not directly affect each other's performance.

deferred_ptr

A deferred_ptr<T> refers to a T object in a deferred_heap. You can construct a deferred_ptr from an existing one (including one returned by deferred_heap::make, the source of all non-null deferred_ptrs) or by default-constructing it to null.

It has the same functions as any normal C++ smart pointer, including derived-to-base and const conversions. Here are the main additional features and semantics:

  • A deferred_ptr member is null in a deferred destructor. This enables safe unordered destruction by enforcing known best practices long learned in other languages and environments, in particular:

    • It enforces that the otherwise-normal deferred destructor cannot access another deferred-destruction object that could possibly be destroyed in the same collection cycle (since it might already have been destroyed). This eliminates the "accessing a disposed/finalized object" class of bugs that is possible in environments like Java, .NET, and Go, and eliminates the "leaking cycles of objects that use finalizers" class of bugs that is possible in Go.

    • It enforces that a destructor cannot "resurrect" another deferred-lifetime object (or itself) by storing a deferred_ptr to it somewhere that would make that object reachable again. This eliminates the "double dispose" class of bugs that is possible in environments like Java, .NET, and Go. It also closes a second route to the "accessing a disposed/finalized object" class of bugs mentioned in the previous point.

  • Like shared_ptr's aliasing constructor, deferred_ptr supports creating a smart pointer to a data member subobject of a deferred object. However, I'm currently doing it in a different way from shared_ptr. Whereas shared_ptr's aliasing constructor can accept any pointer value and so isn't type-safe or memory-safe, deferred_ptr<T> instead provides a .ptr_to<U>(U T::*) function that takes a pointer to member of T and so type- and memory-safely guarantees at compile time that you can only use it to form a deferred_ptr<U> to a valid U subobject of a deferred T object via a valid deferred_ptr<T>.

  • Assigning a deferred_ptr is not allowed across heaps. The pointer must continue pointing into whichever heap it initially pointed into. A default-constructed pointer does not point into any heap, but once assigned to point into a given heap it must continue to point there; this is currently enforced in debug builds.

deferred_allocator

Finally, deferred_allocator is a C++11 STL allocator that you can use to have an STL container store its elements in a deferred_heap. This is especially useful when a deferred-lifetime object contains a container of deferred_ptrs to other objects in its heap; using deferred_allocator expresses that the container's deferred_ptrs are inside the heap so that any cycles they participate in can be automatically cleaned up, whereas not using deferred_allocator expresses that the container's deferred_ptrs are outside the heap (i.e., roots).

Convenience aliases are provided; for example, deferred_vector<T> is an alias for std::vector<T, gcpp::deferred_allocator<T>>.

See also additional uses of deferred_allocator.

Example

Here is a Graph type that has its own local heap shared by all Graph objects:

// possibly-cyclic N-ary Graph, one heap for all graph nodes

class Graph {
    struct Node {
        deferred_vector<deferred_ptr<Node>> outbound_edges{ my_heap };  // keeps all edges in the heap
        /*... data ...*/
    };

    static deferred_heap my_heap;
    vector<deferred_ptr<Node>> roots;

public:
    void SampleFunction() {
        roots.emplace_back(my_heap.make<Node>());
    }

    // ... etc.
};

Here is a variation where each Graph object has its own private local heap:

// possibly-cyclic N-ary Graph, one heap per graph object

class Graph {
    struct Node {
        /*... data ...*/
        deferred_vector<deferred_ptr<Node>> outbound_edges;  // keeps all edges in the heap
        Node(deferred_heap& h) : outbound_edges{h} { }
    };

    deferred_heap my_heap;
    vector<deferred_ptr<Node>> roots;

public:
    void SampleFunction() {
        roots.emplace_back(my_heap.make<Node>(my_heap));
    }

    // ... etc.
};

Note that deferred_allocator requires C++11 allocator support for fancy pointers; see deferred_allocator implementation notes.

Target use cases

gcpp aims to address three main issues, and a speculative use case.

1. Ownership cycles, preventing leaks

The first target use case is automatic by-construction memory management for data structures with cycles, which cannot be expressed with fully automated lifetime using C++17 facilities only.

  • Encapsulated example: A Graph type whose Nodes point to each other but should stay alive as long as they are transitively reachable from the enclosing Graph object or from some object such as a Graph::iterator that is handed out by the Graph object.

  • Unencapsulated example: A group of objects of different types that refer to each other in a potentially-cyclic way but should stay alive as long as they are transitively reachable from some root outside their deferred_heap.

  • In C++17, both examples can be partly automated (e.g., having Graph contain a vector<unique_ptr<Node>> my_nodes; so that all nodes are guaranteed to be destroyed at least when the Graph is destroyed), but require at minimum manual liveness tracing today (e.g., traversal logic to discover unreachable Node objects, which essentially implements a custom tracing algorithm by hand for the nodes, and then my_nodes.erase(unreachable_node); to remove each unreachable node which is manual and morally equivalent to delete unreachable_node;).

2. Real time systems, bounding the cost of pointer assignment

Using shared_ptr can be problematic in real time systems code, because any simple shared_ptr pointer assignment or destruction could cause an arbitrary number of objects to be destroyed, and therefore have unbounded cost. This is a rare case of where prompt destruction, usually a wonderful property, is actually bad for performance. (Note: This is less of a problem with unique_ptr because unique_ptr assignment necessarily causes destruction and so tends to be treated more carefully, whereas shared_ptr assignment might cause destruction.)

Today, deadline-driven code must either avoid manipulating shared_ptrs or take defensive measures:

  • Banning shared_ptr assignment or destruction entirely in deadline-driven code, such as by deferring the pointer assignment until after the end of the critical code region, is not always an option because the code may need to change which object a pointer refers to while still inside the critical region.

  • Bounding the number of objects reachable from a shared_ptr can make the assignment cost have an upper bound to fit within the deadline, but requires ongoing care during maintenance. Adding an additional object, or changing the cost of a destructor, can exceed the bound.

  • Deferring destruction by storing additional strong references in a separate "keepalive" data structure allows tracing to be performed later, outside the critical code region, to identify and destroy those objects no longer referred to from outside the keepalive structure. However, this amounts to another form of manually implemented liveness tracing.

By design, deferred_ptr assignment cost is bounded and unrelated to destructors, and deferred_heap gives full control over when and where collect() runs deferred destructors. This makes it a candidate for being appropriate for real-time code in situations where using shared_ptr may be problematic.

3. Constrained systems, bounding stack depth of destruction

Using unique_ptr and shared_ptr can be problematic in systems with constrained stack space and deep ownership. Because destructors are always run nested (recursively), the thread that releases an object must have sufficient stack space for the call depth required to destroy the tree of objects being released. If the tree can be arbitrarily deep, an arbitrary amout of stack space may be needed.

Today, systems with constrained stacks use similar techniques to those mentioned in #2 above, with similar limitations and tradeoffs.

By design, deferred_heap runs deferred destructors iteratively, not recursively. Of course, an individual deferred object may still own a tree of resources that may use shared_ptrs and be freed recursively by default, but any two deferred objects are destroyed iteratively even if they referred to each other, and their destructors never nest. This makes it a candidate for being appropriate for real-time code in situations where using shared_ptr may be problematic.

(speculative) STL iterator safety

Besides being useful to have containers of deferred_ptrs whose pointers live in a deferred_heap, using just a container<T, deferred_allocator<T>> by itself without explicit deferred_ptrs may have some interesting properties that could be useful for safer use of STL in domains where allowing iterator dereference errors to have undefined behavior is not tolerable.

  • When using a container<T, deferred_allocator<T>>, iterators can’t dangle (point to a destroyed or deallocated object) because iterators keep objects alive and destructors are deferred. This turns dereferencing an invalidated iterator from an undefined behavior problem into a stale data problem.

  • See Implementation notes for limitations on iterator navigation using invalidated iterators.

  • Note: deferred_allocator relies on C++11's allocator extensions to support "fancy" user-defined pointer types. It does not work with pre-C++11 standard libraries, which required allocator::pointer to be a raw pointer type. If you are using Microsoft Visual C++ (the only compiler I've tried so far), the current implementation of gcpp requires Visual Studio 2015 Update 3 (or later); it does not work on Update 2 which did not yet have enough fancy pointer support.

Object lifetime guidance

The following summarizes the best practices we should already teach for expressing object lifetimes in C++17, and at the end adds a potential new fallback option to consider something along these lines.

Guidance / library What it automates Efficiency Clarity and correctness
[C++17] 1. Where possible, prefer scoped lifetime by default (e.g., locals, members) Expressing that this object's lifetime is tied to some other lifetime that is already well defined, such as a block scope (auto storage duration) or another object (member lifetime) Zero additional lifetime management overhead for this object
[C++17] 2. Else prefer make_unique and unique_ptr, if the object must have its own lifetime (i.e., heap) and ownership can be unique without ownership cycles Single-ownership heap object lifetime Usually identical cost as correctly written new+delete Clearer and more robust than explicit delete (declarative, uses are correct by construction)
[C++17] 3. Else prefer make_shared and shared_ptr, if the object must have its own lifetime (i.e., heap) and shared ownership, without ownership cycles Acyclic shared heap object lifetime managed with reference counting Usually identical cost as correctly written manual reference counting Clearer and more robust than manual/custom reference count logic (declarative, uses are correct by construction)
[experimental] 4. Else use similar techniques as deferred_heap and deferred_ptr, if the object must have its own lifetime (i.e., heap) and there can be ownership cycles Potentially-cyclic shared heap object lifetime managed with liveness tracing

Real-time code with bounded pointer assignment cost requirements

Constrained stacks with bounded call depth requirements
(conjecture) Usually identical cost as correctly written manual tracing (conjecture) Clearer and more robust than manual/custom tracing logic (declarative, uses are correct by construction)

FAQs

Q: "Is this garbage collection?"

Of course, and remember that so is reference counting (e.g., shared_ptr).

Q: "I meant, is this just tracing garbage collection (tracing GC)?"

Not the tracing GC most people are familiar with.

Most importantly, deferred_heap collects objects, not garbage:

  • Most mainstream tracing GC is about managing raw memory, which can turn into meaningless "garbage" bytes. Tearing down the real objects that live in that memory is at best a brittle afterthought not designed as an integrated part of the language runtime; see the restrictions on "finalizers" in Java, C#, D, Go, etc. (translation note: remember that in C# what are called "destructors" are actually finalizers) -- all the major GC-based environments I know of that have more than 10 years' field experience with finalizers now recommend avoiding finalizers outright in released code (here's an example), and all for the same reasons: finalizers are not guaranteed to be executed, they are fragile because a finalizer should not (but can) access other possibly-finalized objects, and they lead to unmanageable complications like object resurrection (the worst case of which is making a finalized object reachable again).

  • deferred_heap is fundamentally about managing objects, and is focused on deferring real destructors. So although it does perform liveness tracing, the most important way it differs from the mainstream tracing GC designs is that it tracks and collects constructed objects, and accurately records and later runs their deferred destructors. And because of that emphasis on running real destructors safely, deferred_heap makes all of a collectable object's deferred_ptrs null before running its destructor, which entirely prevents accessing possibly-destroyed objects and object resurrection.

    • Note: When I said gcpp is a demo for people to draw ideas from, that includes other languages' designers. Suggestion to other languages' designers: I would encourage other GC-based languages to consider just nulling object references that point to other finalizable objects before running a round of finalizers. That should nearly always turn a latent bug into a NullPointerException/NullReferenceException/nil-pointer-panic/etc. Although it could be a breaking change in behavior for some programs, it is very likely to "break" only code that is already living on the edge and doing things that you are already telling your programmers not to do. In C++, when we discuss changes that would break suspicious code, Bjarne Stroustrup likes to call it "code that deserves to be broken." So please entertain this suggestion, and consider whether it makes sense to set your object references to null before running finalizers and "break" code deliberately in your language for this case. If you do decide to try it, even in internal test builds, send me mail at hsutter-microsoft and let me know how it goes; I'm actively interested in learning about your experience.

The other important difference is that deferred_heap meets C++'s zero-overhead principle by being opt-in and granular, not default and global:

  • Most mainstream languages that assume GC make the garbage-collected allocator the primary or only way to create heap objects, and all parts of the program (including all libraries linked into the executable) end up sharing a single global heap. Performing something other than GC allocation requires fighting with, and avoiding large parts of, the language and its standard library; for example, by resorting to writing your own allocator by allocating a large array and using unsafe pointers, or by calling out to native code. Further, tracing collection performance (e.g., GC pause timing and duration) in one part of the program can depend on allocation done by an unrelated library linked into the program.

  • deferred_heap is intended to be used tactically as another fallback in situations where options like unique_ptr and shared_ptr are insufficient, and even then in a granular way with a distinct deferred_heap within a class (or at most module). You don't pay for what you don't use: If you never perform a deferred allocation then there is zero cost, and if you do perform some deferred allocation then the cost is always proportional to the amount of deferred allocation you do. Tracing is performed only within each individual deferred_heap bubble, and cycles of deferred_ptrs must stay within the same heap in order to be collected. (Yes, it's possible to instantiate and share a global deferred_heap, but that isn't the way I intend this to be used, and certainly the current demo implementation isn't intended to scale well to millions of pointers.)

Q: "Is this related/similar to the Boehm collector?"

No. That is a fine collector, but with different aims:

  • It doesn't run destructors; even for registered finalizers, cycles of finalizable objects are never finalized.

  • Its tracing is conservative and more global (touches memory beyond the actual GC allocations and roots, such as to discover roots conservatively).

deferred_heap runs destructors, and the tracing is accurate (not conservative) and scoped to an individual granular heap.

Q: "Is deferred_heap equivalent to region-based memory management?"

It's a strict superset of regions.

The key idea of a region is to efficiently release the region's memory in one shot when the region is destroyed, with deallocate-at-once semantics. deferred_heap does that, but goes further in three ways:

  • It adds optional collection via .collect() to reclaim unused parts of the region without waiting for the whole region to be destroyed. If you don't call .collect(), then when the heap is destroyed the heap's memory is released all at once, exactly like a region.

  • It owns objects, not just raw memory, and always correctly destroys any pending destructors for objects that are still in the heap when it is destroyed (or, if .collect() is called, that are unreachable). If you allocate only trivially destructible objects, then this adds zero work -- when the heap is destroyed, no additional destructors need to be run.

  • It knows its roots, and safely resets any outstanding deferred_ptrs that outlive the heap they point into. If you don't let any deferred_ptrs outlive the heap they point into, then this adds zero work -- when the heap is destroyed, no additional nulling needs to be performed.

The only work performed in the deferred_heap destructor is to run pending destructors, null out any deferred_ptrs that outlive the heap, and release the heap's memory pages. So if you use a deferred_heap and never call .collect(), never allocate a non-trivially destructible object, and never let a deferred_ptr outlive the heap, then destroying the heap does exactly nothing beyond efficiently dropping any memory it owns with deallocate-at-once semantics -- and then, yes, it's a region. But, unlike a region, you can do any of those things, and if you do then they are safe.

One way to view deferred_heap is as a candidate approach for unifying tracing GC and regions. And destructors, most importantly of all.

Q: "So deferred_heap and deferred_ptr have no disadvantages?"

Of course there are disadvantages to this approach, especially in this demo implementation.

  • It moves work to different places, and does more total work and therefore has more total overhead than unique_ptr or shared_ptr. Prefer unique_ptr or shared_ptr in that order where possible, as usual; see the guidance above. You should be using them much more frequently than deferred_ptr.

  • The current implementation is not production-quality. In particular, it's a pure library solution that requires no compiler support, it's single-threaded, it dynamically registers every deferred_ptr, and it doesn't try to optimize its marking algorithm. The GC literature and experience is full of ways to make this faster; for example, a compiler optimizer that is aware of deferred_ptr could optimize away all registration of stack-based deferred_ptrs by generating stack maps. The important thing is to provide a distinct deferred_ptr type so we know all the pointers to trace, and that permits a lot of implementation leeway and optimization. (GC experts, feel free to plug in your favorite real GC implementation under the deferred_heap interface and let us know how it goes. I've factored out the destructor tracking to keep it separate from the heap implementation, to make it easier to plug in just the GC memory and tracing management implementation.)

Q: "Why create another smart pointer? another allocator?"

Because that's how we roll in C++.

gcpp aims to continue C++'s long tradition of being a great language for building libraries, including for memory allocation and lifetime management.

  • We already have make_unique and unique_ptr to automate managing the lifetime of a single uniquely owned heap object. Using them is as efficient as (and considerably easier and less brittle than) writing new and delete by hand. Prefer this first when putting an object in a local stack scope or as a data member of another object is not sufficient.

  • Similarly, make_shared and shared_ptr automate managing the lifetime of a single shared heap object. Using them is usually as efficient as (and always easier and less brittle than) managing reference counts by hand. Prefer this next when unique_ptr is not sufficient.

  • In gcpp, deferred_heap and deferred_ptr take a stab at how we might automate managing the lifetime of a group of related shared heap objects that (a) may contain cycles and/or (b) needs deterministic pointer manipulation space and time cost. The goal is that using them be usually as efficient as (and easier and more robust than) managing ownership and writing custom tracing logic by hand to discover and perform destruction of unreachable objects. Because reachability is a property of the whole group, not of a single object or subgroup, an abstraction that owns the whole group is needed. Consider this, or write similar logic by hand, when you have a situation where neither unique_ptr nor shared_ptr are sufficient.

Q: "Would gc_heap and gc_ptr be better names?"

I don't think so.

I used those names initially, but switched to "deferred" for two major reasons, and one minor reason:

  1. "Deferred" emphasizes what I think is the most important property, namely that we're talking about real objects with real destructors, just the destructors are deferred; that's even more important than the tracing collection part.

  2. "GC" could create confusion with the mainstream notion of tracing GC. This is not like GC in other major languages, as I explained above.

  3. "GC" is slightly inaccurate technically, because this isn't really adding GC to C++ inasmuch as C++11 and later already has GC because reference counting is one of the major forms of GC.

Q: "Are there other potential uses where this would be better than current smart pointers?"

Yes.

In particular, a lock-free single-width-compare-and-swap atomic_deferred_ptr<T> should be easy to write. (Of course, before trying that we'd first make deferred_heap itself thread-safe as a baseline.) It's well known that tracing GC makes it easy to avoid ABA problems and traversing-while-erasing concurrency problems for lock-free data structures, without resorting to more complex approaches like hazard pointers. The std::atomic_shared_ptr that I proposed and was adopted in C++17 solves these problems too (see Anthony Williams' nice writeup), but it has two limitations: first, like all shared_ptrs it doesn't deal with cyclic data structures; and second, the only current lock-free implementation (also by Anthony Williams; thanks again Anthony!) requires double-width compare-and-swap (DCAS), which is not available on all platform targets. An atomic_deferred_ptr should solve all four of these problems and limitations.

Even the current implementation, which uses double-width pointers, can pretty easily be made atomic without DCAS as follows: The current implementation is double width because it stores a back pointer to the deferred_heap, specifically so that we can prevent changing the heap that a deferred_ptr points into -- which means that the deferred_heap* is const once it is set to non-null. [The only exception is that the current implementation resets the heap pointer to null when the heap goes away, but that's easy to change by leaving it alone or pointing to a sentinel value.] That means that the implementation of atomic_deferred_ptr<T>::compare_exchange(that) can simply do single atomic loads of *this’s and that’s deferred_heap*s to compare them and enforce that they’re pointing into the same heap, and if they’re the same then do a normal single-width compare_exchange on the T*. -- And deferred_ptr doesn't need to be double width, there are other implementation choices that would make it just a T* and trivially copyable.

Even in the worst case of a naïve non-concurrent collector that stops the world while collecting, this should be sufficient to make using deferred_ptrs completely lock-free "in epochs" meaning in between collections, and each data structure can control when collection of its private mini-heap happens and therefore control the epoch boundaries. And I expect that a concurrent collector written by a real GC implementer (not me) could do better than the baseline "lock-free in epochs."

Implementation notes

Storing destructors

deferred_heap::make<T> stores a single object's destructor as two raw pointers:

struct destructor {
    const void* p;               // raw pointer to the object
    void(*destroy)(const void*); // raw pointer to a type-erased destructor function
};

Later we can just invoke d.destroy(d.p) to correctly clean up the complete object, even if the last deferred_ptr to it is not a deferred_ptr<T> but a pointer to a subobject (deferred_ptr<BaseClassOfT> or deferred_ptr<DataMemberOfT>).

The code conveniently gets a raw function pointer to the destructor by using a lambda:

// Called indirectly from deferred_heap::make<T>.
// Here, t is a T&.
dtors.push_back({
    std::addressof(t),                                    // address of T object
    [](const void* x) { static_cast<const T*>(x)->~T(); } // T destructor to invoke
});

One line removes the type, and the other line adds it back. The lambda gives a handy way to do type erasure -- because we know T here, we just write a lambda that performs the correct cast back to invoke the correct T destructor.

A non-capturing lambda has no state, so it can be used as a plain function. So for each distinct type T that this is instantiated with, compiling this code generates one T-specific function (on demand at compile time, globally unique) and we store that function's address. The function itself is efficient: Depending on the optimization level, the lambda is typically generated as either a one-instruction wrapper function (just a single jmp to the actual destructor) or as a copy of the destructor if the destructor is inlined (no run-time overhead at all, just another inline copy of the destructor in the binary if it's generally being inlined anyway).

deferred_allocator notes

deferred_allocator appears to work with unmodified C++11-conforming STL containers.

  • It requires good support for C++11 fancy pointers.

    • On Microsoft VC++, it requires Visual Studio 2015 Update 3 or later. Update 2 in known to have inadequate fancy pointer support.
    • On Clang/libc++, it requires version 3.9 or later. It might work on 3.7 or 3.8 which I didn't test, but 3.6 is known to have inadequate fancy pointer support (fails to call construct()).
    • I haven't found a version of GCC that supports it yet.
  • deallocate() is a no-op, but performs checking in debug builds. It does not need to actually deallocate because memory-safe deallocation will happen at the next .collect() after the memory becomes unreachable.

  • destroy() is a no-op, but performs checking in debug builds. It does not need to remember the destructor because that was already correctly recorded when construct() was called; see next point. It does not need to call the destructor because the destructor will be called type-safely when the object is unreachable (or, for vector only, when the container in-place constructs a new object in the same location; see next subpoint).

  • The in-place construct() function remembers the type-correct destructor -- if needed, which means only if the object has a nontrivial destructor.

    • construct() is available via deferred_allocator only, and adds special sauce for handling vector::pop_back followed by push_back: A pending destructor is also run before constructing an object in a location for which the destructor is pending. This is the only situation where a destructor runs sooner than at .collect() time, and only happens when using an in-place constructing container like std::vector<T, deferred_allocator<T>> or std::deque<T, deferred_allocator<T>>.

    • Note: We have to assume that the container implementation is not malicious; as Bjarne Stroustrup famously puts it, we protect against Murphy, not Machiavelli. Having said that, to my knowledge, deferred_allocator::construct() is the only operation in gcpp that could be abused in a type-unsafe way, and then only via a buggy or malicious implementation of an STL container that performs in-place construction.

  • container<T, deferred_allocator<T>> iterators keep objects (not just memory) alive. This makes dereferencing an invalidated iterator type-safe, as well as memory-safe.

    • The iterator stores deferred_ptr, which makes the iterator a strong owner. When dereferencing invalidated iterators, this turns an undefined behavior problem into "just" a stale data problem.

    • For all containers, an invalidated iterator points to a valid object. Note that the object may have a different value than the (buggy) program expects, including that it may be in a moved-from state; also, reading the object via the invalidated iterator is not guaranteed to see changes made to the container, and vice versa.

  • However, using navigation (e.g., incrementing) invalidated iterators is not much better than today.

    • For all random access iterators that use pointer arithmetic, any use of an iterator to navigate beyond the end of the allocation that the iterator actually points into will fire an assert in debug builds.

    • For a vector, an invalidated iterator will keep an outgrown-and-discarded vector buffer alive and can still be compared correctly with other iterators into the same actual buffer (i.e., iterators of the same vintage == when the container had the same capacity). In particular, an invalidated iterator obtained before the vector's last expansion cannot be correctly compared to the vector's current .begin() or .end(), and loops that do that with an invalidated iterator will fire an assert in debug builds (because they perform deferred_ptr checked pointer arithmetic).

    • For a node-based container or deque, an invalidated iterator refers to a node whose pointers have been destroyed (not just reset). Incrementing an invalidated iterator to a node-based container is still undefined behavior, as today.

Acknowledgments

This personal project would be considerably weaker without input from a number of gracious experts who have been willing to share their time and experience. I would like to particularly thank the following people for their help:

  • Thanks to Casey Carter, Jonathan Caves, Gabriel Dos Reis, Howard Hinnant, Thomas Koeppe, Stephan T. Lavavej, and Neil MacIntosh for their feedback on the code and/or help with various detailed C++ language and standard library questions. These folks are world-class experts in the C++ language, the C++ standard (and soon-to-be standard) library, and/or the compile-time analysis of both.

  • Thanks to Hans Boehm, Pavel Curtis, Daniel Frampton, Kathryn S McKinley, and Mads Torgersen for their review and suggestions regarding the tracing GC parts. These folks grok garbage collection of all varieties as well as programming language design, and their deep experience has been invaluable.

Thanks, very much.

Comments
  • Enable CMake build

    Enable CMake build

    This PR allows test binaries to build built and executed on OS X and Linux (and Windows) with a Python script:

    python ./scripts/build.py --tests
    

    I've added a CMakeLists file, a Python build script that triggers CMake and added GSL to a submodule.

    I can break this PR up if only bits of it are useful.

    opened by jbcoe 3
  • bitflags: several changes

    bitflags: several changes

    • byte is an odd choice for the units of a bit vector: we care that it's a bag of bits, there's no reason to use the "fundamental unit of addressable storage" type. Instead use the symbolic type unit, currently a typedef to unsigned int so set(from, to, value) can process bigger chunks.
    • sizeof(byte) is not the number of bits in a unit; it is a complicated spelling of 1. bitflags is therefore CHAR_BIT times as large as it needs to be. Replace uses of sizeof(byte) with bits_per_unit which is sizeof(unit) * CHAR_BIT.
    • Factor the complex "bit addressing" expressions into separate functions to simplify.
    • Use std::unique_ptr<unit[]> for bit storage instead of std::vector<unit>; some guy said in a talk at CPPCon that we're supposed to use std::unique_ptr<T[]> to "express a fixed-but-dynamic-size member array" ;)
    • Improve codegen for set(from, to, value) by implementing the head and tail loops in O(1) with bit shift/mask operations.
    opened by CaseyCarter 3
  • Add CMake artifact and Linux executables to .gitignore.

    Add CMake artifact and Linux executables to .gitignore.

    Just a small contribution. CMake leaves a few artifacts in its wake. Also, executables on the Linux build have no extension and thus need to be specified explicitly.

    opened by lballabio 2
  • * gpage.h:

    * gpage.h:

    • narrow the result of locations() after the division to raise the maximum page size by a few more bits. (Verify in the constructor that total_size / min_alloc <= INT_MAX so that locations() is guaranteed to be representable.)
    • inuse and starts both need to hold only one bit per location, not one bit per byte.
    • In the loop in allocate() that is searching for an available space, ensure that we do not misalign i when stepping over an allocated location at j.
    • In contains(), use std::less<> to compare pointers into potentially differing pages. (< is not portably a total function on pointers, std::less is portably total.) Avoid the same issue in deallocate by checking contains() before calculating the difference of the page base pointer and the function argument.
    • In lowest_hex_digits_of_address, convert the pointer to uintptr_t instead of size_t.
    • In debug_print, don't end lines with " \n" when "\n" will do ;)
    • deferred_heap.h:
      • Convert the placement argument to void* to ensure we're invoking True Placement New in construct and construct_array.
      • In construct_array, construct the n elements in n contiguous memory locations instead of in a single memory location. If a constructor throws, cleanup the constructed elements before allowing the exception to propagate.
      • In collect, don't pass pg.page.locations() to pg.page.location_info(). It's out-of-range.
    • test.cpp:
      • Tell the compiler not to warn that p1, p2, p4, & p5 are unused in test_page.
      • In test_deferred_allocator_set, test explicitly for libstdc++ and libc++ whose implementations of std::set do not like this test.
    opened by CaseyCarter 2
  • gcpp::destructors::run: Fix quadratic erase usage; exception safety

    gcpp::destructors::run: Fix quadratic erase usage; exception safety

    • Erasing single elements while iterating over a vector has worst-case quadratic performance; implement algorithm unstable_remove_copy_if to extract the set of destructors to be run in linear time.
    • Ensure that destructors aren't leaked if an exception is thrown reallocating the vector of destructors to be run.
    • Minor:
      • is_stored can be const
      • nuked trailing space
      • replace an & on a user type with an addressof because I'm a standard library implementor and the urge is too ingrained to resist.

    I have only the vaguest notion of whats going on in this codebase, so feel free to complain about formatting/style/ placement of unstable_remove_copy_if in this header.

    opened by CaseyCarter 2
  • Potential false positive in gpage::contains (portability)

    Potential false positive in gpage::contains (portability)

    The question https://stackoverflow.com/questions/69310688/strict-total-order-of-stdlesspointer uses code from this project as example of non-portable code; and to me it seems best to provide feed-back to any project when using it as an example.

    Specifically the issue is that two std::less in gpage::contains https://github.com/hsutter/gcpp/blob/master/gpage.h#L241 can give false positives (i.e. that an unrelated pointer can return true - specifically for segmented memory architectures).

    Feel free to:

    • Ignore
    • Just document limitation
    • Propose improvement of the standard
    • ...
    opened by HansOlsson 1
  • Fast performance degradation

    Fast performance degradation

    As https://github.com/hsutter/gcpp/pull/26 shows there is no clear correlation between number of inserted elements and performance of std::vector::push_back.

    This is not the case for deferred_vector. More elements inserted - more time it will take to insert another one. For example, when there are 25 (cell A2) elements inserted, it is only 36 (cell C2) times slower than insertion into vector<scoped_ptr>, but if there are 4975 (cell A200) elements, it is 1498 (cell C200) times slower.

    Inserting 1 million elements into deferred_vector will probably take N days.

    opened by hia3 1
  • Add Motti Lanzkron's fourth test case.

    Add Motti Lanzkron's fourth test case.

    From the comments to Herb's original post. This test case ensures that the solution doesn't over-deallocate, and was later included in the online challenge.

    Credit goes entirely to Motti, of course.

    opened by lballabio 1
  • Enable CMake build

    Enable CMake build

    This PR allows test binaries to build built and executed on OS X and Linux (and Windows) with a Python script:

    python ./scripts/build.py --tests I've added a CMakeLists file, a Python build script that triggers CMake and added GSL to a submodule.

    I can break this PR up if only bits of it are useful.

    I will follow this up with a Travis-CI PR to enable CI on Linux and macOS.

    This PR replaces #10 with identical functionality but correct handling of case-sensitive filenames.

    opened by jbcoe 1
  • Does C++ ISO will accept a mutilthreaded GC solution?

    Does C++ ISO will accept a mutilthreaded GC solution?

    I implement a C++ PauseLess Garbage Collection,But it's mutilthreaded, not single thread, and don't need STW. I wonder if C++ ISO will accept a mutilthreaded GC solution? Here is my project: https://github.com/yechaoGitHub/GC/tree/master/GC It's a demo for now. I implement it by check circle reference mutilthreadly. Is it acceptable for C++ ISO?

    opened by yechaoGitHub 0
  • bitflags::all_false is incorrect

    bitflags::all_false is incorrect

    bitflags::all_false ignores bits at the end of the sequence.

    If size is a multiple of bits_per_unit (let's say size is 512 and bits_per_unit is 32), then bit_mask returns 1 (which is correct mask to get 0th bit) and this whole expression (bit_mask(size) - 1) becomes zero (which is bad because anything & 0 is always 0).

    opened by hia3 0
  • gcpp::dynamic_pointer_cast, gcpp::const_pointer_cast, etc.

    gcpp::dynamic_pointer_cast, gcpp::const_pointer_cast, etc.

    The public interface right now makes it impossible to implement dynamic_pointer_cast, const_pointer_cast, static_pointer_cast, and reinterpret_pointer_cast for deferred pointers.

    The deferred_ptr's converting constructor makes it possible to covert between any pointers that for which is_convertible<T,U> is true but that is not enough to do all the common casts. The converting constructor could be made private with the enable_if removed and then the casts could implemented as friend functions?

    enhancement 
    opened by jwezorek 1
  • deferred_from_this ?

    deferred_from_this ?

    How do you handle the situation in which you are creating a deferred_ptr to a child node in a member function of a parent node managed by the same deferred_heap and you would like the child to have a deferred_ptr back to the parent?

    should there be a way of recovering an already created deferred_ptr using the heap if you have the raw pointer?

    enhancement 
    opened by jwezorek 1
  • add performance test

    add performance test

    This adds performance test that can work in two modes:

    1. When executed without command line arguments, it prints on stdout:

    Inserting 5000 elements into deferred_vector<deferred_ptr> is 1226 times slower than into vector<unique_ptr>

    1. When executed with one command line argument (html file name), it outputs html file with performance comparison between deferred_vector<deferred_ptr>, std::vectorstd::unique_ptr and std::vectorstd::shared_ptr (uses Google Line Chart https://developers.google.com/chart/interactive/docs/gallery/linechart )

    Here is an example of output it can produce:

    image

    opened by hia3 1
  • deferred_heap.h:511

    deferred_heap.h:511 "attempt to dereference null"

    When compiling the code with STL's STL (that comes with VS2017)

    #include "deferred_allocator.h"
    using namespace gcpp;
    
    int main()
    {
        deferred_heap h;
        using my_map = deferred_unordered_map<int, int>;
        deferred_ptr<my_map> scopes(h.make<my_map>(h));
    }
    

    it fails at deferred_heap.h:511

    image

    opened by hia3 0
  • Multithreaded

    Multithreaded

    Hi! The gcpp is very interesting, but I'm failing to realize how it's supposed to work in multithreaded program. Especially the part that registers deferred_ptr objects and resets them in the heap destructor.

    For example, would it be possible to demand no external reference exist at the end of heap's lifetime instead? If heap destructor would detect such a reference after destroying all the objects (checking remaining refcounters), it would call std::terminate.

    enhancement 
    opened by Nekotekina 1
  • ptr_to() is less flexible than shared_ptr style aliased constructors.

    ptr_to() is less flexible than shared_ptr style aliased constructors.

    If I have the following code:

    #include <memory>
    #include <vector>
    
    struct S {
      std::vector<int> v;
    };
    
    int main() {
      auto p = std::make_shared<S>();
      p->v.push_back(1);
      auto a = std::shared_ptr<int>(p, &p->v[0]);
    }
    

    It's valid and works with shared_ptr, but is currently not allowed by ptr_to. I.e. they do not expose equivalent functionality.

    link to ptr_to: https://github.com/hsutter/gcpp/blob/master/deferred_heap.h#L487

    enhancement 
    opened by ahmedcharles 3
Owner
Herb Sutter
Chair of ISO C++ committee, designer of various C++ features, and software architect at Microsoft.
Herb Sutter
Experimental managed C-strings library

Stricks Managed C strings library. ?? API Why ? Because handling C strings is tedious and error-prone. Appending while keeping track of length, null-t

Francois Alcover 88 Oct 10, 2022
Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as task graphs that are scheduled concurrently and asynchronously on both CPUs and GPUs.

Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as tasks in a graph structure, where edges represent task dependencies

null 28 Dec 18, 2022
A lightweight library of Behavior Trees Library in C++.

A lightweight behavior tree library in C++. NEWS! ?? Thanks to Davide Faconti there is now a more sophisticated version of the library. The new versio

Michele Colledanchise 168 Dec 21, 2022
Cross-platform STL-styled and STL-compatible library with implementing containers, ranges, iterators, type traits and other tools; actors system; type-safe config interface.

Yato A small repository where I'm gatherting useful snippets and abstractions for C++ development. Yato includes 3 main modules: multidimensional cont

Alexey 10 Dec 18, 2022
R :package: and header-only C++ library for geospatial space-division based compression and encoding

spress spress provides utilities for encoding and compressing geospatial objects, such as sf objects. Installation This package requires C++11 for com

UF-OKN 5 Dec 9, 2021
R :package: and header-only C++ library for geospatial space-division based compression and encoding

spress spress provides utilities for encoding and compressing geospatial objects, such as sf objects. Installation This package requires C++11 for com

UF-OKN 5 Dec 9, 2021
C++ library and cmdline tools for parsing and manipulating VCF files

vcflib A C++ library for parsing and manipulating VCF files. overview The Variant Call Format (VCF) is a flat-file, tab-delimited textual format that

null 525 Dec 23, 2022
Library of generic and type safe containers in pure C language (C99 or C11) for a wide collection of container (comparable to the C++ STL).

M*LIB: Generic type-safe Container Library for C language Overview M*LIB (M star lib) is a C library enabling to use generic and type safe container i

PpHd 571 Jan 5, 2023
C header library for typed lists (using macros and "template" C).

vector.h C header library for typed lists (using macros and "template" C). Essentially, this is a resizable array of elements of your choosing that is

Christopher Swenson 33 Dec 19, 2022
libsais is a library for linear time suffix array and burrows wheeler transform construction based on induced sorting algorithm.

libsais libsais is a library for fast (see Benchmarks below) linear time suffix array and Burrows-Wheeler transform construction based on induced sort

Ilya Grebnov 112 Dec 22, 2022
Wonderful library with lots of useful functions, algorithms and data structures in C, link it with -l9wada

Lib9wada Wonderful library with lots of useful functions, algorithms and data structures in C, link it with -l9wada Usage Compile the library with mak

Lprogrammers Lm9awdine 53 Nov 21, 2022
Wonderful library with lots of useful functions, algorithms and data structures in C, link it with -l9wada

LibC+ Wonderful library with lots of useful functions, algorithms and data structures in C, link it with -lC+ Better than C, not as much as c++ Usage

BnademOverflow 53 Nov 21, 2022
A library of type safe sets over fixed size collections of types or values, including methods for accessing, modifying, visiting and iterating over those.

cpp_enum_set A library of type safe sets over fixed size collections of types or values, including methods for accessing, modifying, visiting and iter

Carl Dehlin 22 Jun 16, 2022
A simple and easy-to-use library to build pipelines in C

A simple and easy-to-use library to build pipelines in C

Ray 15 Dec 25, 2022
ring-span lite - A C++yy-like ring_span type for C++98, C++11 and later in a single-file header-only library

ring-span lite: A circular buffer view for C++98 and later Contents Example usage In a nutshell Dependencies Installation Synopsis Reported to work wi

Martin Moene 127 Dec 28, 2022
A library to convert Uber's H3 geo-index to LatLng vertices for Kotlin Multiplatform Mobile iOS and Android

A library to convert Uber's H3 geo-index to LatLng vertices for Kotlin Multiplatform Mobile iOS and android Features Uber H3 in one interface Common i

LINE 9 Jul 12, 2022
Simple and fast configuration file library (written in C99)

Features Configuration file reading Supported operating systems Ubuntu MacOS Windows Build requirements C99 compiler CMake 3.10+ Cloning git clone htt

Nikita Fediuchin 3 May 26, 2022
Step is a C++17, header-only library of STL-like algorithms and data structures

Step is a C++17, header-only library of STL-like algorithms and data structures. Installation git clone --depth 1 https://github.com/storm-ptr/step.gi

storm-ptr 3 Sep 1, 2022