A hybrid thread / fiber task scheduler written in C++ 11

Overview

Marl

Marl is a hybrid thread / fiber task scheduler written in C++ 11.

About

Marl is a C++ 11 library that provides a fluent interface for running tasks across a number of threads.

Marl uses a combination of fibers and threads to allow efficient execution of tasks that can block, while keeping a fixed number of hardware threads.

Marl supports Windows, macOS, Linux, FreeBSD, Fuchsia, Android and iOS (arm, aarch64, mips64, ppc64, x86 and x64).

Marl has no dependencies on other libraries (with an exception on googletest for building the optional unit tests).

Example:

#include "marl/defer.h"
#include "marl/event.h"
#include "marl/scheduler.h"
#include "marl/waitgroup.h"

#include <cstdio>

int main() {
  // Create a marl scheduler using all the logical processors available to the process.
  // Bind this scheduler to the main thread so we can call marl::schedule()
  marl::Scheduler scheduler(marl::Scheduler::Config::allCores());
  scheduler.bind();
  defer(scheduler.unbind());  // Automatically unbind before returning.

  constexpr int numTasks = 10;

  // Create an event that is manually reset.
  marl::Event sayHello(marl::Event::Mode::Manual);

  // Create a WaitGroup with an initial count of numTasks.
  marl::WaitGroup saidHello(numTasks);

  // Schedule some tasks to run asynchronously.
  for (int i = 0; i < numTasks; i++) {
    // Each task will run on one of the 4 worker threads.
    marl::schedule([=] {  // All marl primitives are capture-by-value.
      // Decrement the WaitGroup counter when the task has finished.
      defer(saidHello.done());

      printf("Task %d waiting to say hello...\n", i);

      // Blocking in a task?
      // The scheduler will find something else for this thread to do.
      sayHello.wait();

      printf("Hello from task %d!\n", i);
    });
  }

  sayHello.signal();  // Unblock all the tasks.

  saidHello.wait();  // Wait for all tasks to complete.

  printf("All tasks said hello.\n");

  // All tasks are guaranteed to complete before the scheduler is destructed.
}

Benchmarks

Graphs of several microbenchmarks can be found here.

Building

Marl contains many unit tests and examples that can be built using CMake.

Unit tests require fetching the googletest external project, which can be done by typing the following in your terminal:

cd <path-to-marl>
git submodule update --init

Linux and macOS

To build the unit tests and examples, type the following in your terminal:

cd <path-to-marl>
mkdir build
cd build
cmake .. -DMARL_BUILD_EXAMPLES=1 -DMARL_BUILD_TESTS=1
make

The resulting binaries will be found in <path-to-marl>/build

Windows

Marl can be built using Visual Studio 2019's CMake integration.

Using Marl in your CMake project

You can build and link Marl using add_subdirectory() in your project's CMakeLists.txt file:

set(MARL_DIR <path-to-marl>) # example <path-to-marl>: "${CMAKE_CURRENT_SOURCE_DIR}/third_party/marl"
add_subdirectory(${MARL_DIR})

This will define the marl library target, which you can pass to target_link_libraries():

target_link_libraries(<target> marl) # replace <target> with the name of your project's target

You may also wish to specify your own paths to the third party libraries used by marl. You can do this by setting any of the following variables before the call to add_subdirectory():

set(MARL_THIRD_PARTY_DIR <third-party-root-directory>) # defaults to ${MARL_DIR}/third_party
set(MARL_GOOGLETEST_DIR  <path-to-googletest>)         # defaults to ${MARL_THIRD_PARTY_DIR}/googletest
add_subdirectory(${MARL_DIR})

Usage Recommendations

Capture marl synchronization primitves by value

All marl synchronization primitves aside from marl::ConditionVariable should be lambda-captured by value:

marl::Event event;
marl::schedule([=]{ // [=] Good, [&] Bad.
  event.signal();
})

Internally, these primitives hold a shared pointer to the primitive state. By capturing by value we avoid common issues where the primitive may be destructed before the last reference is used.

Create one instance of marl::Scheduler, use it for the lifetime of the process

The marl::Scheduler constructor can be expensive as it may spawn a number of hardware threads.
Destructing the marl::Scheduler requires waiting on all tasks to complete.

Multiple marl::Schedulers may fight each other for hardware thread utilization.

For these reasons, it is recommended to create a single marl::Scheduler for the lifetime of your process.

For example:

int main() {
  marl::Scheduler scheduler(marl::Scheduler::Config::allCores());
  scheduler.bind();
  defer(scheduler.unbind());

  return do_program_stuff();
}

Bind the scheduler to externally created threads

In order to call marl::schedule() the scheduler must be bound to the calling thread. Failure to bind the scheduler to the thread before calling marl::schedule() will result in undefined behavior.

marl::Scheduler may be simultaneously bound to any number of threads, and the scheduler can be retrieved from a bound thread with marl::Scheduler::get().

A typical way to pass the scheduler from one thread to another would be:

std::thread spawn_new_thread() {
  // Grab the scheduler from the currently running thread.
  marl::Scheduler* scheduler = marl::Scheduler::get();

  // Spawn the new thread.
  return std::thread([=] {
    // Bind the scheduler to the new thread.
    scheduler->bind();
    defer(scheduler->unbind());

    // You can now safely call `marl::schedule()`
    run_thread_logic();
  });
}

Always remember to unbind the scheduler before terminating the thread. Forgetting to unbind will result in the marl::Scheduler destructor blocking indefinitely.

Don't use externally blocking calls in marl tasks

The marl::Scheduler internally holds a number of worker threads which will execute the scheduled tasks. If a marl task becomes blocked on a marl synchronization primitive, marl can yield from the blocked task and continue execution of other scheduled tasks.

Calling a non-marl blocking function on a marl worker thread will prevent that worker thread from being able to switch to execute other tasks until the blocking function has returned. Examples of these non-marl blocking functions include: std::mutex::lock(), std::condition_variable::wait(), accept().

Short blocking calls are acceptable, such as a mutex lock to access a data structure. However be careful that you do not use a marl blocking call with a std::mutex lock held - the marl task may yield with the lock held, and block other tasks from re-locking the mutex. This sort of situation may end up with a deadlock.

If you need to make a blocking call from a marl worker thread, you may wish to use marl::blocking_call(), which will spawn a new thread for performing the call, allowing the marl worker to continue processing other scheduled tasks.


Note: This is not an officially supported Google product

Comments
  • Efficient queuing of tasks that start waiting?

    Efficient queuing of tasks that start waiting?

    In my usage pattern I'm creating a possibly large number of tasks that may have a decent % initially waiting. For example, think of the pathological Ticket::Queue use case where the first (or one of the first) operations would be a wait:

    void runTasksConcurrentThenSerially(int numConcurrentTasks) {
        marl::Ticket::Queue queue;
        auto tickets = queue.take(N);
        for (int i = 0; i < N; i++) {
            marl::schedule([=] {
                ticket[N - i].wait();  // or some other ordering
                doSerialWork();
                ticket[i].done();
            });
        }
    }
    

    Right now each task will get enqueued, switched to, and then immediately switched out when ticket.wait() blocks and for certain workloads this can be the norm instead of a pathological case.

    Maybe the better approach for this is to use the tasks-in-tasks technique to only create the tasks when they are runnable? If so, are there other performance gotchas from doing that? For example, in this artwork: image Would it be more efficient to queue up all of these using tickets (so Bn waits for A, C waits for Bn (waitgroup, etc), etc) ahead of time or better instead to have each queue up the following ones (so A only, then A runs and enqueues Bn, etc)?

    I'm mostly interested in the fan-out case where A->Bn may be going 1 to 100 and Bn->A would be going 100 to 1. I wouldn't want C to wake 100 times only to wait 99 of them, for example. But I also wouldn't want to pessimize the fanout across threads at Bn by waiting until A was executing to enqueue the tasks.

    Thoughts?

    opened by benvanik 12
  • iOS support?

    iOS support?

    Marl looks really interesting but my codebase primarily targets mobile, so I'd need support for both iOS and Android. Is iOS support on the cards at all?

    A bit of background - my codebase does real-time CPU-side camera processing and many of the high CPU parts could be split across cores relatively easily. The work is non-blocking so I don't think I really need fibers or yielding, I just want to spread it out between workers.

    I was thinking of writing a simple dispatch queue type of thing where workers would just all pull the next job from a shared queue, but thought I'd look at some of the more complete third-party library solutions. If you think marl is overkill for this use case then that would be really helpful to hear too!

    There might even be standard library solutions that will work - I've been slow to learn what's new in C++ as historically the codebase worked with lots of old / esoteric compilers (old MSVC, Symbian, Android ndk before it had a standard library...). I have started adopting some new features from C++11 but I haven't dedicated much time to learning all the new stuff.

    opened by tangobravo 11
  • CMake: link with pthread in a CMake way

    CMake: link with pthread in a CMake way

    Hi! @ben-clayton Sorry for the long delay. This patch is based on this discussion: [cmake and libpthread])https://stackoverflow.com/questions/1620918/cmake-and-libpthread).

    The difference is that, by using:

    target_link_libraries(myproject PUBLIC Threads::Threads)
    

    an additional flag -pthread is added when compiling. And it may make a difference sometimes: Significance of -pthread flag when compiling.

    opened by myd7349 9
  • CMake: Ensure MARL_BUILD_TESTS works

    CMake: Ensure MARL_BUILD_TESTS works

    The conditional for building tests was written as simply "BUILD_TESTS". This commit changes it to use "MARL_BUILD_TESTS".

    With this commit, my POWER9 in Big Endian mode can build, run, and pass the Marl test suite.

    opened by awilfox 9
  • Feedback on Marl

    Feedback on Marl

    Hi, I'm just going to leave some feeback and thoughts on marl as recommended from this external issue. Note that my experiences with marl are limited and involve only the incomplete benchmark repository I made and a point cloud processing library (for robotics), polylidar, which is having a major refactor being done which also includes marl.

    Oh nice! How are you finding marl²?

    I find marl to be very simple to build and integrate into my existing project. The CMake build integration is very important to me and seems to be setup correctly such it was trivial to add to my project. Also the Scheduler and WaitGroup primitives seem simple enough and it was apparent how to integrate them into polylidar. Overall I enjoy how simple it is too describe work in tasks and describe dependencies with the wait groups.

    Given that your benchmarks put marl mostly in last place in terms of performance, can I ask what convinced you to go with marl³?

    I chose to try out marl before those benchmarks were completed : ) . But the main reasons were the following:

    1. Many of the tasks for polylidar were dynamic and nested, meaning you have no idea how many subtasks must be completed inside of a task (and all those subtasks must finish before their parent task is completed). It seemed like marl was primarily focused on this scenario. Also I do not have thousands of tasks generally, maybe like 100 total. I figured the difference between all these libraries might be small at such a low amount of tasks.
    2. I saw that marl had hand-written assembly for the arm architecture for its fiber/thread scheduling. I will be running some of my code on an RPI4 and felt more comfortable using a library that had tested with that architecture. Of course that's just a way of saying I was too lazy to (initially) benchmark performance between architectures and implementations and was relying upon googles name and heavy use of android/arm to assume good testing and performance. Note I do plan to run my benchmark repo on my rpi4 after it has solidified better.

    So that lead me to just experiment with marl first. And it worked. Very well.

    However I do have some general comments:

    My system: CPU: Ryzen 3900X (12 Core/24 Thread) RAM: 32 GB OS: Ubuntu 18.04 Compiler: C++ 14 GCC 7.something

    1. There seems to be a 300 microsecond penalty loss when using marl, even if only 1 thread is used. You can see where I create the scheduler here.

    I have benchmarks that compared using marl and not using marl for a light workload that only requires about 300 microseconds of work (because). In this output Optimized means using marl.

    ImagesAndSparseMesh/BM_ExtractPlanesAndPolygonsFromMultipleNormals/1/real_time_mean                   336 us          336 us            3
    ImagesAndSparseMesh/BM_ExtractPlanesAndPolygonsFromMultipleNormals/1/real_time_median                 335 us          335 us            3
    ImagesAndSparseMesh/BM_ExtractPlanesAndPolygonsFromMultipleNormals/1/real_time_stddev                2.36 us         2.35 us            3
    ImagesAndSparseMesh/BM_ExtractPlanesAndPolygonsFromMultipleNormalsOptimized/1/real_time_mean          666 us          355 us            3
    ImagesAndSparseMesh/BM_ExtractPlanesAndPolygonsFromMultipleNormalsOptimized/1/real_time_median        664 us          355 us            3
    ImagesAndSparseMesh/BM_ExtractPlanesAndPolygonsFromMultipleNormalsOptimized/1/real_time_stddev       8.92 us         1.03 us            3
    

    So some thoughts I have that I have not experimented/validated yet. Is this a startup cost assocaited in the instantiation of the Scheduler object? If so should I put it inside of a class variable (created only once) instead of creating the scheduler on the function call. Or is this just the penalty that marl has, which is not that steep of a penalty.

    Note that I would not try to reproduce these results with my repository because you need the input data and I feel its too complex. If this startup cost is unexpected it would be better for me to generate a MRE in my benchmark repo.

    1. There was a previous issue brought up by another here. The point I want to highlight is the use of RAII. I kind of agree that it felt pretty weird/different to use the defer macro. I have read the issue and I understand it is optional but I felt there were not alot of example to show how to accomplish without using defer. I also at one point was trying to call 2 different wait group .dones() in one defer and that did not work well (this was before in the poorly written binary_tree benchmark for marl). I think the idea is that I'm really not that amazing at C++ and the common idiom of RAII I seem to get, but the defer makes me a little puzzled about whats going on and what I need to do to sure that no matter what (even exceptions) the wait group gets decremented so I dont wait forever.

    2. The API of cpp-task-flow, even though I really haven't used it too much either, seems amazing. I'm really impressed by how succinct and simple it is to describe dependencies between tasks and then execute them. Its probably out of scope for marl, but it would nice to have something similar.

    If I think of anything else I will let you know. Thanks again for releasing this product and putting such an effort on improvement.

    opened by JeremyBYU 8
  • when I run marl-unittests on arm soc, Segment error occurred

    when I run marl-unittests on arm soc, Segment error occurred

    Starting program: /mnt/baidu/idl-xteam/hisi-faceID-sdk/cross_3rdparty/opensource/marl/build/marl-unittests [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/a53_softfp_neon-vfpv4/libthread_db.so.1". [==========] Running 316 tests from 5 test suites. [----------] Global test environment set-up. [----------] 18 tests from WithoutBoundScheduler [ RUN ] WithoutBoundScheduler.ConditionVariable [ OK ] WithoutBoundScheduler.ConditionVariable (1 ms) [ RUN ] WithoutBoundScheduler.Defer [ OK ] WithoutBoundScheduler.Defer (0 ms) [ RUN ] WithoutBoundScheduler.DeferOrder [ OK ] WithoutBoundScheduler.DeferOrder (0 ms) [ RUN ] WithoutBoundScheduler.OSFiber

    Program received signal SIGSEGV, Segmentation fault. 0xb6fded60 in _dl_fixup () from /lib/ld-linux.so.3 (gdb) bt #0 0xb6fded60 in _dl_fixup () from /lib/ld-linux.so.3 #1 0xb6fe5d64 in _dl_runtime_resolve () from /lib/ld-linux.so.3 #2 0xb6fe5d64 in _dl_runtime_resolve () from /lib/ld-linux.so.3 #3 0xb6fe5d64 in _dl_runtime_resolve () from /lib/ld-linux.so.3 #4 0xb6fe5d64 in _dl_runtime_resolve () from /lib/ld-linux.so.3 #5 0xb6fe5d64 in _dl_runtime_resolve () from /lib/ld-linux.so.3 #6 0xb6fe5d64 in _dl_runtime_resolve () from /lib/ld-linux.so.3

    CPU infos: processor : 0 model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 100.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 5

    processor : 1 model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 100.00 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 5

    Hardware : Generic DT based system Revision : 0000 Serial : 0000000000000000

    Toolchain info: arm-himix200-linux-g++ (HC&C V1R3C00SPC200B005_20190606) 6.3.0 Copyright (C) 2016 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    What is I want to known is how to fix it?

    opened by beimingxinghai 7
  • Fractal example: poor scalability and unexpected number of CPUs

    Fractal example: poor scalability and unexpected number of CPUs

    Hi,

    I have been playing around with the only example and I was quite surprised by it's poor performance.

    First of all I modified the code to run in a poorly sequential way (no marl). This is the time required by a single CPU:

    real	0m0.025s
    user	0m0.025s
    sys	0m0.000s
    

    Afterward, I modified the code as follows:

    int main(int argc, const char** argv) {
      marl::Scheduler scheduler;
      uint32_t num_threads = atoi(argv[1]);
      scheduler.setWorkerThreadCount(num_threads);
    

    Running with argument "1" I expected one CPU to be used, but actually 2 CPUs are used (100% each).

    real	0m18.354s
    user	0m32.423s
    sys	0m3.972s
    

    Argument "2" uses 3 CPUs apparently

    real	0m16.790s
    user	0m39.521s
    sys	0m10.168s
    

    Argument "4" uses 5 CPUs apparently (I have 8)

    real	0m15.852s
    user	0m49.900s
    sys	0m25.362s
    

    So, basically, the only example provided so far seems to suggest that marl kind of... disappoints, spending most of its time is spent "somewhere" in a blocking operation.

    After some profiling, I have the feeling that the fault is a mutex in the function rand(). See attached flamegraph.

    image

    My suggestion is to either fix this (I don't know how) or provide a different example where we can actually say: "hey, look how scalable it is with the number of CPUs"!

    I am also puzzled by the fact that the number of CPUs used is always equal to (num_threads + 1).

    opened by facontidavide 7
  • No longer using fiber API on Windows

    No longer using fiber API on Windows

    I think fiber api and ucontext api can no longer be used. You can use the core code of boost.context to obtain higher performance and smaller stack space. The smaller stack space on the X32 platform means more tasks can be opened. What do you think?

    opened by a952135763 6
  • thread affinity support for *nix systems

    thread affinity support for *nix systems

    do you have plans to support thread pinning of the worker threads like in the windows implementation?

    it would be nice if a "user" of marl could define some sort of affinity-policy to be able to adjust to requirements ( e.g. single thread pinning vs pinning to group of threads together ).

    thinking of pseudo++ code:

    
    // we might have 4 workers pinned to 2 compute cores
    // in any case, we have to use cores `1` and `2`
    auto allowedCpusFromSomewhere( int workerId ) { return marl::CpuSet{1, 2 }; };
    
    main() {
      ...
      marl::Scheduler scheduler;
      scheduler.bind();
      //scheduler.setWorkerThreadCount(4);
      scheduler.setWorkerThreadCountWithAffinity(
        4, []( auto const workerId, auto const& nativeHandle ) noexcept {
          marl::AffinityPolicy.pin( nativeHandle, allowedCpusFromSomewhere(workerId) );
          // Or
          nativeHandle.pin(CpuSet{1,2,3,4});
          // Or
      }
      // Or
      scheduler.setWorkerThreadCountWithAffinity(marl::ForAll{marl::CpuSet{1,2,3,4}});
      scheduler.setWorkerThreadCountWithAffinity(marl::OneOf{marl::CpuSet{1,2,3,4}});
      ...
    }
    

    furthermore, a user might just want to specify exactly which cpu cores are allowed to be used due to "external" requirements (thinking of network card capture into worker queue. the capture threads a pinned to single threads and these threads must be different to marls native worker threads).

    would that make any sense?

    ( yes i looked into src/thread.cc and read the comment claiming that pinning to a group is favourable instead of pinning to single thread. however, i guess it could be application dependent, why not let the user decide instead of hardcoding? )

    anyhow, on *nix no pinning at all.

    opened by magenbluten 6
  • Please consider replacing foreign language idioms with proper C++ equivalents

    Please consider replacing foreign language idioms with proper C++ equivalents

    The RAII idiom is a native way in C++ land to handle resources management while try-with-resources/finally/defer-like idiom is an escape hatch for rare occasions. External destruction makes code prone to resources leakage and, more importantly, memory corruption and misuse.

    The other issue is lack of proper copy/move constructors/operators. Please check C++ guidelines on movable/copyable classes.

    One such example is Scheduler class

    1. It looks like either non-copyable or even non-movable class. Although it has no copy constructors deleted, or move-constructors overridden. As a result, it implicitly uses generated ones, depending on its member fields semantics. This may lead to unexpected copyability and handles duplication, with consequent dangling-pointers or double-frees.
    2. Destructor only checks scheduler was unbound. The proper way would be to both perform unbind and, if movability is desired, create explicit move constructor which would leave old instance in "moved-from" state.
    opened by target-san 6
  • Unpoison thread_local variables for MSAN

    Unpoison thread_local variables for MSAN

    The current method of working around MSAN errors for thread-local variables by annotating getters and setters with CLANG_NO_SANITIZE_MEMORY was insufficient because the getters don't actually dereference the memory. Rather, all functions that dereference the pointers must be annotated with CLANG_NO_SANITIZE_MEMORY, which is error-prone. This PR switches to an alternative workaround that unpoisons the memory in the getter.

    This false-positive MSAN issue is seen when trying to upgrade Chromium MSAN bots from Ubuntu 18.04 to 20.04, and should be fixed by this PR.

    opened by tanderson-google 5
  • Multiple tasks queued to the same worker while other workers are idle

    Multiple tasks queued to the same worker while other workers are idle

    Great library. Thanks to all the contributors! Here's the situation...

    I sometimes subdivide a larger body of work into multiple tasks. In this case exactly 4 tasks. Those tasks are enqueued in rapid succession (for loop). I have many free workers on my Threadripper CPU and the vast majority of the cores are waiting for work (not spinning). In this situation, I often observe that two of the tasks will be assigned to the same worker. That worker will do both tasks one after the other, taking twice as long as the other workers which only had to perform one task. The caller has to wait for all of the tasks to complete, so the end-to-end time is twice as long as it could have been if the tasks were each assigned to different workers. Each task takes 5+ milliseconds, so there should be plenty of time for other workers to steel the task, if they were attempting to do so. I would like to avoid this situation, and I would appreciate your advice.

    The obvious options include:

    1. Modify Scheduler::enqueue so that it keeps iterating over workers until it finds one that is ready to begin the work immediately, if such a worker exists. This would increase the cost of enqueueing a tasks so it might have a detrimental affect when the tasks outnumber the workers. For that reason, I have considered adding a flag which would opt-in to this behavior (to be used for tasks that are known to be long enough in duration to be worth the extra effort up front).
    2. Another variation of option 1 would be to keep a list of idle workers, but the book keeping overhead might be significant.
    3. Or, I could allow the tasks to be doubled up on the same worker, and then wake up one of the other workers so that it has the opportunity to steel work.

    What do you think? Is there a recommended solution for this problem? I know that subdividing the work into smaller tasks would help the scheduler but there is also a cost that increases with the number of tasks (combining the results). Thank you for taking the time to consider this question.

    opened by natepaynefb 1
  • Difference between option_if_not_defined and option

    Difference between option_if_not_defined and option

    Hello, I am curious that @ben-clayton define a cmake function option_if_not_defined. But I have tested and found that option_if_not_defined and option have the same behavior. So can you tell me why did you do that. I found you use option_if_not_defined instead of option in 2019/11/20.

    function (option_if_not_defined name description default)
        if(NOT DEFINED ${name})
            option(${name} ${description} ${default})
        endif()
    endfunction()
    

    cordialement zhufangda

    opened by zhufangda 1
  • [Question] Mutex for fiber

    [Question] Mutex for fiber

    I need mutex in the fiber level for protecting data between fibers. Although I found the marl::mutex, but it is the thread level mutex. Are there any implementation of mutex for marl like boost::fiber::mutex in boost fiber ?

    opened by zhufangda 3
  • scheduler: Support moved tast captures / arguments

    scheduler: Support moved tast captures / arguments

    Replace the internal use of std::function with std::packaged_task. std::function requires that the wrapped function is CopyConstructable, where as a std::packaged_task does not. This allows the tasks to hold std::move'd values.

    This is an API / ABI breaking change, but I believe few people would be copying marl::Tasks.

    Fixes: #211

    opened by ben-clayton 0
  • [Question] Fiber yield behavior

    [Question] Fiber yield behavior

    I am looking for a way to 'yield' from the currently executing fiber in the sense that the fiber would be sent to the back of the worker Work::fibers queue (as opposed to Work::waiting) and would otherwise resume once its turn comes, without requiring any notification, event signaling or timeout. I have been looking at the Scheduler::Fiber API and the various wait() overloads and cannot see a way to achieve that. So my questions are:

    • is there a way to achieve that that I overlooked?
    • if there isn't a way, is there a fundamental/strong reason why the API doesn't include that, i.e. could it be added as a contribution?
    opened by Guillaume227 4
  • Can't capture unique_ptr in marl::schedule lambda

    Can't capture unique_ptr in marl::schedule lambda

    Here's a simple example:

    #include <iostream>
    #include "marl/scheduler.h"
    
    int main(int argc, char** argv) {
      std::unique_ptr<int> input = std::make_unique<int>(10);
      marl::schedule([=, input = std::move(input)] {
        std::cout << "input: " << *input << std::endl;
      });
    }
    

    I got the following compile error:

    In file included from /usr/gcc_toolchain/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/functional:59:
    /usr/gcc_toolchain/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:176:10: error: call to implicitly-deleted copy constructor of '(lambda at .../test.cc:7:18)'
                new _Functor(*__source._M_access<const _Functor*>());
                    ^        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /usr/gcc_toolchain/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:211:8: note: in instantiation of member function 'std::_Function_base::_Base_manager<(lambda at .../test.cc:7:18)>::_M_clone' requested here
                  _M_clone(__dest, __source, _Local_storage());
                  ^
    /usr/gcc_toolchain/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:677:33: note: in instantiation of member function 'std::_Function_base::_Base_manager<(lambda at .../test.cc:7:18)>::_M_manager' requested here
                _M_manager = &_My_handler::_M_manager;
                                           ^
    external/marl/include/marl/scheduler.h:534:27: note: in instantiation of function template specialization 'std::function<void ()>::function<(lambda at .../test.cc:7:18), void, void>' requested here
      scheduler->enqueue(Task(std::forward<Function>(f)));
                              ^
    ...test.cc:7:9: note: in instantiation of function template specialization 'marl::schedule<(lambda at .../test.cc:7:18)>' requested here
      marl::schedule([=, input = std::move(input)] {
            ^
    .../test.cc:7:22: note: copy constructor of '' is implicitly deleted because field '' has a deleted copy constructor
      marl::schedule([=, input = std::move(input)] {
                         ^
    /usr/gcc_toolchain/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:414:7: note: 'unique_ptr' has been explicitly marked deleted here
          unique_ptr(const unique_ptr&) = delete;
          ^
    1 error generated.
    

    The std::thread() lambda can take unique_ptr though. Any ideas? Thanks.

    opened by 123qws 1
Owner
Google
Google ❤️ Open Source
Google
A easy to use multithreading thread pool library for C. It is a handy stream like job scheduler with an automatic garbage collector. This is a multithreaded job scheduler for non I/O bound computation.

A easy to use multithreading thread pool library for C. It is a handy stream-like job scheduler with an automatic garbage collector for non I/O bound computation.

Hyoung Min Suh 12 Jun 4, 2022
EnkiTS - A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

Support development of enkiTS through Github Sponsors or Patreon enkiTS Master branch Dev branch enki Task Scheduler A permissively licensed C and C++

Doug Binks 1.4k Dec 27, 2022
A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependenc

RichieSams 796 Dec 30, 2022
Lucy job system - Fiber-based job system with extremely simple API

Lucy Job System This is outdated compared to Lumix Engine. Use that instead. Fiber-based job system with extremely simple API. It's a standalone versi

Mikulas Florek 80 Dec 21, 2022
Termite-jobs - Fast, multiplatform fiber based job dispatcher based on Naughty Dogs' GDC2015 talk.

NOTE This library is obsolete and may contain bugs. For maintained version checkout sx library. until I rip it from there and make a proper single-hea

Sepehr Taghdisian 35 Jan 9, 2022
Coroutine - C++11 single .h asymmetric coroutine implementation via ucontext / fiber

C++11 single .h asymmetric coroutine implementation API in namespace coroutine: routine_t create(std::function<void()> f); void destroy(routine_t id);

null 390 Dec 20, 2022
Thread pool - Thread pool using std::* primitives from C++17, with optional priority queue/greenthreading for POSIX.

thread_pool Thread pool using std::* primitives from C++11. Also includes a class for a priority thread pool. Requires concepts and C++17, including c

Tyler Hardin 77 Dec 30, 2022
Thread-pool - Thread pool implementation using c++11 threads

Table of Contents Introduction Build instructions Thread pool Queue Submit function Thread worker Usage example Use case#1 Use case#2 Use case#3 Futur

Mariano Trebino 655 Dec 27, 2022
Thread-pool-cpp - High performance C++11 thread pool

thread-pool-cpp It is highly scalable and fast. It is header only. No external dependencies, only standard library needed. It implements both work-ste

Andrey Kubarkov 542 Dec 17, 2022
Sqrt OS is a simulation of an OS scheduler and memory manager using different scheduling algorithms including Highest Priority First (non-preemptive), Shortest Remaining Time Next, and Round Robin

A CPU scheduler determines an order for the execution of its scheduled processes; it decides which process will run according to a certain data structure that keeps track of the processes in the system and their status.

null 10 Sep 7, 2022
afl/afl++ with a hierarchical seed scheduler

This is developed based on AFLplusplus (2.68c, Qemu mode), thanks to its amazing maintainers and community Build and Run Please follow the instruction

null 47 Nov 25, 2022
Forkpool - A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20

riften::Forkpool A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20. This project uses C++20's coroutines to implement c

Conor Williams 129 Dec 31, 2022
Bikeshed - Lock free hierarchical work scheduler

Branch OSX / Linux / Windows master master bikeshed Lock free hierarchical work scheduler Builds with MSVC, Clang and GCC, header only, C99 compliant,

Dan Engelbrecht 81 Dec 30, 2022
Scheduler - Modern C++ Scheduling Library

Scheduler Modern C++ Header-Only Scheduling Library. Tasks run in thread pool. Requires C++11 and ctpl_stl.h in the path. Inspired by the Rufus-Schedu

Spencer Bosma 232 Dec 21, 2022
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous tasks programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, a

Taskflow 7.6k Dec 31, 2022
A task scheduling framework designed for the needs of game developers.

Intel Games Task Scheduler (GTS) To the documentation. Introduction GTS is a C++ task scheduling framework for multi-processor platforms. It is design

null 424 Jan 3, 2023
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Christian Blume 592 Dec 19, 2022
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, an

Taskflow 7.6k Dec 26, 2022
Task System presented in "Better Code: Concurrency - Sean Parent"

task_system task_system provides a task scheduler for modern C++. The scheduler manages an array of concurrent queues A task, when scheduled, is enque

Pranav 31 Dec 7, 2022