A General-purpose Parallel and Heterogeneous Task Programming System

Overview

Taskflow

taskflow Windows Build status Ubuntu macOS Windows Wiki TFProf Cite

Taskflow helps you quickly write parallel and heterogeneous tasks programs in modern C++

Why Taskflow?

Taskflow is faster, more expressive, and easier for drop-in integration than many of existing task programming frameworks in handling complex parallel workloads.

Taskflow lets you quickly implement task decomposition strategies that incorporate both regular and irregular compute patterns, together with an efficient work-stealing scheduler to optimize your multithreaded performance.

Static Tasking Dynamic Tasking

Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions that were otherwise difficult to do with existing tools.

Conditional Tasking

Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.

Taskflow Composition

Taskflow supports heterogeneous tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing.

Concurrent CPU-GPU Tasking

Taskflow provides visualization and tooling needed for profiling Taskflow programs.

Taskflow Profiler

We are committed to support trustworthy developments for both academic and industrial research projects in parallel computing. Check out Who is Using Taskflow and what our users say:

See a quick presentation and visit the documentation to learn more about Taskflow. Technical details can be referred to our IPDPS paper.

Start Your First Taskflow Program

The following program (simple.cpp) creates four tasks A, B, C, and D, where A runs before B and C, and D runs after B and C. When A finishes, B and C can run in parallel.

#include <taskflow/taskflow.hpp>  // Taskflow is header-only

int main(){
  
  tf::Executor executor;
  tf::Taskflow taskflow;

  auto [A, B, C, D] = taskflow.emplace(  // create 4 tasks
    [] () { std::cout << "TaskA\n"; },
    [] () { std::cout << "TaskB\n"; },
    [] () { std::cout << "TaskC\n"; },
    [] () { std::cout << "TaskD\n"; } 
  );                                  
                                      
  A.precede(B, C);  // A runs before B and C
  D.succeed(B, C);  // D runs after  B and C
                                      
  executor.run(taskflow).wait(); 

  return 0;
}

Taskflow is header-only and there is no wrangle with installation. To compile the program, clone the Taskflow project and tell the compiler to include the headers.

~$ git clone https://github.com/taskflow/taskflow.git  # clone it only once
~$ g++ -std=c++17 simple.cpp -I taskflow/taskflow -O2 -pthread -o simple
~$ ./simple
TaskA
TaskC 
TaskB 
TaskD

Visualize Your First Taskflow Program

Taskflow comes with a built-in profiler, TFProf, for you to profile and visualize taskflow programs in an easy-to-use web-based interface.

# run the program with the environment variable TF_ENABLE_PROFILER enabled
~$ TF_ENABLE_PROFILER=simple.json ./simple
~$ cat simple.json
[
{"executor":"0","data":[{"worker":0,"level":0,"data":[{"span":[172,186],"name":"0_0","type":"static"},{"span":[187,189],"name":"0_1","type":"static"}]},{"worker":2,"level":0,"data":[{"span":[93,164],"name":"2_0","type":"static"},{"span":[170,179],"name":"2_1","type":"static"}]}]}
]
# paste the profiling json data to https://taskflow.github.io/tfprof/

In addition to execution diagram, you can dump the graph to a DOT format and visualize it using a number of free GraphViz tools.

// dump the taskflow graph to a DOT format through std::cout
taskflow.dump(std::cout); 

Supported Compilers

To use Taskflow, you only need a compiler that supports C++17:

  • GNU C++ Compiler at least v7.0 with -std=c++17
  • Clang C++ Compiler at least v6.0 with -std=c++17
  • Microsoft Visual Studio at least v19.27 with /std:c++17
  • AppleClang Xode Version at least v12.0 with -std=c++17
  • Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
  • Intel C++ Compiler (nvcc) at least v19.0.1 with -std=c++17

Taskflow works on Linux, Windows, and Mac OS X.

Learn More about Taskflow

Visit our project website and documentation to learn more about Taskflow. To get involved:

CppCon20 Tech Talk MUC++ Tech Talk

We are committed to support trustworthy developments for both academic and industrial research projects in parallel and heterogeneous computing. At the same time, we appreciate all Taskflow contributors!

License

Taskflow is licensed with the MIT License. You are completely free to re-distribute your work derived from Taskflow.


Issues
  • _per_thread() data in separate compilation units

    _per_thread() data in separate compilation units

    When an tf::Executor is created in one translation unit and then used in another translation unit (or shared library?), the per-thread data inside the Executor is invalid. This causes asserts like

    ../bundled/taskflow-2.7.0/include/taskflow/core/executor.hpp:976: void tf::Executor::_invoke_dynamic_work_external(tf::Node*, tf::Graph&, bool): Assertion `worker &&
    worker->executor == this' failed.
    

    to trigger.

    The problem is https://github.com/taskflow/taskflow/blob/9d17ee3fb28ef8b9b92cd28d10d7ac840ad33de8/taskflow/core/executor.hpp#L446 which will create a different copy in each translation unit, also see https://stackoverflow.com/questions/185624/static-variables-in-an-inlined-function for more details.

    I am attaching an example that uses a shared library for this. Compile and run with:

    export LD_LIBRARY_PATH=`pwd`
    # 1. make libfoo.so
    g++ --std=c++14 -Wall -I ~/taskflow/installed/include/ -O2 -pthread -fpic -c other.cc
    g++ -shared -o libfoo.so other.o
    
    # 2. compile a.out and link libfoo.so
    g++ --std=c++14 -Wall -I ~/taskflow/installed/include/ -O2 -pthread -o a.out -L. main.cc -lfoo
    
    # 3. run
    ./a.out
    

    source code to reproduce: taskflow-thread-local-bug.zip

    bug 
    opened by tjhei 21
  • Taskflow reuse

    Taskflow reuse

    I came across cpp-taskflow recently and I find it very interesting and well designed. However, from my point of view, it would be great to have the possibility to reuse a Taskflow graph after its completion.

    To give a bit of background, I work on robot control and we always have a bunch of stuff to compute in a real time loop (up to 1kHz/1ms) in order to send the robot(s) a new command. Taskflow would fit perfectly to describe and perform all the computations but having to reconstruct the graph at each iteration is not very elegant and may also bring some latencies due to the dynamic memory allocations that happen behind the scene.

    Is it something that could be implemented in cpp-taskflow?

    opened by BenjaminNavarro 20
  • Register callback function to run when task is completed

    Register callback function to run when task is completed

    Hi,

    C++ has a bit of a weird API when it comes to std::future and monitoring progress. It is "doable" but certainly not trivial and requires quite a bit of hackery. What i once did is starting the tasks from within a sub thread and let that thread block on the tasks, like your wait_for_all() function. Then, once those tasks were done the block lifted and the next line would be called to in effect notify some other class that the task is done.

    I'm hoping for an api function like that in taskflow. You now have the functions: dispatch/silent_dispatch/wait_for_topologies/wait_for_all I would like to request a new addition to those, lets call it dispatch_callback which would be non blocking and takes one callable as argument. That callable would be called once all tasks are done processing.

    The implementation could be what i described above as hackery :) But i'm guessing you have much neater options at your disposal as you're already managing the threadpool behind it.

    For me, it would make this project really useful! Even though i'm merely using it as a threadpool, it simply becomes very easy and convenient to use and implement multithreading in something :)

    Cheers, Mark

    opened by markg85 19
  • Add semaphores to limit concurrency for certain tasks

    Add semaphores to limit concurrency for certain tasks

    This is the early stages of implementing a "Semaphore" to limit concurrency in certain sections / tasks of the graph.

    A task can be given the requirement to acquire one or multiple semaphores before executing its work and a task can be given the job to release one or multiple semaphores after finishing its work. A task can acquire and release a semaphore, or just acquire or just release it. Semaphores start with an initial count. As long as that count is above 0, tasks can acquire the semaphore and do their work. If the count is 0 or less, a task trying to acquire the semaphore is not run and instead put into a waiting list of that semaphore. When the semaphore is released by another task, then tasks on that waiting list are scheduled to be run again (trying to acquire their semaphore(s) again).

    I've added a simple example with 5 tasks with no links/dependencies between them; which under normal circumstances would be executed concurrently. The example however has a semaphore with initial count 1, and all tasks need to acquire that semaphore before running and release that semaphore after they're done. This limits the concurrently running tasks to only one. See examples/onlyone.cpp

    Todo:

    • [x] Generally: Test! Right now I just have that single example which runs as expected from a first glance.
    • [x] Add ability for acquire/release to change semaphore counter by another value than 1
    • [x] Measure performance impact when feature is not used.
    • [x] Make sure semaphores don't needlessly block worker threads.

    I'd be happy about general feedback about the implementation so far! Did I follow your coding standards? Any oversights? Suggestions for improvement?

    opened by musteresel 18
  • Schedule an individual task with an executor

    Schedule an individual task with an executor

    We're working to convert the deal.II project (see https://github/dealii/dealii) to use cpp-taskflow (#170 was a result of this attempt) and in https://github.com/dealii/dealii/issues/10388 I was wondering whether there is a way to schedule individual tasks with an executor without actually creating a TaskFlow object?

    Right now, we are creating individual tasks using std::async (see https://github.com/dealii/dealii/pull/10389). In essence, this just puts the object into the background. But if cpp-taskflow uses a different thread pool to schedule task flows than the C++ runtime environment, then these two thread pools will get in each others' ways. As a consequence, I'm looking for a way to run individual tasks through cpp-taskflow executors instead of std::async. It seems unnecessary to create a whole task graph for a single task, but I can't quite find the right interface to avoid this.

    enhancement 
    opened by bangerth 18
  • Is taskflow suitable for this scenario ?

    Is taskflow suitable for this scenario ?

    This looks like a wonderful project.

    Here is my problem:

    I would like to run an incremental filter on a compressed 2D image i.e. decompress the image in small windows and then filter each window once it is fully decompressed. The uncompressed image is broken down into 64x64 code blocks (each code block has a matching compressed code stream), and the blocks are grouped into overlapping 128x128 windows, where each window covers 4 blocks. All blocks in a window must be decompressed before that window gets filtered.

    Say I have a 192x192 image; then there are 4 overlapping windows.

    Let's say I have four threads, and each thread processes one window. Each thread tries to decompress its blocks, and when they are all decompressed, it applies the filter, But, if a block is already being decompressed by another thread, then we must wait until the other thread is done (but while waiting we can steal decompress jobs from some other thread).

    So....... could this problem be modelled efficiently with taskflow ?

    One more issue: I would like to set the thread affinity of the threads in the pool, and reserve one core and it's hyperthreaded sibling for each window.

    Thanks very much! Aaron

    opened by boxerab 17
  • Passing data from tasks to their successors

    Passing data from tasks to their successors

    Could you suggest an easy way to pass data (return values) from tasks to their successors? Suppose A->C and B->C and C needs the results of A and B to proceed (typical of divide & conquer algorithms - BTW an example of d&q would be really helpful - e.g. TBB has a great example for recursive Fibonacci).

    As far as I understand, I'd need to emplace A and B, getting the futures; then I need to capture the futures in a lambda that is passed to C.

    Alternatively, I'd have to create variables for the results of A and B, and capture one of them in A and B, and both of them in C. This variables cannot be allocated on stack as it can get destroyed before the tasks have a chance to run, so one has to use dynamic memory allocation.

    These approaches have major disadvantages:

    • futures are heavy objects, allocating memory and requiring synchronisation - I believe they are an overkill for this scenario: We already know that A and B have stopped once C is running, so it is safe to retrieve the return values without synchronisation; moreover, if A and B have not been destroyed yet (my understanding is that the clean-up is only performed when the graph terminates), no memory allocation is necessary.
    • The code becomes unnecessarily ugly - I think it would be much cleaner to pass a lambda accepting 2 parameters to C, and guarantee that the order of parameters corresponds to the order of precede() etc. calls.
    opened by vkhomenko 16
  • Memory is not being released.

    Memory is not being released.

    I have been noticing that memory is not being release even though the taskflow goes out of scope. Do anyone know why for the case below why the memory is not being released?

    struct TestStruct
    {
      using Ptr = std::shared_ptr<TestStruct>;
      using ConstPtr = std::shared_ptr<const TestStruct>;
    
      std::unordered_map<std::string, double> joints;
    };
    
    void runTest()
    {
      std::vector<TestStruct::Ptr> t;
      t.reserve(10000000);
      for (std::size_t j = 0; j < 10000000; j++)
      {
        auto p = std::make_shared<TestStruct>();
        p->joints["joint_1"] = 0;
        p->joints["joint_2"] = 0;
        p->joints["joint_3"] = 0;
        p->joints["joint_4"] = 0;
        p->joints["joint_5"] = 0;
        p->joints["joint_6"] = 0;
        t.push_back(p);
      }
    }
    
    int main(int /*argc*/, char** /*argv*/)
    {
      tf::Executor executor;
    
      {
        tf::Taskflow taskflow;
    
        taskflow.emplace([=]() { runTest(); });
    
        std::future<void> f = executor.run(taskflow);
        f.wait();
      }
    
      std::cout << "Hit enter key to continue!" << std::endl;
      std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
    }
    

    Now if I replace the runTest with the function below it does what I expect the memory ramps up then drops back to where it started.

    inline std::mt19937 mersenne{ static_cast<std::mt19937::result_type>(std::time(nullptr)) };
    void runTest()
    {
      std::vector<double> t;
      t.reserve(1000000000);
      for (std::size_t j = 0; j < 1000000000; j++)
      {
        std::uniform_real_distribution<double> sample{ 0, 10 };
        t.push_back(sample(mersenne));
      }
    }
    
    opened by Levi-Armstrong 15
  • Cannot build cudaflow on windows

    Cannot build cudaflow on windows

    I'm trying to build the CUDA samples on Windows 10 using CUDA 11.2. When I try running the taskflow cmake it does not compile any of the cudaflow samples but compiles everything else. I tried compiling a cudaflow sample and it throws the below error.

    PS C:\Users\GPomare> nvcc -std=c++17 -I .\taskflow --extended-lambda .\sample.cu -o sample
    sample.cu
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(17): error: no operator "!=" matches these operands
                operand types are: const char [27] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(17): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(17): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(17): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(26): error: no operator "!=" matches these operands
                operand types are: const char [32] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(26): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(26): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(26): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(34): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(34): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(34): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(41): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(41): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(41): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(51): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(51): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(51): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(104): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(104): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(104): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(116): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(116): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(116): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(128): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(128): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(128): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(140): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(140): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(140): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(152): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(152): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(152): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(164): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(164): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(164): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(176): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(176): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(176): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(188): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(188): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(188): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(200): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(200): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(200): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(212): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(212): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(212): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(224): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(224): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(224): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(236): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(236): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(236): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(252): error: no operator "!=" matches these operands
                operand types are: const char [64] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(252): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(252): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(252): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(264): error: no operator "!=" matches these operands
                operand types are: const char [37] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(264): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(264): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(264): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(320): error: no operator "!=" matches these operands
                operand types are: const char [35] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(320): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(320): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(320): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(325): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(325): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_device.hpp(325): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(22): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(22): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(22): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(34): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(34): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(34): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(130): error: no operator "!=" matches these operands
                operand types are: const char [34] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(130): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(130): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(130): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(151): error: no operator "!=" matches these operands
                operand types are: const char [34] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(151): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(151): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_memory.hpp(151): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(19): error: no operator "!=" matches these operands
                operand types are: const char [40] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(19): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(19): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(19): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(30): error: no operator "!=" matches these operands
                operand types are: const char [22] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(30): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(30): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(30): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(46): error: no operator "!=" matches these operands
                operand types are: const char [31] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(46): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(46): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_stream.hpp(46): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_graph.hpp(131): error: no operator "!=" matches these operands
                operand types are: const char [38] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_graph.hpp(131): error: expected an identifier
    
    C:\Users\GPomare\taskflow\cuda\cuda_graph.hpp(131): error: cannot deduce "auto" type (initializer required)
    
    C:\Users\GPomare\taskflow\cuda\cuda_graph.hpp(131): error: no instance of overloaded function "tf::ostreamize" matches the argument list
                argument types are: (std::ostringstream)
    
    C:\Users\GPomare\taskflow\cuda\cuda_graph.hpp(143): error: no operator "!=" matches these operands
                operand types are: const char [33] != cudaError
    
    C:\Users\GPomare\taskflow\cuda\cuda_graph.hpp(143): error: expected an identifier
    
    Error limit reached.
    100 errors detected in the compilation of "./sample.cu".
    Compilation terminated.
    
    
    bug 
    opened by GPomare 13
  • Attaching user data to tasks

    Attaching user data to tasks

    Is it possible or considered for future development to attach user data to a task? My primary interest is a seemingly simple case when a taskflow consists of tasks each with consistent set of boolean flags attached to it. They are used by conditional tasks to skip over some calculations if certain flags are set in dependencies.

    It seems that Node::name is similar kind of data. Maybe it could be generalized for such use cases.

    I can also help implementing this.

    opened by Endilll 13
  • Calling a pybind11 function in a for_each_index executor causes a deadlock

    Calling a pybind11 function in a for_each_index executor causes a deadlock

    Describe the bug Minimal test case attached.

    callback_test.tar.gz

    Calling a pybind11 function in a for_each_index executor causes a deadlock:

    tests/test_callback.py [0.0, 1.0, 2.0, 3.0, 4.0]
    [0.0, 1.0, 2.0, 3.0, 4.0]
    [0.0, 1.0, 2.0, 3.0, 4.0]
    [0.0, 1.0, 2.0, 3.0, 4.0]
    [New Thread 0x7fff95c94700 (LWP 3891)]
    ^C--Type <RET> for more, q to quit, c to continue without paging--
    
    Thread 1 "pytest" received signal SIGINT, Interrupt.
    syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
    38      ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
    (gdb) where
    #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
    #1  0x00007ffff57b9859 in std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /lib/x86_64-linux-gnu/libstdc++.so.6
    #2  0x00007fff95ca6dda in ?? ()
       from /home/mvm/.pyenv/versions/3.9.7/envs/studio/lib/python3.9/site-packages/callback_test-VERSION-py3.9-linux-x86_64.egg/callback_test/_callback_test.cpython-39-x86_64-linux-gnu.so
    #3  0x00007fff95ca5538 in ?? ()
       from /home/mvm/.pyenv/versions/3.9.7/envs/studio/lib/python3.9/site-packages/callback_test-VERSION-py3.9-linux-x86_64.egg/callback_test/_callback_test.cpython-39-x86_64-linux-gnu.so
    #4  0x00007fff95cb10d9 in ?? ()
       from /home/mvm/.pyenv/versions/3.9.7/envs/studio/lib/python3.9/site-packages/callback_test-VERSION-py3.9-linux-x86_64.egg/callback_test/_callback_test.cpython-39-x86_64-linux-gnu.so
    #5  0x00007ffff7d06cb3 in cfunction_call (func=0x7fff95cd7b30, args=<optimized out>, kwargs=<optimized out>) at Objects/methodobject.c:543
    #6  0x00007ffff7cbc634 in _PyObject_MakeTpCall (tstate=0x55555555c550, callable=0x7fff95cd7b30, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at Objects/call.c:191
    #7  0x00007ffff7cbf65a in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7fff95ce77b0, callable=0x7fff95cd7b30, tstate=0x55555555c550) at ./Include/cpython/abstract.h:116
    #8  _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7fff95ce77b0, callable=0x7fff95cd7b30, tstate=0x55555555c550) at ./Include/cpython/abstract.h:103
    #9  method_vectorcall (method=<optimized out>, args=0x7fff95ce77b8, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:53
    #10 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce77b8, callable=0x7fff95cdbdc0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #11 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce77b8, callable=0x7fff95cdbdc0) at ./Include/cpython/abstract.h:127
    #12 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #13 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #14 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7fff95ce7610, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #15 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=0, globals=<optimized out>) at Objects/call.c:330
    #16 0x00007ffff7cbc053 in PyVectorcall_Call (callable=0x7fff96b12430, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
    #17 0x00007ffff7c72fe8 in do_call_core (kwdict=0x7fff95cdb4c0, callargs=0x7ffff7530040, func=0x7fff96b12430, tstate=0x55555555c550) at Python/ceval.c:5123
    #18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3580
    #19 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7fff95cec040, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #20 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=1, kwnames=0x0, kwargs=0x7ffff5080d80,
        kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff6cdf800, qualname=0x7ffff6cdf800) at Python/ceval.c:4327
    #21 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #22 0x00007ffff7cbc053 in PyVectorcall_Call (callable=0x7ffff6bfd940, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
    #23 0x00007ffff7c72fe8 in do_call_core (kwdict=0x0, callargs=0x7ffff5080d60, func=0x7ffff6bfd940, tstate=0x55555555c550) at Python/ceval.c:5123
    #24 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3580
    #25 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x555556895730, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #26 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=4, kwnames=0x0, kwargs=0x7fff95ceb200,
        kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff6d9c2b0, qualname=0x7ffff6d9c2b0) at Python/ceval.c:4327
    #27 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #28 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ceb1e0, callable=0x7ffff6d9db80, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #29 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ceb1e0, callable=0x7ffff6d9db80) at ./Include/cpython/abstract.h:127
    #30 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #31 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #32 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7fff95ceb040, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #33 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at Objects/call.c:330
    #34 0x00007ffff7cbf620 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=5, args=0x7fff95ce4f00, callable=0x7ffff6db43a0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #35 method_vectorcall (method=<optimized out>, args=0x7fff95ce4f08, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:53
    #36 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce4f08, callable=0x7ffff6996a00, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #37 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce4f08, callable=0x7ffff6996a00) at ./Include/cpython/abstract.h:127
    #38 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #39 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #40 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7fff95ce4d60, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #41 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=1, kwnames=0x7ffff5080d18,
        kwargs=0x7ffff508c500, kwcount=1, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff752c1f0, qualname=0x7ffff6daf5d0) at Python/ceval.c:4327
    #42 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    --Type <RET> for more, q to quit, c to continue without paging--
    #43 0x00007ffff7cbc852 in _PyObject_FastCallDictTstate ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7fffffff98f0, [email protected]=1,
        [email protected]=0x7ffff5046f40) at Objects/call.c:129
    #44 0x00007ffff7cbcac4 in _PyObject_Call_Prepend ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7ffff6990d90, [email protected]=0x7ffff7530040,
        [email protected]=0x7ffff5046f40) at Objects/call.c:489
    #45 0x00007ffff7d2b5a9 in slot_tp_call (self=0x7ffff6990d90, args=0x7ffff7530040, kwds=0x7ffff5046f40) at Objects/typeobject.c:6718
    #46 0x00007ffff7cbc634 in _PyObject_MakeTpCall (tstate=0x55555555c550, callable=0x7ffff6990d90, args=<optimized out>, nargs=<optimized out>, keywords=0x7ffff6bfbac0) at Objects/call.c:191
    #47 0x00007ffff7c75dc9 in _PyObject_VectorcallTstate (kwnames=0x7ffff6bfbac0, nargsf=<optimized out>, args=0x7fff95ce6eb8, callable=0x7ffff6990d90, tstate=0x55555555c550) at ./Include/cpython/abstract.h:116
    #48 _PyObject_VectorcallTstate (kwnames=0x7ffff6bfbac0, nargsf=<optimized out>, args=0x7fff95ce6eb8, callable=0x7ffff6990d90, tstate=0x55555555c550) at ./Include/cpython/abstract.h:103
    #49 PyObject_Vectorcall (kwnames=0x7ffff6bfbac0, nargsf=<optimized out>, args=0x7fff95ce6eb8, callable=0x7ffff6990d90) at ./Include/cpython/abstract.h:127
    #50 call_function (kwnames=0x7ffff6bfbac0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #51 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3535
    #52 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7fff95ce6d40, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #53 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330
    #54 0x00007ffff7c75fef in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce4d00, callable=0x7ffff6c12280, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #55 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce4d00, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #56 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #57 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3504
    #58 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7fff95ce4b80, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #59 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330
    #60 0x00007ffff7cbc053 in PyVectorcall_Call (callable=0x7ffff6c414c0, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
    #61 0x00007ffff7c72fe8 in do_call_core (kwdict=0x0, callargs=0x7ffff68954f0, func=0x7ffff6c414c0, tstate=0x55555555c550) at Python/ceval.c:5123
    #62 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3580
    #63 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x5555568954b0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #64 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=4, kwnames=0x0, kwargs=0x7fff95ce4b60,
        kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff6d9c2b0, qualname=0x7ffff6d9c2b0) at Python/ceval.c:4327
    #65 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #66 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce4b40, callable=0x7ffff6d9db80, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #67 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce4b40, callable=0x7ffff6d9db80) at ./Include/cpython/abstract.h:127
    #68 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #69 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #70 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7fff95ce49a0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #71 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at Objects/call.c:330
    #72 0x00007ffff7cbf620 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=5, args=0x7fff95ce4960, callable=0x7ffff6db43a0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #73 method_vectorcall (method=<optimized out>, args=0x7fff95ce4968, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:53
    #74 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce4968, callable=0x7ffff6999280, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #75 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95ce4968, callable=0x7ffff6999280) at ./Include/cpython/abstract.h:127
    #76 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #77 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #78 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7fff95ce47c0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #79 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=1, kwnames=0x7ffff69511a8,
        kwargs=0x7ffff508c6a0, kwcount=1, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff752c1f0, qualname=0x7ffff6daf5d0) at Python/ceval.c:4327
    #80 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #81 0x00007ffff7cbc852 in _PyObject_FastCallDictTstate ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7fffffffa4a0, [email protected]=1,
        [email protected]=0x7ffff50a8a80) at Objects/call.c:129
    #82 0x00007ffff7cbcac4 in _PyObject_Call_Prepend ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7ffff699a040, [email protected]=0x7ffff7530040,
        [email protected]=0x7ffff50a8a80) at Objects/call.c:489
    #83 0x00007ffff7d2b5a9 in slot_tp_call (self=0x7ffff699a040, args=0x7ffff7530040, kwds=0x7ffff50a8a80) at Objects/typeobject.c:6718
    #84 0x00007ffff7cbc1c0 in _PyObject_Call (tstate=0x55555555c550, callable=0x7ffff699a040, args=0x7ffff7530040, kwargs=<optimized out>) at Objects/call.c:281
    #85 0x00007ffff7c72fe8 in do_call_core (kwdict=0x7ffff50a8a80, callargs=0x7ffff7530040, func=0x7ffff699a040, tstate=0x55555555c550) at Python/ceval.c:5123
    #86 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3580
    --Type <RET> for more, q to quit, c to continue without paging--
    #87 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7fff95cdee40, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #88 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=0, kwnames=0x0, kwargs=0x555555dbb878,
        kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x7fff95cd8740, name=0x7ffff73e9530, qualname=0x7ffff6c1bcf0) at Python/ceval.c:4327
    #89 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #90 0x00007ffff7c750b8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x555555dbb878, callable=0x7ffff503f700, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #91 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x555555dbb878, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #92 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #93 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3518
    #94 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x555555dbb6b0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #95 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=2, kwnames=0x7ffff6c2ad58,
        kwargs=0x7fff95ce41f0, kwcount=2, kwstep=1, defs=0x7ffff6c9a298, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff6da0930, qualname=0x7ffff6c2c440) at Python/ceval.c:4327
    #96 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #97 0x00007ffff7cbf620 in _PyObject_VectorcallTstate (kwnames=0x7ffff6c2ad40, nargsf=2, args=0x7fff95ce41e0, callable=0x7ffff6c41a60, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #98 method_vectorcall (method=<optimized out>, args=0x7fff95ce41e8, nargsf=<optimized out>, kwnames=0x7ffff6c2ad40) at Objects/classobject.c:53
    #99 0x00007ffff7c75245 in _PyObject_VectorcallTstate (kwnames=0x7ffff6c2ad40, nargsf=<optimized out>, args=0x7fff95ce41e8, callable=0x7fff95cdb940, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #100 PyObject_Vectorcall (kwnames=0x7ffff6c2ad40, nargsf=<optimized out>, args=0x7fff95ce41e8, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #101 call_function (kwnames=0x7ffff6c2ad40, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #102 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3535
    #103 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7fff95ce4040, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #104 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=2, kwnames=0x0, kwargs=0x7fff95cd8228,
        kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff6c25f80, qualname=0x7ffff6c25f80) at Python/ceval.c:4327
    #105 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #106 0x00007ffff7cbc053 in PyVectorcall_Call (callable=0x7ffff6c41820, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
    #107 0x00007ffff7c72fe8 in do_call_core (kwdict=0x7fff95cdb800, callargs=0x7fff95cd8200, func=0x7ffff6c41820, tstate=0x55555555c550) at Python/ceval.c:5123
    #108 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3580
    #109 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7fff95cdfd60, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #110 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=3, kwnames=0x0, kwargs=0x7fff96b47da8,
        kwcount=0, kwstep=1, defs=0x7ffff6c88238, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff6c2a5b0, qualname=0x7ffff6c2a5b0) at Python/ceval.c:4327
    #111 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #112 0x00007ffff7c750b8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff96b47d90, callable=0x7ffff6c41700, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #113 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff96b47d90, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #114 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #115 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3518
    #116 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7fff96b47be0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #117 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=1, kwnames=0x7ffff6cabbf8,
        kwargs=0x7fff95cde390, kwcount=1, kwstep=1, defs=0x7ffff6c2d818, defcount=2, kwdefs=0x0, closure=0x0, name=0x7ffff6c2a4f0, qualname=0x7ffff6c2a4f0) at Python/ceval.c:4327
    #118 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #119 0x00007ffff7c75245 in _PyObject_VectorcallTstate (kwnames=0x7ffff6cabbe0, nargsf=<optimized out>, args=0x7fff95cde388, callable=0x7ffff6c41310, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #120 PyObject_Vectorcall (kwnames=0x7ffff6cabbe0, nargsf=<optimized out>, args=0x7fff95cde388, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #121 call_function (kwnames=0x7ffff6cabbe0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #122 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3535
    #123 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7fff95cde200, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #124 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:330
    #125 0x00007ffff7cbc053 in PyVectorcall_Call (callable=0x7ffff6c41280, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
    #126 0x00007ffff7c72fe8 in do_call_core (kwdict=0x0, callargs=0x7fff95cd8b00, func=0x7ffff6c41280, tstate=0x55555555c550) at Python/ceval.c:5123
    #127 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3580
    #128 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x5555568944c0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #129 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=4, kwnames=0x0, kwargs=0x7fff95cdf7a0,
        kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff6d9c2b0, qualname=0x7ffff6d9c2b0) at Python/ceval.c:4327
    #130 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    --Type <RET> for more, q to quit, c to continue without paging--
    #131 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95cdf780, callable=0x7ffff6d9db80, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #132 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95cdf780, callable=0x7ffff6d9db80) at ./Include/cpython/abstract.h:127
    #133 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #134 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #135 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7fff95cdf5e0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #136 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at Objects/call.c:330
    #137 0x00007ffff7cbf620 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=5, args=0x555555db7d90, callable=0x7ffff6db43a0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #138 method_vectorcall (method=<optimized out>, args=0x555555db7d98, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:53
    #139 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x555555db7d98, callable=0x7ffff69999c0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #140 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x555555db7d98, callable=0x7ffff69999c0) at ./Include/cpython/abstract.h:127
    #141 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #142 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #143 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x555555db7bf0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #144 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=1, kwnames=0x7ffff5046858,
        kwargs=0x7ffff508cc00, kwcount=2, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff752c1f0, qualname=0x7ffff6daf5d0) at Python/ceval.c:4327
    #145 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #146 0x00007ffff7cbc852 in _PyObject_FastCallDictTstate ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7fffffffbbe0, [email protected]=1,
        [email protected]=0x7ffff5046080) at Objects/call.c:129
    #147 0x00007ffff7cbcac4 in _PyObject_Call_Prepend ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7ffff699a1f0, [email protected]=0x7ffff7530040,
        [email protected]=0x7ffff5046080) at Objects/call.c:489
    #148 0x00007ffff7d2b5a9 in slot_tp_call (self=0x7ffff699a1f0, args=0x7ffff7530040, kwds=0x7ffff5046080) at Objects/typeobject.c:6718
    #149 0x00007ffff7cbc634 in _PyObject_MakeTpCall (tstate=0x55555555c550, callable=0x7ffff699a1f0, args=<optimized out>, nargs=<optimized out>, keywords=0x7ffff6d93400) at Objects/call.c:191
    #150 0x00007ffff7c75dc9 in _PyObject_VectorcallTstate (kwnames=0x7ffff6d93400, nargsf=<optimized out>, args=0x555555db4bc8, callable=0x7ffff699a1f0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:116
    #151 _PyObject_VectorcallTstate (kwnames=0x7ffff6d93400, nargsf=<optimized out>, args=0x555555db4bc8, callable=0x7ffff699a1f0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:103
    #152 PyObject_Vectorcall (kwnames=0x7ffff6d93400, nargsf=<optimized out>, args=0x555555db4bc8, callable=0x7ffff699a1f0) at ./Include/cpython/abstract.h:127
    #153 call_function (kwnames=0x7ffff6d93400, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #154 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3535
    #155 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x555555db4a30, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #156 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330
    #157 0x00007ffff7cbc053 in PyVectorcall_Call (callable=0x7ffff6c469d0, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
    #158 0x00007ffff7c72fe8 in do_call_core (kwdict=0x0, callargs=0x7ffff6951370, func=0x7ffff6c469d0, tstate=0x55555555c550) at Python/ceval.c:5123
    #159 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3580
    #160 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x555556894240, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #161 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=4, kwnames=0x0, kwargs=0x7fff95cdf5c0,
        kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff6d9c2b0, qualname=0x7ffff6d9c2b0) at Python/ceval.c:4327
    #162 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #163 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95cdf5a0, callable=0x7ffff6d9db80, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #164 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95cdf5a0, callable=0x7ffff6d9db80) at ./Include/cpython/abstract.h:127
    #165 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #166 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #167 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7fff95cdf400, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #168 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at Objects/call.c:330
    #169 0x00007ffff7cbf620 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=5, args=0x7fff95cdf3c0, callable=0x7ffff6db43a0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #170 method_vectorcall (method=<optimized out>, args=0x7fff95cdf3c8, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:53
    #171 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95cdf3c8, callable=0x7ffff6999e00, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #172 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff95cdf3c8, callable=0x7ffff6999e00) at ./Include/cpython/abstract.h:127
    #173 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #174 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #175 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7fff95cdf220, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #176 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=1, kwnames=0x7ffff50800b8,
    --Type <RET> for more, q to quit, c to continue without paging--
        kwargs=0x7ffff508c9a0, kwcount=1, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff752c1f0, qualname=0x7ffff6daf5d0) at Python/ceval.c:4327
    #177 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #178 0x00007ffff7cbc852 in _PyObject_FastCallDictTstate ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7fffffffc610, [email protected]=1,
        [email protected]=0x7ffff5093200) at Objects/call.c:129
    #179 0x00007ffff7cbcac4 in _PyObject_Call_Prepend (ts[email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7ffff699a310, [email protected]=0x7ffff7530040,
        [email protected]=0x7ffff5093200) at Objects/call.c:489
    #180 0x00007ffff7d2b5a9 in slot_tp_call (self=0x7ffff699a310, args=0x7ffff7530040, kwds=0x7ffff5093200) at Objects/typeobject.c:6718
    #181 0x00007ffff7cbc634 in _PyObject_MakeTpCall (tstate=0x55555555c550, callable=0x7ffff699a310, args=<optimized out>, nargs=<optimized out>, keywords=0x7ffff6f0d880) at Objects/call.c:191
    #182 0x00007ffff7c75dc9 in _PyObject_VectorcallTstate (kwnames=0x7ffff6f0d880, nargsf=<optimized out>, args=0x7ffff6cceee0, callable=0x7ffff699a310, tstate=0x55555555c550) at ./Include/cpython/abstract.h:116
    #183 _PyObject_VectorcallTstate (kwnames=0x7ffff6f0d880, nargsf=<optimized out>, args=0x7ffff6cceee0, callable=0x7ffff699a310, tstate=0x55555555c550) at ./Include/cpython/abstract.h:103
    #184 PyObject_Vectorcall (kwnames=0x7ffff6f0d880, nargsf=<optimized out>, args=0x7ffff6cceee0, callable=0x7ffff699a310) at ./Include/cpython/abstract.h:127
    #185 call_function (kwnames=0x7ffff6f0d880, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #186 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3535
    #187 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7ffff6cced60, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #188 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:330
    #189 0x00007ffff7c750b8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x555555ab4a18, callable=0x7ffff6c468b0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #190 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x555555ab4a18, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #191 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #192 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3518
    #193 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x555555ab4870, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #194 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:330
    #195 0x00007ffff7c750b8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff50a2d18, callable=0x7ffff6c46790, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #196 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff50a2d18, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #197 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #198 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3518
    #199 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x7ffff50a2ba0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #200 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330
    #201 0x00007ffff7cbc053 in PyVectorcall_Call (callable=0x7ffff6c46820, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
    #202 0x00007ffff7c72fe8 in do_call_core (kwdict=0x0, callargs=0x7ffff50800d0, func=0x7ffff6c46820, tstate=0x55555555c550) at Python/ceval.c:5123
    #203 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3580
    #204 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x555555d9d200, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #205 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=4, kwnames=0x0, kwargs=0x555555b704f0,
        kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff6d9c2b0, qualname=0x7ffff6d9c2b0) at Python/ceval.c:4327
    #206 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    #207 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x555555b704d0, callable=0x7ffff6d9db80, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #208 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x555555b704d0, callable=0x7ffff6d9db80) at ./Include/cpython/abstract.h:127
    #209 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #210 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #211 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x555555b70330, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #212 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at Objects/call.c:330
    #213 0x00007ffff7cbf620 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=5, args=0x7ffff6830960, callable=0x7ffff6db43a0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #214 method_vectorcall (method=<optimized out>, args=0x7ffff6830968, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:53
    #215 0x00007ffff7c76c11 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff6830968, callable=0x7ffff69941c0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #216 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff6830968, callable=0x7ffff69941c0) at ./Include/cpython/abstract.h:127
    #217 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #218 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
    #219 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x7ffff68307c0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #220 _PyEval_EvalCode ([email protected]=0x55555555c550, _co=<optimized out>, globals=<optimized out>, [email protected]=0x0, args=<optimized out>, argcount=1, kwnames=0x7ffff5080088,
        kwargs=0x7ffff68c7180, kwcount=1, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff752c1f0, qualname=0x7ffff6daf5d0) at Python/ceval.c:4327
    #221 0x00007ffff7cbc38e in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
    --Type <RET> for more, q to quit, c to continue without paging--
    #222 0x00007ffff7cbc852 in _PyObject_FastCallDictTstate ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7fffffffd340, [email protected]=1,
        [email protected]=0x7ffff68cf300) at Objects/call.c:129
    #223 0x00007ffff7cbcac4 in _PyObject_Call_Prepend ([email protected]=0x55555555c550, [email protected]=0x7ffff6dacb80, [email protected]=0x7ffff6a54280, [email protected]=0x7ffff7530040,
        [email protected]=0x7ffff68cf300) at Objects/call.c:489
    #224 0x00007ffff7d2b5a9 in slot_tp_call (self=0x7ffff6a54280, args=0x7ffff7530040, kwds=0x7ffff68cf300) at Objects/typeobject.c:6718
    #225 0x00007ffff7cbc634 in _PyObject_MakeTpCall (tstate=0x55555555c550, callable=0x7ffff6a54280, args=<optimized out>, nargs=<optimized out>, keywords=0x7ffff6f0da90) at Objects/call.c:191
    #226 0x00007ffff7c75dc9 in _PyObject_VectorcallTstate (kwnames=0x7ffff6f0da90, nargsf=<optimized out>, args=0x5555557bc9b8, callable=0x7ffff6a54280, tstate=0x55555555c550) at ./Include/cpython/abstract.h:116
    #227 _PyObject_VectorcallTstate (kwnames=0x7ffff6f0da90, nargsf=<optimized out>, args=0x5555557bc9b8, callable=0x7ffff6a54280, tstate=0x55555555c550) at ./Include/cpython/abstract.h:103
    #228 PyObject_Vectorcall (kwnames=0x7ffff6f0da90, nargsf=<optimized out>, args=0x5555557bc9b8, callable=0x7ffff6a54280) at ./Include/cpython/abstract.h:127
    #229 call_function (kwnames=0x7ffff6f0da90, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #230 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3535
    #231 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x5555557bc7f0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #232 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:330
    #233 0x00007ffff7c750b8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555557bcc20, callable=0x7ffff6d0d4c0, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #234 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555557bcc20, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #235 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #236 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3518
    #237 0x00007ffff7c6e42b in _PyEval_EvalFrame (throwflag=0, f=0x5555557bcaa0, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #238 function_code_fastcall (tstate=0x55555555c550, co=<optimized out>, args=<optimized out>, nargs=0, globals=<optimized out>) at Objects/call.c:330
    #239 0x00007ffff7c750b8 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555ad8b0, callable=0x7ffff6d0d550, tstate=0x55555555c550) at ./Include/cpython/abstract.h:118
    #240 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555ad8b0, callable=<optimized out>) at ./Include/cpython/abstract.h:127
    #241 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c550) at Python/ceval.c:5075
    #242 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3518
    #243 0x00007ffff7dac814 in _PyEval_EvalFrame (throwflag=0, f=0x5555555ad730, tstate=0x55555555c550) at ./Include/internal/pycore_ceval.h:40
    #244 _PyEval_EvalCode (tstate=0x55555555c550, [email protected]=0x7ffff74a8870, [email protected]=0x7ffff74a03c0, [email protected]=0x7ffff74a03c0, [email protected]=0x0, [email protected]=0,
        kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4327
    #245 0x00007ffff7dacb62 in _PyEval_EvalCodeWithName ([email protected]=0x7ffff74a8870, [email protected]=0x7ffff74a03c0, [email protected]=0x7ffff74a03c0, [email protected]=0x0, [email protected]=0,
        [email protected]=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4359
    #246 0x00007ffff7dacbb2 in PyEval_EvalCodeEx ([email protected]=0x7ffff74a8870, [email protected]=0x7ffff74a03c0, [email protected]=0x7ffff74a03c0, [email protected]=0x0, [email protected]=0,
        [email protected]=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4375
    #247 0x00007ffff7dacbdf in PyEval_EvalCode ([email protected]=0x7ffff74a8870, [email protected]=0x7ffff74a03c0, [email protected]=0x7ffff74a03c0) at Python/ceval.c:826
    #248 0x00007ffff7defb05 in run_eval_code_obj (locals=0x7ffff74a03c0, globals=0x7ffff74a03c0, co=0x7ffff74a8870, tstate=0x55555555c550) at Python/pythonrun.c:1219
    #249 run_mod (mod=<optimized out>, [email protected]=0x7ffff73d3030, [email protected]=0x7ffff74a03c0, [email protected]=0x7ffff74a03c0, [email protected]=0x7fffffffdad8,
        [email protected]=0x7ffff74ed950) at Python/pythonrun.c:1240
    #250 0x00007ffff7df171b in pyrun_file (flags=0x7fffffffdad8, closeit=1, locals=0x7ffff74a03c0, globals=0x7ffff74a03c0, start=257, filename=0x7ffff73d3030, fp=0x555555559340) at Python/pythonrun.c:1138
    #251 pyrun_simple_file (flags=0x7fffffffdad8, closeit=1, filename=0x7ffff73d3030, fp=0x555555559340) at Python/pythonrun.c:449
    #252 PyRun_SimpleFileExFlags ([email protected]=0x555555559340, filename=<optimized out>, [email protected]=1, [email protected]=0x7fffffffdad8) at Python/pythonrun.c:482
    #253 0x00007ffff7df1d1c in PyRun_AnyFileExFlags ([email protected]=0x555555559340, filename=<optimized out>, [email protected]=1, [email protected]=0x7fffffffdad8) at Python/pythonrun.c:91
    #254 0x00007ffff7e10048 in pymain_run_file (cf=0x7fffffffdad8, config=0x55555555d8a0) at Modules/main.c:373
    #255 pymain_run_python (exitcode=0x7fffffffdacc) at Modules/main.c:598
    #256 Py_RunMain () at Modules/main.c:677
    #257 0x00007ffff7e104ea in pymain_main (args=0x7fffffffdbd0) at Modules/main.c:707
    #258 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
    #259 0x00007ffff7a2d0b3 in __libc_start_main (main=0x555555555060 <main>, argc=4, argv=0x7fffffffdd38, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdd28)
        at ../csu/libc-start.c:308
    #260 0x000055555555509e in _start ()
    

    To Reproduce Steps to reproduce the behavior:

    1. Build attached test case (python 3 and pybind11 are required)
    2. Run gdb -ex r --args bash pytest -s tests/test_callback.py
    3. CTRL+C
    4. where

    Desktop (please complete the following information):

    • OS: Ubuntu 20.04 WSL2 (Linux 5.10.60.1-microsoft-standard-WSL2 SMP Wed Aug 25 23:20:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux)
    • Version 3.2.0
    opened by mydatamodels 4
  • Usage as git submodule with cmake

    Usage as git submodule with cmake

    Whats the correct way to use taskflow as git submodule. I tried to use it in the following way:

    set(TF_BUILD_CUDA OFF CACHE BOOL "Enables build of CUDA code")
    set(TF_BUILD_TESTS OFF CACHE BOOL "Enables build of tests")
    set(TF_BUILD_EXAMPLES OFF CACHE BOOL "Enables build of examples")
    
    add_subdirectory(extern/taskflow)
    ...
    target_link_libraries(target tf::default_settings)
    

    However this fails when compiling with the error:

    fatal error: taskflow/taskflow.hpp: No such file or directory
          4 | #include <taskflow/taskflow.hpp>
    

    I could not find any documentation on how to add taskflow as a submodule.

    Since taskflow is header only I am currently working around this by using

    set(THREADS_PREFER_PTHREAD_FLAG ON)
    find_package(Threads REQUIRED)
    ...
    target_include_directories(target PRIVATE extern/taskflow/)
    target_link_libraries(target Threads::Threads)
    
    opened by maxbachmann 0
  • Parallel prefix sum

    Parallel prefix sum

    An implementation of any modern parallel prefix sum algorithm would be a great addition to the library.

    tf::Task task1 = taskflow.prefix_sum_exclusive(
      first, last, [] (auto& i) -> uint { return first[i]; }    
    );
    tf::Task task2 = taskflow.prefix_sum_inclusive(
      first, last, [] (auto& i) -> uint { return first[i]; }    
    );
    
    opened by corporateshark 2
  • Doesn't respect the cmake find_package's QUIET keyword

    Doesn't respect the cmake find_package's QUIET keyword

    Describe the bug

    find_package(Taskflow REQUIRED QUIET) still prints:

    - Taskflow found. Headers: /usr/local/include

    Version: 3.2.0 cmake-3.20.5 FreeBSD 13

    opened by yurivict 2
  • Taskflow is failing to run when compiled with cuda on windows

    Taskflow is failing to run when compiled with cuda on windows

    tried compiling and running the following code. It fails at the taskflow code segment when compiling for cuda, but runs when running with g++. Running this on windows 10 with a GeForce RTX 2060 with Cuda 11.2

    #include <taskflow/taskflow.hpp>
    #include <iostream>
    
    int main(int argc, const char** argv) {
    
    	std::cout << "start\n";
    	tf::Taskflow taskflow;
    	tf::Executor executor;
    
    	taskflow.emplace([](){
    		std::cout << "running thread\n";
    	});
    
    	executor.run(taskflow).wait();
    
    std::cout << "finish\n";
    
      return 0;
    }
    

    g++ output

    PS C:\Users\Grant\Documents> .\test.exe
    start
    finish
    

    cuda output

    PS C:\Users\Grant\Documents> .\test.exe
    start
    
    opened by GPomare 9
  • Added for_each_index_nested ( do not merge yet )

    Added for_each_index_nested ( do not merge yet )

    Pull request for #328

    I've added in for_each_index_nested however I cannot get the the test case to work. Have a look at the test case in

    unittests/algorithm.cpp

    if I use just for_each_index then the test case passes. If I use the implementation of for_each_index_nested as you reviewed yesterday then there is an error. Would you please have a look at and see what the problem may be.

    opened by bradphelan 0
  • How to create a dynamic subflow from within an  for_each_index operation ie: nested for_each_index.

    How to create a dynamic subflow from within an for_each_index operation ie: nested for_each_index.

    The example given in the docs is

    taskflow.for_each_index(0, 100,  2, [](int i) { });  // 50 loops with a + step
    

    which represents

    for(size_t i = 0; i<100;i+=1){}
    

    but how to do a nested loop given that the callback for for_each_index does not pass a subflow object in?

    for(size_t i = 0; i<5;i+=1){
        for(size_t j = 0; i < 100; j+=1){
            someTask(i,j);
        }
    }
    
    enhancement 
    opened by bradphelan 9
  • Thread-agnostic scheduling

    Thread-agnostic scheduling

    This is a sketch of a solution to #303. I noticed that you guys are actively working on the same code I've modified here (for instance, dynamically calculating max_steals in Executor::_explore_task) and I've torn a lot of stuff up here so I don't actually expect this code to be merged, but hopefully it will provide some inspiration and generate some discussion.

    Unfortunately the diff is pretty unusable because of the way I've shuffled the files around but I'll summarize the main changes:

    • class Executor was turned into a thin wrapper for a base class called class TaskScheduler which stores the previous contents of class Executor. Executor's job is now to simply store a std::vector<std::thread>, so that TaskScheduler no longer has to be explicitly tied to its executing threads.
    • TaskScheduler no longer has any explicit dependence on N. Instead, threads are registered as workers with TaskScheduler::register_worker(std::thread&), which configures the thread and adds a new worker to the vector of workers.
    • When all the threads are registered, TaskScheduler::_configure() is called to set up the things that were previously done in the constructor based on the number of workers N. I've used placement new to re-initialize the Notifier which is pretty ugly, someone who understands the codebase better may have a better idea.
    • I also added a TaskScheduler::_shutdown() function, so that task execution can be halted before the wrapper class join()s its member threads.

    The interface I've created for TaskScheduler doesn't have to actually be this way, it's just what I came up with on my way to turning the execution code into a usable base class.

    This doesn't solve some of the other common requests related to #303 such as allowing the main application thread to be a worker, or requiring certain tasks to execute on particular threads, or adding or removing threads during execution, but it's an important step in decoupling the details of the worker from the executor, and does solve the issues with over-threading.

    opened by z-adams 1
  • Explicit dynamic subgraphs cleanup

    Explicit dynamic subgraphs cleanup

    We develop a render application that recreate its dynamic Subflow graph every frame. The graph is big enough (1000+ nodes) so its cleanup before remaking takes much time (see Figure 1). It is possible to make cleanup procedure explicit? In that case we could run graph cleanup in parallel with other work (see Figure 2). Thank you

    image

    enhancement 
    opened by EugeneGribovich 2
  • Better / more control over threads

    Better / more control over threads

    Related to : https://github.com/taskflow/taskflow/issues/191 https://github.com/taskflow/taskflow/issues/164 https://github.com/taskflow/taskflow/issues/190 https://github.com/taskflow/taskflow/issues/204

    I want better control over the treads used to execute task flow tasks.

    In my situation. I want to be able to call tf::WorkerThread::run(executor) in any arbitrary thread (but not inside the callstack of a previous call to tf::WorkerThread::run()!) To make the current thread execute jobs. The call to run() would only return when either the executor is shutting down, or tf::WorkerThread::cancel() is called with the current threads worker id.

    This solves the following situations:

    • Resources that can only be accessed / managed from a thread that is not owned by tf::Executor, such as opengl resources or other 3rd party frameworks.
    • Micromanaging properties of the threads for an executor, such as assigning OS scheduler priorities. Numa nodes, security considerations such as setting system call restrictions on the worker threads, and whatever else someone might want to do.
    • Multitasking systems where the available resources for a particular task flow need to change in response to external constraints, such as a new job being submitted that has a higher priority. The job manager could remove one or more threads from the current task flow and attach them to the new job's task flow instead. And migrate them back when the new job is finished.
    • The "single threaded" mode of execution, where a task flow is created, and then used in the same thread it was created in. Previously this was an executor with 0 threads.
    • Makes it easier to avoid over-threading an application. While its true that using a few more threads than the CPU supports can provide slightly higher density than if you had only threads equal to numcpu. Due to some threads occasionally blocking on system calls and such. The current model of tf::Executor makes it far too easy to overcommit by creating multiple different executors and using them concurrently with each other. Or similarly bad: Creating new tf::Executors as new task flows are constructed, executing the taskflow and then shutting down the executor. This makes the end user pay the cost of spawning new threads every time.

    Looking at the code for Executor. I don't see this as being a very large change internally. And the current model could continue being supported as well, with the same API. On top of a base class that provided "bring your own threads" support.

    The way threads are managed now is to simply spawn new std::threads with the construction of the tf::Executor. And have them call the _exploit_task function. While I do see occasional implicit reliance on the N variable to indicate the number of worker threads that exist, that's simple enough to replace with a call to workers.size()

    The proposed change is to make a base class for the tf::Executor that holds the work stealing logic and communication primatives. Then the current tf::Executor spawns its threads as it currently does, and calls the base class's "add this thread to the list" functionality. The current executor doesn't need to support changing the number of threads dynamically if it doesn't want to. But the whole point of proposing this base class is to allow for threads to be added and removed while a task flow is executing.

    Prior art for this approach would be Boost.Asio. where Boost.Asio entirely lacks the concept of a worker pool, and requires the end user to configure threads in their desired way, then execute the boost::asio::run() function. Boost Asio supports dynamically adding and removing threads from its executor at any time in the programs life

    enhancement 
    opened by jonesmz 2
Releases(v3.2.0)
Owner
Taskflow
A General-purpose Parallel and Heterogeneous Task Programming System
Taskflow
A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependenc

RichieSams 737 Dec 3, 2021
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

ArrayFire 3.7k Dec 1, 2021
C++-based high-performance parallel environment execution engine for general RL environments.

EnvPool is a highly parallel reinforcement learning environment execution engine which significantly outperforms existing environment executors. With

Sea AI Lab 230 Nov 30, 2021
Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model - Parallel Execution and Memory Abstraction

Kokkos: Core Libraries Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platfor

Kokkos 896 Dec 4, 2021
Task System presented in "Better Code: Concurrency - Sean Parent"

task_system task_system provides a task scheduler for modern C++. The scheduler manages an array of concurrent queues A task, when scheduled, is enque

Pranav 23 Nov 21, 2021
Material for the UIBK Parallel Programming Lab (2021)

UIBK PS Parallel Systems (703078, 2021) This repository contains material required to complete exercises for the Parallel Programming lab in the 2021

null 14 Nov 17, 2021
OOX: Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling.

OOX Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling. Try it Requirements: Install cmake,

Intel Corporation 16 Oct 29, 2021
A task scheduling framework designed for the needs of game developers.

Intel Games Task Scheduler (GTS) To the documentation. Introduction GTS is a C++ task scheduling framework for multi-processor platforms. It is design

null 373 Nov 27, 2021
A hybrid thread / fiber task scheduler written in C++ 11

Marl Marl is a hybrid thread / fiber task scheduler written in C++ 11. About Marl is a C++ 11 library that provides a fluent interface for running tas

Google 1.3k Nov 25, 2021
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Christian Blume 494 Dec 2, 2021
Powerful multi-threaded coroutine dispatcher and parallel execution engine

Quantum Library : A scalable C++ coroutine framework Quantum is a full-featured and powerful C++ framework build on top of the Boost coroutine library

Bloomberg 370 Nov 30, 2021
An optimized C library for math, parallel processing and data movement

PAL: The Parallel Architectures Library The Parallel Architectures Library (PAL) is a compact C library with optimized routines for math, synchronizat

Parallella 289 Dec 6, 2021
Shared-Memory Parallel Graph Partitioning for Large K

KaMinPar The graph partitioning software KaMinPar -- Karlsruhe Minimal Graph Partitioning. KaMinPar is a shared-memory parallel tool to heuristically

Karlsruhe High Quality Graph Partitioning 7 Oct 15, 2021
Operating system project - implementing scheduling algorithms and some system calls for XV6 OS

About XV6 xv6 is a modern reimplementation of Sixth Edition Unix in ANSI C for multiprocessor x86 and RISC-V systems. It was created for pedagogical p

Amirhossein Rajabpour 15 Oct 17, 2021
High Performance Linux C++ Network Programming Framework based on IO Multiplexing and Thread Pool

Kingpin is a C++ network programming framework based on TCP/IP + epoll + pthread, aims to implement a library for the high concurrent servers and clie

null 4 Dec 7, 2021
C++React: A reactive programming library for C++11.

C++React is reactive programming library for C++14. It enables the declarative definition of data dependencies between state and event flows. Based on

Sebastian 929 Nov 26, 2021
:copyright: Concurrent Programming Library (Coroutine) for C11

libconcurrent tiny asymmetric-coroutine library. Description asymmetric-coroutine bidirectional communication by yield_value/resume_value native conte

sharow 344 Nov 22, 2021
A competitive programming helper tool, which packages included libraries into a single file, suitable for online judges.

cpack Cpack is a competitive programming helper tool, which packages the main source file along with included libraries into a single file, suitable f

PetarMihalj 12 Oct 25, 2021