EnkiTS - A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

Overview

Support development of enkiTS through Github Sponsors or Patreon

Become a Patron

enkiTS Logo

enkiTS

Master branch Dev branch
Build Status for branch: master Build Status for branch: dev

enki Task Scheduler

A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

The primary goal of enkiTS is to help developers create programs which handle both data and task level parallelism to utilize the full performance of multicore CPUs, whilst being lightweight (only a small amount of code) and easy to use.

enkiTS was developed for, and is used in enkisoftware's Avoyd codebase.

Platforms

  • Windows, Linux, Mac OS, Android (should work on iOS)
  • x64 & x86, ARM

enkiTS is primarily developed on x64 and x86 Intel architectures on MS Windows, with well tested support for Linux and somewhat less frequently tested support on Mac OS and ARM Android.

Examples

Several examples exist in the example folder.

For further examples, see https://github.com/dougbinks/enkiTSExamples

Building

Building enkiTS is simple, just add the files in enkiTS/src to your build system (_c.* files can be ignored if you only need C++ interface), and add enkiTS/src to your include path. Unix / Linux builds will likely require the pthreads library.

For C++

  • Use #include "TaskScheduler.h"
  • Add enkiTS/src to your include path
  • Compile / Add to project:
    • TaskScheduler.cpp
  • Unix / Linux builds will likely require the pthreads library.

For C

  • Use #include "TaskScheduler_c.h"
  • Add enkiTS/src to your include path
  • Compile / Add to project:
    • TaskScheduler.cpp
    • TaskScheduler_c.cpp
  • Unix / Linux builds will likely require the pthreads library.

For cmake, on Windows / Mac OS X / Linux with cmake installed, open a prompt in the enkiTS directory and:

  1. mkdir build
  2. cd build
  3. cmake ..
  4. either run make all or for Visual Studio open enkiTS.sln

Project Features

  1. Lightweight - enkiTS is designed to be lean so you can use it anywhere easily, and understand it.
  2. Fast, then scalable - enkiTS is designed for consumer devices first, so performance on a low number of threads is important, followed by scalability.
  3. Braided parallelism - enkiTS can issue tasks from another task as well as from the thread which created the Task System, and has a simple task interface for both data parallel and task parallelism.
  4. Up-front Allocation friendly - enkiTS is designed for zero allocations during scheduling.
  5. Can pin tasks to a given thread - enkiTS can schedule a task which will only be run on the specified thread.
  6. Can set task priorities - Up to 5 task priorities can be configured via define ENKITS_TASK_PRIORITIES_NUM (defaults to 3). Higher priority tasks are run before lower priority ones.
  7. Can register external threads to use with enkiTS - Can configure enkiTS with numExternalTaskThreads which can be registered to use with the enkiTS API.
  8. Custom allocator API - can configure enkiTS with custom allocators, see example/CustomAllocator.cpp and example/CustomAllocator_c.c.
  9. Dependencies - can set dependendencies between tasks see example/Dependencies.cpp and example/Dependencies_c.c.
  10. Completion Actions - can perform an action on task completion. This avoids the expensive action of adding the task to the scheduler, and can be used to safely delete a completed task. See example/CompletionAction.cpp and example/CompletionAction_c.c
  11. NEW Can wait for pinned tasks - Can wait for pinned tasks, useful for creating IO threads which do no other work. See example/WaitForPinnedTasks.cpp and example/WaitForPinnedTasks_c.c.

Using enkiTS

C++ usage

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

// define a task set, can ignore range if we only do one thing
struct ParallelTaskSet : enki::ITaskSet {
    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // do something here, can issue tasks with g_TS
    }
};

int main(int argc, const char * argv[]) {
    g_TS.Initialize();
    ParallelTaskSet task; // default constructor has a set size of 1
    g_TS.AddTaskSetToPipe( &task );

    // wait for task set (running tasks if they exist)
    // since we've just added it and it has no range we'll likely run it.
    g_TS.WaitforTask( &task );
    return 0;
}

C++ 11 lambda usage

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

int main(int argc, const char * argv[]) {
   g_TS.Initialize();

   enki::TaskSet task( 1, []( enki::TaskSetPartition range_, uint32_t threadnum_  ) {
         // do something here
      }  );

   g_TS.AddTaskSetToPipe( &task );
   g_TS.WaitforTask( &task );
   return 0;
}

Task priorities usage in C++

// See full example in Priorities.cpp
#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

struct ExampleTask : enki::ITaskSet
{
    ExampleTask( ) { m_SetSize = size_; }

    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // See full example in Priorities.cpp
    }
};


// This example demonstrates how to run a long running task alongside tasks
// which must complete as early as possible using priorities.
int main(int argc, const char * argv[])
{
    g_TS.Initialize();

    ExampleTask lowPriorityTask( 10 );
    lowPriorityTask.m_Priority  = enki::TASK_PRIORITY_LOW;

    ExampleTask highPriorityTask( 1 );
    highPriorityTask.m_Priority = enki::TASK_PRIORITY_HIGH;

    g_TS.AddTaskSetToPipe( &lowPriorityTask );
    for( int task = 0; task < 10; ++task )
    {
        // run high priority tasks
        g_TS.AddTaskSetToPipe( &highPriorityTask );

        // wait for task but only run tasks of the same priority or higher on this thread
        g_TS.WaitforTask( &highPriorityTask, highPriorityTask.m_Priority );
    }
    // wait for low priority task, run any tasks on this thread whilst waiting
    g_TS.WaitforTask( &lowPriorityTask );

    return 0;
}

Pinned Tasks usage in C++

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

// define a task set, can ignore range if we only do one thing
struct PinnedTask : enki::IPinnedTask {
    void Execute() override {
      // do something here, can issue tasks with g_TS
    }
};

int main(int argc, const char * argv[]) {
    g_TS.Initialize();
    PinnedTask task; //default constructor sets thread for pinned task to 0 (main thread)
    g_TS.AddPinnedTask( &task );

    // RunPinnedTasks must be called on main thread to run any pinned tasks for that thread.
    // Tasking threads automatically do this in their task loop.
    g_TS.RunPinnedTasks();

    // wait for task set (running tasks if they exist)
    // since we've just added it and it has no range we'll likely run it.
    g_TS.WaitforTask( &task );
    return 0;
}

Dependency usage in C++

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

// define a task set, can ignore range if we only do one thing
struct TaskA : enki::ITaskSet {
    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // do something here, can issue tasks with g_TS
    }
};

struct TaskB : enki::ITaskSet {
    enki::Dependency m_Dependency;
    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // do something here, can issue tasks with g_TS
    }
};

int main(int argc, const char * argv[]) {
    g_TS.Initialize();
    
    // set dependencies once (can set more than one if needed).
    TaskA taskA;
    TaskB taskB;
    taskB.SetDependency( taskB.m_Dependency, &taskA );

    g_TS.AddTaskSetToPipe( &taskA ); // add first task
    g_TS.WaitforTask( &taskB );      // wait for last
    return 0;
}

External task thread usage in C++

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;
struct ParallelTaskSet : ITaskSet
{
    void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
        // Do something
    }
};

void threadFunction()
{
    g_TS.RegisterExternalTaskThread();

    // sleep for a while instead of doing something such as file IO
    std::this_thread::sleep_for( std::chrono::milliseconds( num_ * 100 ) );

    ParallelTaskSet task;
    g_TS.AddTaskSetToPipe( &task );
    g_TS.WaitforTask( &task);

    g_TS.DeRegisterExternalTaskThread();
}

int main(int argc, const char * argv[])
{
    enki::TaskSchedulerConfig config;
    config.numExternalTaskThreads = 1; // we have one extra external thread

    g_TS.Initialize( config );

    std::thread exampleThread( threadFunction );

    exampleThread.join();

    return 0;
}

WaitForPinnedTasks thread usage in C++ (useful for IO threads)

#include "TaskScheduler.h"

enki::TaskScheduler g_TS;

struct RunPinnedTaskLoopTask : enki::IPinnedTask
{
    void Execute() override
    {
        while( g_TS.GetIsRunning() )
        {
            g_TS.WaitForNewPinnedTasks(); // this thread will 'sleep' until there are new pinned tasks
            g_TS.RunPinnedTasks();
        }
    }
};

struct PretendDoFileIO : enki::IPinnedTask
{
    void Execute() override
    {
        // Do file IO
    }
};

int main(int argc, const char * argv[])
{
    enki::TaskSchedulerConfig config;

    // In this example we create more threads than the hardware can run,
    // because the IO thread will spend most of it's time idle or blocked
    // and therefore not scheduled for CPU time by the OS
    config.numTaskThreadsToCreate += 1;

    g_TS.Initialize( config );

    // in this example we place our IO threads at the end
    RunPinnedTaskLoopTask runPinnedTaskLoopTasks;
    runPinnedTaskLoopTasks.threadNum = g_TS.GetNumTaskThreads() - 1;
    g_TS.AddPinnedTask( &runPinnedTaskLoopTasks );

    // Send pretend file IO task to external thread FILE_IO
    PretendDoFileIO pretendDoFileIO;
    pretendDoFileIO.threadNum = runPinnedTaskLoopTasks.threadNum;
    g_TS.AddPinnedTask( &pretendDoFileIO );

    // ensure runPinnedTaskLoopTasks complete by explicitly calling shutdown
    g_TS.WaitforAllAndShutdown();

    return 0;
}

Bindings

Deprecated

The C++98 compatible branch has been deprecated as I'm not aware of anyone needing it.

The user thread versions are no longer being maintained as they are no longer in use. Similar functionality can be obtained with the externalTaskThreads

Projects using enkiTS

Avoyd

Avoyd is an abstract 6 degrees of freedom voxel game. enkiTS was developed for use in our in-house engine powering Avoyd.

Avoyd screenshot

Imogen

GPU/CPU Texture Generator

Imogen screenshot

ToyPathRacer

Aras Pranckevičius' code for his series on Daily Path Tracer experiments with various languages.

ToyPathTracer screenshot.

License (zlib)

Copyright (c) 2013-2020 Doug Binks

This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgement in the product documentation would be appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.
Comments
  • question : support sleep-waiting ?

    question : support sleep-waiting ?

    Hi, It looks like all WaitFor* methods in enkiTS is busy waiting. In some situations they will result in long spinning time. So what is the preferred/proper way to use sleep-waiting for tasks finishing in enkiTS ?

    feature 
    opened by zhaijialong 15
  • Feature: custom allocators

    Feature: custom allocators

    Whilst enkiTS only allocates at initialization time, a custom allocator would be useful to some users for tracking memory consumption.

    https://twitter.com/serhii_rieznik/status/1187011358220541952

    feature 
    opened by dougbinks 14
  • Pinned task problem

    Pinned task problem

    I think there is a problem with TaskScheduler::WakeThreadsForNewTasks() and pinned tasks.

    Consider a possible case: what if the number of suspended threads - those waiting the m_pNewTaskSemaphore to be signalled - increases just before the SemaphoreSignal() called, i.e. the value of waiting was not accurate as some threads fell asleep between the check and the signal. In this case some task threads would idle. Most of the time it's not a big deal, those threads would awake when next task arrives. (Not sure if it possible, but even if we are so unlucky and all the tasks threads fall asleep just after the check - the calling thread would handle the task itself.)

    Now, when using AddPinnedTask() there's a subtle chance that the thread we pinned the task to was suspended as described above:

    • [User thread] calls AddPinnedTask().
    • [Task thread] falls asleep just after the m_NumThreadsWaitingForNewTasks check.
    • The semaphore is either not being released at all or it awakens some threads but the desired one.
    • [User thread] calls WaitforTask() and hangs as the thread the task is pinned to can't handle the request.

    This is the problem I ran into while trying to port my code to enkiTS. Though to be honest I'm not quite sure if it indeed the case and if my assumption is accurate. Parallel programming is hard.

    bug confirmed 
    opened by Vuhdo 12
  • Taskset dependencies

    Taskset dependencies

    May I ask if you are planning on adding events in the near future? If you are, I would really like to hear about how you plan on designing them. If not, I would be interested in adding them as a PR.

    My motivation is: I am designing a completely asynch image decompressor, so I need to connect up different task sets with events, so that the completion of one task set causes a waiting set to enqueue itself.

    Interested to hear your plans for this feature.

    feature 
    opened by boxerab 12
  • Crash in Android 11 beta

    Crash in Android 11 beta

    This is almost certainly an issue on Android's side, not yours, but I wanted to at least bring things to attention. In enki::DefaultAllocFunc(), non-Win32 programs use posix_memalign(). Our 64-bit Android app is segfaulting at launch, and it seems that the call to posix_memalign() is involved in whatever's going wrong. If we replace that with a call to plain malloc(), our app carries along running "fine". By "fine", I mean this isn't significantly tested or shipped beyond my local build going from segfault-at-launch to looking like all's well.

    Like I said, probably an Android 11 beta issue-- which I'm testing on a Pixel phone-- but at least wanted to make sure you were aware.

    Edit: correction/clarification. It's not the call to posix_memalign() itself that's segfaulting. When TaskScheduler::StartThreads() runs m_pTaskCompleteSemaphore = SemaphoreNew()-- the second call to SemaphoreNew()-- the resulting placement-new in TaskScheduler::New() is what's segfaulting. Even though posix_memalign() returned a "success" error code of 0, and the pointer to memory is non-null.

    ARM Android 
    opened by MSFT-Chris-Barrett 11
  • Valgrind errors on OSX High Sierra

    Valgrind errors on OSX High Sierra

    Hi Doug, I ran my image codec, grok, which uses enkiTS, on OSX with valgrind, and I see this error:

    ==3380== Process terminating with default action of signal 11 (SIGSEGV)
    ==3380==  Access not within mapped region at address 0x18
    ==3380==    at 0x100CC05BA: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
    ==3380==    by 0x100CC050C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
    ==3380==    by 0x100CBFBF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
    ==3380==  If you believe this happened as a result of a stack
    ==3380==  overflow in your program's main thread (unlikely but
    ==3380==  possible), you can try to increase the size of the
    ==3380==  main thread stack using the --main-stacksize= flag.
    ==3380==  The main thread stack size used in this run was 8388608.
    

    This is for an earlier version of enkiTS, as the latest version will result in BAD_ACCESS error and my program crashes.

    Have you run any valgrind tests on OSX ? Everything looks good on Linux.

    Thanks.

    bug OSX 
    opened by boxerab 11
  • Occasional performance spikes in SetEvent

    Occasional performance spikes in SetEvent

    Using Microprofile on Windows, I noticed that SetEventcan occasionally take longer than usual. enkiTS calls it to wake up the worker threads in AddTaskSetToPipe. SetEventusually blocks for less than 1ms, but I've seen spikes way up in the 20s of ms, which is a problem if the game waits on a task which is blocked by AddTaskSetToPipe. It ends up producing noticable frame spikes every few seconds. By changing the Event Object to auto-reset in EventCreate(https://msdn.microsoft.com/en-us/library/windows/desktop/ms682655(v=vs.85).aspx) these spikes disappear. However, this will only wake up 1 thread at a time, and may decrease thread utilization. I could also avoid the issue by increasing the spin count, however this of course increases power consumption. This is probably not a big issue if the threads rarely wait, so it depends on the workload of the scheduler, as well as the number of cores in use. Maybe auto-reset mode should be an option?

    bug Windows 
    opened by jspohr 11
  • Feature suggestion: running tasks from non main/task threads

    Feature suggestion: running tasks from non main/task threads

    Hi Doug,

    enkiTS does not allow running tasks or waiting for completion from threads other than main/task threads, as I understood.

    For example, I would like to be able to use the system from rendering thread, which itself is not a task thread, but a full-fledged thread typically running in parallel with the main one. Or from background loading thread which is mostly idle waiting for IO, but uses tasks to decompress/finalize assets. It could be cool if enkiTS was able to support that. What do you think? Thanks!

    -- Aleksei

    feature 
    opened by Vuhdo 10
  • ThreadSanitizer reports

    ThreadSanitizer reports

    I have been testing latest master (4f9941b). ThreadSanitizer, enabled under XCode 11.0, is reporting some data races when running unmodified samples.

    I am reporting the output of one Data race report for ParallelSum as an example.

    ==================
    WARNING: ThreadSanitizer: data race (pid=56086)
      Read of size 4 at 0x7ffeefbff49c by thread T4:
        #0 enki::TaskScheduler::TryRunTask(unsigned int, unsigned int, unsigned int&) TaskScheduler.cpp:412 (ParallelSum:x86_64+0x100007204)
        #1 enki::TaskScheduler::TryRunTask(unsigned int, unsigned int&) TaskScheduler.cpp:377 (ParallelSum:x86_64+0x1000050d0)
        #2 enki::TaskScheduler::TaskingThreadFunction(enki::ThreadArgs const&) TaskScheduler.cpp:236 (ParallelSum:x86_64+0x100004e04)
        #3 decltype(std::__1::forward<void (*)(enki::ThreadArgs const&)>(fp)(std::__1::forward<enki::ThreadArgs>(fp0))) std::__1::__invoke<void (*)(enki::ThreadArgs const&), enki::ThreadArgs>(void (*&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) type_traits:4361 (ParallelSum:x86_64+0x10000d06d)
        #4 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs>&, std::__1::__tuple_indices<2ul>) thread:342 (ParallelSum:x86_64+0x10000ceb1)
        #5 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs> >(void*) thread:352 (ParallelSum:x86_64+0x10000bf09)
    
      Previous write of size 4 at 0x7ffeefbff49c by main thread:
        #0 enki::ITaskSet::ITaskSet() TaskScheduler.h:122 (ParallelSum:x86_64+0x100003288)
        #1 ParallelReductionSumTaskSet::ParallelReductionSumTaskSet(unsigned int) ParallelSum.cpp:81 (ParallelSum:x86_64+0x100003ba8)
        #2 ParallelReductionSumTaskSet::ParallelReductionSumTaskSet(unsigned int) ParallelSum.cpp:82 (ParallelSum:x86_64+0x100002e04)
        #3 main ParallelSum.cpp:146 (ParallelSum:x86_64+0x100002390)
    
      Location is stack of main thread.
    
      Thread T4 (tid=3398714, running) created by main thread at:
        #0 pthread_create <null>:2673040 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2aa2d)
        #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:328 (ParallelSum:x86_64+0x10000be4e)
        #2 std::__1::thread::thread<void (&)(enki::ThreadArgs const&), enki::ThreadArgs, void>(void (&&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) thread:368 (ParallelSum:x86_64+0x10000ba71)
        #3 std::__1::thread::thread<void (&)(enki::ThreadArgs const&), enki::ThreadArgs, void>(void (&&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) thread:360 (ParallelSum:x86_64+0x100006238)
        #4 enki::TaskScheduler::StartThreads() TaskScheduler.cpp:298 (ParallelSum:x86_64+0x100005901)
        #5 enki::TaskScheduler::Initialize(unsigned int) TaskScheduler.cpp:924 (ParallelSum:x86_64+0x10000a687)
        #6 main ParallelSum.cpp:136 (ParallelSum:x86_64+0x10000231a)
    
    SUMMARY: ThreadSanitizer: data race TaskScheduler.cpp:412 in enki::TaskScheduler::TryRunTask(unsigned int, unsigned int, unsigned int&)
    ==================
    ThreadSanitizer report breakpoint hit. Use 'thread info -s' to get extended information about the report.
    
    

    This is reporting that reading subTask.pTask->m_RangeToRun is a data race

    
    bool TaskScheduler::TryRunTask( uint32_t threadNum_, uint32_t priority_, uint32_t& hintPipeToCheck_io_ )
    {
    // ...
     
            if( subTask.pTask->m_RangeToRun < partitionSize )
            {
                SubTaskSet taskToRun = SplitTask( subTask, subTask.pTask->m_RangeToRun );
           }
    
    
    

    When declaring ParallelSumTaskSet m_ParallelSumTaskSet; inside struct ParallelReductionSumTaskSet

    struct ParallelReductionSumTaskSet : ITaskSet
    {
        ParallelSumTaskSet m_ParallelSumTaskSet;
        uint64_t m_FinalSum;
    
        ParallelReductionSumTaskSet( uint32_t size_ ) : m_ParallelSumTaskSet( size_ ), m_FinalSum(0)
        {
                m_ParallelSumTaskSet.Init( g_TS.GetNumTaskThreads() );
        }
    
        virtual void    ExecuteRange( TaskSetPartition range_, uint32_t threadnum_ )
        {
            g_TS.AddTaskSetToPipe( &m_ParallelSumTaskSet );
            g_TS.WaitforTask( &m_ParallelSumTaskSet );
    
            for( uint32_t i = 0; i < m_ParallelSumTaskSet.m_NumPartialSums; ++i )
            {
                m_FinalSum += m_ParallelSumTaskSet.m_pPartialSums[i].count;
            }
        }
    }
    
    

    will initialize ParallelSumTaskSet::m_RangeToRun in the constructor:

        class ITaskSet : public ICompletable
        {
        public:
            ITaskSet()
                : m_SetSize(1)
                , m_MinRange(1)
                , m_RangeToRun(1)
            {}
    };
    

    I am not expert on the field, but it looks like a potential false positive, because TryRunTask is executed only after AddTaskSetToPipe.

    I try to keep our software clean from all sanitizer reports, so that I can catch real bugs ;) For this reason if this or other reports look safe, I suggest to add annotations that disable TSAN where appropriate (using no_sanitize("thread")).

    Does it make sense for me to report all data races found by the TSAN output?

    Oh, and thanks for your excellent work on the library :)

    not a bug 
    opened by Pagghiu 9
  • Feature request: check return codes for semaphore system calls

    Feature request: check return codes for semaphore system calls

    I noticed that the return codes for semaphore creation etc. aren't being checked for errors. So, create may fail and caller doesn't get notified. What do you think is the best way of handling error conditions: throw an exception, or return false ? Thanks!

    bug Linux 
    opened by boxerab 8
  • Completion of error handling

    Completion of error handling

    opened by elfring 8
  • Q: Manual partitioning

    Q: Manual partitioning

    enkiTS automatically partitions workload into ranges.

    What is a recommended way to manually partition workload and submit these tasks to the scheduler? So, I want to create a list of tasks with manually specified (start, end) range.

    There are two applications:

    • user defined task splitting
    • creating tasks over 2D/3D domain

    Any suggestions are greatly appreciated.

    feature question 
    opened by ib00 6
  • Scheduling tasks with high priority after-the-fact

    Scheduling tasks with high priority after-the-fact

    This may be out of scope for the library, or potentially I've missed a way to do it. What I would like to be able to do is add a number of tasks with no particular priority and have them be scheduled in parallel in no particular order, but I would want the ability to immediately run one if I discover that it's particularly needed soon (while allowing others to still be scheduled as/when).

    The sketched idea is to have the main thread push tasks for a bunch of deferred operations as it's processing something, but then if the results of one of those deferred operations is needed it wants to either immediately run that task on the main thread or spinloop if it's already been scheduled. The tasks can come in any order so you could push 100 different tasks and then want the results from number 50 before continuing, so waiting for 1-49 would not be ideal as then the main thread can't continue processing in parallel with them.

    As far as I can tell unless I'm misunderstanding the code, WaitForTask does not prioritise the task you pass in to wait on but it looks at all tasks at equal priority. I don't see a way to elevate the priority of a task after it has been created either. I also wondered if the way to solve this would be to add a new task with a dependency on the one I want to run, but at least from a layperson's eye that doesn't seem like it would change the scheduling order.

    Is there a way to do this, and/or is it something you'd be interested in supporting? In principle this could be refactored such that there is no main thread needing to sync, and everything including its processing becomes task-based so this translates into dependency management, but that's further than I'd want to go.

    feature question 
    opened by baldurk 5
  • Running tasks via WaitForTask(NULL)

    Running tasks via WaitForTask(NULL)

    I want to be able to say "wait up to N microseconds on the current thread for a task to be executable then run at most one task", and optionally repeat that afterwards with a smaller N, as a way of better using time than a usleep(N) on my main thread in between doing other high priority work. The docs for TaskScheduler::WaitForTask say:

    if called with 0 it will try to run tasks, and return if none available.

    And assuming 0 means NULL then that seems pretty much like what I want, however I'm not sure what guarantees there are on how much work this does. Will it run only one task or multiple? Will it keep going until no more are ready?

    If this contractually only runs one task at most then I think that would pretty much do what I want, though I would need to implement the timeout myself with an external spinloop. I'm not sure if that's much less optimal than if you could do a timeout internally. I also don't see a way to tell if this actually ran a task, to be able to run at most one within the timeframe. That would be nice but isn't necessary.

    Wall of text if desired with more context on what I'm trying to do to avoid the XY problem

    I'm sketching a design where I would have a main thread which adds N tasks, and then be solely responsible for picking up special work from the tasks that needs to go to and from the GPU via a single controlling thread, running it and reading back the results, and adding follow-up tasks to process the results. Normally this would just have a sleep while waiting for results as there's a fair amount of latency in going to the GPU and getting results back, so I want to be able to run tasks on the CPU in the meantime on the main thread without being late to pick up results from the GPU.

    My ideal design then would be for the main thread to have a loop whereby it checks to see if there's GPU work to process, and then if there's nothing to do it runs CPU tasks for a bit before checking again. The key is I would really only want it to run a bounded amount of CPU work before returning to check on the GPU again, to have guarantees on how frequently I will check for GPU work again.

    From what I can see there's a few options:

    1. Keep all of the GPU work out of enkiTS entirely, have my outer loop that looks for GPU work but instead of sleeping when there's nothing to do I instead call WaitforTask(NULL). Hence the above question :smile:

    2. Rely on the OS scheduler instead of enkiTS's scheduler. Let the main thread do a loop and do a genuine OS sleep in between GPU work, relying on the thread being kicked off the hardware core and add one more task thread to be scheduled at the same time. I'm worried that this could impact the main thread though as now the hardware threads could be oversubscribed, and it depends on when the OS scheduler decides to schedule the main thread again.

    3. Set a high priority pinned task on the main thread, and each time it completes it queues a new pinned task on the main thread to be re-run again. The main thread just does WaitForAll or similar and relies on the enkiTS scheduler to sort it out. I think this might technically work but I have a feeling it's going to boil down to a spinloop, since the GPU task will keep being higher priority and I don't think there's anything to stop it getting scheduled again as soon as a new task is added. I don't see a way to represent that sleep/delay before being scheduled again (at least without some explicit dependency on a task in between which I don't have).

    4. Have the tasks submitting the GPU work add a new high priority pinned task whenever they add new work, but the challenge here is if the GPU isn't ready I don't want the pinned task to be blocking if there's more CPU work to do - so something needs to get it to check again and I'm not guaranteed any more tasks will submit new GPU work after that.

    I may also be missing an obviously better way to do this!

    feature question 
    opened by baldurk 2
Releases(v1.11)
  • v1.11(Mar 3, 2022)

    New Features

    This release adds a breaking API feature to the C interface: pre-completion and post-completion functions. The C++ interface is unchanged.

    A pre-completion function is called before the complete action task is 'complete', which means this is prior to dependent tasks being run. This function can thus alter any task arguments of the dependencies.

    A post-completion function is called after the complete action task is 'complete'. Dependent tasks may have already been started. This function can delete the completion action if needed as it will no longer be accessed by other functions.

    It is safe to set either of these to NULL if you do not require that function.

    See CompletionAction_c.c for an example showing both how to modify following tasks as well as free memory from the completion action and previous tasks.

    The C++ equivalent was already possible, I have updated the example CompletionAction.cpp to demonstrate how.

    Fixes

    In addition this release adds the following fixes:

    Thanks to @BobbyAnguelov, @TurtleSimos and @makuto for their issue reports, PRs, and testing which have helped bring this release together. I'd also like to thank our Patreon and Github supporters for their financial assistance.

    Support development of enkiTS through Github Sponsors or Patreon

    Become a Patron

    Source code(tar.gz)
    Source code(zip)
  • v1.10(Jul 20, 2021)

    This release adds a major new feature - WaitForPinnedTasks - suitable for performing work which could be OS blocking such as IO.

    The mortification for this feature is that Some calls, such as file and network IO, may result in the thread being blocked whilst waiting on an external event which does not consume CPU resources to occur. If this was run on a standard enkiTS thread the task scheduler would have fewer threads able to perform computational work. Creating more threads than CPU cores offers one solution, but this will result in OS scheduling overhead which we wish to minimize. The WaitForPinnedTasks() function permits an enkiTS thread to block at the OS level until it explicitly receives new work via a PinnedTask. Developers can thus create extra threads which then loop calling WaitForPinnedTasks() and RunPinnedTasks() to perform IO/blocking work, minimizing OS scheduling overhead whilst keeping all CPU cores active with enkiTS threads.

    WaitForPinnedTasks thread usage in C++:

    #include "TaskScheduler.h"
    
    enki::TaskScheduler g_TS;
    
    struct RunPinnedTaskLoopTask : enki::IPinnedTask
    {
        void Execute() override
        {
            while( g_TS.GetIsRunning() )
            {
                g_TS.WaitForNewPinnedTasks(); // this thread will 'sleep' until there are new pinned tasks
                g_TS.RunPinnedTasks();
            }
        }
    };
    
    struct PretendDoFileIO : enki::IPinnedTask
    {
        void Execute() override
        {
            // Do file IO
        }
    };
    
    int main(int argc, const char * argv[])
    {
        enki::TaskSchedulerConfig config;
    
        // In this example we create more threads than the hardware can run,
        // because the IO thread will spend most of it's time idle or blocked
        // and therefore not scheduled for CPU time by the OS
        config.numTaskThreadsToCreate += 1;
    
        g_TS.Initialize( config );
    
        // in this example we place our IO threads at the end
        RunPinnedTaskLoopTask runPinnedTaskLoopTasks;
        runPinnedTaskLoopTasks.threadNum = g_TS.GetNumTaskThreads() - 1;
        g_TS.AddPinnedTask( &runPinnedTaskLoopTasks );
    
        // Send pretend file IO task to external thread FILE_IO
        PretendDoFileIO pretendDoFileIO;
        pretendDoFileIO.threadNum = runPinnedTaskLoopTasks.threadNum;
        g_TS.AddPinnedTask( &pretendDoFileIO );
    
        // ensure runPinnedTaskLoopTasks complete by explicitly calling shutdown
        g_TS.WaitforAllAndShutdown();
    
        return 0;
    }
    

    enkiTSMicroprofileExample Screenshot of enkiTS in action in Avoyd Voxel Editor whilst CPU path tracing a scene. Avoyd uses additional enkITS threads using WaitForNewPinnedTasks() to wait for PinnedTasks which perform blocking IO (not shown in profile as these are hard to capture in a nice looking screenshot).

    Support development of enkiTS through Github Sponsors or Patreon

    Become a Patron

    Source code(tar.gz)
    Source code(zip)
  • v1.9(Mar 24, 2021)

    This release adds no new features, but incorporates a number of bug fixes:

    Many thanks to @brunochampoux, @aaronfranke, @craigsteyn, @mrdooz, @MSFT-Chris-Barrett, and @cstamford for the issue reports, PRs and testing which have helped to bring this release together.

    Support development of enkiTS through Github Sponsors or Patreon

    Become a Patron

    Source code(tar.gz)
    Source code(zip)
  • v1.8(Mar 2, 2020)

    NOTE: Breaking changes to C API

    This release of enkiTS adds a major feature - Dependencies, and a minor feature - Completion Actions.

    Dependencies introduce an alternative approach to waiting on tasks to create a sequence of tasks, or task graph. See example below along with links.

    Completion Actions are dependencies which execute immediatly after a task has completed, and do not add themselves to the task scheduler. They involve less overhead than a normal task, and can be used to delete the completed task as well as themselves as they are not referenced after their completion function is called. See example/CompletionAction.cpp and example/CompletionAction_c.c .

    The following breaking changes to the C API were required:

    1. enkiDelete* functions now require the task scheduler as a parameter.
    2. enkiAddTaskSet replaced with enkiAddTaskSetArgs so that enkiAddTaskSet can be used for a new function which does not set task arguments, intended for use along with the new enkiSetParams* functions.

    Dependency usage in C++:

    #include "TaskScheduler.h"
    
    enki::TaskScheduler g_TS;
    
    // define a task set, can ignore range if we only do one thing
    struct TaskA : enki::ITaskSet {
        void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
            // do something here, can issue tasks with g_TS
        }
    };
    
    struct TaskB : enki::ITaskSet {
        enki::Dependency m_Dependency;
        void ExecuteRange(  enki::TaskSetPartition range_, uint32_t threadnum_ ) override {
            // do something here, can issue tasks with g_TS
        }
    };
    
    int main(int argc, const char * argv[]) {
        g_TS.Initialize();
        
        // set dependencies once (can set more than one if needed).
        TaskA taskA;
        TaskB taskB;
        taskB.SetDependency( taskB.m_Dependency, &taskA );
    
        g_TS.AddTaskSetToPipe( &taskA ); // add first task, when complete TaskB will run
        g_TS.WaitforTask( &taskB );      // wait for last
        return 0;
    }
    
    Source code(tar.gz)
    Source code(zip)
  • v1.7(Jan 23, 2020)

    This release of enkiTS fixes a number of issues and adds a new smoke test, TestAll.cpp. Additionally, if using profiling code based on enkiTSMicroprofileExample.cpp in enkiTSExample check out the latest code for a stack based tick store required by the v1.2 callbacks.

    enkiTSMicroprofileExample Screenshot of enkiTSMicroprofileExample.cpp showing two stacked waitForTaskComplete callbacks (at top) which requires a tick stack for correct Microprofile profiling.

    As a helper for catching issues earlier I've also added a smoke test TestAll.cpp for Travis CI which now runs on Linux x64, Linux ARM64, OSX x64 and Windows x64.

    • Fixed issue #44 Fixed GGC warnings.
    • Merged Pull Request #47 with Some fixes adding Xbox CreateSemaphoreExW(), uninitialized m_WaitingForTaskCount.
    • Fixed issue #48 Pinned task not being woken.
      • The added WakeSuspendedThreadsWithPinnedTasks() function increases the workload on threads which have no tasks and are about to suspend waiting for new tasks or task completion, but since these threads are about to suspend this has no detectable performance penalty.
    • Compile fix for when NOMINMAX is globally defined.
    • Fixed issue #49 Valgrind errors on OSX and mach semaphore exception.
      • The v1.6 change to add custom allocators caused an OSX crash in placement new of mach semaphores, so I have switched to dispatch semaphores.
    • Fix for initializing TaskScheduler multiple times without a shutdown and with different configuration parameters.
    • Fixed issue #50 Valgrind warning thread states for all threads not initialized prior to thread launch.
    • Fixed issue #51 WaitforAll() and external threads not waking.
    • Fixed issue #52 ThreadSanitizer (TSAN) reporting data race.
      • A note on ThreadSanitizer (TSAN) & Intel Inspector: currently neither TSAN nor Intel Inspector support all the primitives used by enkiTS and thus there will be false data race reports. I hope to keep these to a minimum, but this is not always possible.

    Thanks to @Vuhdo, @boxerab, @Pagghiu, and Bobby Anguelov.

    Support development of enkiTS through Github Sponsors or Patreon

    Become a Patron

    Source code(tar.gz)
    Source code(zip)
  • v1.6(Nov 5, 2019)

    You can configure enkiTS with custom allocators using the

    Developed based on request Feature suggestion: custom allocators #41 by @sergeyreznik.

    In addition you can now sponsor enkiTS through Github Sponsors, and to boost community funding, GitHub will match your contribution!

    Custom Allocator usage in C++:

    #include "TaskScheduler.h"
    
    #include <stdio.h>
    #include <thread>
    
    using namespace enki;
    
    TaskScheduler g_TS;
    
    struct ParallelTaskSet : ITaskSet
    {
        virtual void ExecuteRange( TaskSetPartition range_, uint32_t threadnum_ )
        {
            printf(" This could run on any thread, currently thread %d\n", threadnum_);
        }
    };
    
    struct CustomData
    {
        const char* domainName;
        size_t totalAllocations;
    };
    
    void* CustomAllocFunc( size_t align_, size_t size_, void* userData_, const char* file_, int line_ )
    {
        CustomData* data = (CustomData*)userData_;
        data->totalAllocations += size_;
    
        printf("Allocating %g bytes in domain %s, total %g. File %s, line %d.\n",
            (double)size_, data->domainName, (double)data->totalAllocations, file_, line_ );
    
        return DefaultAllocFunc( align_, size_, userData_, file_, line_ );
    };
    
    void  CustomFreeFunc(  void* ptr_,    size_t size_, void* userData_, const char* file_, int line_ )
    {
        CustomData* data = (CustomData*)userData_;
        data->totalAllocations -= size_;
    
        printf("Freeing %p in domain %s, total %g. File %s, line %d.\n",
            ptr_, data->domainName, (double)data->totalAllocations, file_, line_ );
    
        DefaultFreeFunc( ptr_, size_, userData_, file_, line_ );
    };
    
    
    int main(int argc, const char * argv[])
    {
        enki::TaskSchedulerConfig config;
        config.customAllocator.alloc = CustomAllocFunc;
        config.customAllocator.free  = CustomFreeFunc;
        CustomData data{ "enkITS", 0 };
        config.customAllocator.userData = &data;
    
        g_TS.Initialize( config );
    
        ParallelTaskSet task;
        g_TS.AddTaskSetToPipe( &task );
        g_TS.WaitforTask( &task );
        g_TS.WaitforAllAndShutdown(); // ensure we shutdown before user data is destroyed.
    
        return 0;
    }
    
    Source code(tar.gz)
    Source code(zip)
  • v1.5(Oct 29, 2019)

    As is now a defining tradition of enkiTS, we have a post-release bugfix:

    1. Fixed CACHE_LINE_SIZE name clashes with macro from major console SDK #43
    2. Fixed GetConfig() returning actual config (it was incorrectly returning the default).
    Source code(tar.gz)
    Source code(zip)
  • v1.4(Oct 28, 2019)

    You can now configure enkiTS with numExternalTaskThreads which can be registered to use with the enkiTS API using the RegisterExternalThread function.

    Developed based on request Feature suggestion: running tasks from non main/task threads #39 by @Vuhdo.

    • This also introduces a new way to configure the task scheduler using the enki::TaskSchedulerConfig.
    • C++11 branch is now deprecated (C++11 functionality now in master branch) and will no longer be updated as master is identical. Please switch to master if you are on this legacy branch.
    • GetProfilerCallbacks no deprecated as you should use the enki::TaskSchedulerConfig.

    External thread usage in C++:

    #include "TaskScheduler.h"
    
    enki::TaskScheduler g_TS;
    struct ParallelTaskSet : ITaskSet
    {
        virtual void ExecuteRange( TaskSetPartition range, uint32_t threadnum )
        {
            // Do something
        }
    };
    
    void threadFunction()
    {
        g_TS.RegisterExternalTaskThread();
    
        // sleep for a while instead of doing something such as file IO
        std::this_thread::sleep_for( std::chrono::milliseconds( num_ * 100 ) );
    
        ParallelTaskSet task;
        g_TS.AddTaskSetToPipe( &task );
        g_TS.WaitforTask( &task);
    
        g_TS.DeRegisterExternalTaskThread();
    }
    
    int main(int argc, const char * argv[])
    {
        enki::TaskSchedulerConfig config;
        config.numExternalTaskThreads = 1; // we have one extra external thread
    
        g_TS.Initialize( config );
    
        std::thread exampleThread( threadFunction );
    
        exampleThread.join();
    
        return 0;
    }
    
    Source code(tar.gz)
    Source code(zip)
  • v1.3(Oct 21, 2019)

  • v1.2(Oct 20, 2019)

    The wait functions now relinquish CPU resources when idle, so other threads can run, similar to the behaviour of task threads which have no tasks to run. This lowers CPU power consumption, and can improve performance when the CPU is oversubscribed (for example when other threads or processes are consuming resources). Thanks to @zhaijialong for the feature request and follow up testing. See issue #31 for more information.

    The wait functions try to run tasks whilst waiting, and if there are none they first spin then perform an OS blocking wait for a task complete event allowing other threads to run. The task complete event system may spuriously wake waiting threads, but they will then go back to a blocking wait.

    This update also has a breaking change to the ProfilerCallbacks struct, which should be a one or two line change. Please see the struct declaration and comments for details.

    Finally, in addition to a few bug fixes, this release also deprecates the C++98 branches and support for non C++11 compatible compilers (C support is still available through the C headers).

    Source code(tar.gz)
    Source code(zip)
  • v1.1(Jun 27, 2019)

    This release is primarily an ARM platform fix to the C++11 atomic use on the master branch. Other changes are:

    • WaitforTask clamps the priorityOfLowestToRun_ to be at least as low as the priority of the task waited on to prevent deadlocks
    • Added asserts to AddTaskSetToPipe and AddPinnedTask when the task being added is already running
    • Init order compile warning fix
    Source code(tar.gz)
    Source code(zip)
  • v1.0(Jun 22, 2019)

    Version 1.0 of enkiTS brings with it the following changes:

    • Task Priorities see readme and the Priorities.cpp example for more information.
    • Improved efficiency and performance a tweaked spin wait and new semaphore system for waking tasks help task threads to go to sleep more frequently when not needed and run new tasks faster.
    • Master branch now uses C++11. If you were using the C++11 branch I advise switching to master as the C++11 branch will eventually be removed. A C++98 branch exists for those without access to a C++11 compiler or for ease in porting to pure C, though it may be deprecated in time.
    • Versioning! From now on I'll be versioning enkiTS
    Source code(tar.gz)
    Source code(zip)
Owner
Doug Binks
Game dev, C++, multithreading, Runtime Compiled C++, voxels, graphics. Co-founder of enkisoftware with @juliettef. Occasionally available for consultancy.
Doug Binks
A easy to use multithreading thread pool library for C. It is a handy stream like job scheduler with an automatic garbage collector. This is a multithreaded job scheduler for non I/O bound computation.

A easy to use multithreading thread pool library for C. It is a handy stream-like job scheduler with an automatic garbage collector for non I/O bound computation.

Hyoung Min Suh 12 Jun 4, 2022
A hybrid thread / fiber task scheduler written in C++ 11

Marl Marl is a hybrid thread / fiber task scheduler written in C++ 11. About Marl is a C++ 11 library that provides a fluent interface for running tas

Google 1.5k Jan 4, 2023
A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependenc

RichieSams 796 Dec 30, 2022
EnkiTSExamples - Examples for enkiTS

Support development of enkiTS through Github Sponsors or Patreon enkiTS Examples Submodules are licensed under their own licenses, see their contents

Doug Binks 82 Sep 30, 2022
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous tasks programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, a

Taskflow 7.6k Dec 31, 2022
A General-purpose Parallel and Heterogeneous Task Programming System

Taskflow Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, an

Taskflow 7.6k Dec 26, 2022
Cpp-taskflow - Modern C++ Parallel Task Programming Library

Cpp-Taskflow A fast C++ header-only library to help you quickly write parallel programs with complex task dependencies Why Cpp-Taskflow? Cpp-Taskflow

null 4 Mar 30, 2021
Sqrt OS is a simulation of an OS scheduler and memory manager using different scheduling algorithms including Highest Priority First (non-preemptive), Shortest Remaining Time Next, and Round Robin

A CPU scheduler determines an order for the execution of its scheduled processes; it decides which process will run according to a certain data structure that keeps track of the processes in the system and their status.

null 10 Sep 7, 2022
OOX: Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling.

OOX Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling. Try it Requirements: Install cmake,

Intel Corporation 18 Oct 25, 2022
afl/afl++ with a hierarchical seed scheduler

This is developed based on AFLplusplus (2.68c, Qemu mode), thanks to its amazing maintainers and community Build and Run Please follow the instruction

null 47 Nov 25, 2022
Forkpool - A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20

riften::Forkpool A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20. This project uses C++20's coroutines to implement c

Conor Williams 129 Dec 31, 2022
Bikeshed - Lock free hierarchical work scheduler

Branch OSX / Linux / Windows master master bikeshed Lock free hierarchical work scheduler Builds with MSVC, Clang and GCC, header only, C99 compliant,

Dan Engelbrecht 81 Dec 30, 2022
Scheduler - Modern C++ Scheduling Library

Scheduler Modern C++ Header-Only Scheduling Library. Tasks run in thread pool. Requires C++11 and ctpl_stl.h in the path. Inspired by the Rufus-Schedu

Spencer Bosma 232 Dec 21, 2022
Arcana.cpp - Arcana.cpp is a collection of helpers and utility code for low overhead, cross platform C++ implementation of task-based asynchrony.

Arcana.cpp Arcana is a collection of general purpose C++ utilities with no code that is specific to a particular project or specialized technology are

Microsoft 67 Nov 23, 2022
A task scheduling framework designed for the needs of game developers.

Intel Games Task Scheduler (GTS) To the documentation. Introduction GTS is a C++ task scheduling framework for multi-processor platforms. It is design

null 424 Jan 3, 2023
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Christian Blume 592 Dec 19, 2022
Task System presented in "Better Code: Concurrency - Sean Parent"

task_system task_system provides a task scheduler for modern C++. The scheduler manages an array of concurrent queues A task, when scheduled, is enque

Pranav 31 Dec 7, 2022
Jobxx - Lightweight C++ task system

jobxx License Copyright (c) 2017 Sean Middleditch [email protected] This is free and unencumbered software released into the public domain. A

Sean Middleditch 77 May 28, 2022
C++14 coroutine-based task library for games

SquidTasks Squid::Tasks is a header-only C++14 coroutine-based task library for games. Full project and source code available at https://github.com/we

Tim Ambrogi Saxon 74 Nov 30, 2022