oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html

Overview

oneAPI DPC++ Library (oneDPL)

The oneAPI DPC++ Library (oneDPL) aims to work with the oneAPI DPC++ Compiler to provide high-productivity APIs to developers, which can minimize DPC++ programming efforts across devices for high performance parallel applications. oneDPL consists of following components:

  • Parallel STL for DPC++
  • An additional set of library classes and functions (referred below as "Extension API")
  • Tested standard C++ APIs

Prerequisites

Install Intel(R) oneAPI Base Toolkit (Base Kit) to use oneDPL and please refer to System requirements.

Release Information

Here is the latest Release Notes.

License

oneDPL is licensed under Apache License Version 2.0 with LLVM exceptions. Refer to the "LICENSE" file for the full license text and copyright notice.

Security

See Intel's Security Center for information on how to report a potential security issue or vulnerability. See also: Security Policy

Contributing

See CONTRIBUTING.md for details.

Documentation

See Library Guide with oneDPL.

Samples

You can find oneDPL samples in Samples.

Support and contribution

Please report issues and suggestions via GitHub issues.


Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

* Other names and brands may be claimed as the property of others.

Issues
  • Migrated build process from Makefiles to CMake

    Migrated build process from Makefiles to CMake

    Hi,

    I've migrated current build process from raw Makefiles to CMake instead. This allows for easier configurable releases and the standard configure/make/make install process across platforms for easier packaging across linux distributions and otherwise.

    Additionally, it also exports projectConfig.cmake files for other projects to detect pstl and specify it as a dependency!

    opened by ambasta 13
  • Error with TBB CMake on OS X

    Error with TBB CMake on OS X

    Hello,

    I've been using parallelstl for a few months, and it is really a fantastic library.

    I've just upgraded to the latest release, which includes CMake support. But, when I add the parallelstl subdirectory in CMake, I get the following error:

    CMake Error at /usr/local/lib/cmake/TBB/TBBConfig.cmake:77 (message):
      Missed required Intel TBB component: tbb
    Call Stack (most recent call first):
      third_party/parallelstl-20180619/CMakeLists.txt:39 (find_package)
    
    
    -- Configuring incomplete, errors occurred!
    

    To be clear, I definitely have TBB installed. I'm on OS X and I installed it via brew. Its header files and libraries are setting in /usr/local/include and /usr/local/lib respectively. If I skip the parallelstl CMake and add things manually, everything is fine.

    I was hoping there would be some variable somewhere, like TBB_ROOT or something, that I could set that would make the TBB CMake happy, but so far nothing has worked. Any thoughts?

    Thanks!

    opened by izaid 10
  • Create install target that copies include files and oneDPLConfig.cmake

    Create install target that copies include files and oneDPLConfig.cmake

    Currently, the CMake setup doesn't define an install target that allows oneDPL to be installed in a user-provided directory properly. This pull request provides that by copying the include files and invoking generate_config.cmake.

    opened by masterleinad 8
  • Modify Jenkinsfiles to use good compiler in the Last Good OneDPL link file

    Modify Jenkinsfiles to use good compiler in the Last Good OneDPL link file

    The change will switch our Jenkins CI test to use good compiler in OneDPL link. Verified manually with: http://icl-jenkins2.sc.intel.com:8080/job/Tools_SH/job/test_jobs/job/onedpl_test/job/RHEL_Test/19/console http://icl-jenkins2.sc.intel.com:8080/job/Tools_SH/job/test_jobs/job/onedpl_test/job/UB20_Test/8/console http://icl-jenkins2.sc.intel.com:8080/job/Tools_SH/job/test_jobs/job/onedpl_test/job/UB18_Test/5/console http://icl-jenkins2.sc.intel.com:8080/blue/organizations/jenkins/Tools_SH%2Ftest_jobs%2Fonedpl_test%2FWin_test/detail/Win_test/46/pipeline

    opened by DoyleLi 8
  • Error on configuring parallelstl using CMake on Windows

    Error on configuring parallelstl using CMake on Windows

    Complaining that a TBB CMake directory is missing, though TBB is not build (doesn't provide) CMake configuration and building

    **CMake Error at CMakeLists.txt:39 (find_package): By not providing "FindTBB.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "TBB", but CMake did not find one.

    Could not find a package configuration file provided by "TBB" (requested version 2018) with any of the following names:

    TBBConfig.cmake
    tbb-config.cmake
    

    Add the installation prefix of "TBB" to CMAKE_PREFIX_PATH or set "TBB_DIR" to a directory containing one of the above files. If "TBB" provides a separate development package or SDK, be sure it has been installed.**

    opened by mselim 8
  • help with getting transform_reduce to work for device array

    help with getting transform_reduce to work for device array

    Hello, Can you please help with finding the issue with the following code? More specifically, I am trying to apply std::transform_reduce on a device pointer (allocated using malloc_device and populated with memcpy). I want to first apply abs, and then take the max. I understand std::transform, std::reduce, std::transform_reduce each needs an iterator and usm allocated array/buffer should work (not 100% sure). It does work for std::transform (without any explicit conversion of the device pointer to an iterator but It fails for both std::reduce and std::transform_reduce.

    #include <oneapi/dpl/execution>
    #include <oneapi/dpl/algorithm>
    #include <CL/sycl.hpp>
    
    #include <vector>
    #include <iostream>
    
    struct FunctionalAbs {
        float operator()(const float& x) const {
            return sycl::fabs((float)x);
        }
    };
    
    constexpr int N = 16;
    
    int main()
    {
        // create queue and policy
        sycl::queue myQueue = sycl::queue();
        auto policy = oneapi::dpl::execution::make_device_policy(myQueue);
    
        // fill up host vecotr
        std::vector<float> values_h;
        for (int i = 0; i < N; ++i) {
            values_h.push_back(-(float)N/2 + i);
        }
    
        // allocate and fill up device pointer
        auto values_d = sycl::malloc_device<float>(N, myQueue);
        myQueue.memcpy(values_d, &values_h[0], N * sizeof(float)).wait();
    
        /* THIS WORKS */
        // transform and reduce host vector: takes abs followed by max
        float max_h = std::transform_reduce(    // works
            values_h.begin(),
            values_h.end(),
            0.0f,
            oneapi::dpl::maximum<float>(),
            FunctionalAbs());
    
        /* THIS IS NOT WORKING */
        float max_d = std::transform_reduce(    // does not work
            policy,
            values_d,       // how to get an iterator here
            values_d + N,   // how to get an iterator here
            0.0f,
            oneapi::dpl::maximum<float>(),
            FunctionalAbs());
        
        /* THIS WORKS */
        std::transform(                         // works
            policy,
            values_d,       // does not require iterator here
            values_d + N,   // does not require iterator here
            values_d,
            FunctionalAbs());
    
        std::cout << "max: " << max_h << std::endl;
    
        return 0;
    }
    

    I get this error when trying to compile using dpcpp test_transform_reduce.cpp

    test_transform_reduce.cpp:42:19: error: no matching function for call to 'transform_reduce' float max_d = std::transform_reduce( // does not work ^~~~~~~~~~~~~~~~~~~~~ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/numeric:338:5: note: candidate template ignored: deduced conflicting types for parameter '_InputIterator1' ('oneapi::dpl::execution::device_policy<>' vs. 'float *') transform_reduce(_InputIterator1 __first1, _InputIterator1 __last1, ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/pstl/glue_numeric_impl.h:81:1: note: candidate template ignored: requirement '__pstl::execution::is_execution_policy<oneapi::dpl::execution::device_policy<oneapi::dpl::execution::DefaultKernelName>>::value' was not satisfied [with _ExecutionPolicy = oneapi::dpl::execution::device_policy<> &, _ForwardIterator = float *, _Tp = float, _BinaryOperation = oneapi::dpl::maximum<float>, _UnaryOperation = FunctionalAbs] transform_reduce(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Tp __init, ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/numeric:405:5: note: candidate function template not viable: requires 5 arguments, but 6 were provided transform_reduce(_InputIterator __first, _InputIterator __last, _Tp __init, ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/pstl/glue_numeric_impl.h:54:1: note: candidate function template not viable: requires 5 arguments, but 6 were provided transform_reduce(_ExecutionPolicy&& __exec, _ForwardIterator1 __first1, _ForwardIterator1 __last1, ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/pstl/glue_numeric_impl.h:69:1: note: candidate function template not viable: requires 7 arguments, but 6 were provided transform_reduce(_ExecutionPolicy&& __exec, _ForwardIterator1 __first1, _ForwardIterator1 __last1, ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/numeric:380:5: note: candidate function template not viable: requires 4 arguments, but 6 were provided transform_reduce(_InputIterator1 __first1, _InputIterator1 __last1, ^ 1 error generated.

    Thanks, Golam

    opened by mgrabban 7
  • Issues with reduce_by_segment with zip_iterators

    Issues with reduce_by_segment with zip_iterators

    Hi,

    I was trying to figure out a test case that involves using dpl::reduce_by_segment with zip_iterators(tuple) and was facing some difficultly with compiling it.

    Can someone please comment if there is something wrong the way the test case is setup or otherwise.

    #define PSTL_USE_PARALLEL_POLICIES 0
    #define _GLIBCXX_USE_TBB_PAR_BACKEND 0
    
    #include <CL/sycl.hpp>
    #include <oneapi/dpl/execution>
    #include <oneapi/dpl/algorithm>
    #include <oneapi/dpl/iterator>
    #include <oneapi/dpl/functional>
    
    #include <functional>
    #include <iostream>
    #include <vector>
    
    int main()
    {
        sycl::queue q(sycl::gpu_selector{});
    
        std::vector<int> keys1{11, 11, 21, 20, 21, 21, 21, 37, 37};
        std::vector<int> keys2{11, 11, 20, 20, 20, 21, 21, 37, 37};
        std::vector<int> values{0, 1, 2, 3, 4, 5, 6, 7, 8};
        std::vector<int> output_keys1(keys1.size());
        std::vector<int> output_keys2(keys2.size());    
        std::vector<int> output_values(values.size());
    
        int* d_keys1         = sycl::malloc_device<int>(9, q);
        int* d_keys2         = sycl::malloc_device<int>(9, q);
        int* d_values        = sycl::malloc_device<int>(9, q);
        int* d_output_keys1  = sycl::malloc_device<int>(9, q);
        int* d_output_keys2  = sycl::malloc_device<int>(9, q);
        int* d_output_values = sycl::malloc_device<int>(9, q);
    
        q.memcpy(d_keys1, keys1.data(), sizeof(int)*9);
        q.memcpy(d_keys2, keys2.data(), sizeof(int)*9);
        q.memcpy(d_values, values.data(), sizeof(int)*9);
    
        auto begin_keys_in = oneapi::dpl::make_zip_iterator(d_keys1, d_keys2);
        auto end_keys_in   = oneapi::dpl::make_zip_iterator(d_keys1 + 9, d_keys2 + 9);
        auto begin_keys_out= oneapi::dpl::make_zip_iterator(d_output_keys1, d_output_keys2);
    
        auto new_last = oneapi::dpl::reduce_by_segment(oneapi::dpl::execution::make_device_policy(q),
    						   begin_keys_in, end_keys_in, d_values, begin_keys_out, d_output_values);
    
        q.memcpy(output_keys1.data(), d_output_keys1, sizeof(int)*9);
        q.memcpy(output_keys2.data(), d_output_keys2, sizeof(int)*9);    
        q.memcpy(output_values.data(), d_output_values, sizeof(int)*9);
        q.wait();
    
        // Expected output
        // {11, 11}: 1
        // {21, 20}: 2
        // {20, 20}: 3
        // {21, 20}: 4
        // {21, 21}: 11
        // {37, 37}: 15
        for(int i=0; i<9; i++) {
          std::cout << "{" << output_keys1[i] << ", " << output_keys2 << "}: " << output_values[i] << std::endl;
        }
    }
    

    Environment Target device and vendor: Intel GPUs DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2021.2.0 (2021.x.0.20210323)

    opened by abagusetty 7
  • oneDPL/include/oneapi/dpl/pstl/./omp/parallel_merge.h:87: undefined reference to `omp_in_parallel'

    oneDPL/include/oneapi/dpl/pstl/./omp/parallel_merge.h:87: undefined reference to `omp_in_parallel'

    I'm trying to build 67383db9ac3223c825f4b8783b38bb9ab01aa757 like this:

    cmake .. -DONEDPL_BACKEND=dpcpp_only -DONEDPL_DEVICE_TYPE=GPU -DONEDPL_DEVICE_BACKEND=level_zero -DONEDPL_USE_UNNAMED_LAMBDA=TRUE -DCMAKE_CXX_COMPILER=dpcpp -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CXX_STANDARD=17
    

    It fails because OpenMP isn't available. I know how to tell it where OpenMP is, but I should not have to do that, since I am building a DPC++-only backend for the GPU.

    [ 16%] Linking CXX executable merge.pass
    /usr/bin/ld: /tmp/merge-bd963e.o: in function `_ZN6oneapi3dpl13__omp_backend16__parallel_mergeIRKNS0_9execution2v115parallel_policyEN9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEENS9_IPKiSD_EESE_NS0_10__internal11__pstl_lessEZNSI_15__pattern_mergeIS7_SE_SH_SE_SJ_St17integral_constantIbLb0EEEENSt9enable_ifIXsr6oneapi3dpl10__internal26__is_host_execution_policyINSt5decayIT_E4typeEEE5valueET2_E4typeEOSP_T0_SW_T1_SX_SS_T3_T4_SL_IbLb1EEEUlSE_SE_SH_SH_SE_SJ_E_EEvSV_SW_SW_SX_SX_SS_SY_SZ_':
    /tmp/oneDPL/include/oneapi/dpl/pstl/./omp/parallel_merge.h:87: undefined reference to `omp_in_parallel'
    /usr/bin/ld: /tmp/merge-bd963e.o: in function `_ZN6oneapi3dpl13__omp_backend16__parallel_mergeIRKNS0_9execution2v115parallel_policyEN9__gnu_cxx17__normal_iteratorIPdSt6vectorIdSaIdEEEENS9_IPKdSD_EESE_NS0_10__internal11__pstl_lessEZNSI_15__pattern_mergeIS7_SE_SH_SE_SJ_St17integral_constantIbLb0EEEENSt9enable_ifIXsr6oneapi3dpl10__internal26__is_host_execution_policyINSt5decayIT_E4typeEEE5valueET2_E4typeEOSP_T0_SW_T1_SX_SS_T3_T4_SL_IbLb1EEEUlSE_SE_SH_SH_SE_SJ_E_EEvSV_SW_SW_SX_SX_SS_SY_SZ_':
    /tmp/oneDPL/include/oneapi/dpl/pstl/./omp/parallel_merge.h:87: undefined reference to `omp_in_parallel'
    /usr/bin/ld: /tmp/merge-bd963e.o: in function `_ZN6oneapi3dpl13__omp_backend16__parallel_mergeIRKNS0_9execution2v115parallel_policyESt16reverse_iteratorIN9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEEES8_INSA_IPKiSE_EEESG_St7greaterIiEZNS0_10__internal15__pattern_mergeIS7_SG_SK_SG_SM_St17integral_constantIbLb0EEEENSt9enable_ifIXsr6oneapi3dpl10__internal26__is_host_execution_policyINSt5decayIT_E4typeEEE5valueET2_E4typeEOST_T0_S10_T1_S11_SW_T3_T4_SP_IbLb1EEEUlSG_SG_SK_SK_SG_SM_E_EEvSZ_S10_S10_S11_S11_SW_S12_S13_':
    /tmp/oneDPL/include/oneapi/dpl/pstl/./omp/parallel_merge.h:87: undefined reference to `omp_in_parallel'
    /usr/bin/ld: /tmp/merge-bd963e.o: in function `_ZN6oneapi3dpl13__omp_backend16__parallel_mergeIRKNS0_9execution2v127parallel_unsequenced_policyESt16reverse_iteratorIN9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEEES8_INSA_IPKiSE_EEESG_St7greaterIiEZNS0_10__internal15__pattern_mergeIS7_SG_SK_SG_SM_St17integral_constantIbLb1EEEENSt9enable_ifIXsr6oneapi3dpl10__internal26__is_host_execution_policyINSt5decayIT_E4typeEEE5valueET2_E4typeEOST_T0_S10_T1_S11_SW_T3_T4_SQ_EUlSG_SG_SK_SK_SG_SM_E_EEvSZ_S10_S10_S11_S11_SW_S12_S13_':
    /tmp/oneDPL/include/oneapi/dpl/pstl/./omp/parallel_merge.h:87: undefined reference to `omp_in_parallel'
    /usr/bin/ld: /tmp/merge-bd963e.o: in function `_ZN6oneapi3dpl13__omp_backend16__parallel_mergeIRKNS0_9execution2v115parallel_policyESt16reverse_iteratorIN9__gnu_cxx17__normal_iteratorIPdSt6vectorIdSaIdEEEEES8_INSA_IPKdSE_EEESG_St7greaterIdEZNS0_10__internal15__pattern_mergeIS7_SG_SK_SG_SM_St17integral_constantIbLb0EEEENSt9enable_ifIXsr6oneapi3dpl10__internal26__is_host_execution_policyINSt5decayIT_E4typeEEE5valueET2_E4typeEOST_T0_S10_T1_S11_SW_T3_T4_SP_IbLb1EEEUlSG_SG_SK_SK_SG_SM_E_EEvSZ_S10_S10_S11_S11_SW_S12_S13_':
    /tmp/oneDPL/include/oneapi/dpl/pstl/./omp/parallel_merge.h:87: undefined reference to `omp_in_parallel'
    /usr/bin/ld: /tmp/merge-bd963e.o:/tmp/oneDPL/include/oneapi/dpl/pstl/./omp/parallel_merge.h:87: more undefined references to `omp_in_parallel' follow
    
    bug 
    opened by jeffhammond 6
  • CMake: rework test targets

    CMake: rework test targets

    Signed-off-by: Veprev, Alexey [email protected]

    This PR is inspired by #463.

    It implements the following:

    1. Disables tests configuration if oneDPL is subproject;
    2. Removes test targets from all target: now tests are not built by default with just make, you need to explicitly run make build-onedpl-tests

    Additional change: targets for tests in subfolders are renamed: build-onedpl-<subfolder>-tests

    More details are added into cmake/README.md CI correction were applied, you can see examples there.

    @rarutyun @akukanov please take a look. @alexey-katranov I'd consider the similar approach for oneTBB, but with additional user option TBB_TEST that is already in place.

    For oneDPL I don't think the explicit option is useful.

    opened by AlexVeprev 6
  • Refactor iterator tests

    Refactor iterator tests

    Moving the tests of permutation_iterator and discard_iterator into iterators.pass.cpp and using the existing test for random access iterators there for each of them. permutation_iterator is not default constructible, so the code for checking it has been duplicated to avoid that static assert. The functionality of discard_iterator and permutation_iterator have been extended by adding comparison operators and an addition operator that allow them to meet the random access iterator criteria being tested.

    opened by timmiesmith 6
  • Adding memory deallocation calls to algorithm extension USM tests.

    Adding memory deallocation calls to algorithm extension USM tests.

    The test_with_usm function of the algorithm extension tests were not deallocating memory allocated in the function. Correcting this and removing changing cl::sycl to sycl in the tests to match SYCL 2020 support.

    opened by timmiesmith 6
  • Add test for std::real(std::complex) and std::imag(std::complex)

    Add test for std::real(std::complex) and std::imag(std::complex)

    Which functions were tested:

    • std::complex<T>::complex - constructor
    • std::complex<T>::imag
    • std::complex<T>::real
    • std::abs(std::complex<T>)
    • std::arg(std::complex<T>)
    • std::conj(std::complex<T>)
    • std::exp(std::complex<T>)
    • std::imag(std::complex<T>)
    • std::real(std::complex<T>)

    Where these functions were tested:

    • on host;
    • in Kernel.

    Additional points:

    • Added ability to these new tests generate errors from test classes on host and in Kernel (filename, line no, error msg);
    • Added ability to these new tests continue current test work after first error and show all errors (on host) and some amount of first errors (in kernel - 3 first errors) which occurred during test run.
    opened by SergeyKopienko 0
  • SYCL Rewrites of inclusive_scan_by_segment, exclusive_scan_by_segment, and reduce_by_segment

    SYCL Rewrites of inclusive_scan_by_segment, exclusive_scan_by_segment, and reduce_by_segment

    To increase performance of segmented reduction and segmented scans, the SYCL backend implementations of the algorithms have been rewritten using the SYCL group and sub-group functions.

    The new algorithms operate in two primary phases:

    1. Perform a segmented scan or reduction within each individual work-group with equal work assignment, storing partially completed scan / reduction values to global memory.
    2. Apply the partial reduction / scan values from previous work-group(s) to the current.
    opened by mmichel11 1
  • [oneDPL] Added support C++ functor/lambda  for permutation_iterator

    [oneDPL] Added support C++ functor/lambda for permutation_iterator

    [oneDPL] permutation_iterator and permutation_view improvements

    1. Added support usage case like in Boost (https://www.boost.org/doc/libs/1_38_0/libs/iterator/doc/permutation_iterator.html)
    2. Aded support C++ functor as an index map for permutation_iterator (https://oneapi-src.github.io/oneDPL/parallel_api/iterators.html).
    3. Added support C++ functor as an index map for permutation_view
    4. The test coverage was extended as well.
    opened by MikeDvorskiy 1
  • Implementation of Transform Output Iterator

    Implementation of Transform Output Iterator

    An implementation of transform_output_iterator, an iterator that performs a given transformation upon writes only and avoids unnecessary intermediate storage.

    Closes #404

    enhancement 
    opened by mmichel11 0
  • Performance of inclusive_scan_by_segment() might be improved

    Performance of inclusive_scan_by_segment() might be improved

    Performance of inclusive_scan_by_segment() for 250M elements for code like

    int* vals = malloc_shared<int>(num_elements, policy.queue());
    int* keys = malloc_shared<int>(num_elements, policy.queue());
    int* output = malloc_shared<int>(num_elements, policy.queue());
    std::fill(keys, keys + num_elements, 0);
    std::iota(vals, vals + num_elements, 1);
    std::fill(output, output + num_elements, 0);
    
    auto iter_res = oneapi::dpl::inclusive_scan_by_segment(policy, keys, keys + num_elements, vals, output);
    

    is more than 3 times worse when compared with a sample from "Data Parallel C++" book. CPU is 11th Gen Intel(R) Core(TM) i7-1185G7. Compiler is Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2022.0.0 Build 20211123.

    On CPU/OCL platform, one bottleneck is calculation of reminder over power of 2 in several places in scan_impl(). The division might be replaced to &-ing over 0b11..11 thus improving performance ~2 times.

    On GPU, one reason for performance difference is simultaneous reading of input data and mask. I.e., when scan applied to ints, mask is same size as data, as mask type is unsigned int, so requirement to data moving is actually doubled. It seems we need "where to switch" sign here, not whole int per element. To demonstrate that mask access is an issue it's possible to change unsigned int to unsigned char in FlagType. In my case it gives 40% improvement for GPU/OCL.

    opened by Alexandr-Konovalov 1
Releases(oneDPL-2021.7.0-release)
Owner
oneAPI-SRC
oneAPI open source projects
oneAPI-SRC
Tbb - oneAPI Threading Building Blocks (oneTBB)

oneAPI Threading Building Blocks oneTBB is a flexible C++ library that simplifies the work of adding parallelism to complex applications, even if you

oneAPI-SRC 3.9k Aug 7, 2022
A library OS for Linux multi-process applications, with Intel SGX support

Graphene Library OS with Intel SGX Support A Linux-compatible Library OS for Multi-Process Applications NOTE: We are in the middle of transitioning ou

The Gramine Project 238 Jul 29, 2022
Code from https://queue.acm.org/detail.cfm?id=3448307 unzipped

Copyright (C) 2020-2021 Terence Kelly. All rights reserved. Author contact: [email protected], [email protected], [email protected] Adde

Breck Yunits 21 May 30, 2021
BoloPi Software Project

RT-Thread For Bolopi 中文页 Overview Bolopi-F1 is a all io extracted for f1c100s development board.Such like 40pin rgb lcd,dvp(camera),audio,tv in (cvbs)

null 31 Jul 17, 2022
KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software

KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software, aiming at modularity, extensibility, and high performance. Kratos is written in C++, and counts with an extensive Python interface.

KratosMultiphysics 707 Jul 29, 2022
Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common

null 355 Jun 27, 2022
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

ArrayFire 3.9k Jul 31, 2022
A C++ GPU Computing Library for OpenCL

Boost.Compute Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API an

Boost.org 1.3k Jul 29, 2022
C++React: A reactive programming library for C++11.

C++React is reactive programming library for C++14. It enables the declarative definition of data dependencies between state and event flows. Based on

Sebastian 953 Aug 2, 2022
A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependenc

RichieSams 774 Jul 27, 2022
The C++ Standard Library for Parallelism and Concurrency

Documentation: latest, development (master) HPX HPX is a C++ Standard Library for Concurrency and Parallelism. It implements all of the corresponding

The STE||AR Group 1.9k Jul 31, 2022
A C++ library of Concurrent Data Structures

CDS C++ library The Concurrent Data Structures (CDS) library is a collection of concurrent containers that don't require external (manual) synchroniza

Max Khizhinsky 2.1k Aug 1, 2022
OpenCL based GPU accelerated SPH fluid simulation library

libclsph An OpenCL based GPU accelerated SPH fluid simulation library Can I see it in action? Demo #1 Demo #2 Why? Libclsph was created to explore the

null 47 Jul 27, 2022
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Christian Blume 586 Jul 29, 2022
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

VexCL VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to redu

Denis Demidov 674 Aug 4, 2022
:copyright: Concurrent Programming Library (Coroutine) for C11

libconcurrent tiny asymmetric-coroutine library. Description asymmetric-coroutine bidirectional communication by yield_value/resume_value native conte

sharow 348 Jul 27, 2022
Simple and fast C library implementing a thread-safe API to manage hash-tables, linked lists, lock-free ring buffers and queues

libhl C library implementing a set of APIs to efficiently manage some basic data structures such as : hashtables, linked lists, queues, trees, ringbuf

Andrea Guzzo 387 Jul 30, 2022
An optimized C library for math, parallel processing and data movement

PAL: The Parallel Architectures Library The Parallel Architectures Library (PAL) is a compact C library with optimized routines for math, synchronizat

Parallella 296 Jul 24, 2022
A easy to use multithreading thread pool library for C. It is a handy stream like job scheduler with an automatic garbage collector. This is a multithreaded job scheduler for non I/O bound computation.

A easy to use multithreading thread pool library for C. It is a handy stream-like job scheduler with an automatic garbage collector for non I/O bound computation.

Hyoung Min Suh 12 Jun 4, 2022