Multi-backend implementation of SYCL for CPUs and GPUs

Overview

Project logo

hipSYCL - a SYCL implementation for CPUs and GPUs

hipSYCL is a modern SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing toolchains such as CUDA or HIP. hipSYCL currently targets the following devices:

  • Any CPU via OpenMP
  • NVIDIA GPUs via CUDA
  • AMD GPUs via HIP/ROCm
  • Intel GPUs via oneAPI Level Zero and SPIR-V (highly experimental and WIP!)

hipSYCL supports compiling source files into a single binary that can run on all these backends when building against appropriate clang distributions. More information about the compilation flow can be found here.

The runtime architecture of hipSYCL consists of the main library hipSYCL-rt, as well as independent, modular plugin libraries for the individual backends: Runtime architecture

hipSYCL's compilation and runtime design allows hipSYCL to effectively aggregate multiple toolchains that are otherwise incompatible, making them accessible with a single SYCL interface.

The philosophy behind hipSYCL is to leverage such existing toolchains as much as possible. This brings not only maintenance and stability advantages, but enables performance on par with those established toolchains by design, and also allows for maximum interoperability with existing compute platforms. For example, the hipSYCL CUDA and ROCm backends rely on the clang CUDA/HIP frontends that have been augmented by hipSYCL to additionally also understand SYCL code. This means that the hipSYCL compiler can not only compile SYCL code, but also CUDA/HIP code even if they are mixed in the same source file, making all CUDA/HIP features - such as the latest device intrinsics - also available from SYCL code (details). Additionally, vendor-optimized template libraries such as rocPRIM or CUB can also be used with hipSYCL. Consequently, hipSYCL allows for highly optimized code paths in SYCL code for specific devices.

Because a SYCL program compiled with hipSYCL looks just like any other CUDA or HIP program to vendor-provided software, vendor tools such as profilers or debuggers also work well with hipSYCL.

The following image illustrates how hipSYCL fits into the wider SYCL implementation ecosystem:

About the project

While hipSYCL started its life as a hobby project, development is now led and funded by Heidelberg University. hipSYCL not only serves as a research platform, but is also a solution used in production on machines of all scales, including some of the most powerful supercomputers.

Contributing to hipSYCL

We encourage contributions and are looking forward to your pull request! Please have a look at CONTRIBUTING.md. If you need any guidance, please just open an issue and we will get back to you shortly.

If you are a student at Heidelberg University and wish to work on hipSYCL, please get in touch with us. There are various options possible and we are happy to include you in the project :-)

Citing hipSYCL

hipSYCL is a research project. As such, if you use hipSYCL in your research, we kindly request that you cite:

Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL. In Proceedings of the International Workshop on OpenCL (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 8, 1. DOI:https://doi.org/10.1145/3388333.3388658

(This is a talk and available online. Note that some of the content in this talk is outdated by now)

Acknowledgements

We gratefully acknowledge contributions from the community.

Performance

hipSYCL has been repeatedly shown to deliver very competitive performance compared to other SYCL implementations or proprietary solutions like CUDA. See for example:

  • Sohan Lal, Aksel Alpay, Philip Salzmann, Biagio Cosenza, Nicolai Stawinoga, Peter Thoman, Thomas Fahringer, and Vincent Heuveline. 2020. SYCL-Bench: A Versatile Single-Source Benchmark Suite for Heterogeneous Computing. In Proceedings of the International Workshop on OpenCL (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 10, 1. DOI:https://doi.org/10.1145/3388333.3388669
  • Brian Homerding and John Tramm. 2020. Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs. In Proceedings of the International Workshop on OpenCL (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 16, 1–7. DOI:https://doi.org/10.1145/3388333.3388660
  • Tom Deakin and Simon McIntosh-Smith. 2020. Evaluating the performance of HPC-style SYCL applications. In Proceedings of the International Workshop on OpenCL (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 12, 1–11. DOI:https://doi.org/10.1145/3388333.3388643

Benchmarking hipSYCL

When targeting the CUDA or HIP backends, hipSYCL just massages the AST slightly to get clang -x cuda and clang -x hip to accept SYCL code. hipSYCL is not involved in the actual code generation. Therefore any significant deviation in kernel performance compared to clang-compiled CUDA or clang-compiled HIP is unexpected.

As a consequence, if you compare it to other llvm-based compilers please make sure to compile hipSYCL against the same llvm version. Otherwise you would effectively be simply comparing the performance of two different LLVM versions. This is in particular true when comparing it to clang CUDA or clang HIP.

Current state

hipSYCL is not yet a fully conformant SYCL implementation, although many SYCL programs already work with hipSYCL.

Hardware and operating system support

Supported hardware:

  • Any CPU for which a C++17 OpenMP compiler exists
  • NVIDIA CUDA GPUs. Note that clang, which hipSYCL relies on, may not always support the very latest CUDA version which may sometimes impact support for very new hardware. See the clang documentation for more details.
  • AMD GPUs that are supported by ROCm

Operating system support currently strongly focuses on Linux. On Mac, only the CPU backend is expected to work. Windows support with CPU and CUDA backends is experimental, see Using hipSYCL on Windows.

Installing and using hipSYCL

In order to compile software with hipSYCL, use syclcc which automatically adds all required compiler arguments to the CUDA/HIP compiler. syclcc can be used like a regular compiler, i.e. you can use syclcc -o test test.cpp to compile your SYCL application called test.cpp with hipSYCL.

syclcc accepts both command line arguments and environment variables to configure its behavior (e.g., to select the target platform CUDA/ROCm/CPU to compile for). See syclcc --help for a comprehensive list of options.

When targeting a GPU, you will need to provide a target GPU architecture. The expected formats are defined by clang CUDA/HIP. Examples:

  • sm_52: NVIDIA Maxwell GPUs
  • sm_60: NVIDIA Pascal GPUs
  • sm_70: NVIDIA Volta GPUs
  • gfx900: AMD Vega 10 GPUs
  • gfx906: AMD Vega 20 GPUs

The full documentation of syclcc and hints for the CMake integration can be found in using hipSYCL.

Documentation

Issues
  • ROCm backend - build instructions

    ROCm backend - build instructions

    Hi,

    to build hipSYCL with ROCm backend, the instructions say that the "amd-common" branch for llvm/clang/lld from AMD should be used (Link). But currently this is a llvm/clang/lld version 10 (Link).

    And according to the "CMakeLists.txt" of hipSYCL it supports only llvm/clang/lld up to version 9.

    Shouldn´t it be "roc-ocl-2.7x" instead of "amd-common" branch?

    opened by justxi 75
  • Fix cmake config file generation to use correct function

    Fix cmake config file generation to use correct function

    Earlier to this change, the codebase was using regular configure_file to generate the package config file which renders the install files non-relocatable. This commit refactors the regular configure_file to configure_package_config_file command.

    Apart from the major change mentioned above, this commit also did a couple of additional changes.

    • adds openmp flags to the interface link options of hipSYCL-rt target so that this flag is passed to any application using cmake target. This was required on Arch Linux, without which examples are failing to build with undefined reference errors.
    • Fixed the include path values passed to hipSYCL-rt target
    opened by 9prady9 47
  • MacOS CMake issues

    MacOS CMake issues

    Prompted by https://github.com/illuhad/hipSYCL/issues/222#issuecomment-601104011, I tried to build on Mac but it seems to ignore my explicit specification of Clang location and instead finds /usr/bin/clang++.

    It is possible that the issue here is that the Homebrew installs of LLVM don't have the required components, but CMake is not doing a proper job detecting that.

    jrhammon-mac02:build jrhammon$ cmake .. -DCMAKE_INSTALL_PREFIX=/opt/hipsycl -DCMAKE_CXX_COMPILER=/usr/local/Cellar/llvm/9.0.1/bin/clang++ -DCMAKE_C_COMPILER=/usr/local/Cellar/llvm/9.0.1/bin/clang -DDISABLE_LLVM_VERSION_CHECK=ON
    -- The C compiler identification is Clang 9.0.1
    -- The CXX compiler identification is Clang 9.0.1
    -- Check for working C compiler: /usr/local/Cellar/llvm/9.0.1/bin/clang
    -- Check for working C compiler: /usr/local/Cellar/llvm/9.0.1/bin/clang -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Check for working CXX compiler: /usr/local/Cellar/llvm/9.0.1/bin/clang++
    -- Check for working CXX compiler: /usr/local/Cellar/llvm/9.0.1/bin/clang++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    CMake Warning (dev) at CMakeLists.txt:17 (set):
      implicitly converting 'INTEGER' to 'STRING' type.
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Could NOT find LLVM (missing: LLVM_DIR)
    -- Building hipSYCL against LLVM configured from LLVM_DIR-NOTFOUND
    -- Selecting clang: /usr/bin/clang++
    CMake Error at CMakeLists.txt:77 (message):
      clang include path CLANG_INCLUDE_PATH-NOTFOUND does not exist.  Please
      provide clang's internal include path manually.
    
    
    -- Using clang include directory: CLANG_INCLUDE_PATH-NOTFOUND
    -- Configuring incomplete, errors occurred!
    See also "/Users/jrhammon/Work/SYCL/hipSYCL/build/CMakeFiles/CMakeOutput.log".
    
    opened by jeffhammond 36
  • atomics on CPU

    atomics on CPU

    I have the following simple atomic counter example that compiles and runs fine for CUDA, but doesn't compile on the CPU.

    #include <iostream>
    #include <CL/sycl.hpp>
    
    namespace s = cl::sycl;
    
    int main()
    {
        s::queue q;
    
        int counter = 0;
        {
            s::buffer<int> counter_buf(&counter, 1);
    
            q.submit([&](cl::sycl::handler& cgh)
            {
                auto access_counter = counter_buf.get_access<cl::sycl::access::mode::read_write>(cgh);
    
                cgh.parallel_for<class atomic_increment>(s::range<1>(1 << 30), [=] (cl::sycl::id<1> tid)
                {
                    s::atomic<int> atomic_counter { s::global_ptr<int> {&access_counter[0]} };
                    atomic_counter.fetch_add(1);
                });
            });
        }
    
        std::cout << "Counter: " << counter << std::endl;
    }
    
    /opt/hipSYCL/CUDA/bin/syclcc-clang --hipsycl-gpu-arch=sm_52 -O3 atomic-counter.cpp -o atomic-counter-cuda
    

    compiles and runs fine.

    /opt/hipSYCL/CUDA/bin/syclcc-clang --hipsycl-platform=cpu -g atomic-counter.cpp -o atomic-counter-cpu
    In file included from atomic-counter.cpp:3:
    In file included from /opt/hipSYCL/CUDA/bin/../include/CL/sycl.hpp:58:
    /opt/hipSYCL/CUDA/bin/../include/CL/sycl/atomic.hpp:103:12: error: use of undeclared identifier 'atomicAdd'
        return atomicAdd(_ptr, operand);
               ^
    atomic-counter.cpp:23:32: note: in instantiation of function template specialization 'cl::sycl::atomic<int, cl::sycl::access::address_space::global_space>::fetch_add<int, nullptr>' requested here
                    atomic_counter.fetch_add(1);
    

    I know atomics are not fully supported (they are listed as a limitation in README), but other issues suggest that only minor features of atomics are missing. Is the above error by design, or is it something that can be easily fixed?

    opened by mrzv 24
  • windows compilation report

    windows compilation report "lld-link: error: undefined symbol: cuModuleGetFunction"

    After the failure, repeat 'ninja': E:\hipSYCL-sycl-2020\build>ninja [1/1] Linking CXX shared library src\runtime\rt-backend-cuda.dll FAILED: src/runtime/rt-backend-cuda.dll src/runtime/rt-backend-cuda.lib cmd.exe /C "cd . && E:\hipSYCL-sycl-2020\LLVM\bin\clang++.exe -fuse-ld=lld-link -nostartfiles -nostdlib -O2 -g -DNDEBUG -Xclang -gcodeview -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -shared -o src\runtime\rt-backend-cuda.dll -Xlinker /implib:src\runtime\rt-backend-cuda.lib -Xlinker /pdb:src\runtime\rt-backend-cuda.pdb -Xlinker /version:0.0 src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_event.cpp.obj src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_queue.cpp.obj src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_allocator.cpp.obj src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_device_manager.cpp.obj src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_hardware_manager.cpp.obj src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_backend.cpp.obj src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_module.cpp.obj src/runtime/hipSYCL-rt.lib E:/CUDA11/lib/x64/cudart_static.lib -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -loldnames && cd ." lld-link: error: undefined symbol: cuModuleGetFunction

    referenced by E:\hipSYCL-sycl-2020\src\runtime\cuda\cuda_queue.cpp:329 src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_queue.cpp.obj:(public: class hipsycl::rt::result __cdecl hipsycl::rt::cuda_queue::submit_kernel_from_module(class hipsycl::rt::cuda_module_manager &, class hipsycl::rt::cuda_module const &, class std::basic_string<char, struct std::char_traits, class std::allocator> const &, class hipsycl::rt::static_array<3> const &, class hipsycl::rt::static_array<3> const &, unsigned int, void **))

    lld-link: error: undefined symbol: cuLaunchKernel

    referenced by E:\hipSYCL-sycl-2020\src\runtime\cuda\cuda_queue.cpp:337 src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_queue.cpp.obj:(public: class hipsycl::rt::result __cdecl hipsycl::rt::cuda_queue::submit_kernel_from_module(class hipsycl::rt::cuda_module_manager &, class hipsycl::rt::cuda_module const &, class std::basic_string<char, struct std::char_traits, class std::allocator> const &, class hipsycl::rt::static_array<3> const &, class hipsycl::rt::static_array<3> const &, unsigned int, void **))

    lld-link: error: undefined symbol: cuModuleUnload

    referenced by E:\hipSYCL-sycl-2020\src\runtime\cuda\cuda_module.cpp:131 src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_module.cpp.obj:(public: __cdecl hipsycl::rt::cuda_module_manager::~cuda_module_manager(void)) referenced by E:\hipSYCL-sycl-2020\src\runtime\cuda\cuda_module.cpp:182 src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_module.cpp.obj:(public: class hipsycl::rt::result __cdecl hipsycl::rt::cuda_module_manager::load(class hipsycl::rt::device_id, class hipsycl::rt::cuda_module const &, struct CUmod_st *&))

    lld-link: error: undefined symbol: cuModuleLoadDataEx

    referenced by E:\hipSYCL-sycl-2020\src\runtime\cuda\cuda_module.cpp:192 src/runtime/CMakeFiles/rt-backend-cuda.dir/cuda/cuda_module.cpp.obj:(public: class hipsycl::rt::result __cdecl hipsycl::rt::cuda_module_manager::load(class hipsycl::rt::device_id, class hipsycl::rt::cuda_module const &, struct CUmod_st *&)) clang++: error: linker command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed.


    I follow the steps from https://github.com/illuhad/hipSYCL/wiki/Using-hipSYCL-on-Windows use the prebuilt llvm11.1.0 and boost 1.75 binary, hipSYCL "sycl/2020" branch hipSYCL-rt.dll and rt-backend-omp.dll successfully generated but failed to create cuda backend part. I tried with CUDA 10.2 and 11, both report the same errors. Cannot understand why lld-link reports "undefined symbol: cuModuleGetFunction" when linking with cudart_static.lib. I use the official CUDA windows 10 x86_64 release downloaded from NVIDIA website.

    discussion 
    opened by mz24cn 23
  • Problems with compilation on ubuntu 18.04 with rocm 2.3

    Problems with compilation on ubuntu 18.04 with rocm 2.3

    I tried to run cmake with default parameters seems on my system it will be compiled with gcc 7.3.0 but I am getting errors like error: no member named 'make_unique' in namespace 'std' so I tried to switch to clang 6.0.0 which I have on my system but with same result CC=clang CXX=clang++ cmake. I thought clang 6 has default std c++14. So I tried it with CXXFLAGS+=-std=c++17 cmake but I am getting error The platform rocm was explicitly chosen, but it is not available.. This was again with gcc and 7.3.0 so I suppose gcc is not supported. I would expect that your cmake config will try to choose clang as C++ compiler as I see it listed as dependency and will add -std=c++14 at least.

    At last I tried CC=clang CXX=clang++ CXXFLAGS+=-std=c++17 cmake and it was finally successful (with warning in many places warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]). But when I tried to compile sycl source syclcc test.cpp --hipsycl-platform=amd --std=c++17 I got this error:

    warning: argument unused during compilation: '-L/usr/local/bin/../lib' [-Wunused-command-line-argument]
    ld: /opt/rocm/bin/../lib/libhip_hcc.so: undefined reference to symbol '[email protected]@ROCR_1'
    //opt/rocm/hsa/lib/libhsa-runtime64.so.1: error adding symbols: DSO missing from command line
    clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
    

    This clang-9 come from rocm hcc. So I am going to try to set default C and C++ compiler on my system to newest clang.

    So I tried it with clang-7. I again had to specify CXXFLAGS=-std=c++14 before cmake. This time I tried c++14 and hipsycl compiles without warning. But syclcc test.cpp --hipsycl-platform=rocm gives this strange error:

    warning: argument unused during compilation: '-L/usr/local/bin/../lib' [-Wunused-command-line-argument]
    syclcc fatal error: [Errno 2] No such file or directory: 'hipsycl_211bd330025b7952.cpp'
    

    I tried also singulatiry container. I see that it is using clang 6 and adds std c++14:

    + export CXX=clang++-6.0
    + cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_CXX_FLAGS=-std=c++14 -DWITH_CPU_BACKEND=ON -DWITH_ROCM_BACKEND=ON ..
    -- The C compiler identification is GNU 5.4.0
    -- The CXX compiler identification is Clang 6.0.0
    

    But when compiling sycl program I am getting similar error as previously (with additional perl locale warning):

    [email protected]:~$ syclcc test.cpp --hipsycl-platform=rocm -std=c++14
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
    	LANGUAGE = "en_US:en",
    	LC_ALL = (unset),
    	LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    warning: argument unused during compilation: '-L/usr/bin/../lib' [-Wunused-command-line-argument]
    ld: /opt/rocm/bin/../lib/libhip_hcc.so: undefined reference to symbol '[email protected]@ROCR_1'
    //opt/rocm/hsa/lib/libhsa-runtime64.so.1: error adding symbols: DSO missing from command line
    clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
    

    I have latest rocm 2.3. I now tested sample hip program and hcc program which is calling function hsa_system_major_extension_supported and linking with -lhsa-runtime64 and seems everything is working fine.

    But seems when I skip -lhsa-runtime64 then I am getting similar error as syclcc:

    ld: /tmp/tmp.EcD56X0on0/main.host.o: undefined reference to symbol '[email protected]@ROCR_1'
    //opt/rocm/hsa/lib/libhsa-runtime64.so.1: error adding symbols: DSO missing from command line
    clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
    

    And this finally worked from singularity container: syclcc test.cpp --hipsycl-platform=rocm -std=c++14 -lhsa-runtime64

    opened by misos1 22
  • [SYCL2020] Group functions

    [SYCL2020] Group functions

    This PR adds a naive implementation for group_functions to hipSYCL. This includes implementations of

    • group_broadcast
    • group_barrier
    • group_{any,all,none]_of
    • group_reduce
    • group_{ex,in}clusive_scan

    on CPU and NVIDIA/AMD GPUs, as well as tests for these functions. I will provide optimized versions in later PRs. It also includes (group_)functions using two pointers (beginning/end), but these are not in the specification and are not meant to be used yet (as such they reside in the detail namespace).

    At the moment all tests pass (except some problems with the pointer-based functions on CPU which sometimes fail, I am still investigating). (private results)

    I would love to get some feedback. If you find some template-parameters or formatting you don't like, there is a chance I missed them in one of my cleanup/refactoring attempts. Just tell me so I can fix it. Some small changes like splitting the tests into multiple files for faster compilation might be added here,

    opened by DieGoldeneEnte 21
  • Permission issue in manual build

    Permission issue in manual build

    Hello, I'm trying to build hipSYCL manually following the steps in README file.

    git clone --recurse-submodules https://github.com/illuhad/hipSYCL
    cd hipSYCL
    mkdir build
    cd build
    cmake -DCMAKE_INSTALL_PREFIX=. ..
    

    and this is the output

    -- The C compiler identification is GNU 4.8.5
    -- The CXX compiler identification is GNU 4.8.5
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    CMake Warning (dev) at CMakeLists.txt:17 (set):
      implicitly converting 'INTEGER' to 'STRING' type.
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Looking for pthread.h
    -- Looking for pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    -- Found CUDA: /software/nvidia/cuda/10.0 (found version "10.0")
    -- Boost version: 1.57.0
    -- Found the following Boost libraries:
    --   filesystem
    --   system
    -- Boost version: 1.57.0
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /path/hipSYCL/build
    

    then I tried to build it with: make install it causes fatal error:

    make install
    Scanning dependencies of target hipSYCL_cuda
    [  2%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o
    syclcc fatal error: [Errno 13] Permission denied: '/path/hipSYCL/bin/'
    make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o] Error 255
    make[1]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/all] Error 2
    make: *** [all] Error 2
    

    however the permissions for source and build directory have not changed since their creation and also /path/hipSYCL/bin/ exists and contains syclcc syclcc-clang.

    opened by mteimoori 20
  • Strategy: The future of the source-to-source transformation

    Strategy: The future of the source-to-source transformation

    Since we'll have a clang plugin hopefully soonish that will directly allow the clang CUDA/HIP frontend to ingest SYCL code (see issue #34), we could in principle drop the source-to-source transformation entirely. I'd like to start a discussion here with hipSYCL users and developers to get some feedback on possible futures of the hipSYCL compilation toolchain. Is the source-to-source transformation important to you and we should support both source-to-source and the new clang plugin? Do you need nvcc support? Or is clang support (with the plugin) sufficient for you?

    Here are some pros of the source-to-source transformation that come to my mind:

    • Allows compilation of SYCL code with nvcc. This can be interesting from a marketing position ("you can do anything that nvcc can and can use the newest CUDA features right away")
    • Possible to specify areas in the code with preprocessor definitions that hipSYCL shouldn't modify. This could be beneficial if you're interested in mix-and-match with SYCL and CUDA/HIP code.

    The new clang plugin on the other hand gives us:

    • Much more robustness and reliability (there are edge cases in the source-to-source transformation...)
    • Faster compilation speed
    • Solid SYCL support
    • Paves the way for runtime selection whether a kernel should be executed on host or device (could in principle however also be implemented with source-to-source and clang's CUDA implementation [but likely not with nvcc])
    • Potentially even some parts of the C++ standard library could be used in kernels (although not yet with the initial version of the plugin)
    • Implementation of specific optimizations in the future since we have access to the IR

    While not impossible, it may require some additional effort to support both the current source-to-source transformation and the new clang plugin approach because the clang plugin treats any function without attributes implicitly as __host__ __device__. This means that all functions for SYCL kernels (e.g. math functions) must also support compilation for both host and device. At the moment, we assume in the runtime that everything used in kernels is __device__ only. This is also assumed by the current source-to-source transformation. Also, if we still have to support source-to-source, it may limit our ability to implement things with IR transformations.

    discussion 
    opened by illuhad 18
  • compile problem on Centos7 w/ release 0.8.0

    compile problem on Centos7 w/ release 0.8.0

    After installing the binary distribution of release 0.8.0 for centos7, I'm seeing some unusual errors when compiling, and lots and lots of errors during the final linking phase:

    > scl enable rh-python36 devtoolset-7 /bin/bash
    > /opt/hipSYCL/bin/syclcc-clang -std=c++14 --hipsycl-platform=rocm --hipsycl-gpu-arch=gfx900 -g -pthread -o FindPrimesSYCL.o -c FindPrimesSYCL.cpp
    In file included from FindPrimesSYCL.cpp:1:
    In file included from /opt/hipSYCL/bin/../include/CL/sycl.hpp:36:
    In file included from /opt/hipSYCL/bin/../include/CL/sycl/backend/backend.hpp:55:
    In file included from /opt/hipSYCL/rocm/include/hip/hip_runtime.h:56:
    In file included from /opt/hipSYCL/rocm/include/hip/hcc_detail/hip_runtime.h:105:
    > /opt/hipSYCL/rocm/include/hip/hcc_detail/texture_functions.h:173:5: warning: array designators are a C99 extension [-Wc99-designator]
        [HIP_AD_FORMAT_NOT_INITIALIZED] = 1  ,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1 warning generated when compiling for gfx900.
    In file included from FindPrimesSYCL.cpp:1:
    In file included from /opt/hipSYCL/bin/../include/CL/sycl.hpp:36:
    In file included from /opt/hipSYCL/bin/../include/CL/sycl/backend/backend.hpp:55:
    In file included from /opt/hipSYCL/rocm/include/hip/hip_runtime.h:56:
    In file included from /opt/hipSYCL/rocm/include/hip/hcc_detail/hip_runtime.h:105:
    > /opt/hipSYCL/rocm/include/hip/hcc_detail/texture_functions.h:173:5: warning: array designators are a C99 extension [-Wc99-designator]
        [HIP_AD_FORMAT_NOT_INITIALIZED] = 1  ,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1 warning generated when compiling for host.
    > /opt/hipSYCL/bin/syclcc-clang -std=c++14 --hipsycl-platform=rocm --hipsycl-gpu-arch=gfx900 -g -pthread -o Crunch.o -c Crunch.cpp
    
    
    /opt/hipSYCL/bin/syclcc-clang -std=c++14 --hipsycl-platform=rocm --hipsycl-gpu-arch=gfx900 -g -pthread Crunch.o FindPrimesSYCL.o main.cpp -o fp 
    Crunch.o:1:1: error: expected unqualified-id
    <U+007F>ELF<U+0002><U+0001><U+0001><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0001><U+0000>><U+0000><U+0001><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0008>[<U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000>@<U+0000><U+0000><U+0000><U+0000><U+0000>@<U+0000>-<U+0000><U+0001><U+0000>UH<89><E5>H<81><EC><90><U+0000><U+0000><U+0000>1<C0>H<89>}<F8><89>u<F4>H<8D>M<E8>H<89>ωƺ<U+0008><U+0000><U+0000><U+0000><E8><U+0000><U+0000><U+0000><U+0000><E8><U+0000><U+0000><U+0000><U+0000>H<89>E<E0><C7>E<D8><U+0001><U+0000><U+0000><U+0000><8B>u<U+0609><F0><B9><U+0008><U+0000><U+0000><U+0000>H<F7><E1>A<U+000F><90><C0>H<C7><C1><FF><FF><FF><FF>H<U+000F>@<C1>H<89><C7>D<88>E<87><E8><U+0000><U+0000><U+0000><U+0000>H<89>E<D0>H<8B>E<D0>H<C7><U+0000><U+0002><U+0000><U+0000><U+0000>H<C7>E<C8><U+0002><U+0000><U+0000><U+0000>H<C7>E<C0><U+0000><U+0000><U+0000><U+0000>H<8B>E<C0><8B>M<F4><89><CA>H9<D0><U+000F><83>H<U+0001><U+0000><U+0000>H<8B>E<C8>H<U+0005><U+0001><U+0000><U+0000><U+0000>H<89>E<C8><C6>E<DF><U+0001>H<C7>E<B8><U+0002><U+0000><U+0000><U+0000>1<C0>H<8B>M<B8>H;MȈE<86><U+000F><83><U+0006><U+0000><U+0000><U+0000><8A>E߈E<86><8A>E<86><A8><U+0001><U+000F><85><U+0005><U+0000><U+0000><U+0000><E9>2<U+0000><U+0000><U+0000>H<8B>E<C8>1ɉ<CA>H<F7>u<B8>H<83><FA><U+0000><U+000F><85><U+0004><U+0000><U+0000><U+0000><C6>E<DF><U+0000><E9><U+0000><U+0000><U+0000><U+0000>H<8B>E<B8>H<U+0005><U+0001><U+0000><U+0000><U+0000>H<89>E<B8><E9><A5><FF><FF><FF><F6>E<DF><U+0001><U+000F><84><B1><U+0000><U+0000><U+0000><8B>E<U+0603><C0><U+0001><89>E<B4><8B>E<B4><B9><U+0008><U+0000><U+0000><U+0000>H<F7><E1>@<U+000F><90><C6>H<C7><C1><FF><FF><FF><FF>H<U+000F>@<C1>H<89><C7>@<88>u<85><E8><U+0000><U+0000><U+0000><U+0000>H<89>E<A8><C7>E<A4><U+0000><U+0000><U+0000><U+0000><8B>E<A4>;E<D8><U+000F><83>(<U+0000><U+0000><U+0000>H<8B>EЋM<A4><89><CA>H<8B><U+0004><D0>H<8B>U<A8><8B>M<A4><89><CE>H<89><U+0004><F2><8B>E<A4><83><C0><U+0001><89>E<A4><E9><CC><FF><FF><FF>H<8B>E<C8>H<8B>M<A8><8B>U
    <U+0609><D6>H<89><U+0004><F1>H<8B>E<D0>H<83><F8><U+0000>H<89><85>x<FF><FF><FF><U+000F><84><U+000F><U+0000><U+0000><U+0000>H<8B><85>x<FF><FF><FF>H<89><C7><E8><U+0000><U+0000><U+0000><U+0000>H<8B>E<A8>H<89>EЋM<B4><89>M<D8><E9><U+0000><U+0000><U+0000><U+0000>H<8B>E<C0>H<U+0005><U+0001><U+0000><U+0000><U+0000>H<89>E<C0><E9><A6><FE><FF><FF><C7>E<A0><U+0000><U+0000><U+0000><U+0000><8B>E<A0>;E<D8><U+000F><83>P<U+0000><U+0000><U+0000>H<8B>EЋM<A0><89><CA>H<83><<D0><U+0004><U+000F><85>)<U+0000><U+0000><U+0000>H<8B>E<D0>H<83><F8><U+0000>H<89><85>p<FF><FF><FF><U+000F><84><U+000F><U+0000><U+0000><U+0000>H<8B><85>p<FF><FF><FF>H<89><C7><E8><U+0000><U+0000><U+0000><U+0000><E9><U+0000><U+0000><U+0000><U+0000><E9><U+0000><U+0000><U+0000><U+0000><8B>E<A0><83><C0><U+0001><89>E<A0><E9><A4><FF><FF><FF><E8><U+0000><U+0000><U+0000><U+0000>H<89>E<98>H<8D>}<98>H<8D>u<E0><E8><U+0000><U+0000><U+0000><U+0000>H<89>E<88>H<8D>}<90>H<8D>u<88><E8><U+0000><U+0000><U+0000><U+0000>H<8B>E<90>H<89>E<E8>H<8D>}<E8><E8><U+0000><U+0000><U+0000><U+0000><F2><U+000F><U+0010>
    ^
    Crunch.o:1:8: warning: null character ignored [-Wnull-character]
    

    The AMD gpu is a Vega RX 56. I've installed the rocm drivers and dev packages from amd, and the device is shown by rocminfo and clinfo.

    opened by cgleggett 17
  • Improve CMake support

    Improve CMake support

    This improves hipSYCL's CMake support by...

    1. ...moving to out-of-source builds (and adding -isystem to the flags passed through to the source transformation passes) which facilitates the "basic" use case of setting CMAKE_CXX_COMPILER=syclcc, as detailed in #21.
    2. ...generating a CMake package config file during installation, which can be used to add hipSYCL to specific targets in a more idiomatic CMake fashion.

    For the former I've added support for the --hipsycl-transform-dir= and --hipsycl-main-output-file= CLI parameters to hipsycl_rewrite_includes, as already supported by hipsycl_transform_source. A single output path parameter would suffice here, but I opted to align it with the existing convention. To allow the device compilers to find headers that have not been rewritten I'm adding the source directory of every translation unit as an include path (as again is already being done for the transformation passes).

    The package config file is a bit more involved. Ultimately, as discussed in #21, I wanted to to provide a add_sycl_to_target function, as that seems to be an emerging convention between different SYCL implementations. This would make it more convenient to support multiple SYCL implementations from within a single CMakeLists.txt. Due to hipSYCL's nature -- not having a separate device compilation pass which generates an integration header -- I first thought it might not be possible to provide such a function, and that instead we'd have to go for something like CUDA's (now deprecated) cuda_add_executable. However even that turned out to be problematic, as simply setting CMAKE_CXX_COMPILER=syclcc before calling add_executable does not work; CMake seems to rely on global state when deciding on the compiler being used for any given target (i.e., setting the compiler back to the previous value at the end of add_sycl_to_target will cause the SYCL target to be compiled using that compiler as well).

    Together with @w4rh4wk I came up with an entirely different solution, that is pretty nifty I think: We've added SYCL as a completely separate CMake language (similar to e.g. CXX or CUDA). Using set_source_files_properties it is then easy to mark all source files passed into add_sycl_to_target as SYCL, thus instructing CMake to use syclcc to compile them.

    I've opted to not take any additional parameters for add_sycl_to_target, as neither does ComputeCpp's implementation. Instead, hipSYCL is configured through CMake variables. Besides supporting optional parameters such as HIPSYCL_GPU_ARCH, the CMake module includes first-class support for HIPSYCL_PLATFORM. However, unlike bare-bones syclcc, the information about which platforms are available is not based on the compiler executables that can be found within $PATH at the time of compilation, but instead on which backends were actually compiled and installed. Furthermore, the check for whether multiple platforms are available (and none has been explicitly chosen) is done during CMake configuration, not during compilation. This I think greatly improves the overall developer experience.

    The approach to using this package config is as follows:

    1. Build and install hipSYCL.
    2. Within the user application's CMakeLists.txt, do a find_package(hipSYCL CONFIG).
    3. (edit - forgot this step): Add a target as usual using add_executable or add_library, then call add_sycl_to_target(TARGET <target> SOURCES <source files>).
    4. Run cmake with -DCMAKE_PREFIX_PATH=<hipSYCL installation location>/lib. Potentially add additional parameters such as -DHIPSYCL_PLATFORM.
    5. Success!

    In the future, this approach could even enable use cases where only some translation units are compiled with syclcc, while others are simply compiled with the host compiler - which could be good for performance. However for now, unless the other translation units don't touch SYCL at all, this unfortunately doesn't work, as the hipSYCL headers include hardcoded __host__ and __device__ attributes, which will trip up the host compiler.


    I've marked this as a draft PR as there might very well be some usage patterns that are not yet fully supported by this approach. @VileLasagna please have a look whether this is in line with what you've been envisioning as well!

    opened by psalz 17
  • Compiling hipSYCL with cufft

    Compiling hipSYCL with cufft

    Hi! I'm currently trying to run hipSYCL with cuda specific code via __hipsycl_if_target_cuda(...). I need this to run specific FFT algorithms depending on the target, because SYCL doesn't support a FFT by itself yet. Unfortunately I get the following errors:

    ptxas fatal   : Unresolved extern function 'cufftPlan3d'
    clang: error: ptxas command failed with exit code 255 (use -v to see invocation)
    Ubuntu clang version 14.0.6-++20220622053019+f28c006a5895-1~exp1~20220622173056.159
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    InstalledDir: /usr/lib/llvm-14/bin
    

    I created a MWE (well, its not working ;') ) to maybe reproduce it/investigate. https://github.com/Rhynden/mwe_hipsycl_cufft

    Maybe my CMake setup is just wrong/missing something?

    Any help appreciated!

    Kind regards

    discussion 
    opened by Rhynden 4
  • [infra] Introduce pre-commit hooks

    [infra] Introduce pre-commit hooks

    This commit introduces basic style uniformity in the entire repository. Most notably, it fixes all occurrances of trailing whitespaces and missing newlines at the end of files.

    I think it makes sense to defer the commit that documents the pre-commit hooks so that this commit only contains the changes made by the hooks, and the .pre-commit-config.yaml file.

    I also deferred a commit unifying the C++ code via clang-format, since the formatting style should probably be discussed first.

    While these hooks can (and should) be installed by contributors locally, a next step will be adding this to the CI pipelines so that they run before the actual test pipelines start. This way we can give contributors immediate feedback on style issues instead of them having to wait for 20-60 minutes for the full CI feedback.

    opened by aaronmondal 0
  • gfx90c: Error: Not enough local memory to compile group algorithm tests

    gfx90c: Error: Not enough local memory to compile group algorithm tests

    Playing around with Radeon integrated GPU on ROCm 5.2, clang 14 from ROCm. GPU is not officially supported by ROCm, but worth a try ;)

    Compiling the unit tests currently fails with the compiler error:

    [  8%] Building CXX object CMakeFiles/sycl_tests.dir/sycl/group_functions/group_functions_misc.cpp.o
    syclcc warning: No optimization flag was given, optimizations are disabled by default. Performance may be degraded. Compile with e.g. -O2/-O3 to enable optimizations.
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi1ELi83EiEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_1dILi83EiZN21group_functions_tests13group_barrier11test_methodEvEUlRSt6vectorIiSaIiEEmmE_ZNS2_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_iE_ZNS2_11test_methodEvEUlRKS5_SF_mmE_EvmT1_T2_T3_ENKUlRNSA_7handlerEE_clESL_EUlNSA_7nd_itemILi1EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi1ELi119EcEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_1dILi119EcZN21group_functions_tests15group_broadcastIcE11test_methodEvEUlRSt6vectorIcSaIcEEmmE_ZNS3_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_cE_ZNS3_11test_methodEvEUlRKS6_SG_mmE_EvmT1_T2_T3_ENKUlRNSB_7handlerEE_clESM_EUlNSB_7nd_itemILi1EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi2ELi122EcEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_2dILi122EcZN21group_functions_tests15group_broadcastIcE11test_methodEvEUlRSt6vectorIcSaIcEEmmE_ZNS3_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_cE_ZNS3_11test_methodEvEUlRKS6_SG_mmE_EvmT1_T2_T3_ENKUlRNSB_7handlerEE_clESM_EUlNSB_7nd_itemILi2EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi1ELi149EcEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_1dILi149EcZN21group_functions_tests15group_broadcastIcE11test_methodEvEUlRSt6vectorIcSaIcEEmmE_ZNS3_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_cE0_ZNS3_11test_methodEvEUlRKS6_SG_mmE0_EvmT1_T2_T3_ENKUlRNSB_7handlerEE_clESM_EUlNSB_7nd_itemILi1EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi2ELi152EcEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_2dILi152EcZN21group_functions_tests15group_broadcastIcE11test_methodEvEUlRSt6vectorIcSaIcEEmmE_ZNS3_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_cE0_ZNS3_11test_methodEvEUlRKS6_SG_mmE0_EvmT1_T2_T3_ENKUlRNSB_7handlerEE_clESM_EUlNSB_7nd_itemILi2EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi1ELi183EcEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_1dILi183EcZN21group_functions_tests15group_broadcastIcE11test_methodEvEUlRSt6vectorIcSaIcEEmmE_ZNS3_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_cE1_ZNS3_11test_methodEvEUlRKS6_SG_mmE1_EvmT1_T2_T3_ENKUlRNSB_7handlerEE_clESM_EUlNSB_7nd_itemILi1EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi2ELi186EcEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_2dILi186EcZN21group_functions_tests15group_broadcastIcE11test_methodEvEUlRSt6vectorIcSaIcEEmmE_ZNS3_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_cE2_ZNS3_11test_methodEvEUlRKS6_SG_mmE1_EvmT1_T2_T3_ENKUlRNSB_7handlerEE_clESM_EUlNSB_7nd_itemILi2EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi1ELi119EjEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_1dILi119EjZN21group_functions_tests15group_broadcastIjE11test_methodEvEUlRSt6vectorIjSaIjEEmmE_ZNS3_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_jE_ZNS3_11test_methodEvEUlRKS6_SG_mmE_EvmT1_T2_T3_ENKUlRNSB_7handlerEE_clESM_EUlNSB_7nd_itemILi1EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi2ELi122EjEEEEvv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIZZ25test_nd_group_function_2dILi122EjZN21group_functions_tests15group_broadcastIjE11test_methodEvEUlRSt6vectorIjSaIjEEmmE_ZNS3_11test_methodEvEUlT_mN7hipsycl4sycl9sub_groupET0_jE_ZNS3_11test_methodEvEUlRKS6_SG_mmE_EvmT1_T2_T3_ENKUlRNSB_7handlerEE_clESM_EUlNSB_7nd_itemILi2EEEE_Evv'
    error: local memory (204850) exceeds limit (65536) in function '_Z30__hipsycl_kernel_name_templateIN7hipsycl4glue20complete_kernel_nameI11test_kernelILi1ELi149EjEEEEvv'
    

    Not a priority, but maybe we should look into what's going on in this test and whether we can reduce local memory usage. CC @DieGoldeneEnte

    discussion 
    opened by illuhad 2
  • implement known_identity

    implement known_identity

    This PR implements known_identities and adjusts plus, muiltiplies, etc to also accept void as template argument.

    Thanks @fknorr for letting me use your code from #578!

    In addition to the code from #578, this also contains the code from #628 so known_identity also works with sycl::vec.

    opened by DieGoldeneEnte 0
  • Information descriptors are enum values, must be structs

    Information descriptors are enum values, must be structs

    Bug summary

    Per 2020r5 spec, the info descriptors (like sycl::info::device::device_type) must be structs:

    namespace sycl {
    namespace info {
    namespace platform {
    struct profile;
    }}}
    

    In hipSYCL, they are enum values:

    namespace sycl {
    namespace info {
    enum class platform : unsigned int
    {
      profile,
    }}}
    

    To Reproduce

    // Compile with syclcc test.cpp --hipsycl-targets=omp
    #include <sycl/sycl.hpp>
    int main() {
      using dt = sycl::info::device::device_type;
      return 0;
    }
    

    Expected behavior

    The code above should compile fine. In practice, it produces an error:

    test.cpp:4:14: error: no type named 'device_type' in 'hipsycl::sycl::info::device'; did you mean 'hipsycl::sycl::info::device_type'?

    Describe your setup

    • Latest develop version of hipSYCL, 2d937998228826f183a670d927749a4e6cb93a11
    • Describe the dependencies that hipSYCL sits on top of in your setup:
      • clang-14

    Additional context

    Pretty minor thing and easy to work around. Just wanted to record it.

    bug 
    opened by al42and 2
  • C++ std lib version mismatch errors when compiling hipSYCL

    C++ std lib version mismatch errors when compiling hipSYCL

    Hi @illuhad Now I'm compiling hipSYCL with Clang in a Arm platform (before I did in a x86 one). Its default c++ library path has only version 7 libs (/usr/include/c++/7). However there are several other newer GCC versions I can load into the system as modules and I load gcc 9.

    First when I try to make install, it says fatal error: 'filesystem' file not found and that is obvious because g++7 doesn't have stable filesystem lib. Now when I try to include the library path for c++9 lib in CMakeList.txt file and make again, it fails again with a bunch or other errors.

    Below is the output when without including c++9 libs.

    Scanning dependencies of target hipSYCL_clang
    [  1%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/HipsyclClangPlugin.cpp.o
    [  3%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/GlobalsPruningPass.cpp.o
    [  5%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/LoopSplitterInlining.cpp.o
    [  7%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/SplitterAnnotationAnalysis.cpp.o
    [  9%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/IRUtils.cpp.o
    [ 11%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/KernelFlattening.cpp.o
    [ 13%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/LoopsParallelMarker.cpp.o
    [ 15%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/PHIsToAllocas.cpp.o
    [ 16%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/RemoveBarrierCalls.cpp.o
    [ 18%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/CanonicalizeBarriers.cpp.o
    [ 20%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/SimplifyKernel.cpp.o
    [ 22%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/LoopSimplify.cpp.o
    [ 24%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/PipelineBuilder.cpp.o
    [ 26%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/SubCfgFormation.cpp.o
    [ 28%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/UniformityAnalysis.cpp.o
    [ 30%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/VectorShape.cpp.o
    [ 32%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/VectorizationInfo.cpp.o
    [ 33%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/AllocaSSA.cpp.o
    [ 35%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/VectorShapeTransformer.cpp.o
    [ 37%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/Region.cpp.o
    [ 39%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/cbs/SyncDependenceAnalysis.cpp.o
    [ 41%] Linking CXX shared library libhipSYCL_clang.so
    [ 41%] Built target hipSYCL_clang
    Scanning dependencies of target hipSYCL-rt
    [ 43%] Building CXX object src/runtime/CMakeFiles/hipSYCL-rt.dir/application.cpp.o
    [ 45%] Building CXX object src/runtime/CMakeFiles/hipSYCL-rt.dir/runtime.cpp.o
    [ 47%] Building CXX object src/runtime/CMakeFiles/hipSYCL-rt.dir/error.cpp.o
    [ 49%] Building CXX object src/runtime/CMakeFiles/hipSYCL-rt.dir/backend.cpp.o
    [ 50%] Building CXX object src/runtime/CMakeFiles/hipSYCL-rt.dir/backend_loader.cpp.o
    /home/ri-wshilpage/sycl-compilers/hipSYCL/src/runtime/backend_loader.cpp:36:10: fatal error: 'filesystem' file not found
    #include <filesystem>
             ^~~~~~~~~~~~
    1 error generated.
    make[2]: *** [src/runtime/CMakeFiles/hipSYCL-rt.dir/build.make:115: src/runtime/CMakeFiles/hipSYCL-rt.dir/backend_loader.cpp.o] Error 1
    make[1]: *** [CMakeFiles/Makefile2:286: src/runtime/CMakeFiles/hipSYCL-rt.dir/all] Error 2
    make: *** [Makefile:130: all] Error 2
    

    Below is the output when I added include_directories("/opt/gcc/9.3.0/snos/include/g++") into CMakeList.txt and make install.

    Scanning dependencies of target hipSYCL_clang
    [  1%] Building CXX object src/compiler/CMakeFiles/hipSYCL_clang.dir/HipsyclClangPlugin.cpp.o
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/HipsyclClangPlugin.cpp:29:
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/../../include/hipSYCL/compiler/FrontendPlugin.hpp:31:
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/../../include/hipSYCL/compiler/Frontend.hpp:31:
    In file included from /opt/gcc/9.3.0/snos/include/g++/algorithm:61:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_algobase.h:67:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_iterator.h:66:
    /opt/gcc/9.3.0/snos/include/g++/bits/ptr_traits.h:140:14: error: unknown type name '_GLIBCXX20_CONSTEXPR'
          static _GLIBCXX20_CONSTEXPR pointer
                 ^
    /opt/gcc/9.3.0/snos/include/g++/bits/ptr_traits.h:140:42: error: expected ';' at end of declaration list
          static _GLIBCXX20_CONSTEXPR pointer
                                             ^
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/HipsyclClangPlugin.cpp:29:
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/../../include/hipSYCL/compiler/FrontendPlugin.hpp:31:
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/../../include/hipSYCL/compiler/Frontend.hpp:31:
    In file included from /opt/gcc/9.3.0/snos/include/g++/algorithm:62:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_algo.h:62:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_tempbuf.h:60:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_construct.h:59:
    In file included from /opt/gcc/9.3.0/snos/include/g++/new:40:
    /opt/gcc/9.3.0/snos/include/g++/exception:101:3: error: unknown type name '_GLIBCXX17_DEPRECATED'
      _GLIBCXX17_DEPRECATED
      ^
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/HipsyclClangPlugin.cpp:29:
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/../../include/hipSYCL/compiler/FrontendPlugin.hpp:31:
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/../../include/hipSYCL/compiler/Frontend.hpp:31:
    In file included from /opt/gcc/9.3.0/snos/include/g++/algorithm:62:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_algo.h:62:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_tempbuf.h:60:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_construct.h:59:
    /opt/gcc/9.3.0/snos/include/g++/new:125:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD void* operator new(std::size_t) _GLIBCXX_THROW (std::bad_alloc)
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:127:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD void* operator new[](std::size_t) _GLIBCXX_THROW (std::bad_alloc)
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:139:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD void* operator new(std::size_t, const std::nothrow_t&) _GLIBCXX_USE_NOEXCEPT
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:141:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD void* operator new[](std::size_t, const std::nothrow_t&) _GLIBCXX_USE_NOEXCEPT
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:148:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD void* operator new(std::size_t, std::align_val_t)
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:150:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD void* operator new(std::size_t, std::align_val_t, const std::nothrow_t&)
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:156:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD void* operator new[](std::size_t, std::align_val_t)
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:158:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD void* operator new[](std::size_t, std::align_val_t, const std::nothrow_t&)
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:173:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD inline void* operator new(std::size_t, void* __p) _GLIBCXX_USE_NOEXCEPT
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:174:10: error: cannot initialize return object of type 'int *' with an lvalue of type 'void *'
    { return __p; }
             ^~~
    /opt/gcc/9.3.0/snos/include/g++/new:175:1: error: unknown type name '_GLIBCXX_NODISCARD'
    _GLIBCXX_NODISCARD inline void* operator new[](std::size_t, void* __p) _GLIBCXX_USE_NOEXCEPT
    ^
    /opt/gcc/9.3.0/snos/include/g++/new:176:10: error: cannot initialize return object of type 'int *' with an lvalue of type 'void *'
    { return __p; }
             ^~~
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/HipsyclClangPlugin.cpp:29:
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/../../include/hipSYCL/compiler/FrontendPlugin.hpp:31:
    In file included from /home/ri-wshilpage/sycl-compilers/hipSYCL/src/compiler/../../include/hipSYCL/compiler/Frontend.hpp:31:
    In file included from /opt/gcc/9.3.0/snos/include/g++/algorithm:62:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_algo.h:62:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_tempbuf.h:60:
    In file included from /opt/gcc/9.3.0/snos/include/g++/bits/stl_construct.h:61:
    In file included from /opt/gcc/9.3.0/snos/include/g++/ext/alloc_traits.h:36:
    /opt/gcc/9.3.0/snos/include/g++/bits/alloc_traits.h:304:7: error: unknown type name '_GLIBCXX_NODISCARD'
          _GLIBCXX_NODISCARD static pointer
          ^
    /opt/gcc/9.3.0/snos/include/g++/bits/alloc_traits.h:304:40: error: expected ';' at end of declaration list
          _GLIBCXX_NODISCARD static pointer
                                           ^
    /opt/gcc/9.3.0/snos/include/g++/bits/alloc_traits.h:442:7: error: unknown type name '_GLIBCXX_NODISCARD'
          _GLIBCXX_NODISCARD static pointer
          ^
    /opt/gcc/9.3.0/snos/include/g++/bits/alloc_traits.h:442:40: error: expected ';' at end of declaration list
          _GLIBCXX_NODISCARD static pointer
                                           ^
    fatal error: too many errors emitted, stopping now [-ferror-limit=]
    20 errors generated.
    make[2]: *** [src/compiler/CMakeFiles/hipSYCL_clang.dir/build.make:63: src/compiler/CMakeFiles/hipSYCL_clang.dir/HipsyclClangPlugin.cpp.o] Error 1
    make[1]: *** [CMakeFiles/Makefile2:202: src/compiler/CMakeFiles/hipSYCL_clang.dir/all] Error 2
    make: *** [Makefile:130: all] Error 2
    

    I don't understand how Clang picks up standard header files and why there are errors when I included headers for version 9 (or any version that has stable filesystem lib). However, when I try compiling with G++ it compiles with no errors.

    discussion 
    opened by WageeshaR 7
Releases(v0.9.2)
  • v0.9.2(Feb 14, 2022)

    Changes compared to the previous release 0.9.1 (selection)

    The following is an incomplete list of changes and improvements:

    Highlights

    • Initial support for operating as a pure CUDA library for NVIDIA's proprietary nvc++ compiler, without any additional hipSYCL compiler magic. In this flow, LLVM is not required and new NVIDIA hardware can be targeted as soon as NVIDIA adds support in nvc++.
    • Initial support for dedicated compiler support in the CPU backend. These new compilation passes can greatly improve performance of nd_range parallel for kernels on CPU. This allows executing SYCL code efficiently on any CPU supported by LLVM.
    • Scoped parallelism API v2 for a more performance portable programming model
    • Reimplement explicit multipass support for clang >= 13. This allows targeting multiple backends simultaneously, and was previously only supported on clang 11. Kernel names in the binary are now always demangleable as __hipsycl_kernel<KernelNameT> or __hipsycl_kernel<KernelBodyT>.

    SYCL support

    • Support for new SYCL 2020 features such as atomic_ref, device selector API, device aspect API and others
    • Support for SYCL 2020 final group algorithm interface
    • Add support for the profiling API
    • ... more

    Extensions

    • Add initial support for multi-device queue hipSYCL extension to automatically distribute work across multiple devices
    • Add initial support for queue::get_wait_list() hipSYCL extension to allow barrier-like semantics at the queue level
    • Add accessor_variant extension which allows accessors to automatically optimize the internal data layout of the accessor object depending on how they were constructed. This can save registers on device without any changes needed by the user.
    • Add handler::update_device() extension in analogy to already existing update_host(). This can be e.g. used to prefetch data.
    • Complete buffer-USM interoperability API
    • Add support for explicit buffer policy extension and asynchronous buffers

    See the documentation on extensions for more details.

    Optimizations

    • Automatic work distribution across multiple streams
    • Fix massive performance bug caused by a bug in the kernel cache in the Level Zero backend
    • Optimize CUDA backend to perform aggressive CUDA module caching in an explicit multipass scenario. This can greatly improve performance of the cuda.explicit-multipass compilation flow when multiple translation units are involved.
    • Several performance fixes and improvements in the hipSYCL runtime. Especially when spawning many tasks, performance can now be significantly better.
    • ... more

    Bug fixes and other improvements

    Yes, a lot of them :-)

    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Mar 29, 2021)

    hipSYCL 0.9.1

    -- This release is dedicated to the memory of Oliver M. Some things just end too soon.

    New major features

    • Add new "explicit multipass" compilation model, allowing to simultaneously target all of hipSYCL's backends. This means hipSYCL can now compile to a binary that runs can run on devices from multiple vendors. Details on the compilation flow can be found here: https://github.com/illuhad/hipSYCL/blob/develop/doc/compilation.md
    • Introduce plugin architecture for backends of the hipSYCL runtime. This means hipSYCL now looks for backend plugins at runtime, allowing to extend an already existing hipSYCL installation with support for additional hardware without changing the already installed components.
    • Initial, experimental support for Intel GPUs using Level Zero and SPIR-V
    • Introducing initial support for large portions of oneDPL using our fork at https://github.com/hipSYCL/oneDPL
    • hipSYCL is now also tested on Windows in CI, although Windows support is still experimental.

    New features and extensions

    • Command group properties that can influence how kernels or other operations are scheduled or executed:
      • hipSYCL_retarget command group property. Execute an operation submitted to a queue on an arbitrary device instead of the one the queue is bound to.
      • hipSYCL_prefer_group_size<Dim> command group property. Provides a recommendation to hipSYCL which group size to choose for basic parallel for kernels.
      • hipSYCL_prefer_execution_lane command group property. Provides a hint to the runtime on which backend queue (e.g. CUDA stream) an operation should be executed. This can be used to optimize kernel concurrency or overlap of data transfers and compute in case the hipSYCL scheduler does not already automatically submit an optimal configuration.
    • Comprehensive interoperability framework between buffers and USM pointers. This includes extracting USM pointers from existing buffer objects, turning any buffer into a collection of USM pointers, as well as constructing buffer objects on top of existing USM pointers.
    • The hipSYCL_page_size buffer property can be used to enable data state tracking inside a buffer at a granularity below the buffer size. This can be used to allow multiple kernels to concurrently write to the same buffer as long as they access different hipSYCL data management pages. Unlike subbuffers, this also works with multi-dimensional strided memory accesses.
    • Synchronous sycl::mem_advise() as free function
    • handler::prefetch_host() and queue::prefetch_host() for a simpler mechanism of prefetching USM allocations to host memory.
    • Explicit buffer policies to make programmer intent clearer as well as asynchronous buffer types that do not block in the destructor, which can improve performance. For example, auto v = sycl::make_async_view(ptr, range) constructs a buffer that operates directly on the input pointer and does not block in the destructor.
    • HIPSYCL_VISIBLITY_MASK environment variable can be used to select which backends should be loaded.

    See https://github.com/illuhad/hipSYCL/blob/develop/doc/extensions.md for a list of all hipSYCL extensions with more details.

    Optimizations and improvements

    • Hand-tuned optimizations for SYCL 2020 group algorithms
    • Automatic distribution of kernels across multiple CUDA/HIP streams
    • Improved support for newer ROCm versions
    • SYCL 2020 accessor deduction guides and host_accessor
    • Improve handling of Multi-GPU setups
    • Significant performance improvements for queue::wait()
    • Early DAG optimizations to improve handling of complex and large dependency graphs
    • Optimizations to elide unnecessary synchronization between DAG nodes

    Bug fixes and other improvements

    Yes, a lot of them!

    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Dec 10, 2020)

    hipSYCL 0.9.0

    hipSYCL 0.9 is packed with tons of new features compared to the older 0.8 series:

    Support for key SYCL 2020 features

    hipSYCL 0.9.0 introduces support for several key SYCL 2020 features, including:

    • Unified shared memory provides a pointer-based memory model as an alternative to the traditional buffer-accessor model
    • SYCL 2020 generalized backend model and backend interoperability provides generic mechanisms for interoperability between the underlying backend objects and SYCL
    • Queue shortcuts for kernel invocation and USM memory management functions
    • Inorder queues to submit kernels in order when a task graph is not required
    • Unnamed kernal lambdas (requires building hipSYCL against clang >= 10)
    • Subgroups
    • Group algorithms for parallel primitives at work group and subgroup level (Note that the interface may change slightly with the release of SYCL 2020 final, optimization is ongoing)
    • Reductions provide a simple way to carry out arbitrary amounts of reduction operations across all work items of a kernel using either predefined or user-provided reduction operators (Note that the interface may change slightly with the release of SYCL 2020 final, optimization is ongoing). Currently only scalar reductions are supported. Multiple simultaneous reductions are supported. In addition to the requirements of the SYCL specification, we also support reductions for the hierarchical and scoped parallelism models.
    • ... and more! See here for more information on the SYCL 2020 coverage of current hipSYCL: https://github.com/hipSYCL/featuresupport

    Unique hipSYCL extensions

    There are two new extensions in hipSYCL 0.9.0:

    • Enqueuing custom backend operations for highly efficient backend interoperability: https://github.com/illuhad/hipSYCL/blob/develop/doc/enqueue-custom-operation.md
    • Scoped parallellism is a novel kernel execution model designed for performance portability between host and device backends: https://github.com/illuhad/hipSYCL/blob/develop/doc/scoped-parallelism.md

    New runtime library

    hipSYCL 0.9.0 is the first release containing the entirely rewritten, brand new runtime library, which includes features such as:

    • Single library for all backends (libhipSYCL-rt) instead of libraries for each backend (libhipSYCL_cpu, libhipSYCL_cuda etc)
    • Strict seperation between backend specific code and generic code, clear, simple interface to add new backends, making it easy to add additional backends in the future
    • Multiple runtime backends can now be active at the same time and interact
    • SYCL interface is now header-only; bootstrap mode in syclcc is no longer required and has been removed. When building hipSYCL, only the runtime needs to be compiled which can be done with any regular C++ compiler. This should simplify the build process greatly.
    • Architecture supports arbitrary execution models in different backends - queue/stream based, task graphs etc.
    • CUDA and CPU backends do not depend on HIP API anymore. The CUDA backend now goes directly to CUDA without going through HIP, and the CPU backend goes directly to OpenMP without going through hipCPU. hipCPU and HIP submodules are no longer required and have been removed.
    • Strict separation between SYCL interface and runtime, making it easy to expose new features (e.g. SYCL 2020) in the SYCL interface by leveraging the SYCL runtime interfaces underneath.
    • For each operation, SYCL interface can pass additional information to runtime/scheduler using hints framework. Device on which an operation is executed is just another hint for the runtime.
    • Support for lazy DAG execution (Note: Only partially activated by default)
    • Almost entirely callback-free execution model in CUDA/ROCm backends for potentially higher task throughput
    • New memory management system and improved multi-GPU support
      • manages arbitrary allocations on multiple devices
      • manages memory potentially below buffer granularity, using 3D page table to track invalid memory regions (not yet fully exposed)
    • Backend queues (e.g. CUDA streams) are maintained by the backend in a pool, the scheduler then distributes operations across the queues. No matter how many sycl::queues exist, compute/memory-overlap always works equally well. This means a sycl::queue is now nothing more than an interface to the runtime.
    • Vastly improved error handling. Proper implementation of async errors/error handlers. Task execution will be cancelled when an error is detected.
    • ROCm backend: Add support for 3D data transfers

    syclcc and compilation improvements

    • new --hipsycl-targets flag that allows to compile for multiple targets and backends, e.g. syclcc --hipsycl-targets="omp;hip:gfx906,gfx900" compiles for the OpenMP backend as well as for Vega 10 and Vega 20. Note that simultaneous compilation for both NVIDIA and AMD GPUs is not supported due to clang limitations.
    • The compiler arguments and linker flags passed to backend compilers are now all exposed in cmake (and syclcc.json), giving the user more control to adapt the compilation flow to individual requirements. This can be helpful for uncommon setup scenarios where different flags may be required.

    Performance improvements

    • New execution model for nd_range parallel for on CPU, bringing several orders of magnitudes of performance. Note that nd_range parallel for is inherently difficult to implement in library-only CPU backends, and basic parallel for or our scoped parallelism extension should be preferred if possible.

    Fixes and other improvements

    Yes, a lot of them :-)

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Sep 24, 2019)

    Note: hipSYCL 0.8.0 is deprecated, users are encouraged to use our package repositories instead

    This is the release of hipSYCL 0.8.0. We provide the following packages:

    • hipSYCL-base provides the basic LLVM compiler stack that is needed in any case
    • hipSYCL-rocm provides a compatible ROCm stack that additionally allows hipSYCL to target AMD GPUs
    • hipSYCL provides the actual hipSYCL libraries, tools and headers

    While we cannot provide matching CUDA packages for NVIDIA support due to legal reasons, scripts for installing a matching CUDA distribution as well as scripts to generate CUDA packages are provided. You will find further information in the readme here on github.

    At the moment, Arch Linux, CentOS 7 and Ubuntu 18.04 packages are provided.

    Source code(tar.gz)
    Source code(zip)
    hipSYCL-0.8.0-20190924-archlinux-x86_64.pkg.tar.xz(339.03 KB)
    hipSYCL-0.8.0-20190924-centos-7.x86_64.rpm(386.55 KB)
    hipSYCL-0.8.0-20190924-ubuntu-18.04.deb(342.65 KB)
    hipSYCL-base-0.8.0-20190924-archlinux-x86_64.pkg.tar.xz(246.50 MB)
    hipSYCL-base-0.8.0-20190924-centos-7.x86_64.rpm(357.18 MB)
    hipSYCL-base-0.8.0-20190924-ubuntu-18.04.deb(258.21 MB)
    hipSYCL-rocm-0.8.0-20190924-archlinux-x86_64.pkg.tar.xz(569.58 MB)
    hipSYCL-rocm-0.8.0-20190924-centos-7.x86_64.rpm(712.43 MB)
    hipSYCL-rocm-0.8.0-20190924-ubuntu-18.04.deb(606.87 MB)
  • v0.8.0-rc1(Sep 19, 2019)

    This is a prerelease of hipSYCL 0.8.0. In particular, it serves to test new packages of the entire hipSYCL stack. We provide the following packages:

    • hipSYCL-base provides the basic LLVM compiler stack that is needed in any case
    • hipSYCL-rocm provides a compatible ROCm stack that additionally allows hipSYCL to target AMD GPUs
    • hipSYCL provides the actual hipSYCL libraries, tools and headers

    While we cannot provide matching CUDA packages due to legal reasons, CUDA installation scripts will be provided for the actual hipSYCL 0.8.0 release.

    At the moment, Arch Linux and Ubuntu 18.04 packages are provided.

    Source code(tar.gz)
    Source code(zip)
    hipSYCL-0.8.0-20190919-arch-x86_64.pkg.tar.xz(337.23 KB)
    hipSYCL-0.8.0-20190919-ubuntu-18.04.deb(340.08 KB)
    hipSYCL-base-0.8.0-20190919-arch-x86_64.pkg.tar.xz(240.88 MB)
    hipSYCL-base-0.8.0-20190919-ubuntu-18.04.deb(251.18 MB)
    hipSYCL-rocm-0.8.0-20190919-arch-x86_64.pkg.tar.xz(574.30 MB)
    hipSYCL-rocm-0.8.0-20190919-ubuntu-18.04.deb(612.35 MB)
Owner
Aksel Alpay
Aksel Alpay
Vgpu unlock - Unlock vGPU functionality for consumer grade GPUs.

vgpu_unlock Unlock vGPU functionality for consumer-grade Nvidia GPUs. Important! This tool is not guarenteed to work out of the box in some cases, so

Jonathan Johansson 3.3k Aug 2, 2022
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

moodycamel::ConcurrentQueue An industrial-strength lock-free queue for C++. Note: If all you need is a single-producer, single-consumer queue, I have

Cameron 7k Aug 6, 2022
A bounded multi-producer multi-consumer concurrent queue written in C++11

MPMCQueue.h A bounded multi-producer multi-consumer concurrent queue written in C++11. It's battle hardened and used daily in production: In the Frost

Erik Rigtorp 745 Aug 4, 2022
C++11 thread safe, multi-producer, multi-consumer blocking queue, stack & priority queue class

BlockingCollection BlockingCollection is a C++11 thread safe collection class that provides the following features: Modeled after .NET BlockingCollect

Code Ex Machina LLC 45 Jul 13, 2022
A mod menu base for GTA - Custom UI, backend == BigBase

Custom Base This is a mod menu base made for Grand Theft Auto V and compatible with GTA Online aswell. Making a parent submenu Parent submenus are the

null 4 Jan 16, 2022
Powerful multi-threaded coroutine dispatcher and parallel execution engine

Quantum Library : A scalable C++ coroutine framework Quantum is a full-featured and powerful C++ framework build on top of the Boost coroutine library

Bloomberg 447 Jul 25, 2022
A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependenc

RichieSams 774 Jul 27, 2022
Optimized primitives for collective multi-GPU communication

NCCL Optimized primitives for inter-GPU communication. Introduction NCCL (pronounced "Nickel") is a stand-alone library of standard communication rout

NVIDIA Corporation 1.8k Aug 4, 2022
lc is a fast multi-threaded line counter.

Fast multi-threaded line counter in Modern C++ (2-10x faster than `wc -l` for large files)

Pranav 13 Jul 28, 2022
A library OS for Linux multi-process applications, with Intel SGX support

Graphene Library OS with Intel SGX Support A Linux-compatible Library OS for Multi-Process Applications NOTE: We are in the middle of transitioning ou

The Gramine Project 238 Jul 29, 2022
KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software

KRATOS Multiphysics ("Kratos") is a framework for building parallel, multi-disciplinary simulation software, aiming at modularity, extensibility, and high performance. Kratos is written in C++, and counts with an extensive Python interface.

KratosMultiphysics 707 Jul 29, 2022
This is a C++ package for multi-armed bandit simulations

This is a C++ package for multi-armed bandit simulations.

jkomiyama 130 Apr 26, 2022
Concurrency Kit 2.1k Aug 2, 2022
An implementation of Actor, Publish-Subscribe, and CSP models in one rather small C++ framework. With performance, quality, and stability proved by years in the production.

What is SObjectizer? What distinguishes SObjectizer? SObjectizer is not like TBB, taskflow or HPX Show me the code! HelloWorld example Ping-Pong examp

Stiffstream 293 Jul 20, 2022
Arcana.cpp - Arcana.cpp is a collection of helpers and utility code for low overhead, cross platform C++ implementation of task-based asynchrony.

Arcana.cpp Arcana is a collection of general purpose C++ utilities with no code that is specific to a particular project or specialized technology are

Microsoft 62 Jul 14, 2022
Parallel-util - Simple header-only implementation of "parallel for" and "parallel map" for C++11

parallel-util A single-header implementation of parallel_for, parallel_map, and parallel_exec using C++11. This library is based on multi-threading on

Yuki Koyama 27 Jun 24, 2022
Fast, generalized, implementation of the Chase-Lev lock-free work-stealing deque for C++17

riften::Deque A bleeding-edge lock-free, single-producer multi-consumer, Chase-Lev work stealing deque as presented in the paper "Dynamic Circular Wor

Conor Williams 115 Jul 14, 2022
An ultra-simple thread pool implementation for running void() functions in multiple worker threads

void_thread_pool.cpp © 2021 Dr Sebastien Sikora. [email protected] Updated 06/11/2021. What is it? void_thread_pool.cpp is an ultra-simple

Seb Sikora 1 Nov 19, 2021