HIPIFY: Convert CUDA to Portable C++ Code

Overview

HIPIFY

Tools to translate CUDA source code into portable HIP C++ automatically

Table of Contents

hipify-clang

hipify-clang is a clang-based tool for translating CUDA sources into HIP sources. It translates CUDA source into an abstract syntax tree, which is traversed by transformation matchers. After applying all the matchers, the output HIP source is produced.

Advantages:

  1. It is a translator; thus, any even very complicated constructs will be parsed successfully, or an error will be reported.
  2. It supports clang options like -I, -D, --cuda-path, etc.
  3. Seamless support of new CUDA versions as it is clang's responsibility.
  4. Ease of support.

Disadvantages:

  1. The main advantage is also the main disadvantage: the input CUDA code should be correct; incorrect code wouldn't be translated to HIP.
  2. CUDA should be installed and provided in case of multiple installations by --cuda-path option.
  3. All the includes and defines should be provided to transform code successfully.

hipify-clang: dependencies

hipify-clang requires:

  1. LLVM+CLANG of at least version 3.8.0; the latest stable and recommended release: 12.0.1.

  2. CUDA of at least version 7.0, the latest supported version is 11.4.

LLVM release version CUDA latest supported version Windows Linux
3.8.0*, 3.8.1*,
3.9.0*, 3.9.1*
7.5 + +
4.0.0, 4.0.1,
5.0.0, 5.0.1, 5.0.2
8.0 + +
6.0.0, 6.0.1 9.0 + +
7.0.0, 7.0.1, 7.1.0 9.2 works only with the patch
due to the clang's bug 38811
patch for 7.0.0**
patch for 7.0.1**
patch for 7.1.0**
-
not working due to
the clang's bug 36384
8.0.0, 8.0.1 10.0 works only with the patch
due to the clang's bug 38811
patch for 8.0.0**
patch for 8.0.1**
+
9.0.0, 9.0.1 10.1 + +
10.0.0, 10.0.1 11.0 + +
11.0.1, 11.1.0, 11.1.1 works only with the patch
due to the clang's bug 47332
patch for 10.0.0***
patch for 10.0.1***
11.0.0 11.0 + +
11.0.1, 11.1.0, 11.1.1 works only with the patch
due to the clang's bug 47332
patch for 11.0.0***
11.0.1, 11.1.0 11.2.2 + +
12.0.0, 12.0.1 11.4 LATEST STABLE CONFIG

* LLVM 3.x is not supported anymore but might still work.

** Download the patch and unpack it into your LLVM distributive directory: a few header files will be overwritten; rebuilding of LLVM is not needed.

*** Download the patch and unpack it into your LLVM source directory: the file Cuda.cpp will be overwritten; needs further rebuilding of LLVM.

In most cases, you can get a suitable version of LLVM+CLANG with your package manager.

Failing that or having multiple versions of LLVM, you can download a release archive, build or install it, and set CMAKE_PREFIX_PATH so cmake can find it; for instance: -DCMAKE_PREFIX_PATH=d:\LLVM\12.0.1\dist

hipify-clang: usage

To process a file, hipify-clang needs access to the same headers that would be required to compile it with clang.

For example:

./hipify-clang square.cu --cuda-path=/usr/local/cuda-11.4 -I /usr/local/cuda-11.4/samples/common/inc

hipify-clang arguments are given first, followed by a separator '--', and then the arguments you'd pass to clang if you were compiling the input file. For example:

./hipify-clang cpp17.cu --cuda-path=/usr/local/cuda-11.4 -- -std=c++17

The Clang manual for compiling CUDA may be useful.

For some hipification automation (starting from clang 8.0.0), it is also possible to provide a Compilation Database in JSON format in the compile_commands.json file:

-p <folder containing compile_commands.json> or
-p=<folder containing compile_commands.json>

The compilation database should be provided in the compile_commands.json file or generated by clang based on cmake; options separator '--' must not be used.

For a list of hipify-clang options, run hipify-clang --help.

hipify-clang: building

mkdir build dist
cd build

cmake \
 -DCMAKE_INSTALL_PREFIX=../dist \
 -DCMAKE_BUILD_TYPE=Release \
 ..

make -j install

On Windows, the following option should be specified for cmake at first place: -G "Visual Studio 16 2019 Win64"; the generated hipify-clang.sln should be built by Visual Studio 16 2019 instead of make. Please, see hipify-clang: Windows for the supported tools for building.

Debug build type -DCMAKE_BUILD_TYPE=Debug is also supported and tested; LLVM+CLANG should be built in Debug mode as well. 64-bit build mode (-Thost=x64 on Windows) is also supported; LLVM+CLANG should be built in 64-bit mode as well.

The binary can then be found at ./dist/bin/hipify-clang.

hipify-clang: testing

hipify-clang has unit tests using LLVM lit/FileCheck.

LLVM+CLANG should be built from sources, pre-built binaries are not exhaustive for testing. Before building ensure that the software required for building is of an appropriate version.

LLVM 9.0.1 or older:

  1. download LLVM+CLANG sources;
  2. build LLVM+CLANG:

Linux:

     cmake \
      -DCMAKE_INSTALL_PREFIX=../dist \
      -DLLVM_SOURCE_DIR=../llvm \
      -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
      -DCMAKE_BUILD_TYPE=Release \
      ../llvm
     make -j install

Windows:

     cmake \
      -G "Visual Studio 16 2019" \
      -A x64 \
      -DCMAKE_INSTALL_PREFIX=../dist \
      -DLLVM_SOURCE_DIR=../llvm \
      -DLLVM_TARGETS_TO_BUILD="NVPTX" \
      -DCMAKE_BUILD_TYPE=Release \
      -Thost=x64 \
      ../llvm

Run Visual Studio 16 2019, open the generated LLVM.sln, build all, build project INSTALL.

LLVM 10.0.0 or newer:

  1. download LLVM project sources;
  2. build LLVM project:

Linux:

     cmake \
      -DCMAKE_INSTALL_PREFIX=../dist \
      -DLLVM_SOURCE_DIR=../llvm-project \
      -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
      -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" \
      -DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN=ON \
      -DCMAKE_BUILD_TYPE=Release \
      ../llvm-project/llvm
     make -j install

Windows:

     cmake \
      -G "Visual Studio 16 2019" \
      -A x64 \
      -DCMAKE_INSTALL_PREFIX=../dist \
      -DLLVM_SOURCE_DIR=../llvm-project \
      -DLLVM_TARGETS_TO_BUILD="NVPTX" \
      -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" \
      -DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -Thost=x64 \
      ../llvm-project/llvm

Run Visual Studio 16 2019, open the generated LLVM.sln, build all, build project INSTALL.

  1. Ensure CUDA of minimum version 7.0 is installed.

    • Having multiple CUDA installations to choose a particular version the DCUDA_TOOLKIT_ROOT_DIR option should be specified:

      • Linux: -DCUDA_TOOLKIT_ROOT_DIR=/usr/include

      • Windows: -DCUDA_TOOLKIT_ROOT_DIR="c:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4"

        -DCUDA_SDK_ROOT_DIR="c:/ProgramData/NVIDIA Corporation/CUDA Samples/v11.4"

  2. Ensure cuDNN of the version corresponding to CUDA's version is installed.

    • Path to cuDNN should be specified by the CUDA_DNN_ROOT_DIR option:

      • Linux: -DCUDA_DNN_ROOT_DIR=/usr/include

      • Windows: -DCUDA_DNN_ROOT_DIR=d:/CUDNN/cudnn-11.4-windows-x64-v8.2.2

  3. Ensure CUB of the version corresponding to CUDA's version is installed.

    • Path to CUB should be specified by the CUDA_CUB_ROOT_DIR option:

      • Linux: -DCUDA_CUB_ROOT_DIR=/srv/git/CUB

      • Windows: -DCUDA_CUB_ROOT_DIR=d:/GIT/cub

  4. Ensure python of minimum required version 2.7 is installed.

  5. Ensure lit and FileCheck are installed - these are distributed with LLVM.

    • Install lit into python:

      • Linux: python /usr/llvm/12.0.1/llvm-project/llvm/utils/lit/setup.py install

      • Windows: python d:/LLVM/12.0.1/llvm-project/llvm/utils/lit/setup.py install

    • Starting with LLVM 6.0.1 path to llvm-lit python script should be specified by the LLVM_EXTERNAL_LIT option:

      • Linux: -DLLVM_EXTERNAL_LIT=/usr/llvm/12.0.1/build/bin/llvm-lit

      • Windows: -DLLVM_EXTERNAL_LIT=d:/LLVM/12.0.1/build/Release/bin/llvm-lit.py

    • FileCheck:

      • Linux: copy from /usr/llvm/12.0.1/build/bin/ to CMAKE_INSTALL_PREFIX/dist/bin

      • Windows: copy from d:/LLVM/12.0.1/build/Release/bin to CMAKE_INSTALL_PREFIX/dist/bin

      • Or specify the path to FileCheck in CMAKE_INSTALL_PREFIX option

  6. Set HIPIFY_CLANG_TESTS option turned on: -DHIPIFY_CLANG_TESTS=1.

  7. Build and run tests:

hipify-clang: Linux

On Linux the following configurations are tested:

Ubuntu 14: LLVM 4.0.0 - 7.1.0, CUDA 7.0 - 9.0, cuDNN 5.0.5 - 7.6.5

Ubuntu 16-18: LLVM 8.0.0 - 12.0.1, CUDA 8.0 - 10.2, cuDNN 5.1.10 - 8.0.5

Ubuntu 20: LLVM 9.0.0 - 12.0.1, CUDA 8.0 - 11.4, cuDNN 5.1.10 - 8.2.2

Minimum build system requirements for the above configurations:

Python 2.7, cmake 3.5.1, GNU C/C++ 6.1.

Recommended build system requirements:

Python 3.9.5, cmake 3.20.2, GNU C/C++ 11.1.

Here is an example of building hipify-clang with testing support on Ubuntu 20.04.1:

cmake
 -DHIPIFY_CLANG_TESTS=1 \
 -DCMAKE_BUILD_TYPE=Release \
 -DCMAKE_INSTALL_PREFIX=../dist \
 -DCMAKE_PREFIX_PATH=/usr/llvm/12.0.1/dist \
 -DCUDA_TOOLKIT_ROOT_DIR=/usr/include \
 -DCUDA_DNN_ROOT_DIR=/usr/include \
 -DCUDA_CUB_ROOT_DIR=/usr/CUB \
 -DLLVM_EXTERNAL_LIT=/usr/llvm/12.0.1/build/bin/llvm-lit \
 ..

A corresponding successful output:

-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11")
-- Found LLVM 12.0.1:
--    - CMake module path: /usr/llvm/12.0.1/dist/lib/cmake/llvm
--    - Include path     : /usr/llvm/12.0.1/dist/include
--    - Binary path      : /usr/llvm/12.0.1/dist/bin
-- Linker detection: GNU ld
-- Found PythonInterp: /usr/bin/python3.8 (found suitable version "3.8.5", minimum required is "2.7")
-- Found lit: /usr/local/bin/lit
-- Found FileCheck: /usr/llvm/12.0.1/dist/bin/FileCheck
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/include (found version "11.4")
-- Configuring done
-- Generating done
-- Build files have been written to: /usr/hipify/build
make test-hipify

A corresponding successful output:

Running HIPify regression tests
========================================
CUDA 11.4 - will be used for testing
LLVM 12.0.1 - will be used for testing
x86_64 - Platform architecture
Linux 5.4.0-51-generic - Platform OS
64 - hipify-clang binary bitness
64 - python 3.8.5 binary bitness
========================================
-- Testing: 69 tests, 12 threads --
PASS: hipify :: unit_tests/casts/reinterpret_cast.cu (1 of 69)
PASS: hipify :: unit_tests/device/atomics.cu (2 of 69)
PASS: hipify :: unit_tests/compilation_database/cd_intro.cu (3 of 69)
PASS: hipify :: unit_tests/device/device_symbols.cu (4 of 69)
PASS: hipify :: unit_tests/device/math_functions.cu (5 of 69)
PASS: hipify :: unit_tests/headers/headers_test_01.cu (6 of 69)
PASS: hipify :: unit_tests/headers/headers_test_02.cu (7 of 69)
PASS: hipify :: unit_tests/headers/headers_test_03.cu (8 of 69)
PASS: hipify :: unit_tests/headers/headers_test_05.cu (9 of 69)
PASS: hipify :: unit_tests/headers/headers_test_06.cu (10 of 69)
PASS: hipify :: unit_tests/headers/headers_test_04.cu (11 of 69)
PASS: hipify :: unit_tests/headers/headers_test_07.cu (12 of 69)
PASS: hipify :: unit_tests/headers/headers_test_10.cu (13 of 69)
PASS: hipify :: unit_tests/headers/headers_test_11.cu (14 of 69)
PASS: hipify :: unit_tests/headers/headers_test_08.cu (15 of 69)
PASS: hipify :: unit_tests/kernel_launch/kernel_launch_01.cu (16 of 69)
PASS: hipify :: unit_tests/headers/headers_test_09.cu (17 of 69)
PASS: hipify :: unit_tests/libraries/CAFFE2/caffe2_02.cu (18 of 69)
PASS: hipify :: unit_tests/libraries/CAFFE2/caffe2_01.cu (19 of 69)
PASS: hipify :: unit_tests/libraries/cuBLAS/cublas_0_based_indexing.cu (20 of 69)
PASS: hipify :: unit_tests/libraries/CUB/cub_03.cu (21 of 69)
PASS: hipify :: unit_tests/libraries/CUB/cub_01.cu (22 of 69)
PASS: hipify :: unit_tests/libraries/CUB/cub_02.cu (23 of 69)
PASS: hipify :: unit_tests/libraries/cuBLAS/cublas_sgemm_matrix_multiplication.cu (24 of 69)
PASS: hipify :: unit_tests/libraries/cuBLAS/rocBLAS/cublas_0_based_indexing_rocblas.cu (25 of 69)
PASS: hipify :: unit_tests/libraries/cuBLAS/rocBLAS/cublas_1_based_indexing_rocblas.cu (26 of 69)
PASS: hipify :: unit_tests/libraries/cuBLAS/cublas_1_based_indexing.cu (27 of 69)
PASS: hipify :: unit_tests/libraries/cuComplex/cuComplex_Julia.cu (28 of 69)
PASS: hipify :: unit_tests/libraries/cuDNN/cudnn_softmax.cu (29 of 69)
PASS: hipify :: unit_tests/libraries/cuFFT/simple_cufft.cu (30 of 69)
PASS: hipify :: unit_tests/libraries/cuBLAS/rocBLAS/cublas_sgemm_matrix_multiplication_rocblas.cu (31 of 69)
PASS: hipify :: unit_tests/libraries/cuRAND/poisson_api_example.cu (32 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_03.cu (33 of 69)
PASS: hipify :: unit_tests/libraries/cuRAND/benchmark_curand_generate.cpp (34 of 69)
PASS: hipify :: unit_tests/libraries/cuRAND/benchmark_curand_kernel.cpp (35 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_04.cu (36 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_05.cu (37 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_06.cu (38 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_07.cu (39 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_08.cu (40 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_09.cu (41 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_11.cu (42 of 69)
PASS: hipify :: unit_tests/namespace/ns_kernel_launch.cu (43 of 69)
PASS: hipify :: unit_tests/libraries/cuSPARSE/cuSPARSE_10.cu (44 of 69)
PASS: hipify :: unit_tests/pp/pp_if_else_conditionals.cu (45 of 69)
PASS: hipify :: unit_tests/pp/pp_if_else_conditionals_01.cu (46 of 69)
PASS: hipify :: unit_tests/pp/pp_if_else_conditionals_01_LLVM_10.cu (47 of 69)
PASS: hipify :: unit_tests/pp/pp_if_else_conditionals_LLVM_10.cu (48 of 69)
PASS: hipify :: unit_tests/samples/2_Cookbook/11_texture_driver/tex2dKernel.cpp (49 of 69)
PASS: hipify :: unit_tests/samples/2_Cookbook/0_MatrixTranspose/MatrixTranspose.cpp (50 of 69)
PASS: hipify :: unit_tests/samples/2_Cookbook/11_texture_driver/texture2dDrv.cpp (51 of 69)
PASS: hipify :: unit_tests/samples/2_Cookbook/13_occupancy/occupancy.cpp (52 of 69)
PASS: hipify :: unit_tests/samples/2_Cookbook/1_hipEvent/hipEvent.cpp (53 of 69)
PASS: hipify :: unit_tests/samples/2_Cookbook/2_Profiler/Profiler.cpp (54 of 69)
PASS: hipify :: unit_tests/samples/2_Cookbook/7_streams/stream.cpp (55 of 69)
PASS: hipify :: unit_tests/samples/MallocManaged.cpp (56 of 69)
PASS: hipify :: unit_tests/samples/2_Cookbook/8_peer2peer/peer2peer.cpp (57 of 69)
PASS: hipify :: unit_tests/samples/allocators.cu (58 of 69)
PASS: hipify :: unit_tests/samples/coalescing.cu (59 of 69)
PASS: hipify :: unit_tests/samples/dynamic_shared_memory.cu (60 of 69)
PASS: hipify :: unit_tests/samples/axpy.cu (61 of 69)
PASS: hipify :: unit_tests/samples/cudaRegister.cu (62 of 69)
PASS: hipify :: unit_tests/samples/intro.cu (63 of 69)
PASS: hipify :: unit_tests/samples/square.cu (64 of 69)
PASS: hipify :: unit_tests/samples/static_shared_memory.cu (65 of 69)
PASS: hipify :: unit_tests/samples/vec_add.cu (66 of 69)
PASS: hipify :: unit_tests/kernel_launch/kernel_launch_syntax.cu (67 of 69)
PASS: hipify :: unit_tests/synthetic/driver_structs.cu (68 of 69)
PASS: hipify :: unit_tests/synthetic/driver_enums.cu (69 of 69)
Testing Time: 3.02s
  Expected Passes    : 69
[100%] Built target test-hipify

hipify-clang: Windows

Tested configurations:

LLVM CUDA cuDNN Visual Studio (latest) cmake Python
4.0.0 - 5.0.2 8.0 5.1.10 - 7.1.4 2015.14.0, 2017.15.5.2 3.5.1, 3.18.0 3.6.4, 3.8.5
6.0.0 - 6.0.1 9.0 7.0.5 - 7.6.5 2015.14.0, 2017.15.5.5 3.6.0, 3.18.0 3.7.2, 3.8.5
7.0.0 - 7.1.0 9.2 7.6.5 2017.15.9.11 3.13.3, 3.18.0 3.7.3, 3.8.5
8.0.0 - 8.0.1 10.0 7.6.5 2017.15.9.15 3.14.2, 3.18.0 3.7.4, 3.8.5
9.0.0 - 9.0.1 10.1 7.6.5 2017.15.9.20, 2019.16.4.5 3.16.4, 3.18.0 3.8.0, 3.8.5
10.0.0 - 11.0.0 8.0 - 11.1 7.6.5 - 8.0.5 2017.15.9.30, 2019.16.8.3 3.19.2 3.9.1
11.0.1 - 11.1.0 8.0 - 11.2.2 7.6.5 - 8.0.5 2017.15.9.31, 2019.16.8.4 3.19.3 3.9.2
12.0.0 - 12.0.1 8.0 - 11.4 7.6.5 - 8.2.2 2017.15.9.37, 2019.16.10.4 3.21.0 3.9.6
13.0.0git 8.0 - 11.4 7.6.5 - 8.2.2 2017.15.9.37, 2019.16.10.4 3.21.0 3.9.6

Building with testing support by Visual Studio 16 2019 on Windows 10:

cmake
 -G "Visual Studio 16 2019" \
 -A x64 \
 -DHIPIFY_CLANG_TESTS=1 \
 -DCMAKE_BUILD_TYPE=Release \
 -DCMAKE_INSTALL_PREFIX=../dist \
 -DCMAKE_PREFIX_PATH=d:/LLVM/12.0.1/dist \
 -DCUDA_TOOLKIT_ROOT_DIR="c:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4" \
 -DCUDA_SDK_ROOT_DIR="c:/ProgramData/NVIDIA Corporation/CUDA Samples/v11.4" \
 -DCUDA_DNN_ROOT_DIR=d:/CUDNN/cudnn-11.4-windows-x64-v8.2.2 \
 -DCUDA_CUB_ROOT_DIR=d:/GIT/cub \
 -DLLVM_EXTERNAL_LIT=d:/LLVM/12.0.1/build/Release/bin/llvm-lit.py \
 -Thost=x64
 ..

A corresponding successful output:

-- Found LLVM 12.0.1:
--    - CMake module path: d:/LLVM/12.0.1/dist/lib/cmake/llvm
--    - Include path     : d:/LLVM/12.0.1/dist/include
--    - Binary path      : d:/LLVM/12.0.1/dist/bin
-- Found PythonInterp: c:/Program Files/Python39/python.exe (found suitable version "3.9.5", minimum required is "3.6")
-- Found lit: c:/Program Files/Python39/Scripts/lit.exe
-- Found FileCheck: d:/LLVM/12.0.1/dist/bin/FileCheck.exe
-- Found CUDA: c:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4 (found version "11.4")
-- Configuring done
-- Generating done
-- Build files have been written to: d:/hipify/build

Run Visual Studio 16 2019, open the generated hipify-clang.sln, build project test-hipify.

hipify-perl

hipify-perl is an autogenerated perl-based script which heavily uses regular expressions.

Advantages:

  1. Ease of use.

  2. It doesn't check the input source CUDA code for correctness.

  3. It doesn't have dependencies on 3rd party tools, including CUDA.

Disadvantages:

  1. Current disability (and difficulty in implementing) of transforming the following constructs:

    • macros expansion;

    • namespaces:

      • redefines of CUDA entities in user namespaces;

      • using directive;

    • templates (some cases);

    • device/host function calls distinguishing;

    • header files correct injection;

    • complicated argument lists parsing.

  2. Difficulties in supporting.

hipify-perl: usage

perl hipify-perl square.cu > square.cu.hip

hipify-perl: building

To generate hipify-perl, run hipify-clang --perl. Output directory for the generated hipify-perl file might be specified by --o-hipify-perl-dir option.

Supported CUDA APIs

To generate the above documentation with the actual information about all supported CUDA APIs in Markdown format, run hipify-clang --md with or without output directory specifying (-o).

Disclaimer

The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD's products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale.

AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Copyright (c) 2016-2021 Advanced Micro Devices, Inc. All rights reserved.

Issues
  • [HIPIFY] hipify-clang terminates with error.

    [HIPIFY] hipify-clang terminates with error.

    After f19e7c29dfcef3d0c941c977ce6d1448ad4e288a commit hipify-clang terminates with this error:

    ~/source/HIP/samples/0_Intro/square$ ~/software/hip/hipify-clang square.cu  -- -x cuda --cuda-path=$CUDA_PATH
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:537:3: error: constexpr function 'fpclassify' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      fpclassify(float __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:248:9: note: conflicting __device__ function declared here
    using ::fpclassify;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:542:3: error: constexpr function 'fpclassify' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      fpclassify(double __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:248:9: note: conflicting __device__ function declared here
    using ::fpclassify;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:562:3: error: constexpr function 'isfinite' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isfinite(float __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:252:9: note: conflicting __device__ function declared here
    using ::isfinite;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:566:3: error: constexpr function 'isfinite' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isfinite(double __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:252:9: note: conflicting __device__ function declared here
    using ::isfinite;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:584:3: error: constexpr function 'isinf' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isinf(float __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:255:9: note: conflicting __device__ function declared here
    using ::isinf;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:592:3: error: constexpr function 'isinf' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isinf(double __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:255:9: note: conflicting __device__ function declared here
    using ::isinf;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:611:3: error: constexpr function 'isnan' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isnan(float __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:259:9: note: conflicting __device__ function declared here
    using ::isnan;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:619:3: error: constexpr function 'isnan' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isnan(double __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:259:9: note: conflicting __device__ function declared here
    using ::isnan;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:638:3: error: constexpr function 'isnormal' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isnormal(float __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:260:9: note: conflicting __device__ function declared here
    using ::isnormal;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:642:3: error: constexpr function 'isnormal' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isnormal(double __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:260:9: note: conflicting __device__ function declared here
    using ::isnormal;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:661:3: error: constexpr function 'signbit' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      signbit(float __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:287:9: note: conflicting __device__ function declared here
    using ::signbit;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:665:3: error: constexpr function 'signbit' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      signbit(double __x)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:287:9: note: conflicting __device__ function declared here
    using ::signbit;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:683:3: error: constexpr function 'isgreater' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isgreater(float __x, float __y)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:253:9: note: conflicting __device__ function declared here
    using ::isgreater;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:687:3: error: constexpr function 'isgreater' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isgreater(double __x, double __y)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:253:9: note: conflicting __device__ function declared here
    using ::isgreater;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:709:3: error: constexpr function 'isgreaterequal' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isgreaterequal(float __x, float __y)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:254:9: note: conflicting __device__ function declared here
    using ::isgreaterequal;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:713:3: error: constexpr function 'isgreaterequal' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isgreaterequal(double __x, double __y)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:254:9: note: conflicting __device__ function declared here
    using ::isgreaterequal;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:735:3: error: constexpr function 'isless' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isless(float __x, float __y)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:256:9: note: conflicting __device__ function declared here
    using ::isless;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:739:3: error: constexpr function 'isless' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      isless(double __x, double __y)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:256:9: note: conflicting __device__ function declared here
    using ::isless;
            ^
    In file included from <built-in>:1:
    In file included from /home/kaveh/software/hip/include/__clang_cuda_runtime_wrapper.h:36:
    /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:761:3: error: constexpr function 'islessequal' without __host__ or __device__ attributes cannot overload __device__ function with same signature.  Add a __host__ attribute, or build with -fno-cuda-host-device-constexpr.
      islessequal(float __x, float __y)
      ^
    /home/kaveh/software/hip/include/__clang_cuda_math_forward_declares.h:257:9: note: conflicting __device__ function declared here
    using ::islessequal;
            ^
    fatal error: too many errors emitted, stopping now [-ferror-limit=]
    20 errors generated when compiling for host.
    Error while processing /tmp/square.cu-937183.hip.
    
    bug 
    opened by Kaveh01 28
  • [HIPIFY] hipify-clang fails for application hipify-perl can convert

    [HIPIFY] hipify-clang fails for application hipify-perl can convert

    I tried to run hipify-clang on a simple program (attached for reference). Interac doesn't have any reference calls, and hipify-perl is able to convert it perfectly except that it uses hipLaunchKernel instead of hipLaunchKernelGGL. Unfortunately, hipify-clang is not able to convert it perfectly. Here are some of the errors I'm seeing with hipify-clang:

    • Doesn't convert kernel launches
    • The macros that take the block ID x/y/z and thread ID x/y/z fail with repeated error messages like:

    note: expanded from macro ‘THREAD ID’ #define THREAD_ID ( (THREADS_PER_BLOCK * BLOCK_ID) + threadIdx.x + (THREADS_PER_BLOCK_X * threadIdx.x) + (THREADS_PER_BLOCK_X * THREADS_PER_BLOCK_Y * threadIdx.z) )

    • [HIPIFY] warning: interac.cu:205:63: the following reference is not handled: 'pyHostToDevice' [enum constant ref].
    • [HIPIFY] warning: interac.cu:206:75: the following reference is not handled: 'pyHostToDevice' [enum constant ref].
    • [HIPIFY] warning: interac.cu:216:5: the following reference is not handled: 'CUevent_st' [struct var ptr].
    • [HIPIFY] warning: interac.cu:216:5: the following reference is not handled: 'CUevent_st' [struct var ptr].
    • [HIPIFY] warning: interac.cu:231:5: the following reference is not handled: 'adSynchronize' [function call].
    • [HIPIFY] warning: interac.cu:234:63: the following reference is not handled: 'pyDeviceToHost' [enum constant ref].
    • [HIPIFY] warning: interac.cu:237:5: the following reference is not handled: 'tElapsedTime' [function call].
    • [HIPIFY] warning: interac.cu:274:5: the following reference is not handled: 'tDestroy' [function call].
    • [HIPIFY] warning: interac.cu:275:5: the following reference is not handled: 'tDestroy' [function call].

    So it seems like hipify-clang is having some problems parsing the text. This is the command I ran: /opt/rocm/bin/hipify-clang -o interac.hip.cpp interac.cu -- -x cuda -l/opt/rocm/hip/include/ -l/usr/local/cuda-8.0/targets/x86_64-linux/include/

    Here is the hipify-clang info: (hipify-clang --version)

    LLVM (http://llvm.org/): LLVM version 3.8.0 Optimized build. Default target: x86_64-unknown-linux-gnu Host CPU: skylake

    So my questions are:

    1. Are these known issues with hipify-clang (kernel launch change, MACRO issues, parsing text for various CUDA calls)?
    2. Is it expected that we should have to run hipify-perl after running hipify-clang to get the remainder of the calls converted properly?

    Thanks, Matt interac.zip

    bug perl 
    opened by mattsinc 27
  • [HIPIFY] Incorrect HIP generated from CUDA for DNN based YOLO Algorithm

    [HIPIFY] Incorrect HIP generated from CUDA for DNN based YOLO Algorithm

    Hi,

    I have been trying to run Hipify-clang on a YOLO algorithm written using CUDA. However, hipify-clang does not correctly generate a HIP version of the CUDA file. Below are some of the errors I am facing

    Some kernels are unable to convert keywords in CUDA: /u/p/r/preyesh/private/darknet_tiny_preyesh/src/activation_kernels.hip.cu:151:39: error: use of undeclared identifier 'gridDim' int id = (blockIdx.x + blockIdx.y*gridDim.x) * blockDim.x + threadIdx.x;

            But gridDim is a keyword in CUDA and error should not be flagged. 
    
       2. Does not recognize #define
    
           <command line>:1:11: note: expanded from here
           #define X 1
                             ^
    
       3. For kernels like batch_norm, we fixed the typecasting issues by replacing the below code
            l.scales = calloc(c, sizeof(float)); with 
            l.scales = (float *)calloc(c, sizeof(float));
    

    However, even after these changes hipify-clang does not generate the correct code -- the .hip file which is generated is either empty or exactly the same as the .cu file. We are using roc-1.6.x branch of hipify-clang as that is the version compatible with the gem5 simulator.

    Could you please tell us if we are missing something because of which were are not able to generate the .hip files correctly?

    Thank you, Rohan Mahapatra

    question 
    opened by rohanmahapatra 21
  • [HIPIFY] hipify-clang misbehaves in the presence of preprocessor directives

    [HIPIFY] hipify-clang misbehaves in the presence of preprocessor directives

    Consider the following CUDA program:

    __global__ void axpy_kernel(float a, float* x, float* y) {
        y[threadIdx.x] = a * x[threadIdx.x];
    }
    
    void axpy(float a, float* x, float* y) {
        axpy_kernel<<<1, 4>>> (a, x, y);
    #ifdef SOME_MACRO
        axpy_kernel<<<1, 4>>> (a, x, y);
    #endif
    }
    

    The result of hipifying this, if you don't pass -DSOME_MACRO is:

    #include <hip/hip_runtime.h>
    __global__ void axpy_kernel(float a, float* x, float* y) {
        y[hipThreadIdx_x] = a * x[hipThreadIdx_x];
    }
    
    void axpy(float a, float* x, float* y) {
        hipLaunchKernelGGL(axpy_kernel, dim3(1), dim3(4), 0, 0, a, x, y);
    #ifdef SOME_MACRO
        axpy_kernel<<<1, 4>>> (a, x, y);
    #endif
    
    }
    

    Respecting conditional macros isn't the right thing to do with this sort of mechanised refactoring - what you really want to do is walk the entire tree applying your refactor, regardless of preprocessor conditionals.

    This is going to present a relatively nasty obstacle to people with complicated CUDA programs they want to translate...

    feature 
    opened by ChrisKitching 17
  • [HIPIFY] Statistics reporting seems to be completely broken

    [HIPIFY] Statistics reporting seems to be completely broken

    • Passing -o-stats without also passing -print-stats silently does nothing. No stats anywhere, no warning, and no mention of this behaviour in the help text.
    • Passing both options together does produce statistics output. And a segfault.
    • Using -o-stats=wat.csv or -o-stats=wat.csv -print-stats with a nonexistent input source file causes hipify to silently do nothing and yield a 0 exit code.

    What's the intended use-case of the stats output? I'm struggling to see why someone would want to know how many bytes of source hipify changed, or the total number of calls replaced. If they want to know what hipify did to their code, diff -u is extremely good at that.

    Warnings about things it failed to convert make sense - but those should surely go through the clang diagnostics API, not this nonstandard output channel? Indeed, these are printed regardless of the stats option.

    The stats system accounts for about a third of the code (excluding the lookup tables), and it's tightly coupled into all sorts of random places to keep the counters in sync. Maybe just get rid of it?

    bug 
    opened by ChrisKitching 15
  • [HIPIFY] hipify-clang fails initializing global variable 'DashDash'

    [HIPIFY] hipify-clang fails initializing global variable 'DashDash'

    I think my llvm & clang was correctly configured when I experienced this problem. The program raises assertion failure on that DashDash opt declared in file ArgParse.cpp. Everything worked out just fine after I commented that definition.

    the stacktrace of the failure is as follows:

    #0  0x00007ffff6735428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
    ROCm-Developer-Tools/HIP#1  0x00007ffff673702a in abort () from /lib/x86_64-linux-gnu/libc.so.6
    ROCm-Developer-Tools/HIP#2  0x00007ffff672dbd7 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    ROCm-Developer-Tools/HIP#3  0x00007ffff672dc82 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
    ROCm-Developer-Tools/HIP#4  0x0000000001225379 in llvm::cl::Option::setArgStr (this=0x224df10 <DashDash>, S=...) at /home/lmy/llvm-6.0.1.src/lib/Support/CommandLine.cpp:365
    ROCm-Developer-Tools/HIP#5  0x00000000005985a0 in llvm::cl::applicator<char [2]>::opt<llvm::cl::opt<bool, false, llvm::cl::parser<bool> > > (Str=..., O=...)
        at /home/lmy/llvm/include/llvm/Support/CommandLine.h:1131
    ROCm-Developer-Tools/HIP#6  0x0000000000598545 in llvm::cl::apply<llvm::cl::opt<bool, false, llvm::cl::parser<bool> >, char [2], llvm::cl::desc, llvm::cl::value_desc, llvm::cl::cat> (
        O=0x224df10 <DashDash>, M=..., Ms=..., Ms=..., Ms=...) at /home/lmy/llvm/include/llvm/Support/CommandLine.h:1170
    ROCm-Developer-Tools/HIPIFY#3  0x0000000000595489 in llvm::cl::opt<bool, false, llvm::cl::parser<bool> >::opt<char [2], llvm::cl::desc, llvm::cl::value_desc, llvm::cl::cat> (
        this=0x224df10 <DashDash>, Ms=..., Ms=..., Ms=..., Ms=...) at /home/lmy/llvm/include/llvm/Support/CommandLine.h:1360
    ROCm-Developer-Tools/HIPIFY#4  0x000000000045a6ca in __cxx_global_var_init.43(void) () at /home/lmy/HIP/hipify-clang/src/ArgParse.cpp:92
    ROCm-Developer-Tools/HIP#9  0x000000000045ab0f in _GLOBAL__sub_I_ArgParse.cpp ()
    ROCm-Developer-Tools/HIP#10 0x0000000001599b9d in __libc_csu_init ()
    ROCm-Developer-Tools/HIP#11 0x00007ffff67207bf in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
    ROCm-Developer-Tools/HIP#12 0x0000000000594ed9 in _start ()
    

    And My platform info:

    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                40
    On-line CPU(s) list:   0-39
    Thread(s) per core:    2
    Core(s) per socket:    10
    Socket(s):             2
    NUMA node(s):          2
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 85
    Model name:            Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz
    Stepping:              4
    CPU MHz:               1000.000
    CPU max MHz:           2401.0000
    CPU min MHz:           1000.0000
    BogoMIPS:              4801.51
    Virtualization:        VT-x
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              1024K
    L3 cache:              14080K
    NUMA node0 CPU(s):     0-9,20-29
    NUMA node1 CPU(s):     10-19,30-39
    Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
    

    I'm wondering if this buggy issue has something to do with this platform

    bug 
    opened by Tecelecta 14
  • [HIPIFY][install] HIP has conflicts with nvcc.

    [HIPIFY][install] HIP has conflicts with nvcc.

    Since f19e7c29dfcef3d0c941c977ce6d1448ad4e288a commit hipcc cannot compile CUDA codes on Nvidia machines, the compilation fails with this error:

    $ pwd
    /home/kaveh/source/HIP/samples/0_Intro/square
    
    $   hipcc square.cu
    In file included from /usr/local/cuda-10.1//include/driver_types.h:78:0,
                     from /usr/local/cuda-10.1//include/builtin_types.h:59,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:91,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:18:19: error: missing binary operator before token "("
     #if !__has_feature(modules)
                       ^
    /home/kaveh/software/hip/include/stddef.h:30:42: error: missing binary operator before token "("
     #if !defined(_PTRDIFF_T) || __has_feature(modules)
                                              ^
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    /home/kaveh/software/hip/include/stddef.h:55:42: error: missing binary operator before token "("
          !defined(_RSIZE_T)) || __has_feature(modules)
                                              ^
    In file included from /usr/include/string.h:33:0,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:77,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/strings.h:23:0,
                     from /usr/include/string.h:431,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:77,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/time.h:29:0,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:78,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/c++/7/bits/cxxabi_init_exception.h:38:0,
                     from /usr/include/c++/7/bits/exception_ptr.h:38,
                     from /usr/include/c++/7/exception:142,
                     from /usr/include/c++/7/new:40,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:103,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:18:19: error: missing binary operator before token "("
     #if !__has_feature(modules)
                       ^
    /home/kaveh/software/hip/include/stddef.h:30:42: error: missing binary operator before token "("
     #if !defined(_PTRDIFF_T) || __has_feature(modules)
                                              ^
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    /home/kaveh/software/hip/include/stddef.h:55:42: error: missing binary operator before token "("
          !defined(_RSIZE_T)) || __has_feature(modules)
                                              ^
    In file included from /usr/include/stdio.h:33:0,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:128,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/x86_64-linux-gnu/bits/_G_config.h:19:0,
                     from /usr/include/x86_64-linux-gnu/bits/libio.h:35,
                     from /usr/include/stdio.h:41,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:128,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/stdlib.h:31:0,
                     from /usr/include/c++/7/cstdlib:75,
                     from /usr/include/c++/7/stdlib.h:36,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:129,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/x86_64-linux-gnu/sys/types.h:145:0,
                     from /usr/include/stdlib.h:394,
                     from /usr/include/c++/7/cstdlib:75,
                     from /usr/include/c++/7/stdlib.h:36,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:129,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/alloca.h:24:0,
                     from /usr/include/stdlib.h:566,
                     from /usr/include/c++/7/cstdlib:75,
                     from /usr/include/c++/7/stdlib.h:36,
                     from /usr/local/cuda-10.1//include/crt/common_functions.h:129,
                     from /usr/local/cuda-10.1//include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
    

    I see same error with nvcc if I add HIP's include path to my CPATH:

    
    $ nvcc square.cu  ### works fine!
    $ export CPATH=$CPATH:/home/kaveh/software/hip/include/
    $ nvcc square.cu
    In file included from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/driver_types.h:78:0,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/builtin_types.h:59,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:91,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:18:19: error: missing binary operator before token "("
     #if !__has_feature(modules)
                       ^
    /home/kaveh/software/hip/include/stddef.h:30:42: error: missing binary operator before token "("
     #if !defined(_PTRDIFF_T) || __has_feature(modules)
                                              ^
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    /home/kaveh/software/hip/include/stddef.h:55:42: error: missing binary operator before token "("
          !defined(_RSIZE_T)) || __has_feature(modules)
                                              ^
    In file included from /usr/include/string.h:33:0,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:77,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/strings.h:23:0,
                     from /usr/include/string.h:431,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:77,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/time.h:29:0,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:78,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/c++/7/bits/cxxabi_init_exception.h:38:0,
                     from /usr/include/c++/7/bits/exception_ptr.h:38,
                     from /usr/include/c++/7/exception:142,
                     from /usr/include/c++/7/new:40,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:103,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:18:19: error: missing binary operator before token "("
     #if !__has_feature(modules)
                       ^
    /home/kaveh/software/hip/include/stddef.h:30:42: error: missing binary operator before token "("
     #if !defined(_PTRDIFF_T) || __has_feature(modules)
                                              ^
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    /home/kaveh/software/hip/include/stddef.h:55:42: error: missing binary operator before token "("
          !defined(_RSIZE_T)) || __has_feature(modules)
                                              ^
    In file included from /usr/include/stdio.h:33:0,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:128,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/x86_64-linux-gnu/bits/_G_config.h:19:0,
                     from /usr/include/x86_64-linux-gnu/bits/libio.h:35,
                     from /usr/include/stdio.h:41,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:128,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/stdlib.h:31:0,
                     from /usr/include/c++/7/cstdlib:75,
                     from /usr/include/c++/7/stdlib.h:36,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:129,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/x86_64-linux-gnu/sys/types.h:145:0,
                     from /usr/include/stdlib.h:394,
                     from /usr/include/c++/7/cstdlib:75,
                     from /usr/include/c++/7/stdlib.h:36,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:129,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
                                           ^
    In file included from /usr/include/alloca.h:24:0,
                     from /usr/include/stdlib.h:566,
                     from /usr/include/c++/7/cstdlib:75,
                     from /usr/include/c++/7/stdlib.h:36,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/crt/common_functions.h:129,
                     from /usr/local/cuda-10.1//bin/../targets/x86_64-linux/include/cuda_runtime.h:115,
                     from <command-line>:0:
    /home/kaveh/software/hip/include/stddef.h:41:39: error: missing binary operator before token "("
     #if !defined(_SIZE_T) || __has_feature(modules)
    

    It seems nvcc is using HIP's stddef.h which leads to this problem. The lastes working commit is e1aac060da864beb2afbf29d39566546c8195e9a .

    bug 
    opened by Kaveh01 12
  • [HIPIFY] hipify-clang and hipify-perl are not changing CUB lines to hipCUB.

    [HIPIFY] hipify-clang and hipify-perl are not changing CUB lines to hipCUB.

    hipify-clang and hipify-perl are not changing CUB lines.

    Hipified code:

    #include <hip/hip_runtime.h>
    #include <iostream>
    #include <hiprand.h>
    #include <cub/cub.cuh>
    
    template <typename T>
    __global__ void sort(const T* data_in, T* data_out){
        
        typedef cub::BlockRadixSort<T, 1024, 4> BlockRadixSortT;
        __shared__ typename BlockRadixSortT::TempStorage tmp_sort;
        
        
        double items[4];
        int i0 = 4 * (blockIdx.x * blockDim.x + threadIdx.x);
        for (int i = 0; i < 4; ++i){
            items[i] = data_in[i0 + i];
        }
        
        BlockRadixSortT(tmp_sort).Sort(items);
        
        for (int i = 0; i < 4; ++i){
            data_out[i0 + i] = items[i];
        }
    }
    
    int main(){
        double* d_gpu = NULL;
        double* result_gpu = NULL;
        double* data_sorted = new double[4096];
        // Allocate memory on the GPU
        hipMalloc(&d_gpu, 4096 * sizeof(double));
        hipMalloc(&result_gpu, 4096 * sizeof(double));
        
        hiprandGenerator_t gen;
        //     Create generator
        hiprandCreateGenerator(&gen, HIPRAND_RNG_PSEUDO_DEFAULT);
        //     Fill array with random numbers
        hiprandGenerateNormalDouble(gen, d_gpu, 4096, 0.0, 1.0);
        //     Destroy generator
        hiprandDestroyGenerator(gen);
        
        // Sort data    
        hipLaunchKernelGGL((sort), dim3(1), dim3(1024), 0, 0, d_gpu, result_gpu);
        
        hipMemcpy(data_sorted, result_gpu, 4096 * sizeof(double), hipMemcpyDeviceToHost);
        // Write the sorted data to standard out
        
        for (int i = 0; i < 4096; ++i){
            std::cout << data_sorted[i] << ", ";
        }
        std::cout << std::endl;
    }
    
    

    Working code:

    #include <hip/hip_runtime.h>
    #include <iostream>
    #include <hiprand.h>
    #include <hipcub/hipcub.hpp>   // THIS LINE
    template <typename T>
    __global__ void sort(const T* data_in, T* data_out){
        
        typedef hipcub::BlockRadixSort<T, 1024, 4> BlockRadixSortT; // THIS LINE
        __shared__ typename BlockRadixSortT::TempStorage tmp_sort;
        
        
        double items[4];
        int i0 = 4 * (blockIdx.x * blockDim.x + threadIdx.x);
        for (int i = 0; i < 4; ++i){
            items[i] = data_in[i0 + i];
        }
        
        BlockRadixSortT(tmp_sort).Sort(items);
        
        for (int i = 0; i < 4; ++i){
            data_out[i0 + i] = items[i];
        }
    }
    
    int main(){
        double* d_gpu = NULL;
        double* result_gpu = NULL;
        double* data_sorted = new double[4096];
        // Allocate memory on the GPU
        hipMalloc(&d_gpu, 4096 * sizeof(double));
        hipMalloc(&result_gpu, 4096 * sizeof(double));
        
        hiprandGenerator_t gen;
        //     Create generator
        hiprandCreateGenerator(&gen, HIPRAND_RNG_PSEUDO_DEFAULT);
        //     Fill array with random numbers
        hiprandGenerateNormalDouble(gen, d_gpu, 4096, 0.0, 1.0);
        //     Destroy generator
        hiprandDestroyGenerator(gen);
        
        // Sort data    
        hipLaunchKernelGGL(sort, dim3(1), dim3(1024), 0, 0, d_gpu, result_gpu);
        
        hipMemcpy(data_sorted, result_gpu, 4096 * sizeof(double), hipMemcpyDeviceToHost);
        // Write the sorted data to standard out
        
        for (int i = 0; i < 4096; ++i){
            std::cout << data_sorted[i] << ", ";
        }
        std::cout << std::endl;
    }
    

    Original CUDA code:

    #include <iostream>
    #include <curand.h>
    #include <cub/cub.cuh>
    
    template <typename T>
    __global__ void sort(const T* data_in, T* data_out){
        
        typedef cub::BlockRadixSort<T, 1024, 4> BlockRadixSortT;
        __shared__ typename BlockRadixSortT::TempStorage tmp_sort;
        
        
        double items[4];
        int i0 = 4 * (blockIdx.x * blockDim.x + threadIdx.x);
        for (int i = 0; i < 4; ++i){
            items[i] = data_in[i0 + i];
        }
        
        BlockRadixSortT(tmp_sort).Sort(items);
        
        for (int i = 0; i < 4; ++i){
            data_out[i0 + i] = items[i];
        }
    }
    
    int main(){
        double* d_gpu = NULL;
        double* result_gpu = NULL;
        double* data_sorted = new double[4096];
        // Allocate memory on the GPU
        cudaMalloc(&d_gpu, 4096 * sizeof(double));
        cudaMalloc(&result_gpu, 4096 * sizeof(double));
        
        curandGenerator_t gen;
        //     Create generator
        curandCreateGenerator(&gen, CURAND_RNG_PSEUDO_DEFAULT);
        //     Fill array with random numbers
        curandGenerateNormalDouble(gen, d_gpu, 4096, 0.0, 1.0);
        //     Destroy generator
        curandDestroyGenerator(gen);
        
        // Sort data    
        sort<<<1, 1024>>>(d_gpu, result_gpu);
        
        cudaMemcpy(data_sorted, result_gpu, 4096 * sizeof(double), cudaMemcpyDeviceToHost);
        // Write the sorted data to standard out
        
        for (int i = 0; i < 4096; ++i){
            std::cout << data_sorted[i] << ", ";
        }
        std::cout << std::endl;
    }
    
    
    feature 
    opened by Kaveh01 12
  • [HIPIFY] Errors when  convert CUDA sample to HIP?

    [HIPIFY] Errors when convert CUDA sample to HIP?

    Hello,everyone. when I use hipify-clang to convert cuda code to hip . it produce the following error: /tmp/bcht.hpp-89daf2.hip:49:11: error: use of undeclared identifier 'cuda' cuda::thread_scope Scope = cuda::thread_scope_device, code link: https://github.com/owensgroup/BGHT/blob/main/include/bcht.hpp

    Any ideas would be greatly appreciated. looking forward to your reply.

    question 
    opened by xianwujie 11
  • [HIPIFY] Multiple files in-place hipification issue

    [HIPIFY] Multiple files in-place hipification issue

    After using hipconvertinplace-perl.sh command I'm getting following warning: warning

    According to my observation this is because of this following two functions:

    template global void cudaEncode(const uint maxbits, const Scalar* scalars, Word *stream, const uint3 dims, const int3 stride, const uint3 padded_dims, const uint tot_blocks) { Some commands written over here} . . . hipLaunchKernelGGL(cudaEncode, dim3(grid_size), dim3(block_size), 0, 0, maxbits, d_data, stream, dims, stride, my_pad, my_blocks);

    Will you please tell me exactly what is the problem. Because of this warning, while testing the application I'm getting following error: Error: Missing metadata for global function

    question not a bug 
    opened by chinchoretej 11
  • [Question][HIPIFY][QUDA] I'm a developer on an open source CUDA based project called QUDA. Does anybody know of any previous activity to `hipify` QUDA by AMD?

    [Question][HIPIFY][QUDA] I'm a developer on an open source CUDA based project called QUDA. Does anybody know of any previous activity to `hipify` QUDA by AMD?

    The repo is here: https://github.com/lattice/quda

    I am obviously curious as to how this library can be made to run on the new Frontier machine at Oak Ridge. I'm not an NVIDIA employee, but I have extensive knowledge of QUDA and CUDA. If no efforts have been made so far to make QUDA work on AMD architecture, I will start the project as soon as possible. For the next few days, I'm simply trying to gather as much information as possible so time is not wasted going over old ground.

    Thank you!

    question 
    opened by cpviolator 11
  • [HIPIFY][build] ld.lld: error: undefined symbol: clang::RISCV::RVVIntrinsic::***

    [HIPIFY][build] ld.lld: error: undefined symbol: clang::RISCV::RVVIntrinsic::***

    Linking error occurs while linking against trunk LLVM 15.0.0git:

    ld.lld: error: undefined symbol: clang::RISCV::RVVIntrinsic::computeBuiltinTypes(llvm::ArrayRef<clang::RISCV::PrototypeDescriptor>, bool, bool, bool, unsigned int)
    >>> referenced by SemaRISCVVectorLookup.cpp
    >>>               SemaRISCVVectorLookup.cpp.o:((anonymous namespace)::RISCVIntrinsicManagerImpl::InitIntrinsicList()) in archive /long_pathname_so_that_rpms_can_package_the_debug_info/src/out/ubuntu-20.04/20.04/llvm/lib/libclangSema.a
    >>> referenced by SemaRISCVVectorLookup.cpp
    >>>               SemaRISCVVectorLookup.cpp.o:((anonymous namespace)::RISCVIntrinsicManagerImpl::InitIntrinsicList()) in archive /long_pathname_so_that_rpms_can_package_the_debug_info/src/out/ubuntu-20.04/20.04/llvm/lib/libclangSema.a
    
    ld.lld: error: undefined symbol: clang::RISCV::RVVType::computeTypes(clang::RISCV::BasicType, int, unsigned int, llvm::ArrayRef<clang::RISCV::PrototypeDescriptor>)
    >>> referenced by SemaRISCVVectorLookup.cpp
    >>>               SemaRISCVVectorLookup.cpp.o:((anonymous namespace)::RISCVIntrinsicManagerImpl::InitIntrinsicList()) in archive /long_pathname_so_that_rpms_can_package_the_debug_info/src/out/ubuntu-20.04/20.04/llvm/lib/libclangSema.a
    >>> referenced by SemaRISCVVectorLookup.cpp
    >>>               SemaRISCVVectorLookup.cpp.o:((anonymous namespace)::RISCVIntrinsicManagerImpl::InitIntrinsicList()) in archive /long_pathname_so_that_rpms_can_package_the_debug_info/src/out/ubuntu-20.04/20.04/llvm/lib/libclangSema.a
    
    ld.lld: error: undefined symbol: clang::RISCV::RVVIntrinsic::getSuffixStr[abi:cxx11](clang::RISCV::BasicType, int, llvm::ArrayRef<clang::RISCV::PrototypeDescriptor>)
    >>> referenced by SemaRISCVVectorLookup.cpp
    >>>               SemaRISCVVectorLookup.cpp.o:((anonymous namespace)::RISCVIntrinsicManagerImpl::InitIntrinsicList()) in archive /long_pathname_so_that_rpms_can_package_the_debug_info/src/out/ubuntu-20.04/20.04/llvm/lib/libclangSema.a
    >>> referenced by SemaRISCVVectorLookup.cpp
    >>>               SemaRISCVVectorLookup.cpp.o:((anonymous namespace)::RISCVIntrinsicManagerImpl::InitIntrinsicList()) in archive /long_pathname_so_that_rpms_can_package_the_debug_info/src/out/ubuntu-20.04/20.04/llvm/lib/libclangSema.a
    clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
    
    bug clang build 
    opened by emankov 0
  • [HIPIFY][feature] New hipification strategy: Single source file for CUDA/HIP code

    [HIPIFY][feature] New hipification strategy: Single source file for CUDA/HIP code

    The idea is to introduce yet another hipification approach: instead of a new hipified file or hipification in-place perform hipification in the source CUDA file by keeping both CUDA and HIP code in it.

    [Example]

    #if GPU_PLATFORM == NVIDIA
      #include <cuda_runtime_api.h>
      #include <cublas_v2.h>
    #elif GPU_PLATFORM == AMD
      #include <hip/hip_runtime.h>
      #include <hip/hip_runtime_api.h>
      #include <hipblas.h>
      #define cudaError_t hipError_t
      #define cudaFree hipFree
      #define cudaMalloc hipMalloc
      #define cudaMemcpy hipMemcpy
      #define cublasHandle_t hipblasHandle_t
      #define cublasIdamin hipblasIdamin
      #define cublasStatus_t hipblasStatus_t
    #endif
    

    [on deliberation] For compiling such a single-source hipified code the following defines should be provided: for AMD: -DNVIDIA=0 -DAMD=1 -DGPU_PLATFORM=AMD for NVIDIA: -DNVIDIA=0 -DAMD=1 -DGPU_PLATFORM=NVIDIA

    [IMP] Both approaches should be available in hipify-clang first under corresponding options.

    feature 
    opened by emankov 0
  • [HIPIFY] HIPIFY converts CUDA_R_32F to HIPBLAS_R_32F instead of HIP_R_32F

    [HIPIFY] HIPIFY converts CUDA_R_32F to HIPBLAS_R_32F instead of HIP_R_32F

    When hipifying cuda source containing cusparse code, CUDA_R_32F (of type cudaDataType) should be converted to HIP_R_32F (of type hipDataType), which is, according to nvcc_detail/hip_runtime_api.h, just a macro definition of the original cuda type and value. Instead, CUDA_R_32F gets hipified to HIPBLAS_R_32F, which is of type hipblasDatatype_t and is incompatible.

    The original cuda code gets compiled using nvcc with no problem and runs OK, but after hipification, compilation with hipcc fails with errors like

    my_sparse_test.hip.cpp(109): error: argument of type "hipblasDatatype_t" is incompatible with parameter of type "cudaDataType"
    

    pointing to a line where (for example) hipsparseSpMV function is called.

    Replacing HIPBLAS_R_32F with HIP_R_32F in the hipified code makes the code compile with no problem and the program runs OK.

    I am on nvidia platform, using Ubuntu-18.04. hipcc --version:

    HIP version: 4.2.21155-37cb3a34
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2021 NVIDIA Corporation
    Built on Wed_Jun__2_19:15:15_PDT_2021
    Cuda compilation tools, release 11.4, V11.4.48
    Build cuda_11.4.r11.4/compiler.30033411_0
    

    hipify-clang --version:

    LLVM (http://llvm.org/):
      LLVM version 12.0.1
      Optimized build.
      Default target: x86_64-unknown-linux-gnu
      Host CPU: znver1
    

    this probably applies to other values of the enum as well. this applies to both hipify-perl and hipify-clang

    edit: on my amd-gpu machine the error message from hipcc is

    my_sparse_test.hip.cpp:119:5: error: no matching function for call to 'hipsparseSpMV'
        hipsparseSpMV(sparseHandle, HIPSPARSE_OPERATION_NON_TRANSPOSE, &alpha, matA, vecX, &beta, vecY, HIPBLAS_R_32F, HIPSPARSE_MV_ALG_DEFAULT, workspace);
        ^~~~~~~~~~~~~
    /home/hom0056/apps/hipSPARSE/installation/include/hipsparse.h:5510:19: note: candidate function not viable: no known conversion from 'hipblasDatatype_t' to 'hipDataType' for 8th argument
    

    the cause seems to be the same.

    feature 
    opened by hom0056 3
  • [HIPIFY][perl][tests] Add test harness for hipify-perl testing

    [HIPIFY][perl][tests] Add test harness for hipify-perl testing

    Test harness should:

    • be based on lit testing
    • include the same tests as hipify-clang testing
    • use AMAP the same checks as hipify-clang test harness uses
    test feature perl 
    opened by emankov 0
Releases(rocm-5.2.0)
Owner
ROCm Developer Tools
ROCm Developer Tools and Programing Languages
ROCm Developer Tools
GPU Cloth TOP in TouchDesigner using CUDA-enabled NVIDIA Flex

This project demonstrates how to use NVIDIA FleX for GPU cloth simulation in a TouchDesigner Custom Operator. It also shows how to render dynamic meshes from the texture data using custom PBR GLSL material shaders inside TouchDesigner.

Vinícius Ginja 37 Jul 27, 2022
GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV

PyTorchTOP This project demonstrates how to use OpenCV with CUDA modules and PyTorch/LibTorch in a TouchDesigner Custom Operator. Building this projec

David 65 Jun 15, 2022
A CUDA implementation of Lattice Boltzmann for fluid dynamics simulation

Lattice Boltzmann simulation I am conscious of being only an individual struggling weakly against the stream of time. But it still remains in my power

Long Nguyen 17 Mar 1, 2022
Tiny CUDA Neural Networks

This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning fast "fully fused" multi-layer perceptron as well as support for various advanced input encodings, losses, and optimizers.

NVIDIA Research Projects 1.5k Aug 3, 2022
BM3D denoising filter for VapourSynth, implemented in CUDA

VapourSynth-BM3DCUDA Copyright© 2021 WolframRhodium BM3D denoising filter for VapourSynth, implemented in CUDA Description Please check VapourSynth-BM

null 50 Jul 25, 2022
A easy-to-use image processing library accelerated with CUDA on GPU.

gpucv Have you used OpenCV on your CPU, and wanted to run it on GPU. Did you try installing OpenCV and get frustrated with its installation. Fret not

shrikumaran pb 4 Aug 14, 2021
Cooperative primitives for CUDA C++.

CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model

NVIDIA Corporation 1.2k Aug 8, 2022
CUDA-accelerated Apriltag detection and pose estimation.

Isaac ROS Apriltag Overview This ROS2 node uses the NVIDIA GPU-accelerated AprilTags library to detect AprilTags in images and publishes their poses,

NVIDIA Isaac ROS 40 Aug 3, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 42 Jul 18, 2022
CUDA Custom Buffers and example blocks

gr-cuda CUDA Support for GNU Radio using the custom buffer changes introduced in GR 3.10. Custom buffers for CUDA-enabled hardware are provided that c

GNU Radio 4 Dec 9, 2021
Raytracer implemented with CPU and GPU using CUDA

Raytracer This is a training project aimed at learning ray tracing algorithm and practicing convert sequential CPU code into a parallelized GPU code u

Alex Kotovsky 2 Nov 29, 2021
PointPillars MultiHead 40FPS - A REAL-TIME 3D detection network [Pointpillars] compiled by CUDA/TensorRT/C++.

English | 简体中文 PointPillars High performance version of 3D object detection network -PointPillars, which can achieve the real-time processing (less th

Yan haixu 170 Jul 30, 2022
FoxRaycaster, optimized, fixed and with a CUDA option

Like FoxRaycaster(link) but with a nicer GUI, bug fixes, more optimized and with CUDA. Used in project: Code from FoxRaycaster, which was based on thi

Błażej Roszkowski 2 Oct 21, 2021
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

dgSPARSE 53 Aug 9, 2022
We implemented our own sequential version of GA, PSO, SA and ACA using C++ and the parallelized version with CUDA support

We implemented our own sequential version of GA, PSO, SA and ACA using C++ (some using Eigen3 as matrix operation backend) and the parallelized version with CUDA support. All of them are much faster than the popular lib scikit-opt.

Aron751 4 May 7, 2022
Convert ASF/AMC motion capture files to BVH motion capture files.

About amc2bvh is a utility that converts a pair of files, one in the Acclaim Skeleton Format (ASF) and the other in the Acclaim Motion Capture (AMC) f

Tom Copeland 6 Jun 24, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20k Aug 5, 2022
A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

Libonnx A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support. Getting Started The library's

xboot.org 411 Aug 8, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23k Aug 6, 2022