Radeon Rays is ray intersection acceleration library for hardware and software multiplatforms using CPU and GPU

Overview

RadeonRays 4.1

Summary

RadeonRays is a ray intersection acceleration library. AMD developed RadeonRays to help developers make the most of GPU and to eliminate the need to maintain hardware-dependent code.

The library offers a well-defined C API for scene building and performing asynchronous ray intersection queries.

RadeonRays is not limited to AMD hardware, a specific operating system or graphics framework. The library helps assure compatibility and best performance across a wide range of hardware platforms.

Backends

The library supports the following graphics and GPGPU frameworks as its backends:

  • DirectX12
  • Vulkan

System Requirements

RadeonRays requires a PC with the following software and hardware:

  • DirectX12: a 64-bit version of Windows® 10, and a GPU and drivers that supports DirectX12 features
  • Vulkan: a 64-bit version of Windows® 10 or Linux, and a GPU and drivers that support Vulkan version 1.2
  • Installed spdlog library

Documentation

Documentation page

Comments
  • Ubuntu 15.10 Compilation Fails

    Ubuntu 15.10 Compilation Fails

    Hello there,

    I wanted to compile the SDK and followed the README.

    During this I encountered this Error

    Linking RadeonRays
    /usr/bin/ld: ../Bin/Release/x64/libCalc64.a(calc.o): relocation R_X86_64_32 against `.bss' can not     be used when making a shared object; recompile with -fPIC
    ../Bin/Release/x64/libCalc64.a: error adding symbols: Bad value
    collect2: error: ld returned 1 exit status
    Makefile:165: recipe for target '../Bin/Release/x64/libRadeonRays64.so' failed
    make[1]: *** [../Bin/Release/x64/libRadeonRays64.so] Error 1
    Makefile:54: recipe for target 'RadeonRays' failed
    make: *** [RadeonRays] Error 2
    

    I used ./Tools/premake/linux64/premake5 gmake to create the makefile

    >> OpenCL backend enabled
    Building configurations...
    Running action 'gmake'...
    Generated Makefile...
    Generated RadeonRays/Makefile...
    Generated Calc/Makefile...
    Generated CLW/Makefile...
    Generated App/Makefile...
    Generated Gtest/Makefile...
    Generated UnitTest/Makefile...
    Done (162ms).
    

    and maked with make config=release_x64 which then resulted in the error above.

    My libraries/compiler are as following:

    g++:
      Installed: 4:5.2.1-3ubuntu1
    libopenimageio-dev:
      Installed: 1.6.10-0thomas~wily1
    libglew-dev:
      Installed: 1.10.0-3
    libglew-dev:
      Installed: 1.10.0-3
    nvidia-opencl-dev:
      Installed: 6.5.14-2
    

    Is this a known error and is there a workaround ?

    opened by xoryouyou 14
  • "clCreateCommandQueue failed" error on OS X 10.11.5

    When I try to run the "App" target of the Xcode project, I get an exception thrown on CLWCommandQueue.cpp:37, and the message "clCreateCommandQueue failed" prints before exiting.

    The status returned by clCreateCommandQueue is -33, CL_INVALID_DEVICE.

    Here is the results of printing platform in ConfigManager::CreateConfigs:

    (lldb) print platforms
    (std::__1::vector<CLWPlatform, std::__1::allocator<CLWPlatform> >) $2 = size=1 {
      [0] = {
        ReferenceCounter<_cl_platform_id *, int (*)(_cl_platform_id *), int (*)(_cl_platform_id *)> = (object_ = 0x000000007fff0000)
        name_ = "Apple"
        profile_ = "FULL_PROFILE"
        version_ = "OpenCL 1.2 (Apr 26 2016 00:05:53)"
        vendor_ = "Apple"
        extensions_ = ""
        devices_ = size=2 {
          [0] = {
            ReferenceCounter<_cl_device_id *, cl_int (*)(_cl_device_id *), cl_int (*)(_cl_device_id *)> = (object_ = 0x0000000001024400)
            name_ = "HD Graphics 4000"
            vendor_ = "Intel"
            version_ = "OpenCL 1.2 "
            profile_ = "FULL_PROFILE"
            extensions_ = "cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images cl_khr_3d_image_writes "
            type_ = 4
            localMemSize_ = 65536
            globalMemSize_ = 1610612736
            maxAllocSize_ = 402653184
            localMemType_ = 1
            maxWorkGroupSize_ = 512
            minAlignSize_ = 128
          }
          [1] = {
            ReferenceCounter<_cl_device_id *, cl_int (*)(_cl_device_id *), cl_int (*)(_cl_device_id *)> = (object_ = 0x0000000001022700)
            name_ = "GeForce GT 650M"
            vendor_ = "NVIDIA"
            version_ = "OpenCL 1.2 "
            profile_ = "FULL_PROFILE"
            extensions_ = "cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 "
            type_ = 4
            localMemSize_ = 49152
            globalMemSize_ = 1073741824
            maxAllocSize_ = 268435456
            localMemType_ = 1
            maxWorkGroupSize_ = 1024
            minAlignSize_ = 128
          }
        }
        type_ = 4
      }
    }
    

    Only device 0, "Intel HD Graphics 4000" is failing to create a command queue. If I adjust ConfigManager::CreateConfigs to start with device 1, the nvidia card, the command queue creation succeeds.

    Maybe the CLWContext::Create calls in ConfigManager::CreateConfigs should be done in a try {} catch {} block so if an exception happens, the loop over devices just continues to try the next device?

    opened by ericwa 13
  • Transparent Material

    Transparent Material

    I'm going to express simple transparent material using 'kIdealRefract'. Tinyobj material's dissolve means alpha value. Can I control alpha value using dissolve parameter?

    opened by generalistr6 9
  • Build on Ubuntu

    Build on Ubuntu

    Hi I have managed to build and run the older code from the AMD website in Unbuntu 14.04.3 LTS by making my own App/Makefile some months ago. The app worked/works with cough NV???? HW and drivers obtained via apt-get. There is some glitchyness at startup but the window updates fine following mouse events. Impressive!

    I noticed recent commits on github "official version" so I cloned the repo and had some issues: git clone https://github.com/GPUOpen-LibrariesAndSDKs/FireRays_SDK.git cd FireRays_SDK // Not premake 4 as stated in the instructions // Not executable.. chmod +x ./premake/linux64/premake5 ./premake/linux64/premake5 gmake // make does not work as instructed with "config=release64" so I did this instead make // Compilation seems ok but I am having link issues. I am showing the second make attempt which does the link step only: [email protected]:~/Dropbox/FireRays_SDK$ make ==== Building CLW (debug_x64) ==== ==== Building App (debug_x64) ==== Linking App /usr/bin/ld: cannot find -lFireRays64

    The library name "FireRays/lib/x64/FireRays64.lib " does not observe the conventional Unices naming conventions i.e. libXXX.a/so. I am not familiar with premake and friends but I don't expect App/Makefile: "LIBS += ../Bin/Debug/x64/libCLW64D.a -lFireRays64 -lOpenImageIO -lglut -lGLEW -lGL" with -lFireRays64 seen above to work with your naming convention. So I have hacked the lines (debug and release) in App/Makefile to LIBS += ../Bin/Release/x64/libCLW64.a ../FireRays/lib/x64/FireRays64.lib -lOpenCL -lOpenImageIO -lglut -lGLEW -lGL which got rid of most but not all link problems.

    make ==== Building CLW (debug_x64) ==== ==== Building App (debug_x64) ==== Linking App obj/x64/Debug/frrenderer.o: In function FrRenderer::FrRenderer(CLWContext, int)': /home/robert/Dropbox/FireRays_SDK/App/frrenderer.cpp:124: undefined reference toFireRays::IntersectionApiCL::CreateFromOpenClContext(int, cl_context, cl_device_id_, cl_command_queue_, int)' obj/x64/Debug/frrenderer.o: In function FrRenderer::~FrRenderer()': /home/robert/Dropbox/FireRays_SDK/App/frrenderer.cpp:158: undefined reference toFireRays::IntersectionApi::Delete(FireRays::IntersectionApi_)' collect2: error: ld returned 1 exit status make[1]: *** [../Bin/Debug/x64/App64D] Error 1 make: *** [App] Error 2

    opened by normandrobert 9
  • Will the FireRays SDK source code be available?

    Will the FireRays SDK source code be available?

    Is it correct to expect that the source code for the FireRays SDK will be released? Or will this simply be a mirror repository for the library binaries and sample app already available at the AMD website?

    opened by p32blo 8
  • Bump Texture Implement

    Bump Texture Implement

    There is normal texture is used in App project, but it is expressed as specular texture of a material. Do you have plan to support bump texture which can replace geometric normal?

    opened by generalistr6 7
  • Linker error LNK1104 for  when compiling app target in VS2015

    Linker error LNK1104 for when compiling app target in VS2015

    I'm having trouble setting up the SDK on my windows machine; trying to compile the App target in VS2015 renders the following error:

    Error : LNK1104 cannot open file 'OpenImageIOD.lib' App C:\Users\cons-elo\Source\Repos\RadeonRays_SDK\App\LINK

    I'm also having trouble with tests - most of them fail with the error "unknown file: error: C++ exception with description "Cannot read the contents of a file" thrown in SetUp()."

    Cheers, Erik

    opened by Erikmolin 7
  • Segmentation fault on clCreateContext()

    Segmentation fault on clCreateContext()

    master at ca660eb0bc21d2ae48a4745053ed499c435a1b87 Core dump c.zip

    (gdb) run
    ....
    Thread 1 "App64D" received signal SIGSEGV, Segmentation fault.
    0x0000000000000000 in ?? ()
    (gdb) bt
    #0  0x0000000000000000 in ?? ()
    #1  0x00007fffd1ccd09f in ?? () from /usr/lib/libamdocl64.so
    #2  0x00007fffd1bc3944 in ?? () from /usr/lib/libamdocl64.so
    #3  0x00007fffd1bc3a59 in ?? () from /usr/lib/libamdocl64.so
    #4  0x00007fffd1bcf38b in ?? () from /usr/lib/libamdocl64.so
    #5  0x00007fffd1b9484e in clCreateContext () from /usr/lib/libamdocl64.so
    #6  0x000000000043b0f8 in CLWContext::Create (devices=std::vector of length 1, capacity 1 = {...}, [email protected]=0x7fffffffd5b0) at CLWContext.cpp:40
    #7  0x000000000043b281 in CLWContext::Create (device=..., [email protected]=0x7fffffffd5b0) at CLWContext.cpp:193
    #8  0x000000000042394e in ConfigManager::CreateConfigs (mode=ConfigManager::kUseSingleGpu, interop=true, configs=std::vector of length 0, capacity 0, 
        initial_num_bounces=5) at config_manager.cpp:109
    #9  0x0000000000427d39 in InitCl () at main.cpp:267
    #10 0x0000000000406b5e in main (argc=1, argv=0x7fffffffd958) at main.cpp:808
    (gdb) quit
    A debugging session is active.
        Inferior 1 [process 17295] will be killed.
    Quit anyway? (y or n) y
    

    clinfo output: clinfo.txt

    opened by inferrna 7
  • Documentation on the shading techniques and materials?

    Documentation on the shading techniques and materials?

    Hi, I am not sure if this belongs here, but I don't know where else i could post it. I have been trying to understand the kinds of materials in the App. So far I believe that:

    • kLambert is diffuse
    • kIdealReflect is 0 roughness reflective
    • kMicrofacetXyz are reflective with some roughness controlled by ns
    • kIdealRefract is 0 roughness transparent
    • kFresnelBlend and kMix are used to mix 2 of the above
    • Is there any way to get refraction with some roughness?
    • I had no luck combining Lambert, Microfacet and IdealRefract. Is it possible?
    • In a scene that I tried to render, light directly from emissive material doesn't pass through a window. It does only after it bounces off other objects (light from hdr enviroment does too). Why is that? (For windows I use FresnelBlend of IdealRefract and MicrofacetGGX and for opaque objects FresnelBlend of Lambert and MicrofacetGGX. Even if camera sees the emissive material through the window, it is black..)

    PS: Just noticed that a "glass mask" is being set during Preprocess. What is the purpose? (Setting it for my transparent material did not help with the emissive light)

    Thanks

    opened by DuracellTurtle 7
  • Material support

    Material support

    I'm trying to render high quality ray-traced image using Radeon Rays SDK.

    First of all, I need to render various material. Radeon Rays SDK's library has no material stuff. But I found some materials at the App.proj. These materials are implemented hard coded and difficult to control it. Do you have any plan to various material functions?

    opened by generalistr6 6
  • Crash when rendering OBJ with

    Crash when rendering OBJ with "Odd-size" texture image on GPU

    Hi, I found a crash when rendering OBJ file with "odd-size" texture image on GPU.

    I attached my simple zipped file. (obj + mtl + jpg) rect.zip If I change the (attached) image's from odd-size (e.g. 313 x 161) to even-size (314 x 162), it works well. But, with odd-size, it crashes at the beginning of program. (The crash occurs only on GPU (Mode::kUseSingleGpu). CPU is fine)

    FYI, this bug has occurred recently. If i use old versions (before October, 2016), I have never experienced this bug.

    Test Environment (GPU) : NVIDIA GrForce GTX 750 Ti

    opened by Inyong 5
  • compute shader is broken on latest windows build?

    compute shader is broken on latest windows build?

    trying to run 4.0 and 4.1 on windows, vs 2019 and getting compute shader build issues for dx and vk. can someone verify? will it run only on specific windows target runtimes?

    scene.obj is missing from repo?

    opened by gamerx1221 2
  • Temporary upload buffers are bound as UAVs in DX12 scene build dispatches

    Temporary upload buffers are bound as UAVs in DX12 scene build dispatches

    The temporary buffers holding InstanceDescriptions are allocated off of the upload heap, and then bound as UAVs in the scene BVH build dispatches. This is illegal and is reported as such by GPU-based validation.

    If these resources receive D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS, the error is caught immediately by the debug layer, so that's not the fix. The correct fix is to allocate another non-upload buffer and copy the data to it from the upload buffer.

    Repro steps:

    1. Take a PIX for Windows capture of the basic test.
    2. Open the capture, go to the "Warnings" pane.
    3. Click "run GPU-based validation". When the prompt comes up asking for running the analysis, click "Yes".
    opened by NoahMercury 1
  • 16-bit indices are exposed in the API but left unimplemented, causing TDRs on client software

    16-bit indices are exposed in the API but left unimplemented, causing TDRs on client software

    TriangleMeshBuildInfo::index_type exists, but is ignored throughout the code base, and 32-bit indices are assumed instead. 16-bit indices are thus misinterpreted, causing out-of-bounds reads and TDR-ing unsuspecting client software.

    opened by NoahMercury 0
  • Build instructions?

    Build instructions?

    There are no build instructions? On Windows using CMAKE I get the error "Could not find a package configuration file provided by "spdlog""

    What is required to build this? Is GCC required with MinGW for Windows? I can't find any build docs for this.

    opened by zezba9000 1
  • vk::Device::waitForFences: ErrorDeviceLost after few iterations of intersection commands

    vk::Device::waitForFences: ErrorDeviceLost after few iterations of intersection commands

    Hello,

    Thanks AMD folks for this repo.

    I am facing a vk ErrorDeviceLost pb. The code is borrowed from the test_vk/internal_resources, but here I am casting rays in a loop where the number of rays is decreased by 2 at each iteration.

    void test() {
        RRContext context = nullptr;
        CHECK_RR_CALL(rrCreateContext(RR_API_VERSION, RR_API_VK, &context));
        MeshData mesh_data("../resources/sponza.obj");
    
        // memory management to pass buffers to builder
        // get radeonrays ptrs to triangle description
        RRDevicePtr vertex_ptr = nullptr;
        RRDevicePtr index_ptr  = nullptr;
    
        CHECK_RR_CALL(rrAllocateDeviceBuffer(context, mesh_data.positions.size() * sizeof(float), &vertex_ptr));
        CHECK_RR_CALL(rrAllocateDeviceBuffer(context, mesh_data.indices.size() * sizeof(uint32_t), &index_ptr));
        void* ptr = nullptr;
        CHECK_RR_CALL(rrMapDevicePtr(context, vertex_ptr, &ptr));
        float* pos_ptr = (float*)ptr;
        for (const auto& pos : mesh_data.positions)
        {
            *pos_ptr = pos;
            pos_ptr++;
        }
        CHECK_RR_CALL(rrUnmapDevicePtr(context, vertex_ptr, &ptr));
        CHECK_RR_CALL(rrMapDevicePtr(context, index_ptr, &ptr));
        uint32_t* ind_ptr = (uint32_t*)ptr;
        for (const auto& ind : mesh_data.indices)
        {
            *ind_ptr = ind;
            ind_ptr++;
        }
        CHECK_RR_CALL(rrUnmapDevicePtr(context, index_ptr, &ptr));
    
        auto triangle_count = (uint32_t)mesh_data.indices.size() / 3;
    
        RRGeometryBuildInput    geometry_build_input                = {};
        RRTriangleMeshPrimitive mesh                                = {};
        geometry_build_input.triangle_mesh_primitives               = &mesh;
        geometry_build_input.primitive_type                         = RR_PRIMITIVE_TYPE_TRIANGLE_MESH;
        geometry_build_input.triangle_mesh_primitives->vertices     = vertex_ptr;
        geometry_build_input.triangle_mesh_primitives->vertex_count = uint32_t(mesh_data.positions.size() / 3);
    
        geometry_build_input.triangle_mesh_primitives->vertex_stride    = 3 * sizeof(float);
        geometry_build_input.triangle_mesh_primitives->triangle_indices = index_ptr;
        geometry_build_input.triangle_mesh_primitives->triangle_count   = triangle_count;
        geometry_build_input.triangle_mesh_primitives->index_type       = RR_INDEX_TYPE_UINT32;
        geometry_build_input.primitive_count                            = 1u;
    
        std::cout << "Triangle count " << triangle_count << "\n";
    
        RRBuildOptions options;
        options.build_flags = 0u;
    
        RRMemoryRequirements geometry_reqs;
        CHECK_RR_CALL(rrGetGeometryBuildMemoryRequirements(context, &geometry_build_input, &options, &geometry_reqs));
    
        // allocate buffers for builder and resulting geometry
        RRDevicePtr scratch_ptr  = nullptr;
        RRDevicePtr geometry_ptr = nullptr;
        CHECK_RR_CALL(rrAllocateDeviceBuffer(context, geometry_reqs.temporary_build_buffer_size, &scratch_ptr));
        CHECK_RR_CALL(rrAllocateDeviceBuffer(context, geometry_reqs.result_buffer_size, &geometry_ptr));
    
        std::cout << "Scratch buffer size: " << (float)geometry_reqs.temporary_build_buffer_size / 1000000 << "Mb\n";
        std::cout << "Result buffer size: " << (float)geometry_reqs.result_buffer_size / 1000000 << "Mb\n";
    
        RRCommandStream command_stream = nullptr;
        CHECK_RR_CALL(rrAllocateCommandStream(context, &command_stream));
    
        CHECK_RR_CALL(rrCmdBuildGeometry(
            context, RR_BUILD_OPERATION_BUILD, &geometry_build_input, &options, scratch_ptr, geometry_ptr, command_stream));
    
        RREvent wait_event = nullptr;
        CHECK_RR_CALL(rrSumbitCommandStream(context, command_stream, nullptr, &wait_event));
        CHECK_RR_CALL(rrWaitEvent(context, wait_event));
        CHECK_RR_CALL(rrReleaseEvent(context, wait_event));
        CHECK_RR_CALL(rrReleaseCommandStream(context, command_stream));
    
        //// Cast rays in loop
    
        uint32_t kResolution = 10000;
        for (int i=0; i<5; i++) {
            auto start = std::chrono::system_clock::now();
    
            RRDevicePtr rays_dev_ptr, hits_dev_ptr, scratch_trace_ptr;
            void *ptr = nullptr;
            RREvent wait_event = nullptr;
    
            kResolution >>= 1;
            uint32_t raysNb = kResolution * kResolution;
            
            CHECK_RR_CALL(rrAllocateDeviceBuffer(context, raysNb * sizeof(RRRay), &rays_dev_ptr));
            CHECK_RR_CALL(rrMapDevicePtr(context, rays_dev_ptr, &ptr));
            
            RRRay* rays_ptr = static_cast<RRRay*>(ptr);
    
            for (int x = 0; x < kResolution; ++x)
            {
                for (int y = 0; y < kResolution; ++y)
                {
                    auto i = kResolution * y + x;
                    RRRay *r = rays_ptr+i;
                    r->origin[0] = -1.f;
                    r->origin[1] = 5.5f;
                    r->origin[2] = 0.f;
                    r->direction[1] = -1.f;
                    r->direction[0] = -1.f + (2.f / kResolution) * y;
                    r->direction[2] = -1.f + (2.f / kResolution) * x;
                    r->min_t = 0.001f;
                    r->max_t = 100000.f;
                }
            }
            CHECK_RR_CALL(rrUnmapDevicePtr(context, rays_dev_ptr, &ptr));
    
            auto end = std::chrono::system_clock::now();
            std::chrono::duration<double> elapsed_seconds = end - start;
            auto total_time_taken = elapsed_seconds.count();
            cerr << "ray init " << total_time_taken <<"s for "<< raysNb << " rays" << endl; 
    
            start = std::chrono::system_clock::now();
            CHECK_RR_CALL(rrAllocateDeviceBuffer(context, raysNb * sizeof(RRHit), &hits_dev_ptr));
            size_t scratch_trace_size;
            CHECK_RR_CALL(rrGetTraceMemoryRequirements(context, raysNb, &scratch_trace_size));
            CHECK_RR_CALL(rrAllocateDeviceBuffer(context, scratch_trace_size, &scratch_trace_ptr));
            
            RRCommandStream trace_command_stream = nullptr;
            CHECK_RR_CALL(rrAllocateCommandStream(context, &trace_command_stream));
    
            CHECK_RR_CALL(rrCmdIntersect(context,
                                         geometry_ptr,
                                         RR_INTERSECT_QUERY_CLOSEST,
                                         rays_dev_ptr,
                                         raysNb,
                                         nullptr,
                                         RR_INTERSECT_QUERY_OUTPUT_FULL_HIT,
                                         hits_dev_ptr,
                                         scratch_trace_ptr,
                                         trace_command_stream));
    
            CHECK_RR_CALL(rrSumbitCommandStream(context, trace_command_stream, nullptr, &wait_event));
            CHECK_RR_CALL(rrWaitEvent(context, wait_event));
    
            CHECK_RR_CALL(rrReleaseEvent(context, wait_event));
            CHECK_RR_CALL(rrReleaseCommandStream(context, trace_command_stream));
    
            // Map staging ray buffer.
            CHECK_RR_CALL(rrMapDevicePtr(context, hits_dev_ptr, &ptr));
    
            end = std::chrono::system_clock::now();
            elapsed_seconds = end - start;
            total_time_taken = elapsed_seconds.count();
            cerr << "intersection " << total_time_taken <<"s for "<< raysNb << " rays" << endl; 
    
            RRHit* mapped_ptr = static_cast<RRHit*>(ptr);
            std::vector<uint32_t> data(raysNb);
            int hitcount = 0;
            for (int y = 0; y < kResolution; ++y)
            {
                for (int x = 0; x < kResolution; ++x)
                {
                    int wi = kResolution * (kResolution - 1 - y) + x;
                    int i  = kResolution * y + x;
                    if (mapped_ptr[i].inst_id != ~0u)
                    {
                         data[wi] = 0xff000000 | (uint32_t(mapped_ptr[i].uv[0] * 255) << 8) |
                                       (uint32_t(mapped_ptr[i].uv[1] * 255) << 16);
                         hitcount++;
                    } else
                    {
                        data[wi] = 0xff101010;
                    }
                }
            }
    
            // rrSetLogLevel(RR_LOG_LEVEL_OFF);
            cout << "hitcount:" << hitcount << endl;
            CHECK_RR_CALL(rrUnmapDevicePtr(context, hits_dev_ptr, &ptr));
    
            // stringstream ss;
            // ss << "/dev/shm/test_vk_sponza_geom_isect_internal" << i << ".jpg";
            // stbi_write_jpg(ss.str().c_str(), kResolution, kResolution, 4, data.data(), 120);
    
            CHECK_RR_CALL(rrReleaseDevicePtr(context, rays_dev_ptr));
            CHECK_RR_CALL(rrReleaseDevicePtr(context, scratch_trace_ptr));
            CHECK_RR_CALL(rrReleaseDevicePtr(context, hits_dev_ptr));
        }
        CHECK_RR_CALL(rrReleaseDevicePtr(context, scratch_ptr));
        CHECK_RR_CALL(rrReleaseDevicePtr(context, geometry_ptr));
        CHECK_RR_CALL(rrReleaseDevicePtr(context, index_ptr));
        CHECK_RR_CALL(rrReleaseDevicePtr(context, vertex_ptr));
        CHECK_RR_CALL(rrDestroyContext(context));
    }
    
    int main(int argc, char* argv[]) {
        test();
        return 0;
    }
    

    The loop fails at last iteration where an assertion halts, this is the log output:

    [2021-07-17 00:24:19.903] [RR logger] [info] rrCreateContext(1001) [2021-07-17 00:24:19.903] [RR logger] [info] Creating Vulkan context [2021-07-17 00:24:21.709] [RR logger] [info] rrSetLogLevel(1) ... ... hitcount:390625 [2021-07-17 00:24:35.067] [RR logger] [info] rrUnmapDevicePtr [2021-07-17 00:24:35.067] [RR logger] [debug] Unmap vulkan buffer [2021-07-17 00:24:35.067] [RR logger] [info] rrReleaseDevicePtr [2021-07-17 00:24:35.069] [RR logger] [debug] Device pointer successfully released [2021-07-17 00:24:35.069] [RR logger] [info] rrReleaseDevicePtr [2021-07-17 00:24:35.078] [RR logger] [debug] Device pointer successfully released [2021-07-17 00:24:35.078] [RR logger] [info] rrReleaseDevicePtr [2021-07-17 00:24:35.079] [RR logger] [debug] Device pointer successfully released // next iteration [2021-07-17 00:24:35.079] [RR logger] [info] rrAllocateDeviceBuffer [2021-07-17 00:24:35.081] [RR logger] [debug] Allocated vulkan buffer with size 3115008 [2021-07-17 00:24:35.081] [RR logger] [info] rrMapDevicePtr [2021-07-17 00:24:35.081] [RR logger] [debug] Map vulkan buffer [2021-07-17 00:24:35.082] [RR logger] [info] rrUnmapDevicePtr [2021-07-17 00:24:35.082] [RR logger] [debug] Unmap vulkan buffer ray init 0.00319416s for 97344 rays [2021-07-17 00:24:35.082] [RR logger] [info] rrAllocateDeviceBuffer [2021-07-17 00:24:35.082] [RR logger] [debug] Allocated vulkan buffer with size 1557504 [2021-07-17 00:24:35.082] [RR logger] [info] rrGetTraceMemoryRequirements [2021-07-17 00:24:35.082] [RR logger] [debug] Successfully provided trace memory requirements [2021-07-17 00:24:35.082] [RR logger] [info] rrAllocateDeviceBuffer [2021-07-17 00:24:35.088] [RR logger] [debug] Allocated vulkan buffer with size 24920064 [2021-07-17 00:24:35.088] [RR logger] [info] rrAllocateCommandStream [2021-07-17 00:24:35.088] [RR logger] [debug] Command stream successfully allocated [2021-07-17 00:24:35.088] [RR logger] [info] rrCmdIntersect [2021-07-17 00:24:35.088] [RR logger] [debug] Intersector::Intersect() [2021-07-17 00:24:35.088] [RR logger] [debug] Batch intersect command successfully recorded [2021-07-17 00:24:35.088] [RR logger] [info] rrSumbitCommandStream [2021-07-17 00:24:35.088] [RR logger] [debug] Device::SubmitCommandStream() [2021-07-17 00:24:35.088] [RR logger] [debug] Command stream successfully submitted [2021-07-17 00:24:35.089] [RR logger] [info] rrReleaseEvent [2021-07-17 00:24:35.089] [RR logger] [debug] Device::WaitEvent() [2021-07-17 00:24:35.469] [RR logger] [error] vk::Device::waitForFences: ErrorDeviceLost rfrt: src/main3.cpp:168: void test(): Assertion `(rrWaitEvent(context, wait_event)) == RR_SUCCESS' failed. Aborted (core dumped)

    Also, if I save the Hits data at each iteration, the program completes without raising any problem, it is like allowing more time gap to the GPU hides this problem in some way.

    Ubuntu 20.04 Vulkan version 1.2.182 Nvidia driver 460.73.1 on QuadroM1200 RR4.1 build options : ENABLE_VULKAN=ON, EMBEDDED_KERNELS=ON, ENABLE_TESTING=ON, CMAKE_POSITION_INDEPENDENT_CODE=ON

    Any suggestion how to debug this ? Am I doing this the right way ?

    Cheers

    opened by mlo77 0
Releases(4.1)
  • 4.1(May 11, 2021)

  • 2.0.1(Jan 30, 2017)

    The demo shows material usage in new Baikal frameworks along with the abilities to load materials from XML format and override materials from existing .mtl files. The demo adds interactive DOF controls:

    • W and S: control lens focal length
    • Q: toggle depth of field effect
    • A and D: control lens F-stop
    • X and Y: control lens focus distance

    To run the demo unzip the contents into Resources directory (so that all resource files are in Resources/orb), go into Resources/orb and run run_orb.bat

    materials Source code(tar.gz)
    Source code(zip)
    orb.7z(29.55 MB)
  • 2.0.0(Aug 19, 2016)

    The package contains BMW blender benchmark exported to .obj format. To run it with Baikal rendrerer follow the steps:

    1. Checkout latest SDK
    2. Extract the contents of bmw27.7z package into Resources folder
    3. Run run_bmw.bat from Resources folder.

    image

    Source code(tar.gz)
    Source code(zip)
    bmw27.7z(8.74 MB)
Owner
GPUOpen Libraries & SDKs
Libraries and SDKs from the GPUOpen initiative
GPUOpen Libraries & SDKs
Nvvl - A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training

NVVL is part of DALI! DALI (Nvidia Data Loading Library) incorporates NVVL functionality and offers much more than that, so it is recommended to switc

NVIDIA Corporation 660 Nov 21, 2022
A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

Libonnx A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support. Getting Started The library's

xboot.org 433 Nov 24, 2022
Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"

Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"

Beidi Chen 1k Oct 24, 2022
GPU ray tracing framework using NVIDIA OptiX 7

GPU ray tracing framework using NVIDIA OptiX 7

Shunji Kiuchi 26 Oct 25, 2022
Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

null 60.5k Nov 23, 2022
Raytracer implemented with CPU and GPU using CUDA

Raytracer This is a training project aimed at learning ray tracing algorithm and practicing convert sequential CPU code into a parallelized GPU code u

Alex Kotovsky 2 Nov 29, 2021
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 52 Nov 24, 2022
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

dgSPARSE 58 Nov 7, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.8k Nov 26, 2022
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio

CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.

OpenNMT 370 Nov 23, 2022
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Tencent 1.2k Nov 23, 2022
Toy path tracer for my own learning purposes (CPU/GPU, C++/C#, Win/Mac/Wasm, DX11/Metal, also Unity)

Toy Path Tracer Toy path tracer for my own learning purposes, using various approaches/techs. Somewhat based on Peter Shirley's Ray Tracing in One Wee

Aras Pranckevičius 922 Nov 24, 2022
4eisa40 GPU computing : exploiting the GPU to execute advanced simulations

GPU-computing 4eisa40 GPU computing : exploiting the GPU to execute advanced simulations Activities Parallel programming Algorithms Image processing O

Ecam 4MIN repositories 2 Jan 10, 2022
OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.

OpenEmbedding English version | 中文版 About OpenEmbedding is an open-source framework for TensorFlow distributed training acceleration. Nowadays, many m

4Paradigm 19 Jul 25, 2022
TengineFactory - Algorithm acceleration landing framework, let you complete the development of algorithm at low cost.eg: Facedetect, FaceLandmark..

简介 随着人工智能的普及,深度学习算法的越来越规整,一套可以低代码并且快速落地并且有定制化解决方案的框架就是一种趋势。为了缩短算法落地周期,降低算法落地门槛是一个必然的方向。 TengineFactory 是由 OPEN AI LAB 自主研发的一套快速,低代码的算法落地框架。我们致力于打造一个完全

OAID 88 May 16, 2022
This is a list of hardware which is supports Intel SGX - Software Guard Extensions.

SGX-hardware list This is a list of hardware which supports Intel SGX - Software Guard Extensions. Desktop The CPU and the motherboard BIOS must suppo

Lars Lühr 512 Nov 23, 2022
A C++-based, cross platform ray tracing library

Visionaray A C++ based, cross platform ray tracing library Getting Visionaray The Visionaray git repository can be cloned using the following commands

Stefan Zellmann 411 Nov 24, 2022
The Forge Cross-Platform Rendering Framework PC Windows, Linux, Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2

The Forge is a cross-platform rendering framework supporting PC Windows 10 / 7 with DirectX 12 / Vulkan 1.1 with DirectX Ray Tracing API DirectX 11 Fa

The Forge / Confetti 3.3k Nov 19, 2022
FaceSwap, Realtime using cpu, 3D, c++

faceswap_cxx 3D FaceSwap, Using cpu realtime realtime face swap using cpu with 3D model Introduction c++版使用cpu实时换脸,参考git: https://github.com/MarekKowa

null 5 Nov 23, 2022