NVIDIA Image Scaling SDK

Overview

NVIDIA Image Scaling SDK v1.0

The MIT License(MIT)

Copyright(c) 2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files(the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and / or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions :

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Introduction

The NVIDIA Image Scaling SDK provides a single spatial scaling and sharpening algorithm for cross-platform support. The scaling algorithm uses a 6-tap scaling filter combined with 4 directional scaling and adaptive sharpening filters, which creates nice smooth images and sharp edges. In addition, the SDK provides a state-of-the-art adaptive directional sharpening algorithm for use in applications where no scaling is required.
The directional scaling and sharpening algorithm is named NVScaler while the adaptive-directional-sharpening-only algorithm is named NVSharpen. Both algorithms are provided as compute shaders and developers are free to integrate them in their applications. Note that if you integrate NVScaler, you should NOT integrate NVSharpen, as NVScaler already includes a sharpening pass

Pipeline Placement

The call into the NVIDIA Image Scaling shaders must occur during the post-processing phase after tone-mapping. Applying the scaling in linear HDR in-game color-space may result in a sharpening effect that is either not visible or too strong. Since sharpening algorithms can enhance noisy or grainy regions, it is recommended that certain effects such as film grain should occur after NVScaler or NVSharpen. Low-pass filters such as motion blur or light bloom are recommended to be applied before NVScaler or NVSharpen to avoid sharpening attenuation.

Color Space and Ranges

NVIDIA Image Scaling shaders can process color textures stored as either LDR or HDR with the following restrictions:

  1. LDR
    • The range of color values must be in the [0, 1] range
    • The input color texture must be in display-referred color-space after tone mapping and OETF (gamma-correction) has been applied
  2. HDR PQ
    • The range of color values must be in the [0, 1] range
    • The input color texture must be in display-referred color-space after tone mapping with Rec.2020 PQ OETF applied
  3. HDR Linear
    • The recommended range of color values is [0, 12.5], where luminance value (as per BT. 709) of 1.0 maps to brightness value of 80nits (sRGB peak) and 12.5 maps to 1000nits
    • The input color texture may have luminance values that are either linear and scene-referred or linear and display-referred (after tone mapping)

If the input color texture sent to NVScaler/NVSharpen is in HDR format set NIS_HDR_MODE define to either NIS_HDR_MODE_LINEAR (1) or NIS_HDR_MODE_PQ (2).

Supported Texture Formats

Input and output formats:

Input and output formats are expected to be in the rages defined in previous section and should be specified using non-integer data types such as DXGI_FORMAT_R8G8B8A8_UNORM.

Coefficients formats:

The scaler coefficients and USM coefficients format should be specified using float4 type such as DXGI_FORMAT_R32G32B32A32_FLOAT or DXGI_FORMAT_R16G16B16A16_FLOAT.

Resource States, Buffers, and Sampler:

The game or application calling NVIDIA Image Scaling SDK shaders must ensure that the textures are in the correct state.

  • Input color textures must be in pixel shader read state. Shader Resource View (SRV) in DirectX
  • The output texture must be in read/write state. Unordered Access View (UAV) in DirectX
  • The coefficients texture for NVScaler must be in read state. Shader Resource View (SRV) in DirectX
  • The configuration variables must be passed as constant buffer. Constant Buffer View (CBV) in DirectX
  • The sampler for texture pixel sampling. Linear clamp SamplerState in Direct

Adding NVIDIA Image Scaling SDK to a Project

Include NIS_Scaler.h directly in your application or alternative use the provided NIS_Main.hlsl shader file. Use NIS_Config.h to get the ideal shader dispatch values for your platform, to configure the algorithm constant values (NVScalerUpdateConfig, and NVSharpenUpdateConfig), and to access the algorithm coefficients (coef_scale and coef_USM).

  • Device
    NIS_Scaler.h : HLSL shader file
    NIS_Main.hlsl : Main HLSL shader example (can be replaced by your own)

  • Host Configuration
    NIS_Config.h : Configuration structure

Defines:

NIS_SCALER: default (1) NVScaler, (0) fast NVSharpen only, no upscaling
NIS_HDR_MODE: default(0) disabled, (1) Linear, (2) PQ
NIS_BLOCK_WIDTH: pixels per block width. Use GetOptimalBlockWidth query for your platform
NIS_BLOCK_HEIGHT: pixels per block height. Use GetOptimalBlockHeight query for your platform
NIS_THREAD_GROUP_SIZE: number of threads per group. Use GetOptimalThreadGroupSize query for your platform
NIS_USE_HALF_PRECISION: default(0) disabled, (1) enable half pression computation
NIS_HLSL_6_2: default (0) HLSL v5, (1) HLSL v6.2
NIS_VIEWPORT_SUPPORT: default(0) disabled, (1) enable input/output viewport support\

Default NVScaler shader constants:

[NIS_BLOCK_WIDTH, NIS_BLOCK_HEIGHT, NIS_THREAD_GROUP_SIZE] = [32, 24, 256]

Default NVSharpen shader constants:

[NIS_BLOCK_WIDTH, NIS_BLOCK_HEIGHT, NIS_THREAD_GROUP_SIZE] = [32, 32, 256]

Optimal shader settings

To get optimal performance of NvScaler and NvSharpen for current and future hardware, it is recommended that the following API is used to obtain the values for NIS_BLOCK_WIDTH, NIS_BLOCK_HEIGHT, and NIS_THREAD_GROUP_SIZE.

enum class NISGPUArchitecture : uint32_t
{
    NVIDIA_Generic = 0,
    AMD_Generic = 1,
    Intel_Generic = 2
};
struct NISOptimizer
{
    bool isUpscaling;
    NISGPUArchitecture gpuArch;

    NISOptimizer(bool isUpscaling = true,
                 NISGPUArchitecture gpuArch = NISGPUArchitecture::NVIDIA_Generic);
    uint32_t GetOptimalBlockWidth();
    uint32_t GetOptimalBlockHeight();
    uint32_t GetOptimalThreadGroupSize();
};

HDR shader settings

Use the following enum values for setting NIS_HDR_MODE

enum class NISHDRMode : uint32_t
{
    None = 0,
    Linear = 1,
    PQ = 2
};

Integration of NVScaler

Compile the NIS_Main.hlsl shader

NIS_SCALER should be set to 1, and isUscaling should be pass as true.

NISOptimizer opt(true, NISGPUArchitecture::NVIDIA_Generic);
uint32_t blockWidth = opt.GetOptimalBlockWidth();
uint32_t blockHeight = opt.GetOptimalBlockHeight();
uint32_t threadGroupSize = opt.GetOptimalThreadGroupSize();

Defines defines;
defines.add("NIS_SCALER", true);
defines.add("NIS_HDR_MODE", hdrMode);
defines.add("NIS_BLOCK_WIDTH", blockWidth);
defines.add("NIS_BLOCK_HEIGHT", blockHeight);
defines.add("NIS_THREAD_GROUP_SIZE", threadGroupSize);
NVScalerCS = CompileComputeShader(device, "NIS_Main.hlsl”, &defines);

Create NVIDIA Image Scaling SDK configuration constant buffer

struct NISConfig
{
    float kDetectRatio;
    float kDetectThres;
    float kMinContrastRatio;
    float kRatioNorm;
    ...
};

NISConfig config;
createConstBuffer(&config, &csBuffer);

Create SRV textures for the scaler and USM phase coefficients

const int rowPitch = kFilterSize * 4;
const int imageSize = rowPitch * kPhaseCount;

createTexture2D(kFilterSize / 4, kPhaseCount, DXGI_FORMAT_R32G32B32A32_FLOAT, D3D11_USAGE_DEFAULT, coef_scaler, rowPitch, imageSize, &scalerTex);

createTexture2D(kFilterSize / 4, kPhaseCount, DXGI_FORMAT_R32G32B32A32_FLOAT, D3D11_USAGE_DEFAULT, coef_usm, rowPitch, imageSize, &usmTex);

createSRV(scalerTex.Get(), DXGI_FORMAT_R32G32B32A32_FLOAT, &scalerSRV);
createSRV(usmTex.Get(), DXGI_FORMAT_R32G32B32A32_FLOAT, &usmSRV);

Create Sampler

createLinearClampSampler(&linearClampSampler);

Update NVIDIA Image Scaling SDK NVScaler configuration and constant buffer

Use the following API call to update the NVIDIA Image Scaling SDK configuration

void NVScalerUpdateConfig(NISConfig& config,
    float sharpness,
    uint32_t inputViewportOriginX, uint32_t inputViewportOriginY,
    uint32_t inputViewportWidth, uint32_t inputViewportHeight,
    uint32_t inputTextureWidth, uint32_t inputTextureHeight,
    uint32_t outputViewportOriginX, uint32_t outputViewportOriginY,
    uint32_t outputViewportWidth, uint32_t outputViewportHeight,
    uint32_t outputTextureWidth, uint32_t outputTextureHeight,
    NISHDRMode hdrMode = NISHDRMode::None
);

Update the constant buffer whenever the input size, sharpness, or scale changes

NVScalerUpdateConfig(m_config, sharpness,
                0, 0, inputWidth, inputHeight, inputWidth, inputHeight,
                0, 0, outputWidth, outputHeight, outputWidth, outputHeight,
                NISHDRMode::None);

updateConstBuffer(&config, csBuffer.Get());

A simple DX11 NVScaler dispatch example

context->CSSetShaderResources(0, 1, input); // SRV
context->CSSetShaderResource (1, 1, scalerSRV.GetAddressOf());
context->CSSetShaderResource (2, 1, usmSRV.GetAddressOf());
context->CSSetUnorderedAccessViews(0, 1, output, nullptr);
context->CSSetSamplers(0, 1, linearClampSampler.GetAddressOf());
context->CSSetConstantBuffers(0, 1, csBuffer.GetAddressOf());
context->CSSetShader(NVScalerCS.Get(), nullptr, 0);

context->Dispatch(UINT(std::ceil(outputWidth / float(blockWidth))),
                  UINT(std::ceil(outputHeight / float(blockHeight))), 1);

Integration of NVSharpen

If your application requires upscaling and sharpening do not use NVSharpen use NVScaler instead. Since NVScaler performs both operations, upscaling and sharpening, in one step, it performs faster and produces better image quality.

Compile the NIS_Main.hlsl shader

NIS_SCALER should be set to 0 and the optimizer isUscaling argument should be set as false.

bool isUpscaling = false;
NISOptimizer opt(isUpscaling, NISGPUArchitecture::NVIDIA_Generic);
uint32_t blockWidth = opt.GetOptimalBlockWidth();
uint32_t blockHeight = opt.GetOptimalBlockHeight();
uint32_t threadGroupSize = opt.GetOptimalThreadGroupSize();

Defines defines;
defines.add("NIS_DIRSCALER", isUpscaling);
defines.add("NIS_HDR_MODE", hdrMode);
defines.add("NIS_BLOCK_WIDTH", blockWidth);
defines.add("NIS_BLOCK_HEIGHT", blockHeight);
defines.add("NIS_THREAD_GROUP_SIZE", threadGroupSize);
NVSharpenCS = CompileComputeShader(device, "NIS_Main.hlsl”, &defines);

Create NVIDIA Image Scaling SDK NVSharpen configuration constant buffer

struct NISConfig
{
    float kDetectRatio;
    float kDetectThres;
    float kMinContrastRatio;
    float kRatioNorm;
    ...
};

NISConfig config;
createConstBuffer(&config, &csBuffer);

Create Sampler

createLinearClampSampler(&linearClampSampler);

Update NVIDIA Image Scaling SDK NVSharpen configuration and constant buffer

Use the following API call to update the NVIDIA Image Scaling SDK configuration. Since NVSharpen is a sharpening algorithm only the sharpness and input size are required. For upscaling with sharpening use NVScaler since it performs both operations at the same time.

void NVSharpenUpdateConfig(NISConfig& config, float sharpness,
    uint32_t inputViewportOriginX, uint32_t inputViewportOriginY,
    uint32_t inputViewportWidth, uint32_t inputViewportHeight,
    uint32_t inputTextureWidth, uint32_t inputTextureHeight,
    uint32_t outputViewportOriginX, uint32_t outputViewportOriginY,
    NISHDRMode hdrMode = NISHDRMode::None
);

Update the constant buffer whenever the input size or sharpness changes.

NVSharpenUpdateConfig(m_config, sharpness,
                      0, 0, inputWidth, inputHeight, inputWidth, inputHeight,
                      0, 0, NISHDRMode::None);

updateConstBuffer(&config, csBuffer.Get());

A simple DX11 NVSharpen dispatch example

context->CSSetShaderResources(0, 1, input);
context->CSSetUnorderedAccessViews(0, 1, output, nullptr);
context->CSSetSamplers(0, 1, linearClampSampler.GetAddressOf());
context->CSSetConstantBuffers(0, 1, csBuffer.GetAddressOf());
context->CSSetShader(NVSharpenCS.Get(), nullptr, 0);

context->Dispatch(UINT(std::ceil(outputWidth / float(blockWidth))),
                  UINT(std::ceil(outputHeight / float(blockHeight))), 1);

Samples

Dependencies

Build

$> cd samples
$> mkdir build
$> cd build
$> cmake ..

Open the solution with Visual Studio 2019. Right-click the sample project and select "Set as Startup Project" before building the project

You might also like...
Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.
Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.

Isaac ROS Visual Odometry This repository provides a ROS2 package that estimates stereo visual inertial odometry using the Isaac Elbrus GPU-accelerate

The core engine forked from NVidia's Q2RTX. Heavily modified and extended to allow for a nicer experience all-round.

Nail & Crescent - Development Branch Scratchpad - Things to do or not forget: Items are obviously broken. Physics.cpp needs more work, revising. Proba

Docker files and scripts to setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a docker container.

jetson_vins_fusion_docker This repository provides Docker files and scripts to easily setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a d

Golang bindings for Nvidia Datacenter GPU Manager (DCGM)

Bindings Golang bindings are provided for NVIDIA Data Center GPU Manager (DCGM). DCGM is a set of tools for managing and monitoring NVIDIA GPUs in clu

Vendor and game agnostic latency reduction middleware. An alternative to NVIDIA Reflex.

LatencyFleX (LFX) Vendor and game agnostic latency reduction middleware. An alternative to NVIDIA Reflex. Why LatencyFleX? There is a phenomenon commo

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

Dataset Synthesizer - NVIDIA Deep learning Dataset Synthesizer (NDDS)
Dataset Synthesizer - NVIDIA Deep learning Dataset Synthesizer (NDDS)

NVIDIA Deep learning Dataset Synthesizer (NDDS) Overview NDDS is a UE4 plugin from NVIDIA to empower computer vision researchers to export high-qualit

waifu2x converter ncnn version, runs fast on intel / amd / nvidia GPU with vulkan
waifu2x converter ncnn version, runs fast on intel / amd / nvidia GPU with vulkan

waifu2x ncnn Vulkan ncnn implementation of waifu2x converter. Runs fast on Intel / AMD / Nvidia with Vulkan API. waifu2x-ncnn-vulkan uses ncnn project

NVIDIA GPUs htop like monitoring tool
NVIDIA GPUs htop like monitoring tool

NVTOP What is NVTOP? Nvtop stands for NVidia TOP, a (h)top like task monitor for NVIDIA GPUs. It can handle multiple GPUs and print information about

Comments
  • NIS 1.0.1 regression vs NIS 1.0.0

    NIS 1.0.1 regression vs NIS 1.0.0

    Hi,

    I've noticed a bug introduced since 1.0.1 which is still not fully corrected with the latest commit https://github.com/NVIDIAGameWorks/NVIDIAImageScaling/commit/aa37be760496e6d32a1f2d6b6faba40a00c76a9d

    The color planes are not fully aligned and shifting is visible. This was clear with NIS 1.0.0.

    Here is a screenshot...

    In this screenshot we can see the red color plane is not aligned with the others.

    NIS scale 66% sharpen 50%

    FS2020_20220111_134937_NIS_150_50

    Thanks!

    opened by CptLucky8 10
  • Sample code has some uninitialized values causing random D3D failures

    Sample code has some uninitialized values causing random D3D failures

    Hi! I'm on 38402c9efb67ee5a004d03d3545362b9435d1bae and integrating the DX11 sample code into my application, I noticed that a few data structures are not zero'ed and have uninitialized fields. This leads to some D3D calls randomly failing.

    The one I hit last night was with the samplerDesc in the bilinear upscaler:

    https://github.com/NVIDIAGameWorks/NVIDIAImageScaling/blob/main/samples/DX11/src/BilinearUpscale.cpp#L105

    There is no ZeroMemory() and the MinLOD fields is left uninitialized. Somehow this works OK when I am using a certain device, but with another one, I get this error with the D3D debug layer:

    D3D11 ERROR: ID3D11Device::CreateSamplerState: MinLOD be in the range [-INF to +INF].  -1.#QNAN0 specified. [ STATE_CREATION ERROR #228: CREATESAMPLERSTATE_INVALIDMINLOD]
    

    Same remark in the DeviceResources implementation here:

    https://github.com/NVIDIAGameWorks/NVIDIAImageScaling/blob/main/samples/DX11/src/DeviceResources.cpp#L127

    This goes away by explicitly setting MinLOD to 0.

    Thanks!

    PS: You have done an amazing job with both the NIS shader and the the sample code! It's been super smooth to integrate into my application. Thank you!

    opened by mbucchia 2
  • Create initial CI configuration with GitHub Actions

    Create initial CI configuration with GitHub Actions

    Adds a simple CI configuration using GitHub Actions. It builds the samples on each push, pull request and when requested (workflow_dispatch), using CMake and MSVC.

    An example run can be found here.

    opened by EwoutH 2
  • Anyone tried NIS on Android?

    Anyone tried NIS on Android?

    Hi Nvidia, I'm looking for a good quality image upscaler/sharpener for use in my VR app (Oculus Quest) and am considering NIS as an included shader header for postprocessing. Has anyone tried using NIS on Quest?

    My app isn't doing anything much (it's just a custom CloudXR client), so performance shouldn't be that critical, I hope. I will of course try to optimize it later on. Someone did optimize FSR 1.0 to run on Android, which is my other option, but it's out of date and I'd rather use NIS if I can.

    My app is also GL ES based, so I'd need to make sure this HLSL code can even compile & run properly via glslLangValidator on Android / ES. Has anyone tried that?

    Thanks for any tips / suggestions. Happy new year.

    opened by BattleAxeVR 0
Releases(v1.0.3)
  • v1.0.3(Aug 22, 2022)

    Release v1.0.3

    This release includes the following features: - Added NIS Streamline Plug-in sample - Added NV12 input texture support - Added loop unrolling performance optimization - Fixed slangc and fxc warnings - Added optional output clamp for unformatted writes - Updated DXC version

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Feb 15, 2022)

    Release v1.0.2

    This release includes the following features:

    • Added multiple performance optimizations Moved edge-map the interpolation and weight computation before the directional filter response generation Increased detection ratio by a factor of 2
    • Fixed host and compute shader compilation warnings
    • Adjusted sharpness minimum value and normalization
    • Updated copyright notice
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 11, 2022)

    Release v1.0.1

    This release includes the following features:

    • Performance optimizations
    • fp16 coefficients support
    • GLSL support
    • DX12 and Vulkan samples

    Update notes

    • Fixed sampler initialization in the DX11 and DX12 sample apps reported by @mbucchia (thanks Matthieu!)
    • Fixed .gitignore and missing DXC lib and dll files
    • Fixed NVScaler pixel shift under certain conditions
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Nov 24, 2021)

Owner
NVIDIA GameWorks
NVIDIA Technologies for game and application developers
NVIDIA GameWorks
Gstreamer plugin that allows use of NVIDIA Maxine SDK in a generic pipeline.

GST-NVMAXINE Gstreamer plugin that allows use of NVIDIA MaxineTM sdk in a generic pipeline. This plugin is intended for use with NVIDIA hardware. Visi

Alex Pitrolo 18 Dec 19, 2022
NVIDIA PhysX SDK

NVIDIA PhysX SDK 4.1 Copyright (c) 2021 NVIDIA Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without mo

NVIDIA GameWorks 2.7k Dec 28, 2022
NVIDIA Texture Tools samples for compression, image processing, and decompression.

NVTT 3 Samples This repository contains a number of samples showing how to use NVTT 3, a GPU-accelerated texture compression and image processing libr

NVIDIA DesignWorks Samples 33 Dec 20, 2022
GPU Cloth TOP in TouchDesigner using CUDA-enabled NVIDIA Flex

This project demonstrates how to use NVIDIA FleX for GPU cloth simulation in a TouchDesigner Custom Operator. It also shows how to render dynamic meshes from the texture data using custom PBR GLSL material shaders inside TouchDesigner.

Vinícius Ginja 37 Jul 27, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Tencent 123 Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 509 Dec 17, 2022
GPU ray tracing framework using NVIDIA OptiX 7

GPU ray tracing framework using NVIDIA OptiX 7

Shunji Kiuchi 27 Dec 22, 2022
ROS2 packages based on NVIDIA libArgus library for hardware-accelerated CSI camera support.

Isaac ROS Argus Camera This repository provides monocular and stereo nodes that enable ROS developers to use cameras connected to Jetson platforms ove

NVIDIA Isaac ROS 35 Dec 10, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022