VapourSynth-BM3DCUDA
Copyright© 2021 WolframRhodium
BM3D denoising filter for VapourSynth, implemented in CUDA
Description
Please check VapourSynth-BM3D.
Requirements
-
CPU with AVX support.
-
CUDA-enabled GPU(s) of compute capability 5.0 or higher.
-
GPU driver 450 or newer.
The minimum requirement on compute capability is 3.0, which requires manual compilation (specifying nvcc flag -gencode arch=compute_30,code=sm_30
).
The _rtc
version compiles code at runtime. It requires GPU driver 465 or newer and has dependencies on nvrtc64_112_0.dll/libnvrtc.so.11.2
and nvrtc-builtins64_113.dll/libnvrtc-builtins.so.11.3.109
.
Parameters
bm3dcuda[_rtc].BM3D(clip clip[, clip ref=None, float[] sigma=3.0, int[] block_step=8, int[] bm_range=9, int radius=0, int[] ps_num=2, int[] ps_range=4, bint chroma=False, int device_id=0, bool fast=True])
-
clip:
The input clip. Must be of 32 bit float format. Each plane is denoised separately ifchroma
is set toFalse
. -
ref:
The reference clip. Must be of the same format, width, height, number of frames asclip
.
Used in block-matching and as the reference in empirical Wiener filtering, i.e.bm3d.Final / bm3d.VFinal
. -
sigma:
The strength of denoising for each plane.
The strength is similar (but not strictly equal) asVapourSynth-BM3D
due to differences in implementation. (coefficient normalization is not implemented, for example)
Default[3,3,3]
. -
block_step, bm_range, radius, ps_num, ps_range:
Same as those inVapourSynth-BM3D
.
Ifchroma
is set toTrue
, only the first value is in effect.
Otherwise an array of values may be specified for each plane. -
chroma:
CBM3D algorithm.clip
must be ofYUV444PS
format.
Y channel is used in block-matching of chroma channels. DefaultFalse
. -
device_id:
Set GPU to be used.
Default0
. -
fast:
Multi-threaded copy between CPU and GPU at the expense of 4x memory consumption.
DefaultTrue
.
Notes
bm3d.VAggregate
should be called after temporal filtering, as inVapourSynth-BM3D
.
Statistics
GPU memory consumptions:
(ref ? 4 : 3) * (chroma ? 3 : 1) * (fast ? 4 : 1) * (2 * radius + 1) * size_of_a_single_frame
Compilation on Linux
Standard version
-
g++ 11 (or higher) is required to compile
source.cpp
, while nvcc 11.3 only supports g++ 10 or older. -
Unused nvcc flags may be removed. Documentation for -gencode
cd source
nvcc kernel.cu -o kernel.o -c --use_fast_math --std=c++17 -gencode arch=compute_50,code=\"sm_50,compute_50\" -gencode arch=compute_52,code=sm_52 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_86,code=\"sm_86,compute_86\" -t 0 --compiler-bindir g++-10
g++-11 source.cpp kernel.o -o libbm3dcuda.so -shared -fPIC -I/usr/local/cuda-11.3/include -I/usr/local/include -L/usr/local/cuda-11.3/lib64 -lcudart_static --std=c++20 -march=native -O3
RTC version
cd rtc_source
g++-11 source.cpp -o libbm3drtc.so -shared -fPIC -I /usr/local/cuda-11.3/include -I /usr/local/include -L /usr/local/cuda-11.3/lib64 -lnvrtc -lcuda -Wl,-rpath,/usr/local/cuda-11.3/lib64 --std=c++20 -march=native -O3