Optimized & Generic ML Filter Runtimes for VapourSynth (with builtin support for waifu2x, RealESRGANv2 & DPIR)

Overview

vs-mlrt

VapourSynth ML filter runtimes.

Please see the wiki for supported models.

vsov: OpenVINO-based Pure CPU Runtime

OpenVINO is an AI inference runtime developed by Intel, mainly targeting x86 CPUs and Intel GPUs.

The vs-openvino plugin provides optimized pure CPU runtime for some popular AI filters, with Intel GPU support planned in the future.

To install, download the latest release and extract them into your VS plugins directory.

Please visit the vsov directory for details.

vsort: ONNX Runtime-based CPU/GPU Runtime

ONNX Runtime is an AI inference runtime with many backends.

The vs-onnxruntime plugin provides optimized CPU and CUDA GPU runtime for some popular AI filters.

To install, download the latest release and extract them into your VS plugins directory.

Please visit the vsort directory for details.

vstrt: TensorRT-based GPU Runtime

TensorRT is a highly optimized AI inference runtime for NVidia GPUs. It uses benchmarking to find the optimal kernel to use for your specific GPU, and so there is an extra step to build an engine from ONNX network on the machine you are going to use the vstrt filter, and this extra step makes deploying models a little harder than the others runtimes. However, the resulting performance is also typically much much better than the CUDA backend of vsort.

To install, download the latest release and extract them into your VS plugins directory.

Please visit the vstrt directory for details.

Comments
  • Failed to create engine file for RIFEv4.6

    Failed to create engine file for RIFEv4.6

    [11/21/2022-14:01:02] [W] [TRT] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
    [11/21/2022-14:01:02] [W] [TRT] onnx2trt_utils.cpp:391: One or more weights outside the range of INT32 was clamped
    [11/21/2022-14:01:02] [E] Error[4]: [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::391] Error Code 4: Internal Error (/Reshape: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
    [11/21/2022-14:01:02] [E] [TRT] ModelImporter.cpp:748: While parsing node number 95 [Pad -> "/Pad_output_0"]:
    [11/21/2022-14:01:02] [E] [TRT] ModelImporter.cpp:749: --- Begin node ---
    [11/21/2022-14:01:02] [E] [TRT] ModelImporter.cpp:750: input: "/Slice_output_0"
    input: "/Cast_5_output_0"
    input: ""
    output: "/Pad_output_0"
    name: "/Pad"
    op_type: "Pad"
    attribute {
      name: "mode"
      s: "constant"
      type: STRING
    }
    
    [11/21/2022-14:01:02] [E] [TRT] ModelImporter.cpp:751: --- End node ---
    [11/21/2022-14:01:02] [E] [TRT] ModelImporter.cpp:754: ERROR: ModelImporter.cpp:179 In function parseGraph:
    [6] Invalid Node - /Pad
    [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::391] Error Code 4: Internal Error (/Reshape: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
    [11/21/2022-14:01:02] [E] Failed to parse onnx file
    [11/21/2022-14:01:02] [I] Finish parsing network model
    [11/21/2022-14:01:02] [E] Parsing model failed
    [11/21/2022-14:01:02] [E] Failed to create engine from model or file.
    [11/21/2022-14:01:02] [E] Engine set up failed
    

    Specs are Ryzen 3700x and rtx 3060 Trtexec command :

    trtexec --fp16 --onnx="rife46_ensembleTrue_opset17.onnx" --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=model.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --buildOnly
    
    opened by banjaminicc 25
  • Suggestion: TTA implementation

    Suggestion: TTA implementation

    I know TTA is not that of an amazing function, yet, some small image processing or short videos would be benefited from have more accuracy on their Vapoursynth scripts.

    opened by xurich-xulaco 5
  • Suggestion: Using VMAF for duplicated frame check

    Suggestion: Using VMAF for duplicated frame check

    VapourSynth-RIFE-ncnn-Vulkan uses VMAF to determine similarity of frames. The processing of similar frames can be skipped and result in a faster rendering process. In animation frames are often duplicated. I would really like to see that feature for vstrt. Or at least I would expect a performance increase with that.

    if (sceneChange || psnrY >= d->skipThreshold) {
        dst = vsapi->copyFrame(src0, core);
    

    But instead of copying the original, copying the last result so it is not needed to run a model on basically the same image again.

    opened by styler00dollar 5
  • [local-issue] Incompatible dimensions when using ml_PP-OCRv3_det.onnx

    [local-issue] Incompatible dimensions when using ml_PP-OCRv3_det.onnx

    我在尝试用ocr模型的时候遇到了以下的报错:

    2022-12-08 18:16:53.514: ERROR: operator (): 'ortapi->CreateSessionFromArray( d->environment, std::data(onnx_data), std::size(onnx_data), session_options, &resource.session )' failed: Node (Add_43) Op (Add) [ShapeInferenceError] Incompatible dimensions
    Traceback (most recent call last):
      File "C:\Users\dtlnor\Downloads\Compressed\vapoursynth_portable_22H2p_full\test.vpy", line 48, in <module>
        OCRTest(src32, tiles=1),
      File "C:\Users\dtlnor\Downloads\Compressed\vapoursynth_portable_22H2p_full\test.vpy", line 32, in OCRTest
        res = inference(
      File "C:\Users\dtlnor\Downloads\Compressed\vapoursynth_portable_22H2p_full\vapoursynth\VapourSynthScripts\vsmlrt.py", line 908, in inference
        clip = core.ort.Model(
      File "src\cython\vapoursynth.pyx", line 2727, in vapoursynth.Function.__call__
    vapoursynth.Error: operator (): 'ortapi->CreateSessionFromArray( d->environment, std::data(onnx_data), std::size(onnx_data), session_options, &resource.session )' failed: Node (Add_43) Op (Add) [ShapeInferenceError] Incompatible dimensions
    

    info:

    HArdware:
    CPU: AMD Ryzen 9 5900X
    GPU: RTX 4090
    
    System:
    Edition	Windows 10 Pro
    Version	21H2
    OS build	19044.1645
    

    vs version: vapoursynth_portable_22H2p_full 没客制化任何其他东西

    script:

    import vapoursynth as vs
    import sys
    import havsfunc as haf
    import mvsfunc as mvf
    import descale
    import nnedi3_resample as nnrs
    import math
    import enum
    import numpy as np
    
    core = vs.core
    
    def OCRTest(clip: vs.VideoNode, tiles = 1):
    	from vsmlrt import Backend
    	from vsmlrt import inference
    	from vsmlrt import calc_tilesize
    	from vsmlrt import init_backend
    
    	(tile_w, tile_h), (overlap_w, overlap_h) = calc_tilesize(
            tiles=tiles, tilesize=None,
            width=clip.width, height=clip.height,
            multiple=1,
            overlap_w=0, overlap_h=0
        )
    
    	res = inference(
            [clip],
            network_path=r"somewhere\ml_PP-OCRv3_det.onnx",
    	    overlap=(overlap_w, overlap_h),
            tilesize=(tile_w, tile_h),
            backend=Backend.ORT_CUDA()
        )
    
    	return res
    
    sourcename = r"test.mp4"
    source= core.lsmas.LWLibavSource(sourcename)
    
    src32 = mvf.ToRGB(source, depth= 32, sample=1)
    
    mvf.Preview([
        OCRTest(src32, tiles=1), 
        ], depth=8).set_output()
    

    请问是输入格式错误了吗?(比如需要某个multiple之类的?)还是不应该以rgbs输入?

    opened by dtlnor 2
  • No module named 'onnx'

    No module named 'onnx'

    Hello, I got this error when I tried to use the script. Here are more details.

    Since the wiki haven't been updated, I tried to wade across the stream by feeling the way.

    It worked well. But if I added the param scale=0.5, , I would get the error msg: No module named 'onnx'

    opened by hooke007 2
  • TRT无法生成Realcugan pro X3模型对应的engine文件

    TRT无法生成Realcugan pro X3模型对应的engine文件

    在使用Backend.TRT()的时候,Realcugan在pro X3模型下无法生成engine。但是在pro 2X模型和up 2X,3X,4X模型下都可以正常生成engine。 测试环境: gpu:3090,v100,T4(都是一样的情况) vapoursynth R59 vsmlrt v12 vpy文件如下

    import vapoursynth as vs
    from vsmlrt import CUGAN,RealESRGAN,Backend
    core = vs.core
    device=Backend.TRT()
    device.device_id=0
    device.fp16=True
    res = core.lsmas.LWLibavSource(r"D:/video-cugan/test_709.mp4")
    res = core.resize.Bicubic(clip=res,format=vs.YUV444P16)
    res = core.resize.Bicubic(clip=res,range=1,matrix_in_s="709",format=vs.RGB48)
    res=core.fmtc.bitdepth(res, bits=32)
    res = CUGAN(res, noise=0, scale=3, tiles=1,version=2,alpha=1.00, backend=device)
    res = core.resize.Bicubic(clip=res,matrix_s="709",format=vs.YUV444P16)
    res.set_output()
    

    报错如下

    &&&& RUNNING TensorRT.trtexec [TensorRT v8501] # D:/VapourSynth_V12/package/vapoursynth64/coreplugins\vsmlrt-cuda\trtexec --onnx=D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx --memPoolSize=workspace:128 --timingCacheFile=D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx.1920x1080_fp16_workspace128_trt-8501_NVIDIA-GeForce-RTX-3090_495298c.engine.cache --device=1 --saveEngine=D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx.1920x1080_fp16_workspace128_trt-8501_NVIDIA-GeForce-RTX-3090_495298c.engine --shapes=input:1x3x1080x1920 --fp16 --tacticSources=-CUBLAS,-CUBLAS_LT --buildOnly
    [12/20/2022-23:08:32] [I] === Model Options ===
    [12/20/2022-23:08:32] [I] Format: ONNX
    [12/20/2022-23:08:32] [I] Model: D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx
    [12/20/2022-23:08:32] [I] Output:
    [12/20/2022-23:08:32] [I] === Build Options ===
    [12/20/2022-23:08:32] [I] Max batch: explicit batch
    [12/20/2022-23:08:32] [I] Memory Pools: workspace: 128 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
    [12/20/2022-23:08:32] [I] minTiming: 1
    [12/20/2022-23:08:32] [I] avgTiming: 8
    [12/20/2022-23:08:32] [I] Precision: FP32+FP16
    [12/20/2022-23:08:32] [I] LayerPrecisions: 
    [12/20/2022-23:08:32] [I] Calibration: 
    [12/20/2022-23:08:32] [I] Refit: Disabled
    [12/20/2022-23:08:32] [I] Sparsity: Disabled
    [12/20/2022-23:08:32] [I] Safe mode: Disabled
    [12/20/2022-23:08:32] [I] DirectIO mode: Disabled
    [12/20/2022-23:08:32] [I] Restricted mode: Disabled
    [12/20/2022-23:08:32] [I] Build only: Enabled
    [12/20/2022-23:08:32] [I] Save engine: D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx.1920x1080_fp16_workspace128_trt-8501_NVIDIA-GeForce-RTX-3090_495298c.engine
    [12/20/2022-23:08:32] [I] Load engine: 
    [12/20/2022-23:08:32] [I] Profiling verbosity: 0
    [12/20/2022-23:08:32] [I] Tactic sources: cublas [OFF], cublasLt [OFF], 
    [12/20/2022-23:08:32] [I] timingCacheMode: global
    [12/20/2022-23:08:32] [I] timingCacheFile: D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx.1920x1080_fp16_workspace128_trt-8501_NVIDIA-GeForce-RTX-3090_495298c.engine.cache
    [12/20/2022-23:08:32] [I] Heuristic: Disabled
    [12/20/2022-23:08:32] [I] Preview Features: Use default preview flags.
    [12/20/2022-23:08:32] [I] Input(s)s format: fp32:CHW
    [12/20/2022-23:08:32] [I] Output(s)s format: fp32:CHW
    [12/20/2022-23:08:32] [I] Input build shape: input=1x3x1080x1920+1x3x1080x1920+1x3x1080x1920
    [12/20/2022-23:08:32] [I] Input calibration shapes: model
    [12/20/2022-23:08:32] [I] === System Options ===
    [12/20/2022-23:08:32] [I] Device: 1
    [12/20/2022-23:08:32] [I] DLACore: 
    [12/20/2022-23:08:32] [I] Plugins:
    [12/20/2022-23:08:32] [I] === Inference Options ===
    [12/20/2022-23:08:32] [I] Batch: Explicit
    [12/20/2022-23:08:32] [I] Input inference shape: input=1x3x1080x1920
    [12/20/2022-23:08:32] [I] Iterations: 10
    [12/20/2022-23:08:32] [I] Duration: 3s (+ 200ms warm up)
    [12/20/2022-23:08:32] [I] Sleep time: 0ms
    [12/20/2022-23:08:32] [I] Idle time: 0ms
    [12/20/2022-23:08:32] [I] Streams: 1
    [12/20/2022-23:08:32] [I] ExposeDMA: Disabled
    [12/20/2022-23:08:32] [I] Data transfers: Enabled
    [12/20/2022-23:08:32] [I] Spin-wait: Disabled
    [12/20/2022-23:08:32] [I] Multithreading: Disabled
    [12/20/2022-23:08:32] [I] CUDA Graph: Disabled
    [12/20/2022-23:08:32] [I] Separate profiling: Disabled
    [12/20/2022-23:08:32] [I] Time Deserialize: Disabled
    [12/20/2022-23:08:32] [I] Time Refit: Disabled
    [12/20/2022-23:08:32] [I] NVTX verbosity: 0
    [12/20/2022-23:08:32] [I] Persistent Cache Ratio: 0
    [12/20/2022-23:08:32] [I] Inputs:
    [12/20/2022-23:08:32] [I] === Reporting Options ===
    [12/20/2022-23:08:32] [I] Verbose: Disabled
    [12/20/2022-23:08:32] [I] Averages: 10 inferences
    [12/20/2022-23:08:32] [I] Percentiles: 90,95,99
    [12/20/2022-23:08:32] [I] Dump refittable layers:Disabled
    [12/20/2022-23:08:32] [I] Dump output: Disabled
    [12/20/2022-23:08:32] [I] Profile: Disabled
    [12/20/2022-23:08:32] [I] Export timing to JSON file: 
    [12/20/2022-23:08:32] [I] Export output to JSON file: 
    [12/20/2022-23:08:32] [I] Export profile to JSON file: 
    [12/20/2022-23:08:32] [I] 
    [12/20/2022-23:08:32] [I] === Device Information ===
    [12/20/2022-23:08:32] [I] Selected Device: NVIDIA GeForce RTX 3090
    [12/20/2022-23:08:32] [I] Compute Capability: 8.6
    [12/20/2022-23:08:32] [I] SMs: 82
    [12/20/2022-23:08:32] [I] Compute Clock Rate: 1.695 GHz
    [12/20/2022-23:08:32] [I] Device Global Memory: 24575 MiB
    [12/20/2022-23:08:32] [I] Shared Memory per SM: 100 KiB
    [12/20/2022-23:08:32] [I] Memory Bus Width: 384 bits (ECC disabled)
    [12/20/2022-23:08:32] [I] Memory Clock Rate: 9.751 GHz
    [12/20/2022-23:08:32] [I] 
    [12/20/2022-23:08:32] [I] TensorRT version: 8.5.1
    [12/20/2022-23:08:34] [I] [TRT] [MemUsageChange] Init CUDA: CPU +449, GPU +0, now: CPU 23832, GPU 1429 (MiB)
    [12/20/2022-23:08:42] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +452, GPU +118, now: CPU 24764, GPU 1547 (MiB)
    [12/20/2022-23:08:42] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
    [12/20/2022-23:08:42] [I] Start parsing network model
    [12/20/2022-23:08:42] [I] [TRT] ----------------------------------------------------------------
    [12/20/2022-23:08:42] [I] [TRT] Input filename:   D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx
    [12/20/2022-23:08:42] [I] [TRT] ONNX IR version:  0.0.7
    [12/20/2022-23:08:42] [I] [TRT] Opset version:    13
    [12/20/2022-23:08:42] [I] [TRT] Producer name:    pytorch
    [12/20/2022-23:08:42] [I] [TRT] Producer version: 1.10
    [12/20/2022-23:08:42] [I] [TRT] Domain:           
    [12/20/2022-23:08:42] [I] [TRT] Model version:    0
    [12/20/2022-23:08:42] [I] [TRT] Doc string:       
    [12/20/2022-23:08:42] [I] [TRT] ----------------------------------------------------------------
    [12/20/2022-23:08:42] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
    [12/20/2022-23:08:42] [E] [TRT] ModelImporter.cpp:726: While parsing node number 75 [Pad -> "191"]:
    [12/20/2022-23:08:42] [E] [TRT] ModelImporter.cpp:727: --- Begin node ---
    [12/20/2022-23:08:42] [E] [TRT] ModelImporter.cpp:728: input: "110"
    input: "188"
    input: "190"
    output: "191"
    name: "Pad_112"
    op_type: "Pad"
    attribute {
      name: "mode"
      s: "constant"
      type: STRING
    }
    
    [12/20/2022-23:08:42] [E] [TRT] ModelImporter.cpp:729: --- End node ---
    [12/20/2022-23:08:42] [E] [TRT] ModelImporter.cpp:732: ERROR: builtin_op_importers.cpp:3308 In function importPad:
    [8] Assertion failed: convertOnnxPadding(ctx, nbDims, onnxPadding, start, totalPadding) && "Failed to convert padding!"
    [12/20/2022-23:08:42] [E] Failed to parse onnx file
    [12/20/2022-23:08:42] [I] Finish parsing network model
    [12/20/2022-23:08:42] [E] Parsing model failed
    [12/20/2022-23:08:42] [E] Failed to create engine from model or file.
    [12/20/2022-23:08:42] [E] Engine set up failed
    &&&& FAILED TensorRT.trtexec [TensorRT v8501] # D:/VapourSynth_V12/package/vapoursynth64/coreplugins\vsmlrt-cuda\trtexec --onnx=D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx --memPoolSize=workspace:128 --timingCacheFile=D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx.1920x1080_fp16_workspace128_trt-8501_NVIDIA-GeForce-RTX-3090_495298c.engine.cache --device=1 --saveEngine=D:/VapourSynth_V12/package/vapoursynth64/coreplugins\models\cugan\pro-conservative-up3x.onnx_alpha0.8.onnx.1920x1080_fp16_workspace128_trt-8501_NVIDIA-GeForce-RTX-3090_495298c.engine --shapes=input:1x3x1080x1920 --fp16 --tacticSources=-CUBLAS,-CUBLAS_LT --buildOnly
    
    opened by NangInShell 2
  • Preloading error when piping/mounting the vpy script

    Preloading error when piping/mounting the vpy script

    vstrt: failed to preload C:\Users\\AppData\Roaming\VapourSynth\plugins64\vsmlrt-cuda\cublas64_11.dll, errno 127
    

    Been getting this message after updating torch and it seems to be causing the pipe to end abruptly randomly Any ideas? have tried rolling back my torch version but it did not work

    opened by banjaminicc 2
  • enhancr fork

    enhancr fork

    Hey,

    First let me thank you for the fine work on vsmlrt, which has made my project enhancr possible.

    I would like to ask if you would consider not forking my GUI with python dependencies & Github Actions, building the release automatically, as this will effectively release my bespoke front-end publicly. While this is of course your good right to do that, it does undercut my model for supporting development of the project (Patreon donations).

    (For the avoidance of doubt, I'll be releasing free versions (i.e, not behind Patreon paywall, just 3 versions behind) as soon as the code is a bit more stable, as I've explained in the README.)

    Me and a lot of other people spent 7 months and a lot of work building this project.

    If you are able to help me out with this, I think my work can help bring vsmlrt to a wider group of users through my UI while maintaining the open source ethos.

    So, can we work something out?

    Let's create more awesome stuff for users together 🤝

    Hope this makes sense, please do get in contact with me to discuss, and thanks again for your work.

    Kind regards,

    Mafio

    opened by mafiosnik777 0
  • 显存问题和颜色问题

    显存问题和颜色问题

    多次调用函数显存会叠加导致爆显存,而upcunet_v3_vs没有问题

    脚本例子

    import vapoursynth as vs
    import mvsfunc as mvf
    import upcunet_v3_vs as realcugan
    realcugan = realcugan.RealWaifuUpScaler()
    from vsmlrt import *
    
    core = vs.core
    src=r"123.jpg"
    src = core.ffms2.Source(src)
    
    def upscale(clip):
    #        clip = realcugan(clip)
            clip = CUGAN(clip, noise=0, scale=2, backend=Backend.ORT_CUDA())
            return clip
    
    src = mvf.ToRGB(src, depth=32, matrix="709")
    src = core.std.Expr(src, "x 0 max 1 min")
    src2 = src
    src = upscale(src) 
    src2 = upscale(src2)
    
    res = core.std.Splice([src,src2])
    res.set_output()
    

    问题2颜色问题 alpha=1 model=保守

    原图 156495643-ab912d3a-533a-4868-ab73-431583f01067 输出 Snipaste_2022-04-26_18-51-24

    def upscale(clip):
            clip = realcugan(clip)
            return clip
            
    def upscale2(clip):
            clip = CUGAN(clip, noise=0, scale=2, tilesize=300,  backend=Backend.ORT_CUDA())
            return clip
    #...
    res = core.std.StackHorizontal([src,src2])
    res.set_output()
    
    opened by ueyome 8
  • Waifu2x-cunet brightness bug

    Waifu2x-cunet brightness bug

    The current cunet-model version makes the output darker than it should be.

    It is a known bug in VS's Waifu2x ports, as it happens in the caffe port and was introduced to ncnn-vulkan in R4. ncnn-vulkan-R3.2 is the only one that doesn't have the bug.

    Here a comparison showing the input and the difference between a spline resize, vs-mlrt-v2 and vs-mlrt-v8, apparently the bug was introduced after v2: https://slow.pics/c/myFP5fXB

    opened by dnjulek 9
Releases(v12.2)
  • v12.2(Nov 23, 2022)

    Update vsmlrt.py:

    • Introduce a new release artifact ext-models.v12.2.7z, which comes from External Models, and it's not bundled into full binary release packages (i.e. the cpu, cuda and vk packages). Please refer to their release notes for details on how to use those models.

    • Export a new API vsmlrt.inference for inference of custom models.

      import vsmlrt
      output = vsmlrt.inference(clips, "path/to/onnx", backend=vsmlrt.Backend.TRT(fp16=True))
      

      If you encounter issues like Cannot find input tensor with name "input" in the network inputs! Please make sure the input tensor names are correct., you could use vsmlrt.inference(..., input_name=None) or export the model with its input name set to "input".

    • Fix trt inference of cugan-pro (3x) models. (https://github.com/AmusementClub/vs-mlrt/issues/15)

    Source code(tar.gz)
    Source code(zip)
    ext-models.v12.2.7z(161.70 MB)
    models.v12.2.7z(744.35 MB)
    scripts.v12.2.7z(9.16 KB)
    vsmlrt-cuda.v12.2.7z(1003.83 MB)
    vsmlrt-windows-x64-cpu.v12.2.7z(762.73 MB)
    vsmlrt-windows-x64-cuda.v12.2.7z(1930.97 MB)
    vsmlrt-windows-x64-vk.v12.2.7z(765.19 MB)
    VSNCNN-Windows-x64.v12.2.7z(2.13 MB)
    VSORT-Windows-x64.v12.2.7z(14.14 MB)
    VSOV-Windows-x64.v12.2.7z(11.48 MB)
    VSTRT-Windows-x64.v12.2.7z(381.36 KB)
  • external-models(Dec 7, 2022)

    More models!

    In addition to bundled models, vs-mlrt can also be used to run these models:

    With more to come.

    Usage

    You can use the generic vsmlrt.inference API to run these models (requires release v12.2 or later).

    import vsmlrt
    output = vsmlrt.inference(rgbs, "path/to/onnx", backend=vsmlrt.Backend.TRT(fp16=True))
    
    Source code(tar.gz)
    Source code(zip)
    anime-segmentation_v1.7z(155.30 MB)
    oidn_v1.7z(3.24 MB)
    ppocr_v1.7z(2.09 MB)
  • v12.1(Nov 16, 2022)

    This minor release fixes #9: now if vsort/vstrt fails to load required cuda DLLs, they won't crash the entire process.

    However, if vs-mlrt is correctly installed, this shouldn't happen. Please report an issue if you can't access the core.trt or core.ort namespaces. Common mistake is forgetting to extract the vsmlrt-cuda.v12.1.7z package for VSORT-Windows-x64.v12.1.7z or VSTRT-Windows-x64.v12.1.7z packages. If in doubt, use the fully bundled release vsmlrt-windows-x64-cuda.v12.1.7z for CUDA users.

    Note: we explicitly do not support using both pytorch and vs-mlrt plugins in the same vpy script as pytorch uses its own set of cuda DLL which might be in conflict with the ones vs-mlrt uses. As those DLLs are not explicitly versioned (e.g. nvinfer.dll instead of nvinfer-x.yz.dll), there is nothing we can do.

    Source code(tar.gz)
    Source code(zip)
    models.v12.1.7z(744.35 MB)
    scripts.v12.1.7z(8.92 KB)
    vsmlrt-cuda.v12.1.7z(1003.82 MB)
    vsmlrt-windows-x64-cpu.v12.1.7z(762.73 MB)
    vsmlrt-windows-x64-cuda.v12.1.7z(1930.96 MB)
    vsmlrt-windows-x64-vk.v12.1.7z(765.20 MB)
    VSNCNN-Windows-x64.v12.1.7z(2.13 MB)
    VSORT-Windows-x64.v12.1.7z(14.14 MB)
    VSOV-Windows-x64.v12.1.7z(11.48 MB)
    VSTRT-Windows-x64.v12.1.7z(381.65 KB)
  • v12(Nov 1, 2022)

    Compared to v11, this release updated CUDA dependencies to CUDA 11.8.0, cuDNN 8.6.0 and TensorRT 8.5.1:

    • Added support for the NVIDIA 40 series GPUs.
    • Added support for RIFE on the trt backend.

    Known issue

    • Performance of the OV_CPU or ORT_CUDA(fp16=True) backends for RIFE is lower than expected, which is under investigation. Please consider ORT_CPU or ORT_CUDA(fp16=False) for now.
    • The NCNN_VK backend does not support RIFE.

    Installation Notes

    For some advanced features, vsmlrt.py requires numpy and onnx packages to be available. You might need to run pip install onnx numpy.

    Benchmark

    previous benchmark

    Configuration: NVIDIA RTX 3090, driver 526.47, windows server 2019, vs r60, python 3.11.0, 1080p fp16

    Backends: ort-cuda, trt from vs-mlrt v12.

    For the trt backend, the engine is created without CUDA_MODULE_LOADING=LAZY environment variable and with it during benchmarking to reduce device memory consumption.

    Data format: fps / GPU memory usage (MB)

    rife(model=44, 1920x1088)

    | backend | 1 stream | 2 streams | |:---------:|:-----------|:----------:| | ort-cuda | 53.62/1771 | 83.34/2748 | | trt | 71.30/ 626 | 107.3/ 962 |

    dpir color

    | backend | 1 stream | 2 streams | |:---------:|:-----------|:----------:| | ort-cuda | 4.64/3230 | | | trt | 10.32/1992 | 11.61/3475 |

    waifu2x upconv_7

    | backend | 1 stream | 2 streams | |:---------:|:-----------|:-----------:| | ort-cuda | 11.07/5916 | 15.04/10899 | | trt | 18.38/2092 | 31.64/ 3848 |

    waifu2x cunet

    | backend | 1 stream | 2 streams | |:---------:|:-----------|:-----------:| | ort-cuda | 4.63/8541 | 5.32/16148 | | trt | 11.44/4771 | 15.59/ 8972 |

    realesrgan v2/v3

    | backend | 1 stream | 2 streams | |:---------:|:-----------|:----------:| | ort-cuda | 8.84/2283 | 11.10/4202 | | trt | 14.59/1324 | 21.37/2174 |

    Source code(tar.gz)
    Source code(zip)
    models.v12.7z(744.35 MB)
    scripts.v12.7z(8.83 KB)
    vsmlrt-cuda.v12.7z(1003.82 MB)
    vsmlrt-windows-x64-cpu.v12.7z(762.73 MB)
    vsmlrt-windows-x64-cuda.v12.7z(1930.96 MB)
    vsmlrt-windows-x64-vk.v12.7z(765.20 MB)
    VSNCNN-Windows-x64.v12.7z(2.13 MB)
    VSORT-Windows-x64.v12.7z(14.14 MB)
    VSOV-Windows-x64.v12.7z(11.48 MB)
    VSTRT-Windows-x64.v12.7z(382.55 KB)
  • v11(Oct 26, 2022)

    Added support for the RIFE video frame interpolation algorithm.

    There are two APIs for RIFE:

    • vsmlrt.RIFE is a high-level API for interpolating a clip. set the multi argument to specify the fps factor. Just remember to perform scene detection on the input clip.
    • vsmlrt.RIFEMerge is a novel temporal std.MaskedMerge-like interface for RIFE. Use it if you want to precisely control the frames and/or time point for the interpolation.

    Known issues

    • vstrt doesn't support RIFE for the moment[^1]. The next release of TensorRT should include RIFE support and we will release v12 when that happens.

    • vstrt backend also doesn't yet support latest RTX 4000 series GPUs. This will be fixed after upgrading to the upcoming TensorRT 8.5 release. RTX 4000 series GPU owners please use other the other CUDA backends.

    • Users of the OV_GPU backend may experience errors like Exceeded max size of memory object allocation: Requested 11456040960 bytes but max alloc size is 4294959104 bytes. Please consider tiling for now.

      The reason is that the openvino library follows the opencl standard on memory object allocation restriction (CL_DEVICE_MAX_MEM_ALLOC_SIZE). For most existing intel gpus (gen9 and later), the driver imposes a maximum allocation size of ~4GiB[^2].

      [^1]: It's missing grid_sample operator support, see https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md. [^2]: this value is derived from here, which states that device not supporting sharedSystemMemCapabilities has a maximum allowed allocation size of 4294959104 bytes

    Source code(tar.gz)
    Source code(zip)
    models.v11.7z(744.35 MB)
    scripts.v11.7z(8.78 KB)
    vsmlrt-cuda.v11.7z(834.20 MB)
    vsmlrt-windows-x64-cpu.v11.7z(762.62 MB)
    vsmlrt-windows-x64-cuda.v11.7z(1821.29 MB)
    vsmlrt-windows-x64-vk.v11.7z(765.08 MB)
    VSNCNN-Windows-x64.v11.7z(2.13 MB)
    VSORT-Windows-x64.v11.7z(12.63 MB)
    VSOV-Windows-x64.v11.7z(11.48 MB)
    VSTRT-Windows-x64.v11.7z(375.80 KB)
  • v11.test(Sep 23, 2022)

    internal testing only.

    Added support for the RIFE video frame interpolation algorithm. Some features are still being implemented. The Python RIFE model wrapper interface is still subject to change.

    Known issue

    • Users of the OV_GPU backend may experience errors like Exceeded max size of memory object allocation: Requested 11456040960 bytes but max alloc size is 4294959104 bytes. Please consider tiling for now.

      The reason is that the openvino library follows the opencl standard on memory object allocation restriction (CL_DEVICE_MAX_MEM_ALLOC_SIZE). For most existing intel gpus (gen9 and later), the driver imposes a maximum allocation size of ~4GiB[^1].

      [^1]: this value is derived from here, which states that device not supporting sharedSystemMemCapabilities has a maximum allowed allocation size of 4294959104 bytes

    Source code(tar.gz)
    Source code(zip)
    models.v11.test.7z(744.35 MB)
    scripts.v11.test.7z(8.36 KB)
    vsmlrt-cuda.v11.test.7z(834.20 MB)
    vsmlrt-windows-x64-cpu.v11.test.7z(762.62 MB)
    vsmlrt-windows-x64-cuda.v11.test.7z(1821.29 MB)
    vsmlrt-windows-x64-vk.v11.test.7z(765.07 MB)
    VSNCNN-Windows-x64.v11.test.7z(2.13 MB)
    VSORT-Windows-x64.v11.test.7z(12.63 MB)
    VSOV-Windows-x64.v11.test.7z(11.48 MB)
    VSTRT-Windows-x64.v11.test.7z(376.10 KB)
  • model-20220923(Sep 23, 2022)

    New modules (compared to previous model release):

    • RIFE v4.0 from vs-rife v2.0.0. rife/rife_v4.0.onnx, config: fastmode=True, ensemble=False
    • RIFE v4.2, v4.3, v4.4, v4.5 from Practical-RIFE. rife/rife_{v4.2,v4.3,v4.4,v4.5}.onnx, config: fastmode=True, ensemble=False

    Notes:

    • For RIFE on ort-gpu, vs-mlrt v11 or later is suggested for best performance. And (as of v11), only ov-cpu, ort-cpu, ort-cuda, trt (pending new TensorRT release) support RIFE. Specifically, ov-gpu and ncnn-vk do not support RIFE due to missing gridsample op.
    Source code(tar.gz)
    Source code(zip)
    rife_v4.7z(110.42 MB)
  • v10(Sep 15, 2022)

    Release Highlight

    Vulkan based AMD GPU support added with the new vsncnn-vk backend.

    Major features

    • Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete).
      • Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
      • Please refer to the benchmark below for performance details. Tl;dr it's comparable to vsort-cuda on most networks (except waifu2x-cunet), but (significantly) slower than vstrt. Owing to its C++ implementation, it's generally faster than Python based ncnn implementations.
      • Hint: If your GPU has enough memory, please consider setting num_streams>1 to extract more performance.
      • Even though it's possible to use software based Vulkan implementations (as we did in the GHA tests), if you want to do CPU-only inference, it's much better to use vsov-cpu (or vsort-cpu).
    • Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 2~8x faster. Now there shouldn't be any reasons not to use vs-mlrt.

    Benchmark

    Configuration: NVIDIA RTX 3090, driver 516.94, windows server 2019, vs r60, python 3.10.7, 1080p fp16

    Backends: ncnn-vk, ort-cuda, trt from vs-mlrt v10, dpir-ncnn v2.0.0, w2xncnnvk r2

    Data format: fps / GPU memory usage (MB)

    dpir color

    | backend | 1 stream | 2 streams | |:---------:|:-----------|:----------:| | ncnn-vk | 4.33/3347 | 4.72/6119 | | ort-cuda | 4.56/3595 | | | trt | 10.64/2595 | 11.10/4593 | | dpir-ncnn | 3.68/3326 | |

    waifu2x upconv_7

    | backend | 1 stream | 2 streams | |:---------:|:-----------|:-----------:| | ncnn-vk | 9.46/6820 | 14.71/13468 | | ort-cuda | 12.10/6411 | 13.98/11273 | | trt | 21.32/3317 | 29.10/ 5053 | | w2xncnnvk | 6.68/6931 | 12.70/13626 |

    waifu2x cunet

    | backend | 1 stream | 2 streams | |:---------:|:------------|:-----------:| | ncnn-vk | 1.46/11908 | 1.53/23574 | | ort-cuda | 4.85/ 8793 | 5.18/16231 | | trt | 11.60/ 4960 | 15.60/ 9057 | | w2xncnnvk | 1.38/11966 | 1.58/23687 |

    realesrgan v2/v3

    | backend | 1 stream | 2 streams | |:---------:|:-----------|:----------:| | ncnn-vk | 7.23/2781 | 8.35/5330 | | ort-cuda | 9.05/2669 | 10.18/4539 | | trt | 15.93/1667 | 19.58/2543 |

    Source code(tar.gz)
    Source code(zip)
    models.v10.7z(633.40 MB)
    scripts.v10.7z(6.69 KB)
    vsmlrt-cuda.v10.7z(834.20 MB)
    vsmlrt-windows-x64-cpu.v10.7z(651.71 MB)
    vsmlrt-windows-x64-cuda.v10.7z(1710.38 MB)
    vsmlrt-windows-x64-vk.v10.7z(654.16 MB)
    VSNCNN-Windows-x64.v10.7z(2.13 MB)
    VSORT-Windows-x64.v10.7z(12.63 MB)
    VSOV-Windows-x64.v10.7z(11.48 MB)
    VSTRT-Windows-x64.v10.7z(376.06 KB)
  • v10.pre(Sep 14, 2022)

    This is a pre-release for testing & benchmarking purposes only. For production use, please use the official v10 release.

    Release Highlight

    Vulkan based AMD GPU support added with the new vsncnn-vk backend.

    Major features

    • Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete). Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
    • Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.pre.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 3x faster. Now there shouldn't be any reasons not to use vs-mlrt.
    Source code(tar.gz)
    Source code(zip)
    models.v10.pre.7z(633.40 MB)
    scripts.v10.pre.7z(6.69 KB)
    vsmlrt-cuda.v10.pre.7z(834.20 MB)
    vsmlrt-windows-x64-cpu.v10.pre.7z(651.71 MB)
    vsmlrt-windows-x64-cuda.v10.pre.7z(1710.38 MB)
    vsmlrt-windows-x64-vk.v10.pre.7z(654.16 MB)
    VSNCNN-Windows-x64.v10.pre.7z(2.13 MB)
    VSORT-Windows-x64.v10.pre.7z(12.63 MB)
    VSOV-Windows-x64.v10.pre.7z(11.48 MB)
    VSTRT-Windows-x64.v10.pre.7z(376.09 KB)
  • v9.2(Aug 7, 2022)

    Fixed issues

    • In vs-mlrt v9 and v9.1 on windows, the ORT_CUDA backend may fails for out of memory when processing a noninitial frame. This has been fixed and the performance should be improved.
    • Parameter use_cuda_graph of the ORT_CUDA backend now works properly on windows. It is however not recommended to use currently.

    Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v9.1...v9.2

    Source code(tar.gz)
    Source code(zip)
    models.v9.2.7z(633.40 MB)
    scripts.v9.2.7z(6.59 KB)
    vsmlrt-cuda.v9.2.7z(834.21 MB)
    vsmlrt-windows-x64-cpu.v9.2.7z(651.70 MB)
    vsmlrt-windows-x64-cuda.v9.2.7z(1707.95 MB)
    VSORT-Windows-x64.v9.2.7z(12.63 MB)
    VSOV-Windows-x64.v9.2.7z(11.47 MB)
    VSTRT-Windows-x64.v9.2.7z(373.97 KB)
  • v9.1(Jul 28, 2022)

    Bugfix release for v9. Recommended update for v9 users. Please see release notes for v9 to see all the major new features.

    • Fix ort_cuda fp16 inference for CUGAN(version=2) model.

      A new parameter fp16_blacklist_ops is introduced in ort and ov backends for other issues possibly related to reduced precision.

      Please still carefully review the output of fp16 accelerated CUGAN(version=2).

    • Conform with CUGAN(version=2)'s dynamic range compression. This feature is enabled by setting conformance=True (which is the default) in the CUGAN wrapper in vsmlrt.py, and it's implemented as:

      clip = clip.std.Expr("x 0.7 * 0.15 +")
      clip = CUGAN(clip, version=2)
      clip = clip.std.Expr("x 0.15 - 0.7 /")
      

    Known issues

    • These two issues are fixed in the v9.2 release.
      • The ORT_CUDA backend allocates memory during inference. This degrades performance and may results in out of memory error.
      • Parameter use_cuda_graph of the ORT_CUDA backend is broken on Windows.

    Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v9...v9.1

    Source code(tar.gz)
    Source code(zip)
    models.v9.1.7z(633.40 MB)
    scripts.v9.1.7z(6.59 KB)
    vsmlrt-cuda.v9.1.7z(834.21 MB)
    vsmlrt-windows-x64-cpu.v9.1.7z(651.70 MB)
    vsmlrt-windows-x64-cuda.v9.1.7z(1707.95 MB)
    VSORT-Windows-x64.v9.1.7z(12.63 MB)
    VSOV-Windows-x64.v9.1.7z(11.47 MB)
    VSTRT-Windows-x64.v9.1.7z(374.34 KB)
  • v9(Mar 25, 2022)

    This is a major release.

    • Added support for Intel GPUs (both discrete [Xe Arc series] and integrated [Gen 8+ on Broadwell+])

      • In vsmlrt.py, this corresponds to the OV_GPU backend.
      • The openvino library is now dynamically linked because of the integration of oneDNN for GPU.
    • Added support for RealESRGANv3 and cugan-pro models.

    • Upgraded CUDA toolkit to 11.7.0, TensorRT to 8.4.1 and cuDNN to 8.4.1. It is now possible to build TRT engines for CUGAN, waifu2x cunet and upresnet10 models on RTX 2000 and RTX 3000 series GPUs.

    • The trt backend in vsmlrt.py wrapper now creates a log file for trtexec output in the TEMP directory (this only works if using the bundled trtexec.exe.) The log file will only be retained if trtexec fails (and the vsmlrt exception message will include the full path of the log file.) If you want the log to go to a specific file, set environment variable TRTEXEC_LOG_FILE to the absolute path of the log file. If you don't want this behavior, set log=False when creating the backend (e.g.vsmlrt.Backend.TRT(log=False))

    • The cuda bundles now include VC runtime DLLs as well, so trtexec.exe should run even on systems without proper VC runtime redistributable packages installed (e.g. freshly installed Windows).

    • The ov backend can now configure model compilation via config. Available configurations can be found here.

      • Example:

        core.ov.Model(..., config = lambda: dict(CPU_THROUGHPUT_STREAMS=core.num_threads, CPU_BIND_THREAD="NO"))
        

        This configuration may be useful in improving processor utilization at the expense of significantly increased memory consumption (only try this if you have a huge number of cores underutilized by the default settings.)

        The equivalent form for the python wrapper is

        backend = vsmlrt.Backend.OV_CPU(num_streams=core.num_threads, bind_thread=False)
        
    • When using the vsmlrt.py wrapper, it will no longer create temporary onnx files (e.g. when using non-default alpha CUGAN parameters). Instead, the modified ONNX network will be passed directly into the various ML runtime filters. Those filters now supports (network_path=b'raw onnx protobuf serialization', path_is_serialization=True) for this. This feature also opens the door for generating ONNX on the fly (e.g. ever dreamed of GPU accelerated 2d-convolution or std.Expr?)

    Update Instructions

    1. Delete the previous vsmlrt-cuda, vsov, vsort and vstrt directories and vsov.dll, vsort.dll and vstrt.dll from your VS plugins directory and then extract the newly released files (specifically, do not leave files from previous version and just overwrite with the new release as the new release might have removed some files in those four directories.)
    2. Replace vsmlrt.py in your Python package directory.
    3. Updated models directories by overwriting with the new release. (Models are generally append only. We will make special notices and bump the model release tag if we change any of the previously released models.)

    Compatibility Notes

    vsmrt.py in this release is not compatible with binaries in previous releases, only script level compatibility is maintained. Generally, please make sure to upgrade the filters and vsmlrt.py as a whole.

    We strive to maintain script source level compatibility as much as possible (i.e. there won't be a great api4 breakage), and it means script writing for v7 (for example) will continue to function for the foreseeable future. Minor issues (like the non-monotonic denoise setting of cugan) will be documented instead of fixed with a breaking change.

    Known issue

    CUGAN(version=2) (a.k.a. cugan-pro) may produces blank clip when using the ORT_CUDA(fp16) backend. This is fixed in the v10 release.

    Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v8...v9

    Source code(tar.gz)
    Source code(zip)
    models.v9.7z(633.40 MB)
    scripts.v9.7z(6.39 KB)
    vsmlrt-cuda.v9.7z(834.21 MB)
    vsmlrt-windows-x64-cpu.v9.7z(651.69 MB)
    vsmlrt-windows-x64-cuda.v9.7z(1707.95 MB)
    VSORT-Windows-x64.v9.7z(12.63 MB)
    VSOV-Windows-x64.v9.7z(11.46 MB)
    VSTRT-Windows-x64.v9.7z(374.23 KB)
  • v8(Mar 12, 2022)

    • This release upgrades the cuda libraries to their latest version. Models are observed to be accelerated by ~1.1x.
    • vsmlrt.CUGAN() now accepts a new parameter alpha, which controls the strength of filtering. Setting alpha to non-default values requires the Python onnx package (but this might change in the future.)
    • Added tf32 parameter to the trt backend in vsmlrt.py. TF32 acceleration is enabled by default on the Ampere GPUs, mostly for fp32 inference, and it has no effect on other architectures.
    Source code(tar.gz)
    Source code(zip)
    models.v8.7z(603.70 MB)
    scripts.v8.7z(4.87 KB)
    vsmlrt-cuda.v8.7z(847.24 MB)
    vsmlrt-windows-x64-cpu.v8.7z(613.77 MB)
    vsmlrt-windows-x64-cuda.v8.7z(1694.63 MB)
    VSORT-Windows-x64.v8.7z(12.68 MB)
    VSOV-Windows-x64.v8.7z(6.15 MB)
    VSTRT-Windows-x64.v8.7z(355.66 KB)
  • v7(Jan 27, 2022)

    This release adds support for bilibili's Real-CUGAN, please refer to the wiki for details.

    Special notes for CUGAN:

    1. Make sure the RGBS input to CUGAN is within [0,1] range (if in doubt, better to use core.std.Expr(input, "x 0 max 1 min") to condition the input before feeding the NN; fmtconv YUV2RGB might generate out of range RGB values): Out of range values will trip the NN into producing large negative values.
    2. Do not use tiling (i.e. must set tiles=1) as CUGAN requires access to the entire input frame for its depth detection mechanism to work.

    Compared to v6, only scripts.v7.7z, models.v7.7z, vsmlrt-windows-x64-cpu.v7.7z and vsmlrt-windows-x64-cuda.v7.7z files are updated.

    Source code(tar.gz)
    Source code(zip)
    models.v7.7z(603.70 MB)
    scripts.v7.7z(4.38 KB)
    vsmlrt-cuda.v7.7z(906.05 MB)
    vsmlrt-windows-x64-cpu.v7.7z(613.76 MB)
    vsmlrt-windows-x64-cuda.v7.7z(1736.62 MB)
    VSORT-Windows-x64.v7.7z(12.74 MB)
    VSOV-Windows-x64.v7.7z(6.15 MB)
    VSTRT-Windows-x64.v7.7z(351.28 KB)
  • v6(Jan 20, 2022)

    This release contains some performance optimization of the vs-trt plugin. The general takeaway is that vs-trt can beat all benchmarked solutions on DPIR, waifu2x and RealESRGANv2 models. Specific highlights are as follows:

    • waifu2x: when using CPU, vs-ov beats waifu2x-w2xc by 2.7x (Intel 32C64T); when using GPU, vs-ort/vs-trt beats vulkan-ncnn by ~4x.
    • DPIR: vs-trt beats existing implementations on both Volta (Tesla V100) and Ampere (A10) platforms (by at most 1.5x), and vs-ort saves significant amount of GPU memory (by as much as 3.7x) compared to its counterpart
    • RealESRGANv2: vs-trt, being the only backend that utilizes TensorRT, is up to 3.3x faster than the reference implementation

    Please see detailed benchmark results in the wiki:

    • waifu2x: https://github.com/AmusementClub/vs-mlrt/wiki/waifu2x#benchmarking
    • DPIR: https://github.com/AmusementClub/vs-mlrt/wiki/DPIR#benchmarking
    • RealESRGANv2: https://github.com/AmusementClub/vs-mlrt/wiki/RealESRGANv2#benchmarking

    This release also fixed the following two bugs:

    • vs-ov: some openvino error messages from openvino were sent to stdout, affecting vspipe | x265 usage.
    • vs-ort/vs-ov: error in converting RealESRGANv2 model to fp16 format.
    Source code(tar.gz)
    Source code(zip)
    models.v6.7z(552.03 MB)
    scripts.v6.7z(4.19 KB)
    vsmlrt-cuda.v6.7z(906.05 MB)
    vsmlrt-windows-x64-cpu.v6.7z(562.10 MB)
    vsmlrt-windows-x64-cuda.v6.7z(1684.95 MB)
    VSORT-Windows-x64.v6.7z(12.74 MB)
    VSOV-Windows-x64.v6.7z(6.15 MB)
    VSTRT-Windows-x64.v6.7z(351.50 KB)
  • v5(Dec 30, 2021)

    Changelog:

    1. added fp16 support to vs-ov and vs-ort (input model is still fp32, and these filters will convert it to fp16 on the fly). Now all three backends support inference with fp16 (though using fp16 mainly benefit vs-ort's CUDA backend).
    2. ~~fixed vs-ov spurious logging messages to stdout which interferes with vspipe | x265 pipeline (requires patched openvino)~~ Turns out the fix is not picked by the CI. Please use v6 for vs-ov.
    3. changes to the vs-trt backend vsmlrt.Backend.TRT() of the vsmlrt.py wrapper
      • max_shapes defaults to tile size now (as tensorrt GPU memory usage is related to max_shapes rather than the actual shape used in inference, this should help saving GPU memory);
      • the default opt_shapes is None now, which means it will be set to the actual tilesize in use: this is especially beneficial for large models like DPIR. If you prefer faster engine build times, you should set opt_shapes=(64, 64) to restore previous behavior. This change also makes it easier to use the tiles parameter (as in this case, you generally don't know the exact inference shape)
      • changed default cache & engine directory: first try saving the engine and cache file to the same directory as the onnx model and if not writable, use the system temporary directory (on the same drive as the onnx model files).
      • fixed a bug when reusing the same backend variable for different filters

    vsmlrt-cuda and model packages are identical to v4.

    PS: we have successfully used both ~~vs-ov and~~ vs-trt in production anime encodings, so this release should be ready for production. As always, issues and suggestions welcome. Update: turns out vs-ov is broken. The fix to openvino is not correctly picked up by the CI pipeline. Please use v6 for vs-ov.

    Source code(tar.gz)
    Source code(zip)
    models.v5.7z(552.03 MB)
    scripts.v5.7z(4.10 KB)
    vsmlrt-cuda.v5.7z(906.05 MB)
    vsmlrt-windows-x64-cpu.v5.7z(562.10 MB)
    vsmlrt-windows-x64-cuda.v5.7z(1684.95 MB)
    VSORT-Windows-x64.v5.7z(12.57 MB)
    VSOV-Windows-x64.v5.7z(6.14 MB)
    VSTRT-Windows-x64.v5.7z(351.59 KB)
  • v4(Dec 17, 2021)

    This release introduces the following features:

    • vsmlrt.py: added support for vs-trt (including transparent engine compilation)
    • added RealESRGANv2 models, see https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211209/RealESRGANv2_v1.7z
    • full binary releases for Windows, which includes full set of models (waifu2x, RealESRGANv2 and DPIR) and all required DLLs. To simplify installation, we provide two variants:
      • CPU only: vsmlrt-windows-x64-cpu.v4.7z
      • CPU+CUDA: vsmlrt-windows-x64-cuda.v4.7z To install, just extract them into your VS plugins directory (preserving the existing directory structure within the 7z archive), and move vsmlrt.py into your VS python site-packages directory and you're done.

    Component Downloads

    Besides the full releases, each individual component also has its own release, so that users can upgrade only what has been changed:

    • models: full fp32 model release 20211209, includes waifu2x, RealESRGANv2 and DPIR.
    • scripts: vsmlrt.py wrapper script, extract to VS python site-packages directory
    • vsmlrt-cuda: shared CUDA DLLs for vs-ort and vs-trt
    • VSOV-Windows-x64: vs-ov plugin (pure CPU backend)
    • VSORT-Windows-x64: vs-ort plugin, includes both CPU and CUDA backend; CUDA backend requires vsmlrt-cuda package.
    • VSTRT-Windows-x64: vs-trt plugin, requires vsmlrt-cuda package.

    All component packages should be extracted to your VS plugins directory, except for scripts.v4.7z, which needs to be extracted to VS python site-packages directory.

    Known Issues

    1. building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
    2. due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.

    Installation Notes

    1. It is recommended to update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility; However, GeForce GPU users with GPU driver >= v452.39 should be able to use the CUDA backend.
    2. There are no changes to vsmlrt-cuda.7z from v3, so no need to re-download it if you already have it from v3.
    Source code(tar.gz)
    Source code(zip)
    models.v4.7z(552.03 MB)
    scripts.v4.7z(3.80 KB)
    vsmlrt-cuda.v4.7z(906.05 MB)
    vsmlrt-windows-x64-cpu.v4.7z(561.87 MB)
    vsmlrt-windows-x64-cuda.v4.7z(1684.26 MB)
    VSORT-Windows-x64.v4.7z(12.03 MB)
    VSOV-Windows-x64.v4.7z(6.12 MB)
    VSTRT-Windows-x64.v4.7z(350.99 KB)
  • v3(Dec 16, 2021)

    This release improves the interface of wrapper and plugins:

    • The argument pad is renamed to overlap, and it is now possible to specify different overlap values on each direction.
    • The arguments block_w and block_h are merged into a single argument tilesize.
    • vsmlrt.py now supports DPIR models. The type of argument backend is changed to a typed data class. To use the plugin, you need to extract v3 DPIR model files into VS plugins\models directory (please keep the directory structure inside the 7z archive intact while extracting.)

    Built-in models can be found at model-20211209.

    Example waifu2x wrapper usage:

    from vsmlrt import Waifu2x, Waifu2xModel, Backend
    
    src = core.std.BlankClip(format=vs.RGBS)
    
    # backend could be:
    #  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
    #  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort's cpu backend.
    #  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
    #     - use device_id to select device
    #     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
    flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend=Backend.ORT_CUDA())
    

    Example DPIR wrapper usage:

    from vsmlrt import DPIR, DPIRModel, Backend
    src = core.std.BlankClip(format=vs.RGBS) # or vs.GRAYS for gray only models
    # DPIR is a huge model and GPU backend is highly recommended.
    # If the model runs out of GPU memory, increase the tiles parameter.
    flt = DPIR(src, strength=5, model=DPIRModel.drunet_color, tiles=2, backend=Backend.ORT_CUDA())
    

    Known Issues

    1. building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
    2. due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.

    Installation Notes

    1. It is recommended to update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility; However, GeForce GPU users with GPU driver >= v452.39 should be able to use the CUDA backend.
    2. There are no changes to vsmlrt-cuda.7z from v2, so no need to re-download it if you already have it from v2.
    Source code(tar.gz)
    Source code(zip)
    vsmlrt-cuda.7z(906.51 MB)
    VSORT-Windows-x64.v3.zip(31.36 MB)
    VSOV-Windows-x64.v3.zip(11.41 MB)
    VSTRT-Windows-x64.v3.zip(613.29 KB)
  • v2(Dec 10, 2021)

    This release introduces the vs-trt plugin, which should provde the best possible performance on NVidia GPUs at the expense of requiring an extra tedious engine building step. vs-trt is only recommended for large AI models, e.g. DPIR. Smaller models like waifu2x won't see much performance benefits. Please refer to its docs for further usage instructions (and be forewarned, it's very hard to use unless you are prepared to spend some time understanding the process and doing some trial and error experiments.)

    If you use GPU support for vsort or vstrt, then you also need to download and extract vsmlrt-cuda.7z into your VS plugins directory (while keeping the directory structure inside the 7z files). The DLLs there will be shared by vsort and vstrt. Please also note that vstrt requires the use of new models released in model-20211209.

    This release also introduces builtin model support for vsov and vsort (as vstrt requires building engine separately, builtin model support is moot.) You can place the model onnx files under VS plugins\models directory, and set builtin=True for vsov and vsort filters so that the network_path argument is interpreted as a path relative to plugins\models. This mode makes it easier to make a VS portable release with integrated models. For example, after extracting waifu2x-v3.7z into your VS plugins\models directory (while keeping the directory structure inside the 7z files), you can use do this to use the waifu2x models with vsmlrt.py without worrying about their absolute paths:

    from vsmlrt import Waifu2x, Waifu2xModel
    
    src = core.std.BlankClip(format=vs.RGBS)
    # backend could be: "ort-cpu", "ort-cuda", "ov-cpu"; suggested choice is "ov-cpu" for pure CPU and "ort-cuda" for GPU.
    flt = Waifu2x(src, noise=-1, scale=2, model=Waifu2xModel.upconv_7_anime_style_art_rgb, backend="ort-cuda")
    

    vsmlrt-cuda.7z Changelog

    1. added nvrtc for vstrt dynamic layer fusion support, only necessary if you use vstrt. If you only intend to use vsort, you can just download the smaller package vsmlrt-cuda-no-nvrtc.7z.

    Known Issues

    1. building TRT engine for waifu2x cunet and upresnet10 will fail on RTX 2000 and RTX 3000 series GPUs, please use vsort if you are using affected GPUs.
    2. due to the way NVidia DLLs are named, there might be DLL conflicts if you also have other AI filters (e.g. waifu2x caffe) in your plugins directory. Due to licensing restrictions and windows technical restrictions, there is no easy way to solve this DLL conflict problem. You will have to remove those conflicting plugins. Fortunately, the only affected plugin seems to be waifu2x caffe and we have already provided full functionality coverage and better performance with the vsmlrt.py script so there is no reason to use the caffe plugin anymore.

    Installation Notes

    1. please update to the latest GPU driver (e.g. >= v472.50) if you intend to use the CUDA backend of vsort or vstrt for best performance and compatibility.
    2. GeForce GPU users may use the v2b version of vsort which supports GPU driver >= v452.39.
    Source code(tar.gz)
    Source code(zip)
    vsmlrt-cuda-no-nvrtc.7z(894.86 MB)
    vsmlrt-cuda.7z(906.51 MB)
    VSORT-Windows-x64.v2.zip(11.48 MB)
    VSORT-Windows-x64.v2b.zip(31.36 MB)
    VSOV-Windows-x64.v2.zip(11.41 MB)
    VSTRT-Windows-x64.v2.zip(610.32 KB)
  • model-20211209(Dec 9, 2021)

    Model release 20211209

    This requires plugin release v2 or above. Users of v1 or v0 plugin releases please continue to use the previous release.

    In general, we strive to keep previous model releases usable with newer plugin releases, but new model releases generally require newer plugin releases.

    Changelog

    1. Modified input dimension to -1 to better support dynamic shapes and the upcoming vstrt plugin. vsov and vsort users can continue to use last release (though upgrading is highly recommended.)
    2. Added Real-CUGAN models
    3. Added cugan-pro and RealESRGANv3 models.
    Source code(tar.gz)
    Source code(zip)
    cugan-pro_v3.7z(26.76 MB)
    cugan_v2.7z(51.28 MB)
    dpir_v3.7z(459.91 MB)
    RealESRGANv2_v1.7z(4.32 MB)
    RealESRGANv3_v1.7z(2.19 MB)
    waifu2x_v3.7z(83.12 MB)
  • v1(Dec 4, 2021)

    Initial public preview vs-ort release.

    Changelog

    1. VSOV: moved tbb.dll into its own directory, so that we don't put any non-VS plugin DLL into the top level of the plugins directory.
    2. VSORT: initial release.

    Installation Notes

    • VSORT: ONNX Runtime
      • CPU only: extract VSORT-Windows-x64.zip into vapoursynth/plugins directory. You can additionally remove vsort/onnxruntime_providers_cuda.dll and vsort/onnxruntime_providers_shared.dll to save some disk space.
      • CUDA: extract both VSORT-Windows-x64.zip and vsmlrt-cuda.7z into vapoursynth/plugins directory.
    • VSOV: just extract VSOV-Windows-x64.zip into vapoursynth/plugins directory.

    Please note that the CUDA libraries are huge (requires ~1.9GiB space after extraction).

    Please refer to the wiki for details.

    Source code(tar.gz)
    Source code(zip)
    vsmlrt-cuda.7z(728.05 MB)
    VSORT-Windows-x64.zip(10.99 MB)
    VSOV-Windows-x64.zip(11.41 MB)
  • v0(Dec 3, 2021)

  • model-20211203(Dec 3, 2021)

    Initial ONNX Model Release

    Waifu2x

    Waifu2x is a well-known image super-resolution for anime-style arts.

    Link: https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211203/waifu2x_v2.7z Docs: https://github.com/AmusementClub/vs-mlrt/wiki/waifu2x

    Includes all known publicly available waifu2x models:

    • anime_style_art: noise1 noise2 noise3 scale2.0x
    • anime_style_art_rgb: noise0 noise1 noise2 noise3 scale2.0x
    • upconv_7_anime_style_art_rgb: scale2.0x noise3_scale2.0x noise2_scale2.0x noise1_scale2.0x noise0_scale2.0x
    • photo: noise0 noise1 noise2 noise3 scale2.0x
    • ukbench: scale2.0x
    • upconv_7_photo: scale2.0x noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x
    • cunet: noise0 noise1 noise2 noise3 scale2.0x noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x
    • upresnet10: scale2.0x noise0_scale2.0x noise1_scale2.0x noise2_scale2.0x noise3_scale2.0x

    DPIR

    DPIR, or Plug-and-Play Image Restoration with Deep Denoiser Prior, is a denoise and deblocking neural network. See also https://github.com/HolyWu/vs-dpir.

    DPIR requires a strength parameter, you need to pass it in the form of a GRAYS clip.

    Link: https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211203/dpir_v2.7z Docs: https://github.com/AmusementClub/vs-mlrt/wiki/DPIR

    Includes these models:

    • drunet_gray: GRAY denoise
    • drunet_deblocking_grayscale: GRAY deblocking
    • drunet_color: RGB denoise
    • drunet_deblocking_color.onnx: RGB deblocking
    Source code(tar.gz)
    Source code(zip)
    dpir_v2.7z(460.14 MB)
    waifu2x_v2.7z(83.21 MB)
Owner
私立七森中ごらく部
#VapourSynth-Classic & YuruYuri: Bleeding Edge Anime Encoding
私立七森中ごらく部
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.

Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.

Aaron Feng 8.7k Dec 31, 2022
waifu2x converter ncnn version, runs fast on intel / amd / nvidia GPU with vulkan

waifu2x ncnn Vulkan ncnn implementation of waifu2x converter. Runs fast on Intel / AMD / Nvidia with Vulkan API. waifu2x-ncnn-vulkan uses ncnn project

null 2.4k Dec 24, 2022
C++ Implementation of "An Equivariant Filter for Visual Inertial Odometry", ICRA 2021

EqF VIO (Equivariant Filter for Visual Inertial Odometry) This repository contains an implementation of an Equivariant Filter (EqF) for Visual Inertia

null 60 Nov 15, 2022
A custom GEGL filter that does layer effects. It may not be non-destructive but you can make presets of your favorite text styles

A custom GEGL filter that does layer effects. It may not be non-destructive but you can make presets of your favorite text styles. Futures plans are going to include an image file overlay, and pro tip you can do a multistroke if sacrifice a shadow/glow.

null 11 Jan 2, 2023
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 378 Dec 30, 2022
Object Based Generic Perception Object Model

This model is a highly parameterizable generic perception sensor and tracking model. It can be parameterized as a Lidar or a Radar. The model is based on object lists and all modeling is performed on object level.

TU Darmstadt - FZD 5 Jun 11, 2022
Gstreamer plugin that allows use of NVIDIA Maxine SDK in a generic pipeline.

GST-NVMAXINE Gstreamer plugin that allows use of NVIDIA MaxineTM sdk in a generic pipeline. This plugin is intended for use with NVIDIA hardware. Visi

Alex Pitrolo 18 Dec 19, 2022
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

ENCRYPTO 15 Nov 29, 2022
percepnet implemented using Keras, still need to be optimized and tuned.

PercepNet (Still need to be tuned) Unofficial implementation of PercepNet : A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhanceme

cookcodes 33 Nov 17, 2022
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Tencent 16.2k Jan 5, 2023
copc-lib provides an easy-to-use interface for reading and creating Cloud Optimized Point Clouds

copc-lib copc-lib is a library which provides an easy-to-use reader and writer interface for COPC point clouds. This project provides a complete inter

Rock Robotic 25 Nov 29, 2022
Advent of Code 2021 optimized solutions in C++

advent2021-fast These solutions are a work in progress. Advent of Code 2021 optimized C++ solutions. Here are the timings from an example run on an i9

Andrew Skalski 10 Dec 15, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.7k Jan 3, 2023
FoxRaycaster, optimized, fixed and with a CUDA option

Like FoxRaycaster(link) but with a nicer GUI, bug fixes, more optimized and with CUDA. Used in project: Code from FoxRaycaster, which was based on thi

Błażej Roszkowski 2 Oct 21, 2021
A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

Libonnx A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support. Getting Started The library's

xboot.org 442 Dec 25, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

JoliBrain 2.4k Dec 30, 2022
A PoC for requesting HWIDs directly from hardware, skipping any potential hooks or OS support.

PCIBan A PoC for requesting HWIDs directly from hardware, skipping any potential hooks or OS support. This is probably very unsafe, not supporting edg

null 62 Dec 28, 2022
APFS module for linux, with experimental write support (out-of-tree repository)

Apple File System ================= The Apple File System (APFS) is the copy-on-write filesystem currently used on all Apple devices. This module pro

APFS for Linux 260 Jan 4, 2023
Support Yolov4/Yolov3/Centernet/Classify/Unet. use darknet/libtorch/pytorch to onnx to tensorrt

ONNX-TensorRT Yolov4/Yolov3/CenterNet/Classify/Unet Implementation Yolov4/Yolov3 centernet INTRODUCTION you have the trained model file from the darkn

null 172 Dec 29, 2022