PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

Overview

PPLNN

Overview

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing. It can run various ONNX models and has better support for OpenMMLab.

alt arch

Documents

Contact Us

Contributions

This project uses Contributor Covenant as code of conduct. Any contributions would be highly appreciated.

Acknowledgements

License

This project is distributed under the Apache License, Version 2.0.

Comments
  • cuda convolution kernel input question.

    cuda convolution kernel input question.

    Hi,

    I see current implemented cuda conv kernel are either fp16 or int8. And those kernel's data layout is NHWC, as is requred by nvidia's tensor core. So like ./tools/pplnn.py, where it do the layout transpose? in the cpu side? As from nvprof result, I only see the conv kernel.

    If I want to do the transpose at the gpu side, how should I change the command? Or I need to add additional transpose node in the onnx file?

    opened by leiwen83 12
  • 【gemm_fp32_fma performance】常用shape下,gemm_fp32_fma和tensorflow1.15  eigen matmul性能几乎持平?

    【gemm_fp32_fma performance】常用shape下,gemm_fp32_fma和tensorflow1.15 eigen matmul性能几乎持平?

    问题:测试sgemm时,发现openppl的gemm_fp32_fma在一些常用shape下和tensorflow1.15 eigen matmul性能几乎持平,符合预期吗?有啥办法提升吗? 相关参数: openppl版本v0.8,intel 32核机器,均使用多线程,build命令:./build.sh -DPPLNN_USE_X86_64=ON -DPPLNN_ENABLE_ONNX_MODEL=OFF -DPPL_USE_X86_OMP=ON -DPPLNN_USE_OPENMP=ON 以下是测试数据: image

    opened by huangmiumang 9
  • [CUDA] `RuntimeBuilder.Preprocess()` causes subsequent CUDA function calls to fail

    [CUDA] `RuntimeBuilder.Preprocess()` causes subsequent CUDA function calls to fail

    What are the problems?(screenshots or detailed error messages)

    Observe that, for some models (e.g. YOLOX-s, DBNet-r18, others like ResNet-18 are fine), after creating runtime using RuntimeBuilder, subsequent CUDA function calls (or kernel launches) may fail.

    I first getting the CUDA invalid argument error when testing ppl.nn using mmdeploy's test.py, at a point after runtime creation, before inference, when copying data from host to device. Later I met the same problem when testing using mmdeploy's SDK.

    After digging around for a while, I found the the simplest way to reproduce the problem using pplnn.py:

    insert the following code

    import torch
    t = torch.Tensor([[1,1],[1,1]]).cuda()
    

    to https://github.com/openppl-public/ppl.nn/blob/1ae5d95f3ee49b3e582564cc004443931fbe2f7a/tools/pplnn.py#L564 and then

    python pplnn.py --use-cuda --onnx-model model.onnx --in-shape 1_3_640_640 --quick-select
    

    got

    INFO: PPLNN version: [0.8.0], commit: [02418bb57bef2d888b57d44589a599080cb806d9]
    [INFO][2022-07-06 22:23:06.057][utils.cc:456] total partition(s) of graph[torch-jit-export]: 1.
    [INFO][2022-07-06 22:23:06.067][opt_graph.cc:324] added 1020 new bridge kernels
    [INFO][2022-07-06 22:23:06.223][opt_graph.cc:581] deleted 990 bridge kernels
    Traceback (most recent call last):
      File "pplnn.py", line 567, in <module>
        t = torch.Tensor([[1,1],[1,1]]).cuda()
    RuntimeError: CUDA error: invalid argument
    

    Which version(commit id or tag) of ppl.nn is used?

    02418bb57bef2d888b57d44589a599080cb806d9

    What's the operating system ppl.nn runs on?

    Ubuntu 18.04

    What's the compiler and its version?

    GCC-7.5, CUDA-11.1

    What are the commands used to build ppl.nn?

    cmake .. \
        -DCMAKE_INSTALL_PREFIX=/workspace/ppl.nn/install \
        -DPPLNN_ENABLE_PYTHON_API=ON \
        -DPPLNN_USE_X86_64=ON \
        -DPPLNN_USE_CUDA=ON \
        -DPPL_USE_X86_AVX512=OFF \
        -DPPLNN_ENABLE_CUDA_JIT=OFF \
        -DCMAKE_BUILD_TYPE=Release \
        -DCMAKE_CUDA_ARCHITECTURES=75
    
    opened by lzhangzz 9
  • Mask R-CNN failed with pplnn

    Mask R-CNN failed with pplnn

    The model was conveted from mmdetection library. And when I try to execute with pplnn, it shows error:

    [INFO][2021-07-14 17:18:19.999][pplnn.cc:703] ppl.nn version: 5d56662bf5a288898f0dd5b90f763459cc86f47a
    [WARNING][2021-07-14 17:18:21.873][engine.cc:209] Default input dims for dynamic graph are 1_3_224_224, we recommend using '--dims' to set a suitable training shape.
    [INFO][2021-07-14 17:18:21.873][pplnn.cc:104] ***** register CudaEngine *****
    [INFO][2021-07-14 17:18:22.320][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export]: 1.
    [ERROR][2021-07-14 17:18:22.322][reshape_reshape.cc:66] infer shape failed.
    [ERROR][2021-07-14 17:18:22.338][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.339][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.340][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.341][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.341][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.342][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.343][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.343][reshape_unsqueeze.cc:36] axes overflow.
    [ERROR][2021-07-14 17:18:22.343][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.344][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.344][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.345][reshape_split.cc:59] splited axis and sum of split point not match.
    [INFO][2021-07-14 17:18:22.346][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export1]: 1.
    [INFO][2021-07-14 17:18:22.346][opt_graph.cc:204] Create 2 TensorImpl
    [INFO][2021-07-14 17:18:22.346][opt_graph.cc:316] added 2 new bridge kernels
    [INFO][2021-07-14 17:18:22.346][opt_graph.cc:478] deleted 1 bridge kernels
    [INFO][2021-07-14 17:18:22.347][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export2]: 1.
    [INFO][2021-07-14 17:18:22.347][opt_graph.cc:204] Create 20 TensorImpl
    [INFO][2021-07-14 17:18:22.347][opt_graph.cc:316] added 21 new bridge kernels
    [INFO][2021-07-14 17:18:22.347][opt_graph.cc:478] deleted 14 bridge kernels
    [ERROR][2021-07-14 17:18:22.348][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.348][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.348][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.349][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.350][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.389][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.389][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.390][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.391][reshape_add.cc:39] unbroadcastable input.
    [ERROR][2021-07-14 17:18:22.391][reshape_unsqueeze.cc:36] axes overflow.
    [ERROR][2021-07-14 17:18:22.391][reshape_unsqueeze.cc:36] axes overflow.
    [INFO][2021-07-14 17:18:22.392][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export3]: 1.
    [INFO][2021-07-14 17:18:22.392][opt_graph.cc:204] Create 2 TensorImpl
    [INFO][2021-07-14 17:18:22.392][opt_graph.cc:316] added 2 new bridge kernels
    [INFO][2021-07-14 17:18:22.392][opt_graph.cc:478] deleted 1 bridge kernels
    [INFO][2021-07-14 17:18:22.392][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export4]: 1.
    [INFO][2021-07-14 17:18:22.393][opt_graph.cc:204] Create 20 TensorImpl
    [INFO][2021-07-14 17:18:22.393][opt_graph.cc:316] added 21 new bridge kernels
    [INFO][2021-07-14 17:18:22.408][opt_graph.cc:478] deleted 14 bridge kernels
    [ERROR][2021-07-14 17:18:22.408][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.409][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.409][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.409][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.410][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.411][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.411][reshape_split.cc:59] splited axis and sum of split point not match.
    [ERROR][2021-07-14 17:18:22.413][reshape_split.cc:59] splited axis and sum of split point not match.
    [INFO][2021-07-14 17:18:22.426][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export5]: 1.
    [ERROR][2021-07-14 17:18:22.426][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.427][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.428][reshape_concat.cc:42] input shape not match.
    [ERROR][2021-07-14 17:18:22.429][reshape_concat.cc:42] input shape not match.
    [INFO][2021-07-14 17:18:22.429][opt_graph.cc:204] Create 135 TensorImpl
    [INFO][2021-07-14 17:18:22.430][opt_graph.cc:316] added 174 new bridge kernels
    [INFO][2021-07-14 17:18:22.433][opt_graph.cc:478] deleted 153 bridge kernels
    [INFO][2021-07-14 17:18:22.434][opt_graph.cc:204] Create 2263 TensorImpl
    [INFO][2021-07-14 17:18:22.660][opt_graph.cc:316] added 2626 new bridge kernels
    [INFO][2021-07-14 17:20:05.963][opt_graph.cc:478] deleted 2547 bridge kernels
    [ERROR][2021-07-14 17:20:06.007][scheduler_common.cc:170] exec kernel[Pad_146] failed: invalid value
    [ERROR][2021-07-14 17:20:06.007][sequential_scheduler.cc:116] execute kernel[Pad_146] failed: invalid value
    [ERROR][2021-07-14 17:20:06.007][pplnn.cc:804] Run() failed: invalid value
    

    I'm running it with true image data. Dose that pplnn support maskrcnn, or what should I do to execute it suceessfully? Thanks a lot! The model was generated by this command:

    python ../tools/deployment/pytorch2onnx.py ../configs/mask_rcnn/mask_rcnn_r50_fpn_mstrain-poly_3x_coco.py \
    mask_rcnn_r50_fpn_mstrain-poly_3x_coco_20210524_201154-21b550bb.pth \
    --output-file mask_rcnn.onnx --simplify --dynamic-export
    
    opened by Maosquerade 9
  • tools/pplnn.py --use-cuda output error

    tools/pplnn.py --use-cuda output error

    What are the problems?(screenshots or detailed error messages)

    use ./tools/pplnn.py --use-cuda --onnx-model tests/testdata/conv.onnx to test python api and cuda engine; add input and output data value print to https://github.com/openppl-public/ppl.nn/blob/master/tools/pplnn.py#L499 and https://github.com/openppl-public/ppl.nn/blob/master/tools/pplnn.py#L511 it seems that input tensor and output tensor have the same value; which is different from x86 engine output;

    INFO: PPLNN version: [0.6.3], commit: [9444a9d2ee0b89d8cd4a2fee8cef839fedfe8837]
    [INFO][2022-04-19 18:43:40.768][engine_graph_partitioner.cc:103] total partition(s) of graph[torch-jit-export]: 1.
    [INFO][2022-04-19 18:43:40.768][opt_graph.cc:329] added 4 new bridge kernels
    [INFO][2022-04-19 18:43:40.770][algo_conv_hmma.cc:129] Compiling Conv_0
    [INFO][2022-04-19 18:43:41.454][opt_graph.cc:583] deleted 2 bridge kernels
    INFO: ----- input info -----
    INFO: input[0]
    INFO:     name: input
    INFO:     dim(s): [1, 3, 4, 4]
    INFO:     type: FLOAT32
    INFO:     format: NDARRAY
    INFO:     byte(s) excluding padding: 192
    INFO:     in_data: [[[[-0.7580919  -1.0537796  -1.4523766  -1.1736736 ]
       [-0.50453496 -1.48383    -1.3174736  -0.8811438 ]
       [-1.5446684  -0.33240414 -1.429975   -1.172169  ]
       [-1.2639251  -0.00716734 -0.26453447 -1.4403057 ]]
    
      [[-1.6206262  -1.3826382  -0.74133873 -0.9391637 ]
       [-0.42861128 -0.09090185 -1.2538221  -0.02137303]
       [-0.074507   -0.29974604 -0.45086026 -1.9801757 ]
       [-0.07279325 -0.67775655 -1.4832225  -1.862076  ]]
    
      [[-1.0764339  -0.25367737 -1.8603811  -1.5876365 ]
       [-1.8216178  -0.6460962  -0.5559113  -0.9660294 ]
       [-1.837322   -1.0467303  -0.04060197 -0.5114651 ]
       [-0.21527338 -0.26388478 -1.6131785  -1.4633346 ]]]]
    INFO: ----- output info -----
    INFO: output[0]
    INFO:     name: 5
    INFO:     dim(s): [1, 3, 5, 5]
    INFO:     type: FLOAT32
    INFO:     format: NDARRAY
    INFO:     byte(s) excluding padding: 300
    INFO:     out_data: [[[[-0.7580919  -1.0537796  -1.4523766  -1.1736736  -0.50453496]
       [-1.48383    -1.3174736  -0.8811438  -1.5446684  -0.33240414]
       [-1.429975   -1.172169   -1.2639251  -0.00716734 -0.26453447]
       [-1.4403057  -1.6206262  -1.3826382  -0.74133873 -0.9391637 ]
       [-0.42861128 -0.09090185 -1.2538221  -0.02137303 -0.074507  ]]
    
      [[-0.29974604 -0.45086026 -1.9801757  -0.07279325 -0.67775655]
       [-1.4832225  -1.862076   -1.0764339  -0.25367737 -1.8603811 ]
       [-1.5876365  -1.8216178  -0.6460962  -0.5559113  -0.9660294 ]
       [-1.837322   -1.0467303  -0.04060197 -0.5114651  -0.21527338]
       [-0.26388478 -1.6131785  -1.4633346   0.          0.        ]]
    
      [[ 0.          0.          0.          0.          0.        ]
       [ 0.          0.          0.          0.          0.        ]
       [ 0.          0.          0.          0.          0.        ]
       [ 0.          0.          0.          0.          0.        ]
       [ 0.          0.          0.          0.          0.        ]]]]
    INFO: Run ok
    

    Which version(commit id or tag) of ppl.nn is used?

    PPLNN version: [0.6.3], commit: [9444a9d2ee0b89d8cd4a2fee8cef839fedfe8837]

    What's the operating system ppl.nn runs on?

    ubuntu18.04

    What's the compiler and its version?

    g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

    What are the commands used to build ppl.nn?

    ./build.sh -DHPCC_USE_X86_64=ON -DPPLNN_ENABLE_PYTHON_API=ON -DHPCC_USE_CUDA=ON

    What are the execution commands?

    PYTHONPATH=./pplnn-build/install/lib python3 ./tools/pplnn.py --use-cuda --onnx-model tests/testdata/conv.onnx

    minimal code snippets for reproducing these problems(if necessary)

    models and inputs for reproducing these problems (send them to [email protected] if necessary)

    opened by sky-fun 8
  • cuda推理报错

    cuda推理报错

    What are the problems?(snapshots or detailed error messages)

    将cpp的分类示例工程改为使用cuda推理(x86可以正常编译运行,benchmark cuda和x86也都可以跑),编译时打印以下内容:

    $ bear make -j
    Consolidate compiler generated dependencies of target classification
    [ 50%] Building CXX object CMakeFiles/classification.dir/classification.cpp.o
    [100%] Linking CXX executable classification
    /home/ubuntu/Documents/ppl.nn/pplnn-build/install/lib/cmake/ppl/../../../lib/libpplnn_static.a(engine_factory.cc.o):在函数‘ppl::nn::CudaEngineFactory::Create(ppl::nn::CudaEngineOptions const&)’中:
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_module.h:42:对‘cuModuleUnload’未定义的引用
    /home/ubuntu/Documents/ppl.nn/pplnn-build/install/lib/cmake/ppl/../../../lib/libpplnn_static.a(engine.cc.o):在函数‘ppl::nn::cuda::CudaEngine::~CudaEngine()’中:
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_module.h:42:对‘cuModuleUnload’未定义的引用
    /home/ubuntu/Documents/ppl.nn/pplnn-build/install/lib/cmake/ppl/../../../lib/libpplnn_static.a(engine.cc.o):在函数‘ppl::nn::cuda::CudaEngine::~CudaEngine()’中:
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_module.h:42:对‘cuModuleUnload’未定义的引用
    /home/ubuntu/Documents/ppl.nn/pplnn-build/install/lib/cmake/ppl/../../../lib/libpplnn_static.a(cuda_compiler.cc.o):在函数‘ppl::nn::cuda::CUDANVRTCCompile(std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::vector<char const*, std::allocator<char const*> >, int, bool)’中:
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:44:对‘nvrtcCreateProgram’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:45:对‘nvrtcCompileProgram’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:48:对‘nvrtcGetProgramLogSize’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:51:对‘nvrtcGetProgramLog’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:56:对‘nvrtcGetPTXSize’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:59:对‘nvrtcGetPTX’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:60:对‘nvrtcDestroyProgram’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:61:对‘cudaDeviceSynchronize’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:60:对‘nvrtcGetErrorString’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:44:对‘nvrtcGetErrorString’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:59:对‘nvrtcGetErrorString’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_compiler.cc:56:对‘nvrtcGetErrorString’未定义的引用
    /home/ubuntu/Documents/ppl.nn/pplnn-build/install/lib/cmake/ppl/../../../lib/libpplnn_static.a(cuda_module.cc.o):在函数‘ppl::nn::cuda::CUDAModule::GetKernelFunc()’中:
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_module.cc:25:对‘cuModuleLoadDataEx’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_module.cc:25:对‘cuGetErrorName’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_module.cc:28:对‘cuModuleGetFunction’未定义的引用
    /home/ubuntu/Documents/ppl.nn/src/ppl/nn/engines/cuda/module/cuda_module.cc:28:对‘cuGetErrorName’未定义的引用
    /home/ubuntu/Documents/ppl.nn/pplnn-build/install/lib/cmake/ppl/../../../lib/libpplnn_static.a(cuda_module.cc.o):在函数‘ppl::nn::cuda::CUDAModule::GetKernelFunc(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)’中:
    ...
    ...
    ...
    

    Which version(commit id or tag) of ppl.nn is used?

    ppl.nn version: 0a545145b6b1816fd190c6023a588328872fe80f

    What's the operating system ppl.nn runs on?

    Linux ubuntu-1660ti 5.4.0-100-generic #113~18.04.1-Ubuntu SMP Mon Feb 7 15:02:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

    What's the compiler and its version?

    我使用了两个版本的gcc,都不行

    • gcc (Ubuntu 9.4.0-1ubuntu1~18.04) 9.4.0
    • gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

    What are the commands used to build ppl.nn?

    ./build.sh -DPPLNN_ENABLE_PYTHON_API=ON -DHPCC_USE_X86_64=ON -DHPCC_USE_CUDA=ON

    What are the execution commands?

    bear make -j

    minimal code snippets for reproducing these problems(if necessary)

    #include "ppl/nn/engines/cuda/cuda_engine_options.h"
    #include "ppl/nn/engines/cuda/engine_factory.h"
    ...
    /************************ 2. create runtime builder from onnx model *************************/
        CudaEngineOptions options;
        options.device_id = 0;
        options.mm_policy = CUDA_MM_BEST_FIT;
    
        auto cuda_engine = CudaEngineFactory::Create(options);
        if (!cuda_engine)
        {
            return false;
        }
        cuda_engine->Configure(ppl::nn::CUDA_CONF_USE_DEFAULT_ALGORITHMS, false);
        vector<unique_ptr<Engine>> engines;
        vector<Engine *> engine_ptrs;
        engines.emplace_back(unique_ptr<Engine>(cuda_engine));
        engine_ptrs.emplace_back(engines[0].get());
    ...
    

    models and inputs for reproducing these problems (sends them to [email protected] if necessary)

    opened by watersounds 8
  • centernet runs with memory error.

    centernet runs with memory error.

    My gpu is Tesla T4, and sample model runs normally. When I use centernet with --mm-policy=mem, it turns out erorr like this, but it can get an output. image WHen I use --mm-policy=perf, it gets error out of memory like this: image It seems they both end with memory error, is this error familiar to your team, or how can I avoid this error?

    opened by Maosquerade 8
  • pplnn run mobilenet v2 model failed. (use cuda)

    pplnn run mobilenet v2 model failed. (use cuda)

    What are the problems?(screenshots or detailed error messages)

    pplnn run mobilenet v2 model failed(use cuda). mobilenet v2 model is exported from torchvision.

    ppl.nn version: [0.9.0], commit: [2da19ac438d4f726b8744d650a1751d310fc0710-dirty]
    [INFO][2022-12-04 17:42:46.453][pplnn.cc:308] ***** register CudaEngine *****
    [INFO][2022-12-04 17:42:46.474][utils.cc:369] total partition(s) of graph[torch_jit]: 1.
    [INFO][2022-12-04 17:42:46.478][opt_graph.cc:312] added 242 new bridge kernels
    [INFO][2022-12-04 17:42:46.509][algo_conv_hmma.cc:141] Compiling /features/features.0/features.0.0/Conv
    [INFO][2022-12-04 17:42:51.219][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x32_w32x16_k32_s16
    [INFO][2022-12-04 17:42:51.239][algo_conv_hmma.cc:141] Compiling /features/features.1/conv/conv.1/Conv
    [INFO][2022-12-04 17:42:55.559][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x16_w32x8_k64_s32
    [INFO][2022-12-04 17:42:55.650][algo_conv_hmma.cc:141] Compiling /features/features.2/conv/conv.0/conv.0.0/Conv
    [INFO][2022-12-04 17:42:58.170][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b128x32_w32x16_k32_s32
    [INFO][2022-12-04 17:42:58.184][algo_conv_hmma.cc:141] Compiling /features/features.2/conv/conv.2/Conv
    [INFO][2022-12-04 17:43:00.891][algo_conv_hmma.cc:146] select kernel nv2spkSm75Fp16Conv_hmma1688_nhwc_f1_b32x16_w16x16_k64_s32_buf2
    [INFO][2022-12-04 17:43:00.921][algo_conv_hmma.cc:141] Compiling /features/features.3/conv/conv.0/conv.0.0/Conv
    [INFO][2022-12-04 17:43:06.278][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x32_w32x8_k32_s32
    [INFO][2022-12-04 17:43:06.289][algo_conv_hmma.cc:141] Compiling /features/features.3/conv/conv.2/Conv
    [INFO][2022-12-04 17:43:06.524][algo_conv_hmma.cc:146] select kernel nv2spkSm75Fp16Conv_hmma1688_nhwc_f1_b64x8_w64x8_k128_s32_buf1
    [INFO][2022-12-04 17:43:06.557][algo_conv_hmma.cc:141] Compiling /features/features.4/conv/conv.0/conv.0.0/Conv
    [INFO][2022-12-04 17:43:12.012][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b128x32_w64x8_k32_s32
    [INFO][2022-12-04 17:43:12.017][algo_conv_hmma.cc:141] Compiling /features/features.4/conv/conv.2/Conv
    Segmentation fault (core dumped)
    

    What are the types of GPU/CPU you are using?

    RTX 2080 Ti

    What's the operating system ppl.nn runs on?

    Ubuntu 18.04

    What's the compiler and its version?

    g++ 7.5.0 nvcc V10.2.89

    Which version(commit id or tag) of ppl.nn is used?

    2da19ac438d4f726b8744d650a1751d310fc0710-dirty

    What are the commands used to build ppl.nn?

    cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install cmake --build . -j 20 --config Release
    cmake --build . --target install -j 20 --config Release

    What are the execution commands?

    ./pplnn-build/tools/pplnn --use-cuda --onnx-model=mobilenet_v2.onnx --kernel-type=float16 --export-algo-file=algos/mobilenet_v2_fp16.json

    minimal code snippets for reproducing these problems(if necessary)

    import torch
    import torchvision
    model = torchvision.models.mobilenet_v2(torchvision.models.MobileNet_V2_Weights.DEFAULT)
    dummy_input = torch.randn(1, 3, 224, 224)
    torch.onnx.export(
           model,
           dummy_input,
           "mobilenet_v2.onnx",
           input_names=["inp"],
           output_names=["out"],
           opset_version=11
    )
    
    ./pplnn-build/tools/pplnn --use-cuda --onnx-model=mobilenet_v2.onnx  --kernel-type=float16 --export-algo-file=algos/mobilenet_v2_fp16.json
    

    models and inputs for reproducing these problems (send them to [email protected] if necessary)

    opened by shiwenloong 7
  • About the core J1900 run the python demo occur get unsupported isa 0

    About the core J1900 run the python demo occur get unsupported isa 0

    Screenshot from 2022-03-24 10-39-10

    cpu core J1900 vendor_id : GenuineIntel cpu family : 6 model : 55 model name : Intel(R) Celeron(R) CPU J1900 @ 1.99GHz stepping : 9 microcode : 0x90c cpu MHz : 2042.652 cache size : 1024 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat md_clear bugs : cpu_meltdown spectre_v1 spectre_v2 mds msbds_only bogomips : 4000.00 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual

    gcc 7.5.0 os:Ubuntu18.04 LTS PPLNN version: [0.6.3]

    I use the command: PYTHONPATH=./pplnn-build/install/lib python3 ./tools/pplnn.py --use-x86 --onnx-model tests/testdata/conv.onnx

    i find maybe the core is too elder that not support this?

    opened by F0xZz 7
  • x86引擎运行没问题,但cuda引擎无法运行,会卡在Compiling Conv_0直至64G内存全部耗尽

    x86引擎运行没问题,但cuda引擎无法运行,会卡在Compiling Conv_0直至64G内存全部耗尽

    [DEBUG][2022-02-26 11:23:12.125][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_127_Fused] [DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_139_Fused] [DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_151_Fused] [DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_163_Fused] [DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_176_Fused] [DEBUG][2022-02-26 11:23:12.126][fuse_shape_optimizer.cc:257] Output count 1 for fused shape node[Shape_185_Fused] [INFO][2022-02-26 11:23:12.127][engine_graph_partitioner.cc:103] total partition(s) of graph[torch-jit-export]: 1. [DEBUG][2022-02-26 11:23:12.153][opt_graph.cc:186] Can not reshape safely for node[Resize_170] [DEBUG][2022-02-26 11:23:12.154][opt_graph.cc:186] Can not reshape safely for node[Resize_158] [DEBUG][2022-02-26 11:23:12.155][opt_graph.cc:186] Can not reshape safely for node[Resize_146] [DEBUG][2022-02-26 11:23:12.156][opt_graph.cc:186] Can not reshape safely for node[Resize_134] [DEBUG][2022-02-26 11:23:12.156][reshape_concat.cc:43] ERROR: input[1]'s dim[2]'s value[1] != input[0]'s dim[2]'s value[37]. [DEBUG][2022-02-26 11:23:12.156][opt_graph.cc:186] Can not reshape safely for node[Concat_171] [DEBUG][2022-02-26 11:23:12.172][opt_graph.cc:186] Can not reshape safely for node[Resize_183] [DEBUG][2022-02-26 11:23:12.172][opt_graph.cc:186] Can not reshape safely for node[Resize_192] [DEBUG][2022-02-26 11:23:12.173][opt_graph.cc:200] Create 305 TensorImpl [DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_172] and nextnode[Relu_173] [DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_124] and nextnode[Relu_125] [DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_136] and nextnode[Relu_137] [DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_148] and nextnode[Relu_149] [DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_160] and nextnode[Relu_161] [DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_120] and nextnode[Add_121] [DEBUG][2022-02-26 11:23:12.173][fs_conv.cc:80] Fuse node[Conv_120] and nextnode[Relu_122] [DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_118] and nextnode[Relu_119] [DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_116] and nextnode[Relu_117] [DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_113] and nextnode[Add_114] [DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_113] and nextnode[Relu_115] [DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_111] and nextnode[Relu_112] [DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_109] and nextnode[Relu_110] [DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_105] and nextnode[Add_107] [DEBUG][2022-02-26 11:23:12.174][fs_conv.cc:80] Fuse node[Conv_105] and nextnode[Relu_108] [DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_103] and nextnode[Relu_104] [DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_101] and nextnode[Relu_102] [DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_98] and nextnode[Add_99] [DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_98] and nextnode[Relu_100] [DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_96] and nextnode[Relu_97] [DEBUG][2022-02-26 11:23:12.175][fs_conv.cc:80] Fuse node[Conv_94] and nextnode[Relu_95] [DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_91] and nextnode[Add_92] [DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_91] and nextnode[Relu_93] [DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_89] and nextnode[Relu_90] [DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_87] and nextnode[Relu_88] [DEBUG][2022-02-26 11:23:12.176][fs_conv.cc:80] Fuse node[Conv_84] and nextnode[Add_85] [DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_84] and nextnode[Relu_86] [DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_82] and nextnode[Relu_83] [DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_80] and nextnode[Relu_81] [DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_77] and nextnode[Add_78] [DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_77] and nextnode[Relu_79] [DEBUG][2022-02-26 11:23:12.177][fs_conv.cc:80] Fuse node[Conv_75] and nextnode[Relu_76] [DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_73] and nextnode[Relu_74] [DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_70] and nextnode[Add_71] [DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_70] and nextnode[Relu_72] [DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_68] and nextnode[Relu_69] [DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_66] and nextnode[Relu_67] [DEBUG][2022-02-26 11:23:12.178][fs_conv.cc:80] Fuse node[Conv_62] and nextnode[Add_64] [DEBUG][2022-02-26 11:23:12.179][fs_conv.cc:80] Fuse node[Conv_62] and nextnode[Relu_65] [DEBUG][2022-02-26 11:23:12.180][fs_conv.cc:80] Fuse node[Conv_60] and nextnode[Relu_61] [DEBUG][2022-02-26 11:23:12.180][fs_conv.cc:80] Fuse node[Conv_58] and nextnode[Relu_59] [DEBUG][2022-02-26 11:23:12.181][fs_conv.cc:80] Fuse node[Conv_55] and nextnode[Add_56] [DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_55] and nextnode[Relu_57] [DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_53] and nextnode[Relu_54] [DEBUG][2022-02-26 11:23:12.182][fs_conv.cc:80] Fuse node[Conv_51] and nextnode[Relu_52] [DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_48] and nextnode[Add_49] [DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_48] and nextnode[Relu_50] [DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_46] and nextnode[Relu_47] [DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_44] and nextnode[Relu_45] [DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_41] and nextnode[Add_42] [DEBUG][2022-02-26 11:23:12.183][fs_conv.cc:80] Fuse node[Conv_41] and nextnode[Relu_43] [DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_39] and nextnode[Relu_40] [DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_37] and nextnode[Relu_38] [DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_33] and nextnode[Add_35] [DEBUG][2022-02-26 11:23:12.184][fs_conv.cc:80] Fuse node[Conv_33] and nextnode[Relu_36] [DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_31] and nextnode[Relu_32] [DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_29] and nextnode[Relu_30] [DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_26] and nextnode[Add_27] [DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_26] and nextnode[Relu_28] [DEBUG][2022-02-26 11:23:12.185][fs_conv.cc:80] Fuse node[Conv_24] and nextnode[Relu_25] [DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_22] and nextnode[Relu_23] [DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_19] and nextnode[Add_20] [DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_19] and nextnode[Relu_21] [DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_17] and nextnode[Relu_18] [DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_15] and nextnode[Relu_16] [DEBUG][2022-02-26 11:23:12.186][fs_conv.cc:80] Fuse node[Conv_11] and nextnode[Add_13] [DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_11] and nextnode[Relu_14] [DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_9] and nextnode[Relu_10] [DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_7] and nextnode[Relu_8] [DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_4] and nextnode[Relu_5] [DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_2] and nextnode[Relu_3] [DEBUG][2022-02-26 11:23:12.187][fs_conv.cc:80] Fuse node[Conv_0] and nextnode[Relu_1] [INFO][2022-02-26 11:23:12.192][opt_graph.cc:311] added 261 new bridge kernels [INFO][2022-02-26 11:23:12.724][algo_conv_hmma.cc:126] Compiling Conv_0

    opened by stujiajia 7
  • [x86-compile] error: impossible constraint in ‘asm’

    [x86-compile] error: impossible constraint in ‘asm’

    I try to compile the latest master.

    CPU | result ------- | ------------- Core i5-9500(not support avx512) | error: impossible constraint in ‘asm’ Xeon 6130(support avx512) | pass

    I find that latest commit supports AVX-512. If it is a bug, will ppl support more CPU(no avx512) and any macro to separate AVX-512 codes? Thanks.

    opened by alanzhai219 7
  • pytorch wrapper

    pytorch wrapper

    Hi guys, Is it possible to supply a torch wrapper for ppl.nn? It will make it much easier to use ppl.nn. the wrapper can parse onnx file, and accept a torch.Tensor for forward-process.

    improvement 
    opened by ShiyangZhang 0
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 509 Dec 17, 2022
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

NVIDIA Corporation 6.4k Jan 4, 2023
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as scikit-learn, and more.

Microsoft 8k Jan 2, 2023
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Dec 30, 2022
Examples for using ONNX Runtime for machine learning inferencing.

Examples for using ONNX Runtime for machine learning inferencing.

Microsoft 394 Jan 3, 2023
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 179 Dec 20, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 113 Dec 23, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 80 Dec 27, 2022
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla

InferenceHelper This is a helper class for deep learning frameworks especially for inference This class provides an interface to use various deep lear

iwatake 192 Dec 26, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.7k Jan 3, 2023
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

NeuroSim 32 Nov 24, 2022
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

OpenAI 4.6k Dec 26, 2022
An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

手写AI 1.5k Jan 5, 2023
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Tencent 16.2k Jan 5, 2023
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

dgSPARSE 59 Dec 5, 2022
Simple inference deep head pose ncnn version

ncnn-deep-head-pose Simple implement inference deep head pose ncnn version with high performance and optimized resource. This project based on deep-he

Đỗ Công Minh 13 Dec 16, 2022
Nimble: Physics Engine for Deep Learning

Nimble: Physics Engine for Deep Learning

Keenon Werling 312 Dec 27, 2022