C++ library based on tensorrt integration

Overview

3行代码实现YoloV5推理,TensorRT C++库

  1. 支持最新版tensorRT8.0,具有最新的解析器算子支持
  2. 支持静态显性batch size,和动态非显性batch size,这是官方所不支持的
  3. 支持自定义插件,简化插件的实现过程
  4. 支持fp32、fp16、int8的编译
  5. 优化代码结构,打印编译网络信息
  6. 优化内存分配
  7. yolov5的推理作为案例
  8. c++类库,对编译和推理做了封装,对tensor做了封装,支持n维的tensor管理

3行代码实现YoloV5推理

// 创建推理引擎在0显卡上
auto engine = YoloV5::create_infer("yolov5s.fp32.trtmodel", 0);

// 加载图像
auto image = cv::imread("1.jpg");

// 推理并获取结果
auto box = engine->commit(image).get();

效果图

YoloV5-ONNX推理支持-下载的模型

  • 这个yolov5s.onnx模型使用官方最新版本直接导出得到
  • 配置好依赖的tensorRT、cuda、cudnn,参考下面的配置节点
git clone [email protected]:shouxieai/tensorRT_cpp.git
cd tensorRT_cpp
make run -j32

YoloV5-ONNX推理支持-官方导出模型

  • yolov5的onnx,你的pytorch版本>=1.7时,导出的onnx模型可以直接被当前框架所使用
  • 你的pytorch版本低于1.7时,或者对于yolov5其他版本(2.0、3.0、4.0),可以对opset进行简单改动后直接被框架所支持
  • 如果你想实现低版本pytorch的tensorRT推理、动态batchsize等更多更高级的问题,请打开我们博客地址后找到二维码进群交流
  1. 下载yolov5
git clone [email protected]:ultralytics/yolov5.git
  1. 导出onnx模型
cd yolov5
python export.py
  1. 复制模型并执行
cp yolov5/yolov5s.onnx tensorRT_cpp/workspace/
cd tensorRT_cpp
make run -j32

项目依赖的配置

  1. 推荐使用Linux、VSCode,当然也可以支持windows
  2. 在Makefile中配置你的cudnn、cuda、tensorRT8.0、protobuf路径
  3. 在.vscode/c_cpp_properties.json中配置你的库路径
  4. CUDA版本:CUDA10.2
  5. CUDNN版本:cudnn8.2.2.26,注意下载dev(h文件)和runtime(so文件)
  6. tensorRT版本:tensorRT-8.0.1.6-cuda10.2
  7. protobuf版本(用于onnx解析器):这里使用的是protobufv3.11.4

模型编译-FP32/16

TRTBuilder::compile(
  TRTBuilder::TRTMode_FP32,   // 使用fp32模型编译
  {},                         // caffe时指定输出节点
  3,                          // max batch size
  "plugin.onnx",              // onnx 文件
  "plugin.fp32.trtmodel",     // 保存的文件路径
  {},                         // 重新定制输入的shape
  false                       // 是否动态batch size
);
  • 对于FP32编译,只需要提供onnx文件即可,可以允许重定义onnx输入节点的shape
  • 对于动态或者静态batch的支持,仅仅只需要一个选项,这对于官方发布的解析器是不支持的

模型编译-INT8

  • 众所周知,int8的推理效果比fp32稍微差一点(预计-5%的损失),但是速度确快很多很多,这里通过集成的编译方式,很容易实现int8的编译工作
// 定义int8的标定数据处理函数,读取数据并交给tensor的函数
auto int8process = [](int current, int count, vector<string>& images, shared_ptr<TRTInfer::Tensor>& tensor){
    for(int i = 0; i < images.size(); ++i){

	// 对于int8的编译需要进行标定,这里读取图像数据并通过set_norm_mat到tensor中
        auto image = cv::imread(images[i]);
        cv::resize(image, image, cv::Size(640, 640));
        float mean[] = {0, 0, 0};
        float std[]  = {1, 1, 1};
        tensor->set_norm_mat(i, image, mean, std);
    }
};

// 编译模型指定为INT8
auto model_file = "yolov5s.int8.trtmodel";
TRTBuilder::compile(
  TRTBuilder::TRTMode_INT8,   // 选择INT8
  {},                         // 对于caffe的输出节点名称
  3,                          // max batch size
  "yolov5s.onnx",             // onnx文件
  model_file,                 // 编译后保存的文件
  {},                         // 重定义输入的shape
  false,                      // 是否为动态batch size
  int8process,                // 指定标定数据的处理回调函数
  ".",                        // 指定标定图像数据的目录
  ""                          // 指定标定后的数据储存/读取路径
);
  • 避免了官方标定流程分离的问题,复杂度太高,在这里直接集成为一个函数处理

模型推理

  • 对于模型推理,封装了Tensor类,实现推理的维护和数据交互,对于数据从GPU到CPU过程完全隐藏细节
  • 封装了Engine类,实现模型推理和管理
// 模型加载,得到一个共享指针,如果为空表示加载失败
auto engine = TRTInfer::load_engine("yolov5s.fp32.trtmodel");

// 打印模型信息
engine->print();

// 加载图像
auto image = imread("demo.jpg");

// 获取模型的输入和输出tensor节点,可以根据名字或者索引获取第几个
auto input = engine->input(0);
auto output = engine->output(0);

// 把图像塞到input tensor中,这里是减去均值,除以标准差
float mean[] = {0, 0, 0};
float std[]  = {1, 1, 1};
input->set_norm_mat(i, image, mean, std);

// 执行模型的推理,这里可以允许异步或者同步
engine->forward();

// 这里拿到的指针即是最终的结果指针,可以进行访问操作
float* output_ptr = output->cpu<float>();
// 这里对output_ptr进行处理即可得到结果

一个插件的例子

  • 只需要定义必要的核函数和推理过程,完全隐藏细节,隐藏插件的序列化、反序列化、注入
  • 可以简洁的实现FP32、FP16两种格式支持的插件。具体参见代码HSwish cu/hpp
template<>
__global__ void HSwishKernel(float* input, float* output, int edge) {

    KernelPositionBlock;
    float x = input[position];
    float a = x + 3;
    a = a < 0 ? 0 : (a >= 6 ? 6 : a);
    output[position] = x * a / 6;
}

int HSwish::enqueue(const std::vector<GTensor>& inputs, std::vector<GTensor>& outputs, const std::vector<GTensor>& weights, void* workspace, cudaStream_t stream) {

    int count = inputs[0].count();
    auto grid = cuda::grid_dims(count);
    auto block = cuda::block_dims(count);
    HSwishKernel <<<grid, block, 0, stream >>> (inputs[0].ptr<float>(), outputs[0].ptr<float>(), count);
    return 0;
}

RegisterPlugin(HSwish);

执行方式

  1. 配置好Makefile中的依赖项路径
  2. make run -j64即可

执行结果

[2021-07-22 14:37:11][info][_main.cpp:160]:===================== test fp32 ==================================
[2021-07-22 14:37:11][info][trt_builder.cpp:430]:Compile FP32 Onnx Model 'yolov5s.onnx'.
[2021-07-22 14:37:18][warn][trt_infer.cpp:27]:NVInfer WARNING: src/tensorRT/onnx_parser/ModelImporter.cpp:257: Change input batch size: images, final dimensions: (1, 3, 640, 640), origin dimensions: (5, 3, 640, 640)
[2021-07-22 14:37:18][info][trt_builder.cpp:548]:Input shape is 1 x 3 x 640 x 640
[2021-07-22 14:37:18][info][trt_builder.cpp:549]:Set max batch size = 3
[2021-07-22 14:37:18][info][trt_builder.cpp:550]:Set max workspace size = 1024.00 MB
[2021-07-22 14:37:18][info][trt_builder.cpp:551]:Dynamic batch dimension is true
[2021-07-22 14:37:18][info][trt_builder.cpp:554]:Network has 1 inputs:
[2021-07-22 14:37:18][info][trt_builder.cpp:560]:      0.[images] shape is 1 x 3 x 640 x 640
[2021-07-22 14:37:18][info][trt_builder.cpp:566]:Network has 3 outputs:
[2021-07-22 14:37:18][info][trt_builder.cpp:571]:      0.[470] shape is 1 x 255 x 80 x 80
[2021-07-22 14:37:18][info][trt_builder.cpp:571]:      1.[471] shape is 1 x 255 x 40 x 40
[2021-07-22 14:37:18][info][trt_builder.cpp:571]:      2.[472] shape is 1 x 255 x 20 x 20
[2021-07-22 14:37:18][verbo][trt_builder.cpp:575]:Network has 226 layers:
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:  >>> 0.  Slice              1 x 3 x 640 x 640 -> 1 x 3 x 320 x 640 
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:      1.  Slice              1 x 3 x 320 x 640 -> 1 x 3 x 320 x 320 
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:  >>> 2.  Slice              1 x 3 x 640 x 640 -> 1 x 3 x 320 x 640 
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:      3.  Slice              1 x 3 x 320 x 640 -> 1 x 3 x 320 x 320 
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:  >>> 4.  Slice              1 x 3 x 640 x 640 -> 1 x 3 x 320 x 640 
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:      5.  Slice              1 x 3 x 320 x 640 -> 1 x 3 x 320 x 320 
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:  >>> 6.  Slice              1 x 3 x 640 x 640 -> 1 x 3 x 320 x 640 
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:      7.  Slice              1 x 3 x 320 x 640 -> 1 x 3 x 320 x 320
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:      222.LeakyRelu          1 x 768 x 20 x 20 -> 1 x 768 x 20 x 20 
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:  *** 223.Convolution        1 x 192 x 80 x 80 -> 1 x 255 x 80 x 80 channel: 255, kernel: 1 x 1, padding: 0 x 0, stride: 1 x 1, dilation: 1 x 1, group: 1
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:  *** 224.Convolution        1 x 384 x 40 x 40 -> 1 x 255 x 40 x 40 channel: 255, kernel: 1 x 1, padding: 0 x 0, stride: 1 x 1, dilation: 1 x 1, group: 1
[2021-07-22 14:37:18][verbo][trt_builder.cpp:606]:  *** 225.Convolution        1 x 768 x 20 x 20 -> 1 x 255 x 20 x 20 channel: 255, kernel: 1 x 1, padding: 0 x 0, stride: 1 x 1, dilation: 1 x 1, group: 1
[2021-07-22 14:37:18][info][trt_builder.cpp:615]:Building engine...
[2021-07-22 14:37:19][warn][trt_infer.cpp:27]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
[2021-07-22 14:37:40][info][trt_builder.cpp:635]:Build done 22344 ms !
Engine 0x23dd7780 detail
        Max Batch Size: 3
        Dynamic Batch Dimension: true
        Inputs: 1
                0.images : shape {1 x 3 x 640 x 640}
        Outputs: 3
                0.470 : shape {1 x 255 x 80 x 80}
                1.471 : shape {1 x 255 x 40 x 40}
                2.472 : shape {1 x 255 x 20 x 20}
[2021-07-22 14:37:42][info][_main.cpp:77]:input.shape = 3 x 3 x 640 x 640
[2021-07-22 14:37:42][info][_main.cpp:96]:input->shape_string() = 3 x 3 x 640 x 640
[2021-07-22 14:37:42][info][_main.cpp:124]:outputs[0].size = 2
[2021-07-22 14:37:42][info][_main.cpp:124]:outputs[1].size = 5
[2021-07-22 14:37:42][info][_main.cpp:124]:outputs[2].size = 1

关于

Comments
  • no result in int8 mode

    no result in int8 mode

    Hi, thanks for your awesome code share!

    I test this repo on my Ubuntu 18 Linux with CUDA V11.2.152, TensorRT-8.0.0.3 and cudnn8.2.0.

    I tested yolox_s model in FP32、FP16 and INT8 mode, but in INT8 mode, the network output has no results.

    FP16 test code is below:

    static void test_fp16(Yolo::Type type){
    
        TRT::set_device(0);
        INFO("===================== test %s fp16 ==================================", Yolo::type_name(type));
    
        const char* name = nullptr;
        if(type == Yolo::Type::V5){
            name = "yolov5m";
        }else if(type == Yolo::Type::X){
            name = "yolox_s";
        }
    
        if(not requires(name))
            return;
    
        string onnx_file = iLogger::format("%s.onnx", name);
        string model_file = iLogger::format("%s.fp16.trtmodel", name);
        int test_batch_size = 1;  // 当你需要修改batch大于1时,请注意你的模型是否修改(看readme.md代码修改部分),否则会有错误
        
        // 动态batch和静态batch,如果你想要弄清楚,请打开http://www.zifuture.com:8090/
        // 找到右边的二维码,扫码加好友后进群交流(免费哈,就是技术人员一起沟通)
        if(not iLogger::exists(model_file)){
            TRT::compile(
                TRT::TRTMode_FP16,   // 编译方式有,FP32、FP16、INT8
                {},                         // onnx时无效,caffe的输出节点标记
                test_batch_size,            // 指定编译的batch size
                onnx_file,                  // 需要编译的onnx文件
                model_file,                 // 储存的模型文件
                {},                         // 指定需要重定义的输入shape,这里可以对onnx的输入shape进行重定义
                false                       // 是否采用动态batch维度,true采用,false不采用,使用静态固定的batch size
            );
        }
    
        forward_engine(model_file, type);
    }
    

    Below is the output of the log:

    [2021-09-01 19:38:45][info][app_yolo.cpp:240]:===================== test YoloX fp32 ==================================
    [2021-09-01 19:38:45][info][trt_builder.cpp:473]:Compile FP32 Onnx Model 'yolox_s.onnx'.
    [2021-09-01 19:38:45][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
    [2021-09-01 19:38:45][info][trt_builder.cpp:603]:Set max batch size = 1
    [2021-09-01 19:38:45][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
    [2021-09-01 19:38:45][info][trt_builder.cpp:605]:Dynamic batch dimension is false
    [2021-09-01 19:38:45][info][trt_builder.cpp:608]:Network has 1 inputs:
    [2021-09-01 19:38:45][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
    [2021-09-01 19:38:45][info][trt_builder.cpp:620]:Network has 1 outputs:
    [2021-09-01 19:38:45][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
    [2021-09-01 19:38:45][info][trt_builder.cpp:670]:Building engine...
    [2021-09-01 19:38:45][warn][trt_builder.cpp:33]:NVInfer WARNING: Convolution + generic activation fusion is disable due to incompatible driver or nvrtc
    [2021-09-01 19:38:46][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:38:46][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
    [2021-09-01 19:39:18][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:18][info][trt_builder.cpp:690]:Build done 32689 ms !
    [2021-09-01 19:39:18][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.
    
    [2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:19][info][trt_infer.cpp:169]:Infer 0x7fe3f8000c40 detail
    [2021-09-01 19:39:19][info][trt_infer.cpp:170]: Max Batch Size: 1
    [2021-09-01 19:39:19][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
    [2021-09-01 19:39:19][info][trt_infer.cpp:172]: Inputs: 1
    [2021-09-01 19:39:19][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
    [2021-09-01 19:39:19][info][trt_infer.cpp:179]: Outputs: 1
    [2021-09-01 19:39:19][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
    [2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 2 object, 10.10 ms
    [2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 1 object, 6.19 ms
    [2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 2 object, 8.72 ms
    [2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 3 object, 6.03 ms
    [2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 1 object, 5.95 ms
    [2021-09-01 19:39:19][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 1 object, 5.92 ms
    [2021-09-01 19:39:19][info][yolo.cpp:214]:Engine destroy.
    [2021-09-01 19:39:19][info][app_yolo.cpp:277]:===================== test YoloX fp16 ==================================
    [2021-09-01 19:39:19][info][trt_builder.cpp:473]:Compile FP16 Onnx Model 'yolox_s.onnx'.
    [2021-09-01 19:39:19][warn][trt_builder.cpp:483]:Platform not have fast fp16 support
    [2021-09-01 19:39:19][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
    [2021-09-01 19:39:19][info][trt_builder.cpp:603]:Set max batch size = 1
    [2021-09-01 19:39:19][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
    [2021-09-01 19:39:19][info][trt_builder.cpp:605]:Dynamic batch dimension is false
    [2021-09-01 19:39:19][info][trt_builder.cpp:608]:Network has 1 inputs:
    [2021-09-01 19:39:19][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
    [2021-09-01 19:39:19][info][trt_builder.cpp:620]:Network has 1 outputs:
    [2021-09-01 19:39:19][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
    [2021-09-01 19:39:19][info][trt_builder.cpp:670]:Building engine...
    [2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
    [2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:19][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
    [2021-09-01 19:39:47][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:47][info][trt_builder.cpp:690]:Build done 28282 ms !
    [2021-09-01 19:39:47][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.
    
    [2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:48][info][trt_infer.cpp:169]:Infer 0x7fe3f8015650 detail
    [2021-09-01 19:39:48][info][trt_infer.cpp:170]: Max Batch Size: 1
    [2021-09-01 19:39:48][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
    [2021-09-01 19:39:48][info][trt_infer.cpp:172]: Inputs: 1
    [2021-09-01 19:39:48][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
    [2021-09-01 19:39:48][info][trt_infer.cpp:179]: Outputs: 1
    [2021-09-01 19:39:48][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
    [2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 2 object, 10.75 ms
    [2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 1 object, 6.38 ms
    [2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 2 object, 9.12 ms
    [2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 3 object, 6.10 ms
    [2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 1 object, 6.07 ms
    [2021-09-01 19:39:48][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 1 object, 6.01 ms
    [2021-09-01 19:39:48][info][yolo.cpp:214]:Engine destroy.
    [2021-09-01 19:39:48][info][app_yolo.cpp:190]:===================== test YoloX int8 ==================================
    [2021-09-01 19:39:48][info][trt_builder.cpp:473]:Compile INT8 Onnx Model 'yolox_s.onnx'.
    [2021-09-01 19:39:48][info][trt_builder.cpp:593]:Using image list[6 files]: inference
    [2021-09-01 19:39:48][info][trt_builder.cpp:602]:Input shape is 1 x 3 x 640 x 640
    [2021-09-01 19:39:48][info][trt_builder.cpp:603]:Set max batch size = 1
    [2021-09-01 19:39:48][info][trt_builder.cpp:604]:Set max workspace size = 1024.00 MB
    [2021-09-01 19:39:48][info][trt_builder.cpp:605]:Dynamic batch dimension is false
    [2021-09-01 19:39:48][info][trt_builder.cpp:608]:Network has 1 inputs:
    [2021-09-01 19:39:48][info][trt_builder.cpp:614]:      0.[images] shape is 1 x 3 x 640 x 640
    [2021-09-01 19:39:48][info][trt_builder.cpp:620]:Network has 1 outputs:
    [2021-09-01 19:39:48][info][trt_builder.cpp:625]:      0.[output] shape is 1 x 8400 x 85
    [2021-09-01 19:39:48][info][trt_builder.cpp:670]:Building engine...
    [2021-09-01 19:39:48][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:49][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:49][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 1 / 6
    [2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 2 / 6
    [2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 3 / 6
    [2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 4 / 6
    [2021-09-01 19:39:49][info][app_yolo.cpp:193]:Int8 5 / 6
    [2021-09-01 19:39:50][info][app_yolo.cpp:193]:Int8 6 / 6
    [2021-09-01 19:40:27][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:40:27][warn][trt_builder.cpp:33]:NVInfer WARNING: Detected invalid timing cache, setup a local cache instead
    [2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:40:59][info][trt_builder.cpp:685]:No set entropyCalibratorFile, and entropyCalibrator will not save.
    [2021-09-01 19:40:59][info][trt_builder.cpp:690]:Build done 70917 ms !
    [2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: The logger passed into createInferRuntime differs from one already assigned, 0x557f9ae330b0, logger not updated.
    
    [2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:40:59][warn][trt_builder.cpp:33]:NVInfer WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1
    [2021-09-01 19:40:59][info][trt_infer.cpp:169]:Infer 0x7fe3b4000c40 detail
    [2021-09-01 19:40:59][info][trt_infer.cpp:170]: Max Batch Size: 1
    [2021-09-01 19:40:59][info][trt_infer.cpp:171]: Dynamic Batch Dimension: false
    [2021-09-01 19:40:59][info][trt_infer.cpp:172]: Inputs: 1
    [2021-09-01 19:40:59][info][trt_infer.cpp:176]:         0.images : shape {1 x 3 x 640 x 640}
    [2021-09-01 19:40:59][info][trt_infer.cpp:179]: Outputs: 1
    [2021-09-01 19:40:59][info][trt_infer.cpp:183]:         0.output : shape {1 x 8400 x 85}
    [2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/car.jpg, 0 object, 7.75 ms
    [2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/group.jpg, 0 object, 4.04 ms
    [2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/zand.jpg, 0 object, 6.04 ms
    [2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/zgjr.jpg, 0 object, 3.74 ms
    [2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/gril.jpg, 0 object, 3.72 ms
    [2021-09-01 19:40:59][info][app_yolo.cpp:94]:Save to YoloX_result/yq.jpg, 0 object, 3.74 ms
    [2021-09-01 19:40:59][info][yolo.cpp:214]:Engine destroy.
    

    Time-consuming in the FP16/FP32 modes is close, is it because the code I wrote has a problem?

    Your sincerely!

    opened by qixuxiang 14
  • AffineMatrix求解?

    AffineMatrix求解?

    正常图像作缩放是不需要偏移的吧? M = [ scale, 0, -scale * from.width * 0.5 + to.width * 0.5 0, scale, -scale * from.height * 0.5 + to.height * 0.5 0, 0, 1 ] -scale * from.width * 0.5 + to.width * 0.5这个偏移量抵消了,

    但是第一次平移矩阵不是 P = [ 1, 0, -scale * from.width * 0.5 0, -1, scale * from.height * 0.5 0, 0, 1 ] 第二次平移矩阵不应该是: T = [ 1, 0, to.width * 0.5, 0, - 1, to.height * 0.5, 0, 0, 1 ]

    ??? 参考的矩阵: https://www.cnblogs.com/xuanyuyt/p/7112876.html

    opened by cqray1990 8
  • Environment Configuration Issues!

    Environment Configuration Issues!

    QQ截图20211119174416 QQ截图20211119173519 I have never used VS Code before. I want to learn from the video of the blogger, but there are always errors. Is there any solution?(I have installed Tensorrt on Windows)

    opened by Havehandssook 6
  • arcface人脸特征提取不同人脸,计算的相似度都一样,是怎么回事呢?

    arcface人脸特征提取不同人脸,计算的相似度都一样,是怎么回事呢?

    余弦距离计算代码: def cosine_distance(matrix1, matrix2): matrix1_matrix2 = np.dot(matrix1, matrix2.transpose()) matrix1_norm = np.sqrt(np.multiply(matrix1, matrix1).sum(axis=1)) matrix1_norm = matrix1_norm[:, np.newaxis] matrix2_norm = np.sqrt(np.multiply(matrix2, matrix2).sum(axis=1)) matrix2_norm = matrix2_norm[:, np.newaxis] cosine_distance = np.divide(matrix1_matrix2, np.dot(matrix1_norm, matrix2_norm.transpose())) return cosine_distance

    是不是没有进行归一化呢,归一化有没有好的函数推荐呢?

    opened by seawater668 6
  • Error,arcface onnx 转tensorrt 错误

    Error,arcface onnx 转tensorrt 错误

    使用了本项目提供给的arcface_iresnet50.onnx 模型
    错误如下:
    While parsing node number 182 [BatchNormalization]:
    ERROR: /home/Project/tensorRT_Pro-main/src/tensorRT/onnx_parser/onnx2trt_utils.cpp:1523 In function scaleHelper:
    [8] Assertion failed: dims.nbDims == 4 || dims.nbDims == 5
    [2021-10-04 16:39:25][error][trt_builder.cpp:517]:Can not parse OnnX file: arcface_iresnet50_iii.onnx
    

    计算机环境如下:

    tensorrt 7.2
    cudnn8.1
    cuda 11.2
    protobufv3.11.4
    gpu 3080  arch86
    

    BatchNormalization_182 是模型的倒数第二层。

    opened by create-li 6
  • preprocess  engine->forward  decode_kernel_invoker  线程问题

    preprocess engine->forward decode_kernel_invoker 线程问题

    老师,您好,preprocess 在commits,是在主线程执行,而decode_kernel_invoker是在work 函数中和forward在一个线程中执行,为什么会这么设计呀? 如果送入的图片存在高并发问题是否会导致主线程卡死在commits中,为什么不把preprocess过程和decode一样放在work线程中执行?

    然后我有个想法是不是可以把preprocess 和decode 剥离出来,通过多线程实现, work只做forward的工作,这样是不是更高效?

    opened by QZ1219 4
  • cmake编译失败

    cmake编译失败

    cmake能全部正常build,但是在make的时候正常编译到85%的时候报错了,以下是报错内容(我未修改报错文件内的代码)

    [ 83%] Building CXX object CMakeFiles/pro.dir/src/application/app_yolo_fast.cpp.o [ 83%] Building CXX object CMakeFiles/pro.dir/src/application/app_alphapose.cpp.o [ 85%] Building CXX object CMakeFiles/pro.dir/src/application/app_python/interface.cpp.o In file included from /xxx/datav/projects/tensorRT_Pro/src/application/app_python/interface.cpp:6:0: /xxx/datav/projects/tensorRT_Pro/src/application/tools/pybind11.hpp:159:20: fatal error: Python.h: No such file or directory #include <Python.h> ^ compilation terminated. CMakeFiles/pro.dir/build.make:1142: recipe for target 'CMakeFiles/pro.dir/src/application/app_python/interface.cpp.o' failed make[3]: *** [CMakeFiles/pro.dir/src/application/app_python/interface.cpp.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CMakeFiles/Makefile2:328: recipe for target 'CMakeFiles/pro.dir/all' failed make[2]: *** [CMakeFiles/pro.dir/all] Error 2 CMakeFiles/Makefile2:537: recipe for target 'CMakeFiles/yolo.dir/rule' failed make[1]: *** [CMakeFiles/yolo.dir/rule] Error 2 Makefile:300: recipe for target 'yolo' failed make: *** [yolo] Error 2

    opened by sungh66 4
  • protobuf编译有问题

    protobuf编译有问题

    make报错

    g++: error: google/protobuf/util/internal/.libs/proto_writer.o: No such file or directory
    Makefile:2372: recipe for target 'libprotobuf.la' failed
    make[2]: *** [libprotobuf.la] Error 1
    

    是不是要先下载googletest,如果是的话,是不是哪个版本呢? ubuntu18.04 jetson xavier nx

    opened by DCC-lzhy 4
  • ][trt_builder.cpp:36]:NVInfer: TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1

    ][trt_builder.cpp:36]:NVInfer: TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.1

    warning, errors are showed above, it is annoying and weird to concern the compatibility issues among tensorrt version and cuda, and cuda toolkit versions. i cannot figure out the difference among them, any help will be approciated!!!

    i have installed tensorrt !!!

    dpkg -l | grep tensorrt ii nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.0.2.6-ga-20181009 1-1 amd64 nv-tensorrt repository configuration files ii nv-tensorrt-repo-ubuntu1804-cuda10.1-trt5.1.5.0-ga-20190427 1-1 amd64 nv-tensorrt repository configuration files ii nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.1.8-ga-20211117 1-1 amd64 nv-tensorrt repository configuration files ii tensorrt 8.2.1.8-1+cuda11.4 amd64

    it should be 8.2.18 cuda11.4 as i am concerned.

    and after i typed nvcc -V , the cuda version is nvcc as follows:

    NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Nov_30_19:08:53_PST_2020 Cuda compilation tools, release 11.2, V11.2.67 Build cuda_11.2.r11.2/compiler.29373293_0

    so which version of toolkit should i install??? currently, my cuda toolkit is v11.5.0 i think

    opened by sainttelant 4
  • yolox master分支的模型能够正常导出到onnx但生成engine失败

    yolox master分支的模型能够正常导出到onnx但生成engine失败

    [2021-10-26 14:29:01][info][app_yolo.cpp:121]:===================== test YoloX FP32 yolox_s ================================== [2021-10-26 14:29:01][info][trt_builder.cpp:471]:Compile FP32 Onnx Model 'yolox_s.onnx'. [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0] [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0] [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0] [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0] [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0] [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0] [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0] [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: Slice_562: slice size must be positive, size = [0] [2021-10-26 14:29:02][error][trt_builder.cpp:30]:NVInfer: INVALID_ARGUMENT: getPluginCreator could not find plugin ScatterND version 1 While parsing node number 565 [ScatterND]: ERROR: /home/work/tracking/tensorRT_Pro/src/tensorRT/onnx_parser/builtin_op_importers.cpp:4013 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?" [2021-10-26 14:29:02][error][trt_builder.cpp:517]:Can not parse OnnX file: yolox_s.onnx [2021-10-26 14:29:02][error][yolo.cpp:138]:Engine yolox_s.FP32.trtmodel load failed [2021-10-26 14:29:02][error][app_yolo.cpp:42]:Engine is nullptr [100%] Built target yolo

    opened by deep-practice 4
  • 对输入输出的内存申请为啥要加个size_matrix 仿射矩阵的大小,而且对输出affine_matrix_device的内存分配是sizeof(job.additional.d2i)

    对输入输出的内存申请为啥要加个size_matrix 仿射矩阵的大小,而且对输出affine_matrix_device的内存分配是sizeof(job.additional.d2i)

            uint8_t* gpu_workspace        = (uint8_t*)workspace->gpu(size_matrix + size_image);
            float*   affine_matrix_device = (float*)gpu_workspace;
            uint8_t* image_device         = size_matrix + gpu_workspace;
    
            uint8_t* cpu_workspace        = (uint8_t*)workspace->cpu(size_matrix + size_image);
            float* affine_matrix_host     = (float*)cpu_workspace;
            uint8_t* image_host           = size_matrix + cpu_workspace;
           memcpy(image_host, image.data, size_image);
            memcpy(affine_matrix_host, job.additional.d2i, sizeof(job.additional.d2i));
            checkCudaRuntime(cudaMemcpyAsync(image_device, image_host, size_image, cudaMemcpyHostToDevice, preprocess_stream));
            checkCudaRuntime(cudaMemcpyAsync(affine_matrix_device, affine_matrix_host, sizeof(job.additional.d2i), cudaMemcpyHostToDevice, preprocess_stream));
    
    opened by cqray1990 3
  • 请问下是否支持arm

    请问下是否支持arm

    CUDA Runtime error cudaSetDevice(device_id) # initialization error, code = cudaErrorInitializationError [ 3 ] in file /src/tensorRT/infer/trt_infer.cpp:472 请问下代码是否支持 这个是在jetson nx编译通过后,转engine是报错,麻烦大佬帮忙看看

    opened by Waynepoo 0
  • rtsp视频流解码报错

    rtsp视频流解码报错

    报错如下,这个错误影响解码吗

    [h264 @ 0x7f918c007980] concealing 1466 DC, 1466 AC, 1466 MV errors in I frame
    [rtsp @ 0x7f918c0021c0] RTP Xiph packet settings (0,1,0) is not implemented. Update your FFmpeg version to the newest one from Git. If the problem still occurs, it means that your file has a feature which has not been implemented.
    Decode Error occurred for picture 0
    
    opened by ZJU-lishuang 0
  • onnx转trt报错

    onnx转trt报错

    Hi [2022-12-08 17:36:09][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:09][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:09][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: 9: [graphShapeAnalyzer.cpp::nvinfer1::builder::`anonymous-namespace'::ShapeNodeRemover::throwIfError::1306] Error Code 9: Internal Error (Reshape_1087: reshape changes volume ) [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: g:\chengc\project\tensorrt-onnx-fasterrcnn-fpn-roialign-master\tensorrt_code\src\tensorrt\onnx_parser\modelimporter.cpp:736: While parsing node number 364 [Add -> "1963"]: [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: g:\chengc\project\tensorrt-onnx-fasterrcnn-fpn-roialign-master\tensorrt_code\src\tensorrt\onnx_parser\modelimporter.cpp:737: --- Begin node --- [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: g:\chengc\project\tensorrt-onnx-fasterrcnn-fpn-roialign-master\tensorrt_code\src\tensorrt\onnx_parser\modelimporter.cpp:738: input: "1935" input: "1940" output: "1963" name: "Add_1095" op_type: "Add"

    [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: g:\chengc\project\tensorrt-onnx-fasterrcnn-fpn-roialign-master\tensorrt_code\src\tensorrt\onnx_parser\modelimporter.cpp:739: --- End node --- [2022-12-08 17:36:10][error][trt_builder.cpp:30]:NVInfer: g:\chengc\project\tensorrt-onnx-fasterrcnn-fpn-roialign-master\tensorrt_code\src\tensorrt\onnx_parser\modelimporter.cpp:742: ERROR: g:\chengc\project\tensorrt-onnx-fasterrcnn-fpn-roialign-master\tensorrt_code\src\tensorrt\onnx_parser\onnx2trt_utils.cpp:959 In function elementwiseHelper: [8] Assertion failed: tensor_ptr->getDimensions().nbDims == maxNbDims && "Failed to broadcast tensors elementwise!"

    opened by Chengcheng1998727 1
Releases(v1.0)
Owner
手写AI
手写AI
A multi object tracking Library Based on tensorrt

YoloV5_JDE_TensorRT_for_Track Introduction A multi object detect and track Library Based on tensorrt 一个基于TensorRT的多目标检测和跟踪融合算法库,可以同时支持行人的多目标检测和跟踪,当然也可

zwg_cv 48 Nov 25, 2022
Inference framework for MoE layers based on TensorRT with Python binding

InfMoE Inference framework for MoE-based models, based on a TensorRT custom plugin named MoELayerPlugin (including Python binding) that can run infere

Shengqi Chen 34 Nov 25, 2022
An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

手写AI 1.5k Jan 5, 2023
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

NVIDIA Corporation 6.4k Jan 4, 2023
Implement yolov5 with Tensorrt C++ api, and integrate batchedNMSPlugin. A Python wrapper is also provided.

yolov5 Original codes from tensorrtx. I modified the yololayer and integrated batchedNMSPlugin. A yolov5s.wts is provided for fast demo. How to genera

weiwei zhou 46 Dec 6, 2022
TensorRT int8 量化部署 yolov5s 4.0 模型,实测3.3ms一帧!

tensorrt模型推理 git clone https://github.com/Wulingtian/yolov5_tensorrt_int8.git(求star) cd yolov5_tensorrt_int8 vim CMakeLists.txt 修改USER_DIR参数为自己的用户根目录

null 120 Dec 18, 2022
TensorRT implementation of RepVGG models from RepVGG: Making VGG-style ConvNets Great Again

RepVGG RepVGG models from "RepVGG: Making VGG-style ConvNets Great Again" https://arxiv.org/pdf/2101.03697.pdf For the Pytorch implementation, you can

weiwei zhou 69 Sep 10, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

JoliBrain 2.4k Dec 30, 2022
Simple samples for TensorRT programming

Introduction This is a collection of simplified TensorRT samples to get you started with TensorRT programming. Most of the samples are written in C++,

NVIDIA Corporation 675 Jan 6, 2023
Support Yolov4/Yolov3/Centernet/Classify/Unet. use darknet/libtorch/pytorch to onnx to tensorrt

ONNX-TensorRT Yolov4/Yolov3/CenterNet/Classify/Unet Implementation Yolov4/Yolov3 centernet INTRODUCTION you have the trained model file from the darkn

null 172 Dec 29, 2022
vs2015上使用tensorRT加速yolov5推理(Using tensorrt to accelerate yolov5 reasoning on vs2015)

1、安装环境 CUDA10.2 TensorRT7.2 OpenCV3.4(工程中已给出,不需安装) vs2015 下载相关工程:https://github.com/wang-xinyu/tensorrtx.git 2、生成yolov5s.wts文件 在生成yolov5s.wts前,首先需要下载模

null 16 Apr 19, 2022
TensorRT for Scaled YOLOv4(yolov4-csp.cfg)

TensoRT Scaled YOLOv4 TensorRT for Scaled YOLOv4(yolov4-csp.cfg) 很多人都写过TensorRT版本的yolo了,我也来写一个。 测试环境 ubuntu 18.04 pytorch 1.7.1 jetpack 4.4 CUDA 11.0

Bolano 10 Jul 30, 2021
YOLOv4 accelerated wtih TensorRT and multi-stream input using Deepstream

Deepstream 5.1 YOLOv4 App This Deepstream application showcases YOLOv4 running at high FPS throughput! P.S - Click the gif to watch the entire video!

Akash James 35 Nov 10, 2022
(ROS) YOLO detection with TensorRT, utilizing tkDNN

tkDNN-ROS YOLO object detection with ROS and TensorRT using tkDNN Currently, only YOLO is supported. Comparison of performance and other YOLO implemen

EungChang-Mason-Lee 7 Dec 10, 2022
Real-time object detection with YOLOv5 and TensorRT

YOLOv5-TensorRT The goal of this library is to provide an accessible and robust method for performing efficient, real-time inference with YOLOv5 using

Noah van der Meer 43 Dec 27, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
An R3D network implemented with TensorRT

r3d_TensorRT An r3d network implemented with TensorRT8.x, The weight of the model comes from PyTorch. A description of the models in Pytroch can be fo

null 2 Nov 7, 2021
The MOT implement by Solov2+DeepSORT with C++ (Libtorch, TensorRT).

Tracking-Solov2-Deepsort This project implement the Multi-Object-Tracking(MOT) base on SOLOv2 and DeepSORT with C++。 The instance segmentation model S

ChenJianqu 38 Dec 22, 2022
In this repo, we deployed SOLOv2 to TensorRT with C++.

Solov2-TensorRT-CPP in this repo, we deployed SOLOv2 to TensorRT with C++. See the video:https://www.bilibili.com/video/BV1rQ4y1m7mx Requirements Ubun

ChenJianqu 30 Dec 9, 2022