In this repo, we deployed SOLOv2 to TensorRT with C++.

Overview

Solov2-TensorRT-CPP

in this repo, we deployed SOLOv2 to TensorRT with C++.
See the video:https://www.bilibili.com/video/BV1rQ4y1m7mx

Requirements

  • Ubuntu 16.04/18.04/20.04
  • Cuda10.2
  • Cudnn8
  • TensorRT8.0.1
  • OpenCV 3.4
  • Libtorch 1.8.2
  • CMake 3.20

Acknowledge

SOLO SOLOv2.tensorRT

Getting Started

1. Install Solov2 from SOLO

download,and run it successfully

2. Export the ONNX model fron original model

**you can follow with ** SOLOv2.tensorRT.

That is, before export, you have to modify some parts of the original SOLOv2 first:

2.1. modify SOLO-master/mmdet/models/anchor_heads/solov2_head.py:154:0

#Modify for onnx export, frozen the input size = 800x800, batch size = 1
size = {0: 100, 1: 100, 2: 50, 3: 25, 4: 25}
feat_h, feat_w = ins_kernel_feat.shape[-2], ins_kernel_feat.shape[-1]
feat_h, feat_w = int(feat_h.cpu().numpy() if isinstance(feat_h, torch.Tensor) else feat_h), int(feat_w.cpu().numpy() if isinstance(feat_w, torch.Tensor) else feat_w)
x_range = torch.linspace(-1, 1, feat_w, device=ins_kernel_feat.device)
y_range = torch.linspace(-1, 1, feat_h, device=ins_kernel_feat.device)
y, x = torch.meshgrid(y_range, x_range)
y = y.expand([1, 1, -1, -1])
x = x.expand([1, 1, -1, -1])

# Origin from SOLO
# x_range = torch.linspace(-1, 1, ins_feat.shape[-1], device=ins_feat.device)
# y_range = torch.linspace(-1, 1, ins_feat.shape[-2], device=ins_feat.device)
# y, x = torch.meshgrid(y_range, x_range)
# y = y.expand([ins_feat.shape[0], 1, -1, -1])
# x = x.expand([ins_feat.shape[0], 1, -1, -1])

2.2 single_stage_ins.py in the function of forward_dummy(), add the forward_dummy of mask, such as :

def forward_dummy(self, img):
        x = self.extract_feat(img)
        outs = self.bbox_head(x)
        if self.with_mask_feat_head:
            mask_feat_pred = self.mask_feat_head(
                x[self.mask_feat_head.start_level:self.mask_feat_head.end_level + 1])
            outs = (outs[0], outs[1], mask_feat_pred)
        return outs

2.3 export onnx model move the onnx_exporter.py to the SOLO/demo/, then run

#kitti size
python onnx_exporter.py ../configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py ../weights/SOLOv2_light_R34.onnx --checkpoint ../checkpoints/SOLOv2_LIGHT_448_R34_3x.pth --shape 384 1152

3. build the tensorrt model

Firstly edit the config file:config.yaml

image_width: 1226
image_height: 370


# the position of exported ONNX model
onnx_path: "xxxx/SOLOv2_light_R34.onnx"  

# save  tensorrt model to:
serialize_path: "xxx/tensorrt_model_1152x384.bin"

# solo parameters
SOLO_NMS_PRE: 500
SOLO_MAX_PER_IMG: 100
SOLO_NMS_KERNEL: "gaussian"
#SOLO_NMS_SIGMA=2.0
SOLO_NMS_SIGMA: 2.0
SOLO_SCORE_THR: 0.1
SOLO_MASK_THR: 0.5
SOLO_UPDATE_THR: 0.2

segmentor_log_path: "xxx/log/segmentor_log.txt"
segmentor_log_level: "debug"
segmentor_log_flush: "debug"

# test img dir
DATASET_DIR: "xxx/kitti/odometry/colors/07/image_2/"
WARN_UP_IMAGE_PATH: "xxx/kitti.png"

and then,compile the CMake project:

mkdir build && cd build

cmake ..

make -j10

last, build the tensorrt model:

cd ..
./build/build_model ./config/config.yaml

4. run the demo
if you have the KITTI dataset, set config.yaml with right path DATASET_DIR ,run:

./build/InstanceSegment ./config/config.yaml

but if you not , and just want run at a image, set config.yaml with right image path WARN_UP_IMAGE_PATH, then run :

./build/demo ./config/config.yaml
Comments
  • onnx转换失败

    onnx转换失败

    我发现使用onnx转换后,输出是11个,不是代码里面的三个,而且我用shape 768 1344去转换,直接报错 Traceback (most recent call last): File "onnx_exporter.py", line 231, in check(args, dummy_input, check_onnx=True, check_trt=False) File "onnx_exporter.py", line 126, in check sess = rt.InferenceSession(args.out) File "/home/tao/anaconda3/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 195, in init self._create_inference_session(providers, provider_options) File "/home/tao/anaconda3/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 205, in _create_inference_session sess.initialize_session(providers or [], provider_options or []) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_637) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=49 Target=48 Dimension=2

    这个是我在single_stage_ins 加了你介绍的代码后出现的,不加就不会报错,但输出就变成10个了,求教

    opened by 1124676457 2
  • compile error!

    compile error!

    hi,professor: when i compile (build_model , i got error: Solov2-TensorRT-CPP/InstanceSegment/common.h:388:9: error: ‘virtual nvinfer1::IBuilder::~IBuilder()’ is protected within this context delete obj; ^~~~~~ why? my enviroment: nvidia jetson tx2 with jetpack4.5.1 tensorrt 7.1.3 please help!

    opened by jcyhcs 1
  • 追踪问题

    追踪问题

    感谢作者分享代码~ 在追踪模块的运动参数滤波部分,使用的卡尔曼滤波模型和原始论文不一致: https://github.com/chenjianqu/Tracking-Solov2-Deepsort/blob/master/InstanceTracking/KalmanTracker.cpp 计算匹配测度(目标相似性和几何距离)时,有时测度为负值。

    同样的实现来自这里: https://github.com/weixu000/libtorch-yolov3-deepsort

    和原始 DeepSort-Python 相比,追踪结果不能和原始算法对齐。

    opened by FishHe 1
  • compile error

    compile error

    Hello teacher, when I compile your code, I always get this error (c++ 14/17): /home/hermione/library/libtorch/include/ATen/TensorIterator.h:200:3: error: reference to 'DeviceType' is ambiguous DeviceType device_type(int arg=0) const { return device(arg).type(); }. Do you know the reason for this, I am using libtorch1.82+cuda10.2.

    opened by wafaer 0
  • 导出onnx文件出错

    导出onnx文件出错

    您好,在执行您步骤中的

    python onnx_exporter.py ../configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py ../weights/SOLOv2_light_R34.onnx --checkpoint ../checkpoints/SOLOv2_LIGHT_448_R34_3x.pth --shape 384 1152
    

    出现下面的报错,请问您知道什么原因吗? RuntimeError: Given groups=1, weight of size [256, 258, 3, 3], expected input[1, 260, 40, 40] to have 258 channels, but got 260 channels instead

    后面我更换了权重文件,换成了SOLOv2_LIGHT_512_DCN_R50_3x,执行您的命令没有报错,但是没有找到对应的.onnx文件的输出,请问为什么呢?

    opened by Hbelief1998 0
  • Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr

    Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr

    When i run "./build/segment ./config/config.yaml", i get an Error "[E] [TRT] 3: [executionContext.cpp::enqueueInternal::322] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr)", and what maybe the reason? If i get the right ONNX and tensorrt_model_bin? This is my output information of running "build_model" and "demo":

    ./build/build_model ./config/config.yaml

    ~/Solov2-TensorRT-CPP/cmake-build-debug/build_model ./config/config.yaml config_file:./config/config.yaml createInferBuilder [05/25/2022-22:57:19] [I] [TRT] [MemUsageChange] Init CUDA: CPU +299, GPU +0, now: CPU 301, GPU 309 (MiB) createNetwork createBuilderConfig createParser parseFromFile:~/Solov2-TensorRT-CPP/ONNX/SOLOv2_light_R34.onnx [05/25/2022-22:57:19] [I] [TRT] ---------------------------------------------------------------- [05/25/2022-22:57:19] [I] [TRT] Input filename: ~/Solov2-TensorRT-CPP/ONNX/SOLOv2_light_R34.onnx [05/25/2022-22:57:19] [I] [TRT] ONNX IR version: 0.0.4 [05/25/2022-22:57:19] [I] [TRT] Opset version: 11 [05/25/2022-22:57:19] [I] [TRT] Producer name: pytorch [05/25/2022-22:57:19] [I] [TRT] Producer version: 1.3 [05/25/2022-22:57:19] [I] [TRT] Domain:
    [05/25/2022-22:57:19] [I] [TRT] Model version: 0 [05/25/2022-22:57:19] [I] [TRT] Doc string:
    [05/25/2022-22:57:19] [I] [TRT] ---------------------------------------------------------------- [05/25/2022-22:57:19] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. input shape:input (1, 3, 384, 1152) output shape:cate_pred (3872, 80) enableDLA buildEngineWithConfig [05/25/2022-22:57:20] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 664 MiB, GPU 671 MiB [05/25/2022-22:57:21] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +70, GPU +68, now: CPU 822, GPU 1012 (MiB) [05/25/2022-22:57:21] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 822, GPU 1022 (MiB) [05/25/2022-22:57:21] [W] [TRT] Detected invalid timing cache, setup a local cache instead [05/25/2022-22:57:24] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [05/25/2022-22:58:39] [I] [TRT] Detected 1 inputs and 13 output network tensors. [05/25/2022-22:58:39] [I] [TRT] Total Host Persistent Memory: 274640 [05/25/2022-22:58:39] [I] [TRT] Total Device Persistent Memory: 83921920 [05/25/2022-22:58:39] [I] [TRT] Total Scratch Memory: 0 [05/25/2022-22:58:39] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 158 MiB, GPU 675 MiB [05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1298, GPU 1635 (MiB) [05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1298, GPU 1643 (MiB) [05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1298, GPU 1627 (MiB) [05/25/2022-22:58:39] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1297, GPU 1611 (MiB) [05/25/2022-22:58:39] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1210 MiB, GPU 1381 MiB serializeModel done

    Process finished with exit code 0


    ./build/demo ./config/config.yaml

    ~/Solov2-TensorRT-CPP/cmake-build-debug/segment ./config/config.yaml config_file:./config/config.yaml [05/25/2022-23:35:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +298, GPU +0, now: CPU 411, GPU 309 (MiB) [05/25/2022-23:35:11] [I] [TRT] Loaded engine size: 81 MB [05/25/2022-23:35:11] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 411 MiB, GPU 309 MiB [05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +2140, GPU +980, now: CPU 2804, GPU 1731 (MiB) [05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2804, GPU 1741 (MiB) [05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2804, GPU 1725 (MiB) [05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 2804 MiB, GPU 1725 MiB [05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 2804 MiB, GPU 1725 MiB [05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2804, GPU 1733 (MiB) [05/25/2022-23:35:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2804, GPU 1741 (MiB) [05/25/2022-23:35:22] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 2811 MiB, GPU 2166 MiB [05/25/2022-23:35:23] [E] [TRT] 3: [executionContext.cpp::enqueueInternal::322] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::322, condition: bindings[x] != nullptr ) terminate called after throwing an instance of 'c10::Error' what(): CUDA error: invalid argument Exception raised from getDeviceFromPtr at ../aten/src/ATen/cuda/CUDADevice.h:13 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x69 (0x7f30d25c1b29 in ~/NVIDIA/libtorch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xd2 (0x7f30d25beab2 in /home/cqyd/NVIDIA/libtorch/lib/libc10.so) frame #2: + 0x36d1ea7 (0x7f306f824ea7 in ~/NVIDIA/libtorch/lib/libtorch_cuda.so) frame #3: + 0x7c87c (0x559bd949387c in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #4: + 0x7cdf1 (0x559bd9493df1 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #5: + 0x7d2a7 (0x559bd94942a7 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #6: + 0x7dff8 (0x559bd9494ff8 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #7: + 0x7e0b2 (0x559bd94950b2 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #8: + 0x79af5 (0x559bd9490af5 in ~/SOLOV2model/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #9: + 0x84b4d (0x559bd949bb4d in ~/SOLOV2model/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #10: + 0x84827 (0x559bd949b827 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #11: + 0x16f77 (0x559bd942df77 in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment) frame #12: __libc_start_main + 0xf3 (0x7f306b82c083 in /lib/x86_64-linux-gnu/libc.so.6) frame #13: + 0x1606e (0x559bd942d06e in ~/Solov2-TensorRT-CPP/cmake-build-debug/segment)

    Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

    opened by mmksir 1
  • not use libtorch for deployment solov2 tensorrt?

    not use libtorch for deployment solov2 tensorrt?

    hi,professor: is there any possible for deployment solov2 don't use libtorch? just use tensorrt deserialize api, then write some postprocess code? beacause it has fewer dependence, install libtorch on jetson isn't friendly! for your build_model , it can create raw tensorrt engine , so the demo can just read it as file,and then create tensorrt context,then deploy, so, please help!

    opened by jcyhcs 1
Owner
ChenJianqu
ChenJianqu
A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

JinquanPan 58 Jan 3, 2023
the C++ version of solov2 with ncnn

the C++ version of SOLOV2 with ncnn

DayBreak 70 Jan 4, 2023
Implement yolov5 with Tensorrt C++ api, and integrate batchedNMSPlugin. A Python wrapper is also provided.

yolov5 Original codes from tensorrtx. I modified the yololayer and integrated batchedNMSPlugin. A yolov5s.wts is provided for fast demo. How to genera

weiwei zhou 46 Dec 6, 2022
TensorRT int8 量化部署 yolov5s 4.0 模型,实测3.3ms一帧!

tensorrt模型推理 git clone https://github.com/Wulingtian/yolov5_tensorrt_int8.git(求star) cd yolov5_tensorrt_int8 vim CMakeLists.txt 修改USER_DIR参数为自己的用户根目录

null 120 Dec 18, 2022
TensorRT implementation of RepVGG models from RepVGG: Making VGG-style ConvNets Great Again

RepVGG RepVGG models from "RepVGG: Making VGG-style ConvNets Great Again" https://arxiv.org/pdf/2101.03697.pdf For the Pytorch implementation, you can

weiwei zhou 69 Sep 10, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

JoliBrain 2.4k Dec 30, 2022
Simple samples for TensorRT programming

Introduction This is a collection of simplified TensorRT samples to get you started with TensorRT programming. Most of the samples are written in C++,

NVIDIA Corporation 675 Jan 6, 2023
Support Yolov4/Yolov3/Centernet/Classify/Unet. use darknet/libtorch/pytorch to onnx to tensorrt

ONNX-TensorRT Yolov4/Yolov3/CenterNet/Classify/Unet Implementation Yolov4/Yolov3 centernet INTRODUCTION you have the trained model file from the darkn

null 172 Dec 29, 2022
vs2015上使用tensorRT加速yolov5推理(Using tensorrt to accelerate yolov5 reasoning on vs2015)

1、安装环境 CUDA10.2 TensorRT7.2 OpenCV3.4(工程中已给出,不需安装) vs2015 下载相关工程:https://github.com/wang-xinyu/tensorrtx.git 2、生成yolov5s.wts文件 在生成yolov5s.wts前,首先需要下载模

null 16 Apr 19, 2022
Inference framework for MoE layers based on TensorRT with Python binding

InfMoE Inference framework for MoE-based models, based on a TensorRT custom plugin named MoELayerPlugin (including Python binding) that can run infere

Shengqi Chen 34 Nov 25, 2022
TensorRT for Scaled YOLOv4(yolov4-csp.cfg)

TensoRT Scaled YOLOv4 TensorRT for Scaled YOLOv4(yolov4-csp.cfg) 很多人都写过TensorRT版本的yolo了,我也来写一个。 测试环境 ubuntu 18.04 pytorch 1.7.1 jetpack 4.4 CUDA 11.0

Bolano 10 Jul 30, 2021
YOLOv4 accelerated wtih TensorRT and multi-stream input using Deepstream

Deepstream 5.1 YOLOv4 App This Deepstream application showcases YOLOv4 running at high FPS throughput! P.S - Click the gif to watch the entire video!

Akash James 35 Nov 10, 2022
C++ library based on tensorrt integration

3行代码实现YoloV5推理,TensorRT C++库 支持最新版tensorRT8.0,具有最新的解析器算子支持 支持静态显性batch size,和动态非显性batch size,这是官方所不支持的 支持自定义插件,简化插件的实现过程 支持fp32、fp16、int8的编译 优化代码结构,打印

手写AI 1.5k Jan 5, 2023
A multi object tracking Library Based on tensorrt

YoloV5_JDE_TensorRT_for_Track Introduction A multi object detect and track Library Based on tensorrt 一个基于TensorRT的多目标检测和跟踪融合算法库,可以同时支持行人的多目标检测和跟踪,当然也可

zwg_cv 48 Nov 25, 2022
(ROS) YOLO detection with TensorRT, utilizing tkDNN

tkDNN-ROS YOLO object detection with ROS and TensorRT using tkDNN Currently, only YOLO is supported. Comparison of performance and other YOLO implemen

EungChang-Mason-Lee 7 Dec 10, 2022
An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

手写AI 1.5k Jan 5, 2023
Real-time object detection with YOLOv5 and TensorRT

YOLOv5-TensorRT The goal of this library is to provide an accessible and robust method for performing efficient, real-time inference with YOLOv5 using

Noah van der Meer 43 Dec 27, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
An R3D network implemented with TensorRT

r3d_TensorRT An r3d network implemented with TensorRT8.x, The weight of the model comes from PyTorch. A description of the models in Pytroch can be fo

null 2 Nov 7, 2021