Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

Overview

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection, CVPR 2021

Abhinav Kumar, Garrick Brazil, Xiaoming Liu

[project], [supp], [slides], [1min_talk], demo, arxiv

This code is based on Kinematic-3D, such that the setup/organization is very similar. A few of the implementations, such as classical NMS, are based on Caffe.

References

Please cite the following paper if you find this repository useful:

@inproceedings{kumar2021groomed,
  title={{GrooMeD-NMS}: Grouped Mathematically Differentiable NMS for Monocular {$3$D} Object Detection},
  author={Kumar, Abhinav and Brazil, Garrick and Liu, Xiaoming},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

Setup

  • Requirements

    1. Python 3.6
    2. Pytorch 0.4.1
    3. Torchvision 0.2.1
    4. Cuda 8.0
    5. Ubuntu 18.04/Debian 8.9

    This is tested with NVIDIA 1080 Ti GPU. Other platforms have not been tested. Unless otherwise stated, the below scripts and instructions assume the working directory is the project root.

    Clone the repo first:

    git clone https://github.com/abhi1kumar/groomed_nms.git
  • Cuda & Python

    Install some basic packages:

    sudo apt-get install libopenblas-dev libboost-dev libboost-all-dev git
    sudo apt install gfortran
    
    # We need to compile with older version of gcc and g++
    sudo apt install gcc-5 g++-5
    sudo ln -f /usr/bin/gcc-5 /usr/local/cuda-8.0/bin/gcc
    sudo ln -s /usr/bin/g++-5 /usr/local/cuda-8.0/bin/g++

    Next, install conda and then install the required packages:

    wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
    bash Anaconda3-2020.02-Linux-x86_64.sh
    source ~/.bashrc
    conda list
    conda create --name py36 --file dependencies/conda.txt
    conda activate py36
  • KITTI Data

    Download the following images of the full KITTI 3D Object detection dataset:

    Then place a soft-link (or the actual data) in data/kitti:

     ln -s /path/to/kitti data/kitti

    The directory structure should look like this:

    ./groomed_nms
    |--- cuda_env
    |--- data
    |      |---kitti
    |            |---training
    |            |        |---calib
    |            |        |---image_2
    |            |        |---label_2
    |            |
    |            |---testing
    |                     |---calib
    |                     |---image_2
    |
    |--- dependencies
    |--- lib
    |--- models
    |--- scripts

    Then, use the following scripts to extract the data splits, which use soft-links to the above directory for efficient storage:

    python data/kitti_split1/setup_split.py
    python data/kitti_split2/setup_split.py

    Next, build the KITTI devkit eval:

     sh data/kitti_split1/devkit/cpp/build.sh
  • Classical NMS

    Lastly, build the classical NMS modules:

    cd lib/nms
    make
    cd ../..

Training

Training is carried out in two stages - a warmup and a full. Review the configurations in scripts/config for details.

chmod +x scripts_training.sh
./scripts_training.sh

If your training is accidentally stopped, you can resume at a checkpoint based on the snapshot with the restore flag. For example, to resume training starting at iteration 10k, use the following command:

source dependencies/cuda_8.0_env
CUDA_VISIBLE_DEVICES=0 python -u scripts/train_rpn_3d.py --config=groumd_nms --restore=10000

Testing

We provide logs/models/predictions for the main experiments on KITTI Val 1/Val 2/Test data splits available to download here.

Make an output folder in the project directory:

mkdir output

Place different models in the output folder as follows:

./groomed_nms
|--- output
|      |---groumd_nms
|      |
|      |---groumd_nms_split2
|      |
|      |---groumd_nms_full_train_2
|
| ...

To test, run the file as below:

chmod +x scripts_evaluation.sh
./scripts_evaluation.sh

Contact

For questions, feel free to post here or drop an email to this address- [email protected]

You might also like...
A lightweight version of OrcVIO that uses monocular images, inertial data, as well as bounding box measurements
A lightweight version of OrcVIO that uses monocular images, inertial data, as well as bounding box measurements

OrcVIO-Lite About Object residual constrained Visual-Inertial Odometry (OrcVIO) is a visual-inertial odometry pipeline, which is tightly coupled with

ORB-SLAM3-Monodepth is an extended version of ORB-SLAM3 that utilizes a deep monocular depth estimation network
ORB-SLAM3-Monodepth is an extended version of ORB-SLAM3 that utilizes a deep monocular depth estimation network

ORB_SLAM3_Monodepth Introduction This repository was forked from [ORB-SLAM3] (https://github.com/UZ-SLAMLab/ORB_SLAM3). ORB-SLAM3-Monodepth is an exte

ADOP: Approximate Differentiable One-Pixel Point Rendering
ADOP: Approximate Differentiable One-Pixel Point Rendering

ADOP: Approximate Differentiable One-Pixel Point Rendering

Efficient Differentiable Simulation of Articulated Bodies (ICML2021)
Efficient Differentiable Simulation of Articulated Bodies (ICML2021)

Efficient Differentiable Simulation of Articulated Bodies Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin [Paper] [Video] [Slides] [Code] Set

 Object Based Generic Perception Object Model
Object Based Generic Perception Object Model

This model is a highly parameterizable generic perception sensor and tracking model. It can be parameterized as a Lidar or a Radar. The model is based on object lists and all modeling is performed on object level.

Training and fine-tuning YOLOv4 Tiny on custom object detection dataset for Taiwanese traffic
Training and fine-tuning YOLOv4 Tiny on custom object detection dataset for Taiwanese traffic

Object Detection on Taiwanese Traffic using YOLOv4 Tiny Exploration of YOLOv4 Tiny on custom Taiwanese traffic dataset Trained and tested AlexeyAB's D

UAV images dataset for moving object detection
UAV images dataset for moving object detection

PESMOD PESMOD (PExels Small Moving Object Detection) dataset consists of high resolution aerial images in which moving objects are labelled manually.

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection
A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection, how to set up the GigE camera, and deployment for the DeepStream apps.

Real-time object detection with YOLOv5 and TensorRT

YOLOv5-TensorRT The goal of this library is to provide an accessible and robust method for performing efficient, real-time inference with YOLOv5 using

Comments
  • Is there any difference between groom-nms and penalize highest-confidence proposal using gt directly?

    Is there any difference between groom-nms and penalize highest-confidence proposal using gt directly?

    Hi~thanks for your great work. However, I have some confusion in understanding the motivation of this algorithm. If we want to achieve the consistency of training and test, we can simply penalize the highest-confidence proposal in the training pipeline, which seems to achieve similar result.So, is there any difference between groom-nms and penalize highest-confidence proposal using gt directly?

    opened by kaixinbear 3
  • Problem in test

    Problem in test

    Hi, this is an exciting work.And i have a question when I try to test with the pre-train model. I can't find "Kinematic3D-Release/val1_kinematic/model_final".

    opened by chenH20000109 1
Releases(v0.1)
Owner
Abhinav Kumar
PhD Student, Computer Vision and Deep Learning, MSU
Abhinav Kumar
Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

Teddy Koker 654 Dec 25, 2022
Code and Data for our CVPR 2021 paper "Structured Scene Memory for Vision-Language Navigation"

SSM-VLN Code and Data for our CVPR 2021 paper "Structured Scene Memory for Vision-Language Navigation". Environment Installation Download Room-to-Room

hanqing 35 Dec 3, 2022
HybridPose: 6D Object Pose Estimation under Hybrid Representation (CVPR 2020)

HybridPose: 6D Object Pose Estimation under Hybrid Representations This repository contains authors' implementation of HybridPose: 6D Object Pose Esti

SONG, Chen 358 Nov 22, 2022
Grouped Feedback Delay Networks for Coupled Room Modeling

Grouped Feedback Delay Networks Reverb Plugin GFDNs connect multiple spaces with different T60 characteristics and a parameterized mixing matrix to co

Orchisama Das 28 Dec 5, 2022
Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV Workshop @ CVPR 2021.

MarkerPose: Robust Real-time Planar Target Tracking for Accurate Stereo Pose Estimation This is a PyTorch and LibTorch implementation of MarkerPose: a

Jhacson Meza 47 Nov 18, 2022
[CVPR 2021] NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning

NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning Project Page | Paper | Supplemental material #1 | Supplement

KAIST VCLAB 50 Jan 2, 2023
DeepI2P - Image-to-Point Cloud Registration via Deep Classification. CVPR 2021

#DeepI2P: Image-to-Point Cloud Registration via Deep Classification Summary Video PyTorch implementation for our CVPR 2021 paper DeepI2P. DeepI2P solv

Li Jiaxin 138 Jan 8, 2023
Official Pytorch implementation of RePOSE (ICCV2021)

RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering (ICCV2021) [Link] Abstract We present RePOSE, a fast iterative refinement method for

Shun Iwase 68 Nov 15, 2022
C++ trainable detection library based on libtorch (or pytorch c++). Yolov4 tiny provided now.

C++ Library with Neural Networks for Object Detection Based on LibTorch. ?? Libtorch Tutorials ?? Visit Libtorch Tutorials Project if you want to know

null 62 Dec 29, 2022
ORB-SLAM3 is the first real-time SLAM library able to perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models.

Just to test for my research, and I add coordinate transformation to evaluate the ORB_SLAM3. Only applied in research, and respect the authors' all work.

B.X.W 5 Jul 11, 2022