A modern object detector inside fragment shaders

Overview

YOLOv4 Tiny in UnityCG/HLSL

Video Demo: https://twitter.com/SCRNinVR/status/1380238589238206465?s=20

Overview

YOLOv4 Tiny is one of the fastest object detectors that exists currently. The goal of this project is to completely recreate it without any existing ML libraries such as Darknet, PyTorch, or TensorFlow in order to port it into a VR game called VRChat.

My naive implementation only runs around 30 FPS so it doesn't hog resources for VR. It is nowhere near as performant as the original.

This implementation is based on the TensorFlow version from https://github.com/hunglc007/tensorflow-yolov4-tflite

NOTE: This was built and tested with Unity 2018.4.20f1, there may be shader compatibility issues with other versions.

Setup

  1. Download the package from Release
  2. Import
  3. Open scene in the Scenes folder
  4. Done, no dependencies
  5. Enter Play Mode to run the network

Avatars

  1. Look in the Prefabs folder
  2. Drop the prefab onto your avatar

Code

Important Shader Properties

yolov4tiny.shader

  1. Frame Delay - How much time each layer waits to update. Default value is 3, the lower it is the more GPU intensive it is.

nms.shader

  1. Confidence Threshold - The cut off point for which the bounding boxes will be culled. Default value is 0.5. Lowering this value will increase the boxes but also increase the error rate.

Reading the Output

The basic setup is: yolov4tiny.shader -> nms.shader -> output.shader. To read the bounding boxes information, we loop through the output of nms.shader

This is a basic setup of how output.shader works, refer to the file for more information.

  1. Setup the input, and feed in nms_buffer.renderTexture
Properties
{
    _NMSout ("NMS Output", 2D) = "black" {}
}
  1. Import the functions
#include "nms_include.cginc"
  1. Loop through the texture
const float2 scale = 0.5.xx;

uint i;
uint j;

// Loop through the 26x26 grid output
for (i = 0; i < 26; i++) {
    for (j = 0; j < 26; j++) {
        // Only draw a box if the confidence is over 50%
        uint4 buff = asuint(_NMSout[txL20nms.xy + uint2(i, j)]);
        float conf = f16tof32(buff.a);
        [branch]
        if (conf > 0.5) {
            // Class, 0 to 79
            float c = f16tof32(buff.b >> 16);
            // x, y is the center position of the bbox relative to 416, the initial image input size that goes into the network
            float x = f16tof32(buff.r >> 16);
            float y = f16tof32(buff.r);
            // w, h are the width and height of the bbox relative to 416, the initial image input size that goes into the network
            float w = f16tof32(buff.g >> 16);
            float h = f16tof32(buff.g);
            // Scale to camera resolution using UVs
            float2 center = float2(x, y) / 416.0;
            center.y = 1.0 - center.y;
            float2 size = float2(w, h) / 416.0 * scale;
        }
    }
}

// YOLOv4 tiny has two outputs, remember to go through the second one too
// Loop through the 13x13 grid output
for (i = 0; i < 13; i++) {
    for (j = 0; j < 13; j++) {
        // Only draw a box if the confidence is over 50%
        uint4 buff = asuint(_NMSout[txL17nms.xy + uint2(i, j)]);
        float conf = f16tof32(buff.a);
        [branch]
        if (conf > 0.5) {
            // Class, 0 to 79
            float c = f16tof32(buff.b >> 16);
            // x, y is the center position of the bbox relative to 416, the initial image input size that goes into the network
            float x = f16tof32(buff.r >> 16);
            float y = f16tof32(buff.r);
            // w, h are the width and height of the bbox relative to 416, the initial image input size that goes into the network
            float w = f16tof32(buff.g >> 16);
            float h = f16tof32(buff.g);
            // Scale to camera resolution using UVs
            float2 center = float2(x, y) / 416.0;
            center.y = 1.0 - center.y;
            float2 size = float2(w, h) / 416.0 * scale;
        }
    }
}

The data is packed into two 16 bits per channel by the nms.shader and the layout is as follows:

R =      X       |    Y
G =      W       |    H
B =  Best class  |    Best class probability
A =              |    Bounding box confidence

All the classes the network can detect, based on the COCO Dataset classes:

Index 0 starts at the top left, i.e. 0 = person, 1 = bicycle and so on.

How It Works

Since this is a direct implementation of a known architecture, you can refer to their original papers.

YOLOv4's paper is basically an ablation study on the different parameters and tuning to maximize speed and accuracy. I suggest reading the previous versions to have a better understanding of the actual architecture.

Other Resources

If you have questions or comments, you can reach me on Discord: SCRN#8008 or Twitter: https://twitter.com/SCRNinVR

You might also like...
A multi object tracking Library Based on tensorrt
A multi object tracking Library Based on tensorrt

YoloV5_JDE_TensorRT_for_Track Introduction A multi object detect and track Library Based on tensorrt 一个基于TensorRT的多目标检测和跟踪融合算法库,可以同时支持行人的多目标检测和跟踪,当然也可

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection
A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection, how to set up the GigE camera, and deployment for the DeepStream apps.

Real-time object detection with YOLOv5 and TensorRT

YOLOv5-TensorRT The goal of this library is to provide an accessible and robust method for performing efficient, real-time inference with YOLOv5 using

Port of the 2020 support library to Raspberry Pi for the VL53L3CX Time-of-Flight ranging sensor with advanced multi-object detection

Port of ST VL53L3CX (2020) driver library to Raspberry Pi This is a port of the support library to Raspberry Pi for the VL53L3CX Time-of-Flight rangin

YOLOX + ROS2 object detection package
YOLOX + ROS2 object detection package

YOLOX-ROS YOLOX+ROS2 Foxy Supported List Base ROS1 C++ ROS1 Python ROS2 C++ ROS2 Python CPU ✅ CUDA ✅ CUDA (FP16) ✅ TensorRT (CUDA) ✅ OpenVINO ✅ MegEng

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021
ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021 Dataset Code Demos Authors: He Zhang, Yuting Ye, Tak

[3DV 2021] DSP-SLAM: Object Oriented SLAM with Deep Shape Priors
[3DV 2021] DSP-SLAM: Object Oriented SLAM with Deep Shape Priors

DSP-SLAM Project Page | Video | Paper This repository contains code for DSP-SLAM, an object-oriented SLAM system that builds a rich and accurate joint

OrcVIO Stereo and object mapping
OrcVIO Stereo and object mapping

About Object residual constrained Visual-Inertial Odometry (OrcVIO) is a visual-inertial odometry pipeline, which is tightly coupled with tracking and

Ncnn version demo of [CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search (ncnn) The official implementation by pytorch: ht

Comments
  • it's not Issue. just i have some questions.

    it's not Issue. just i have some questions.

    Hi. I am very impressed with what you made. actually, I'm also trying to get yolov4-tiny to work in unity. However, I think UNITY will have difficulties because it is my first time using it.

    I converted the yolov4 model trained on darknet to onnx. And then I uploaded it to unity, but I can't seem to read the model file. Like I said, I know very little about unity.

    So, I'd like you to explain the tutorial or process of applying yolov4 to unity. Can you do that?

    opened by pervin0527 4
Releases(1.0.2)
Owner
Bleh.
null
deep learning vision detector/estimator

libopenvision deep learning visualization C library Prerequest ncnn Install openmp vulkan(optional) Build git submodule update --init --recursuve cd b

Prof Syd Xu 3 Sep 17, 2022
An encoding detector library ported from Mozilla

uchardet uchardet moved! uchardet is now a freedesktop project. The page: https://www.freedesktop.org/wiki/Software/uchardet/ Bug reports: https://git

Carbo Kuo 556 Dec 12, 2022
A c++ implementation of yolov5 head deepsort detector

A C++ implementation of Yolov5 and Deepsort in Jetson Xavier nx and Jetson nano This repository uses yolov5 and deepsort to follow humna heads which c

null 5 Aug 25, 2022
Object Based Generic Perception Object Model

This model is a highly parameterizable generic perception sensor and tracking model. It can be parameterized as a Lidar or a Radar. The model is based on object lists and all modeling is performed on object level.

TU Darmstadt - FZD 5 Jun 11, 2022
Docker files and scripts to setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a docker container.

jetson_vins_fusion_docker This repository provides Docker files and scripts to easily setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a d

Mohamed Abdelkader Zahana 22 Dec 18, 2022
An x64 binary executing code that's not inside of it.

Remote Machine Code Fetch & Exec in other words, another self rewriting binary.. boy I just love doing these. Description The idea behind this one is

x0reaxeax 2 Nov 19, 2022
Newton fractal in openframeworks, with shaders. (inspired by: 3b1b)

Newton-fractal Newton fractal in openframeworks, with shaders. (inspired by: 3b1b) Formula read more: Newton's method learn more: Newton's Fractal (wh

András Zoller 2 Nov 14, 2021
Training and fine-tuning YOLOv4 Tiny on custom object detection dataset for Taiwanese traffic

Object Detection on Taiwanese Traffic using YOLOv4 Tiny Exploration of YOLOv4 Tiny on custom Taiwanese traffic dataset Trained and tested AlexeyAB's D

Andrew Chen 5 Dec 14, 2022
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 2, 2023
UAV images dataset for moving object detection

PESMOD PESMOD (PExels Small Moving Object Detection) dataset consists of high resolution aerial images in which moving objects are labelled manually.

İbrahim Delibaşoğlu 37 Jan 2, 2023