# 在yolov5的基础上增加landmark预测分支，loss使用wingloss,使用yolov5s取得了相对于retinaface-r50更好的性能。

### Related tags

Algorithms yolov5-face

## yolov5-face

#### WiderFace测试

• 在wider face val精度（单尺度最大边输入分辨率：1024
Backbone Easy Medium Hard
yolov5s 95.4% 94.6% 88.2%
Yolov5m 95.8% 95.1% 90.5%
RetinaFace-R50(original image scale) 95.5% 94.0% 84.4%

#### References

https://github.com/ultralytics/yolov5

https://github.com/DayBreak-u/yolo-face-with-landmark

https://github.com/xialuxi/yolov5_face_landmark

https://github.com/biubug6/Pytorch_Retinaface

• #### Some questions about TensorRT for Yolov5-face

@bobo0810 Really thanks for your TensorRT inference implementation!! There are some questions after successfully running the TensorRT of Yolov5-face:

1. The results in table look very impressive. But in my case, I test the RT time on 2080ti GPU after running following two codes:
``````start = time()
for i in range(1000):
pred=yolo_trt_model(img.cpu().numpy()) #tensorrt推理
ends = time()
print('RT for one image: {} ms'.format(ends-start))
``````

This code gives the `RT for one image: 6 ms`.

``````start = time()
for i in range(1000):
pred=yolo_trt_model(img.cpu().numpy()) #tensorrt推理
pred=yolo_trt_model.after_process(pred,device)
ends = time()
print('RT for one image: {} ms'.format(ends-start))
``````

This code gives the `RT for one image: 11 ms`. Is such a test of RT time right in my understanding ?

1. It seems `yolo_trt_model.after_process` cost much time. Why not put this process into TensorRT, by uncommenting this line ? I find in original yolov5 repo, the overall model can be exported by this file. Is it possible to put the entire process of Yolov5-face into TenserRT?

2. The results in the table only consider the `yolo_trt_model.__call__` running time, or both `yolo_trt_model.__call__`, `yolo_trt_model.after_process` and `non_max_suppression_face` are considered ?

opened by vtddggg 13
• #### Question about running multiple inference with TensorRT

I ran the tensorRT Engine file for inference successfully, and the performance is amazing compared to default in Jetson TX2 board. However, when I try to run infer using the same loaded engine file it shows wrong results as well as the inference time has increased. Could you please recommend or guide on how to apply trt engine file for multiple inference. Thanks.

• #### test_widerface.py where is the file wider_val.txt?

parser.add_argument('--folder_pict', default='/yolov5-face/data/widerface/val/wider_val.txt', type=str, help='folder_pict') how to get the file wider_val.txt? thanks

opened by aa12356jm 8
• #### Align Landmar

Hi,

By the way have you 112x112 align version or utils somewhere ?

O can we use retinafaces align . Is the Landmark [5 point] is in same order as in retinaface?

opened by MyraBaba 6
• #### 多类别人脸，只能检测出来第一种人脸？

你好，我用这个代码跑3中类别人脸检测遇到了一点问题，多个类别训练完，跑检测脚本，只能预测出来第一个类别（人脸框，landm都正常），另外两个类别的人脸预测不出来，把过nms前，原始pred的tensor打印出来看，发现后两列被固定成0。然后检查了训练代码，发现另外两个类别的分类score和obj的score都是正常的，训练过程中的precision和recall指标都很低，不知道是不是跟nms超时10s有关。调试了很久找不到原因，可以帮忙分析下吗？谢谢。 hello, this is my question: train 3 class faces by this code, but when i run detect_face.py to pred a picture, only the 0 class can be success detected，the pred tensor of the other class faces always be 0.00000e+00. i checked the train.py, i also find the question, that is precision and recall is too low, but the pred tensor is ok in score of the other class face.

opened by changhy666 6
• #### no face detected TensorRT

Hi, @bobo0810 Really appreciate your works! I've successfully run all the command line in your readme file. But the resulting image detects no face. What am I doing wrong? Is that cause of a different TensorRT version? My TensorRT version is 8.0.0.3 Thank you!

python torch2trt/main.py --trt_path pretrained/yolov5s-face.trt [TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +136, GPU +0, now: CPU 259, GPU 407 (MiB) [TensorRT] INFO: Loaded engine size: 50 MB [TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine begin: CPU 259 MiB, GPU 407 MiB [TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +184, GPU +76, now: CPU 455, GPU 527 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +110, GPU +46, now: CPU 565, GPU 573 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 565, GPU 555 (MiB) [TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine end: CPU 565 MiB, GPU 555 MiB [TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 565, GPU 565 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 565, GPU 573 (MiB) bingding: input (1, 3, 640, 640) bingding: output (1, 25200, 16) img.shape: torch.Size([1, 3, 640, 640]) orgimg.shape: (768, 1024, 3) result save in {...}yolov5-face/torch2trt/result.jpg [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 688, GPU 785 (MiB)

opened by quocnhat 5
• #### Why am I having this problem? AssertionError: No results.txt files found in /content/yolov5-face/runs/train/exp, nothing to plot.

When I run this command: !python3 train.py --data data/widerface.yaml --cfg models/yolov5s.yaml --weights yolovs-face.pt --epochs 1 I run into this problem. Traceback (most recent call last): File "train.py", line 513, in train(hyp, opt, device, tb_writer, wandb) File "train.py", line 400, in train plot_results(save_dir=save_dir) # save as results.png File "/content/yolov5-face/utils/plots.py", line 393, in plot_results assert len(files), 'No results.txt files found in %s, nothing to plot.' % os.path.abspath(save_dir) AssertionError: No results.txt files found in /content/yolov5-face/runs/train/exp, nothing to plot.

• #### [C++]: 🍅add YOLO5Face MNN/TNN/NCNN/ONNXRuntime C++ demo

Add YOLO5Face MNN/TNN/NCNN/ONNXRuntime C++ demo, all tests passed ~

## Usage

• build lite.ai.toolkit
``````git clone --depth=1 https://github.com/DefTruth/lite.ai.toolkit.git  # latest
cd lite.ai.toolkit && sh ./build.sh  # On MacOS, you can use the built OpenCV, ONNXRuntime, MNN, NCNN and TNN libs in this repo.
``````

• use YOLO5Face in C++

``````auto *yolov5face = new lite::cv::face::detect::YOLO5Face(onnx_path);
auto *yolov5face = new lite::mnn::cv::face::detect::YOLO5Face(mnn_path);
auto *yolov5face = new lite::tnn::cv::face::detect::YOLO5Face(tnn_path);
auto *yolov5face = new lite::ncnn::cv::face::detect::YOLO5Face(ncnn_path);
``````

## Demo

``````#include "lite/lite.h"

static void test_default()
{
std::string onnx_path = "../../../hub/onnx/cv/yolov5face-s-640x640.onnx"; // yolov5s-face
std::string test_img_path = "../../../examples/lite/resources/test_lite_face_detector.jpg";
std::string save_img_path = "../../../logs/test_lite_yolov5face.jpg";

auto *yolov5face = new lite::cv::face::detect::YOLO5Face(onnx_path);

std::vector<lite::types::BoxfWithLandmarks> detected_boxes;
yolov5face->detect(img_bgr, detected_boxes);

lite::utils::draw_boxes_with_landmarks_inplace(img_bgr, detected_boxes);

cv::imwrite(save_img_path, img_bgr);

std::cout << "Default Version Done! Detected Face Num: " << detected_boxes.size() << std::endl;

delete yolov5face;
}
``````

The output is:

opened by DefTruth 5
• #### support export onnx/pb with output concat

verified ok! giving a value to slince variable is not support in exporting onnx or pb, such as op like '...' or ':'. if want get the output concat, please edit models/export.py like this: model.model[-1].export = False # set Detect() layer export=True model.model[-1].export_cat = True

opened by changhy666 4
• #### Inference Speed

I have a question about yolov5-face inference speed. Yolov5-face is more accurate than scrfd, but inference speed is more slower.

Is it true?

opened by LeeKyungwook 4
• #### 统计的Flops不一样

大佬您好，好像你readme中统计的flops和我用yolov5统计出来的不一样： yolov5-n：Model Summary: 308 layers, 1705462 parameters, 1705462 gradients, 5.0 GFLOPS yolov5-0.5n：Model Summary: 308 layers, 439734 parameters, 439734 gradients, 1.4 GFLOPS

opened by ppogg 3
• #### Can't run validate after training

I'm training the model yolov5n-0.5 on WIDERFACE dataset. Whenever the train loop hits the validation step, it will just crash. When I use htop to check the memory usage, it showed me that whenever I hit the validation step, it will consume all of my RAM and swap memory (which is 16GB of RAM and 16GB of swap, resulting in 32GB of total memory) and cause memory overflow. Does anyone encountered this problem and what is the suggested fix?

opened by MS1908 0
• #### torch2trt/main.py中有几行代码错误

torch2trt/main.py 69~71行，此处cv2.rectangle接收的就是左上角和右下角坐标，这里不需要使用xyxy2xywh，并且也不需要 除以gn进行归一化，landmarks同理。

``````                xywh = (xyxy2xywh(det[j, :4].view(1, 4)) / gn).view(-1).tolist()
conf = det[j, 4].cpu().numpy()
landmarks = (det[j, 5:15].view(1, 10) / gn_lks).view(-1).tolist()
``````

也就是应该和detect_face.py的174~176行一致，但是不知道为什么多写了，修改后为：

``````                xywh = det[j, :4].view(1, 4).view(-1).tolist()
conf = det[j, 4].cpu().numpy()
landmarks = (det[j, 5:15].view(1, 10)).view(-1).tolist()
``````
opened by Monkey-D-Luffy-star 0
• #### fine-tunning label.txt keypoint format

I am doing fine tuning. The kpt_label default value is 5, but if you give the key point coordinate value to label txt as in the form of bbox, it seems that it cannot be read. For example, bbox_id x y whatpt_id x y *5 What format should I create label.txt?

opened by qlqqqk 0