A tool which profiles Vulkan devices to find their peak capacities

Related tags

Testing vkpeak
Overview

vkpeak

CI

A synthetic benchmarking tool to measure peak capabilities of vulkan devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case.

Download

Download Windows/Linux/MacOS Executable for Intel/AMD/Nvidia GPU

https://github.com/nihui/vkpeak/releases

Usages

vkpeak.exe 0

The only parameter 0 is the device id.

If you encounter a crash or error, try upgrading your GPU driver:

Build from Source

  1. Download and setup the Vulkan SDK from https://vulkan.lunarg.com/
  • For Linux distributions, you can either get the essential build requirements from package manager
dnf install vulkan-headers vulkan-loader-devel
apt-get install libvulkan-dev
pacman -S vulkan-headers vulkan-icd-loader
  1. Clone this project with all submodules
git clone https://github.com/nihui/vkpeak.git
cd vkpeak
git submodule update --init --recursive
  1. Build with CMake
  • You can pass -DUSE_STATIC_MOLTENVK=ON option to avoid linking the vulkan loader library on MacOS
mkdir build
cd build
cmake ..
cmake --build . -j 4

Sample

[[email protected] build]$ ./vkpeak 0
device       = GeForce RTX 2070

fp32-scalar  = 8536.18 GFLOPS
fp32-vec4    = 8473.82 GFLOPS

fp16-scalar  = 8405.30 GFLOPS
fp16-vec4    = 16261.30 GFLOPS

fp64-scalar  = 262.86 GFLOPS
fp64-vec4    = 262.86 GFLOPS

int32-scalar = 8363.63 GIOPS
int32-vec4   = 8313.07 GIOPS

int16-scalar = 5518.05 GIOPS
int16-vec4   = 7138.91 GIOPS
[email protected] vkpeak-20210424-macos % ./vkpeak 0 
device       = Apple M1

fp32-scalar  = 2093.55 GFLOPS
fp32-vec4    = 2369.02 GFLOPS

fp16-scalar  = 2195.79 GFLOPS
fp16-vec4    = 2513.04 GFLOPS

fp64-scalar  = 0.00 GFLOPS
fp64-vec4    = 0.00 GFLOPS

int32-scalar = 653.38 GIOPS
int32-vec4   = 649.56 GIOPS

int16-scalar = 653.42 GIOPS
int16-vec4   = 652.94 GIOPS

Other Open-Source Code Used

Issues
  • Test fp16 without 16-bit storage

    Test fp16 without 16-bit storage

    It would be nice to be able to look at the perfomance of fp16 ALU ops, even if we don't have 16-bit storage. This is the case for qualcomm A618, for example -- we can't load/store 16 bits, but we can do math.

    Similarly, it would be nice to test RelaxedPrecision ALU ops on 32-bit values, which seems to be a common case for glslang-translated glsl.

    opened by anholt 0
  • Crashes on M1 Max

    Crashes on M1 Max

    When I try and compile it myself, I get an error. When I run the program from the releases page, it works. Have any ideas?

    
    fish: Job 1, './vkpeak 0' terminated by signal SIGSEGV (Address boundary error)
    

    Console says:

    
    Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
    Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000010
    Exception Codes:       0x0000000000000001, 0x0000000000000010
    Exception Note:        EXC_CORPSE_NOTIFY
    
    Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
    Terminating Process:   exc handler [99795]
    
    VM Region Info: 0x10 is not in any region.  Bytes before following region: 4332306416
    
    opened by lbibass 2
  • Int8 & Int64 support?

    Int8 & Int64 support?

    Hi, nice benchmark! below my Titan V and RX Vega Win results.. AFAIK Vulkan spec supports also int8 (via VK_KHR_shader_float16_int8 shaderInt8) and int64 (shaderInt64).. any plan on support benchmarking int8/64 throughput? thanks..

    Results:

    device = NVIDIA TITAN V

    fp32-scalar = 17230.91 GFLOPS fp32-vec4 = 16898.01 GFLOPS

    fp16-scalar = 16781.96 GFLOPS fp16-vec4 = 32568.21 GFLOPS

    fp64-scalar = 7664.02 GFLOPS fp64-vec4 = 7677.14 GFLOPS

    int32-scalar = 14464.71 GIOPS int32-vec4 = 14755.26 GIOPS

    int16-scalar = 9727.97 GIOPS int16-vec4 = 11768.93 GIOPS

    device = Radeon RX Vega

    fp32-scalar = 11453.46 GFLOPS fp32-vec4 = 11010.15 GFLOPS

    fp16-scalar = 10388.36 GFLOPS fp16-vec4 = 17744.94 GFLOPS

    fp64-scalar = 686.59 GFLOPS fp64-vec4 = 686.31 GFLOPS

    int32-scalar = 2188.62 GIOPS int32-vec4 = 2170.05 GIOPS

    int16-scalar = 10013.59 GIOPS int16-vec4 = 9885.89 GIOPS

    opened by oscarbg 1
Owner
マジやばくね
null
Network utility tool which enables to prototype or test network things.

netsck netsck is a network utility tool which is developed to prototype or test network things. It provides a shell inside which runs javascript engin

Ozan Cansel 4 May 29, 2022
Practical mutation testing tool for C and C++

Mull Mull is a tool for Mutation Testing based on LLVM/Clang with a strong focus on C and C++ languages. For installation and usage please refer to th

Mull Project 636 Aug 5, 2022
A dynamic mock tool for C/C++ unit test on Linux&MacOS X86_64

lmock 接口 替换一个函数,修改机器指令,用新函数替换旧函数,支持全局函数(包括第三方和系统函数)、成员函数(包括静态和虚函数)

null 48 Jul 28, 2022
A tool to help in testing client/server robustness in the presence of malformed data.

Tool to assist in testing robustness of network-attached services in the presence of malformed data.

Peter Farley 0 Jun 11, 2022
A tool to test if a shared library is dlopen'ble

A tool to test if a shared library is dlopen'ble

Mahin Ahmed 1 Oct 17, 2021
🍋 Macro creation tool for MacOS

?? Lime Macro creation tool for MacOS Why Does lime require accessibility? Lime requires the Accessibility API to perform macro actions, such as press

AshPerson 1 Nov 27, 2021
The Vulkan Profiles Tools are a collection of tools delivered with the Vulkan SDK for Vulkan application developers to leverage Vulkan Profiles while developing a Vulkan application

Copyright © 2021-2022 LunarG, Inc. Vulkan Profiles Tools (BETA) The Vulkan Profiles Tools are a collection of tools delivered with the Vulkan SDK for

The Khronos Group 56 Jul 31, 2022
Find patterns of vulnerabilities on Windows in order to find 0-day and write exploits of 1-days. We use Microsoft security updates in order to find the patterns.

Back 2 the Future Find patterns of vulnerabilities on Windows in order to find 0-day and write exploits of 1-days. We use Microsoft security updates i

SafeBreach Labs 92 Jul 14, 2022
Fully resizing juce peak meter module with optional fader overlay.

Sound Meter Juce peak meter module with optional fader overlay. by Marcel Huibers | Sound Development 2021 | Published under the MIT License Features:

Sound Development 15 Jun 8, 2022
This project helps a person park their car in their garage in the same place every time.

garage-parking-sensor Description This project is developed to help a person park their car in their garage in the same place every time. Normally peo

Calvin Pereira 2 Sep 13, 2021
Vulkan Video Sample Application demonstrating an end-to-end, all-Vulkan, processing of h.264/5 compressed video content.

This project is a Vulkan Video Sample Application demonstrating an end-to-end, all-Vulkan, processing of h.264/5 compressed video content. The application decodes the h.264/5 compressed content using an HW accelerated decoder, the decoded YCbCr frames are processed with Vulkan Graphics and then presented via the Vulkan WSI.

NVIDIA DesignWorks Samples 115 Aug 4, 2022
ContactGot is an offline desktop app, where clients can leave their info, while an administrator can manage which information they need to gather on certain projects.

ContactGot Contents Description How to use Requirements Engineering Installation Documentation Design Architecture Demonstration 1. Description During

Elizaveta 16 Dec 17, 2021
Identify I2C devices from a database of the most popular I2C sensors and other devices

I2C Detective Identify I2C devices from a database of the most popular I2C sensors and other devices. For more information see http://www.technoblogy.

David Johnson-Davies 19 Jun 11, 2022
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers

Coqui STT ( ?? STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. ?? STT is battle tested in both producti

Coqui.ai 1.4k Aug 8, 2022
A light-weight Flutter Engine Embedder based on HADK ,which for Android devices that runs without any java code

flutter-hadk A light-weight Flutter Engine Embedder based on HADK ,which for Android devices that runs without any java code 1.Build by android-ndk-to

null 12 Jun 15, 2022
Björn Kalkbrenner 32 Jul 25, 2022
Dolphin |MMJR| is a Gamecube/Wii Emulator for Android devices; based on Dolphin MMJ source code which is aimed at pure performance.

Dolphin |MMJR| An Android-only performance-focused Dolphin (Official) fork, continued from the Dolphin MMJ source code by Weihuoya. This version is me

null 237 Aug 5, 2022
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

Mozilla 20k Aug 8, 2022
Syncspirit is a continuous file synchronization program, which synchronizes files between devices.

syncspirit sites: github, abf syncspirit is a continuous file synchronization program, which synchronizes files between devices. It is build using C++

Ivan Baidakou 14 Jun 9, 2022
Screens options data to find the best options to sell for theta-gangers

Robinhood-options-screener Screens options data to find the best options to sell for theta-gangers, works for cash-secured-puts and covered-calls. Get

null 25 Jul 24, 2022
Upload arbitrary data via Apple's Find My network.

Send My Send My allows you to to upload abritrary data from devices without an internet connection by (ab)using Apple's Find My network. The data is b

Positive Security 1.5k Aug 7, 2022
Upload arbitrary data via Apple's Find My network.

Send My allows you to to upload abritrary data from devices without an internet connection by (ab)using Apple's Find My network. The data is broadcasted via Bluetooth Low Energy and forwarded by nearby Apple devices.

Positive Security 1.5k Jul 30, 2022
find likely coding segments in DNA using composition-normalised hexamer tables

hextable makes files of statistics that hexamer uses to scan for likely coding regions

Richard Durbin 14 Jan 21, 2022
Library that simplify to find header for class from STL library.

Library that simplify to find header for class from STL library. Instead of searching header for some class you can just include header with the class name.

null 6 Jun 7, 2022
A memory allocation program, it is used for doing an experiment to find out the detail of Microsoft Windows taskmgr performance information

memory-allocation-test A memory allocation program, it is used for doing an experiment to find out the detail of Microsoft Windows taskmgr performance

Chang Wei 5 Jul 28, 2022
Clang plugin to find method or property directable.

ObjCDirectFinder Clang had provided objc_direct attribute for us to write this: @property (nonatomic, assign, direct) BOOL isLaunchFinished; - (BOOL)i

Kam-To 3 Jun 2, 2022
OpenGL Object Loading can load virtually every 3d.obj file you can find on the internet, without using another object loading library

OpenGL Object Loading can load virtually every 3d.obj file you can find on the internet, without using another object loading library (assimp for example). The program can load Object with 12M+ triangles and more

Phan Sang 10 Jul 30, 2022
In this repository you'll find the fully reversed source code for GTA III (master branch) and GTA VC (miami branch).

Intro In this repository you'll find the fully reversed source code for GTA III (master branch) and GTA VC (miami branch). It has been tested and work

Zero 1 Nov 11, 2021