Fast single source file BC7/BPTC texture encoder with perceptual metric support

Overview
Note: Since this repo was created, we've released two new codecs with better BC7 encoders:
https://github.com/richgel999/bc7enc_rdo
https://github.com/BinomialLLC/bc7e

bc7enc16 - Fast, single source file BC7/BPTC GPU texture encoder with perceptual colorspace metric support

This project is basically a demo of some of the techniques we use in Basis BC7,
which is Binomial's state of the art vectorized BC7 encoder. Basis BC7 is the
highest quality and fastest CPU BC7 encoder available (2-3x faster than
ispc_texcomp). It supports all modes and linear/perceptual colorspace metrics.
Licensees get full ISPC source code so they can customize the codec as needed.

bc7enc16 purposely only supports modes 1 and 6. This is a strong opaque texture encoder, with basic
support for alpha channels (using mode 6). The intended use case is opaque
textures, or opaque textures with relatively simple alpha channels.

If alpha is highly correlated compared to RGB, or alpha is relatively simple
(think simple masks where lots of blocks are either all-transparent or
all-opaque), it should work great. For complex alpha channels more modes (such
as 4, 5 or maybe 7) are necessary.

This codec supports a perceptual mode, where it computes colorspace error in
weighted YCbCr space (like etc2comp), and it also supports weighted RGBA
metrics. It's particular strong in perceptual mode, beating the current state of
the art CPU encoder (Intel's ispc_texcomp) by a wide margin when measured by
Luma PSNR, even though it only supports 2 modes and isn't vectorized.

Why only modes 1 and 6?
Because with these two modes you have a complete encoder that supports both
opaque and transparent textures in a small amount (~1400 lines) of
understandable plain C code. Mode 6 excels on smooth blocks, and mode 1 is
strong with complex blocks, and a strong encoder that combines both modes can be
quite high quality. Fast mode 6-only encoders will have noticeable block
artifacts which this codec avoids by fully supporting mode 1.

Modes 1 and 6 are typically the most used modes on many textures using other
encoders. Mode 1 has two subsets, 64 possible partitions, and 3-bit indices,
while mode 6 has large 4-bit indices and high precision 7777.1 endpoints. This
codec produces output that is far higher quality than any BC1 encoder, and
approaches (or in perceptual mode exceeds!) the quality of other full BC7
encoders.

Why is bc7enc16 so fast in perceptual mode?
Computing error in YCbCr space is more expensive than in RGB space, yet bc7enc16
in perceptual mode is stronger than ispc_texcomp (see the benchmark below) -
even without SSE/AVX vectorization and with only 2 modes to work with!

Most BC7 encoders only support linear RGB colorspace metrics, which is a
fundamental weakness. Some support weighted RGB metrics, which is better. With
linear RGB metrics, encoding error is roughly balanced between each channel, and
encoders have to work *very* hard (examining large amounts of RGB search space)
to get overall quality up. With perceptual colorspace metrics, RGB error tends
to become a bit unbalanced, with green quality favored more highly than red and
blue, and blue quality favored the least. A perceptual encoder is tuned to
prefer exploring solutions along the luma axis, where it's much less work to find
solutions with less luma error. bc7enc16 is, as far as I know, the first BC7
codec to support computing error in weighted YCbCr colorspace.

Note: Most of the timings here (except for the ispc_texcomp "fast" mode timings at the very bottom)
are for the *original* release, before I added several more optimizations. The latest version of 
bc7enc16.c is around 8-27% faster than the initial release at same quality (when mode 1 is enabled - 
there's no change with just mode 6).

Some benchmarks across 31 images (kodim corpus+others):

Perceptual (average REC709 Luma PSNR - higher is better quality):

iscp_texcomp slow vs. bc7enc16 uber4/max_partitions 64
iscp_texcomp:   355.4 secs 48.6 dB
bc7enc16:       122.6 secs 50.0 dB

iscp_texcomp slow vs. bc7enc16 uber0/max_partitions 64
iscp_texcomp:   355.4 secs 48.6 dB
bc7enc16:       38.3 secs 49.6 dB

iscp_texcomp basic vs. bc7enc16 uber0/max_partitions 16
ispc_texcomp:   100.2 secs 48.3 dB
bc7enc16:       20.8 secs 49.3 dB 

iscp_texcomp fast vs. bc7enc16 uber0/max_partitions 16
iscp_texcomp:   41.5 secs 48.0 dB 
bc7enc16:       20.8 secs 49.3 dB

iscp_texcomp ultrafast vs. bc7enc16 uber0/max_partitions 0
iscp_texcomp:   1.9 secs 46.2 dB
bc7enc16:       8.9 secs 48.4 dB 

Non-perceptual (average RGB PSNR):

iscp_texcomp slow vs. bc7enc16 uber4/max_partitions 64
iscp_texcomp:   355.4 secs 46.8 dB 
bc7enc16:       51 secs 46.1 dB

iscp_texcomp slow vs. bc7enc16 uber0/max_partitions 64
iscp_texcomp:   355.4 secs 46.8 dB
bc7enc16:       29.3 secs 45.8 dB

iscp_texcomp basic vs. bc7enc16 uber4/max_partitions 64
iscp_texcomp:   99.9 secs 46.5 dB
bc7enc16:       51 secs 46.1 dB

iscp_texcomp fast vs. bc7enc16 uber1/max_partitions 16
ispc_texcomp:   41.5 secs 46.1 dB
bc7enc16:       19.8 secs 45.5 dB

iscp_texcomp fast vs. bc7enc16 uber0/max_partitions 8
ispc_texcomp:   41.5 secs 46.1 dB
bc7enc16:       10.46 secs 44.4 dB

iscp_texcomp ultrafast vs. bc7enc16 uber0/max_partitions 0
ispc_texcomp:   1.9 secs 42.7 dB 
bc7enc16:       3.8 secs 42.7 dB

DirectXTex CPU in "mode 6 only" mode vs. bc7enc16 uber1/max_partions 0 (mode 6 only), non-perceptual:

DirectXTex:     466.4 secs 41.9 dB 
bc7enc16:       6.7 secs 42.8 dB

DirectXTex CPU in (default - no 3 subset modes) vs. bc7enc16 uber1/max_partions 64, non-perceptual:

DirectXTex:     9485.1 secs 45.6 dB 
bc7enc16:       36 secs 46.0 dB

(Note this version of DirectXTex has a key pbit bugfix which I've submitted but
is still waiting to be accepted. Non-bugfixed versions will be slightly lower
quality.)

UPDATE: To illustrate how strong the mode 1+6 implementation is in bc7enc16, let's compare ispc_texcomp 
fast vs. the latest version of bc7enc16 uber4/max_partitions 64:

Without filterbank optimizations:

                Time       RGB PSNR   Y PSNR
ispc_texcomp:   41.45 secs 46.09 dB   48.0 dB
bc7enc16:       41.42 secs 46.03 dB   48.2 dB

With filterbank optimizations enabled:
bc7enc16:       38.78 secs 45.94 dB   48.12 dB

They both have virtually the same average RGB PSNR with these settings (.06 dB is basically noise), but 
bc7enc16 is just as fast as ispc_texcomp fast, even though it's not vectorized. Interestingly, our Y PSNR is better, 
although bc7enc16 wasn't using perceptual metrics in these benchmarks. 

This was a multithreaded benchmark (using OpenMP) on a dual Xeon workstation.
ispc_texcomp was called with 64-blocks at a time and used AVX instructions.
Timings are for encoding only.
Issues
  • KTX file format added

    KTX file format added

    I just added the standard output file format KTX (https://www.khronos.org/opengles/sdk/tools/KTX/file_format_spec).

    Further I solved issue #2 and some compiler warnings.

    Finally I added "flush" to make the progress visible when the dots are being printed.

    opened by Andreas-Kromke 1
  • Question about g_partition_predictors

    Question about g_partition_predictors

    Would you please clarify why one of the values in g_partition_predictors looks like:

    	(1 << 2) | (1 << 14) | (1 << 15) || (1 << 1),
    

    Thanks

    opened by ppiastucki 3
Owner
Rich Geldreich
Rich Geldreich
conversion from absolute encoder and incremental encoder, control two robotis dynamixel motors, testing qserialport library in qt

Q_dxl This example is created for testing: Serial connection Testing two dynamixel motors (eg. MX-28AT) Doing the conversion from absolute encoder (of

ibov 1 Oct 30, 2021
A single file, single function, header to make notifications on the PS4 easier

Notifi Synopsis Adds a single function notifi(). It functions like printf however the first arg is the image to use (NULL and any invalid input should

Al Azif 7 Mar 24, 2022
A family of small, fast, and simple bitmap fonts in single-file C headers

Blit A family of small, fast, and simple bitmap fonts in single-file C headers [go to repository] These are not intended as a replacement for fancy us

Andrew Reece 49 May 18, 2022
Stack-based texture generation tool written in C99!

Stack-based texture generation tool written in C99! Brought to you by @zaklaus and contributors Introduction zpl.texed is a cross-platform stack-based

zpl | pushing the boundaries of simplicity. 17 May 1, 2022
Arduino code to interface with quadrature-encoder mice, specifically the Depraz mouse

Depraz Mice on USB via Arduino This code lets you connect a Depraz mouse to a modern computer via USB. The Depraz mouse has a male DE-9 connector but

John Floren 5 Jul 8, 2022
Cheap 3D Printed Absolute Encoder Knob

A cheap, 3D printed absolute position encoder knob based on a low-cost AS5600 breakout module. Demo firmware using a TTGO T-Display ESP32 board is pro

Scott Bezek 143 Jul 30, 2022
PHP Encoder, protect PHP scripts in PHP 8 and PHP 7, High Performance, Compitable with X86_64, MIPS, ARM platform and Ubuntu/Centos/OpenWRT system.

What's FRICC2? FRICC2 is a PHP Script encryption tool. When you are developing a commercial software using PHP, the script can be distributed as encry

Hoowa Sun 31 Jun 30, 2022
QuadratureDecoder - PIO based Encoder Library for the RP2040

QuadratureDecoder - PIO based Encoder Library for the RP2040 Overview The QuadratureDecoder C++ class can be used to count quadrature encoder signal t

Adam Green 10 Jun 2, 2022
Quite OK Image (QOI) format encoder/decoder

This project implements encoding and decoding the "Quite OK Image" (QOI) format in the Ć programming language. Ć can be automatically translated to pu

Piotr Fusik 40 Jul 25, 2022
Turing-ring is a simple Turing Machine using just a Nano, a NeoPixel ring and a rotary encoder+push-button The ring is the tape and the UI.

Turing-ring Turing-ring is a simple Turing Machine using just a Nano, a NeoPixel ring and a rotary encoder+push-button The ring is the tape and the UI

Mark Wilson 2 Dec 26, 2021
Minimalist protocol buffer decoder and encoder in C++

protozero Minimalistic protocol buffer decoder and encoder in C++. Designed for high performance. Suitable for writing zero copy parsers and encoders

Mapbox 224 Aug 10, 2022
Texture Packer for Game Development Using MaxRects Algorithm

Overview Texture Packer for Game Development Using MaxRects Algorithm. Note: The game assets used in this example were download from Grassland Tileset

Jeremy HU 61 May 2, 2022
Builds atlas texture from a bunch of input images.

Atlasc @septag atlasc is a command-line program that builds atlas texture from a bunch of input images. Main Features Cross-platform. Runs on linux/ma

Sepehr Taghdisian 75 Jul 29, 2022
Basis Universal GPU Texture Codec

basis_universal Basis Universal Supercompressed GPU Texture Codec Basis Universal is a "supercompressed" GPU texture data interchange system that supp

null 2.1k Jul 29, 2022
Simple font renderer library written in Opengl 3.3 using stb_truetype.h to load a packed bitmap into texture of a .ttf font.

mv_easy_font Simple font renderer library written in Opengl 3.3 using stb_truetype.h to load a packed bitmap into texture of a .ttf font. Uses instanc

null 27 May 13, 2022
An efficient texture-free GLSL procedural noise library

Wombat An efficient texture-free GLSL procedural noise library Source: https://github.com/BrianSharpe/Wombat Derived from: https://github.com/BrianSha

Brian Sharpe 194 Jul 13, 2022
"Sigma File Manager" is a free, open-source, quickly evolving, modern file manager (explorer / finder) app for Windows, MacOS, and Linux.

"Sigma File Manager" is a free, open-source, quickly evolving, modern file manager (explorer / finder) app for Windows, MacOS, and Linux.

Aleksey Hoffman 898 Aug 7, 2022
Filter driver which support changing DPI of mouse that does not support hardware dpi changing.

Custom Mouse DPI Driver 하드웨어 DPI 변경이 불가능한 마우스들의 DPI 변경을 가능하게 하는 필터 드라이버 경고: 해당 드라이버는 완전히 테스트 되지 않았습니다 Install 해당 드라이버는 서명이 되어있지않습니다. 드라이버를 사용하려면 tests

storycraft 3 Jun 9, 2022