Translates binary information (images, fonts, shaders) into C++ source code.

Related tags

Utilities cpp
Overview

Binary bakery 🍪

Translates binary files (images, fonts etc.) into C++ source code and gives access to that data at compile- or runtime. There are different reasons why you might want this:

  • Avoiding the complexity of loading images or archives with external libraries
  • Compile-time access to meta information and content itself, including pixel colors
  • Faster load times, especially with small files
  • Avoiding to ship files with your application binary or preventing people from grabbing such files

Binary bakery allows the data itself as well as some meta information to be available at compile-time (constexpr). Images are also encoded in a special way so that their dimensions as well as their pixel colors are accessible directly - without decoding with image libraries.

Basics 🍰 Encoding 🧁 Decoding 🥧 Costs and benefits 🎂

Basics

An executable translates binary file(s) into C++ source code:

Include the resulting payload header as well as the decoder header into your code:

logo_pixels = bb::decode_to_vector (logo_ptr); // Meta information is also available - at compile time! constexpr bb::header meta_information = bb::get_header(logo_ptr); // For uncompressed payloads, the data can also be accessed at compile time constexpr color first_pixel = bb::get_element (logo_ptr, 0); // If you don't want to use std::vector, the data can just be memcopied my_vec storage; constexpr const uint64_t* large_data_ptr = bb::get_payload("level.bin"); storage.resize(bb::get_element_count (large_data_ptr)); bb::decode_into_pointer(large_data_ptr, storage.data(), my_decompression_function); ">
#include <binary_bakery_payload.h>
#define BAKERY_PROVIDE_VECTOR
#include <binary_bakery_decoder.h>

// All binary information can just be read as bytes.
const std::vector<uint8_t> font_bytes = bb::decode_to_vector<uint8_t>(bb::get_payload("fancy_font.ttf"));

// Images have their pixel information available directly, without third party libraries
struct color { uint8_t r, g, b; };

constexpr const uint64_t* logo_ptr = bb::get_payload("logo.png");
const std::vector
        logo_pixels = bb::decode_to_vector
       
        (logo_ptr);


        // Meta information is also available - at compile time!

        constexpr bb::header meta_information = bb::get_header(logo_ptr);


        // For uncompressed payloads, the data can also be accessed at compile time

        constexpr color first_pixel = bb::get_element
        
         (logo_ptr, 
         0);


         // If you don't want to use std::vector, the data can just be memcopied
my_vec<
         uint8_t> storage;

         constexpr 
         const 
         uint64_t* large_data_ptr = bb::get_payload(
         "level.bin");
storage.resize(bb::get_element_count<
         uint8_t>(large_data_ptr));

         bb::decode_into_pointer(large_data_ptr, storage.data(), my_decompression_function);
        
       
      

If decompression code is available in the target codebase, the payload can be compressed during encoding, resulting in less impact on the compile metrics. Currently supported is zstd and LZ4.

Encoding

The tool is a command line executable that encodes all files from its command line parameters. Dragging files on top of it is the same as calling it from the command line:

binary_bakery.exe file1 file2 ...

Configuration

There's a binary_bakery.toml configuration file, which documents its options. Most importantly, you can set your compression there.

The program tries to pick the best available configuration file. In order of priority:

  1. A suitable .toml file among one of the parameters (or files dragged onto the executable).
  2. A binary_bakery.toml in the directory of the files being encoded.
  3. A binary_bakery.toml in the current working directory.
  4. Default settings.

Not all settings have to be set, left out will be defaulted.

Currently png, tga and bmp images will be read as images and have their pixel information stored directly. Other image formats like jpg will be treated as any other generic binary file. It's not recommended to use images without another compression algorithm. png files can have a huge memory footprint compared to their filesize when not compressed in another way.

Decoding

The encoder produces a payload header, which contains valid C++ and needs to be included in your source code. Make sure to only include it in one translation unit because of its potentially large size. To access the encoded information inside, you also need the binary_bakery_decoder.h.

A typical payload header looks like this:

namespace bb{

static constexpr uint64_t bb_bitmap_font_16_png[]{
   0x0020002003000201, 0x00000bf600000c00, 0x62a1b925fffffff0, 0x97ad5c9db662a1b9, 
   0xc262a3bb65a8bc5b, 0x5b9bb462a3bb6aad, 0x82973f7c944f8ca1, /* ... */
};

} // namespace bb

Header

You can get a const uint64_t* pointer to the payloads at compile-time by filename with bb::get_payload(const char*). All other functions require that payload pointer.

Inside those uint64 payload arrays is a header with meta information and the data itself. You can access the header with constexpr get_header(const uint64_t*). See binary_bakery_decoder.h#L16-L34 for the header members.

Decompression

If your data was encoded with a compression algorithm, you need to provide a function in your code that does the decompression. All interface functions have such a function pointer parameter. For uncompressed streams, that parameter can be left out as it defaults to nullptr. That function should look like this:

auto compression_fun(
   const void* src,       // Points to the compressed data
   const size_t src_size, // Size of compressed data in bytes
   void* dst,             // Pointer to the preallocated decompressed data
   const size_t dst_size  // Size of decompressed data in bytes
) -> void;

For zstd for example, that would typically contain a call to ZSTD_decompress(dst, dst_size, src, src_size);. For LZ4, that might look like LZ4_decompress_safe(src, dst, src_size, dst_size).

Data interfaces

template
      
std::vector bb::decode_to_vector(const uint64_t* payload, decomp_fun)
Returns a vector of your target type. If you want to use this interface, you need to #define BAKERY_PROVIDE_VECTOR before you include the decoder header (to prevent the include if you don't). Note that this function requires user_type to be default constructible.
void bb::decode_into_pointer(const uint64_t* payload, void* dst, decomp_fun)
Writes into a preallocated memory. You can access the required decompressed size in bytes (at compile-time) from header::decompressed_size. This function memcopies into the destination.
template
      
constexpr user_type bb::get_element(const uint64_t* payload, const int index)
Compile-time access that only works for uncompressed data. For images, it should be sizeof(user_type)==bpp.

Do your own thing

If you want to avoid using the provided decoding header altogether, you can access the information yourself. The first 16 bytes contain the header which is defined at the top of the binary_bakery_decoder.h. Everything after that is the byte stream.

Error handling

If there's an error in a compile-time context, that always results in a compile error. Runtime behavior is configurable by providing a function that gets called in error cases. You might want to throw an exception, call std::terminate(), log some error and continue or whatever you desire.

To provide a custom error function, set set the function pointer to your function:

auto my_error_function(
   const char* msg,
   const std::source_location& loc
) -> void
{
   std::cerr << std::format(
      "ERROR: {} in: {}({}:{}) \"{}\"\n",
      msg, loc.file_name(), loc.line(), loc.column(), loc.function_name()
      );
   std::terminate();
}
// ...
bb::error_callback = my_error_function;

⚠️ If no bb::error_callback is set, default behavior is ignoring errors and returning nulled values. That is almost certainly not what you want. Errors are things like calling image-only functions on non-image payloads and providing nullptr parameters. Behavior summary:

Compiletime Runtime
Default Compile error No error, return defaulted types
User-defined error function Compile error Call user-defined function

Costs and benefits

There are two main concerns about embedding non-code data into your source code and resulting binary: Compile times and the size of the resulting binary. On the flipside, there's also the potential of higher decode speed. What follows is an analysis of the pros and cons this method vs file loading in regard to various metrics. To get realistic results, a dataset of different images with common dimension and sizes was created (in sample_datasets/):

Dimensions Uncompressed size zstd ratio LZ4 ratio
192.png 8×8×3BPP 192 B 91.7% 94.8%
3072.png 32×32×3BPP 3 KB 88.7% 99.7%
49152.png 128×128×3BPP 48 KB 12.0% 29.1%
240000.png 400×200×3BPP 234 KB 34.9% 43.6%
480000.png 400×400×3BPP 468 KB 22.4% 31.9%
3145728.png 1024×1024×3BPP 3 MB 14.2% 24.9%
16777216.png 2048×2048×4BPP 16 MB 10.6% 17.5%

Note that the compression ratio here refers to compressed size / uncompressed size (lower is better).

The dataset contains various game spritesheets like this:

Binary size

The expected size of the resulting binary is the size without any embedded binary files plus the byte-size of the payload. The following plot shows the resulting binary size relative to that expected size.

Good news: the compiler doesn't add overhead beyond the payload size and a small constant size penalty of ~3KB from the decoding header. Compression allows the payload size to decrease, reducing the impact on binary size and -time.

As an example datapoint, an image with an uncompressed size of 16MB only adds 1.78MB to the resulting binary with zstd compression.

Compile times

The increase in compile times is linear with the size of the payload (note the log scale). Compression decreases the effective payload size. For the biggest 16 MB data sample, compile time increases by 5 seconds uncompressed and 0.5 seconds with zstd.

The payload header should only be included in one translation unit (TU). With commonplace parallel compilation, the effective compile time increase should only be 1/n (with n threads) of that number because the other n-1 threads can compile other TUs. So even ignoring compression, a payload size of 16MB only increases compile times by 0.06 seconds (assuming 8 threads and enough TUs to saturate them).

Decode speed

Of interest is also the loading speed compared to traditional loading from files. Here's the decoding time for png images relative to the decoding speed from a file with stb_image (lower is faster):

LZ4 performs identical to uncompressed data in decoding speed by being fast enough to not be the bottleneck. zstd is heavier bit also often reduced the compile impact by half compared to LZ4.

Baking performs better than file loading for all sizes and compression types. In particular for small files, which should be the main target demographic for this tool.

All these numbers were measured in release mode (/O2) with Visual Studio 16.11.3, a Ryzen 3800X and a Samsung 970 EVO SSD.

Summary

This is a niche tool. File systems and other means of packing data are a good inventions and the better choice in most cases.

If this suits your needs however, the tradeoffs are manageable.

Comments
  • CMake support

    CMake support

    Great idea! I didn't come across something very similar -- although Qt is cross platform, it is definitely extremely heavy.

    I have a few questions 1- Your project seems to be based on visual studio, why not port it to CMake? 2- Is the project (binary file) windows dependent or do you only serve the executable for windows and ask others to compile it for unix/macos? 3- Would you be interested in extending this to make it integrated via CMake to allow resource files to be updated when they change on pre-built steps?

    opened by CihanSari 15
  • Replace const char* with string_view

    Replace const char* with string_view

    Please check carefully. I tested locally and encountered no problems. However, this PR contains a lot more changes in the application than the others.

    string_view allows easy string comparison alongside other pretty features (non-null terminated strings with actual length etc.)

    opened by CihanSari 7
  • CMake build environment & Ubuntu support

    CMake build environment & Ubuntu support

    Created the build environment using cmake Added instructions on how to prepare the build environment using VCPKG on Windows and Ubuntu Added missing includes for GNU11 on Ubuntu Removed non-common tests (C:\Dropbox or C:) Added a short TODO list on building.md (to be expanded)

    opened by CihanSari 0
  • (documentation) Mention similar/alternative OSS solutions in README, plus a bit about their pros & cons

    (documentation) Mention similar/alternative OSS solutions in README, plus a bit about their pros & cons

    Mention similar/alternative OSS solutions, plus a short description & cons of using those.


    Was looking myself for a fundamental 'resource compiler' like yours. Already had inspected a few of the others and wasn't satisfied. Kept looking, found yours, which I've to test further, but it looks promising. 👍

    Augmented the README at the end.

    opened by GerHobbelt 0
  • (QA) fix PVS reports

    (QA) fix PVS reports

    as reported by PVS Studio (a static code analyzer tool I use for some other projects too):

    • V801 Decreased performance. It is better to redefine the third function argument as a reference. Consider replacing 'const .. uncompressed_size' with 'const .. &uncompressed_size'. content_meta.cpp 62
    • V801 Decreased performance. It is better to redefine the fourth function argument as a reference. Consider replacing 'const .. compressed_size' with 'const .. &compressed_size'. content_meta.cpp 63
    opened by GerHobbelt 1
  • Potential additions

    Potential additions

    • zstd has a compressions level setting. Currently that's fixed on the recommended level (3). That could be made configurable
    • Other compressions?
    • Currently, png, tga and bmp formats are read as images. That maybe should be configurable.
    opened by s9w 5
Owner
Sebastian Werhausen
Sebastian Werhausen
Whitee is a tiny compiler written in C++17, which translates SysY language into ARM-v7a assembly.

Whitee is a tiny compiler written in C++17, which translates SysY language into ARM-v7a assembly. Table of Contents Background Install Usage Ar

null 14 Dec 11, 2022
Tool based in nodes to build GLSL shaders without any programming knowledge written in C using OpenGL and GLFW.

FNode Tool based in nodes to build GLSL shaders without any programming knowledge written in C using OpenGL and GLFW (raylib library). It contains a c

Víctor Fisac 80 Dec 26, 2022
A toolchain for injecting custom code into Super Mario Galaxy 2.

Syati Syati is a custom code loader for Super Mario Galaxy 2. It is able to compile custom code and link to existing functions in the game to create o

shibbs 20 Mar 29, 2022
Applications based on Wi-Fi CSI (Channel state information), such as indoor positioning, human detection

ESP-CSI The main purpose of this project is to show the use of ESP-WIFI-CSI. The human body detection algorithm is still being optimized. You can get

Espressif Systems 314 Jan 4, 2023
Another system information tool written in C++

Sysfex Another neofetch-like system information fetching tool for linux-based systems written in C++. This is a hobby project, so bugs are to be expec

Mehedi Rahman Mahi 109 Dec 29, 2022
A simple application that generates animated BTTV emotes from static images

emoteJAM WARNING! The application is in active development and can't do anything yet. A simple application that generates animated BTTV emotes from st

Tsoding 7 Apr 27, 2021
Orbit, the Open Runtime Binary Instrumentation Tool, is a standalone C/C++ profiler for Windows and Linux

Orbit, the Open Runtime Binary Instrumentation Tool, is a standalone C/C++ profiler for Windows and Linux. Its main purpose is to help developers visualize the execution flow of a complex application.

Google 3k Dec 30, 2022
VMPImportFixer is a tool aimed to resolve import calls in a VMProtect'd (3.x) binary.

VMPImportFixer VMPImportFixer is a tool aimed to resolve import calls in a VMProtect'd (3.x) binary. Information VMPImportFixer attempts to resolve al

null 256 Dec 28, 2022
A combined suite of utilities for manipulating binary data files.

BinaryTools A combined suite of utilities for manipulating binary data files. It was developed for use on Windows but might compile on other systems.

David Walters 6 Oct 1, 2022
The most powerful and customizable binary pattern scanner written on modern C++

Sig The most powerful and customizable binary pattern scanner written on modern C++ ✔ Capabilities: Support for all common pattern formats: Pattern +

Александр 153 Dec 21, 2022
libcurses and dependencies taken from netbsd and brought into a portable shape (at least to musl or glibc)

netbsd-libcurses portable edition this is a port of netbsd's curses library for usage on Linux systems (tested and developed on sabotage linux, based

null 124 Nov 7, 2022
runsc loads 32/64 bit shellcode (depending on how runsc is compiled) in a way that makes it easy to load in a debugger. This code is based on the code from https://github.com/Kdr0x/Kd_Shellcode_Loader by Gary "kd" Contreras.

runsc This code is based on the code from https://github.com/Kdr0x/Kd_Shellcode_Loader by Gary "kd" Contreras and contains additional functionality. T

null 24 Nov 9, 2022
Source code for Amiga intro Planet Disco Balls

Planet Jazz - Planet Disco Balls (Amiga A500 Intro Source) What is it? This is the 68000 assembler source code for the Planet Jazz "Planet Disco Balls

Jonathan Bennett 21 Oct 1, 2022
MacFlim flim player source code and utilities

MacFlim Video player source code Please do not barf on code quality. It was not in releasable state, but people wanted to use it. You may even be one

Fred Stark 71 Jan 1, 2023
vs herobrine fnf source code

Friday Night Funkin' - Psych Engine Engine originally used on Mind Games Mod, intended to be a fix for the vanilla version's many issues while keeping

indigoUan 4 Aug 6, 2022
Fast regular expression grep for source code with incremental index updates

Fast regular expression grep for source code with incremental index updates

Arseny Kapoulkine 261 Dec 28, 2022
C-code generator for docopt language.

C-code generator for docopt language Note, at this point the code generator handles only options (positional arguments, commands and pattern matching

null 311 Dec 25, 2022
Utilities and common code for use with raylib

Utilities and shared components for use with raylib

Jeffery Myers 86 Dec 1, 2022
Example code for interfacing with a LCD with a Raspberry Pi Pico

picoLCD is a collection of functions to make interfacing with HD44780 based LCD screens easier on the Raspberry Pi Pico. Rather than reading through data sheets to figure out the correct set of instructions to send to the screen, picoLCD attempts to make it a simpler process, while still being extremely versatile.

null 25 Sep 8, 2022