Data compression utility for minimalist demoscene programs.

Related tags

Compression bzpack
Overview

bzpack

Bzpack is a data compression utility which targets retrocomputing and demoscene enthusiasts. Given the artificially imposed size limits on programs like 256 B, 512 B or 1 KiB intros, the ability to implement a short decoder is nearly as important as the efficiency of the compression format itself. Bzpack doesn’t claim to be a state-of-the-art, general purpose packer. The goal is to achieve acceptable trade-off between simplicity and efficiency in order to minimize the overall program length. Special consideration was given to vintage computing platforms like Sinclair ZX Spectrum.

Format Overview

All supported formats are based on the Lempel–Ziv–Storer–Szymanski algorithm. The compressed stream consists of:

  1. strings of uncompressed bytes (literals),
  2. reusable byte sequences represented as offset-length pairs (phrases).

Bzpack encodes offsets as raw 8-bit values. Literals are encoded in two possible ways:

  1. as a sequence of bytes preceded by block length (the “block” method),
  2. using 1-bit flag for every literal symbol (the “unary” method).

Phrase and literal lengths are encoded using universal codes such as Elias-Gamma.

A Note on Elias-Gamma Coding

The canonical form of Elias-Gamma code consists of N zeroes followed by a (N + 1)-bit binary number. For instance, the number 12 is encoded as 0001100. In his 1975 paper "Universal codeword sets and representations of the integers", Peter Elias devised also an alternative form in which the bits are interleaved: 1010000.

Assuming that the most significant bit is implicit, the interleaved zeroes can be seen as 1-bit flags that indicate whether another significant bit follows. This interleaved form lends itself to a very efficient decoder implementation. Bzpack borrows this approach but uses 1s instead of 0s. Therefore, the actual format as it appears in the output is: (1)1110100, that is:

  1. the most significant bit is not stored,
  2. subsequent significant bits are preceded by 1 indicating their presence,
  3. 0 marks the end of sequence.

Selecting Between 1..N and 2..N Codes

Elias-Gamma code is capable of representing arbitrary integer values 1..N. However, the code can be offset to 2..N which leads to somewhat different distribution of codeword lengths:

Regular 1..N code:

1: 0
2: 100
3: 110
4: 10100
5: 10110
6: 11100
7: 11110

Offset 2..N code:

2: 00
3: 10
4: 0100
5: 0110
6: 1100
7: 1110
8: 10100

Although the 1..N code usually performs better, there are data blocks where 2..N is a better fit. Therefore, it is best to try both options.

Thanks

I have used valuable input from multiple people, including Aleksey "introspec" Pichugin, Slavomir "Busy" Labsky and Pavel "Zilog" Cimbal. Let me also take the opportunity to recognize the work of Einar Saukas.

Issues
  • I added support for bzpack -be1 -e to my 86-DOS kernel depacker

    I added support for bzpack -be1 -e to my 86-DOS kernel depacker

    Here's my depacker based on your C++ code, it assumes the -be1 -e format. Like all my depackers it supports overlapping destination and source buffers. It was very easy to write, I didn't have to fix a single bug. That is, this is my first draft and it appears to work as is.

    Here's the additions to the build scripts to assemble my debugger with bzpack used for compression.

    opened by ecm-pushbx 3
  • Patch to compile on Debian Linux 10

    Patch to compile on Debian Linux 10

    I tried out building revision https://github.com/mbaze/bzpack/commit/9294688b98ce243fec49e4188c042509ba836bf7 on a Debian Linux 10 server with clang 7.0.1 and gcc 8.3.0 and had to modify a few bits to make everything compile. I uploaded everything to the server and here's just the patch. I didn't test decompression of the resulting file yet but both compilers produce the same output file and its size seems plausible: the 92 KiB of an 86-DOS application binary compressed to 68 KiB, using the default compression scheme.

    diff --git a/src/BitStream.h b/src/BitStream.h
    index 014c78c..1d03310 100644
    --- a/src/BitStream.h
    +++ b/src/BitStream.h
    @@ -5,6 +5,8 @@
     #define BIT_STREAM_H
     
     #include <vector>
    +#include <algorithm>
    +#include <cstdint>
     
     class BitStream
     {
    diff --git a/src/Main.cpp b/src/Main.cpp
    index 776c84f..7515b0c 100644
    --- a/src/Main.cpp
    +++ b/src/Main.cpp
    @@ -4,6 +4,7 @@
     #include <memory>
     #include <string>
     #include <stdio.h>
    +#include <algorithm>
     #include "Compressor.h"
     
     const char* pInputFileError = "Error: Cannot open input file.\n";
    @@ -72,7 +73,7 @@ int main(int argCount, char** args)
         }
     
         FILE* pInputFile;
    -    if (fopen_s(&pInputFile, args[1], "rb"))
    +    if ( (pInputFile = fopen(args[1], "rb")) == NULL )
         {
             printf(pInputFileError);
             return 0;
    @@ -89,7 +90,7 @@ int main(int argCount, char** args)
         rewind(pInputFile);
     
         std::unique_ptr<uint8_t[]> spInputStream = std::make_unique<uint8_t[]>(inputFileSize);
    -    size_t bytesRead = fread_s(spInputStream.get(), inputFileSize, 1, inputFileSize, pInputFile);
    +    size_t bytesRead = fread(spInputStream.get(), 1, inputFileSize, pInputFile);
         fclose(pInputFile);
     
         if (bytesRead != inputFileSize)
    @@ -105,7 +106,7 @@ int main(int argCount, char** args)
         }
     
         FILE* pOutputFile;
    -    if (fopen_s(&pOutputFile, args[2], "wb"))
    +    if ( (pOutputFile = fopen(args[2], "wb")) == NULL )
         {
             printf(pOutputFileError);
             return 0;
    diff --git a/src/UniversalCodes.h b/src/UniversalCodes.h
    index e40934f..c69cc41 100644
    --- a/src/UniversalCodes.h
    +++ b/src/UniversalCodes.h
    @@ -5,6 +5,8 @@
     #define UNIVERSAL_CODES_H
     
     #include "BitStream.h"
    +#include <cassert>
    +#define _ASSERT assert
     
     uint32_t GetElias1Cost(uint32_t value);
     void EncodeElias1(BitStream& stream, uint32_t value);
    
    opened by ecm-pushbx 2
  • Suggestion: Encode an exact number as end marker, not unspecified value above 0FFh

    Suggestion: Encode an exact number as end marker, not unspecified value above 0FFh

    I believe this is the code that encodes the end marker for the default E1E1 format: https://github.com/mbaze/bzpack/blob/8627dde07d6228b3ae54dbdd283a3dce838faf75/src/Compressor.cpp#L102

    In my depacker I found that you apparently encode the value 1FFh to be used as an end marker (16 ones followed by 1 zero). I'd prefer if the value, whatever specific value you'd choose, would be fixed and dependable. This would aid my depacker's corruption detection. Accidentally encoding any value higher than FFh is much easier than accidentally encoding one single specific value, so the error detection is hardened if we can depend on one specific value.

    opened by ecm-pushbx 4
Owner
Milos Bazelides
Demoscener, 8-bit enthusiast, senior graphics developer.
Milos Bazelides
Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++

LZHAM - Lossless Data Compression Codec Public Domain (see LICENSE) LZHAM is a lossless data compression codec written in C/C++ (specifically C++03),

Rich Geldreich 626 Jun 15, 2022
A variation CredBandit that uses compression to reduce the size of the data that must be trasnmitted.

compressedCredBandit compressedCredBandit is a modified version of anthemtotheego's proof of concept Beacon Object File (BOF). This version does all t

Conor Richard 17 Apr 9, 2022
Brotli compression format

SECURITY NOTE Please consider updating brotli to version 1.0.9 (latest). Version 1.0.9 contains a fix to "integer overflow" problem. This happens when

Google 11.2k Jul 3, 2022
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

null 1.8k Jul 2, 2022
Extremely Fast Compression algorithm

LZ4 - Extremely fast compression LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU

lz4 7.2k Jul 4, 2022
LZFSE compression library and command line tool

LZFSE This is a reference C implementation of the LZFSE compressor introduced in the Compression library with OS X 10.11 and iOS 9. LZFSE is a Lempel-

null 1.7k Jun 24, 2022
Small strings compression library

SMAZ - compression for very small strings ----------------------------------------- Smaz is a simple compression library suitable for compressing ver

Salvatore Sanfilippo 984 Jun 20, 2022
Zstandard - Fast real-time compression algorithm

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better comp

Facebook 17.1k Jun 28, 2022
A massively spiffy yet delicately unobtrusive compression library.

ZLIB DATA COMPRESSION LIBRARY zlib 1.2.11 is a general purpose data compression library. All the code is thread safe. The data format used by the z

Mark Adler 3.6k Jun 28, 2022
A bespoke sample compression codec for 64k intros

pulsejet A bespoke sample compression codec for 64K intros codec pulsejet lifts a lot of ideas from Opus, and more specifically, its CELT layer, which

logicoma 32 Apr 6, 2022
A simple C library implementing the compression algorithm for isosceles triangles.

orvaenting Summary A simple C library implementing the compression algorithm for isosceles triangles. License This project's license is GPL 2 (as of J

Kevin Matthes 0 Apr 1, 2022
Advanced DXTc texture compression and transcoding library

crunch/crnlib v1.04 - Advanced DXTn texture compression library Public Domain - Please see license.txt. Portions of this software make use of public d

null 743 Jun 19, 2022
Better lossless compression than PNG with a simpler algorithm

Zpng Small experimental lossless photographic image compression library with a C API and command-line interface. It's much faster than PNG and compres

Chris Taylor 200 Jun 17, 2022
Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.

slow5tools Slow5tools is a simple toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format. Abou

Hasindu Gamaarachchi 46 Jun 22, 2022
Simple data packing library (written in C99)

Features Compressed file pack creation Runtime file pack reading Supported operating systems Ubuntu MacOS Windows Build requirements C99 compiler CMak

Nikita Fediuchin 3 Feb 25, 2022
Experimental data compressor for 8-bit computers and low-end platforms

ZX5 (experimental) ZX5 is an experimental data compressor derived from ZX0, similarly targeted for low-end platforms, including 8-bit computers like t

Einar Saukas 9 Apr 14, 2022
Analysing and implementation of lossless data compression techniques like Huffman encoding and LZW was conducted along with JPEG lossy compression technique based on discrete cosine transform (DCT) for Image compression.

PROJECT FILE COMPRESSION ALGORITHMS - Huffman compression LZW compression DCT Aim of the project - Implement above mentioned compression algorithms an

null 1 Dec 14, 2021
Przemyslaw Skibinski 551 Jul 1, 2022
gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

ACM at UCLA 7 Apr 27, 2022
A small utility to embed files into C or C++ programs.

hexembed hexembed is a very small utility to help embed files in C or C++ programs in an easy, cross-platform way. Usage > gcc hexembed.c -o hexembed

Lewis Van Winkle 30 Jun 15, 2022
data compression library for embedded/real-time systems

heatshrink A data compression/decompression library for embedded/real-time systems. Key Features: Low memory usage (as low as 50 bytes) It is useful f

Atomic Object 1.1k Jun 30, 2022
Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++

LZHAM - Lossless Data Compression Codec Public Domain (see LICENSE) LZHAM is a lossless data compression codec written in C/C++ (specifically C++03),

Rich Geldreich 626 Jun 15, 2022
A variation CredBandit that uses compression to reduce the size of the data that must be trasnmitted.

compressedCredBandit compressedCredBandit is a modified version of anthemtotheego's proof of concept Beacon Object File (BOF). This version does all t

Conor Richard 17 Apr 9, 2022
Argh! A minimalist argument handler.

Frustration-free command line processing So many different command line processing libraries out there and none of them just work! Some bring their wh

Adi Shavit 1k Jun 27, 2022
minimalist reddit2nntp gateway

nntpit This is a simple reddit2nntp gateway server that lets you use a newsreader to follow discussions on reddit. The intention is for you to run it

Tavis Ormandy 211 May 29, 2022
A minimalist andf platform-agnostic application layer for writing graphical applications, with a strong emphasis on simplicity and ease of use.

SlimApp A minimalist(*) and platform-agnostic application layer for writing graphical applications. Available as either a single header file or a dire

Arnon Marcus 33 Apr 22, 2022
A minimalist library with basic facilities for developing interactive real-time 3D applications, with a strong emphasis on simplicity and ease of use.

SlimEngine A minimalist and platform-agnostic base project for interactive graphical applications (2D/3D) with a strong emphasis on simplicity, ease o

Arnon Marcus 66 May 10, 2022
kvm-host is a minimalist type 2 hypervisor using Linux Kernel-based Virtual Machine (KVM), capable of running Linux kernel partially.

kvm-host kvm-host is a minimalist type 2 hypervisor using Linux Kernel-based Virtual Machine (KVM), capable of running Linux kernel partially. Build a

null 68 Jul 3, 2022