New generation entropy codecs : Finite State Entropy and Huff0

Overview

New Generation Entropy coders

This library proposes two high speed entropy coders :

Huff0, a Huffman codec designed for modern CPU, featuring OoO (Out of Order) operations on multiple ALU (Arithmetic Logic Unit), achieving extremely fast compression and decompression speeds.

FSE is a new kind of Entropy encoder, based on ANS theory, from Jarek Duda, achieving precise compression accuracy (like Arithmetic coding) at much higher speeds.

Branch Status
master Build Status
dev Build Status

Benchmarks

Benchmarks are run on an Intel Core i7-5600U, with Linux Mint 64-bits. Source code is compiled using GCC 4.8.4, 64-bits mode. Test files are generated using the provided probagen program. Benchmark breaks sample files into blocks of 32 KB. Huff0 and FSE are compared to zlibh, the huffman encoder within zlib, provided by Frederic Kayser.

File Codec Ratio Compression Decompression
Proba80
Huff0 6.38 600 MB/s 1350 MB/s
FSE 8.84 325 MB/s 440 MB/s
zlibh 6.38 265 MB/s 300 MB/s
Proba14
Huff0 1.90 595 MB/s 860 MB/s
FSE 1.91 330 MB/s 460 MB/s
zlibh 1.90 255 MB/s 250 MB/s
Proba02
Huff0 1.13 525 MB/s 555 MB/s
FSE 1.13 325 MB/s 445 MB/s
zlibh 1.13 180 MB/s 210 MB/s

By design, Huffman can't break the "1 bit per symbol" limit, hence loses efficiency on squeezed distributions, such as Proba80. FSE is free of such limit, and its compression efficiency remains close to Shannon limit in all circumstances. However, this accuracy is not always necessary, and less compressible distributions show little difference with Huffman. On its side, Huff0 delivers in the form of a massive speed advantage.

Branch Policy

External contributions are welcomed and encouraged. The "master" branch is only meant to host stable releases. The "dev" branch is the one where all contributions are merged. If you want to propose a patch, please commit into "dev" branch or dedicated feature branch. Direct commit to "master" are not permitted.

Comments
  • Public methods are not exported to DLL/LIB

    Public methods are not exported to DLL/LIB

    As stated in facebook/zstd#472, I would like to use FSE and HUFF0 methods from .NET which requires having a Windows DLL with public method marked as exported.

    As was done for zstd, we just need to add a macro that does the same thing as here: https://github.com/facebook/zstd/blob/426a9d4b7128ef54d79627cff346173e833f733a/lib/zstd.h#L21

    I can try putting a PR together for this, but I have a few questions:

    1. Should there be a single PUBLIC_API macro used in all the fse.h, huf.h error_public.h? Or should there be multiple FSE_API, HUF_API macros to opt-in/out of each module?

    2. Can we consider all methods declared in fse.h, huf.h and error_public.h as public, and everything else as private? What about bitstream.h and mem.h ?

    3. the VC2012 project does not seem to provide a name for the generated dll (so it's probably . Would there be objections to updating the project to generate libfse.dll instead? Though if it can contains huff0 as well, many libfse.dll is too specific, and there should be another name for this library? (the repo is named FiniteStateEntropy but the title says "New Generation Entropy coders"...)

    opened by KrzysFR 9
  • Integer division by zero in FSE_normalizeM2

    Integer division by zero in FSE_normalizeM2

    Hi, I get sometimes integer division by zero error from FSE_normalizeM2, when having large max symbol values. Specifically, I am using following compiler defines to compile FSE:

    FSE_DEFAULT_MEMORY_USAGE=14 FSE_MAX_MEMORY_USAGE=16 FSEU16_DEFAULT_MEMORY_USAGE=14 FSEU16_MAX_MEMORY_USAGE=16 FSEU16_MAX_SYMBOL_VALUE=4095

    Please find below example code that will reproduce the error.

    #include "fse.h"
    
    void test()
    {
    	uint16_t sourceBuffer[64] =
    	{
    		4048, 1072, 1532, 1936, 2252, 2460, 2536, 2480, 2292, 1988, 1596, 1140, 668, 200, 
    		207, 535, 747, 839, 795, 615, 327, 64, 512, 988, 1456, 1868, 2208, 2432, 2532,
    		2500, 2332, 2048, 1672, 1224, 748, 280, 139, 483, 723, 831, 811, 659, 383, 11,
    		432, 904, 1376, 1800, 2152, 2404, 2524, 2516, 2372, 2104, 1744, 1304, 832, 364,
    		71, 431, 687, 823, 827, 263
    	};
    	uint16_t destinationBuffer[64];
    	static_assert(FSEU16_MAX_SYMBOL_VALUE == 4095, "Custom fseu16 max symbol value");
    	FSE_compressU16(destinationBuffer, 64, sourceBuffer, 64, 4048, 13);
    	/* -> Integer division by zero at fse_compress.c line 540 (FSE_normalizeM2) */
    }
    
    

    Thanks, Markus

    opened by sh4k3n 8
  • huff0 with > 256 symbols

    huff0 with > 256 symbols

    I am fascinated by the speed of the huff0 encoder however unfortunately I wish to encode/decode data with 512 bits. I have already tried the fseU16 and it works great, however is there a quick hack/modification I can do to make the huff0 also work with 9-bit data?

    Any pointers in this direction are appreciated. Thanks

    UPDATE: My apologies, I didn't realized that HUF_compress2() is doing exactly what I desire. However will have to use it before saying it for sure.

    question 
    opened by rajputasif 7
  • Crash when FSE_MAX_MEMORY_USAGE 13

    Crash when FSE_MAX_MEMORY_USAGE 13

    1. Clone current master
    2. Edit FSE_MAX_MEMORY_USAGE to 13
    3. fse blows up its stack when compressing

    Valgrind and gdb traces are useless. Changing DEBUGLEVEL to 1 in debug.h does not help, no assert fires. Compiling with -fstack-protector says "*** stack smashing detected ***" and gdb says it came when exiting FSE_compress2 at ../lib/fse_compress.c:706.

    opened by clbr 6
  • FSE_emergencyDistrib does not exit under certain conditions

    FSE_emergencyDistrib does not exit under certain conditions

    when all values in the first maxSymbolValue positions of the normalizedCounter are <= 1 the function will not exit and an infinite loop will result

    opened by tjizep 5
  • Feature request: safe decompression for U16 coder

    Feature request: safe decompression for U16 coder

    The 8-bit decompressor has a safe function which guarantees it will not read outside of the compressed buffer. It would be great if the 16-bit decompressor also had that functionality.

    opened by chadaustin 5
  • GCC and LLVM CLANG Stats

    GCC and LLVM CLANG Stats

    Hi,

    I ran these on a mediocre Core2Duo with 1.3GHz and 4GB DDR on Gentoo Linux with the latest kernel and thought that sharing these stats might be useful. I've not integrated Yepp! into FSE, but would be curious, if it could bring any advantage. (The -lyeppp flag was used in the hope that it would have a positive effect.) I have tried various combinations of gcc/clang flags and none have had a positive effect, except -02 in combination with -lyeppp, but most visibly the usage of -funroll-loops in combination with clang version 3.3.

    FSE : Finite State Entropy, capability demo by Yann Collet (Jan 12 2014)

    File already compressed

    GCC ../data/win98-lz : 4671615 -> 4671758 (100.0%), 73.0 MB/s , 1420.2 MB/s GCC -funroll-loops ../data/win98-lz : 4671615 -> 4671758 (100.0%), 76.5 MB/s , 1405.2 MB/s GCC -funroll-loops -lyeppp ../data/win98-lz : 4671615 -> 4671758 (100.0%), 75.9 MB/s , 1420.9 MB/s

    CLANG ../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.4 MB/s , 1409.0 MB/s CLANG -funroll-loops ../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.4 MB/s , 1418.3 MB/s CLANG -funroll-loops -lyeppp ../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.3 MB/s , 1431.4 MB/s

    File is uncompressed

    GCC ../data/win98-lz : 12536244 -> 4671591 (37.26%), 73.0 MB/s , 96.9 MB/s GCC -funroll-loops ../data/win98-lz : 12536244 -> 4671591 (37.26%), 76.5 MB/s , 112.9 MB/s GCC -funroll-loops -lyeppp `../data/win98-lz : 12536244 -> 4671591 (37.26%), 76.5 MB/s , 112.9 MB/s

    CLANG ../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.2 MB/s , 107.9 MB/s CLANG -funroll-loops ../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.2 MB/s , 108.0 MB/s CLANG -funroll-loops -lyeppp ../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.6 MB/s , 108.0 MB/s

    EDIT: I have been thinking for several months about entropy, the universe and the use and state of compression in computer science. I've also used entropy as a main theme in my thesis. What strikes me is that the similarity between this and a neural networks is diminishing, if you chained multiple FSE's into a multi-layer network. Thought leader in this region is currently the work of Prof. Dr. Jürgen Schmidthuber's work and those of his Students which you can study here: http://www.idsia.ch/~juergen/onlinepub.html The main problem being the topology of data in the study of entropy raises the question why topological data analysis is so rarely used in the field as a method of exploiting the nature of the dataset to achieve higher compression ratios. It would be a pleasure to exchange ideas on entropy with you. Thanks for this great contribution! I've just recently enjoyed a growing awe on groundbreaking algorithms that have come up with linear, near optimal and rarely even near perfect solutions.

    opened by X4 5
  • Build fails because 32 bit absolute addressing isn't supported

    Build fails because 32 bit absolute addressing isn't supported

    I'm on OS X, just did a fresh clone of the repo, and clang outputs the following when calling make.

    I'm currently using the beta of Xcode 8, but this issue has been ongoing for a while.

    make CFLAGS="-march=native -ofast" LDFLAGS="-flto"

    /Applications/Xcode-beta.app/Contents/Developer/usr/bin/make -C programs test cc -I../lib -march=native -ofast -flto probaGenerator.c -o probagen cc -I../lib -march=native -ofast -flto ../lib/huf_decompress.c ../lib/entropy_common.c bench.c commandline.c fileio.c xxhash.c zlibh.c ../lib/fse_decompress.c ../lib/fse_compress.c ../lib/fseU16.c ../lib/huf_compress.c -o fse ./probagen 20% Binary file generator Generating 1023 KB with P=20.00% File proba.bin generated **** compress using FSE **** ./fse -f proba.bin tmp FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016) Compressed 1048575 bytes into 474414 bytes ==> 45.24%
    ./fse -df tmp result FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016) Decoded 1048575 bytes
    diff proba.bin result **** compress using HUF **** ./fse -fh proba.bin tmp FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016) Compressed 1048575 bytes into 478412 bytes ==> 45.62%
    ./fse -df tmp result FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016) Decoded 1048575 bytes
    diff proba.bin result **** compress using zlibh **** ./fse -fz proba.bin tmp FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016) Compressed 1048575 bytes into 478213 bytes ==> 45.61%
    ./fse -df tmp result FSE : Finite State Entropy, 64-bits demo by Yann Collet (Sep 4 2016) Decoded 1048575 bytes
    diff proba.bin result rm result rm proba.bin rm tmp cc -I../lib -march=native -ofast -flto fullbench.c xxhash.c ../lib/fse_decompress.c ../lib/fse_compress.c ../lib/fseU16.c ../lib/huf_compress.c ../lib/huf_decompress.c ../lib/entropy_common.c -o fullbench LLVM ERROR: 32-bit absolute addressing is not supported in 64-bit mode clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [fullbench] Error 1 make: *** [test] Error 2

    opened by MarcusJohnson91 4
  • Feature: high-level pseudocode in README

    Feature: high-level pseudocode in README

    Would be great if you could add pseudocode to the README. I'd love to understand how the algorithm works, but there's too much C for me to understand, and the whitepaper is not good for a layman. :/

    opened by dbkaplun 3
  • Can't compile on cygwin64.

    Can't compile on cygwin64.

    In fileio.c and in commandline.c , both of the defined (CYGWIN) lines must be removed to get it to compile as IS_CONSOLE and SET_BINARY_MODE are not supported.

    opened by neheb 3
  • resync with changes from zstd

    resync with changes from zstd

    zstd has accumulated a lot of little fixes and changes to its embedded fse/huff0. Would be great to see those backported to standalone FiniteStateEntropy if plausible.

    opened by adamdmoss 2
  • How should we understand this function ‘FSE_normalizeM2’?

    How should we understand this function ‘FSE_normalizeM2’?

    opened by huhu0823 2
  • a step in normalization of count

    a step in normalization of count

    I am new here and still reading the source code. In this line, why normalizedCounter[s] = -1? In my opinion, -1 should be 1. Could you please explain it to me? Thanks a lot. https://github.com/Cyan4973/FiniteStateEntropy/blob/12a533a9bf4d7bdcc507bf9d11302a7a1be454f5/lib/fse_compress.c#L459

    question 
    opened by BeeBreeze 2
  • question: FSE doesn't always compress better than huf - expected?

    question: FSE doesn't always compress better than huf - expected?

    I can't share the data, but in short, it's 9108363519 bytes (~9GB) of almost-uncompressible data (IIRC it's the 9GB tail of a larger already-compressed stream).

    % ./fse -e ./almost-uncompressable Compressed 9108363519 bytes into 8992064047 bytes ==> 98.72% % ./fse -h ./almost-uncompressable Compressed 9108363519 bytes into 8943423537 bytes ==> 98.19% % ./fse -z ./almost-uncompressable Compressed 9108363519 bytes into 8944678105 bytes ==> 98.20%

    Granted that I don't know the intimate details of FSE and this is a near-pathological case, but I'd have expected the two huffman implementations to fare rather worse than FSE on these almost-but-not-quite-uniform distributions of data.

    Am I wrong?

    Cheers.

    question 
    opened by adamdmoss 2
  • benchmark mode claims a decompression error on this data w/huf

    benchmark mode claims a decompression error on this data w/huf

     % ./fse -h -b ./xxx    
    FSE : Finite State Entropy, 64-bits demo by Yann Collet (Jul 17 2020)
    !! Error decompressing block 4 of cSize 18041 !! => (Corrupted block detected)
    

    gunzip the below file and run the above. xxx.gz The 'xxx' file appears to survive a huf compress and then a huf decompress intact when doing them individually, so perhaps this is an issue specific to the benchmark mode.

    Tested with 3865a704e8079fb45fc921480a6c351bb6bfbd72

    opened by adamdmoss 5
  • FSE_compressU16() computes clipped data if dstCapacity different

    FSE_compressU16() computes clipped data if dstCapacity different

    Repro:

    constexpr size_t inSize = 24;
    uint16_t in[inSize] =
      {0, 0, 3, 2, 0, 0, 0, 0, 314, 0, 0, 0, 0, 51, 50, 0, 0, 59, 22, 36, 0, 55, 32, 22};
    unsigned int maxSymbolValue = 314;
    unsigned int outSize = 48;
    
    uint8_t out1[256];
    uint8_t out2[256];
    size_t        numBytes1 = FSE_compressU16(out1,
                                       outSize,
                                       in,
                                       inSize,
                                       maxSymbolValue,
                                       0);
    
    size_t numBytes2 = FSE_compressU16(out2,
                                       256,
                                       in,
                                       inSize,
                                       maxSymbolValue,
                                       0);
    // numBytes1 is now 34 bytes
    // numBytes2 is now 43 bytes
    
    FSE_decompressU16(in, inSize, out2, numBytes2);
    FSE_decompressU16(in, inSize, out1, numBytes1); // <- Crashes 
    

    When output bytes is 48 rather than 256 as an argument for FSE_compressU16(), all generated bytes are identical up until 34 bytes. The rest of the bytes are clipped. The difference in code paths is that in the 34 bytes case, the non-fast path is chosen.

    opened by JohanKnutzen 1
  • NULL pointer dereference in BIT_reloadDStream()

    NULL pointer dereference in BIT_reloadDStream()

    Crashing line: 430: bitD->bitContainer = MEM_readLEST(bitD->ptr); in bitstream.h in the function BIT_reloadDStream() triggered by FSE_decompressU16() with the following code:

    const uint8_t* casted2 = (uint8_t*)"Àâ(¢x(Kùÿÿcb¿\a";
    uint16_t out[256];
    size_t ret = FSE_decompressU16(out, 256, casted2, 14);
    

    This is caused by size_t const NSize = FSE_readNCount (NCount, &maxSymbolValue, &tableLog, istart, cSrcSize); returning zero in the following block of FSE_decompressU16():

    { size_t const NSize = FSE_readNCount (NCount, &maxSymbolValue, &tableLog, istart, cSrcSize); if (FSE_isError(NSize)) return NSize; ip += NSize; cSrcSize -= NSize; } thus, resulting in a zero cSrcSize when entering FSE_decompressU16_usingDTable() which is unexpected

    I don't know if this can happen in the 8 bit version.

    opened by JohanKnutzen 7
Owner
Yann Collet
data compression
Yann Collet
Lambda code is a new high level compiled statically typed programming language

Lambda code is a new high level compiled statically typed programming language. Written in python, C, C++. Its syntax is far more easier then other middle level compiled languages.

Lambda Code 12 Aug 16, 2022
Przemyslaw Skibinski 561 Sep 8, 2022
Compile and execute C "scripts" in one go!

c "There isn't much that's special about C. That's one of the reasons why it's fast." I love C for its raw speed (although it does have its drawbacks)

Ryan Jacobs 2k Oct 3, 2022
distributed builds for C, C++ and Objective C

distcc -- a free distributed C/C++ compiler system by Martin Pool Current Documents: https://distcc.github.io/ Formally http://distcc.org/ "pump" func

distcc 1.7k Sep 27, 2022
Roaring bitmaps in C (and C++)

CRoaring Portable Roaring bitmaps in C (and C++) with full support for your favorite compiler (GNU GCC, LLVM's clang, Visual Studio). Included in the

Roaring bitmaps: A better compressed bitset 1.1k Sep 30, 2022
Compression abstraction library and utilities

Squash - Compresion Abstraction Library

null 367 Sep 14, 2022
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

null 1.9k Sep 30, 2022
Easing the task of comparing code generated by cc65, vbcc, and 6502-gcc

6502 C compilers benchmark Easing the way to compare code generated by cc65, 6502-gcc, vbcc, and KickC. This repository contains scripts to: Compile t

Sylvain Gadrat 17 Sep 4, 2022
Secure ECC-based DID intersection in Go, Java and C.

SecureUnionID Secure ECC-based DID intersection. ABSTRACT This project is used to protect device ID using Elliptic Curve Cryptography algorithm. The d

Volcengine 19 Aug 22, 2022
nanoc is a tiny subset of C and a tiny compiler that targets 32-bit x86 machines.

nanoc is a tiny subset of C and a tiny compiler that targets 32-bit x86 machines. Tiny? The only types are: int (32-bit signed integer) char (8-

Ajay Tatachar 16 Feb 13, 2022
Smaller C is a simple and small single-pass C compiler

Smaller C is a simple and small single-pass C compiler, currently supporting most of the C language common between C89/ANSI C and C99 (minus some C89 and plus some C99 features).

Alexey Frunze 1.2k Sep 29, 2022
Microvm is a virtual machine and compiler

The aim of this project is to create a stack based language and virtual machine for microcontrollers. A mix of approaches is used. Separate memory is used for program and variable space (Harvard architecture). An interpreter, virtual machine and compiler are available. A demostration of the interpreter in action is presented below.

null 10 Aug 14, 2022
Pre-configured LLVM and ANTLR4 for C++

LLVM + ANTLR4 Starter Project Starter project for ANTLR4 and LLVM C++ project. Prerequisite LLVM 12 Java (for ANTLR4) git Install prerequisite librari

Nathanael Demacon 12 Aug 9, 2022
Aheui JIT compiler for PC and web

아희짓 개요 아희짓은 아희 언어를 위한 JIT (Just in Time) 컴파일러입니다. 어셈블러와 유틸 라이브러리외에 외부 라이브러리에 전혀 의존하지 않고 JIT을 바닥부터 구현합니다. 지원 환경 64비트 windows, mac, linux (x86 아키텍쳐) 웹어셈

Sunho Kim 28 Sep 23, 2022
Interpreter and miner for the LODA language written in C++

LODA Interpreter and Miner (C++) LODA is an assembly language, a computational model and a tool for mining integer sequences. You can use it to search

LODA Language 12 Sep 22, 2022
is a c++20 compile and runtime Struct Reflections header only library.

is a c++20 compile and runtime Struct Reflections header only library. It allows you to iterate over aggregate type's member variables.

RedSkittleFox 4 Apr 18, 2022
C++ python bytecode disassembler and decompiler

C++ python bytecode disassembler and decompiler

Michael Hansen 1.5k Sep 26, 2022
🌳 A compressed rank/select dictionary exploiting approximate linearity and repetitiveness.

The block-ε tree is a compressed rank/select dictionary that achieves new space-time trade-offs by exploiting the approximate linearity and the repeti

Giorgio Vinciguerra 10 Jun 5, 2022
A LLVM and Clang compiler toolchain built for kernel development

Cosmic-Clang Toolchain This is a LLVM and Clang compiler toolchain built for kernel development. Builds are always made from the latest LLVM sources r

Ǥђ๏ຮ₮⌁Ⲙครtє࿐ 0 Apr 12, 2022