TurboRC - Turbo Range Coder / Arithmetic Coding

Overview

TurboRC: Turbo Range Coder

======================================

  • Fastest Range Coder / Arithmetic Coder
    • 100% C (C++ headers).
    • OS/Arch: Linux amd/intel, arm, PowerPC, s390x, MacOs+Apple M1. Windows: Mingw, visual c++
    • No other Range Coder / Arithmetic Coder encode or decode faster with better compression
    • Up to 3 times faster than the next fastest range coder with similar compression ratio
    • Can work as bitwise or/and as multisymbol range coder
    • 32 or 64 bits range coder. Big + Little endian
    • Renormalization output 8,16 or 32 bits
    • Easy connection to bit, nibble or byte predictors.
    • Several built-in predictors: simple, dual speed, 🆕 fsm
    • Built-in order0, order1, order2, Sliding Context, 🆕 Context mixing,
      - Run Length Encoding, Gamma Coding, 🆕 Rice Coding,
      - 🆕 Bit entropy coding,
      - 🆕 Turbo VLC: novel Variable Length Coding for large integers,
      - MTF (Move-To-Front) / QLFC (Quantized Local Frequency Coding)
    • Fast full 8/16 bits BWT: Burrows-Wheeler compression/decompression w/
      • preprocessing : lzp and 🆕 utf-8
      • postprocessing: QLFC (Quantized Local Frequency Coding), 🆕 RLE and 🆕 Bit entropy coder
    • BWT 🆕 libsais + optimized libdivsufsort + 🆕 optimized inverse bwt included
    • static + adaptive cdf - cumulative distribution functions
    • stdin/stdout file compressor included
    • TurboRC App for benchmarking all the functions and test allmost all byte, integer, floating point, date and timestamp file types.
      • 🆕 read and convert text, csv or binary files to 8/16/32 bits before processing
      • 🆕 set predictor and parameters at the command line

Usage examples

    ./turborc -e0   file           " benchmark all basic functions using the default simple predictor
    ./turborc -e20  file           " byte gamma coding + rc
    ./turborc -e11,14 file
    ./turborc -e0 -pss -r47 file   " use dual speed predictor with parameters 4 and 7
    ./turborc -e0 -psf -r1 file    " use FSM predictor with filename "FSM1.txt"
    ./turborc -e0 file -Os         " raw 16 bits input
    ./turborc -e0 file -Ou         " raw 32 bits input
    ./turborc -e0 file -Ft         " text file (one integer/line) 
    ./turborc -e0 file -Fc         " text file with multiple integer entries (separated by non-digits characters ex. 456,32,54)
    ./turborc -e0 file -Fc -v5     " like prev., display the first 100 values read
    ./turborc -e0 file -Fcf        " text file with multiple floating-point entries (separated by non-digits characters ex. 456.56,32.1,54)
    ./turborc -e0 file -Fru -Ob    " convert raw 32 bits input to bytes before processing possibly truncating large values
    ./turborc -e0 file -Ft -K3 -Ou " convert column 3 of a csv text file to 32 bits integers
    ./turborc -e0 file -pss -r47   " benchmark all basic functions using the dual speed predictor with paramters 4 and 7
    ./turborc -e0 file -psf -r1    " benchmark all basic functions using the fsm predictor with the paramter file FSM1.txt

Benchmark

see also Entropy Coder Benchmark

File: enwik9bwt generated BWT (wikipedia XML 1GB )
C Size ratio% C MB/s D MB/s Name / 2020-01
167,395,956 17.13 82 120 25-rcqlfcs Move-To-Front/QLFC
171,300,484 17.13 92 91 24-rcssxrle RLE
172,462,692 17.25 112 108 22-rcsxrle RLE
173,418,060 17.34 48 44 14-rcssx o8bits (strong)
175,618,792 17.56 64 50 12-rcsx o8bits sliding context
176,983,264 17.70 61 58 13-rcss o0 (strong)
183,243,104 18.32 110 102 23-rcssrle RLE
182,958,556 18.30 139 113 21-rcsrle RLE
183,542,772 18.35 81 73 11-rcs o0
File: QC (DNA 100MB)
C Size ratio% C MB/s D MB/s Name / 2020-01
838,416 0.83% 909 1645 24-rcssxrle RLE
839,384 0.83% 1283 1936 22-rcsxrle RLE
1,000,716 0.99% 54 55 14-rcssx o8bits sliding context (strong)
1,041,592 1.03% 76 65 12-rcsx o8bits sliding context
1,081,104 1.07% 1263 1785 23-rcssrle RLE
1,087,896 1.08% 1456 2021 21-rcsrle RLE
3,388,684 3.36% 73 73 13-rcss o0 (strong)
4,145,920 4.10% 92 95 11-rcs o0

File Compression

Range Coder

    ./turborc -1 inputfile outputfile         "order 0 simple
    ./turborc -2 inputfile outputfile         "order 1 simple

Range Coder + RLE

    ./turborc -12 inputfile outputfile         "order 0 simple
    ./turborc -14 inputfile outputfile         "order 1 simple
    ./turborc -d inputfile outputfile          "decompress

BWT (Burrows-Wheeler) + QLFC (Quantized Local Frequency Coding) + TurboRC

    ./turborc -20 inputfile outputfile -l# [-Os]  "bwt compression 
	                                              #:level  1:Bit EC, 2:RC simple, 3:RC dual speed, 4:RLE O0
												  -Os : optional 16-bits BWT only for levels 1 and 4
    ./turborc -d inputfile outputfile             "decompress

Compile:

    Download or clone TurboRC
	git clone --recursive git://github.com/powturbo/Turbo-Range-Coder.git
	cd Turbo-Range-Coder
Linux, MacOS, Windows (MingW), Clang,... (see also makefile)
= haswell">
	make
or
	make AVX2=1                                "compile for recent architectures >= haswell
Windows visual c++
	nmake /f makefile.vs
Windows visual studio c++
	project files in directory vs/vs2017

Function usage:

See examples in "turborc.c"

Environment:

OS/Compiler (32 + 64 bits):
  • Windows: Visual C++ (2017)
  • Windows: MinGW-w64 makefile
  • Linux amd/intel: GNU GCC (>=4.6)
  • Linux amd/intel: Clang (>=3.2)
  • Linux arm: aarch64
  • MaxOS: XCode (>=9) + Apple M1
  • PowerPC ppc64le
  • IBM s390x

References:

Last update: 01 JAN 2022

You might also like...
Silk coder; Encode audio to silk; Decode silk to PCM

silk-codec A library for convert PCM to tencent silk files. Features Convert PCM to silk Convert audio to silk (FFMPEG required) Convert silk to PCM P

This repo contains solutions to coding questions available online on coding platforms like - Codeforces, Codechef, URI Online Judge, and Hackerrank.

CPP_Soln This repo contains solutions to coding questions available online on coding platforms like - Codeforces, Codechef, URI Online Judge , LeetCod

Main libjpeg-turbo repository

Background libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression on x86, x86-64, A

A terminal emulator that runs in your terminal. Powered by Turbo Vision.
A terminal emulator that runs in your terminal. Powered by Turbo Vision.

tvterm A terminal emulator that runs in your terminal. Powered by Turbo Vision. tvterm is an experimental terminal emulator widget and application bas

BCJR-based decoder for CCSDS turbo codes (according to CCSDS 131.0-B-3)

ccsds-tc ccsds-tc is a small project that attempts to systematize the decoding of space packets as received by ground stations of the Amateur DSN. Thi

A modern port of Turbo Vision 2.0, the classical framework for text-based user interfaces. Now cross-platform and with Unicode support.
A modern port of Turbo Vision 2.0, the classical framework for text-based user interfaces. Now cross-platform and with Unicode support.

Turbo Vision A modern port of Turbo Vision 2.0, the classical framework for text-based user interfaces. Now cross-platform and with Unicode support. I

HastyBadger is a branch of the excellent widget and GUI library Turbo Badger.

Branch Notice - HastyBadger Hasty is not Turbo. HastyBadger is a branch of the excellent widget and GUI library Turbo Badger. Notabe additions are c++

MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

MIRACL What is MIRACL? Multiprecision Integer and Rational Arithmetic Cryptographic Library – the MIRACL Crypto SDK – is a C software library that is

Universal Number Arithmetic
Universal Number Arithmetic

Universal: a header-only C++ template library for universal number arithmetic The goal of the Universal Numbers Library is to offer applications alter

MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

MIRACL What is MIRACL? Multiprecision Integer and Rational Arithmetic Cryptographic Library – the MIRACL Crypto SDK – is a C software library that is

C library for arbitrary-precision ball arithmetic
C library for arbitrary-precision ball arithmetic

Arb Arb is a C library for arbitrary-precision interval arithmetic. It has full support for both real and complex numbers. The library is thread-safe,

Fixed point 48.16 bit arithmetic type

fixed_math written from scratch fixed point math library in C++17 features minimum c++17 compiler required fixed point 48.16 arithmethic strong type w

Intel:registered: Homomorphic Encryption Acceleration Library accelerates modular arithmetic operations used in homomorphic encryption

Intel Homomorphic Encryption Acceleration Library (HEXL) Intel ®️ HEXL is an open-source library which provides efficient implementations of integer a

A single-header C/C++ library for parsing and evaluation of arithmetic expressions

ceval A C/C++ header for parsing and evaluation of arithmetic expressions. [README file is almost identical to that of the ceval library] Functions ac

A single-header C/C++ library for parsing and evaluation of arithmetic expressions

ceval A C/C++ header for parsing and evaluation of arithmetic expressions. [README file is almost identical to that of the ceval library] Functions ac

Long integer arithmetic.

LongInt Содержимое репозитория представляет из себя исходный код примитивного консольного калькулятора, который использует реализацию длинных чисел из

C++ Matrix -- High performance and accurate (e.g. edge cases) matrix math library with expression template arithmetic operators
C++ Matrix -- High performance and accurate (e.g. edge cases) matrix math library with expression template arithmetic operators

Matrix This is a math and arithmetic matrix library. It has stood many years of performing in mission critical production for financial systems. It ha

A C/C++ library for parsing and evaluation of arithmetic expressions.

ceval A C/C++ header for parsing and evaluation of arithmetic expressions. Functions accessibe from main() Function Argument(s) Return Value ceval_res

Comments
  • Level 25 fails decompression on enwik8

    Level 25 fails decompression on enwik8

    ./turborc -25 /tmp/enwik8 /tmp/enwik8.rc ll /tmp/enwik8.rc -rw-rw-r-- 1 fred fred 58290080 Mar 21 13:03 /tmp/enwik8.rc rm /tmp/enwik8.rc.bak ./turborc -d /tmp/enwik8.rc /tmp/enwik8.rc.bak ll /tmp/enwik8.rc.bak -rw-rw-r-- 1 fred fred 100000000 Mar 21 13:04 /tmp/enwik8.rc.bak md5sum /tmp/enwik8.rc.bak /tmp/enwik8 0f86d7c5a6180cf9584c1d21144d85b0 /tmp/enwik8.rc.bak a1fa5ffddb56f4953e226637dabbb36a /tmp/enwik8

    opened by flanglet 2
  • Two iota

    Two iota

    TRC is lovely.

    1. these encoders can write past end of buffer. Is there a maximum over-run?

    2. This seems to fix over runs: #define OVERFLOW(in, inlen) if(op >= out+inlen-16) { memcpy(out, in, inlen); return inlen; }

    3. It seems cleaner to specify the outlen as a parameter (inlen for decompress). outlen might be reduced externally for application reasons and internally for maximum overrun. Omitting the memcpy() #define OVERFLOW(in, inlen) if(op >= outbound)return 0;

    Thanks!

    opened by stati64 1
Owner
powturbo
Turbo Data Compression, InformationRetrieval. InvertedIndex, Integer Compression, InMemoryDB, SIMD, Data Structures, Algorithms, BigData, Databases
powturbo
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Giorgio Vinciguerra 650 Dec 31, 2022
A collection of basic data structures syntaxes, useful for competitive coding and placement exams

Data-Structures A collection of basic data structures syntaxes, useful for competitive coding and placement exams 1. Array 2. Matrix 3. Linked List Si

MAINAK CHAUDHURI 2 Aug 8, 2021
LeetCode Coding Problems

LeetCode LeetCode Coding Problems Author: Abhishek Kumar My C++ Code for LeetCode OJ. Please give this repo a ⭐️ if it inspires you. Thanks. ☺️ # Titl

Abhishek kumar 3 Jan 7, 2022
Leetcode solutions in C++ for coding interviews.

LeetCode Solutions This repository will consist solutions focusing on the concepts of DataStructures and Algorithms. Daily updates! Please come back a

Meghna Srivastava 4 May 17, 2022
My solutions to problems of Global Coding Challenge 2021

Global Coding Challenge 2021 Organized by Credit Suisse This repo contains all the problems of the competition and solutions to them submitted by bpan

Bibek Panthi 1 Nov 7, 2021
Project in C : Implementation of Huffman Coding algorithm.

HUFFMAN CODING PROJECT Huffman coding algorithm implemented in C programming language. Compressing text files into binary files, and decompressing tho

Raphael CAUSSE 0 Sep 1, 2022
A collection of libraries, data structures, and more that I have created to make coding in C less painful.

ctools A collection of libraries, data structures, and more that I have created to make coding in C less painful. Data structures There are many data

null 3 Nov 27, 2021
👨‍💻 Solution of everyday coding problem given in 30DaysofCode contest held on Hackerrank.

??‍??30DaysOfCode-PhoenixClub This repository gives you the solution of everyday problems given in 30DaysOfCode contest which is held on Hackerrank. N

Urveshkumar 8 Jan 30, 2022
TechMaestro's 75 days coding challenge

You can also join the challenge to: Learn DSA Coding from basic to advance level by FAANG engineers Building Github and Leetcode profile in the process for better resume Daily live sessions Win excited rewards and goodies on completing milestone.

Gopal Oswal 3 Jul 4, 2022
Minimal Huffman coder/decoder

huffandpuff This is an extremely minimal huffman encoder/decoder. It uses no calls at all, not even stdlib/stdio, making it suitable for embedded appl

Adam Ierymenko 89 Nov 28, 2022