LZFSE compression library and command line tool

Related tags

Compression lzfse
Overview

LZFSE

This is a reference C implementation of the LZFSE compressor introduced in the Compression library with OS X 10.11 and iOS 9.

LZFSE is a Lempel-Ziv style data compression algorithm using Finite State Entropy coding. It targets similar compression rates at higher compression and decompression speed compared to deflate using zlib.

Files

README.md                             This file ;-)
Makefile                              Linux / macOS Makefile
lzfse.xcodeproj                       Xcode project

src/lzfse.h                           Main LZFSE header
src/lzfse_tunables.h                  LZFSE encoder configuration
src/lzfse_internal.h                  LZFSE internal header
src/lzfse_decode.c                    LZFSE decoder API entry point
src/lzfse_encode.c                    LZFSE encoder API entry point
src/lzfse_decode_base.c               LZFSE decoder internal functions
src/lzfse_encode_base.c               LZFSE encoder internal functions
src/lzfse_encode_tables.h             LZFSE encoder tables

src/lzfse_fse.h                       FSE entropy encoder/decoder header
src/lzfse_fse.c                       FSE entropy encoder/decoder functions

src/lzvn_decode_base.h                LZVN decoder
src/lzvn_decode_base.c
src/lzvn_encode_base.h                LZVN encoder
src/lzvn_encode_base.c

src/lzfse_main.c                      Command line tool

Building on OS X

$ xcodebuild install DSTROOT=/tmp/lzfse.dst

Produces the following files in /tmp/lzfse.dst:

usr/local/bin/lzfse                   command line tool
usr/local/include/lzfse.h             LZFSE library header
usr/local/lib/liblzfse.a              LZFSE library

Building on Linux

Tested on Ubuntu 15.10 with gcc 5.2.1 and clang 3.6.2. Should work on any recent distribution.

$ make install INSTALL_PREFIX=/tmp/lzfse.dst/usr/local

Produces the following files in /tmp/lzfse.dst:

usr/local/bin/lzfse                   command line tool
usr/local/include/lzfse.h             LZFSE library header
usr/local/lib/liblzfse.a              LZFSE library

Building with cmake

$ mkdir build
$ cd build
$ cmake ..
$ make install

Installs the header, library, and command line tool in /usr/local.

Bindings

Python: dimkr/pylzfse

Comments
  • Fix labels

    Fix labels

    The squash unit tests were failing on Windows. It looks like I made a cut-and-paste error in the Python script I used to create the labels for the switch statemet, so eos was swapped with one of the nop labels.

    opened by jibsen 9
  • License and Usage Approval Questions

    License and Usage Approval Questions

    So far LZFSE seems to be quite nice as an alternative to zlib. But I hope that I'm not the only one having troubles using a proprietary copyrighted algorithm which might deny me to use this software piece in future.

    My questions so far are:

    1. Will there be an RFC for LZFSE or any intention to standardize this algorithm to the public so that cross-platform adaption is possible?
    2. Will LZFSE be distributed via another License (like Apache, BSD or MIT)?
    3. What kind of patents by Apple are involved in the usage and adaption of LZFSE?

    Thanks in advice.

    opened by cookiengineer 8
  • Allow LZVN to work with inputs shorter than 8 bytes.

    Allow LZVN to work with inputs shorter than 8 bytes.

    Please review this carefully, I'm not particularly confident of the change.

    I'd really like to be able to expose a LZVN in Squash, but the requirement of the input buffer being at least 8 bytes is a problem. Output isn't a big deal since I can just compress to a temporary 8-byte buffer, then copy it over to the "real" output buffer if there is enough room.

    After poking around the code a bit, it occurs to me that it should be possible to just skip the compression part and write the entire buffer as a literal, just like you already do for the trailing few bytes. This patch, I think, does that. I'm not very familiar with the code base, or the compression format, though, so please be careful applying this.

    I've modified the Squash LZFSE plugin to also expose LZVN, and with this change it does pass all Squash's unit tests, which is about all I can do. Here is the diff for the changes to Squash in case you want to play with it:

    diff --git a/plugins/lzfse/squash-lzfse.c b/plugins/lzfse/squash-lzfse.c
    index f72324f..39514e5 100644
    --- a/plugins/lzfse/squash-lzfse.c
    +++ b/plugins/lzfse/squash-lzfse.c
    @@ -34,6 +34,8 @@
    
     #include "lzfse/src/lzfse.h"
     #include "lzfse/src/lzfse_internal.h"
    +#include "lzfse/src/lzvn_encode_base.h"
    +#include "lzfse/src/lzvn_decode_base.h"
    
     SQUASH_PLUGIN_EXPORT
     SquashStatus squash_plugin_init_codec (SquashCodec* codec, SquashCodecImpl* impl);
    @@ -101,14 +103,101 @@ squash_lzfse_compress_buffer (SquashCodec* codec,
       return SQUASH_OK;
     }
    
    +static size_t
    +squash_lzvn_get_max_compressed_size (SquashCodec* codec, size_t uncompressed_size) {
    +  return (uncompressed_size) + (uncompressed_size / 80) + 16;
    +}
    +
    +static SquashStatus
    +squash_lzvn_decompress_buffer (SquashCodec* codec,
    +                               size_t* decompressed_size,
    +                               uint8_t decompressed[SQUASH_ARRAY_PARAM(*decompressed_size)],
    +                               size_t compressed_size,
    +                               const uint8_t compressed[SQUASH_ARRAY_PARAM(compressed_size)],
    +                               SquashOptions* options) {
    +  lzvn_decoder_state decoder = { 0, };
    +
    +  decoder.src = compressed;
    +  decoder.src_end = compressed + compressed_size;
    +
    +  decoder.dst = decompressed;
    +  decoder.dst_begin = decompressed;
    +  decoder.dst_end = decompressed + *decompressed_size;
    +  decoder.dst_current = decompressed;
    +
    +  lzvn_decode (&decoder);
    +
    +  const size_t bytes_read = decoder.src - compressed;
    +  const size_t bytes_written = decoder.dst - decoder.dst_begin;
    +
    +  if (SQUASH_UNLIKELY(bytes_read != compressed_size)) {
    +    if (bytes_written == *decompressed_size)
    +      return SQUASH_BUFFER_FULL;
    +    else
    +      return SQUASH_FAILED;
    +  }
    +
    +  *decompressed_size = bytes_written;
    +
    +  return SQUASH_OK;
    +}
    +
    +static SquashStatus
    +squash_lzvn_compress_buffer (SquashCodec* codec,
    +                             size_t* compressed_size,
    +                             uint8_t compressed[SQUASH_ARRAY_PARAM(*compressed_size)],
    +                             size_t uncompressed_size,
    +                             const uint8_t uncompressed[SQUASH_ARRAY_PARAM(uncompressed_size)],
    +                             SquashOptions* options) {
    +  uint8_t outbuf[LZVN_ENCODE_MIN_DST_SIZE];
    +  uint8_t* dest;
    +  size_t dest_l;
    +
    +  if (SQUASH_UNLIKELY(*compressed_size < sizeof(outbuf))) {
    +    dest = outbuf;
    +    dest_l = sizeof(outbuf);
    +  } else {
    +    dest = compressed;
    +    dest_l = *compressed_size;
    +  }
    +
    +  void* workmem = squash_malloc (LZVN_ENCODE_WORK_SIZE);
    +  if (SQUASH_UNLIKELY(workmem == NULL))
    +    return squash_error (SQUASH_MEMORY);
    +
    +  dest_l =
    +    lzvn_encode_buffer(dest, dest_l,
    +                       uncompressed, uncompressed_size,
    +                       workmem);
    +
    +  squash_free (workmem);
    +
    +  if (SQUASH_UNLIKELY(dest_l == 0))
    +    return squash_error (SQUASH_BUFFER_FULL);
    +
    +  if (SQUASH_UNLIKELY(dest == outbuf)) {
    +    if (*compressed_size < dest_l)
    +      return squash_error (SQUASH_BUFFER_FULL);
    +
    +    memcpy (compressed, dest, dest_l);
    +  }
    +
    +  *compressed_size = dest_l;
    +  return SQUASH_OK;
    +}
    +
     SquashStatus
     squash_plugin_init_codec (SquashCodec* codec, SquashCodecImpl* impl) {
       const char* name = squash_codec_get_name (codec);
    
    -  if (SQUASH_LIKELY(strcmp ("lzfse", name) == 0)) {
    +  if (strcmp ("lzfse", name) == 0) {
         impl->get_max_compressed_size = squash_lzfse_get_max_compressed_size;
         impl->decompress_buffer = squash_lzfse_decompress_buffer;
         impl->compress_buffer = squash_lzfse_compress_buffer;
    +  } else if (strcmp ("lzvn", name) == 0) {
    +    impl->get_max_compressed_size = squash_lzvn_get_max_compressed_size;
    +    impl->decompress_buffer = squash_lzvn_decompress_buffer;
    +    impl->compress_buffer = squash_lzvn_compress_buffer;
       } else {
         return SQUASH_UNABLE_TO_LOAD;
       }
    diff --git a/plugins/lzfse/squash.ini b/plugins/lzfse/squash.ini
    index 11f1324..e3fcf60 100644
    --- a/plugins/lzfse/squash.ini
    +++ b/plugins/lzfse/squash.ini
    @@ -1,3 +1,4 @@
     license=BSD3
    
     [lzfse]
    +[lzvn]
    
    opened by nemequ 7
  • MSVC support

    MSVC support

    LZFSE currently fails to build with MSVC. See https://ci.appveyor.com/project/quixdb/squash/build/470/job/1fvyqd3lrkrpem6t#L1388 for a build log.

    @jibsen kindly looked into this, and it seems the major issue is reliance on a gcc extension in lzvn_decode_base.c.

    opened by nemequ 6
  • Library release versioning

    Library release versioning

    Hi, I noticed the project doesn't have proper releases and that makes hard to build package for third party distribution.

    In particular, I was adding the package for this library to conda-forge (see https://github.com/conda-forge/staged-recipes/pull/2596) and having versions somewhat required. It would be posible to have tags (and optionally releases) that we can refer to?

    CC @jakirkham

    opened by rmax 4
  • Add CMake build system.

    Add CMake build system.

    As discussed in #15, rebased on master. Still wondering about the answers to the questions I asked there, but this should at least be a good starting point.

    FWIW, I'm willing to help with maintenance on this, just ping me if you need something.

    opened by nemequ 4
  • Consider requiring scratch memory to already be zeroed

    Consider requiring scratch memory to already be zeroed

    One potential optimization is to require that memory passed to the encode and decode functions already be zeroed, which would let you save the memset call. It can be much faster to call calloc than malloc + memset. For people doing lots of small operations without reusing memory this could result in a significant speedup (I've seen big boosts in other codec's performance by doing this).

    wontfix 
    opened by nemequ 4
  • Fix #47

    Fix #47

    Hi. Fix for the above issue. I hope the code and comments are reasonably clear.

    Cause. The main decode loop cannot differentiate between an invalid decode input and a destination buffer that is too small. When it encounters an invalid input it will exponentially malloc until it can malloc no more, at which point it fails with the dreaded malloc: Cannot allocate memory error.

    Fix. We fast fail zero-length inputs, which are always invalid. This obliviates the allocation of zero destination buffers. We can now interpret a zero lzfse_decode_buffer return value as a decode failure. The API docs are a bit vague in regards to invalid inputs: but as we can see here, lzfse_decode_buffer will return a zero value in the case of failure.

    opened by shampoofactory 2
  • UBsan flags 2 loops in lzfse_decode_base.c with

    UBsan flags 2 loops in lzfse_decode_base.c with "Pointer Overflow" warnings

    Hi, I've compiled the most recent lzfse library using Xcode 11.7 and ran it with UB sanitizer enabled. It flags these two warnings when I ran it against my test data:

    file lzfse_decode_base.c lines 240-241: for (size_t i = 0; i < M; i++) dst[i] = dst[i - D];

    UBsan flags line 241 as "Thread 1: Pointer overflow" "Addition of unsigned offset to 0x000106a85801 overflowed to 0x000106a85800"

    variables: D = 1 M = 1023 i = 0

    The actual problem is that this code is trying to copy bytes with overlapping buffers and it ends up performing the copy incorrectly.

    In this specific case, it will first copy dst[-1] to dst[0] and then on the next loop iteration, it will copy dst[0] to dst[1] but this will be the same value as dst[-1] which is almost certainly not what the code's author intended because the comment above this code states // ..."a more // careful path that applies a permutation to account for the // possible overlap between source and destination if the distance // is small". which is referring to this loop.

    I believe the code should be performing the equivalent of

    memmove(dst, dst - D, M);

    which is what the fast code path above it does (the fast code path assumes that the buffers do not overlap however so it performs its work using a memcpy-like loop).

    I've patched my working copy of the source file to use memmove and this stopped the warning from being generated.

    There is another UBsan warning in the same file on line 280-281:

        for (size_t i = 0; i < M; i++)
          dst[i] = dst[i - D];
    

    UBsan flags line 281 as "Thread 1: Pointer overflow" "Addition of unsigned offset to 0x000106b790f4 overflowed to 0x000106b3f0f4"

    with variables:

    M = 1801 D = 237568 i = 0

    which clearly is a non-overlapping copy so the only issue here is the way the code appears to access invalid array elements.

    I've patched my working copy of the source file to use memcpy(dst, dst - D, M) and this stopped the warning from being generated.

    I will also note that lines 294-295 are another loop similar to these 2 and may also generate a similar warning (but my test data did not cause this code to execute).

    Cheers.

    opened by chuchusoft 2
  • Clarify patent situation

    Clarify patent situation

    I would appreciate some sort of clarification on LZFSE's patent situation. At minimum, if Apple believes it has any patents relevant to LZFSE they should disclose them, and if Apple doesn't have any relevant patents it should provide a statement to that effect (Google recently did something similar for Brotli).

    Ideally, though, I would like to see some sort of patent license grant for any patents it owns or acquires which read on LZFSE, at least for the purpose of using LZFSE.

    opened by nemequ 2
  • [Question] How to build this as dylib (consumable in a .NET Core project)

    [Question] How to build this as dylib (consumable in a .NET Core project)

    How can I build this as a dylib on macOS, rather than a .a (static library)?

    I've tried the following with no success:

    • passing -DBUILD_SHARED_LIBS=ON parameter to xcodebuild
    • Explicitly specified "Dynamic Library" as a "Match-O Type": https://take.ms/4Tzj6
    opened by alex-swiftify 1
  • lzfse decode issue - msys2 build

    lzfse decode issue - msys2 build

    compile success over msys2(win10x64) encode all fine only issue is with decode :

    decoding file of 14mb, RAM increase insanely like 5GB+ then output :

    malloc: Not enough space
    
    opened by P5-2005 0
  • Add vcpkg installation instructions

    Add vcpkg installation instructions

    lzfse is available as a port in vcpkg, a C++ library manager that simplifies installation for lzfse and other project dependencies. Documenting the install process here will help users get started by providing a single set of commands to build lzfse, ready to be included in their projects.

    We also test whether our library ports build in various configurations (dynamic, static) on various platforms (OSX, Linux, Windows: x86, x64) to keep a wide coverage for users.

    I'm a maintainer for vcpkg, and here is what the port script looks like. We try to keep the library maintained as close as possible to the original library. :)

    opened by JonLiu1993 0
  • Add cpack support to generate packages

    Add cpack support to generate packages

    This commit enables CPack for CMake to generate target packages for each platform: RPMs and DEBs. It also generates packages for Mac and Windows platforms. The commit includes the regular packages and devel packages with the LZFSE header. To generate the package, you should pass -DBUILD_PACKAGING=ON and to generate devel packages you should also include -DINSTALL_HEADERS.

    Signed-off-by: Julio Faracco [email protected]

    opened by jcfaracco 1
  • Fix README link to Compression library

    Fix README link to Compression library

    Fixes a broken link to the Compression library documentation on developer.apple.com. Also fixes a remaining instance of "OS X" to "macOS". I didn't change the initial one since 10.11 was still correctly called OS X.

    opened by zebe 0
Owner
null
Advanced DXTc texture compression and transcoding library

crunch/crnlib v1.04 - Advanced DXTn texture compression library Public Domain - Please see license.txt. Portions of this software make use of public d

null 775 Dec 26, 2022
Small strings compression library

SMAZ - compression for very small strings ----------------------------------------- Smaz is a simple compression library suitable for compressing ver

Salvatore Sanfilippo 1k Dec 28, 2022
A massively spiffy yet delicately unobtrusive compression library.

ZLIB DATA COMPRESSION LIBRARY zlib 1.2.11 is a general purpose data compression library. All the code is thread safe. The data format used by the z

Mark Adler 4.1k Dec 30, 2022
A simple C library implementing the compression algorithm for isosceles triangles.

orvaenting Summary A simple C library implementing the compression algorithm for isosceles triangles. License This project's license is GPL 2 (as of J

Kevin Matthes 0 Apr 1, 2022
Brotli compression format

SECURITY NOTE Please consider updating brotli to version 1.0.9 (latest). Version 1.0.9 contains a fix to "integer overflow" problem. This happens when

Google 11.7k Dec 29, 2022
Extremely Fast Compression algorithm

LZ4 - Extremely fast compression LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU

lz4 7.9k Dec 31, 2022
Zstandard - Fast real-time compression algorithm

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better comp

Facebook 19.2k Jan 1, 2023
Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++

LZHAM - Lossless Data Compression Codec Public Domain (see LICENSE) LZHAM is a lossless data compression codec written in C/C++ (specifically C++03),

Rich Geldreich 641 Dec 22, 2022
A bespoke sample compression codec for 64k intros

pulsejet A bespoke sample compression codec for 64K intros codec pulsejet lifts a lot of ideas from Opus, and more specifically, its CELT layer, which

logicoma 34 Jul 25, 2022
A variation CredBandit that uses compression to reduce the size of the data that must be trasnmitted.

compressedCredBandit compressedCredBandit is a modified version of anthemtotheego's proof of concept Beacon Object File (BOF). This version does all t

Conor Richard 18 Sep 22, 2022
Data compression utility for minimalist demoscene programs.

bzpack Bzpack is a data compression utility which targets retrocomputing and demoscene enthusiasts. Given the artificially imposed size limits on prog

Milos Bazelides 20 Jul 27, 2022
gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

ACM at UCLA 8 Nov 6, 2022
Better lossless compression than PNG with a simpler algorithm

Zpng Small experimental lossless photographic image compression library with a C API and command-line interface. It's much faster than PNG and compres

Chris Taylor 214 Dec 23, 2022
archiver is a compressing/decompressing tool made for educational purposes

archiver ?? archiver is a compressing/decompressing tool made for educational purposes (specifically, it was a hometask given at a C++ course in the H

Ihor Chovpan 0 Sep 19, 2022
A C++ static library offering a clean and simple interface to the 7-zip DLLs.

bit7z A C++ static library offering a clean and simple interface to the 7-zip DLLs Supported Features • Getting Started • Download • Requirements • Bu

Riccardo 326 Jan 1, 2023
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz

Miniz Miniz is a lossless, high performance data compression library in a single source file that implements the zlib (RFC 1950) and Deflate (RFC 1951

Rich Geldreich 1.6k Jan 5, 2023
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.0 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 971 Jan 4, 2023
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.1 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 971 Jan 4, 2023
PhysFS++ is a C++ wrapper for the PhysicsFS library.

PhysFS++ PhysFS++ is a C++ wrapper for the excellent PhysicsFS library by Ryan C. Gordon and others. It is licensed under the zlib license - same as P

Kevin Howell 80 Oct 25, 2022