zlib replacement with optimizations for "next generation" systems.

Overview

zlib-ng

zlib data compression library for the next generation systems

Maintained by Hans Kristian Rosbach aka Dead2 (zlib-ng àt circlestorm dót org)

CI Status
GitHub Actions Master Branch Status Master Branch Status Master Branch Status
Buildkite Build status
CodeFactor CodeFactor
OSS-Fuzz Fuzzing Status
Codecov codecov.io

Features

  • Zlib compatible API with support for dual-linking
  • Modernized native API based on zlib API for ease of porting
  • Modern C99 syntax and a clean code layout
  • Deflate medium and quick algorithms based on Intels zlib fork
  • Support for CPU intrinsics when available
    • Adler32 implementation using SSSE3, AVX2, Neon & VSX
    • CRC32-B implementation using PCLMULQDQ & ACLE
    • Hash table implementation using CRC32-C intrinsics on x86 and ARM
    • Slide hash implementations using SSE2, AVX2, Neon & VSX
    • Compare256/258 implementations using SSE4.2 & AVX2
    • Inflate chunk copying using SSE2, AVX2 & Neon
    • Support for hardware-accelerated deflate using IBM Z DFLTCC
  • Unaligned memory read/writes and large bit buffer improvements
  • Includes improvements from Cloudflare and Intel forks
  • Configure, CMake, and NMake build system support
  • Comprehensive set of CMake unit tests
  • Code sanitizers, fuzzing, and coverage
  • GitHub Actions continuous integration on Windows, macOS, and Linux
    • Emulated CI for ARM, AARCH64, PPC, PPC64, SPARC64, S390x using qemu

History

The motivation for this fork came after seeing several 3rd party contributions containing new optimizations not getting implemented into the official zlib repository.

Mark Adler has been maintaining zlib for a very long time, and he has done a great job and hopefully he will continue for a long time yet. The idea of zlib-ng is not to replace zlib, but to co-exist as a drop-in replacement with a lower threshold for code change.

zlib has a long history and is incredibly portable, even supporting lots of systems that predate the Internet. This is great, but it does complicate further development and maintainability. The zlib code has numerous workarounds for old compilers that do not understand ANSI-C or to accommodate systems with limitations such as operating in a 16-bit environment.

Many of these workarounds are only maintenance burdens, some of them are pretty huge code-wise. For example, the [v]s[n]printf workaround code has a whopping 8 different implementations just to cater to various old compilers. With this many workarounds cluttered throughout the code, new programmers with an idea/interest for zlib will need to take some time to figure out why all of these seemingly strange things are used, and how to work within those confines.

So I decided to make a fork, merge all the Intel optimizations, merge the Cloudflare optimizations that did not conflict, plus a couple of other smaller patches. Then I started cleaning out workarounds, various dead code, all contrib and example code as there is little point in having those in this fork for various reasons.

A lot of improvements have gone into zlib-ng since its start, and numerous people and companies have contributed both small and big improvements, or valuable testing.

Please read LICENSE.md, it is very simple and very liberal.

Build

There are two ways to build zlib-ng:

Cmake

To build zlib-ng using the cross-platform makefile generator cmake.

cmake .
cmake --build . --config Release
ctest --verbose -C Release

Alternatively, you can use the cmake configuration GUI tool ccmake:

ccmake .

Configure

To build zlib-ng using the bash configure script:

./configure
make
make test

Build Options

CMake configure Description Default
ZLIB_COMPAT --zlib-compat Compile with zlib compatible API OFF
ZLIB_ENABLE_TESTS Build test binaries ON
WITH_GZFILEOP --without-gzfileops Compile with support for gzFile related functions ON
WITH_OPTIM --without-optimizations Build with optimisations ON
WITH_NEW_STRATEGIES --without-new-strategies Use new strategies ON
WITH_NATIVE_INSTRUCTIONS --native Compiles with full instruction set supported on this host (gcc/clang -march=native) OFF
WITH_SANITIZER --with-sanitizer Build with sanitizer (memory, address, undefined) OFF
WITH_FUZZERS --with-fuzzers Build test/fuzz OFF
WITH_MAINTAINER_WARNINGS Build with project maintainer warnings OFF
WITH_CODE_COVERAGE Enable code coverage reporting OFF

Install

WARNING: We do not recommend manually installing unless you really know what you are doing, because this can potentially override the system default zlib library, and any incompatibility or wrong configuration of zlib-ng can make the whole system unusable, requiring recovery or reinstall. If you still want a manual install, we recommend using the /opt/ path prefix.

For Linux distros, an alternative way to use zlib-ng (if compiled in zlib-compat mode) instead of zlib, is through the use of the LD_PRELOAD environment variable. If the program is dynamically linked with zlib, then zlib-ng will temporarily be used instead by the program, without risking system-wide instability.

LD_PRELOAD=/opt/zlib-ng/libz.so.1.2.11.zlib-ng /usr/bin/program

Cmake

To install zlib-ng system-wide using cmake:

cmake --build . --target install

Configure

To install zlib-ng system-wide using the configure script:

make install

Contributing

Zlib-ng is a aiming to be open to contributions, and we would be delighted to receive pull requests on github. Just remember that any code you submit must be your own and it must be zlib licensed. Help with testing and reviewing of pull requests etc is also very much appreciated.

If you are interested in contributing, please consider joining our IRC channel #zlib-ng on the Freenode IRC network.

Acknowledgments

Thanks to Servebolt.com for sponsoring my maintainership of zlib-ng.

Thanks go out to all the people and companies who have taken the time to contribute code reviews, testing and/or patches. Zlib-ng would not have been nearly as good without you.

The deflate format used by zlib was defined by Phil Katz. The deflate and zlib specifications were written by L. Peter Deutsch.

zlib was originally created by Jean-loup Gailly (compression) and Mark Adler (decompression).

Advanced Build Options

CMake configure Description Default
ZLIB_DUAL_LINK Dual link tests with system zlib OFF
UNALIGNED_OK Allow unaligned reads ON (x86, arm)
--force-sse2 Skip runtime check for SSE2 instructions (Always on for x86_64) OFF (x86)
WITH_AVX2 Build with AVX2 intrinsics ON
WITH_SSE2 Build with SSE2 intrinsics ON
WITH_SSE4 Build with SSE4 intrinsics ON
WITH_PCLMULQDQ Build with PCLMULQDQ intrinsics ON
WITH_ACLE --without-acle Build with ACLE intrinsics ON
WITH_NEON --without-neon Build with NEON intrinsics ON
WITH_POWER8 Build with POWER8 optimisations ON
WITH_DFLTCC_DEFLATE --with-dfltcc-deflate Build with DFLTCC intrinsics for compression on IBM Z OFF
WITH_DFLTCC_INFLATE --with-dfltcc-inflate Build with DFLTCC intrinsics for decompression on IBM Z OFF
WITH_UNALIGNED Allow optimizations that use unaligned reads if safe on current arch ON
WITH_INFLATE_STRICT Build with strict inflate distance checking OFF
WITH_INFLATE_ALLOW_INVALID_DIST Build with zero fill for inflate invalid distances OFF
INSTALL_UTILS Copy minigzip and minideflate during install OFF

Related Projects

Comments
  • Adler32 Avx512 optimizations

    Adler32 Avx512 optimizations

    This PR adds AVX512 support, having 512 bit wide operations as well as VNNI based multiply-adds for the 8 bit integer operations (for CPUs that support it).

    optimization Architecture 
    opened by KungFuJesus 134
  • Revert inflate window reorganization

    Revert inflate window reorganization

    This PR removes the changes in PR #1033. Most of the performance benefits of those changes were due to the use of pclmulqdq when calculating the crc32. Now that we have PR #1161 that incorporates pclmulqdq into the functable.crc32 function, there is no need for reorganizing the inflate window. I think the inflate window was originally reorganized simply to handle the extra memory copy that the pclmulqdq version of crc32 required. According to the discussion on #1173, reverting these changes should also benefit png decoding performance.

    DEVELOP https://github.com/zlib-ng/zlib-ng/commit/43dbfd6709fb3a8028430ea30f3da88fbeb3ced9

    OS: Windows 10 10.0.22000 AMD64
    CPU: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
    Tool: ../zlib-ng/build-develop/Release/minigzip.exe
    Levels: 0-9
    Runs: 70         Trim worst: 40
    
     Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
     0    100.008%      0.018/0.019/0.020/0.000        0.018/0.019/0.020/0.001       15,737,543
     1     54.166%      0.094/0.095/0.096/0.001        0.046/0.047/0.049/0.001        8,523,732
     2     43.871%      0.153/0.155/0.156/0.001        0.047/0.049/0.051/0.001        6,903,609
     3     42.387%      0.210/0.212/0.214/0.001        0.047/0.048/0.050/0.001        6,670,099
     4     41.647%      0.232/0.234/0.236/0.001        0.044/0.047/0.048/0.001        6,553,723
     5     41.216%      0.243/0.246/0.247/0.001        0.045/0.046/0.047/0.001        6,485,938
     6     41.037%      0.277/0.281/0.283/0.002        0.044/0.046/0.047/0.001        6,457,776
     7     40.778%      0.345/0.350/0.353/0.002        0.045/0.046/0.047/0.001        6,416,919
     8     40.704%      0.433/0.438/0.440/0.002        0.044/0.046/0.047/0.001        6,405,244
     9     40.409%      0.514/0.518/0.521/0.002        0.045/0.046/0.048/0.001        6,358,951
    
     avg1  48.622%                        0.255                          0.044
     avg2  54.025%                        0.283                          0.049
     tot                                 76.445                         13.234       76,513,534
    
    OS: Windows 10 10.0.22000 AMD64
    CPU: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
    Tool: ../zlib-ng/build-develop/Release/minigzip.exe
    Levels: 0-9
    Runs: 70         Trim worst: 40
    
     Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
     0    100.008%      0.152/0.161/0.171/0.006        0.154/0.164/0.180/0.008      211,973,953
     1     44.409%      0.981/1.027/1.092/0.036        0.486/0.510/0.541/0.015       94,127,290
     2     35.518%      1.600/1.679/1.794/0.060        0.506/0.533/0.558/0.015       75,282,961
     3     33.882%      2.274/2.385/2.525/0.062        0.488/0.517/0.544/0.014       71,816,478
     4     33.174%      2.576/2.665/2.787/0.071        0.481/0.509/0.532/0.017       70,315,668
     5     32.660%      2.692/2.776/2.924/0.072        0.478/0.501/0.526/0.014       69,225,542
     6     32.508%      3.225/3.335/3.569/0.099        0.469/0.493/0.521/0.013       68,902,222
     7     32.255%      4.390/4.511/4.754/0.113        0.477/0.495/0.519/0.013       68,366,800
     8     32.167%      7.122/7.269/7.570/0.139        0.473/0.497/0.523/0.015       68,180,776
     9     31.887%      7.307/7.448/7.896/0.177        0.473/0.494/0.521/0.015       67,586,442
    
     avg1  40.847%                        3.326                          0.471
     avg2  45.385%                        3.695                          0.524
     tot                                997.655                        141.397      865,778,132
    

    PR https://github.com/zlib-ng/zlib-ng/pull/1178/commits/fa118e26e83ea62227b80f25248313a1cf42dcfe

    OS: Windows 10 10.0.22000 AMD64
    CPU: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
    Tool: ../zlib-ng/build-revert/Release/minigzip.exe
    Levels: 0-9
    Runs: 70         Trim worst: 40
    
     Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
     0    100.008%      0.018/0.020/0.021/0.001        0.017/0.018/0.019/0.001       15,737,543
     1     54.166%      0.092/0.094/0.096/0.001        0.047/0.048/0.049/0.000        8,523,732
     2     43.871%      0.154/0.156/0.157/0.001        0.049/0.050/0.051/0.001        6,903,609
     3     42.387%      0.208/0.212/0.215/0.001        0.047/0.048/0.049/0.001        6,670,099
     4     41.647%      0.231/0.233/0.235/0.001        0.046/0.047/0.048/0.001        6,553,723
     5     41.216%      0.235/0.239/0.240/0.001        0.046/0.047/0.049/0.001        6,485,938
     6     41.037%      0.271/0.276/0.278/0.001        0.045/0.047/0.048/0.001        6,457,776
     7     40.778%      0.344/0.349/0.353/0.002        0.046/0.047/0.048/0.001        6,416,919
     8     40.704%      0.434/0.438/0.441/0.002        0.045/0.047/0.047/0.001        6,405,244
     9     40.409%      0.511/0.516/0.519/0.002        0.045/0.047/0.048/0.001        6,358,951
    
     avg1  48.622%                        0.253                          0.045
     avg2  54.025%                        0.281                          0.050
     tot                                 75.976                         13.391       76,513,534
    
    OS: Windows 10 10.0.22000 AMD64
    CPU: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
    Tool: ../zlib-ng/build-revert/Release/minigzip.exe
    Levels: 0-9
    Runs: 70         Trim worst: 40
    
     Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
     0    100.008%      0.148/0.154/0.159/0.003        0.136/0.145/0.150/0.004      211,973,953
     1     44.409%      0.970/0.989/1.002/0.008        0.485/0.499/0.507/0.006       94,127,290
     2     35.518%      1.607/1.631/1.645/0.010        0.506/0.517/0.522/0.005       75,282,961
     3     33.882%      2.277/2.311/2.340/0.015        0.485/0.497/0.504/0.006       71,816,478
     4     33.174%      2.553/2.589/2.606/0.013        0.477/0.487/0.495/0.005       70,315,668
     5     32.660%      2.605/2.642/2.659/0.014        0.477/0.489/0.498/0.006       69,225,542
     6     32.508%      3.154/3.183/3.205/0.017        0.472/0.481/0.489/0.005       68,902,222
     7     32.255%      4.379/4.415/4.430/0.012        0.470/0.482/0.490/0.005       68,366,800
     8     32.167%      7.048/7.143/7.194/0.040        0.472/0.480/0.488/0.005       68,180,776
     9     31.887%      7.240/7.292/7.322/0.023        0.468/0.481/0.487/0.004       67,586,442
    
     avg1  40.847%                        3.235                          0.456
     avg2  45.385%                        3.594                          0.507
     tot                                970.505                        136.757      865,778,132
    
    optimization 
    opened by nmoinvaz 104
  • Implement Google Test framework

    Implement Google Test framework

    • CMake now uses Google Test instead of example.
    • Configure/nmake no longer test example, but just use simple minigzip tests. This is because fetching Google Test repository might add more complexity than it is worth.
    • This resolves #1017 by testing each Adler32 and Crc32 variant available on the host platform.
    image enhancement cleanup Continuous Integration 
    opened by nmoinvaz 97
  • Incorporate fast-zlib changes.

    Incorporate fast-zlib changes.

    This is a PR that seeks to incorporate the fast zlib changes mentioned in #97. Fast-zlib changes require rolling hash to be used. It also performs best when hash mask is 32k-1 instead of 64k-1 which is used for integer hashing method we currently use.

    I don't expect these changes to be incorporated into the first release. We might also decide to not incorporate them at all due to the added complexity. But this is the first step toward determining that. And in the future should anybody want to investigate this again they can use this PR as a starting base.

    Benchmarks

    MADLER

     Tool: minigzip-madler.exe Levels: 1-9
     Runs: 70         Trim worst: 40
    
     Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
     1     36.448%      2.421/2.433/2.442/0.006        0.907/0.919/0.925/0.004       77,255,275
     2     35.381%      2.675/2.689/2.696/0.006        0.897/0.907/0.916/0.006       74,991,864
     3     34.426%      3.306/3.321/3.332/0.007        0.880/0.900/0.908/0.008       72,967,657
     4     33.498%      3.540/3.556/3.566/0.008        0.876/0.891/0.896/0.006       71,002,237
     5     32.627%      4.840/4.858/4.872/0.007        0.876/0.888/0.892/0.004       69,156,120
     6     32.188%      6.789/6.822/6.842/0.015        0.870/0.880/0.886/0.004       68,224,174
     7     32.051%      8.181/8.214/8.234/0.013        0.865/0.877/0.882/0.005       67,934,314
     8     31.935%   12.677/12.719/12.745/0.019        0.863/0.872/0.877/0.003       67,688,380
     9     31.916%   16.200/16.248/16.282/0.022        0.861/0.872/0.877/0.004       67,647,801
    
     avg1  33.385%                        6.762                          0.889
     tot                               1825.836                        240.135      636,867,822
    

    ZLIB-NG

    Tool: minigzip-ng.exe Levels: 1-9
    Runs: 70         Trim worst: 40
    
    Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
    1     47.618%      1.840/1.859/1.865/0.006        0.681/0.688/0.693/0.003      100,930,315
    2     35.519%      2.050/2.065/2.074/0.007        0.703/0.715/0.719/0.005       75,286,316
    3     34.198%      2.424/2.435/2.442/0.005        0.686/0.699/0.704/0.004       72,485,932
    4     32.928%      2.849/2.866/2.874/0.006        0.674/0.681/0.687/0.004       69,794,199
    5     32.661%      3.049/3.059/3.064/0.004        0.667/0.678/0.683/0.004       69,226,720
    6     32.507%      3.475/3.493/3.502/0.007        0.670/0.678/0.682/0.003       68,902,076
    7     32.255%      4.448/4.468/4.480/0.008        0.666/0.676/0.681/0.004       68,366,763
    8     32.167%      6.652/6.671/6.683/0.009        0.666/0.675/0.679/0.003       68,180,782
    9     32.156%      9.090/9.109/9.124/0.010        0.667/0.674/0.678/0.003       68,156,152
    
    avg1  34.668%                        4.003                          0.685
    tot                               1080.764                        184.972      661,329,255
    

    FAST-ZLIB

     Tool: minigzip-fast-zlib.exe Levels: 1-9
    Runs: 70         Trim worst: 40
    
    Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
    1     36.448%      2.325/2.350/2.359/0.008        0.908/0.918/0.926/0.005       77,255,275
    2     35.381%      2.517/2.535/2.543/0.006        0.889/0.909/0.917/0.008       74,991,864
    3     34.426%      3.072/3.087/3.097/0.008        0.893/0.901/0.908/0.005       72,967,657
    4     33.481%      3.359/3.377/3.386/0.007        0.874/0.891/0.896/0.005       70,966,161
    5     32.489%      4.366/4.377/4.387/0.006        0.870/0.882/0.887/0.005       68,863,411
    6     32.061%      5.148/5.173/5.188/0.011        0.865/0.878/0.882/0.004       67,955,060
    7     31.979%      5.433/5.459/5.474/0.011        0.868/0.875/0.879/0.003       67,782,174
    8     31.913%      5.889/5.909/5.921/0.010        0.863/0.872/0.878/0.004       67,641,467
    9     31.909%      6.116/6.155/6.174/0.014        0.865/0.872/0.876/0.003       67,634,262
    
    avg1  33.343%                        4.269                          0.889
    tot                               1152.638                        239.939      636,057,331
    

    ZLIB-NG-FAST

     Tool: minigzip-fast-ng.exe Levels: 1-9
    Runs: 70         Trim worst: 40
    
    Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
    1     47.618%      1.853/1.864/1.870/0.005        0.681/0.690/0.694/0.004      100,930,315
    2     35.519%      2.062/2.078/2.086/0.008        0.705/0.715/0.721/0.004       75,286,316
    3     34.198%      2.435/2.451/2.462/0.008        0.693/0.700/0.705/0.003       72,485,932
    4     32.928%      2.870/2.889/2.899/0.007        0.675/0.682/0.688/0.004       69,794,199
    5     32.661%      3.062/3.075/3.086/0.007        0.673/0.680/0.685/0.003       69,226,720
    6     32.507%      3.490/3.513/3.522/0.008        0.668/0.678/0.683/0.004       68,902,076
    7     32.255%      4.488/4.516/4.525/0.008        0.669/0.677/0.682/0.003       68,366,763
    8     32.167%      6.708/6.723/6.735/0.008        0.661/0.676/0.680/0.005       68,180,782
    9     32.003%      8.272/8.302/8.321/0.015        0.677/0.686/0.689/0.003       67,831,961
    
    avg1  34.651%                        3.935                          0.687
    tot                               1062.329                        185.535      661,005,064
    
    optimization Next Devel 
    opened by nmoinvaz 74
  • Split up memcopy by architecture

    Split up memcopy by architecture

    This PR is the final result of my changes to break out architecture specific code into separate files.

    This code moves all of the memory chunk copying code into their own files and utilizes functable to reference them.

    enhancement Architecture 
    opened by nmoinvaz 74
  • AVX2 optimizations

    AVX2 optimizations

    Since this is constant, anyway, we may as well use the variant that doesn't add vector register pressure, has better ILP opportunities, and has shorter instruction latency.

    optimization Architecture 
    opened by KungFuJesus 71
  • Speed up chunkcopy and memset

    Speed up chunkcopy and memset

    This was found to have a significant impact on a highly compressible PNG for both the encode and decode. Some deltas show performance improving as much as 60%+.

    For the scenarios where the "dist" is not an even modulus of our chunk size, we simply repeat the bytes as many times as possible into our vector registers. We then copy the entire vector and then advance the quotient of our chunksize divided by our dist value.

    If dist happens to be 1, there's no reason to not just call memset from libc (this is likely to be just as fast if not faster).

    bug optimization 
    opened by KungFuJesus 69
  • Clean up longest_match variants and abstract match comparisons to compare258

    Clean up longest_match variants and abstract match comparisons to compare258

    This PR changes cleans up all the longest_match variants by abstracting the memory comparison code into compare258 functions. There is static versions uses for longest_match which is in match_p.h. And there is non-static versions for use in deflate_quick. By changing deflate_quick to use functable.compare258 this allows for a future PR that will make deflate_quick cross-platform.

    One major benefit to this PR is that the longest_match function can now use compare258 with SSE 4.2 intrinsics where as before it was only used by deflate_quick. I have also added a 64-bit version of uanligned compare258 when SSE 4.2 is not supported.

    The benefits of this PR are:

    • Longest_match comparisons are abstracted out so that we can better support additional architectures
    • All deflate algorithms can now take advantage of intrinsics in memory comparison
      • Previously only deflate_quick used compare258_sse4.
    • Allows us to port deflate_quick to all other platforms
    • Adds support for 64-bit unaligned comparison on platforms that don't support SSE4 or AVX2
    • Unifies longest_match variants

    For background see #496. For previous PR see #544.

    enhancement optimization cleanup 
    opened by nmoinvaz 62
  • Improved adler32 NEON performance by 30-47%

    Improved adler32 NEON performance by 30-47%

    We unlocked some ILP by allowing for independent sums in the loop and reducing these sums outside of the loop. Additionally, the multiplication by 32 (now 64) is moved outside of this loop. We have enough register headroom that we could additionally track some more of the sums in the loop separately but it's suspected that the widening pairwise additions are the actual bottlenecks here. It might be possible with additional branching to do half of the pairwise additions by summing to 16 bit intermediates but this requires a fair bit of bookkeeping to get correct. The 16 bit intermediates every 128 loop iterations need to be summed into 32 bit values to not overflow.

    On the Odroid-N2, the Cortex-A73 cores are ~20-25% faster on the adler32 benchmark, and the Cortex-A53 cores are anywhere from 25-30% faster.

    optimization Architecture 
    opened by KungFuJesus 51
  • Inline adler copy

    Inline adler copy

    This PR gives the adler checksum a similar treatment to the CRC32 folded calculation. Instead checksumming, followed by a memcpy, this performs the copies while doing the checksum.

    The code is written in such a way that preprocessor is leveraged for templated functions that are 95% similar. This was chosen as an alternative to putting a branch in the tight loop body. This was definitely worthwhile.

    Some notes:

    • Attempts were made at trying to minimize the rebasing back to the adler base but in doing so, much more heap access was required, killing any gains
    • Architectures that benefit from SIMD alignment will not benefit from this if the penalty for unaligned access is as much as the gains here (this definitely includes pre-Nehalem Intel CPUs, all ARM so far (M1 may differ), and PowerPC). For those, a fallback C routine exists that calls their respective adler checksum followed by a memcpy
    • In testing this, it seems that there are a lot of cases where chunkmemset_sse2 is faster than chunkmemset_avx. I'll want to explore that more, possibly with my permutation table idea to derive a fast preloaded copy vector.
    • We're beating Chromium's zlib implementation now at just about every compression level most of the time. Sometimes there were traded blows but for the most part, we're anywhere from 2-10% faster for PNG decoding for realistic imagery. For insanely compressible things (where we don't spend nearly as much time in the machinations of inflate) we are more than twice as fast. Where we aren't probably has much to do with the fact that they don't dispatch to an architecture specific function table for copies (though, for when we have larger copy runs it might be why we're beating them by so much). Google's code mostly just inlines these routines and has unrolled copy loops in their inflate implementation. I'm fairly confident improving chunkmemset could still give us a pretty big advantage here, though.
    optimization Architecture 
    opened by KungFuJesus 49
  • Improvements to avx512 adler32 implementations

    Improvements to avx512 adler32 implementations

    Now that better benchmarks are in place, it became apparent that masked broadcast was not faster and it's actually faster to use vmovd, as suspected. Additionally, for the VNNI variant, we've unlocked some additional ILP by doing a second dot product in the loop to a different running sum that gets recombined later. This broke a data dependency chain and allowed the IPC be ~2.75. The result is about a 40-50% improvement in runtime.

    Additionally, we've called the lesser SIMD sized variants if the input is too small and they happen to be compiled in. This helps for the impossibly small input that still is large enough to be a vector length. For size 16 and 32 inputs I was seeing something like sub 10 ns instead of 50 ns.

    bug optimization Architecture 
    opened by KungFuJesus 49
  • Symbol versioning support for multiple versions of functions.

    Symbol versioning support for multiple versions of functions.

    This allows us to make changes to functions without losing backwards compatibility.

    With this we can:

    • Change types
    • Add/remove parameters
    • Change function or parameter meaning

    This means we can have one version of a function bound to 2.0.0 backwards compatibility while also having a function of the same name bound to 2.1.0 that differs from the old one.

    For example this can allow us to use the backwards incompatible crc32 changes zlib introduced, but limited to 2.1.0, while still keeping backwards compatibility with applications compiled against 2.0.0. This would make it easier for distros to upgrade from 2.0.0 to 2.1.0 since it would not require a full recompile of all the dependent applications at the same time.

    In the case of zlibng_version, the following is how the symbol is now exported:

    $ readelf --dyn-syms -W libz-ng.so|grep version
        43: 00000000000126d0     8 FUNC    GLOBAL DEFAULT   12 zlibng_version@@ZLIB_NG_2.1.0
        44: 00000000000126d0     8 FUNC    GLOBAL DEFAULT   12 zlibng_version
    
    enhancement Compatibility 
    opened by Dead2 8
  • [BROKEN] Stable pre release

    [BROKEN] Stable pre release

    Once completed, reviewed, tested and verified, this will go on to become Stable version 2.0.7.

    If you think any important fixes are missing, tell me the PR and commit numbers.

    The following has not been backported (yet?):

    • DFLTCC related fixes and improvements @iii-i.
    • Most ARM64 fixes, especially related to the new Apple cpu line.
    • Some Power fixes
    • Most tests and CI changes

    These were too involved with code that has been completely rewritten since 2.0 and would need backporting efforts that targets each of the above and brings them up to date with the most important fixes only. I accept PRs for necessary backports for these.

    The CI we could perhaps do a more or less complete backport of, as that would probably be easier than finding fixes, and also more maintainable for the future.

    Changes since 2.0.6:

    • Fix CVE-2022-37434 #1328
    • Fix chunkmemset #1196
    • Fix deflateBound too small #1236
    • Fix Z_SOLO #1263
    • Fix ACLE variant of crc32 #1274
    • Fix inflateBack #1311
    • Fix warnings #1194 #1312 #1362
    • MacOS build fix #1198
    • Add invalid windowBits handling #1293
    • Support for Force TZCNT #1186
    • Support for aligned_alloc() #1360
    • Minideflate improvements #1175 #1238
    • Dont use unaligned access for memcpy #1309
    • Build system #1209 #1233 #1267 #1273 #1278 #1292 #1316 #1318 #1365
    • CI and Test improvements #1184 #1208 #1227 #1241 #1334 #1353 #1368
    • Cleanup #1266
    • Documentation #1205 #1359
    • Misc improvements #1294 #1297 #1306 #1344 #1348
    Next Stable 
    opened by Dead2 10
  • Regression in 2.0.6 that corrupts git repositories when using git repack

    Regression in 2.0.6 that corrupts git repositories when using git repack

    When using git-repack against zlib-ng 2.0.6 with compat mode, the result is corrupted, e.g:

    Error: inflate: data stream error (incorrect data check)
    Error: failed to unpack compressed delta at offset 1459671 from .git/objects/pack/pack-7e786173f2a4faf21da0a3b61492da7620ae09cb.pack
    

    bisect of 2.0.5 -> 2.0.6 points at this commit being the one "bad"

    commit 8a378ba9b85e23a6e2e67b01a1b3d738e86faefe
    Author: Hans Kristian Rosbach <[email protected]>
    Date:   Mon Dec 13 22:30:58 2021 +0100
    
        Fix deflateBound and compressBound returning very small size estimates.
        Remove workaround in switchlevels.c, so we do actual testing of this.
        Use named defines instead of magic numbers where we can.
    

    bisecting this on the develop branch points that the following commit fixes it:

    From 4fadf3c49e3dc98c5e1c0b86401324061d951d9f Mon Sep 17 00:00:00 2001
    From: Mika Lindqvist <[email protected]>
    Date: Wed, 6 Apr 2022 00:04:45 +0300
    Subject: [PATCH] Add one extra byte to return value of compressBound and
     deflateBound for small lengths due to shift returning 0. * Treat 0 byte input
     as 1 byte input when calculating compressBound and deflateBound
    

    which applies cleanly on 2.0.6. please accept that into a future 2.0.x update.

    opened by dirkmueller 0
  • IBM zSystems DFLTCC: Do not update strm.adler for raw streams

    IBM zSystems DFLTCC: Do not update strm.adler for raw streams

    Commit d38dd9240f2d ("IBM Z DFLTCC: Fix updating strm.adler with inflate()") broke libxml2, as can be seen with the repro from [1]:

    $ echo "<a></a>" | gzip >file.xml.gz
    $ python3 -c 'import libxml2; libxml2.parseFile("file.xml.gz")'
    file.xml.gz:1: parser error : Document is empty
    

    This is because libxml2 expects strm.adler to be untouched for raw streams.

    Fix this and a similar issue in deflate by adding state->wrap checks. Add tests.

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=2155328 [2] https://gitlab.gnome.org/GNOME/libxml2/-/blob/v2.10.3/xzlib.c#L607

    opened by iii-i 1
  • MSVC warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)

    MSVC warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)

    Warning inflate.c(1356,9): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) introduced in commit 3eab3173ac7d1d53457452f3cd1eaeea5b2d43df

    This warning is present in CI output: https://github.com/zlib-ng/zlib-ng/actions/runs/3686988352/jobs/6239982515#step:12:37

    cc @iii-i

    opened by phprus 2
Releases(2.0.6)
Owner
zlib-ng
zlib-ng
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz

Miniz Miniz is a lossless, high performance data compression library in a single source file that implements the zlib (RFC 1950) and Deflate (RFC 1951

Rich Geldreich 1.6k Jan 5, 2023
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.0 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 971 Jan 4, 2023
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.1 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 971 Jan 4, 2023
PNGFuse is a cross-platform application that allows you to embed and extract full zlib-compressed files within PNG metadata.

PNGFuse PNGFuse is a portable, lightweight, and cross-platform application written in C++ that allows you to embed and extract full zlib-compressed fi

Eta 3 Dec 29, 2021
A C++ header-only ZLib wrapper

A C++ ZLib wrapper This C++ header-only library enables the use of C++ standard iostreams to access ZLib-compressed streams. For input access (decompr

Matei David 242 Jan 5, 2023
gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

ACM at UCLA 8 Nov 6, 2022
zlib replacement with optimizations for "next generation" systems.

zlib-ng zlib data compression library for the next generation systems Maintained by Hans Kristian Rosbach aka Dead2 (zlib-ng àt circlestorm dót org) C

zlib-ng 13 Dec 29, 2022
zlib replacement with optimizations for "next generation" systems.

zlib-ng zlib data compression library for the next generation systems Maintained by Hans Kristian Rosbach aka Dead2 (zlib-ng àt circlestorm dót org) C

zlib-ng 1.2k Dec 31, 2022
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz

Miniz Miniz is a lossless, high performance data compression library in a single source file that implements the zlib (RFC 1950) and Deflate (RFC 1951

Rich Geldreich 1.6k Jan 5, 2023
nanomsg-next-generation -- light-weight brokerless messaging

nng - nanomsg-next-gen ℹ️ If you are looking for the legacy version of nanomsg, please see the nanomsg repository. This project is a rewrite of the Sc

nanomsg 3k Dec 30, 2022
This repository accompanies Ray Tracing Gems II: Next Generation Rendering with DXR, Vulkan, and OptiX

Apress Source Code This repository accompanies Ray Tracing Gems II: Next Generation Rendering with DXR, Vulkan, and OptiX by Adam Marrs, Peter Shirley

Apress 684 Dec 29, 2022
FluidNC - The next generation of motion control firmware

FluidNC (CNC Controller) For ESP32 Introduction FluidNC is the next generation of Grbl_ESP32. It has a lot of improvements over Grbl_ESP32 as listed b

null 683 Jan 3, 2023
A next generation media player, with vim-like bindings

MusicKid A next generation media player, with vim-like bindings Installation Clone the repo git clone <git-url> cd MusicKid/Final Install dependencies

null 4 Jan 10, 2022
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Overview GridDB is Database for IoT with both NoSQL interface and SQL Interface. Please refer to GridDB Features Reference for functionality. This rep

GridDB 2k Jan 8, 2023
Compiler Optimizations Playground

This is (hopefully) the simplest implementation of the classic register-machine intermediate representation (IR) to undertake data and control flow analysis in a compiler middle-end.

null 27 May 31, 2022
brainfuck interpreter and repl with some optimizations implemented in.

bfc brainfuck interpreter and repl with some optimizations implemented in. building bfc uses premake5 to generate the required build files. main:bfc (

nwxnk 5 Dec 9, 2021
ImmortalWrt is a fork of OpenWrt, with more packages ported, more devices supported, better performance, and special optimizations for mainland China users.

ImmortalWrt is a fork of OpenWrt, with more packages ported, more devices supported, better performance, and special optimizations for mainland China users.

null 4 Jan 31, 2022
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.0 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 971 Jan 4, 2023
Heavily optimized zlib compression algorithm

Optimized version of longest_match for zlib Summary Fast zlib longest_match function. Produces slightly smaller compressed files for significantly fas

Konstantin Nosov 124 Dec 12, 2022
Przemyslaw Skibinski 579 Jan 8, 2023