Gzip header-only C++ library

Related tags

Compression gzip-hpp
Overview

Gzip C++ lib for gzip compression and decompression. Extracted from mapnik-vector-tile for light-weight modularity.

Build Status hpp-skel badge

Usage

// Include the specific gzip headers your code needs, for example...
#include <gzip/compress.hpp>
#include <gzip/config.hpp>
#include <gzip/decompress.hpp>
#include <gzip/utils.hpp>
#include <gzip/version.hpp>

// All function calls must pass in a pointer of an 
// immutable character sequence (aka a string in C) and its size
std::string data = "hello";
const char * pointer = data.data();
std::size_t size = data.size();

// Check if compressed. Can check both gzip and zlib.
bool c = gzip::is_compressed(pointer, size); // false

// Compress returns a std::string
std::string compressed_data = gzip::compress(pointer, size);

// Decompress returns a std::string and decodes both zlib and gzip
const char * compressed_pointer = compressed_data.data();
std::string decompressed_data = gzip::decompress(compressed_pointer, compressed_data.size());

// Or like so
std::string compressed_data = gzip::compress(tile->data(), tile->data.size());

// Or like so
std::string value = gzip::compress(node::Buffer::Data(obj), node::Buffer::Length(obj));

// Or...etc

Compress

// Optionally include compression level
std::size_t size; // No default value, but what happens when not passed??
int level = Z_DEFAULT_COMPRESSION; // Z_DEFAULT_COMPRESSION is the default if no arg is passed

std::string compressed_data = gzip::compress(tile->data(), size, level);

Decompress

// No args other than the std:string
std::string data = "hello";
std::string compressed_data = gzip::compress(data);
const char * compressed_pointer = compressed_data.data();

std::string decompressed_data = gzip::decompress(compressed_pointer, compressed_data.size());

Test

# build test binaries
make

# run tests
make test

You can make Release test binaries as well

BUILDTYPE=Release make
BUILDTYPE=Release make test

Versioning

This library is semantically versioned using the /include/gzip/version.cpp file. This defines a number of macros that can be used to check the current major, minor, or patch versions, as well as the full version string.

Here's how you can check for a particular version to use specific API methods

#if GZIP_VERSION_MAJOR > 2
// use version 2 api
#else
// use older verion apis
#endif

Here's how to check the version string

std::cout << "version: " << GZIP_VERSION_STRING << "/n";
// => version: 0.2.0

And lastly, mathematically checking for a specific version:

#if GZIP_VERSION_CODE > 20001
// use feature provided in v2.0.1
#endif
Comments
  • Add optional -Werror flag

    Add optional -Werror flag

    Per https://github.com/mapbox/gzip-hpp/issues/2

    Next Actions

    • [x] Fix current warnings (which are now erroring)
    • [x] Set MAX for compress/decompress size
    • [x] Remove std::string functions and only allow char * (per https://github.com/mapbox/gzip-hpp/pull/1#issuecomment-326078576)
    • [x] Fix compile errors/Travis tests 🍏
    • [ ] ~~Add tidy and format scripts~~
    • [x] add handy Dockerfile
    • [x] Code Review

    cc @springmeyer @mapsam

    opened by GretaCB 18
  • Blue sky/future: investigate alternative, high perf deflate implementions

    Blue sky/future: investigate alternative, high perf deflate implementions

    I presume our wrapper around zlib is as fast as possible, and faster than the C++ wrapper boost provides (refs #7).

    But in the past I've seen zlib compression be a meaningful % of the time taken to work with vector tiles. Often the usecase is:

    • Get a tile over a network or a db
    • Decompress it
    • Do an operation on it (query, shave, composite, etc)
    • Recompress
    • Send back over a network or put into db

    When the "do an operation" is fairly speedy, the decompress and recompress times are often meaningful (at least 5-10% of the time taken on the CPU). In applications handling a lot of concurrent requests where the CPU may be working hard, we can increase the amount of concurrency possible on a single machine by reducing the amount of work on the CPU.

    So, this is a long way of saying:

    • gzip is fairly cheap and likely already as fast as it can be
    • except in rare cases of high load with fairly optimized code (meaning where the vector tile operation is already pretty fast so the gzip part stands out more)...
    • In that case we might want to revisit trying to speed up gzip coding operations

    When/if we do, then we should look into benchmarking https://github.com/ebiggers/libdeflate, which claims to be faster than zlib.

    //cc @GretaCB @flippmoke

    opened by springmeyer 8
  • Multiply-defined symbols linker error

    Multiply-defined symbols linker error

    I've included gzip.hpp in two CPP files in a VS2017 project, and when it links I get "already defined" errors for gzip::compress and gzip::decompress. Is there a way to allow multiple inclusion or do I need to include it just once in a wrapper class?

    opened by kravlost 6
  • Travis temporarily broken

    Travis temporarily broken

    I noticed this repo is currently in an odd state due to https://docs.travis-ci.com/user/open-source-on-travis-ci-com/#Existing-Open-Source-Repositories-on-travis-ci.org. When trying to enable it on https://travis-ci.com/profile/mapbox I see:

    screen shot 2018-09-27 at 10 46 39 am

    @artemp until this is fixed please ensure your tests are passing locally at https://github.com/mapbox/gzip-hpp/pull/25 and ensure you get another reviewer to confirm as well.

    /cc @mapsam for visibility

    opened by springmeyer 5
  • Add new, lower level API

    Add new, lower level API

    Context

    Applications using gzip-hpp like node-cpp-skel (and apps based on it) need to work in as zero-copy way as possible. The common usecase we have is:

    • create a std::unique_ptr<std::string>
    • write a gzip encoded data to that std::string inside that ptr
    • pass the std::string ownership to node.js

    This is described in detail at https://github.com/mapbox/node-cpp-skel/issues/69. And https://github.com/mapbox/node-cpp-skel/issues/67 also relates.

    Problem

    The current gzip API in master was designed by @GretaCB and @springmeyer to be simple and easy to use. However it does allow you to write to memory that is owned elsewhere. It only has the ability to create a new std::string.

    Proposed Solution

    So I think the best solution is what is proposed in this PR, which:

    • Keeps the existing API working without changes
    • Adds a new, lower level API that can be used by clients with high performance or zero copy needs.

    With the low level API it is now possible to:

    • write (both in compress and decompress) to an existing std::string by reference, which allows the caller to control the memory and allocation of this memory. An advantage here is that the caller may want to reuse this buffer as an arena or might want to pre-allocate lots of memory with reserve to ensure writing to this buffer does not require re-allocation.
    • ~~resize the buffer only if needed~~
    opened by springmeyer 5
  • Should we allow compressing an empty string?

    Should we allow compressing an empty string?

    Currently the API allows for compressing an empty string without throwing. It results in a compressed string that is 20 chars long. Should we continue to support this? Or should we catch this case and throw (since this usage is likely a programmer mistake)?

    /cc @mapbox/core-tech

    opened by springmeyer 5
  • Add benchmarks

    Add benchmarks

    Per https://github.com/mapbox/hpp-skel/issues/33, add bench tests so we can measure performance going forward, specifically API changes in https://github.com/mapbox/gzip-hpp/pull/5

    Next Actions

    • [x] Troubleshoot compile errors
    • [x] Add benchmarks for both decompress and compress

    There have been a few improvements in hpp-skel since I ported this over. Planning to add those in a separate PR (format, tidy, Debug config, etc)

    cc @springmeyer

    opened by GretaCB 5
  • Update Readme

    Update Readme

    Add gzip-specific docs to readme.

    Next Actions

    • [x] Enable Travis tests and Uudate Travis badge link
    • [x] Get tests passing 🍏, fix current AddressSanitizer failure
    • [x] Add pointer compress/decompress tests. Right now there are only std::string tests.
    • [x] Update versioning test/code (currently commented out)
    • [x] Update versioning section in Readme after ^^^ is finished
    • [x] Code Review

    cc @mapsam @springmeyer

    opened by GretaCB 5
  • Where did this implementation originally come from?

    Where did this implementation originally come from?

    This project exists, in part, to stop the dangerous practice of copying code around without properly understanding what it does and without there being a clear lineage of bug fixes and docs.

    So, we are working on starting off on the right foot in this repo / fixing our colored past. But the question has come up during the vetting of the code of: "where did this come from and why was this decision made?".

    I spent some time trying to figure out the history. We know that proximately we copied the code from mapnik-vector-tile to get this project started, but where did that code come from? Well, if you search on github for while (inflate_s.avail_out == 0) (which is a pretty unique line) we get a bunch of hits for the same code copied around: https://github.com/search?q=while+%28inflate_s.avail_out+%3D%3D+0%29&type=Code&utf8=%E2%9C%93. In looking at that I think the first to come along was https://github.com/kkaefer/DEPRECATED-node-zlib from @kkaefer. So I think that is the origination of the code that mapnik-vector-tile has been using. And the history of commits in that repo show some of the reasons for changes in the code (like the addition of handling of Z_BUF_ERROR in https://github.com/kkaefer/DEPRECATED-node-zlib/commit/523477f17f397e8e455b19e4613204e5efcb8786#diff-1edfe6d0b0a3a9c5bcbb3ba0e11144a9).

    opened by springmeyer 3
  • Avoid re-allocation overhead by predicting the uncompressed buffer size?

    Avoid re-allocation overhead by predicting the uncompressed buffer size?

    Per chat with @joto , we learned that gzip lib has another API that takes the original size of the original decompressed data. See example in libosmium. This could help with performance in gzip-hpp to avoid constantly resizing the write buffer which requires reallocations.

    This would mean grabbing the original filesize/buffer length via zlib's ISIZE flag stored in the header.

    cc @springmeyer

    opened by GretaCB 3
  • [WIP] start a gzip compression CLI

    [WIP] start a gzip compression CLI

    First steps toward a command line tool that can compress data. Designed to be compared to the unix gzip tool (TODO: make sure this tool performs as well)

    opened by springmeyer 2
  • Docker update request

    Docker update request

    Any chance we can get a 20.04 dockerfile (and upgrade from 16.04) for master? Also perhaps update zlib version from 1.2.8 to 1.2.11? other dependencies..

    opened by pythonmobile 0
  • Making the repo suitable for having it as submodule

    Making the repo suitable for having it as submodule

    The way the library has been implemented we need to install the library in order to use it, otherwise nothing will work as in the source code you include the files from system's file like `#include <gzip/...> and it made the project impossible for having it as a submodule; since this project is a header-only, the changes in this commit are compatible with the submodule approach and current make it installed approach. Test units are also passing.

    opened by alphamarket 3
  • Added missing include guards. Fixes a lot of compile issues.

    Added missing include guards. Fixes a lot of compile issues.

    The headers provided are generally missing include guards. This led to many compiler issues in a few projects.

    I've added basic include guards, so this doesn't happen in future and ugly hacks like this aren't needed:

    #ifndef GZIP_HPP_INCLUDED
    #define GZIP_HPP_INCLUDED
    #include "gzip/decompress.hpp"
    #endif
    
    opened by SimonCahill 0
  • benchmark_register.cc:(.text+0x26e6): undefined reference to `std::__throw_out_of_range_fmt(char const*, ...)

    benchmark_register.cc:(.text+0x26e6): undefined reference to `std::__throw_out_of_range_fmt(char const*, ...)

    cmake did not error out

    cmake .. -- The CXX compiler identification is GNU 4.8.5 -- Check for working CXX compiler: /usr/lib64/ccache/c++ -- Check for working CXX compiler: /usr/lib64/ccache/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Configuring release build -- Looking for C++ include pthread.h -- Looking for C++ include pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Configuring done -- Generating done -- Build files have been written to: /home1/newdbadmin/gzip/gzip-hpp/build

    make has error: /home/travis/build/mapbox/mason/mason_packages/.build/benchmark-1.3.0/src/benchmark_register.cc:(.text+0x26e6): undefined reference to `std::__throw_out_of_range_fmt(char const*, ...)'

    make Scanning dependencies of target bench-tests [ 16%] Building CXX object CMakeFiles/bench-tests.dir/bench/run.cpp.o [ 33%] Linking CXX executable bench-tests ../mason_packages/linux-x86_64/benchmark/1.3.0/lib/libbenchmark.a(benchmark_register.cc.o): In function benchmark::internal::Benchmark::Ranges(std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > > const&)': /home/travis/build/mapbox/mason/mason_packages/.build/benchmark-1.3.0/src/benchmark_register.cc:(.text+0x26e6): undefined reference tostd::__throw_out_of_range_fmt(char const*, ...)' collect2: error: ld returned 1 exit status make[2]: *** [bench-tests] Error 1 make[1]: *** [CMakeFiles/bench-tests.dir/all] Error 2 make: *** [all] Error 2

    Thanks

    opened by geyungjen 0
  • error: reinterpret_cast from type ‘const char*’ to type ‘Bytef* {aka unsigned char*}’ casts away qualifiers

    error: reinterpret_cast from type ‘const char*’ to type ‘Bytef* {aka unsigned char*}’ casts away qualifiers

    The solution is to look for line with reinterpreted_cast<z_const Bytef*>(data) to reinterpreted_cast<z_const Bytef*>(&data) for example. This needs to be done twice. Once in the compress.hpp and the decompress.hpp.

    opened by AKJ7 8
Owner
Mapbox
Mapbox is the location data platform for mobile and web applications. We're changing the way people move around cities and explore our world.
Mapbox
A C++ header-only ZLib wrapper

A C++ ZLib wrapper This C++ header-only library enables the use of C++ standard iostreams to access ZLib-compressed streams. For input access (decompr

Matei David 242 Jan 5, 2023
Single header lib for JPEG encoding. Public domain. C99. stb style.

tiny_jpeg.h A header-only public domain implementation of Baseline JPEG compression. Features: stb-style header only library. Does not do dynamic allo

Sergio Gonzalez 213 Jan 1, 2023
A C++ static library offering a clean and simple interface to the 7-zip DLLs.

bit7z A C++ static library offering a clean and simple interface to the 7-zip DLLs Supported Features • Getting Started • Download • Requirements • Bu

Riccardo 326 Jan 1, 2023
Multi-format archive and compression library

Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of form

null 1.9k Jan 8, 2023
LZFSE compression library and command line tool

LZFSE This is a reference C implementation of the LZFSE compressor introduced in the Compression library with OS X 10.11 and iOS 9. LZFSE is a Lempel-

null 1.7k Jan 4, 2023
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz

Miniz Miniz is a lossless, high performance data compression library in a single source file that implements the zlib (RFC 1950) and Deflate (RFC 1951

Rich Geldreich 1.6k Jan 5, 2023
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.0 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 971 Jan 4, 2023
Small strings compression library

SMAZ - compression for very small strings ----------------------------------------- Smaz is a simple compression library suitable for compressing ver

Salvatore Sanfilippo 1k Dec 28, 2022
A massively spiffy yet delicately unobtrusive compression library.

ZLIB DATA COMPRESSION LIBRARY zlib 1.2.11 is a general purpose data compression library. All the code is thread safe. The data format used by the z

Mark Adler 4.1k Dec 30, 2022
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.1 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 971 Jan 4, 2023
PhysFS++ is a C++ wrapper for the PhysicsFS library.

PhysFS++ PhysFS++ is a C++ wrapper for the excellent PhysicsFS library by Ryan C. Gordon and others. It is licensed under the zlib license - same as P

Kevin Howell 80 Oct 25, 2022
An embedded-friendly library for decompressing files from zip archives

An 'embedded-friendly' (aka Arduino) library to extract and decompress files from ZIP archives

Larry Bank 33 Dec 30, 2022
Simple data packing library (written in C99)

Features Compressed file pack creation Runtime file pack reading Supported operating systems Ubuntu MacOS Windows Build requirements C99 compiler CMak

Nikita Fediuchin 3 Feb 25, 2022
A simple C library implementing the compression algorithm for isosceles triangles.

orvaenting Summary A simple C library implementing the compression algorithm for isosceles triangles. License This project's license is GPL 2 (as of J

Kevin Matthes 0 Apr 1, 2022
Advanced DXTc texture compression and transcoding library

crunch/crnlib v1.04 - Advanced DXTn texture compression library Public Domain - Please see license.txt. Portions of this software make use of public d

null 775 Dec 26, 2022
gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

ACM at UCLA 8 Nov 6, 2022
Single-header header-only C++11 / C++14 / C++17 library for easily managing set of auto-generated type-safe flags.

Single-header header-only C++11 / C++14 / C++17 library for easily managing set of auto-generated type-safe flags. Quick start #include <bitflags/bitf

Marin Peko 76 Nov 22, 2022
Low dependency(C++11 STL only), good portability, header-only, deep neural networks for embedded

LKYDeepNN LKYDeepNN 可訓練的深度類神經網路 (Deep Neural Network) 函式庫。 輕量,核心部份只依賴 C++11 標準函式庫,低相依性、好移植,方便在嵌入式系統上使用。 Class diagram 附有訓練視覺化 demo 程式 訓練視覺化程式以 OpenCV

Lin Kao-Yuan 44 Nov 7, 2022
A simple header-only C++ argument parser library. Supposed to be flexible and powerful, and attempts to be compatible with the functionality of the Python standard argparse library (though not necessarily the API).

args Note that this library is essentially in maintenance mode. I haven't had the time to work on it or give it the love that it deserves. I'm not add

Taylor C. Richberger 1.1k Jan 4, 2023
A simple header-only C++ argument parser library. Supposed to be flexible and powerful, and attempts to be compatible with the functionality of the Python standard argparse library (though not necessarily the API).

args Note that this library is essentially in maintenance mode. I haven't had the time to work on it or give it the love that it deserves. I'm not add

Taylor C. Richberger 896 Aug 31, 2021