Multi-format archive and compression library

Overview

Welcome to libarchive!

The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of formats. It also includes implementations of the common tar, cpio, and zcat command-line tools that use the libarchive library.

Questions? Issues?

Contents of the Distribution

This distribution bundle includes the following major components:

  • libarchive: a library for reading and writing streaming archives
  • tar: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
  • cpio: the 'bsdcpio' program is a different interface to essentially the same functionality
  • cat: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
  • examples: Some small example programs that you may find useful.
  • examples/minitar: a compact sample demonstrating use of libarchive.
  • contrib: Various items sent to me by third parties; please contact the authors with any questions.

The top-level directory contains the following information files:

  • NEWS - highlights of recent changes
  • COPYING - what you can do with this
  • INSTALL - installation instructions
  • README - this file
  • CMakeLists.txt - input for "cmake" build tool, see INSTALL
  • configure - configuration script, see INSTALL for details. If your copy of the source lacks a configure script, you can try to construct it by running the script in build/autogen.sh (or use cmake).

The following files in the top-level directory are used by the 'configure' script:

  • Makefile.am, aclocal.m4, configure.ac - used to build this distribution, only needed by maintainers
  • Makefile.in, config.h.in - templates used by configure script

Documentation

In addition to the informational articles and documentation in the online libarchive Wiki, the distribution also includes a number of manual pages:

  • bsdtar.1 explains the use of the bsdtar program
  • bsdcpio.1 explains the use of the bsdcpio program
  • bsdcat.1 explains the use of the bsdcat program
  • libarchive.3 gives an overview of the library as a whole
  • archive_read.3, archive_write.3, archive_write_disk.3, and archive_read_disk.3 provide detailed calling sequences for the read and write APIs
  • archive_entry.3 details the "struct archive_entry" utility class
  • archive_internals.3 provides some insight into libarchive's internal structure and operation.
  • libarchive-formats.5 documents the file formats supported by the library
  • cpio.5, mtree.5, and tar.5 provide detailed information about these popular archive formats, including hard-to-find details about modern cpio and tar variants.

The manual pages above are provided in the 'doc' directory in a number of different formats.

You should also read the copious comments in archive.h and the source code for the sample programs for more details. Please let us know about any errors or omissions you find.

Supported Formats

Currently, the library automatically detects and reads the following formats:

  • Old V7 tar archives
  • POSIX ustar
  • GNU tar format (including GNU long filenames, long link names, and sparse files)
  • Solaris 9 extended tar format (including ACLs)
  • POSIX pax interchange format
  • POSIX octet-oriented cpio
  • SVR4 ASCII cpio
  • Binary cpio (big-endian or little-endian)
  • ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
  • ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
  • ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries)
  • GNU and BSD 'ar' archives
  • 'mtree' format
  • 7-Zip archives
  • Microsoft CAB format
  • LHA and LZH archives
  • RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status)
  • XAR archives

The library also detects and handles any of the following before evaluating the archive:

  • uuencoded files
  • files with RPM wrapper
  • gzip compression
  • bzip2 compression
  • compress/LZW compression
  • lzma, lzip, and xz compression
  • lz4 compression
  • lzop compression
  • zstandard compression

The library can create archives in any of the following formats:

  • POSIX ustar
  • POSIX pax interchange format
  • "restricted" pax format, which will create ustar archives except for entries that require pax extensions (for long filenames, ACLs, etc).
  • Old GNU tar format
  • Old V7 tar format
  • POSIX octet-oriented cpio
  • SVR4 "newc" cpio
  • shar archives
  • ZIP archives (with uncompressed or "deflate" compressed entries)
  • GNU and BSD 'ar' archives
  • 'mtree' format
  • ISO9660 format
  • 7-Zip archives
  • XAR archives

When creating archives, the result can be filtered with any of the following:

  • uuencode
  • gzip compression
  • bzip2 compression
  • compress/LZW compression
  • lzma, lzip, and xz compression
  • lz4 compression
  • lzop compression
  • zstandard compression

Notes about the Library Design

The following notes address many of the most common questions we are asked about libarchive:

  • This is a heavily stream-oriented system. That means that it is optimized to read or write the archive in a single pass from beginning to end. For example, this allows libarchive to process archives too large to store on disk by processing them on-the-fly as they are read from or written to a network or tape drive. This also makes libarchive useful for tools that need to produce archives on-the-fly (such as webservers that provide archived contents of a users account).

  • In-place modification and random access to the contents of an archive are not directly supported. For some formats, this is not an issue: For example, tar.gz archives are not designed for random access. In some other cases, libarchive can re-open an archive and scan it from the beginning quickly enough to provide the needed abilities even without true random access. Of course, some applications do require true random access; those applications should consider alternatives to libarchive.

  • The library is designed to be extended with new compression and archive formats. The only requirement is that the format be readable or writable as a stream and that each archive entry be independent. There are articles on the libarchive Wiki explaining how to extend libarchive.

  • On read, compression and format are always detected automatically.

  • The same API is used for all formats; it should be very easy for software using libarchive to transparently handle any of libarchive's archiving formats.

  • Libarchive's automatic support for decompression can be used without archiving by explicitly selecting the "raw" and "empty" formats.

  • I've attempted to minimize static link pollution. If you don't explicitly invoke a particular feature (such as support for a particular compression or format), it won't get pulled in to statically-linked programs. In particular, if you don't explicitly enable a particular compression or decompression support, you won't need to link against the corresponding compression or decompression libraries. This also reduces the size of statically-linked binaries in environments where that matters.

  • The library is generally thread safe depending on the platform: it does not define any global variables of its own. However, some platforms do not provide fully thread-safe versions of key C library functions. On those platforms, libarchive will use the non-thread-safe functions. Patches to improve this are of great interest to us.

  • In particular, libarchive's modules to read or write a directory tree do use chdir() to optimize the directory traversals. This can cause problems for programs that expect to do disk access from multiple threads. Of course, those modules are completely optional and you can use the rest of libarchive without them.

  • The library is not thread aware, however. It does no locking or thread management of any kind. If you create a libarchive object and need to access it from multiple threads, you will need to provide your own locking.

  • On read, the library accepts whatever blocks you hand it. Your read callback is free to pass the library a byte at a time or mmap the entire archive and give it to the library at once. On write, the library always produces correctly-blocked output.

  • The object-style approach allows you to have multiple archive streams open at once. bsdtar uses this in its "@archive" extension.

  • The archive itself is read/written using callback functions. You can read an archive directly from an in-memory buffer or write it to a socket, if you wish. There are some utility functions to provide easy-to-use "open file," etc, capabilities.

  • The read/write APIs are designed to allow individual entries to be read or written to any data source: You can create a block of data in memory and add it to a tar archive without first writing a temporary file. You can also read an entry from an archive and write the data directly to a socket. If you want to read/write entries to disk, there are convenience functions to make this especially easy.

  • Note: The "pax interchange format" is a POSIX standard extended tar format that should be used when the older ustar format is not appropriate. It has many advantages over other tar formats (including the legacy GNU tar format) and is widely supported by current tar implementations.

Issues
  • The libarchive lib exist a READ memory access Vulnerability

    The libarchive lib exist a READ memory access Vulnerability

    hello,when i use libfuzzer to write code to call archive_read_data function,i find a READ memory access Vulnerability.see the picture! The lzma_decode function crashed when decode my testcase. 图片

    opened by icycityone 47
  • Support for RAR

    Support for RAR

    Original issue 40 created by Google Code user ondra.pelech on 2009-10-05T16:05:49.000Z:

    Hi,
    
    it would be great if libarchive supported the RAR format; even if it would
    be passworded archive.
    
    This is just a wish/enhancement, not a bug; and I know it's probably not
    easy to implement and may take a long time. And thanks for this great
    project, I use it through GNOME's gvfs-mount.
    
    Type-Enhancement OpSys-All Milestone-Later Component-libarchive Priority-None 
    opened by kwrobot 45
  • libarchive's CMakeLists.txt finds major() when it shouldn't

    libarchive's CMakeLists.txt finds major() when it shouldn't

    Original issue 125 created by Google Code user audiofanatic on 2011-01-04T08:44:44.000Z:

    When building the cmlibarchive project with LSB compilers (LSB = Linux Standards Base), the archive_entry.c file generates compiler errors because it relies on the following functions which are not provided by the LSB:
    
    major
    minor
    makedev
    
    On linux, these are generally implemented as macros which forward to functions like gnu_dev_makedev, etc., but they actually have very simple inlineable implementations. In fact, with certain GCC flags, these macros/functions *are* fully inlined. It would seem that this has been noted by the cmlibarchive developers too, since they have recently added the following to archive_entry.c
    
    #if !defined(HAVE_MAJOR) && !defined(major)
    /* Replacement for major/minor/makedev. */
    #define major(x) ((int)(0x00ff & ((x) >> 8)))
    #define minor(x) ((int)(0xffff00ff & (x)))
    #define makedev(maj,min) ((0xff00 & ((maj)<<8)) | (0xffff00ff & (min)))
    #endif
    
    The HAVE_MAJOR switch is the problem for LSB compilers. It is set earlier in archive_entry.c and it merely depends on one of MAJOR_IN_MKDEV or MAJOR_IN_SYSMACROS being defined. Unfortunately, the top level CMakeLists.txt file does this detection without considering LSB compilers, since the LSB does not provide either mkdev.h nor sysmacros.h, but the CMakeLists.txt file doesn't account for this. The result is that system versions of these headers can be found, which is incorrect/dangerous when using LSB compilers. This is easy to fix with the attached patch to the top level CMakeLists.txt file.
    
    Note that this bug was originally reported to KitWare since it affects CMake itself. They have requested that this issue be fixed in cmlibarchive itself since they import cmlibarchive sources. For reference, see here:
    
    http://public.kitware.com/Bug/view.php?id=11648
    
    
    
    
    

    See attachment: CMakeLists.txt.patch

    Type-Defect Priority-Medium OpSys-All 
    opened by kwrobot 34
  • Add support for extracting SCHILY.xattr extended attributes

    Add support for extracting SCHILY.xattr extended attributes

    This patch adds support for extracting SCHIL.xattr extended attributes found in the PAX extended header. Since some of the attributes found there can be binary data, we extend the parser for support of binary data.

    One example for an attribute with binary data is SCHILY.xattr.security.ima, which contains a digital signature.

    Signed-off-by: Stefan Berger [email protected]

    Type-Feature 
    opened by stefanberger 26
  • Unicode filenames inside RAR not working

    Unicode filenames inside RAR not working

    Original issue 247 created by Google Code user [email protected] on 2012-03-06T03:13:12.000Z:

    <b>What steps will reproduce the problem?</b>
    Attached RAR file contains one file called &quot;テスト3.xlsx&quot;.
    Read filenames in attached file with archive_read_next_header and archive_entry_pathname_w.
    
    <b>What is the expected output? What do you see instead?</b>
    Expected filename == &quot;テスト3.xlsx&quot;, but get &quot;テスト3&quot;.
    
    <b>What version are you using?</b>
    3.0.3
    
    <b>On what operating system?</b>
    Win7-64
    
    <b>How did you build?  (cmake, configure, or pre-packaged binary)</b>
    Cmake
    
    <b>What compiler or development environment (please include version)?</b>
    VS2010
    
    <b>Please provide any additional information below.</b>
    64-bit build.
    

    See attachment: unicode-subfile.rar

    Type-Defect Priority-Medium OpSys-All 
    opened by kwrobot 24
  • [meta] Reporting potential security problems

    [meta] Reporting potential security problems

    (copy of https://groups.google.com/d/topic/libarchive-discuss/zFtqsPhNcQ0/discussion)

    Our fuzzing effort (read more at our home page: https://github.com/google/oss-fuzz) has detected several crashes (2 buffer overrun and one null deref) in libarchive trunk using the fuzz target that we developed:

    https://github.com/google/oss-fuzz/blob/master/targets/libarchive/libarchive_fuzzer.cc

    These crashes are now filed in a security-protected monorail tracker (https://bugs.chromium.org/p/oss-fuzz/issues/list) and we'd like to find libarchive engineers to take a look at them.

    We'd like to CC developers on libarchive issues to give them access to stack traces and reproducer data. For that we'd only need an e-mail with associated gmail account. We can set up the process to auto-CC these e-mails when we find more issues.

    opened by mikea 23
  • build fails on SCO 5

    build fails on SCO 5

    Original issue 129 created by Google Code user brianchina60221 on 2011-01-18T18:34:53.000Z:

    libarchive/archive.h defines some types appropriate to the platform, but those types aren't used elsewhere in the code. There are many direct uses of, e.g., uint32_t.
    
    The attached patch against trunk is big, but it just rearranges some stuff in archive.h, archive_entry.h, and archive_platform.h, and then seds the C99 types in libarchive/* to use the internal names.
    
    Thank you.
    

    See attachment: types.patch

    Type-Defect Priority-Medium OpSys-Other 
    opened by kwrobot 22
  • libarchive fails to process zip files with garbage padding at end

    libarchive fails to process zip files with garbage padding at end

    Original issue 257 created by Google Code user alexkozlov0 on 2012-04-11T00:29:38.000Z:

    <b>What steps will reproduce the problem?</b>
    1. wget ftp://ftp.adobe.com/pub/adobe/magic/acrobatviewer/unix/1.x/viewer.bin
    2. bsdtar tvf viewer.bin
    
    What is the expected output?
    The zip file listing.
    
    What do you see instead?
    bsdtar: Invalid central directory signature
    bsdtar: Error exit delayed from previous errors.
    
    <b>What version are you using?</b>
    3.0.4, git
    
    <b>On what operating system?</b>
    FreeBSD 9.x
    
    <b>How did you build?  (cmake, configure, or pre-packaged binary)</b>
    configure
    
    <b>What compiler or development environment (please include version)?</b>
    gcc 4.2
    
    <b>Please provide any additional information below.</b>
    Now that libarchive have seekable zip reader, it can fallback to Central directory at the end of the zip file instead of terminating with error.
    
    Type-Defect Priority-Medium OpSys-All 
    opened by kwrobot 18
  • Support reading from multiple data objects (multivolume reading)

    Support reading from multiple data objects (multivolume reading)

    Original issue 166 created by Google Code user mcitadel on 2011-08-13T18:39:46.000Z:

    RAR archives can be split into multiple files (to provide multivolume support). Each file contains the RAR signature header, a main archive header, and the optional EOF header. The data blocks are split arbitrarily between each file in a multivolume set of files. Currently, libarchive doesn't handle reading from multiple files.
    
    This patch would introduce reading from multiple files by way of reading from multiple client objects. What would happen is that there is a chain of client objects, each with the callbacks and data necessary to open, read, skip, and close each object it's reading from (such as different files). Data is read from each of these clients as one large stream. I plan on implementing multivolume reading support of RAR files once general reading from multiple streams is accepted and committed to trunk.
    
    I introduced a new callback (switch callback) that can be used to switch from reading of one client to the next or previous client. I needed some way to determine whether a file should be closed because it's going to open the next file, or if it's being closed because libarchive is done reading from the file set. The latter would mean that I also need to free all memory allocated for all data objects of each client.
    
    I've introduced some test cases already for reading from these multiple clients. The test files are simply some reused test rar files that have been splitted using the 'split' program. There's also a test case for supplying custom callbacks and multiple client objects. This custom callbacks test case is essentially the way I see of using libarchive to read from multiple files with custom callbacks. I plan on using libarchive in this way in another application (XBMC).
    
    This patch also updates test_fuzz so it can read from multiple files. Currently, the multiple files used in test_fuzz would have the same result as &quot;test_read_format_rar.rar&quot; would. Once I have RARv3 multivolume reading support implemented, this would provide a better test for test_fuzz.
    
    
    Priority-Medium Type-Review 
    opened by kwrobot 18
  • Wrong locale defaults for windows

    Wrong locale defaults for windows

    Original issue 132 created by Google Code user repalov on 2011-01-31T18:08:58.000Z:

    <b>What version are you using?</b>
    trunk / revision 2953
    
    <b>On what operating system?</b>
    Windows 7
    
    <b>How did you build?  (cmake, configure, or pre-packaged binary)</b>
    cmake
    
    <b>What compiler or development environment (please include version)?</b>
    Visual Studio 2010
    
    I have two comments.
    
    1)  In line 464 of archive_string.c (
    http://code.google.com/p/libarchive/source/browse/trunk/libarchive/archive_string.c#464 ) used ACP code page, but ZIP and TAR (and may be other) archives created in Windows with using CP_OEMCP (OEM) character set for filenames (at least for russian it is true - ACP defines codepage 1251, but in archive names are in 866 codepage).
    
    2) It is incorrect idea to use _system_ default locale to convert mbstring&lt;-&gt;wcstring. Because if I have archive with russian filenames from FreeBSD it filenames is in koi8-r, and if I can't define charset for archive - I can't get proper names. 
    The other problem - if i build libarchive as dll, then dll have it's own locale and I can't change it from program at all.
    
    So I think it is need mechanism to change locale for string conversion for library (as minimum) or for archive (optimal).
    
    At this time no one archiver that I tested (7zip, WinRar, bsdtar from libarchive) not extracted russian names correct from tar.bz2 archive created on FreeBSD 8.1.
    
    Type-Defect OpSys-All Priority-Critical Milestone-3.0 
    opened by kwrobot 18
  • Hide private symbols in libarchive.so

    Hide private symbols in libarchive.so

    Libarchive.so presently exports 281 symbols (over 50%, full list attached) which are not present in libarchive's headers and thus are not supposed to be used by clients.

    Removing these symbols would allow compiler to optimize code more aggressively (.text reduced by 1%), speed up dynamic linker on Linux and prevent clients from inadvertently using internal APIs.

    I attached a simple patch that hides private symbols. It passes make check (I can do additional testing if needed). Would something like this be interesting for the project?

    0001-Hide-private-symbols.patch.txt private_syms.txt

    The issue was found using ShlibVisibilityChecker.

    opened by yugr 17
  • archive_entry_size incorrectly ignores ZIP's Data Descriptor records

    archive_entry_size incorrectly ignores ZIP's Data Descriptor records

    Background

    Here's a puzzle. I have a zip file containing 2 constituent files. unzip -l gives the file sizes as 305 and 320 bytes. libarchive (see the main.c program below) also says 305 and 320 bytes.

    However, if I gzip the zip file and pass it to libarchive, I get 0 and 320, not 305 and 320. This is incorrect, and I believe that this is a bug in libarchive (as opposed to a malformed zip file).

    $ ls
    main.c	test_data_descriptor.zip
    
    $ unzip -l test_data_descriptor.zip
    Archive:  test_data_descriptor.zip
      Length      Date    Time    Name
    ---------  ---------- -----   ----
          305  2015-09-05 16:32   -
          320  2015-09-05 22:00   second.txt
    ---------                     -------
          625                     2 files
    
    $ gzip --keep test_data_descriptor.zip
    
    $ gcc main.c -larchive
    
    $ ./a.out test_data_descriptor.zip
    -
      RAW: 0,   ZIP: 1
      size before: 305 (64)
      size after:  305 (64)
    second.txt
      RAW: 0,   ZIP: 1
      size before: 320 (64)
      size after:  320 (64)
    
    $ ./a.out test_data_descriptor.zip.gz
    -
      RAW: 0,   ZIP: 1
      size before: 0 (0)
      size after:  0 (0)
    second.txt
      RAW: 0,   ZIP: 1
      size before: 320 (64)
      size after:  320 (64)
    

    The test_data_descriptor.zip file can be downloaded from https://github.com/adamhathcock/sharpcompress/files/242365/test_data_descriptor.zip and is referenced from https://github.com/adamhathcock/sharpcompress/issues/88 Its contents are:

    $ hd test_data_descriptor.zip
    00000000  50 4b 03 04 14 00 08 00  08 00 0e 84 25 47 00 00  |PK..........%G..|
    00000010  00 00 00 00 00 00 00 00  00 00 01 00 00 00 2d 73  |..............-s|
    00000020  f3 0c 0a 0e 51 70 23 87  e4 e5 22 4b db 10 d7 0c  |....Qp#..."K....|
    00000030  00 50 4b 07 08 01 61 6f  5b 12 00 00 00 31 01 00  |.PK...ao[....1..|
    00000040  00 50 4b 03 04 14 00 08  00 08 00 1a 70 25 47 00  |.PK.........p%G.|
    00000050  00 00 00 00 00 00 00 40  01 00 00 0a 00 7b 00 73  |[email protected]{.s|
    00000060  65 63 6f 6e 64 2e 74 78  74 53 44 66 00 ac 00 00  |econd.txtSDf....|
    00000070  00 00 08 00 a4 e0 ad bb  63 64 60 69 10 61 60 60  |........cd`i.a``|
    00000080  30 60 80 00 1f 20 66 64  05 33 59 45 81 84 42 7b  |0`... fd.3YE..B{|
    00000090  86 e1 b2 5c c9 69 c7 67  ca f4 bd 60 c6 2d c7 c8  |...\.i.g...`.-..|
    000000a0  c4 c0 c0 c4 90 c0 c0 02  96 96 60 f8 cf 28 cf 00  |..........`..(..|
    000000b0  12 03 a9 55 00 a9 05 b3  45 20 e2 8c 10 71 21 06  |...U....E ...q!.|
    000000c0  88 d8 7e 46 61 b8 18 37  54 ff 4a 06 21 14 fd 8a  |..~Fa..7T.J.!...|
    000000d0  40 36 00 55 54 0d 00 07  74 d9 ea 55 5f d9 ea 55  |@6.UT...t..U_..U|
    000000e0  5f d9 ea 55 0b 76 75 f6  f7 73 51 08 26 8f e2 e5  |_..U.vu..sQ.&...|
    000000f0  22 53 e3 b0 d1 0f 00 50  4b 07 08 e0 08 02 5a 13  |"S.....PK.....Z.|
    00000100  00 00 00 40 01 00 00 50  4b 01 02 1e 00 14 00 08  |[email protected]|
    00000110  00 08 00 0e 84 25 47 01  61 6f 5b 12 00 00 00 31  |.....%G.ao[....1|
    00000120  01 00 00 01 00 00 00 00  00 00 00 01 00 00 00 00  |................|
    00000130  00 00 00 00 00 2d 50 4b  01 02 1e 00 14 00 08 00  |.....-PK........|
    00000140  08 00 1a 70 25 47 e0 08  02 5a 13 00 00 00 40 01  |...p%[email protected]|
    00000150  00 00 0a 00 11 00 00 00  00 00 01 00 20 00 00 00  |............ ...|
    00000160  41 00 00 00 73 65 63 6f  6e 64 2e 74 78 74 53 44  |A...second.txtSD|
    00000170  04 00 ac 00 00 00 55 54  05 00 07 74 d9 ea 55 50  |......UT...t..UP|
    00000180  4b 05 06 00 00 00 00 02  00 02 00 78 00 00 00 07  |K..........x....|
    00000190  01 00 00 00 00                                    |.....|
    00000195
    

    The main.c program is:

    $ cat main.c
    #include <archive.h>
    #include <archive_entry.h>
    #include <stdio.h>
    
    int main(int argc, char** argv) {
      struct archive* a;
      struct archive_entry* entry;
      int r;
    
      if (argc < 2) {
        fprintf(stderr, "No archive listed.\n");
        return 1;
      }
    
      a = archive_read_new();
      archive_read_support_filter_all(a);
      archive_read_support_format_all(a);
      r = archive_read_open_filename(a, argv[1], 10240);
      if (r != ARCHIVE_OK) {
        fprintf(stderr, "archive_read_open_filename failed.\n");
        return 1;
      }
    
      while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
        printf("%s\n", archive_entry_pathname(entry));
        printf("  RAW: %d,   ZIP: %d\n", archive_format(a) == ARCHIVE_FORMAT_RAW,
               archive_format(a) == ARCHIVE_FORMAT_ZIP);
        printf("  size before: %d (%d)\n", (int)archive_entry_size(entry),
               archive_entry_size_is_set(entry));
        archive_read_data_skip(a);
        printf("  size after:  %d (%d)\n", (int)archive_entry_size(entry),
               archive_entry_size_is_set(entry));
      }
    
      r = archive_read_free(a);
      if (r != ARCHIVE_OK) {
        fprintf(stderr, "archive_read_free failed.\n");
        return 1;
      }
      return 0;
    }
    

    ZIP File Format

    As you may already know, a ZIP file, roughly speaking, is a sequence of records, such as a Local File Header record, Data Descriptor record Central Directory File Header record or End Of Central Directory record. The https://en.wikipedia.org/wiki/ZIP_(file_format)#Structure Wikipedia page has a good overview, and details the fields per record. See also https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which is the closest thing we have to an official ZIP file format specification.

    Both the LFH (Local File Header) and CDFH (Central Directory File Header) records contain a little-endian 4 byte "uncompressed size" field. For the first file (with filename "-"), the LFH one is at byte offset 0x0016, with value 0x00000000. The CDFH one is at offset 0x011f, with value 0x00000131 = 305.

    For test_data_descriptor.zip, libarchive has random access to the zip file and uses the CDFH records' uncompressed size (and CDFH records are at the end of the file). For test_data_descriptor.zip.gz (and note the .zip.gz suffix), libarchive can automatically decompress the outer GZIP compression but the decompressed bytes are only available under sequential (streaming) access, not random access, so libarchive reports the LFH records' uncompressed size.

    (Not a) Malformed ZIP File

    My initial reaction was that the test_data_descriptor.zip file was malformed, since the two "uncompressed size" fields are inconsistent. However, there is a further subtlety. The two bytes starting at offset 0x0006 are "general purpose bit flag" bytes and the value here is 0x0008.

    APPNOTE.TXT section 4.4.4 "general purpose bit flag" says "Bit 3: If this bit is set, the fields crc-32, compressed size and uncompressed size are set to zero in the local header. The correct values are put in the data descriptor immediately following the compressed data." Bit 3 is, of course, the 0x0008 bit that is set in test_data_descriptor.zip's first LFH.

    The first Data Descriptor starts with the "50 4b 07 08" signature at byte offset 0x0031 and its "uncompressed size" field is at byte offset 0x003d, with value 0x00000131 = 305.

    So the test_data_descriptor.zip file is valid, and both (LFH+DD) and (CDFH) are reporting the same "uncompressed size" value, 305, but libarchive reports the wrong value, presumably because it reports the literal 0 value in the LFH record and ignores the DD record.

    Expected Behavior

    Given libarchive's iterator model (only requiring sequential access, not random access), I would expect archive_entry_size_is_set to still return 0 before the archive_read_data_skip call but return non-zero afterwards (and for archive_entry_size to then return 305).

    Workaround

    For my particular program, I could work around it if libarchive treated test_data_descriptor.zip.gz as a RAW data.gz file instead of a (possibly separately compressed) ZIP data.zip file, so I could use libarchive once to extract the 'data' to a real (temporary) file and use libarchive a second time on that real (seekable) file to use (CDFH) values instead of (LFH+DD) values. However, per the printf calls above, archive_format doesn't distinguish between test_data_descriptor.zip and test_data_descriptor.zip.gz and returns ARCHIVE_FORMAT_ZIP for both.

    Is there a way for libarchive to treat it as raw (for archive_format to return ARCHIVE_FORMAT_RAW) for the second case?

    opened by nigeltao 0
  • archive_write_set_format_7zip: no flushes/writes into disc during processing

    archive_write_set_format_7zip: no flushes/writes into disc during processing

    Is this a bug or a feature? I tried basic write example with archive_write_set_format_7zip and it seems it stores everything in memory until archive_write_free is called (I tried with callback version of archive_write_open, no writes until archive_write_free). Tried the same with gzip/xz filters and write is called frequently during the process. Tried with libarchive 3.2.2 and 3.6.1, compressing 2000000 x 1.5KB ascii random buffers.

    opened by bodzio131 0
  • Missing archive_read_support_filter_by_code API and tests

    Missing archive_read_support_filter_by_code API and tests

    As spotted in https://github.com/libarchive/libarchive/pull/1751 the cmake build (as does the Android one) do not include the libarchive/archive_read_support_filter_by_code.c file into the build. As such the API is not exposed.

    To make it even worse, there is no tests or in-tree users for this API (unlike the read_format_by_code).

    We should fix that up by:

    • adding the cmake and android bits
    • adding new test
    • optional: check through (and if needed fix) the rest of the API/files
    • optional: add some misc API tests - say like the ones in mesa that I've wrote :-P

    Alternatively, we could remove the API

    • audit for potential users
      • some/all(?) Linux distros are using the cmake build
      • what about BSDs?
      • check around for users - distros, github, general search?
    • check when the code was added - see with author for actual users
    • if seemingly safe and maintainers are happy - remove it
    opened by evelikov 1
  • how can i get file attribute when reading archive file content

    how can i get file attribute when reading archive file content

    Dear all,

    I am trying to read archive file contents with code below, is there any way to get file attribute (like hidden, ... ) ? for example there is one file in archive with hidden attribute, and i want to know this.

    static void listContent()
    {
        struct archive *a;
        struct archive_entry *entry;
        int r;
    
        a = archive_read_new();
        archive_read_support_filter_all(a);
        archive_read_support_format_all(a);
        r = archive_read_open_filename(a, "1.zip", 10240); // Note 1
        if (r != ARCHIVE_OK)
            exit(1);
        while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
            printf("%s\n", archive_entry_pathname(entry));
            archive_read_data_skip(a); // Note 2
        }
        r = archive_read_free(a); // Note 3
        if (r != ARCHIVE_OK)
            exit(1);
    }
    
    opened by mohsenomidi 2
  • Incorrect passphrase for Zip archive

    Incorrect passphrase for Zip archive

    Hello! I try open zip archive with password in Chinese, but i get an error when reading entry.

    Password: 夜亱线强爲為为 File: clean_one_layer_password_in_utf8.zip OS: linux libarchive: 3.6.1 LANG: en_US.utf8

    Simple code to reproduce:

    #include <archive.h>
    #include <archive_entry.h>
    #include <stdio.h>
    
    #define CHECK_AND_EXIT(result, msg)                                            \
      {                                                                            \
        if (result) {                                                              \
          printf(msg);                                                             \
          return 1;                                                                \
        }                                                                          \
      }
    
    int main(int argc, char **argv) {
      int r;
      char buff[8192];
      ssize_t len;
      FILE *out;
      struct archive *ina;
      struct archive_entry *entry;
      char *input, *passphrase;
    
      input = argv[1];
      passphrase = argv[2];
    
      CHECK_AND_EXIT((ina = archive_read_new()) == NULL,
                     "Cannot create archive reader");
      CHECK_AND_EXIT(archive_read_support_filter_all(ina) != ARCHIVE_OK,
                     "Cannot enable decompression");
      CHECK_AND_EXIT(archive_read_support_format_all(ina) != ARCHIVE_OK,
                     "Cannot enable read formats")
      CHECK_AND_EXIT(archive_read_add_passphrase(ina, passphrase) != ARCHIVE_OK,
                     "Cannot add passphrase");
      CHECK_AND_EXIT(archive_read_open_filename(ina, input, 10240) != ARCHIVE_OK,
                     "Cannot open archive");
      CHECK_AND_EXIT((out = fopen(argv[3], "wb")) == NULL,
                     "Cannot open output file");
    
      while ((r = archive_read_next_header(ina, &entry)) == ARCHIVE_OK) {
        printf("%s: ", archive_entry_pathname(entry));
        /* Skip anything that isn't a regular file. */
        if (!S_ISREG(archive_entry_mode(entry))) {
          printf("skipped\n");
          continue;
        }
        if (archive_entry_size(entry) > 0) {
          do {
    
            len = archive_read_data(ina, buff, sizeof(buff));
            if (len == 0) {
              printf("copied\n");
              break;
            }
            if (len < 0) {
              printf("Error reading input archive: retcode=%zi string=%s\n", len,
                     archive_error_string(ina));
              break;
            }
            fprintf(out, buff, len);
          } while (1);
        }
      }
    
      CHECK_AND_EXIT(r != ARCHIVE_EOF, "Error reading archive");
      /* Close the archives.  */
      CHECK_AND_EXIT(archive_read_free(ina) != ARCHIVE_OK,
                     "Error closing input archive");
      return 0;
    }
    
    opened by ikrivosheev 0
Releases(v3.6.1)
LZFSE compression library and command line tool

LZFSE This is a reference C implementation of the LZFSE compressor introduced in the Compression library with OS X 10.11 and iOS 9. LZFSE is a Lempel-

null 1.7k Jul 30, 2022
Advanced DXTc texture compression and transcoding library

crunch/crnlib v1.04 - Advanced DXTn texture compression library Public Domain - Please see license.txt. Portions of this software make use of public d

null 748 Aug 10, 2022
Small strings compression library

SMAZ - compression for very small strings ----------------------------------------- Smaz is a simple compression library suitable for compressing ver

Salvatore Sanfilippo 1k Aug 11, 2022
A massively spiffy yet delicately unobtrusive compression library.

ZLIB DATA COMPRESSION LIBRARY zlib 1.2.11 is a general purpose data compression library. All the code is thread safe. The data format used by the z

Mark Adler 3.7k Aug 12, 2022
A simple C library implementing the compression algorithm for isosceles triangles.

orvaenting Summary A simple C library implementing the compression algorithm for isosceles triangles. License This project's license is GPL 2 (as of J

Kevin Matthes 0 Apr 1, 2022
Extremely Fast Compression algorithm

LZ4 - Extremely fast compression LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU

lz4 7.4k Aug 16, 2022
Zstandard - Fast real-time compression algorithm

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better comp

Facebook 17.4k Aug 8, 2022
Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++

LZHAM - Lossless Data Compression Codec Public Domain (see LICENSE) LZHAM is a lossless data compression codec written in C/C++ (specifically C++03),

Rich Geldreich 628 Jul 28, 2022
A bespoke sample compression codec for 64k intros

pulsejet A bespoke sample compression codec for 64K intros codec pulsejet lifts a lot of ideas from Opus, and more specifically, its CELT layer, which

logicoma 34 Jul 25, 2022
A variation CredBandit that uses compression to reduce the size of the data that must be trasnmitted.

compressedCredBandit compressedCredBandit is a modified version of anthemtotheego's proof of concept Beacon Object File (BOF). This version does all t

Conor Richard 17 Apr 9, 2022
Data compression utility for minimalist demoscene programs.

bzpack Bzpack is a data compression utility which targets retrocomputing and demoscene enthusiasts. Given the artificially imposed size limits on prog

Milos Bazelides 20 Jul 27, 2022
gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

gzip (GNU zip) is a compression utility designed to be a replacement for 'compress'

ACM at UCLA 7 Apr 27, 2022
Better lossless compression than PNG with a simpler algorithm

Zpng Small experimental lossless photographic image compression library with a C API and command-line interface. It's much faster than PNG and compres

Chris Taylor 201 Jun 28, 2022
Runtime Archiver plugin for Unreal Engine. Cross-platform archiving and unarchiving directories and files. Currently supports ZIP format.

Runtime Archiver Archiving and dearchiving directories and files Explore the docs » Marketplace . Releases . Support Chat Features Fast speed Easy arc

Georgy Treshchev 18 Jul 27, 2022
Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.

slow5tools Slow5tools is a simple toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format. Abou

Hasindu Gamaarachchi 50 Aug 1, 2022
A C++ static library offering a clean and simple interface to the 7-zip DLLs.

bit7z A C++ static library offering a clean and simple interface to the 7-zip DLLs Supported Features • Getting Started • Download • Requirements • Bu

Riccardo 281 Aug 15, 2022
miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz

Miniz Miniz is a lossless, high performance data compression library in a single source file that implements the zlib (RFC 1950) and Deflate (RFC 1951

Rich Geldreich 1.5k Aug 8, 2022
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.0 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 930 Aug 11, 2022
Fork of the popular zip manipulation library found in the zlib distribution.

minizip-ng 3.0.1 minizip-ng is a zip manipulation library written in C that is supported on Windows, macOS, and Linux. Developed and maintained by Nat

zlib-ng 929 Aug 5, 2022