Random access array of tightly packed unsigned integers

Overview

PackedArray: random access array of tightly packed unsigned integers

Build Status

TLDR

PackedArray comes to the rescue when you're in a desperate need for an uint9_t or uint17_t array.

What?

When you want to hold an unordered sequence of unsigned integers into memory, the C programming language lets you choose among 4 data types:

  • uint8_t
  • uint16_t
  • uint32_t
  • uint64_t

If your numbers are within the [0, 100000] range, only 17 bits per integer are needed since 217 = 131072. However, you can't use an array of uint16_t because 16 bits are not enough to store numbers between 65536 and 100000. When you use the next available type, uint32_t, you're wasting 15 bits per integer which represents a 47% overhead in terms of storage requirements.

PackedArray saves memory by packing integers/items together at the bit-level:

b0 b1 b2 ...
i0 i1 i2 i3 i4 i5 i6 i7 i8 i9 ...

A PackedArray is backed by an uint32_t buffer. Several items end up being stored inside the same buffer cell, e.g. i0, i1, and i2. Some items span two buffer cells, e.g. i3, and i7. PackedArray is responsible for encoding/decoding items into/from the storage buffer.

PackedArraySIMD is a PackedArray variant that makes use of SSE2 or NEON instructions.

Going SIMD processes integers 4 by 4 but imposes an interleaved layout in the storage buffer.

PackedArraySIMD interleaved layout, 13 bits per item:

b0 b1 b2 b3 ...
i0 i4 i8a i1 i5 i9a i2 i6 i10a i3 i7 i11a i8b ...

As a consequence, the data layout of PackedArraySIMD isn't compatible with its non SIMD counterpart. In other words, you cannot use PackedArray to unpack data packed with PackedArraySIMD or the other way around.

It is also worth noting the implementations of PackedArraySIMD_pack and PackedArraySIMD_unpack require more plumbing than their non-SIMD counterparts. Additional computations are needed to find out and adjust a data window that can be processed 4 by 4 with SIMD instructions.

PackedArray and PackedArraySIMD are released under the WTFPL v2 license.

For more information, see the PackedArray announcement on my personal website.

Why?

PackedArray is designed as a drop-in replacement for an unsigned integer array. I couldn't find such a data structure in the wild, so I implemented one.

Instead of writing:

uint32_t* a = (uint32_t*)malloc(sizeof(uint32_t) * count);
...
value = a[i];
...
a[j] = value;

You write:

PackedArray* a = PackedArray_create(bitsPerItem, count);
...
value = PackedArray_get(a, i);
...
PackedArray_set(a, j, value);

The PackedArray_computeBitsPerItem helper scans a uint32_t array and returns the number of bits needed to create a PackedArray capable of holding its content.

There are also PackedArray_pack and PackedArray_unpack that operate on several items in a row. Those two could really have been named PackedArray_write and PackedArray_read but I decided "pack" / "unpack" conveys better something is happening under the hood.

// bulk packing / unpacking
PackedArray_pack(a, j, in, count);
PackedArray_unpack(a, j, out, count);

// the following are semantically equivalent
PackedArray_set(a, j, value);
PackedArray_pack(a, j, &value, 1);

value = PackedArray_get(a, i);
PackedArray_unpack(a, i, &value, 1);

Compiling

In order to use PackedArray or PackedArraySIMD in your own project, you just have to bring in the two PackedArray.h and PackedArray.c (or PackedArraySIMD.c) files. It's that simple.

You can customize PackedArray.c's behavior by defining the following macros:

  • PACKEDARRAY_ASSERT
  • PACKEDARRAY_MALLOC
  • PACKEDARARY_FREE

You can customize PackedArraySIMD.c's behavior by defining the following macros:

  • PACKEDARRAY_ASSERT
  • PACKEDARRAY_ALIGNED_MALLOC
  • PACKEDARARY_FREE

PackedArray.c and PackedArraySIMD.c can compile themselves into either a test program or a micro-benchmark. For that, you have to use one of the following preprocessor directives:

  • PACKEDARRAY_SELF_TEST
  • PACKEDARRAY_SELF_BENCH

For example, from command line:

$ cc -o PackedArraySelfTest -DPACKEDARRAY_SELF_TEST -O2 -g PackedArray.c
$ cc -o PackedArraySelfBench -DPACKEDARRAY_SELF_BENCH -DNDEBUG -O2 -g PackedArray.c

$ cc -o PackedArraySIMDSelfTest -DPACKEDARRAY_SELF_TEST -O2 -g PackedArraySIMD.c
$ cc -o PackedArraySIMDSelfBench -DPACKEDARRAY_SELF_BENCH -DNDEBUG -O2 -g PackedArraySIMD.c

Compiling for Windows

There is a Visual Studio 2012 solution in the _win-vs11/ folder.

Compiling for Linux or Mac

There is a GNU Make 3.81 MakeFile in the _gnu-make/ folder:

$ make -C _gnu-make/

Compiling for Mac

See above if you want to compile from command line. Otherwise there is an Xcode project located in the _mac-xcode/ folder.

Compiling for iOS

There is an Xcode project located in the _ios-xcode/ folder.

If you prefer compiling from command line and deploying to a jailbroken device through SSH, use:

$ make -C _gnu-make/ binsubdir=ios CC="$(xcrun --sdk iphoneos --find clang) -isysroot $(xcrun --sdk iphoneos --show-sdk-path) -arch armv7 -arch armv7s -arch arm64" postbuild="codesign -s 'iPhone Developer'"

Compiling for Android

You will have to install the Android NDK, and point the $NDK_ROOT environment variable to the NDK path: e.g. export NDK_ROOT=/opt/android-ndk (without a trailing / character).

Next, the easy way is to make a standalone Android toolchain with the following command:

$ $NDK_ROOT/build/tools/make-standalone-toolchain.sh --system=$(uname -s | tr [A-Z] [a-z])-$(uname -m) --platform=android-3 --toolchain=arm-linux-androideabi-clang3.3 --install-dir=/tmp/android-clang

Now you can compile the self test and self benchmark programs by running:

$ make -C _gnu-make/ binsubdir=android CC=/tmp/android-clang/bin/clang CFLAGS='-march=armv7-a -mfloat-abi=softfp -mfpu=neon -O2'

Implementation details, what the hell is going on?

First, in PackedArray.c or PackedArraySIMD.c, everything that comes below the - 8< ---- marker is the code for the self test and self micro-benchmark programs and can be discarded if you really want to:

If you want to cut down your anxiety, you can use the provided GNU Makefile and invoke:

$ make -C _gnu-make/ cut

This produces the PackedArray.cut.c and PackedArraySIMD.cut.c files.

You may also be troubled by PackedArray.c and PackedArraySIMD.c including themselves with #include PACKEDARRAY_SELF. By combining preprocessing tricks and including themselves, PackedArray.c and PackedArraySIMD.c "generate the code" for the unrolled pack and unpack implementations.

By default PACKEDARRAY_SELF is defined to "PackedArray.c" which assumes the compiler is going to look for the file in the same directory as the file from which the #include statement is being evaluated. This helps compiling when the build system refers to the source files with relative paths. Depending on your compiler/build system combination you may want to override PACKEDARRAY_SELF to __FILE__.

If you want to see the generated code, you can use the provided GNU Makefile and invoke:

$ make -C _gnu-make/ preprocess

This produces the PackedArray.pp.c and PackedArraySIMD.pp.c files.


If you find PackedArray or PackedArraySIMD useful and decide to use it in your own projects please drop me a line @gpakosz.

If you use it in a commercial project, consider using Gittip.

Comments
  • Fix

    Fix "unused function" warning seen with gcc.

    highestBitSet does not use log2 when using a newish Microsoft compiler or gcc. Therefore log2 needs to be defined only when neither using a newish Microsoft compiler, nor gcc. De Morgan's laws in action!

    opened by prideout 3
  • Thread safety

    Thread safety

    Posting for the sake of others information. I attempted to use an array of mutexes to guard segments of a PackedArray to reduce the delay caused by threads waiting single-file to write to the PackedArray. This approach works fine with normal arrays. However with the PackedArray I was getting inconstant results.

    This suggests that any invocation of the PackedArray_get / PackedArray_set should be guarded by a mutex, even if the individual index is guarded.

    opened by guinn8 2
  • 0 Initalized packed array buffer

    0 Initalized packed array buffer

    I have 0 initialized the PackedArray buffer. I was valrgrind'ing around in my application and was getting Conditional jump or move depends on uninitialised value(s) when using a 1- bit packed array buffer and was getting incorrect results. This fix resolves the issue. I reran the tests and everything is passing. This is a fix for #6

    Bug can be reproduced in this commit with this input aliquot/_pom_yang_rewrite/ make && valgrind ./main --bound=$((10**6)) --seg_len=$((5 * $((10**3)))) --writebuf_len=1000000 --preimage_count_bits=1 --num_threads=12

    opened by guinn8 5
  • Inconsistent results with 1-bit packed array

    Inconsistent results with 1-bit packed array

    I only really know the symptoms of the (possible) bug, no idea about the cause. I'll keep my description high level so others know the gist of the issue, happy to answer additional questions.

    I get different results from my program when I use PackedArray *f = PackedArray_create(1, len); vs PackedArray *f = PackedArray_create(2, len);. The only line of code modifying the array is PackedArray_set(f, offset, 1); and as such the number of bits should not matter.

    opened by guinn8 1
  • A question about __PackedArray_pack_ code

    A question about __PackedArray_pack_ code

    Hi @gpakosz ,

    Greeting from me!

    I am reading PackedArray's code, and come across the following statement:

    packed |= *out & ~((uint32_t)(1ULL << ((((uint64_t)count * (uint64_t)PACKEDARRAY_IMPL_BITS_PER_ITEM + startBit - 1) % 32) + 1)) - 1);
    

    Why not just use:

    packed |= *out & ~((uint32_t)(1ULL << (((uint64_t)count * (uint64_t)PACKEDARRAY_IMPL_BITS_PER_ITEM + startBit) % 32)) - 1)
    

    What is the difference between these two methods?

    Thanks very much in advance!

    Bets Regards Nan Xiao

    question 
    opened by NanXiao 3
  • Effect of PACKEDARRAY_* macros?

    Effect of PACKEDARRAY_* macros?

    The documentation lists several macros which are supposed to customize the behaviour of the library. However, it doesn't specify what exactly this does. Would you be able to explain (and maybe include it)?

    opened by kozross 1
Owner
Gregory Pakosz
"If You're Not Appearing, You're Disappearing” — Art Blakey
Gregory Pakosz
Given two arrays X and Y of positive integers, find the number of pairs such that x^y > y^x (raised to power of) where x is an element from X and y is an element from Y.

Number-of-pairs Given two arrays X and Y of positive integers, find the number of pairs such that x^y > y^x (raised to power of) where x is an element

Rajesh Kumar Sah 1 Nov 20, 2021
libsais is a library for linear time suffix array and burrows wheeler transform construction based on induced sorting algorithm.

libsais libsais is a library for fast (see Benchmarks below) linear time suffix array and Burrows-Wheeler transform construction based on induced sort

Ilya Grebnov 112 Dec 22, 2022
A collection of array rotation algorithms.

The most commonly used rotation algorithms (aka block swaps) were documented around 1981 and haven't notably changed since. Below I'll describe severa

null 117 Dec 26, 2022
This repo is full with code from random c++ projects. Anyone can contribute

Randomcplusplus Learn c++ This repo is for random c++ code that can be used for any project Learn how to code in c++ by using and reading peoples exam

Thomas Kerby 2 Jan 30, 2022
This algorithm is amazing and take a high performance to search something under array.

Sequential Binary Algorithm O(n) Algoritmo Este é um algoritmo de complexidade O(log n), que possui uma alta performance em percorrer um vetor de inte

Guilherme Serafim Kollet 1 Oct 26, 2021
Structure-of-array synthesis in C++20

Structure-of-array synthesis in C++20

celtera 62 Nov 7, 2022
Simple PRNG (Pseudo random number generator) i wrote in C

Floian Simple PRNG i wrote in C (Kinda trash actually). Kinda same with Xorshift32. For speed i didn't test it yet, maybe later. Anyone can use and mo

Natsuo 1 Nov 8, 2021
C implementation of Random Depth-first search algorithm with back tracking

Maze Generating Algorithm This is an implementation of Randomized depth-first search algorithm. It creates a maze by taking a random valid path and co

Jonas STIRNEMANN 2 May 2, 2022
Data Structure Studying Group: array, linked list, stack, queue, deque, tree, graph. Summary of these concepts are descripted in the markdown files.

DATA_STRUCTURE STUDY ?? CURRICULUM ✔️ Day1 array linked list ✔️ Day2 circular linked list double linked list polynomial addtion reverse linked list ✔️

Soyeon Kim 3 Jul 22, 2022
Display array is a board that sets 6 ST7735 display with a resolution of 80x160px in a linear array sharing the clock, data, rs, backlight pins together

The display array is a board that sets 6 ST7735 display with a resolution of 80x160px in a linear array sharing the clock, data, rs, backlight pins together, and leaving individual access to the cs lines of each display, This board allows you to display images with a resolution of 480x160px.

Josue Alejandro Gutierrez 70 Dec 19, 2022
x64 Windows PatchGuard bypass, register process-creation callbacks from unsigned code

NoPatchGuardCallback x64 Windows PatchGuard bypass, register process-creation callbacks from unsigned code Read: https://www.godeye.club/2021/05/22/00

Kento Oki 139 Dec 26, 2022
x64 Windows kernel driver mapper, inject unsigned driver using anycall

anymapper x64 Windows kernel driver mapper, inject unsigned driver using anycall This project is WIP. Todo Fix: Can't make API calls from IAT nor func

Kento Oki 72 Dec 26, 2022
Flexible, portable, high-performance bit fields C++ library. unsigned a:13 becomes F<13> a;

C-plus-plus-library-bit-fields Flexible, portible, high-performance bit fields C++ library. The bit fields are specified with a dummy structure where

Walt Karas 27 Oct 12, 2022
An implementation of a weak handle interface to a packed vector in C++

Experimental handle container in C++ Overview Following on from c-handle-container, this library builds on the same ideas but supports a dynamic numbe

Tom Hulton-Harrop 13 Nov 26, 2022
PageBuster - dump all executable pages of packed processes.

PageBuster Ever wanted to dump all the executable pages of a process? Do you crave something capable of dealing with packed processes? We've got you c

rev.ng 188 Oct 10, 2022
A utility to fix intentionally corrupted UPX packed files.

UPX Fixer Some C code to repair corrupt p_info header on UPX! packed malware. It fixes two variants I found that were pretty common. There are many ot

Larry W. Cashdollar 52 Oct 14, 2022
Simple font renderer library written in Opengl 3.3 using stb_truetype.h to load a packed bitmap into texture of a .ttf font.

mv_easy_font Simple font renderer library written in Opengl 3.3 using stb_truetype.h to load a packed bitmap into texture of a .ttf font. Uses instanc

null 27 May 13, 2022
A simple C library for compressing lists of integers using binary packing

The SIMDComp library A simple C library for compressing lists of integers using binary packing and SIMD instructions. The assumption is either that yo

Daniel Lemire 409 Dec 22, 2022
A C++20 implementation of safe (wrap around) integers following MISRA C++ rules

PSsimplesafeint A C++20 implementation of safe (wrap around) integers following MISRA C++ rules. I might also work on a C++17 implementation, but this

Peter Sommerlad 37 Aug 15, 2022