libsrt is a C library for writing fast and safe C code, faster.

Overview

master branch: Build Status (gcc/clang/tcc) Build Status (Ubuntu 16) Build Status (vs win32)

master branch analysis: Build Status

documentation (generated by doing ./make_test.sh 8): HTML

libsrt has been included into Paul Hsieh's String Library Comparisons table. What a great honor! :-)

libsrt: Safe Real-Time library for the C programming language

libsrt is a C library that provides string, vector, bit set, set, map, hash set, and hash map handling. It's been designed for avoiding explicit memory management when using dynamic-size data structures, allowing safe and expressive code, while keeping high performance. It is also suitable for low level and hard real-time applications, because functions are predictable in both space and time (assuming OS and underlying C library is also real-time suitable).

Key points:

  • Safe: write high-level-like C code. Write code faster and safer.
  • Fast: using O(n)/O(log n)/O(1) state of the art algorithms.
  • Rich: strings supporting raw data handling (per-byte), vector, bit set, set, map, hash set, and hash map data structures.
  • Efficient: space optimized, minimum allocation calls (heap and stack support for ALL data structures).
  • Compatible: OS-independent (e.g. built-in space-optimized UTF-8 support).
  • Predictable: intended for both general purpose and for hard real-time applications.
  • Unicode: although string internal representation is raw data (bytes), functions for handling Unicode interpretation/generation/transformation are provided, so when Unicode-specific functions are used, the result of these functions is stored internally as UTF-8 (also, caching some operations, like Unicode string length -e.g. ss_len()/ss_size() give length in bytes, and ss_len_u() the length in Unicode characters-).

How to build

  • In most POSIX systems (e.g. Linux, Unix, and other Unix-like) "make -f Makefile.posix" will be enough, as the Makefile.posix includes platform detection for corner cases. However, you can use multiple flags for tuning the build (you can also mix them):

    • make -f Makefile.posix # Defaults (equivalent to make ADD_CFLAGS="-DS_CRC32_SLC=12")
    • make -f Makefile.posix -j 8 # Make, spawning up to 8 concurrent jobs (faster on multiprocessor systems)
    • make -f Makefile.posix DEBUG=1 # Disable optimizations (-O0 -ggdb)
    • make -f Makefile.posix PROFILING=1 # Profiling and coverage flags
    • make -f Makefile.posix CC=gcc PEDANTIC=1 C99=1 # Pedantic build for GCC in C99 mode (this should give no warnings)
    • make -f Makefile.posix CC=tcc # Use the Tiny C Compiler
    • make -f Makefile.posix CC=gcc CXX=g++ # Use gcc for the library and examples and g++ for the benchmark
    • make -f Makefile.posix CC=clang CXX=clang++ # Use CLang for the library and examples and CLang C++ for the benchmark
    • make -f Makefile.posix FORCE32=1 # Force 32 bit build on 64 bit system
    • make -f Makefile.posix MINIMAL=1 # Microcontroller-suitable build (optimize for size and low memory usage)
    • make -f Makefile.posix CC=gcc PROFILING=1 # Build with gcc and profiling
    • make -f Makefile.posix CC=gcc C90=1 # Build with gcc using C89/90 standard
    • make -f Makefile.posix CC=gcc C99=1 # Build with gcc using C99 standard
    • make -f Makefile.posix CC=gcc C11=1 # Build with gcc using C11 standard
    • make -f Makefile.posix CC=g++ # Build with g++ (C++ instead of C)
    • make -f Makefile.posix CC=clang++ CPP11=1 # Build with clang++ using C++11 standard (instead of C)
    • make -f Makefile.posix CC=clang CXX=clang++ C99=1 CPP11=1 # Build with clang using C99 mode and build the benchmark using C++11
    • make -f Makefile.posix CC=tcc DEBUG=1 # Build with TinyCC with debug symbols
    • make -f Makefile.posix CPP11=1 bench # Build the benchmark including C++11 support (for comparing libsrt vs C++/STL hash map/set)
    • make -f Makefile.posix CC=powerpc-linux-gnu-gcc # Build with gcc cross compiler (PPC)
    • make -f Makefile.posix CC=arm-linux-gnueabi-gcc # Build with gcc cross compiler (ARM)
    • make -f Makefile.posix ADD_CFLAGS="-DS_CRC32_SLC=0" # Build without CRC32 hash tables, 1 bit/loop (100MB/s on [email protected])
    • make -f Makefile.posix ADD_CFLAGS="-DS_CRC32_SLC=1" # Build with CRC32 1024 byte hash table, 1 byte/loop (400MB/s on [email protected])
    • make -f Makefile.posix ADD_CFLAGS="-DS_CRC32_SLC=4" # Build with CRC32 4096 byte hash table, 4 bytes/loop (1000MB/s on [email protected])
    • make -f Makefile.posix ADD_CFLAGS="-DS_CRC32_SLC=8" # Build with CRC32 8192 byte hash table, 8 bytes/loop (2000MB/s on [email protected])
    • make -f Makefile.posix ADD_CFLAGS="-DS_CRC32_SLC=12" # Build with CRC32 12288 byte hash table, 12 bytes/loop (2500MB/s on [email protected]) -this is the default CRC32 mode-
    • make -f Makefile.posix ADD_CFLAGS="-DS_CRC32_SLC=16" # Build with CRC32 16384 byte hash table, 16 bytes/loop (2700MB/s on [email protected]):
    • make -f Makefile.posix ADD_FLAGS=-DSD_DISABLE_HEURISTIC_GROWTH # Build with growth heuristics disabled (not recommended)
    • make -f Makefile.posix ADD_FLAGS=-DS_DISABLE_SM_STRING_OPTIMIZATION # Build without map string optimizations (not recommended, except for benchmarking)
    • make -f Makefile.posix HAS_PNG=1 HAS_JPG=1 # Build enabling PNG and JPG usage so the 'imgc' example can convert import/export those formats (libpng and jpeg 6b -e.g. libjpegturbo- compatible dev libs and headers must be installed in the system)
  • Observations

    • Every make call, in addition to building the targets, it does a full test for that build (unit tests covering all the API function calls -'stest' executable-)
    • Between make calls with different parameters, please use "make clean"
    • It should build with any GCC version >= 2.95.2 (1999), in any hardware platform (x86, x86-64, MIPS32/64, MIPSLE, PPC/PPC64, ARM, etc.)
    • CLang should work in all versions (except in cases of reporting wrong/missing compiler or C standard version)
    • For Windows it is provided a .sln example just for building the test (used for the CI check)
    • In BSD systems not using GNU Make as default, use gmake instead of make
  • For launching the extensive tests:

    • ./make_test.sh # All tests (used in the CI validation): all builds (23 * 4 tests), Valgrind memcheck, CLang static analyzer, documentation generation and validation, coding style check
    • ./make_test.sh 1 # Validate all available C/C++ builds
    • ./make_test.sh 2 # Valgrind memcheck
    • ./make_test.sh 4 # Clang static analyzer
    • ./make_test.sh 8 # Generate documentation
    • ./make_test.sh 16 # Check coding style
    • ./make_test.sh 24 # Like 8 plus 16 (3 would be like 1 plus 2, etc.)
  • A preliminary Autoconf/Automake build is also provided:

    • ./bootstrap.sh
    • ./configure
    • make
    • make check
  • A Visual Studio project is provided in win/vs2013/ for running the test through AppVeyor's CI.

Generic advantages

  • Ease of use

    • Use strings, vectors, bit sets, sets, maps, hash sets, and hash maps in a similar way to higher level languages.
  • Space-optimized

    • Dynamic one-block linear addressing space.
    • Internal structures use indexes instead of pointers (i.e. similar memory usage in both 32 and 64 bit mode).
    • Details: doc/benchmarks.md
  • Time-optimized

    • Buffer direct access
    • Preallocation hints (reducing memory allocation calls)
    • Heap and stack memory allocation support
    • Details: doc/benchmarks.md
  • Predictable (suitable for hard and soft real-time)

    • Predictable execution speed: all API calls have documented time complexity. Also space complexity, when extra space involving dynamic memory is required.
    • Hard real-time: allocating maximum size (strings, vector, bit set, set, map, hash set, hash map) will imply calling 0 or 1 times to the malloc() function (C library). Zero times if using the stack as memory or externally allocated memory pool, and one if using the heap. This is important because malloc() implementation has both memory and time overhead, as internally manages a pool of free memory (which could have O(1), O(log(n)), or even O(n), depending on the compiler provider C library implementation).
    • Soft real-time: logarithmic time memory allocation (when not doing preallocation): for 10^n new elements just log(n) calls to the malloc() function. This is not a guarantee for hard real time, but for soft real time (being careful could provide almost same results, except in the case of very poor malloc() implementation in the C library being used -not a problem with modern compilers-).
  • RAM, ROM, and disk operation

    • Data structures can be stored in ROM memory.
    • Data structures are suitable for memory mapped operation, and disk store/restore. This is true for strings, vectors, and bit sets, and for sets/maps/hash sets/hash maps when using integer data and when using small strings (<= 19 bytes for S/SI/IS data types, and <= 54 bytes for SS).
  • Known edge case behavior

    • Allowing both "carefree code" and per-operation error check. I.e. memory errors and UTF8 format error can be checked after every operation.
  • Implementation

    • Simple internal structures, with linear addressing. It allows to reduce memory allocation calls to the minimum (using the stack it is possible to avoid heap usage).
    • Implementation kept as simple as possible/known.
    • Focus in avoiding code-bloat, avoiding expensive code duplication/inlining (inlining is used when being "free" or because of big speed impact)
    • Small code, suitable for static linking. In fact, currently no dynamic linking is provided (will be added, but it is not a priority).
    • Coverage, profiling, and memory leak/overflow/uninitialized checks (Valgrind, CLang static analysis, Coverity, MS VS).
  • Compatibility

    • C99 (and later) compatible (requiring alloca() function)
    • C++11 (and later) compatible (requiring alloca() function)
    • Endian-safe: the library works with any CPU endianess (tested with little and big endian, but it should work with other modes, e.g. with PDP endianess -not tested-)
    • POSIX builds (Linux, BSD's) require GNU Make (e.g. on FreeBSD use 'gmake' instead of 'make')
    • Compatibility with old C compilers:
      • Oldest GNU C compiler being tested is GCC 2.95.2 (1999)
      • Other old compilers may work in C89 mode only if following language extensions are available:
        • __VA_ARGS__ macro
        • alloca()
        • type of bit-field in 'struct'
        • %S printf extension (only for unit testing)
    • Compatibility with old C++ compilers:
      • It may work in C++98 mode only if following language extensions are available:
        • Anonymous variadic macros
        • Long long integer constant support (LL)

Generic disadvantages/limitations

  • Double pointer usage: because of using just one allocation, write operations require to address a double pointer, so in the case of reallocation the source pointer could be changed.
  • Concurrent read-only operations are safe, but concurrent read/write must be protected by the user (e.g. using mutexes or spinlocks). That can be seen as a disadvantage or as a "feature" (it is faster).

String-specific advantages (srt_string)

  • Binary data support
    • I.e. strings can have zeros in the middle
    • Search and replace into binary data is supported
  • Unicode support
    • Although strings internal storage is binary, Unicode-aware functions store data in UTF-8.
    • Search and replace into UTF-8 data is supported
    • Full and fast Unicode lowercase/uppercase support without requiring "setlocale" nor hash tables.
  • Efficient raw and Unicode (UTF-8) handling. Unicode size is tracked, so resulting operations with cached Unicode size, will keep that, keeping the O(1) for getting that information afterwards.
    • Find/search: O(n), one pass.
    • Replace: O(n), one pass. Worst case overhead is limited to a realloc and a copy of the part already processed.
    • Concatenation: O(n), one pass for multiple concatenation. I.e. Optimal concatenation of multiple elements require just one allocation, which is computed before the concatenation. When concatenating ss_t strings the allocation size compute time is O(1).
    • Resize: O(n) for worst case (when requiring reallocation for extra space. O(n) for resize giving as cut indicator the number of Unicode characters. O(1) for cutting raw data (bytes).
    • Case conversion: O(n), one pass, using the same input if case conversion requires no allocation over current string capacity. If resize is required, in order to keep O(n) complexity, the string is scanned for computing required size. After that, the conversion outputs to the secondary string. Before returning, the output string replaces the input, and the input becomes freed.
    • Avoid double copies for I/O (read access, write reserve)
    • Avoid re-scan (e.g. commands with offset for random access)
    • Transformation operations are supported in all dup/cpy/cat functions, in order to both increase expressiveness and avoid unnecessary copies (e.g. tolower, erase, replace, etc.). E.g. you can both convert to lower a string in the same container, or copy/concatenate to another container.
  • Space-optimized
    • Using just 4 byte overhead for strings with size <= 255 bytes
    • Using sizeof(size_t) * 5 byte overhead for strings with size >= 256 bytes (e.g. 20 bytes for a 32-bit CPU, 40 for 64-bit)
    • Data structure has no pointers, i.e. just one allocation is required for building a string. Or zero, if using the stack.
    • No additional memory allocation for search.
    • Extra memory allocation may be required for: UTF-8 uppercase/lowercase and replace.
    • Strings can grow from 0 bytes to ((size_t)~0 - metainfo_size)
  • String operations
    • Copy, cat, tolower/toupper, find, split, printf, cmp, base64, data compression, crc32 on buffers, etc.
    • All string operations allow C strings and raw buffers as input, without extra copies (ss_[c]refa functions)
    • Allocation, buffer pre-reserve,
    • Raw binary content is allowed, including 0's.
    • "Wide char" and "C style" strings R/W interoperability support.
    • I/O helpers: buffer read, reserve space for async write
    • Aliasing suport, e.g. ss_cat(&a, a) is valid
  • Misc string/buffer operations:
    • Real-time O(n) data compression (stateless, unlimited buffer size, and hash table resource usage proportional to the input size, i.e. efficient also for small inputs)
    • State of the art encoding: base64, hexadecimal, etc. (at GB/s speeds)
    • State of the art CRC32 and Adler32 hashes on strings (at >2 GB/s speeds)
  • Focus on reducing verbosity:
    • ss_cat(&t, s1, ..., sN);
    • ss_cat(&t, s1, s2, ss_printf(&s3, "%i", cnt), ..., sN);
    • ss_free(&s1, &s2, ..., &sN);
    • Expressive code without explicit memory handling
  • Focus on reducing errors, e.g.
    • If a string operation fails, the string is kept in the last successful state (e.g. ss_cat(&a, b, huge_string, other))
    • String operations always return valid strings, e.g. This is OK: srt_string *s = NULL; ss_cpy_c(&s, "a"); Same behavior as: srt_string *s = ss_dup_c("a");
    • ss_free(&s1, ..., &sN); (no manual set to NULL is required)

String-specific disadvantages/limitations

  • No reference counting support. Rationale: simplicity.

Vector-specific advantages (srt_vector)

  • Variable-length concatenation and push functions.
  • Allow explicit size for allocation (8, 16, 32, 64 bits) with size-agnostic generic signed/unsigned functions (easier coding).
  • Allow variable-size generic element.
  • Sorting
    • O(n) for 8-bit elements (counting sort algorithm), much faster than GNU/Clang qsort() (C), and up to 5x faster than GNU/Clang std::vector sort (C++)
    • O(n log n) -pseudo O(n)- for 16/32/64-bit elements (in-place MSD binary radix sort algorithm), 2x-3x faster than GNU/Clang qsort() (C), performing similar to GNU/Clang std::vector sort (C++)
    • O(n log n) for generic elements using the C library (qsort())

Vector-specific disadvantages/limitations

  • No insert function. Rationale: insert is slow (O(n)). Could be added, if someone asks for it.

Set and map advantages (srt_set and srt_map)

  • Abstraction over Red-Black tree implementation using linear memory pool with just 8 byte per node overhead, allowing up to (2^32)-1 nodes (for both 32 an 64 bit compilers). E.g. for a key-value map, one million 32 bit key, 32 bit value map will take just 16MB of memory (16 bytes per element -8 byte metadata, 4 + 4 byte data-).
  • Keys: integer (8, 16, 32, 64 bits) and string (ss_t)
  • Values: integer (8, 16, 32, 64 bits), string (ss_t), and pointer
  • O(1) for allocation
  • O(1) for deleting without strings (one or zero calls to 'free' C function)
  • O(n) for deleting with strings (n + one or zero calls to 'free' C function)
  • O(n) for copy (in case of without strings, would be as fast as a memcpy())
  • O(log n) insert, search, delete
  • O(n) sorted enumeration (amortized O(n log n))
  • O(n) unsorted enumeration (faster than the sorted case)
  • O(n) copy: tree structure is copied as fast as a memcpy(). For types involving strings, additional allocation is used for duplicating strings.
  • Short string optimization so strings up to 18 bytes can fit in the node for (SI, IS, SP maps, and S sets), and up to 54 bytes combined for string-string maps (SS type). Short strings require no extra allocation/de-allocation calls.

Set and map disadvantages/limitations (srt_set and srt_map)

  • Because of being implemented as a tree, it can be slower than a hash-map on average, specially on integer data. However, total execution time is not that bad, as because of allocation heuristics a lot of calls to the allocator are avoided.
  • There is room for node deletion speed-up (currently deletion is a bit slower than insertion, because of an additional tree search used for avoiding having memory fragmentation, as implementation guarantees linear/compacted memory usage, it could be optimized with a small LRU for cases of multiple delete/insert operation mix).

Hash set and hash map advantages (srt_hset and srt_hmap)

  • Implemented using open-addressing hash table, using linear memory pool with 12 byte per bucket overhead, allowing up to (2^32)-1 nodes (for both 32 an 64 bit compilers). E.g. for a key-value hash map, one million 32 bit key, 32 bit value map will take just 20MB of memory (20 bytes per insertion -12 byte for the hash table bucket, 4 + 4 byte data-).
  • Keys: integer (8, 16, 32, 64 bits) and string (ss_t)
  • Values: integer (8, 16, 32, 64 bits), string (ss_t), and pointer
  • O(1) for allocation
  • O(1) for deleting without strings (one or zero calls to 'free' C function)
  • O(n) for deleting with strings (n + one or zero calls to 'free' C function)
  • O(n) for map copy (in case of maps without strings, would be as fast as a memcpy())
  • O(n) -O(1) amortized- insert, search, delete
  • O(n) unsorted enumeration
  • O(n) copy: hash table structure and data elements are copied as fast as a memcpy(). For map types involving strings, additional allocation is used for duplicating strings.
  • Short string optimization so strings up to 18 bytes can fit in the node for (SI, IS, SP maps, and S sets), and up to 54 bytes combined for string-string maps (SS type). Short strings require no extra allocation/de-allocation calls.

Hash set and hash map disadvantages/limitations (srt_hset and srt_hmap)

  • Because of being implemented as a hash table, if not pre-reserved, rehash adds latency.

Test-covered platforms

ISA Word size Endianess Unaligned memory access HW support OS Compilers Code analysis Test coverage
x86, x86-64 (Core i5) 32, 64 little yes Linux Ubuntu 12.04/14.04/17.10 gcc, g++, tcc, clang, clang++ Valgrind, clang, Coverity Travis CI (automatic, every public commit)
x86, x86-64 (Core i5) 32, 64 little yes Windows Visual Studio Express 2013, AppVeyor's VS VS AppVeyor (automatic, every public commit)
x86, x86-64 (Core i5) 32, 64 little yes FreeBSD 10.2 gcc, g++, clang, clang++ Valgrind clang manual
x86, x86-64 (Core2Duo) 32, 64 little yes Darwin 11.4.2 gcc, g++, clang, clang++ none manual
ARMv5 (ARM926EJ-S) 32 little no Arch Linux gcc, g++, clang, clang++ none manual
ARMv5 (ARM926EJ-S) 32 little no Linux Debian 7.0 "Wheezy" gcc, g++, clang, clang++ none manual
ARMv5 (Feroceon) 32 little no Linux Debian 7.0 "Wheezy" gcc, g++ none manual
ARMv6 (ARM1176JZF-S) 32 little yes Linux Raspbian gcc, g++, clang, clang++ Valgrind, clang manual
ARMv7-A (Krait 400) 32 little yes Linux Android 5.1.1 + BusyBox gcc, g++ none manual
ARMv8-A (Cortex A53) 64 little yes Debian 8.5 "Jessie" gcc, g++, clang, clang++ Valgrind, clang manual
ARMv8-A (MSM8996) 64 little yes Linux Android 7.1.1 clang, clang++ clang manual
MIPS, MIPS64 (Octeon) 32, 64 big yes EdgeOS v1.6.0 (Linux Vyatta-based using Debian 7 "Wheezy" packages) gcc, g++, clang, clang++ Valgrind, clang manual
MIPS (EE R5900) 32 little no Playstation 2 Linux (Red Hat based) gcc, g++ 2.95.2 none manual
PowerPC (G4) 32 big yes Linux Ubuntu 12.04 gcc, g++ none manual
PowerPC, PPC64 (Cell) 32, 64 big yes Yellow Dog Linux 6.2 for Playstation 3 (Red Hat based) gcc, g++ 4.1.2 none manual

License

Copyright (c) 2015-2019, F. Aragon. All rights reserved. Released under the BSD 3-Clause License (see the LICENSE file included).

Contact

email: faragon.github (GMail account, add @gmail.com)

Other

Status

Beta. API still can change: suggestions are welcome.

Acknowledgements and references

Check doc/references.md

Comments
  • src/saux/schar.h: fix function prototype

    src/saux/schar.h: fix function prototype

    Fixing incompatibility of parameters type between declaration and definition of the function.

    Signed-off-by: Viviana Quirama [email protected]

    opened by viquiram 5
  • Changing default allocators to support rpmalloc

    Changing default allocators to support rpmalloc

    I've been experimenting with different malloc replacement and so far I think that the best one for my use case is rpmalloc.

    This is how I tried to use it with libsrt.

    In src/saux/sconfig.h:

    #ifndef S_MALLOC
    #define _s_malloc rpmalloc
    #endif
    #ifndef S_CALLOC
    #define _s_calloc rpcalloc
    #endif
    #ifndef S_REALLOC
    #define _s_realloc rprealloc
    #endif
    #ifndef S_FREE
    #define _s_free rpfree
    #endif
    

    Not that I'm having any issue, but have I missed anything?

    P.S. Happy new year!

    opened by drkameleon 3
  • Add string (ss_t) missing tests

    Add string (ss_t) missing tests

    ss_set_size, ss_real_off, ss_alloc_errors, ss_encoding_errors, ss_clear_errors, ss_cpy_wn, ss_enc_b64, ss_enc_hex, ss_enc_HEX, ss_enc_esc_json, ss_enc_esc_xml, ss_enc_esc_url, ss_enc_esc_dquote, ss_enc_esc_squote, ss_dec_b64, ss_dec_hex, ss_dec_esc_json, ss_dec_esc_xml, ss_dec_esc_url, ss_dec_esc_dquote, ss_dec_esc_squote, ss_findb, ss_findbm, ss_findc, ss_findnb, ss_find_cn, ss_findr, ss_findrb, ss_findrbm, ss_findrc, ss_findrnb, ss_findr_cn

    opened by faragon 3
  • Allocation bug in sv_* when running out of memory

    Allocation bug in sv_* when running out of memory

    How to reproduce:

    • System with low memory without swap enabled (e.g. 128, 256, or 512MB RAM system)
    • Run the "counter" example with "4 0" input parameters, e.g. ./counter 4 0 < input

    It crashes because accessing a location that should be allocated, but it is not (according to the allocation logic, it should, but it is not; it looks like a subtle bug). I don't think is going to be the case, but can not be discarded being a bug in the C library (bug found in one 32-bit i686 Linux system with eglibc 2.5 and 512MB of RAM -280MB free-); doing tests in more systems should discard this.

    opened by faragon 3
  • False positive reported by Valgrind (3.13.0) memcheck tests

    False positive reported by Valgrind (3.13.0) memcheck tests

    On Ubuntu 17.10, Valgrind (3.13.0) memcheck tests report an uninitialized memory access, which is a false positive. I've filed two bug tickets, one for glib, and another for Valgrind.

    You can check it running all the tests:

        ./make_test.sh
    

    Or just the Valgrind memcheck tests:

        ./make_test.sh 2
    
    opened by faragon 2
  • sm_t: optimize string data

    sm_t: optimize string data

    Description of the problem:

    Internally, a sm_t node size takes 8 bytes for the red-black tree structure (31 bit for "left", 31 bit for "right", 1 bit for "is_red"), and when the key and/or the associated value is a string currently a pointer to a dynamically allocated ss_t string is used. E.g. for a string key, string value, map, it would take 8 + 2 * sizeof(void ), i.e. 16 bytes for a 32-bit system, and 24 for a 64-bit one (just for the node structure, plus size required for the data in its allocated place(s) (!)). Considering that every dynamic memory allocation (malloc()) it takes from 2 to 4 sizeof(void *), plus at least 5 bytes for the ss_t header, we would be using for the string/string sm_t example on 64-bit mode up to 98 bytes for storing two strings of *zero size string/string (!). Discounting the 8 bytes of the RB tree header, it would be a waste of 90 bytes per node (50 bytes for the 32-bit case). That is a huge internal memory fragmentation issue.

    Solution being considered, composed of three cases:

    • Case A (data embedded in the node): For reducing internal fragmentation, allow defining node size with a fixed size that even in worst case wastes less memory than the pure dynamic memory case. E.g. defining for sm_t types using strings, a node size of 32 bytes. Being the first 8 used for the RB tree information, and remaining 24 bytes for e.g. 2 strings of 7 elements, one 32-bit integer and a string of 15 elements, etc.
    • Case B (indirection, or partial indirection, within sm_t structure): In case of strings larger than the fixed-size space, but below 256 bytes, put in the node one (for key or value) or two (for key and value) redirections to sm_t dynamic-grow reserved area. Using 128 or 256-byte (to be decided) fixed size it would allow to avoid external fragmentation, so in case of deleting one element we could fill the gap with the last element.
    • Case C (unlimited size elements, redirected to dynamic memory): for cases of huge keys/values, we would redirect them to dynamic-allocated storage. This case wouldn't be suitable for memory mapped data (I thought about using file indirections, too, but discarded it, because it would be complicating things too much).

    Additional considerations:

    • Space optimized mode, for WORM ("write once read many", with minimum internal and external fragmentation), and cases of small amount of data deleted (cases where external fragmentation could be ignored).
    opened by faragon 2
  • Big Endian CPU issues

    Big Endian CPU issues

    Although most things are working properly, some issues have been found when testing on MIPS hardware (tested on EdgeRouter Lite, Octeon+ MIPS 64r2 CPU):

    • LZW encoding is not working properly.
    • Locale case conversion not passing the test (this may be a false positive, as the test assume that machine has compliant locales -with the exception of Turkish mode, which requires special handlng-).
    opened by faragon 2
  • docs: fix simple typo, collisons -> collisions

    docs: fix simple typo, collisons -> collisions

    There is a small typo in src/saux/ssearch.h.

    Should read collisions rather than collisons.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 1
  • What is the fastest way to append items to an existing vector?

    What is the fastest way to append items to an existing vector?

    This is what I'm currently trying:

        srt_vector* vec = sv_alloc_t(SV_U8,100000000);
    
        double start = getCurrentTime();
        Byte* inner_buff = sv_get_buffer(vec);
        for (int i=0; i<100000000; i++) {
            sv_push_u(&vec,(Byte)i);
        }
        
        printf("total time (srt): %g\n",getCurrentTime()-start);
    
        sv_free(&vec);
    

    I've also tried sv_get_buffer after allocating enough memory and setting the elements like vec[x]=i but it doesn't seem to be working.

    Any suggestions?

    opened by drkameleon 1
  • Migrate LGTM.com installation from OAuth to GitHub App

    Migrate LGTM.com installation from OAuth to GitHub App

    Hi There,

    This project is still using an old implementation of LGTM's automated code review, which has now been disabled. To continue using automated code review, and receive checks on your Pull Requests, please install the GitHub App on this repository.

    Thanks, The LGTM Team

    opened by LGTM-badger 1
  • ss_cat_enc_hex not working properly

    ss_cat_enc_hex not working properly

    Case:

    • ss_cat_enc_hex(&tgt, src)
    • src string size: 1 (checked both with heap and stack, same behaviour)

    To do:

    • Fix the bug
    • Update the tests in order to catch that case
    opened by faragon 1
  • Is libsrt thread-safe?

    Is libsrt thread-safe?

    Hello! I'm thinking of adopting this library for my projects, but I'm worried about thread safety. According to Paul Hsieh's String Library Comparisons table, libsrt has Thread Safety: No. Could you please explain what parts of the library are not thread-safe? Thanks!

    opened by seirios 0
  • Use a build system like CMake or SCons

    Use a build system like CMake or SCons

    Using a build system like CMake or SCons would allow more building options and integration. It would also help newer users build and contribute the library.

    CMake is more standard than SCons or even automake and would allow for better integration with most projects. CMake would also allow for better cross-platform and cross-compiler building. Its also rather simple and very powerful allowing new users to very easily use it. https://cmake.org/

    SCons is less mainstream than CMake however it is still very powerful and simple to use. It uses python for the build script allowing you to write very powerful build scripts to build all kinds of programs. https://scons.org/

    opened by jwoodie 1
Cross-platform STL-styled and STL-compatible library with implementing containers, ranges, iterators, type traits and other tools; actors system; type-safe config interface.

Yato A small repository where I'm gatherting useful snippets and abstractions for C++ development. Yato includes 3 main modules: multidimensional cont

Alexey 9 Jul 22, 2022
Library of generic and type safe containers in pure C language (C99 or C11) for a wide collection of container (comparable to the C++ STL).

M*LIB: Generic type-safe Container Library for C language Overview M*LIB (M star lib) is a C library enabling to use generic and type safe container i

PpHd 527 Oct 2, 2022
A library of type safe sets over fixed size collections of types or values, including methods for accessing, modifying, visiting and iterating over those.

cpp_enum_set A library of type safe sets over fixed size collections of types or values, including methods for accessing, modifying, visiting and iter

Carl Dehlin 22 Jun 16, 2022
Like neofetch, but much faster because written in c. Only Linux.

fastfetch fastfetch is a neofetch like tool for fetching system information and displaying them in a pretty way. It is written in c to achieve much be

Linus Dierheimer 346 Sep 28, 2022
Easy to use, header only, macro generated, generic and type-safe Data Structures in C

C Macro Collections Easy to use, header only, macro generated, generic and type-safe Data Structures in C. Table of Contents Installation Contributing

Leonardo Vencovsky 319 Sep 26, 2022
A C++20 implementation of safe (wrap around) integers following MISRA C++ rules

PSsimplesafeint A C++20 implementation of safe (wrap around) integers following MISRA C++ rules. I might also work on a C++17 implementation, but this

Peter Sommerlad 37 Aug 15, 2022
Functional, Type safe, Lazy abstractions for generic iterators in C

C-Iterplus Functional abstractions for working with iterators, as demonstrated in c-iterators. Finding the longest common prefix of an array of string

Chase 26 Jul 25, 2022
Eggs.Variant is a C++11/14/17 generic, type-safe, discriminated union.

Eggs.Variant Introduction Eggs.Variant is a C++11/14/17 generic, type-safe, discriminated union. See the documentation at http://eggs-cpp.github.io/va

null 139 Sep 9, 2022
Simple C++ code to benchmark fast division algorithms

fast_division Simple C++ code to benchmark fast division algorithms relying on constant divisors. The code is a companion to the paper Integer Divisio

Daniel Lemire 37 Jul 31, 2022
Simple and fast configuration file library (written in C99)

Features Configuration file reading Supported operating systems Ubuntu MacOS Windows Build requirements C99 compiler CMake 3.10+ Cloning git clone htt

Nikita Fediuchin 3 May 26, 2022
A fast and efficient non-iterating hashmap library

libhash: a fast and efficient non-iterating hashmap library Libhash is a fast and efficient non-iterating hashmap library Usage Usage is easy and simp

Oğuzhan Eroğlu 6 Aug 19, 2022
A fast Python Common substrings of multiple strings library with C++ implementation

A fast Python Common substrings of multiple strings library with C++ implementation Having a bunch of strings, can I print some substrings which appea

Đào Nguyên Dương 7 Aug 21, 2022
A family of header-only, very fast and memory-friendly hashmap and btree containers.

The Parallel Hashmap Overview This repository aims to provide a set of excellent hash map implementations, as well as a btree alternative to std::map

Gregory Popovitch 1.6k Sep 30, 2022
C++ implementation of a fast hash map and hash set using hopscotch hashing

A C++ implementation of a fast hash map and hash set using hopscotch hashing The hopscotch-map library is a C++ implementation of a fast hash map and

Thibaut Goetghebuer-Planchon 567 Sep 20, 2022
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Giorgio Vinciguerra 630 Sep 21, 2022
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Thibaut Goetghebuer-Planchon 814 Sep 21, 2022
A fast hash map/hash table (whatever you want to call it) for the C programming language.

C HashMap A fast hash map/hash table (whatever you want to call it) for the C programming language. It can associate a key with a pointer or integer v

Mashpoe 68 Sep 15, 2022