Experimental managed C-strings library

Overview

logo

Stricks

Managed C strings library.

CI
📙 API

Why ?

Because handling C strings is tedious and error-prone.
Appending while keeping track of length, null-termination, realloc, etc...

Speed is also a concern with excessive (sometimes implicit) calls to strlen.

Principle

schema

The stx_t (or "strick") type is just a normal char* string.

typedef const char* stx_t;

The trick 😉 lies before the stx_t address :

Header {   
         cap;  
         len;  
         canary; 
         flags;
    char data[];
}

Header takes care of the string state and bounds.
The stx_t type points directly to the data member.

Header and data occupy a single block of memory (an "SBlock"),
avoiding the indirection you find in {len,*str} schemes.

This technique is used notably in antirez SDS.

The SBlock is invisible to the user, who only passes stx_t to and from.
The convenience is, being really char*, stricks can be passed to any (non-modifying) <string.h> function.

The above layout is simplified. In reality, Stricks uses two header types to optimize space, and houses the canary and flags in a separate struct.

Security

Stricks aims at limiting memory faults through the API :

  • Typedef const char* forces the user to cast when she wants to write.
  • All API methods check for a valid Header.
  • If invalid, no action is taken and a falsy value is returned.

(See stx_free)

Usage

// app.c

#include <stdio.h>
#include "stx.h"

int main() {

    stx_t s = stx_from("Stricks");
    stx_append_alloc (&s, " are treats!");        
    
    printf(s);

    return 0;
}
$ gcc app.c libstx -o app && ./app
Stricks are treats!

Sample

example/forum.c implements a mock forum with a fixed size page buffer.
When the next post would truncate, the buffer is flushed.

make && cd example && ./forum

Build & unit-test

make && make check

API

stx_new
stx_from
stx_from_len
stx_dup
stx_load

stx_cap
stx_len
stx_spc

stx_free
stx_reset
stx_update
stx_trim
stx_show
stx_resize
stx_check
stx_equal
stx_split

stx_append / stx_cat
stx_append_count / stx_ncat
stx_append_format / stx_catf
stx_append_alloc / stx_cata
stx_append_count_alloc / stx_ncata

Custom allocator and destructor can be defined with

#define STX_MALLOC  my_allocator
#define STX_REALLOC my_realloc
#define STX_FREE    my_free

stx_new

Allocates and inits a new strick of capacity cap.

stx_t stx_new (size_t cap)

stx_from

Creates a new strick by copying string src.

stx_t stx_from (const char* src)

Capacity is adjusted to length.

stx_t s = stx_from("Stricks");
stx_show(s); 
// cap:7 len:7 data:'Stricks'

stx_from_len

Creates a new strick with at most len bytes from src.

stx_t stx_from_len (const char* src, size_t len)

If len > strlen(src), the resulting capacity is len.
Capacity gets trimmed down to length.

stx_t s = stx_from_len("Stricks", 7);
stx_show(s); 
// cap:7 len:7 data:'Stricks'

stx_t s = stx_from_len("Stricks", 10);
stx_show(s); 
// cap:10 len:7 data:'Stricks'

stx_t s = stx_from_len("Stricks", 3);
stx_show(s); 
// cap:3 len:3 data:'Str'

stx_dup

Creates a duplicate strick of src.

stx_t stx_dup (stx_t src)

Capacity gets trimmed down to length.

stx_t s = stx_new(16);
stx_cat(s, "foo");
stx_t dup = stx_dup(s);
stx_show(dup); 
// cap:3 len:3 data:'foo'

stx_load

Read string from file.

stx_t stx_load (const char* src_path)

stx_cap

Current capacity accessor.

size_t stx_cap (stx_t s)

stx_len

Current length accessor.

size_t stx_len (stx_t s)

stx_spc

Remaining space.

size_t stx_spc (stx_t s)

stx_reset

Sets data length to zero.

void stx_reset (stx_t s)
stx_t s = stx_new(16);
stx_cat(s, "foo");
stx_reset(s);
stx_show(s); 
// cap:16 len:0 data:''

stx_free

void stx_free (stx_t s)

Releases the enclosing SBlock.

🍰 Security :
Once the block is freed, no use-after-free or double-free should be possible through the Strick API :

stx_t s = stx_new(16);
stx_append(s, "foo");
stx_free(s);

// Use-after-free
stx_append(s, "bar");
// No action. Returns 0.  
printf("%zu\n", stx_len(s));
// 0

// Double-free
stx_free(s);
// No action.

🔧 How it works
On first call, stx_free(s) zeroes-out the header, erasing the canary.
All subsequent API calls check the canary and do nothing if dead.

stx_resize

Change capacity.

bool stx_resize (stx_t *pstx, size_t newcap)
  • If increased, the passed reference may get transparently updated.
  • If lowered below length, data gets truncated.

Returns: true/false on success/failure.

stx_t s = stx_new(3);
int rc = stx_cat(s, "foobar"); // -> -6
if (rc<0) stx_resize(&s, -rc);
stx_cat(s, "foobar");
stx_show(s); 
// cap:6 len:6 data:'foobar'

stx_update

Sets len straight in case data was modified from outside.

void stx_update (stx_t s)

stx_trim

Removes white space, left and right.

void stx_trim (stx_t s)

Capacity remains the same.

stx_split

Splits a strick or string on separator sep into an array of stricks.

stx_t*
stx_split (const void* s, const char* sep, unsigned int *outcnt)

*outcnt gets the array length.

stx_t s = stx_from("foo, bar");
unsigned cnt = 0;

stx_t* list = stx_split(s, ", ", &cnt);

for (int i = 0; i < cnt; ++i) {
    stx_show(list[i]);
}

// cap:3 len:3 data:'foo'
// cap:3 len:3 data:'bar'

Or more comfortably (using the list sentinel)

while (part = *list++) {
    stx_show(part);
}

stx_equal

Compares a and b's data string.

bool stx_equal (stx_t a, stx_t b)
  • Capacities are not compared.
  • Faster than memcmp since stored lengths are compared first.

stx_show

void stx_show (stx_t s)

Utility. Prints the state of s.

stx_show(foo);
// cap:8 len:5 data:'hello'

stx_check

bool stx_check (stx_t s)

Check if s has a valid header.

stx_append

stx_cat

int stx_append (stx_t dst, const char* src)

Appends src to dst.

  • No reallocation.
  • Nothing done if input exceeds remaining space.

Return code :

  • rc >= 0 on success, as change in length.
  • rc < 0 on potential truncation, as needed capacity.
  • rc = 0 on error.
stx_t s = stx_new(5);  
stx_cat(s, "abc"); //-> 3
printf("%s", s); // "abc"  
stx_cat(s, "def"); //-> -6  (needs capacity = 6)
printf("%s", s); // "abc"

stx_append_count

stx_ncat

int stx_ncat (stx_t dst, const char* src, size_t n)

Appends at most n bytes from src to dst.

  • No reallocation.
  • if n is zero, strlen(src) is used.
  • Nothing done if input exceeds remaining space.

Return code :

  • rc >= 0 on success, as change in length.
  • rc < 0 on potential truncation, as needed capacity.
  • rc = 0 on error.
stx_t s = stx_new(5);  
stx_ncat(s, "abc", 2); //-> 2
printf("%s", s); // "ab"

stx_append_format

stx_catf

int stx_catf (stx_t dst, const char* fmt, ...)

Appends a formatted c-string to dst, in place.

  • No reallocation.
  • Nothing done if input exceeds remaining space.

Return code :

  • rc >= 0 on success, as increase in length.
  • rc < 0 on potential truncation, as needed capacity.
  • rc = 0 on error.
stx_t foo = stx_new(32);
stx_catf (foo, "%s has %d apples", "Mary", 10);
stx_show(foo);
// cap:32 len:18 data:'Mary has 10 apples'

stx_append_alloc

stx_cata

size_t stx_cata (stx_t *pdst, const char* src)

Appends src to *pdst.

  • If over capacity, *pdst gets reallocated.
  • reallocation reserves 2x the needed memory.

Return code :

  • rc = 0 on error.
  • rc >= 0 on success, as change in length.
stx_t s = stx_new(3);  
stx_cat(s, "abc"); 
stx_cata(s, "def"); //-> 3 
stx_show(s); // "cap:12 len:6 data:'abcdef'"

stx_append_count_alloc

stx_ncata

size_t stx_ncata (stx_t *pdst, const char* src, size_t n)

Append n bytes of src to *pdst.

  • If n is zero, strlen(src) is used.
  • If over capacity, *pdst gets reallocated.

Return code :

  • rc = 0 on error.
  • rc >= 0 on success, as change in length.

TODO

  • Slices / StringView
  • Utf-8 ?
  • More high-level methods ?
Issues
  • CI: Add jobs for Clang, c99 and c17 and -O3 to build matrix

    CI: Add jobs for Clang, c99 and c17 and -O3 to build matrix

    Expands the GitHub Actions CI build matrix to eight jobs:

    • All combinations of two compilers (GCC and Clang) and C standards (c99, c11 and c17)
    • Two optimized -O3 builds using GCC and Clang for c17.

    I split this PR into 3 commits, each adding one option:

    1. Different compilers (GCC and Clang)
    2. Different C standards (c99, c11 and c17)
    3. The -O3 optimization level for c17

    If you have any questions, find some combinations not useful or would like to have some combinations added, please let me know!

    opened by EwoutH 7
  • Create initial CI workflow with GitHub Actions

    Create initial CI workflow with GitHub Actions

    Builds with make and make check on Ubuntu on pushes and pull requests. GitHub Actions can be enabled here.

    Fun note: This is probably the fastest running CI configuration I have ever written!

    opened by EwoutH 4
  • Fix stx_split for self colliding separators.

    Fix stx_split for self colliding separators.

    These changes should cause

    int count;
    stx_t* results = stx_split("baaaad", "aa", &count);
    

    to return

    "b"
    ""
    "d"
    

    rather than

    "b"
    "aad"
    "ad"
    "d"
    
    opened by ILMTitan 3
  • Compilation fails

    Compilation fails

    Hi,

    I tried to clone it and compile it but it directly fails, any idea ?

    I'm on mac so it compiles with clang.

    clang --version
    Apple clang version 12.0.0 (clang-1200.0.32.29)
    Target: x86_64-apple-darwin20.3.0
    Thread model: posix
    InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    
    Capture d’écran 2021-03-11 à 23 11 32
    opened by Yohannfra 3
  • CodeBlocks compilation error

    CodeBlocks compilation error

    Hello,

    I've got an error when compiling stx.c in my project. I'm using CodeBlocks v17.12.

    stx.c|47|error: expected declaration specifiers or '...' before '(' token|

    Ps. I'm NOT using c99 or c11 standard. Could you make it comatible please?

    image

    opened by fabek 1
  • Love this library, but cant use it.

    Love this library, but cant use it.

    I absolutely love to use this library, small, fast, and simple. Just what I was after.

    My plan was to use this in the embedded place we use at work, but problem is that GPL requires me to provide the full source of our application if I include this. And that's really a tough sell.

    I love doing open source, and I'm happily sharing all the work I can, and I'm a regular contributor. So I absolutely understand you, but is there any way you can consider switching to LGPL, MIT, CC, or some other less restrictive license? I'm pretty sure that would accelerate the popularity, and also increase contributions to your code in the long run.

    opened by jimmyw 2
  • undefined behavior in grow()

    undefined behavior in grow()

    I saw the link to this project on hacker news, it looks interesting. One thing I noticed was that in grow() it has

    // TYPE1 -> TYPE4
    RELOC(4)
    memcpy (newdata, DATA(newhead, TYPE1), dims.len+1); 
    

    the result of memcpy() is undefined when the source and destination overlap, this should use memmove() instead. One a related note the code to grow a string seems to be duplicated in a different form in resize(), perhaps that could call grow() instead.

    opened by phillipwood 2
Owner
Francois Alcover
Studied Computer Science at Paris VI.
Francois Alcover
A fast Python Common substrings of multiple strings library with C++ implementation

A fast Python Common substrings of multiple strings library with C++ implementation Having a bunch of strings, can I print some substrings which appea

Đào Nguyên Dương 6 Sep 14, 2021
C++ library to manage strings with different encodings

StringSuite C++ library to manage strings and (almost) any kind of encoded data. License Encmetric is written under the GNU Lesser General Public Lice

Paolo 3 Feb 12, 2022
StringCheese is a CTF tool to solve easy challenges automatically in many cases where a strings | grep is just not enough

StringCheese StringCheese is a script written in Python to extract CTF flags (or any other pattern with a prefix) automatically. It works like a simpl

Mathis HAMMEL 56 Feb 21, 2022
➿ mulle-c-string-escape turns data into C-strings

mulle-c-string-escape ➿ mulle-c-string-escape turns data into C-strings Non-ASCII characters will be escaped to hex or octal. C-escapes are used for k

Nat! 9 Apr 11, 2022
Gcpp - Experimental deferred and unordered destruction library for C++

You can find a talk that describes this library here: Video: "Leak-Freedom in C++... By Default" (CppCon 2016) PDF slides gcpp: Deferred and unordered

Herb Sutter 837 Jun 14, 2022
A lightweight library of Behavior Trees Library in C++.

A lightweight behavior tree library in C++. NEWS! ?? Thanks to Davide Faconti there is now a more sophisticated version of the library. The new versio

Michele Colledanchise 157 May 29, 2022
A library of generic data structures.

Collections-C A library of generic data structures including a list, array, hashtable, deque etc.. Examples Building and Installing Using the library

Srđan Panić 2.4k Jun 26, 2022
A simple C library for working with KD-Trees

kdtree Overview kdtree is a simple, easy to use C library for working with kd-trees. Kd-trees are an extension of binary search trees to k-dimensional

John Tsiombikas 331 Jun 19, 2022
Library of generic and type safe containers in pure C language (C99 or C11) for a wide collection of container (comparable to the C++ STL).

M*LIB: Generic type-safe Container Library for C language Overview M*LIB (M star lib) is a C library enabling to use generic and type safe container i

PpHd 498 Jul 1, 2022
Linear Linked List Library

list.h Implementations for singly-linked and doubly-linked list functions. Basic Working Example #include <stdio.h> #include <stdlib.h> #include "list

Nick Bulischeck 41 Jun 23, 2022
C header library for typed lists (using macros and "template" C).

vector.h C header library for typed lists (using macros and "template" C). Essentially, this is a resizable array of elements of your choosing that is

Christopher Swenson 28 May 6, 2022
Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as task graphs that are scheduled concurrently and asynchronously on both CPUs and GPUs.

Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as tasks in a graph structure, where edges represent task dependencies

null 23 May 11, 2022
nanoplan is a header-only C++11 library for search-based robot planning.

nanoplan is a header-only C++11 library for search-based robot planning. The primary design goals are correctness, ease-of-use, and efficiency (in tha

Jordan Ford 14 May 17, 2022
libsais is a library for linear time suffix array and burrows wheeler transform construction based on induced sorting algorithm.

libsais libsais is a library for fast (see Benchmarks below) linear time suffix array and Burrows-Wheeler transform construction based on induced sort

Ilya Grebnov 89 Jun 21, 2022
An open source library for C

Eric O Meehan C Library Introduction Eric O Meehan's C Library is an open source collection of tools for the C programming language. The project is in

Eric O Meehan 90 Jun 12, 2022
Wonderful library with lots of useful functions, algorithms and data structures in C, link it with -l9wada

Lib9wada Wonderful library with lots of useful functions, algorithms and data structures in C, link it with -l9wada Usage Compile the library with mak

Lprogrammers Lm9awdine 47 May 29, 2022
Wonderful library with lots of useful functions, algorithms and data structures in C, link it with -l9wada

LibC+ Wonderful library with lots of useful functions, algorithms and data structures in C, link it with -lC+ Better than C, not as much as c++ Usage

BnademOverflow 47 May 29, 2022
Simple C++ Genetic Algorithm library

crsGA: Simple C++ Genetic Algorithm library crsGA is a simple C++ template library for developing genetic algorithms, plus some other utilities (Logge

Rafael Gaitán 6 Apr 24, 2022
A library of type safe sets over fixed size collections of types or values, including methods for accessing, modifying, visiting and iterating over those.

cpp_enum_set A library of type safe sets over fixed size collections of types or values, including methods for accessing, modifying, visiting and iter

Carl Dehlin 22 Jun 16, 2022