A standalone and lightweight C library

Overview

Klib: a Generic Library in C

Overview

Klib is a standalone and lightweight C library distributed under MIT/X11 license. Most components are independent of external libraries, except the standard C library, and independent of each other. To use a component of this library, you only need to copy a couple of files to your source code tree without worrying about library dependencies.

Klib strives for efficiency and a small memory footprint. Some components, such as khash.h, kbtree.h, ksort.h and kvec.h, are among the most efficient implementations of similar algorithms or data structures in all programming languages, in terms of both speed and memory use.

A new documentation is available here which includes most information in this README file.

Common components

Components for more specific use cases

Methodology

For the implementation of generic containers, klib extensively uses C macros. To use these data structures, we usually need to instantiate methods by expanding a long macro. This makes the source code look unusual or even ugly and adds difficulty to debugging. Unfortunately, for efficient generic programming in C that lacks template, using macros is the only solution. Only with macros, we can write a generic container which, once instantiated, compete with a type-specific container in efficiency. Some generic libraries in C, such as Glib, use the void* type to implement containers. These implementations are usually slower and use more memory than klib (see this benchmark).

To effectively use klib, it is important to understand how it achieves generic programming. We will use the hash table library as an example:

#include "khash.h"
KHASH_MAP_INIT_INT(m32, char)        // instantiate structs and methods
int main() {
    int ret, is_missing;
    khint_t k;
    khash_t(m32) *h = kh_init(m32);  // allocate a hash table
    k = kh_put(m32, h, 5, &ret);     // insert a key to the hash table
    if (!ret) kh_del(m32, h, k);
    kh_value(h, k) = 10;             // set the value
    k = kh_get(m32, h, 10);          // query the hash table
    is_missing = (k == kh_end(h));   // test if the key is present
    k = kh_get(m32, h, 5);
    kh_del(m32, h, k);               // remove a key-value pair
    for (k = kh_begin(h); k != kh_end(h); ++k)  // traverse
        if (kh_exist(h, k))          // test if a bucket contains data
			kh_value(h, k) = 1;
    kh_destroy(m32, h);              // deallocate the hash table
    return 0;
}

In this example, the second line instantiates a hash table with unsigned as the key type and char as the value type. m32 names such a type of hash table. All types and functions associated with this name are macros, which will be explained later. Macro kh_init() initiates a hash table and kh_destroy() frees it. kh_put() inserts a key and returns the iterator (or the position) in the hash table. kh_get() and kh_del() get a key and delete an element, respectively. Macro kh_exist() tests if an iterator (or a position) is filled with data.

An immediate question is this piece of code does not look like a valid C program (e.g. lacking semicolon, assignment to an apparent function call and apparent undefined m32 'variable'). To understand why the code is correct, let's go a bit further into the source code of khash.h, whose skeleton looks like:

#define KHASH_INIT(name, SCOPE, key_t, val_t, is_map, _hashf, _hasheq) \
  typedef struct { \
    int n_buckets, size, n_occupied, upper_bound; \
    unsigned *flags; \
    key_t *keys; \
    val_t *vals; \
  } kh_##name##_t; \
  SCOPE inline kh_##name##_t *init_##name() { \
    return (kh_##name##_t*)calloc(1, sizeof(kh_##name##_t)); \
  } \
  SCOPE inline int get_##name(kh_##name##_t *h, key_t k) \
  ... \
  SCOPE inline void destroy_##name(kh_##name##_t *h) { \
    if (h) { \
      free(h->keys); free(h->flags); free(h->vals); free(h); \
    } \
  }

#define _int_hf(key) (unsigned)(key)
#define _int_heq(a, b) (a == b)
#define khash_t(name) kh_##name##_t
#define kh_value(h, k) ((h)->vals[k])
#define kh_begin(h, k) 0
#define kh_end(h) ((h)->n_buckets)
#define kh_init(name) init_##name()
#define kh_get(name, h, k) get_##name(h, k)
#define kh_destroy(name, h) destroy_##name(h)
...
#define KHASH_MAP_INIT_INT(name, val_t) \
	KHASH_INIT(name, static, unsigned, val_t, is_map, _int_hf, _int_heq)

KHASH_INIT() is a huge macro defining all the structs and methods. When this macro is called, all the code inside it will be inserted by the C preprocess to the place where it is called. If the macro is called multiple times, multiple copies of the code will be inserted. To avoid naming conflict of hash tables with different key-value types, the library uses token concatenation, which is a preprocessor feature whereby we can substitute part of a symbol based on the parameter of the macro. In the end, the C preprocessor will generate the following code and feed it to the compiler (macro kh_exist(h,k) is a little complex and not expanded for simplicity):

typedef struct {
  int n_buckets, size, n_occupied, upper_bound;
  unsigned *flags;
  unsigned *keys;
  char *vals;
} kh_m32_t;
static inline kh_m32_t *init_m32() {
  return (kh_m32_t*)calloc(1, sizeof(kh_m32_t));
}
static inline int get_m32(kh_m32_t *h, unsigned k)
...
static inline void destroy_m32(kh_m32_t *h) {
  if (h) {
    free(h->keys); free(h->flags); free(h->vals); free(h);
  }
}

int main() {
	int ret, is_missing;
	khint_t k;
	kh_m32_t *h = init_m32();
	k = put_m32(h, 5, &ret);
	if (!ret) del_m32(h, k);
	h->vals[k] = 10;
	k = get_m32(h, 10);
	is_missing = (k == h->n_buckets);
	k = get_m32(h, 5);
	del_m32(h, k);
	for (k = 0; k != h->n_buckets; ++k)
		if (kh_exist(h, k)) h->vals[k] = 1;
	destroy_m32(h);
	return 0;
}

This is the C program we know.

From this example, we can see that macros and the C preprocessor plays a key role in klib. Klib is fast partly because the compiler knows the key-value type at the compile time and is able to optimize the code to the same level as type-specific code. A generic library written with void* will not get such performance boost.

Massively inserting code upon instantiation may remind us of C++'s slow compiling speed and huge binary size when STL/boost is in use. Klib is much better in this respect due to its small code size and component independency. Inserting several hundreds lines of code won't make compiling obviously slower.

Resources

  • Library documentation, if present, is available in the header files. Examples can be found in the test/ directory.
  • Obsolete documentation of the hash table library can be found at SourceForge. This README is partly adapted from the old documentation.
  • Blog post describing the hash table library.
  • Blog post on why using void* for generic programming may be inefficient.
  • Blog post on the generic stream buffer.
  • Blog post evaluating the performance of kvec.h.
  • Blog post arguing B-tree may be a better data structure than a binary search tree.
  • Blog post evaluating the performance of khash.h and kbtree.h among many other implementations. An older version of the benchmark is also available.
  • Blog post benchmarking internal sorting algorithms and implementations.
  • Blog post on the k-small algorithm.
  • Blog post on the Hooke-Jeeve's algorithm for nonlinear programming.
Comments
  • Make ks_resize(&kstr,kstr.l+1) useful for ensuring the string is allocated and NUL-terminated

    Make ks_resize(&kstr,kstr.l+1) useful for ensuring the string is allocated and NUL-terminated

    For example, htslib/sam.c's sam_hdr_read() calls sam_hdr_parse(0, NULL) when the SAM file has no header lines. This crashes, while sam_hdr_parse(0, "") would be correct.

    I also attempted to add ks_init(kstring_t*) and ks_destroy(kstring_t*) functions as wrappers so client code would not need to know exactly which fields to initialise to 0/NULL and which to free. However these names are unfortunately already taken by kstream_t in kseq.h.

    Any thoughts about what to do here? Probably kstring_t is simple enough that a comment showing what to set to zero suffices. Another option would be to add a macro like the following to kstring.h:

    #define EMPTY_KSTRING { 0, 0, NULL }
    

    so that client code could write

    kstring_t str = EMPTY_KSTRING;
    

    I would be happy to add a init/destroy comment or such a macro to this pull request.

    opened by jmarshall 9
  • Can you describe the difference between the various ksw_*

    Can you describe the difference between the various ksw_*

    @attractivechaos @lh3

    I am trying to figure out the types of alignment based on the function names, guessing the following:

    • ksw_align - local alignment: a sub-sequence of the query aligned to a sub-sequence of the target
    • ksw_extend - a prefix of the query aligned to a prefix of the target**
    • ksw_global - alignment of the full query to the full target

    ** doesn't behave this way, which is why I am asking

    Also, I have a ksw_glocal implementation if you are looking for a contribution.

    opened by nh13 7
  • LICENSE file is missing

    LICENSE file is missing

    Hi, I'd like to request adding a license file to this repo. Right now most files have a header that says the license is MIT, and the README says MIT/X11, but there are also files without license headers (e.g. keigen.c) and just a sentence in the README isn't really enough.

    Also, GitHub's license detection will say that Klib as a whole does not have a license: https://github.com/attractivechaos/klib/community

    I'm happy to open a PR if that would help.

    opened by rgommers 4
  • Good hash function for 64bit keys?

    Good hash function for 64bit keys?

    Hi!

    I noticed that khash is very sensitive to the choice of hash function for >= 64bit keys (not so for 32bit keys). In particular kh_int64_hash_func() is extremely slow with random input. Does anyone have a similar experience? Is it expected to have wildly varying results based on the choice of hash function?

    Thanks!

    opened by capr 4
  • About Declarations and Definitions of khash

    About Declarations and Definitions of khash

    KHASH_DECLARE declares with extern, but KHASH_INIT is replaced with KHASH_INIT2(SCOPE = static kh_inline), SCOPE is always static. I want to know how to use KHASH_DECLARE. It looks like a bug.

    Thanks to you.

    opened by kevin-xu 4
  • Please enable the wiki on github

    Please enable the wiki on github

    I've been writing some documentation for klib for personal reference, and if you enabled the wiki feature on github, i'd be able to upload it for all to see and benefit. Its a great library, and with some good docs it has great potential. Thanks again for writing it.

    doc featreq 
    opened by chrishulbert 4
  • kvec: remove need to pass type?

    kvec: remove need to pass type?

    Many of kvec.h function macros require passing the type being stored (kv_resize, kv_copy, kv_push, kv_pushp, kv_a) but this seems to not actually be needed. It's only used for two reasons:

    • To cast the pointer returned from realloc(), which is not actually needed in C where the preferred style is usually to not cast from a void pointer when assigning to a variable.
    • In sizeof(type) which could be changed to sizeof(*(v).a)

    Has this not been changed simply to not break the current API?

    opened by damag 3
  • Got wrong hash for the same string for hash map

    Got wrong hash for the same string for hash map

    Reproduction code:

    KHASH_MAP_INIT_STR(var, double);
    static khash_t(var) *h = NULL;
    static double get_var(const char *name) {
        khiter_t k = kh_get(var, h, name);
        double val = k != kh_end(h) ? kh_val(h, k) : 0;
        printf("value of %s is %g\n", name, val);
        return val;
    }
    
    static void set_var(const char *name, double val) {
        printf("set %s to %g\n", name, val);
    
        int ret;
        khiter_t k = kh_put(var, h, name, &ret);
        kh_value(h, k) = val;
    }
    
    void test() {
        if (h == NULL) {
            h = kh_init(var);
        }
    
        const char *foo = strdup("foo");
    
        set_var(foo, 20);
        free(foo);
    
        get_var("foo");
        get_var(strdup("foo"));
    }
    
    

    Result:

    set foo to 20
    value of foo is 0
    value of foo is 20
    

    If I turned on optimization:

    set foo to 20
    value of foo is 0
    value of foo is 0
    

    wtf

    opened by stevefan1999-personal 3
  • Need to check how much memory my hash table has taken after inserting 1 million key values

    Need to check how much memory my hash table has taken after inserting 1 million key values

    Here is the code.

    https://github.com/attractivechaos/klib/blob/master/khash.h

    What I want: I am entering n unique entry in this table. and after inserting key and value i want to check how much space this hastable is consuming.

    Issue: Example after entering 1000 entry. When i try to get the size of hastable using kh_size(h) function given in "khash.h" I am getting 387 entry only. But when i am trying get the value by giving the key I am able to get.

    Here is my code: PFA. `#include <stdio.h> #include <stdlib.h> #include "khash.h" #include "string.h"

    #define kh_get_val(kname, hash, key, defVal) ({k=kh_get(kname, hash, key);(k!=kh_end(hash)?kh_val(hash,k):defVal);}) #define kh_set(kname, hash, key, val) ({int ret; k = kh_put(kname, hash,key,&ret); kh_value(hash,k) = val; ret;})

    const int khIntInt = 33;

    KHASH_MAP_INIT_INT(khIntInt, long long) // setup khash to handle string key with int payload //KHASH_MAP_INIT_STR(khStrInt, long long) // setup khash to handle string key with int payload int int_key_int_body_example(char * filename) { printf("\n\nBegin khash test string key with int payload\n"); int ret, is_missing; khiter_t k; char *tval; khash_t(khIntInt) *h = kh_init(khIntInt); // create a hashtable printf("\n --------------------------------- FILE READ AND WRITE OPERTAION -------------------\n");

        FILE *fp;
    	int * ch;
    	char line[200];
    	fp=fopen(filename,"r");
    	if(!fp)
    	{
                perror("could not find the file");
                exit(0);
    	}
    
    printf("\n ------------------------------------------------- PUTTING ALL KEYS ---------------------\n");
    
    int count=0;
    long long val_line=0;
        while (fgets ( line, 200, fp ) != NULL ) {
    
        printf(" Trying to Add key = %s and value = %s count =%lu \n", line,line,count++);
        val_line = strtoll(line, NULL, 0);
        printf("\n int_key_int_body: Here is the value in integer After Atoi = %ld \n",val_line);
        kh_set(khIntInt, h, val_line, val_line);
        printf(" ### DEBUG: h->n_buckets=%d,h->size=%d,h->n_occupied=%d,h->upper_bound=%d \n ",h->n_buckets,h->size,h->n_occupied,h->upper_bound);
        }
               fseek(fp, 0, SEEK_SET);
               printf("\n ------------------------------------------------- GETTING ALL VALUES ---------------------\n");
               count=0;
              while (fgets ( line, 200, fp ) != NULL ) {
                     val_line = strtoll(line, NULL, 0);
                     printf("\n int_key_int_body: Here is the value in integer After Atoi = %ld \n",val_line);
                     k=kh_get(khIntInt, h, val_line);
                    if (k == kh_end(h)) {
                         printf("no key found for line=%ld",val_line);
                    } else {
                        printf("GET value for  key = %s and value = %d count =%lu and k_iterator=%d \n", 
                        line,kh_val(h,k),count++,k);
                  }
               }
              long long 
    

    h,k),count++,k); } } long long no_of_elements = kh_size(h); printf("\n No_of_elements = %lld", no_of_elements); printf("\n ------------------------------------------------- ---------------------\n"); kh_destroy(khIntInt, h); return 0; } int main( int argc, char *argv[] ) { char *filename; filename=argv[1]; printf("\n filename=%s",filename); int_key_int_body_example(filename); }`

    opened by vikash009 3
  • kbtree iterator interface

    kbtree iterator interface

    I've stumbled upon this thread while looking for an adequate way to iterate over kbtree keys. Then i've ran across this kbtree variation which exposes an iterator interface instead of __kb_traverse. The following example illustrates their usage:

            int cnt = 0;
            kbitr_t itr;
            kb_itr_first_int(h, &itr);
            do {
                ++cnt;
                printf("%u\n", kb_itr_key(int, &itr));
            } while (kb_itr_next_int(h, &itr) != 0); 
            printf("[ht_khash_int] traverse size: %d\n", cnt);
    

    Assuming that this repo is the "official" for klib, any chance that iterator version can be landed here?

    opened by Lupus 3
  • Prevent unused function warnings in khash.h, klist.h

    Prevent unused function warnings in khash.h, klist.h

    Recent versions of Clang warn about unused static inline functions in .c files (though they suppress this warning for such definitions in header files). Definitions via KHASH_INIT etc are effectively in the .c file, and it's impractical to make these inline other than static inline; so add attributes to suppress these warnings.

    See for example the warnings that drown out other information in travis htslib build logs.

    opened by jmarshall 3
  • heapsort crash if input is empty

    heapsort crash if input is empty

    Given 0 as lsize, the program crashes. These are the relevant lines. https://github.com/attractivechaos/klib/blob/9a063b33efd841fcc42d4b9f68cb78bb528bf75b/ksort.h#L144-L145

    opened by jpcima 0
  • Create SECURITY.md

    Create SECURITY.md

    Hey there!

    I belong to an open source security research community, and a member (@michaellrowley) has found an issue, but doesn’t know the best way to disclose it.

    If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

    Thank you for your consideration, and I look forward to hearing from you!

    (cc @huntr-helper)

    opened by JamieSlome 0
  • Ask for type less often

    Ask for type less often

    For instance, the following :

    #define kv_push(type, v, x) do {									\
    		if ((v).n == (v).m) {										\
    			(v).m = (v).m? (v).m<<1 : 2;							\
    			(v).a = (type*)realloc((v).a, sizeof(type) * (v).m);	\
    		}															\
    		(v).a[(v).n++] = (x);										\
    	} while (0)
    

    can be rewritten as :

    #define kv_push(v, x) do {									\
    		if ((v).n == (v).m) {										\
    			(v).m = (v).m? (v).m<<1 : 2;							\
    			(v).a = realloc((v).a, sizeof(*(v).a) * (v).m);	\
    		}															\
    		(v).a[(v).n++] = (x);										\
    	} while (0)
    

    The key difference is that asking for type simply to take sizeof(type) can be error prone (and the errors can be hard to debug) and is redundant when the size of the type can be determined by taking sizeof() of a variable known to be of the same type in the structure passed to us. It is also sweeter syntax, as an aside.

    Also, why do we use trust malloc()/realloc() to always succeed ?

    opened by a-p-jo 0
  • Don't pass type as named macro parameter

    Don't pass type as named macro parameter

    Instead of :

    #define kvec_t(type) struct { size_t n, m; type *a; }
    

    do :

    #define kvec_t(...) struct { size_t n, m; __VA_ARGS__ *a; }
    

    C macros are parsed in a very literal way, with no contextual awareness of {}, etc. So while struct { int x, y; } is a valid type, kvec_t(type) will split it into two arguments at the first , and then complain about too many parameters, while kvec_t(...) will paste the entire thing in place of __VA_ARGS__.

    opened by a-p-jo 3
  • Crashes in KSON - kson_parse

    Crashes in KSON - kson_parse

    Hello, I encountered segmentation faults (by double free / heap overflow) in the kson_parse / kson_parse_core functions. Could someone please point me to a contact to discuss these issues? Best Regards

    opened by 0xca7 0
MojoSetup is a standalone installer for Linux

MojoSetup is a standalone installer for Linux, designed to help third-party developers that need to ship software outside of traditional package management infrastructure.

Ryan C. Gordon 38 Dec 10, 2022
Standalone MinHook wrapper for Golang.

Standalone version of GoMinHook! Credit to https://github.com/NaniteFactory/gominhook and https://github.com/TsudaKageyu/minhook as almost all of the

null 3 Jun 4, 2022
Node running standalone on PC, with interface - self-containing all dependencies

GMD Node Windows Application It is the GMD Node App for Windows packaged in a simple "one-click" installer containing all necessary dependencies. We a

Geoma COOP 3 Sep 25, 2022
Python Standalone Deploy Environment

PyStand Python 独立部署环境。Python 3.5 以后,Windows 下面都有一个 Embedded Python 的 独立 Python 运行环境,这个 PyStand 就是配合 Embedded Python 使用的。

Linwei 207 Dec 26, 2022
PikaScript is an ultra-lightweight Python engine with zero dependencies and zero-configuration, that can run with 4KB of RAM (such as STM32G030C8 and STM32F103C8), and is very easy to deploy and expand.

PikaScript 中文页| Star please~ 1. Abstract PikaScript is an ultra-lightweight Python engine with zero dependencies and zero-configuration, that can run

Lyon 906 Dec 29, 2022
Vireo is a lightweight and versatile video processing library written in C++11

Overview Vireo is a lightweight and versatile video processing library that powers our video transcoding service, deep learning recognition systems an

Twitter 874 Dec 27, 2022
agent-less and lightweight communication library compatible with rclcpp for embedded devices

mros2 mros2 (formally mROS 2) realizes an agent-less and lightweight runtime environment compatible with ROS 2 for embedded devices. It consists of ba

null 123 Dec 21, 2022
A C++ fast and lightweight 3D vector library

Vector3D A C++ fast and lightweight 3D vector library. Usage On your C++ code include the file. #include <vector.h> Then, declare your vector and ini

Carlos del Valle 10 Dec 21, 2022
Newlib for Xuantie RISC-V CPU, a lightweight C library for embedded systems.

README for GNU development tools This directory contains various GNU compilers, assemblers, linkers, debuggers, etc., plus their support routines, d

T-Head Semiconductor Co., Ltd. 5 Sep 9, 2022
BNFLite is a C++ template library for lightweight flexible grammar parsers

BNFLite is a C++ template library for lightweight flexible grammar parsers. BNFLite offers creative approach when the developer can specify a language for further parsing directly in the C++ code. Moreover, such "specifications" are executable now!

Alexander S 64 Dec 19, 2022
OptimLib: a lightweight C++ library of numerical optimization methods for nonlinear functions

OptimLib OptimLib is a lightweight C++ library of numerical optimization methods for nonlinear functions. Features: A C++11 library of local and globa

Keith O'Hara 618 Dec 27, 2022
Pipet - c++ library for building lightweight processing pipeline at compile-time for string obfuscation, aes ciphering or whatever you want

Pipet Pipet is a lightweight c++17 headers-only library than can be used to build simple processing pipelines at compile time. Features Compile-time p

C. G. 60 Dec 12, 2022
A lightweight C++14 parsing library for tmx map files created with the Tiled map editor

tmxlite Description A lightweight C++14 parsing library for tmx map files created with the Tiled map editor. Requires no external linking, all depende

Matt Styles 325 Dec 26, 2022
Lightweight Windows/Linux terminal control library for C/C++

TerControl Table of Contents About TerControl Features Installation Contributing License Contact Thanks TerControl is a lightweight opinion based term

Zackery .R. Smith 3 Sep 3, 2022
Fast and lightweight username lookup tool inspired by sherlock.

Lightweight username lookup inspired by Sherlock. Created in C++. Features Works on 250+ websites Fast, easy to use and compact How to use $ scout.exe

eternity 10 Jun 23, 2022
MasterPlan is a project management software / visual idea board software. It attempts to be easy to use, lightweight, and fun.

MasterPlan is a customizeable graphical project management software for independent users or small teams. If you need to share plans across a whole co

SolarLune 444 Dec 23, 2022
A simple, lightweight Python 2.7 interpreter, with predictable memory management and without global locks.

libPyVM is a compact Python 2.7 interpreter that can be used in various places where the standard CPython interpreter is not appropriate or cumbersome

null 9 May 19, 2021
A cross-platform,lightweight,scalable game server framework written in C++, and support Lua Script

Current building status Moon Moon is a lightweight online game server framework implement with multithread and multi-luaVM. One thread may have 1-N lu

Bruce 467 Dec 29, 2022