Universal configuration library parser

Related tags

Utilities libucl
Overview

LIBUCL

CircleCI Coverity Coverage Status

Table of Contents generated with DocToc

Introduction

This document describes the main features and principles of the configuration language called UCL - universal configuration language.

If you are looking for the libucl API documentation you can find it at this page.

Basic structure

UCL is heavily infused by nginx configuration as the example of a convenient configuration system. However, UCL is fully compatible with JSON format and is able to parse json files. For example, you can write the same configuration in the following ways:

  • in nginx like:
param = value;
section {
    param = value;
    param1 = value1;
    flag = true;
    number = 10k;
    time = 0.2s;
    string = "something";
    subsection {
        host = {
            host = "hostname";
            port = 900;
        }
        host = {
            host = "hostname";
            port = 901;
        }
    }
}
  • or in JSON:
{
    "param": "value",
    "section": {
        "param": "value",
        "param1": "value1",
        "flag": true,
        "number": 10000,
        "time": "0.2s",
        "string": "something",
        "subsection": {
            "host": [
                {
                    "host": "hostname",
                    "port": 900
                },
                {
                    "host": "hostname",
                    "port": 901
                }
            ]
        }
    }
}

Improvements to the json notation.

There are various things that make ucl configuration more convenient for editing than strict json:

General syntax sugar

  • Braces are not necessary to enclose a top object: it is automatically treated as an object:
"key": "value"

is equal to:

{"key": "value"}
  • There is no requirement of quotes for strings and keys, moreover, : may be replaced = or even be skipped for objects:
key = value;
section {
    key = value;
}

is equal to:

{
    "key": "value",
    "section": {
        "key": "value"
    }
}
  • No commas mess: you can safely place a comma or semicolon for the last element in an array or an object:
{
    "key1": "value",
    "key2": "value",
}

Automatic arrays creation

  • Non-unique keys in an object are allowed and are automatically converted to the arrays internally:
{
    "key": "value1",
    "key": "value2"
}

is converted to:

{
    "key": ["value1", "value2"]
}

Named keys hierarchy

UCL accepts named keys and organize them into objects hierarchy internally. Here is an example of this process:

section "blah" {
	key = value;
}
section foo {
	key = value;
}

is converted to the following object:

section {
	blah {
		key = value;
	}
	foo {
		key = value;
	}
}

Plain definitions may be more complex and contain more than a single level of nested objects:

section "blah" "foo" {
	key = value;
}

is presented as:

section {
	blah {
		foo {
			key = value;
		}
	}
}

Convenient numbers and booleans

  • Numbers can have suffixes to specify standard multipliers:
    • [kKmMgG] - standard 10 base multipliers (so 1k is translated to 1000)
    • [kKmMgG]b - 2 power multipliers (so 1kb is translated to 1024)
    • [s|min|d|w|y] - time multipliers, all time values are translated to float number of seconds, for example 10min is translated to 600.0 and 10ms is translated to 0.01
  • Hexadecimal integers can be used by 0x prefix, for example key = 0xff. However, floating point values can use decimal base only.
  • Booleans can be specified as true or yes or on and false or no or off.
  • It is still possible to treat numbers and booleans as strings by enclosing them in double quotes.

General improvements

Comments

UCL supports different style of comments:

  • single line: #
  • multiline: /* ... */

Multiline comments may be nested:

# Sample single line comment
/*
 some comment
 /* nested comment */
 end of comment
*/

Macros support

UCL supports external macros both multiline and single line ones:

.macro_name "sometext";
.macro_name {
    Some long text
    ....
};

Moreover, each macro can accept an optional list of arguments in braces. These arguments themselves are the UCL object that is parsed and passed to a macro as options:

.macro_name(param=value) "something";
.macro_name(param={key=value}) "something";
.macro_name(.include "params.conf") "something";
.macro_name(#this is multiline macro
param = [value1, value2]) "something";
.macro_name(key="()") "something";

UCL also provide a convenient include macro to load content from another files to the current UCL object. This macro accepts either path to file:

.include "/full/path.conf"
.include "./relative/path.conf"
.include "${CURDIR}/path.conf"

or URL (if ucl is built with url support provided by either libcurl or libfetch):

.include "http://example.com/file.conf"

.include macro supports a set of options:

  • try (default: false) - if this option is true than UCL treats errors on loading of this file as non-fatal. For example, such a file can be absent but it won't stop the parsing of the top-level document.
  • sign (default: false) - if this option is true UCL loads and checks the signature for a file from path named <FILEPATH>.sig. Trusted public keys should be provided for UCL API after parser is created but before any configurations are parsed.
  • glob (default: false) - if this option is true UCL treats the filename as GLOB pattern and load all files that matches the specified pattern (normally the format of patterns is defined in glob manual page for your operating system). This option is meaningless for URL includes.
  • url (default: true) - allow URL includes.
  • path (default: empty) - A UCL_ARRAY of directories to search for the include file. Search ends after the first match, unless glob is true, then all matches are included.
  • prefix (default false) - Put included contents inside an object, instead of loading them into the root. If no key is provided, one is automatically generated based on each files basename()
  • key (default: ) - Key to load contents of include into. If the key already exists, it must be the correct type
  • target (default: object) - Specify if the prefix key should be an object or an array.
  • priority (default: 0) - specify priority for the include (see below).
  • duplicate (default: 'append') - specify policy of duplicates resolving:
    • append - default strategy, if we have new object of higher priority then it replaces old one, if we have new object with less priority it is ignored completely, and if we have two duplicate objects with the same priority then we have a multi-value key (implicit array)
    • merge - if we have object or array, then new keys are merged inside, if we have a plain object then an implicit array is formed (regardless of priorities)
    • error - create error on duplicate keys and stop parsing
    • rewrite - always rewrite an old value with new one (ignoring priorities)

Priorities are used by UCL parser to manage the policy of objects rewriting during including other files as following:

  • If we have two objects with the same priority then we form an implicit array
  • If a new object has bigger priority then we overwrite an old one
  • If a new object has lower priority then we ignore it

By default, the priority of top-level object is set to zero (lowest priority). Currently, you can define up to 16 priorities (from 0 to 15). Includes with bigger priorities will rewrite keys from the objects with lower priorities as specified by the policy. The priority of the top-level or any other object can be changed with the .priority macro, which has no options and takes the new priority:

# Default priority: 0.
foo = 6
.priority 5
# The following will have priority 5.
bar = 6
baz = 7
# The following will be included with a priority of 3, 5, and 6 respectively.
.include(priority=3) "path.conf"
.include(priority=5) "equivalent-path.conf"
.include(priority=6) "highpriority-path.conf"

Variables support

UCL supports variables in input. Variables are registered by a user of the UCL parser and can be presented in the following forms:

  • ${VARIABLE}
  • $VARIABLE

UCL currently does not support nested variables. To escape variables one could use double dollar signs:

  • $${VARIABLE} is converted to ${VARIABLE}
  • $$VARIABLE is converted to $VARIABLE

However, if no valid variables are found in a string, no expansion will be performed (and $$ thus remains unchanged). This may be a subject to change in future libucl releases.

Multiline strings

UCL can handle multiline strings as well as single line ones. It uses shell/perl like notation for such objects:

key = <<EOD
some text
splitted to
lines
EOD

In this example key will be interpreted as the following string: some text\nsplitted to\nlines. Here are some rules for this syntax:

  • Multiline terminator must start just after << symbols and it must consist of capital letters only (e.g. <<eof or << EOF won't work);
  • Terminator must end with a single newline character (and no spaces are allowed between terminator and newline character);
  • To finish multiline string you need to include a terminator string just after newline and followed by a newline (no spaces or other characters are allowed as well);
  • The initial and the final newlines are not inserted to the resulting string, but you can still specify newlines at the beginning and at the end of a value, for example:
key <<EOD

some
text

EOD

Single quoted strings

It is possible to use single quoted strings to simplify escaping rules. All values passed in single quoted strings are NOT escaped, with two exceptions: a single ' character just before \ character, and a newline character just after \ character that is ignored.

key = 'value'; # Read as value
key = 'value\n\'; # Read as  value\n\
key = 'value\''; # Read as value'
key = 'value\
bla'; # Read as valuebla

Emitter

Each UCL object can be serialized to one of the four supported formats:

  • JSON - canonic json notation (with spaces indented structure);
  • Compacted JSON - compact json notation (without spaces or newlines);
  • Configuration - nginx like notation;
  • YAML - yaml inlined notation.

Validation

UCL allows validation of objects. It uses the same schema that is used for json: json schema v4. UCL supports the full set of json schema with the exception of remote references. This feature is unlikely useful for configuration objects. Of course, a schema definition can be in UCL format instead of JSON that simplifies schemas writing. Moreover, since UCL supports multiple values for keys in an object it is possible to specify generic integer constraints maxValues and minValues to define the limits of values count in a single key. UCL currently is not absolutely strict about validation schemas themselves, therefore UCL users should supply valid schemas (as it is defined in json-schema draft v4) to ensure that the input objects are validated properly.

Performance

Are UCL parser and emitter fast enough? Well, there are some numbers. I got a 19Mb file that consist of ~700 thousand lines of json (obtained via http://www.json-generator.com/). Then I checked jansson library that performs json parsing and emitting and compared it with UCL. Here are results:

jansson: parsed json in 1.3899 seconds
jansson: emitted object in 0.2609 seconds

ucl: parsed input in 0.6649 seconds
ucl: emitted config in 0.2423 seconds
ucl: emitted json in 0.2329 seconds
ucl: emitted compact json in 0.1811 seconds
ucl: emitted yaml in 0.2489 seconds

So far, UCL seems to be significantly faster than jansson on parsing and slightly faster on emitting. Moreover, UCL compiled with optimizations (-O3) performs significantly faster:

ucl: parsed input in 0.3002 seconds
ucl: emitted config in 0.1174 seconds
ucl: emitted json in 0.1174 seconds
ucl: emitted compact json in 0.0991 seconds
ucl: emitted yaml in 0.1354 seconds

You can do your own benchmarks by running make check in libucl top directory.

Conclusion

UCL has clear design that should be very convenient for reading and writing. At the same time it is compatible with JSON language and therefore can be used as a simple JSON parser. Macro logic provides an ability to extend configuration language (for example by including some lua code) and comments allow to disable or enable the parts of a configuration quickly.

Comments
  • ucl_object_free_internal too aggressive?

    ucl_object_free_internal too aggressive?

    Hi! I've been getting some strange segfaults in my go-libucl library, and I outputted ref counts and everything looks good. It seems like when the parent gets freed, all children are automatically freed.

    Instead, shouldn't it just unref the children?

    opened by mitchellh 11
  • Why `const unsigned char *` for chunks but nothing else?

    Why `const unsigned char *` for chunks but nothing else?

    The API for ucl_parser_add_chunk is inconsistent, it takes a const unsigned char * while everything else takes a const char *. While the project is young I figured I'd report this if you feel you want to make the change.

    It is just a bit odd.

    opened by mitchellh 11
  • C++ interface: variable support

    C++ interface: variable support

    • User-defined variable support
      • Add static member function ucl::Ucl::find_variable(...) as an assistant function
      • Parse with variables
        • Using key-value pairs (std::map)
        • Or, using 'strategy' class (OOP design pattern); example inherited class is attached
      • Actually, user-defined ucl_variable_handle function was not working properly (or, this feature was not implemented yet). I tried to fix the bug in C code, and I would be grateful if you check it.
    • Replace functions which have std::istream param, with parse_from_file(...) and find_from_file(...)
      • I think getting input from a stream object (or file pointer) is not a good idea
        • Class interface should provide 'core' and 'minimal' functionalities only, considering encapsulation. And istream-to-string conversion is not so difficult, for users of Ucl module.
        • Those functions did not check the validity of input parameters at all. They should check it, but it could be complex job...
        • Most of all, core C interface already provides the file-reading functionality. We can use ucl_parser_add_file function.
        • C++ interface of Ucl was inspired by json11(https://github.com/dropbox/json11/blob/master/json11.hpp), but json11 does not support istream-input.
      • However, changing(especially, removing) of interfaces is always not desirable. I'll respect your decision and restore the functions if needed.
    opened by ttti07 6
  • Comment code results in doubles

    Comment code results in doubles

    Output with comments retained results in any consecutive comments being duplicated

    double_comment.in:

    key1 = 1;
    #comment1
    #comment2
    key2 = 2;
    

    result: test_basic -C double_comment.in

    #comment1
    #comment2
    key1 = 1;
    #comment1
    #comment2
    key2 = 2;
    

    It does not happen if the comments are not consecutive: normal.in:

    key1 = 1;
    #comment1
    key2 = 2;
    #comment2
    key3 = 3;
    

    test_basic -C normal.in

    key1 = 1;
    #comment1
    key2 = 2;
    #comment2
    key3 = 3;
    
    opened by allanjude 6
  • Added fuzzer for msgpack

    Added fuzzer for msgpack

    A couple of notes on this fuzzer:

    1: It runs out of memory fairly quickly (about 20-30 seconds). To me it looks like everything is freed as it should be, but if that is not the case, I will appreciate any observations.

    2: There are 2 assertions that make the fuzzer stop altogether. The one is line 1304: https://github.com/vstakhov/libucl/blob/e03e0bc63bfe38ac41b67779a491988d1d6fc501/src/ucl_msgpack.c#L1301-L1306 If you see a way to not trigger it, please let me know. We could free parser->stack manually and add a return statement just above it to not stop the fuzzer.

    The second is on line 1047: https://github.com/vstakhov/libucl/blob/e03e0bc63bfe38ac41b67779a491988d1d6fc501/src/ucl_msgpack.c#L1033-L1049 Depending on what the logic is here, it might not be able to avoid it. If you have a suggestion to not abort here, do let me know.

    In either case, we don’t need to change the code in the repository. We would make a temporary fix when running the fuzzers. Nevertheless, I would like to be sure that we don’t divert from the logic of the application.

    It would be great if we could find a solution, as the fuzzer does find an off-by-one in ucl_msgpack.c, and it covers a large part of the ucl_msgpack.c code.

    opened by AdamKorcz 5
  • src/ucl_util.c: add missing include

    src/ucl_util.c: add missing include

    Hi all,

    I ran into the below when building rtpproxy master, which includes a copy of libucl. I sent a PR to sippy/libucl, but seeing that is a a fork of this repo, I send the PR to you as well.

    This was compile-tested on OpenWrt master with gcc 7.4.0 and libc musl. I'm not 100% what rtpproxy uses libucl for, but the resulting rtpproxy was run-tested on a big endian mips router.

    Kind regards, Seb

    When building within rtpproxy the following breakage occurs:

    ../external/libucl/src/ucl_util.c: In function 'ucl_include_url':
    ../external/libucl/src/ucl_util.c:914:14: error: 'PATH_MAX' undeclared (first use in this function); did you mean 'INT8_MAX'?
      char urlbuf[PATH_MAX];
                  ^~~~~~~~
                  INT8_MAX
    ../external/libucl/src/ucl_util.c:914:14: note: each undeclared identifier is reported only once for each function it appears in
    ../external/libucl/src/ucl_util.c: In function 'ucl_include_file_single':
    ../external/libucl/src/ucl_util.c:987:15: error: 'PATH_MAX' undeclared (first use in this function); did you mean 'INT8_MAX'?
      char filebuf[PATH_MAX], realbuf[PATH_MAX];
                   ^~~~~~~~
                   INT8_MAX
    ../external/libucl/src/ucl_util.c:188:22: warning: implicit declaration of function 'realpath'; did you mean 'realloc'? [-Wimplicit-function-declaration]
     #define ucl_realpath realpath
                          ^
    ../external/libucl/src/ucl_util.c:996:6: note: in expansion of macro 'ucl_realpath'
      if (ucl_realpath (filebuf, realbuf) == NULL) {
          ^~~~~~~~~~~~
    ../external/libucl/src/ucl_util.c:1059:21: warning: implicit declaration of function 'strdup'; did you mean 'strcmp'? [-Wimplicit-function-declaration]
      parser->cur_file = strdup (realbuf);
                         ^~~~~~
                         strcmp
    ../external/libucl/src/ucl_util.c: In function 'ucl_include_file':
    ../external/libucl/src/ucl_util.c:1314:20: error: 'PATH_MAX' undeclared (first use in this function); did you mean 'INT8_MAX'?
      char glob_pattern[PATH_MAX];
                        ^~~~~~~~
                        INT8_MAX
    ../external/libucl/src/ucl_util.c: In function 'ucl_include_common':
    ../external/libucl/src/ucl_util.c:1390:13: error: 'PATH_MAX' undeclared (first use in this function); did you mean 'INT8_MAX'?
      char ipath[PATH_MAX];
                 ^~~~~~~~
                 INT8_MAX
    ../external/libucl/src/ucl_util.c: In function 'ucl_parser_set_filevars':
    ../external/libucl/src/ucl_util.c:1833:15: error: 'PATH_MAX' undeclared (first use in this function); did you mean 'INT8_MAX'?
      char realbuf[PATH_MAX], *curdir;
                   ^~~~~~~~
                   INT8_MAX
    ../external/libucl/src/ucl_util.c: In function 'ucl_parser_add_file_full':
    ../external/libucl/src/ucl_util.c:1868:15: error: 'PATH_MAX' undeclared (first use in this function); did you mean 'INT8_MAX'?
      char realbuf[PATH_MAX];
                   ^~~~~~~~
                   INT8_MAX
    ../external/libucl/src/ucl_util.c: In function 'ucl_object_copy_internal':
    ../external/libucl/src/ucl_util.c:3459:36: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
        new->trash_stack[UCL_TRASH_KEY] =
                                        ^
    ../external/libucl/src/ucl_util.c:3466:38: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
        new->trash_stack[UCL_TRASH_VALUE] =
                                          ^
    make[4]: *** [Makefile:615: libucl_a-ucl_util.o] Error 1
    

    Fix this by including the proper header.

    Signed-off-by: Sebastian Kemper [email protected]

    opened by micmac1 5
  • lua binding for json schema validation

    lua binding for json schema validation

    I would very much like to have access to the json schema validation api via lua/luajit. Is this going to happen any time soon, and is it possible to sponsor it?

    opened by bjne 5
  • fix: Changed OpenSSL check inside configure.am

    fix: Changed OpenSSL check inside configure.am

    In OpenSSL 1.1.0 the EVP_MD_CTX_create() and EVP_MD_CTX_destroy() functions were renamed to EVP_MD_CTX_new() and EVP_MD_CTX_free(). Because a check for EVP_MD_CTX_create() was in place inside configure.am, building with newer OpenSSL versions could not be done.

    Checking for EVP_MD_CTX_create function from configure.am was replaced with a check for CRYPTO_new_ex_data() function.

    Because a compatibility layer was introduced in OpenSSL 1.1.0, no code changes are necessary.

    Fixes: #203

    opened by vimishor 4
  • How to use macros?

    How to use macros?

    It is not clear from the documentation or examples how to use macros or how they work:

    1. Can a user define a macro in a UCL file?
    2. How do macros take arguments? Sometimes there are keyword arguments passed in parenthesis, and then there is a string after the macro name. What is the syntax for macros?
    3. How do they work?

    more info please, Thanks

    opened by dpacbach 4
  • Fix all Clang warnings from implicit conversions.

    Fix all Clang warnings from implicit conversions.

    This commit will fix all remaining Clang warnings that arise from building the core libucl library. These warnings arise from various kinds of conversions to and from char pointers: {signed,unsigned} {const,non-const} char*.

    The code uses all four of these variations and implicitly converts between them in many places. This generates many warnings on clang with -Wall.

    To fix this without introducing too much noise in the code there are four new macros:

    Cast to: Macro

    char* S_(...) unsigned char* U_(...) const char* SC_(...) const unsigned char* UC_(...)

    The idea behind this change is, instead of changing any of the types of e.g. function parameters or variables, we continue to do the casting, but make it explicit with these macros.

    opened by dpacbach 4
  • Macro help

    Macro help

    Could someone please explain how to use .macro? I'm trying to figure out how to define and use my own macro but I'm not having any luck. Some more complete examples than what the README provides would be helpful. I'm so confused on how to use it.

    Thank you.

    opened by dangerousHobo 4
  • Heap-buffer-overflow in ucl_skip_comments function of ucl_parser.c:182:11

    Heap-buffer-overflow in ucl_skip_comments function of ucl_parser.c:182:11

    When I use tests/fuzzers/ucl_add_string_fuzzer.c for fuzz testing, I found Heap-buffer-overflow in ucl_skip_comments function of ucl_parser.c:182:11

    Verification steps

    CC = clang
    CFLAGS = -O1 -fno-omit-frame-pointer -gline-tables-only -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address -fsanitize-address-use-after-scope -fsanitize=fuzzer-no-link
    LIB_FUZZING_ENGINE="-fsanitize=fuzzer"
    
    cd libucl 
    ./autogen.sh && ./configure
    make
    
    $CC $CFLAGS $LIB_FUZZING_ENGINE tests/fuzzers/ucl_add_string_fuzzer.c \
        -DHAVE_CONFIG_H -I./src -I./include src/.libs/libucl.a -I./ \
        -o $OUT/ucl_add_string_fuzzer
    ./ucl_add_string_fuzzer $poc
    

    POC file

    https://github.com/HotSpurzzZ/testcases/blob/main/libucl/libucl_Heap_buffer_overflow_ucl_skip_comments

    AddressSanitizer output

    ==24470==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6070000000d1 at pc 0x00000057296a bp 0x7ffe68c8a060 sp 0x7ffe68c8a058
      | READ of size 1 at 0x6070000000d1 thread T0
      | #0 0x572969 in ucl_skip_comments /src/libucl/src/ucl_parser.c:182:11
      | #1 0x562b3b in ucl_parse_key /src/libucl/src/ucl_parser.c:1466:9
      | #2 0x562b3b in ucl_state_machine /src/libucl/src/ucl_parser.c:2475:9
      | #3 0x560e4e in ucl_parser_add_chunk_full /src/libucl/src/ucl_parser.c:2995:12
      | #4 0x570621 in ucl_parser_add_chunk_priority /src/libucl/src/ucl_parser.c:3030:9
      | #5 0x570621 in ucl_parser_add_string_priority /src/libucl/src/ucl_parser.c:3093:9
      | #6 0x570621 in ucl_parser_add_string /src/libucl/src/ucl_parser.c:3105:9
      | #7 0x55a8de in LLVMFuzzerTestOneInput /src/libucl/tests/fuzzers/ucl_add_string_fuzzer.c:17:2
      | #8 0x455243 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
      | #9 0x440ed2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
      | #10 0x44671c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
      | #11 0x46f312 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
      | #12 0x7f9ae934f082 in __libc_start_main
      | #13 0x41f6fd in _start
      |  
      | 0x6070000000d1 is located 0 bytes to the right of 65-byte region [0x607000000090,0x6070000000d1)
      | allocated by thread T0 here:
      | #0 0x523b2d in malloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
      | #1 0x436ed7 in operator new(unsigned long) cxa_noexception.cpp:0
      | #2 0x440ed2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
      | #3 0x44671c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
      | #4 0x46f312 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
      | #5 0x7f9ae934f082 in __libc_start_main
      |  
      | SUMMARY: AddressSanitizer: heap-buffer-overflow (/clusterfuzz/bot1700-7565d8759b-549f9/clusterfuzz/bot/builds/libfuzzer_asan_linux_libucl/custom/ucl_add_string_fuzzer+0x572969)
      | Shadow bytes around the buggy address:
      | 0x0c0e7fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      | 0x0c0e7fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      | 0x0c0e7fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      | 0x0c0e7fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      | 0x0c0e7fff8000: fa fa fa fa 00 00 00 00 00 00 00 00 01 fa fa fa
      | =>0x0c0e7fff8010: fa fa 00 00 00 00 00 00 00 00[01]fa fa fa fa fa
      | 0x0c0e7fff8020: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa
      | 0x0c0e7fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      | 0x0c0e7fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      | 0x0c0e7fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      | 0x0c0e7fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      | Shadow byte legend (one shadow byte represents 8 application bytes):
      | Addressable:           00
      | Partially addressable: 01 02 03 04 05 06 07
      | Heap left redzone:       fa
      | Freed heap region:       fd
      | Stack left redzone:      f1
      | Stack mid redzone:       f2
      | Stack right redzone:     f3
      | Stack after return:      f5
      | Stack use after scope:   f8
      | Global redzone:          f9
      | Global init order:       f6
      | Poisoned by user:        f7
      | Container overflow:      fc
      | Array cookie:            ac
      | Intra object redzone:    bb
      | ASan internal:           fe
      | Left alloca redzone:     ca
      | Right alloca redzone:    cb
      | ==24470==ABORTING
     
    
    opened by HotSpurzzZ 0
  • Heap-buffer-overflow in ucl_maybe_parse_number /src/libucl/src/ucl_parser.c

    Heap-buffer-overflow in ucl_maybe_parse_number /src/libucl/src/ucl_parser.c

    Build platform

    ubuntu20.04

    Build steps
    CC = clang
    CFLAGS = -O1 -fno-omit-frame-pointer -gline-tables-only -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fsanitize=address -fsanitize-address-use-after-scope -fsanitize=fuzzer-no-link
    LIB_FUZZING_ENGINE="-fsanitize=fuzzer"
    
    $CC $CFLAGS $LIB_FUZZING_ENGINE tests/fuzzers/ucl_add_string_fuzzer.c \
        -DHAVE_CONFIG_H -I./src -I./include src/.libs/libucl.a -I./ \
        -o $OUT/ucl_add_string_fuzzer
    
    Test case

    poc.zip

    Execution steps

    ./ucl_add_string_fuzzer poc

    Output
    INFO: Running with entropic power schedule (0xFF, 100).
    INFO: Seed: 227906990
    INFO: Loaded 1 modules   (3863 inline 8-bit counters): 3863 [0x624b30, 0x625a47), 
    INFO: Loaded 1 PC tables (3863 PCs): 3863 [0x5cde08,0x5dcf78), 
    ./ucl_add_string_fuzzer: Running 1 inputs 1 time(s) each.
    Running: poc
    =================================================================
    ==9526==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000b3 at pc 0x00000055c824 bp 0x7ffeaeae0290 sp 0x7ffeaeae0288
    READ of size 1 at 0x6020000000b3 thread T0
        #0 0x55c823 in ucl_maybe_parse_number /src/libucl/src/ucl_parser.c:846:45
        #1 0x56bd3f in ucl_lex_number /src/libucl/src/ucl_parser.c:1026:8
        #2 0x56bd3f in ucl_parse_value /src/libucl/src/ucl_parser.c:1899:10
        #3 0x56bd3f in ucl_state_machine /src/libucl/src/ucl_parser.c:2509:29
        #4 0x5618be in ucl_parser_add_chunk_full /src/libucl/src/ucl_parser.c:2995:12
        #5 0x571091 in ucl_parser_add_chunk_priority /src/libucl/src/ucl_parser.c:3030:9
        #6 0x571091 in ucl_parser_add_string_priority /src/libucl/src/ucl_parser.c:3093:9
        #7 0x571091 in ucl_parser_add_string /src/libucl/src/ucl_parser.c:3105:9
        #8 0x55b34e in LLVMFuzzerTestOneInput /src/libucl/tests/fuzzers/ucl_add_string_fuzzer.c:17:2
        #9 0x455402 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
        #10 0x440fb2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
        #11 0x44681c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
        #12 0x46f1b2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
        #13 0x7fdb5594f0b2 in __libc_start_main /build/glibc-sMfBJT/glibc-2.31/csu/../csu/libc-start.c:308:16
        #14 0x41f6fd in _start (/home/wcc/workspace/libucl/build-out/ucl_add_string_fuzzer+0x41f6fd)
    
    0x6020000000b3 is located 0 bytes to the right of 3-byte region [0x6020000000b0,0x6020000000b3)
    allocated by thread T0 here:
        #0 0x52482d in malloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
        #1 0x436f27 in operator new(unsigned long) cxa_noexception.cpp
        #2 0x440fb2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
        #3 0x44681c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
        #4 0x46f1b2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
        #5 0x7fdb5594f0b2 in __libc_start_main /build/glibc-sMfBJT/glibc-2.31/csu/../csu/libc-start.c:308:16
    
    SUMMARY: AddressSanitizer: heap-buffer-overflow /src/libucl/src/ucl_parser.c:846:45 in ucl_maybe_parse_number
    Shadow bytes around the buggy address:
      0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0c047fff8000: fa fa 00 00 fa fa 00 fa fa fa 00 fa fa fa 00 fa
    =>0x0c047fff8010: fa fa 03 fa fa fa[03]fa fa fa 00 fa fa fa 00 04
      0x0c047fff8020: fa fa 00 01 fa fa 00 01 fa fa 05 fa fa fa 00 fa
      0x0c047fff8030: fa fa 00 01 fa fa 06 fa fa fa 07 fa fa fa 02 fa
      0x0c047fff8040: fa fa 04 fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07 
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==9526==ABORTING
    
    
    opened by CCWANG19 1
  • Fix excessive escaping when using `ucl_object_fromstring()`

    Fix excessive escaping when using `ucl_object_fromstring()`

    UCL_STRING_ESCAPE might be useful for something (though I'm not sure what) but it's not good default behaviour. Strings are escaped during output to meet the needs of the output format, so adding escape sequences while building the object results in things being escaped twice.

    opened by flowerysong 5
  • Release a new pip package

    Release a new pip package

    The last release of this project to pypi was march of 2018. Any chance we could get an update? The libucl project seems to be fairly active.

    Thank you

    opened by retr0h 2
  • [DISCUSSION]: Using dot-notation for setting values

    [DISCUSSION]: Using dot-notation for setting values

    Hello,

    It's currently possible via ucl_object_lookup_path() to lookup objects. For example, given the following configuration:

    section {
        A {
            B {
                key = value;
            }
        }
    }
    

    You could use the string:

    "section.A.B.key"
    

    Which would return value.

    I'm proposing that we can do the inverse, and change the parser, such that the following examples are possible:

    section.A.B.key = "value";
    foo.bar.baz = [1,2,3,4];
    

    ... which would result in:

    section {
        A {
            B {
                key = "value";
            }
        }
    }
    
    foo {
        bar {
            baz = [1,2,3,4]
        }
    }
    

    I know it's possible to do:

    section "A" "B" {
        key = value
    }
    

    But this isn't quite the same thing as what I'm suggesting.

    I suppose we might have an issue whereby if someone has:

    section.A.B.key = value
    

    In their config already, then the existing parser will use section.A.B.key as the key, and value as its value, so there might be a slight backwards-compatible issue.

    What do you think? I'm happy to do the work, but wanted to check before going ahead.

    One of the reasons for suggesting this, is it makes it possible to create configuration values which are entirely line-based (rather than block-based).

    opened by ThomasAdam 1
  • Memory safety bug in hashtable replace

    Memory safety bug in hashtable replace

    The randomness in the hash implementation makes it difficult for me to reproduce this bug reliably.

    If I call ucl_object_replace_key replacing property that is a string with another one of the same type, I get an intermittent use-after-free (reported by asan - without asan I get some random data in memory and when I try to call the emit function I get truncated nonsense with an invalid character in the middle).

    The random number generator is seeded with the current time and so this reproduces for a few seconds and then goes away. It's probably a good idea to run a fuzzer over this.

    It looks as if the replaced key (which I hold other references to, which are dropped later, and which trigger deallocation at the expected point) remains in the hash table, in spite of the new version being added.

    opened by davidchisnall 7
Releases(0.8.2)
  • 0.8.2(Jan 8, 2023)

    What's Changed

    • Backport JSON fixes from rspamd by @flowerysong in https://github.com/vstakhov/libucl/pull/186
    • Fix typos: replace missmatch with mismatch by @0mp in https://github.com/vstakhov/libucl/pull/185
    • python: update package to 0.8.1 by @denisvm in https://github.com/vstakhov/libucl/pull/184
    • Add ability to pass both the parser and userdata into a macro handler by @MeachamusPrime in https://github.com/vstakhov/libucl/pull/187
    • Provide inline free(3) wrapper, so it's easier to plug the code into out memory usage tracking framework. by @sobomax in https://github.com/vstakhov/libucl/pull/188
    • Remove unnecessary (and ignored) const from return types. by @dpacbach in https://github.com/vstakhov/libucl/pull/191
    • Modernize CMake file with target-based includes. by @dpacbach in https://github.com/vstakhov/libucl/pull/192
    • Remove unnecessary std::move from return statement. by @dpacbach in https://github.com/vstakhov/libucl/pull/193
    • Remove unused CMake logic and add -Wno-pointer-sign by @dpacbach in https://github.com/vstakhov/libucl/pull/196
    • Fix ucl++ bug where iterators stop on a UCL_NULL field. by @dpacbach in https://github.com/vstakhov/libucl/pull/197
    • Suppress the [-Wunused-parameter] warning. by @dpacbach in https://github.com/vstakhov/libucl/pull/198
    • Improve ENOMEM handling by @sobomax in https://github.com/vstakhov/libucl/pull/204
    • Fix mismerge in pull request #204. by @sobomax in https://github.com/vstakhov/libucl/pull/205
    • Document ucl_object_iter_chk_excpn() and add it into test scenario. by @sobomax in https://github.com/vstakhov/libucl/pull/206
    • Added a fuzzer for OSS-fuzz integration by @AdamKorcz in https://github.com/vstakhov/libucl/pull/214
    • Priority Validation/Documentation by @kevans91 in https://github.com/vstakhov/libucl/pull/213
    • Added to fuzzer a return statement if the string is 0 by @AdamKorcz in https://github.com/vstakhov/libucl/pull/216
    • Check for NULL inputs in ucl_object_compare() by @mikeowens in https://github.com/vstakhov/libucl/pull/207
    • Pass correct pointer to var_handler by @andoriyu in https://github.com/vstakhov/libucl/pull/218
    • Miscellaneous fixes by @bdrewery in https://github.com/vstakhov/libucl/pull/219
    • fix: ucl_expand_single_variable doesn't call free by @andoriyu in https://github.com/vstakhov/libucl/pull/220
    • fix: Incorrect pointer arithmetics in ucl_expand_single_variable by @andoriyu in https://github.com/vstakhov/libucl/pull/221
    • Install headers and library with CMake by @netbsduser in https://github.com/vstakhov/libucl/pull/222
    • Added fuzzer for msgpack by @AdamKorcz in https://github.com/vstakhov/libucl/pull/224
    • lua: Return early when init fails by @freqlabs in https://github.com/vstakhov/libucl/pull/229
    • make use of the undocumented flag UCL_PARSER_NO_IMPLICIT_ARRAYS, so by @jmgurney in https://github.com/vstakhov/libucl/pull/235
    • Punished snake pr/error fixes by @PunishedSnakePr in https://github.com/vstakhov/libucl/pull/236
    • CMake-Utils by @PunishedSnakePr in https://github.com/vstakhov/libucl/pull/237
    • Fixed ucl_tool's command line argument parsing by @PunishedSnakePr in https://github.com/vstakhov/libucl/pull/238
    • fix: Changed OpenSSL check inside configure.am by @vimishor in https://github.com/vstakhov/libucl/pull/234
    • update JSON example to match w/ UCL example by @jmgurney in https://github.com/vstakhov/libucl/pull/239
    • Cleanup CURL handle after use by @PunishedSnakePr in https://github.com/vstakhov/libucl/pull/243
    • Added CMake compile definitions by @PunishedSnakePr in https://github.com/vstakhov/libucl/pull/242
    • Fixed expanding runtime variables by @PunishedSnakePr in https://github.com/vstakhov/libucl/pull/241
    • docs: fix simple typo, tectual -> textual by @timgates42 in https://github.com/vstakhov/libucl/pull/245
    • CI: migrate to new CircleCI format by @jethron in https://github.com/vstakhov/libucl/pull/246
    • Python changes by @jethron in https://github.com/vstakhov/libucl/pull/247
    • Updated "three" to "four" output formats by @TimHiggison in https://github.com/vstakhov/libucl/pull/248
    • Fixing building using: pip install -e "git+..." by @vanIvan in https://github.com/vstakhov/libucl/pull/249
    • Avoid leaking memory when validating schemata by @netbsduser in https://github.com/vstakhov/libucl/pull/250
    • Mark + as unsafe which fixes export a key with + in config mode by @bapt in https://github.com/vstakhov/libucl/pull/253
    • Modernise the CMake build system slightly. by @davidchisnall in https://github.com/vstakhov/libucl/pull/256
    • ucl_maybe_parse_number: if there is trailing content, it is not a number by @allanjude in https://github.com/vstakhov/libucl/pull/259
    • Fix memory safety issues by @alpire in https://github.com/vstakhov/libucl/pull/260
    • Fix cmake public include install by @kuriboshi in https://github.com/vstakhov/libucl/pull/258
    • mypy/stubgen: add typeinterfaces for ucl python module by @igalic in https://github.com/vstakhov/libucl/pull/168

    New Contributors

    • @flowerysong made their first contribution in https://github.com/vstakhov/libucl/pull/186
    • @0mp made their first contribution in https://github.com/vstakhov/libucl/pull/185
    • @sobomax made their first contribution in https://github.com/vstakhov/libucl/pull/188
    • @dpacbach made their first contribution in https://github.com/vstakhov/libucl/pull/191
    • @AdamKorcz made their first contribution in https://github.com/vstakhov/libucl/pull/214
    • @kevans91 made their first contribution in https://github.com/vstakhov/libucl/pull/213
    • @mikeowens made their first contribution in https://github.com/vstakhov/libucl/pull/207
    • @andoriyu made their first contribution in https://github.com/vstakhov/libucl/pull/218
    • @netbsduser made their first contribution in https://github.com/vstakhov/libucl/pull/222
    • @freqlabs made their first contribution in https://github.com/vstakhov/libucl/pull/229
    • @jmgurney made their first contribution in https://github.com/vstakhov/libucl/pull/235
    • @PunishedSnakePr made their first contribution in https://github.com/vstakhov/libucl/pull/236
    • @vimishor made their first contribution in https://github.com/vstakhov/libucl/pull/234
    • @timgates42 made their first contribution in https://github.com/vstakhov/libucl/pull/245
    • @jethron made their first contribution in https://github.com/vstakhov/libucl/pull/246
    • @TimHiggison made their first contribution in https://github.com/vstakhov/libucl/pull/248
    • @vanIvan made their first contribution in https://github.com/vstakhov/libucl/pull/249
    • @bapt made their first contribution in https://github.com/vstakhov/libucl/pull/253
    • @davidchisnall made their first contribution in https://github.com/vstakhov/libucl/pull/256
    • @alpire made their first contribution in https://github.com/vstakhov/libucl/pull/260
    • @kuriboshi made their first contribution in https://github.com/vstakhov/libucl/pull/258
    • @igalic made their first contribution in https://github.com/vstakhov/libucl/pull/168

    Full Changelog: https://github.com/vstakhov/libucl/compare/0.8.1...0.8.2

    Source code(tar.gz)
    Source code(zip)
Owner
Vsevolod Stakhov
Vsevolod Stakhov
A universal type for non-type template parameters for C++20 or later.

uninttp A universal type for non-type template parameters for C++20 or later. Installation: uninttp (Universal Non-Type Template Parameters) is a head

null 16 Dec 24, 2021
Parser for argv that works similarly to getopt

About Most command-line programs have to parse options, so there are a lot of different solutions to this problem. Some offer many features, while oth

Jørgen Ibsen 157 Dec 22, 2022
tiny recursive descent expression parser, compiler, and evaluation engine for math expressions

TinyExpr TinyExpr is a very small recursive descent parser and evaluation engine for math expressions. It's handy when you want to add the ability to

Lewis Van Winkle 1.2k Dec 30, 2022
Simple .INI file parser in C, good for embedded systems

inih (INI Not Invented Here) inih (INI Not Invented Here) is a simple .INI file parser written in C. It's only a couple of pages of code, and it was d

Ben Hoyt 1.9k Jan 2, 2023
ini file parser

Iniparser 4 I - Overview This modules offers parsing of ini files from the C level. See a complete documentation in HTML format, from this directory o

Nicolas D 845 Jan 1, 2023
MiniCalculator with a simple parser.

MiniCalculator with a simple parser. This is a homework-expanded project. To learn something about parser and basic theory of programmi

GZTime 8 Oct 9, 2021
Simple and lightweight pathname parser for C. This module helps to parse dirname, basename, filename and file extension .

Path Module For C File name and extension parsing functionality are removed because it's difficult to distinguish between a hidden dir (ex: .git) and

Prajwal Chapagain 3 Feb 25, 2022
Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c

Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c

Rasmus 1.3k Dec 24, 2022
A simple YAML parser which produces a Node Tree Object representation of YAML Documents

A simple YAML parser which produces a Node Tree Object representation of YAML Documents and includes a find method to locate individual Nodes within the parsed Node Tree.

Timothy Rule 2 Sep 18, 2022
A PE parser written as an exercise to study the PE file structure.

Description A PE parser written as an exercise to study the PE file structure. It parses the following parts of PE32 and PE32+ files: DOS Header Rich

Ahmed Hesham 22 Nov 18, 2022
A markdown parser for tree-sitter

tree-sitter-markdown A markdown parser for tree-sitter Progress: Leaf blocks Thematic breaks ATX headings Setext headings Indented code blocks Fenced

Matthias Deiml 227 Jan 7, 2023
A TreeSitter parser for the Neorg File Format

NFF TreeSitter Parser A TreeSitter grammar for Neorg. Available Commands Command Result yarn installs needed dependencies (only do if you don't have t

Neorg 63 Dec 7, 2022
Isocline is a pure C library that can be used as an alternative to the GNU readline library

Isocline: a portable readline alternative. Isocline is a pure C library that can be used as an alternative to the GNU readline library (latest release

Daan 136 Dec 30, 2022
A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename.

whereami A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename. usag

Blackle Morisanchetto 3 Sep 24, 2022
Command-line arguments parsing library.

argparse argparse - A command line arguments parsing library in C (compatible with C++). Description This module is inspired by parse-options.c (git)

Yecheng Fu 533 Dec 26, 2022
A cross platform C99 library to get cpu features at runtime.

cpu_features A cross-platform C library to retrieve CPU features (such as available instructions) at runtime. Table of Contents Design Rationale Code

Google 2.2k Dec 22, 2022
Library that solves the exact cover problem using Dancing Links, also known as DLX.

The DLX Library The DLX library The DLX library solves instances of the exact cover problem, using Dancing Links (Knuth’s Algorithm X). Also included

Ben Lynn 44 Dec 18, 2022
Standards compliant, fast, secure markdown processing library in C

Hoedown Hoedown is a revived fork of Sundown, the Markdown parser based on the original code of the Upskirt library by Natacha Porté. Features Fully s

Hoedown 923 Dec 27, 2022
CommonMark parsing and rendering library and program in C

cmark cmark is the C reference implementation of CommonMark, a rationalized version of Markdown syntax with a spec. (For the JavaScript reference impl

CommonMark 1.4k Jan 4, 2023