CommonMark parsing and rendering library and program in C

Related tags

Utilities cmark
Overview

cmark

CI tests

cmark is the C reference implementation of CommonMark, a rationalized version of Markdown syntax with a spec. (For the JavaScript reference implementation, see commonmark.js.)

It provides a shared library (libcmark) with functions for parsing CommonMark documents to an abstract syntax tree (AST), manipulating the AST, and rendering the document to HTML, groff man, LaTeX, CommonMark, or an XML representation of the AST. It also provides a command-line program (cmark) for parsing and rendering CommonMark documents.

Advantages of this library:

  • Portable. The library and program are written in standard C99 and have no external dependencies. They have been tested with MSVC, gcc, tcc, and clang.

  • Fast. cmark can render a Markdown version of War and Peace in the blink of an eye (127 milliseconds on a ten year old laptop, vs. 100-400 milliseconds for an eye blink). In our benchmarks, cmark is 10,000 times faster than the original Markdown.pl, and on par with the very fastest available Markdown processors.

  • Accurate. The library passes all CommonMark conformance tests.

  • Standardized. The library can be expected to parse CommonMark the same way as any other conforming parser. So, for example, you can use commonmark.js on the client to preview content that will be rendered on the server using cmark.

  • Robust. The library has been extensively fuzz-tested using american fuzzy lop. The test suite includes pathological cases that bring many other Markdown parsers to a crawl (for example, thousands-deep nested bracketed text or block quotes).

  • Flexible. CommonMark input is parsed to an AST which can be manipulated programmatically prior to rendering.

  • Multiple renderers. Output in HTML, groff man, LaTeX, CommonMark, and a custom XML format is supported. And it is easy to write new renderers to support other formats.

  • Free. BSD2-licensed.

It is easy to use libcmark in python, lua, ruby, and other dynamic languages: see the wrappers/ subdirectory for some simple examples.

There are also libraries that wrap libcmark for Go, Haskell, Ruby, Lua, Perl, Python, R and Scala.

Installing

Building the C program (cmark) and shared library (libcmark) requires cmake. If you modify scanners.re, then you will also need re2c (>= 0.14.2), which is used to generate scanners.c from scanners.re. We have included a pre-generated scanners.c in the repository to reduce build dependencies.

If you have GNU make, you can simply make, make test, and make install. This calls cmake to create a Makefile in the build directory, then uses that Makefile to create the executable and library. The binaries can be found in build/src. The default installation prefix is /usr/local. To change the installation prefix, pass the INSTALL_PREFIX variable if you run make for the first time: make INSTALL_PREFIX=path.

For a more portable method, you can use cmake manually. cmake knows how to create build environments for many build systems. For example, on FreeBSD:

mkdir build
cd build
cmake ..  # optionally: -DCMAKE_INSTALL_PREFIX=path
make      # executable will be created as build/src/cmark
make test
make install

Or, to create Xcode project files on OSX:

mkdir build
cd build
cmake -G Xcode ..
open cmark.xcodeproj

The GNU Makefile also provides a few other targets for developers. To run a benchmark:

make bench

For more detailed benchmarks:

make newbench

To run a test for memory leaks using valgrind:

make leakcheck

To reformat source code using clang-format:

make format

To run a "fuzz test" against ten long randomly generated inputs:

make fuzztest

To do a more systematic fuzz test with american fuzzy lop:

AFL_PATH=/path/to/afl_directory make afl

Fuzzing with libFuzzer is also supported but, because libFuzzer is still under active development, may not work with your system-installed version of clang. Assuming LLVM has been built in $HOME/src/llvm/build the fuzzer can be run with:

CC="$HOME/src/llvm/build/bin/clang" LIB_FUZZER_PATH="$HOME/src/llvm/lib/Fuzzer/libFuzzer.a" make libFuzzer

To make a release tarball and zip archive:

make archive

Installing (Windows)

To compile with MSVC and NMAKE:

nmake

You can cross-compile a Windows binary and dll on linux if you have the mingw32 compiler:

make mingw

The binaries will be in build-mingw/windows/bin.

Usage

Instructions for the use of the command line program and library can be found in the man pages in the man subdirectory.

Security

By default, the library will scrub raw HTML and potentially dangerous links (javascript:, vbscript:, data:, file:).

To allow these, use the option CMARK_OPT_UNSAFE (or --unsafe) with the command line program. If doing so, we recommend you use a HTML sanitizer specific to your needs to protect against XSS attacks.

Contributing

There is a forum for discussing CommonMark; you should use it instead of github issues for questions and possibly open-ended discussions. Use the github issue tracker only for simple, clear, actionable issues.

Authors

John MacFarlane wrote the original library and program. The block parsing algorithm was worked out together with David Greenspan. Vicent Marti optimized the C implementation for performance, increasing its speed tenfold. Kārlis Gaņģis helped work out a better parsing algorithm for links and emphasis, eliminating several worst-case performance issues. Nick Wellnhofer contributed many improvements, including most of the C library's API and its test harness.

Comments
  • Extension support in libcmark

    Extension support in libcmark

    Hello, I always see "extensions" mentioned on discussions of features in CommonMark (for example the discussion about tables).

    Does libcmark itself support actual extensions, and if so is there any guide on how to implement one, and an index of common extensions, or are extensions purely conceptual extensions to the specification, up to individual implementations to add ?

    I'm pretty sure I could just read the code and find out but

    • I'm lazy
    • This question may be useful for someone else wondering the same thing.

    Thanks for all your work on CommonMark / pandoc!

    opened by MathieuDuponchelle 58
  • Extensions redux

    Extensions redux

    Hi there!

    I've taken the work in #123 and rejigged it a bit. At GitHub we're currently using a Sundown-based parser/renderer, but it's not super extensible. So, we've decided to roll out CommonMark to replace it.

    Here are some of the changes to #123 in this PR:

    • Took out the shared object searcher. I don't think library code should searching and loading objects dynamically at runtime. Instead, you as the user register whatever plugins you might have linked in yourself. (Maybe you loaded it dynamically — that's not something for a Markdown library to do, imo.)
    • Expanded the extension interface enough such that the two existing plugins (table, strikethrough) can be implemented without any changes to the core code. #123 had table specific code in the core (outside of the shared object), which limited its usefulness. This branch means no change to core code for implementing said. Extensions can register their own node types, their own renderer functions for said node types, etc.
    • Fixes the Windows build.
    • Adds tests.
    • Adds an autolink extension.
    • Adds a whitelist extension for the HTML renderer.

    This functionality has all been exposed as opt-in in the Ruby gem commonmarker, which is our primary interface.

    opened by kivikakk 55
  • More sourcepos!

    More sourcepos!

    Found time to work a bit more on https://github.com/jgm/cmark/issues/26 , these commits solely implement the definition of a gap-free source map with no empty extents, parsed in the same pass as the nodes.

    A few things to discuss:

    • Ownership of line-termination characters and trailing whitespace at the end of blocks, I make them belong to the root
    • Ownership of reference definitions, which I also attribute to the root, as there's no proper AST node for them
    • Performance, my initial implementation for this approach doubled make bench time (erg), I now observe a 25 % difference, which seems reasonable to me, not sure how much can be ironed out still, the task is inherently a bit complex, due to potential discontinuities between the n extents constituting a block. Either we consider this a reasonable enough performance hit and hope we find more smart ways to reduce it, or we make it conditional to the SOURCEPOS option as currently.

    Things that are in my opinion out of scope and can be discussed later:

    • Actual API for this
    • "Reverse source map", ie node to extents

    To test this, just enable the call to "print_parser_source_map" in blocks.c:finalize_document , example output for the case that made me revise my approach:

    >     code
    >     more code
    
    0:1 - block_quote (0x9485f0)
    1:2 - block_quote (0x9485f0)
    2:6 - code_block (0x948740)
    6:14 - code_block (0x948740)
    14:15 - block_quote (0x9485f0)
    15:16 - block_quote (0x9485f0)
    16:20 - code_block (0x948740)
    20:30 - code_block (0x948740)
    

    I'm sure there's more to say, but let's keep this short :)

    opened by MathieuDuponchelle 35
  • Consider usage of the GLib in libcmark

    Consider usage of the GLib in libcmark

    This subject has been briefly discussed in #100 , but I figured a separate issue to sum up arguments for and against that, and discuss whether this would be acceptable would be useful.

    Argument(s) against using the glib

    The main (and only) argument raised against using the glib is that of portability. I would argue its usage would actually help with portability, with respect to things like loading of plugins, or threading.

    Note that I will open another issue at some point regarding multithreading of the inline parsing phase, as I think this phase is very amenable to parallelization, as long as the separation of inline and block parsing is consistently enforced.

    I'm working for an Open Source software consultancy company, collabora, where we routinely deploy glib-based solutions on a wide range of architectures and operating systems, including Windows, and I've never seen any issues with glib's portability.

    I'm writing this from my seldom-used Windows partition, where I've just successfully compiled a version of cmark built against the glib, thanks to the MSYS2 project this has been a completely painless experience. Note that it is also trivial to provide installers using that solution.

    Arguments for using the glib

    Features

    See https://developer.gnome.org/glib/2.48/ for the full list of features, here are a few I think are relevant for cmark, in that they could make its codebase way leaner, and help implement features in a portable manner.

    • Portable basic types: gboolean would for example help get us rid of that code, by the way I'm not even sure how cmark could compile at all when the preprocessor enters the #elif !defined(__cplusplus) preproc branch there, as true and false will not be defined.
    • Standard high-level data structures: Having to implement a poor man's linked list in my work on extensions isn't something I'm satisfied with, more generally the more wheels one reinvents the more surface for bugs one has to maintain. I think (not sure) that reference maps are implemented as a hashtable, one could use a GHashTable for this instead. The AST could be implemented as an N-ary tree etc.
    • String-related utilities: https://developer.gnome.org/glib/2.48/glib-String-Chunks.html https://developer.gnome.org/glib/2.48/glib-Strings.html and https://developer.gnome.org/glib/2.48/glib-String-Utility-Functions.html would let us get rid of cmark_chunk and cmark_strbuf, which I had to expose in the API for extensions.
    • Unicode handling: I think the plethora of functions defined in there would make most if not all of the utf8 handling code in cmark irrelevant.
    • Error handling / Logging: https://developer.gnome.org/glib/2.48/glib-Warnings-and-Assertions.html, https://developer.gnome.org/glib/2.48/glib-Message-Logging.html and https://developer.gnome.org/glib/2.48/glib-Error-Reporting.html would be more than useful to improve ease of debugging of the library for us developers, as well as offer a more advanced error API.
    • Parsing and lexing utilities: https://developer.gnome.org/glib/2.48/glib-Lexical-Scanner.html,https://developer.gnome.org/glib/2.48/glib-regex-syntax.html : I haven't benchmarked these functions, but I'm pretty sure re2c - generated scanners will blow them out of the water performance-wise, however their performance might be adequate for extension implementers who would desire a higher-level lexing / regex interface, and would directly have one available when linking with libcmark.
    • Filesystem-related utilities: https://developer.gnome.org/glib/2.48/glib-File-Utilities.html and https://developer.gnome.org/glib/2.48/glib-URI-Functions.html contain functions that would help simplify some code paths as well.
    • command-line parser to get rid of and improve the equivalent code in the cmark executable.
    • Testing: this wouldn't hurt either :)
    • GApplication: not really needed for now, but one could imagine this being useful if we ever wanted to have a "cmark server".
    • Threading: https://developer.gnome.org/glib/2.48/glib-Thread-Pools.html would help a lot in portably parallelizing inline parsing. I haven't benchmarked this, but I believe there's a significant performance reserve to tap into in that direction, which could be very valuable for one of cmark's announced use cases, which is to be run server-side.
    • Plugin loading: would be extremely useful for extensions obviously.

    Other arguments

    • cmark is already a moderately complex library, which implements internally things that have been standard in the glib for ages. My work on extensions only complexifies it a bit more, with the addition of (linux-only) plugin-loading code, a linked list, exposing API that has nothing to do with cmark's actual job ... Using the glib would drastically reduce the scope of the library, and actually help improve its behaviour and portability. I think this outweighs by far the distribution concerns, which simply need to be addressed once and for all by updating the installing documentation, and possibly providing helpers such as Visual Studio solutions for people insisting on using that piece of technology (I'm afraid I won't be able to help there).
    • The "glib port" is by no means something that needs to happen all at once, it can be done incrementally, and wouldn't interfere with daily development.
    • We could decide at some point to port some of the already kind of object-oriented API to GObject, which would let us offer dynamic introspection capabilities for javascript and python
    • I really really think this is a good idea, if that's any help :)

    @nwellnhof , I know you're concerned with this, but please come to this with an open mind, consider all the things the glib would bring to the table, and evaluate whether this port would really prevent you from using cmark at all, or simply mean spending ten minutes figuring out how to update your bundling of cmark, which could profit to other people using that practice.

    opened by MathieuDuponchelle 26
  • Parsing ‘* * * * * * … a’ takes quadratic time

    Parsing ‘* * * * * * … a’ takes quadratic time

    $ python -c 'print("* "*10000 + "a")' | time cmark > /dev/null
    1.21user 0.00system 0:01.23elapsed 98%CPU (0avgtext+0avgdata 6048maxresident)k
    0inputs+0outputs (0major+1188minor)pagefaults 0swaps
    $ python -c 'print("* "*20000 + "a")' | time cmark > /dev/null
    7.55user 0.00system 0:07.59elapsed 99%CPU (0avgtext+0avgdata 9968maxresident)k
    0inputs+0outputs (0major+2245minor)pagefaults 0swaps
    $ python -c 'print("* "*40000 + "a")' | time cmark > /dev/null
    41.23user 0.01system 0:41.44elapsed 99%CPU (0avgtext+0avgdata 18848maxresident)k
    0inputs+0outputs (0major+4410minor)pagefaults 0swaps
    

    Related: jgm/commonmark-hs#2, mity/md4c#66.

    opened by andersk 21
  • Build fails on Debian Jessie

    Build fails on Debian Jessie

    It seems that cmark 0.30.0 can't be compiled on Debian Jessie (compilation of cmark 0.29.0 succeeds):

    From within a Docker image launched with docker run --rm -it debian:jessie bash:

    $ apt-get update && apt-get install -qy cmake curl g++ python3
    [...]
    $ cd "$(mktemp -d)"
    $ curl -sSL -o - https://github.com/commonmark/cmark/archive/0.30.0.tar.gz | tar xz
    $ cd cmark-*
    $ make -s -j$(nproc) cmake_build
    -- The C compiler identification is GNU 4.9.2
    -- The CXX compiler identification is GNU 4.9.2
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Performing Test HAVE_FLAG_ADDRESS_SANITIZER
    -- Performing Test HAVE_FLAG_ADDRESS_SANITIZER - Failed
    -- Performing Test HAVE_FLAG_SANITIZE_ADDRESS
    -- Performing Test HAVE_FLAG_SANITIZE_ADDRESS - Success
    -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
    -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
    -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
    -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
    -- Performing Test COMPILER_HAS_DEPRECATED_ATTR
    -- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
    -- Looking for stdbool.h
    -- Looking for stdbool.h - found
    -- Performing Test HAVE___BUILTIN_EXPECT
    -- Performing Test HAVE___BUILTIN_EXPECT - Success
    -- Performing Test HAVE___ATTRIBUTE__
    -- Performing Test HAVE___ATTRIBUTE__ - Success
    -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.4.2", minimum required is "3")
    -- Configuring done
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:42 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:43 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:44 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    CMake Error at CMakeLists.txt:50 (add_compile_options):
      Error evaluating generator expression:
    
        $<COMPILE_LANGUAGE:C>
    
      Expression did not evaluate to a known generator expression
    
    
    -- Generating done
    -- Build files have been written to: /tmp/tmp.KfonX7KiZZ/cmark-0.30.0/build
    Makefile:37: recipe for target 'build' failed
    make: *** [build] Error 1
    
    opened by mlocati 19
  • Provide a CMARK_UNSAFE environment variable for backwards compatibility.

    Provide a CMARK_UNSAFE environment variable for backwards compatibility.

    Making safe mode the default is a noble idea. However, an incompatible change without a backwards compatibility option, after years of unchanging behaviour, is a serious headache.

    cmark 0.28 is still in a lot of still supported OSes/distros. If you use cmark as a part of a build process and your Markdown comes from a trusted/reviewed source, you have to choose between old and new because the old one will complain about an invalid option if you run cmark --unsafe, and the new one will auto-escape HTML without it.

    Environment variables can be an effective backwards compatibility mechanism: for the new versions it will disable safe mode, and for old versions it will have no ill effects.

    This patch adds CMARK_UNSAFE environment variable in addition to the --unsafe option, with the same effect.

    $ ./build/src/cmark 
    <blink>hello world</blink> ^D
    
    <p><!-- raw HTML omitted -->hello world<!-- raw HTML omitted --></p>
    
    $ CMARK_UNSAFE=1 ./build/src/cmark 
    <blink>hello world</blink> ^D
    
    <p><blink>hello world</blink></p>
    

    I'm open to discussion regarding the naming and implementation.

    opened by dmbaturin 17
  • buffer: proper safety checks for unbounded memory

    buffer: proper safety checks for unbounded memory

    Hey @jgm! Long time no chat. :)

    I've been doing some security review on the library. I have some concerns about the way we're handling buffer overflows, so here's a proposed commit. Message as follows:


    The previous work for unbounded memory usage and overflows on the buffer API had several shortcomings:

    1. The total size of the buffer was limited by arbitrarily small precision on the storage type for buffer indexes (typedef'd as bufsize_t). This is not a good design pattern in secure applications, particualarly since it requires the addition of helper functions to cast to/from the native size types and the custom type for the buffer, and check for overflows.
    2. The library was calling abort on overflow and memory allocation failures. This is not a good practice for production libraries, since it turns a potential RCE into a trivial, guaranteed DoS to the whole application that is linked against the library. It defeats the whole point of performing overflow or allocation checks when the checks will crash the library and the enclosing program anyway.
    3. The default size limits for buffers were essentially unbounded (capped to the precision of the storage type) and could lead to DoS attacks by simple memory exhaustion (particularly critical in 32-bit platforms). This is not a good practice for a library that handles arbitrary user input.

    Hence, this patchset provides slight (but in my opinion critical) improvements on this area, copying some of the patterns we've used in the past for high throughput, security sensitive Markdown parsers:

    1. The storage type for buffer sizes is now platform native (ssize_t). Ideally, this would be a size_t, but several parts of the code expect buffer indexes to be possibly negative. Either way, switching to a size type is an strict improvement, particularly in 64-bit platforms. All the helpers that assured that values cannot escape the size range have been removed, since they are superfluous.
    2. The overflow checks have been removed. Instead, the maximum size for a buffer has been set to a safe value for production usage (32mb) that can be proven not to overflow in practice. Users that need to parse particularly large Markdown documents can increase this value. A static, compile-time check has been added to ensure that the maximum buffer size cannot overflow on any growth operations.
    3. The library no longer aborts on buffer overflow. The CMark library now follows the convention of other Markdown implementations (such as Hoedown and Sundown) and silently handles buffer overflows and allocation failures by dropping data from the buffer. The result is that pathological Markdown documents that try to exploit the library will instead generate truncated (but valid, and safe) outputs.

    All tests after these small refactorings have been verified to pass.


    NOTE: Regarding 32 bit overflows, generating test cases that crash the library is trivial (any input document larger than 2gb will crash CMark), but most Python implementations have issues with large strings to begin with, so a test case cannot be added to the pathological tests suite, since it's written in Python.

    opened by vmg 17
  • smart_punct.txt

    smart_punct.txt

    We have already discussed it commonmark.js, but I spotted test for smart punctuation here. And have the same question. Why does markdown transformer have responsibility to do smth with typography?

    opened by iamstarkov 16
  • Remove

    Remove "-rdynamic" flag for static builds

    I ran into problems trying to build the statically linked cmark executable using musl libc that was caused by the "-rdynamic" flag being implicitly added to the build command. The resulting executable would had references to a musl libc shared object instead of being hermetic as expected. I'm not sure if this has any unintended consequences, but I at least have no problems building a dynamically linked executable with this patch applied on my Linux and macOS machines.

    opened by ericpruitt 15
  • Implement support for custom memory allocators

    Implement support for custom memory allocators

    Supersedes https://github.com/jgm/cmark/pull/127

    As discussed on the previous PR, here's a proposal on a supporting custom memory allocators. As you can see I've wired up the cmark_mem structure throughout the parser, I believe without increasing complexity or memory usage needlessly. I haven't implemented yet pooling allocators but it should be trivial now that we're passing a "memory" structure everywhere where we allocate memory.

    I'd love feedback on this design. I'm moderately happy with it. The external API hasn't been broken, and the "default APIs" will continue working as before, using a default allocator (system malloc + abort on OOM). Every single node is now also aware of its allocator, which means that we can choose an allocator when creating new nodes and check that you cannot insert a node from a specific allocator into a document tree created by another allocator.

    To offset the slight memory increase in the node structure, I've put some light memory diet -- although we could save quite a few bytes if we did something smarter with the prev, next, etc links in each structure.

    cc @jgm @nwellnhof

    opened by vmg 14
  • Parsing of

    Parsing of "____a__!__!___"

    Sorry for the line noise, it was discovered when I was fuzzing my Markdown parser and I haven't been able to reduce it to a smaller case.

    The following Markdown:

    ____a__!__!___
    

    generates the following output:

    __<strong>a</strong>!<strong>!</strong>_
    

    However, none of the rules in https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis seems to prohibit the second _ and the last _ to be paired to form another emphasis, thus becoming:

    _<em><strong>a</strong>!<strong>!</strong></em>
    

    Interestingly, if I replace all _ with * I do get the desired output, and in this particular case the distinction shouldn't matter.

    commonmark.js has the same behavior: https://spec.commonmark.org/dingus/?text=____a__!__!___. But GitHub's Markdown implementation has the behavior I expect: a!!___

    opened by xiaq 2
  • use an in-tree header for symbol export information

    use an in-tree header for symbol export information

    Relying on CMake's GenerateExportHeader produces a file that is longer than the one being added here, which for all that is just mostly defining macros not used here. And after all that, it only contains macros specific to a single compiler, while failing to consistently handle GNUC (that always supports symbol visibility even for static libraries).

    Replace this with a more targeted header that is easy to read or include into external build systems, and which is also more robust than the one that only exists inside CMake.

    opened by eli-schwartz 8
  • Tracking backslash escapes?

    Tracking backslash escapes?

    I think this is similar/related to #131 and #292, but one thing I noticed is that bare square brackets do not roundtrip:

    Input:

    [unescaped brackets],  \[escaped brackets\] and [a link](https://example.com)
    

    Output:

    \[unescaped brackets\],  \[escaped brackets\] and [a link](https://example.com)
    

    Is there a way to add an attribute that can track the position of escaped characters in a line?

    This is useful for me because I'm trying to parse and rewrite documents that have reference links in child documents, and I have to backtrack to identify and protect these links from being overwritten.

    (originally reported this in https://github.com/r-lib/commonmark/issues/20)

    opened by zkamvar 0
  • Add vcpkg installation instructions

    Add vcpkg installation instructions

    cmark is available as a port in vcpkg, a C++ library manager that simplifies installation for cmark and other project dependencies. Documenting the install process here will help users get started by providing a single set of commands to build cmark, ready to be included in their projects.

    We also test whether our library ports build in various configurations (dynamic, static) on various platforms (OSX, Linux, Windows: x86, x64) to keep a wide coverage for users.

    I'm a maintainer for vcpkg, and here is what the port script looks like. We try to keep the library maintained as close as possible to the original library. 😊

    opened by FrankXie05 0
  • incorrect start_column & end_column

    incorrect start_column & end_column

    The text: - \na

    Creates the following hierarchy:

    • Document
      • List
        • Paragraph
          • Text

    This AST has the following {start_line, end_line, start_column, end_column}

    • Document: {1, 2, 1, 1}
      • List: {1, 2, 1, 1}
        • Paragraph: {1, 2, 3, 1}
          • Text: {2, 2, 3, 3}

    {2, 2, 3, 3} (the Text bounds) exceed the bounds for the document, and violates some assumptions which cause my application to panic.

    I couldn't find any documentation which makes it clear whether this is a bug or not, but my intuition says that this is a bug.

    opened by Parth 1
Releases(0.30.2)
  • 0.30.2(Sep 24, 2021)

    • Fix parsing of emphasis before links (#424, Nick Wellnhofer). Fixes a regression introduced with commit ed0a4bf.

    • Update to Unicode 14.0 (data-man).

    • Add ~ to safe href character set (#394, frogtile).

    • Update CMakeLists.txt (Saleem Abdulrasool). Bump the minimum required CMake to 3.7. Imperatively define output name for static library.

    • Fix install paths in libcmark.pc (Sebastián Mancilla). CMAKE_INSTALL_<dir> can be relative or absolute path, so it is wrong to prefix CMAKE_INSTALL_PREFIX because if CMAKE_INSTALL_<dir> is set to an absolute path it will result in a malformed path with two absolute paths joined together. Instead, use CMAKE_INSTALL_FULL_<dir> from GNUInstallDirs.

    Source code(tar.gz)
    Source code(zip)
  • 0.30.1(Jul 17, 2021)

    • Properly indent block-level contents of list items in man (#258). This handles nested lists as well as items with multiple paragraphs. The change requires addition of a new field block_number_in_list_item to cmark_renderer, but this does not change the public API.
    • Fix quadratic behavior when parsing emphasis (#389, Nick Wellnhofer). Delimiters can be deleted, so store delimiter positions instead of pointers in openers_bottom. Besides causing undefined behavior when reading a dangling pointer, this could also result in quadratic behavior when parsing emphasis.
    • Fix quadratic behavior when parsing smart quotes (#388, Nick Wellnhofer). Remove matching smart quote delimiters. Otherwise, the same opener could be found over and over, preventing the openers_bottom optimization from kicking in and leading to quadratic behavior when processing lots of quotes.
    • Modify CMake configuration so that the project can be built with older versions of CMake (#384, Saleem Abdulrasool). (In 0.30.0, some features were used that require CMake >= 3.3.) The cost of this backwards compatibility is that developers must now explicitly invoke cmark_add_compile_options when a new compilation target is added.
    • Remove a comma at the end of an enumerator list, which was flagged by clang as a C++11 extension.
    • make_man_page.py: use absolute path with CDLL. This avoids the error "file system relative paths not allowed in hardened programs."
    • Include cmark version in cmark(3) man page (instead of LOCAL).
    Source code(tar.gz)
    Source code(zip)
  • 0.30.0(Jun 20, 2021)

    • Use official 0.30 spec.txt.
    • Add cmark_get_default_mem_allocator() (#330). API change: this adds a new exported function in cmark.h.
    • Fix #383. An optimization we used for emphasis parsing was too aggressive, causing us to miss some emphasis that was legal according to the spec. We fix this by indexing the openers_bottom table not just by the type of delimiter and the length of the closing delimiter mod 3, but by whether the closing delimiter can also be an opener. (The algorithm for determining emphasis matching depends on all these factors.) Add regression test.
    • Fix quadratic behavior with inline HTML (#299, Nick Wellnhofer). Repeated starting sequences like <?, <!DECL or <![CDATA[ could lead to quadratic behavior if no matching ending sequence was found. Separate the inline HTML scanners. Remember if scanning the whole input for a specific ending sequence failed and skip subsequent scans.
    • Speed up hierarchy check in tree manipulation API (Nick Wellnhofer). Skip hierarchy check in the common case that the inserted child has no children.
    • Fix quadratic behavior when parsing inlines (#373, Nick Wellnhofer). The inline parsing code would call cmark_node_append_child to append nodes. This public function has a sanity check which is linear in the depth of the tree. Repeated calls could show quadratic behavior in degenerate trees. Use a special function to append nodes without this check. (Issue found by OSS-Fuzz.)
    • Replace invalid characters in XML output (#365, Nick wellnhofer). Control characters, U+FFFE and U+FFFF aren't allowed in XML 1.0, so replace them with U+FFFD (replacement character). This doesn't solve the problem how to roundtrip these characters, but at least we don't produce invalid XML.
    • Avoid quadratic output growth with reference links (#354, Nick Wellnhofer). Keep track of the number bytes added through expansion of reference links and limit the total to the size of the input document. Always allow a minimum of 100KB. Unfortunately, cmark has no error handling, so all we can do is to stop expanding reference links without returning an error. This should never be an issue in practice though. The 100KB minimum alone should cover all real-world cases.
    • Fix issue with type-7 HTML blocks interrupting paragraphs (see commonmark/commonmark.js#213).
    • Treat textarea like script, style, pre (type 1 HTML block), in accordance with spec change.
    • Define whitespace per spec (Asherah Conor).
    • Add MAX_INDENT for xml (#355). Otherwise we can get quadratic increase in size with deeply nested structures.
    • Fix handling of empty strings when creating XML/HTML output (Steffen Kieß).
    • Commonmark renderer: always use fences for code (#317). This solves problems with adjacent code blocks being merged.
    • Improve rendering of commonmark code spans with spaces (#316).
    • Cleaner approach to max digits for numeric entities. This modifies unescaping in houdini_html_u.c rather than the entity handling in inlines.c. Unlike the other, this approach works also in e.g. link titles.
    • Fix entity parser (and api test) to respect length limit on numeric entities.
    • Don't allow link destinations with unbalanced unescaped parentheses. See commonmark/commonmark.js#177.
    • print_usage(): Minor grammar fix, swap two words (#305, Øyvind A. Holm).
    • Don't call memcpy with NULL as first parameter. This is illegal according to the C standard, sec. 7.1.4. See https://www.imperialviolet.org/2016/06/26/nonnull.html.
    • Add needed include in blocks.c.
    • Fix unnecessary variable assignment.
    • Skip UTF-8 BOM if present at beginning of buffer (#334).
    • Fix URL check in is_autolink (Nick Wellnhofer). In a recent commit, the check was changed to strcmp, but we really have to use strncmp.
    • Fix null pointer deref in is_autolink (Nick Wellnhofer). Introduced by a recent commit. Found by OSS-Fuzz.
    • Rearrange struct cmark_node (Nick Wellnhofer). Introduce multi-purpose data/len members in struct cmark_node. This is mainly used to store literal text for inlines, code and HTML blocks. Move the content strbuf for blocks from cmark_node to cmark_parser. When finalizing nodes that allow inlines (paragraphs and headings), detach the strbuf and store the block content in the node's data/len members. Free the block content after processing inlines. Reduces size of struct cmark_node by 8 bytes.
    • Improve packing of struct cmark_list (Nick Wellnhofer).
    • Use C string instead of chunk in a number of contexts (Nick Wellnhofer, #309). The node struct never references memory of other nodes now. Node accessors don't have to check for delayed creation of C strings, so parsing and iterating all literals using the public API should actually be faster than before. These changes also reduce the size of struct cmark_node.
    • Add casts for MSVC10 (from kivikakk in cmark-cfm).
    • commonmark renderer: better escaping in smart mode. When CMARK_OPT_SMART is enabled, we escape literal -, ., and quote characters when needed to avoid their being "smartified."
    • Add options field to cmark_renderer.
    • commonmark.c - use size_t instead of int.
    • Include string.h in cmark-fuzz.c.
    • Fix #220 (hash collisions for references) (Vicent Marti via cmark-gfm). Reimplemented reference storage as follows:
      1. New references are always inserted at the end of a linked list. This is an O(1) operation, and does not check whether an existing (duplicate) reference with the same label already exists in the document.
      2. Upon the first call to cmark_reference_lookup (when it is expected that no further references will be added to the reference map), the linked list of references is written into a fixed-size array.
      3. The fixed size array can then be efficiently sorted in-place in O(n log n). This operation only happens once. We perform this sort in a stable manner to ensure that the earliest link reference in the document always has preference, as the spec dictates. To accomplish this, every reference is tagged with a generation number when initially inserted in the linked list.
      4. The sorted array is then compacted in O(n). Since it was sorted in a stable way, the first reference for each label is preserved and the duplicates are removed, matching the spec.
      5. We can now simply perform a binary search for the current cmark_reference_lookup query in O(log n). Any further lookup calls will also be O(log n), since the sorted references table only needs to be generated once. The resulting implementation is notably simple (as it uses standard library builtins qsort and bsearch), whilst performing better than the fixed size hash table in documents that have a high number of references and never becoming pathological regardless of the input.
    • Comment out unused function cmark_strbuf_cstr in buffer.h.
    • Re-add --safe command-line option as a no-op (#344), for backwards compatibility.
    • Update to Unicode 13.0
    • Generate and install cmake-config file (Reinhold Gschweicher). Add full cmake support. The project can either be used with add_subdirectory or be installed into the system (or some other directory) and be found with find_package(cmark). In both cases the cmake target cmark::cmark and/or cmark::cmark_static is all that is needed to be linked. Previously the cmarkConfig.cmake file was generated, but not installed. As additional bonus of generation by cmake we get a generated cmake-config-version.cmake file for find_package() to search for the same major version. The generated config file is position independent, allowing the installed directory to be copied or moved and still work. The following four files are generated and installed: lib/cmake/cmark/cmark-config.cmake, lib/cmake/cmark/cmark-config-version.cmake, lib/cmake/cmark/cmark-targets.cmake, lib/cmake/cmark/cmark-targets-release.cmake.
    • Adjust the MinGW paths for MinGW64 (Daniil Baturin).
    • Fix CMake generator expression checking for MSVC (Nick Wellnhofer).
    • Fix -Wconst-qual warning (Saleem Abdulrasool). This enables building with /Zc:strictString with MSVC as well.
    • Improve and modernize cmake build (Saleem Abdulrasool).
      • Build: add exports targets for build tree usage (#307).
      • Uuse target properties for include paths.
      • Remove the unnecessary execute permission on CMakeLists.txt.
      • Reduce property computation in CMake.
      • Use CMAKE_INCLUDE_CURRENT_DIRECTORY.
      • Improve man page installation.
      • Only include GNUInstallDirs once.
      • Replace add_compile_definitions with add_compile_options since the former was introduced in 3.12 (#321).
      • Cleanup CMake (#319).
      • Inline a variable.
      • Use LINKER_LANGUAGE property for C++ runtime.
      • Use CMake to control C standard.
      • Use the correct variable.
      • Loosen the compiler check
      • Hoist shared flags to top-level CMakeLists
      • Remove duplicated flags.
      • Use add_compile_options rather than modify CMAKE_C_FLAGS.
      • Hoist sanitizer flags to global state.
      • Hoist -fvisibilty flags to top-level.
      • Hoist the debug flag handling.
      • Hoist the profile flag handling.
      • Remove incorrect variable handling.
      • Remove unused CMake includes.
    • Remove "-rdynamic" flag for static builds (#300, Eric Pruitt).
    • Fixed installation on other than Ubuntu GNU/Linux distributions (Vitaly Zaitsev).
    • Link executable with static or shared library (Nick Wellnhofer). If CMARK_STATIC is on (default), link the executable with the static library. This produces exactly the same result as compiling the library sources again and linking with the object files. If CMARK_STATIC is off, link the executable with the shared library. This wasn't supported before and should be the preferred way to package cmark on Linux distros. Building only a shared library and a statically linked executable isn't supported anymore but this doesn't seem useful.
    • Reintroduce version check for MSVC /TP flag (Nick Wellnhofer). The flag is only required for old MSVC versions.
    • normalize.py: use html.escape instead of cgi.escape (#313).
    • Fix pathological_tests.py on Windows (Nick Wellnhofer). When using multiprocessing on Windows, the main program must be guarded with a __name__ check.
    • Remove useless __name__ check in test scripts (Nick Wellnhofer).
    • Add CIFuzz (Leo Neat).
    • cmark.1 - Document --unsafe instead of --safe (#332).
    • cmark.1: remove docs for --normalize which no longer exists (#332).
    • Add lint target to Makefile.
    • Add uninstall target to Makefile.
    • Update benchmarks (#338).
    • Fix typo in documentation (Tim Gates).
    • Increase timeout for pathological tests to avoid CI failure.
    • Update the Racket wrapper with the safe -> unsafe flag change (Eli Barzilay).
    Source code(tar.gz)
    Source code(zip)
  • 0.29.0(Apr 8, 2019)

    [0.29.0]

    • Update spec to 0.29.
    • Make rendering safe by default (#239, #273). Adds CMARK_OPT_UNSAFE and make CMARK_OPT_SAFE a no-op (for API compatibility). The new default behavior is to suppress raw HTML and potentially dangerous links. The CMARK_OPT_UNSAFE option has to be set explicitly to prevent this. NOTE: This change will require modifications in bindings for cmark and in most libraries and programs that use cmark. Borrows heavily from @kivikakk's patch in github/cmark-gfm#123.
    • Add sourcepos info for inlines (Yuki Izumi).
    • Disallow more than 32 nested balanced parens in a link (Yuki Izumi).
    • Resolve link references before creating setext header. A setext header line after a link reference should not create a header, according to the spec.
    • commonmark renderer: improve escaping. URL-escape special characters when escape mode is URL, and not otherwise. Entity-escape control characters (< 0x20) in non-literal escape modes.
    • render: only emit actual newline when escape mode is LITERAL. For markdown content, e.g., in other contexts we want some kind of escaping, not a literal newline.
    • Update code span normalization to conform with spec change.
    • Allow empty <> link destination in reference link.
    • Remove leftover includes of memory.h (#290).
    • A link destination can't start with < unless it is an angle-bracket link that also ends with > (#289). (If your URL really starts with <, URL-escape it.)
    • Allow internal delimiter runs to match if both have lengths that are multiples of 3. See commonmark/commonmark#528.
    • Include references.h in parser.h (#287).
    • Fix [link](<foo\>).
    • Use hand-rolled scanner for thematic break (see #284). Keep track of the last position where a thematic break failed to match on a line, to avoid rescanning unnecessarily.
    • Rename ends_with_blank_line with S_ prefix.
    • Add CMARK_NODE__LAST_LINE_CHECKED flag (#284). Use this to avoid unnecessary recursion in ends_with_blank_line.
    • In ends_with_blank_line, call S_set_last_line_blank to avoid unnecessary repetition (#284). Once we settle whether a list item ends in a blank line, we don't need to revisit this in considering parent list items.
    • Disallow unescaped ( in parenthesized link title.
    • Copy line/col info straight from opener/closer (Ashe Connor). We can't rely on anything in subj since it's been modified while parsing the subject and could represent line info from a future line. This is simple and works.
    • render.c: reset last_breakable after cr. Fixes jgm/pandoc#5033.
    • Fix a typo in houdini_href_e.c (Felix Yan).
    • commonmark writer: use ~~~ fences if info string contains backtick. This is needed for round-trip tests.
    • Update scanners for new info string rules.
    • Add XSLT stylesheet to convert cmark XML back to Commonmark (Nick Wellnhofer, #264). Initial version of an XSLT stylesheet that converts the XML format produced by cmark -t xml back to Commonmark.
    • Check for whitespace before reference title (#263).
    • Bump CMake to version 3 (Jonathan Müller).
    • Build: Remove deprecated call to add_compiler_export_flags() (Jonathan Müller). It is deprecated in CMake 3.0, the replacement is to set the CXX_VISIBILITY_PRESET (or in our case C_VISIBILITY_PRESET) and VISIBILITY_INLINES_HIDDEN properties of the target. We're already setting them by setting the CMake variables anyway, so the call can be removed.
    • Build: only attempt to install MSVC system libraries on Windows (Saleem Abdulrasool). Newer versions of CMake attempt to query the system for information about the VS 2017 installation. Unfortunately, this query fails on non-Windows systems when cross-compiling: cmake_host_system_information does not recognize <key> VS_15_DIR. CMake will not find these system libraries on non-Windows hosts anyways, and we were silencing the warnings, so simply omit the installation when cross-compiling to Windows.
    • Simplify code normalization, in line with spec change.
    • Implement code span spec changes. These affect both parsing and writing commonmark.
    • Add link parsing corner cases to regressions (Ashe Connor).
    • Add xml:space="preserve" in XML output when appropriate (Nguyễn Thái Ngọc Duy). (For text, code, code_block, html_inline and html_block tags.)
    • Removed meta from list of block tags. Added regression test. See commonmark/CommonMark#527.
    • entity_tests.py - omit noisy success output.
    • pathological_tests.py: make tests run faster. Commented out the (already ignored) "many references" test, which times out. Reduced the iterations for a couple other tests.
    • pathological_tests.py: added test for deeply nested lists.
    • Optimize S_find_first_nonspace. We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to Martin Mitas for diagnosing the problem.
    • Allow spaces in link destination delimited with pointy brackets.
    • Adjust max length of decimal/numeric entities. See commonmark/CommonMark#487.
    • Fix inline raw HTML parsing. This fixes a recently added failing spec test case. Previously spaces were being allowed in unquoted attribute values; no we forbid them.
    • Don't allow list markers to be indented >= 4 spaces. See commonmark/CommonMark#497.
    • Check for empty buffer when rendering (Phil Turnbull). For empty documents, ->size is zero so renderer.buffer->ptr[renderer.buffer->size - 1] will cause an out-of-bounds read. Empty buffers always point to the global cmark_strbuf__initbuf buffer so we read cmark_strbuf__initbuf[-1].
    • Also run API tests with CMARK_SHARED=OFF (Nick Wellnhofer).
    • Rename roundtrip and entity tests (Nick Wellnhofer). Rename the tests to reflect that they use the library, not the executable.
    • Generate export header for static-only build (#247, Nick Wellnhofer).
    • Fuzz width parameter too (Phil Turnbull). Allow the width parameter to be generated too so we get better fuzz-coverage.
    • Don't discard empty fuzz test-cases (Phil Turnbull). We currently discard fuzz test-cases that are empty but empty inputs are valid markdown. This improves the fuzzing coverage slightly.
    • Fixed exit code for pathological tests.
    • Add allowed failures to pathological_tests.py. This allows us to include tests that we don't yet know how to pass.
    • Add timeout to pathological_tests.py. Tests must complete in 8 seconds or are errors.
    • Add more pathological tests (Martin Mitas). These tests target the issues #214, #218, #220.
    • Use pledge(2) on OpenBSD (Ashe Connor).
    • Update the Racket wrapper (Eli Barzilay).
    • Makefile: For afl target, don't build tests.
    Source code(tar.gz)
    Source code(zip)
  • 0.28.3(Oct 21, 2017)

  • 0.28.2(Oct 12, 2017)

  • 0.28.1(Oct 11, 2017)

    • --smart: open quote can never occur right after ] or ) (#227).
    • Fix quadratic behavior in finalize (Vicent Marti).
    • Don't use CMAKE_INSTALL_LIBDIR to create libcmark.pc (#236). This wasn't getting set in processing libcmark.pc.in, and we were getting the wrong entry in libcmark.pc. The new approach sets an internal libdir variable to lib${LIB_SUFFIX}. This variable is used both to set the install destination and in the libcmark.pc.in template.
    • Update README.md, replace make astyle with make format (Nguyễn Thái Ngọc Duy).
    Source code(tar.gz)
    Source code(zip)
  • 0.28.0(Aug 2, 2017)

    • Update spec.

    • Use unsigned integer when shifting (Phil Turnbull). Avoids a UBSAN warning which can be triggered when handling a long sequence of backticks.

    • Avoid memcpy'ing NULL pointers (Phil Turnbull). Avoids a UBSAN warning when link title is empty string. The length of the memcpy is zero so the NULL pointer is not dereferenced but it is still undefined behaviour.

    • DeMorgan simplification of some tests in emphasis parser. This also brings the code into closer alignment with the wording of the spec (see jgm/CommonMark#467).

    • Fixed undefined shift in commonmark writer (#211). Found by google/oss-fuzz: https://oss-fuzz.com/v2/testcase-detail/4686992824598528.

    • latex writer: fix memory overflow (#210). We got an array overflow in enumerated lists nested more than 10 deep with start number =/= 1. This commit also ensures that we don't try to set enum_ counters that aren't defined by LaTeX (generally up to enumv). Found by google/oss-fuzz: https://oss-fuzz.com/v2/testcase-detail/5546760854306816.

    • Check for NULL pointer in get_link_type (Phil Turnbull). echo '[](xx:)' | ./build/src/cmark -t latex gave a segfault.

    • Move fuzzing dictionary into single file (Phil Turnbull). This allows AFL and libFuzzer to use the same dictionary

    • Reset bytes after UTF8 proc (Yuki Izumi, #206).

    • Don't scan past an EOL (Yuki Izumi). The existing negated character classes ([^…]) are careful to always include\x00 in the characters excluded, but these . catch-alls can scan right past the terminating NUL placed at the end of the buffer by _scan_at. As such, buffer overruns can occur. Also, don't scan past a newline in HTML block end scanners.

    • Document cases where get_ functions return NULL (#155). E.g. cmark_node_get_url on a non-link or image.

    • Properly handle backslashes in link destinations (#192). Only ascii punctuation characters are escapable, per the spec.

    • Fixed cmark_node_get_list_start to return 0 for bullet lists, as documented (#202).

    • Use CMARK_NO_DELIM for bullet lists (#201).

    • Fixed code for freeing delimiter stack (#189).

    • Removed abort outside of conditional (typo).

    • Removed coercion in error message when aborting from buffer.

    • Print message to stderr when we abort due to memory demands (#188).

    • libcmark.pc: use CMAKE_INSTALL_LIBDIR (#185, Jens Petersen). Needed for multilib distros like Fedora.

    • Fixed buffer overflow error in S_parser_feed (#184). The overflow could occur in the following condition: the buffer ends with \r and the next memory address contains \n.

    • Update emphasis parsing for spec change. Strong now goes inside Emph rather than the reverse, when both scopes are possible. The code is much simpler. This also avoids a spec inconsistency that cmark had previously: ***hi*** became Strong (Emph "hi")) but ***hi**** became Emph (Strong "hi")) "*"

    • Fixes for the LaTeX renderer (#182, Doeme)

      • Don't double-output the link in latex-rendering.
      • Prevent ligatures in dashes sensibly when rendering latex. \- is a hyphenation, so it doesn't get displayed at all.
    • Added a test for NULL when freeing subj->last_delim.

    • Cleaned up setting of lower bounds for openers. We now use a much smaller array.

    • Fix #178, quadratic parsing bug. Add pathological test.

    • Slight improvement of clarity of logic in emph matching.

    • Fix "multiple of 3" determination in emph/strong parsing. We need to store the length of the original delimiter run, instead of using the length of the remaining delimiters after some have been subtracted. Test case: a***b* c*. Thanks to Raph Levin for reporting.

    • Correctly initialize chunk in S_process_line (Nick Wellnhofer, #170). The alloc member wasn't initialized. This also allows to add an assertion in chunk_rtrim which doesn't work for alloced chunks.

    • Added 'make newbench'.

    • scanners.c generated with re2c 0.16 (68K smaller!).

    • scanners.re - fixed warnings; use * for fallback.

    • Fixed some warnings in scanners.re.

    • Update CaseFolding to latest (Kevin Wojniak, #168).

    • Allow balanced nested parens in link destinations (Yuki Izumi, #166)

    • Allocate enough bytes for backticks array.

    • Inlines: Ensure that the delimiter stack is freed in subject.

    • Fixed pathological cases with backtick code spans:

      • Removed recursion in scan_to_closing_backticks

      • Added an array of pointers to potential backtick closers to subject

      • This array is used to avoid traversing the subject again when we've already seen all the potential backtick closers.

      • Added a max bound of 1000 for backtick code span delimiters.

      • This helps with pathological cases like:

          x
          x `
          x ``
          x ```
          x ````
          ...
        
      • Added pathological test case.

      Thanks to Martin Mitáš for identifying the problem and for discussion of solutions.

    • Remove redundant cmake_minimum_required (#163, @kainjow).

    • Make shared and static libraries optional (Azamat H. Hackimov). Now you can enable/disable compilation and installation targets for shared and static libraries via -DCMARK_SHARED=ON/OFF and -DCMARK_STATIC=ON/OFF.

    • Added support for built-in ${LIB_SUFFIX} feature (Azamat H. Hackimov). Replaced ${LIB_INSTALL_DIR} option with built-in ${LIB_SUFFIX} for installing for 32/64-bit systems. Normally, CMake will set ${LIB_SUFFIX} automatically for required enviroment. If you have any issues with it, you can override this option with -DLIB_SUFFIX=64 or -DLIB_SUFFIX="" during configuration.

    • Add Makefile target and harness to fuzz with libFuzzer (Phil Turnbull). This can be run locally with make libFuzzer but the harness will be integrated into oss-fuzz for large-scale fuzzing.

    • Advertise --validate-utf8 in usage information (Nguyễn Thái Ngọc Duy).

    • Makefile: use warnings with re2c.

    • README: Add link to Python wrapper, prettify languages list (Pavlo Kapyshin).

    • README: Add link to cmark-scala (Tim Nieradzik, #196)

    Source code(tar.gz)
    Source code(zip)
  • 0.27.1(Nov 19, 2016)

    • Set policy for CMP0063 to avoid a warning (#162). Put set_policy under cmake version test. Otherwise we get errors in older versions of cmake.
    • Use VERSION_GREATER to clean up cmake version test.
    • Improve afl target. Use afl-clang by default. Set default for path.
    Source code(tar.gz)
    Source code(zip)
  • 0.27.0(Nov 18, 2016)

    • Update spec to 0.27.
    • Fix warnings building with MSVC on Windows (#165, Hugh Bellamy).
    • Fix CMAKE_C_VISIBILITY_PRESET for cmake versions greater than 1.8 (e.g. 3.6.2) (#162, Hugh Bellamy). This lets us build swift-cmark on Windows, using clang-cl.
    • Fix for non-matching entities (#161, Yuki Izumi).
    • Modified print_delimiters (commented out) so it compiles again.
    • make format: don't change order of includes.
    • Changed logic for null/eol checks (#160).
      • only check once for "not at end of line"
      • check for null before we check for newline characters (the previous patch would fail for NULL + CR)
    • Fix by not advancing past both \0 and \n (Yuki Izumi).
    • Add test for NUL-LF sequence (Yuki Izumi).
    • Fix memory leak in list parsing (Yuki Izumi).
    • Use cmark_mem to free where used to alloc (Yuki Izumi).
    • Allow a shortcut link before a ( (jgm/CommonMark#427).
    • Allow tabs after setext header line (jgm/commonmark.js#109).
    • Don't let URI schemes start with spaces.
    • Fixed h2..h6 HTML blocks (jgm/CommonMark#430). Added regression test.
    • Autolink scheme can contain digits (Gábor Csárdi).
    • Fix nullary function declarations in cmark.h (Nick Wellnhofer). Fixes strict prototypes warnings.
    • COPYING: Update file name and remove duplicate section and (Peter Eisentraut).
    • Fix typo (Pavlo Kapyshin).
    Source code(tar.gz)
    Source code(zip)
  • 0.26.1(Jul 16, 2016)

    • Removed unnecessary typedef that caused build failure on some platforms.
    • Use $(MAKE) in Makefile instead of hardcoded make (#146, Tobias Kortkamp).
    Source code(tar.gz)
    Source code(zip)
  • 0.26.0(Jul 15, 2016)

    • Implement spec changes for list items:
      • Empty list items cannot interrupt paragraphs.
      • Ordered lists cannot interrupt paragraphs unless they start with 1.
      • Removed "two blank lines break out of a list" feature.
    • Fix sourcepos for blockquotes (#142).
    • Fix sourcepos for atx headers (#141).
    • Fix ATX headers and thematic breaks to allow tabs as well as spaces.
    • Fix chunk_set_cstr with suffix of current string (#139, Nick Wellnhofer). It's possible that cmark_chunk_set_cstr is called with a substring (suffix) of the current string. Delay freeing of the chunk content to handle this case correctly.
    • Export targets on installation (Jonathan Müller). This allows using them in other cmake projects.
    • Fix cmake warning about CMP0048 (Jonathan Müller).
    • commonmark renderer: Ensure we don't have a blank line before a code block when it's the first thing in a list item.
    • Change parsing of strong/emph in response to spec changes. process_emphasis now gets better results in corner cases. The change is this: when considering matches between an interior delimiter run (one that can open and can close) and another delimiter run, we require that the sum of the lengths of the two delimiter runs mod 3 is not 0.
    • Ported Robin Stocker's changes to link parsing in jgm/commonmark#101. This uses a separate stack for brackets, instead of putting them on the delimiter stack. This avoids the need for looking through the delimiter stack for the next bracket.
    • cmark_reference_lookup: Return NULL if reference is null string.
    • Fix character type detection in commonmark.c (Nick Wellnhofer). Fixes test failures on Windows and undefined behavior.
      • Implement cmark_isalpha.
      • Check for ASCII character before implicit cast to char.
      • Use internal ctype functions in commonmark.c.
    • Better documentation of memory-freeing responsibilities. in cmark.h and its man page (#124).
    • Use library functions to insert nodes in emphasis/link processing. Previously we did this manually, which introduces many places where errors can creep in.
    • Correctly handle list marker followed only by spaces. Previously when a list marker was followed only by spaces, cmark expected the following content to be indented by the same number of spaces. But in this case we should treat the line just like a blank line and set list padding accordingly.
    • Fixed a number of issues relating to line wrapping.
      • Extend CMARK_OPT_NOBREAKS to all renderers and add --nobreaks.
      • Do not autowrap, regardless of width parameter, if CMARK_OPT_NOBREAKS is set.
      • Fixed CMARK_OPT_HARDBREAKS for LaTeX and man renderers.
      • Ensure that no auto-wrapping occurs if CMARK_OPT_NOBREAKS is enabled, or if output is CommonMark and CMARK_OPT_HARDBREAKS is enabled.
    • Set stdin to binary mode on Windows (Nick Wellnhofer, #113). This fixes EOLs when reading from stdin.
    • Add library option to render softbreaks as spaces (Pavlo Kapyshin). Note that the NOBREAKS option is HTML-only
    • renderer: no_linebreaks instead of no_wrap. We generally want this option to prohibit any breaking in things like headers (not just wraps, but softbreaks).
    • Coerce realurllen to int. This is an alternate solution for pull request #132, which introduced a new warning on the comparison (Benedict Cohen).
    • Remove unused variable link_text (Mathiew Duponchelle).
    • Improved safety checks in buffer (Vicent Marti).
    • Add new interface allowing specification of custom memory allocator for nodes (Vicent Marti). Added cmark_node_new_with_mem, cmark_parser_new_with_mem, cmark_mem to API.
    • Reduce storage size for nodes by using bit flags instead of separate booleans (Vicent Marti).
    • config: Add SSIZE_T compat for Win32 (Vicent Marti).
    • cmake: Global handler for OOM situations (Vicent Marti).
    • Add tests for memory exhaustion (Vicent Marti).
    • Document in man page and public header that one should use the same memory allocator for every node in a tree.
    • Fix ctypes in Python FFI calls (Nick Wellnhofer). This didn't cause problems so far because all types are 32-bit on 32-bit systems and arguments are passed in registers on x86-64. The wrong types could cause crashes on other platforms, though.
    • Remove spurious failures in roundtrip tests. In the commonmark writer we separate lists, and lists and indented code, using a dummy HTML comment. So in evaluating the round-trip tests, we now strip out these comments. We also normalize HTML to avoid issues having to do with line breaks.
    • Add 2016 to copyright (Kevin Burke).
    • Added to_commonmark in test/cmark.py (for round-trip tests).
    • spec_test.py - parameterize do_test with converter.
    • spec_tests.py: exit code is now sum of failures and errors. This ensures that a failing exit code will be given when there are errors, not just with failures.
    • Fixed round trip tests. Previously they actually ran cmark instead of the round-trip version, since there was a bug in setting the ROUNDTRIP variable (#131).
    • Added new roundtrip_tests.py. This replaces the old use of simple shell scripts. It is much faster, and more flexible. (We will be able to do custom normalization and skip certain tests.)
    • Fix tests under MinGW (Nick Wellnhofer).
    • Fix leak in api_test (Mathieu Duponchelle).
    • Makefile: have leakcheck stop on first error instead of going through all the formats and options and probably getting the same output.
    • Add regression tests (Nick Wellnhofer).
    Source code(tar.gz)
    Source code(zip)
  • 0.25.2(Mar 26, 2016)

    • Open files in binary mode (#113, Nick Wellnhofer). Now that cmark supports different line endings, files must be openend in binary mode on Windows.
    • Reset partially_consumed_tab on every new line (#114, Nick Wellnhofer).
    • Handle buffer split across a CRLF line ending (#117). Adds an internal field to the parser struct to keep track of last_buffer_ended_with_cr. Added test.
    Source code(tar.gz)
    Source code(zip)
  • 0.25.1(Mar 25, 2016)

    • Release with no code changes. cmark version was mistakenly set to 0.25.1 in the 0.25.0 release (#112), so this release just ensures that this will cause no confusion later.
    Source code(tar.gz)
    Source code(zip)
  • 0.25.0(Mar 25, 2016)

    • Fixed tabs in indentation (#101). This patch fixes S_advance_offset so that it doesn't gobble a tab character when advancing less than the width of a tab.

    • Added partially_consumed_tab to parser. This keeps track of when we have gotten partway through a tab when consuming initial indentation.

    • Simplified add_line (only need parser parameter).

    • Properly handle partially consumed tab. E.g. in

      - foo
      
       <TAB><TAB>bar
      

      we should consume two spaces from the second tab, including two spaces in the code block.

    • Properly handle tabs with blockquotes and fenced blocks.

    • Fixed handling of tabs in lists.

    • Clarified logic in S_advance_offset.

    • Use an assertion to check for in-range html_block_type. It's a programming error if the type is out of range.

    • Refactored S_processLines to make the logic easier to understand, and added documentation (Mathieu Duponchelle).

    • Removed unnecessary check for empty string_content.

    • Factored out contains_inlines.

    • Moved the cmake minimum version to top line of CMakeLists.txt (tinysun212).

    • Fix ctype(3) usage on NetBSD (Kamil Rytarowski). We need to cast value passed to isspace(3) to unsigned char to explicitly prevent possibly undefined behavior.

    • Compile in plain C mode with MSVC 12.0 or newer (Nick Wellnhofer). Under MSVC, we used to compile in C++ mode to get some C99 features like mixing declarations and code. With newer MSVC versions, it's possible to build in plain C mode.

    • Switched from "inline" to "CMARK_INLINE" (Nick Wellnhofer). Newer MSVC versions support enough of C99 to be able to compile cmark in plain C mode. Only the "inline" keyword is still unsupported. We have to use "__inline" instead.

    • Added include guards to config.h

    • config.h.in - added compatibility snprintf, vsnprintf for MSVC.

    • Replaced sprintf with snprintf (Marco Benelli).

    • config.h: include stdio.h for _vscprintf etc.

    • Include starg.h when needed in config.h.

    • Removed an unnecessary C99-ism in buffer.c. This helps compiling on systems like luarocks that don't have all the cmake configuration goodness (thanks to carlmartus).

    • Don't use variable length arrays (Nick Wellnhofer). They're not supported by MSVC.

    • Test with multiple MSVC versions under Appveyor (Nick Wellnhofer).

    • Fix installation dir of man-pages on NetBSD (Kamil Rytarowski).

    • Fixed typo in cmark.h comments (Chris Eidhof).

    • Clarify in man page that cmark_node_free frees a node's children too.

    • Fixed documentation of --width in man page.

    • Require re2c >= 1.14.2 (#102).

    • Generated scanners.c with more recent re2c.

    Source code(tar.gz)
    Source code(zip)
  • 0.24.1(Jan 18, 2016)

    • Commonmark renderer:
      • Use HTML comment, not two blank lines, to separate a list item from a following code block or list. This makes the output more portable, since the "two blank lines" rule is unique to CommonMark. Also, it allows us to break out of a sublist without breaking out of all levels of nesting.
      • is_autolink - handle case where link has no children, which previously caused a segfault.
      • Use 4-space indent for bullet lists, for increased portability.
      • Use 2-space + newline for line break for increased portability (#90).
      • Improved punctuation escaping. Previously all ) and . characters after digits were escaped; now they are only escaped if they are genuinely in a position where they'd cause a list item. This is achieved by changes in render.c: (a) renderer->begin_content is only set to false after a string of digits at the beginning of the line, and (b) we never break a line before a digit. Also, begin_content is properly initialized to true.
    • Handle NULL root in consolidate_text_nodes.
    Source code(tar.gz)
    Source code(zip)
  • 0.24.0(Jan 13, 2016)

    • [API change] Added cmark_node_replace(oldnode, newnode).
    • Updated spec.txt to 0.24.
    • Fixed edge case with escaped parens in link destination (#97). This was also checked against the #82 case with asan.
    • Removed unnecessary check for fenced in cmark_render_html. It's sufficient to check that the info string is empty. Indeed, those who use the API may well create a code block with an info string without explicitly setting fenced.
    • Updated format of test/smart_punct.txt.
    • Updated test/spec.txt, test/smart_punct.txt, and spec_tests.py to new format.
    • Fixed get_containing_block logic in src/commonmark.c. This did not allow for the possibility that a node might have no containing block, causing the commonmark renderer to segfault if passed an inline node with no block parent.
    • Fixed string representations of CUSTOM_BLOCK, CUSTOM_INLINE. The old versions raw_inline and raw_block were being used, and this led to incorrect xml output.
    • Use default opts in python sample wrapper.
    • Allow multiline setext header content, as per spec.
    • Don't allow spaces in link destinations, even with pointy brackets. Conforms to latest change in spec.
    • Updated scheme scanner according to spec change. We no longer use a whitelist of valid schemes.
    • Allow any kind of nodes as children of CUSTOM_BLOCK (#96).
    • cmark.h: moved typedefs for iterator into iterator section. This just moves some code around so it makes more sense to read, and in the man page.
    • Fixed make_man_page.py so it includes typedefs again.
    Source code(tar.gz)
    Source code(zip)
  • 0.23.0(Dec 29, 2015)

    • [API change] Added CUSTOM_BLOCK and CUSTOM_INLINE node types. They are never generated by the parser, and do not correspond to CommonMark elements. They are designed to be inserted by filters that postprocess the AST. For example, a filter might convert specially marked code blocks to svg diagrams in HTML and tikz diagrams in LaTeX, passing these through to the renderer as a CUSTOM_BLOCK. These nodes can have children, but they also have literal text to be printed by the renderer "on enter" and "on exit." Added cmark_node_get_on_enter, cmark_node_set_on_enter, cmark_node_get_on_exit, cmark_node_set_on_exit to API.
    • [API change] Rename NODE_HTML -> NODE_HTML_BLOCK, NODE_INLINE_HTML -> NODE_HTML_INLINE. Define aliases so the old names still work, for backwards compatibility.
    • [API change] Rename CMARK_NODE_HEADER -> CMARK_NODE_HEADING. Note that for backwards compatibility, we have defined aliases: CMARK_NODE_HEADER = CMARK_NODE_HEADING, cmark_node_get_header_level = cmark_node_get_heading_level, and cmark_node_set_header_level = cmark_node_set_heading_level.
    • [API change] Rename CMARK_NODE_HRULE -> CMARK_NODE_THEMATIC_BREAK. Defined the former as the latter for backwards compatibility.
    • Don't allow space between link text and link label in a reference link (spec change).
    • Separate parsing and rendering opts in cmark.h (#88). This change also changes some of these constants' numerical values, but nothing should change in the API if you use the constants themselves. It should now be clear in the man page which options affect parsing and which affect rendering.
    • xml renderer - Added xmlns attribute to document node (jgm/CommonMark#87).
    • Commonmark renderer: ensure html blocks surrounded by blanks. Otherwise we get failures of roundtrip tests.
    • Commonmark renderer: ensure that literal characters get escaped when they're at the beginning of a block, e.g. > \- foo.
    • LaTeX renderer - better handling of internal links. Now we render [foo](#bar) as \protect\hyperlink{bar}{foo}.
    • Check for NULL pointer in _scan_at (#81).
    • Makefile.nmake: be more robust when cmake is missing. Previously, when cmake was missing, the build dir would be created anyway, and subsequent attempts (even with cmake) would fail, because cmake would not be run. Depending on build/CMakeFiles is more robust -- this won't be created unless cmake is run. Partially addresses #85.
    • Fixed DOCTYPE in xml output.
    • commonmark.c: fix size_t to int. This fixes an MSVC warning "conversion from 'size_t' to 'int', possible loss of data" (Kevin Wojniak).
    • Correct string length in cmark_parse_document example (Lee Jeffery).
    • Fix non-ASCII end-of-line character check (andyuhnak).
    • Fix "declaration shadows a local variable" (Kevin Wojniak).
    • Install static library (jgm/CommonMark#381).
    • Fix warnings about dropping const qualifier (Kevin Wojniak).
    • Use full (unabbreviated) versions of constants (CMARK_...).
    • Removed outdated targets from Makefile.
    • Removed need for sudo in make bench.
    • Improved benchmark. Use longer test, since time has limited resolution.
    • Removed bench.h and timing calls in main.c.
    • Updated API docs; getters return empty strings if not set rather than NULL, as previously documented.
    • Added api_tests for custom nodes.
    • Made roundtrip test part of the test suite run by cmake.
    • Regenerate scanners.c using re2c 0.15.3.
    • Adjusted scanner for link url. This fixes a heap buffer overflow (#82).
    • Added version number (1.0) to XML namespace. We don't guarantee stability in this until 1.0 is actually released, however.
    • Removed obsolete TIMER macro.
    • Make LIB_INSTALL_DIR configurable (Mathieu Bridon, #79).
    • Removed out-of-date luajit wrapper.
    • Use input, not parser->curline to determine last line length.
    • Small optimizations in _scan_at.
    • Replaced hard-coded 4 with TAB_STOP.
    • Have make format reformat api tests as well.
    • Added api tests for man, latex, commonmark, and xml renderers (#51).
    • render.c: added begin_content field. This is like begin_line except that it doesn't trigger production of the prefix. So it can be set after an initial prefix (say >) is printed by the renderer, and consulted in determining whether to escape content that has a special meaning at the beginning of a line. Used in the commonmark renderer.
    • Python 3.5 compatibility: don't require HTMLParseError (Zhiming Wang). HTMLParseError was removed in Python 3.5. Since it could never be thrown in Python 3.5+, we simply define a placeholder when HTMLParseError cannot be imported.
    • Set convert_charrefs=False in normalize.py (#83). This defeats the new default as of python 3.5, and allows the script to work with python 3.5.
    Source code(tar.gz)
    Source code(zip)
  • 0.22.0(Aug 24, 2015)

    • Removed pre from blocktags scanner. pre is handled separately in rule 1 and needn't be handled in rule 6.
    • Added iframe to list of blocktags, as per spec change.
    • Fixed bug with HRULE after blank line. This previously caused cmark to break out of a list, thinking it had two consecutive blanks.
    • Check for empty string before trying to look at line ending.
    • Make sure every line fed to S_process_line ends with \n (#72). So S_process_line sees only unix style line endings. Ultimately we probably want a better solution, allowing the line ending style of the input file to be preserved. This solution forces output with newlines.
    • Improved cmark_strbuf_normalize_whitespace (#73). Now all characters that satisfy cmark_isspace are recognized as whitespace. Previously \r and \t (and others) weren't included.
    • Treat line ending with EOF as ending with newline (#71).
    • Fixed --hardbreaks with \r\n line breaks (#68).
    • Disallow list item starting with multiple blank lines (jgm/CommonMark#332).
    • Allow tabs before closing #s in ATX header
    • Removed cmark_strbuf_printf and cmark_strbuf_vprintf. These are no longer needed, and cause complications for MSVC. Also removed HAVE_VA_COPY and HAVE_C99_SNPRINTF feature tests.
    • Added option to disable tests (Kevin Wojniak).
    • Added CMARK_INLINE macro.
    • Removed need to disable MSVC warnings 4267, 4244, 4800 (Kevin Wojniak).
    • Fixed MSVC inline errors when cmark is included in sources that don't have the same set of disabled warnings (Kevin Wojniak).
    • Fix FileNotFoundError errors on tests when cmark is built from another project via add_subdirectory() (Kevin Wojniak).
    • Prefix utf8proc functions to avoid conflict with existing library (Kevin Wojniak).
    • Avoid name clash between Windows .pdb files (Nick Wellnhofer).
    • Improved smart_punct.txt (see jgm/commonmark.js#61).
    • Set POSITION_INDEPENDENT_CODE ON for static library (see #39).
    • make bench: allow overriding BENCHFILE. Previously if you did this, it would clopper BENCHFILE with the default bench file.
    • make bench: Use -10 priority with renice.
    • Improved make_autolink. Ensures that title is chunk with empty string rather than NULL, as with other links.
    • Added clang-check target.
    • Travis: split roundtrip_test and leakcheck (OGINO Masanori).
    • Use clang-format, llvm style, for formatting. Reformatted all source files. Added format target to Makefile. Removed astyle target. Updated .editorconfig.
    Source code(tar.gz)
    Source code(zip)
  • 0.21.0(Jul 15, 2015)

    • Updated to version 0.21 of spec.
    • Added latex renderer (#31). New exported function in API: cmark_render_latex. New source file: src/latex.hs.
    • Updates for new HTML block spec. Removed old html_block_tag scanner. Added new html_block_start and html_block_start_7, as well as html_block_end_n for n = 1-5. Rewrote block parser for new HTML block spec.
    • We no longer preprocess tabs to spaces before parsing. Instead, we keep track of both the byte offset and the (virtual) column as we parse block starts. This allows us to handle tabs without converting to spaces first. Tabs are left as tabs in the output, as per the revised spec.
    • Removed utf8 validation by default. We now replace null characters in the line splitting code.
    • Added CMARK_OPT_VALIDATE_UTF8 option and command-line option --validate-utf8. This option causes cmark to check for valid UTF-8, replacing invalid sequences with the replacement character, U+FFFD. Previously this was done by default in connection with tab expansion, but we no longer do it by default with the new tab treatment. (Many applications will know that the input is valid UTF-8, so validation will not be necessary.)
    • Added CMARK_OPT_SAFE option and --safe command-line flag.
      • Added CMARK_OPT_SAFE. This option disables rendering of raw HTML and potentially dangerous links.
      • Added --safe option in command-line program.
      • Updated cmark.3 man page.
      • Added scan_dangerous_url to scanners.
      • In HTML, suppress rendering of raw HTML and potentially dangerous links if CMARK_OPT_SAFE. Dangerous URLs are those that begin with javascript:, vbscript:, file:, or data: (except for image/png, image/gif, image/jpeg, or image/webp mime types).
      • Added api_test for OPT_CMARK_SAFE.
      • Rewrote README.md on security.
    • Limit ordered list start to 9 digits, per spec.
    • Added width parameter to render_man (API change).
    • Extracted common renderer code from latex, man, and commonmark renderers into a separate module, renderer.[ch] (#63). To write a renderer now, you only need to write a character escaping function and a node rendering function. You pass these to cmark_render and it handles all the plumbing (including line wrapping) for you. So far this is an internal module, but we might consider adding it to the API in the future.
    • commonmark writer: correctly handle email autolinks.
    • commonmark writer: escape !.
    • Fixed soft breaks in commonmark renderer.
    • Fixed scanner for link url. re2c returns the longest match, so we were getting bad results with [link](foo\(and\(bar\)\)) which it would parse as containing a bare \ followed by an in-parens chunk ending with the final paren.
    • Allow non-initial hyphens in html tag names. This allows for custom tags, see jgm/CommonMark#239.
    • Updated test/smart_punct.txt.
    • Implemented new treatment of hyphens with --smart, converting sequences of hyphens to sequences of em and en dashes that contain no hyphens.
    • HTML renderer: properly split info on first space char (see jgm/commonmark.js#54).
    • Changed version variables to functions (#60, Andrius Bentkus). This is easier to access using ffi, since some languages, like C# like to use only function interfaces for accessing library functionality.
    • process_emphasis: Fixed setting lower bound to potential openers. Renamed potential_openers -> openers_bottom. Renamed start_delim -> stack_bottom.
    • Added case for #59 to pathological_test.py.
    • Fixed emphasis/link parsing bug (#59).
    • Fixed off-by-one error in line splitting routine. This caused certain NULLs not to be replaced.
    • Don't rtrim in subject_from_buffer. This gives bad results in parsing reference links, where we might have trailing blanks (finalize removes the bytes parsed as a reference definition; before this change, some blank bytes might remain on the line).
      • Added column and first_nonspace_column fields to parser.
      • Added utility function to advance the offset, computing the virtual column too. Note that we don't need to deal with UTF-8 here at all. Only ASCII occurs in block starts.
      • Significant performance improvement due to the fact that we're not doing UTF-8 validation.
    • Fixed entity lookup table. The old one had many errors. The new one is derived from the list in the npm entities package. Since the sequences can now be longer (multi-code-point), we have bumped the length limit from 4 to 8, which also affects houdini_html_u.c. An example of the kind of error that was fixed: &ngE; should be rendered as "≧̸" (U+02267 U+00338), but it was being rendered as "≧" (which is the same as &gE;).
    • Replace gperf-based entity lookup with binary tree lookup. The primary advantage is a big reduction in the size of the compiled library and executable (> 100K). There should be no measurable performance difference in normal documents. I detected only a slight performance hit in a file containing 1,000,000 entities.
      • Removed src/html_unescape.gperf and src/html_unescape.h.
      • Added src/entities.h (generated by tools/make_entities_h.py).
      • Added binary tree lookup functions to houdini_html_u.c, and use the data in src/entities.h.
      • Renamed entities.h -> entities.inc, and tools/make_entities_h.py -> tools/make_entitis_inc.py.
    • Fixed cases like [ref]: url "title" ok Here we should parse the first line as a reference.
    • inlines.c: Added utility functions to skip spaces and line endings.
    • Fixed backslashes in link destinations that are not part of escapes (jgm/commonmark#45).
    • process_line: Removed "add newline if line doesn't have one." This isn't actually needed.
    • Small logic fixes and a simplification in process_emphasis.
    • Added more pathological tests:
      • Many link closers with no openers.
      • Many link openers with no closers.
      • Many emph openers with no closers.
      • Many closers with no openers.
      • "*a_ " * 20000.
    • Fixed process_emphasis to handle new pathological cases. Now we have an array of pointers (potential_openers), keyed to the delim char. When we've failed to match a potential opener prior to point X in the delimiter stack, we reset potential_openers for that opener type to X, and thus avoid having to look again through all the openers we've already rejected.
    • process_inlines: remove closers from delim stack when possible. When they have no matching openers and cannot be openers themselves, we can safely remove them. This helps with a performance case: "a_ " * 20000 (jgm/commonmark.js#43).
    • Roll utf8proc_charlen into utf8proc_valid (Nick Wellnhofer). Speeds up "make bench" by another percent.
    • spec_tests.py: allow for tab in HTML examples.
    • normalize.py: don't collapse whitespace in pre contexts.
    • Use utf-8 aware re2c.
    • Makefile afl target: removed -m none, added CMARK_OPTS.
    • README: added make afl instructions.
    • Limit generated generated cmark.3 to 72 character line width.
    • Travis: switched to containerized build system.
    • Removed debug.h. (It uses GNU extensions, and we don't need it anyway.)
    • Removed sundown from benchmarks, because the reading was anomalous. sundown had an arbitrary 16MB limit on buffers, and the benchmark input exceeded that. So who knows what we were actually testing? Added hoedown, sundown's successor, which is a better comparison.
    Source code(tar.gz)
    Source code(zip)
  • 0.20.0(Jun 8, 2015)

    • Fixed bug in list item parsing when items indented >= 4 spaces (#52).
    • Don't allow link labels with no non-whitespace characters (jgm/CommonMark#322).
    • Fixed multiple issues with numeric entities (#33, Nick Wellnhofer).
    • Support CR and CRLF line endings (Ben Trask).
    • Added test for different line endings to api_test.
    • Allow NULL value in string setters (Nick Wellnhofer). (NULL produces a 0-length string value.) Internally, URL and title are now stored as cmark_chunk rather than char *.
    • Fixed memory leak in cmark_consolidate_text_nodes (#32).
    • Fixed is_autolink in the CommonMark renderer (#50). Previously any link with an absolute URL was treated as an autolink.
    • Cope with broken snprintf on Windows (Nick Wellnhofer). On Windows, snprintf returns -1 if the output was truncated. Fall back to Windows-specific _scprintf.
    • Switched length parameter on cmark_markdown_to_html, cmark_parser_feed, and cmark_parse_document from int to size_t (#53, Nick Wellnhofer).
    • Use a custom type bufsize_t for all string sizes and indices. This allows to switch to 64-bit string buffers by changing a single typedef and a macro definition (Nick Wellnhofer).
    • Hardened the strbuf code, checking for integer overflows and adding range checks (Nick Wellnhofer).
    • Removed unused function cmark_strbuf_attach (Nick Wellnhofer).
    • Fixed all implicit 64-bit to 32-bit conversions that -Wshorten-64-to-32 warns about (Nick Wellnhofer).
    • Added helper function cmark_strbuf_safe_strlen that converts from size_t to bufsize_t and throws an error in case of an overflow (Nick Wellnhofer).
    • Abort on strbuf out of memory errors (Nick Wellnhofer). Previously such errors were not being trapped. This involves some internal changes to the buffer library that do not affect the API.
    • Factored out S_find_first_nonspace in S_proces_line. Added fields offset, first_nonspace, indent, and blank to cmark_parser struct. This just removes some repetition.
    • Added Racket Racket (5.3+) wrapper (Eli Barzilay).
    • Removed -pg from Debug build flags (#47).
    • Added Ubsan build target, to check for undefined behavior.
    • Improved make leakcheck. We now return an error status if anything in the loop fails. We now check --smart and --normalize options.
    • Removed wrapper3.py, made wrapper.py work with python 2 and 3. Also improved the wrapper to work with Windows, and to use smart punctuation (as an example).
    • In wrapper.rb, added argument for options.
    • Revised luajit wrapper.
    • Added build status badges to README.md.
    • Added links to go, perl, ruby, R, and Haskell bindings to README.md.
    Source code(tar.gz)
    Source code(zip)
  • 0.19.0(Apr 29, 2015)

    • Fixed _ emphasis parsing to conform to spec (jgm/CommonMark#317).
    • Updated spec.txt.
    • Compile static library with -DCMARK_STATIC_DEFINE (Nick Wellnhofer).
    • Suppress warnings about Windows runtime library files (Nick Wellnhofer). Visual Studio Express editions do not include the redistributable files. Set CMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_NO_WARNINGS to suppress warnings.
    • Added appyeyor: Windows continuous integration (appveyor.yml).
    • Use os.path.join in test/cmark.py for proper cross-platform paths.
    • Fixed Makefile.nmake.
    • Improved make afl: added test/afl_dictionary, increased timeout for hangs.
    • Improved README with a description of the library's strengths.
    • Pass-through Unicode non-characters (Nick Wellnhofer). Despite their name, Unicode non-characters are valid code points. They should be passed through by a library like libcmark.
    • Check return status of utf8proc_iterate (#27).
    Source code(tar.gz)
    Source code(zip)
  • 0.18.3(Apr 1, 2015)

    • Include patch level in soname (Nick Wellnhofer). Minor version is tied to spec version, so this allows breaking the ABI between spec releases.
    • Install compiler-provided system runtime libraries (Changjiang Yang).
    • Use strbuf_printf instead of snprintf. snprintf is not available on some platforms (Visual Studio 2013 and earlier).
    • Fixed memory access bug: "invalid read of size 1" on input [link](<>).
    Source code(tar.gz)
    Source code(zip)
  • 0.18.2(Mar 30, 2015)

    • Added commonmark renderer: cmark_render_commonmark. In addition to options, this takes a width parameter. A value of 0 disables wrapping; a positive value wraps the document to the specified width. Note that width is automatically set to 0 if the CMARK_OPT_HARDBREAKS option is set.
    • The cmark executable now allows -t commonmark for output as CommonMark. A --width option has been added to specify wrapping width.
    • Added roundtrip_test Makefile target. This runs all the spec through the commonmark renderer, and then through the commonmark parser, and compares normalized HTML to the test. All tests pass with the current parser and renderer, giving us some confidence that the commonmark renderer is sufficiently robust. Eventually this should be pythonized and put in the cmake test routine.
    • Removed an unnecessary check in blocks.c. By the time we check for a list start, we've already checked for a horizontal rule, so we don't need to repeat that check here. Thanks to Robin Stocker for pointing out a similar redundancy in commonmark.js.
    • Fixed bug in cmark_strbuf_unescape (buffer.c). The old function gave incorrect results on input like \\*, since the next backslash would be treated as escaping the * instead of being escaped itself.
    • scanners.re: added _scan_scheme, scan_scheme, used in the commonmark renderer.
    • Check for CMAKE_C_COMPILER (not CC_COMPILER) when setting C flags.
    • Update code examples in documentation, adding new parser option argument, and using CMARK_OPT_DEFAULT (Nick Wellnhofer).
    • Added options parameter to cmark_markdown_to_html.
    • Removed obsolete reference to CMARK_NODE_LINK_LABEL.
    • make leakcheck now checks all output formats.
    • test/cmark.py: set default options for markdown_to_html.
    • Warn about buggy re2c versions (Nick Wellnhofer).
    Source code(tar.gz)
    Source code(zip)
  • 0.18.1(Mar 10, 2015)

  • 0.18(Mar 4, 2015)

    • Switch to 2-clause BSD license, with agreement of contributors.
    • Added Profile build type, make prof target.
    • Fixed autolink scanner to conform to the spec. Backslash escapes not allowed in autolinks.
    • Don't rely on strnlen being available (Nick Wellnhofer).
    • Updated scanners for new whitespace definition.
    • Added CMARK_OPT_SMART and --smart option, smart.c, smart.h.
    • Added test for --smart option.
    • Fixed segfault with --normalize (closes #7).
    • Moved normalization step from XML renderer to cmark_parser_finish.
    • Added options parameter to cmark_parse_document, cmark_parse_file.
    • Fixed man renderer's escaping for unicode characters.
    • Don't require python3 to make cmark.3 man page.
    • Use ASCII escapes for punctuation characters for portability.
    • Made options an int rather than a long, for consistency.
    • Packed cmark_node struct to fit into 128 bytes. This gives a small performance boost and lowers memory usage.
    • Repacked delimiter struct to avoid hole.
    • Fixed use-after-free bug, which arose when a paragraph containing only reference links and blank space was finalized (#9). Avoid using parser->current in the loop that creates new blocks, since finalize in add_child may have removed the current parser (if it contains only reference definitions). This isn't a great solution; in the long run we need to rewrite to make the logic clearer and to make it harder to make mistakes like this one.
    • Added 'Asan' build type. make asan will link against ASan; the resulting executable will do checks for memory access issues. Thanks @JordanMilne for the suggestion.
    • Add Makefile target to fuzz with AFL (Nick Wellnhofer) The variable $AFL_PATH must point to the directory containing the AFL binaries. It can be set as an environment variable or passed to make on the command line.
    Source code(tar.gz)
    Source code(zip)
  • 0.17(Jan 25, 2015)

    • Stripped out all JavaScript related code and documentation, moving it to a separate repository (https://github.com/jgm/commonmark.js).
    • Improved Makefile targets, so that cmake is run again only when necessary (Nick Wellnhofer).
    • Added INSTALL_PREFIX to the Makefile, allowing installation to a location other than /usr/local without invoking cmake manually (Nick Wellnhofer).
    • make test now guarantees that the project will be rebuilt before tests are run (Nick Wellnhofer).
    • Prohibited overriding of some Makefile variables (Nick Wellnhofer).
    • Provide version number and string, both as macros (CMARK_VERSION, CMARK_VERSION_STRING) and as symbols (cmark_version, cmark_version_string) (Nick Wellnhofer). All of these come from cmark_version.h, which is constructed from a template cmark_version.h.in and data in CMakeLists.txt.
    • Avoid calling free on null pointer.
    • Added an accessor for an iterator's root node (cmark_iter_get_root).
    • Added user data field for nodes (Nick Wellnhofer). This is intended mainly for use in bindings for dynamic languages, where it could store a pointer to a target language object (#287). But it can be used for anything.
    • Man renderer: properly escape multiline strings.
    • Added assertion to raise error if finalize is called on a closed block.
    • Implemented the new spec rule for emphasis and strong emphasis with _.
    • Moved the check for fence-close with the other checks for end-of-block.
    • Fixed a bug with loose list detection with items containings fenced code blocks (#285).
    • Removed recursive algorithm in ends_with_blank_line (#286).
    • Minor code reformatting: renamed parameters.
    Source code(tar.gz)
    Source code(zip)
Owner
CommonMark
A strongly specified, highly compatible implementation of Markdown
CommonMark
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

libpostal: international street address NLP libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP a

openvenues 3.6k Dec 4, 2022
Command-line arguments parsing library.

argparse argparse - A command line arguments parsing library in C (compatible with C++). Description This module is inspired by parse-options.c (git)

Yecheng Fu 527 Nov 27, 2022
Header only roguelike rendering library.

Header only roguelike rendering library. Support for Opengl33 and Raylib. Features Support for custom glyph atlasses with up to 65655 tiles of custom

Journeyman 8 Nov 4, 2022
tlRender, or timeline render, is an early stage project for rendering editorial timelines

tlRender tlRender, or timeline render, is an early stage project for rendering editorial timelines. The project includes libraries for rendering timel

Darby Johnston 82 Nov 21, 2022
A docker image where you can run a judge program and a converter for multiple sequence alignment

genocon2021-docker 本リポジトリでは、ジャッジプログラム(eval.c)と Multiple Sequence Alignment (MSA) 変換プログラム(decode_cigar.py)を同梱した Docker イメージを提供しています。 また、サンプル解答プログラム(sam

Sakamoto, Kazunori 4 Sep 20, 2021
XEphem is an interactive astronomy program for all UNIX platforms.

XEphem is an interactive astronomy program for all UNIX platforms. More screenshots are shown below.

null 74 Nov 30, 2022
A System Fetching Program written in C.

A System Fetching Program written in C.

ABHacker Official 7 Nov 24, 2022
A simple program to suspend or hibernate your computer

A simple program to suspend or hibernate your computer. It supports hooks before and after suspending.

Jakub Jirutka 12 Nov 9, 2022
Context Free Grammars to Pushdown Automata Conversion, C++ program

CFG-to-PDA-Conversion Context Free Grammars to Pushdown Automata Conversion, C++ program USF Group Project: Was in charge of Lambda Removal, Unit Remo

Matias 0 Mar 15, 2022
Add colors to your program in C with umbrella.h

☂️ umbrella ☂️ Add colors to your program in C with umbrella.h Using in projects

Marcello Belanda 1 Jan 18, 2022
Isocline is a pure C library that can be used as an alternative to the GNU readline library

Isocline: a portable readline alternative. Isocline is a pure C library that can be used as an alternative to the GNU readline library (latest release

Daan 135 Nov 25, 2022
A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename.

whereami A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename. usag

Blackle Morisanchetto 3 Sep 24, 2022
Locate the current executable and the current module/library on the file system

Where Am I? A drop-in two files library to locate the current executable and the current module on the file system. Supported platforms: Windows Linux

Gregory Pakosz 380 Nov 20, 2022
A small and portable INI file library with read/write support

minIni minIni is a portable and configurable library for reading and writing ".INI" files. At just below 900 lines of commented source code, minIni tr

Thiadmer Riemersma 287 Nov 22, 2022
The libxo library allows an application to generate text, XML, JSON, and HTML output using a common set of function calls. The application decides at run time which output style should be produced.

libxo libxo - A Library for Generating Text, XML, JSON, and HTML Output The libxo library allows an application to generate text, XML, JSON, and HTML

Juniper Networks 252 Nov 20, 2022
A simple and easy-to-use library to enjoy videogames programming

hb-raylib v3.5 Harbour bindings for raylib 3.5, a simple and easy to use library to learn videogames programming raylib v3.5. The project has an educa

MarcosLMG 1 Aug 28, 2022
Small header-only C++ library that helps to initialize Vulkan instance and device object

Vulkan Extensions & Features Help, or VkExtensionsFeaturesHelp, is a small, header-only, C++ library for developers who use Vulkan API.

Adam Sawicki 11 Oct 12, 2022
Haxe bindings for raylib, a simple and easy-to-use library to learn videogame programming

Haxe bindings for raylib, a simple and easy-to-use library to learn videogame programming, Currently works only for windows but feel free the expand t

FSasquatch 33 Nov 17, 2022