Standards compliant, fast, secure markdown processing library in C

Related tags

Utilities hoedown
Overview

Hoedown

Build Status

Hoedown is a revived fork of Sundown, the Markdown parser based on the original code of the Upskirt library by Natacha Porté.

Features

  • Fully standards compliant

    Hoedown passes out of the box the official Markdown v1.0.0 and v1.0.3 test suites, and has been extensively tested with additional corner cases to make sure its output is as sane as possible at all times.

  • Massive extension support

    Hoedown has optional support for several (unofficial) Markdown extensions, such as non-strict emphasis, fenced code blocks, tables, autolinks, strikethrough and more.

  • UTF-8 aware

    Hoedown is fully UTF-8 aware, both when parsing the source document and when generating the resulting (X)HTML code.

  • Tested & Ready to be used on production

    Hoedown has been extensively security audited, and includes protection against all possible DOS attacks (stack overflows, out of memory situations, malformed Markdown syntax...).

    We've worked very hard to make Hoedown never leak or crash under any input.

    Warning: Hoedown doesn't validate or post-process the HTML in Markdown documents. Unless you use HTML_ESCAPE or HTML_SKIP, you should strongly consider using a good post-processor in conjunction with Hoedown to prevent client-side attacks.

  • Customizable renderers

    Hoedown is not stuck with XHTML output: the Markdown parser of the library is decoupled from the renderer, so it's trivial to extend the library with custom renderers. A fully functional (X)HTML renderer is included.

  • Optimized for speed

    Hoedown is written in C, with a special emphasis on performance. When wrapped on a dynamic language such as Python or Ruby, it has shown to be up to 40 times faster than other native alternatives.

  • Zero-dependency

    Hoedown is a zero-dependency library composed of some .c files and their headers. No dependencies, no bullshit. Only standard C99 that builds everywhere.

  • Additional features

    Hoedown comes with a fully functional implementation of SmartyPants, a separate autolinker, escaping utilities, buffers and stacks.

Bindings

You can see a community-maintained list of Hoedown bindings at the wiki. There is also a migration guide available for authors of Sundown bindings.

Help us

Hoedown is all about security. If you find a (potential) security vulnerability in the library, or a way to make it crash through malicious input, please report it to us by emailing the private Hoedown Security mailing list. The Hoedown security team will review the vulnerability and work with you to reproduce and resolve it.

Unicode character handling

Given that the Markdown spec makes no provision for Unicode character handling, Hoedown takes a conservative approach towards deciding which extended characters trigger Markdown features:

  • Punctuation characters outside of the U+007F codepoint are not handled as punctuation. They are considered as normal, in-word characters for word-boundary checks.

  • Whitespace characters outside of the U+007F codepoint are not considered as whitespace. They are considered as normal, in-word characters for word-boundary checks.

Install

Just typing make will build Hoedown into a dynamic library and create the hoedown and smartypants executables, which are command-line tools to render Markdown to HTML and perform SmartyPants, respectively.

If you are using CocoaPods, just add the line pod 'hoedown' to your Podfile and call pod install.

Or, if you prefer, you can just throw the files at src into your project.

Issues
  • Markdown AST inquiry

    Markdown AST inquiry

    Gentlemen, first let allow me thank you for your effort with Hoedown. What you have done so far is truly remarkable!

    I am running my own fork of Sundown that extends it in two major ways:

    1. Adds three additional "rendering" hooks to allow me build parsed Markdown abstract syntax tree (AST) instead of rendering
    2. Adds source maps so you can tell where in the Markdown source document the block being renders comes from

    This two "extensions" allow me to build the Markdown AST in memory so it can be processed later by some other tools (e.g. the API Blueprint Parser in my case).

    My question is: Would you care about such a contribution to the Hoedown project so it can be used to build Markdown ASTs?

    If so, should Hoedown just support building Markdown ASTs thanks to sufficient hooks and source maps or should it also offer a full AST on its own?

    Thank you for consideration.

    question 
    opened by zdne 64
  • Source maps in Hoedown

    Source maps in Hoedown

    Derived from #22.

    The renderers should be given the position of the block they're rendering. An easy and low-level way to do that would be to pass a size_t pos as last argument to the callbacks where possible, indicating the position of the block in the input buffer.

    That would however make callbacks longer and most of the time this feature isn't gonna be used.

    question 
    opened by mildsunrise 36
  • Import improvements from Lanli (and more things)

    Import improvements from Lanli (and more things)

    Lanli is an upcoming HTML sanitizer that will soon be published as a companion to Hoedown. Most of their code (and philosophy) is shared, so I'm importing the changes I've done on Lanli. (That forced me to do more improvements in the same commit, instead of splitting them in multiple PR. Sorry about that, really.)

    Overall, this PR greatly improves API and code consistency, adds basic documentation and some optimizations.

    Here's a list of the changes:

    • Documentation: added short description for each function in the API.

    • Performance: hoedown_escape_html, hoedown_escape_href and parse_inline have been optimized and are slightly faster than before. (This made Lanli 10% faster, I don't know about Hoedown yet)

    • API: following buffer_new, all functions that return a pointer to a newly allocated memory area must be declared with __attribute__ ((malloc)), in order to properly hint the compiler.

    • API: hoedown_buffer_eq[s] and hoedown_buffer_set[s] have been added.

    • Behaviour: implement a malloc wrapper as we said around #48. This allows us to further simplify the code while still being safe, so we can finally close #48.

    • API: extern is no longer used in document.h and html.h as it's not needed.

    • Building: files are now built in C99 mode instead of the default GNU89.

    • Style: all public headers now closely follow this structure (note: tabs aren't expanded):

      /* header.h - short description */
      
      #ifndef HOEDOWN_HEADER_H
      #define HOEDOWN_HEADER_H
      
      #include "public_header_1.h"
      #include "public_header_2.h"
      
      #include <system_header_1.h>
      #include <system_header_2.h>
      
      #ifdef __cplusplus
      extern "C" {
      #endif
      
      // [Platform-specific hacks]
      
      
      /*************
       * CONSTANTS *
       *************/
      
      #define HOEDOWN_CONSTANT 3.1415926535898
      
      typedef enum hoedown_flag {
              HOEDOWN_FLAG_ONE = (1 << 0),
              HOEDOWN_FLAG_TWO = (1 << 1),
              HOEDOWN_FLAG_THREE = (1 << 2)
      } hoedown_flag;
      
      typedef enum hoedown_enum {
              HOEDOWN_ENUM_ONE,
              HOEDOWN_ENUM_TWO,
              HOEDOWN_ENUM_THREE
      } hoedown_enum;
      
      
      /*********
       * TYPES *
       *********/
      
      typedef void *(*hoedown_pointer_type)(void *, size_t);
      
      struct hoedown_type {
              type *field;
              type field;
              other_type **field;
      };
      typedef struct hoedown_type hoedown_type;
      
      
      /*************
       * FUNCTIONS *
       *************/
      
      /* hoedown_function: description */
      return_type *hoedown_function(parameter *one, parameter *two);
      
      /* hoedown_function: description */
      return_type *hoedown_function(parameter *one, parameter *two);
      
      
      /* HOEDOWN_HELPER: description */
      #define HOEDOWN_HELPER(param, param) \
              DEFINITION
      
      
      #ifdef __cplusplus
      }
      #endif
      
      #endif /** HOEDOWN_HEADER_H **/
      

      Sections may be ommited if they're empty.

    • Style: source files, unlike headers, have no comment at the top. Exported functions in source files have no comment on them.

    • Style: descriptions for functions must always be in infinitive.

    • Style: all files must end with a single newline.

    • Style: the init function should be the first declared function in the header, followed by new.

    • Formatting: trailing whitespace has been removed.

    • API: these guidelines apply when naming instance functions:

      • hoedown_instance_init initializes an already-allocated, uninitialized instance.
      • hoedown_instance_new allocates and initializes a new instance.
      • hoedown_instance_uninit uninitializes the instance, which is then ready for deallocation (or initialization).
      • hoedown_instance_free uninitializes and deallocates the instance.
      • hoedown_instance_reset resets the instance to its recently initialized state.
        It's equivalent to calling uninit and then init, but faster.
      • All other methods should only be called on initialized instances.

      Following these guidelines, and for consistency with buffer, hoedown_stack_new has been renamed to hoedown_stack_init, and hoedown_stack_free to hoedown_stack_uninit.

    • Behaviour: before, hoedown_stack_push always called hoedown_stack_grow to double the current stack size, resulting in repeated grows. Now, it only grows when there's no room for more items, as buffer does.

    • API: exported methods now follow the const uint8_t *data, size_t size convention to accept input, as well as the hoedown_buffer *ob convention for output.

    • API: typedef not only structs, but enums, and use the enum type instead of unsigned int.

    • API: flags should have a plural name, as in hoedown_extensions or hoedown_html_flags. Regular enums should have a singular name, as in hoedown_html_tag or hoedown_action.

    • Keep hoedown.def updated.

    opened by mildsunrise 23
  • Link reference names aren't case insensitive with Unicode

    Link reference names aren't case insensitive with Unicode

    Imported from vmg/sundown#138.

    Reference names are case-insensitive, but Unicode characters are allowed in them. The spec says they must be case-insensitive, but doesn't say anything about Unicode characters allowed in them (in fact, no mention of Unicode is made in the whole spec).

    Because Hoedown doesn't actually deal with Unicode codepoints, only ASCII letters are lowercased to do the match. So, in some cases the link is not matched:

    See [Ñora][] at the Spanish wikipedia.
    
    [ñora]: http://es.wikipedia.org/wiki/ñora
    

    gives

    <p>See [Ñora][] at the Spanish wikipedia.</p>
    

    So we basically have two options:

    • Mark this as wontfix and explicitely say that non-ASCII letters are case-sensitive in link names.
    • Grab some UTF-8 library, and use it to lowercase the strings so we can match them.

    I'd say the first, since the last is probably out-of-scope for Hoedown.

    bug minor 
    opened by mildsunrise 22
  • Executable should parse options

    Executable should parse options

    Derived from #19. It would be great if the executable parsed options as extensions, rendering flags and renderers.

    Example: hoedown --fenced-code-blocks --tables my.markdown

    While that would increase the complexity of the code as an example, it'd show how to pass options to the parser / renderer.

    Also, --version and --smartypants.

    enhancement 
    opened by mildsunrise 22
  • Finish reorganization

    Finish reorganization

    Now that @devinus has done the base work, there's still some things to do, namely:

    Code (3)

    • [x] Prefix enums as well. Currently they start with   MKD_ or HTML_.
    • [x] Normalize guard names and comments on headers.
    • [x] General cleanup.

    Building and versioning (4)

    • [x] Remove the html/ directory from Makefiles.
    • [x] Makefile.win should be modified as well.
    • [x] Add everything to hoedown.def.
    • [x] Reset version.

    Readme and licensing (5)

    • [x] Correct a typo at README.
    • [x] Review the README. Especially clarify the "bindings" part: all those bindings currently aren't Hoedown bindings.
    • [x] Rewrite the "Install" section. The "it's just three files" part is not true anymore.
    • [x] Does README's License match with LICENSE?
    • [x] Update preambles where necessary.
    opened by mildsunrise 22
  • Support for GitHub flavored Markdown

    Support for GitHub flavored Markdown

    This might be a tough one, but since GitHub left Sundown in a dust and Redcarpet is apparently now their Markdown parser of choice my question is: What about possible future extensions to GFM?

    Do you plan to reimplement such a possible changes Hoedown? Should Hoedown support traditional Markdown only?

    Again thanks for your time and effort!

    opened by zdne 21
  • Add hoedown_document_render_inline

    Add hoedown_document_render_inline

    Wether it's for short posts, or full articles, Markdown is great. But sometimes, a full Markdown render is too much.

    This pull request adds a companion to hoedown_document_render: it's hoedown_document_render_inline. As the name implies, the content is passed directly to parse_inline, so it gets parsed as if it was regular Markdown inside of a paragraph, for instance.

    The preprocessing done on this new method is much simpler than that of a regular render:

    • All spacing is converted to spaces, directly. This prevents parse_inline from interpreting a linebreak.
    • No reference or footnote processing.
    • No BOM is interpreted.
    • No linefeed is added at the end.

    Use cases

    You could use this on a Markdown-based commenting system similar to StackOverflow's (they call this "mini-markdown"):

    StackOverflow comment box

    Or on a "Todo app":

    Some tasks

    Or on Github itself, for titles:

    Thread title

    You'd use this whenever you have short strings of text, and you want to give them some basic formatting.

    Examples

    Input:     Some **inline** markdown here!
    Output:    Some <strong>inline</strong> markdown here!
    
    Input:     - This is *not* a list item.
    Output:    - This is <em>not</em> a list item.
    
    Input:     Autolinking. http://ddg.gg
    Output:    Autolinking. <a href="http://ddg.gg">http://ddg.gg</a>
    
    Input:     > This < would be interpreted as a `blockquote`.
    Output:    &gt; This &lt; would be interpreted as a <code>blockquote</code>.
    
    Input:     Because images in short comments are unacceptable, the image callback was set
               to `NULL` in this example. ![image](http://something)
    Output:    Because images in short comments are unacceptable, the image callback was set
               to <code>NULL</code> in this example. !<a href="http://something">image</a>
    
    opened by mildsunrise 20
  • Every link between quotes is not rendered as anchor

    Every link between quotes is not rendered as anchor

    "[IAB Guidelines](http://www.iab.net/guidelines/)" generates <q>[IAB Guidelines](http://www.iab.net/guidelines/)</q> instead of "IAB Guidelines".

    bug 
    opened by dedalozzo 18
  • Use of C99 features

    Use of C99 features

    From the README.md:

    [...] standard C99 that builds everywhere.

    If Hoedown is C99, then I suppose there should be no problem with using the bool type with #include <stdbool.h>. Would increase code readability and, you know, a bool takes less memory than an int.

    According to this answer:

    [...] will work only if you use C99 and it's the "standard way" to do it. Choose this if possible.

    Should we transfer all boolean uses in Hoedown to bool? Would there be any problems of compatibility?

    question 
    opened by mildsunrise 18
  • MathJax support

    MathJax support

    Based on @uranusjr's work in #112.

    From #100.

    This implements HOEDOWN_EXT_MATH and HOEDOWN_EXT_MATH_DOLLAR. The former triggers char_math (from char_escape with \\[ and \\(, or from active char $ with $$), which parses the block and feed the content and opening/ending tags to the renderer callback. The latter flag enabled an extra math block syntax delimited with a single $. Renderer callback in hoedown_html_renderer outputs tags and content of the block verbatim. Not sure whether I should trim and/or collapse spaces and newlines inside the block. It’s irrelevant to MathJax.

    opened by mildsunrise 15
  • Support multiple references to the same footnote

    Support multiple references to the same footnote

    # Heading
    
    Some text with a footnote.[^1]
    
    Some other text with the same footnote.[^1]
    
    [^1]: The footnote
    

    When rendered, the second paragraph renders the string literal "[^1]" instead of generating a second supertext link to the same footnote.

    opened by yorickhenning 2
  • Triple-quoted code breaks list

    Triple-quoted code breaks list

    While triple codes can be used to introduce a fenced code block, they may also be used to mark an inline code span. However for the sake of list item formatting, this distinction is broken:

    * First item
    
      ```Triple quoted code```
    
    * Second list item
    
      ```More such code```
    * Last list item
    

    This ends the list after the first item, starting a new list for the second item. The bullet of the third item gets consumed into the body of the second list item. Taken together I get

    <li><p>First item</p>
    
    <p><code>Triple quoted code</code></p></li>
    </ul>
    
    <ul>
    <li><p>Second list item</p>
    
    <p><code>More such code</code>
    * Last list item</p></li>
    </ul>
    
    opened by gagern 0
  • Detect fenced code block starting in first line of list item

    Detect fenced code block starting in first line of list item

    Prior to this change, the in_fenced flag was not set correctly for the first line, and therefore inverted for every following line if the first line did start with a code block. Since the start of a subsequent list item is explicitly not detected while in fenced code, this essentially disabled the has_next_oli detection, leading to HOEDOWN_LI_END terminating not only the list item but the list as a whole.

    Fixes https://github.com/hoedown/hoedown/issues/236.

    opened by gagern 0
  • Fenced code at start of numbered list item resets number at next item

    Fenced code at start of numbered list item resets number at next item

    1. First item.
    
    1. ```
       Some code.
       More code.
       ```
    
       There is code here.
    
    1. Third item.
    

    The code above resets item numbering for the third item. In other words, it ends one <ol> and starts another after the second item, the one starting in the code block. I could reproduce this with hoedown --fenced-code using current HEAD, namely 980b9c549b4348d50b683ecee6abee470b98acda.

    https://spec.commonmark.org/0.29/#lists states that a list consists of consecutive list items of the same kind, so I believe that my expected behavior of one continued list is in line with the spec, and the implemented behavior is not. Things might be different if the There is code here line were considered to be after the list, but as it is still inside the <li> then the list item definitely gets treated as continuing till that line.

    Some experiments show that having a non-code line of text at the beginning of the item fixes the renumbering issue. An empty line as the first item of the numbered list does fix this as well, but as the space after the number is part of the list marker, that means a trailing space in the line with the number. I have written down these cases and the resulting Hoedown rendering in a gist.

    I'm actually experiencing this with Hoextdown, and will write a corresponding bug report there as well. Not sure how much exchange there is between these projects today.

    opened by gagern 0
Owner
Hoedown
Hoedown
Rich text library supporting customizable Markdown formatting

Rich text library supporting customizable Markdown formatting

Brace Yourself Games 81 Aug 3, 2022
A markdown parser for tree-sitter

tree-sitter-markdown A markdown parser for tree-sitter Progress: Leaf blocks Thematic breaks ATX headings Setext headings Indented code blocks Fenced

Matthias Deiml 193 Aug 9, 2022
A fast image processing library with low memory needs.

libvips : an image processing library Introduction libvips is a demand-driven, horizontally threaded image processing library. Compared to similar lib

libvips 6.9k Aug 3, 2022
A fast character conversion and transliteration library based on the scheme defined for Japan National Tax Agency (国税庁) 's corporate number (法人番号) system.

jntajis-python Documentation: https://jntajis-python.readthedocs.io/ What's JNTAJIS-python? JNTAJIS-python is a transliteration library, specifically

Open Collector, Inc. 12 May 16, 2022
fast javascript bundler :package:

Fast JavaScript Bundler https://fjbundler.com What? It is what it says it is. However, this bundler aims to be a monolithic does-it-all type of bundle

Sebastian Karlsson 104 Aug 5, 2022
Fast comparison-based sort algorithm

nanosort Algorithm nanosort aims to be a fast comparison-based sorting algorithm, tuned for POD types of reasonably small sizes. nanosort implements a

Arseny Kapoulkine 36 May 24, 2022
A fast phone number lib for Ruby (binds to Google's C++ libphonenumber)

MiniPhone A Ruby gem which plugs directly into Google's native C++ libphonenumber for extremely fast and robust phone number parsing, validation, and

Ian Ker-Seymer 146 Aug 8, 2022
Tau is a fast syntax highlighter capable of emitting HTML.

tau - a reasonably fast (wip) syntax highlighter. Tau is a fast syntax highlighter capable of emitting HTML. It highlights the following languages: py

Palaiologos 12 Apr 21, 2022
The goal of insidesp is to do fast point in polygon classification, the sp way.

insidesp The goal of insidesp is to do fast point in polygon classification, the sp way. We are comparing a few ways of implementing this, essentially

diminutive 2 Nov 12, 2021
Fast regular expression grep for source code with incremental index updates

Fast regular expression grep for source code with incremental index updates

Arseny Kapoulkine 250 Aug 10, 2022
Isocline is a pure C library that can be used as an alternative to the GNU readline library

Isocline: a portable readline alternative. Isocline is a pure C library that can be used as an alternative to the GNU readline library (latest release

Daan 120 Jul 19, 2022
A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename.

whereami A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename. usag

Blackle Morisanchetto 1 Nov 5, 2021
Command-line arguments parsing library.

argparse argparse - A command line arguments parsing library in C (compatible with C++). Description This module is inspired by parse-options.c (git)

Yecheng Fu 497 Aug 8, 2022
A cross platform C99 library to get cpu features at runtime.

cpu_features A cross-platform C library to retrieve CPU features (such as available instructions) at runtime. Table of Contents Design Rationale Code

Google 2.1k Jul 31, 2022
Library that solves the exact cover problem using Dancing Links, also known as DLX.

The DLX Library The DLX library The DLX library solves instances of the exact cover problem, using Dancing Links (Knuth’s Algorithm X). Also included

Ben Lynn 40 Jul 14, 2022
CommonMark parsing and rendering library and program in C

cmark cmark is the C reference implementation of CommonMark, a rationalized version of Markdown syntax with a spec. (For the JavaScript reference impl

CommonMark 1.4k Aug 7, 2022
A cross-platform protocol library to communicate with iOS devices

libimobiledevice A library to communicate with services on iOS devices using native protocols. Features libimobiledevice is a cross-platform software

libimobiledevice 5.1k Aug 3, 2022
Platform independent Near Field Communication (NFC) library

*- * Free/Libre Near Field Communication (NFC) library * * Libnfc historical contributors: * Copyright (C) 2009 Roel Verdult * Copyright (C) 2009

null 1.3k Aug 7, 2022
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

libpostal: international street address NLP libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP a

openvenues 3.5k Aug 11, 2022