# Standards compliant, fast, secure markdown processing library in C

### Related tags

Utilities hoedown

# Hoedown

Hoedown is a revived fork of Sundown, the Markdown parser based on the original code of the Upskirt library by Natacha Porté.

## Features

• Fully standards compliant

Hoedown passes out of the box the official Markdown v1.0.0 and v1.0.3 test suites, and has been extensively tested with additional corner cases to make sure its output is as sane as possible at all times.

• Massive extension support

Hoedown has optional support for several (unofficial) Markdown extensions, such as non-strict emphasis, fenced code blocks, tables, autolinks, strikethrough and more.

• UTF-8 aware

Hoedown is fully UTF-8 aware, both when parsing the source document and when generating the resulting (X)HTML code.

• Tested & Ready to be used on production

Hoedown has been extensively security audited, and includes protection against all possible DOS attacks (stack overflows, out of memory situations, malformed Markdown syntax...).

We've worked very hard to make Hoedown never leak or crash under any input.

Warning: Hoedown doesn't validate or post-process the HTML in Markdown documents. Unless you use HTML_ESCAPE or HTML_SKIP, you should strongly consider using a good post-processor in conjunction with Hoedown to prevent client-side attacks.

• Customizable renderers

Hoedown is not stuck with XHTML output: the Markdown parser of the library is decoupled from the renderer, so it's trivial to extend the library with custom renderers. A fully functional (X)HTML renderer is included.

• Optimized for speed

Hoedown is written in C, with a special emphasis on performance. When wrapped on a dynamic language such as Python or Ruby, it has shown to be up to 40 times faster than other native alternatives.

• Zero-dependency

Hoedown is a zero-dependency library composed of some .c files and their headers. No dependencies, no bullshit. Only standard C99 that builds everywhere.

Hoedown comes with a fully functional implementation of SmartyPants, a separate autolinker, escaping utilities, buffers and stacks.

## Bindings

You can see a community-maintained list of Hoedown bindings at the wiki. There is also a migration guide available for authors of Sundown bindings.

## Help us

Hoedown is all about security. If you find a (potential) security vulnerability in the library, or a way to make it crash through malicious input, please report it to us by emailing the private Hoedown Security mailing list. The Hoedown security team will review the vulnerability and work with you to reproduce and resolve it.

## Unicode character handling

Given that the Markdown spec makes no provision for Unicode character handling, Hoedown takes a conservative approach towards deciding which extended characters trigger Markdown features:

• Punctuation characters outside of the U+007F codepoint are not handled as punctuation. They are considered as normal, in-word characters for word-boundary checks.

• Whitespace characters outside of the U+007F codepoint are not considered as whitespace. They are considered as normal, in-word characters for word-boundary checks.

## Install

Just typing make will build Hoedown into a dynamic library and create the hoedown and smartypants executables, which are command-line tools to render Markdown to HTML and perform SmartyPants, respectively.

If you are using CocoaPods, just add the line pod 'hoedown' to your Podfile and call pod install.

Or, if you prefer, you can just throw the files at src into your project.

• #### Markdown AST inquiry

Gentlemen, first let allow me thank you for your effort with Hoedown. What you have done so far is truly remarkable!

I am running my own fork of Sundown that extends it in two major ways:

1. Adds three additional "rendering" hooks to allow me build parsed Markdown abstract syntax tree (AST) instead of rendering
2. Adds source maps so you can tell where in the Markdown source document the block being renders comes from

This two "extensions" allow me to build the Markdown AST in memory so it can be processed later by some other tools (e.g. the API Blueprint Parser in my case).

My question is: Would you care about such a contribution to the Hoedown project so it can be used to build Markdown ASTs?

If so, should Hoedown just support building Markdown ASTs thanks to sufficient hooks and source maps or should it also offer a full AST on its own?

Thank you for consideration.

question
opened by zdne 64
• #### Source maps in Hoedown

Derived from #22.

The renderers should be given the position of the block they're rendering. An easy and low-level way to do that would be to pass a size_t pos as last argument to the callbacks where possible, indicating the position of the block in the input buffer.

That would however make callbacks longer and most of the time this feature isn't gonna be used.

question
opened by mildsunrise 36
• #### Import improvements from Lanli (and more things)

Lanli is an upcoming HTML sanitizer that will soon be published as a companion to Hoedown. Most of their code (and philosophy) is shared, so I'm importing the changes I've done on Lanli. (That forced me to do more improvements in the same commit, instead of splitting them in multiple PR. Sorry about that, really.)

Overall, this PR greatly improves API and code consistency, adds basic documentation and some optimizations.

Here's a list of the changes:

• Documentation: added short description for each function in the API.

• Performance: hoedown_escape_html, hoedown_escape_href and parse_inline have been optimized and are slightly faster than before. (This made Lanli 10% faster, I don't know about Hoedown yet)

• API: following buffer_new, all functions that return a pointer to a newly allocated memory area must be declared with __attribute__ ((malloc)), in order to properly hint the compiler.

• API: hoedown_buffer_eq[s] and hoedown_buffer_set[s] have been added.

• Behaviour: implement a malloc wrapper as we said around #48. This allows us to further simplify the code while still being safe, so we can finally close #48.

• API: extern is no longer used in document.h and html.h as it's not needed.

• Building: files are now built in C99 mode instead of the default GNU89.

• Style: all public headers now closely follow this structure (note: tabs aren't expanded):

/* header.h - short description */

#ifdef __cplusplus
extern "C" {
#endif

// [Platform-specific hacks]

/*************
* CONSTANTS *
*************/

#define HOEDOWN_CONSTANT 3.1415926535898

typedef enum hoedown_flag {
HOEDOWN_FLAG_ONE = (1 << 0),
HOEDOWN_FLAG_TWO = (1 << 1),
HOEDOWN_FLAG_THREE = (1 << 2)
} hoedown_flag;

typedef enum hoedown_enum {
HOEDOWN_ENUM_ONE,
HOEDOWN_ENUM_TWO,
HOEDOWN_ENUM_THREE
} hoedown_enum;

/*********
* TYPES *
*********/

typedef void *(*hoedown_pointer_type)(void *, size_t);

struct hoedown_type {
type *field;
type field;
other_type **field;
};
typedef struct hoedown_type hoedown_type;

/*************
* FUNCTIONS *
*************/

/* hoedown_function: description */
return_type *hoedown_function(parameter *one, parameter *two);

/* hoedown_function: description */
return_type *hoedown_function(parameter *one, parameter *two);

/* HOEDOWN_HELPER: description */
#define HOEDOWN_HELPER(param, param) \
DEFINITION

#ifdef __cplusplus
}
#endif

#endif /** HOEDOWN_HEADER_H **/


Sections may be ommited if they're empty.

• Style: source files, unlike headers, have no comment at the top. Exported functions in source files have no comment on them.

• Style: descriptions for functions must always be in infinitive.

• Style: all files must end with a single newline.

• Style: the init function should be the first declared function in the header, followed by new.

• Formatting: trailing whitespace has been removed.

• API: these guidelines apply when naming instance functions:

• hoedown_instance_init initializes an already-allocated, uninitialized instance.
• hoedown_instance_new allocates and initializes a new instance.
• hoedown_instance_uninit uninitializes the instance, which is then ready for deallocation (or initialization).
• hoedown_instance_free uninitializes and deallocates the instance.
• hoedown_instance_reset resets the instance to its recently initialized state.
It's equivalent to calling uninit and then init, but faster.
• All other methods should only be called on initialized instances.

Following these guidelines, and for consistency with buffer, hoedown_stack_new has been renamed to hoedown_stack_init, and hoedown_stack_free to hoedown_stack_uninit.

• Behaviour: before, hoedown_stack_push always called hoedown_stack_grow to double the current stack size, resulting in repeated grows. Now, it only grows when there's no room for more items, as buffer does.

• API: exported methods now follow the const uint8_t *data, size_t size convention to accept input, as well as the hoedown_buffer *ob convention for output.

• API: typedef not only structs, but enums, and use the enum type instead of unsigned int.

• API: flags should have a plural name, as in hoedown_extensions or hoedown_html_flags. Regular enums should have a singular name, as in hoedown_html_tag or hoedown_action.

• Keep hoedown.def updated.

opened by mildsunrise 23
• #### Link reference names aren't case insensitive with Unicode

Imported from vmg/sundown#138.

Reference names are case-insensitive, but Unicode characters are allowed in them. The spec says they must be case-insensitive, but doesn't say anything about Unicode characters allowed in them (in fact, no mention of Unicode is made in the whole spec).

Because Hoedown doesn't actually deal with Unicode codepoints, only ASCII letters are lowercased to do the match. So, in some cases the link is not matched:

See [Ñora][] at the Spanish wikipedia.

[ñora]: http://es.wikipedia.org/wiki/ñora


gives

<p>See [Ñora][] at the Spanish wikipedia.</p>


So we basically have two options:

• Mark this as wontfix and explicitely say that non-ASCII letters are case-sensitive in link names.
• Grab some UTF-8 library, and use it to lowercase the strings so we can match them.

I'd say the first, since the last is probably out-of-scope for Hoedown.

bug minor
opened by mildsunrise 22
• #### Executable should parse options

Derived from #19. It would be great if the executable parsed options as extensions, rendering flags and renderers.

Example: hoedown --fenced-code-blocks --tables my.markdown

While that would increase the complexity of the code as an example, it'd show how to pass options to the parser / renderer.

Also, --version and --smartypants.

enhancement
opened by mildsunrise 22
• #### Finish reorganization

Now that @devinus has done the base work, there's still some things to do, namely:

#### Code (3)

• [x] Prefix enums as well. Currently they start with   MKD_ or HTML_.
• [x] Normalize guard names and comments on headers.
• [x] General cleanup.

#### Building and versioning (4)

• [x] Remove the html/ directory from Makefiles.
• [x] Makefile.win should be modified as well.
• [x] Add everything to hoedown.def.
• [x] Reset version.

#### Readme and licensing (5)

• [x] Correct a typo at README.
• [x] Review the README. Especially clarify the "bindings" part: all those bindings currently aren't Hoedown bindings.
• [x] Rewrite the "Install" section. The "it's just three files" part is not true anymore.
• [x] Does README's License match with LICENSE?
• [x] Update preambles where necessary.
opened by mildsunrise 22
• #### Support for GitHub flavored Markdown

This might be a tough one, but since GitHub left Sundown in a dust and Redcarpet is apparently now their Markdown parser of choice my question is: What about possible future extensions to GFM?

Do you plan to reimplement such a possible changes Hoedown? Should Hoedown support traditional Markdown only?

Again thanks for your time and effort!

opened by zdne 21
• #### Add hoedown_document_render_inline

Wether it's for short posts, or full articles, Markdown is great. But sometimes, a full Markdown render is too much.

This pull request adds a companion to hoedown_document_render: it's hoedown_document_render_inline. As the name implies, the content is passed directly to parse_inline, so it gets parsed as if it was regular Markdown inside of a paragraph, for instance.

The preprocessing done on this new method is much simpler than that of a regular render:

• All spacing is converted to spaces, directly. This prevents parse_inline from interpreting a linebreak.
• No reference or footnote processing.
• No BOM is interpreted.
• No linefeed is added at the end.

#### Use cases

You could use this on a Markdown-based commenting system similar to StackOverflow's (they call this "mini-markdown"):

Or on a "Todo app":

Or on Github itself, for titles:

You'd use this whenever you have short strings of text, and you want to give them some basic formatting.

#### Examples

Input:     Some **inline** markdown here!
Output:    Some <strong>inline</strong> markdown here!

Input:     - This is *not* a list item.
Output:    - This is <em>not</em> a list item.

Input:     Autolinking. http://ddg.gg
Output:    Autolinking. <a href="http://ddg.gg">http://ddg.gg</a>

Input:     > This < would be interpreted as a blockquote.
Output:    &gt; This &lt; would be interpreted as a <code>blockquote</code>.

Input:     Because images in short comments are unacceptable, the image callback was set
to NULL in this example. ![image](http://something)
Output:    Because images in short comments are unacceptable, the image callback was set
to <code>NULL</code> in this example. !<a href="http://something">image</a>

opened by mildsunrise 20
• #### Every link between quotes is not rendered as anchor

"[IAB Guidelines](http://www.iab.net/guidelines/)" generates <q>[IAB Guidelines](http://www.iab.net/guidelines/)</q> instead of "IAB Guidelines".

bug
opened by dedalozzo 18
• #### Use of C99 features

From the README.md:

[...] standard C99 that builds everywhere.

If Hoedown is C99, then I suppose there should be no problem with using the bool type with #include <stdbool.h>. Would increase code readability and, you know, a bool takes less memory than an int.

According to this answer:

[...] will work only if you use C99 and it's the "standard way" to do it. Choose this if possible.

Should we transfer all boolean uses in Hoedown to bool? Would there be any problems of compatibility?

question
opened by mildsunrise 18
• #### MathJax support

Based on @uranusjr's work in #112.

From #100.

This implements HOEDOWN_EXT_MATH and HOEDOWN_EXT_MATH_DOLLAR. The former triggers char_math (from char_escape with \\[ and \\(, or from active char $ with $$), which parses the block and feed the content and opening/ending tags to the renderer callback. The latter flag enabled an extra math block syntax delimited with a single $. Renderer callback in hoedown_html_renderer outputs tags and content of the block verbatim. Not sure whether I should trim and/or collapse spaces and newlines inside the block. It’s irrelevant to MathJax.

opened by mildsunrise 15
• #### Support multiple references to the same footnote

# Heading

Some text with a footnote.[^1]

Some other text with the same footnote.[^1]

[^1]: The footnote


When rendered, the second paragraph renders the string literal "[^1]" instead of generating a second supertext link to the same footnote.

opened by yorickhenning 2
• #### Triple-quoted code breaks list

While triple codes can be used to introduce a fenced code block, they may also be used to mark an inline code span. However for the sake of list item formatting, this distinction is broken:

* First item

Triple quoted code

* Second list item

More such code
* Last list item


This ends the list after the first item, starting a new list for the second item. The bullet of the third item gets consumed into the body of the second list item. Taken together I get

<li><p>First item</p>

<p><code>Triple quoted code</code></p></li>
</ul>

<ul>
<li><p>Second list item</p>

<p><code>More such code</code>
* Last list item</p></li>
</ul>

opened by gagern 0
• #### Detect fenced code block starting in first line of list item

Prior to this change, the in_fenced flag was not set correctly for the first line, and therefore inverted for every following line if the first line did start with a code block. Since the start of a subsequent list item is explicitly not detected while in fenced code, this essentially disabled the has_next_oli detection, leading to HOEDOWN_LI_END terminating not only the list item but the list as a whole.

Fixes https://github.com/hoedown/hoedown/issues/236.

opened by gagern 0
• #### Fenced code at start of numbered list item resets number at next item

1. First item.

1. 
Some code.
More code.


There is code here.

1. Third item.


The code above resets item numbering for the third item. In other words, it ends one <ol> and starts another after the second item, the one starting in the code block. I could reproduce this with hoedown --fenced-code using current HEAD, namely 980b9c549b4348d50b683ecee6abee470b98acda.

https://spec.commonmark.org/0.29/#lists states that a list consists of consecutive list items of the same kind, so I believe that my expected behavior of one continued list is in line with the spec, and the implemented behavior is not. Things might be different if the There is code here line were considered to be after the list, but as it is still inside the <li> then the list item definitely gets treated as continuing till that line.

Some experiments show that having a non-code line of text at the beginning of the item fixes the renumbering issue. An empty line as the first item of the numbered list does fix this as well, but as the space after the number is part of the list marker, that means a trailing space in the line with the number. I have written down these cases and the resulting Hoedown rendering in a gist.

I'm actually experiencing this with Hoextdown, and will write a corresponding bug report there as well. Not sure how much exchange there is between these projects today.

opened by gagern 0
###### Rich text library supporting customizable Markdown formatting

Rich text library supporting customizable Markdown formatting

81 Aug 3, 2022
###### A markdown parser for tree-sitter

tree-sitter-markdown A markdown parser for tree-sitter Progress: Leaf blocks Thematic breaks ATX headings Setext headings Indented code blocks Fenced

193 Aug 9, 2022
###### A fast image processing library with low memory needs.

libvips : an image processing library Introduction libvips is a demand-driven, horizontally threaded image processing library. Compared to similar lib

6.9k Aug 3, 2022
###### A fast character conversion and transliteration library based on the scheme defined for Japan National Tax Agency (国税庁) 's corporate number (法人番号) system.

jntajis-python Documentation: https://jntajis-python.readthedocs.io/ What's JNTAJIS-python? JNTAJIS-python is a transliteration library, specifically

12 May 16, 2022
###### fast javascript bundler :package:

Fast JavaScript Bundler https://fjbundler.com What? It is what it says it is. However, this bundler aims to be a monolithic does-it-all type of bundle

104 Aug 5, 2022
###### Fast comparison-based sort algorithm

nanosort Algorithm nanosort aims to be a fast comparison-based sorting algorithm, tuned for POD types of reasonably small sizes. nanosort implements a

36 May 24, 2022
###### A fast phone number lib for Ruby (binds to Google's C++ libphonenumber)

MiniPhone A Ruby gem which plugs directly into Google's native C++ libphonenumber for extremely fast and robust phone number parsing, validation, and

146 Aug 8, 2022
###### Tau is a fast syntax highlighter capable of emitting HTML.

tau - a reasonably fast (wip) syntax highlighter. Tau is a fast syntax highlighter capable of emitting HTML. It highlights the following languages: py

12 Apr 21, 2022
###### The goal of insidesp is to do fast point in polygon classification, the sp way.

insidesp The goal of insidesp is to do fast point in polygon classification, the sp way. We are comparing a few ways of implementing this, essentially

2 Nov 12, 2021
###### Fast regular expression grep for source code with incremental index updates

Fast regular expression grep for source code with incremental index updates

250 Aug 10, 2022
###### Isocline is a pure C library that can be used as an alternative to the GNU readline library

Isocline: a portable readline alternative. Isocline is a pure C library that can be used as an alternative to the GNU readline library (latest release

120 Jul 19, 2022
###### A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename.

whereami A linux library to get the file path of the currently running shared library. Emulates use of Win32 GetModuleHandleEx/GetModuleFilename. usag

1 Nov 5, 2021
###### Command-line arguments parsing library.

argparse argparse - A command line arguments parsing library in C (compatible with C++). Description This module is inspired by parse-options.c (git)

497 Aug 8, 2022
###### A cross platform C99 library to get cpu features at runtime.

cpu_features A cross-platform C library to retrieve CPU features (such as available instructions) at runtime. Table of Contents Design Rationale Code

2.1k Jul 31, 2022
###### Library that solves the exact cover problem using Dancing Links, also known as DLX.

The DLX Library The DLX library The DLX library solves instances of the exact cover problem, using Dancing Links (Knuth’s Algorithm X). Also included

40 Jul 14, 2022
###### CommonMark parsing and rendering library and program in C

cmark cmark is the C reference implementation of CommonMark, a rationalized version of Markdown syntax with a spec. (For the JavaScript reference impl

1.4k Aug 7, 2022
###### A cross-platform protocol library to communicate with iOS devices

libimobiledevice A library to communicate with services on iOS devices using native protocols. Features libimobiledevice is a cross-platform software

5.1k Aug 3, 2022
###### Platform independent Near Field Communication (NFC) library

*- * Free/Libre Near Field Communication (NFC) library * * Libnfc historical contributors: * Copyright (C) 2009 Roel Verdult * Copyright (C) 2009

1.3k Aug 7, 2022
###### A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

libpostal: international street address NLP libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP a

3.5k Aug 11, 2022