A markdown parser for tree-sitter

Overview

tree-sitter-markdown

A markdown parser for tree-sitter

Progress:

  • Leaf blocks
    • Thematic breaks
    • ATX headings
    • Setext headings
    • Indented code blocks
    • Fenced code blocks
    • HTML blocks
    • Link reference definitions
    • Paragraphs
    • Blank lines
    • Tables (extension)
  • Container blocks
    • Block quotes
    • List items
    • Task list items (extension) ironic
    • Lists
  • Inlines
    • Backslash escapes
    • Entity and numeric character references
    • Code spans
    • Emphasis and strong emphasis
    • Strikethrough (extension)
    • Links
    • Images
    • Autolinks
    • Autolinks (extension)
    • Raw HTML
    • Disallowed Raw HTML (extension)
    • Hard line breaks
    • Soft line breaks
    • Textual content
Comments
  • Error on markdown images

    Error on markdown images

    If two consecutive lines have images the parser fails.

    Example

    ### test doc
    
    ![img1](link1)
    ![img2](link2)
    
    what's going on?
    ### why an error after two image links ?
    

    ast from TSPlayground

    atx_heading [0, 0] - [1, 0]
      atx_h3_marker [0, 0] - [0, 3]
      heading_content [0, 3] - [0, 12]
    ERROR [2, 0] - [7, 0]
      image [3, 0] - [3, 14]
        image_description [3, 2] - [3, 6]
        link_destination [3, 8] - [3, 13]
      ERROR [4, 0] - [5, 0]
      ERROR [6, 0] - [6, 3]
    

    But if new_line between the images is removed or additional line is added between the two images the parser works .

    Means this works but the above one doesn't

    ![img1](link1)![img2](link2)
    
    ![img1](link1)
    
    ![img2](link2)
    
    invalid 
    opened by shadmansaleh 9
  • nvim becomes unresponsive when a fenced code block uses attributes

    nvim becomes unresponsive when a fenced code block uses attributes

    Hi there and thank you for your parser! Fixing this issue might be out of scope, as it is specific to pandocs extension of markdown, but since it leads to a crash / nvim becoming unresponsive I am reporting it anyways because people might come across this by accident.

    Pandoc allows attributes to be added to block level and inline elements (https://pandoc.org/MANUAL.html#extension-attributes) using a syntax with curly braces (https://pandoc.org/MANUAL.html#extension-header_attributes). These attributes can also be added to (fenced) code blocks (https://pandoc.org/MANUAL.html#fenced-code-blocks), which is commonly used by literate programming formats combining computation and documentation such as Rmarkdown to note down the language of a code block for execution. Unfortunately this syntax leads to a crash.

    So while

    ```R
    1 + 1
    ```
    

    works,

    ```{R}
    1 + 1
    ```
    

    becomes very slow once it is typed out and gives warnings about the missing closing } while typing. This must be somehow related to how the info_string after the three backticks is parsed. If this could allow for curly braces it would help a lot of pandoc and Rmarkdown users.

    opened by jmbuhr 8
  • Error: query: invalid node type at position 152

    Error: query: invalid node type at position 152

    Hi again @MDeiml,

    I seem to encounter a new error when using the markdown parser with the nightly, using get_string_parser fails:

    vim.treesitter.get_string_parser("test", "cpp")              -- success
    vim.treesitter.get_string_parser("test", "markdown_inline")  -- success
    vim.treesitter.get_string_parser("test", "markdown")         -- FAIL
    

    With error:

    The reason the runtime paths are weird is because I'm using the nightly appimage

    E5108: Error executing lua ...488e/usr/share/nvim/runtime/lua/vim/treesitter/query.lua:174: query: invalid node type at position 152
    stack traceback:
            [C]: in function '_ts_parse_query'
            ...488e/usr/share/nvim/runtime/lua/vim/treesitter/query.lua:174: in function 'get_query'
            ...r/share/nvim/runtime/lua/vim/treesitter/languagetree.lua:35: in function 'get_string_parser'
            .../site/pack/vendor/start/ts-vimdoc.nvim/lua/ts-vimdoc.lua:23: in function 'docgen'
            [string ":lua"]:1: in main chunk
    
    opened by ibhagwan 7
  • Highlight group suggestions

    Highlight group suggestions

    opened by pwntester 7
  • Incorrect formatting of `tick quoted text`

    Incorrect formatting of `tick quoted text`

    Describe the bug

    Code example

    ### Search
    
    Search commands all operate on the `d` register by default. Use `"<char>` to operate on a different one.
    
    | Key | Description                                 | Command            |
    | --- | ------------------------------------------- | ------------------ |
    | `a` | Search for regex pattern                    | `search`           |
    | `?` | Search for previous pattern                 | `rsearch`          |
    | `n` | Select next search match                    | `search_next`      |
    | `N` | Select previous search match                | `search_prev`      |
    | `*` | Use current selection as the search pattern | `search_selection` |
    
    

    Expected behavior

    As GitHub displays above

    Actual behavior

    In Helix (using the latest commit e375ba95ff9a12418f9b9e7c190f549d08b5380a):

    Screenshot from 2022-09-29 09-42-39

    Please also see: https://github.com/helix-editor/helix/issues/3849 , is it enough to update the Helix commit hash or do the queries need modification?

    bug 
    opened by David-Else 6
  • Text in square brackets conceals like a link

    Text in square brackets conceals like a link

    Text contained in square brackets will be rendered as link, despite the absence of a link destination in parenthesis.

    Code example

    This [link] renders like a link when it should not.
    

    Expected behavior When no link destination is provided square brackets render as text.

    Actual behavior When no link destination is provided square brackets render as link.

    bug wontfix 
    opened by storm 6
  • Feature request: add front matter support

    Feature request: add front matter support

    Rationale

    Markdown is a quite popular choice for writing content for static sites or documentation. Majority of existing tooling supports so called "front matter" that allows to attach some metadata to the document.

    See example https://gohugo.io/content-management/front-matter/

    Suggestion

    Front matter is neither a part of the standard nor standardize among existing tooling. Nevertheless, it seems community pretty much settled on two front matter formats:

    • YAML is used in case of --- separator. E.g:

      ---
      tags: [foo, bar]
      summary: a short summary
      ---
      
      # My blog post
      
      ...
      
    • TOML is used in case of +++ separator. E.g:

      +++
      tags = ["foo", "bar"]
      summary = "a short summary"
      +++
      
      # My blog post
      
      ...
      

    It would be nice to support both YAML and TOML injections for this 2 most popular choices. I'm sure it would cover 99% of cases.

    enhancement 
    opened by ikalnytskyi 6
  • Errror: Ranges can on ly be made from 6 element long tables or nodes.

    Errror: Ranges can on ly be made from 6 element long tables or nodes.

    Haven't investigated this fully yet but it seems to be related to the latest updates.

    I'm using this parser to generate vimdoc from markdown, I've narrowed it down to the part of code that's failing:

    this fails even with an empty file

      local parser = vim.treesitter.get_string_parser(
        -- contents of a sample markdown file consisting of 2 lines
        -- tile and body
        "# title\nbody\n\n",
        "markdown"
      )
      parser:parse()
    

    The call to parser:parse() fails with:

    Error detected while processing /home/bhagwan/test.lua:
    E5113: Error while calling lua chunk: /usr/share/nvim/runtime/lua/vim/treesitter/languagetree.lua:115:
    Ranges can only be made from 6 element long tables or nodes.
    stack traceback:
            [C]: in function 'set_included_ranges'
            /usr/share/nvim/runtime/lua/vim/treesitter/languagetree.lua:115: in function 'parse'
            /usr/share/nvim/runtime/lua/vim/treesitter/languagetree.lua:149: in function 'parse'
            /home/bhagwan/test.lua:3: in main chunk
    
    opened by ibhagwan 5
  • Bug: code bolck in quote block

    Bug: code bolck in quote block

    something like this will break the AST:

    > ```bash
    > git add --all
    > git commit -m "msg"
    > ```
    

    playground shows that:

    block_quote [0, 0] - [4, 0]
      block_quote_marker [0, 0] - [0, 2]
      fenced_code_block [0, 2] - [4, 0]
        info_string [0, 5] - [0, 9]
          language [0, 5] - [0, 9]
        block_quote_marker [1, 0] - [1, 2]
        code_fence_content [1, 2] - [3, 2]
          ERROR [1, 2] - [3, 1]
            command [1, 2] - [1, 15]
              name: command_name [1, 2] - [1, 5]
                word [1, 2] - [1, 5]
              argument: word [1, 6] - [1, 9]
              argument: word [1, 10] - [1, 15]
            command [2, 0] - [2, 21]
              file_redirect [2, 0] - [2, 5]
                destination: word [2, 2] - [2, 5]
              name: command_name [2, 6] - [2, 12]
                word [2, 6] - [2, 12]
              argument: word [2, 13] - [2, 15]
              argument: string [2, 16] - [2, 21]
          block_quote_marker [2, 0] - [2, 2]
          block_quote_marker [3, 0] - [3, 2]
    

    It seems this problem is very hard to fix.

    invalid 
    opened by black-desk 5
  • Link concealing?

    Link concealing?

    I've been using plasticboy/vim-markdown for markdown syntax highlighting and other stuff for years. I really misses the link concealing feature of that plugin when I switched to tree-sitter-markdown today. Link concealing shows [text](link) as text and make the whole document less cluttered, especially if the link is very long.

    Not sure if link concealing is within the scope of tree-sitter.

    invalid 
    opened by smartding 5
  • Table support?

    Table support?

    First, great job on this and ty for taking ownership on mardown in neovim treesitter :-)

    Are tables supported? I'm using treesitter to auto-generate vimdoc for my plugin from the README, I used to do with https://github.com/ikatyang/tree-sitter-markdown which recognized the below as table, this seems not to be the case here.

    Are tables supported?

    screenshot-1639535526

    For reference, here's how it's parsed with the ikatyang: screenshot-1639536128

    enhancement 
    opened by ibhagwan 5
  • Inline nodes don't return parent

    Inline nodes don't return parent

    Describe the bug

    When I am trying to get a parent of node with type inline I get nil. I am using neovim method tsnode:parent(). It works for all nodes but inline ones.

    I am not sure though if the bug relates to this parser.

    Code example

    ### Test Heading
    
    - test list
    

    Expected behavior

    When you get a parent node of an inline element (text test list and Test Heading) you get paragraph and atx_heading nodes respectively

    Actual behavior

    You get nil

    bug 
    opened by sarmong 3
  • List the extra hl groups

    List the extra hl groups

    I'm using the monokai colorscheme, and it doesn't support the hl groups you use. Can you list the (probably) needed extra hl groups so that other colorschemes would know which to add? All or the ones you think are nonstandard for other programming languages. I just don't know where to look really. Thanks!

    opened by OfekShochat 2
  • add basic latex support

    add basic latex support

    I use Markdown (well, Kramdown I suppose) for writing blog posts involving math. For this reason, LaTeX injection makes the most sense for me when controlled by a pair of $$. I took a basic stab at adding this to the markdown-inline parser.

    Currently having an unclosed $$ is an error; I imagine this is not desirable?

    Unrelatedly, any hints about being able to test this parser in neovim, which complains about "invalid node type at position 21 for language markdown_inline" would be most welcome.

    opened by ryleelyman 1
  • Add install, usage and examples

    Add install, usage and examples

    Hello,

    The readme is not very welcoming in my opinion. I'm not familiar with rust and don't know how to install the package. Also, it would be great to have a simple usage example in the readme

    enhancement 
    opened by sucrecacao 0
  • Faulty highlighting of inline comments

    Faulty highlighting of inline comments

    Describe the bug

    Code example

    # A heading
    A text with<!-- an inline comment -->
    

    Expected behavior The comment is highlighted as a comment.

    Actual behavior The comment is highlighted as normal text. See this screenshot (with neovim): 20221018_17h14m44s_grim

    This seems similar but not quite the same as #36, though I didn't quite understand what the issue is there (and it's also marked as "invalid").

    Highlighting gets applied correctly when I remove the heading and also if there's a comment as the first element of the paragraph.

    bug 
    opened by jghauser 3
  • Rust user observations

    Rust user observations

    I've been testing both tree-sitter-md and tree-sitter-markdown over the past couple days and have the following observations. Note that each observation is standalone and they aren't listed in any particular order. Also note that my use case is general in that it is not specifically or only concerned with syntax highlighting but includes other uses such as general markdown parsing/processing similar to pulldown-cmark.

    1. While tree-sitter-md (this crate) is more recently updated and maintained than tree-sitter-markdown, and depends on tree-sitter 0.20, its grammar seems more basic than tree-sitter-markdown's. In particular:

      • Block node inline kinds are present, and there are inline node inline kinds, but there seems to be no kind for just the plain text (?), so this would require additional calculations to extract the plain text, or replace parts of the rendered content with the inline spans... (?)
      • No support for tables due to strict commonmark spec (?)
      • Unsure if there are others... (?)
    2. tree-sitter-md uses a 2-pass approach which forces walking those multiple trees. Perhaps the MarkdownTree struct could provide a method to walk all of the trees? tree-sitter-markdown doesn't work like this, and there aren't any drawbacks afaik, because whether a node is a block or inline can be determined by its kind. Here's an example of usage for this crate, but tree-sitter-markdown does not need the nested bit to walk the inline tree(s).

      let input = String::from("Markdown content here...");
      
      let mut parser = MarkdownParser::default();
      let tree = parser.parse(&input.as_bytes(), None).unwrap();
      
      // walk block tree...
      let mut cursor = tree.block_tree().walk();
      'outer: loop {
          let node = cursor.node();
      
          // do something with node...
      
          // walk inline tree...
          if let Some(inline_tree) = tree.inline_tree(&node) {
              let mut inline_cursor = inline_tree.walk();
              'inline_outer: loop {
                  let inline_node = inline_cursor.node();
      
                  // do something with inline_node...
      
                  if !inline_cursor.goto_first_child() {
                      if !inline_cursor.goto_next_sibling() {
                          loop {
                              if !inline_cursor.goto_parent() {
                                  break 'inline_outer;
                              }
                              if inline_cursor.goto_next_sibling() {
                                  break;
                              }
                          }
                      }
                  }
              }
          }
      
          if !cursor.goto_first_child() {
              if !cursor.goto_next_sibling() {
                  loop {
                      if !cursor.goto_parent() {
                          break 'outer;
                      }
                      if cursor.goto_next_sibling() {
                          break;
                      }
                  }
              }
          }
      }
      
    3. In my initial testing, parsing 2 large but slightly different files both took roughly the same amount of time (50 ms) for each file, which should not be the case because it should only be processing the changes (right?). Using tree-sitter-markdown, it took 50 ms for the first one and 4 ms for the second one. Perhaps I'm doing something wrong?

    4. Saw an error during cargo test: thread 'project::tests::doc_from_empty' panicked at 'Could not load injection query: QueryError { row: 0, column: 1, offset: 1, message: "inline", kind: NodeType }', tree-sitter-md-0.1.1/bindings/rust/lib.rs:113:14. I did not see this on a standalone attempt; it only cropped up when I combined both tree-sitter-md and tree-sitter-markdown into the same project, so would guess that is causing some specific conflict.

    enhancement 
    opened by qtfkwk 3
Owner
Matthias Deiml
Matthias Deiml
Standards compliant, fast, secure markdown processing library in C

Hoedown Hoedown is a revived fork of Sundown, the Markdown parser based on the original code of the Upskirt library by Natacha Porté. Features Fully s

Hoedown 923 Dec 27, 2022
Rich text library supporting customizable Markdown formatting

Rich text library supporting customizable Markdown formatting

Brace Yourself Games 95 Dec 30, 2022
A simple YAML parser which produces a Node Tree Object representation of YAML Documents

A simple YAML parser which produces a Node Tree Object representation of YAML Documents and includes a find method to locate individual Nodes within the parsed Node Tree.

Timothy Rule 2 Sep 18, 2022
ARCHIVED - libbson has moved to https://github.com/mongodb/mongo-c-driver/tree/master/src/libbson

libbson ARCHIVED - libbson is now maintained in a subdirectory of the libmongoc project: https://github.com/mongodb/mongo-c-driver/tree/master/src/lib

mongodb 344 Nov 29, 2022
Unified device tree for Poco X3 NFC (surya/karna)

Copyright (C) 2020 The LineageOS Project Unified device configuration for POCO X3 / X3 NFC The POCO X3/X3 NFC (codenamed "karna / surya") are mid-rang

Rizak Kamal 0 Dec 9, 2021
Parser for argv that works similarly to getopt

About Most command-line programs have to parse options, so there are a lot of different solutions to this problem. Some offer many features, while oth

Jørgen Ibsen 157 Dec 22, 2022
tiny recursive descent expression parser, compiler, and evaluation engine for math expressions

TinyExpr TinyExpr is a very small recursive descent parser and evaluation engine for math expressions. It's handy when you want to add the ability to

Lewis Van Winkle 1.2k Dec 30, 2022
Simple .INI file parser in C, good for embedded systems

inih (INI Not Invented Here) inih (INI Not Invented Here) is a simple .INI file parser written in C. It's only a couple of pages of code, and it was d

Ben Hoyt 1.9k Jan 2, 2023
ini file parser

Iniparser 4 I - Overview This modules offers parsing of ini files from the C level. See a complete documentation in HTML format, from this directory o

Nicolas D 845 Jan 1, 2023
Small configuration file parser library for C.

libConfuse Introduction Documentation Examples Build & Install Origin & References Introduction libConfuse is a configuration file parser library writ

null 419 Dec 14, 2022
Universal configuration library parser

LIBUCL Table of Contents generated with DocToc Introduction Basic structure Improvements to the json notation General syntax sugar Automatic arrays cr

Vsevolod Stakhov 1.5k Dec 28, 2022
MiniCalculator with a simple parser.

MiniCalculator with a simple parser. This is a homework-expanded project. To learn something about parser and basic theory of programmi

GZTime 8 Oct 9, 2021
Simple and lightweight pathname parser for C. This module helps to parse dirname, basename, filename and file extension .

Path Module For C File name and extension parsing functionality are removed because it's difficult to distinguish between a hidden dir (ex: .git) and

Prajwal Chapagain 3 Feb 25, 2022
A PE parser written as an exercise to study the PE file structure.

Description A PE parser written as an exercise to study the PE file structure. It parses the following parts of PE32 and PE32+ files: DOS Header Rich

Ahmed Hesham 22 Nov 18, 2022
A TreeSitter parser for the Neorg File Format

NFF TreeSitter Parser A TreeSitter grammar for Neorg. Available Commands Command Result yarn installs needed dependencies (only do if you don't have t

Neorg 63 Dec 7, 2022
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

BLLIP Reranking Parser Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006 We request acknowledgement in any publications that

Brown Laboratory for Linguistic Information Processing 218 Dec 17, 2022
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)

Brown Laboratory for Linguistic Information Processing 218 Dec 17, 2022
tree-sitter parser and syntax highlighter for the Dwarf Fortress raw language

tree-sitter-dfraw A simple language parser and highlighter made with tree-sitter tokyonight nightfly Using with nvim-treesitter Please refer to the ad

null 2 Apr 1, 2022
Languages for the Tree-sitter parser generator wrapped in Swift packages

TreeSitterLanguages Languages for the Tree-sitter parser generator wrapped in Swift packages. Motivation There are two reasons this package exists: As

Simon Støvring 23 Dec 21, 2022
C++ implementation of R*-tree, an MVR-tree and a TPR-tree with C API

libspatialindex Author: Marios Hadjieleftheriou Contact: [email protected] Revision: 1.9.3 Date: 10/23/2019 See http://libspatialindex.org for full doc

null 633 Dec 28, 2022