A TreeSitter parser for the Neorg File Format

Overview

NFF TreeSitter Parser

A TreeSitter grammar for Neorg.

Available Commands

Command Result
yarn installs needed dependencies (only do if you don't have tree-sitter in your path)
yarn gen tree-sitter generate && node-gyp build
yarn test tree-sitter test
  • npm can be used instead of yarn
  • When yarn is used with no args then it's replaced with node install

Features

  • Has support for pretty much the entire specification
  • Has support for carryover tags
  • Can show errors (yes, it can show errors in a markdown-like format, crazy)
  • Isn't a massive editor hog

Drawbacks

  • Does not support attached modifiers (things like *this*).

❤️ Contribution

If you know a thing or two about TreeSitter and would like to support us by contributing then please do! If you have any questions you can ask away in the Github issues or on our discord! The specification can be found in the docs/ directory in the Neorg Repo.

Issues
  • Parser Rewrite

    Parser Rewrite

    Today's the day. The decision has been made. The parser - although cool, simply wasn't enough for a format like Neorg. Neorg needs a decent chunk of custom lexing. I will try and implement as many things as I can via regex rules and will only use the lexer as a last resort (aka the good way of doing things).

    Things we need:

    • If a token cannot be detected then mark it as a paragraph
    • Detecting *bold*, /italic/, _underline_, [these](links) and all the other modifiers present in the specification.
    • Properly identifying escape sequences
    • Distinguishing attached vs detached modifiers
    • Making sure the parser doesn't crash every 3.5ms :P
    • The ability for the parser to create a true structured syntax tree. For example, the current parser generates trees similar to this:
      heading1
        leading_whitespace
        heading1_prefix
        paragraph_segment
      paragraph
        paragraph_segment
            words
      

      Since the paragraph is usually underneath the heading you would logically want it to look like so:

        heading1
          etc.
          paragraph
              paragraph_segment
                 words
      

    Excited to start this new endeavor. I'll start proper work on the parser tomorrow. If you have any experience with TreeSitter and wanna hop along for the ride and help out then please do! :)

    opened by vhyrro 10
  • Error while compiling on Win10

    Error while compiling on Win10

    I am running windows, and I have this after following the documentation:

    Console output:

    [nvim-treesitter] [0/1] Downloading...
    [nvim-treesitter] [0/1] Checking out locked revision
    [nvim-treesitter] [0/1] Compiling...
    LINK : warning LNK4044: unrecognized option '/Z-reserved-lib-stdc++'; ignored^M
       Creating library parser.lib and object parser.exp^M
    parser-d99b79.o : error LNK2001: unresolved external symbol tree_sitter_norg_external_scanner_create^M
    parser-d99b79.o : error LNK2001: unresolved external symbol tree_sitter_norg_external_scanner_destroy^M
    parser-d99b79.o : error LNK2001: unresolved external symbol tree_sitter_norg_external_scanner_scan^M
    parser-d99b79.o : error LNK2001: unresolved external symbol tree_sitter_norg_external_scanner_serialize^M
    parser-d99b79.o : error LNK2001: unresolved external symbol tree_sitter_norg_external_scanner_deserialize^M
    parser.so : fatal error LNK1120: 5 unresolved externals^M
    nvim-treesitter[norg]: Error during compilation
    clang: error: linker command failed with exit code 1120 (use -v to see invocation)^M
    

    I am pretty sure it use to work before, but for some reason not any more, and I couldn't make sense of what the issue is:

    init.lua section:

    local parser_configs = require('nvim-treesitter.parsers').get_parser_configs()
    
    parser_configs.norg = {
        install_info = {
            url = "https://github.com/vhyrro/tree-sitter-norg",
            files = { "src/parser.c" },
            branch = "main"
        },
    }
    
    require'nvim-treesitter.configs'.setup {
        ensure_installed = 'all', -- one of "all", "maintained" (parsers with maintainers), or a list of languages
        ignore_install = { "fortran" }, -- List of parsers to ignore installing
        highlight = {
            enable = true -- false will disable the whole extension
        },
        playground = {
            enable = true,
            disable = {},
            updatetime = 25, -- Debounced time for highlighting nodes in the playground from source code
            persist_queries = false -- Whether the query persists across vim sessions
        },
        autotag = {enable = true},
        rainbow = {enable = true}
        -- refactor = {highlight_definitions = {enable = true}}
    }
    
    opened by kishikaisei 8
  • Macos startup errors

    Macos startup errors

    Yo!

    I use doom-nvim and .norg i not working on neither of my old intel macbook pro or my new M1 macbook air.

    M1 startup error

    nvim-treesitter[norg]: Error during compilation
    Undefined symbols for architecture arm64:
      "_tree_sitter_norg_external_scanner_create", referenced from:
          _tree_sitter_norg.language in parser-29db6d.o
      "_tree_sitter_norg_external_scanner_deserialize", referenced from:
          _tree_sitter_norg.language in parser-29db6d.o
      "_tree_sitter_norg_external_scanner_destroy", referenced from:
          _tree_sitter_norg.language in parser-29db6d.o
      "_tree_sitter_norg_external_scanner_scan", referenced from:
          _tree_sitter_norg.language in parser-29db6d.o
      "_tree_sitter_norg_external_scanner_serialize", referenced from:
          _tree_sitter_norg.language in parser-29db6d.o
    ld: symbol(s) not found for architecture arm64
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    

    Intel startup error

    nvim-treesitter[norg]: Error during compilation
    Undefined symbols for architecture x86_64:
      "_tree_sitter_norg_external_scanner_create", referenced from:
          _language.0 in cc3MwxxW.o
      "_tree_sitter_norg_external_scanner_deserialize", referenced from:
          _language.0 in cc3MwxxW.o
      "_tree_sitter_norg_external_scanner_destroy", referenced from:
          _language.0 in cc3MwxxW.o
      "_tree_sitter_norg_external_scanner_scan", referenced from:
          _language.0 in cc3MwxxW.o
      "_tree_sitter_norg_external_scanner_serialize", referenced from:
          _language.0 in cc3MwxxW.o
    ld: symbol(s) not found for architecture x86_64
    collect2: error: ld returned 1 exit status
    
    opened by molleweide 6
  • Ranged elements

    Ranged elements

    Adds the ranged element syntax discussed on Discord.

    • [x] change spoiler syntax from | to !
    • [x] implement ranged attached modifiers
    You can now use the range-modifier `|` to write |*ranged markup like this
    which has the benefit
    
    of allowing arbitrary whitespace inside of it      *|.
    
    Ranged markup elements are placed into the paragraph in which they start.
    They can contain other in-line syntax like more markup, links, etc.
    No detached modifiers, headings, etc. are allowed inside of them.
    
    • [x] support things like |*/ ranged bold + italic /*|
    • [x] implement general ranged element frames
    - |
      If you want to include more elements inside of a single detached modifier you can use a `||` ranged element.
      This allows you to for example include code blocks inside list items:
    
      @code lua
      print("Hello world!")
      @end
      ---
    
    opened by mrossinek 5
  • [Question] Feasability of using this parser as a basis for a markdown tree-sitter parser?

    [Question] Feasability of using this parser as a basis for a markdown tree-sitter parser?

    I guess this is more an enquiry than an issue, but I was wondering how realistic it would be to use this parser as the base for a markdown parser. Norg markup seems to be closely related to markdown and markdown is currently missing a tree-sitter parser. A proper markdown parser would be very useful, especially if you have markdown documents with lots of code blocks (e.g. rmarkdown).

    opened by BlackEdder 4
  • Attached modifiers

    Attached modifiers

    This adds support for attached modifiers.

    :warning: this is very much WIP!

    Open tasks:

    • [x] rewrite check_attached with int32 instead of unsigned char
    • [x] allow attached modifiers within headings (currently they immediately cause the heading to end)
    • [x] permit nested attached modifiers
    • this will likely come with some caveat of a limited nesting level support...

    We also still need to add the following new attached modifiers:

    • [x] +: inline comment
    • [x] $: inline math
    • [x] =: variable reference
    opened by mrossinek 3
  • [dev branch] Paragraph delimiters terminate parent headings instead of parent indent segments.

    [dev branch] Paragraph delimiters terminate parent headings instead of parent indent segments.

    Reproduction example:

    * Heading
      - \
          Text
          ---
    This text no longer belongs in the heading even though it should.
    

    The --- paragraph delimiter should end the indent segment, but it instead terminates the heading.

    opened by vhyrro 2
  • Cyrillic `н` letter bug

    Cyrillic `н` letter bug

    Treesitter considers Cyrillic н letter (the analog of Latin n) at the beginning of the word as NeorgMarkupVariableDelimiter and if there is another one somewhere at the end of the word, everything between these two letters inside one paragraph becomes a variable.

    For example:

    Я тоже не знал, пока в моей жизни не появились проклятые рыбки. Жизнь аквариумиста - это один сплошной стресс.
    
    bug 
    opened by anuvyklack 2
  • Ignore unneeded data

    Ignore unneeded data

    Not sure if this is removing too much data but we actually only care about:

    grammar.js
    src/scanner.cc
    

    Running tree-sitter generate produces src/parser.c which we also need for :TSInstall norg purposes (duh...)

    Please give this a try and let me know what you think

    opened by mrossinek 2
  • Compilation error during install

    Compilation error during install

    I have the following in my vim config

    parser_configs.norg = {
      install_info = {
        url = "https://github.com/vhyrro/tree-sitter-norg",
        files = { "src/parser.c", "src/scanner.cc" },
        branch = "main",
      },
    }
    

    When I run :TSUpdate, I get this error:

    nvim-treesitter[norg]: Error during compilation
    src/scanner.cc:352:208: error: wrong number of arguments specified for ‘nodiscard’ attribute
      352 |      TokenType check_detached(TSLexer* lexer, const std::vector<TokenType>& results, const std::array<unsigned char, Size>& expected, std::pair<char, std::vector<TokenType>> terminate_at = { 0, NONE | NONE })
          |                                                                                                                                                                                                                ^
    

    From :checkhealth nvim_treesitter

    ## Installation
      - WARNING: `tree-sitter` executable not found (parser generator, only needed for :TSInstallFromGrammar, not required for :TSInstall)
      - OK: `node` found v16.4.2 (only needed for :TSInstallFromGrammar)
      - OK: `git` executable found.
      - OK: `cc` executable found. Selected from { vim.NIL, "cc", "gcc", "clang", "cl" }
      - OK: Neovim was compiled with tree-sitter runtime ABI version 13 (required >=13). Parsers must be compatible with runtime ABI.
    

    And cc --version

    cc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    Copyright (C) 2019 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    opened by stevearc 2
  • [WIP] Formal Syntax Specification

    [WIP] Formal Syntax Specification

    Rewrites the rather informal NFF spec (https://github.com/nvim-neorg/neorg/blob/main/docs/NFF-0.1-spec.md) to become a lot more formal to provide a ground-truth for parser implementations.

    opened by mrossinek 1
  • “:TSInstall norg” command stuck on compiling

    “:TSInstall norg” command stuck on compiling

    [email protected] from nvim-treesitter/nvim-treesitter recommended me to forward this issue here.

    Describe the bug

    I’m not able to install the norg parser. It never finishes compiling.

    To Reproduce

    1. Open up nvim
    2. Run :TSInstall norg

    Expected behavior

    I expected the norg parser to be installed within a few seconds

    Output of :checkhealth nvim-treesitter

    nvim-treesitter: require("nvim-treesitter.health").check()                                                                           
    ========================================================================                                                             
    ## Installation                                                                                                                      
      - WARNING: `tree-sitter` executable not found (parser generator, only needed for :TSInstallFromGrammar, not required for :           TSInstall)                                                                                                                         
      - OK: `node` found v14.18.1 (only needed for :TSInstallFromGrammar)                                                                
      - OK: `git` executable found.                                                                                                      
      - OK: `cc` executable found. Selected from { vim.NIL, "cc", "gcc", "clang", "cl", "zig" }                                          
        Version: gcc (GCC) 10.3.0                                                                                                        
      - OK: Neovim was compiled with tree-sitter runtime ABI version 13 (required >=13). Parsers must be compatible with runtime ABI.    
                                                                                                                                         
    ## Parser/Features H L F I J                                                                                                         
      - nix            ✓ ✓ ✓ . ✓                                                                                                         
      - clojure        ✓ ✓ ✓ . ✓                                                                                                         
                                                                                                                                         
      Legend: H[ighlight], L[ocals], F[olds], I[ndents], In[j]ections                                                                    
             +) multiple parsers found, only one will be used                                                                            
             x) errors found in the query, try to run :TSUpdate {lang}
    

    Output of nvim --version

    NVIM v0.6.1
    Build type: Release
    LuaJIT 2.1.0-beta3
    Compiled by nixbld
    
    Features: +acl +iconv +tui
    See ":help feature-compile"
    
       system vimrc file: "$VIM/sysinit.vim"
      fall-back for $VIM: "
    /nix/store/29bl3zlp96r1w8d0ii3zcim1p9fdk1xm-neovim-unwrapped-0.6.1/share/nvim
    "
    
    Run :checkhealth for more info
    

    Additional context

    No response

    opened by alandao 1
  • Current development branch

    Current development branch

    Consolidates the work on:

    • the formal syntax specification (#29)
    • the breaking changes and paragraph parsing refactoring (#30)
    • the inline link targets (#25)
    opened by mrossinek 0
  • Links that span multiple lines are treated as errors

    Links that span multiple lines are treated as errors

    Although it is disallowed for such syntax to be valid:

    {
    * My Link
    }
    

    and

    [
    my description
    ]
    

    This syntax is allowed:

    {* My
    link}
    

    and

    [my
    description]
    

    But since both elements allow only a paragraph_segment any newlines are treated as erroneous, therefore breaking things like Neorg's hop module.

    opened by vhyrro 1
  • Compilation error during install

    Compilation error during install

    I have the following in my vim config

    local parser_config = require 'nvim-treesitter.parsers'.get_parser_configs()
    
    parser_config.norg = {
      install_info =
        { url    = 'https://github.com/vhyrro/tree-sitter-norg'
        , files  = { 'src/parser.c', 'src/scanner.cc' }
        , branch = 'main'
      }
    }
    

    When I run TSUpdate, I get this error:

    nvim-treesitter[norg]: Error during compilation
    src/scanner.cc:97:39: error: expected expression
            return std::vector<TokenType>({ lhs, static_cast<TokenType>(rhs) });
                                          ^
    src/scanner.cc:100:27: warning: rvalue references are a C++11 extension [-Wc++11-extensions]
        std::vector<TokenType>&& operator|(std::vector<TokenType>&& lhs, TokenType rhs)
                              ^
    src/scanner.cc:100:62: warning: rvalue references are a C++11 extension [-Wc++11-extensions]
        std::vector<TokenType>&& operator|(std::vector<TokenType>&& lhs, TokenType rhs)
                                                                 ^
    src/scanner.cc:327:41: error: expected ';' at end of declaration list
        std::vector<size_t>& get_tag_stack() noexcept { return m_TagStack; }
                                            ^
                                            ;
    src/scanner.cc:556:27: warning: in-class initialization of non-static data member is a C++11 extension [-Wc++11-extensions]
        TokenType m_LastToken = NONE;
                              ^
    src/scanner.cc:559:26: warning: in-class initialization of non-static data member is a C++11 extension [-Wc++11-extensions]
        size_t m_ParsedChars = 0, m_IndentationLevel = 0;
                             ^
    src/scanner.cc:559:50: warning: in-class initialization of non-static data member is a C++11 extension [-Wc++11-extensions]
        size_t m_ParsedChars = 0, m_IndentationLevel = 0;
                                                     ^
    src/scanner.cc:565:54: warning: in-class initialization of non-static data member is a C++11 extension [-Wc++11-extensions]
        const std::array<int32_t, 6> s_DetachedModifiers = { '*', '-', '>', '|', '=', '~' };
                                                         ^
    src/scanner.cc:121:13: error: use of undeclared identifier 'advance'
                advance(lexer);
                ^
    src/scanner.cc:131:13: error: use of undeclared identifier 'advance'
                advance(lexer);
                ^
    src/scanner.cc:141:17: error: use of undeclared identifier 'advance'
                    advance(lexer);
                    ^
    src/scanner.cc:148:21: error: use of undeclared identifier 'advance'
                        advance(lexer);
                        ^
    src/scanner.cc:162:13: error: use of undeclared identifier 'advance'
                advance(lexer);
                ^
    src/scanner.cc:167:48: error: use of undeclared identifier 'm_Current'
                    if (lexer->lookahead == ']' && m_Current != '\\')
                                                   ^
    src/scanner.cc:169:21: error: use of undeclared identifier 'advance'
                        advance(lexer);
                        ^
    src/scanner.cc:141:17: error: use of undeclared identifier 'advance'
                    advance(lexer);
                    ^
    src/scanner.cc:148:21: error: use of undeclared identifier 'advance'
                        advance(lexer);
                        ^
    src/scanner.cc:162:13: error: use of undeclared identifier 'advance'
                advance(lexer);
                ^
    src/scanner.cc:167:48: error: use of undeclared identifier 'm_Current'
                    if (lexer->lookahead == ']' && m_Current != '\\')
                                                   ^
    src/scanner.cc:169:21: error: use of undeclared identifier 'advance'
                        advance(lexer);
                        ^
    src/scanner.cc:177:17: error: use of undeclared identifier 'advance'
                    advance(lexer);
                    ^
    src/scanner.cc:184:20: error: use of undeclared identifier 'check_link'
                return check_link(lexer);
                       ^
    src/scanner.cc:188:13: error: use of undeclared identifier 'advance'
                advance(lexer);
                ^
    src/scanner.cc:206:17: error: use of undeclared identifier 'skip'
                    skip(lexer);
                    ^
    src/scanner.cc:211:17: error: use of undeclared identifier 'advance'
                    advance(lexer);
                    ^
    src/scanner.cc:217:21: error: use of undeclared identifier 'advance'
                        advance(lexer);
                        ^
    src/scanner.cc:220:25: error: use of undeclared identifier 'advance'
                            advance(lexer);
                            ^
    src/scanner.cc:223:29: error: use of undeclared identifier 'advance'
                                advance(lexer);
                                ^
    src/scanner.cc:227:37: error: use of undeclared identifier 'advance'
                                        advance(lexer);
                                        ^
    src/scanner.cc:252:25: error: use of undeclared identifier 'advance'
                            advance(lexer);
                            ^
    fatal error: too many errors emitted, stopping now [-ferror-limit=]
    6 warnings and 20 errors generated.
    

    From :checkhealth nvim_treesitter

      health#nvim_treesitter#check
      ========================================================================
      ## Installation
        - OK: `tree-sitter` found  0.20.0 (parser generator, only needed for :TSInstallFromGrammar)
        - OK: `node` found v16.8.0 (only needed for :TSInstallFromGrammar)
        - OK: `git` executable found.
        - OK: `cc` executable found. Selected from { vim.NIL, "cc", "gcc", "clang", "cl" }
        - OK: Neovim was compiled with tree-sitter runtime ABI version 13 (required >=13). Parsers must be compatible with runtime ABI.
    

    And cc --version

    Apple clang version 12.0.5 (clang-1205.0.22.11)
    Target: x86_64-apple-darwin20.6.0
    Thread model: posix
    InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    
    opened by tdjordan 21
Owner
Neorg
Neorg
Simple .INI file parser in C, good for embedded systems

inih (INI Not Invented Here) inih (INI Not Invented Here) is a simple .INI file parser written in C. It's only a couple of pages of code, and it was d

Ben Hoyt 1.8k Aug 6, 2022
ini file parser

Iniparser 4 I - Overview This modules offers parsing of ini files from the C level. See a complete documentation in HTML format, from this directory o

Nicolas D 807 Aug 1, 2022
Small configuration file parser library for C.

libConfuse Introduction Documentation Examples Build & Install Origin & References Introduction libConfuse is a configuration file parser library writ

null 411 Aug 5, 2022
Simple and lightweight pathname parser for C. This module helps to parse dirname, basename, filename and file extension .

Path Module For C File name and extension parsing functionality are removed because it's difficult to distinguish between a hidden dir (ex: .git) and

Prajwal Chapagain 3 Feb 25, 2022
A PE parser written as an exercise to study the PE file structure.

Description A PE parser written as an exercise to study the PE file structure. It parses the following parts of PE32 and PE32+ files: DOS Header Rich

Ahmed Hesham 15 Jun 22, 2022
C ANSI Library to work with BER-TLV format data.

BER-TLV Challenge Library As requested a shared library(.so) were developed using C programming language to interpret and works with BER-TLV objects.

null 1 Oct 14, 2021
code (written in C) to check day by entering Date in DD/MM/YYYY format

Minimal Calendar Last Updated : Oct. 26, 2021 This code(written in C) can be used to know the day of the entered date in DD/MM/YYYY format. This c

Priyanshu Gupta 1 Oct 29, 2021
convert elf file to single c/c++ header file

elf-to-c-header Split ELF to single C/C++ header file

Musa Ünal 2 Nov 4, 2021
Parser for argv that works similarly to getopt

About Most command-line programs have to parse options, so there are a lot of different solutions to this problem. Some offer many features, while oth

Jørgen Ibsen 152 Aug 3, 2022
tiny recursive descent expression parser, compiler, and evaluation engine for math expressions

TinyExpr TinyExpr is a very small recursive descent parser and evaluation engine for math expressions. It's handy when you want to add the ability to

Lewis Van Winkle 1.1k Aug 7, 2022
Universal configuration library parser

LIBUCL Table of Contents generated with DocToc Introduction Basic structure Improvements to the json notation General syntax sugar Automatic arrays cr

Vsevolod Stakhov 1.4k Jul 27, 2022
MiniCalculator with a simple parser.

MiniCalculator with a simple parser. This is a homework-expanded project. To learn something about parser and basic theory of programmi

GZTime 8 Oct 9, 2021
Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c

Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c

Rasmus 1.1k Aug 8, 2022
A simple YAML parser which produces a Node Tree Object representation of YAML Documents

A simple YAML parser which produces a Node Tree Object representation of YAML Documents and includes a find method to locate individual Nodes within the parsed Node Tree.

Timothy Rule 2 Jul 14, 2022
A markdown parser for tree-sitter

tree-sitter-markdown A markdown parser for tree-sitter Progress: Leaf blocks Thematic breaks ATX headings Setext headings Indented code blocks Fenced

Matthias Deiml 193 Aug 9, 2022
Locate the current executable and the current module/library on the file system

Where Am I? A drop-in two files library to locate the current executable and the current module on the file system. Supported platforms: Windows Linux

Gregory Pakosz 366 Aug 7, 2022
A small and portable INI file library with read/write support

minIni minIni is a portable and configurable library for reading and writing ".INI" files. At just below 900 lines of commented source code, minIni tr

Thiadmer Riemersma 270 Aug 8, 2022
Beacon Object File (BOF) for remote process injection via thread hijacking

cThreadHijack ___________.__ .______ ___ .__ __ __ ___\__ ___/| |_________ ____ _____

Connor McGarr 141 Aug 1, 2022
A Cobalt Strike Beacon Object File (BOF) project which uses direct system calls to enumerate processes for specific loaded modules or process handles.

FindObjects-BOF A Cobalt Strike Beacon Object File (BOF) project which uses direct system calls to enumerate processes for specific modules or process

Outflank B.V. 241 Aug 2, 2022