BNFLite is a C++ template library for lightweight flexible grammar parsers

Overview

About

BNFLite is a C++ template library for lightweight flexible grammar parsers. BNFLite offers creative approach when the developer can specify a language for further parsing directly in the C++ code. Moreover, such "specifications" are executable now!

Purpose

BNFLite is intended to parse:

  • command line arguments;
  • small configuration files;
  • output of different tools

Preface

Once the author participated in the development of some tool which was invented to integrate together a lot of proprietary modules. There were thousands combinations of command line options, poor examples, ambiguous docs. So the command line was not compatible from version to version. Up-to-date formal BNF specs of the command line language could help but not for projects with limited budget. Starting YACC era, there is a solution to support some extra executable code describing language specifics. As a rule, usage of such means is too heavy because it is extra stuff and it is not BNF. BNFLite does not have such drawbacks!

Usage

You just need to include bnflite.h in to your C++ application:

#include "bnflite.h"

Concept

BNF Notation

The BNF (Backus–Naur form) specifies rules of a context-free grammar. Each computer language should have a complete BNF syntactic specification. Formal BNF term is called "production rule". Each rule except "terminal" is a conjunction of a series of more concrete rules or terminals:

production_rule ::= ... | ... ;

For example:


   
     ::= <0> | <1> | <2> | <3> | <4> | <5> | <6> | <7> | <8> | <9>

    
      ::= 
     
       | 
       
        
       
      
     
    
   

which means that the number is just a digit or another number with one more digit.

Generally terminal is a symbol called "token". There are two types of productions rules: Lexical production is called "lexem". We will call syntax production rule as just a "rule".

BNFlite notation

All above can be presented in C++ friendly notation:

Lexem Digit = Token("0") | "1"  | "2" | "4" | "5" | "6" | "7" | "8" | "9"; //C++11: = "0"_T + "1" ...
LEXEM(Number) = Digit | Digit + Number;

These both expressions are executable due to this "bnflite.h" source code library which supports "Token", "Lexem" and "Rule" classes with overloaded "+" and "|" operators. More practical and faster way is to use simpler form:

Token Digit("01234567");
Lexem Number = Iterate(1, Digit);

Now e.g. bnf::Analyze(Number, "532") can be called with success.

ABNF Notation

Augmented BNF specifications introduce constructions like "* " to support repetition where and imply at least and at most occurrences of the element. For example, 3*3 allows exactly three and 1*2 allows one or two. Simplified construction * allows any number(from 0 to infinity). Alternatively 1* requires at least one. BNFLite offers to use the following constructions: Series(a, token, b); Iterate(a, lexem, b); Repeat(a, rule, b);

But BNFLite also supports ABNF-like forms:

*<3> - any 2 or 3 digit number */ Lexem I_DIGIT = 1*DIGIT; /* 1* any number */ Lexem O_DIGIT = *DIGIT; /* * - any number or nothing */ Lexem N_DIGIT = !DIGIT; /* <0>*<1> - one digit or nothing */``` ">
Token DIGIT("0123456789");
Lexem AB_DIGIT = DIGIT(2,3)  /* <2>*<3>
           
             - any 2 or 3 digit number */
Lexem I_DIGIT = 1*DIGIT;     /* 1*
            
              any number  */
Lexem O_DIGIT = *DIGIT;      /* *
             
               - any number or nothing */
Lexem N_DIGIT = !DIGIT;      /* <0>*<1>
              
                - one digit or nothing */```

              
             
            
           

So, you can almost directly transform ABNF specifications to BNFLite

User's Callbacks

To receive intermediate parsing results the callback system can be used. The first kind of callback can be used as expression element:

bool MyNumber(const char* number_string, size_t length_of_number) //...
Lexem Number = Iterate(1, Digit) + MyNumber;

The second kind of callback can be bound to production Rule. The user need to define own context type and work with it:

typedef Interface
      
        Usr;
Usr DoNothing(std::vector
       
        & usr) {  return usr[0]; }
//...
Rule Foo;
Bind(Foo, DoNothing);

       
      

Restrictions for Recursion in Rules

Lite version have some restrictions for rule recursion. You can not write:

Lexem Number = Digit | Digit + Number; /* failure */

because Number is not initialized yet in the expressions. You can use macro LEXEM for such constructions

LEXEM(Number) = Digit | Digit + Number;

that means

Lexem Number; Number = Digit | Digit + Number;

when parsing is finished (after Analyze call) you have to break recursion manually like this:

Number = Null();

Otherwise not all BNFlite internal objects will be released (memory leaks expected)

Design Notes

BNFlite is а class library. It is not related to the template Boost::Spirit library This is expendable approach, for example, the user can inherit public lib classes to create own constructions to parse and perform simultaneously. It fact, parser goes from implementation of domain specific language here.
The prior-art is rather "A BNF Parser in Forth".

Examples

  1. examples/cmd.cpp - simple command line parser
  2. examples/cfg.cpp - parser of restricted custom xml configuration
  3. examples/ini.cpp - parser of ini-files (custom parsing example to perform grammar spaces and comments)
  4. examples/calc.cpp - arithmetic calculator

$cd examples

$ g++ -I. -I.. calc.cpp

$ ./a.exe "2+(1+3)*2"

Result of 2+(1+3)*2 = 10

Examples have been tested on several msvc and gcc compilers.

Unit Test ( C-like expression parser and calculator )

  1. c_xprs/utest.cpp - simple unit test
  2. c_xprs/c_xprs.h - parser of C-like expressions

$cd c_xprs

$ g++ -I. -I.. utest.cpp

$ ./a.exe

Output result of several C-like expressions

Demo (simplest formula compiler & bite-code interpreter)

  • formula_compiler/main.cpp - starter of byte-code formula compiler and interpreter
  • formula_compiler/parser.cpp - BNF-lite parser with grammar section and callbacks
  • formula_compiler/code_gen.cpp - byte-code generator
  • formula_compiler/code_lib.cpp - several examples of embedded functions (e.g POW(2,3) - power: 222)
  • formula_compiler/code_run.cpp - byte-code interpreter (used SSE2 for parallel calculation of 4 formulas)

To build and run (remove option -march=pentium4 if it needed for arm or 64 build):

$ cd formula_compiler

$ g++ -O2 -march=pentium4 -std=c++14 -I.. main.cpp parser.cpp code_gen.cpp code_lib.cpp code_run.cpp

$ ./a.exe "2 + 3 *GetX()"

5 byte-codes in: 2 + 3 *GetX()

Byte-code: Int(2),Int(3),opCall<1>,opMul ,opAdd ;

result = 2, 5, 8, 11;

Note: The embedded function GetX() returns a sequential integer number started from 0. So, the result is four parallel computations: 2 + 3 * 0 = 2; 2 + 3 * 1 = 5; 2 + 3 * 2 = 8; 2 + 3 * 3 = 11.

Contacts

Alexander Semjonov : [email protected]

Contributing

If you have any idea, feel free to fork it and submit your changes back to me.

Donations

If you think that the library you obtained here is worth of some money and are willing to pay for it, feel free to send any amount through WebMoney WMID: 047419562122

Roadmap

  • Productize several approaches to catch syntax errors by means of this library (done in unit test)
  • Generate fastest C code parser from C++ BNFlite statements (..looking for customer)
  • Support wide characters (several approaches, need customer reqs, ..looking for customer)
  • Support releasing of ringed Rules (see "Restrictions for Recursion"), in fact the code exists but it is not "lite"

License

  • MIT
You might also like...
the implementations of 'A Flexible New Technique for Camera Calibration' and Bouguet's method

StereoCameraCalibration MonocularCameraCalibration/StereoCameraCalibration/StereoCameraRectification 1、Class "MonocularCameraCalibration" provides the

A terse, flexible language and runtime for creating and executing visual novels.

Fabulist A terse, flexible language and runtime for creating and executing visual novels. Contributing We're open to contributions from anyone and eve

VDBFusion: Flexible and Efficient TSDF Integration
VDBFusion: Flexible and Efficient TSDF Integration

VDBFusion: Flexible and Efficient TSDF Integration This is a small utility library that implements the VDBFusion algorithm, similar to TSDF-based reco

First open source android modding library for Geometry Dash Based on Hooking-and-Patching-android-template

Android-ML First open source android modding library for Geometry Dash Based on Hooking-and-Patching-android-template Installation Download this githu

free template opensource with minimal depends library flutter & dart

Project Name Sosial Media Donate ID: Jika Anda Menyukai karya saya dan ingin memberikan dana untuk saya membeli beberapa snack silahkan donasi seberap

Template library and blog that explain how JSI modules are built from scratch in React Native

react-native-jsi-template This is an example library that explains how anyone can build jsi modules from scratch in React Native. This code is written

C language simulation CXX template library

lib.xtl This is the implementation of a C language imitation cxx template library Explain This library simulates some operations in STL. Simulation ve

Xtl - eXtended Template Library

eXtended Template Library Open Hub Linux Windows Coverage Technical Debt Code Quality License Contribute with Gratipay Contribute with Beerpay View th

Generic Math Template Library

Generic Math Template Library

Comments
  • Used reserved names

    Used reserved names

    INTERNATIONAL STANDARD ISO/IEC14882:2011(E) Third edition 2011-09-01

    17.6.4.3.2 Global names [global.names] 1 Certain sets of names and function signatures are always reserved to the implementation: — Each name that contains a double underscore _ _ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use. — Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

    opened by aka-sps 1
  • std::function vs function pointer (was ::Non-portable: Converting a function pointer to an object pointer type or vice versa is conditionally-supported)

    std::function vs function pointer (was ::Non-portable: Converting a function pointer to an object pointer type or vice versa is conditionally-supported)

    INTERNATIONAL STANDARD ISO/IEC14882:2011(E) Third edition 2011-09-01

    5.2.10 Reinterpret cast [expr.reinterpret.cast]

    8 Converting a function pointer to an object pointer type or vice versa is conditionally-supported. The meaning of such a conversion is implementation-defined, except that if an implementation supports conversions in both directions, converting a prvalue of one type to the other type and back, possibly with different cv-qualification, shall yield the original pointer value.

    opened by aka-sps 1
Owner
Alexander S
I am a professional software developer with very strong background in mathematics, C/C++, Linux, excellent algorithms writing capacity and experience of working
Alexander S
Parsing Expression Grammar Template Library

Welcome to the PEGTL The Parsing Expression Grammar Template Library (PEGTL) is a zero-dependency C++ header-only parser combinator library for creati

The Art of C++ 1.6k Jan 8, 2023
This is a C plus plus coding template for Compitative programming. This template is very optimized for the Online Judgment

C-plusplus-compitative-Programming-Template Tech We Used C++ Features Easy to compile Easy debug facility Analysised and optimized base template Steps

Alan Binu 15 Jan 27, 2022
OpenGL Template Engine - a C++ OpenGL graphics engine which aimed to be a simple startup template for 3D OpenGL projects.

OpenGL Template Engine is a C++ OpenGL graphics engine which aimed to be a simple startup template for 3D OpenGL projects. This is the template I personally use for my own projects and provides me with the general OpenGL 3D render setup with model import and UI.

Marcus Nesse Madland 2 May 16, 2022
Tree-sitter grammar for comment tags like TODO, FIXME(user).

Tree-sitter grammar for comment tags like TODO:, FIXME(user):, etc. Useful to be embedded inside comments.

Santos Gallegos 86 Jan 9, 2023
tree-sitter grammar for emacs lisp

Tree-sitter Grammar for Emacs Lisp A simple tree-sitter grammar for elisp. Syntax supported: Atoms (integers, floats, strings, characters, symbols) Li

Wilfred Hughes 30 Dec 7, 2022
Flexible, portable, high-performance bit fields C++ library. unsigned a:13 becomes F<13> a;

C-plus-plus-library-bit-fields Flexible, portible, high-performance bit fields C++ library. The bit fields are specified with a dummy structure where

Walt Karas 27 Oct 12, 2022
A flexible image resampling library

Image Resampling Library This is a simple yet flexible image resampling library that supports the following image samplers: Nearest Average Bilinear G

null 39 Sep 15, 2022
foxBMS is a free, open and flexible development environment to design battery management systems.

foxBMS is a free, open and flexible development environment to design battery management systems. It is the first modular open source BMS development platform.

The foxBMS Team 100 Jan 10, 2023
FlexOS: Towards Flexible OS Isolation (ASPLOS'22) Artifact Evaluation Repository

FlexOS ASPLOS'22 Artifact Evaluation This repository contains the artifacts, including experiments and graphs, for the paper: FlexOS: Towards Flexible

null 12 Aug 24, 2022
☕ GDBFrontend is an easy, flexible and extensionable gui debugger.

GDBFrontend is an easy, flexible and extensionable gui debugger. Installing Requirements GDB => 8.2 (with python3) python3 => 3.2 tmux PIP Package (Py

Oğuzhan Eroğlu 2.5k Jan 1, 2023