Small Extremely Powerful Header Only C++ Lexical Analyzer/String Parser Library

Related tags

Miscellaneous lexpp
Overview


GitHub repo size Lines of code GitHub commit activity
Maintenance Support me on Patreon


lexpp

Small Extremely Powerful Header Only C++ Lexical Analyzer/String Parser Library

Lexpp is made with simplicity and size in mind. The entire library is about 500 lines!

Lexpp is very powerful and can be used for almost all parsing needs!

You can check the examples/ for more elaborate usage.

How to Use

Just place the lexpp.h file in your project include directory.

In one cpp file define LEXPP_IMPLEMENTATION before importing lexpp like this:

#define LEXPP_IMPLEMENTATION
#include "lexpp.h"

You are all done to use lexpp!

Basic Examples

String Parsing

std::string data = "some text to parse! ";
std::vector<std::string> tokens = lexpp::lex(data, " ;\n");

for(std::string& token : tokens){
    std::cout << token << std::endl;
}

Some more string parsing

std::string data = "some text to parse! ";
std::vector<std::string> tokens = lexpp::lex(data, {"<=", "<<", "\n", "::", ",", "}", "{", ";", " "}, false);

for(std::string& token : tokens){
    std::cout << token << std::endl;
}

Using Custom Token Classifier

enum MyTokens{
    Keyword = 0,
    Number,
    String,
    Other
};

static std::string TokenToString(int tok){
switch(tok){
    case Keyword: return "Keyword";
    case Number:  return "Number";
    case String:  return "String";
    case Other:   return "Other";
}
}

Now the Lexing

std::vector<std::string> keywords = {"for", "void", "return", "if", "int"};
std::vector<lexpp::Token> tokens = lexpp::lex(data, {"<=", "<<", "\n", "::", ",", "}", "{", "(", ")" ";", " "}, [keywords](std::string& token, bool* discard, bool is_separator) -> int {
    if(std::find(keywords.begin(), keywords.end(), token) != keywords.end()){
        return MyTokens::Keyword;
    }
    if(is_number(token))
        return MyTokens::Number;
    else
        return MyTokens::String;
}, false);

for(lexpp::Token& token : tokens){
    std::cout << TokenToString(token.type) << " -> " << token.value << std::endl;
}

Using the TokenParser class

We need to extend the TokenParser class to have our cuastom token parser

class MyTokenParser : public lexpp::TokenParser
{
public:
MyTokenParser(std::string data, std::string separators)
:TokenParser(data, separators, false){}

virtual int process_token(std::string& token, bool* discard, bool isSeparator) override
{
    if(std::find(keywords.begin(), keywords.end(), token) != keywords.end())
        return MyTokens::Keyword;
    else if(is_number(token))
        return MyTokens::Number;
    else if(isSeparator)
        return MyTokens::Other;
    else
        return MyTokens::String;
}    

std::vector<std::string> keywords = {"for", "void", "return", "if", "int"};
};

Now using the class with the lexer

std::vector<lexpp::Token> tokens =     lexpp::lex(std::make_shared<MyTokenParser>(data, "\n :,[]{}().\t"));
for(lexpp::Token& token : tokens){
    std::cout << TokenToString(token.type) << " -> " << token.value << std::endl;
}

Making an email parser with lexpp

First a strutto store out data

struct Email{
    std::string name;
    std::string domainFront;
    std::string domainEnd;
    std::string domain;
};

Now we need to make our custom token parser for email parsing

class EmailTokenParser : public lexpp::TokenParser
{
public:
EmailTokenParser(std::string data, std::string separators = "\n@.")
:TokenParser(data, separators, true){}

virtual int process_token(std::string& token, bool* discard, bool isSeparator) override
{
    if(isSeparator){
        if(ci == 2){
            currMail.domain = currMail.domainFront + "." + currMail.domainEnd;
            emailIds.push_back(currMail);
            ci = 0;
            *discard = true;
            return 0;  
        }
        if(token.size() <= 0){
            *discard = true;
            return 0;  
        }
        if(token == "\n"){
            ci = 0;
            *discard = true;
            return 0;  
        }
        else if(token == "@"){
            ci = 1;
            *discard = true;
            return 0;                
        }
        else if(token == "."){
            ci = 2;
            *discard = true;
            return 0;                
        }
    }

    if(ci == 0)
        currMail.name = token;
    else if(ci == 1)
        currMail.domainFront = token;
    else if(ci == 2)
        currMail.domainEnd = token;
}    

int ci = 0;
Email currMail;
std::vector<Email> emailIds;
};

Now finallh calling lex

std::shared_ptr<EmailTokenParser> tok_parser = std::make_shared<EmailTokenParser>(data+"\n", "\n@.");
lexpp::lex(tok_parser);
for(Email& email : tok_parser->emailIds){
    std::cout << "Email : \nNAME: " << email.name << "\nDOMAIN : " << email.domain << std::endl;
}
You might also like...
 Simple Stepper Motor Analyzer
Simple Stepper Motor Analyzer

A DYI stepper motor analyzer. This is a new design that is based on Raspberry Pi Pico and users a compact single PCB design. NOTE: The legacy STM32 based stepper analyzer was moved to this repository https://github.com/zapta/legacy_stepper_motor_analyzer.

NAND (JEDEC / ONFI) Analyzer for Saleae Logic
NAND (JEDEC / ONFI) Analyzer for Saleae Logic

NandAnalyzer NAND (JEDEC / ONFI) Analyzer for Saleae Logic The plugin was only tested against NV-DDR3 traces (and I use the term "test" lightly). You

A multimedia framework developed from scratch in C/C++, bundled with test programs and a neat media analyzer.

MiniVideo framework MiniVideo is a multimedia framework developed from scratch in C/C++, bundled with small testing programs and a neat media analyser

📚 single header utf8 string functions for C and C++

📚 utf8.h A simple one header solution to supporting utf8 strings in C and C++. Functions provided from the C header string.h but with a utf8* prefix

Fast C/C++ CSS Parser (Cascading Style Sheets Parser)

MyCSS — a pure C CSS parser MyCSS is a fast CSS Parser implemented as a pure C99 library with the ability to build without dependencies. Mailing List:

Small Header-Only Window and OpenGL Manager.
Small Header-Only Window and OpenGL Manager.

LxDemOWin Linux Demo OpenGL and Window manager A small header-Only Window and OpenGL manager made in C, written in about 2 hours. With some basic code

Small Header-Only Window and OpenGL Manager.
Small Header-Only Window and OpenGL Manager.

LxDemOWin Linux Demo OpenGL and Window manager A small header-Only Window and OpenGL manager made in C, written in about 2 hours. With some basic code

RemixDB: A read- and write-optimized concurrent KV store. Fast point and range queries. Extremely low write-amplification.

REMIX and RemixDB The REMIX data structure was introduced in paper "REMIX: Efficient Range Query for LSM-trees", FAST'21. This repository maintains a

FNC is an Extremely lightweight C++ remake of GNU Cat

FNC is an barebones recreation of GNU CAT in C++ that removes unecessary options, which could be useful if you need to shave down a system to the kilobytes.

Owner
Jaysmito Mukherjee
Jaysmito Mukherjee
header-only UTF-8 string functions based on STL-string

utf8_xxx header-only UTF-8 string functions based on STL-string size_t utf8_len(const std::string& _Str) std::string utf8_sub(const std::string& _Str,

Voidmatrix 2 Dec 27, 2021
C Program to input a string and adjust memory allocation according to the length of the string.

C-String C Program to input a string and adjust memory allocation according to the length of the string. With the help of this program, we have replic

Kunal Kumar Sahoo 1 Jan 20, 2022
C++11 header-only library that offers small vector, small flat map/set/multimap/multiset.

sfl library This is header-only C++11 library that offers several new containers: small_vector small_flat_set small_flat_map small_flat_multiset small

null 21 Dec 14, 2022
Header-only library providing unicode aware string support for C++

CsString Introduction CsString is a standalone library which provides unicode aware string support. The CsBasicString class is a templated class which

CopperSpice 91 Dec 8, 2022
dwm is an extremely fast, small, and dynamic window manager for X.

dwm - dynamic window manager dwm is an extremely fast, small, and dynamic window manager for X. My Patches This is in the order that I patched everyth

Christian Chiarulli 32 Dec 23, 2022
A header only library that provides parser combinators to C++

This is an experimental C++ library for rapidly building parsers. It is inspired by parser combinators in haskell such as attoparsec and, like those libraries, allows for the construction of fully fledged parsers in a few lines of code.

Jotron AS 14 Jul 24, 2022
Small and dirty header-only library that supports user input with some more advanced features than in the standard lib.

dirty-term Small and dirty header-only library that supports user input with some more advanced features than in the standard lib. This small, lightwe

null 3 Apr 24, 2022
Using a RP2040 Pico as a basic logic analyzer, exporting CSV data to read in sigrok / Pulseview

rp2040-logic-analyzer This project modified the PIO logic analyzer example that that was part of the Raspberry Pi Pico examples. The example now allow

Mark 62 Dec 29, 2022