C XML Minimalistic Library (CXML) - An XML library for C with a focus on simplicity and ease of use.

Overview

built with C CXML License c 11 issues stars forks


Table of Contents


Overview

cxml (C XML Minimalistic Library) is a powerful and flexible XML library for C with a focus on simplicity and ease of use, coupled with features that enables quick processing of XML documents.

cxml provides a DOM, and streaming interface for interacting with XML documents. This includes XPATH (1.0) support for simple/complex operations on the DOM, a built-in, simple and intuitive query language and an API for selection/creation/deletion/update operations (which may be used as an alternative to the XPATH API or in tandem with it), and a SAX-like interface for streaming large XML documents with no callback requirement. cxml works with any XML file encoded in an ASCII compatible encoding (UTF-8 for example).

One should be able to quickly utilize the library in processing or extracting data from an XML document almost effortlessy.

Note: cxml is a non-validating XML parser library. This means that DTD structures aren't used for validating the XML document. However, cxml enforces correct use of namespaces, and general XML well-formedness.

Quick Start

Say we have an XML file named "foo.xml", containing some tags/elements:

<bar>
    <bar>It's foo-bar!bar>
    <bar/>
    <foo>This is a foo elementfoo>
    <bar>Such a simple foo-bar documentbar>
    <foo/>
    <bar>So many bars herebar>
    <bar>Bye for nowbar> 
bar>

foo.xml



Using XPATH


We can perform a simple XPATH operation that selects all bar elements that have some text child/node and also are the first (element) child of their parents (as an example).

items){ // get the string representation of the element found item = cxml_element_to_rstring(node); // we own this string, we must free. printf("%s\n", item); free(item); } // free root node cxml_destroy(root); // cleanup the set cxml_set_free(node_set); // it's allocated, so it has to be freed. free(node_set); return 0; } ">
#include <cxml/cxml.h>

int main(){
    // load/parse xml file (`false` ensures the file isn't loaded 'lazily')
    cxml_root_node *root = cxml_load_file("foo.xml", false);

    // using the xpath interface, select all bar elements.
    cxml_set *node_set = cxml_xpath(root, "//bar[text() and position()=1]");
    char *item;

    // display all selected "bar" elements
    cxml_for_each(node, &node_set->items){
        // get the string representation of the element found
        item = cxml_element_to_rstring(node);
        // we own this string, we must free.
        printf("%s\n", item);
        free(item);
    }
    // free root node
    cxml_destroy(root);
    // cleanup the set
    cxml_set_free(node_set);
    // it's allocated, so it has to be freed.
    free(node_set);

    return 0;
}

A large subset of XPATH 1.0 is supported. Check out this page for non-supported XPATH features.



Using CXQL


Suppose we only need the first "bar" element, we can still utilize the XPATH interface, taking the first element in the node set returned. However, cxml ships with a built-in query language, that makes this quite easy.

Using the query language:

/"); // get the string representation of the element found char *str = cxml_element_to_rstring(elem); printf("%s\n", str); // we own this string, so we must free. free(str); // We destroy the entire root, which frees `elem` automatically cxml_destroy(root); return 0; } ">
#include <cxml/cxml.h>

int main(){
    // load/parse xml file
    cxml_root_node *root = cxml_load_file("foo.xml", false);

    // find 'bar' element
    cxml_element_node *elem = cxml_find(root, "/");

    // get the string representation of the element found
    char *str = cxml_element_to_rstring(elem);
    printf("%s\n", str);

    // we own this string, so we must free.
    free(str);

    // We destroy the entire root, which frees `elem` automatically
    cxml_destroy(root);

    return 0;
}


An example to find the first bar element containing text "simple":

/$text|='simple'/"); char *str = cxml_element_to_rstring(elem); printf("%s\n", str); free(str); // We destroy the entire root, which frees `elem` automatically cxml_destroy(root); return 0; } ">
#include <cxml/cxml.h>

int main(){
    // load/parse xml file
    cxml_root_node *root = cxml_load_file("foo.xml", false);

    cxml_element_node *elem = cxml_find(root, "/$text|='simple'/");

    char *str = cxml_element_to_rstring(elem);
    printf("%s\n", str);

    free(str);

    // We destroy the entire root, which frees `elem` automatically
    cxml_destroy(root);

    return 0;
}

In actuality, this selects the first bar element, having a text (child) node, whose string-value contains "simple". The query language ins't limited to finding only "first" elements. Check out the documentation for more details on this.


Here's a quick example that pretty prints an XML document:

#include <cxml/cxml.h>

int main(){
    // load/parse xml file
    cxml_root_node *root = cxml_load_file("foo.xml", false);

    // get the "prettified" string
    char *pretty = cxml_prettify(root);
    printf("%s\n", pretty);

    // we own this string.
    free(pretty);

    // destroy root
    cxml_destroy(root);

    return 0;
}



Using SAX


The SAX API may be the least convenient, but can be rewarding for very large files.

Here's an example to print every text and the name of every element found in the XML document, using the API:

#include <cxml/cxml.h>

int main(){
    // create an event reader object ('true' allows the reader to close itself once all events are exhausted)
    cxml_sax_event_reader reader = cxml_stream_file("foo.xml", true);

    // event object for storing the current event
    cxml_sax_event_t event;

    // cxml string objects to store name and text
    cxml_string name = new_cxml_string();
    cxml_string text = new_cxml_string();

    while (cxml_sax_has_event(&reader)){ // while there are events available to be processed.
        // get us the current event
        event = cxml_sax_get_event(&reader);

        // check if the event type is the beginning of an element
        if (event == CXML_SAX_BEGIN_ELEMENT_EVENT)
        {
            // consume the current event by collecting the element's name
            cxml_sax_get_element_name(&reader, &name);
            printf("Element: `%s`\n", cxml_string_as_raw(&name));
            cxml_string_free(&name);
        }
        // or a text event
        else if (event == CXML_SAX_TEXT_EVENT)
        {
            // consume the current event by collecting the text data
            cxml_sax_get_text_data(&reader, &text);
            printf("Text: `%s`\n", cxml_string_as_raw(&text));
            cxml_string_free(&text);
        }
    }

    return 0;
}


Quick Questions

If you have little questions that you feel isn't worth opening an issue for, use cxml's discussions.

Tests and Examples

The tests folder contains the tests. See the examples folder for more examples, and use cases.

Documentation

This is still a work in progress. See the examples folder for now.

Installation

Check out the installation guide for information on how to install, build or use the library in your project.

Dependencies

cxml only depends on the C standard library. All that is needed to build the library from sources is a C11 compliant compiler.

Contributing

Your contributions are absolutely welcome! See the contribution guidelines to learn more. You can also check out the project architecture for a high-level description of the entire project. Thanks!

Reporting Bugs/Requesting Features

cxml is in its early stages, but under active development. Any bugs found can be reported by opening an issue (check out the issue template). Please be nice. Providing details for reproducibility of the bug(s) would help greatly in implementing a fix, or better still, if you have a fix, consider contributing. You can also open an issue if you have a feature request that could improve the library.

Project Non-goals

cxml started out as a little personal experiment, but along the line, has acquired much more features than I had initially envisioned. However, some things are/will not be in view for this project. Here are some of the non-goals:

  • Contain every possible feature (DTD validation, namespace well-formedness validation, etc.)
  • Be the most powerful/sophisticated XML library.
  • Be the "best" XML library.

However, to take a full advantage of this library, you should have a good understanding of XML, including its dos, and dont's.

License

cxml is distributed under the MIT License.

You might also like...
Software ray tracer written from scratch in C that can run on CPU or GPU with emphasis on ease of use and trivial setup
Software ray tracer written from scratch in C that can run on CPU or GPU with emphasis on ease of use and trivial setup

A minimalist and platform-agnostic interactive/real-time raytracer. Strong emphasis on simplicity, ease of use and almost no setup to get started with

bsnes is a Super Nintendo (SNES) emulator focused on performance, features, and ease of use.
bsnes is a Super Nintendo (SNES) emulator focused on performance, features, and ease of use.

bsnes is a Super Nintendo (SNES) emulator focused on performance, features, and ease of use.

Open Source Cheat for Apex Legends, designed for ease of use. Made to understand reversing of Apex Legends and respawn's modified source engine as well as their Easy Anti Cheat Implementation.
Open Source Cheat for Apex Legends, designed for ease of use. Made to understand reversing of Apex Legends and respawn's modified source engine as well as their Easy Anti Cheat Implementation.

Apex-Legends-SDK Open Source Cheat for Apex Legends, designed for ease of use. Made to understand reversing of Apex Legends and respawn's modified sou

The HorusUI library allows you to quickly develop GUIs for your applications by leveraging the ease of use provided by immediate mode GUI concepts.
The HorusUI library allows you to quickly develop GUIs for your applications by leveraging the ease of use provided by immediate mode GUI concepts.

Immediate Mode Graphical User Interface for Tools OVERVIEW The HorusUI library allows you to quickly develop GUIs for your applications by leveraging

An SQLite binding for node.js with built-in encryption, focused on simplicity and (async) performance

Description An SQLite (more accurately SQLite3MultipleCiphers) binding for node.js focused on simplicity and (async) performance. When dealing with en

Standard GNU99C Simplicity (Convention Library)

The purpose of this library is to simplify programming with the C programming language. This is possible by abstracting community established best practices surrounding calls to functions in the C standard libraries

This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.
This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

This is a JSON C++ library. It can write and read JSON files with ease and speed.

Json Box JSON (JavaScript Object Notation) is a lightweight data-interchange format. Json Box is a C++ library used to read and write JSON with ease a

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

A cross-platform HTTP client library with a focus on usability and speed

EasyHttp A cross-platform HTTP client library with a focus on usability and speed. Under its hood, EasyHttp uses POCO C++ Libraries and derives many o

STM32Cube is an STMicroelectronics original initiative to ease the developers life by reducing efforts, time and cost.

STM32Cube covers the overall STM32 products portfolio. It includes a comprehensive embedded software platform (this repo), delivered for each series (such as the STM32CubeU5 for the STM32U5 series).

Duktape - embeddable Javascript engine with a focus on portability and compact footprint

Duktape ⚠️ Master branch is undergoing incompatible changes for Duktape 3.x. To track Duktape 2.x, follow the v2-maintenance branch. Introduction Dukt

This is a fork of prboom+ with extra tooling for demo recording and playback, with a focus on speedrunning.

dsda-doom v0.15.1 This is a fork of prboom+ with extra tooling for demo recording and playback, with a focus on speedrunning. Heretic Support (beta) D

Nano is a digital payment protocol designed to be accessible and lightweight, with a focus on removing inefficiencies present in other cryptocurrencies.

Nano is a digital payment protocol designed to be accessible and lightweight, with a focus on removing inefficiencies present in other cryptocurrencies. With ultrafast transactions and zero fees on a secure, green and decentralized network, this makes Nano ideal for everyday transactions.

Nano is a digital payment protocol designed to be accessible and lightweight, with a focus on removing inefficiencies present in other cryptocurrencies.

Nano is a digital payment protocol designed to be accessible and lightweight, with a focus on removing inefficiencies present in other cryptocurrencies. With ultrafast transactions and zero fees on a secure, green and decentralized network, this makes Nano ideal for everyday transactions.

A model viewer for Quake 1 and Hexen 2 with a focus on accurate representation.
A model viewer for Quake 1 and Hexen 2 with a focus on accurate representation.

LunarViewer A model viewer for Quake 1 and Hexen 2 with a focus on accurate representation. Powered by raylib and dear imgui! Features Support for Qua

a small build system with a focus on speed

Ninja Ninja is a small build system with a focus on speed. https://ninja-build.org/ See the manual or doc/manual.asciidoc included in the distribution

Orbit is a multiplatform-focus graphical engine build on top of OpenGl, ImGui
Orbit is a multiplatform-focus graphical engine build on top of OpenGl, ImGui

Orbit Engine Orbit is a multiplatform-focus graphical engine build on top of OpenGl, ImGui and more... The development of the engine is documented via

frost is a programming language with a focus on low-friction systems programming.

❄️ frost frost programming language About frost is a programming language with a focus on low-friction systems programming.

Comments
  • Update Test Framework

    Update Test Framework

    I came across your post on Reddit, and I must say I absolutely love this library you've built - I was planning on building something similar, but I guess now I can contribute to yours.

    For the testing, I would recommend you use Tau, my testing framework. Why?

    • It's fast
    • It's tiny
    • Powerful assertion macros
    • Errors in code are nicely outputted and they run similar to the testing suite you employed previously

    Most of this PR has been focussed on the tests folder, but you had a typo (u_int32_t as opposed to uint32_t) in the src folder which I fixed. Additionally, I turned off the -Werror flag in CMakeLists.txt because you have a few warnings that I was unable to fix. Additionally, I added more compiler flags.

    I'll let you try out Muon before you decide (all you need to do is set the CXML_BUILD_TESTS flag ON). You have two failing tests (this was there before I worked on this), but Muon neatly shows you the file + line it failed on.

    Feel free to ask me any questions.

    Thanks, -J

    opened by jasmcaus 2
Owner
Make magic.
null
pugixml is a Light-weight, simple and fast XML parser for C++ with XPath support

pugixml is a C++ XML processing library, which consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with Unicode interface variants and conversions between different Unicode encodings (which happen automatically during parsing/saving).

Arseny Kapoulkine 3.3k Jan 8, 2023
Tiny XML library.

Mini-XML Version 3.2 Mini-XML is a small XML parsing library that you can use to read XML data files or strings in your application without requiring

Michael R Sweet 371 Dec 29, 2022
Expat - a C library for parsing XML

Fast streaming XML parser written in C

Expat development team 831 Dec 28, 2022
TinyXML2 is a simple, small, efficient, C++ XML parser that can be easily integrated into other programs.

TinyXML-2 TinyXML-2 is a simple, small, efficient, C++ XML parser that can be easily integrated into other programs. The master is hosted on github: h

Lee Thomason 4.3k Dec 27, 2022
A minimalist library with basic facilities for developing interactive real-time 3D applications, with a strong emphasis on simplicity and ease of use.

SlimEngine A minimalist and platform-agnostic base project for interactive graphical applications (2D/3D) with a strong emphasis on simplicity, ease o

Arnon Marcus 67 Oct 29, 2022
DOSBox Pure is a new fork of DOSBox built for RetroArch/Libretro aiming for simplicity and ease of use.

DOSBox Pure is a fork of DOSBox, an emulator for DOS games, built for RetroArch/Libretro aiming for simplicity and ease of use.

Bernhard Schelling 565 Dec 27, 2022
A minimalist andf platform-agnostic application layer for writing graphical applications, with a strong emphasis on simplicity and ease of use.

SlimApp A minimalist(*) and platform-agnostic application layer for writing graphical applications. Available as either a single header file or a dire

Arnon Marcus 34 Dec 18, 2022
libspng is a C library for reading and writing PNG format files with a focus on security and ease of use.

libspng (simple png) is a C library for reading and writing Portable Network Graphics (PNG) format files with a focus on security and ease of use.

Randy 570 Dec 29, 2022
Entity-Component-System (ECS) with a focus on ease-of-use, runtime extensibility and compile-time type safety and clarity.

Kengine The Koala engine is a type-safe and self-documenting implementation of an Entity-Component-System (ECS), with a focus on runtime extensibility

Nicolas Phister 466 Dec 26, 2022
Cross-platform, efficient, customizable, and robust asynchronous HTTP/WebSocket server C++14 library with the right balance between performance and ease of use

What Is RESTinio? RESTinio is a header-only C++14 library that gives you an embedded HTTP/Websocket server. It is based on standalone version of ASIO

Stiffstream 924 Jan 6, 2023