Header-only library providing unicode aware string support for C++

Overview

CsString

Introduction

CsString is a standalone library which provides unicode aware string support.

The CsBasicString class is a templated class which provides unicode aware string support. The encoding, such as UTF-8 or UTF-16, is passed to the CsBasicString template. The following typedefs are provided for convenience.

using CsString       = CsBasicString<utf8>;
using CsString_utf8  = CsBasicString<utf8>;
using CsString_utf16 = CsBasicString<utf16>;

System Requirements

To use CsString you will need a C++17 compiler and a C++17 standard library.

Currently uses the CMake build system for building and running the unit test suite. The library has been tested with clang sanitizer and a major code review.

Documentation

Class level documentation for CsString is available on the CopperSpice website:

www.copperspice.com/docs/cs_string/index.html

Presentations

Multiple videos discussing Unicode and strings can be found on the following pages:

www.youtube.com/copperspice
www.copperspice.com/presentations.html

Authors / Contributors

  • Ansel Sermersheim
  • Barbara Geller

License

This library is released under the BSD 2-clause license. For more information refer to the LICENSE file provided with this project.

References

Comments
  • MSVC linker warning:

    MSVC linker warning:

    LINK : warning LNK4044: unrecognized option '/Wl,--no-undefined'; ignored

    In cmake the compiler ID is the most robust way to identify the compiler used. E.g. <$<CXX_COMPILER_ID:MSVC>:/wd4715>

    opened by ThomasKrenn 4
  • Inconsistent 'size_type' and 'int' usage

    Inconsistent 'size_type' and 'int' usage

    MSVC warnings: 'argument': conversion from 'CsString::CsStringIterator<E,A>::size_type' to 'int'

    'walk' takes an int as first argument. E.g. here: template <typename E, typename A> CsStringIterator<E,A> &CsStringIterator<E,A>::operator+=(size_type n) { m_iter += E::walk(n, m_iter); return *this; }

    in

    walk(int len, std::vector<storage_unit>::const_iterator iter)

    opened by ThomasKrenn 3
  • CsStringView dangling pointer?

    CsStringView dangling pointer?

    Over here: https://en.cppreference.com/w/cpp/string/basic_string_view they say:

    std::string_view good("a string literal"); // OK: "good" points to a static array std::string_view bad("a temporary string"s); // "bad" holds a dangling pointer The first example is not working with CsStringView.

    opened by ThomasKrenn 3
  • Encodings do not check iterator limits

    Encodings do not check iterator limits

    This seems implicitly to violate the boundaries of the data structure you're iterating over, which would lead to invalid inputs causing crashes. For example, seeking 3 characters forward in a 3-code unit / 1 code point string would exceed the string length & read undefined memory.

    opened by dascandy 3
  • Documentation for constData() wrong?

    Documentation for constData() wrong?

    There seems to be an error in the documentation of the constData() method: it should have return type "const typename E::storage_unit *" and not "void".

    From https://www.copperspice.com/docs/cs_string/class_csstring_1_1csbasicstring.html#ac353b4a5bfcdfde79571314d373fbd04

    template<typename E, typename A = std::allocator> void CsString::CsBasicString< E, A >::constData() const

    Returns a pointer to the null terminated data stored in this CsBasicString. The pointer is valid until the string is modified.

    opened by PDHCoder 2
  • Request for Discussions Section

    Request for Discussions Section

    Hi I know you by my teacher I am happy to know you are there

    Could you please open a Discussion Section as a quick and safe solution for learning your projects ?

    See you there

    opened by ghost 2
  • size_t and int usage and other issue

    size_t and int usage and other issue

    CsString has some inconsistent use of int and size_t. E.g. here: int walk(int len, std::vector<storage_unit>::const_iterator iter)

    What is the idea behind using the namespace CsString? Does the statement using namespace CsString; not lead to name conflicts with the class?

    CsChar is not exported on Windows. class CsChar { ... }; in file cs_char.h

    Does the statement below create a temp. CsString? CsString::CsStringView text{ "hello" }; If yes then any further use of 'text' will cause problems.

    Cmake Visual Studio C++ does not have the option -wl

    The header files are not included in the project, but this is helpful for IDEs that support header files in the project (e.g. CLion, Visual Studio).

    Best regards Thomas Krenn

    opened by ThomasKrenn 2
  • Compile failure on storage on OSX build

    Compile failure on storage on OSX build

    Not sure where else to put this information, feel free to just close it if its in the wrong place.


    Here's what's installed

    12:10:13 [email protected]:~/src/cs_string (git:master:bf347f4) ruby-2.4.1 $ g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 8.0.0 (clang-800.0.42.1) Target: x86_64-apple-darwin16.6.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin


    Here's the build output

    $ make check /Applications/Xcode.app/Contents/Developer/usr/bin/make check-am /Applications/Xcode.app/Contents/Developer/usr/bin/make bin/test g++ -DPACKAGE_NAME="cs_string" -DPACKAGE_TARNAME="cs_string" -DPACKAGE_VERSION="1.0.0" -DPACKAGE_STRING="cs_string\ 1.0.0" -DPACKAGE_BUGREPORT="[email protected]" -DPACKAGE_URL="" -DPACKAGE="cs_string" -DVERSION="1.0.0" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=".libs/" -DSIZEOF_SIZE_T=8 -DQT_SHARED -I./src -DCS_STRING_ALLOW_UNSAFE -g -O2 -std=c++11 -MT test/bin_test-main.o -MD -MP -MF test/.deps/bin_test-main.Tpo -c -o test/bin_test-main.o test -f 'test/main.cpp' || echo './'test/main.cpp test/main.cpp:100:28: error: character too large for enclosing character literal type CsString::CsChar c127 = '¿'; ^ test/main.cpp:104:28: error: character too large for enclosing character literal type CsString::CsChar c256 = '↴'; ^ test/main.cpp:110:26: error: character too large for enclosing character literal type CsString::CsChar cX = '𝅘𝅥𝅮'; ^ 3 errors generated. make[2]: *** [test/bin_test-main.o] Error 1 make[1]: *** [check-am] Error 2 make: *** [check] Error 2


    Remarks

    Given the structure of test_o2, it might be worthwhile separating these cases (CsString::CsChar c127 = '¿';) from the CsString::CsChar u127 = UCHAR('¿');. That is, collect the known bogosity into its own test.

    opened by doolin 2
Owner
CopperSpice
CopperSpice
Neo - Simulates the digital rain from "The Matrix" (cmatrix clone with 32-bit color and Unicode support)

neo WARNING: neo may cause discomfort and seizures in people with photosensitive epilepsy. User discretion is advised. neo recreates the digital rain

Stew Reive 470 Dec 28, 2022
A modern port of Turbo Vision 2.0, the classical framework for text-based user interfaces. Now cross-platform and with Unicode support.

Turbo Vision A modern port of Turbo Vision 2.0, the classical framework for text-based user interfaces. Now cross-platform and with Unicode support. I

null 1.4k Dec 31, 2022
C Program to input a string and adjust memory allocation according to the length of the string.

C-String C Program to input a string and adjust memory allocation according to the length of the string. With the help of this program, we have replic

Kunal Kumar Sahoo 1 Jan 20, 2022
DimensionalAnalysis - A compact C++ header-only library providing compile-time dimensional analysis and unit awareness

Dimwits ...or DIMensional analysis With unITS is a C++14 library for compile-time dimensional analysis and unit awareness. Minimal Example #include <i

NJOY 8 Jul 8, 2022
Tiny header-only library providing bitwise operators for enums in C++11

bitflags Tiny header-only library providing bitwise operators for enums in C++11. Getting started Import the operators from namespace avakar::bitflags

Martin Vejnár 3 Aug 28, 2022
Small Extremely Powerful Header Only C++ Lexical Analyzer/String Parser Library

lexpp Small Extremely Powerful Header Only C++ Lexical Analyzer/String Parser Library Lexpp is made with simplicity and size in mind. The entire libra

Jaysmito Mukherjee 49 Jun 21, 2022
Text - A spicy text library for C++ that has the explicit goal of enabling the entire ecosystem to share in proper forward progress towards a bright Unicode future.

ztd.text Because if text works well in two of the most popular systems programming languages, the entire world over can start to benefit properly. Thi

Shepherd's Oasis 228 Dec 25, 2022
libu8ident - Follow unicode security guidelines for identifiers

libu8ident - Follow unicode security guidelines for identifiers without adding the full Unicode database. This library does the unicode identifier sec

Reini Urban 9 Dec 23, 2022
Simdutf - Unicode routines (UTF8, UTF16): billions of characters per second.

simdutf: Unicode validation and transcoding at billions of characters per second Most modern software relies on the Unicode standard. In memory, Unico

simdutf: Unicode at gigabytes per second 535 Jan 3, 2023
Thread aware Signal/Slot library

CsSignal Introduction CsSignal is a library for thread aware Signal/Slot delivery. This library does not depend upon CopperSpice or any other librarie

CopperSpice 57 Oct 18, 2022
📚 single header utf8 string functions for C and C++

?? utf8.h A simple one header solution to supporting utf8 strings in C and C++. Functions provided from the C header string.h but with a utf8* prefix

Neil Henning 1.3k Dec 28, 2022
Coverage-guided grammar aware fuzzer that uses grammar automatons

Gramatron Gramatron is a coverage-guided fuzzer that uses grammar automatons to perform grammar-aware fuzzing. Technical details about our framework a

HexHive 44 Dec 28, 2022
Minimal freestanding C library for kernel dev. Think Rust's libcore but for C. (memutils, string formatting, etc)

Libcore Minimal freestanding C library. Features String formatting. Memory utils memcpy memmove memcmp String utils strlen strncpy strncmp Serial driv

Anthony 3 Oct 21, 2021
A run-time C++ library for working with units of measurement and conversions between them and with string representations of units and measurements

Units What's new Some of the CMake target names have changed in the latest release, please update builds appropriately Documentation A library that pr

Lawrence Livermore National Laboratory 112 Dec 14, 2022
Pipet - c++ library for building lightweight processing pipeline at compile-time for string obfuscation, aes ciphering or whatever you want

Pipet Pipet is a lightweight c++17 headers-only library than can be used to build simple processing pipelines at compile time. Features Compile-time p

C. G. 60 Dec 12, 2022
The Better String Library

The Better String Library The Better String Library is an abstraction of a string data type which is superior to the C library char buffer string type

null 412 Dec 26, 2022
(Simple String Format) is an syntax of format and a library for parse this.

SSFMT (Simple String Format) is an syntax of format and a library for parse this. SSFMT != {fmt} SSFMT is NOT an API/library for parse {fmt} syntax !

null 2 Jan 30, 2022