Simdutf - Unicode routines (UTF8, UTF16): billions of characters per second.

Overview

Alpine Linux MSYS2-CI MSYS2-CLANG-CI Ubuntu 20.04 CI (GCC 9) VS16-ARM-CI VS16-CI

simdutf: Unicode validation and transcoding at billions of characters per second

Most modern software relies on the Unicode standard. In memory, Unicode strings are represented using either UTF-8 or UTF-16. The UTF-8 format is the de facto standard on the web (JSON, HTML, etc.) and it has been adopted as the default in many popular programming languages (Go, Rust, Swift, etc.). The UTF-16 format is standard in Java, C# and in many Windows technologies.

Not all sequences of bytes are valid Unicode strings. It is unsafe to use Unicode strings in UTF-8 and UTF-16LE without first validating them. Furthermore, we often need to convert strings from one encoding to another, by a process called transcoding. For security purposes, such transcoding should be validating: it should refuse to transcode incorrect strings.

This library provide fast Unicode functions such as

  • UTF-8 and UTF-16LE validation,
  • UTF-8 to UTF-16LE transcoding, with or without validation,
  • UTF-16LE to UTF-8 transcoding, with or without validation,
  • From an UTF-8 string, compute the size of the UTF-16 equivalent string,
  • From an UTF-16 string, compute the size of the UTF-8 equivalent string,
  • UTF-8 and UTF-16LE character counting.

The functions are accelerated using SIMD instructions (e.g., ARM NEON, SSE, AVX, etc.). When your strings contain hundreds of characters, we can often transcode them at speeds exceeding a billion caracters per second. You should expect high speeds not only with English strings (ASCII) but also Chinese, Japanese, Arabic, and so forth. We handle the full character range (including, for example, emojis).

The library compiles down to tens of kilobytes. Our functions are exception-free and non allocating. We have extensive tests.

How fast is it?

Over a wide range of realistic data sources, we transcode a billion characters per second or more. Our approach can be 3 to 10 times faster than the popular ICU library on difficult (non-ASCII) strings. We can be 20x faster than ICU when processing easy strings (ASCII). Our good results apply to both recent x64 and ARM processors.

To illustrate, we present a benchmark result with values are in billions of characters processed by second. Consider the following figures.

Datasets: https://github.com/lemire/unicode_lipsum

Please refer to our benchmarking tool for a proper interpretation of the numbers. Our results are reproducible.

Requirements

  • C++11 compatible compiler. We support LLVM clang, GCC, Visual Studio. (Our optional benchmark tool requires C++17.)
  • For high speed, you should have a recent 64-bit system (e.g., ARM or x64).
  • If you rely on CMake, you should use a recent CMake (at least 3.15) ; otherwise you may use the single header version. The library is also available from Microsoft's vcpkg.

Usage (Usage)

We made a video to help you get started with the library.

the simdutf library

Usage (CMake)

cmake -B build
cmake --build build
cd build
ctest .

Visual Studio users must specify whether they want to build the Release or Debug version.

To run benchmarks, execute the benchmark command. You can get help on its usage by first building it and then calling it with the --help flag. E.g., under Linux you may do the following:

cmake -B build
cmake --build build
./build/benchmarks/benchmark --help

Instructions are similar for Visual Studio users.

Since ICU is so common and popular, we assume that you may have it already on your system. When it is not found, it is simply omitted from the benchmarks. Thus, to benchmark against ICU, make sure you have ICU installed on your machine and that cmake can find it. For macOS, you may install it with brew using brew install icu4c. If you have ICU on your system but cmake cannot find it, you may need to provide cmake with a path to ICU, such as ICU_ROOT=/usr/local/opt/icu4c cmake -B build.

Single-header version

You can create a single-header version of the library where all of the code is put into two files (simdutf.h and simdutf.cpp). We publish a zip archive containing these files, e.g., see https://github.com/simdutf/simdutf/releases/download/v1.0.0/singleheader.zip

You may generate it on your own using a Python script.

python3 ./singleheader/amalgamate.py

We require Python 3 or better.

Under Linux and macOS, you may test it as follows:

cd singleheader
c++ -o amalgamation_demo amalgamation_demo.cpp -std=c++17
./amalgamation_demo

Example

Using the single-header version, you could compile the following program.

#include <iostream>
#include <memory>

#include "simdutf.cpp"
#include "simdutf.h"

int main(int argc, char *argv[]) {
  const char *source = "1234";
  // 4 == strlen(source)
  bool validutf8 = simdutf::validate_utf8(source, 4);
  if (validutf8) {
    std::cout << "valid UTF-8" << std::endl;
  } else {
    std::cerr << "invalid UTF-8" << std::endl;
    return EXIT_FAILURE;
  }
  // We need a buffer of size where to write the UTF-16LE words.
  size_t expected_utf16words = simdutf::utf16_length_from_utf8(source, 4);
  std::unique_ptr<char16_t[]> utf16_output{new char16_t[expected_utf16words]};
  // convert to UTF-16LE
  size_t utf16words =
      simdutf::convert_utf8_to_utf16(source, 4, utf16_output.get());
  std::cout << "wrote " << utf16words << " UTF-16LE words." << std::endl;
  // It wrote utf16words * sizeof(char16_t) bytes.
  bool validutf16 = simdutf::validate_utf16(utf16_output.get(), utf16words);
  if (validutf16) {
    std::cout << "valid UTF-16LE" << std::endl;
  } else {
    std::cerr << "invalid UTF-16LE" << std::endl;
    return EXIT_FAILURE;
  }
  // convert it back:
  // We need a buffer of size where to write the UTF-8 words.
  size_t expected_utf8words =
      simdutf::utf8_length_from_utf16(utf16_output.get(), utf16words);
  std::unique_ptr<char[]> utf8_output{new char[expected_utf8words]};
  // convert to UTF-8
  size_t utf8words = simdutf::convert_utf16_to_utf8(
      utf16_output.get(), utf16words, utf8_output.get());
  std::cout << "wrote " << utf8words << " UTF-8 words." << std::endl;
  std::string final_string(utf8_output.get(), utf8words);
  std::cout << final_string << std::endl;
  if (final_string != source) {
    std::cerr << "bad conversion" << std::endl;
    return EXIT_FAILURE;
  } else {
    std::cerr << "perfect round trip" << std::endl;
  }
  return EXIT_SUCCESS;
}

API

Our API is made of a few non-allocating function. They typically take a pointer and a length as a parameter, and they sometimes take a pointer to an output buffer. Users are responsible for memory allocation.

namespace simdutf {


/**
 * Validate the UTF-8 string.
 *
 * Overridden by each implementation.
 *
 * @param buf the UTF-8 string to validate.
 * @param len the length of the string in bytes.
 * @return true if and only if the string is valid UTF-8.
 */
simdutf_warn_unused bool validate_utf8(const char *buf, size_t len) noexcept;

/**
 * Validate the UTF-16LE string.
 *
 * Overridden by each implementation.
 *
 * This function is not BOM-aware.
 *
 * @param buf the UTF-16LE string to validate.
 * @param len the length of the string in number of 2-byte words (char16_t).
 * @return true if and only if the string is valid UTF-16LE.
 */
simdutf_warn_unused bool validate_utf16(const char16_t *buf, size_t len) noexcept;

/**
 * Convert possibly broken UTF-8 string into UTF-16LE string.
 *
 * During the conversion also validation of the input string is done.
 * This function is suitable to work with inputs from untrusted sources.
 *
 * @param input         the UTF-8 string to convert
 * @param length        the length of the string in bytes
 * @param utf16_buffer  the pointer to buffer that can hold conversion result
 * @return the number of written char16_t; 0 if the input was not valid UTF-8 string
 */
simdutf_warn_unused size_t convert_utf8_to_utf16(const char * input, size_t length, char16_t* utf8_output) noexcept;

/**
 * Convert valid UTF-8 string into UTF-16LE string.
 *
 * This function assumes that the input string is valid UTF-8.
 *
 * @param input         the UTF-8 string to convert
 * @param length        the length of the string in bytes
 * @param utf16_buffer  the pointer to buffer that can hold conversion result
 * @return the number of written char16_t
 */
simdutf_warn_unused size_t convert_valid_utf8_to_utf16(const char * input, size_t length, char16_t* utf16_buffer) noexcept;

/**
 * Compute the number of 2-byte words that this UTF-8 string would require in UTF-16LE format.
 *
 * This function does not validate the input.
 *
 * @param input         the UTF-8 string to process
 * @param length        the length of the string in bytes
 * @return the number of char16_t words required to encode the UTF-8 string as UTF-16LE
 */
simdutf_warn_unused size_t utf16_length_from_utf8(const char * input, size_t length) noexcept;

/**
 * Convert possibly broken UTF-16LE string into UTF-8 string.
 *
 * During the conversion also validation of the input string is done.
 * This function is suitable to work with inputs from untrusted sources.
 *
 * This function is not BOM-aware.
 *
 * @param input         the UTF-16LE string to convert
 * @param length        the length of the string in 2-byte words (char16_t)
 * @param utf8_buffer   the pointer to buffer that can hold conversion result
 * @return number of written words; 0 if input is not a valid UTF-16LE string
 */
simdutf_warn_unused size_t convert_utf16_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) noexcept;

/**
 * Convert valid UTF-16LE string into UTF-8 string.
 *
 * This function assumes that the input string is valid UTF-16LE.
 *
 * This function is not BOM-aware.
 *
 * @param input         the UTF-16LE string to convert
 * @param length        the length of the string in 2-byte words (char16_t)
 * @param utf8_buffer   the pointer to buffer that can hold the conversion result
 * @return number of written words; 0 if conversion is not possible
 */
simdutf_warn_unused size_t convert_valid_utf16_to_utf8(const char16_t * input, size_t length, char* utf8_buffer) noexcept;

/**
 * Compute the number of bytes that this UTF-16LE string would require in UTF-8 format.
 *
 * This function does not validate the input.
 *
 * This function is not BOM-aware.
 *
 * @param input         the UTF-16LE string to convert
 * @param length        the length of the string in 2-byte words (char16_t)
 * @return the number of bytes required to encode the UTF-16LE string as UTF-8
 */
simdutf_warn_unused size_t utf8_length_from_utf16(const char16_t * input, size_t length) noexcept;

/**
 * Count the number of code points (characters) in the string assuming that
 * it is valid.
 *
 * This function assumes that the input string is valid UTF-16LE.
 *
 * This function is not BOM-aware.
 *
 * @param input         the UTF-16LE string to process
 * @param length        the length of the string in 2-byte words (char16_t)
 * @return number of code points
 */
simdutf_warn_unused size_t count_utf16(const char16_t * input, size_t length) noexcept;

/**
 * Count the number of code points (characters) in the string assuming that
 * it is valid.
 *
 * This function assumes that the input string is valid UTF-8.
 *
 * @param input         the UTF-8 string to process
 * @param length        the length of the string in bytes
 * @return number of code points
 */
simdutf_warn_unused size_t count_utf8(const char * input, size_t length) noexcept;


}

Usage

The library used by haskell/text.

References

License

This code is made available under the Apache License 2.0 as well as the MIT license.

We include a few competitive solutions under the benchmarks/competition directory. They are provided for research purposes only.

Comments
  • Iconv tool

    Iconv tool

    Fix #146 This PR adds a tool to directly use the simdutf library from the command line (only transcoding functions for now). In case of an unimplemented transcoding, we use iconv as a fallback. For now, the name of the tool is sutf, but I am open to a better name. The usage is the similar to iconv : sutf [OPTION...] [−f encoding] [−t encoding] [inputfile ...] where the only option now is to specify an output file with [-o output_file]. If no input is specified, the output is redirected to standard output. There is also -h or --help and -l or --list to display the formats supported by simdutf. There can also be more than one input file.

    opened by NicolasJiaxin 74
  • Fuzzer for buffer overflow

    Fuzzer for buffer overflow

    Fix #156 I am not entirely sure how to proceed, but this is my first fuzzer test to detect buffer overflows. It is not very sophisticated yet, but I just wanted to know if I had a good approach. From #92 , at least one case of overflow occurred when trying to predict the output length with utf*_length_from_utf*. Also, I think buffer overflows can only happen with valid inputs. Right now, the fuzzer test I have is pretty much just a UTF-8 to UTF-16 test, but the idea would be to generate maybe something uniform and insert some irregularities (like a single non-ASCII characters in the middle of long ASCII strings, something like that). I could also add extensive random tests with valid and invalid inputs. Is this reasonable?

    opened by NicolasJiaxin 31
  • Add support of UTF-16BE

    Add support of UTF-16BE

    UTF-16 comes in two flavours (BE, LE). Some of the time, it can be distinguished with a BOM but often we must rely on the platforms' default. There are effectively no big endian platform left (other than legacy systems). I believe that UTF-16LE is effectively the default (e.g., under Windows).

    In time, we may want to support UTF-16BE. It involves a simple byte reversal.

    API Design 
    opened by lemire 26
  • AVX512 feature branch (legacy)

    AVX512 feature branch (legacy)

    This is a feature branch for AVX512 implementations. It will not be merged. Instead, we are working on https://github.com/simdutf/simdutf/pull/174 with the new (extended) simdutf API.

    For the time being, it will remain for benchmarking and testing purposes.

    opened by lemire 21
  • Preliminary version of the new UTF8=>UTF16 + some benchmarks

    Preliminary version of the new UTF8=>UTF16 + some benchmarks

    This PR adds some UTF8=>UTF16 benchmarks and a new SSE based transcoder.

    The new transcoder seems empirically faster. The main reason I built it was to make more portable. It should be "easily" ported to NEON and AVX, my next step.

    The general idea is to first index a block of data. Currently it works in "cache line" sizes. So you find out where the "continuation bytes" are (0b10______) on a 64-byte block and that tells you right away whether you are dealing with ASCII. If you are, you can just quickly transcode ASCII => UTF16. There is a cheaper way to detect ASCII, but the nice thing here is that if you know where the "continuation bytes" are, you all have you need to know where the UTF-8 characters. No need for another movemask.

    Furthermore, given that we are working in large blocks (say 64 bytes), then we can repeatedly call a decoding routine with the same mask computed once.

    This should be more portable because NEON lacks movemask, and requires many operations to emulate the same result. However, it does fine doing a movemask over a large block since you can amortize the cost somewhat.

    Currently, the haswell kernel just contains my previous UTF8=>UTF16 SSE transcoder while the westmere kernel contains the new UTF8=>UTF16 SSE transcoder. Here are the numbers:

    $  ./build/benchmarks/benchmark -P convert -F benchmarks/dataset/wikipedia_mars/chinese.txt 
    testcases: 1
    convert_valid_utf8_to_utf16+fallback, input size: 75146, iterations: 100, 
      18.842 ins/byte,    3.395 GHz,    0.797 GB/s 
    convert_valid_utf8_to_utf16+haswell, input size: 75146, iterations: 100, 
       2.989 ins/byte,    3.398 GHz,    1.673 GB/s 
    convert_valid_utf8_to_utf16+westmere, input size: 75146, iterations: 100, 
       3.253 ins/byte,    3.401 GHz,    2.865 GB/s 
    $ ./build/benchmarks/benchmark -P convert -F benchmarks/dataset/wikipedia_mars/english.txt 
    testcases: 1
    convert_valid_utf8_to_utf16+fallback, input size: 181798, iterations: 100, 
      21.947 ins/byte,    3.395 GHz,    0.836 GB/s 
    convert_valid_utf8_to_utf16+haswell, input size: 181798, iterations: 100, 
       1.192 ins/byte,    3.405 GHz,    8.843 GB/s 
    convert_valid_utf8_to_utf16+westmere, input size: 181798, iterations: 100, 
       1.175 ins/byte,    3.407 GHz,    8.835 GB/s 
    $ ./build/benchmarks/benchmark -P convert -F benchmarks/dataset/wikipedia_mars/french.txt 
    testcases: 1
    convert_valid_utf8_to_utf16+fallback, input size: 245549, iterations: 100, 
      21.790 ins/byte,    3.394 GHz,    0.703 GB/s 
    convert_valid_utf8_to_utf16+haswell, input size: 245549, iterations: 100, 
       2.226 ins/byte,    3.395 GHz,    1.299 GB/s 
    convert_valid_utf8_to_utf16+westmere, input size: 245549, iterations: 100, 
       3.707 ins/byte,    3.396 GHz,    2.045 GB/s 
    
    opened by lemire 20
  • Making the AVX-512 UTF-8 to UTF-16 transcoder more branchy

    Making the AVX-512 UTF-8 to UTF-16 transcoder more branchy

    In this PR (to be applied to https://github.com/simdutf/simdutf/pull/97), we add a few more branches to help the performance of some Asian inputs. The net result is a slight loss of performance all around, with a nice gain in the few cases where our haswell kernel was slower. This might prove to be an interesting compromise since with this new code, the icelake kernel is always faster than haswell, except maybe on emoji inputs.

    The gist of the idea is to check when our expanded 512-bit registers contain too few valid characters and to merge two successive expanded 512-bit registers when we detect such a scenario. When this branch is not taken, we pay for some arithmetic and a branch not taken. However, when the branch is fruitful, the gains are interesting: from 3.3 GB/s to 3.9 GB/s over some Japanese file (a 15% boost).

    ===========================
    Using ICU version 70.1
    testcases: 1
    input detected as UTF8
    current system detected as icelake
    ===========================
    convert_utf8_to_utf16+haswell, input size: 69840, iterations: 10000, dataset: unicode_lipsum/lipsum/Chinese-Lipsum.utf8.txt
       3.851 GB/s (1.3 %)    1.294 Gc/s     2.98 byte/char
    convert_utf8_to_utf16+icelake, input size: 69840, iterations: 10000, dataset: unicode_lipsum/lipsum/Chinese-Lipsum.utf8.txt
       3.331 GB/s (1.4 %)    1.119 Gc/s     2.98 byte/char
    We define the number of bytes to be the number of *input* bytes.
    We define a 'char' to be a code point (between 1 and 4 bytes).
    ===========================
    Using ICU version 70.1
    testcases: 1
    input detected as UTF8
    current system detected as icelake
    ===========================
    convert_utf8_to_utf16+haswell, input size: 67808, iterations: 10000, dataset: unicode_lipsum/lipsum/Japanese-Lipsum.utf8.txt
       3.578 GB/s (1.3 %)    1.233 Gc/s     2.90 byte/char
    convert_utf8_to_utf16+icelake, input size: 67808, iterations: 10000, dataset: unicode_lipsum/lipsum/Japanese-Lipsum.utf8.txt
       3.338 GB/s (0.9 %)    1.151 Gc/s     2.90 byte/char
    We define the number of bytes to be the number of *input* bytes.
    We define a 'char' to be a code point (between 1 and 4 bytes).
    ===========================
    Using ICU version 70.1
    testcases: 1
    input detected as UTF8
    current system detected as icelake
    ===========================
    convert_utf8_to_utf16+haswell, input size: 66600, iterations: 10000, dataset: unicode_lipsum/lipsum/Korean-Lipsum.utf8.txt
       2.206 GB/s (1.1 %)    0.899 Gc/s     2.45 byte/char
    convert_utf8_to_utf16+icelake, input size: 66600, iterations: 10000, dataset: unicode_lipsum/lipsum/Korean-Lipsum.utf8.txt
       3.340 GB/s (0.9 %)    1.361 Gc/s     2.45 byte/char
    We define the number of bytes to be the number of *input* bytes.
    We define a 'char' to be a code point (between 1 and 4 bytes).
    

    After:

    ===========================
    Using ICU version 70.1
    testcases: 1
    input detected as UTF8
    current system detected as icelake
    ===========================
    convert_utf8_to_utf16+haswell, input size: 69840, iterations: 10000, dataset: unicode_lipsum/lipsum/Chinese-Lipsum.utf8.txt
       3.838 GB/s (1.6 %)    1.289 Gc/s     2.98 byte/char
    convert_utf8_to_utf16+icelake, input size: 69840, iterations: 2000, dataset: unicode_lipsum/lipsum/Chinese-Lipsum.utf8.txt
       3.955 GB/s (0.8 %)    1.329 Gc/s     2.98 byte/char 
    We define the number of bytes to be the number of *input* bytes.
    We define a 'char' to be a code point (between 1 and 4 bytes).
    ===========================
    Using ICU version 70.1
    testcases: 1
    input detected as UTF8
    current system detected as icelake
    ===========================
    convert_utf8_to_utf16+haswell, input size: 67808, iterations: 10000, dataset: unicode_lipsum/lipsum/Japanese-Lipsum.utf8.txt
       3.587 GB/s (1.5 %)    1.236 Gc/s     2.90 byte/char
    convert_utf8_to_utf16+icelake, input size: 67808, iterations: 2000, dataset: unicode_lipsum/lipsum/Japanese-Lipsum.utf8.txt
       3.932 GB/s (0.9 %)    1.355 Gc/s     2.90 byte/char 
    We define the number of bytes to be the number of *input* bytes.
    We define a 'char' to be a code point (between 1 and 4 bytes).
    ===========================
    Using ICU version 70.1
    testcases: 1
    input detected as UTF8
    current system detected as icelake
    ===========================
    convert_utf8_to_utf16+haswell, input size: 66600, iterations: 10000, dataset: unicode_lipsum/lipsum/Korean-Lipsum.utf8.txt
       2.204 GB/s (3.4 %)    0.898 Gc/s     2.45 byte/char
    convert_utf8_to_utf16+icelake, input size: 66600, iterations: 2000, dataset: unicode_lipsum/lipsum/Korean-Lipsum.utf8.txt
       3.946 GB/s (0.9 %)    1.608 Gc/s     2.45 byte/char 
    We define the number of bytes to be the number of *input* bytes.
    We define a 'char' to be a code point (between 1 and 4 bytes).
    
    
    Algorithmic/Performance AVX-512 
    opened by lemire 15
  • AVX-512 UTF-16 to UTF-32

    AVX-512 UTF-16 to UTF-32

    Fix #129 The algorithm offers a vectorized approach in the presence of surrogate pairs instead of the scalar fallback that we have for other architectures. For surrogates, the idea is to have a register that is shifted by one 16-bit word to the right to align high surrogate words with low surrogate words. We expand all words to 32-bit words of both registers (shifted and non-shifted registers). Then, we shift the bits of the low surrogates by 10 to the left in the non-shifted register, add both registers together and add a constant (0xfca04200, credits to @clausecker) to remove surrogates prefixes and to add the plane shift 0x10000.

    With AVX-512, this is not too hard to do with all the masked instructions and _mm512_mask_compressstoreu_epi32. I might have used too many masked instructions, but I do not know if this affects performance (in Intel's documentation, the throughput and the latency of masked vs non-masked instructions are the same I think). I will test it to be sure.

    opened by NicolasJiaxin 15
  • Relatively poor performance of validate_utf16 on ARM (Apple M1) for pure emoji

    Relatively poor performance of validate_utf16 on ARM (Apple M1) for pure emoji

    Using Apple M1 and clang 12, I am getting that the accelerated validate_utf16 function is slower than our scalar fallback on the Emoji-Lipsum.utf16.txt test file (pure emoji).

    ❯  ./build/benchmarks/benchmark -F unicode_lipsum/lipsum/Emoji-Lipsum.utf16.txt -P validate_utf16
    We define the number of bytes to be the number of *input* bytes.
    We define a 'char' to be a code point (between 1 and 4 bytes).
    ===========================
    Using ICU version 69.1
    testcases: 1
    input detected as UTF16 little-endian
    current system detected as arm64
    ===========================
    validate_utf16+arm64, input size: 65542, iterations: 2000, dataset: unicode_lipsum/lipsum/Emoji-Lipsum.utf16.txt
    kpc_set_config failed, run the program with sudo
       1.873 GB/s (7.2 %)    0.936 Gc/s     2.00 byte/char 
    validate_utf16+fallback, input size: 65542, iterations: 2000, dataset: unicode_lipsum/lipsum/Emoji-Lipsum.utf16.txt
       2.579 GB/s (3.4 %)    1.289 Gc/s     2.00 byte/char 
    

    So we are nearly 40% slower with the accelerated function than the fallback function.

    But not on x64 (AMD Rome, GCC 12)...

    $ ./build/benchmarks/benchmark -F unicode_lipsum/lipsum/Emoji-Lipsum.utf16.txt -P validate_utf16
    We define the number of bytes to be the number of *input* bytes.
    We define a 'char' to be a code point (between 1 and 4 bytes).
    ===========================
    Using ICU version 67.1
    testcases: 1
    input detected as UTF16 little-endian
    current system detected as haswell
    ===========================
    validate_utf16+fallback, input size: 65542, iterations: 2000, dataset: unicode_lipsum/lipsum/Emoji-Lipsum.utf16.txt
       9.507 ins/byte,    9.507 cycle/byte,    2.025 GB/s (0.8 %),     3.405 GHz,    5.654 ins/cycle
      19.012 ins/char,   19.012 cycle/char,    1.013 Gc/s (0.8 %)     2.00 byte/char
    validate_utf16+haswell, input size: 65542, iterations: 2000, dataset: unicode_lipsum/lipsum/Emoji-Lipsum.utf16.txt
       1.071 ins/byte,    1.071 cycle/byte,   12.343 GB/s (1.9 %),     3.466 GHz,    3.814 ins/cycle
       2.142 ins/char,    2.142 cycle/char,    6.172 Gc/s (1.9 %)     2.00 byte/char
    validate_utf16+westmere, input size: 65542, iterations: 2000, dataset: unicode_lipsum/lipsum/Emoji-Lipsum.utf16.txt
       1.825 ins/byte,    1.825 cycle/byte,    8.915 GB/s (3.2 %),     3.449 GHz,    4.717 ins/cycle
       3.649 ins/char,    3.649 cycle/char,    4.458 Gc/s (3.2 %)     2.00 byte/char
    

    The relevant ARM code is as follows:

    const char16_t* arm_validate_utf16le(const char16_t* input, size_t size) {
        const char16_t* end = input + size;
        const auto v_d8 = simd8<uint8_t>::splat(0xd8);
        const auto v_f8 = simd8<uint8_t>::splat(0xf8);
        const auto v_fc = simd8<uint8_t>::splat(0xfc);
        const auto v_dc = simd8<uint8_t>::splat(0xdc);
        while (input + 16 < end) {
            // 0. Load data: since the validation takes into account only higher
            //    byte of each word, we compress the two vectors into one which
            //    consists only the higher bytes.
            const auto in0 = simd16<uint16_t>(input);
            const auto in1 = simd16<uint16_t>(input + simd16<uint16_t>::SIZE / sizeof(char16_t));
            const auto t0 = in0.shr<8>();
            const auto t1 = in1.shr<8>();
            const simd8<uint8_t> in = simd16<uint16_t>::pack(t0, t1);
            // 1. Check whether we have any 0xD800..DFFF word (0b1101'1xxx'yyyy'yyyy).
            const auto surrogates_wordmask = ((in & v_f8) == v_d8);
            if(surrogates_wordmask.none()) {
    // We never come here
                 input += 16;
            } else {
    /*****
    We keep coming here 
    ****/
                 const auto vH = simd8<uint8_t>((in & v_fc) ==  v_dc);
                const auto vL = simd8<uint8_t>(surrogates_wordmask).bit_andnot(vH);
                // We are going to need these later:
                const uint8_t low_vh = vH.first();
                const uint8_t high_vl = vL.last();
                // We shift vH down, possibly killing low_vh
                const auto sh = simd8<uint8_t>({1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,0xFF});
                const auto vHshifteddown = vH.apply_lookup_16_to(sh);
                const auto match = vHshifteddown == vL;
                // We need to handle the fact that high_vl is unmatched.
                // We could use this...
                // const uint8x16_t allbutlast = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xFF};
                //             match = vorrq_u8(match, allbutlast);
                // but sh will do:
                const auto fmatch = simd8<bool>(simd8<uint8_t>(match) | sh);
                // We deliberately take these two lines out of the following branchy code
                // so that they are always s
                if (fmatch.all() && low_vh == 0) {
                    input += (high_vl == 0) ? 16 : 15;
                } else {
                    return nullptr;
                }
            }
        }
        return input;
    }
    
    
    Algorithmic/Performance NEON 
    opened by lemire 14
  • Port to ARM NEON of the utf16 to utf8 transcoder.

    Port to ARM NEON of the utf16 to utf8 transcoder.

    I think it is important to port the utf16 to utf8 transcoder to ARM NEON.

    Apple M1

    | | llvm | arm64 | |----|------|--------| | Arabic-Lipsum.utf16.txt |  0.338 | 4.881 | | Chinese-Lipsum.utf16.txt |  0.389 | 3.312 | | Emoji-Lipsum.utf16.txt |  0.351 | 0.591 | | Hebrew-Lipsum.utf16.txt |  0.336 | 6.632 | | Hindi-Lipsum.utf16.txt |  0.275 | 3.290 | | Japanese-Lipsum.utf16.txt |  0.355 | 3.242 | | Korean-Lipsum.utf16.txt |  0.375 | 3.324 | | Latin-Lipsum.utf16.txt | 0.401 | 21.514 | | Russian-Lipsum.utf16.txt | 0.260 | 6.472 |

    These numbers are very, very good.

    Zen2 results:

    | | avx2 | |----|------| | Arabic-Lipsum.utf16.txt | 3.549 | | Chinese-Lipsum.utf16.txt | 2.307 | | Emoji-Lipsum.utf16.txt | 0.329| | Hebrew-Lipsum.utf16.txt | 3.543 | | Hindi-Lipsum.utf16.txt | 2.228 | | Japanese-Lipsum.utf16.txt | 2.252 | | Korean-Lipsum.utf16.txt |  2.192 | | Latin-Lipsum.utf16.txt | 6.769 | | Russian-Lipsum.utf16.txt |  3.540 |

    opened by lemire 13
  • Improving drastically the performance of the utf-8 to utf-16 icelake transcoder in some cases

    Improving drastically the performance of the utf-8 to utf-16 icelake transcoder in some cases

    Over some inputs dominated by 1-byte and 2-byte UTF-8 characters, the icelake utf-8 to utf-16 transcoder has very low instruction/cycle count. The reason is a long dependency chain. We can break the chain at the expense of a higher instruction count.

    The idea is that we load 64 bytes with each iteration, and if we detect that they are made of 1-byte or 2-byte characters, then we always decode fully the set of characters (between 16 and 32) that begin in the first 32 bytes of our input. Of course, that's pessimistic because we can often process much more if there are few ASCII (1-byte) characters.

    For lipsum/Arabic, this change pushes our from ~4.7 GB/s to 7.8 GB/s under GCC and from 5.0 GB/s to 7.6 GB/s under LLVM.

    My approach is simplistic and might be improved, but the gains appear quite solid.

    Warning: the cycle/byte metric should be ignored, it is reporting an erroneous value, see https://github.com/simdutf/simdutf/pull/194

    lipsum files:

    first the new code, then the old code
    We start with GCC 12
    unicode_lipsum/lipsum/Arabic-Lipsum.utf8.txt
     -- new : --
       1.373 ins/byte,    1.373 cycle/byte,    7.828 GB/s (0.3 %),     3.418 GHz,    3.144 ins/cycle 
     -- old : --
       0.669 ins/byte,    0.669 cycle/byte,    4.692 GB/s (0.2 %),     3.411 GHz,    0.920 ins/cycle 
    unicode_lipsum/lipsum/Chinese-Lipsum.utf8.txt
     -- new : --
       1.213 ins/byte,    1.213 cycle/byte,    5.811 GB/s (0.3 %),     3.415 GHz,    2.064 ins/cycle 
     -- old : --
       1.213 ins/byte,    1.213 cycle/byte,    5.815 GB/s (0.3 %),     3.414 GHz,    2.066 ins/cycle 
    unicode_lipsum/lipsum/Emoji-Lipsum.utf8.txt
     -- new : --
       1.553 ins/byte,    1.553 cycle/byte,    3.795 GB/s (0.3 %),     3.411 GHz,    1.728 ins/cycle 
     -- old : --
       1.553 ins/byte,    1.553 cycle/byte,    3.798 GB/s (0.4 %),     3.410 GHz,    1.729 ins/cycle 
    unicode_lipsum/lipsum/Hebrew-Lipsum.utf8.txt
     -- new : --
       1.374 ins/byte,    1.374 cycle/byte,    7.857 GB/s (0.4 %),     3.420 GHz,    3.156 ins/cycle 
     -- old : --
       0.670 ins/byte,    0.670 cycle/byte,    4.638 GB/s (0.3 %),     3.413 GHz,    0.911 ins/cycle 
    unicode_lipsum/lipsum/Hindi-Lipsum.utf8.txt
     -- new : --
       1.228 ins/byte,    1.228 cycle/byte,    4.910 GB/s (0.3 %),     3.409 GHz,    1.768 ins/cycle 
     -- old : --
       1.228 ins/byte,    1.228 cycle/byte,    4.907 GB/s (0.3 %),     3.412 GHz,    1.766 ins/cycle 
    unicode_lipsum/lipsum/Japanese-Lipsum.utf8.txt
     -- new : --
       1.220 ins/byte,    1.220 cycle/byte,    5.408 GB/s (0.4 %),     3.414 GHz,    1.933 ins/cycle 
     -- old : --
       1.220 ins/byte,    1.220 cycle/byte,    5.419 GB/s (0.3 %),     3.414 GHz,    1.937 ins/cycle 
    unicode_lipsum/lipsum/Korean-Lipsum.utf8.txt
     -- new : --
       1.230 ins/byte,    1.230 cycle/byte,    4.873 GB/s (0.3 %),     3.415 GHz,    1.755 ins/cycle 
     -- old : --
       1.230 ins/byte,    1.230 cycle/byte,    5.757 GB/s (24.2 %),     3.415 GHz,    2.074 ins/cycle 
    unicode_lipsum/lipsum/Latin-Lipsum.utf8.txt
     -- new : --
       0.253 ins/byte,    0.253 cycle/byte,   23.135 GB/s (5.0 %),     3.449 GHz,    1.698 ins/cycle 
     -- old : --
       0.253 ins/byte,    0.253 cycle/byte,   23.067 GB/s (4.6 %),     3.451 GHz,    1.692 ins/cycle 
    unicode_lipsum/lipsum/Russian-Lipsum.utf8.txt
     -- new : --
       1.373 ins/byte,    1.373 cycle/byte,    7.840 GB/s (0.3 %),     3.414 GHz,    3.152 ins/cycle 
     -- old : --
       0.660 ins/byte,    0.660 cycle/byte,    5.912 GB/s (42.9 %),     3.410 GHz,    1.144 ins/cycle 
    Now LLVM
    unicode_lipsum/lipsum/Arabic-Lipsum.utf8.txt
     -- new : --
       1.297 ins/byte,    1.297 cycle/byte,    7.613 GB/s (0.2 %),     3.419 GHz,    2.889 ins/cycle 
     -- old : --
       0.634 ins/byte,    0.634 cycle/byte,    4.984 GB/s (0.3 %),     3.413 GHz,    0.926 ins/cycle 
    unicode_lipsum/lipsum/Chinese-Lipsum.utf8.txt
     -- new : --
       1.086 ins/byte,    1.086 cycle/byte,    5.354 GB/s (0.3 %),     3.416 GHz,    1.702 ins/cycle 
     -- old : --
       1.086 ins/byte,    1.086 cycle/byte,    5.355 GB/s (0.3 %),     3.418 GHz,    1.701 ins/cycle 
    unicode_lipsum/lipsum/Emoji-Lipsum.utf8.txt
     -- new : --
       1.350 ins/byte,    1.350 cycle/byte,    4.174 GB/s (0.5 %),     3.413 GHz,    1.650 ins/cycle 
     -- old : --
       1.350 ins/byte,    1.350 cycle/byte,    4.173 GB/s (0.4 %),     3.413 GHz,    1.650 ins/cycle 
    unicode_lipsum/lipsum/Hebrew-Lipsum.utf8.txt
     -- new : --
       1.298 ins/byte,    1.298 cycle/byte,    7.698 GB/s (0.4 %),     3.423 GHz,    2.919 ins/cycle 
     -- old : --
       0.636 ins/byte,    0.636 cycle/byte,    4.924 GB/s (0.3 %),     3.416 GHz,    0.917 ins/cycle 
    unicode_lipsum/lipsum/Hindi-Lipsum.utf8.txt
     -- new : --
       1.099 ins/byte,    1.099 cycle/byte,    4.563 GB/s (0.2 %),     3.410 GHz,    1.471 ins/cycle 
     -- old : --
       1.099 ins/byte,    1.099 cycle/byte,    4.563 GB/s (0.3 %),     3.411 GHz,    1.471 ins/cycle 
    unicode_lipsum/lipsum/Japanese-Lipsum.utf8.txt
     -- new : --
       1.092 ins/byte,    1.092 cycle/byte,    5.000 GB/s (0.2 %),     3.416 GHz,    1.599 ins/cycle 
     -- old : --
       1.092 ins/byte,    1.092 cycle/byte,    5.001 GB/s (0.3 %),     3.415 GHz,    1.600 ins/cycle 
    unicode_lipsum/lipsum/Korean-Lipsum.utf8.txt
     -- new : --
       1.101 ins/byte,    1.101 cycle/byte,    4.524 GB/s (0.2 %),     3.415 GHz,    1.459 ins/cycle 
     -- old : --
       1.101 ins/byte,    1.101 cycle/byte,    4.530 GB/s (0.2 %),     3.415 GHz,    1.461 ins/cycle 
    unicode_lipsum/lipsum/Latin-Lipsum.utf8.txt
     -- new : --
       0.238 ins/byte,    0.238 cycle/byte,   24.442 GB/s (6.7 %),     3.466 GHz,    1.677 ins/cycle 
     -- old : --
       0.238 ins/byte,    0.238 cycle/byte,   24.657 GB/s (7.4 %),     3.462 GHz,    1.694 ins/cycle 
    unicode_lipsum/lipsum/Russian-Lipsum.utf8.txt
     -- new : --
       1.298 ins/byte,    1.298 cycle/byte,    7.645 GB/s (0.3 %),     3.416 GHz,    2.904 ins/cycle 
     -- old : --
       0.626 ins/byte,    0.626 cycle/byte,    5.001 GB/s (0.3 %),     3.410 GHz,    0.917 ins/cycle 
    

    wikipedia/Mars files:

    first the new code, then the old code
    We start with GCC 12
    unicode_lipsum/wikipedia_mars/arabic.utf8.txt
     -- new : --
       1.047 ins/byte,    1.047 cycle/byte,    8.909 GB/s (1.5 %),     3.404 GHz,    2.740 ins/cycle 
     -- old : --
       0.699 ins/byte,    0.699 cycle/byte,    4.982 GB/s (0.7 %),     3.402 GHz,    1.024 ins/cycle 
    unicode_lipsum/wikipedia_mars/chinese.utf8.txt
     -- new : --
       1.412 ins/byte,    1.412 cycle/byte,    4.146 GB/s (0.2 %),     3.404 GHz,    1.720 ins/cycle 
     -- old : --
       1.406 ins/byte,    1.406 cycle/byte,    4.059 GB/s (0.2 %),     3.404 GHz,    1.677 ins/cycle 
    unicode_lipsum/wikipedia_mars/czech.utf8.txt
     -- new : --
       1.231 ins/byte,    1.231 cycle/byte,    7.044 GB/s (0.3 %),     3.408 GHz,    2.544 ins/cycle 
     -- old : --
       1.097 ins/byte,    1.097 cycle/byte,    3.811 GB/s (0.2 %),     3.404 GHz,    1.228 ins/cycle 
    unicode_lipsum/wikipedia_mars/english.utf8.txt
     -- new : --
       0.389 ins/byte,    0.389 cycle/byte,   13.748 GB/s (2.1 %),     3.410 GHz,    1.569 ins/cycle 
     -- old : --
       0.384 ins/byte,    0.384 cycle/byte,   14.155 GB/s (1.8 %),     3.410 GHz,    1.593 ins/cycle 
    unicode_lipsum/wikipedia_mars/esperanto.utf8.txt
     -- new : --
       0.896 ins/byte,    0.896 cycle/byte,    8.703 GB/s (0.7 %),     3.417 GHz,    2.282 ins/cycle 
     -- old : --
       0.843 ins/byte,    0.843 cycle/byte,    5.928 GB/s (0.6 %),     3.412 GHz,    1.464 ins/cycle 
    unicode_lipsum/wikipedia_mars/french.utf8.txt
     -- new : --
       0.989 ins/byte,    0.989 cycle/byte,    9.001 GB/s (1.0 %),     3.407 GHz,    2.614 ins/cycle 
     -- old : --
       0.882 ins/byte,    0.882 cycle/byte,    4.654 GB/s (0.6 %),     3.403 GHz,    1.207 ins/cycle 
    unicode_lipsum/wikipedia_mars/german.utf8.txt
     -- new : --
       0.880 ins/byte,    0.880 cycle/byte,    9.253 GB/s (0.6 %),     3.407 GHz,    2.389 ins/cycle 
     -- old : --
       0.821 ins/byte,    0.821 cycle/byte,    6.218 GB/s (0.4 %),     3.405 GHz,    1.500 ins/cycle 
    unicode_lipsum/wikipedia_mars/greek.utf8.txt
     -- new : --
       1.114 ins/byte,    1.114 cycle/byte,    8.458 GB/s (0.3 %),     3.408 GHz,    2.766 ins/cycle 
     -- old : --
       0.776 ins/byte,    0.776 cycle/byte,    4.878 GB/s (0.3 %),     3.405 GHz,    1.112 ins/cycle 
    unicode_lipsum/wikipedia_mars/hebrew.utf8.txt
     -- new : --
       1.242 ins/byte,    1.242 cycle/byte,    7.602 GB/s (0.3 %),     3.407 GHz,    2.771 ins/cycle 
     -- old : --
       0.878 ins/byte,    0.878 cycle/byte,    4.085 GB/s (0.2 %),     3.404 GHz,    1.054 ins/cycle 
    unicode_lipsum/wikipedia_mars/hindi.utf8.txt
     -- new : --
       1.126 ins/byte,    1.126 cycle/byte,    5.329 GB/s (0.3 %),     3.402 GHz,    1.764 ins/cycle 
     -- old : --
       1.119 ins/byte,    1.119 cycle/byte,    5.267 GB/s (0.3 %),     3.403 GHz,    1.732 ins/cycle 
    unicode_lipsum/wikipedia_mars/japanese.utf8.txt
     -- new : --
       1.346 ins/byte,    1.346 cycle/byte,    4.394 GB/s (0.3 %),     3.405 GHz,    1.738 ins/cycle 
     -- old : --
       1.341 ins/byte,    1.341 cycle/byte,    4.259 GB/s (0.3 %),     3.405 GHz,    1.678 ins/cycle 
    unicode_lipsum/wikipedia_mars/korean.utf8.txt
     -- new : --
       1.408 ins/byte,    1.408 cycle/byte,    4.199 GB/s (0.3 %),     3.407 GHz,    1.735 ins/cycle 
     -- old : --
       1.402 ins/byte,    1.402 cycle/byte,    4.035 GB/s (0.2 %),     3.407 GHz,    1.661 ins/cycle 
    unicode_lipsum/wikipedia_mars/persan.utf8.txt
     -- new : --
       1.183 ins/byte,    1.183 cycle/byte,    6.586 GB/s (0.4 %),     3.407 GHz,    2.286 ins/cycle 
     -- old : --
       0.989 ins/byte,    0.989 cycle/byte,    4.468 GB/s (0.3 %),     3.405 GHz,    1.297 ins/cycle 
    unicode_lipsum/wikipedia_mars/portuguese.utf8.txt
     -- new : --
       0.970 ins/byte,    0.970 cycle/byte,    9.124 GB/s (0.4 %),     3.406 GHz,    2.598 ins/cycle 
     -- old : --
       0.885 ins/byte,    0.885 cycle/byte,    4.966 GB/s (0.3 %),     3.403 GHz,    1.291 ins/cycle 
    unicode_lipsum/wikipedia_mars/russian.utf8.txt
     -- new : --
       1.155 ins/byte,    1.155 cycle/byte,    8.067 GB/s (0.4 %),     3.406 GHz,    2.736 ins/cycle 
     -- old : --
       0.799 ins/byte,    0.799 cycle/byte,    4.601 GB/s (0.4 %),     3.402 GHz,    1.080 ins/cycle 
    unicode_lipsum/wikipedia_mars/thai.utf8.txt
     -- new : --
       1.036 ins/byte,    1.036 cycle/byte,    5.878 GB/s (0.7 %),     3.403 GHz,    1.789 ins/cycle 
     -- old : --
       1.032 ins/byte,    1.032 cycle/byte,    5.813 GB/s (0.8 %),     3.402 GHz,    1.764 ins/cycle 
    unicode_lipsum/wikipedia_mars/turkish.utf8.txt
     -- new : --
       1.054 ins/byte,    1.054 cycle/byte,    8.461 GB/s (0.5 %),     3.407 GHz,    2.617 ins/cycle 
     -- old : --
       0.928 ins/byte,    0.928 cycle/byte,    4.582 GB/s (0.2 %),     3.404 GHz,    1.249 ins/cycle 
    unicode_lipsum/wikipedia_mars/vietnamese.utf8.txt
     -- new : --
       1.554 ins/byte,    1.554 cycle/byte,    3.909 GB/s (0.3 %),     3.403 GHz,    1.785 ins/cycle 
     -- old : --
       1.543 ins/byte,    1.543 cycle/byte,    3.724 GB/s (0.3 %),     3.402 GHz,    1.689 ins/cycle 
    Now LLVM
    unicode_lipsum/wikipedia_mars/arabic.utf8.txt
     -- new : --
       0.987 ins/byte,    0.987 cycle/byte,    8.033 GB/s (1.3 %),     3.406 GHz,    2.327 ins/cycle 
     -- old : --
       0.657 ins/byte,    0.657 cycle/byte,    5.003 GB/s (1.4 %),     3.403 GHz,    0.966 ins/cycle 
    unicode_lipsum/wikipedia_mars/chinese.utf8.txt
     -- new : --
       1.268 ins/byte,    1.268 cycle/byte,    3.773 GB/s (0.3 %),     3.405 GHz,    1.404 ins/cycle 
     -- old : --
       1.262 ins/byte,    1.262 cycle/byte,    3.722 GB/s (0.5 %),     3.406 GHz,    1.379 ins/cycle 
    unicode_lipsum/wikipedia_mars/czech.utf8.txt
     -- new : --
       1.148 ins/byte,    1.148 cycle/byte,    6.535 GB/s (1.1 %),     3.409 GHz,    2.200 ins/cycle 
     -- old : --
       1.016 ins/byte,    1.016 cycle/byte,    3.896 GB/s (0.3 %),     3.406 GHz,    1.162 ins/cycle 
    unicode_lipsum/wikipedia_mars/english.utf8.txt
     -- new : --
       0.360 ins/byte,    0.360 cycle/byte,   13.995 GB/s (2.2 %),     3.411 GHz,    1.476 ins/cycle 
     -- old : --
       0.354 ins/byte,    0.354 cycle/byte,   13.818 GB/s (2.2 %),     3.408 GHz,    1.437 ins/cycle 
    unicode_lipsum/wikipedia_mars/esperanto.utf8.txt
     -- new : --
       0.828 ins/byte,    0.828 cycle/byte,    8.048 GB/s (1.3 %),     3.421 GHz,    1.948 ins/cycle 
     -- old : --
       0.775 ins/byte,    0.775 cycle/byte,    5.895 GB/s (0.7 %),     3.414 GHz,    1.338 ins/cycle 
    unicode_lipsum/wikipedia_mars/french.utf8.txt
     -- new : --
       0.929 ins/byte,    0.929 cycle/byte,    7.498 GB/s (1.6 %),     3.406 GHz,    2.046 ins/cycle 
     -- old : --
       0.824 ins/byte,    0.824 cycle/byte,    4.427 GB/s (1.6 %),     3.403 GHz,    1.072 ins/cycle 
    unicode_lipsum/wikipedia_mars/german.utf8.txt
     -- new : --
       0.814 ins/byte,    0.814 cycle/byte,    8.567 GB/s (0.8 %),     3.408 GHz,    2.045 ins/cycle 
     -- old : --
       0.756 ins/byte,    0.756 cycle/byte,    6.200 GB/s (0.5 %),     3.407 GHz,    1.376 ins/cycle 
    unicode_lipsum/wikipedia_mars/greek.utf8.txt
     -- new : --
       1.044 ins/byte,    1.044 cycle/byte,    7.772 GB/s (0.7 %),     3.409 GHz,    2.381 ins/cycle 
     -- old : --
       0.724 ins/byte,    0.724 cycle/byte,    5.042 GB/s (0.4 %),     3.406 GHz,    1.072 ins/cycle 
    unicode_lipsum/wikipedia_mars/hebrew.utf8.txt
     -- new : --
       1.165 ins/byte,    1.165 cycle/byte,    7.018 GB/s (0.5 %),     3.408 GHz,    2.399 ins/cycle 
     -- old : --
       0.819 ins/byte,    0.819 cycle/byte,    4.208 GB/s (0.3 %),     3.405 GHz,    1.013 ins/cycle 
    unicode_lipsum/wikipedia_mars/hindi.utf8.txt
     -- new : --
       1.012 ins/byte,    1.012 cycle/byte,    4.647 GB/s (0.6 %),     3.403 GHz,    1.381 ins/cycle 
     -- old : --
       1.005 ins/byte,    1.005 cycle/byte,    4.626 GB/s (0.8 %),     3.404 GHz,    1.366 ins/cycle 
    unicode_lipsum/wikipedia_mars/japanese.utf8.txt
     -- new : --
       1.209 ins/byte,    1.209 cycle/byte,    4.001 GB/s (0.4 %),     3.405 GHz,    1.421 ins/cycle 
     -- old : --
       1.204 ins/byte,    1.204 cycle/byte,    3.920 GB/s (0.6 %),     3.405 GHz,    1.386 ins/cycle 
    unicode_lipsum/wikipedia_mars/korean.utf8.txt
     -- new : --
       1.264 ins/byte,    1.264 cycle/byte,    3.843 GB/s (0.6 %),     3.408 GHz,    1.426 ins/cycle 
     -- old : --
       1.259 ins/byte,    1.259 cycle/byte,    3.726 GB/s (0.5 %),     3.408 GHz,    1.376 ins/cycle 
    unicode_lipsum/wikipedia_mars/persan.utf8.txt
     -- new : --
       1.091 ins/byte,    1.091 cycle/byte,    6.085 GB/s (0.5 %),     3.408 GHz,    1.948 ins/cycle 
     -- old : --
       0.907 ins/byte,    0.907 cycle/byte,    4.457 GB/s (0.4 %),     3.406 GHz,    1.187 ins/cycle 
    unicode_lipsum/wikipedia_mars/portuguese.utf8.txt
     -- new : --
       0.906 ins/byte,    0.906 cycle/byte,    8.437 GB/s (0.7 %),     3.409 GHz,    2.243 ins/cycle 
     -- old : --
       0.822 ins/byte,    0.822 cycle/byte,    5.089 GB/s (0.5 %),     3.404 GHz,    1.229 ins/cycle 
    unicode_lipsum/wikipedia_mars/russian.utf8.txt
     -- new : --
       1.083 ins/byte,    1.083 cycle/byte,    7.130 GB/s (0.9 %),     3.406 GHz,    2.266 ins/cycle 
     -- old : --
       0.744 ins/byte,    0.744 cycle/byte,    4.665 GB/s (0.7 %),     3.403 GHz,    1.020 ins/cycle 
    unicode_lipsum/wikipedia_mars/thai.utf8.txt
     -- new : --
       0.931 ins/byte,    0.931 cycle/byte,    5.022 GB/s (0.7 %),     3.402 GHz,    1.374 ins/cycle 
     -- old : --
       0.928 ins/byte,    0.928 cycle/byte,    5.001 GB/s (0.9 %),     3.403 GHz,    1.363 ins/cycle 
    unicode_lipsum/wikipedia_mars/turkish.utf8.txt
     -- new : --
       0.985 ins/byte,    0.985 cycle/byte,    7.827 GB/s (0.6 %),     3.410 GHz,    2.261 ins/cycle 
     -- old : --
       0.862 ins/byte,    0.862 cycle/byte,    4.705 GB/s (0.4 %),     3.405 GHz,    1.192 ins/cycle 
    unicode_lipsum/wikipedia_mars/vietnamese.utf8.txt
     -- new : --
       1.397 ins/byte,    1.397 cycle/byte,    3.476 GB/s (0.7 %),     3.402 GHz,    1.427 ins/cycle 
     -- old : --
       1.386 ins/byte,    1.386 cycle/byte,    3.373 GB/s (0.3 %),     3.402 GHz,    1.374 ins/cycle 
    
    opened by lemire 11
  • Alternative UTF-8 to UTF-16 transcoder

    Alternative UTF-8 to UTF-16 transcoder

    The code has been updated, see more results below.

    This is a draft version of a C/C++ port of Robert's transcoder (UTF-8 to UTF-16) with an additional fast path for 1,2,3 bytes. It is not better across the board currently, but there are some nice wins (10% to 20% faster in some cases). This is a first draft and there might be various code generation issues to examine.

    Let us start with the regressions... (where the new code is worse)

    For each file, the first numbers are for the new transcoder, and the second one is the existing transcoder.
    
    ../unicode_lipsum/wikipedia_mars/chinese.utf8.txt
       3.758 GB/s (0.8 %)    2.844 Gc/s     1.32 byte/char 
       4.045 GB/s (0.8 %)    3.061 Gc/s     1.32 byte/char 
    ../unicode_lipsum/wikipedia_mars/czech.utf8.txt
       3.698 GB/s (0.8 %)    3.482 Gc/s     1.06 byte/char 
       3.923 GB/s (1.0 %)    3.695 Gc/s     1.06 byte/char 
    ../unicode_lipsum/wikipedia_mars/french.utf8.txt
       4.298 GB/s (2.5 %)    4.182 Gc/s     1.03 byte/char 
       4.403 GB/s (1.5 %)    4.285 Gc/s     1.03 byte/char 
    ../unicode_lipsum/wikipedia_mars/esperanto.utf8.txt
       5.670 GB/s (1.4 %)    5.485 Gc/s     1.03 byte/char 
       5.962 GB/s (1.1 %)    5.768 Gc/s     1.03 byte/char 
    ../unicode_lipsum/wikipedia_mars/german.utf8.txt
       5.890 GB/s (1.1 %)    5.760 Gc/s     1.02 byte/char 
       6.070 GB/s (0.9 %)    5.936 Gc/s     1.02 byte/char 
    ../unicode_lipsum/wikipedia_mars/korean.utf8.txt
       3.754 GB/s (0.9 %)    2.797 Gc/s     1.34 byte/char 
       3.945 GB/s (0.8 %)    2.939 Gc/s     1.34 byte/char 
    ../unicode_lipsum/wikipedia_mars/turkish.utf8.txt
       4.441 GB/s (0.9 %)    4.222 Gc/s     1.05 byte/char 
       4.673 GB/s (1.2 %)    4.442 Gc/s     1.05 byte/char
    ../unicode_lipsum/wikipedia_mars/vietnamese.utf8.txt
       3.459 GB/s (0.8 %)    3.062 Gc/s     1.13 byte/char 
       3.867 GB/s (0.7 %)    3.423 Gc/s     1.13 byte/char 
    

    So for a number of cases, we are down 10%.

    Let us look at the wins..

    For each file, the first numbers are for the new transcoder, and the second one is the existing transcoder.
    ../unicode_lipsum/wikipedia_mars/arabic.utf8.txt
       4.873 GB/s (1.4 %)    3.878 Gc/s     1.26 byte/char 
       4.157 GB/s (1.3 %)    3.308 Gc/s     1.26 byte/char 
    ../unicode_lipsum/wikipedia_mars/greek.utf8.txt
       4.745 GB/s (1.1 %)    3.741 Gc/s     1.27 byte/char 
       4.154 GB/s (0.9 %)    3.276 Gc/s     1.27 byte/char 
    ../unicode_lipsum/wikipedia_mars/hebrew.utf8.txt
       4.002 GB/s (0.8 %)    3.081 Gc/s     1.30 byte/char 
       3.679 GB/s (0.9 %)    2.832 Gc/s     1.30 byte/char 
    ../unicode_lipsum/wikipedia_mars/hindi.utf8.txt
       4.830 GB/s (1.1 %)    3.336 Gc/s     1.45 byte/char 
       4.618 GB/s (0.9 %)    3.190 Gc/s     1.45 byte/char 
    ../unicode_lipsum/wikipedia_mars/persan.utf8.txt
       4.289 GB/s (0.8 %)    3.424 Gc/s     1.25 byte/char 
       3.944 GB/s (1.0 %)    3.148 Gc/s     1.25 byte/char 
    ../unicode_lipsum/wikipedia_mars/portuguese.utf8.txt
       4.784 GB/s (1.3 %)    4.664 Gc/s     1.03 byte/char 
       4.683 GB/s (1.0 %)    4.565 Gc/s     1.03 byte/char 
    ../unicode_lipsum/wikipedia_mars/russian.utf8.txt
       4.486 GB/s (1.0 %)    3.438 Gc/s     1.30 byte/char 
       3.833 GB/s (1.0 %)    2.938 Gc/s     1.30 byte/char 
    ../unicode_lipsum/wikipedia_mars/thai.utf8.txt
       5.205 GB/s (1.5 %)    3.549 Gc/s     1.47 byte/char 
       4.725 GB/s (1.3 %)    3.222 Gc/s     1.47 byte/char 
    ../unicode_lipsum/lipsum/Arabic-Lipsum.utf8.txt
       4.682 GB/s (0.7 %)    2.623 Gc/s     1.78 byte/char 
       3.350 GB/s (1.0 %)    1.877 Gc/s     1.78 byte/char 
    ../unicode_lipsum/lipsum/Chinese-Lipsum.utf8.txt
       5.264 GB/s (0.7 %)    1.768 Gc/s     2.98 byte/char 
       3.974 GB/s (0.7 %)    1.335 Gc/s     2.98 byte/char 
    ../unicode_lipsum/lipsum/Emoji-Lipsum.utf8.txt
       3.427 GB/s (0.9 %)    0.857 Gc/s     4.00 byte/char 
       2.973 GB/s (0.9 %)    0.743 Gc/s     4.00 byte/char 
    ../unicode_lipsum/lipsum/Hebrew-Lipsum.utf8.txt
       4.637 GB/s (0.7 %)    2.601 Gc/s     1.78 byte/char 
       3.317 GB/s (0.9 %)    1.861 Gc/s     1.78 byte/char 
    ../unicode_lipsum/lipsum/Hindi-Lipsum.utf8.txt
       4.934 GB/s (10.9 %)    1.837 Gc/s     2.69 byte/char 
       3.966 GB/s (0.8 %)    1.477 Gc/s     2.69 byte/char 
    ../unicode_lipsum/lipsum/Japanese-Lipsum.utf8.txt
       4.950 GB/s (0.7 %)    1.706 Gc/s     2.90 byte/char 
       3.964 GB/s (0.9 %)    1.366 Gc/s     2.90 byte/char 
    ../unicode_lipsum/lipsum/Korean-Lipsum.utf8.txt
       4.523 GB/s (0.6 %)    1.843 Gc/s     2.45 byte/char 
       3.920 GB/s (0.9 %)    1.598 Gc/s     2.45 byte/char 
    ../unicode_lipsum/lipsum/Russian-Lipsum.utf8.txt
       4.699 GB/s (0.6 %)    2.600 Gc/s     1.81 byte/char 
       3.290 GB/s (0.8 %)    1.820 Gc/s     1.81 byte/char
    

    Many of the wins are larger than 10%.

    Another way to look at the issue is that, in the worst case, we achieve ~3.5 GB/s (with emojis, Vietnamese). This is significantly better than with the haswell kernel (AVX2) where our worst case was about half that (~1.5 GB/s), and better than the previous AVX-512 which could go as low as 3.5 GB/s.

    The numbers without the new fast path are less favorable with many cases where we get worse results (with speeds significantly below 4 GB/s):

    For each file, the first numbers are for the new transcoder, and the second one is the existing transcoder.
    
    ../unicode_lipsum/wikipedia_mars/chinese.utf8.txt
       3.014 GB/s (0.9 %)    2.281 Gc/s     1.32 byte/char 
       3.993 GB/s (1.0 %)    3.022 Gc/s     1.32 byte/char 
    ../unicode_lipsum/wikipedia_mars/czech.utf8.txt
       3.430 GB/s (0.9 %)    3.231 Gc/s     1.06 byte/char 
       3.905 GB/s (0.8 %)    3.677 Gc/s     1.06 byte/char 
    ../unicode_lipsum/wikipedia_mars/english.utf8.txt
      12.308 GB/s (2.3 %)   12.218 Gc/s     1.01 byte/char 
      13.470 GB/s (2.4 %)   13.371 Gc/s     1.01 byte/char 
    ../unicode_lipsum/wikipedia_mars/esperanto.utf8.txt
       5.074 GB/s (1.3 %)    4.908 Gc/s     1.03 byte/char 
       5.876 GB/s (1.3 %)    5.685 Gc/s     1.03 byte/char 
    ../unicode_lipsum/wikipedia_mars/french.utf8.txt
       4.064 GB/s (3.0 %)    3.955 Gc/s     1.03 byte/char 
       4.396 GB/s (1.2 %)    4.278 Gc/s     1.03 byte/char 
    ../unicode_lipsum/wikipedia_mars/german.utf8.txt
       5.307 GB/s (1.1 %)    5.190 Gc/s     1.02 byte/char 
       6.068 GB/s (0.9 %)    5.934 Gc/s     1.02 byte/char 
    ../unicode_lipsum/wikipedia_mars/hindi.utf8.txt
       3.899 GB/s (1.1 %)    2.693 Gc/s     1.45 byte/char 
       4.665 GB/s (1.0 %)    3.223 Gc/s     1.45 byte/char 
    ../unicode_lipsum/wikipedia_mars/japanese.utf8.txt
       3.177 GB/s (0.9 %)    2.298 Gc/s     1.38 byte/char 
       4.108 GB/s (0.9 %)    2.972 Gc/s     1.38 byte/char 
    ../unicode_lipsum/wikipedia_mars/korean.utf8.txt
       3.006 GB/s (1.1 %)    2.240 Gc/s     1.34 byte/char 
       3.949 GB/s (0.9 %)    2.943 Gc/s     1.34 byte/char 
    ../unicode_lipsum/wikipedia_mars/persan.utf8.txt
       3.833 GB/s (0.9 %)    3.060 Gc/s     1.25 byte/char 
       3.947 GB/s (0.9 %)    3.151 Gc/s     1.25 byte/char 
    ../unicode_lipsum/wikipedia_mars/portuguese.utf8.txt
       4.476 GB/s (1.0 %)    4.363 Gc/s     1.03 byte/char 
       4.733 GB/s (1.3 %)    4.614 Gc/s     1.03 byte/char 
    ../unicode_lipsum/wikipedia_mars/thai.utf8.txt
       4.208 GB/s (1.6 %)    2.869 Gc/s     1.47 byte/char 
       4.718 GB/s (1.4 %)    3.217 Gc/s     1.47 byte/char 
    ../unicode_lipsum/wikipedia_mars/turkish.utf8.txt
       4.162 GB/s (0.9 %)    3.957 Gc/s     1.05 byte/char 
       4.667 GB/s (0.8 %)    4.437 Gc/s     1.05 byte/char 
    ../unicode_lipsum/wikipedia_mars/vietnamese.utf8.txt
       2.790 GB/s (0.9 %)    2.469 Gc/s     1.13 byte/char 
       3.890 GB/s (0.9 %)    3.444 Gc/s     1.13 byte/char 
    ../unicode_lipsum/lipsum/Hindi-Lipsum.utf8.txt
       3.588 GB/s (1.4 %)    1.336 Gc/s     2.69 byte/char 
       3.963 GB/s (0.8 %)    1.476 Gc/s     2.69 byte/char 
    ../unicode_lipsum/lipsum/Japanese-Lipsum.utf8.txt
       3.865 GB/s (0.9 %)    1.332 Gc/s     2.90 byte/char 
       3.974 GB/s (0.8 %)    1.370 Gc/s     2.90 byte/char 
    ../unicode_lipsum/lipsum/Korean-Lipsum.utf8.txt
       3.568 GB/s (0.8 %)    1.454 Gc/s     2.45 byte/char 
       3.953 GB/s (0.9 %)    1.611 Gc/s     2.45 byte/char 
    

    This is with validation (throughout).

    opened by lemire 11
  • Feature request: UTF8 Byte Length

    Feature request: UTF8 Byte Length

    Node has Buffer.byteLength() explained in https://www.w3schools.com/nodejs/met_buffer_bytelength.asp. The Buffer.byteLength() method returns the length of a specified string object, in bytes.

    If possible, it would be great to have a similar & faster approach.

    opened by anonrig 5
  • API for escaping characters

    API for escaping characters " \

    I'm using this library to convert utf16le to utf8 and place it inside a JSON string. Currently this library has no way of escaping the special characters " and \ which I'd need. Is there any interest in getting this upstream?

    opened by HookedBehemoth 9
  • Add high-level C++17/C++20 conversion functions

    Add high-level C++17/C++20 conversion functions

    Starting with C++11, we have a full range of specialized string classes... E.g., std::u8string, std::u16string... std::u8string_view, and so forth. Strictly speaking they were introduced with C++11 (for std::*string) and C++17 (for std::*string_view) but std::u8string became available with C++20.

    We could use std::string, assuming that it is UTF-8, but it might also use other encodings. If we are explicit that we are assuming UTF-8 then it is ok.

    What we could do is to provide conversion functions. That might be helpful to some...?

    The objective would be to improve quality of life for users who prefer not to mess with pointers.

    #include <string>
    
    #ifndef SIMDUTF_CPLUSPLUS
    #if defined(_MSVC_LANG) && !defined(__clang__)
    #define SIMDUTF_CPLUSPLUS (_MSC_VER == 1900 ? 201103L : _MSVC_LANG)
    #else
    #define SIMDUTF_CPLUSPLUS __cplusplus
    #endif
    #endif
    
    #if (SIMDUTF_CPLUSPLUS >= 202002L)
    #define SIMDJSON_CPLUSPLUS20 1
    #endif
    
    #if (SIMDUTF_CPLUSPLUS >= 201703L)
    #define SIMDJSON_CPLUSPLUS17 1
    #endif
    
    
    #if SIMDJSON_CPLUSPLUS17
    
    inline std::u32string to_u32string(const std::u16string_view in) {
      return U"bogus code";
    }
    
    #if SIMDJSON_CPLUSPLUS20
    inline std::u32string to_u32string(const std::u8string_view in) {
      return U"bogus code";
    }
    #endif 
    
    
    inline std::u16string to_u16string(const std::u16string_view in) {
      return u"bogus code";
    }
    
    inline std::u16string to_u16string(const std::u32string_view in) {
      return u"bogus code";
    }
    
    int main() {
      printf("Support for C++17.\n");
      std::string mystring("hello"); // could be any encoding?
    #if SIMDJSON_CPLUSPLUS20
      std::u8string mystringu8(u8"hello");
    #endif
      std::u16string mystringu16(u"hello");
      std::u32string mystringu32(U"hello");
    #if SIMDJSON_CPLUSPLUS20
      std::u32string mystringu8_as32 = to_u32string(mystringu8);
    #endif
      std::u32string mystring_as32 = to_u32string(mystring);
    
    }
    
    #else
    int main() { printf("No support for C++20.\n"); }
    #endif
    

    References:

    https://en.cppreference.com/w/cpp/string/basic_string_view https://en.cppreference.com/w/cpp/string/basic_string

    opened by lemire 1
Releases(v2.0.9)
  • v2.0.9(Dec 15, 2022)

    What's Changed

    • Improving drastically the performance of the utf-8 to utf-16 icelake transcoder in some cases by @lemire in https://github.com/simdutf/simdutf/pull/193
    • Correcting cycle-per-byte and cycle-per-char metrics as reported by our benchmarking tool. by @lemire in https://github.com/simdutf/simdutf/pull/194

    Full Changelog: https://github.com/simdutf/simdutf/compare/v2.0.8...v2.0.9

    Source code(tar.gz)
    Source code(zip)
    singleheader.zip(144.80 KB)
  • v2.0.8(Dec 14, 2022)

    What's Changed

    • fix compilation error on gcc 9 debug build by @toge in https://github.com/simdutf/simdutf/pull/192

    New Contributors

    • @toge made their first contribution in https://github.com/simdutf/simdutf/pull/192

    Full Changelog: https://github.com/simdutf/simdutf/compare/v2.0.7...v2.0.8

    Source code(tar.gz)
    Source code(zip)
    singleheader.zip(144.77 KB)
  • v2.0.7(Dec 13, 2022)

    What's Changed

    • fix: check for avx512 before applying gcc-8 changes by @anonrig in https://github.com/simdutf/simdutf/pull/187
    • fix: update SIMDUTF_VERSION to 2.0.6 by @anonrig in https://github.com/simdutf/simdutf/pull/190
    • Silence some spurious GCC warnings (unrelated to our code). by @lemire in https://github.com/simdutf/simdutf/pull/191

    New Contributors

    • @anonrig made their first contribution in https://github.com/simdutf/simdutf/pull/187

    Full Changelog: https://github.com/simdutf/simdutf/compare/v2.0.6...v2.0.7

    Source code(tar.gz)
    Source code(zip)
    singleheader.zip(144.74 KB)
  • v2.0.6(Dec 8, 2022)

  • v2.0.5(Dec 1, 2022)

  • v2.0.4(Nov 30, 2022)

    What's Changed

    • Better code generation for UTF-8 to UTF-16 routine under GCC and LLVM (icelake kernel) by @lemire in https://github.com/simdutf/simdutf/pull/183

    New Contributors

    • @sno2 made their first contribution in https://github.com/simdutf/simdutf/pull/181

    Full Changelog: https://github.com/simdutf/simdutf/compare/v2.0.3...v2.0.4

    Source code(tar.gz)
    Source code(zip)
    singleheader.zip(143.71 KB)
  • v2.0.3(Nov 11, 2022)

    What's Changed

    • Allow skipping the build of the tools via SIMDUTF_TOOLS=OFF by @schlenk in https://github.com/simdutf/simdutf/pull/178
    • Guarding calls to iconv to accommodate win-iconv by @lemire in https://github.com/simdutf/simdutf/pull/179
    • Allows the installation of the tool sutf. by @lemire in https://github.com/simdutf/simdutf/pull/180

    New Contributors

    • @schlenk made their first contribution in https://github.com/simdutf/simdutf/pull/178

    Full Changelog: https://github.com/simdutf/simdutf/compare/v2.0.2...v2.0.3

    Source code(tar.gz)
    Source code(zip)
    singleheader.zip(143.31 KB)
  • v2.0.2(Oct 28, 2022)

  • v2.0.1(Oct 28, 2022)

  • v2.0.0(Oct 27, 2022)

    What's Changed

    Most text today is represented using the Unicode standard. The simdutf library seeks to provide high performance Unicode functions for C++ programmers. Version 2.0 introduces a richer API, with support for the most popular Unicode formats (UTF-32, UTF-16BE, UTF-16LE and UTF-8). Users can transcode between these formats, and validate the inputs as needed. For users that so desire, we also return a structure containing failure information, including the nature and location of the error.

    For advanced x64 processors, we introduce a whole new AVX-512 kernel which includes novel algorithms by @WojciechMula and @clausecker It can be twice as fast as a previous kernels, reaching speeds close to 5 GB/s on non-trivial Unicode inputs. The library relies on runtime dispatching so that if your processor supports the new kernel, it is automatically used. The currently supported processors include Ice Lake, Rocket Lake, and Zen4.

    On an Ice Lake processor, we get the following speeds with the Arabic-Lipsum.utf8.txt test file:

    | function | UTF-8 to UTF-16 speed (GB/s) | |:---------|:--------------------------------| | simduft (AVX-512) | 4.6 GB/s | | simduft (AVX2) | 2.3 GB/s | | ICU | 1.4 GB/s | | iconv | 0.7 GB/s |

    Major changes

    • AVX512 kernel for Ice Lake / Zen 4 processors by @WojciechMula and @clausecker in https://github.com/simdutf/simdutf/pull/174
    • Support for UTF-32, UTF-16BE and transcoding between UTF-32, UTF-16BE, UTF-16LE and UTF-8, by @NicolasJiaxin, @clausecker and others
    • Ascii validation by @NicolasJiaxin in https://github.com/simdutf/simdutf/pull/110
    • One pass autodetect encodings by @NicolasJiaxin in https://github.com/simdutf/simdutf/pull/134
    • Returning a struct indicating success and length for some functions by @NicolasJiaxin in https://github.com/simdutf/simdutf/pull/157
    • Iconv-like tool (sut) by @NicolasJiaxin in https://github.com/simdutf/simdutf/pull/160

    Performance

    • Optimize ARM utf16 validation by @danlark1 in https://github.com/simdutf/simdutf/pull/145

    Bug fixes

    • fix valid_utf8_to_utf16.h producing invalid utf16 (issue111) by @lemire in https://github.com/simdutf/simdutf/pull/119
    • Fix Buffer Overrun on aarch64 by @wx257osn2 in https://github.com/simdutf/simdutf/pull/171
    • fix some typos by @striezel in https://github.com/simdutf/simdutf/pull/139

    Testing

    • Fuzzer for buffer overflow by @NicolasJiaxin in https://github.com/simdutf/simdutf/pull/163
    • update actions/checkout in GitHub Actions to v3 by @striezel in https://github.com/simdutf/simdutf/pull/138

    Building

    • 👷‍♀️ CMake: Guard Tests/Examples Behind CMake Variables by @ThePhD in https://github.com/simdutf/simdutf/pull/149

    Benchmarking

    • Added iconv to the benchmarks, by @lemire in https://github.com/simdutf/simdutf/pull/164
    • We use simpler performance counters since under graviton 2 (AWS), you may only access two counters at a time by @lemire in https://github.com/simdutf/simdutf/pull/123

    New Contributors

    • @striezel made their first contribution in https://github.com/simdutf/simdutf/pull/139
    • @danlark1 made their first contribution in https://github.com/simdutf/simdutf/pull/145
    • @ThePhD made their first contribution in https://github.com/simdutf/simdutf/pull/149
    • @wx257osn2 made their first contribution in https://github.com/simdutf/simdutf/pull/171

    Full Changelog: https://github.com/simdutf/simdutf/compare/v1.0.1...v2.0.0

    Source code(tar.gz)
    Source code(zip)
    singleheader.zip(143.40 KB)
Owner
simdutf: Unicode at gigabytes per second
simdutf: Unicode at gigabytes per second
📚 single header utf8 string functions for C and C++

?? utf8.h A simple one header solution to supporting utf8 strings in C and C++. Functions provided from the C header string.h but with a utf8* prefix

Neil Henning 1.3k Dec 28, 2022
Read file to console, automatically recognize file encoding, include ansi, utf16le, utf16be, utf8. Currently output ansi as gbk for chinese text search.

rgpre A tool for rg --pre. Read file to console, automatically recognize file encoding, include ansi, utf16le, utf16be, utf8. Currently output ansi as

null 3 Mar 18, 2022
FFTW is a free collection of fast C routines for computing the Discrete Fourier Transform in one or more dimensions

FFTW is a free collection of fast C routines for computing the Discrete Fourier Transform in one or more dimensions

null 4 Aug 12, 2022
libu8ident - Follow unicode security guidelines for identifiers

libu8ident - Follow unicode security guidelines for identifiers without adding the full Unicode database. This library does the unicode identifier sec

Reini Urban 9 Dec 23, 2022
Header-only library providing unicode aware string support for C++

CsString Introduction CsString is a standalone library which provides unicode aware string support. The CsBasicString class is a templated class which

CopperSpice 91 Dec 8, 2022
Text - A spicy text library for C++ that has the explicit goal of enabling the entire ecosystem to share in proper forward progress towards a bright Unicode future.

ztd.text Because if text works well in two of the most popular systems programming languages, the entire world over can start to benefit properly. Thi

Shepherd's Oasis 228 Dec 25, 2022
Neo - Simulates the digital rain from "The Matrix" (cmatrix clone with 32-bit color and Unicode support)

neo WARNING: neo may cause discomfort and seizures in people with photosensitive epilepsy. User discretion is advised. neo recreates the digital rain

Stew Reive 470 Dec 28, 2022
A modern port of Turbo Vision 2.0, the classical framework for text-based user interfaces. Now cross-platform and with Unicode support.

Turbo Vision A modern port of Turbo Vision 2.0, the classical framework for text-based user interfaces. Now cross-platform and with Unicode support. I

null 1.4k Dec 31, 2022
A Blender 2.7+ plugin that exports sausage link characters with animations

Sausage64 Sausage64 is a plugin for Blender 2.7 onwards, which allows you to export "sausage link" style character models with animations. The plugin

Buu342 33 Dec 17, 2022
Play video by fonts in a console window by composing characters

FontVideo Play video by fonts in a console window by composing characters. Using FFmpeg API to decode the input file, then the video stream is rendere

0xaa55 9 Jul 16, 2022
libleftpad is a useful C++ library which prepends characters to strings

libleftpad libleftpad is a useful C++ library which prepends characters to strings. It is very definitely a serious project. Usage: #include <libleftp

Tristan Brindle 4 Dec 5, 2022
libleftpad is a useful C++ library which prepends characters to strings

libleftpad is a useful C++ library which prepends characters to strings

Tristan Brindle 4 Dec 5, 2022
per - Simple unix permission viewer and converter

Per is a simple utility that can verbosely print unix permissions and convert between symbolic and numeric notations and vice-versa.

jarmuszz 5 Oct 23, 2022
Repo per il corso di Programmazione 2 - canale M-Z - A.A. 2020/21

Argomenti del corso Principi della Programmazione a Oggetti Astrazione Incapsulamento e Information Hiding Ereditarietà Polimorfismo Classi e oggetti

null 22 Oct 24, 2022
Repository per gli studenti dell'UNICAL che seguono il corso di Fondamenti di Programmazione 2

Fondamenti di programmazione 2 Per gli studenti dell'UNICAL che seguono il corso di Fondamenti di Programmazione 2 (2021/2022) Indice degli argomenti

Alessio 3 Jul 22, 2022
A BOF to parse the imports of a provided PE-file, optionally extracting symbols on a per-dll basis.

PE Import Enumerator BOF What is this? This is a BOF to enumerate DLL files to-be-loaded by a given PE file. Depending on the number of arguments, thi

null 78 Dec 1, 2022
Side panels for the Voron Zero printer that has 16 WS2812 RGB LED's per side.

Voron Zero RGB Side Panels Side panels for the Voron Zero printer with 16 WS2812 RGB LED's per side. The build below has some 3mm TAP Plastic black LE

Tim Abraham 3 Jul 22, 2022
Per function, Lua JIT using LLVM C++ toolchain

Lua Low Level Brief This is an alternative Lua (5.3.2) implementation that aims to archive better performance by generating native code with the help

Gabriel de Quadros Ligneul 10 Sep 4, 2021
Macro magic for declaring/calling Objective-C APIs from C11 or C++. Preloads selectors, chooses the correct objc_msgSend to call per method/platform.

OC - Easily Declare/Invoke Objective-C APIs from C11 or C++11 Usage // Call class and instance methods: NSWindow* const nswindow = oc_cls(NSWindow,new

Garett Bass 46 Dec 24, 2022