Parsing gigabytes of JSON per second

Overview

Fuzzing Status Ubuntu 18.04 CI Ubuntu 20.04 CI VS16-CI MinGW64-CI Doxygen Documentation

simdjson : Parsing gigabytes of JSON per second

JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to parse JSON 2.5x faster than RapidJSON and 25x faster than JSON for Modern C++.
  • Fast: Over 2.5x faster than commonly used production-grade JSON parsers.
  • Record Breaking Features: Minify JSON at 6 GB/s, validate UTF-8 at 13 GB/s, NDJSON at 3.5 GB/s.
  • Easy: First-class, easy to use and carefully documented APIs.
  • Beyond DOM: Try the new On Demand API for twice the speed (>4GB/s).
  • Strict: Full JSON and UTF-8 validation, lossless parsing. Performance with no compromises.
  • Automatic: Selects a CPU-tailored parser at runtime. No configuration needed.
  • Reliable: From memory allocation to error handling, simdjson's design avoids surprises.
  • Peer Reviewed: Our research appears in venues like VLDB Journal, Software: Practice and Experience.

This library is part of the Awesome Modern C++ list.

Table of Contents

Quick Start

The simdjson library is easily consumable with a single .h and .cpp file.

  1. Prerequisites: g++ (version 7 or better) or clang++ (version 6 or better), and a 64-bit system with a command-line shell (e.g., Linux, macOS, freeBSD). We also support programming environments like Visual Studio and Xcode, but different steps are needed.

  2. Pull simdjson.h and simdjson.cpp into a directory, along with the sample file twitter.json.

    wget https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.h https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.cpp https://raw.githubusercontent.com/simdjson/simdjson/master/jsonexamples/twitter.json
    
  3. Create quickstart.cpp:

    #include "simdjson.h"
    int main(void) {
      simdjson::dom::parser parser;
      simdjson::dom::element tweets = parser.load("twitter.json");
      std::cout << tweets["search_metadata"]["count"] << " results." << std::endl;
    }
  4. c++ -o quickstart quickstart.cpp simdjson.cpp

  5. ./quickstart

    100 results.
    

On Demand

The new On Demand JSON parser is just as easy, but much faster due to just-in-time parsing. It is in alpha right now. More information can be found in the On Demand Guide.

  1. Do step 1 of the Quick Start.

  2. Create quickstart.cpp:

    #include "simdjson.h"
    using namespace simdjson;
    int main(void) {
       ondemand::parser parser;
       padded_string json = padded_string::load("twitter.json");
       ondemand::document tweets = parser.iterate(json);
       std::cout << uint64_t(tweets["search_metadata"]["count"]) << " results." << std::endl;
    }
  3. c++ -march=native -o quickstart quickstart.cpp simdjson.cpp

  4. ./quickstart

    100 results.
    

You'll notice that the code here is very similar to the main Quick Start code (and indeed, it does the same thing). However, if you compare the performance, you should find On Demand much faster.

Documentation

Usage documentation is available:

  • Basics is an overview of how to use simdjson and its APIs.
  • Performance shows some more advanced scenarios and how to tune for them.
  • Implementation Selection describes runtime CPU detection and how you can work with it.
  • API contains the automatically generated API documentation.

Performance results

The simdjson library uses three-quarters less instructions than state-of-the-art parser RapidJSON and fifty percent less than sajson. To our knowledge, simdjson is the first fully-validating JSON parser to run at gigabytes per second (GB/s) on commodity processors. It can parse millions of JSON documents per second on a single core.

The following figure represents parsing speed in GB/s for parsing various files on an Intel Skylake processor (3.4 GHz) using the GNU GCC 9 compiler (with the -O3 flag). We compare against the best and fastest C++ libraries. The simdjson library offers full unicode (UTF-8) validation and exact number parsing. The RapidJSON library is tested in two modes: fast and exact number parsing. The sajson library offers fast (but not exact) number parsing and partial unicode validation. In this data set, the file sizes range from 65KB (github_events) all the way to 3.3GB (gsoc-2018). Many files are mostly made of numbers: canada, mesh.pretty, mesh, random and numbers: in such instances, we see lower JSON parsing speeds due to the high cost of number parsing. The simdjson library uses exact number parsing which is particular taxing.

On a Skylake processor, the parsing speeds (in GB/s) of various processors on the twitter.json file are as follows, using again GNU GCC 9.1 (with the -O3 flag). The popular JSON for Modern C++ library is particularly slow: it obviously trades parsing speed for other desirable features.

parser GB/s
simdjson 2.5
RapidJSON UTF8-validation 0.29
RapidJSON UTF8-valid., exact numbers 0.28
RapidJSON insitu, UTF8-validation 0.41
RapidJSON insitu, UTF8-valid., exact 0.39
sajson (insitu, dynamic) 0.62
sajson (insitu, static) 0.88
dropbox 0.13
fastjson 0.27
gason 0.59
ultrajson 0.34
jsmn 0.25
cJSON 0.31
JSON for Modern C++ (nlohmann/json) 0.11

The simdjson library offers high speed whether it processes tiny files (e.g., 300 bytes) or larger files (e.g., 3MB). The following plot presents parsing speed for synthetic files over various sizes generated with a script on a 3.4 GHz Skylake processor (GNU GCC 9, -O3).

All our experiments are reproducible.

You can go beyond 4 GB/s with our new On Demand API. For NDJSON files, we can exceed 3 GB/s with our multithreaded parsing functions.

Real-world usage

If you are planning to use simdjson in a product, please work from one of our releases.

Bindings and Ports of simdjson

We distinguish between "bindings" (which just wrap the C++ code) and a port to another programming language (which reimplements everything).

About simdjson

The simdjson library takes advantage of modern microarchitectures, parallelizing with SIMD vector instructions, reducing branch misprediction, and reducing data dependency to take advantage of each CPU's multiple execution cores.

Some people enjoy reading our paper: A description of the design and implementation of simdjson is in our research article:

We have an in-depth paper focused on the UTF-8 validation:

We also have an informal blog post providing some background and context.

For the video inclined,
simdjson at QCon San Francisco 2019
(it was the best voted talk, we're kinda proud of it).

Funding

The work is supported by the Natural Sciences and Engineering Research Council of Canada under grant number RGPIN-2017-03910.

Contributing to simdjson

Head over to CONTRIBUTING.md for information on contributing to simdjson, and HACKING.md for information on source, building, and architecture/design.

License

This code is made available under the Apache License 2.0.

Under Windows, we build some tools using the windows/dirent_portable.h file (which is outside our library code): it under the liberal (business-friendly) MIT license.

For compilers that do not support C++17, we bundle the string-view library which is published under the Boost license (http://www.boost.org/LICENSE_1_0.txt). Like the Apache license, the Boost license is a permissive license allowing commercial redistribution.

Issues
  • On-Demand Parsing

    On-Demand Parsing

    This introduces a DOM-like API that parses JSON with forward-only streaming--combining the ease of traditional DOM parsers with the performance of SAX. One major virtue of this approach is that we know what type the user wants a value to be before we parse it, so we eliminate the typical "type check" employed by DOM and SAX, instead parsing with a parser dedicated to that type.

    It is far faster than using the DOM (4.0 GB/s vs. 2.3 GB/s). It is also much easier than the SAX approach (15 lines for a Tweet reader versus 300 in SAX), as well as being slightly faster (results on the Clang 10 compiler, on a Skylake machine):

    | Benchmark | Generic DOM | SAX | On-Demand | |------------|---|---|---| | Tweets | 2.3 GB/s | 3.5 GB/s | 4.0 GB/s | | LargeRandom | 0.50 GB/s | 0.71 GB/s | 0.71 GB/s |

    Examples

    The benchmarks have some real, fairly simple examples with equivalent DOM and SAJ implementations.

    points.json

    This parses a giant array of points: [ { "x": 1.0, "y": 1.2, "z": 1.3 }, ... ]

    ondemand::parser parser;
    std::vector<my_point> container;
    for (ondemand::object p : parser.parse(json)) {
      container.emplace_back(my_point{p["x"], p["y"], p["z"]});
    }
    

    twitter.json

    This parses a list of Tweets (from the Twitter API) into C++ structs, serializing text, screen names, ids, and favore/retweet count.

    // Walk the document, parsing the tweets as we go
    std::vector<twitter::tweet> tweets;
    ondemand::parser parser;
    auto doc = parser.parse(json);
    for (ondemand::object tweet : doc["statuses"]) {
      tweets.emplace_back(twitter::tweet{
        tweet["created_at"],
        tweet["id"],
        tweet["text"],
        nullable_int(tweet["in_reply_to_status_id"]),
        read_user(tweet["user"]),
        tweet["retweet_count"],
        tweet["favorite_count"]
      });
    }
    

    With these auxiliary functions:

    simdjson_really_inline twitter::twitter_user read_user(ondemand::object && u) {
      return { u["id"], u["screen_name"] };
    }
    simdjson_really_inline uint64_t nullable_int(ondemand::value && value) {
      if (value.is_null()) { return 0; }
      return std::move(value);
    }
    

    Principles

    • Inline Iteration: Iterating arrays or objects is done through exactly the kind of for loop you'd expect. You can nest them, iterating an array of arrays or an array of objects with array values through nested for loops. Under the hood, the iterator checks for the "[", passes you the index of the value, and when you finish with a value, it checks for "," and passes the next value until it sees "]".
    • Forward-Only Iteration: To prevent reiteration of the same values and to keep the number of variables down (literally), only a single index is maintained and everything uses it (even if you have nested for loops). This means when you're going through an array of arrays, for example, that the inner array loop will advance the index to the next comma, and the array can just pick it up and look at it.
    • Inline, On-Demand Parsing: Parses exactly the type you want and nothing else. Because it's inline this means way fewer branches per value, and they're more predictable as well. For example, if you ask for an unsigned integer, we just start parsing digits. If there were no digits, we toss an error. With a generic parser you have to do a big switch statement checking whether it's a digit before you even start parsing, so it's both an extra branch, and a hard to predict one (because you are also checking other values).
    • Streaming Output: This is streaming in the sense of output, but not input. You still have to pass the whole file as input; it just doesn't parse anything besides the marks until you ask. This also means the parser memory has to grow as the file grows (because of structural indexes). Streaming input is a whole nother problem, however.
    • Validate What You Use: It deliberately validates the values you use and the structure leading to it, but nothing else. The goal is a guarantee that the value you asked for is the correct one and is not malformed so that there is no confusion over whether you got the right value. But it leaves the possibility that the JSON as a whole is invalid. A full-validation mode is possible and planned, but I think this mode should be the default, personally, or at least pretty heavily advertised. Full-validation mode should really only be for debug.
    • Avoids Genericity Pitfalls I think it avoids the pitfalls of generating a generic DOM, which are that you don't know what to expect in the file so you can't tune the parser to expect it (and thus branch mispredictions abound). Even SAX falls into this, though definitely less than others: the core of SAX still has to have a giant switch statement in a loop, and that's just going to be inherently branchy.

    Impact on DOM parse (skylake haswell gcc10.2)

    As expected / hoped, this is entirely neutral with respect to our existing performance, all the way down to identical instruction count:

    | File | Blocks | master Cycles | PR Cycles | +Throughput | master Instr. | PR Instr. | -Instr. | | --- | --: | --: | --: | --: | --: | --: | --: | | gsoc-2018 | 51997 | 72.7 | 71.6 | 1% | 160.1 | 160.1 | 0% | | instruments | 3442 | 108.2 | 107.3 | 0% | 370.6 | 370.6 | 0% | | github_events | 1017 | 78.6 | 78.4 | 0% | 256.7 | 256.7 | 0% | | numbers | 2345 | 284.1 | 283.4 | 0% | 791.0 | 791.0 | 0% | | apache_builds | 1988 | 84.9 | 84.7 | 0% | 295.1 | 295.1 | 0% | | mesh | 11306 | 319.3 | 318.5 | 0% | 984.2 | 984.2 | 0% | | twitterescaped | 8787 | 188.1 | 187.8 | 0% | 493.0 | 493.0 | 0% | | marine_ik | 46616 | 318.4 | 318.1 | 0% | 895.7 | 895.7 | 0% | | update-center | 8330 | 113.2 | 113.2 | 0% | 326.5 | 326.5 | 0% | | mesh.pretty | 24646 | 189.0 | 188.9 | 0% | 571.3 | 571.3 | 0% | | twitter | 9867 | 92.0 | 92.0 | 0% | 281.6 | 281.6 | 0% | | citm_catalog | 26987 | 81.7 | 81.7 | 0% | 287.5 | 287.5 | 0% | | canada | 35172 | 311.2 | 311.4 | 0% | 946.7 | 946.7 | 0% | | semanticscholar-corpus | 134271 | 108.8 | 109.0 | 0% | 274.4 | 274.4 | 0% | | random | 7976 | 141.2 | 142.2 | 0% | 482.1 | 482.1 | 0% |

    Design

    The primary classes are:

    • ondemand::parser: The equivalent of dom::parser.
      • This handles allocation and parse calls, and keeps memory around between parses.
    • ondemand::document: Holds iteration state. Can be cast to array, object or scalar.
      • Forward-Only: This is a forward-only input iterator. You may only get the document's value once. Once you have retrieved an array, object, or scalar, subsequent attempts to get other values fail.
      • Iteration Owner: If you let go of the document object, iteration will fail. This is not checked, but the failure will be really obvious :) Moves are disallowed after iteration has started, because array/object/value all point to the document.
      • Locks the Parser: Only one iteration is allowed at a time. If you attempt to parse a new document before destroying the old one, you will get an error. document cannot be copied.
    • ondemand::array: Manages array iteration.
      • Forward-Only: Retrieving the same array element twice will fail. There is currently no check on whether it has handed out a value, it's just something you shouldn't do multiple times without ++. This is consistent with C++'s "input iterator" concept.
      • Child Blindness: Once you get an array element, it has no control over whether you do anything with it. For example, you could decide not to handle a value if it's an array or object. To control for this, when you ++ the array checks whether there is an unfinished array or object by checking if we're at the current depth. If so, it skips tokens until it's returned to the current depth.
      • Chainable: We allow you to pass an error into the iterator, which it will yield on its first iteration and then stop. This allows error chaining to make its way all the way to the loop: for (auto o : parser.parse(json)) works!
      • C++ Iterator: Because C++ breaks what could be a single next() call into !=, ++, and * calls, we have to break up the algorithm into parts and keep some state between them.
        • operator * Reads key, : and a value, advancing exactly three times. Returns an error if key or : is invalid. value takes up the slack from there, incrementing depth if there is a [ or { even if the user doesn't use the value. If there is an error to report, it decrements depth so that the loop will terminate, and returns it.
        • operator ++ Checks if we have a ] (decrementing depth) or , (setting error if no comma).
        • operator != lets the loop continue if current depth >= array depth.
      • Zero Overhead: It keeps state, but that state is all either constant and knowable at compile time, or only has an effect for a single iteration (the first or last). Our goal is to get the compiler to elide them all.
        • document: This member variable is in many objects, but always has the same value. It is likely the compiler will elide it.
        • at_start: This member variable is set the first time true. If it is true, we check for } before the first != and then set it to false.
        • error: Whether this member variable is passed in initially or detected by ++, error has no effect unless it is nonzero, and when it is zero, the loop always terminates after the next iteration. We hope this will be elided, therefore, into a trailing control flow.
        • depth: This member variable is constant and knowable at compile time, because depth will have been incremented a constant number of times based on how many nested objects you have. Whether the compiler recognizes this is anybody's game, however :/
    • ondemand::object: Manages object iteration and object["key"].
      • Forward-Only: [] will never go backwards; you must do [] in order to get all the fields you want. Retrieving the same field twice with * will fail. There is currently no check on whether it has handed out a value, it's just something you shouldn't do multiple times without ++. This is consistent with C++'s "input iterator" concept.
      • Child Blindness: Once you get a field or value, the object has no control over whether you do anything with it. For example, you could decide not to handle a value if it's an array or object. To control for this, when you ++ or do a second [], the array checks whether there is an unfinished array or object by checking if we're at the current depth. If so, it skips tokens until it's returned to the current depth.
      • Chainable: We allow you to pass an error into the iterator, which it will yield on its first iteration and then stop. This allows error chaining to make its way all the way to the loop: for (auto o : parser.parse(json)) works!
      • C++ Iterator: Because C++ breaks what could be a single next() call into !=, ++, and * calls, we have to break up the algorithm into parts and keep some state between them.
        • operator * Reads key, : and a value, advancing exactly three times. Returns an error if key or : is invalid. value takes up the slack from there, incrementing depth if there is a [ or { even if the user doesn't use the value. If there is an error to report, it decrements depth so that the loop will terminate, and returns it.
        • operator ++ Checks if we have a ] (decrementing depth) or , (setting error if no comma).
        • operator != lets the loop continue if current depth >= array depth.
      • Zero Overhead: It keeps state, but that state is all either constant and knowable at compile time, or only has an effect for a single iteration (the first or last). Our goal is to get the compiler to elide them all.
        • document: This member variable is in many objects, but always has the same value. It is likely the compiler will elide it.
        • at_start: This member variable is set the first time true. If it is true, we check for } before the first != and then set it to false therafter. We expect this to be elided in favor of leading control flow.
        • error: Whether this member variable is passed in initially or detected by ++, error has no effect unless it is nonzero, and when it is zero, the loop always terminates after the next iteration. We hope this will be elided, therefore, into a trailing control flow.
        • depth: This member variable is constant and knowable at compile time, because depth will have been incremented a constant number of times based on how many nested objects you have. Whether the compiler recognizes this is anybody's game, however :/
    • ondemand::value: A transient object giving you the opportunity to convert a JSON value into an array, object, or scalar.
      • Forward-Only: This is transient: its value can only be retrieved once. Further retrievals will fail. It is an error to keep multiple value objects around at once (it is also hard to do, but possible).
      • Skippable: If you don't use the value (for example, if you have a field and don't care about the key), the destructor will check if it's { or [ and increment depth, to keep things consistent.
    • ondemand::raw_json_string: Represents the raw json string inside the buffer, terminated by ". This allows you to inspect and do comparisons against the raw, escaped json string without performance penalty.
      • The unescape() method on it will parse it (escapes and all) into a string buffer of your choice.
    • ondemand::token_iterator: Internal. Used to actually track structural iteration. document is the only way users will see this.

    Concerns / Rough Edges

    • Compiler Flags: You have to compile all of simdjson.cpp/simdjson.h with the target flags. Otherwise, simdjson_result<> and other things in the include/ headers can't be inlined with your Haswell-specific code.
    • Heavy Reliance On Optimizers: The object and array structs have four member variables each, which I'm expecting the compiler to elide completely in normal cases. I think these are largely unavoidable given C++'s iterator design. Without that elision, register pressure will be intense and stuff will get shoved into memory. I made some assumptions about how optimizers should work, particularly that it can deduce the depth value since it's constant, and that it can elide variables like at_start into control flow since it only affects the header of the loop.
    • Unconsumed Values: Because the user drives the parse, we hand them array, object and value objects which they can then iterate or call get_string/etc. on. However, if they don't do this, or only partially iterate, then the structural_index and depth won't get updated. I added a destructor to value to check whether the value has been used, and check for start/end array if not. We also keep track of depth so that if child iterations are left partially iterated, we can skip everything until we get back to our own depth. All of this can be optimized away if the compiler is smart enough ... but I'm not convinced it will be :) We'll see.

    Performance

    clang does a lot better on the (relatively complex) twitter benchmark with ondemand than g++. I assume this has to do with its affinity for SSA optimizations:

    Haswell clang10.0 (Skylake)

    | Benchmark | Generic DOM | On-Demand | SAX | |------------|---|---|---| | PartialTweets | 2.3 GB/s | 4.0 GB/s | 3.5 GB/s | | LargeRandom | 0.50 GB/s | 0.71 GB/s | 0.71 GB/s |

    Haswell gcc10 (Skylake)

    GCC is more or less on par with clang:

    | Benchmark | DOM | On-Demand | SAX | |------------|---|---|---| | PartialTweets | 2.5 GB/s | 3.8 GB/s | 3.7 GB/s | | LargeRandom | 0.50 GB/s | 0.78 GB/s | 0.74 GB/s |

    Running the Benchmark

    You can see several examples in benchmark/bench_ondemand.cpp. To compile for your native platform, do this:

    rm -rf build
    cd build
    cmake -DCMAKE_CXX_FLAGS="-march=native" ..
    make bench_ondemand
    benchmark/bench_ondemand --benchmark_counters_tabular=true
    

    Raw Data: Haswell clang10.0 (Skylake)

    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Benchmark                         Time             CPU   Iterations best_branch_miss best_bytes_per_sec best_cache_miss best_cache_ref best_cycles best_cycles_per_byte best_docs_per_sec best_frequency best_instructions best_instructions_per_byte best_instructions_per_cycle best_items_per_sec branch_miss      bytes bytes_per_second cache_miss  cache_ref     cycles cycles_per_byte docs_per_sec  frequency instructions instructions_per_byte instructions_per_cycle      items items_per_second
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    PartialTweets<OnDemand>      165179 ns       165149 ns         4237           1.538k             4.056G               0        58.152k    574.945k             0.910422          6.42265k       3.69267G          1.93943M                    3.07108                     3.37325           642.265k    1.71353k   631.515k       3.56129G/s   8.96861m   58.1731k   580.103k         0.91859   6.05512k/s  3.5126G/s     1.93943M               3.07108                3.34325        100       605.512k/s [best: throughput=  4.06 GB/s doc_throughput=  6422 docs/s instructions=     1939432 cycles=      574945 branch_miss=    1538 cache_miss=       0 cache_ref=     58152 items=       100 avg_time=    157210 ns]
    PartialTweets<Iter>          182774 ns       182773 ns         3828           2.825k           3.66457G               0         58.28k    636.406k              1.00774          5.80282k       3.69295G             1.84M                    2.91363                     2.89124           580.282k    3.08751k   631.515k       3.21789G/s    3.9185m    58.158k   644.682k         1.02085   5.47127k/s 3.52723G/s        1.84M               2.91363                2.85412        100       547.127k/s [best: throughput=  3.66 GB/s doc_throughput=  5802 docs/s instructions=     1840002 cycles=      636406 branch_miss=    2825 cache_miss=       0 cache_ref=     58280 items=       100 avg_time=    174683 ns]
    PartialTweets<Dom>           282433 ns       282382 ns         2480           3.149k           2.31979G               0        92.404k    1005.16k              1.59167          3.67338k       3.69235G           2.9721M                     4.7063                     2.95682           367.338k     3.2693k   631.515k        2.0828G/s   0.439516   92.8304k   1011.84k         1.60225    3.5413k/s 3.58324G/s      2.9721M                4.7063                2.93731        100        354.13k/s [best: throughput=  2.32 GB/s doc_throughput=  3673 docs/s instructions=     2972097 cycles=     1005165 branch_miss=    3149 cache_miss=       0 cache_ref=     92404 items=       100 avg_time=    274248 ns]
    Creating a source file spanning 44921 KB 
    LargeRandom<Dom>           91468995 ns     91467607 ns            8         968.537k           503.312M        10.8974M       15.4273M    337.109M              7.32865           10.9419        3.6886G          1041.23M                    22.6361                     3.08872           10.9419M    967.752k   45.9988M         479.6M/s   10.9539M   15.4278M   337.335M         7.33356    10.9328/s 3.68803G/s     1041.23M               22.6361                3.08665      1000k       10.9328M/s [best: throughput=  0.50 GB/s doc_throughput=    10 docs/s instructions=  1041233885 cycles=   337109080 branch_miss=  968537 cache_miss=10897426 cache_ref=  15427295 items=   1000000 avg_time=  91455371 ns]
    LargeRandomSum<Dom>        89605540 ns     89588793 ns            8         968.435k           514.102M        10.3415M       14.5719M     330.07M              7.17562           11.1764         3.689G          1022.23M                    22.2231                     3.09702           11.1764M    968.274k   45.9988M       489.658M/s   10.3269M   14.5726M   330.174M         7.17789    11.1621/s 3.68544G/s     1022.23M               22.2231                3.09604      1000k       11.1621M/s [best: throughput=  0.51 GB/s doc_throughput=    11 docs/s instructions=  1022233883 cycles=   330069936 branch_miss=  968435 cache_miss=10341543 cache_ref=  14571861 items=   1000000 avg_time=  89591929 ns]
    LargeRandom<OnDemand>      64622779 ns     64622223 ns           11         929.755k           712.493M        5.63348M       8.01073M     238.12M              5.17666           15.4894       3.68834G           648.69M                    14.1023                     2.72422           15.4894M    930.709k   45.9988M       678.835M/s   5.66211M   8.01242M   238.304M         5.18066    15.4746/s 3.68765G/s      648.69M               14.1023                2.72212      1000k       15.4746M/s [best: throughput=  0.71 GB/s doc_throughput=    15 docs/s instructions=   648690465 cycles=   238120090 branch_miss=  929755 cache_miss= 5633485 cache_ref=   8010731 items=   1000000 avg_time=  64609673 ns]
    LargeRandomSum<OnDemand>   64569064 ns     64568969 ns           11         951.323k           714.434M        4.98723M       7.12271M    237.524M              5.16371           15.5316       3.68913G           641.69M                    13.9502                     2.70158           15.5316M    958.366k   45.9988M       679.395M/s    5.0381M   7.12423M    238.12M         5.17667    15.4873/s 3.68785G/s      641.69M               13.9502                2.69481      1000k       15.4873M/s [best: throughput=  0.71 GB/s doc_throughput=    15 docs/s instructions=   641690192 cycles=   237524226 branch_miss=  951323 cache_miss= 4987231 cache_ref=   7122709 items=   1000000 avg_time=  64556393 ns]
    LargeRandom<Iter>          60862746 ns     60863442 ns           11         990.089k           757.035M        5.62286M       7.98759M    224.156M              4.87309           16.4577        3.6891G          581.692M                    12.6458                     2.59503           16.4577M    995.475k   45.9988M       720.759M/s   5.65907M   7.98859M   224.456M          4.8796    16.4302/s 3.68786G/s     581.692M               12.6458                2.59157      1000k       16.4302M/s [best: throughput=  0.76 GB/s doc_throughput=    16 docs/s instructions=   581691751 cycles=   224156097 branch_miss=  990089 cache_miss= 5622863 cache_ref=   7987593 items=   1000000 avg_time=  60850056 ns]
    LargeRandomSum<Iter>       59778441 ns     59777987 ns           12          981.57k           770.555M        5.01271M       7.15428M    220.194M              4.78696           16.7516       3.68861G          570.691M                    12.4067                     2.59177           16.7516M    986.227k   45.9988M       733.846M/s   5.05344M   7.15511M   220.459M         4.79271    16.7286/s 3.68796G/s     570.691M               12.4067                2.58865      1000k       16.7286M/s [best: throughput=  0.77 GB/s doc_throughput=    16 docs/s instructions=   570691393 cycles=   220194110 branch_miss=  981570 cache_miss= 5012710 cache_ref=   7154280 items=   1000000 avg_time=  59765975 ns]
    Creating a source file spanning 134087 KB 
    Kostya<Dom>                94265351 ns     94266428 ns            7          1045.8k           1.45789G        15.7114M       22.3239M    347.272M               2.5292           10.6179        3.6873G          975.883M                    7.10741                     2.81014           5.56684M    1045.46k   137.305M       1.35653G/s   15.7288M   22.2886M   347.637M         2.53186    10.6082/s 3.68782G/s     975.883M               7.10741                2.80719   524.288k       5.56177M/s [best: throughput=  1.46 GB/s doc_throughput=    10 docs/s instructions=   975882556 cycles=   347271978 branch_miss= 1045795 cache_miss=15711430 cache_ref=  22323923 items=    524288 avg_time=  94251475 ns]
    KostyaSum<Dom>             93482419 ns     93481371 ns            7         1048.42k           1.47166G        15.4012M       21.9002M    344.184M              2.50671           10.7182       3.68902G          970.115M                     7.0654                      2.8186            5.6194M    1049.48k   137.305M       1.36792G/s   15.4135M   21.7919M   344.773M           2.511    10.6973/s 3.68815G/s     970.115M                7.0654                2.81378   524.288k       5.60848M/s [best: throughput=  1.47 GB/s doc_throughput=    10 docs/s instructions=   970115386 cycles=   344183956 branch_miss= 1048419 cache_miss=15401206 cache_ref=  21900243 items=    524288 avg_time=  93468372 ns]
    Kostya<OnDemand>           59869079 ns     59857167 ns           12         468.968k            2.2974G        9.97175M       13.9606M    220.483M              1.60579           16.7321       3.68914G          635.858M                    4.63099                     2.88393           8.77243M    472.341k   137.305M       2.13634G/s   9.99057M   13.9227M   220.746M         1.60771    16.7064/s 3.68788G/s     635.858M               4.63099                2.88049   524.288k       8.75898M/s [best: throughput=  2.30 GB/s doc_throughput=    16 docs/s instructions=   635857684 cycles=   220483050 branch_miss=  468968 cache_miss= 9971751 cache_ref=  13960563 items=    524288 avg_time=  59856199 ns]
    KostyaSum<OnDemand>        60280325 ns     60279422 ns           12         469.529k           2.27999G        9.67866M       13.4947M    222.157M              1.61799           16.6053       3.68898G          630.615M                     4.5928                     2.83859           8.70594M    469.704k   137.305M       2.12137G/s   9.67636M   13.4264M   222.299M         1.61902    16.5894/s 3.68781G/s     630.615M                4.5928                2.83678   524.288k       8.69763M/s [best: throughput=  2.28 GB/s doc_throughput=    16 docs/s instructions=   630614946 cycles=   222157441 branch_miss=  469529 cache_miss= 9678657 cache_ref=  13494711 items=    524288 avg_time=  60267368 ns]
    Kostya<Iter>               61758017 ns     61757605 ns           11         497.377k           2.22614G        9.95741M       13.9293M    227.537M              1.65716           16.2131       3.68908G          606.497M                    4.41715                     2.66549           8.50035M    497.937k   137.305M        2.0706G/s   9.99912M   13.9207M   227.752M         1.65873    16.1923/s 3.68785G/s     606.497M               4.41715                2.66297   524.288k       8.48945M/s [best: throughput=  2.23 GB/s doc_throughput=    16 docs/s instructions=   606497405 cycles=   227536701 branch_miss=  497377 cache_miss= 9957411 cache_ref=  13929258 items=    524288 avg_time=  61745137 ns]
    KostyaSum<Iter>            59370790 ns     59359390 ns           12         464.597k           2.31499G        9.64345M       13.4522M    218.801M              1.59354           16.8602       3.68902G          597.061M                    4.34843                     2.72878           8.83958M    464.774k   137.305M       2.15425G/s   9.67782M   13.4532M    218.89M         1.59419    16.8465/s 3.68754G/s     597.061M               4.34843                2.72767   524.288k       8.83244M/s [best: throughput=  2.31 GB/s doc_throughput=    16 docs/s instructions=   597060518 cycles=   218801084 branch_miss=  464597 cache_miss= 9643448 cache_ref=  13452173 items=    524288 avg_time=  59357614 ns]
    
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Benchmark                    Time             CPU   Iterations best_branch_miss best_bytes_per_sec best_cache_miss best_cache_ref best_cycles best_cycles_per_byte best_docs_per_sec best_frequency best_instructions best_instructions_per_byte best_instructions_per_cycle best_items_per_sec branch_miss      bytes bytes_per_second cache_miss  cache_ref     cycles cycles_per_byte docs_per_sec  frequency instructions instructions_per_byte instructions_per_cycle      items items_per_second
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    PartialTweets<Sax>      191072 ns       191072 ns         3670           1.323k           3.47984G               0        59.009k    670.174k              1.06122          5.51031k       3.69287G          2.17912M                    3.45062                     3.25157           551.031k    1.47164k   631.515k       3.07814G/s   0.013624   58.9976k   674.469k         1.06802   5.23364k/s 3.52993G/s     2.17912M               3.45062                3.23086        100       523.364k/s [best: throughput=  3.48 GB/s doc_throughput=  5510 docs/s instructions=     2179117 cycles=      670174 branch_miss=    1323 cache_miss=       0 cache_ref=     59009 items=       100 avg_time=    182754 ns]
    Creating a source file spanning 44921 KB 
    LargeRandom<Dom>      91475275 ns     91476347 ns            8         933.026k           503.328M        11.1398M       15.6303M    337.144M              7.32942           10.9422        3.6891G          1040.23M                    22.6144                     3.08543           10.9422M     932.62k   45.9988M       479.554M/s   11.1208M   15.6317M   337.404M         7.33507    10.9318/s 3.68843G/s     1040.23M               22.6144                3.08305      1000k       10.9318M/s [best: throughput=  0.50 GB/s doc_throughput=    10 docs/s instructions=  1040233883 cycles=   337144298 branch_miss=  933026 cache_miss=11139804 cache_ref=  15630295 items=   1000000 avg_time=  91461980 ns]
    LargeRandomSum<Dom>   90329253 ns     90329264 ns            8         932.216k            509.83M        10.4788M       14.7649M    332.849M              7.23603           11.0836       3.68915G          1022.23M                    22.2231                     3.07117           11.0836M    932.897k   45.9988M       485.644M/s    10.522M   14.7658M   333.176M         7.24315    11.0706/s 3.68846G/s     1022.23M               22.2231                3.06815      1000k       11.0706M/s [best: throughput=  0.51 GB/s doc_throughput=    11 docs/s instructions=  1022233881 cycles=   332848515 branch_miss=  932216 cache_miss=10478836 cache_ref=  14764904 items=   1000000 avg_time=  90315728 ns]
    LargeRandom<Sax>      67111081 ns     67111854 ns           10         973.014k           686.397M         5.6914M       8.09484M    247.225M               5.3746           14.9221        3.6891G          675.692M                    14.6893                     2.73311           14.9221M    975.305k   45.9988M       653.653M/s    5.7521M    8.0964M   247.525M         5.38113    14.9005/s 3.68825G/s     675.692M               14.6893                2.72979      1000k       14.9005M/s [best: throughput=  0.69 GB/s doc_throughput=    14 docs/s instructions=   675691776 cycles=   247224897 branch_miss=  973014 cache_miss= 5691399 cache_ref=   8094842 items=   1000000 avg_time=  67098397 ns]
    

    Raw Data: Haswell gcc 10.2 (Skylake)

    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Benchmark                         Time             CPU   Iterations best_branch_miss best_bytes_per_sec best_cache_miss best_cache_ref best_cycles best_cycles_per_byte best_docs_per_sec best_frequency best_instructions best_instructions_per_byte best_instructions_per_cycle best_items_per_sec branch_miss      bytes bytes_per_second cache_miss  cache_ref     cycles cycles_per_byte docs_per_sec  frequency instructions instructions_per_byte instructions_per_cycle      items items_per_second
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    PartialTweets<OnDemand>      177528 ns       177530 ns         3941           1.657k           3.74809G               0        54.903k    622.194k              0.98524          5.93507k       3.69277G          2.11159M                    3.34369                     3.39378           593.507k    1.80355k   631.515k       3.31293G/s  0.0284192   54.9838k   626.235k        0.991639   5.63285k/s 3.52748G/s     2.11159M               3.34369                3.37188        100       563.285k/s [best: throughput=  3.75 GB/s doc_throughput=  5935 docs/s instructions=     2111588 cycles=      622194 branch_miss=    1657 cache_miss=       0 cache_ref=     54903 items=       100 avg_time=    169681 ns]
    PartialTweets<Iter>          384499 ns       384501 ns         1819           2.968k           1.68681G               0         55.31k    1.38216M              2.18865          2.67105k       3.69182G          4.40862M                    6.98102                     3.18965           267.105k    3.19399k   631.515k       1.52963G/s   0.239142   55.4276k   1.38969M         2.20056   2.60078k/s 3.61426G/s     4.40862M               6.98102                3.17239        100       260.078k/s [best: throughput=  1.69 GB/s doc_throughput=  2671 docs/s instructions=     4408621 cycles=     1382163 branch_miss=    2968 cache_miss=       0 cache_ref=     55310 items=       100 avg_time=    376595 ns]
    PartialTweets<Dom>           266111 ns       266112 ns         2630            3.53k           2.46696G               0        87.587k    945.164k              1.49666          3.90642k       3.69221G          2.91945M                    4.62293                     3.08883           390.642k    3.70404k   631.515k       2.21014G/s  0.0285171   87.3744k   952.087k         1.50762   3.75782k/s 3.57777G/s     2.91945M               4.62293                3.06637        100       375.782k/s [best: throughput=  2.47 GB/s doc_throughput=  3906 docs/s instructions=     2919449 cycles=      945164 branch_miss=    3530 cache_miss=       0 cache_ref=     87587 items=       100 avg_time=    257984 ns]
    Creating a source file spanning 44921 KB 
    LargeRandom<Dom>           91849988 ns     91849888 ns            8         889.651k           502.486M        10.9371M       15.2509M    337.691M              7.34129           10.9239        3.6889G          970.316M                    21.0944                     2.87339           10.9239M    889.185k   45.9988M       477.604M/s   11.0046M   15.2543M   338.763M         7.36461    10.8873/s 3.68822G/s     970.316M               21.0944                2.86429      1000k       10.8873M/s [best: throughput=  0.50 GB/s doc_throughput=    10 docs/s instructions=   970315574 cycles=   337690544 branch_miss=  889651 cache_miss=10937148 cache_ref=  15250891 items=   1000000 avg_time=  91836392 ns]
    LargeRandomSum<Dom>        92188755 ns     92188563 ns            8         889.635k           499.861M          10.36M       14.4073M    339.484M              7.38027           10.8668       3.68911G          974.316M                    21.1813                     2.86999           10.8668M    889.272k   45.9988M       475.849M/s   10.4213M   14.4099M   340.028M          7.3921    10.8473/s 3.68839G/s     974.316M               21.1813                 2.8654      1000k       10.8473M/s [best: throughput=  0.50 GB/s doc_throughput=    10 docs/s instructions=   974315578 cycles=   339483518 branch_miss=  889635 cache_miss=10360012 cache_ref=  14407265 items=   1000000 avg_time=  92175359 ns]
    LargeRandom<OnDemand>      58992605 ns     58991725 ns           12         869.377k           781.677M        5.62944M       7.89403M    217.093M              4.71954           16.9934       3.68916G          615.695M                     13.385                     2.83609           16.9934M    868.032k   45.9988M       743.627M/s   5.67331M   7.89681M    217.57M         4.72991    16.9515/s 3.68815G/s     615.695M                13.385                2.82987      1000k       16.9515M/s [best: throughput=  0.78 GB/s doc_throughput=    16 docs/s instructions=   615694894 cycles=   217093162 branch_miss=  869377 cache_miss= 5629445 cache_ref=   7894027 items=   1000000 avg_time=  58980167 ns]
    LargeRandomSum<OnDemand>   56594492 ns     56594225 ns           12         876.324k           813.963M        5.01997M       7.05425M    208.485M               4.5324           17.6953       3.68921G          606.695M                    13.1894                     2.91002           17.6953M    876.066k   45.9988M       775.129M/s     5.066M   7.05672M   208.735M         4.53784    17.6696/s 3.68828G/s     606.695M               13.1894                2.90653      1000k       17.6696M/s [best: throughput=  0.81 GB/s doc_throughput=    17 docs/s instructions=   606694893 cycles=   208485037 branch_miss=  876324 cache_miss= 5019967 cache_ref=   7054246 items=   1000000 avg_time=  56582402 ns]
    LargeRandom<Iter>          53364551 ns     53364201 ns           13         894.323k            863.44M        5.63805M       7.89683M    196.541M              4.27273           18.7709       3.68925G          570.695M                    12.4067                      2.9037           18.7709M    894.787k   45.9988M       822.046M/s   5.66445M   7.89822M    196.82M          4.2788    18.7392/s 3.68823G/s     570.695M               12.4067                2.89958      1000k       18.7392M/s [best: throughput=  0.86 GB/s doc_throughput=    18 docs/s instructions=   570694596 cycles=   196540518 branch_miss=  894323 cache_miss= 5638049 cache_ref=   7896828 items=   1000000 avg_time=  53352485 ns]
    LargeRandomSum<Iter>       54883627 ns     54883439 ns           13         871.251k           841.314M        5.02219M       7.05069M    201.706M              4.38502           18.2899       3.68918G          577.695M                    12.5589                     2.86405           18.2899M    871.778k   45.9988M       799.291M/s   5.06164M   7.05285M   202.423M         4.40061    18.2204/s 3.68823G/s     577.695M               12.5589                2.85391      1000k       18.2204M/s [best: throughput=  0.84 GB/s doc_throughput=    18 docs/s instructions=   577695426 cycles=   201705764 branch_miss=  871251 cache_miss= 5022193 cache_ref=   7050692 items=   1000000 avg_time=  54871279 ns]
    Creating a source file spanning 134087 KB 
    Kostya<Dom>                86984857 ns     86984354 ns            8         494.739k           1.58086G        15.8348M       22.1883M      320.4M              2.33349           11.5135       3.68893G          936.468M                    6.82035                     2.92281            6.0364M    494.617k   137.305M       1.47009G/s    15.849M   22.1757M   320.827M          2.3366    11.4963/s 3.68833G/s     936.468M               6.82035                2.91892   524.288k       6.02738M/s [best: throughput=  1.58 GB/s doc_throughput=    11 docs/s instructions=   936467833 cycles=   320400264 branch_miss=  494739 cache_miss=15834791 cache_ref=  22188266 items=    524288 avg_time=  86971545 ns]
    KostyaSum<Dom>             86970961 ns     86970134 ns            8         495.135k           1.58123G        15.5502M       21.6331M    320.352M              2.33314           11.5162       3.68924G          938.565M                    6.83562                     2.92979           6.03781M    494.743k   137.305M       1.47034G/s   15.6095M   21.6834M   320.783M         2.33628    11.4982/s 3.68842G/s     938.565M               6.83562                2.92586   524.288k       6.02837M/s [best: throughput=  1.58 GB/s doc_throughput=    11 docs/s instructions=   938564987 cycles=   320352061 branch_miss=  495135 cache_miss=15550182 cache_ref=  21633093 items=    524288 avg_time=  86957969 ns]
    Kostya<OnDemand>           60343057 ns     60343775 ns           12         456.213k           2.28197G        10.1305M       13.9803M    221.966M              1.61659           16.6197       3.68902G          647.782M                    4.71783                     2.91838           8.71353M    456.138k   137.305M       2.11911G/s   10.1682M    13.868M   222.551M         1.62085    16.5717/s 3.68805G/s     647.782M               4.71783                2.91072   524.288k       8.68835M/s [best: throughput=  2.28 GB/s doc_throughput=    16 docs/s instructions=   647782119 cycles=   221966325 branch_miss=  456213 cache_miss=10130481 cache_ref=  13980257 items=    524288 avg_time=  60330609 ns]
    KostyaSum<OnDemand>        58642231 ns     58641748 ns           12          453.15k            2.3471G        9.82263M       13.5381M    215.814M              1.57178            17.094       3.68913G          643.064M                    4.68347                     2.97972            8.9622M    453.464k   137.305M       2.18062G/s   9.84356M   13.5389M    216.28M         1.57518    17.0527/s 3.68816G/s     643.064M               4.68347                2.97329   524.288k       8.94052M/s [best: throughput=  2.35 GB/s doc_throughput=    17 docs/s instructions=   643063529 cycles=   215813521 branch_miss=  453150 cache_miss= 9822627 cache_ref=  13538124 items=    524288 avg_time=  58629573 ns]
    Kostya<Iter>               59348929 ns     59348585 ns           12          452.97k           2.32237G        10.0784M       13.9769M    218.108M              1.58849           16.9139       3.68906G          642.015M                    4.67583                     2.94357           8.86776M    453.045k   137.305M       2.15465G/s   10.1584M    13.924M    218.88M         1.59412    16.8496/s 3.68804G/s     642.015M               4.67583                2.93318   524.288k       8.83404M/s [best: throughput=  2.32 GB/s doc_throughput=    16 docs/s instructions=   642015174 cycles=   218107739 branch_miss=  452970 cache_miss=10078433 cache_ref=  13976859 items=    524288 avg_time=  59336271 ns]
    KostyaSum<Iter>           121340358 ns    121337200 ns            6         453.895k           1.13236G          9.935M       13.6518M    447.335M              3.25796           8.24704       3.68919G          1.31992G                    9.61305                     2.95063           4.32383M    454.037k   137.305M       1079.18M/s   9.95978M   13.6513M   447.577M         3.25973     8.2415/s 3.68871G/s     1.31992G               9.61305                2.94903   524.288k       4.32092M/s [best: throughput=  1.13 GB/s doc_throughput=     8 docs/s instructions=  1319919270 cycles=   447334605 branch_miss=  453895 cache_miss= 9934995 cache_ref=  13651799 items=    524288 avg_time= 121327101 ns]
    
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Benchmark                    Time             CPU   Iterations best_branch_miss best_bytes_per_sec best_cache_miss best_cache_ref best_cycles best_cycles_per_byte best_docs_per_sec best_frequency best_instructions best_instructions_per_byte best_instructions_per_cycle best_items_per_sec branch_miss      bytes bytes_per_second cache_miss  cache_ref     cycles cycles_per_byte docs_per_sec  frequency instructions instructions_per_byte instructions_per_cycle      items items_per_second
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    PartialTweets<Sax>      181416 ns       181385 ns         3854           1.384k           3.67147G               0        58.443k    635.165k              1.00578          5.81375k       3.69269G          2.07459M                     3.2851                     3.26623           581.375k    1.53157k   631.515k       3.24252G/s   0.044629   58.4857k   640.112k         1.01361   5.51314k/s 3.52903G/s     2.07459M                3.2851                3.24098        100       551.314k/s [best: throughput=  3.67 GB/s doc_throughput=  5813 docs/s instructions=     2074593 cycles=      635165 branch_miss=    1384 cache_miss=       0 cache_ref=     58443 items=       100 avg_time=    173490 ns]
    Creating a source file spanning 44921 KB 
    LargeRandom<Dom>      88806316 ns     88806323 ns            8         871.991k           518.893M        10.8318M       15.3949M    326.983M              7.10851           11.2806       3.68855G          970.316M                    21.0944                     2.96748           11.2806M    872.063k   45.9988M       493.972M/s   10.8779M   15.3961M   327.465M         7.11899    11.2605/s 3.68741G/s     970.316M               21.0944                2.96311      1000k       11.2605M/s [best: throughput=  0.52 GB/s doc_throughput=    11 docs/s instructions=   970315579 cycles=   326982612 branch_miss=  871991 cache_miss=10831777 cache_ref=  15394882 items=   1000000 avg_time=  88792975 ns]
    LargeRandomSum<Dom>   89824596 ns     89807350 ns            8          872.01k           513.081M        10.3216M       14.5612M    330.716M              7.18967           11.1542       3.68889G          974.316M                    21.1813                     2.94608           11.1542M    871.863k   45.9988M       488.466M/s   10.3039M   14.5623M   331.208M         7.20036    11.1349/s 3.68798G/s     974.316M               21.1813                 2.9417      1000k       11.1349M/s [best: throughput=  0.51 GB/s doc_throughput=    11 docs/s instructions=   974315579 cycles=   330716103 branch_miss=  872010 cache_miss=10321626 cache_ref=  14561194 items=   1000000 avg_time=  89811290 ns]
    LargeRandom<Sax>      62784008 ns     62783943 ns           11         913.123k           738.521M        5.61415M       8.01956M    229.784M              4.99545           16.0552       3.68924G          672.695M                    14.6242                      2.9275           16.0552M    918.166k   45.9988M       698.711M/s   5.65478M   8.02137M   231.532M         5.03344    15.9276/s 3.68776G/s     672.695M               14.6242                2.90541      1000k       15.9276M/s [best: throughput=  0.74 GB/s doc_throughput=    16 docs/s instructions=   672694521 cycles=   229784372 branch_miss=  913123 cache_miss= 5614148 cache_ref=   8019564 items=   1000000 avg_time=  62771549 ns]
    

    Loose Ends

    Things that likely won't get finished in this checkin, but might be considered for a full release (and also might not :)):

    • Bugs
      • Possible corruption with incomplete arrays / objects ("[ 1, 2, ")
      • Win32 / VS2019 failure (#1208)
    • Rough Edges
      • x["a"]["b"] unsupported (right now x["a"] would be released early and the element never gets fully skipped)
      • parser.load()
      • parser.iterate(buf, len)
      • get_c_str()
      • Out-of-order key lookup support
    • Features
      • Print / minify
      • document_stream
      • Validation of skipped values
      • Strict object support (don't ever skip keys, error immediately if the next key is not what you expect)
      • Nullable value support: .get_nullable_int64()
      • .is_false_or_null() is probably useful ...
      • SIMDJSON_ONDEMAND_SAFETY_RAILS tests
      • Compile-time safety tests (make sure bad things don't compile)
    • Performance
      • Add ondemand to competitions
      • Make more competitions
      • Performance optimization for recursion
      • Tuple support [ x, y, z ]? It would force-unroll loops, basically, possibly improving performance.
      • Ability to stop iterating when finished (right now it will inspect and skip all remaining elements)
    • Sanity checks:
      • Sanity review of & and && versions of methods (I hate the error messages but I hate letting people compile programs that will fail at runtime even more). Can we make it so people can use things in more flexible ways (for example, ["a"]["b"] above)? Can we make error messages better?
      • Sanity review of document vs. value. I don't like that they behave almost identically but have different code. Even with the root value parsing difference notwithstanding, the fact that document owns the iterator and value has a reference makes the code fundamentally non-reusable. We should look into doing something about that.

    Next Steps

    • [X] Supported way to use it on multiple platforms ("single kernel mode"? Plug in to simdjson's architecture selection?)
    • [X] Parse numbers/booleans at the root correctly, without overrun
    • [X] Don't overrun when objects/arrays are unbalanced
    • [X] Thorough type tests similar to DOM API tests
    • [X] Error tests for common error cases (to make sure errors are actually raised and iteration stops)
    • [X] Last performance check to ensure we haven't dropped below 4.0GB/s in the final round of fixes
    • [X] Resolve compiler failures on other platforms
    enhancement performance research 
    opened by jkeiser 187
  • Bringing ndjson(document_stream) to On Demand

    Bringing ndjson(document_stream) to On Demand

    I have trying to implement just a simple document stream for On Demand. The main issue, as discussed by @lemire in this comment, is that On Demand does not know where the end of a single document lies in a document stream (if stage1 covers multiple documents). To overcome this, I created a JSON iterator that will be used to traverse a document and go to the next document when needed. This JSON iterator is also used to create a document instance when the *operator is called.

    • [x] implements threaded version so that stage 1 is processed independently
    • [x] add the rest of the tests so that we have as good a coverage as the DOM document_stream,
    • [x] add documentation and examples,
    • [x] add benchmarking.

    Fixes https://github.com/simdjson/simdjson/issues/1464

    opened by NicolasJiaxin 106
  • MSVC simdjson is slower than g++ on Windows

    MSVC simdjson is slower than g++ on Windows

    On the same machine and OS, WSL g++ 7.5-compiled simdjson parses at 2.6GB/s and MSVC 2019-compiled simdjson parses at 1.0GB/s. ClangCL parses at 1.4GB/s, so there might be a link.exe thing going on there. My machine is Kady Lake R (AVX2 but not AVX512).

    After investigation: these seem to be the major impactors:

    • [ ] 40%: @TrianglesPCT may be fixing some or all of the most major regression, caused by generic SIMD, by removing lambdas.
    • [ ] 10%: We need to understand why this did not fully recover the performance we had before this. Either one of them could be the culprit, but it's probably not anything in between.
    • [ ] 10%: We need to understand why we lost another 10% to the stage 1 structural scanner refactor.

    Data

    g++ 7.5.0 under WSL

    [email protected]:~/simdjson/build$ benchmark/parse ../jsonexamples/twitter.json
    number of iterations 200 
                                                         
    ../jsonexamples/twitter.json
    
         9867 blocks -     631515 bytes - 55263 structurals (  8.8 %)
    special blocks with: utf8      2284 ( 23.1 %) - escape       598 (  6.1 %) - 0 structurals      1287 ( 13.0 %) - 1+ structurals      8581 ( 87.0 %) - 8+ structurals      3272 ( 33.2 %) - 16+ structurals         0 (  0.0 %)
    special block flips: utf8      1104 ( 11.2 %) - escape       642 (  6.5 %) - 0 structurals       940 (  9.5 %) - 1+ structurals       940 (  9.5 %) - 8+ structurals      2593 ( 26.3 %) - 16+ structurals         0 (  0.0 %)
    
    All Stages
    |    Speed        :  24.3210 ns per block ( 70.04%) -   0.3800 ns per byte -   4.3429 ns per structural -    2.631 GB/s
    |- Stage 1
    |    Speed        :  11.5728 ns per block ( 33.33%) -   0.1808 ns per byte -   2.0665 ns per structural -    5.530 GB/s
    |- Stage 2
    |    Speed        :  12.6267 ns per block ( 36.36%) -   0.1973 ns per byte -   2.2547 ns per structural -    5.068 GB/s
    
    3181.7 documents parsed per second
    

    VS 2019 (cl.exe 19.25.28614)

    PS C:\Users\john\Source\simdjson\build> .\benchmark\Release\parse.exe ..\jsonexamples\twitter.json
    number of iterations 200 
    
    ..\jsonexamples\twitter.json
    
         9867 blocks -     631515 bytes - 55263 structurals (  8.8 %)
    special blocks with: utf8      2284 ( 23.1 %) - escape       598 (  6.1 %) - 0 structurals      1287 ( 13.0 %) - 1+ structurals      8581 ( 87.0 %) - 8+ structurals      3272 ( 33.2 %) - 16+ structurals         0 (  0.0 %)
    special block flips: utf8      1104 ( 11.2 %) - escape       642 (  6.5 %) - 0 structurals       940 (  9.5 %) - 1+ structurals       940 (  9.5 %) - 8+ structurals      2593 ( 26.3 %) - 16+ structurals         0 (  0.0 %)
    
    All Stages
    |    Speed        :  65.5249 ns per block ( 83.29%) -   1.0239 ns per byte -  11.7004 ns per structural -    0.977 GB/s
    |- Allocation
    |    Speed        :   2.8679 ns per block (  3.65%) -   0.0448 ns per byte -   0.5121 ns per structural -   22.315 GB/s
    |- Stage 1
    |    Speed        :  32.2862 ns per block ( 41.04%) -   0.5045 ns per byte -   5.7652 ns per structural -    1.982 GB/s
    |- Stage 2
    |    Speed        :  29.4285 ns per block ( 37.41%) -   0.4598 ns per byte -   5.2549 ns per structural -    2.175 GB/s
    
    1976.0 documents parsed per second
    

    VS 2019 (cl.exe 19.25.28614) with /arch:AVX2

    Compiling with /arch:AVX2 only gave a 10% improvement:

    PS C:\Users\john\Source\simdjson\build> .\benchmark\Release\parse.exe ..\jsonexamples\twitter.json
    number of iterations 200
    
    ..\jsonexamples\twitter.json
    
         9867 blocks -     631515 bytes - 55263 structurals (  8.8 %)
    special blocks with: utf8      2284 ( 23.1 %) - escape       598 (  6.1 %) - 0 structurals      1287 ( 13.0 %) - 1+ structurals      8581 ( 87.0 %) - 8+ structurals      3272 ( 33.2 %) - 16+ structurals         0 (  0.0 %)
    special block flips: utf8      1104 ( 11.2 %) - escape       642 (  6.5 %) - 0 structurals       940 (  9.5 %) - 1+ structurals       940 (  9.5 %) - 8+ structurals      2593 ( 26.3 %) - 16+ structurals         0 (  0.0 %)
    
    All Stages
    |    Speed        :  60.7013 ns per block ( 82.70%) -   0.9485 ns per byte -  10.8391 ns per structural -    1.054 GB/s
    |- Allocation
    |    Speed        :   2.4726 ns per block (  3.37%) -   0.0386 ns per byte -   0.4415 ns per structural -   25.882 GB/s
    |- Stage 1
    |    Speed        :  27.1889 ns per block ( 37.04%) -   0.4249 ns per byte -   4.8550 ns per structural -    2.354 GB/s
    |- Stage 2
    |    Speed        :  29.8135 ns per block ( 40.62%) -   0.4659 ns per byte -   5.3236 ns per structural -    2.147 GB/s
    
    2246.1 documents parsed per second
    
    performance 
    opened by jkeiser 103
  • UTF-8 validation flag lookup algorithm

    UTF-8 validation flag lookup algorithm

    This lookup algorithm's primary feature is that it does most of the work with 3 lookup tables against the high nibbles of bytes 1 and 2, and the low nibble of byte 1.

    EDIT: @zwegner independently came up with a better variant that uses scalar masks to process continuation bytes (which probably makes better use of the execution cores by spreading the load across simd and scalar execution units). I have integrated it here. ARM uses fastvalidate still, because none of the algorithms could match it.

    UTF-8 Shootout

    While evaluating the algorithms, I ran a "UTF-8 shootout" to figure out what was going to be the fastest. What you see here represents the winner :)

    I put all of the algorithms in here in separate headers, which can be switched between by changing the #include. A brief "shootout" between the algorithms, running stage 1 against twitter.json with ./parse -tf -n 1000 jsonexamples/twitter.json (and ./parse -tfs -n 1000 jsonexamples/twitter.json for SSE) yields this on my Kaby Lake machine (run multiple times for each and pick the best number):

    twitter.json:

    | | AVX2 | SSE4.2 | |--------------|--------------|----------------| | @zwegner | 5.952074 | 3.364491 | | lookup | 5.825784 | 3.400727 | | range | 5.715068 | 3.263643 | | fastvalidate | 5.588628 | 3.208918 |

    random.json:

    | | AVX2 | SSE4.2 | |--------------|--------------|----------------| | @zwegner | 4.318748 | 2.632677 | | lookup | 4.087078 | 2.387633 | | range | 3.917698 | 2.295306 | | fastvalidate | 3.778505 | 2.174089 |

    gsoc-2018.json:

    | | AVX2 | SSE4.2 | |--------------|--------------|----------------| | @zwegner | 7.923407 | 4.403640 | | lookup | 8.091007 | 4.583158 | | range | 7.891465 | 4.483739 | | fastvalidate | 7.949907 | 4.563674 |

    This algorithm uses 4-bit table lookups to lookup 8-bit "error flags" from each nibble, and treats them as an error if all nibbles in the sequence have the error flag. Turns out UTF-8 has 8 3-nibble sequences (byte 1 alongside the high nibble of the next 4 bytes) and 2 2-nibble error sequences (byte 1 by itself). It is also notable that, incredibly, no more than [8 distinct combinations] of the first byte and later paired bytes are required to reliably detect all errors.

    It works by sequences of 4-bit table lookups, &'d together, so that the error is present only if all nibbles are part of the error value. For example, to detect this overlong encoding of "a" (0x61):

    • overlong encoding of a: 11000001 10010001
    • Lookup high nibble 1100 in this table, yielding ERROR_OVERLONG_2
    • Lookup low nibble 0001 in this table, yielding ERROR_OVERLONG_2
    • When &'d together, the bits align and we have an error! If the low nibble had been 0010, we would have detected no errors.

    The algorithm is simple:

    1. First byte errors: Check for errors that only require the first byte to detect: overlong encoding of 2 bytes (1100000_) and a subset of too-large encodings (anything >= 11110101). This is done by a pair of table lookups (high and low nibble) and a single &, Too-large encodings could be done quickly with >, but when we're figuring out both of these together, I couldn't find any way to beat 2 table lookups and an &, instruction-count-wise.
    2. Second byte errors: Check for errors that require the first byte + another (overlong 3- and 4-byte encodings, missing/extra continuations, some of the smaller overlarge values, and surrogates). To accomplish this, we're essentially doing the same thing as step 1 (flag lookups with &) but with 3 nibbles each time.

    This made AVX around 1-1.5% faster on my machine (about the same as the range lookup algorithm), and astonishingly passed make test the first time I ran it! Submitting this is partly to see what ARM thinks of it :)

    I'm curious if anyone has ideas for further improvement. It feels like overuse of the AND-flag-lookup method, but that method takes so few instructions (4-5 per pair) that it's hard to come up with non-lookup-based methods that compare. I suspect 2-byte-long error detection of overlong/surrogate/too-large bytes is about as optimal as it can get, but perhaps there are clever ways to handle missing/extra continuation detection or the 1-byte-long error detection that save us a few instructions.

    opened by jkeiser 97
  • [WIP] Exact float parsing

    [WIP] Exact float parsing

    This is an attempt at providing exact (or more exact) float parsing at high speed.

    • [ ] in functions like compute_float_64 and the 128-bit counterpart, we need to ensure that the binary exponent is always greater than 0 and strictly smaller than 0x7FF.
    • [ ] we need to add more testing
    • [ ] we should compare the performance against abseil-cpp, ScanadvDouble, Boost Spirit and Andrei Alexandrescu's implementation (Folly)
    • [ ] the lookup tables could be made smaller, we should investigate
    • [ ] Make sure that the code compiles under most compilers including Visual Studio; this involves avoiding 128-bit integers.
    • [ ] Benchmark on ARM processors.

    Fixes https://github.com/lemire/simdjson/issues/242

    This replaces https://github.com/lemire/simdjson/pull/303

    research 
    opened by lemire 87
  • elimination of g++ -Weffc++ warnings

    elimination of g++ -Weffc++ warnings

    • document.h bugfixes
      • operator++
      • document.h
        • noninitialized members
        • document_stream load_many trailing whitespace
    • implementation.h
      • missing virtual destructor
      • whitespaces (not me my editor)
    • parsedjson_iterator.h
      • operator=
    • document_stream.h
      • trailing blank
      • noninitialized members
    • document.h
      • trailing witespace
      • noninitialized members
      • operator++
    • parsedjson_iterator.h
      • noninitialized members json_minifier.h
      • noninitialized members json_scanner.h
      • noninitialized members
      • trailing space
    • json_structural_indexer.h
      • noninitialized members
      • trailing space
    • stage2_build_tape.h
      • noninitialized members
    opened by ostri 80
  • MSVC simdjson twice as slow as ClangCL

    MSVC simdjson twice as slow as ClangCL

    On Windows, simdjson compiled with MSVC parses twitter.json at almost half the speed of ClangCL-compiled simdjson, on the same machine, in the same command prompt.

    | Platform | Overall | Stage 1 | Stage 2 | |---|---|---|---| | MSVC 19.25.28614 | 1.3051 | 2.3777 | 3.3898 | | ClangCL 9.0.0 | 2.2221 | 5.4161 | 4.6401 |

    Methodology:

    • MSVC: git clean -ffdx build && cmake -B build && cmake --build build --target parse --config Release && build\benchmark\Release\parse jsonexamples\twitter.json
    • ClangCL: git clean -ffdx build && cmake -B build -T ClangCL && cmake --build build --target parse --config Release && build\benchmark\Release\parse jsonexamples\twitter.json

    I validated that MSVC simdjson is using the haswell implementation, both by running json2json to print out the implementation, and by doing set SIMDJSON_FORCE_IMPLEMENTATION=haswell.

    performance platform coverage 
    opened by jkeiser 64
  • Faster float parsing

    Faster float parsing

    @lemire here's a version that works for all floats in canada.json. there are a few edge cases to be worked out, and the perf does take a hit (about a 20% increase). but i will be playing with using a mix of powers of 10 and 5, or just using both all 10 and all 5, to see if i can get more accurate mantissas (or at least, mantissas whose inaccuracy is not correlated). it does hit the fast path most of the time, just gets hit hard by what i guess is a slow strtod on this mac

    research 
    opened by michaeleisel 52
  • [WIP] Unroll the loop, do more work during pipeline stall

    [WIP] Unroll the loop, do more work during pipeline stall

    This patch improves simdjson performance on AVX-2 by 3-4% in my tests. This comment describes the 4 changes made, and the reasons why. It also adds a -f option to parse.cpp so that causes it to only run find_structural_bits--this was critical to isolating the different performance gains.

    opened by jkeiser 52
  • Change new usages to std::allocator to accept custom memory allocator

    Change new usages to std::allocator to accept custom memory allocator

    I've checked the performance using benchmark/perfdiff as well as benchmark/on_demand, and there seems to be no difference.

    The main issue with this PR is that std::allocator::allocate throws bad_alloc when it fails to allocate instead of gracefully failing into a nullptr like new (std::nothrow) does. Per this stack overflow post, forcing a bad_alloc with -fno_exceptions would result in an abort() (also confirmed locally).

    Edit: Also wasn't sure to mess with malloc calls or not, so I chose not to.

    Closes issue #1017

    opened by rrohak 46
  • Add CIFuzz integration

    Add CIFuzz integration

    Add CIFuzz workflow action to have fuzzers build and run on each PR. This is a service offered by OSS-Fuzz, on which simdjson already runs. I noticed you already had several CI fuzz jobs, but thought you might be interested in this as well.

    Signed-off-by: David Korczynski [email protected]

    opened by DavidKorczynski 2
  • Improve the depth model so that skip_child does not need to check for a key status

    Improve the depth model so that skip_child does not need to check for a key status

    Currently, we are sometimes left pointing at a key in On Demand when skip_child is called.

    This requires the introduction an ugly patch:

    https://github.com/simdjson/simdjson/pull/1743

    There might be a better design that avoids this complexity.

    on demand 
    opened by lemire 0
  • Allow JSON Pointer queries on arrays and objects in On Demand

    Allow JSON Pointer queries on arrays and objects in On Demand

    The JSON Pointer specification is a document-wide format. It does not have a notion of subdocument. There is not such thing as a relative query.

    Between each JSON Pointer query, on the On Demand document, we reset our string buffer (the buffer used for unescaped strings). This is needed because someone could do multiple JSON Pointer queries on the same document, possibly generating a buffer overflow. By resetting the string buffer, we ensure that buffer overflows are impossible.

    To achieve the same results over arrays and objects, we would need to similarly reset the string buffer between JSON Pointer calls. This would invalidate all other string references, including references outside of the array/object in question.

    https://github.com/simdjson/simdjson/pull/1696#issuecomment-901233038

    on demand 
    opened by lemire 1
  • Faster double-to-string operation when writing out JSON

    Faster double-to-string operation when writing out JSON

    I'm not sure how high of a priority the speed of writing out JSON is. But if it is, I have a prototype that runs in about 40% the time of the fastest library I know of, which is https://github.com/ulfjack/ryu (to be fair, ryu tries to cover a larger set of use cases than we need). In many ways it's just an inverse of what we do for string to double, and uses the same multiplication trick.

    performance 
    opened by michaeleisel 13
  • Investigate potential performance drops with respect to v0.9

    Investigate potential performance drops with respect to v0.9

    | | v0.9.7 | current main branch | loss | |:--|:---|:------------|:------------ |partial_tweets | 3.13 GB/s | 2.95 GB/s | 6% | | large_random | 0.80 GB/s | 0.73 GB/s | 10% | | kostya | 2.23 GB/s | 2.11 GB/s | 6% | | distinct_user_id | 3.64 GB/s | 2.95 GB/s | 25% | | find_tweet | 4.84 GB/s | 4.79 GB/s | 1% | | top_tweet | 3.44 GB/s | 3.43 GB/s | 0% |

    This might be related to https://github.com/simdjson/simdjson/pull/1663#issuecomment-892690965

    We should investigate this issue before releasing v1.0.0.

    We should do regression to find where and when we lost the performance.

    performance 
    opened by lemire 5
  • Experiment with bogus-error approach for no-overhead bound-check-like behavior

    Experiment with bogus-error approach for no-overhead bound-check-like behavior

    The simdjson library is highly optimized. Through clever optimizations, it avoids most bound checks.

    There are a few limitations. For example, we require a few bytes of padding at the end of the input (https://github.com/simdjson/simdjson/issues/174). We also refuse to parse a single JSON document that exceeds 4 GB (https://github.com/simdjson/simdjson/issues/128).

    To get around this, we have an outstanding PR https://github.com/simdjson/simdjson/pull/1665 which undoes these clever optimizations, adds regular bound checking, and lower the performance somewhat, but also allows you to lift the padding requirement.

    A more daring approach would not to not go back to conventional bound checking and, instead, push forward with our clever bound-free approach. Instead of doing all of these bound checks all over the place... examine the document when we get started, adjust the structural index so that at a strategic location you get a bogus error. This bogus error brings you into a distinct mode where you finish the processing with more careful code. Then you'd get the no-padding for free (given a large enough input).

    This "bogus error" approach is also how I would try to handle the "stage 1 in chunks". You give me a 6 GB JSON document. I index it in chunks of 1 MB. I change the index so that somewhere before the end of the chunk, I encounter a bogus error. Then I know to load a new index.

    This would be a bit challenging, for sure. And it would require that we maintain a slow path with bound checking at times. The latter could be achieved with templates, maybe.

    cc @jkeiser

    research 
    opened by lemire 4
  • [WIP] Lifting the padding requirement from simdjson APIs

    [WIP] Lifting the padding requirement from simdjson APIs

    @jkeiser did a lot of hard work on two PRs to remove the need for padding in the On Demand front-end: https://github.com/simdjson/simdjson/pull/1511 and https://github.com/simdjson/simdjson/pull/1606

    These PRs are diverging from our main branch more and more. So we need to refresh them. Let us do so.

    • [x] merge fully all stale PRs
    • [x] actually turn on the feature
    • [x] update documentation
    • [x] assess performance change
    • [ ] debug/make tests pass, run through sanitizers
    • [ ] Recover some of the performance loss.

    In DOM, there is an unacceptable performance regression.

    In On Demand, there is definitively a performance regression, but it is in the expected range (5%). Factors to take into account: (1) we improve the usability (2) in some cases, we save a copy and some memory usage which might be worth more than 5% (3) we might be able to claw back this extra 5%. At a glance, the regression can be explained by the extra number of instructions we need. For example, we get the worst regression with distinct_user_id (pointer), but the number of instructions went from 2166639 to 2431248, so a 12% uptick. (4) The ~5% range is the kind of performance difference we get just by switching compiler... so it is unlikely to displease an existing user (they will not notice the regression).

    | task | main branch | this PR | loss (%) | |:------|:--------------|:----------|:------ | amazon_cellphones | 2.04 GB/s | 1.92 GB/s | 6% | | large_amazon_cellphones | 2.07 GB/s | 2.18 GB/s |-5% | | partial_tweets | 3.04 GB/s | 3.00 GB/s | 1% | | large_random | 0.77 GB/s | 0.74 GB/s | 4% | | large_random (unordered) |  0.74 GB/s | 0.71 GB/s |4% | | kostya | 2.29 GB/s | 2.18 GB/s | 5% | | distinct_user_id | 3.50 GB/s | 3.35 GB/s |4% | | distinct_user_id (pointer) | 3.14 GB/s | 2.85 GB/s | 9%| | find_tweet | 4.81 GB/s | 4.81 GB/s | 0% | | top_tweet | 3.40 GB/s | 3.41 GB/s | 0% | | top_tweet (forward) | 3.11 GB/s | 3.01 GB/s | 3 % |

    It is a PR on top of https://github.com/simdjson/simdjson/pull/1663

    Fixes https://github.com/simdjson/simdjson/issues/174

    research 
    opened by lemire 21
  • Document precisely what we validate for skipped values

    Document precisely what we validate for skipped values

    We always validate whatever you use, as well as any objects and arrays that are part of the path to it. The idea is, you should never get the wrong value just because something you skipped has an error in it; but there are many errors that won't affect that.

    A (partial?) list for documentation of what we do and don't validate for skipped values:

    Strings:

    • We ALWAYS validate the boundaries of strings (that strings are properly closed, taking into account \"). This means that even if a string has some invalid stuff inside, we validate enough that they cannot affect anything else.
    • We ALWAYS validate string characters (newlines and control characters being disallowed).
    • We only validate the content of escape characters if you unescape a string (like using get_string() or unescaped_key()). i.e. \p is not allowed, and \u must be followed by hex digits.

    Numbers/Booleans/Null:

    • For the purposes of structure, we ALWAYS treat any sequence of non-whitespace (except { } [ ] : and ,) is a single number/boolean/null. This means we'll treat that -0ab-10 trrrue as two values without a comma between them, whether you actually use the values or not.
    • We only validate the content of numbers, booleans and null if you actually use them.

    Arrays/Objects:

    • We ALWAYS validate that objects/arrays are closed before continuing to iterate, even if the objects/arrays are skipped.
    • We only check for missing or extra , or : when you iterate or index an array or object.
    • We only check that keys are strings when you iterate or index an array or object.
    • We only validate remaining ,/: in an array or object if you fully iterate it.
    • We only validate that the closing ] or } matches the opening one if you fully iterate the object/array.

    Document:

    • We ALWAYS validate that the JSON is not empty.
    • We NEVER check if there is extra JSON after your document.
    opened by jkeiser 7
  • Document the On Demand depth model

    Document the On Demand depth model

    The On Demand front-end depends crucially on how it tracks the current depth. An improper depth state can break the parsing. I think that before we release 1.0, we should have a clearly documented model of how the depth is meant to change.

    Furthermore, we should also clearly document the depth and positions at different stages where the user is handed back control.

    I got started with a wiki at https://github.com/simdjson/simdjson/wiki/OnDemand-logic

    on demand 
    opened by lemire 0
  • Consider parsing -0 as a double and not as zero

    Consider parsing -0 as a double and not as zero

    Currently, we parse -0 as 0 (integer value) and -0.0 as (-0, double value) in the DOM API.

    opened by lemire 7
Releases(v1.0.2)
  • v1.0.2(Oct 27, 2021)

  • v1.0.1(Oct 20, 2021)

  • v1.0.0(Sep 7, 2021)

    Release 1.0.0 of the simdjson library builds on earlier pre-1.0 release that made the On Demand frontend our default. The On Demand front-end is a new way to build parser. With On Demand, if you open a file containing 1000 numbers and you need just one of these numbers, only one number is parsed. If you need to put the numbers into your own data structure, they are materialized there directly, without being first written to a temporary tree. Thus we expect that the simdjson On Demand might often provide superior performance, when you do not need to intermediate materialized view of a DOM tree. The On Demand front-end was primarily developed by @jkeiser.

    If you adopted simdjson from an earlier version and relied on the DOM approach, it remains as always. Though On Demand is our new default, we remain committed to supporting the conventional DOM approach in the future, as there are instances where it is more appropriate.

    Release 1.0.0 adds several key features:

    • In big data analytics, it is common to serialize large sets of records as multiple JSON documents separated by while spaces. You can now get the benefits of On Demand while parsing almost infinitely long streams of JSON records (see iterate_many). At each step, you have access to the current document, but a secondary thread indexes the following block. You can thus access enormous files while using a small amount of memory and achieve record-breaking speeds. Credit: @NicolasJiaxin.
    • In some cases, JSON documents contain numbers embedded within strings (e.g., "3.1416"). You can access these numbers directly using methods such as get_double_in_string(). Credit: @NicolasJiaxin
    • Given an On Demand instance (value, array, object, etc.), you can now convert it to a JSON string using the to_json_string method which returns a string view in the original document for unbeatable speeds. Credit: @NicolasJiaxin
    • The On Demand front-end now supports the JSON Pointer specification. You can request a specific value using a JSON Pointer within a large document. Credit: @NicolasJiaxin
    • Arrays in On Demand now have a count_elements() method. Objects have a count_fields() method. Arrays and objects have a reset method for when you need to iterate through them more than once. Document instances now have a rewind method in case you need to process the same document multiple times.

    Other improvements include:

    • We have extended and improved our documentation and we have added much testing.
    • We have accelerated the JSON minification function (simdjson::minify) under ARM processors (credit @dougallj)

    We encourage users of previous versions of the simdjson library to update. We encourage users to deploy it for production uses.

    Source code(tar.gz)
    Source code(zip)
    simdjson.cpp(520.94 KB)
    simdjson.h(1.13 MB)
  • v0.9.7(Jul 31, 2021)

    Fixing an issue whereas the operator << would halt the program due to a noexcept clause when called with a simdjson_result input.

    credit @teathinker

    Source code(tar.gz)
    Source code(zip)
  • v0.9.6(Jun 6, 2021)

    This is a patch release fixing issue https://github.com/simdjson/simdjson/issues/1601 That is, users reusing the same document instance may get spurious errors in following parsing attempts because the "error" flag in the document is 'sticky'. This patch fixes the issue by modifying two lines of code.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.5(May 28, 2021)

    Patch release fixing issue 1588: Unexpected behavior while traversing a json array containing json objects containing a subset of known keys. This patch adds a single line. Users relying on the On Demand front-end should update.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.4(May 20, 2021)

    This fourth patch release is the second and final fix on an issue with the ondemand front-end where when we search for a key and it is not found, we can end up in a poor state from which follow-up queries will lead to spurious errors even with a valid JSON.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.3(May 14, 2021)

  • v0.9.2(Apr 1, 2021)

    This is a patch release which fixes a bug for users of the On Demand front-end. In some instances, when trying to access keys that are missing, the parser will fail with a generic error (TAPE_ERROR) under versions 0.9.0 and 0.9.1. Thanks to @jpalus for reporting the issue (https://github.com/simdjson/simdjson/issues/1521) and to @jkeiser for reviewing the patch.

    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Mar 18, 2021)

  • v0.9.0(Mar 17, 2021)

    The high-performance On Demand front-end introduced in version 0.7 and optimized in version 0.8 becomes the primary simdjson front-end in version 0.9 (credit @jkeiser). The On Demand front-end benefits from new features:

    • The On Demand elements have received a type() method so branching on the schema becomes possible.
    • We make it safer to recover one JSON element as multiple types https://github.com/simdjson/simdjson/pull/1414
    • We added safety checks for out-of-order iterations https://github.com/simdjson/simdjson/pull/1416

    Other changes :

    • You can now parse a DOM document as a separate entity from the parser https://github.com/simdjson/simdjson/pull/1430
    • Improve compatibility with the Qt library.
    • We encourage users to replace templated get<T>() methods with specific methods, using get_object() instead of get<simdjson::dom::object>(), and get_uint64() instead of get<uint64_t>().
    • dom::document_stream::iterator has received a default constructor and is copyable
    Source code(tar.gz)
    Source code(zip)
  • v0.8.2(Feb 11, 2021)

    This patch adds an explicit include for string_view which is needed by Visual Studio 2017. It also includes other minor Visual Studio 2017 fixes. Visual Studio 2017 are reminded that they should target specifically x64 builds.

    credit to @NancyLi1013 for reporting the issue

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Jan 28, 2021)

  • v0.8.0(Jan 22, 2021)

    The high-performance On Demand front-end introduced in version 0.7 has received major quality-of-life and performance improvements (credit @jkeiser).

    • Runtime dispatching is now supported, achieving high performance without compiling for a specific CPU.
    • Object field lookup is now order-insensitive: double x = object["x"]; double y = object["y"]; will work no matter which order the fields appear in the object. Reading fields in order still gives maximum performance.
    • Object lookup and array iteration can now be used against untyped values, enabling things like chained lookup (object["a"]["b"])
    • Numbers, strings and boolean values can be saved and parsed later by storing the ondemand::value, allowing more efficient filtering and bulk parsing, as well as fixing smaller quality-of-life issues.

    We have improved our CMake build with respect to installed artefacts so that CMake dependencies automatically handle thread dependencies.

    We have greatly improved our benchmarks with a set of realistic tasks on realistic datasets, using Google Benchmark as a framework.

    skylake-clang10

    Source code(tar.gz)
    Source code(zip)
  • v0.7.1(Dec 14, 2020)

    It is a patch release for version 0.7.0 which mistakenly disabled, by default, optimized ARM NEON and POWER kernels. The result was a substantial lost of performance, by default, on these platforms. Users could still work around the limitation by passing macro values to the compiler.

    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Dec 5, 2020)

    This version improves our support for streams of documents (ndjson). We have improved the documentation, especially with how one might deal with truncated inputs and track the location of the current JSON. We have added fuzz testing and other tests. We added some minor fixes.

    Performance:

    • SIMD accelerated (AltiVec) kernel for POWER processors.

    Platform support:

    • We can now disable exceptions when compiling with MSVC
    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Nov 4, 2020)

    This is a minor patch release for version 0.6.0 to support legacy libc++ builds.

    https://github.com/simdjson/simdjson/issues/1285

    https://github.com/simdjson/simdjson/issues/1286

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Oct 23, 2020)

    Release 0.6 features faster serialization, updated number parsing and a new parsing API (On Demand).

    Novel parsing approach: Prior to release 0.6, the simdjson only supported the DOM model. In the DOM model, you parse the entire document eagerly and then query the resulting parsed document. With release 0.6, we have introduced an innovative new query model called "On Demand" (credit: @jkeiser). As the name suggests, the programmer can now access only the components of the JSON documents that are needed. It is useful when the dialect of the JSON document is known at compile-time. It can multiply the performance: on a benchmark where we retrieve unique identifiers from a file, we go from 2.3 GB/s to 4.6 GB/s. Compared to the generic DOM approach, the On Demand strategy may improve parsing speed and reduce memory usage. The On Demand API is both easy to use and highly efficient. We encourage our users to try out the new On Demand API and to provide constructive feedback. Though it is well tested, we consider On Demand to be a novel and experimental feature that will receive further updates in the upcoming releases including runtime dispatching. Users should be mindful that the On Demand API is subject to change in future releases, we encourage testing more than deployment. Please check our documentation: https://github.com/simdjson/simdjson/blob/master/doc/ondemand.md

    Performance:

    • Serializing a parsed JSON element is about 10x faster compared to previous releases.

    New features:

    • We make it easier to check whether a given implementation/kernel is available at runtime with a new function my_implementation->supported_by_runtime_system().

    Correctness:

    • Serialization is now guaranteed to be locale-independent.
    • We have redone the number parsing routines and extended our testing. We no longer fall back on the system's C library thus avoiding potential sources of unexact parsing.
    • Fuzz testing has been extended. E.g., our standalone UTF-8 validation is now fuzz tested. (Credit @pauldreik)
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Aug 19, 2020)

    Performance

    • Faster and simpler UTF-8 validation with the lookup4 algorithm https://github.com/simdjson/simdjson/pull/993
    • We improved the performance of simdjson under Visual Studio by about 25%. Users will still get better performance with clang-cl (+30%) but the gap has been reduced. https://github.com/simdjson/simdjson/pull/1031

    Code usability

    • In parse_many, when parsing streams of JSON documetns, we give to the users runtime control as to whether threads are used (via the parser.threaded attribute). https://github.com/simdjson/simdjson/issues/925
    • Prefixed public macros to avoid name clashes with other libraries. https://github.com/simdjson/simdjson/issues/1035
    • Better documentation regarding package managers (brew, MSYS2, conan, apt, vcpkg, FreeBSD package manager, etc.).
    • Better documentation regarding CMake usage.

    Standards

    • We improved standard compliance with respect to both the JSON RFC 8259 and JSON Pointer RFC 6901. We added the at_pointer method to nodes for standard-compliant JSON Pointer queries. The legacy at(std::string_view) method remains but is deprecated since it is not standard-compliant as per RFC 6901.
    • We removed computed GOTOs without sacrificing performance thus improving the C++ standard compliance (since computed GOTOs are compiler-specific extensions).
    • Better support for C++20 https://github.com/simdjson/simdjson/pull/1050
    Source code(tar.gz)
    Source code(zip)
  • v0.4.7(Jul 17, 2020)

    This is a patch release to remove an overhead of 100 ns per query. It will improve performance on small files. The fix involves the removal of a single line of code. Credit to @vitlibar from Yandex.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.6(Jul 1, 2020)

  • v0.4.5(Jun 30, 2020)

  • v0.4.4(Jun 30, 2020)

  • v0.4.3(Jun 30, 2020)

  • v0.4.2(Jun 28, 2020)

    Fix an installation bug whereas the header may not get installed when only the library was built (issue https://github.com/simdjson/simdjson/issues/1000)

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Jun 27, 2020)

    This is a patch release to fix minor issues with release 0.4.0.

    • We removed a spurious 'optional' header (which was unused).
    • We document std::string_view aliasing for pre-C++17 compilers.
    • We made the cmake build more robust (missing git, ninja).
    • When used inside another cmake project, we just build the library by default (no test or benchmark).
    • We improved compatibility with 32-bit systems.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jun 24, 2020)

    Highlights

    • Test coverage has been greatly improved and we have resolved many static-analysis warnings on different systems.

    New features:

    • We added a fast (8GB/s) minifier that works directly on JSON strings.
    • We added fast (10GB/s) UTF-8 validator that works directly on strings (any strings, including non-JSON).
    • The array and object elements have a constant-time size() method.

    Performance:

    • Performance improvements to the API (type(), get<>()).
    • The parse_many function (ndjson) has been entirely reworked. It now uses a single secondary thread instead of several new threads.
    • We have introduced a faster UTF-8 validation algorithm (lookup3) for all kernels (ARM, x64 SSE, x64 AVX).

    System support:

    • C++11 support for older compilers and systems.
    • FreeBSD support (and tests).
    • We support the clang front-end compiler (clangcl) under Visual Studio.
    • It is now possible to target ARM platforms under Visual Studio.
    • The simdjson library will never abort or print to standard output/error.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Apr 2, 2020)

    • Fixed a problem with cmake install https://github.com/simdjson/simdjson/issues/660
    • Bumped SOVERSION https://github.com/simdjson/simdjson/pull/662
    • Add element.type() https://github.com/simdjson/simdjson/pull/669

    cc @jkeiser @TkTech @cdluminate

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Mar 31, 2020)

    Highlights

    • Multi-Document Parsing: Read a bundle of JSON documents (ndjson) 2-4x faster than doing it individually. API docs / Design Details
    • Simplified API: The API has been completely revamped for ease of use, including a new JSON navigation API and fluent support for error code and exception styles of error handling with a single API. Docs
    • Exact Float Parsing: Now simdjson parses floats flawlessly without any performance loss, thanks to great work by @michaeleisel and @lemire. Blog Post
    • Even Faster: The fastest parser got faster! With a shiny new UTF-8 validator and meticulously refactored SIMD core, simdjson 0.3 is 15% faster than before, running at 2.5 GB/s (where 0.2 ran at 2.2 GB/s).

    Minor Highlights

    • Fallback implementation: simdjson now has a non-SIMD fallback implementation, and can run even on very old 64-bit machines.
    • Automatic allocation: as part of API simplification, the parser no longer has to be preallocated-- it will adjust automatically when it encounters larger files.
    • Runtime selection API: We've exposed simdjson's runtime CPU detection and implementation selection as an API, so you can tell what implementation we detected and test with other implementations.
    • Error handling your way: Whether you use exceptions or check error codes, simdjson lets you handle errors in your style. APIs that can fail return simdjson_result, letting you check the error code before using the result. But if you are more comfortable with exceptions, skip the error code and cast straight to T, and exceptions will be thrown automatically if an error happens. Use the same API either way!
    • Error chaining: We also worked to keep non-exception error-handling short and sweet. Instead of having to check the error code after every single operation, now you can chain JSON navigation calls like looking up an object field or array element, or casting to a string, so that you only have to check the error code once at the very end.
    Source code(tar.gz)
    Source code(zip)
Owner
Parsing gigabytes of JSON per second
null
A header only C++11 library for parsing TOML

tinytoml A header only C++11 library for parsing TOML. This parser is based on TOML v0.4.0. This library is distributed under simplified BSD License.

mayah 154 Nov 23, 2021
cpptoml is a header-only library for parsing TOML

cpptoml A header-only library for parsing TOML configuration files. Targets: TOML v0.5.0 as of August 2018. This includes support for the new DateTime

Chase Geigle 533 Nov 21, 2021
Use to copy a file from an NTFS partitioned volume by reading the raw volume and parsing the NTFS structures.

ntfsDump Use to copy a file from an NTFS partitioned volume by reading the raw volume and parsing the NTFS structures. Similar to https://github.com/P

null 39 Dec 2, 2021
JSON for Modern C++

Design goals Sponsors Integration CMake Package Managers Pkg-config Examples JSON as first-class data type Serialization / Deserialization STL-like ac

Niels Lohmann 27.6k Dec 2, 2021
A C++ library for interacting with JSON.

JsonCpp JSON is a lightweight data-interchange format. It can represent numbers, strings, ordered sequences of values, and collections of name/value p

null 6k Dec 7, 2021
Header-only library for automatic (de)serialization of C++ types to/from JSON.

fuser 1-file header-only library for automatic (de)serialization of C++ types to/from JSON. how it works The library has a predefined set of (de)seria

null 49 Oct 3, 2021
Yet another JSON/YAML/BSON serialization library for C++.

ThorsSerializer Support for Json Yaml Bson NEW Benchmark Results Conformance mac linux Performance max linux For details see: JsonBenchmark Yet anothe

Loki Astari 272 Nov 18, 2021
Similar to C++ streams, but the stream elements are structured JSON data rather than characters.

JIOS : JSON Input Output Streams Similar to C++ streams, but the stream elements are structured JSON data rather than characters. Contents Features [P

Castedo Ellerman 3 Aug 16, 2019
Parsing gigabytes of JSON per second

simdjson : Parsing gigabytes of JSON per second JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach

null 15.1k Nov 29, 2021
Parsing gigabytes of JSON per second

simdjson : Parsing gigabytes of JSON per second JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach

null 15.1k Nov 27, 2021
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.

LruClockCache Low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. (see wiki for details) using MyKeyType = std::string; us

Hüseyin Tuğrul BÜYÜKIŞIK 18 Nov 18, 2021
https://github.com/json-c/json-c is the official code repository for json-c. See the wiki for release tarballs for download. API docs at http://json-c.github.io/json-c/

\mainpage json-c Overview and Build Status Building on Unix Prerequisites Build commands CMake options Testing Building with vcpkg Linking to libjson-

json-c 2.3k Dec 2, 2021
A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON

JSONCONS jsoncons is a C++, header-only library for constructing JSON and JSON-like data formats such as CBOR. For each supported data format, it enab

Daniel Parker 450 Dec 6, 2021
A convenience C++ wrapper library for JSON-Glib providing friendly syntactic sugar for parsing JSON

This library is a wrapper for the json-glib library that aims to provide the user with a trivial alternative API to the API provided by the base json-

Rob J Meijer 14 Jul 8, 2020
A simple class for parsing JSON data into a QVariant hierarchy and vice versa.

The qt-json project is a simple collection of functions for parsing and serializing JSON data to and from QVariant hierarchies. NOTE: Qt5 introduced a

null 291 Nov 30, 2021
A fast streaming JSON parsing library in C.

********************************************************************** This is YAJL 2. For the legacy version of YAJL see https

Lloyd Hilaiel 2k Nov 27, 2021
A C++11 or library for parsing and serializing JSON to and from a DOM container in memory.

Branch master develop Azure Docs Drone Matrix Fuzzing --- Appveyor codecov.io Boost.JSON Overview Boost.JSON is a portable C++ library which provides

Boost.org 254 Dec 5, 2021
Very fast Python JSON parsing library

cysimdjson Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser. It is Python bindings for the simdjson using Cyth

TeskaLabs 150 Dec 3, 2021
An easy-to-use and competitively fast JSON parsing library for C++17, forked from Bitcoin Cash Node's own UniValue library.

UniValue JSON Library for C++17 (and above) An easy-to-use and competitively fast JSON parsing library for C++17, forked from Bitcoin Cash Node's own

Calin Culianu 10 Aug 31, 2021
Fast JSON serialization and parsing in C++

DAW JSON Link v2 Content Intro Default Mapping of Types API Documentation - Member mapping classes and methods Cookbook Get cooking and putting it all

Darrell Wright 229 Nov 30, 2021
json_struct is a single header only C++ library for parsing JSON directly to C++ structs and vice versa

Structurize your JSON json_struct is a single header only library that parses JSON to C++ structs/classes and serializing structs/classes to JSON. It

Jørgen Lind 161 Nov 28, 2021
per - Simple unix permission viewer and converter

Per is a simple utility that can verbosely print unix permissions and convert between symbolic and numeric notations and vice-versa.

jarmuszz 6 Oct 7, 2021
Repo per il corso di Programmazione 2 - canale M-Z - A.A. 2020/21

Argomenti del corso Principi della Programmazione a Oggetti Astrazione Incapsulamento e Information Hiding Ereditarietà Polimorfismo Classi e oggetti

null 15 Nov 5, 2021
Algoritmi d'esame per il corso di Elaborazione delle Immagini all'Università degli Studi di Napoli

ELIM: Algoritmi d'esame Questo archivio contiene una collezione di algoritmi potenzialmente richiesti per la prova al calcolatore dell'esame di Elabor

Dario 2 Sep 15, 2021
Implement a program that recovers JPEGs from a forensic image, per the below.

Recover - CS50 Implement a program that recovers JPEGs from a forensic image, per the below. $ ./recover card.raw Background In anticipation of this

Vanessa Bauer 1 Oct 4, 2021
Repository per gli studenti dell'UNICAL che seguono il corso di Fondamenti di Programmazione 2

Fondamenti di programmazione 2 Per gli studenti dell'UNICAL che seguono il corso di Fondamenti di Programmazione 2 (2021/2022) Indice degli argomenti

Alessio 2 Dec 3, 2021
A BOF to parse the imports of a provided PE-file, optionally extracting symbols on a per-dll basis.

PE Import Enumerator BOF What is this? This is a BOF to enumerate DLL files to-be-loaded by a given PE file. Depending on the number of arguments, thi

null 68 Dec 1, 2021
Side panels for the Voron Zero printer that has 16 WS2812 RGB LED's per side.

Voron Zero RGB Side Panels Side panels for the Voron Zero printer with 16 WS2812 RGB LED's per side. The build below has some 3mm TAP Plastic black LE

Tim Abraham 1 Nov 28, 2021
All those who are willing to participate in the 7-day coding event Commit-Your-Code organized by GDSC, UIET have to fork this repository and contribute their solutions as per defined rules.

????‍?? Commit-Ur-Code ????‍?? GDSC UIET KUK ?? , welcomes you all to this amazing event where you will be introduced to the world of coding ?? . It i

Google Developer Student Club UIET KUK 10 Nov 4, 2021