Fast CSV parser and writer for Modern C++

Overview

csv2

Table of Contents

CSV Reader

#include <csv2/reader.hpp>

int main() {
  csv2::Reader<delimiter<','>, 
               quote_character<'"'>, 
               first_row_is_header<true>,
               trim_policy::trim_whitespace> csv;
               
  if (csv.mmap("foo.csv")) {
    const auto header = csv.header();
    for (const auto row: csv) {
      for (const auto cell: row) {
        // Do something with cell value
        // std::string value;
        // cell.read_value(value);
      }
    }
  }
}

Performance Benchmark

This benchmark measures the average execution time (of 5 runs after 3 warmup runs) for csv2 to memory-map the input CSV file and iterate over every cell in the CSV. See benchmark/main.cpp for more details.

cd benchmark
g++ -I../include -O3 -std=c++11 -o main main.cpp
./main <csv_file>

Hardware

MacBook Pro (15-inch, 2019)
Processor: 2.4 GHz 8-Core Intel Core i9
Memory: 32 GB 2400 MHz DDR4
Operating System: macOS Catalina version 10.15.3

Results (as of 23 APR 2020)

Dataset File Size Rows Cols Time
Denver Crime Data 111 MB 479,100 19 0.174s
AirBnb Paris Listings 196 MB 141,730 96 0.289s
2015 Flight Delays and Cancellations 574 MB 5,819,079 31 1.047s
StackLite: Stack Overflow questions 870 MB 17,203,824 7 1.505s
Used Cars Dataset 1.4 GB 539,768 25 1.979s
Title-Based Semantic Subject Indexing 3.7 GB 12,834,026 4 5.929s
Bitcoin tweets - 16M tweets 4 GB 47,478,748 9 7.040s
DDoS Balanced Dataset 6.3 GB 12,794,627 85 12.648s
Seattle Checkouts by Title 7.1 GB 34,892,623 11 12.883s
SHA-1 password hash dump 11 GB 2,62,974,241 2 19.505s
DOHUI NOH scaled_data 16 GB 496,782 3213 32.780s

Reader API

Here is the public API available to you:

template <class delimiter = delimiter<','>, 
          class quote_character = quote_character<'"'>,
          class first_row_is_header = first_row_is_header<true>,
          class trim_policy = trim_policy::trim_whitespace>
class Reader {
public:
  
  // Use this if you'd like to mmap and read from file
  bool mmap(string_type filename);

  // Use this if you have the CSV contents in std::string already
  bool parse(string_type contents);

  // Shape
  size_t rows() const;
  size_t cols() const;
  
  // Row iterator
  // If first_row_is_header, row iteration will start
  // from the second row
  RowIterator begin() const;
  RowIterator end() const;

  // Access the first row of the CSV
  Row header() const;
};

Here's the Row class:

// Row class
class Row {
public:
  // Get raw contents of the row
  void read_raw_value(Container& value) const;
  
  // Cell iterator
  CellIterator begin() const;
  CellIterator end() const;
};

and here's the Cell class:

// Cell class
class Cell {
public:
  // Get raw contents of the cell
  void read_raw_value(Container& value) const;
  
  // Get converted contents of the cell
  // Handles escaped content, e.g., 
  // """foo""" => ""foo""
  void read_value(Container& value) const;
};

CSV Writer

This library also provides a basic csv2::Writer class - one that can be used to write CSV rows to file. Here's a basic usage:

#include <csv2/writer.hpp>
#include <vector>
#include <string>
using namespace csv2;

int main() {
    std::ofstream stream("foo.csv");
    Writer<delimiter<','>> writer(stream);

    std::vector<std::vector<std::string>> rows = 
        {
            {"a", "b", "c"},
            {"1", "2", "3"},
            {"4", "5", "6"}
        };

    writer.write_rows(rows);
    stream.close();
}

Writer API

Here is the public API available to you:

template <class delimiter = delimiter<','>>
class Writer {
public:
  
  // Construct using an std::ofstream
  Writer(output_file_stream stream);

  // Use this to write a single row to file
  void write_row(container_of_strings row);

  // Use this to write a list of rows to file
  void write_rows(container_of_rows rows);

Compiling Tests

mkdir build && cd build
cmake -DCSV2_TEST=ON ..
make
cd test
./csv2_test

Generating Single Header

python3 utils/amalgamate/amalgamate.py -c single_include.json -s .

Contributing

Contributions are welcome, have a look at the CONTRIBUTING.md document for more information.

License

The project is available under the MIT license.

Issues
  • Quote character is not getting parsed

    Quote character is not getting parsed

    Hello,

    I have copy pasted the default example to read a CSV file into variable which works fine, but the quote characters are not removed with std::string value; cell.read_value(value);

    My dev environment: OS: Windows 10 Compiler: MSVC 2019

    example csv: a,b,c "Hello", 0.123, "World"

    opened by remz1337 3
  • could you make a package? (tag/release)

    could you make a package? (tag/release)

    it's easy integrate with xmake package management good for cmake fetchcontent management, too

    example https://github.com/xmake-io/xmake-repo/blob/master/packages/a/abseil/xmake.lua

    i already write a config, if you made the package, i send this PR to xmake repo `package("csv2") set_urls("https://github.com/p-ranav/csv2.git") set_homepage("https://github.com/p-ranav/csv2") set_description("A CSV parser library") set_license("MIT") // add_version()

    on_install(function (package)
        os.cp("include/csv2", package:installdir("include"))
    end)
    on_test(function (package)
        assert(package:has_cxxtypes("csv2::Reader<csv2::delimiter<','>, csv2::quote_character<'\"'>, csv2::first_row_is_header<false>>", 
        {configs = {languages = "c++11"}, includes = "csv2/reader.hpp"}))
    end)
    

    package_end()`

    opened by wanghenshui 2
  • Read a single cell. Can you provide sample

    Read a single cell. Can you provide sample

    Hello Pranav, can you provide an example of how to read the contents of a single cell? How to extract the contents that can be a string, an int or a double? I share the request of Amirmasoudabdol, that is to be able to access each cell to read/write the content. Thank you very much. Sergio

    opened by sergioferrari 2
  • LICENSE.termcolor doesn't exist

    LICENSE.termcolor doesn't exist

    https://github.com/p-ranav/csv2/blob/68ded29a7af0d6660afc41fb96677462e42578e2/CMakeLists.txt#L56

    I believe this is a copy and paste error and is meant to be LICENSE.mio?

    opened by SuperWig 2
  • warning when compiled with clang 11

    warning when compiled with clang 11

    /n1/env_centos/7.6/include/csv.hpp:6313:20: warning: explicitly defaulted move assignment operator is implicitly deleted [-Wdefaulted-function-deleted]
    [build]         CSVReader& operator=(CSVReader&& other) = default;
    [build]                    ^
    [build] /n1/env_centos/7.6/include/csv.hpp:6379:23: note: move assignment operator of 'CSVReader' is implicitly deleted because field 'records' has a deleted move assignment operator
    [build]         RowCollection records = RowCollection(100);
    [build]                       ^
    [build] /n1/env_centos/7.6/include/csv.hpp:5984:24: note: copy assignment operator of 'ThreadSafeDeque<csv::CSVRow>' is implicitly deleted because field '_lock' has a deleted copy assignment operator
    [build]             std::mutex _lock;
    [build]                        ^
    [build] /opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9/../../../../include/c++/9/bits/std_mutex.h:95:12: note: 'operator=' has been explicitly marked deleted here
    [build]     mutex& operator=(const mutex&) = delete;
    

    Does it matter?

    opened by xgdgsc 1
  • Fix .pc description and url

    Fix .pc description and url

    Fix .pc description and url. PROJECT_DESCRIPTION and PROJECT_URL are not set in the project command so they evaluate to nothing. csv2 requires CMake 3.8 but setting HOMEPAGE_URL in the project command is only available in CMake 3.12 and later and the variables seem to only be used in the .pc file so set the values instead of using substitution.

    opened by niclasr 0
  • Fix invalid cast in rows()

    Fix invalid cast in rows()

    Hi, I've started using csv2 in my project but got the following error:

    /Users/keichi/Projects/research/csv2/include/csv2/reader.hpp:284:16: error: cannot initialize a variable of type 'char *' with an lvalue
          of type 'const char *const'
        for (char *p = buffer_; (p = (char *)memchr(p, '\n', (buffer_ + buffer_size_) - p)); ++p)
    

    This PR is a quick fix.

    opened by keichi 0
  • adding a cell view

    adding a cell view

    Rather than building a full string, returning a string view may be more efficient in some cases (suppose one wants to drop a column). Something like:

    class Cell { [...] std::string_view read_view() const { return std::string_view(buffer_ + start_, end_ - start_); }

    [...] };

    Even for conversion, it's probably enough.

    opened by reder2000 0
  • Empty file can not be parsed

    Empty file can not be parsed

    I use csv.mmap(filename) to read csv file.

    When the input file is empty (size 0), I expect a zero length output.

      if (csv.mmap(fileName)) {
        for (const auto row : csv)  // should not loop
    

    but I get the following error

    libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: Invalid argument fish: Job 1, '../bin/mp20_client.exe -in 0....' terminated by signal SIGABRT (Abort)

    opened by wodadehencou 1
  • Parsing error

    Parsing error

    problem1

    int,string
    1,
    

    One less column is parsed, the last column should be null

    problem2

    int,string,int
    1,"",123
    

    There are 3 columns in total, but only 2 columns can be parsed, '",123' is treated as one column

    opened by hanguanmiao 0
  • How to solve problem? error: no matching function for call to.

    How to solve problem? error: no matching function for call to.

    In file included from /tmp/tmp.RkhsHRAu07/main.cpp:2:0: /tmp/tmp.RkhsHRAu07/csv2/reader.hpp: In instantiation of ‘bool csv2::Reader<delimiter, quote_character, first_row_is_header, trim_policy>::mmap(StringType&&) [with StringType = const char (&)[8]; delimiter = csv2::delimiter<','>; quote_character = csv2::quote_character<'"'>; first_row_is_header = csv2::first_row_is_header; trim_policy = csv2::trim_policy::trim_characters<' ', '\011'>]’: /tmp/tmp.RkhsHRAu07/main.cpp:13:27: required from here /tmp/tmp.RkhsHRAu07/csv2/reader.hpp:24:11: error: no matching function for call to ‘mio::basic_mmap<(mio::access_mode)0, char>::basic_mmap(const char [8])’ mmap_ = mio::mmap_source(filename); ^ /tmp/tmp.RkhsHRAu07/csv2/reader.hpp:24:11: note: candidates are: In file included from /tmp/tmp.RkhsHRAu07/csv2/reader.hpp:4:0, from /tmp/tmp.RkhsHRAu07/main.cpp:2: /tmp/tmp.RkhsHRAu07/include/csv2/mio.hpp:216:3: note: mio::basic_mmap<AccessMode, ByteT>::basic_mmap(mio::basic_mmap<AccessMode, ByteT>&&) [with mio::access_mode AccessMode = (mio::access_mode)0; ByteT = char] basic_mmap(basic_mmap &&); ^ /tmp/tmp.RkhsHRAu07/include/csv2/mio.hpp:216:3: note: no known conversion for argument 1 from ‘const char [8]’ to ‘mio::basic_mmap<(mio::access_mode)0, char>&&’ /tmp/tmp.RkhsHRAu07/include/csv2/mio.hpp:178:3: note: mio::basic_mmap<AccessMode, ByteT>::basic_mmap() [with mio::access_mode AccessMode = (mio::access_mode)0; ByteT = char] basic_mmap() = default; ^ /tmp/tmp.RkhsHRAu07/include/csv2/mio.hpp:178:3: note: candidate expects 0 arguments, 1 provided gmake[3]: *** [CMakeFiles/TestReadBigFile.dir/main.o] Error 1 gmake[2]: *** [CMakeFiles/TestReadBigFile.dir/all] Error 2 gmake[1]: *** [CMakeFiles/TestReadBigFile.dir/rule] Error 2 gmake: *** [TestReadBigFile] Error 2

    opened by jhhe66 2
  • trailing missing value unrecognized

    trailing missing value unrecognized

    First thanks for this wonderful library. It's very fast comparing to other csv parsers.

    The issue is when the reader parses the following line (note that this is the last line of the file), column size will be 4. The parser failed to find the trailing missing column:

    ABC,123,123,123,\0

    If there's an additional line, column size will be 5 which is correct:

    ABC,123,123,123,\n\0

    opened by zhouw 1
Releases(v0.1)
  • v0.1(Dec 20, 2021)

    CSV Reader

    #include <csv2/reader.hpp>
    
    int main() {
      csv2::Reader<delimiter<','>, 
                   quote_character<'"'>, 
                   first_row_is_header<true>,
                   trim_policy::trim_whitespace> csv;
                   
      if (csv.mmap("foo.csv")) {
        const auto header = csv.header();
        for (const auto row: csv) {
          for (const auto cell: row) {
            // Do something with cell value
            // std::string value;
            // cell.read_value(value);
          }
        }
      }
    }
    

    Performance Benchmark

    This benchmark measures the average execution time (of 5 runs after 3 warmup runs) for csv2 to memory-map the input CSV file and iterate over every cell in the CSV. See benchmark/main.cpp for more details.

    cd benchmark
    g++ -I../include -O3 -std=c++11 -o main main.cpp
    ./main <csv_file>
    

    Hardware

    MacBook Pro (15-inch, 2019)
    Processor: 2.4 GHz 8-Core Intel Core i9
    Memory: 32 GB 2400 MHz DDR4
    Operating System: macOS Catalina version 10.15.3
    

    Results (as of 23 APR 2020)

    | Dataset | File Size | Rows | Cols | Time | |:--- | ---:| ---:| ---:| ---:| | Denver Crime Data | 111 MB | 479,100 | 19 | 0.174s | | AirBnb Paris Listings | 196 MB | 141,730 | 96 | 0.289s | | 2015 Flight Delays and Cancellations | 574 MB | 5,819,079 | 31 | 1.047s | | StackLite: Stack Overflow questions | 870 MB | 17,203,824 | 7 | 1.505s | | Used Cars Dataset | 1.4 GB | 539,768 | 25 | 1.979s | | Title-Based Semantic Subject Indexing | 3.7 GB | 12,834,026 | 4 | 5.929s| | Bitcoin tweets - 16M tweets | 4 GB | 47,478,748 | 9 | 7.040s | | DDoS Balanced Dataset | 6.3 GB | 12,794,627 | 85 | 12.648s | | Seattle Checkouts by Title | 7.1 GB | 34,892,623 | 11 | 12.883s | | SHA-1 password hash dump | 11 GB | 2,62,974,241 | 2 | 19.505s | | DOHUI NOH scaled_data | 16 GB | 496,782 | 3213 | 32.780s |

    Reader API

    Here is the public API available to you:

    template <class delimiter = delimiter<','>, 
              class quote_character = quote_character<'"'>,
              class first_row_is_header = first_row_is_header<true>,
              class trim_policy = trim_policy::trim_whitespace>
    class Reader {
    public:
      
      // Use this if you'd like to mmap and read from file
      bool mmap(string_type filename);
    
      // Use this if you have the CSV contents in std::string already
      bool parse(string_type contents);
    
      // Shape
      size_t rows() const;
      size_t cols() const;
      
      // Row iterator
      // If first_row_is_header, row iteration will start
      // from the second row
      RowIterator begin() const;
      RowIterator end() const;
    
      // Access the first row of the CSV
      Row header() const;
    };
    

    Here's the Row class:

    // Row class
    class Row {
    public:
      // Get raw contents of the row
      void read_raw_value(Container& value) const;
      
      // Cell iterator
      CellIterator begin() const;
      CellIterator end() const;
    };
    

    and here's the Cell class:

    // Cell class
    class Cell {
    public:
      // Get raw contents of the cell
      void read_raw_value(Container& value) const;
      
      // Get converted contents of the cell
      // Handles escaped content, e.g., 
      // """foo""" => ""foo""
      void read_value(Container& value) const;
    };
    

    CSV Writer

    This library also provides a basic csv2::Writer class - one that can be used to write CSV rows to file. Here's a basic usage:

    #include <csv2/writer.hpp>
    #include <vector>
    #include <string>
    using namespace csv2;
    
    int main() {
        std::ofstream stream("foo.csv");
        Writer<delimiter<','>> writer(stream);
    
        std::vector<std::vector<std::string>> rows = 
            {
                {"a", "b", "c"},
                {"1", "2", "3"},
                {"4", "5", "6"}
            };
    
        writer.write_rows(rows);
        stream.close();
    }
    

    Writer API

    Here is the public API available to you:

    template <class delimiter = delimiter<','>>
    class Writer {
    public:
      
      // Construct using an std::ofstream
      Writer(output_file_stream stream);
    
      // Use this to write a single row to file
      void write_row(container_of_strings row);
    
      // Use this to write a list of rows to file
      void write_rows(container_of_rows rows);
    
    Source code(tar.gz)
    Source code(zip)
Owner
Pranav
Pranav
A modern C++ library for reading, writing, and analyzing CSV (and similar) files.

Vince's CSV Parser Motivation Documentation Integration C++ Version Single Header CMake Instructions Features & Examples Reading an Arbitrarily Large

Vincent La 566 Jun 16, 2022
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

BLLIP Reranking Parser Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006 We request acknowledgement in any publications that

Brown Laboratory for Linguistic Information Processing 211 May 29, 2022
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)

Brown Laboratory for Linguistic Information Processing 211 May 29, 2022
JSON & BSON parser/writer

jbson is a library for building & iterating BSON data, and JSON documents in C++14. \tableofcontents Features # {#features} Header only. Boost license

Chris Manning 39 May 12, 2022
JSONes - c++ json parser & writer. Simple api. Easy to use.

JSONes Just another small json parser and writer. It has no reflection or fancy specs. It is tested with examples at json.org Only standart library. N

Enes Kaya ÖCAL 2 Dec 28, 2021
fast-cpp-csv-parser

Fast C++ CSV Parser This is a small, easy-to-use and fast header-only library for reading comma separated value (CSV) files. Features Automatically re

null 1.6k Jun 28, 2022
Fast, gpu-based CSV parser

nvParse Parsing CSV files with GPU Parsing delimiter-separated files is a common task in data processing. The regular way of extracting the columns fr

Anton 541 May 25, 2022
Very fast C++ .PNG writer for 24/32bpp images.

fpng Very fast C++ .PNG writer for 24/32bpp images. fpng.cpp was written to see just how fast you can write .PNG's without sacrificing too much compre

Rich Geldreich 574 Jun 19, 2022
A modern C++ library for reading, writing, and analyzing CSV (and similar) files.

Vince's CSV Parser Motivation Documentation Integration C++ Version Single Header CMake Instructions Features & Examples Reading an Arbitrarily Large

Vincent La 566 Jun 16, 2022
ZSV/lib: a fast CSV parsing library and standalone utility

Please note: this code is still alpha / pre-production. Everything here should be considered preliminary. If you like ZSVlib, please give it a star! Z

null 61 Jun 23, 2022
Header-only lock-free synchronization utilities (one writer, many readers).

stupid Header-only lock-free synchronization utilities (one writer, many readers). No queues Base functionality The base functionality of this library

Colugo 13 Jun 9, 2022
Fast C/C++ CSS Parser (Cascading Style Sheets Parser)

MyCSS — a pure C CSS parser MyCSS is a fast CSS Parser implemented as a pure C99 library with the ability to build without dependencies. Mailing List:

Alexander 119 Jun 18, 2022
A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON

JSONCONS jsoncons is a C++, header-only library for constructing JSON and JSON-like data formats such as CBOR. For each supported data format, it enab

Daniel Parker 508 Jun 20, 2022
Parses existing Chia plotter log files and builds a .csv file containing all the important details

Chia Log Analysis Parses through Chia plotter log files and plops all the juicy details into a CSV file! Build See below for instructions if you prefe

Drew M Johnson 45 May 10, 2022
Unix pager (with very rich functionality) designed for work with tables. Designed for PostgreSQL, but MySQL is supported too. Works well with pgcli too. Can be used as CSV or TSV viewer too. It supports searching, selecting rows, columns, or block and export selected area to clipboard.

Unix pager (with very rich functionality) designed for work with tables. Designed for PostgreSQL, but MySQL is supported too. Works well with pgcli too. Can be used as CSV or TSV viewer too. It supports searching, selecting rows, columns, or block and export selected area to clipboard.

Pavel Stehule 1.8k Jun 21, 2022
a cpp lib for csv reading and writing

CSV Reader and Writer Author : csl E-Mail : [email protected] OverView Comma separated values (CSV, sometimes called character separated values, becau

null 0 Apr 3, 2022
Using a RP2040 Pico as a basic logic analyzer, exporting CSV data to read in sigrok / Pulseview

rp2040-logic-analyzer This project modified the PIO logic analyzer example that that was part of the Raspberry Pi Pico examples. The example now allow

Mark 48 Jun 22, 2022
Simple CSV localization system for Unreal Engine 4

BYG Localization We wanted to support fan localization for Industries of Titan and found that Unreal's built-in localization system was not exactly wh

Brace Yourself Games 50 Jun 23, 2022
Lister (Total Commander) plugin to view CSV files

csvtab-wlx is a Total Commander plugin to view CSV files. Download the latest version Features Auto-detect codepage and delimiter Column filters Sort

null 7 May 2, 2022
tiny HTTP parser written in C (used in HTTP::Parser::XS et al.)

PicoHTTPParser Copyright (c) 2009-2014 Kazuho Oku, Tokuhiro Matsuno, Daisuke Murase, Shigeo Mitsunari PicoHTTPParser is a tiny, primitive, fast HTTP r

H2O 1.5k Jun 22, 2022
udmp-parser: A Windows user minidump C++ parser library.

udmp-parser: A Windows user minidump C++ parser library. This is a cross-platform (Windows / Linux / OSX / x86 / x64) C++ library that parses Windows

Axel Souchet 89 Jun 23, 2022
Modern C++ 20 compile time OpenAPI parser and code generator implementation

OpenApi++ : openapipp This is a proof of concept, currently under active work to become the best OpenAPI implementation for C++. It allows compile tim

tipi.build 5 Apr 8, 2022
Argument Parser for Modern C++

Highlights Single header file Requires C++17 MIT License Quick Start Simply include argparse.hpp and you're good to go. #include <argparse/argparse.hp

Pranav 1.2k Jun 28, 2022
pugixml is a Light-weight, simple and fast XML parser for C++ with XPath support

pugixml is a C++ XML processing library, which consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with Unicode interface variants and conversions between different Unicode encodings (which happen automatically during parsing/saving).

Arseny Kapoulkine 3k Jun 28, 2022
Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c

Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c

Rasmus 1.1k Jun 27, 2022
A fast JSON parser/generator for C++ with both SAX/DOM style API

A fast JSON parser/generator for C++ with both SAX/DOM style API Tencent is pleased to support the open source community by making RapidJSON available

Tencent 12.1k Jun 24, 2022
An extremely fast FEC filing parser written in C

FastFEC A C program to stream and parse FEC filings, writing output to CSV. This project is in early stages but works on a wide variety of filings and

The Washington Post 51 May 26, 2022
official repository of the muparser fast math parser library

muparser - Fast Math Parser 2.3.3 (Prerelease) To read the full documentation please go to: http://beltoforion.de/en/muparser. See Install.txt for ins

Ingo Berg 277 Jun 23, 2022
A modern day direct port of BOOM 2.02 for modern times. Aiming to tastefully continue the development of BOOM, in the style of TeamTNT.

ReBOOM ReBOOM is a continuation of the BOOM source port, version 2.02. what is it ReBOOM is a source port, directly ported from BOOM 2.02 with additio

Gibbon 12 Jun 14, 2022