The goal of arrowvctrs is to wrap the Arrow Data C API and Arrow Stream C API to provide lightweight Arrow support for R packages

Related tags

Utilities arrowvctrs
Overview

arrowvctrs

Codecov test coverage R-CMD-check Lifecycle: experimental

The goal of arrowvctrs is to wrap the Arrow Data C API and Arrow Stream C API to provide lightweight Arrow support for R packages to consume and produce streams of data in Arrow format. Right now it’s just a fun way for me to learn about Arrow!

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("paleolimbot/arrowvctrs")

Example

This is a basic example which shows you how to solve a common problem:

library(arrowvctrs)
(vctr <- as_arrow_vctr(ggplot2::mpg))
#> 
   
#> - schema:
#>   
   
#>   - format: +s
#>   - name: NULL
#>   - flags: 
#>   - metadata: NULL
#>   - dictionary: NULL
#>   - children[11]:
#>     
   
#>     - format: u
#>     - name: manufacturer
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: u
#>     - name: model
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: g
#>     - name: displ
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: i
#>     - name: year
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: i
#>     - name: cyl
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: u
#>     - name: trans
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: u
#>     - name: drv
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: i
#>     - name: cty
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: i
#>     - name: hwy
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: u
#>     - name: fl
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - format: u
#>     - name: class
#>     - flags: 
#>     - metadata: NULL
#>     - dictionary: NULL
#>     - children[0]:
#> - array:
#>   
   
#>   - length: 234
#>   - null_count: 0
#>   - offset: 0
#>   - buffers[0]:  list()
#>   - dictionary: NULL
#>   - children[11]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[2]: List of 2
#>       $ : int [1:235] 0 4 8 12 16 20 24 28 32 36 ...
#>       $ : raw [1:1463] 61 75 64 69 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[2]: List of 2
#>       $ : int [1:235] 0 2 4 6 8 10 12 14 24 34 ...
#>       $ : raw [1:2455] 61 34 61 34 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[1]: List of 1
#>       $ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[1]: List of 1
#>       $ : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[1]: List of 1
#>       $ : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[2]: List of 2
#>       $ : int [1:235] 0 8 18 28 36 44 54 62 72 80 ...
#>       $ : raw [1:2026] 61 75 74 6f ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[2]: List of 2
#>       $ : int [1:235] 0 1 2 3 4 5 6 7 8 9 ...
#>       $ : raw [1:234] 66 66 66 66 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[1]: List of 1
#>       $ : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[1]: List of 1
#>       $ : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[2]: List of 2
#>       $ : int [1:235] 0 1 2 3 4 5 6 7 8 9 ...
#>       $ : raw [1:234] 70 70 70 70 ...
#>     - dictionary: NULL
#>     - children[0]:
#>     
   
#>     - length: 234
#>     - null_count: 0
#>     - offset: 0
#>     - buffers[2]: List of 2
#>       $ : int [1:235] 0 7 14 21 28 35 42 49 56 63 ...
#>       $ : raw [1:1462] 63 6f 6d 70 ...
#>     - dictionary: NULL
#>     - children[0]:
tibble::as_tibble(from_arrow_vctr(vctr))
#> # A tibble: 234 × 11
#>    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
#>    
           
          
      
       
        
         
          
           
            
             
             
            
           
          
         
        
       
      
     
    
   
#>  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
#>  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
#>  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
#>  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
#>  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
#>  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
#>  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
#>  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
#>  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
#> 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
#> # … with 224 more rows
Issues
  • Minor compiler warnings

    Minor compiler warnings

    Thanks for putting this together -- I've been on-and-off following under the different names it had over the last few months.

    One thing I noticed, and which may be due to my use off gcc / g++ (now at -10 or -11) with you presumably using clang+= is that I get two nuisance warnings. They are easy to fix and I would be happy to send you a PR:

    • in src/cast-from-array.c you declare double_copy_result to hold the function return, but the compiler is unhappy that the result is never used -- simple fix is to not declare and assing
    • in src/narrow/buffer.cc we have simple signed/unsigned mismatch.

    That's all, and I don't mean to intrude or trample of your feet but you appear to be a rather careful coder so maybe you'd value two fewer warnings.

    opened by eddelbuettel 6
  • Could narrow_pointer_addr_dbl be exported as well?

    Could narrow_pointer_addr_dbl be exported as well?

    Your have

    # only needed for arrow package before 7.0.0
    <- function(ptr) {
      .Call(narrow_c_pointer_addr_dbl, ptr)
    }
    

    and I had been / am using schema and array pointers cast to double which maps reasonably well for non-R use. Is there a reason this could not be exported, whether or not Arrow 7.0 changed this or not. (Also, changed to ... what? I did not immediately find a replacement or alternate in the updated arrow package but I will admit that I do find navigating its code base a little difficult given the size and rate of change).

    Also, #5 really asked for a C-level API, as helpful as the exported R functions are. I may still play with a local variant to export the object code and see how it goes.

    Thanks again for the package. It is quite helpful.

    opened by eddelbuettel 4
  • Exporting record batch readers from arrow package segfaults

    Exporting record batch readers from arrow package segfaults

    ...probably a misunderstanding on my part about the object types that are expected:

    library(carrow)
    
      # some test data
      df <- data.frame(a = 1L, b = 2, c = "three")
      batch <- arrow::record_batch(df)
      tf <- tempfile()
    
      # write a valid file
      file_obj <- arrow::FileOutputStream$create(tf)
      writer <- arrow::RecordBatchFileWriter$create(file_obj, batch$schema)
      writer$write(batch)
      writer$close()
      file_obj$close()
    
      # create the reader
      read_file_obj <- arrow::ReadableFile$create(tf)
      reader <- arrow::RecordBatchFileReader$create(read_file_obj)
    
      # export it to carrow
      stream <- as_carrow_array_stream(reader)
    
      schema <- carrow_array_stream_get_schema(stream)
      identical(
        carrow_schema_info(schema, recursive = TRUE),
        carrow_schema_info(as_carrow_schema(reader$schema), recursive = TRUE)
      )
    #> [1] TRUE
    
      # skip("Attempt to read batch from exported RecordBatchReader segfaults")
      # batch <- carrow_array_stream_get_next(stream)
    
      read_file_obj$close()
      unlink(tf)
    

    Created on 2021-11-23 by the reprex package (v2.0.1)

    opened by paleolimbot 1
  • Fedora 36 gcc 12 problems

    Fedora 36 gcc 12 problems

    narrow-check.zip shows what happens when CMD check on Fedora 36 gcc 12.1.1 and the Fedora libarrow etc. binaries are used. Same for geoarrow: geoarrow-check.zip

    The GDAL 3.5.0 Arrow and Parquet drivers read the GDAL autotest/ogr/data examples OK - my reason for raising a diffuse issue is https://github.com/wcjochem/sfarrow/issues/14, a new failing revdep check on F36/gcc12 and Fedora's arrow RPMs (no external arrow libraries at all previously). I think, since you are obviously looking at checking the GDAL drivers, it might be worth seeing how to cross-check sf with GDAL 3.5.0 and the two experimental drivers, geoarrow and sfarrow (and others if they appear - maybe Python and reticulate). What do you think (@edzer for reference, this format feels promising)?

    opened by rsbivand 4
  • Could the C API be exported ?

    Could the C API be exported ?

    narrow already automates creating the needed instantiations via src/init.c. That registers the object code.

    How do you think about going one step further and provide a file, say, narrowAPI.h with a bunch of declarations such as (untested, adapting from prior work)

    SEXP attribute_hidden narrow_c_array_info(SEXP x) {
      static SEXP(*fun)(SEXP) = (SEXP(*)(SEXP)) R_GetCCallable("narrow","narrow_c_array_info");
      return fun(x);
    }
    

    and so on.

    A number of packages do this (e.g. nloptr, I have two small packages exporting some of R's own (unexported) API functions (via copies in the package), also added it in small scope other packages). The canonical example (also in WRE) is always lme4 and Matrix.

    Would you have any interest in supporting such use?

    opened by eddelbuettel 11
Owner
Dewey Dunnington
Former RStudio summer intern, ggplot2
Dewey Dunnington
Isaac ROS common utilities and scripts for use in conjunction with the Isaac ROS suite of packages.

Isaac ROS Common Isaac ROS common utilities and scripts for use in conjunction with the Isaac ROS suite of packages. Docker Scripts run_dev.sh creates

NVIDIA Isaac ROS 45 Jul 25, 2022
Modify Android linker to provide loading module and hook function

fake-linker Chinese document click here Project description Modify Android linker to provide loading module and plt hook features.Please check the det

sanfengAndroid 194 Jul 19, 2022
cavi is an open-source library that aims to provide performant utilities for closed hierarchies (i.e. all class types of the hierarchy are known at compile time).

cavi cavi is an open-source library that aims to provide performant utilities for closed hierarchies (i.e. all class types of the hierarchy are known

Baber Nawaz 5 Mar 9, 2022
LXC Manager provide a set of functions to visually manage LXC unprivileged containers.

LXC Manager provide a set of functions to visually manage LXC unprivileged containers. The applciation use LXC Api to manage LXC. To use the application you must have LXC installed on your linux machine.

Peter Cata 3 May 10, 2022
provide SFML Time utilities in pure C++20, no dependencies

SFML-Time-utilities-without-SFML provide SFML Time utilities in pure C++20, no dependencies Example int main() { Clock clock; Sleep(1000);

null 1 Apr 28, 2022
This data is a sample data created for a Qiita article and a YouTube commentary.

NiagaraSample UE4 4.27.1 English This data is a sample data created for a Qiita article and a YouTube commentary. Here is some sample data that may be

O.Y.G 10 Jun 15, 2022
The lightweight and modern Map SDK for Android and iOS

Open Mobile Maps The lightweight and modern Map SDK for Android (6.0+) and iOS (10+) openmobilemaps.io Getting started Readme Android Readme iOS Featu

Open Mobile Maps 89 Jul 6, 2022
Simple and lightweight pathname parser for C. This module helps to parse dirname, basename, filename and file extension .

Path Module For C File name and extension parsing functionality are removed because it's difficult to distinguish between a hidden dir (ex: .git) and

Prajwal Chapagain 3 Feb 25, 2022
MMUit is a lightweight toolkit to explore and modify address translation for ARM64.

Overview MMUit is a lightweight toolkit to explore and modify address translation for ARM64. C/C++ interface detailed information on VA, TTE, TCR etc

Alexander Hude 37 Feb 13, 2022
A shebang-friendly script for "interpreting" single C99, C11, and C++ files, including rcfile support.

c99sh Basic Idea Control Files Shebang Tricks C++ C11 Credits Basic Idea A shebang-friendly script for "interpreting" single C99, C11, and C++ files,

Rhys Ulerich 101 Jul 29, 2022
A small and portable INI file library with read/write support

minIni minIni is a portable and configurable library for reading and writing ".INI" files. At just below 900 lines of commented source code, minIni tr

Thiadmer Riemersma 268 Jul 22, 2022
This project aims to facilitate debugging a kernel driver in windows by adding support for a code change on the fly without reboot/unload, and more!

BSOD Survivor Tired of always telling yourself when you got a BSOD that what if I could just return to the caller function which caused the BSOD, and

Ido Westler 146 Jul 21, 2022
match(it): A lightweight header-only pattern-matching library for C++17 with macro-free APIs.

match(it): A lightweight header-only pattern-matching library for C++17 with macro-free APIs. Features Easy to get started. Single header library. Mac

Bowen Fu 229 Jul 30, 2022
Lightweight state machine implemented in C++

Intro This is my second take on C++ state machine implementation. My first attempt can be found here. The main goals of the implementation are: No dyn

Łukasz Gemborowski 20 Jul 14, 2022
CE-Plugin - 📃 Support Version Cheat Engine 6.5~Higher

?? Support Version Cheat Engine 6.5~Higher ?? Preview ❄️ Reference & Thanks Cheat Engine[Debugger with plugin] Unicorn[CPU emulator framework] Capston

kanren3 1 Jul 25, 2022
GNU project's implementation of the standard C library(with Xuantie RISC-V CPU support).

GNU project's implementation of the standard C library(with Xuantie RISC-V CPU support).

T-Head Semiconductor Co., Ltd. 5 Mar 17, 2022
This library support run-time type casting faster than dynamic_cast ( similar to unreal engine's CastTo )

Fast Runtime Type Casting This library give you C++ Fast Runtime Type Casting faster than dynamic_cast ( similar to Unreal Engine's CastTo, IsChildOf

SungJinKang 7 Jun 11, 2022
The C++ REST SDK is a Microsoft project for cloud-based client-server communication in native code using a modern asynchronous C++ API design. This project aims to help C++ developers connect to and interact with services.

The C++ REST SDK is a Microsoft project for cloud-based client-server communication in native code using a modern asynchronous C++ API design. This project aims to help C++ developers connect to and interact with services.

Microsoft 7k Aug 4, 2022
Windows kernel hacking framework, driver template, hypervisor and API written on C++

Windows kernel hacking framework, driver template, hypervisor and API written on C++

Александр 1.2k Aug 1, 2022