Rangeless - c++ LINQ -like library of higher-order functions for data manipulation

Overview

rangeless::fn

range-free LINQ-like library of higher-order functions for manipulation of containers and lazy input-sequences.

Documentation

What it's for

  • Reduce the amount of mutable state.
  • Flatten control-flow.
  • Lessen the need to deal with iterators directly.
  • Make the code more expressive and composeable.

This library is intended for moderate to advanced-level c++ programmers that like the idea of c++ ranges, but can't or choose not to use them for various reasons (e.g. high complexity, compilation overhead, debug-build performance, size of the library, etc).

Motivations:

Features

  • Portable c++11. (examples are c++14)
  • Single-header.
  • Minimal standard library dependencies.
  • No inheritance, polymorphism, type-erasures, ADL, advanced metaprogramming, enable_ifs, concepts, preprocessor magic, arcane programming techniques (for some definition of arcane), or compiler-specific workarounds.
  • Low #include and compile-time overhead.
  • Enables trivial parallelization (see fn::to_async and fn::transform_in_parallel).
  • Allows for trivial extension of functionality (see fn::adapt).

Simple examples

struct employee_t
{
            int id;
    std::string last_name;
    std::string first_name;
            int years_onboard;

    bool operator<(const employee_t& other) const
    {
        return id < other.id;
    }
};

namespace fn = rangeless::fn;
using fn::operators::operator%;   // arg % fn   equivalent to fn(std::forward<Arg>(arg))
using fn::operators::operator%=;  // arg %= fn; equivalent to arg = fn( std::move(arg));

// Abbreviated lambda macro, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0573r2.html
// (It is not defined in this library, because nobody likes macros except those we define ourselves).
#define L(expr) ([&](auto&& _ ){ return expr; })

auto employees = std::vector<employee_t>{/*...*/};

employees %= fn::where L( _.last_name != "Doe" );
employees %= fn::take_top_n_by(10, L( _.years_onboard ));
employees %= fn::sort_by L( std::tie( _.last_name, _.first_name) );

// or:

employees = std::move(employees)
          % fn::where L( _.last_name != "Doe" )
          % fn::take_top_n_by(10, L( _.years_onboard ))
          % fn::sort_by L( std::tie( _.last_name, _.first_name) );

How does this work? E.g. fn::sort_by(projection_fn) is a higher-order function that returns a unary function that takes inputs by value (normally passed as rvalue), sorts them by the user-provided projection, and returns them by value (non-copying). Similarly, fn::where(predicate) filters the input to those satisfying the predicate.

operator %, pronounced "then", invoked as arg % unary_function, is syntax-sugar, similar to F#'s operator |>, that enables structuring your code in top-down manner, consistent with the direction of the data-flow, similar to UNIX pipes. It is implemented as:

    template<typename Arg, typename F>
    auto operator % (Arg&& arg, F&& fn) -> decltype( std::forward<F>(fn)( std::forward<Arg>(arg)) )
    {
        return std::forward<F>(fn)( std::forward<Arg>(arg));
    }

Without using operator% the above example would look as follows:

employees = fn::where L( _.last_name != "Doe" )( std::move(employees) );
employees = fn::take_top_n_by(10, L( _.years_onboard ))( std::move(employees) );
employees = fn::sort_by L( std::tie( _.last_name, _.first_name) )( std::move(employees) );

// or, as single nested function call:

employees = fn::sort_by L( std::tie( _.last_name, _.first_name))(
                fn::take_top_n_by(10, L( _.years_onboard ))(
                    fn::where L( _.last_name != "Doe" )(
                        std::move(employees) )));

Example: Top-5 most frequent words chosen among the words of the same length.

    auto my_isalnum = L( std::isalnum(_) || _ == '_' );

    fn::from( std::istreambuf_iterator<char>(istr.rdbuf()), {})
  % fn::transform L( char(std::tolower(_)) )
  % fn::group_adjacent_by(my_isalnum)             // returns sequence-of-std::string
  % fn::where L( my_isalnum( _.front()))          // discard strings with punctuation
  % fn::counts()                                  // returns map<string,size_t> of word->count
  % fn::group_all_by L( _.first.size())           // returns [[(word, count)]], each subvector containing words of same length
  % fn::transform(                                // transform each sub-vector...
        fn::take_top_n_by(5UL, L( _.second))      // by filtering it taking top-5 by count.
  % fn::concat()                                  // undo group_all_by (flatten)
  % fn::for_each L( void(std::cout << _.first << "\t" << _.second << "\n") );

    // compilation time:
    // >>time g++ -I ../include/ -std=c++14 -o test.o -c test.cpp
    // real   0m1.176s
    // user   0m1.051s
    // sys    0m0.097s

More examples

Description

Unlike range-v3, this library is centered around value-semantics rather than reference-semantics. This library does not know or deal with the multitude of range-related concepts; rather, it deals with data transformations via higher-order functions. It differentiates between two types of inputs: a Container and a lazy seq<NullaryInvokable> satisfying single-pass forward-only InputRange semantics (also known as a data-stream). Most of the function-objects in this library have two overloads of operator() respectively. Rather than composing views over ranges as with range-v3, operator()s take inputs by value, operate on it eagerly or compose a lazy seq, as appropriate (following the Principle of Least Astonishment), and return the result by value (with move-semantics) to the next stage.

E.g.

  • fn::where
    • given a container, passed by rvalue, returns the same container filtered to elements satisfying the predicate, using the erase-remove or iterate-erase idioms under the hood, as appropriate.
    • given a container, passed by lvalue-reference, returns a copy of the container with elements satisfying the predicate, using std::copy_if under the hood.
    • given a seq, passed by value, composes and returns a seq that will skip the elements not satisfying the predicate (lazy).
  • fn::sort
    • given a container, passed by value, returns the sorted container.
    • given a seq, passed by value, moves elements into a std::vector, and delegates to the above.
  • fn::transform
    • given a seq, passed by value, returns a seq wrapping a composition of the transform-function over the underlying NullaryInvokable that will lazily yield the results of transform-function.
    • given a container, passed by value, wraps it as seq and delegates to the above (i.e. also lazy).

Some functions in this library internally buffer elements, as appropriate, with single-pass streaming inputs, whereas range-v3, on the other hand, imposes multipass ForwardRange or stronger requirement on the inputs in situations that would otherwise require buffering. This makes this library conceptually more similar to UNIX pipelines with eager sort and lazy sed, than to c++ ranges.

Operations Buffering behavior Laziness
fn::group_adjacent_by, fn::in_groups_of buffer elements of the incoming group lazy
fn::unique_all_by buffer unique keys of elements seen so far lazy
fn::drop_last, fn::sliding_window buffer a queue of last n elements lazy
fn::transform_in_parallel buffer a queue of n executing async-tasks lazy
fn::group_all_by, fn::sort_by, fn::lazy_sort_by, fn::reverse, fn::to_vector buffer all elements eager
fn::take_last buffer a queue of last n elements eager
fn::where_max_by, fn::where_min_by buffer maximal/minimal elements as seen so-far eager
fn::take_top_n_by buffer top n elements as seen so-far eager

Signaling end-of-sequence from a generator-function

More often than not a generator-function that yields a sequence of values will not be an infinite Fibonacci sequence, but rather some bounded sequence of objects, either from a file, a socket, a database query, etc, so we need to be able to signal end-of-sequence. One way to do it is to yield elements wrapped in std::unique_ptr or std::optional:

  fn::seq([]() -> std::unique_ptr<...> { ... })
% fn::take_while([](const auto& x) { return bool(x); })
% fn::transform(fn::get::dereferenced{})
% ...

If your value-type has an "empty" state interpretable as end-of-inputs, you can use the value-type directly without wrapping.

If you don't care about incurring an exception-handling overhead once per whole seq, there's a simpler way of doing it: just return fn::end_seq() from the generator function (e.g. see my_intersperse example). This throws end-of-sequence exception that is caught under the hood (python-style). If you are in -fno-exceptions land, then this method is not for you.

Summary of different ways of passing inputs

      fn::seq([]{ ... }) % ... // as input-range from a nullary invokable

          std::move(vec) % ... // pass a container by-move
                    vec  % ... // pass by-copy

           fn::from(vec) % ... // as move-view yielding elements by-move (std::move will make copies iff vec is const)
          fn::cfrom(vec) % ... // as above, except always take as const-reference / yield by copy
           fn::refs(vec) % ... // as seq taking vec by reference and yielding reference-wrappers

fn::from(it_beg, it_end) % ... // as a move-view into range (std::move will make copies iff const_iterator)
  fn::from(beg_end_pair) % ... // as above, as std::pair of iterators

Note: fn::from can also be used to adapt an lvalue-reference to an Iterable that implements begin() and end() as free-functions rather than methods.

Primer on using projections

Grouping/sorting/uniqing/where_max_by/etc. functions take a projection function rather than a binary comparator as in std:: algorithms.

    // Sort by employee_t::operator<.
    employees %= fn::sort(); // same as fn::sort_by( fn::by::identity{} );

    // Sort by a projection involving multiple fields (first by last_name, then by first_name):
    employees %= fn::sort_by L( std::make_pair( _.last_name, _.first_name ));

    // The above may be inefficient (makes copies); prefer returning as tuple of references:
    employees %= fn::sort_by L( std::tie( _.last_name, _.first_name ));

    // If need to create a mixed tuple capturing lvalues as references and rvalues as values:
    employees %= fn::sort_by L( std::make_tuple( _.last_name.size(), std::ref( _.last_name ), std::ref( _.first_name )));

    // Equivalently, using fn::tie_lvals(...) that captures lvalues as references like std::tie, and rvalues as values:
    employees %= fn::sort_by L( fn::tie_lvals( _.last_name.size(), _.last_name, _.first_name ));

    // fn::by::decreasing() and fn::by::decreasing_ref() can wrap individual values or references,
    // The wrapper captures the value or reference and exposes inverted operator<.
    // E.g. to sort by (last_name's length, last_name descending, first_name):
    employees %= fn::sort_by L( fn::tie_lvals( _.last_name.size(), fn::by::decreasing_ref( _.last_name ), _.first_name ));

    // fn::by::decreasing() can also wrap the entire projection-function:
    employees %= fn::sort_by( fn::by::decreasing L( fn::tie_lvals( _.last_name.size(), _.last_name, _.first_name )));

    // If the projection function is expensive, and you want to invoke it once per element:
    auto expensive_key_fn = [](const employee_t& e) { return ... };

    employees = std::move(employees)
              % fn::transform([&](employee_t e)
                {
                    auto key = expensive_key_fn(e);
                    return std::make_pair( std::move(e), std::move(key) );
                })
              % fn::sort_by( fn::by::second{})   // or L( std::ref( _.second ))
              % fn::transform( fn::get::first{}) // or L( std::move( _.first ))
              % fn::to_vector();

    // Alternatively, expensive_key_fn can be wrapped with a unary-function-memoizer
    // (results are internally cached in std::map and subsequent lookups are log-time).
    employees %= fn::sort_by( fn::make_memoized( expensive_key_fn ));

    // fn::make_comp() takes a projection and creates a binary Comparator object
    // that can be passed to algorithms that require one.
    gfx::timsort( employees.begin(), employees.end(),
                  fn::by::make_comp L( std::tie( _.last_name, _.first_name )));

#include and compilation-time overhead

Despite packing a lot of functionality, #include <fn.hpp> adds only a tiny sliver (~0.03s) of compilation-time overhead in addition to the few common standard library include-s that it relies upon:

// tmp.cpp

#if defined(STD_INCLUDES)
#    include <stdexcept>
#    include <algorithm>
#    include <functional>
#    include <vector>
#    include <map>
#    include <deque>
#    include <string>
#    include <cassert>
#elif defined(INCLUDE_FN)
#    include "fn.hpp"
#elif defined(INCLUDE_RANGE_ALL)
#    include <range/v3/all.hpp>
#endif

int main()
{
    return 0;
}
# just std-includes used by fn.hpp
>>time for i in {1..10}; do g++ -std=c++14 -DSTD_INCLUDES=1 -o tmp.o -c tmp.cpp; done
real    0m3.682s
user    0m3.106s
sys     0m0.499s

# fn.hpp
>>time for i in {1..10}; do g++ -std=c++14 -DINCLUDE_FN=1 -o tmp.o -c tmp.cpp; done
real    0m3.887s
user    0m3.268s
sys     0m0.546s

# range/v3/all.hpp, for comparison
>>time for i in {1..10}; do g++ -std=c++14 -DINCLUDE_RANGE_ALL=1 -I. -o tmp.o -c tmp.cpp; done
real    0m22.687s
user    0m20.412s
sys     0m2.043s

There are not many compiler-torturing metaprogramming techniques used by the library, so the template-instantiation overhead is reasonable as well (see the Top-5 most frequent word example; leaving it as an excercise to the reader to compare with raw-STL-based implementation).

Discussion

Many programmers after getting introduced to toy examples get an impression that the intended usage is "to express the intent" or "better syntax" or to "separate the concerns", etc.
Others look at the toy examples and point out that they could be straightforwardly written as normal imperative code, and I tend to agree with them: There's no real reason to write code like:

    std::move(xs)
  % fn::where(     [](const auto& x) { return x % 2 == 0; })
  % fn::transform( [](const auto& x) { return x * x; }
  % fn::for_each(  [](const auto& x) { std::cout << x << "\n"; })

, if it can be written simply as

for(const auto& x : xs)
    if(x % 2 == 0)
        std::cerr << x*x << "\n";

The "functional-style" equivalent just incurs additional compile-time, debugging-layers, and possibly run-time overhead.

There are some scenarios where functional-style is useful:

Const-correctness and reduction of mutable state.

When you declare a non-const variable in your code (or worse, when you have to deal with an API that forces you to do that), you introduce another "moving part" in your program. Having more things const makes your code more robust and easier to reason about.

With this library you can express complex data transformations as a single expression, assigning the result to a const variable (unless you intend to std::move() it, of course).

const auto result = std::move(inputs) % stage1 % stage2 % ... ;

Another code-pattern for this is immediately-executed-lambda.

const X x = [&]
{
    X x{};

    // build X...
    return x;
}();

Reduced boilerplate.

E.g. compare

employees %= fn::where L( _.last_name != "Doe" );

vs. idiomatic c++ way:

employees.erase(
    std::remove_if( employees.begin(), 
                    employees.end(), 
                    [](const employee_t& e)
                    {
                        return e.last_name == "Doe"; 
                    }), 
    employees.end());

Transformations over infinite (arbitrarily large) streams (InputRanges)

The most useful use-case is the scenarios for writing a functional pipeline over an infinite stream that you want to manipulate lazily, like a UNIX pipeline, e.g. the above-mentioned aln_filter.cpp.

Implement embarassingly-parallel problems trivially.

Replacing fn::transform with fn::transform_in_parallel where appropriate may be all it takes to parallelize your code.

Downsides and Caveats.

Compilation errors related to templates are completely gnarly. For this reason the library has many static_asserts to help you figure things out. If you encounter a compilation error that could benefit from adding a static_assert, please open an issue.

Sometimes it may be difficult to reason about the complexity space and time requirements of some operations. There are two ways to approach this: 1) Peek into documentation where I discuss space and time big-O for cases that are not obvious (e.g. how lazy_sort_by differs from regular sort_by, or how unique_all_by operates in single-pass for seqs). 2) Feel free to peek under the hood of the library. Most of the code is intermediate-level c++ that should be within the ability to grasp by someone familiar with STL and <algorithm>.

There's a possibility that a user may instantiate a seq and then forget to actually iterate over it, e.g.

    std::move(inputs) % fn::transform(...); // creates and immediately destroys a lazy-seq.

    // whereas the user code probably intended:
    std::move(inputs) % fn::transform(...) % fn::for_each(...);

Minimum supported compilers: MSVC-19.15, GCC-4.9.3, clang-3.7, ICC-18

References:

c++ -specific (in no particular order):

Recommended blogs:

Issues
  • std::inserter not found

    std::inserter not found

    I get a compiler error when trying to use this library with a C++17 msvc2019 project:

    "inserter" is not a member of "std"

    Adding #include <iterator> fixes this.

    I would send a pull request, but I am not experienced enough in C++ to have the confidence.

    opened by XDracam 1
  • exists_where doesn't work after fn::transform

    exists_where doesn't work after fn::transform

    The exists_where operation takes a const Iterable& which makes it fail when used after fn::transform or other operators... I think that ones that generate seq? The issue seems to be that seq.begin is a non-const operation because seq is consumable.

    I think that exists_where should work like for_each or foldl, both of which take Iterable&&. It's really just a foldl with a fixed algorithm that can return early.

    First invocation works, second one doesn't:

    std::vector<int> a = {1, 2, 3};
    a % fn::exists_where([](auto item_a) { return item_a == 1; });
    a % fn::transform([](auto item_a) { return item_a; }) % fn::exists_where([](auto item_a) { return item_a == 1; });
    
    opened by willisblackburn 0
  • Allow easier inclusion in other cmake projects

    Allow easier inclusion in other cmake projects

    This is a bit awkward because I'm such a cmake noob but this PR adds fn as a interface library which allows it to be used directly via target_link_libraries in other projects (using CPM.cmake here)

    CPMAddPackage(
        NAME fn
        GITHUB_REPOSITORY ast-al/rangeless
    )
    
    target_link_libraries(MyApp PRIVATE
        fn
    )
    
    opened by doubleday 1
  • Recent compilers are warning about pessimizations

    Recent compilers are warning about pessimizations

    I have been playing around with rangeless as a faster compiling alternative to std::ranges and ranges-v3. I have been getting some compiler warnings that there are redundant or pessimizing moves in fn::transform and in fn::take_while. Would you consider removing the unnecessary std::move wrappers on lines 1592 and 1846? See this example. Thanks.

    opened by gotnone 1
Manual mapper that uses PTE manipulation, Virtual Address Descriptor (VAD) manipulation, and forceful memory allocation to hide executable pages. (VAD hide / NX bit swapping)

Stealthy Kernel-mode Injector Manual mapper that uses PTE manipulation, Virtual Address Descriptor (VAD) manipulation, and forceful memory allocation

Charlie Wolfe 79 Jun 22, 2022
Example code for collecting weather data from an ESP32 and then uploading this data to InfluxDB in order to create a dashboard using Grafana.

InfluxGrafanaTutorial Example code for collecting weather data from an ESP32 and then uploading this data to InfluxDB in order to create a dashboard u

Michael Klements 5 May 12, 2022
Libft is an individual project at 42 that requires us to re-create some standard C library functions including some additional ones that can be used later to build a library of useful functions for the rest of the program.

Libft is an individual project at 42 that requires us to re-create some standard C library functions including some additional ones that can be used later to build a library of useful functions for the rest of the program.

Paulo Rafael Ramalho 0 Apr 5, 2022
This repository was created in order to keep local data with code in the cloud.

Airplane Ino Данный репозиторий был создан для совсместной комфортной работы над проектом. В данном файле(README.md) будет размещена основная полезная

surpri6e 1 Nov 26, 2021
Quick fix to iphone usb tethering with ios14 or higher for Linux kernel lower than 5.10.4

Quick fix to Linux Iphone USB tethering with IOS 14 or higher (Tested with ubuntu 18.04, kernel 5.4.0-65, if you fail in the build, please download yo

null 23 May 6, 2022
Allows to join RDF of previous expansions on a higher character level

mod-rdf-expansion Allows to join RDF of previous expansions on a higher character level. Up to character level 58, you can join the "Random Classic Du

AzerothCore 3 Apr 3, 2022
MozJPEG improves JPEG compression efficiency achieving higher visual quality and smaller file sizes at the same time

Mozilla JPEG Encoder Project MozJPEG improves JPEG compression efficiency achieving higher visual quality and smaller file sizes at the same time. It

Mozilla 4.9k Jun 24, 2022
hb-libgd is a graphics library for the dynamic manipulation of images.

hb-libgd Harbour bindings for libGD 2.3.3, GD is an open source code library for the dynamic creation of images by programmers. GD has builtin support

Rafał Jopek 0 Dec 10, 2021
Pdfmm - A C++ PDF manipulation library forked from PoDoFo

pdfmm What is pdfmm? Requirements String encoding API Stability TODO Licensing No warranty Contributions Authors What is pdfmm? pdfmm is a s a free po

null 22 Jun 22, 2022
Modifies the hosts file in order to block sites hosting Kant's rat

In the Minecraft cheating community, it's not uncommon for clients or client cracks/leaks to be malware. The most famous example of this would be the Autumn client "crack", released by Kant. This application attempts to blacklist known hosts of Kant's malware, in order to prevent someone from accidentally getting themselves ratted.

Gardening_Tool 55 Jun 16, 2022
A simple utility that cold patches dwm (uDWM.dll) in order to disable window rounded corners in Windows 11

Win11DisableRoundedCorners A simple utility that cold patches the Desktop Window Manager (uDWM.dll) in order to disable window rounded corners in Wind

Valentin-Gabriel Radu 387 Jun 27, 2022
DLL Hijack Search Order Enumeration BOF

DLL Hijack Search Order BOF What is this? This is a Cobalt Strike BOF file, meant to use two arguments (path to begin, and a DLL filename of interest)

null 117 Jun 24, 2022
A collection of DLLs that use search order hijacking to automatically inject specified DLLs.

?? Koaloader ?? A collection of DLLs that use search order hijacking to automatically inject specified DLLs. ?? Usage Simply place one of the proxy dl

null 28 Jun 25, 2022
A ZX Spectrum-like library built for "dos-like" by Mattias Gustavsson.

ZX-Like A ZX Spectrum-like library built for "dos-like" by Mattias Gustavsson. It allows for the creation of ZX Spectrum like screens for demos, games

Oli Wilkinson 3 Oct 20, 2021
This is a tool for software engineers to view,record and analyse data(sensor data and module data) In the process of software development.

![Contributors][Huang Jianyu] Statement 由于工具源码在网上公开,除使用部分开源项目代码外,其余代码均来自我个人,工具本身不包含公司的知识产权,所有与公司有关的内容均从软件包中移除,软件发布遵循Apache协议,任何人均可下载进行修改使用,如使用过程中出现任何问

HuangJianyu 34 May 5, 2022
Vimb - the vim like browser is a webkit based web browser that behaves like the vimperator plugin for the firefox and usage paradigms from the great editor vim.

Vimb - the vim like browser is a webkit based web browser that behaves like the vimperator plugin for the firefox and usage paradigms from the great editor vim. The goal of vimb is to build a completely keyboard-driven, efficient and pleasurable browsing-experience.

Daniel Carl 1.1k Jun 19, 2022
Itpp - IT++ library mirror/fork. C++ library of mathematical, signal processing and communication classes and functions.

Introduction ************ IT++ is a C++ library of mathematical, signal processing and communication classes and functions. Its main use is in simula

null 18 Apr 9, 2022
Samir Teymurov 1 Oct 6, 2021