Blazing fast, composable, Pythonic quantile filters.

Overview

Rolling Quantiles for NumPy

Python tests

Hyper-efficient and composable filters.

  • Simple, clean, intuitive interface.
  • Supports streaming data or bulk processing.
  • Python 3 bindings for a lean library written in pure C.

A Quick Tour

Let me give you but a superficial overview of this module's elegance.

import numpy as np
import rolling_quantiles as rq

pipe = rq.Pipeline( # rq.Pipeline is the only stateful object
  # declare a cascade of filters by a sequence of immutable description objects
  rq.LowPass(window=201, portion=100, subsample_rate=2),
    # the above takes a median (101th element out of 201) of the most recent 200
    # points and then spits out every other one
  rq.HighPass(window=10, portion=3)
    # that subsampled rolling median is then fed into this filter that takes a
    # 30% quantile on a window of size 10, and subtracts it from its raw input

# the pipeline exposes a set of read-only attributes that describe it
pipe.lag # = 60.0, the effective number of time units that the real-time output
         #   is delayed from the input
pipe.stride # = 2, how many inputs it takes to produce an output
            #  (>1 due to subsampling)


input = np.random.randn(1000)
output = pipe.feed(input) # the core, singular exposed method

# every other output will be a NaN to demarcate unready values
subsampled_output = output[1::pipe.stride]

Example Signal

That may be a lot to take in, so let me break it down for you:

  • rq.Pipeline(description...) constructs a filter pipeline from one or more filter descriptions and initializes internal state.
  • .feed(*) takes in a Python number or np.array and its output is shaped likewise.
  • The two filter types are rq.LowPass and rq.HighPass that compute rolling quantiles and return them as is, and subtract them from the raw signal respectively. Compose them however you like!
  • NaNs in the output purposefully indicate missing values, usually due to subsampling. If you pass a NaN into a LowPass filter, it will slowly deplete its reserve and continue to return valid quantiles until the window empties completely.
  • rq.LowPass and rq.HighPass alternatively take in a quantile=q argument, 0<=q<=1. The filters would perform a linear interpolation in this case. In order to control the statistical characteristics of this quantile estimate, parameters alpha and beta are exposed as well with default values (1, 1). Refer to SciPy's documentation for details on this aspect.

I also expose a convenience function rq.medfilt(signal, window_size) at the top-level of the package to directly supplant scipy.signal.medfilt.

That's it! I detailed the entire library. Don't let the size of its interface fool you!

Installation

Downloads

If you are running Linux, MacOS, or Windows with Python 3.8+ and NumPy ~1.20, execute the following:

pip install rolling-quantiles

These are the conditions under which binaries are built and sent to the Python Package Index, which holds pip's packages. Should the NumPy version be unsuitable, for instance, I suggest building the package from source. This is rather straightforward because the handful of source files in C have absolutely minimal dependencies.

Building from Source

The meat of this package is a handful of C files with no external dependencies, besides NumPy 1.16+ and Python 3.7+ for the bindings located in src/python.c. As such, you may build from source by running the following from the project's root directory:

  1. cd python
  2. Check pyproject.toml to make sure the listed NumPy version matches your desired target.
  3. python -m build (make sure this invokes Python 3)
  4. pip install dist/<name_of_generated_wheel_file>.whl

Note of Caution on MacOS Big Sur

Make sure to specify MACOSX_DEPLOYMENT_TARGET=10.X as a prefix to the build command, e.g. python -m build. The placeholder X can be any MacOS version earlier than Big Sur (I use 9.) By default, the build system would attempt to build for MacOS 11 that is incompatible with current Python interpreters that have been compiled against a prior version.

Benchmarking a median filter on 100 million doubles.

I make use of binary heaps that impart desirable guarantees on their amortized runtime. Realistically, their performance may depend on the statistics of the incoming signal. I pummeled the filters with Gaussian Brownian motion to gauge their practical usability under a typical drifting stochastic process.

window rolling_quantiles [1] scipy [2] pandas [3]
4 14 seconds 22 seconds 25 seconds
10 21 seconds 47 seconds 31 seconds
20 28 seconds 95 seconds 35 seconds
30 30 seconds 140 seconds 37 seconds
40 34 seconds 190 seconds 40 seconds
50 36 seconds 242 seconds 40 seconds
1,000 61 seconds N/A 62 seconds

Likewise, with simulated Gaussian white noise (no drift in the signal):

window rolling_quantiles [1] scipy [2] pandas [3]
4 14 seconds 22 seconds 25 seconds
10 20 seconds 51 seconds 31 seconds
20 25 seconds 105 seconds 36 seconds
30 27 seconds 156 seconds 39 seconds
40 30 seconds 218 seconds 41 seconds
50 30 seconds 279 seconds 42 seconds
1,000 45 seconds N/A 70 seconds

Intel(R) Core(TM) i7-8700T CPU @ 2.40GHz, single-threaded performance on Linux. My algorithm looked even better (relative to pandas) on a 2020 MacBook Pro. Check out this StackOverflow answer for a particular use case.

[1] rq.Pipeline(...)

[2] scipy.signal.medfilt(...)

[3] pd.Series.rolling(*).quantile(...)

Brought to you by Myrl

You might also like...
ThunderSVM: A Fast SVM Library on GPUs and CPUs
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Fast, differentiable sorting and ranking in PyTorch
Fast, differentiable sorting and ranking in PyTorch

Torchsort Fast, differentiable sorting and ranking in PyTorch. Pure PyTorch implementation of Fast Differentiable Sorting and Ranking (Blondel et al.)

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

Unofficial third-party implementation of FFD (fast feature detector) published in IEEE TIP 2020.

fast_feature_detector Unofficial third-party implementation of FFD (fast feature detector) published in IEEE TIP 2020. Caution I have not got any perm

SOINN / 聚类 / 无监督聚类 / 快速 / clustering / unsupervised clustering / fast
SOINN / 聚类 / 无监督聚类 / 快速 / clustering / unsupervised clustering / fast

____ ___ ___ _ _ _ _ / ___| / _ \_ _| \ | | \ | | \___ \| | | | || \| | \| | ___) | |_| | || |\ | |\ | |____/ \___/___|_| \_|_| \_| SOIN

CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio

CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.

fast face classification
fast face classification

Fast Face Classification (F²C)—— An Efficient Training Approach for Very Large Scale Face Recognition Training on ultra-large-scale datasets is time-c

FG-Net: Fast Large-Scale LiDAR Point Clouds Understanding Network Leveraging Correlated Feature Mining and Geometric-Aware Modelling
FG-Net: Fast Large-Scale LiDAR Point Clouds Understanding Network Leveraging Correlated Feature Mining and Geometric-Aware Modelling

FG-Net: Fast Large-Scale LiDAR Point Clouds Understanding Network Leveraging Correlated Feature Mining and Geometric-Aware Modelling Comparisons of Running Time of Our Method with SOTA methods RandLA and KPConv:

Header-only C++/python library for fast approximate nearest neighbors

Hnswlib - fast approximate nearest neighbor search Header-only C++ HNSW implementation with python bindings. NEWS: Hnswlib is now 0.5.2. Bugfixes - th

Comments
  • Impossible to install with python 3.9.X

    Impossible to install with python 3.9.X

    Context: python 3.9.12 Ubuntu 22 pip 22.1.1

    This packages is impossible to install using pip on python 3.9.X:

    pip install rolling-quantiles
    ERROR: Could not find a version that satisfies the requirement rolling-quantiles (from versions: none)
    ERROR: No matching distribution found for rolling-quantiles
    

    Installation using setup.py fails

    
     python rolling-quantiles-1.0.0/python/setup.py install
    running install
    running bdist_egg
    running egg_info
    creating UNKNOWN.egg-info
    writing UNKNOWN.egg-info/PKG-INFO
    writing dependency_links to UNKNOWN.egg-info/dependency_links.txt
    writing top-level names to UNKNOWN.egg-info/top_level.txt
    writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
    reading manifest file 'UNKNOWN.egg-info/SOURCES.txt'
    writing manifest file 'UNKNOWN.egg-info/SOURCES.txt'
    installing library code to build/bdist.linux-x86_64/egg
    running install_lib
    running build_ext
    building 'triton' extension
    Warning: Can't read registry to find the necessary compiler setting
    Make sure that Python modules winreg, win32api or win32con are installed.
    INFO: C compiler: gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC
    
    creating build
    creating build/temp.linux-x86_64-3.9
    creating build/temp.linux-x86_64-3.9/src
    INFO: compile options: '-I/home/andrei/.pyenv/versions/3.9.12/envs/test-lev/lib/python3.9/site-packages/numpy/core/include -I/home/andrei/.pyenv/versions/3.9.12/envs/test-lev/include -I/home/andrei/.pyenv/versions/3.9.12/include/python3.9 -c'
    extra options: '-O3'
    INFO: gcc: src/filter.c
    INFO: gcc: src/heap.c
    INFO: gcc: src/python.c
    INFO: gcc: src/quantile.c
    cc1: fatal error: src/filter.c: No such file or directory
    compilation terminated.
    cc1: fatal error: src/heap.c: No such file or directory
    compilation terminated.
    cc1: fatal error: src/python.c: No such file or directory
    compilation terminated.
    cc1: fatal error: src/quantile.c: No such file or directory
    compilation terminated.
    error: Command "gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/andrei/.pyenv/versions/3.9.12/envs/test-lev/lib/python3.9/site-packages/numpy/core/include -I/home/andrei/.pyenv/versions/3.9.12/envs/test-lev/include -I/home/andrei/.pyenv/versions/3.9.12/include/python3.9 -c src/filter.c -o build/temp.linux-x86_64-3.9/src/filter.o -O3" failed with exit status 1
    
    

    Installation using Building from Source section also fails:

    pip install build
    Collecting build
      Using cached build-0.8.0-py3-none-any.whl (17 kB)
    Collecting pep517>=0.9.1
      Using cached pep517-0.12.0-py2.py3-none-any.whl (19 kB)
    Collecting tomli>=1.0.0
      Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
    Collecting packaging>=19.0
      Using cached packaging-21.3-py3-none-any.whl (40 kB)
    Collecting pyparsing!=3.0.5,>=2.0.2
      Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
    Installing collected packages: tomli, pyparsing, pep517, packaging, build
    Successfully installed build-0.8.0 packaging-21.3 pep517-0.12.0 pyparsing-3.0.9 tomli-2.0.1
    (test-lev) [email protected]:~/worspace/test-LEV/rolling-quantiles-1.0.0/python$ python -m build
    * Creating venv isolated environment...
    * Installing packages in isolated environment... (numpy~=1.20, setuptools>=54, wheel)
    * Getting dependencies for sdist...
    * Installing packages in isolated environment... (numpy ~= 1.20)
    * Building sdist...
    running sdist
    running egg_info
    creating rolling_quantiles.egg-info
    writing rolling_quantiles.egg-info/PKG-INFO
    writing dependency_links to rolling_quantiles.egg-info/dependency_links.txt
    writing requirements to rolling_quantiles.egg-info/requires.txt
    writing top-level names to rolling_quantiles.egg-info/top_level.txt
    writing manifest file 'rolling_quantiles.egg-info/SOURCES.txt'
    reading manifest file 'rolling_quantiles.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no previously-included files matching '.DS_Store' found anywhere in distribution
    warning: no previously-included files matching 'pypi-token.txt' found anywhere in distribution
    adding license file '../LICENSE'
    writing manifest file 'rolling_quantiles.egg-info/SOURCES.txt'
    running check
    creating rolling_quantiles-1.0.0
    creating rolling_quantiles-1.0.0/rolling_quantiles
    creating rolling_quantiles-1.0.0/rolling_quantiles.egg-info
    creating rolling_quantiles-1.0.0/src
    copying files to rolling_quantiles-1.0.0...
    copying MANIFEST.in -> rolling_quantiles-1.0.0
    copying README.md -> rolling_quantiles-1.0.0
    copying pyproject.toml -> rolling_quantiles-1.0.0
    copying setup.cfg -> rolling_quantiles-1.0.0
    copying setup.py -> rolling_quantiles-1.0.0
    copying ../LICENSE -> rolling_quantiles-1.0.0/..
    copying rolling_quantiles/__init__.py -> rolling_quantiles-1.0.0/rolling_quantiles
    copying rolling_quantiles.egg-info/PKG-INFO -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
    copying rolling_quantiles.egg-info/SOURCES.txt -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
    copying rolling_quantiles.egg-info/dependency_links.txt -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
    copying rolling_quantiles.egg-info/requires.txt -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
    copying rolling_quantiles.egg-info/top_level.txt -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
    copying rolling_quantiles.egg-info/zip-safe -> rolling_quantiles-1.0.0/rolling_quantiles.egg-info
    copying src/filter.c -> rolling_quantiles-1.0.0/src
    copying src/heap.c -> rolling_quantiles-1.0.0/src
    copying src/python.c -> rolling_quantiles-1.0.0/src
    copying src/quantile.c -> rolling_quantiles-1.0.0/src
    Writing rolling_quantiles-1.0.0/setup.cfg
    Creating tar archive
    removing 'rolling_quantiles-1.0.0' (and everything under it)
    * Building wheel from sdist
    * Creating venv isolated environment...
    * Installing packages in isolated environment... (numpy~=1.20, setuptools>=54, wheel)
    * Getting dependencies for wheel...
    * Installing packages in isolated environment... (numpy ~= 1.20, wheel)
    * Building wheel...
    running bdist_wheel
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-cpython-39
    creating build/lib.linux-x86_64-cpython-39/rolling_quantiles
    copying rolling_quantiles/__init__.py -> build/lib.linux-x86_64-cpython-39/rolling_quantiles
    running build_ext
    building 'triton' extension
    Warning: Can't read registry to find the necessary compiler setting
    Make sure that Python modules winreg, win32api or win32con are installed.
    INFO: C compiler: gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC
    
    creating build/temp.linux-x86_64-cpython-39
    creating build/temp.linux-x86_64-cpython-39/src
    INFO: compile options: '-I/tmp/build-env-sv7dbz4o/lib/python3.9/site-packages/numpy/core/include -I/tmp/build-env-sv7dbz4o/include -I/home/andrei/.pyenv/versions/3.9.12/include/python3.9 -c'
    extra options: '-O3'
    INFO: gcc: src/filter.c
    INFO: gcc: src/heap.c
    INFO: gcc: src/python.c
    INFO: gcc: src/quantile.c
    src/heap.c:17:10: fatal error: heap.h: No such file or directory
       17 | #include "heap.h"
          |          ^~~~~~~~
    compilation terminated.
    src/filter.c:17:10: fatal error: filter.h: No such file or directory
       17 | #include "filter.h"
          |          ^~~~~~~~~~
    compilation terminated.
    src/quantile.c:17:10: fatal error: quantile.h: No such file or directory
       17 | #include "quantile.h"
          |          ^~~~~~~~~~~~
    compilation terminated.
    src/python.c:24:10: fatal error: filter.h: No such file or directory
       24 | #include "filter.h"
          |          ^~~~~~~~~~
    compilation terminated.
    error: Command "gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/tmp/build-env-sv7dbz4o/lib/python3.9/site-packages/numpy/core/include -I/tmp/build-env-sv7dbz4o/include -I/home/andrei/.pyenv/versions/3.9.12/include/python3.9 -c src/filter.c -o build/temp.linux-x86_64-cpython-39/src/filter.o -O3" failed with exit status 1
    
    ERROR Backend subprocess exited when trying to invoke build_wheel
    
    
    bug 
    opened by andrewshkovskii 4
  • Feature request: Centered roll

    Feature request: Centered roll

    As is, the roll is left-aligned, or 'real-time'. This is great for some uses, but it would be great if there was an option to have a 'centered' option such that the edges are rolled all the way to the end (decreasing number of elements in sliding window towards the edges). Such options exist in rolling stats packages like pandas.rolling(center=True, min_periods=1).quantile(q). I'm aware that a centered method can be achieved by effectively lagging the output by pipe.lag. However, it is not possible to recover the right edge of the output. To be clear, when you compare the output traces from these two functions, you can clearly see how rolling_quantiles is not able to achieve the same kind of output.

    import numpy as np
    import pandas as pd
    import rolling_quantiles as rq
    import matplotlib.pyplot as plt
    
    import multiprocessing as mp
    from functools import partial
    
    def rp_pandas(x_in, win_len, ptile=10):
        x_in_df = pd.Series(x_in)
        x_out_pd = x_in_df.rolling(window=win_len, center=True, min_periods=1).quantile(quantile=(ptile/100))
        return np.array(x_out_pd)
    
    def rp_rq_centered(x_in, win_len, ptile=10):
        pipe = rq.Pipeline( rq.LowPass(window=win_len, quantile=(ptile/100)) )
        lag = int(np.floor(pipe.lag))
        return pipe.feed(x_in)[lag:]
    
    def mp_rolling_quantile(x_in, win_len, ptile, function):
        pool = mp.Pool(processes=None)
        results = pool.map(partial(function , win_len=win_len, ptile=ptile), [x_in[ii] for ii in np.arange(x_in.shape[0])])
        pool.close()
        pool.join()
        return np.row_stack(results)
    
    x_in = np.random.rand(100, 100)
    win_len = 11
    ptile = 50
    
    
    output_pd = mp_rolling_quantile(x_in, win_len, ptile, rp_pandas)
    output_rq = mp_rolling_quantile(x_in, win_len, ptile, rp_rq_centered)
    
    %matplotlib notebook
    plt.figure()
    plt.plot(x_in[0])
    plt.plot(output_pd[0][0:], linewidth=2)
    plt.plot(output_rq[0][0:], linewidth=2)
    plt.legend(['x_in', 'pd: centered=True, min_periods=1', 'rq: centered'])
    

    image

    enhancement 
    opened by RichieHakim 0
  • Feature request: parallel processing implementation

    Feature request: parallel processing implementation

    Hi, love the package. Thank you for making it.

    I noticed that there is a block of code in the 'illustrations.py' file (here) demonstrating some code that hasn't been implemented yet in the code base:

    rq.LineUp(rq.Pipeline) # possibly parallelized execution of parallel pipelines
    big_input = np.random.randn(100, 1000)
    # broadcast. route one row to each pipeline.
    big_output = pipes.feed(big_input) # respects Fortran or C ordering to preserve cache locality
    

    I would greatly benefit from being able to parallelize these computations. Multithreading and numba don't seem to improve speed through their parallelization methods. A CPU process monitor shows that most CPU cores are not being utilized, and there are expected errors in attempting to parallelize the rq.Pipeline object. Adding the LineUp feature to your package would be greatly appreciated.

    Thanks, Rich

    enhancement 
    opened by RichieHakim 2
Releases(v1.1.0)
  • v1.1.0(Jun 2, 2022)

  • 1.0.0(Apr 14, 2021)

  • v0.2.1(Mar 14, 2021)

  • v0.2.0(Mar 5, 2021)

    Extended the interface to enable linear interpolations between the discrete values approximating a true quantile fraction. The estimation characteristics may be controlled via alpha and beta parameters, each between zero and one.

    Binaries available via pip install rolling-quantiles for Linux, MacOS, and Windows running Python 3.8+ and NumPy 1.20.

    Source code(tar.gz)
    Source code(zip)
Owner
Myrl Marmarelis
Myrl Marmarelis
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch Why functorch? | Install guide | Transformations | Future Plans functorch is a prototype of JAX-like composable FUNCtion transforms for pyTO

Facebook Research 1.2k Dec 27, 2022
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch Why functorch? | Install guide | Transformations | Documentation | Future Plans This library is currently under heavy development - if you h

null 1.2k Dec 27, 2022
Fast and robust certifiable relative pose estimation

Fast and Robust Relative Pose Estimation for Calibrated Cameras This repository contains the code for the relative pose estimation between two central

null 42 Dec 6, 2022
A light and fast internet speed plugin(DDE).

lfxNet English | 简体中文 | 繁體中文 lfxNet 是一款轻量、快速的实时显示系统资源信息的应用程序。 目录 背景 编译 下载 作者 鸣谢 协议 背景 喜爱 DDE ,为 Deepin 爱好者、也是开发者之一。因习惯其它系统上有一个任务栏网速插件,但在 Deepin/UOS上没有

偕臧 57 Dec 19, 2022
fast zksnark prover

rapidsnark rapid snark is a zkSnark proof generation written in C++ and intel assembly. That generates proofs created in circom and snarkjs very fast.

iden3 78 Jan 3, 2023
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 33k Dec 30, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Dec 31, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 5, 2023
Fast parallel CTC.

In Chinese 中文版 warp-ctc A fast parallel implementation of CTC, on both CPU and GPU. Introduction Connectionist Temporal Classification is a loss funct

Baidu Research 4k Dec 26, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 648 Dec 16, 2022