Velox is a new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Related tags

Database velox
Overview

Velox logo

Velox is a C++ database acceleration library which provides reusable, extensible, and high-performance data processing components. These components can be reused to build compute engines focused on different analytical workloads, including batch, interactive, stream processing, and AI/ML. Velox was created by Facebook and it is currently developed in partnership with Intel, ByteDance, and Ahana.

In common usage scenarios, Velox takes a fully optimized query plan as input and performs the described computation. Considering Velox does not provide a SQL parser, a dataframe layer, or a query optimizer, it is usually not meant to be used directly by end-users; rather, it is mostly used by developers integrating and optimizing their compute engines.

Velox provides the following high-level components:

  • Type: a generic typing system that supports scalar, complex, and nested types, such as structs, maps, arrays, tensors, etc.
  • Vector: an Arrow-compatible columnar memory layout module, which provides multiple encodings, such as Flat, Dictionary, Constant, Sequence/RLE, and Bias, in addition to a lazy materialization pattern and support for out-of-order writes.
  • Expression Eval: a fully vectorized expression evaluation engine that allows expressions to be efficiently executed on top of Vector/Arrow encoded data.
  • Function Packages: sets of vectorized function implementations following the Presto and Spark semantic.
  • Operators: implementation of common data processing operators such as scans, projection, filtering, groupBy, orderBy, shuffle, hash join, unnest, and more.
  • I/O: a generic connector interface that allows different file formats (ORC/DWRF and Parquet) and storage adapters (S3, HDFS, local files) to be used.
  • Network Serializers: an interface where different wire protocols can be implemented, used for network communication, supporting PrestoPage and Spark's UnsafeRow.
  • Resource Management: a collection of primitives for handling computational resources, such as memory arenas and buffer management, tasks, drivers, and thread pools for CPU and thread execution, spilling, and caching.

Velox is extensible and allows developers to define their own engine-specific specializations, including:

  1. Custom types
  2. Simple and vectorized functions
  3. Aggregate functions
  4. Operators
  5. File formats
  6. Storage adapters
  7. Network serializers

Examples

Examples of extensibility and integration with different component APIs can be found here

Documentation

Developer guides detailing many aspects of the library, in addition to the list of available functions can be found here.

Getting Started

We provide scripts to help developers setup and install Velox dependencies.

Setting up on macOS

See scripts/setup-macos.sh

Setting up on Linux (Ubuntu 20.04 or later)

See scripts/setup-ubuntu.sh

Building Velox

Run make in the root directory to compile the sources. For development, use make debug to build a non-optimized debug version, or make release to build an optimized version. Use make unittest to build and run tests.

Contributing

Check our contributing guide to learn about how to contribute to the project.

Community

The main communication channel with the Velox OSS community is through the the Velox-OSS Slack workspace. Please don't hesitate in reaching out.

License

Velox is licensed under the Apache 2.0 License. A copy of the license can be found here.

Issues
  • Add hdfs read support

    Add hdfs read support

    Add Hdfs read support. The design is similar to the s3 connector. There is an HdfsFileSystem class that creates the client and vends HdfsReadFiles.

    HdfsReadFile is in charge of returning data from the file via the pread methods. On each pread HdfsReadFile creates an hdfsFile and reads the requested data, after the read is finished the file is closed. Since each pread creates its own file object, the pread methods are thread safe.

    To test this I depend on the hadoop executable that has a local mini cluster implementation. For my tests I start a mini cluster and read from it.

    CLA Signed 
    opened by Arty-Maly 43
  • How connect presto with velox?

    How connect presto with velox?

    We want to try velox with presto, it seems the todo includes

    1. define protocol between presto and velox, build a binary with velox, and list on a port x
    2. presto connect port x and send plan tree, receive the result

    is it right ? thanks.

    opened by RaymondChina 33
  • Build abstract and portable SIMD using xsimd

    Build abstract and portable SIMD using xsimd

    Summary: This change replaces all Intel intrinsics with portable xsimd wrappers. It only runs on AVX2 chips for now. Subsequent changes will make it portable to SSE4.2 and NEON.

    Differential Revision: D35579863

    CLA Signed fb-exported 
    opened by Yuhta 32
  • Add TPC-H query 18 to ParquetTpchTest

    Add TPC-H query 18 to ParquetTpchTest

    I have modified the existing TPC-H queries 1 and 6 slightly to accommodate adding more queries easily.

    I would like add more TPC-H queries, but I first wanted to get the framework changes in place a thumbs up before adding more queries.

    CLA Signed 
    opened by 9prady9 25
  • Fixing Unsaferow Deserializer for Arrays BFS traversal

    Fixing Unsaferow Deserializer for Arrays BFS traversal

    Summary: The diff fixes a bug associated with Unsaferow deserializer BFS traversal for Arrays. The top level ordering of arrays and their contents was not correct.

    Differential Revision: D32452803

    CLA Signed fb-exported Merged 
    opened by beroyfb 25
  • Initial ArrayProxy implementation.

    Initial ArrayProxy implementation.

    Add initial basic implementation of ArrayProxy.

    • Tested for primitives.

    • The current proxy does not replace old output types yet, instead VectorWriter <ArrayProxyT> is created, eventuallyVectorWriter<ArrayProxyT>will replaceVectorWriter<Array>once all types are converted to use proxies, and all functionalities of arrayProxy are tested and completed.

    • Note: The VectorWriter protocol is not changed which is:

    1. call init() on the vectorWriter with the results vectors. then for each row:
    2. setOffset()
    3. pass current() to simple function.
    4. call commit(bool isNull) .
    • The ArrayProxy utilizes the same protocol to write to children. Note that this protocol allows for out of order writes, but arrayProxy writes things in order.

    • Array proxy has the following general interface: 1 - addItem : add not null item , returns reference to the inner item proxy. 2 - addNull: add null item.

    For primitives the following are enabled:

    • push_back
    • resize
    • operator[]
    proxy.resize(10); 
    proxy[0]=1;
    proxy[1] = std::nullopt;
    ..etc
    

    Example of function using array proxy:

    template <typename T>
    struct SimpleFunctionArrayProxyPushBack {
      bool call(exec::ArrayProxy<int64_t>& out, const int64_t& n) {
        for (int i = 0; i < n; i++) {
          if (WITH_NULLS && i % 5) {
            out.push_back(std::nullopt);
          } else {
            out.push_back(i);
          }
        }
        return true;
      }
    };
    registerFunction<
            SimpleFunctionArrayProxyPushBack,
            ArrayProxyT<int64_t>,
            int64_t>({"simple_proxy_resize"});
    
    CLA Signed 
    opened by laithsakka 24
  • re-organize presto function registration

    re-organize presto function registration

    • Split functions to semantic type-based category: Deprecate registerVectorFunctions() and registerSimpleFunctions() and instead introduce registerStringFunctions(), registerMapFunctions()...etc.

    • This split also reduces the compile time by distributing the instantiations of the simple functions across multiple compilation units.

    CLA Signed Reverted 
    opened by laithsakka 21
  • Introduce ArrayView type for simple functions

    Introduce ArrayView type for simple functions

    This is continuous and extension work of https://github.com/facebookincubator/velox/pull/493

    ArrayView

    Introduce and use ArrayView as an input type for simple functions for ArrayType. ArrayView is a lightweight lazy access abstraction for the underlying array with the following interface.

    • size()
    • operator [ ], at returns an ArrayView::Element

    Element is an abstract over the value and nullability of the array at a specific index, it have an interface similar to std::optional:

    • value() // for now unchecked access, (shall we make it checked?) i.e add a dynamic check that value is not null
    • operator * // unchecked access
    • has_value()
    • implicit bool cast => if(v) == if(v.has_value())

    ArrayView provides an iterator interface to iterate over the elements

    i.e:

    foo( ArrayView<T> inputs){
      for(const auto & element: inputs){
          if(element.has_value()){
             bar(*element); 
           }
      }
    }
    

    or

    foo( ArrayView<T> inputs){
      for(int i=0; i<inputs.size(); i++){
          if(inputs[i].has_value()){
             bar(*inputs[i]); 
           }
      }
    }
    

    or

    foo( ArrayView<T> inputs){
     auto it = inputs.begin(); 
     while(it!=inputs.end()){
        if(*it.has_value()){
                 bar(**it); 
        }
        ++it;
     }
    }
    

    Performance:

    this diff speedup simple functions by 2X, simple functions are not yet as fast as optimized vector functions, but they are almost as fast as vector functions that do not have special fast paths based on dynamic info not available to simple functions.

    1) min function

    VectorMinIntegerBasic is added and it computes min with out any vector-function privileged optimizations It is added as a comparison point to observe the overhead (if any) of the argument dispatch in the vector reader. the performance of vectorMinIntegerBasic and simpleMinIntegerIterator are similar

    ============================================================================
    fastMinInteger                                             488.62us    2.05K
    vectorMinInteger                                  84.74%   576.58us    1.73K
    vectorMinIntegerNoFastPath                        58.81%   830.82us    1.20K
    simpleMinIntegerIterator                          58.75%   831.76us    1.20K
    simpleMinInteger                                  56.18%   869.66us    1.15K
    ============================================================================
    

    before and after this diff

    the simple function min implementation did not support null reading: so here we compare non-nulls min before: 1.58ms after: 803.24us

    2) compare function

    • with current vector function
    with this diff
    ============================================================================
    vectorSimpleFunction                                         1.04ms   961.28
    vectorFunctionInteger                            202.30%   514.24us    1.94K
    ============================================================================
    
    before this diff
    ============================================================================
    SimpleFunction                                         2.11ms   473.00
    vectorFunction                            364.69%   579.71us    1.72K
    ============================================================================
    
    • with the fast path under those conditions disabled in the vector function: (elementsDecoded.isIdentityMapping() && !elementsDecoded.mayHaveNulls() && searchDecoded.isConstantMapping())
    with this diff
    ============================================================================
    vectorSimpleFunction                                         1.03ms   974.86
    vectorFunctionInteger                            128.00%   801.37us    1.25K
    ============================================================================
    
    before this diff
    ============================================================================
    SimpleFunction                                         1.91ms   523.35
    vectorFunction                            233.55%   818.14us    1.22K
    

    Test

    • add ArrayViewTest
    • simpleFunctionTest already covers cases of nested complex types.

    Next

    • introduce View types for row, map .
    • introduce 0-copy write proxies for complex types.
    CLA Signed Merged 
    opened by laithsakka 21
  • Add spilling to HashAggregation

    Add spilling to HashAggregation

    Add spilling to HashAgrregation

    Spilling of aggregation works by spilling sorted runs of accumulators. These are merged at output time. No hash table is needed at output time since we are merging sorted streams and will see all accumulators of a particular key before we see accumulators of the next key. Spilling writes out accumulators for a fraction of the hash key space of the group by. This creates space in the group by state and thus can receive more keys without increasing memory utilization.

    • A method for shrinking the memory footprint will be added later after the Task memory arbitration strategy is added.

    • Adds basic spilling logic to GroupingSet. Before processing a batch of input, checks that this will fit. The check considers hash table growth, available space in the table's RowContainer, available space in the reservation of the operator's MemoryUsageTracker etc. If there is a possibility of the input not fitting, this looks at the parent trackers to see if it is possible to extend the reservation. If so, it tries to extend. If growth is not possible, GroupingSet spills part of the data, creating empty space in the container.

    • In getOutput, if there has been spilling, frees the HashTable but retains the RowContainer. first returns the state from non-spilled partitions. For partitins that have spilled, creates a merge RowContainer into which this reads batches of accumulators. The merge uses a TreeOfLosers that tells te caller when the last row of a particular key is received, thus the spill reader does not have to compare keys.

    • Adds a config for spilling path to QueryCtx.

    • Adds a testing device for forcing a percentage of GroupingSet input batches to spill regardless of memory.

    • Adds an optional spill executor to QueryCtx. This is used to parallelize outgoing writes and possible background double buffering for spill reading.

    • Exposes intermediate types of Aggregates. This is needed to spill a single aggregation where the input type is raw but the spill type is the intermediate type.

    • Adds a forced spilling case to AggregationTestBase.

    CLA Signed 
    opened by oerling 20
  • Add spilling to RowContainer

    Add spilling to RowContainer

    Add a Spiller class to manage spilling rows from a RowContainer.

    The spiller groups rows into a power of two number of partitions based on hash of the keys. One or more of these partitions can be spillable. More partitions become spillable as memory pressure demands.

    Spilling trims the number of rows and the size of variable length data in the RowContainer. This allows inserting more data in the freed space but does not release the memory. Spilling with a target size of 0 writes out everything and releases the memory, leaving the RowContainer in a post-construction state.

    Spilling picks rows that fall in the same hash partition and sorts them by their key columns. The runs are ideally at least a target file size bytes in size. The runs go into one spill file. The file internally consists of relatively short vectors, e.g. 1MB or 128 rows, whichever comes first.

    The data in a RowContainer can be read back partition by partition, merging the rows in the container with spilled rows.

    Spilling can write multiple partitions in parallel. This is useful when needing to rapidly free up space under memory contention. This also minimizes the stopping wall time for consuming data in a spilling operator.

    CLA Signed 
    opened by oerling 19
  • Skip inMap indices in subcolumn readers

    Skip inMap indices in subcolumn readers

    Summary: With recent enablement of dictionary encoding in flatmaps, we also enabled string dictionary encoding, which exposed an issue that we write the inMap indices at the front of all indices, but don't skip them when trying to read stride dictionary metadata in the index.

    This diff passes additional info from FlatMapColumnReader into (subcolumn) value readers to help

    Differential Revision: D31479861

    CLA Signed fb-exported 
    opened by HuamengJiang 18
  • Add additional info to MEM_CAP_EXCEEDED error

    Add additional info to MEM_CAP_EXCEEDED error

    This patch adds a mechanism to the MemoryUsageTracker class that allows its user to register a callback to add additional info to a MEM_CAP_EXCEEDED error when encountered during a call to increase reservation. Moreover, this mechanism is leveraged to add details at the Task and Operator level in the memory tracker hierarchy.

    Testing: Added tests to force this error and verify details added to the error message.

    CLA Signed 
    opened by bikramSingh91 2
  • Fixed DateTimeFormatter format function logic and registered format_datetime Velox function

    Fixed DateTimeFormatter format function logic and registered format_datetime Velox function

    Summary: Updated logic in DateTimeFormatter format function so that year 0 is BC and CENTURY_OF_ERA token format supports negative years. Registered format_datetime Velox function and wrote tests to verify correctness.

    Differential Revision: D37259813

    CLA Signed fb-exported 
    opened by ConnorDevlin2000 2
  • Add aggregate function last(expr, [,ignoreNull])

    Add aggregate function last(expr, [,ignoreNull])

    Summary: Implement last aggregate function last(expr[, ignoreNull]) Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values

    Reference: https://spark.apache.org/docs/latest/api/sql/#last

    Differential Revision: D37360558

    CLA Signed fb-exported 
    opened by nemausus 1
Owner
Facebook Incubator
We work hard to contribute our work back to the web, mobile, big data, & infrastructure communities. NB: members must have two-factor auth.
Facebook Incubator
A very fast lightweight embedded database engine with a built-in query language.

upscaledb 2.2.1 Fr 10. Mär 21:33:03 CET 2017 (C) Christoph Rupp, [email protected]; http://www.upscaledb.com This is t

Christoph Rupp 531 Jun 20, 2022
The database built for IoT streaming data storage and real-time stream processing.

The database built for IoT streaming data storage and real-time stream processing.

HStreamDB 501 Jun 29, 2022
MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

Copyright (c) 2000, 2021, Oracle and/or its affiliates. This is a release of MySQL, an SQL database server. License information can be found in the

MySQL 8k Jul 4, 2022
A mini database for learning database

A mini database for learning database

Chuckie Tan 3 Nov 3, 2021
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Overview GridDB is Database for IoT with both NoSQL interface and SQL Interface. Please refer to GridDB Features Reference for functionality. This rep

GridDB 1.8k Jun 27, 2022
TimescaleDB is an open-source database designed to make SQL scalable for time-series data.

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

Timescale 13.3k Jun 27, 2022
LogMessage is one of the output format of database incremental data

LogMessage LogMessage是一种数据库增量数据的输出格式,oceanbase的增量采集模块liboblog正是使用的这种消息格式来输出增量数据,LogMessage支持oceanbase中不同数据类型的增量数据的写入,具有序列化和反序列化的能力。 如何编译 LogMessage的编译

OceanBase 7 Jan 21, 2022
BerylDB is a data structure data manager that can be used to store data as key-value entries.

BerylDB is a data structure data manager that can be used to store data as key-value entries. The server allows channel subscription and is optimized to be used as a cache repository. Supported structures include lists, sets, and keys.

BerylDB 195 Jun 24, 2022
A friendly and lightweight C++ database library for MySQL, PostgreSQL, SQLite and ODBC.

QTL QTL is a C ++ library for accessing SQL databases and currently supports MySQL, SQLite, PostgreSQL and ODBC. QTL is a lightweight library that con

null 155 Jun 26, 2022
dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover

dqlite dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover. The acron

Canonical 3k Jun 27, 2022
Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

GitHub 248 Jun 11, 2022
SOCI - The C++ Database Access Library

Originally, SOCI was developed by Maciej Sobczak at CERN as abstraction layer for Oracle, a Simple Oracle Call Interface. Later, several database backends have been developed for SOCI, thus the long name has lost its practicality. Currently, if you like, SOCI may stand for Simple Open (Database) Call Interface or something similar.

SOCI 1.1k Jun 25, 2022
The fastest database-library on Android OS.

Android SQLite3 NDK 封装 Demo下载 (操作:按钮新增 按钮查询 点按编辑 长按删除) 写在前面 sqlite3 开源、集成简单(现在的版本只有2个文件 sqlite3.h sqlite3.c) 这个库抽离自 Telegram 的开源代码、作者:DrKLO 我个人感觉 Tele

水银灯、 2 Dec 27, 2021
C++11 wrapper for the LMDB embedded B+ tree database library.

lmdb++: a C++11 wrapper for LMDB This is a comprehensive C++ wrapper for the LMDB embedded database library, offering both an error-checked procedural

D.R.Y. C++ 257 Jun 16, 2022
ObjectBox C and C++: super-fast database for objects and structs

ObjectBox Embedded Database for C and C++ ObjectBox is a superfast C and C++ database for embedded devices (mobile and IoT), desktop and server apps.

ObjectBox 131 Jun 17, 2022
ESE is an embedded / ISAM-based database engine, that provides rudimentary table and indexed access.

Extensible-Storage-Engine A Non-SQL Database Engine The Extensible Storage Engine (ESE) is one of those rare codebases having proven to have a more th

Microsoft 780 Jun 13, 2022
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large scale graphs with dozens of billions of vertices (nodes) and trillions of edges, with milliseconds of latency.

vesoft inc. 807 Jun 30, 2022
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.

What is OceanBase database OceanBase Database is a native distributed relational database. It is developed entirely by Alibaba and Ant Group. OceanBas

OceanBase 4.4k Jun 27, 2022
Config and tools for config of tasmota devices from mysql database

tasmota-sql Tools for management of tasmota devices based on mysql. The tasconfig command can load config from tasmota and store in sql, or load from

RevK 3 Jan 8, 2022