HybridSE (Hybrid SQL Engine) is an LLVM-based, hybrid-execution and high-performance SQL engine

Overview

GitHub Workflow Status GitHub release (latest by date) GitHub milestones GitHub

Introduction

HybridSE (Hybrid SQL Engine) is an LLVM-based, hybrid-execution and high-performance SQL engine. It can provide fast and consistent execution on heterogeneous SQL data systems, e.g., OLAD database, HTAP system, SparkSQL, and Flink Stream SQL.

image-hybridse

HybridSE is a modularized SQL compiler and executor, with following features:

  • SQL syntax validation
  • Logical plan generation and optimization
  • Expression optimization
  • Online/Offline physical plan generation
  • Native code generation
  • Cluster/Standalone runner

By leveraging the power of HybridSE, developer can implement high performance SQL database with ease, or improve performance over existing offline SQL execution engine. Unlike MySQL and SparkSQL, which have buildin SQL engine, HybridSE offers better performance. What's more, it's designed for AI scenarios, did grammar expansion and optimization, making HybridSE more like a modern SQL engine.

HybridSE has following characteristic:

  • High Performance

    Leveraging the power of LLVM JIT, HybridSE can generate binary code dynamicly based on different hardware environment. It also have dozens of plan pass buildin, a more flexible memory management, which ensure High-Performance for HybridSE.

  • Great Scalability

    Thanks to the modularized design, HybridSE can generate logical and physical plan for different stages. With sdk for multiple languages, HybridSE can be used on SQL optimization, regardless the system is realtime OLAD database, distributed OLAP or stream SQL.

  • Machine Learning Aimed Optimization

    Offer the special table join operation and customized UDF/UDAF, which fullfill the requirements of feature extraction and deployment from machine learning applications.

  • Online-Offline Consistency

    Same SQL and CodeGen logical, is guaranteed equivalent meaning and exact same result. Consistency also apply to UDF/UDAF in multiple programing languages.

Quick Start

Requirements

Prepare Code & Docker

git clone https://github.com/4paradigm/HybridSE.git
cd HybridSE
docker run -v `pwd`:/HybridSE -it ghcr.io/4paradigm/hybridsql:latest
cd /HybridSE
# init enviroment before build
source tools/init_env.profile.sh

It is recommended using the docker image listed above for faster start and avoid dependency hole. You may checkout HybridSQL-docker for complete dependency.

Build

cd /HybridSE
mkdir -p build && cd build
cmake ..
# compile the core library
make -j$(nproc) hybridse_core # install coreutils if nproc not found in mac

Install

cd /HybridSE
mkdir -p build && cd build
cmake ..  -DCMAKE_INSTALL_PREFIX="CONFIG_YOUR_HYRBIDSE_INSTALL_DIR"
make -j$(nproc) install

checkout HybridSE Quick start for more information

Run tests

cd /HybridSE
mkdir -p build & cd buid
cmake .. -DTESTING_ENABLE=ON
export SQL_CASE_BASE_DIR=/HybridSE
make -j$(nproc) && make -j$(nproc) test

Run simple engine demo

cd /HybridSE
mkdir build
cd build
cmake ..
make -j$(nproc) hybridse_proto && make -j$(nproc) hybridse_parser && make -j$(nproc) simple_engine_demo
./src/simple_engine_demo

simple_engine_demo is a in-memory SQL engine implemented on HybridSE。For more information see How to create a simple SQL engine

Run ToyDB

  • Build ToyDB
cd /HybridSE
mkdir build
cmake .. -DEXAMPLES_ENABLE=ON
make -j$(nproc) hybridse_proto && make -j$(nproc) hybride_parser && make toydb -j$(nproc)
  • Start ToyDB
cd /HybridSE/examples/toydb/onebox
sh start_all.sh
sh start_cli.sh

ToyDB is a simple in-memory database powered by HybridSE, supporting basic CRUD operations. see more information at ToyDB quick start

Related Projects

Project Status Description
FEDB Open Source NewSQL database optimised for realtime inference and decisioning applications
SparkFE Open Source LLVM-based and high-performance Spark native execution engine which is designed for feature engineering
NativeFlink Under Develop High-performance, Batch-Stream-in-onebox FlinkSQL execution engine

Roadmap

ANSI SQL compatibility

HybridSE is compatible with mainstream DDL, DML already,and will support ANSI SQL progressively, which will reduce the cost of migration from other SQL engine.

  • [2021H1&H2] Enrich standard syntax of Window,support Where, Group By, Join etc
  • [2021H1&H2] Extend AI-domain specific grammar and UDAF functions

Performance

HybridSE offer dozens of SQL expression and logical plan optimization, standardized optimization pass interface, and will implement more SQL optimization.

  • [2021H1] Logical and physical plan optimization for batch mode and request mode data processing
  • [2021H1] High-performance, distributed execution plan generation and codegen
  • [2021H2] Compilation and codegen optimization for LLVM-based expression
  • [2021H2] More classic SQL expression pass support

Ecosystem Integration

HybridSE has been integrated by NativeSpark and FEDB already. It can been integrated into NoSQL, OLAP, OLTP system, and will support more open source system in the future.

  • [2021H2] Adapt to open source SQL compute framework like FlinkSQL
  • [2021H2] Adapt to various encoding format in row and column, be compatible with Apache Arrow
  • [2021H2] Support popular programing languages,including C++, Java, Python, Go, Rust etc

Contribution

License

Apache License 2.0

You might also like...
TimescaleDB is an open-source database designed to make SQL scalable for time-series data.

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

A bare-bone SQL implementation

MiniSQL A bare-bone SQL implementation. Project Structure include folder contains header files of all modules. These header files are meant to be shar

A framework to monitor and improve the performance of PostgreSQL using Machine Learning methods.
A framework to monitor and improve the performance of PostgreSQL using Machine Learning methods.

pg_plan_inspector pg_plan_inspector is being developed as a framework to monitor and improve the performance of PostgreSQL using Machine Learning meth

An SQLite binding for node.js with built-in encryption, focused on simplicity and (async) performance

Description An SQLite (more accurately SQLite3MultipleCiphers) binding for node.js focused on simplicity and (async) performance. When dealing with en

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large scale graphs with dozens of billions of vertices (nodes) and trillions of edges, with milliseconds of latency.

DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.

DB Browser for SQLite What it is DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files c

ESE is an embedded / ISAM-based database engine, that provides rudimentary table and indexed access.

Extensible-Storage-Engine A Non-SQL Database Engine The Extensible Storage Engine (ESE) is one of those rare codebases having proven to have a more th

An embeddable fulltext search engine. Groonga is the successor project to Senna.

README Groonga is an open-source fulltext search engine and column store. Reference manual See doc/source/ directory or http://groonga.org/docs/. Here

Releases(v0.2.1)
Owner
4Paradigm
4Paradigm Open Source Community
4Paradigm
PGSpider: High-Performance SQL Cluster Engine for distributed big data.

PGSpider: High-Performance SQL Cluster Engine for distributed big data.

PGSpider 132 Sep 8, 2022
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.

yugabyte 7.4k Jan 7, 2023
dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover

dqlite dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover. The acron

Canonical 3.3k Jan 9, 2023
A PostgreSQL extension providing an async networking interface accessible via SQL using a background worker and curl.

pg_net is a PostgreSQL extension exposing a SQL interface for async networking with a focus on scalability and UX.

Supabase 49 Dec 14, 2022
A lightweight header-only C++11 library for quick and easy SQL querying with QtSql classes.

EasyQtSql EasyQtSql is a lightweight header-only C++11 library for quick and easy SQL querying with QtSql classes. Features: Header only C++11 library

null 53 Dec 30, 2022
pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

Agroal 555 Dec 27, 2022
High-performance time-series aggregation for PostgreSQL

PipelineDB has joined Confluent, read the blog post here. PipelineDB will not have new releases beyond 1.0.0, although critical bugs will still be fix

PipelineDB 2.5k Dec 26, 2022
A high performance fiber RPC network framework. 高性能协程RPC网络框架

ACID: 高性能协程RPC框架 学习本项目需要有一定的C++,网络,RPC知识 项目依赖 1.项目用到了大量C++17/20新特性,如constexpr if的编译期代码生成,基于c++20 coroutine的无栈协程状态机解析 URI 和 HTTP 协议等。注意,必须安装g++-11,否则不支

null 277 Dec 30, 2022
A type safe SQL template library for C++

sqlpp11 A type safe embedded domain specific language for SQL queries and results in C++ Documentation is found in the wiki So what is this about? SQL

Roland Bock 2.1k Dec 30, 2022
DuckDB is an in-process SQL OLAP Database Management System

DuckDB is an in-process SQL OLAP Database Management System

DuckDB 7.8k Jan 3, 2023