OpenMLDB is an open-source database particularly designed to efficiently provide consistent data for machine learning driven applications.

Overview

build status docker pulls slack discuss codecov release license gitee maven central maven central pypi

English version | 中文版

1. Introduction

OpenMLDB is an open-source database particularly designed to efficiently provide consistent data for machine learning. A database for machine learning consists of two major tasks: feature extraction and feature access, which are served as data provisioning for offline training and online inference. Without OpenMLDB, there are two separate systems for online and offline data provisioning, which cost significant effort to verify the online-offline consistency. On the contrary, OpenMLDB supports the unified SQL programming and its execution engine for both online and offline data provisioning. As a result, the online-offline consistency is inherently guaranteed. Moreover, the system is carefully designed and optimized to ensure the efficiency. By taking advantages of OpenMLDB, database engineers are now able to write SQL scripts only to efficiently provide consistent data to machine learning, and an offline model can be immediately deployed for online serving with little cost involved.

image-20211103103052252

The above figure illustrates the OpenMLDB workflow. SQL engineers first write SQL scripts for offline feature extraction, which provides data for offline model training. When the model quality is satisfied, the online feature extraction and access can be enabled immediately for online serving without additional efforts involved. Thanks to the unified SQL programming and execution engine, the online-offline consistency verification is eliminated, which is inherently guaranteed by OpenMLDB. Furthermore, certain optimization techniques (e.g., data skew optimization and in-memory indexing for offline and online feature extraction, respectively) are adopted to ensure that the performance requirement can be met for both offline training and online inference. In summary, OpenMLDB enables SQL as the only programming interface for consistent and efficient data provisioning for both offline model training and online inference serving.

2. Highlight Features

2.1. SQL Programming APIs

We believe SQL is the most suitable programming APIs for feature engineering because of its elegant design and popularity. OpenMLDB enables SQL as the programming APIs for developers for both offline and online feature extraction. Besides, we extend the capability of standard SQL and make it more powerful for feature extraction.

2.2 Online-Offline Consistency

Based on the SQL programming APIs, we design an unified execution engine for both online and offline feature extraction. As a result, the online-offline consistency is inherently guaranteed by OpenMLDB with no other cost.

2.3. Efficiency

We propose a few techniques to improve the performance for both offline and online feature extraction. As a result, our offline feature extraction can be significantly faster than existing opensource bigdata processing frameworks. Moreover, our online service can provide low latency (tens of milliseconds) to meet the performance requirement of online inference.

You can read our below section (7. Publications & Blogs) for more technical detail.

2.4 Integrated CLI

We provide a powerful integrated CLI for SQL programming, job management, online and offline deployment, and database administration. Developers who are familiar with database's CLIs should be very comfortable with our tool.

Note that, the CLI of current release 0.3.0 supports the cluster mode partially. It will be fully supported in the next release of 0.4.0

3. Build & Install

👉 Read more

4. Demo & QuickStart

Since OpenMLDB v0.3.0, we have introduced two operating modes, which are cluster mode and standalone mode. The cluster mode is suitable for large-scale datasets and real-world applications, which provides the scalability and high-availability. On the other hand, the lightweight standalone mode running on a single node is ideal for small businesses and demonstration.

We demonstrate the workflow using the cluster and standalone modes:

5. Roadmap

We list a few highlight features that we have planned in the future releases. Please join our community to understand more about our planning and discuss your ideas.

Version Est. release date Highlight features
0.4.0 End of 2021 - Full support of standalone and cluster modes in the integrated CLI
0.5.0 2022 Q1 - Monitoring APIs and tools for online serving
- Efficient queries over a fairly long period of time by window functions
- Kafka/Pulsar connector support for online data source

6. Community

You may join our community for feedback and discussion

  • Email: [email protected]

  • Slack Workspace: You may find useful information of release notes, user support, development discussion and even more from our various Slack channels.

  • GitHub Issues and Discussions: If you are a serious developer, you are most welcome to join our discussion on GitHub. GitHub Issues are used to report bugs and collect new requirements. GitHub Discussions are mostly used by our project maintainers to publish and comment RFCs.

  • Blogs (Chinese)

  • WeChat Groups (Chinese):

    img

7. Publications & Blogs

Comments
  • feat: support the SQL RLIKE expression

    feat: support the SQL RLIKE expression

    close #862

    Development

    • [x] SQL syntax https://github.com/4paradigm/zetasql/pull/41
    • [x] extend BinaryExpr to support RLIKE type
    • [x] add regexp_like builtin function
    • [x] Codegen: deal with RLIKE BinaryExpr

    Test

    • [x] ~~logic plan test in ast_node_converter_test to verify correctly plan generated for rlike expr (discuss needed)~~
    • [x] unit test for new functions in udf.cc -> udf_test.cc
    • [x] udf_ir_builder_test for the registered function regexp_like udf functions
    • [x] expr_ir_builder_test for the supported rlike expression
    • [x] and end2end, a full sql example to test correctly for both rlike expression and regexp_like function (discuss needed)
    build execute-engine 
    opened by jiang1997 33
  • refactor: rm cmake modules

    refactor: rm cmake modules

    • Please check if the PR fulfills these requirements
    • [ ] The commit message follows our guidelines
    • [ ] Tests for the changes have been added (for bug fixes / features)
    • [ ] Docs have been added / updated (for bug fixes / features)
    • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

    • What is the current behavior? (You can also link to an open issue here)

    • What is the new behavior (if this is a feature change)?

    • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

    • Other information:

    opened by dl239 20
  • `lag()/at()/lead()` return show offset'th row, it is not related to window frame bound

    `lag()/at()/lead()` return show offset'th row, it is not related to window frame bound

    Bug Description

      - id: 6
        desc: window merge optimization
        inputs:
          - columns: [ "row_id int","ts timestamp","group1 string","val1 int" ]
            indexs: [ "index1:group1:ts" ]
            name: t1
            data: |
              1, 1612130400000, g1, 1
              2, 1612130401000, g1, 2
              3, 1612130402000, g1, 3
              4, 1612130403000, g1, 4
              5, 1612130404000, g1, 5
        sql: |
          select
          `row_id` as row_id_1,
          `row_id` as t1_row_id_original_0,
          case when !isnull(at(`val1`, 0)) over t1_group1_ts_0s_5s_10 then count_where(`val1`, `val1` = at(`val1`, 0)) over t1_group1_ts_0s_5s_10 else null end as t1_val1_window_count_1,
          case when !isnull(at(`val1`, 0)) over t1_group1_ts_1s_5s_10 then count_where(`val1`, `val1` = at(`val1`, 0)) over t1_group1_ts_1s_5s_10 else null end as t1_val1_window_count_2
          from
            `t1` WINDOW
            t1_group1_ts_0s_5s_10 as (partition by `group1` order by `ts` rows_range between 5s preceding and 0s preceding MAXSIZE 10),
            t1_group1_ts_1s_5s_10 as (partition by `group1` order by `ts` rows_range between 5s preceding and 1s preceding MAXSIZE 10);
        expect:
          columns: ["row_id_1 int", "t1_row_id_original_0 int", "t1_val1_window_count_1 int64", "t1_val1_window_count_2 int64"]
          order: row_id_1
          data: |
            1, 1, 1, 0
            2, 2, 1, 0
            3, 3, 1, 0
            4, 4, 1, 0
            5, 5, 1, 0
    

    Expected Behavior

    the case should pass. Current result is not:

    +----------+----------------------+------------------------+------------------------+
    | row_id_1 | t1_row_id_original_0 | t1_val1_window_count_1 | t1_val1_window_count_2 |
    +----------+----------------------+------------------------+------------------------+
    | 1        | 1                    | 1                      | NULL                   |
    | 2        | 2                    | 1                      | NULL                   |
    | 3        | 3                    | 1                      | NULL                   |
    | 4        | 4                    | 1                      | NULL                   |
    | 5        | 5                    | 1                      | NULL                   |
    +----------+----------------------+------------------------+------------------------+
    

    Work List

    • [x] add extra handling in logic plan for lag functions
      • working in #1605
    • [x] fix lag correctness in request mode & cluster environment
    • [x] fix window merge result correctness ( may related to #1587 )
    bug community high-priority 
    opened by aceforeverd 18
  • style: enforece cpp style convention in hybridse

    style: enforece cpp style convention in hybridse

    This PR try to solve the problem that some api don't follow google cpp style.

    The thing this PR has done:

    • [x] Change the api of EngineOptions and JitOptions' methods to follow google cpp style name

    The related issues gitee related issue 286

    opened by FrankSzn 17
  • style: enforece cpp style convention in hybridse

    style: enforece cpp style convention in hybridse

    This PR try to solve the problem that some api don't follow google cpp style.

    background cpp code should follow our style guide: https://github.com/4paradigm/rfcs/blob/main/style-guide/code-convention.md

    The thing this PR has done:

    • [x] Change the api of EngineOptions and JitOptions' methods to follow google cpp style name, just leave the corresponding part in openmldb-batch unchanged

    The related issues gitee related issue 286

    opened by FrankSzn 14
  • segmentation fault when trying demo under the standalone mode

    segmentation fault when trying demo under the standalone mode

    image A segmentation fault occurs during the operation of Demo with The Standalone Mode ../openmldb/bin/openmldb --host 127.0.0.1 --port 6527, which may be caused by insufficient memory of the machine. bug high-priority 
    opened by yabg-shuai666 13
  • feat: support dayofyear() built-in function

    feat: support dayofyear() built-in function

    • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

    Adds the dayofyear() function which returns the day of the year for a given date (a number from 1 to 366).

    • What is the current behavior? (You can also link to an open issue here)

    dayofyear function doesn't exist, see issue for more details: #785

    enhancement community 
    opened by Nicholas-SR 13
  • feat: implement function last_day

    feat: implement function last_day

    • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

    Feature: implement built-in function last_day

    • What is the new behavior (if this is a feature change)?

    Closes #821

    udf execute-engine 
    opened by HeZean 12
  • feat: add a udf  function similar to Hive get_json_object

    feat: add a udf function similar to Hive get_json_object

    • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

    feature

    • What is the current behavior? (You can also link to an open issue here)

    https://github.com/4paradigm/OpenMLDB/issues/1639

    • What is the new behavior (if this is a feature change)? add a udf function similar to Hive get_json_object
    documentation workflow build monitoring batch-engine sdk execute-engine storage-engine 
    opened by hezhaozhao-git 11
  • feat: like prediate and like udf

    feat: like prediate and like udf

    Features

    resolve #224 detail sql rules found in #224 and #686

    What not implemented yet:

    • convert and into string if possible, currently only support string or null
    • data exception: invalid escape sequence is not checked

    Further work

    • Implicit conversion for like's <target expression> & <pattern expression>
    • support RLIKE/SIMILAR TO predicate (regexp_match)
    enhancement 
    opened by aceforeverd 11
  • docs(udf): how to generate udf documents (udfs_8h.md)

    docs(udf): how to generate udf documents (udfs_8h.md)

    Uncompleted.

    • add the document for the second task in #1707 : udf doc gen
      • there needed extra code to make steps work
    • also to metion #807, which related to linked udf doc problems
    documentation build execute-engine 
    opened by aceforeverd 10
  • docs: improved the en version of create_deploy

    docs: improved the en version of create_deploy

    • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...) docs

    • What is the current behavior? (You can also link to an open issue here)

    • What is the new behavior (if this is a feature change)? improved the format and the syntax

    documentation 
    opened by michelle-qinqin 0
  • fix!: able to compile & execute multi window union query in batch mode

    fix!: able to compile & execute multi window union query in batch mode

    breaking change:

    • for window agg node with append_input, the output schema is changed to producer row + project row, which it is project row + producer row before this commit
    execute-engine 
    opened by aceforeverd 5
  • Fixed different signedness

    Fixed different signedness

    I tried to fix the issue different sightedness that was causing the warning message for issue #2109. I noticed that on line 333 you used int instead of size_t when dealing with both schema.columns().size()

    execute-engine 
    opened by marandabui 5
  • docs: go sdk

    docs: go sdk

    • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

    go SDK doc

    • What is the current behavior? (You can also link to an open issue here)

    • What is the new behavior (if this is a feature change)?

    documentation 
    opened by qsliu2017 0
  • fix: correct go module to its path

    fix: correct go module to its path

    • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

    Bug fix

    • What is the current behavior? (You can also link to an open issue here)

    Go SDK module is under github.com/4paradigm/OpenMLDB/go but declares its path github.com/4paradigm/OpenMLDB/go/openmldb

    • What is the new behavior (if this is a feature change)?
    opened by qsliu2017 3
Releases(v0.6.2)
  • v0.6.2(Sep 20, 2022)

    Features

    • Support independently executing the OpenMLDB offline engine without the OpenMLDB deployment (#2423 @tobegit3hub)
    • Support the log setting of ZooKeeper and disable ZooKeeper logs in the diagnostic tool (#2451 @vagetablechicken)
    • Support query parameters of the SQL query APIs (#2277 @qsliu2017)
    • Improve the documents (#2406 @aceforeverd, #2408 #2414 @vagetablechicken, #2410 #2402 #2356 #2374 #2396 #2376 #2419 @michelle-qinqin, #2424 #2418 @dl239, #2455 @lumianph, #2458 @tobegit3hub)
    • Other minor features (#2420 @aceforeverd, #2411 @wuyou10206, #2446 #2452 @vagetablechicken, #2475 @tobegit3hub)

    Bug Fixes

    • Table creation succeeds even if partitionnum is set to 0, which should report an error. (#2220 @dl239)
    • There are thread races in aggregators if there are concurrent puts. (#2472 @zhanghaohit)
    • The limit clause dose not work if it is used with the where and group by clauses. (#2447 @aceforeverd)
    • The TaskManager process will terminate if ZooKeeper disconnects. (#2494 @tobegit3hub)
    • The replica cluster dose not create the database if a database is created in the leader cluster (#2488 @dl239)
    • When there is data in base tables, deployment with long windows still can be executed (which should report an error). (#2501 @zhanghaohit)
    • Other minor bug fixes (#2415 @aceforeverd, #2417 #2434 #2435 #2473 @dl239, #2466 @vagetablechicken)

    Code Refactoring

    #2413 @dl239, #2470 #2467 #2468 @vagetablechicken

    Source code(tar.gz)
    Source code(zip)
    openmldb-0.6.2-darwin.tar.gz(154.90 MB)
    openmldb-0.6.2-linux.tar.gz(394.74 MB)
  • v0.6.1(Aug 30, 2022)

    Features

    • Support new build-in functions last_day and regexp_like (#2262 @HeZean, #2187 @jiang1997)
    • Support Jupyter Notebook for the TalkingData use case (#2354 @vagetablechicken)
    • Add a new API to disable Saprk logs of the batch engine (#2359 @tobegit3hub)
    • Add the use case of precision marketing based on OneFlow (#2267 @Elliezza @vagetablechicken @siqi)
    • Support the RPC request timeout in CLI and Python SDK (#2371 @vagetablechicken)
    • Improve the documents (#2021 @liuceyi, #2348 #2316 #2324 #2361 #2315 #2323 #2355 #2328 #2360 #2378 #2319 #2350 #2395 #2398 @michelle-qinqin, #2373 @njzyfr, #2370 @tobegit3hub, #2367 #2382 #2375 #2401 @vagetablechicken, #2387 #2394 @dl239, #2379 @aceforeverd, #2403 @lumianph, #2400 @gitpod-for-oss @aceforeverd, )
    • Other minor features (#2363 @aceforeverd, #2185 @qsliu2017)

    Bug Fixes

    • APIServer will core dump if no rs in QueryResp. (#2346 @vagetablechicken)
    • Data has not been deleted from pre-aggr tables if there are delete operations in a main table. (#2300 @zhanghaohit)
    • Task jobs will core dump when enabling UnsafeRowOpt with multiple threads in the Yarn cluster. (#2352 #2364 @tobegit3hub)
    • Other minor bug fixes (#2336 @dl239, #2337 @dl239, #2385 #2372 @aceforeverd, #2383 #2384 @vagetablechicken)

    Code Refactoring

    #2310 @hv789, #2306 #2305 @yeya24, #2311 @Mattt47, #2368 @TBCCC, #2391 @PrajwalBorkar, #2392 @zahyaah, #2405 @wang-jiahua

    Source code(tar.gz)
    Source code(zip)
    openmldb-0.6.1-darwin.tar.gz(154.85 MB)
    openmldb-0.6.1-linux.tar.gz(394.23 MB)
  • v0.6.0(Aug 10, 2022)

    Highlights

    • Add a new toolkit of managing OpenMLDB, currently including a diagnostic tool and a log collector (#2299 #2326 @dl239 @vagetablechicken)
    • Support aggregate functions with suffix _where using pre-aggregation (#1821 #1841 #2321 #2255 #2321 @aceforeverd @nautaa @zhanghaohit)
    • Support a new SQL syntax of EXCLUDE CURRENT_ROW (#2053 #2165 #2278 @aceforeverd)
    • Add new OpenMLDB ecosystem plugins for DolphinScheduler (#1921 #1955 @vagetablechicken) and Airflow (#2215 @vagetablechicken)

    Other Features

    • Support SQL syntax of DELETE in SQL and Kafka Connector (#2183 #2257 @dl239)
    • Support customized order in the insert statement (#2075 @vagetablechicken)
    • Add a new use case of TalkingData AdTracking Fraud Detection (#2008 @vagetablechicken)
    • Improve the startup script to remove mon (#2050 @dl239)
    • Improve the performance of offline batch SQL engine (#1882 #1943 #1973 #2142 #2273 #1773 @tobegit3hub)
    • Support returning version numbers from TaskManager (#2102 @tobegit3hub)
    • Improve the CICD workflow and release procedure (#1873 #2025 #2028 @mangoGoForward)
    • Support GitHub Codespaces (#1922 @nautaa)
    • Support new built-in functions char(int), char_length, character_length, radians, hex, median (#1896 #1895 #1897 #2159 #2030 @wuxiaobai24 @HGZ-20 @Ivyee17)
    • Support returning result set for a new query API (#2189 @qsliu2017)
    • Improve the documents (#1796 #1817 #1818 #2254 #1948 #2227 #2254 #1824 #1829 #1832 #1840 #1842 #1844 #1845 #1848 #1849 #1851 #1858 #1875 #1923 #1925 #1939 #1942 #1945 #1957 #2031 #2054 #2140 #2195 #2304 #2264 #2260 #2257 #2254 #2247 #2240 #2227 #2115 #2126 #2116 #2154 #2152 #2178 #2147 #2146 #2184 #2138 #2145 #2160 #2197 #2198 #2133 #2224 #2223 #2222 #2209 #2248 #2244 #2242 #2241 #2226 #2225 #2221 #2219 #2201 #2291 # 2231 #2196 #2297 #2206 #2238 #2270 #2296 #2317 #2065 #2048 #2088 #2331 #1831 #1945 #2118 @ZtXavier @pearfl @PrajwalBorkar @tobegit3hub @ZtXavier @zhouxh19 @dl239 @vagetablechicken @tobegit3hub @aceforeverd @jmoldyvan @lumianph @bxiiiiii @michelle-qinqin @yclchuxue @redundan3y)

    Bug Fixes

    • The SQL engine may produce incorrect results under certain circumstances. (#1950 #1997 #2024 @aceforeverd)
    • The genDDL function generates incorrect DDL if the SQL is partitioned by multiple columns. (#1956 @dl239)
    • The snapshot recovery may fail for disk tables. (#2174 @zhanghaohit)
    • enable_trace does not work for some SQL queries. (#2292 @aceforeverd)
    • Tablets cannot save ttl when updating the ttl of index. (#1935 @dl239)
    • MakeResultSet uses a wrong schema in projection. (#2049 @dl239)
    • A table does not exist when deploying SQL by the APIServer (#2205 @vagetablechicken)
    • The cleanup for ZooKeep does not work properly. (#2191 @mangoGoForward)

    Other minor bug fixes (#2052 #1959 #2253 #2273 #2288 #1964 #2175 #1938 #1963 #1956 #2171 #2036 #2170 #2236 #1867 #1869 #1900 #2162 #2161 #2173 #2190 #2084 #2085 #2034 #1972 #1408 #1863 #1862 #1919 #2093 #2167 #2073 #1803 #1998 #2000 #2012 #2055 #2174 #2036 @Xeonacid @CuriousCorrelation @Shigm1026 @jiang1997 @Harshvardhantomar @nautaa @Ivyee17 @frazie @PrajwalBorkar @dl239 @aceforeverd @tobegit3hub @dl239 @vagetablechicken @zhanghaohit @mangoGoForward @SaumyaBhushan @BrokenArrow1404 @harshlancer)

    Code Refactoring

    #1884 #1917 #1953 #1965 #2017 #2033 #2044 @mangoGoForward; #2131 #2130 #2112 #2113 #2104 #2107 #2094 #2068 #2071 #2070 #1982 #1878 @PrajwalBorkar; #2158 #2051 #2037 #2015 #1886 #1857 @frazie; #2100 #2096 @KikiDotPy; #2089 @ayushclashroyale; #1994 @fpetrakov; #2079 kayverly; #2062 @WUBBBB; #1843 @1korenn; #2092 @HeZean; #1984 @0sirusD3m0n; #1976 @Jaguar16; #2086 @marc-marcos; #1999 @Albert-Debbarma;

    Source code(tar.gz)
    Source code(zip)
    openmldb-0.6.0-darwin.tar.gz(154.89 MB)
    openmldb-0.6.0-linux.tar.gz(394.07 MB)
  • v0.5.3(Jul 22, 2022)

  • v0.5.2(Jun 9, 2022)

    Features

    • Add new built-in functions, including char_length, char, radians, and replace (#1895 #1896 #1897 @Ivyee17, #1924 @aceforeverd)
    • Add the demo of DolphinScheduler task (#1921 @vagetablechicken)
    • Support inserting values with a specified database name (#1929 @dl239)
    • Improve window computation with UnsafeRowOpt by removing the zipped dataframe (#1882 @tobegit3hub)
    • Improve the documents (#1831 @yclchuxue, #1925 @lumianph, #1902 #1923 @vagetablechicken)
    • Support GitHub Codespaces (#1922 @nautaa)

    Bug Fixes

    • DistributeWindowIterator::GetKey() may result in core dump (#1892 @aceforeverd)
    • Tablet does not make ttl persistent when updating the ttl of index (#1935 @dl239)
    • TaskManager startup fails if LANG=zh_CN.UTF-8 is set (#1912 @vagetablechicken)
    • There are duplicate records in PRE_AGG_META_INFO (#1919 @nautaa)
    • The OpenMLDB Spark fails to fallback to SparkSQL for unsupported functions (#1908 @tobegit3hub)
    • Fixing other minor bugs (#1914 aceforeverd, #1900 @mangoGoForward, #1934 @vagetablechicken)

    Code Refactoring

    #1899 @auula, #1913 @dl239, #1917 @mangoGoForward, #1803 @SaumyaBhushan, #1870 @Ivyee17, #1886 @frazie

    Source code(tar.gz)
    Source code(zip)
    openmldb-0.5.2-darwin.tar.gz(154.40 MB)
    openmldb-0.5.2-linux.tar.gz(387.28 MB)
  • v0.5.1(May 26, 2022)

    Features

    • Support the new OpenMLDB Kafka connector (#1771 @vagetablechicken)
    • Support very long SQLs in TaskManager (#1833 @tobegit3hub)
    • Support window union correctly in the cluster mode (#1855 #1856 @aceforeverd @dl239)
    • Support count_where(*, condition) in the storage engine (#1841 @nautaa)
    • Add a new micro-benchmark tool for performance evaluation (#1800 @dl239)

    Bug Fixes

    • Auto creating table throws error when a new ttl is greater than the current ttl. (#1737 @keyu813)
    • Offline tasks crash when enabling UnsafeRowOpt for continuous windows. (#1773 @tobegit3hub)
    • The aggregator is not reset if the table is empty. (#1784 @zhanghaohit)
    • The order for window union rows and original rows with the same order key is undefined. (#1802 @aceforeverd)
    • Queries with pre-aggregate enabled may crash under certain tests. (#1838 zhanghaohit)
    • Ending space in CLI may cause program crash. (#1820 @aceforeverd)
    • When creating an engine with empty databases, it cannot execute the command of USE database in the Python SDK. (#1854 @vagetablechicken)
    • When using the soft copy for csv files, it cannot read offline path with options. (#1872 @vagetablechicken)

    Code Refactoring

    #1766 @hiyoyolumi; #1777 @jmoldyvan; #1779 @SohamRatnaparkhi; #1768 @SaumyaBhushan; #1795 @vighnesh-kadam; #1806 @Mount-Blanc; #1978 @wangxinyu666666; #1781 @SaumyaBhushan; #1786 @xuduling; #1810 @IZUMI-Zu; #1824 @bxiiiiii; #1843 @1korenn; #1851 @zhouxh19; #1862 @Ivyee17; #1867, #1869, #1873, #1884 @mangoGoForward; #1863 @Ivyee17; #1815 @jmoldyvan; #1857 @frazie; #1878 @PrajwalBorkar

    Source code(tar.gz)
    Source code(zip)
    apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz(668.29 MB)
    openmldb-0.5.1-darwin.tar.gz(152.85 MB)
    openmldb-0.5.1-linux.tar.gz(385.57 MB)
    workflow_openmldb_demo.json(16.96 KB)
  • v0.5.0(May 7, 2022)

    Highlights

    • We have introduced an important performance optimization technique of pre-aggregation, which can significantly improve the performance for a query with time windows containing massive amount of rows, e.g., a few millions. (#1532 #1573 #1583 #1622 #1627 #1672 # 1712 @zhanghaohit @nautaa)
    • We have added a new storage engine that supports persistent storage (such as HDD and SSD) for the online SQL engine. Such a storage engine is helpful when a user wants to reduce the cost with acceptable performance degradation. (#1483 @Leowner)
    • We have supported C/C++ based User-Defined Functions (UDFs) with dynamic registration to enhance the development experience. (#1509 #1733 #1700 @dl239 @tobegit3hub)

    Other Features

    • Enhance the OpenMLDB Prometheus exporter ( #1584, #1645, #1754 @aceforeverd )
    • Support collecting statistics of query response time for online queries ( #1497, #1521 @aceforeverd )
    • Support new SQL commands: SHOW COMPONENTS, SHOW TABLE STATUS (#1380 #1431 #1704 @aceforeverd)
    • Support setting global variables (#1310 #1359 #1364 @keyu813 @aceforeverd)
    • Support reading Spark configuration files from the CLI (#1600 @tobegit3hub)
    • Support using multiple threads for the Spark local mode (#1675 @tobegit3hub)
    • Enhance the performance of join by using the Spark's native expression (#1502 tobegit3hub)
    • Support the validation for TaskManager configuration (#1262 @tobegit3hub)
    • Support tracking unfinished jobs in the TaskManager (#1474 @tobegit3hub)
    • Other minor features (#1601 @dl239; #1574 @vagetablechicken; #1546 @keyu813; #1729 @vagetablechicken; #1460 @tobegit3hub)

    Bug Fixes

    • Incorrect results when the order of conditions specified in where is different from that of the index (#1709 @aceforeverd)
    • Incorrect results of lag/at/lead under certain circumstances (#1605 #1739 @aceforeverd)
    • Memory leakage in zk_client (#1660 @wuxiaobai24)
    • Catalog update failed if the role of a tablet is changed (#1655 @dl239)
    • Related bugs about UnsafeRow for the offline engine (#1298, #1312, #1326, #1362, #1637, #1381, #1731 @tobegit3hub)
    • Incorrect results after adding a new index in the standalone mode (#1721 @keyu813)
    • Incorrect results of SHOW JOBS under certain circumstances (#1453 @tobegit3hub)
    • Incorrect results of the date columns with UnsafeRowOpt(#1469 @tobegit3hub)
    • Other minor bug fixes (#1698 @kfiring; #1651 @kutlayacar; #1621 @KaidoWang; #1150, #1243 @tobegit3hub; )

    Code Refactoring

    #1616 @dl239; #1743 @zjx1319

    Acknowledgement

    We appreciate the contribution to this release from all of our contributors, especially those from the community, including @nautaa @Leowner @keyu813 @wuxiaobai24 @kfiring @kutlayacar @KaidoWang @zjx1319

    We are looking forward to your contribution!

    Source code(tar.gz)
    Source code(zip)
    kafka-connect-jdbc.tgz(157.95 MB)
    openmldb-0.5.0-darwin.tar.gz(152.83 MB)
    openmldb-0.5.0-linux.tar.gz(385.42 MB)
  • v0.4.4(Apr 1, 2022)

    Features

    • Support the standalone version by Java and Python SDKs (#1302 #1325 #1485 @tobegit3hub @HuilinWu2 @keyu813)
    • Support the blocking execution for offline queries (#1486 @vagetablechicken )
    • Add the getStatement API in Java SDK (#1231 @dl239 )
    • Support multiple rows insertion in the Python SDK (#1402 @hezhaozhao-git )
    • Support the JDBC connection (#1511 @vagetablechicken )

    Bug Fixes

    • The error message is empty when executing show deployment in CLI fails. (#1415 @dl239 )
    • The show job and show jobs cannot display correct information. (#1440 @vagetablechicken )
    • The built-in function execution on a string filed with the length of greater than 2048 characters causes OpenMLDB crash. (#1540 @dl239 )
    • The simple expression inference fails in some cases (#1443 @jingchen2222 )
    • The PreparedStatement in Java SDK does not perform as expected. (#1511 @vagetablechicken )

    Code Refactoring

    #1467 @aimanfatima ; #1513 @L-Y-L ; #1503 @Stevinson ;

    Acknowledgement

    We appreciate the contribution to this release from all of our contributors, especially those from the community, including @hezhaozhao-git @HuilinWu2 @keyu813 @aimanfatima @L-Y-L @Stevinson . We are looking forward to your contribution!

    Source code(tar.gz)
    Source code(zip)
    openmldb-0.4.4-darwin.tar.gz(149.02 MB)
    openmldb-0.4.4-linux.tar.gz(370.68 MB)
    pulsar-io-jdbc-openmldb-2.11.0-SNAPSHOT.nar(158.47 MB)
  • v0.4.3(Mar 14, 2022)

    Features

    • Add the output of the number of rows imported after successfully importing data (#1401 @Manny-op)
    • Code Refactoring (#1366 @Cupid0320; #1378 @wuteek; #1418 @prashantpaidi; #1420 @shiyoubun; #1422 @vagetablechicken)

    Bug Fixes

    • Loading online data with "not null" columns in Spark fails. (#1341 @vagetablechicken)
    • max_where and min_where results are incorrect if there is no rows matched. (#1403 @aceforeverd)
    • The insert and select execution of the standalone version fails. (#1426 @dl239)
    • Other minor bug fixes (#1379 @wuteek; #1384 @jasleon)
    Source code(tar.gz)
    Source code(zip)
    openmldb-0.4.3-darwin.tar.gz(148.92 MB)
    openmldb-0.4.3-linux.tar.gz(369.90 MB)
  • v0.4.2(Mar 1, 2022)

    Features

    • Support timestamps in long int when importing a csv file (#1237 @vagetablechicken)
    • Change the default execution mode in CLI from online to offline (#1332 @dl239)
    • Enhancements for the Python SDK:
      • Support fetchmany and fetchall in Python SDK (#1215 @HuilinWu2)
      • Support fetching logs of TaskManager jobs in Python SDK and APIs (#1214 @tobegit3hub)
      • Support fetching the schema of result sets in Python SDK (#1194 @tobegit3hub)
      • Support the SQL magic function in Jupyter Notebook when using the Python SDK. (#1164 @HuilinWu2)
    • Enhancements for the TaskManager:
      • Taskmanager can find the local batchjob jar if the path is not configured. (#1250 @tobegit3hub)
      • Support the Yarn-client mode in TaskManager (#1265 @tobegit3hub)
      • Support correctness checking for TaskManager's configuration (#1262 @tobegit3hub)
      • Support reordering for the task list (#1256 @tobegit3hub)
    • Add new UDF functions of lower and lcase (#1192 @Liu-2001)
    • Offline queries that do not execute on tables will run successfully even when the connection fails. (#1264 @tobegit3hub)

    Bug Fixes

    • Offline data import fails when the timestamp value is null. (#1274 @tobegit3hub)
    • Start time of TaskManager jobs in CLI is null. (#1272 @tobegit3hub)
    • LAST JOIN may fail in the cluster version under certain circumstances. (#1226 @dl239)
    • Invalid SQL may run successfully. (#1208 @aceforeverd)
    Source code(tar.gz)
    Source code(zip)
    openmldb-0.4.2-darwin.tar.gz(148.97 MB)
    openmldb-0.4.2-linux.tar.gz(369.59 MB)
  • v0.4.1(Feb 9, 2022)

    [0.4.1] - 2022-02-09

    Features

    • Improve CLI error messages and support the 'enable_trace' system variable (#1129 @jingchen2222)

    Bug Fixes

    • CLI coredumps when it fails to connect to a nameserver. (#1166 @keyu813)
    • Java SDK has the issue of memory leaks. (#1148 @dl239)
    • The startup fails if a pid file exists. (#1108 @dl239)
    • There are incorrect values for the column with the date type when loading data into an online table. (#1103 @yabg-shuai666)
    • Offline data import for the CSV format may cause incorrect results. (#1100 @yabg-shuai666)
    • 'Offline path' cannot be displayed after importing offline data. (#1172 @vagetablechicken)
    Source code(tar.gz)
    Source code(zip)
    openmldb-0.4.1-darwin.tar.gz(143.93 MB)
    openmldb-0.4.1-linux.tar.gz(364.44 MB)
  • v0.4.0(Jan 13, 2022)

    [0.4.0] - 2022-01-14

    Highlights

    • The SQL-centric feature is enhanced for both standalone and cluster versions. Now you can enjoy the SQL-centric development and deployment experience seamlessly. (#991,#1034,#1071,#1064,#1061,#1049,#1045,#1038,#1034,#1029,#997,#996,#968,#946,#840,#830,#814,#776,#774,#764,#747,#740,#466,#481,#1033,#1027,#966,#951,#950,#932,#853,#835,#804,#800,#596,#595,#568,#873,#1025,#1021,#1019,#994,#991,#987,#912,#896,#894,#893,#873,#778,#777,#745,#737,#701,#570,#559,#558,#553 @tobegit3hub; #1030,#965,#933,#920,#829,#783,#754,#1005,#998 @vagetablechicken)
    • The Chinese documentations are thoroughly polished and accessible at https://docs.openmldb.ai/ . This documentation repository is available at https://github.com/4paradigm/openmldb-docs-zh , and you are welcome to make contributions.
    • Experimental feature: We have introduced a monitoring module based on Prometheus + Grafana for online feature processing. (#1048 @aceforeverd)

    Other Features

    • Support SQL syntax: LIKE, HAVING (#841 @aceforeverd; #927,#698 @jingchen2222)
    • Support new built-in functions: reverse (#1004 @nautaa), dayofyear (#856 @Nicholas-SR)
    • Improve the compilation and install process, and support building from sources (#999,#871,#594,#752,#793,#805,#875,#871,#999 @aceforeverd; #992 @vagetablechicken)
    • Improve the GitHub CI/CD workflow (#842,#884,#875,#919,#1056,#874 @aceforeverd)
    • Support system databases and tables (#773 @dl239)
    • Improve the function create index (#828 @dl239)
    • Improve the demo image (#1023,#690,#734,#751 @zhanghaohit)
    • Improve the Python SDK (#913,#906 @tobegit3hub;#949,#909 @HuilinWu2; #838 @dl239)
    • Simplify the concepts of execution modes (#877,#985,#892 @jingchen2222)
    • Add data import and export for the cluster version (#1078 @tobegit3hub)
    • Add new deployment command for the cluster version (#921 @dl239)
    • Support default values when creating a table (#563 @zoyopei)
    • Support string delimiters and quotes (#668 @ZackeryWang)
    • Add a new lru_cache to support upsert (#795 @vagetablechicken)
    • Support adding index with any ts_col (#828 @dl239)
    • Improve the ts packing in sql_insert_now (#944 ,#974 @keyu813)
    • Improve documentations (#952 #885 @mahengyang; #834 @Nicholas-SR; #792,#1058,#1002,#872,#836,#792 @lumianph; #844,#782 @jingchen2222; #1022,#805 @aceforeverd)
    • Other minor updates (#1073 @dl239)

    Bug Fixes

    #847, #831, #647, #934, #953, #1015, #982, #927, #994, #1008, #1028, #1019, #779, #855, #350, #631, #1074, #1073, #1081

    @nautaa, @Nicholas-SR, @aceforeverd, @dl239, @jingchen2222, @tobegit3hub, @keyu813

    Source code(tar.gz)
    Source code(zip)
    openmldb-0.4.0-darwin.tar.gz(143.87 MB)
    openmldb-0.4.0-linux.tar.gz(364.33 MB)
    SHA256SUM(189 bytes)
  • v0.3.2(Nov 18, 2021)

  • v0.3.0(Nov 5, 2021)

    Highlight

    We introduce a new standalone mode that can be deployed on a single node, which is suitable for small businesses and the demonstration purpose. Please read more details from here. The standalone mode is particularly enhanced for ease of use based on the following features that are supported by standalone mode only.

    • The standalone deployment mode https://github.com/4paradigm/OpenMLDB/issues/440
    • Connection establishment by specifying the host name and port in CLI https://github.com/4paradigm/OpenMLDB/issues/441
    • LOAD DATA command for bulk loading https://github.com/4paradigm/OpenMLDB/issues/443
    • SQL syntax support for exporting data: SELECT INTO FILE https://github.com/4paradigm/OpenMLDB/issues/455
    • Deployment commands: DEPLOY, SHOW DEPLOYMENT, and DROP DEPLOYMENT https://github.com/4paradigm/OpenMLDB/issues/460 https://github.com/4paradigm/OpenMLDB/issues/447

    Other Features

    • A new CLI command to support different levels of performance sensitivity: SET performance_sensitive=true|false. When it is set to false, SQL queries can be executed without indexes. Please read here for more details about the performance sensitivity configuration https://github.com/4paradigm/OpenMLDB/issues/555
    • Supporting SQL queries over multiple databases https://github.com/4paradigm/OpenMLDB/issues/476
    • Supporting inserting multiple tuples into a table using a single SQL https://github.com/4paradigm/OpenMLDB/issues/398
    • Improvements for Java SDK: The new API getTableSchema https://github.com/4paradigm/OpenMLDB/pull/483 The new API genDDL, which is used to generate DDLs according to a given SQL script https://github.com/4paradigm/OpenMLDB/issues/588

    Bugfix

    • Exceptions caused by certain physical plans with special structures when performing column resolve for logical plans. https://github.com/4paradigm/OpenMLDB/issues/437
    • Under specific circumstances, unexpected outcomes produced by SQL queries with the WHERE when certain WHERE conditions do not fit into indexes https://github.com/4paradigm/OpenMLDB/issues/599
    • The bug when enabling WindowParallelOpt and WindowSkewOptimization at the same times https://github.com/4paradigm/OpenMLDB/issues/444
    • The bug of LCA (Lowest Common Ancestor) algorithm to support WindowParallelOpt for particular SQLs https://github.com/4paradigm/OpenMLDB/issues/485
    • Workaround for the Spark bug (SPARK-36932) when the columns with the same names in LastJoin https://github.com/4paradigm/OpenMLDB/issues/484

    Acknowledgement

    We appreciate the contribution to this release from external contributors who are not from 4Paradigm's core OpenMLDB team, including Kanekanekane, shawn-happy, lotabout, Shouren, zoyopei, huqianshan

    Source code(tar.gz)
    Source code(zip)
    openmldb-0.3.0-linux.tar.gz(136.63 MB)
    openmldb-0.3.0-macos.tar.gz(21.28 MB)
  • v0.2.3(Sep 3, 2021)

    Feature

    • Data importer support bulk load #250
    • Support parameterized query under BatchMode #262, #168
    • Support Hive metastore and Iceberg tables for offline #245, #146
    • Integrated with Trino #254
    • Support global SortBy node for offline #296

    Bug Fix

    • Fix end2end offline tests for the same SQL #300
    • desc do not display the value of ttl after adding index#156

    SQL Syntax

    • nvl & nvl2: #238
    • bitwise operators: &, |, ^, ~ #244
    • between predicate: #277
    Source code(tar.gz)
    Source code(zip)
    openmldb-0.2.3-linux.tar.gz(133.64 MB)
  • hybridse-v0.2.3(Aug 31, 2021)

  • 0.2.2(Aug 9, 2021)

  • v0.2.1(Aug 3, 2021)

  • v0.2.0(Jul 23, 2021)

    Features

    • Refactor front-end using zetasql. Thus OpenMLDB can support more SQL syntaxs and provide friendly syntax error message.
    • Better code style and comment
    • Add APIServer module. User can use Rest API access OpenMLDB.#70

    SQL Syntax

    Changed

    • table options syntax: #103
    • lead method: #136

    Removed

    • || and && as logical operator: #99
    • at function: #136

    Note

    • openmldb-0.2.0-linux.tar.gz targets on x86_64
    • aarch64 artifacts consider experimental
    Source code(tar.gz)
    Source code(zip)
    openmldb-0.2.0-210802-linux-gnu-aarch64.tar.gz(523.56 MB)
    openmldb-0.2.0-linux.tar.gz(529.77 MB)
    _sql_router_sdk_210802_aarch64.so(447.91 MB)
  • v0.1.5-pre(Jul 15, 2021)

Owner
4Paradigm
4Paradigm Open Source Community
4Paradigm
Fast unidirectional synchronization - make or efficiently update a copy of a database, without slow dumping & reloading

Fast unidirectional synchronization - make or efficiently update a copy of a database, without slow dumping & reloading

Will Bryant 265 Aug 19, 2022
Tuibox - A single-header terminal UI (TUI) library, capable of creating mouse-driven, interactive applications on the command line.

tuibox tuibox ("toybox") is a single-header terminal UI library, capable of creating mouse-driven, interactive applications on the command line. It is

Andrew 19 Sep 22, 2022
Source code for the article "Code vs Data Driven Displacement"

Code vs Data Driven Displacement This repo contains the source code for all the demos from this article. It uses raylib or more specifically raygui so

Daniel Holden 367 Sep 19, 2022
rax/RAX is a C++ extension library designed to provide new, fast, and reliable cross-platform class types.

rax rax/RAX is a C++ extension library designed to provide cross-platform new, fast, and reliable class types for different fields such as work with I

MaxHwoy 5 May 2, 2022
Open Source Cheat for Apex Legends, designed for ease of use. Made to understand reversing of Apex Legends and respawn's modified source engine as well as their Easy Anti Cheat Implementation.

Apex-Legends-SDK Open Source Cheat for Apex Legends, designed for ease of use. Made to understand reversing of Apex Legends and respawn's modified sou

null 101 Sep 5, 2022
Tightly coupled GNSS-Visual-Inertial system for locally smooth and globally consistent state estimation in complex environment.

GVINS GVINS: Tightly Coupled GNSS-Visual-Inertial Fusion for Smooth and Consistent State Estimation. paper link Authors: Shaozu CAO, Xiuyuan LU and Sh

HKUST Aerial Robotics Group 520 Sep 24, 2022
fx is a workspace tool manager. It allows you to create consistent, discoverable, language-neutral and developer friendly command line tools.

fx is a workspace tool manager. It allows you to create consistent, discoverable, language-neutral and developer friendly command line tools.

null 19 Aug 27, 2022
Project is to port original Zmodem for Unix to CP/M and provide binaries and source code for platform specific modification as needed. Based on 1986 C source code by Chuck Forsberg

Zmodem-CP-M This repository is intended to foster a RetroBrewComputers community effort to port the original Zmodem source code for Unix to CP/M so ev

null 11 Aug 31, 2022
Project is to port original Zmodem for Unix to CP/M and provide binaries and source code for platform specific modification as needed. Based on 1986 C source code by Chuck Forsberg

Zmodem4CPM This repository is intended to foster a RetroBrewComputers community effort to port the original Zmodem source code for Unix to CP/M so eve

null 11 Aug 31, 2022
Code accompanying our SIGGRAPH 2021 Technical Communications paper "Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments"

SIGGRAPH ASIA 2021 Technical Communications Transition Motion Tensor: A Data-Driven Framework for Versatile and Controllable Agents in Physically Simu

null 10 Apr 21, 2022
A simple tool that aims to efficiently and quickly parse the outputs of web scraping tools like gau

massurl is a simple tool that aims to parse the outputs of tools like gau, and extract the parameters for each URL, remove duplicates and do it all very quickly. Because web scraping tools' outputs can get very large very quickly, it is nice to have a tool that parses them and and outputs something clean and easy to read.

Fr1nge 13 Jul 24, 2022
Coqui Inference Engine is a library for efficiently deploying speech models.

Coqui Inference Engine Coqui Inference Engine is a library for efficiently deploying speech models. This project is at an early proof-of-concept stage

coqui 34 Feb 13, 2022
Extension for PHP to interface efficiently with a Controller Area Network (CAN bus) 2.0A / 2.0B

PHP-CanBus Extension PHP-canbus is THE extension for PHP on Linux that allows PHP code to interface efficiently with a Controller Area Network (CAN bu

Adamczyk Piotr 5 Sep 10, 2022
Separable Subsurface Scattering is a technique that allows to efficiently perform subsurface scattering calculations in screen space in just two passes.

Separable Subsurface Scattering Separable Subsurface Scattering is a technique that allows to efficiently perform subsurface scattering calculations i

Jorge Jimenez 528 Sep 10, 2022
Samir Teymurov 1 Oct 6, 2021
Unix pager (with very rich functionality) designed for work with tables. Designed for PostgreSQL, but MySQL is supported too. Works well with pgcli too. Can be used as CSV or TSV viewer too. It supports searching, selecting rows, columns, or block and export selected area to clipboard.

Unix pager (with very rich functionality) designed for work with tables. Designed for PostgreSQL, but MySQL is supported too. Works well with pgcli too. Can be used as CSV or TSV viewer too. It supports searching, selecting rows, columns, or block and export selected area to clipboard.

Pavel Stehule 1.9k Sep 22, 2022
OpenDCDiag is an open-source project designed to identify defects and bugs in CPUs.

OpenDCDiag is an open-source project designed to identify defects and bugs in CPUs. It consists of a set of tests built around a sophisticated CPU testing framework. OpenDCDiag is primarily intended for, but not limited to, Data Center CPUs.

OpenDCDiag 24 Aug 27, 2022
Professor Terence Parr has taught us how to create a virtual machine Now it is time to pwn virtual machine

My First real world CTF Simple Virtual Machine Challenge description Professor Terence Parr has taught us how to create a virtual machine Now it is ti

null 1 Feb 17, 2022