StarRocks is a next-gen sub-second MPP database for full analysis senarios, including multi-dimensional analytics, real-time analytics and ad-hoc query, formerly known as DorisDB.

Related tags

Database starrocks
Overview

StarRocks

StarRocks is a next-gen sub-second MPP database for full analysis senarios, including multi-dimensional analytics, real-time analytics and ad-hoc query, formerly known as DorisDB.

Technology

  • Native vectorized SQL engine: StarRocks adopts vectorization technology to make full use of the parallel computing power of CPU, achieving sub-second query returns in multi-dimensional analyses, which is 5 to 10 times faster than previous systems.
  • Simple architecture: StarRocks does not rely on any external systems. The simple architecture makes it easy to deploy, maintain and scale out. StarRocks also provides high availability, reliability, scalability and fault tolerance.
  • Standard SQL: StarRocks supports ANSI SQL syntax (fully supportted TPC-H and TPC-DS). It is also compatible with the MySQL protocol. Various clients and BI software can be used to access StarRocks.
  • Smart query optimization: StarRocks can optimize complex queries through CBO (Cost Based Optimizer). With a better execution plan, the data analysis efficiency will be greatly improved.
  • Realtime update: The updated model of StarRocks can perform upsert/delete operations according to the primary key, and achieve efficient query while concurrent updates.
  • Intelligent materialized view: The materialized view of StarRocks can be automatically updated during the data import and automatically selected when the query is executed.
  • Convenient query federation: StarRocks allows direct access to data from Hive, MySQL and Elasticsearch without importing.

User cases

  • StarRocks supports not only high concurrency & low latency points queries, but also high throughput ad-hoc queries.
  • StarRocks unified batch and near real-time streaming data ingestion.
  • Pre-aggregations, flat tables, star and snowflake schemas are supported and all run at enhanced speed.
  • StarRocks hybridizes serving and analytical processing(HSAP) in a easy way. The minimalist architectural design reduces the complexity and maintenance cost of StarRocks and increases its reliability and scalability.

Install

Download the current release here.
For detailed instructions, please refer to deploy.

Links

LICENSE

Code in this repository is provided under the Elastic License 2.0. Some portions are available under open source licenses. Please see our FAQ.

Contributing to StarRocks

A big thanks for your attention to StarRocks! In order to accept your pull request, please follow the CONTRIBUTING.md.

Comments
  • [BugFix] fix race condition of workgroup scheduling (backport #12604)

    [BugFix] fix race condition of workgroup scheduling (backport #12604)

    This is an automatic backport of pull request #12604 done by Mergify.


    Mergify commands and options

    More conditions and actions can be found in the documentation.

    You can also trigger Mergify actions by commenting on this pull request:

    • @Mergifyio refresh will re-evaluate the rules
    • @Mergifyio rebase will rebase this PR on its base branch
    • @Mergifyio update will merge the base branch into this PR
    • @Mergifyio backport <destination> will backport this PR on <destination> branch

    Additionally, on Mergify dashboard you can:

    • look at your merge queues
    • generate the Mergify configuration with the config editor.

    Finally, you can contact us on https://mergify.com

    automerge 
    opened by mergify[bot] 147
  • [Feature]Encode integers/binary per column for exchange

    [Feature]Encode integers/binary per column for exchange

    What type of PR is this:

    • [ ] BugFix
    • [x] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    encode integers/binary per column for exchange, controlled by transmission_encode_level, if transmission_encode_level & 2, intergers are encode by streamvbyte, in order or not; if transmission_encode_level & 4, binary columns are compressed by lz4. if transmission_encode_level & 1, enable adaptive encoding.

    e.g., if transmission_encode_level = 7, SR will adaptively encode numbers and string columns according to the proper encoding ratio(< 0.9); if transmission_encode_level = 6, SR will force encoding numbers and string columns.

    in short, for transmission_encode_level, 2 for encoding integers or types supported by integers, 4 for encoding string, json and object columns are left to be supported later.

    NOTE:

    to be compatible with older version, during downgrading/upgrading across this PR, transmission_encode_level must be 0 and waiting already running queries done, then replace binary.

    Problem Summary(Required) :

    encode integers for exchange to reduce data size but costing a little CPU.

    effects:

    mysql> set transmission_encode_level =7;
    mysql> select max(l_linenumber), max(l_orderkey),max(l_partkey),max(l_tax),max(l_discount),max(l_extendedprice),max(l_quantity),max(l_shipdate),min(l_suppkey),max(l_commitdate) from orders join lineitem on o_orderkey=l_partkey  where    l_orderkey< 500000000;
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    | max(l_linenumber) | max(l_orderkey) | max(l_partkey) | max(l_tax) | max(l_discount) | max(l_extendedprice) | max(l_quantity) | max(l_shipdate) | min(l_suppkey) | max(l_commitdate) |
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    |                 7 |       499999974 |       20000000 |       0.08 |            0.10 |            104798.50 |           50.00 | 1998-12-01      |              1 | 1998-10-31        |
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    1 row in set (4.32 sec)
    
    mysql> set transmission_encode_level =0;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select max(l_linenumber), max(l_orderkey),max(l_partkey),max(l_tax),max(l_discount),max(l_extendedprice),max(l_quantity),max(l_shipdate),min(l_suppkey),max(l_commitdate) from orders join lineitem on o_orderkey=l_partkey  where    l_orderkey< 500000000;
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    | max(l_linenumber) | max(l_orderkey) | max(l_partkey) | max(l_tax) | max(l_discount) | max(l_extendedprice) | max(l_quantity) | max(l_shipdate) | min(l_suppkey) | max(l_commitdate) |
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    |                 7 |       499999974 |       20000000 |       0.08 |            0.10 |            104798.50 |           50.00 | 1998-12-01      |              1 | 1998-10-31        |
    +-------------------+-----------------+----------------+------------+-----------------+----------------------+-----------------+-----------------+----------------+-------------------+
    1 row in set (5.48 sec)
    

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
    Approved 
    opened by fzhedu 103
  • [Refactor] Refactor event based compaction framework

    [Refactor] Refactor event based compaction framework

    What type of PR is this:

    • [ ] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [x] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #10503

    Problem Summary(Required) :

    This PR only contain refactor of event base compaction framework

    1. optimize CompactionManager's update_tablet(), avoid multiple calls and fully copy
    2. support compaction on missing version tablet
    3. optimize default base compaction score calculation strategy
    4. optimize compaction scheduler avoid create unnecessary compaction task
    5. optimize cumulative compaction scheduler interval
    6. fix duplicate compaction bug

    the default compaction strategy still need optimize after some test, meanwhile we will support size-tiered compaction strategy in subsequent PRs.

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function
    Approved be-build 2.5 
    opened by meegoo 85
  • [Doc] add date functions and update other docs

    [Doc] add date functions and update other docs

    What type of PR is this:

    • [ ] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #

    Problem Summary(Required) :

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
    documentation Approved 
    opened by evelynzhaojie 73
  • [BugFix] fix left join on big chunk

    [BugFix] fix left join on big chunk

    What type of PR is this:

    • [x] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixed https://github.com/StarRocks/StarRocksTest/issues/1355

    Problem Summary(Required) :

    Left outer join on small left side and big right side:

    • Add _probe_row_finished indicates whether the probe row is finished, which needs to emit the left row if unmatched

    Test case:

    • https://github.com/StarRocks/StarRocksTest/pull/1360

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
    opened by mofeiatwork 67
  • [BugFix] Fix the page selection bug in late materialization for large columns

    [BugFix] Fix the page selection bug in late materialization for large columns

    What type of PR is this:

    • [x] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #

    Problem Summary(Required) :

    The page selection in late materialization depends on next_row context, which only be updated after the total selection procedure now. As this selection procedure may contain multiple select operations for large column, the context cannot be updated in time in the subsequent operations, which causes the unexpected selection result.

    This PR advances the next_row context in each selection operation.

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function
    be-build 
    opened by GavinMar 62
  • [BugFix] string_to_float_internal may lose some precision

    [BugFix] string_to_float_internal may lose some precision

    For 2.97, string_to_float_internal will calculated by 2 + 97 / 100, which is 2.9699999999999998, acutulay it should be calculated by by 297 / 100, which is 2.970000000002

    Signed-off-by: xyz [email protected]

    What type of PR is this:

    • [x] bug
    • [ ] feature
    • [ ] enhancement
    • [ ] refactor
    • [ ] others

    Which issues of this PR fixes :

    Fixes #9633

    Problem Summary(Required) :

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
    Approved 
    opened by xiaoyong-z 50
  • [Doc] modify varchar length limit (backport #12723)

    [Doc] modify varchar length limit (backport #12723)

    This is an automatic backport of pull request #12723 done by Mergify.


    Mergify commands and options

    More conditions and actions can be found in the documentation.

    You can also trigger Mergify actions by commenting on this pull request:

    • @Mergifyio refresh will re-evaluate the rules
    • @Mergifyio rebase will rebase this PR on its base branch
    • @Mergifyio update will merge the base branch into this PR
    • @Mergifyio backport <destination> will backport this PR on <destination> branch

    Additionally, on Mergify dashboard you can:

    • look at your merge queues
    • generate the Mergify configuration with the config editor.

    Finally, you can contact us on https://mergify.com

    documentation automerge 
    opened by mergify[bot] 42
  • [Enhancement] Add tables_config in  information_schema db

    [Enhancement] Add tables_config in information_schema db

    What type of PR is this:

    • [ ] bug
    • [ ] feature
    • [x] enhancement
    • [ ] refactor
    • [ ] others

    Which issues of this PR fixes :

    Fixes #9498

    Problem Summary(Required) :

    Add information_schema.tables_config to show tables config. This table contains columns such as PRIMARY_KEY PARTITION_KEY

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] I have added user document for my new feature or new function
    Approved 
    opened by waittttting 36
  • [Feature]Drop lake table

    [Feature]Drop lake table

    What type of PR is this:

    • [ ] bug
    • [x] feature
    • [ ] enhancement
    • [ ] refactor
    • [ ] others

    Which issues of this PR fixes :

    Fixes #

    Problem Summary(Required) :

    drop lake table: 1.use a daemon thread to drop lake tablet and delete shard 2.persist shard infos in image and journal

    Approved 
    opened by abc982627271 36
  • [BugFix] array_append/remove solve only null array

    [BugFix] array_append/remove solve only null array

    Signed-off-by: fzhedu [email protected]

    What type of PR is this:

    • [x] BugFix
    • [ ] Feature
    • [ ] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #https://github.com/StarRocks/starrocks/issues/10123

    Problem Summary(Required) :

    array_append/remove should directly return for only null array.

    Checklist:

    • [x] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function
    Approved be-build 
    opened by fzhedu 34
  • [Enhancement] Instead of using def for null values, use empty to represent null (backport #16265)

    [Enhancement] Instead of using def for null values, use empty to represent null (backport #16265)

    This is an automatic backport of pull request #16265 done by Mergify.


    Mergify commands and options

    More conditions and actions can be found in the documentation.

    You can also trigger Mergify actions by commenting on this pull request:

    • @Mergifyio refresh will re-evaluate the rules
    • @Mergifyio rebase will rebase this PR on its base branch
    • @Mergifyio update will merge the base branch into this PR
    • @Mergifyio backport <destination> will backport this PR on <destination> branch

    Additionally, on Mergify dashboard you can:

    • look at your merge queues
    • generate the Mergify configuration with the config editor.

    Finally, you can contact us on https://mergify.com

    automerge be-build 
    opened by mergify[bot] 1
  • [Enhancement][Cherry-pick][Branch-2.4] Make schema changed table not to be moved to trash

    [Enhancement][Cherry-pick][Branch-2.4] Make schema changed table not to be moved to trash

    What type of PR is this:

    • [ ] BugFix
    • [ ] Feature
    • [x] Enhancement
    • [ ] Refactor
    • [ ] UT
    • [ ] Doc
    • [ ] Tool

    Which issues of this PR fixes :

    Fixes #13651

    Problem Summary(Required) :

    When a table schema change is complete, the old table is moved to the trash directory, which takes up a lot of disk space. This PR fixes this issue.

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function

    Bugfix cherry-pick branch check:

    • [ ] I have checked the version labels which the pr will be auto backported to target branch
      • [ ] 2.5
      • [ ] 2.4
      • [ ] 2.3
      • [ ] 2.2
    automerge be-build 
    opened by zaorangyang 0
  • [BugFix] DictMappingRewriter support compoundPredicate (backport #15737)

    [BugFix] DictMappingRewriter support compoundPredicate (backport #15737)

    This is an automatic backport of pull request #15737 done by Mergify.


    Mergify commands and options

    More conditions and actions can be found in the documentation.

    You can also trigger Mergify actions by commenting on this pull request:

    • @Mergifyio refresh will re-evaluate the rules
    • @Mergifyio rebase will rebase this PR on its base branch
    • @Mergifyio update will merge the base branch into this PR
    • @Mergifyio backport <destination> will backport this PR on <destination> branch

    Additionally, on Mergify dashboard you can:

    • look at your merge queues
    • generate the Mergify configuration with the config editor.

    Finally, you can contact us on https://mergify.com

    automerge be-build 
    opened by mergify[bot] 1
  • [Enhancement] Use thrift 0.17.0

    [Enhancement] Use thrift 0.17.0

    What type of PR is this:

    • [x] Enhancement

    Problem Summary(Required) :

    In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could send short messages which would result in a large memory allocation, potentially leading to denial of service.

    Bump up the version to the latest 0.17.0.

    Tested both insert and select queries working for

    • FE w/ thrift 0.13.0 and BE w/ thrift 0.17.0
    • FE w/ thrift 0.17.0 and BE w/ thrift 0.13.0

    Checklist:

    • [ ] I have added test cases for my bug fix or my new feature
    • [ ] This pr will affect users' behaviors
    • [ ] This pr needs user documentation (for new or modified features or behaviors)
      • [ ] I have added documentation for my new feature or new function

    Bugfix cherry-pick branch check:

    • [ ] I have checked the version labels which the pr will be auto backported to target branch
      • [X] 2.5
      • [ ] 2.4
      • [ ] 2.3
      • [ ] 2.2
    be-build 
    opened by zuyu 4
  • Rename data distribution type name

    Rename data distribution type name

    Refactoring request

    Current DataPartition name is chaos, not so straightforward:

    • UNPARTITIONED means broadcast
    • RANDOM: means round robin
    • HASH_PARTITIONED means partitioned by hash
    good first issue type/cleanup 
    opened by mofeiatwork 0
Releases(2.3.6)
  • 2.3.6(Dec 23, 2022)

    Improvements

    • The Pipeline execution engine supports INSERT INTO statements. To enable it, set the FE configuration item enable_pipeline_load_for_insert to true. #14723
    • The memory used by Compaction for the primary key table is reduced. #13861 #13862

    Bug Fixes The following bugs are fixed:

    • For aggregation queries and multi-table JOIN queries, the statistics are not collected accurately and CROSS JOIN occurs in the execution plan, resulting in long query latency. #15497
    • When you create a materialized view by using CREATE MATERIALIZED VIEW AS SELECT, if the SELECT clause does not use aggregate functions, and uses GROUP BY, for example CREATE MATERIALIZED VIEW test_view AS SELECT a,b from test group by b,a order by a;, then the BE nodes all crash. #13743
    • If you restart the BE immediately after you use INSERT INTO to frequently load data into the primary key table to make data changes, the BE may restart very slowly. #15128
    • If only JRE is installed on the environment and JDK is not installed, queries fail after FE restarts. After the bug is fixed, FE cannot restart in that environment and it returns error JAVA_HOME can not be jre. To successfully restart FE, you need to install JDK on the environment. #14332
    • Queries cause BE crashes. #14221
    • exec_mem_limit cannot be set to an expression. #13647
    • You cannot create a sync refreshed materialized view based on subquery results. #13507
    • The comments for columns are deleted after you refresh the Hive external table. #13742
    • During a correlated JOIN, the right table is processed before the left table and the right table is very large. If compaction is performed on the left table while the right table is being processed, the BE node crashes. #14070
    • If the Parquet file column names are case-sensitive, and the query condition uses upper-case column names from the Parquet file, the query returns no result. #13860 #14773
    • During bulk loading, if the number of connections to Broker exceeds the default maximum number of connections, Broker is disconnected and the loading job fails with an error message list path error. #13911
    • When BEs are highly loaded, the metric for resource groups starrocks_be_resource_group_running_queries may be incorrect. #14043
    • If the query statement uses OUTER JOIN, it may cause the BE node to crash. #14840
    • After you create an asynchronous materialized view by using StarRocks 2.4, and you roll back it to 2.3, you may find FE fails to start. #14400
    • When the primary key table uses delete_range, and the performance is not good, it may slow down data reading from RocksDB and cause high CPU usage. #15130
    Source code(tar.gz)
    Source code(zip)
  • 2.4.2(Dec 16, 2022)

    Behavior Change

    • Constrained the session variable query_timeout with an upper limit of 259200 and a lower limit of 1.

    Improvement

    • Optimized the performance of Bucket Hint when a multitude of buckets exist. #13142

    Bug Fixes

    The following bugs are fixed:

    • Flushing the Primary Key index may cause BE to crash. #14857 #14819
    • Materialized view types cannot be correctly identified by SHOW FULL TABLES. #13954
    • Upgrading StarRocks v2.2 to v2.4 may cause BE to crash. #13795
    • Broker Load may cause BE to crash. #13973
    • The session variable statistic_collect_parallel does not take effect. #14352
    • INSERT INTO may cause BE to crash. #14818
    • JAVA UDF may cause BE to crash. #13947
    • Cloning replicas during partial updates may cause BE to crash and fail to restart. #13683
    • Colocated Join may not take effect. #13561
    Source code(tar.gz)
    Source code(zip)
  • 2.3.5(Dec 2, 2022)

    Improvements

    • Colocate Join supports Equi Join. #13546
    • Fix the problem that primary key index files are too large due to continuously appending WAL records when data is frequently loaded. #12862
    • FE scans all tablets in batches so that FE releases db.readLock at scanning intervals in case of holding db.readLock for too long. #13070

    Bug Fixes The following bugs are fixed:

    • When a view is created based directly on the result of UNION ALL, and the UNION ALL operator's input columns include NULL values, the schema of the view is incorrect since the data type of columns is NULL_TYPE rather than UNION ALL's input columns. #13917
    • The query result of SELECT * FROM ... and SELECT * FROM ... LIMIT ... is inconsistent. #13585
    • External tablet metadata synchronized to FE may overwrite local tablet metadata, which causes data loading from Flink to fail. #12579
    • BE nodes crash when null filter in Runtime Filter handles literal constants. #13526
    • An error is returned when you execute CTAS. #12388
    • The metrics ScanRows collected by pipeline engine in audit log may be wrong. #12185
    • The query result is incorrect when you query compressed HIVE data. #11546
    • Queries are timeout and StarRocks responds slowly after a BE node crashes. #12955
    • The error of Kerberos authentication failure occurs when you use Broker Load to load data. #13355
    • Too many OR predicates cause statistics estimation to take too long. #13086
    • BE node crashes if Broker Load loads ORC files (Snappy compression) contain uppercase column names. #12724
    • An error is returned when unloading or querying Primary Key table takes more than 30 minutes. #13403
    • The backup task fails when you back up large data volumes to HDFS by using a broker. #12836
    • The data StarRocks read from Iceberg may be incorrect, which is caused by the parquet_late_materialization_enable parameter. #13132
    • An error failed to init view stmt is returned when a view is created. #13102
    • An error is returned when you use JDBC to connect StarRock and execute SQL statements. #13526
    • The query is timeout because the query involves too many buckets and uses tablet hint. #13272
    • A BE node crashes and cannot be restarted, and in the meantime, the loading job into a newly built table reports an error. #13701
    • All BE nodes crash when a materialized view is created. #13184
    • When you execute ALTER ROUTINE LOAD to update the offset of consumed partitions, an error The specified partition 1 is not in the consumed partitionsmay be returned, and followers eventually crash. #12227
    Source code(tar.gz)
    Source code(zip)
  • 2.4.1(Nov 17, 2022)

    2.4.1

    New Feature

    • Supports non-equi joins - LEFT SEMI JOIN and ANTI JOIN. Optimized the JOIN function. #13019

    Improvements

    • Supports property aliveStatus in HeartbeatResponse. aliveStatus indicates if a node is alive in the cluster. Mechanisms that judge the aliveStatus are further optimized. #12713

    • Optimized the error message of Routine Load. #12155

    Bug Fixes

    • BE crashes after being upgraded from v2.4.0RC to v2.4.0. #13128

    • Late materialization causes incorrect results to queries on data lakes. #13133

    • The get_json_int function throws exceptions. #12997

    • Data may be inconsistent after deletion from a PRIMARY KEY table with a persistent index.#12719

    • BE may crash during compaction on a PRIMARY KEY table. #12914

    • The json_object function returns incorrect results when its input contains an empty string. #13030

    • BE crashes due to RuntimeFilter. #12807

    • FE hangs due to excessive recursive computations in CBO. #12788

    • BE may crash or report an error when exiting gracefully. #12852

    • Compaction crashes after data is deleted from a table with new columns added to it. #12907

    • Data may be inconsistent due to incorrect mechanisms in OLAP external table metadata synchronization. #12368

    • When one BE crashes, the other BEs may execute relevant queries till timeout. #12954

    Behavior Change

    • When parsing Hive external table fails, StarRocks throws error messages instead of converting relevant columns into NULL columns. #12382
    Source code(tar.gz)
    Source code(zip)
  • 2.3.4(Nov 11, 2022)

    2.3.4

    Release date: November 10, 2022

    Improvements

    • The error message provides a solution when StarRocks fails to create a Routine Load job because the number of running Routine Load job exceeds the limit. #12204
    • The query fails when StarRocks queries data from Hive and fails to parse CSV files. #13013

    Bug Fixes The following bugs are fixed:

    • The query may fail if HDFS files paths contain (). #12660
    • The result of ORDER BY ... LIMIT ... OFFSET is incorrect when the subquery contains LIMIT. #9698
    • StarRocks is case-insensitive when querying ORC files. #12724
    • BE may crash when RuntimeFilter is closed without invoking the prepare method. #12906
    • BE may crash because of memory leak. #12906
    • The query result may be incorrect after you add a new column and immediately delete data. #12907
    • BE may crash because of sorting data. #11185
    • If StarRocks and MySQL client are not on the same LAN, the loading job created by using INSERT INTO SELECT can not be terminated successfully by executing KILL only once. #11879
    • The metrics ScanRows collected by pipeline engine in audit log may be wrong. #12185
    Source code(tar.gz)
    Source code(zip)
  • 2.2.9(Nov 16, 2022)

    Improvements

    • Added the session variable hive_partition_stats_sample_size to control the number of Hive partitions from which to collect statistics. An excessive number of partitions will cause errors in obtaining Hive metadata. #12700
    • Elasticsearch external tables support custom time zones. #12662

    Bug Fixes

    The following bugs are fixed:

    • The DECOMMISSION operation is stuck if an error occurs during metadata synchronization for external tables. #12369
    • Compaction crashes if a column that is newly added is deleted. #12907
    • SHOW CREATE VIEW does not display the comments that were added when creating the view. #4163
    • Memory leak in Java UDF may cause OOM. #12418

    Behavior Changes

    Extended the length of Hive STRING columns that can be queried by StarRocks from 64 KB to 1 MB. If a STRING column exceeds 1 MB, it will be processed as a null column during queries. #12986

    Source code(tar.gz)
    Source code(zip)
  • 2.4.0(Oct 20, 2022)

    New Features

    • Supports creating a materialized view based on multiple base tables to accelerate queries with JOIN operations.
    • Supports overwriting data via INSERT OVERWRITE.
    • [Preview] Provides stateless Compute Nodes (CN) that can be horizontally scaled. You can use StarRocks Operator to deploy CN into your Kubernetes (K8s) cluster to achieve automatic horizontal scaling.
    • Outer Join supports non-equi joins in which join items are related by comparison operators including <, <=, >, >=, and <>.
    • Supports creating Iceberg catalogs and Hudi catalogs, which allow direct queries on data from Apache Iceberg and Apache Hudi.
    • Supports querying ARRAY-type columns from Apache Hive™ tables in CSV format.
    • Supports viewing the schema of external data via DESC.
    • Supports granting a specific role or IMPERSONATE permission to a user via GRANT and revoking them via REVOKE, and supports executing an SQL statement with IMPERSONATE permission via EXECUTE AS.
    • Supports FDQN access: now you can use a domain name or the combination of hostname and port as the unique identification of a BE or an FE node. This prevents access failures caused by changing IP addresses.
    • flink-connector-starrocks supports Primary Key model partial update.
    • Provides the following new functions:
      • array_contains_all: checks whether a specific array is a subset of another.
      • percentile_cont: calculates the percentile value with linear interpolation.

    Improvements

    • The Primary Key model supports flushing VARCHAR-type primary key indexes to disks. From version 2.4.0, the Primary Key model supports the same data types for primary key indexes regardless of whether the persistent primary key index is turned on or not.
    • Optimized the query performance on external tables.
      • Supports late materialization during queries on external tables in Parquet format to optimize the query performance on data lakes with small-scale filtering involved.
      • Small I/O operations can be merged to reduce the delay for querying data lakes, thereby improving the query performance on external tables.
    • Optimized the performance of window functions.
    • Optimized the performance of Cross Join by supporting predicate pushdown.
    • Histograms are added to CBO statistics. Full statistics collection is further optimized.
    • Adaptive multi-threading is enabled for tablet scanning to reduce the dependency of scanning performance on the tablet number. As a result, you can set the number of buckets more easily.
    • Supports querying compressed TXT files in Apache Hive.
    • Adjusted the mechanisms of default PageCache size calculation and memory consistency check to avoid OOM issues during multi-instance deployments.
    • Improved the performance of large-size batch load on the PRIMARY KEY model up to two times by removing final_merge operations.
    • Supports a Stream Load transaction interface to implement a two-phase commit (2PC) for transactions that are run to load data from external systems such as Apache Flink® and Apache Kafka®, improving the performance of highly concurrent stream loads.
    • Functions:
      • You can use COUNT DISTINCT over multiple columns to calculate the number of distinct column combinations.
      • Window functions min() and max() support sliding windows.
      • Optimized the performance of the window_funnel function.

    Bug Fixes

    The following bugs are fixed:

    • DECIMAL data types returned by DESC are different from those specified in the CREATE TABLE statement. #7309
    • FE metadata management issues that affect the stability of FE. #6685 #9445 #7974 #7455
    • Data load-related issues:
      • Broke Load fails when ARRAY-type column is set. #9158
      • Replicas are inconsistent after data is loaded to a non-Duplicate Key table via Broker Load. #8714
      • Executing ALTER ROUTINE LOAD raises NPE. #7804
    • Data Lake analytic-related issues:
      • Queries on Parquet-format in Hive external tables fail. #7413 #7482 #7624
      • Incorrect results are returned to queries with limit clause on Elasticsearch external table.#9226
      • An unknown error is raised during queries on an Apache Iceberg table with a complex data type. #11298
    • Metadata can be inconsistent between the Leader FE and Follower FE nodes. #11215
    • BE crashes when BITMAP type data size is larger than 2GB. #11178

    Behavior Change

    • Page Cache is enabled by default. The default cache size is 20% of the system memory.

    Others

    • Announcing the stable release of Resource Group.
    • Announcing the stable release of the JSON data type and its related functions.
    Source code(tar.gz)
    Source code(zip)
  • 2.2.8(Oct 20, 2022)

    Bugfix

    The following bugs are fixed:

    • BEs may crash if an expression encounters an error in the initial stage. #11395
    • BEs may crash if invalid JSON data is loaded. #10804
    • Parallel writing encounters an error when the pipeline engine is enabled. #11451
    • BEs crash when the ORDER BY NULL LIMIT clause is used. #11648
    • BEs crash if the column type defined in the external table is different from the column type in the source Parquet file. #11839
    Source code(tar.gz)
    Source code(zip)
  • 2.3.3(Oct 10, 2022)

    Bug Fixes

    The following bugs are fixed:

    • Query result may be inaccurate when you query an Hive external table stored as a text file. #11546

    • Nested arrays are not supported when you query Parquet files. #10983

    • Queries or a query may time out if concurrent queries that read data from StarRocks and external data sources are routed to the same resource group, or a query reads data from StarRocks and external data sources. #10983

    • When the Pipeline execution engine is enabled by default, the parameter parallel_fragment_exec_instance_num is changed to 1. It will cause data loading by using INSERT INTO to be slow. #11462

    • BE may crash if there are mistakes when a expression is initialized. #11396

    • The error heap-buffer-overflow may occur if you execute ORDER BY LIMIT. #11185

    • Schema change fails if you restart Leader FE in the meantime. #11561

    Source code(tar.gz)
    Source code(zip)
  • 2.2.7(Sep 27, 2022)

    Bug Fixes

    The following bugs are fixed:

    • Data may be lost when JSON data is loaded. #11054
    • The output from SHOW FULL TABLES is incorrect. #11126
    • In previous versions, to access data in a view, users must have permissions on both the base tables and the view. In the current version, users are only required to have permissions on the view. #11290
    • The result from a complex query that is nested with EXISTS or IN is incorrect. #11415
    • REFRESH EXTERNAL TABLE fails when the schema of the corresponding Hive table is changed. #11406
    • An error may occur when a non-leader FE replays the bitmap index creation operation. #11261
    Source code(tar.gz)
    Source code(zip)
  • 2.2.6(Sep 27, 2022)

    Bug Fixes

    The following bugs are fixed:

    • The result of order by... limit...offset is incorrect when the subquery contains LIMIT.
    • The BE crashes when partial update is performed on a table with large data volume.
    • Compaction causes memory to overflow when the size of BITMAP data exceeds 2 GB.
    • The like() and regexp() functions do not work if the pattern length exceeds 16 KB.

    Behavior Changes

    • The format used to represent array values in the output is modified. Escape characters are no longer used in the returned JSON values. For example, [{"k1":"v1"}] is returned, instead of "[{"k1":"v1"}]".
    Source code(tar.gz)
    Source code(zip)
  • 2.3.2(Sep 7, 2022)

    New Features

    • Late materialization is supported to accelerate range filter-based queries on external tables in Parquet format. #9738
    • The SHOW AUTHENTICATION statement is added to display user authentication-related information. #9996

    Improvements

    • A configuration item is provided to control whether StarRocks recursively traverses all data files for the bucketed Hive table from which StarRocks queries data. #10239
    • The resource group type realtime is renamed as short_query. #10247
    • StarRocks no longer distinguishes between uppercase letters and lowercase letters in Hive external tables by default. #10187

    Bug Fixes The following bugs are fixed:

    • Queries on an Elasticsearch external table may unexpectedly exit when the table is divided into multiple shards. #10369
    • StarRocks throws errors when sub-queries are rewritten as common table expressions (CTEs). #10397
    • StarRocks throws errors when a large amount of data is loaded. #10370 #10380
    • When the same Thrift service IP address is configured for multiple catalogs, deleting one catalog invalidates the incremental metadata updates in the other catalogs. #10511
    • The statistics of memory consumption from BEs are inaccurate. #9837
    • StarRocks throws errors for queries on Primary Key tables. #10811
    • Queries on logical views are not allowed even when you have SELECT permissions on these views. #10563
    • StarRocks does not impose limits on the naming of logical views. Now logical views need to follow the same naming conventions as tables. #10558
    Source code(tar.gz)
    Source code(zip)
  • 2.1.13(Sep 6, 2022)

    Release date: September 6, 2022

    Improvements

    • Added a BE configuration item enable_check_string_lengths to check the length of loaded data. This mechanism helps prevent compaction failures caused by VARCHAR data size out of range. #10380
    • Optimized the query performance when a query contains more than 1000 OR operators. #9332

    Bug Fixes

    The following bugs are fixed:

    • An error may occur and BEs may crash when you query ARRAY columns (calculated by using the REPLACE_IF_NOT_NULL function) from a table using the Aggregate Key Model. #10144
    • The query result is incorrect if more than one IFNULL() function is nested in the query. #5028 #10486
    • After a dynamic partition is truncated, the number of tablets in the partition changes from the value configured by dynamic partitioning to the default value. #10435
    • If the Kafka cluster is stopped when you use Routine Load to load data into StarRocks, deadlocks may occur, affecting query performance. #8947
    • An error occurs when a query contains both subqueries and ORDER BY clauses. #10066

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.1.12...2.1.13

    Thanks to

    @Astralidea, @Linkerist, @Seaven, @ZiheLiu, @amber-create, @dulong41, @meegoo, @mergify, @rickif, @trueeyu, @xiaoyong-z

    Source code(tar.gz)
    Source code(zip)
  • 2.3.1(Aug 22, 2022)

    Release date: August 22, 2022

    Improvements

    • Broker Load supports transforming the List type in Parquet files into non-nested ARRAY data type. #9150
    • Optimizes the performance of JSON-related functions (json_query, get_json_string, and get_json_int). #9623
    • Optimizes the exception message: during a query on Hive, Iceberg, or Hudi, if the data type of the column in the query is not supported by StarRocks, the system throws an exception on the corresponding column. #10139
    • Reduces the schedule latency of resource group to optimize the resource isolation performance. #10122

    Bugfixes

    • Wrong result is returned from the query on Elasticsearch external table caused by the incorrect pushdown of the limit operator. #9952
    • Query on Oracle external table fails when the limit operator is used. #9542
    • BE is blocked when all Kafka Brokers are stopped during a Routine Load. #9935
    • BE crashes during a query on a Parquet file whose data type mismatches that of the corresponding external table. #10107
    • Query hangs because the external table scan range is empty. #10091
    • The system throws an exception when ORDER BY clause is included in a sub-query. #10180
    • Hive Metastore hangs when the Hive metadata is reloaded asynchronously. #10132
    Source code(tar.gz)
    Source code(zip)
  • 2.2.5(Aug 19, 2022)

    Release date: August 18, 2022

    Improvements

    • Improved the system performance when the pipeline engine is enabled. #9580
    • Improved the accuracy of memory usage metrics for index metadata. #9837

    Bug Fixes

    The following bugs are fixed:

    • BEs may be stuck in querying Kafka partition offsets (get_partition_offset) during Routine Load. #9937
    • An error occurs when multiple Broker Load threads attempt to load the same HDFS file. #9507
    Source code(tar.gz)
    Source code(zip)
  • 2.1.12(Aug 9, 2022)

    Release date: August 9, 2022

    Improvements

    Added two parameters, bdbje_cleaner_threads and bdbje_replay_cost_percent, to speed up metadata cleanup in BDB JE.

    Bug Fixes

    The following bugs are fixed:

    • Some queries are forwarded to the Leader FE, causing the /api/query_detail action to return incorrect execution information about SQL statements such as SHOW FRONTENDS. #9185
    • After a BE is terminated, the current process is not completely terminated, resulting in a failed restart of the BE. #9175
    • When multiple Broker Load jobs are created to load the same HDFS data file, if one job encounters exceptions, the other jobs may not be able to properly read data and consequently fail. #9506
    • The related variables are not reset when the schema of a table changes, resulting in an error (no delete vector found tablet) when querying the table. #9192

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.1.11...2.1.12

    Thanks to

    @EsoragotoSpirit, @HangyuanLiu, @Pslydhh, @Seaven, @amber-create, @chaoyli, @decster, @dulong41, @femiiii, @gengjun-git, @hellolilyliuyi, @jaogoy, @liuyehcf, @minchowang, @padmejin, @rickif, @stdpain, @trueeyu, @xiaoyong-z, @yongbingwang

    Source code(tar.gz)
    Source code(zip)
  • 2.0.9(Aug 6, 2022)

    Release date: August 6, 2022

    Bug Fixes

    The following bugs are fixed:

    • For a Broker Load job, if the broker is heavily loaded, internal heartbeats may time out, causing data loss. [#8282]
    • For a Broker Load job, if the destination StarRocks table does not have the column specified by the COLUMNS FROM PATH AS parameter, the BEs stop running. [#5346]
    • Some queries are forwarded to the Leader FE, causing the /api/query_detail action to return incorrect execution information about SQL statements such as SHOW FRONTENDS. [#9185]
    • When multiple Broker Load jobs are created to load the same HDFS data file, if one job encounters exceptions, the other jobs may not be able to properly read data either and consequently fail. [#9506]

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.0.8...2.0.9

    Thanks to

    @dulong41, @gengjun-git, @liuyehcf, @padmejin, @trueeyu,

    Source code(tar.gz)
    Source code(zip)
  • 2.2.4(Aug 4, 2022)

    Improvements

    • Supports synchronizing schema changes on Hive table to the corresponding external table. #9010
    • Supports loading ARRAY data in Parquet files via Broker Load. #9131

    Bug Fixes

    The following bugs are fixed:

    • Broker Load cannot handle Kerberos logins with multiple keytab files. #8820 #8837
    • Supervisor may fail to restart services if stop_be.sh exits immediately after it is executed. #9175
    • Incorrect Join Reorder precedence causes error "Column cannot be resolved". #9063 #9487
    Source code(tar.gz)
    Source code(zip)
  • 2.3.0(Jul 28, 2022)

    New Features

    • The Primary Key model supports the full DELETE WHERE syntax.#4500
    • The primary key model supports persistent primary key indexes to prevent excessive memory consumption.#3399
    • When a Routine Load job is executed, and a global dictionary is built to optimize queries on low-cardinality columns, the global dictionary can be updated, thus improving query performance.
    • The CREATE TABLE AS SELECT statement can be executed asynchronously and write results to a new table. #5543
    • Support the following resource group-related features:
      • Monitor resource groups: You can use the audit log to view a query in which resource group and call API operations to obtain the monitoring metrics about specific resource groups.
      • Limit the consumption of large queries on CPU, memory, or I/O resources: You can route queries to resource groups by matching classifiers or configure session variables to directly specify resource groups for queries.
    • StarRocks provides JDBC external tables to query Oracle, PostgreSQL, MySQL, SQLServer, Clickhouse, and other databases. StarRocks also supports predicate pushdown when performing queries. #3352 #3353 #3354 #3723 #3724 #3725
    • [Preview] A new Data Source Connector framework is released to support user-defined external catalogs. You can use the external catalogs to directly access and analyze Hive without creating external tables. #5062 #7268
    • Add the following functions:
      • window_funnel
      • ntile
      • bitmap_union_count、array_to_bitmap #7404、base64_to_bitmap
      • week、time_slice
    • Add the EXECUTE AS statement. After you use the GRANT statement to impersonate a specific user identity to perform operations, you can use the EXECUTE AS statement to switch the execution context of the current session to this user.

    Improvements

    • The compaction mechanism can merge large metadata more quickly. This prevents metadata squeezing and excessive disk usage that can occur shortly after frequent data updates.
    • The performance of loading Parquet files and compressed files is optimized. #3945
    • The mechanism of creating materialized views is optimized. After the optimization, materialized views can be created at a speed up to 10 times higher than before.
    • The performance of the following operators is optimized:
      • TopN and sort operators #3953 #4106 #4227 #4680
      • Equivalence comparison operators that contain functions can use Zone Map indexes when these operators are pushed down to scan operators.
    • Optimize the Apache Hive™ external tables.
      • When Apache Hive™ tables are stored as Parquet, ORC, or CSV formats, StarRocks can synchronize the schema changes like ADD COLUMN and REPLACE COLUMN from hive tables when you perform REFRESH statement on external tables.
      • Hive resources' hive.metastore.uris can be modified. #3843
    • Optimize the Apache Iceberg external tables. A custom catalog can be used to create an Iceberg resource. #5062
    • Optimize the Elasticsearch external tables. Sniffing the addresses of the data nodes in an Elasticsearch cluster can be disabled. #6314
    • When the sum() function calculates numeric strings, implicit conversion is performed.
    • The year, month, and day functions support the DATE data type. Bug Fixes The following bugs are fixed:
      • CPU utilization increases abnormally high due to an excessive number of tablets. #5875
      • Problems cause the "fail to prepare tablet reader" error message to occur.#7248#7854#8257 Problems cause the "fail to prepare tablet reader" error message to occur.
      • The FEs fail to restart.#5642#4969#5580
      • The CTAS statement cannot be run successfully when the statement includes a JSON function.#6498

    Others

    • [Preview] StarGo, a cluster management tool, can deploy, start, upgrade, and roll back clusters and manage multiple clusters.
    • StarRocks clusters can be quickly deployed on AWS by using CloudFormation.
    • Pipeline Engine is turned on as default now.
    Source code(tar.gz)
    Source code(zip)
  • 2.2.3(Jul 27, 2022)

    Release date: July 25, 2022

    Bug Fixes

    The following bugs are fixed:

    • An error occurs when users delete a resource group. #8036
    • Thrift server exits when the number of threads is insufficient. #7974
    • In some scenarios, join reorder in CBO returns no results. #7099 #7831 #6866
    Source code(tar.gz)
    Source code(zip)
  • 2.0.8(Jul 15, 2022)

    Release date: July 15, 2022

    Bug Fixes

    The following bugs are fixed:

    • Switching Leader FE node repetitively may cause all load jobs hang and fail. #7350
    • BE crashes when the memory usage estimation of MemTable exceeds 4GB, because, during a data skew in load, some fields may occupy a large amount of memory resources. #7161
    • After restarting FEs, the schemas of materialized views changed due to incorrect parsing of uppercase and lowercase letters. #7362
    • When you load JSON data from Kafka into StarRocks by using Routine Load, if there are blank rows in the JSON data, the data after the blank rows will be lost. #8534

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.0.7...2.0.8

    Thanks to

    @Astralidea, @HangyuanLiu, @Seaven, @gengjun-git, @imay, @mergify, @padmejin, @rickif, @stdpain, @trueeyu, @xiaoyong-z

    Source code(tar.gz)
    Source code(zip)
  • 2.1.11(Jul 9, 2022)

    Release date: July 9, 2022

    Bug Fixes

    The following bugs are fixed:

    • Data loading into a table of the Primary Key model is suspended in the event of frequent data loads into that table.#7763
    • Aggregate expressions are processed in an incorrect sequence during low-cardinality optimization, causing the count distinct function to return unexpected results. #7659
    • No results are returned for the LIMIT clause, because the pruning rule in the clause cannot be properly processed. #7894
    • If the global dictionary for low-cardinality optimization is applied on columns that are defined as join conditions for a query, the query returns unexpected results. 8302

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.1.10...2.1.11

    Thanks to

    @HangyuanLiu, @Seaven, @Youngwb, @chaoyli, @decster, @gengjun-git, @imay, @kangkaisen, @mergify, @rickif, @sduzh, @stdpain, @trueeyu

    Source code(tar.gz)
    Source code(zip)
  • 2.2.2(Jun 30, 2022)

    Release date: June 29, 2022

    Improvements

    • UDFs can be used across databases. #6865 #7211
    • Optimized concurrency control for internal processing such as schema change. This reduces pressure on FE metadata management. In addition, the possibility that load jobs may pile up or slow down is reduced in scenarios where huge volume of data needs to be loaded at high concurrency. #6838

    BugFix

    The following bugs are fixed:

    • The number of replicas (replication_num) created by using CTAS is incorrect. #7036
    • Metadata may be lost after ALTER ROUTINE LOAD is performed. #7068
    • Runtime filters fail to be pushed down. #7206 #7258
    • Pipeline issues that may cause memory leaks. #7295
    • Deadlock may occur when a Routine Load job is aborted. #6849
    • Some profile statistics information is inaccurate. #7074 #6789
    • The get_json_string function incorrectly processes JSON arrays.

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.2.1...2.2.2

    Thanks to

    @Astralidea, @DorianZheng, @HangyuanLiu, @Seaven, @Youngwb, @ZiheLiu, @caneGuy, @decster, @dirtysalt, @gengjun-git, @huangfeng1993, @imay, @liuyehcf, @meegoo, @mergify, @mofeiatwork, @rickif, @satanson, @sduzh, @sevev, @stdpain, @trueeyu, @xiaoyong-z

    Source code(tar.gz)
    Source code(zip)
  • 2.1.10(Jun 27, 2022)

    Release date: June 24, 2022

    Bug Fixes

    The following bugs are fixed:

    • Switching Leader FE node repetitively may cause all load jobs hang and fail. #7350
    • Field of DECIMAL(18,2) type is shown as DECIMAL64(18,2) when checking the table schema with DESC SQL. #7309
    • BE crashes when the memory usage estimation of MemTable exceeds 4GB, because, during a data skew in load, some fields may occupy a large amount of memory resources. #7161
    • A large number of small segment files are created due to the overflow in the calculation of the max_rows_per_segment when there are many input rows in a compaction. #5610

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.1.8...2.1.10

    Thanks to

    @Seaven, @chaoyli, @decster, @gengjun-git, @mergify, @mofeiatwork, @satanson, @sduzh, @trueeyu, @xiaoyong-z

    Source code(tar.gz)
    Source code(zip)
  • 2.1.8(Jun 8, 2022)

    Release date: June 10, 2022

    Improvements

    • The concurrency control mechanism used for internal processing workloads such as schema changes is optimized to reduce the pressure on frontend (FE) metadata. As such, load jobs are less likely to pile up and slow down if these load jobs are concurrently run to load a large amount of data. #6560 #6804
    • The performance of StarRocks in loading data at a high frequency is improved. #6532 #6533

    Bug Fixes

    The following bugs are fixed:

    • ALTER operation logs do not record all information about LOAD statements. Therefore, after you perform an ALTER operation on a routine load job, the metadata of the job is lost after checkpoints are created. #6936
    • A deadlock may occur if you stop a routine load job. #6450
    • By default, a backend (BE) uses the default UTC+8 time zone for a load job. If your server uses the UTC time zone, 8 hours are added to the timestamps in the DateTime column of the table that is loaded by using a Spark load job. #6592
    • The GET_JSON_STRING function cannot process non-JSON strings. If you extract a JSON value from a JSON object or array, the function returns NULL. The function has been optimized to return an equivalent JSON-formatted STRING value for a JSON object or array. #6426
    • If the data volume is large, a schema change may fail due to excessive memory consumption. Optimizations have been made to allow you to specify memory consumption limits at all stages of a schema change. #6705
    • If the number of duplicate values in a column of a table that is being compacted exceeds 0x40000000, the compaction is suspended. #6513
    • After an FE restarts, it encounters high I/O and abnormally increasing disk usage due to a few issues in BDB JE v7.3.8 and shows no sign of restoring to normal. The FE is restored to normal after it rolls back to BDB JE v7.3.7. [#6634](https://github.com/StarRocks/starrocks/issues/6634)

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.1.7...2.1.8

    Thanks to

    @Astralidea, @Linkerist, @RowenWoo, @chaoyli, @gengjun-git, @meegoo, @mergify, @rickif, @stdpain, @xiaoyong-z

    Source code(tar.gz)
    Source code(zip)
  • 2.0.7(Jun 13, 2022)

    Release date: June 13, 2022

    Bug Fixes

    The following bugs are fixed:

    • If the number of duplicate values in a column of a table that is being compacted exceeds 0x40000000, the compaction is suspended. #6513
    • After an FE restarts, it encounters high I/O and abnormally increasing disk usage due to a few issues in BDB JE v7.3.8 and shows no sign of restoring to normal. The FE is restored to normal after it rolls back to BDB JE v7.3.7. #6634

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.0.6...2.0.7

    Thanks to

    @Astralidea, @RowenWoo, @gengjun-git, @meegoo, @mergify, @xiaoyong-z

    Source code(tar.gz)
    Source code(zip)
  • 2.2.1(Jun 6, 2022)

    Release date: June 2, 2022

    Improvements

    • Optimized the data loading performance and reduced long tail latency by reconstructing part of the hotspot code and reducing lock granularity. #6641
    • Added the CPU and memory usage information of the machines on which BEs are deployed for each query to the FE audit log. #6208 #6209
    • Supported JSON data types in the tables that use the Primary Key model and tables that use the Unique Key model. #6544
    • Reduced FEs load by reducing lock granularity and deduplicating BE report requests. Optimized the report performance when a large number of BEs are deployed, and solved the issue of Routine Load tasks getting stuck in a large cluster. #6293

    Bug Fixes

    The following bugs are fixed:

    • An error occurs when StarRocks parses the escape characters specified in the SHOW FULL TABLES FROM DatabaseName statement. #6559
    • FE disk space usage rises sharply (Fix this bug by rolling back the BDBJE version). #6708
    • BEs become faulty because relevant fields cannot be found in the data returned after columnar scanning is enabled (enable_docvalue_scan=true). #6600

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.2.0...2.2.1

    Thanks to

    @Astralidea, @HangyuanLiu, @Linkerist, @RowenWoo, @Seaven, @Youngwb, @ZiheLiu, @dirtysalt, @gengjun-git, @meegoo, @mergify, @mofeiatwork, @rickif, @sevev, @trueeyu, @xiaoyong-z

    Source code(tar.gz)
    Source code(zip)
  • 2.1.7(May 26, 2022)

    Release date: May 26, 2022

    Improvements

    For window functions in which the frame is set to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, if the partition involved in a calculation is large, StarRocks caches all data of the partition before it performs the calculation. In this situation, a large number of memory resources are consumed. StarRocks has been optimized not to cache all data of the partition in this situation. 5829

    Bug Fixes

    The following bugs are fixed:

    • When data is loaded into a table that uses the Primary Key model, data processing errors may occur if the creation time of each data version stored in the system does not monotonically increase due to reasons such as backward-moved system time and related unknown bugs. Such data processing errors cause backends (BEs) to stop. #6046
    • Some graphical user interface (GUI) tools automatically configure the set_sql_limit variable. As a result, the SQL statement ORDER BY LIMIT is ignored, and consequently an incorrect number of rows are returned for queries. #5966
    • If the DROP SCHEMA statement is executed on a database, the database is forcibly deleted and cannot be restored. #6201
    • When JSON-formatted data is loaded, BEs stop if the data contains JSON format errors. For example, key-value pairs are not separated by commas (,). #6098
    • When a large amount of data is being loaded in a highly concurrent manner, tasks that are run to write data to disks are piled up on BEs. In this situation, the BEs may stop. #3877
    • StarRocks estimates the amount of memory that is required before it performs a schema change on a table. If the table contains a large number of STRING fields, the memory estimation result may be inaccurate. In this situation, if the estimated amount of memory that is required exceeds the maximum memory that is allowed for a single schema change operation, schema change operations that are supposed to be properly run encounter errors. #6322
    • After a schema change is performed on a table that uses the Primary Key model, a "duplicate key xxx" error may occur when data is loaded into that table. #5878
    • If low-cardinality optimization is performed during Shuffle Join operations, partitioning errors may occur. #4890
    • If a colocation group (CG) contains a large number of tables and data is frequently loaded into the tables, the CG may not be able to stay in the stable state. In this case, the JOIN statement does not support Colocate Join operations. StarRocks has been optimized to wait for a little longer during data loading. This way, the integrity of the tablet replicas to which data is loaded can be maximized.

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.1.6...2.1.7

    Thanks to

    @Astralidea, @HangyuanLiu, @Linkerist, @Youngwb, @chaoyli, @decster, @dirtysalt, @gengjun-git, @meegoo, @rickif, @sevev, @stdpain, @trueeyu, @xiaoyong-z

    Source code(tar.gz)
    Source code(zip)
  • 2.0.6(May 26, 2022)

    Release date: May 25, 2022

    Bug Fixes

    The following bugs are fixed:

    • Some graphical user interface (GUI) tools automatically configure the set_sql_limit variable. As a result, the SQL statement ORDER BY LIMIT is ignored, and consequently an incorrect number of rows are returned for queries. #5966
    • If a colocation group (CG) contains a large number of tables and data is frequently loaded into the tables, the CG may not be able to stay in the stable state. In this case, the JOIN statement does not support Colocate Join operations. StarRocks has been optimized to wait for a little longer during data loading. This way, the integrity of the tablet replicas to which data is loaded can be maximized.
    • If a few replicas fail to be loaded due to reasons such as heavy loads or high network latencies, cloning on these replicas is triggered. In this case, deadlocks may occur, which may cause a situation in which the loads on processes are low but a large number of requests time out. #5646 #6290
    • After the schema of a table that uses the Primary Key model is changed, a "duplicate key xxx" error may occur when data is loaded into that table. #5878
    • If the DROP SCHEMA statement is executed on a database, the database is forcibly deleted and cannot be restored. #6201

    Full Changelog: https://github.com/StarRocks/starrocks/compare/2.0.5...2.0.6

    Thanks to

    @Astralidea, @Linkerist, @chaoyli, @decster, @dirtysalt, @gengjun-git, @sevev, @stdpain

    Source code(tar.gz)
    Source code(zip)
  • 2.2.0(May 25, 2022)

    New Features

    • [Preview] Resource groups are supported. By using resource groups to control CPU and memory usage, StarRocks can achieve resource isolation and rational use of resources when different tenants perform complex and simple queries in the same cluster.
    • [Preview] Java UDFs (user-defined functions) are supported. StarRocks supports writing UDFs in Java, extending StarRocks' functions.
    • [Preview] Primary key model supports partial updates when data is loaded to the primary key model using Stream Load, Broker Load, and Routine Load. In real-time data update scenarios such as updating orders and joining multiple streams, partial updates allow users to update only a few columns.
    • [Preview] JSON data types and JSON functions are supported.
    • External tables based on Apache Hudi are supported, which further improves data lake analytics experience.
    • The following functions are supported:
      • ARRAY functions, including array_agg, array_sort, array_distinct, array_join, reverse, array_slice, array_concat, array_difference, array_overlap, and array_intersect
      • BITMAP functions, including bitmap_max and bitmap_min
      • Other functions, including retention and square

    Improvement

    • CBO's Parser and Analyzer are reconstructed, code structure is optimized and syntax such as Insert with CTE is supported. So the performance of complex queries is optimized, such as those queries reusing common table expression (CTE).
    • The query performance of object storage-based (AWS S3, Alibaba Cloud OSS, Tencent COS) Apache Hive external table is optimized. After optimization, the performance of object storage-based queries is comparable to that of HDFS-based queries. Also, late materialization of ORC files is supported, improving query performance of small files.
    • When external tables are used to query Apache Hive, StarRocks supports automatic and incremental updating of cached metastore data by consuming Hive Metastore events, such as data changes and partition changes. Moreover, it also supports querying DECIMAL and ARRAY data in Apache Hive.
    • The performance of UNION ALL operator is optimized, delivering improvement of up to 2-25 times.
    • The pipeline engine which can adaptively adjust query parallelism is released, and its profile is optimized. The pipeline engine can improve performance for small queries in high concurrent scenarios.
    • StarRocks supports the loading of CSV files with multi-character row delimiters.

    Bug Fixes

    The following bugs are fixed:

    • Deadlocks occur when data is loaded and changes are committed into tables based on Primary Key model. #4998
    • Some FE (including BDBJE) stability issues. #4428, #4666, #2
    • The return value overflows when the SUM function is used to calculate a large amount of data. #3944
    • The return values of ROUND and TRUNCATE functions have precision issues. #4256 Some bugs detected by SQLancer. Please see SQLancer related issues.

    Others

    • The Flink connector flink-connector-starrocks supports Flink 1.14.
    Source code(tar.gz)
    Source code(zip)
Owner
StarRocks
StarRocks
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Overview GridDB is Database for IoT with both NoSQL interface and SQL Interface. Please refer to GridDB Features Reference for functionality. This rep

GridDB 2k Jan 8, 2023
Velox is a new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Velox is a C++ database acceleration library which provides reusable, extensible, and high-performance data processing components

Facebook Incubator 2k Jan 8, 2023
A very fast lightweight embedded database engine with a built-in query language.

upscaledb 2.2.1 Fr 10. Mär 21:33:03 CET 2017 (C) Christoph Rupp, [email protected]; http://www.upscaledb.com This is t

Christoph Rupp 542 Dec 30, 2022
A redis module, similar to redis zset, but you can set multiple scores for each member to support multi-dimensional sorting

TairZset: Support multi-score sorting zset Introduction Chinese TairZset is a data structure developed based on the redis module. Compared with the na

Alibaba 60 Dec 1, 2022
The database built for IoT streaming data storage and real-time stream processing.

The database built for IoT streaming data storage and real-time stream processing.

HStreamDB 575 Dec 26, 2022
A mini database for learning database

A mini database for learning database

Chuckie Tan 4 Nov 14, 2022
SiriDB is a highly-scalable, robust and super fast time series database

SiriDB is a highly-scalable, robust and super fast time series database. Build from the ground up SiriDB uses a unique mechanism to operate without a global index and allows server resources to be added on the fly. SiriDB's unique query language includes dynamic grouping of time series for easy analysis over large amounts of time series.

SiriDB 471 Jan 9, 2023
TimescaleDB is an open-source database designed to make SQL scalable for time-series data.

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

Timescale 14.3k Jan 2, 2023
以简单、易用、高性能为目标、开源的时序数据库,支持Linux和Windows, Time Series Database

松果时序数据库(pinusdb) 松果时序数据库是一款针对中小规模(设备数少于10万台,每天产生的数据量少于10亿条)场景设计的时序数据库。以简单、易用、高性能为设计目标。使用SQL语句进行交互,拥有极低的学习、使用成本, 提供了丰富的功能、较高的性能。 我们的目标是成为最简单、易用、健壮的单机时序

null 99 Nov 19, 2022
A friendly and lightweight C++ database library for MySQL, PostgreSQL, SQLite and ODBC.

QTL QTL is a C ++ library for accessing SQL databases and currently supports MySQL, SQLite, PostgreSQL and ODBC. QTL is a lightweight library that con

null 173 Dec 12, 2022
ObjectBox C and C++: super-fast database for objects and structs

ObjectBox Embedded Database for C and C++ ObjectBox is a superfast C and C++ database for embedded devices (mobile and IoT), desktop and server apps.

ObjectBox 152 Dec 23, 2022
dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover

dqlite dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover. The acron

Canonical 3.3k Jan 9, 2023
ESE is an embedded / ISAM-based database engine, that provides rudimentary table and indexed access.

Extensible-Storage-Engine A Non-SQL Database Engine The Extensible Storage Engine (ESE) is one of those rare codebases having proven to have a more th

Microsoft 792 Dec 22, 2022
Nebula Graph is a distributed, fast open-source graph database featuring horizontal scalability and high availability

Nebula Graph is an open-source graph database capable of hosting super large scale graphs with dozens of billions of vertices (nodes) and trillions of edges, with milliseconds of latency.

vesoft inc. 834 Dec 24, 2022
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.

What is OceanBase database OceanBase Database is a native distributed relational database. It is developed entirely by Alibaba and Ant Group. OceanBas

OceanBase 5.1k Jan 4, 2023
Config and tools for config of tasmota devices from mysql database

tasmota-sql Tools for management of tasmota devices based on mysql. The tasconfig command can load config from tasmota and store in sql, or load from

RevK 3 Jan 8, 2022
Serverless SQLite database read from and write to Object Storage Service, run on FaaS platform.

serverless-sqlite Serverless SQLite database read from and write to Object Storage Service, run on FaaS platform. NOTES: This repository is still in t

老雷 7 May 12, 2022
Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.

GitHub 482 Dec 31, 2022
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.

DB Browser for SQLite What it is DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files c

null 17.5k Jan 2, 2023