OrioleDB – building a modern cloud-native storage engine

Overview

OrioleDB – building a modern cloud-native storage engine

(... and solving some PostgreSQL wicked problems)

build status codecov dockerhub

OrioleDB is a new storage engine for PostgreSQL, bringing a modern approach to database capacity, capabilities and performance to the world's most-loved database platform.

OrioleDB consists of an extension, building on the innovative table access method framework and other standard Postgres extension interfaces. By extending and enhancing the current table access methods, OrioleDB opens the door to a future of more powerful storage models that are optimized for cloud and modern hardware architectures.

OrioleDB is currently distributed under the standard PostgreSQL license.

  1. Designed for modern hardware. OrioleDB design avoids legacy CPU bottlenecks on modern servers containing dozens and hundreds CPU cores, providing optimized usage of modern storage technologies such as SSD and NVRAM.

  2. Reduced maintenance needs. OrioleDB implements the concepts of undo log and page-mergins, eliminating the need for dedicated garbage collection processes. Additionally, OrioleDB implements default 64-bit transaction identifiers, thus eliminating the well-known and painful wraparound problem.

  3. Designed to be distributed. OrioleDB implements a row-level write-ahead log with support for parallel apply. This log architecture is optimized for raft consensus-based replication allowing the implementation of active-active multimaster.

The key technical differentiations of OrioleDB are as follows:

  1. No buffer mapping and lock-less page reading. In-memory pages in OrioleDB are connected with direct links to the storage pages. This eliminates the need for in-buffer mapping along with its related bottlenecks. Additionally, in OrioleDB in-memory page reading doesn't involve atomic operations. Together, these design decisions bring vertical scalability for Postgres to the whole new level.

  2. MVCC is based on the UNDO log concept. In OrioleDB, old versions of tuples do not cause bloat in the main storage system, but eviction into the undo log comprising undo chains. Page-level undo records allow the system to easily reclaim space occupied by deleted tuples as soon as possible. Together with page-mergins, these mechanisms eliminate bloat in the majority of cases. Dedicated VACUUMing of tables is not needed as well, removing a significant and common cause of system performance deterioration and database outages.

  3. Copy-on-write checkpoints and row-level WAL. OrioleDB utilizes copy-on-write checkpoints, which provides a structurally consistent snapshot of data every moment of time. This is friendly for modern SSDs and allows row-level WAL logging. In turn, row-level WAL logging is easy to parallelize (done), compact and suitable for active-active multimaster (planned).

See, usage and architecture documentation as well as PostgresBuild 2021 slides.

Status

OrioleDB now has public alpha status. It is recommended for experiments, testing, benchmarking, etc., but is not recommended for production usage. If you are interested in OrioleDB's benefits in production, please contact us.

Installation

Use docker container

We provide docker images for amd64 and arm64v8 architectures under Alpine Linux.

docker pull orioledb/orioledb

See our dockerhub for details.

Build from source

Before building and installing OrioleDB, one should ensure to have the following:

Typical installation procedure may look like this:

 $ git clone https://github.com/orioledb/orioledb
 $ cd orioledb
 $ make USE_PGXS=1
 $ make USE_PGXS=1 install
 $ make USE_PGXS=1 installcheck

Before starting working with OrioleDB, adding the following line to postgresql.conf is required. This change requires a restart of the PostgreSQL database server.

shared_preload_libraries = 'orioledb.so'

Setup

Run the following SQL query on the database to enable the OrioleDB engine.

CREATE EXTENSION orioledb;

Once the above steps are complete, you can start using OrioleDB's tables. See usage documentation for details.

CREATE TABLE table_name (...) USING orioledb;
Issues
  • Optimizer statistics difference

    Optimizer statistics difference

    After pgbench initialization estimated and actual rows are differs by two order of magnitude.

    postgres=# select n_live_tup from pg_stat_user_tables where relname = 'pgbench_accounts';
     n_live_tup 
    ------------
      100003298
    (1 row)
    
    postgres=# explain (analyze, buffers) select count(1) from pgbench_accounts;
                                                                  QUERY PLAN                                                               
    ---------------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=164157.55..164157.56 rows=1 width=8) (actual time=30772.449..30772.451 rows=1 loops=1)
       ->  Seq Scan on pgbench_accounts  (cost=0.00..150086.64 rows=5628364 width=0) (actual time=0.141..24405.699 rows=100000000 loops=1)
     Planning:
       Buffers: shared hit=23 read=2
     Planning Time: 0.458 ms
     Execution Time: 30772.624 ms
    (6 rows)
    
    postgres=# 
    
    opened by ioxgrey 22
  • orioledb_undo catalog grows steadily till the disk space is run out during a pgbench init procedure

    orioledb_undo catalog grows steadily till the disk space is run out during a pgbench init procedure

    Steps to reproduce:

    /usr/local/pgsql/bin/initdb --wal-segsize=128 /pg_data cp ./orioledb-test/postgresql.auto.conf /pg_data/

    /usr/local/pgsql/bin/pg_ctl -D /pg_data start /usr/local/pgsql/bin/pgbench -i -I d /usr/local/pgsql/bin/psql -f orioledb-test/pgbench.sql /usr/local/pgsql/bin/pg_ctl -D /pg_data stop

    mv /pg_data/orioledb_undo /pg_wal/ mv /pg_data/pg_wal /pg_wal/ ln -s /pg_wal/orioledb_undo /pg_data/orioledb_undo ln -s /pg_wal/pg_wal /pg_data/pg_wal

    /usr/local/pgsql/bin/pg_ctl -D /pg_data start

    export PGPASSWORD='password' /usr/local/pgsql/bin/pgbench -i -I gvp -s 75000 -U pgbench -h 127.0.0.1 pgbench

    … pgbench: fatal: PQputline failed(92%) done (elapsed 17690.38 s, remaining 1385.41 s)

    $ df -h | grep -e pg_data -e pg_wal -e Filesystem

    Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1 1.5T 1002G 488G 68% /pg_data /dev/nvme1n1 1.5T 1.5T 24K 100% /pg_wal

    $ du --max-depth=1 -h

    1.4T ./orioledb_undo 69G ./pg_wal 1.5T . Let's see a postgres log: 2022-03-09 23:34:15 MSK [61541]: [2-1] us=,db=,app=,cli= PANIC: could not write buffer to file orioledb_undo/00000058BA 2022-03-09 23:34:15 MSK [61456]: [9-1] us=,db=,app=,cli= LOG: background worker "orioledb background writer" (PID 61541) was terminated by signal 6: Aborted 2022-03-09 23:34:15 MSK [61456]: [10-1] us=,db=,app=,cli= LOG: terminating any other active server processes 2022-03-09 23:34:17 MSK [61456]: [11-1] us=,db=,app=,cli= LOG: all server processes terminated; reinitializing

    orioledb specific settings in the postgresql.auto.conf (server: 2 sockets, 36 physical cores total, 256GB RAM): orioledb.main_buffers = 16GB orioledb.free_tree_buffers = 256MB orioledb.catalog_buffers = 256MB orioledb.undo_buffers = 1GB orioledb.recovery_pool_size = 14 orioledb.recovery_queue_size = 64MB orioledb.checkpoint_completion_ratio = 0.9 orioledb.bgwriter_num_workers = 8 orioledb.max_io_concurrency = 64

    pgbench.sql:

    CREATE EXTENSION IF NOT EXISTS orioledb WITH SCHEMA public; COMMENT ON EXTENSION orioledb IS 'OrioleDB -- the next generation transactional engine';

    CREATE USER pgbench WITH PASSWORD 'password'; CREATE DATABASE pgbench; ALTER DATABASE pgbench OWNER TO pgbench;

    \c pgbench

    CREATE EXTENSION IF NOT EXISTS orioledb WITH SCHEMA public; COMMENT ON EXTENSION orioledb IS 'OrioleDB -- the next generation transactional engine';

    SET ROLE pgbench;

    SET default_tablespace = ''; SET default_table_access_method = orioledb;

    CREATE TABLE public.pgbench_accounts ( aid int8 NOT NULL, bid int8, abalance integer, filler character(84) ) USING orioledb WITH (fillfactor='100', compress = 1, toast_compress = 5, primary_compress = -1);

    ALTER TABLE public.pgbench_accounts OWNER TO pgbench;

    CREATE TABLE public.pgbench_branches ( bid int8 NOT NULL, bbalance integer, filler character(88) ) USING orioledb WITH (fillfactor='100', compress = 1, toast_compress = 5, primary_compress = -1);

    ALTER TABLE public.pgbench_branches OWNER TO pgbench;

    CREATE TABLE public.pgbench_history ( tid int8, bid int8, aid int8, delta integer, mtime timestamp without time zone, filler character(22) ) USING orioledb WITH (compress = 1, toast_compress = 5, primary_compress = -1);

    ALTER TABLE public.pgbench_history OWNER TO pgbench;

    CREATE TABLE public.pgbench_tellers ( tid int8 NOT NULL, bid int8, tbalance integer, filler character(84) ) USING orioledb WITH (fillfactor='100', compress = 1, toast_compress = 5, primary_compress = -1);

    ALTER TABLE public.pgbench_tellers OWNER TO pgbench;

    opened by vbp1 15
  • Orioledb bloat

    Orioledb bloat

    While using standard pgbench test orioledb_data directory size raise two times for approx one million transactions.

    pgbench -i  -s 100 postgres
    pg_ctl restart
    
    du -hs `ls` | grep orioledb_data
    1.6G	orioledb_data
    
    pgbench -c 10 -j 5 -t 100000 -P 1  postgres
    pg_ctl restart
    
    du -hs `ls` | grep orioledb_data
    3.6G	orioledb_data
    
    opened by ioxgrey 15
  • segfault at 0 ip 00007f605f3dc8f3 sp 00007ffdcca73680 error 6 in orioledb.so

    segfault at 0 ip 00007f605f3dc8f3 sp 00007ffdcca73680 error 6 in orioledb.so

    Hi!

    VM: 4vCPU, 16GB RAM, SSD, Centos 7

    I created pgbench tables with options:

    • compress = 5,
    • toast_compress = 10,
    • primary_compress = -1

    and start to load pgbench test like:

    [[email protected] ~]# /usr/pgsql-patches14/bin/pgbench -h 10.0.0.8 -p 5433 -U postgres -c 90 -j 4 -T 3600 -P 60 -n
    progress: 60.0 s, 475.0 tps, lat 183.291 ms stddev 7.043
    progress: 120.0 s, 489.0 tps, lat 183.984 ms stddev 9.003
    progress: 180.0 s, 491.6 tps, lat 183.076 ms stddev 6.638
    progress: 240.0 s, 492.0 tps, lat 182.927 ms stddev 6.709
    progress: 300.0 s, 491.1 tps, lat 183.203 ms stddev 6.653
    progress: 360.0 s, 491.7 tps, lat 183.022 ms stddev 6.703
    progress: 420.0 s, 489.5 tps, lat 183.882 ms stddev 8.176
    progress: 480.0 s, 490.8 tps, lat 183.303 ms stddev 6.986
    progress: 540.0 s, 491.7 tps, lat 182.990 ms stddev 6.713
    progress: 600.0 s, 491.7 tps, lat 183.037 ms stddev 6.626
    progress: 660.0 s, 491.7 tps, lat 183.012 ms stddev 6.764
    progress: 720.0 s, 488.9 tps, lat 184.066 ms stddev 8.690
    progress: 780.0 s, 491.5 tps, lat 183.074 ms stddev 6.469
    progress: 840.0 s, 491.2 tps, lat 183.236 ms stddev 6.825
    progress: 900.0 s, 491.9 tps, lat 182.908 ms stddev 6.304
    progress: 960.0 s, 491.1 tps, lat 183.267 ms stddev 6.548
    progress: 1020.0 s, 490.4 tps, lat 183.525 ms stddev 7.172
    WARNING:  terminating connection because of crash of another server process
    DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
    

    and the following issue in the log:

    Mar 29 12:16:40 pgbench-load-01 kernel: postgres[23000]: segfault at 0 ip 00007f605f3dc8f3 sp 00007ffdcca80e70 error 6
    Mar 29 12:16:40 pgbench-load-01 kernel: postgres[22999]: segfault at 0 ip 00007f605f3dc8f3 sp 00007ffdcca80e70 error 6
    Mar 29 12:16:40 pgbench-load-01 kernel: in orioledb.so[7f605f36c000+89000]
    Mar 29 12:16:40 pgbench-load-01 kernel:
    Mar 29 12:16:40 pgbench-load-01 kernel: postgres[23001]: segfault at 0 ip 00007f605f3dc8f3 sp 00007ffdcca80e70 error 6
    Mar 29 12:16:40 pgbench-load-01 kernel: in orioledb.so[7f605f36c000+89000]
    Mar 29 12:16:40 pgbench-load-01 kernel:
    Mar 29 12:16:40 pgbench-load-01 kernel: in orioledb.so[7f605f36c000+89000]
    Mar 29 12:16:40 pgbench-load-01 kernel:
    Mar 29 12:16:48 pgbench-load-01 kernel: postgres[16232]: segfault at 0 ip 00007f605f3dc8f3 sp 00007ffdcca73680 error 6 in orioledb.so[7f605f36c000+89000]
    Mar 29 12:16:48 pgbench-load-01 systemd-logind: Removed session 100.
    Mar 29 12:23:15 pgbench-load-01 systemd: Starting Cleanup of Temporary Directories...
    Mar 29 12:23:15 pgbench-load-01 systemd: Started Cleanup of Temporary Directories.
    Mar 29 12:26:59 pgbench-load-01 dhclient[692]: DHCPREQUEST on eth1 to 10.0.0.1 port 67 (xid=0x4102cba8)
    Mar 29 12:26:59 pgbench-load-01 dhclient[692]: DHCPACK from 10.0.0.1 (xid=0x4102cba8)
    Mar 29 12:26:59 pgbench-load-01 NetworkManager[667]: <info>  [1648556819.1422] dhcp4 (eth1):   address 10.0.0.8
    Mar 29 12:26:59 pgbench-load-01 NetworkManager[667]: <info>  [1648556819.1430] dhcp4 (eth1):   plen 32 (255.255.255.255)
    Mar 29 12:26:59 pgbench-load-01 NetworkManager[667]: <info>  [1648556819.1432] dhcp4 (eth1):   classless static route 10.0.0.1/32 gw 0.0.0.0
    Mar 29 12:26:59 pgbench-load-01 NetworkManager[667]: <info>  [1648556819.1433] dhcp4 (eth1):   gateway 10.0.0.1
    Mar 29 12:26:59 pgbench-load-01 NetworkManager[667]: <info>  [1648556819.1433] dhcp4 (eth1):   lease time 86400
    Mar 29 12:26:59 pgbench-load-01 NetworkManager[667]: <info>  [1648556819.1434] dhcp4 (eth1): state changed bound -> bound
    Mar 29 12:26:59 pgbench-load-01 dbus[581]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
    Mar 29 12:26:59 pgbench-load-01 dhclient[692]: bound to 10.0.0.8 -- renewal in 33148 seconds.
    Mar 29 12:26:59 pgbench-load-01 systemd: Starting Network Manager Script Dispatcher Service...
    Mar 29 12:26:59 pgbench-load-01 dbus[581]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
    Mar 29 12:26:59 pgbench-load-01 systemd: Started Network Manager Script Dispatcher Service.
    Mar 29 12:26:59 pgbench-load-01 nm-dispatcher: req:1 'dhcp4-change' [eth1]: new request (4 scripts)
    Mar 29 12:26:59 pgbench-load-01 nm-dispatcher: req:1 'dhcp4-change' [eth1]: start running ordered scripts...
    Mar 29 12:52:04 pgbench-load-01 su: (to postgres) root on pts/0
    Mar 29 12:52:25 pgbench-load-01 kernel: postgres[16776]: segfault at 0 ip 00007f184a6288f3 sp 00007fff38603c10 error 6
    Mar 29 12:52:25 pgbench-load-01 kernel: postgres[16778]: segfault at 0 ip 00007f184a6288f3 sp 00007fff38603c10 error 6
    Mar 29 12:52:25 pgbench-load-01 kernel: in orioledb.so[7f184a5b8000+89000]
    Mar 29 12:52:25 pgbench-load-01 kernel:
    Mar 29 12:52:25 pgbench-load-01 kernel: in orioledb.so[7f184a5b8000+89000]
    Mar 29 12:52:25 pgbench-load-01 kernel:
    Mar 29 12:52:30 pgbench-load-01 kernel: postgres[16782]: segfault at 0 ip 00007f184a6288f3 sp 00007fff385f6440 error 6 in orioledb.so[7f184a5b8000+89000]
    Mar 29 12:57:27 pgbench-load-01 kernel: postgres[16840]: segfault at 0 ip 00007fd75f4b98f3 sp 00007ffdd767bd40 error 6 in orioledb.so[7fd75f449000+89000]
    Mar 29 12:57:32 pgbench-load-01 kernel: postgres[16842]: segfault at 0 ip 00007fd75f4b98f3 sp 00007ffdd7677900 error 6 in orioledb.so[7fd75f449000+89000]
    

    If I try to start PostgreSQL it doesn't get recovered:

    2022-03-29 12:57:24.695 UTC [16833] LOG:  starting PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
    2022-03-29 12:57:24.696 UTC [16833] LOG:  listening on IPv4 address "0.0.0.0", port 5433
    2022-03-29 12:57:24.696 UTC [16833] LOG:  listening on IPv6 address "::", port 5433
    2022-03-29 12:57:24.697 UTC [16833] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
    2022-03-29 12:57:24.700 UTC [16833] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
    2022-03-29 12:57:24.704 UTC [16835] LOG:  database system was interrupted while in recovery at 2022-03-29 12:52:27 UTC
    2022-03-29 12:57:24.704 UTC [16835] HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
    2022-03-29 12:57:24.704 UTC [16838] LOG:  orioledb background writer started
    2022-03-29 12:57:24.704 UTC [16837] LOG:  orioledb background writer started
    2022-03-29 12:57:24.705 UTC [16836] LOG:  orioledb background writer started
    2022-03-29 12:57:24.810 UTC [16835] LOG:  database system was not properly shut down; automatic recovery in progress
    2022-03-29 12:57:24.812 UTC [16835] LOG:  orioledb recovery started.
    2022-03-29 12:57:24.818 UTC [16841] LOG:  orioledb recovery worker 0 started.
    2022-03-29 12:57:24.819 UTC [16839] LOG:  orioledb recovery worker 2 started.
    2022-03-29 12:57:24.819 UTC [16840] LOG:  orioledb recovery worker 1 started.
    2022-03-29 12:57:25.328 UTC [16835] LOG:  redo starts at D/8968F880
    2022-03-29 12:57:27.177 UTC [16833] LOG:  background worker "orioledb recovery worker" (PID 16840) was terminated by signal 11: Segmentation fault
    2022-03-29 12:57:27.177 UTC [16833] LOG:  terminating any other active server processes
    2022-03-29 12:57:27.187 UTC [16833] LOG:  all server processes terminated; reinitializing
    2022-03-29 12:57:29.333 UTC [16833] LOG:  OrioleDB public alpha 4 started
    2022-03-29 12:57:29.335 UTC [16842] LOG:  database system was interrupted while in recovery at 2022-03-29 12:57:24 UTC
    2022-03-29 12:57:29.335 UTC [16842] HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
    2022-03-29 12:57:29.448 UTC [16842] LOG:  database system was not properly shut down; automatic recovery in progress
    2022-03-29 12:57:29.451 UTC [16842] LOG:  orioledb recovery after fatal error started.  Unable to make multiprocess recovery.
    2022-03-29 12:57:29.921 UTC [16842] LOG:  redo starts at D/8968F880
    2022-03-29 12:57:32.279 UTC [16833] LOG:  startup process (PID 16842) was terminated by signal 11: Segmentation fault
    2022-03-29 12:57:32.279 UTC [16833] LOG:  aborting startup due to startup process failure
    2022-03-29 12:57:32.708 UTC [16833] LOG:  database system is shut down
    
    opened by Vadim0908 12
  • postgres  was terminated by signal 11: Segmentation fault during pgbench test

    postgres was terminated by signal 11: Segmentation fault during pgbench test

    We have pgbench db SF=75K initialized like #18 /usr/local/pgsql/bin/pgbench --client=100 --jobs=100 --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench pgbench (14.2) starting vacuum...end. progress: 10.0 s, 77986.2 tps, lat 1.278 ms stddev 0.544 ... pgbench: error: client 8 script 0 aborted in command 8 query 0: ERROR: failed to fetch tuple being updated progress: 70.0 s, 81340.5 tps, lat 1.228 ms stddev 0.560 ... WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

    From postgres.log: 2022-03-23 18:32:10 MSK [699032]: [1-1] us=pgbench,db=pgbench,app=pgbench,cli=127.0.0.1 ERROR: failed to fetch tuple being updated 2022-03-23 18:32:10 MSK [699032]: [2-1] us=pgbench,db=pgbench,app=pgbench,cli=127.0.0.1 STATEMENT: UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2; 2022-03-23 18:36:50 MSK [653113]: [15-1] us=,db=,app=,cli= LOG: orioledb checkpoint 72 complete 2022-03-23 18:36:52 MSK [653113]: [16-1] us=,db=,app=,cli= LOG: checkpoint complete: wrote 8 buffers (0.0%); 0 WAL file(s) added, 67 removed, 0 recycled; write=0.702 s, sync=0.001 s, total=157.240 s; sync files=7, longest=0.001 s, average=0.001 s; distance=5112146 kB, estimate=5112146 kB 2022-03-23 18:36:55 MSK [652995]: [9-1] us=,db=,app=,cli= LOG: background worker "orioledb background writer" (PID 653112) was terminated by signal 11: Segmentation fault 2022-03-23 18:36:55 MSK [652995]: [10-1] us=,db=,app=,cli= LOG: terminating any other active server processes 2022-03-23 18:36:57 MSK [652995]: [11-1] us=,db=,app=,cli= LOG: all server processes terminated; reinitializing 2022-03-23 18:37:04 MSK [652995]: [12-1] us=,db=,app=,cli= LOG: OrioleDB public alpha 3 started 2022-03-23 18:37:04 MSK [703713]: [1-1] us=,db=,app=,cli= LOG: database system was interrupted; last known up at 2022-03-23 18:36:51 MSK 2022-03-23 18:37:04 MSK [703713]: [2-1] us=,db=,app=,cli= LOG: database system was not properly shut down; automatic recovery in progress 2022-03-23 18:37:04 MSK [703713]: [3-1] us=,db=,app=,cli= LOG: orioledb recovery after fatal error started. Unable to make multiprocess recovery. 2022-03-23 18:37:21 MSK [703713]: [4-1] us=,db=,app=,cli= LOG: redo starts at D0/AC9ED5E0 2022-03-23 18:37:31 MSK [652995]: [13-1] us=,db=,app=,cli= LOG: startup process (PID 703713) was terminated by signal 11: Segmentation fault 2022-03-23 18:37:31 MSK [652995]: [14-1] us=,db=,app=,cli= LOG: aborting startup due to startup process failure 2022-03-23 18:37:33 MSK [652995]: [15-1] us=,db=,app=,cli= LOG: database system is shut down

    Unfortunately there is no coredump of the first failure, but I do have coredump of the second. And... Signal: 11 (SEGV) Command Line: postgres: startup recovering 00000001000000D000000015

    Stack trace of thread 742878: #0 0x00007effb1adeb13 seq_buf_write_u32 (orioledb.so + 0x71b13) #1 0x00007effb1ab4847 free_extent_for_checkpoint (orioledb.so + 0x47847) #2 0x00007effb1a96d94 try_merge_pages (orioledb.so + 0x29d94) #3 0x00007effb1a97708 btree_try_merge_and_unlock (orioledb.so + 0x2a708) #4 0x00007effb1a934f1 walk_page (orioledb.so + 0x264f1) #5 0x00007effb1ade235 ppool_run_clock (orioledb.so + 0x71235) #6 0x00007effb1ade2df ppool_reserve_pages (orioledb.so + 0x712df) #7 0x00007effb1a901f8 o_btree_insert_item (orioledb.so + 0x231f8) #8 0x00007effb1a90c10 o_btree_insert_tuple_to_leaf (orioledb.so + 0x23c10) #9 0x00007effb1a97a22 o_btree_modify_insert_update (orioledb.so + 0x2aa22) #10 0x00007effb1a97dc7 o_btree_modify_handle_tuple_not_found (orioledb.so + 0x2adc7) #11 0x00007effb1a98050 o_btree_modify_internal (orioledb.so + 0x2b050) #12 0x00007effb1a9906e o_btree_normal_modify (orioledb.so + 0x2c06e) #13 0x00007effb1a998dc o_btree_autonomous_insert (orioledb.so + 0x2c8dc) #14 0x00007effb1aa7da0 free_extent (orioledb.so + 0x3ada0) #15 0x00007effb1ac2956 o_tree_init_free_extents (orioledb.so + 0x55956) #16 0x00007effb1ac3883 o_btree_load_shmem (orioledb.so + 0x56883) #17 0x00007effb1ac1392 apply_modify_record (orioledb.so + 0x54392) #18 0x00007effb1abf0a1 replay_container (orioledb.so + 0x520a1) #19 0x00007effb1abf5b4 o_recovery_logicalmsg_redo_hook (orioledb.so + 0x525b4) #20 0x00000000007623fa logicalmsg_redo (postgres + 0x3623fa) #21 0x000000000057a853 StartupXLOG (postgres + 0x17a853) #22 0x000000000074c42e StartupProcessMain (postgres + 0x34c42e) #23 0x000000000058863b AuxiliaryProcessMain (postgres + 0x18863b) #24 0x00000000007488bd StartChildProcess (postgres + 0x3488bd) #25 0x0000000000748a59 PostmasterStateMachine (postgres + 0x348a59) #26 0x00000000007499b5 reaper (postgres + 0x3499b5) #27 0x00007effb1ed78b0 __restore_rt (libpthread.so.0 + 0x128b0) #28 0x00007effb1c52437 __GI___select (libc.so.6 + 0xf1437) #29 0x000000000074a52d ServerLoop (postgres + 0x34a52d) #30 0x000000000074bd98 PostmasterMain (postgres + 0x34bd98) #31 0x00000000004e7a7d main (postgres + 0xe7a7d) #32 0x00007effb1b8508b __libc_start_main (libc.so.6 + 0x2408b) #33 0x00000000004e7b1a _start (postgres + 0xe7b1a)

    opened by vbp1 7
  • One more bloat issue

    One more bloat issue

    I try to reproduce Write-amplification and bloat test from here https://gist.github.com/akorotkov/f5e98ba5805c42ee18bf945b30cc3d67 Standard engine

    pgbench -c 74 -j 74 -P 1 -M prepared -f wa_bloat_script.sql -t 1000000 postgres
    number of transactions actually processed: 74000000/74000000
    tps = 68255.683199 (without initial connection time)
    
    du -hs `ls` | grep base
    5.5G	base
    

    Orioledb engine

    pgbench -c 74 -j 74 -P 1 -M prepared -f wa_bloat_script.sql -t 1000000 postgres
    number of transactions actually processed: 74000000/74000000 
    tps = 164179.914483 (without initial connection time)
    

    Is 2.5 times faster but data size grows proportionally

    du -hs `ls` | grep oriole
    13G	orioledb_data
    

    This size tend to grow further

    pgbench -c 74 -j 74 -P 1 -M prepared -f wa_bloat_script.sql -t 1000000 postgres
    number of transactions actually processed: 74000000/74000000
    tps = 151986.912793 (without initial connection time)
    
    du -hs `ls` | grep oriole
    17G	orioledb_data
    
    opened by ioxgrey 7
  • Segmentation fault in WSL

    Segmentation fault in WSL

    Hi there,

    Awesome project, really looking forward to this! I tried this locally, generally following the instructions, but I got a segfault:

     create table test (id bigserial primary key, value varchar) using orioledb;
    2022-02-24 23:46:58.736 PST [6551] ERROR:  Unable to found internal opclass 8500 for datoid 16385.
    2022-02-24 23:46:58.736 PST [6551] STATEMENT:  create table test (id bigserial primary key, value varchar) using orioledb;
    ERROR:  Unable to found internal opclass 8500 for datoid 16385.
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    2022-02-24 23:46:58.737 PST [6508] LOG:  server process (PID 6551) was terminated by signal 11: Segmentation fault
    2022-02-24 23:46:58.737 PST [6508] DETAIL:  Failed process was running: create table test (id bigserial primary key, value varchar) using orioledb;
    The connection to the server was lost. Attempting reset: 2022-02-24 23:46:58.737 PST [6508] LOG:  terminating any other active server processes
    2022-02-24 23:46:58.738 PST [6552] FATAL:  the database system is in recovery mode
    Failed.
    !> 2022-02-24 23:46:58.738 PST [6508] LOG:  all server processes terminated; reinitializing
    2022-02-24 23:46:58.756 PST [6508] LOG:  OrioleDB public alpha 1 started
    2022-02-24 23:46:58.757 PST [6553] LOG:  database system was interrupted; last known up at 2022-02-24 23:46:34 PST
    2022-02-24 23:46:58.904 PST [6553] LOG:  database system was not properly shut down; automatic recovery in progress
    2022-02-24 23:46:58.905 PST [6553] LOG:  orioledb recovery after fatal error started.  Unable to make multiprocess recovery.
    2022-02-24 23:46:58.907 PST [6553] LOG:  redo starts at 0/16F07B8
    2022-02-24 23:46:58.908 PST [6553] LOG:  invalid record length at 0/1714E48: wanted 24, got 0
    2022-02-24 23:46:58.908 PST [6553] LOG:  redo done at 0/1714860 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
    2022-02-24 23:46:58.916 PST [6553] LOG:  orioledb recovery finished.
    2022-02-24 23:46:58.918 PST [6553] LOG:  orioledb checkpoint 2 started
    2022-02-24 23:46:58.939 PST [6553] LOG:  orioledb checkpoint 2 complete
    2022-02-24 23:46:58.952 PST [6508] LOG:  database system is ready to accept connections
    2022-02-24 23:46:58.952 PST [6559] LOG:  orioledb background writer started
    

    How to reproduce:

    1. Clone and install Postgres with patches using ./configure --prefix=/opt/postgresql-14-extensions
    2. Install orioledb extension as documented.
    3. Run CREATE EXTENSION orioledb; (works)
    4. Run create table test (id bigserial primary key, value varchar) using orioledb; (segfaults)

    Versions used:

    Extension: 19516a549a23abed91f0fcac954f0c1c7f3647b6 Postgres: 87605494f7aa5d3c7a056000b3e9eab76d189c30

    Environment: WSL2 (Windows Subsystem for Linux 2), x86_64 (AMD), Windows 10 host

    opened by levkk 6
  • EXPLAIN SIGSEGV

    EXPLAIN SIGSEGV

    I am start a transaction during pgbench initialization

    pgbench -i -I gv -s 1000 postgres
    generating data (client-side)...
    100000000 of 100000000 tuples (100%) done (elapsed 223.70 s, remaining 0.00 s)
    vacuuming...
    done in 239.29 s (client-side generate 238.71 s, vacuum 0.58 s).
    
    postgres=# begin;
    BEGIN
    postgres=*# select count(1) from pgbench_accounts;
       count
    -----------
     100000000
    (1 row)
    
    postgres=*# explain (analyze, buffers) select * from pgbench_accounts ORDER BY aid DESC LIMIT 10;
                                                                   QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.18..0.53 rows=10 width=97) (actual time=0.043..0.047 rows=10 loops=1)
       ->  Custom Scan (o_scan) on pgbench_accounts  (cost=0.18..196990.44 rows=5628364 width=97) (actual time=0.042..0.045 rows=20 loops=1)
             Backward index scan of: pgbench_accounts_pkey
             Primary pages: read=4
     Planning:
       Buffers: shared hit=30 dirtied=3
     Planning Time: 0.198 ms
     Execution Time: 0.067 ms
    (8 rows)
    
    postgres=*# explain (analyze, buffers) select count(1) from pgbench_accounts;
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Succeeded.
    postgres=#
    
    [email protected]:/db/14$ gdb postgres postgres.core
    #0  0x00007fcd098c60d9 in find_tree_in_descr () from /usr/local/pgsql/lib/orioledb.so
    (gdb) bt
    #0  0x00007fcd098c60d9 in find_tree_in_descr () from /usr/local/pgsql/lib/orioledb.so
    #1  0x00007fcd098a0586 in o_btree_read_page () from /usr/local/pgsql/lib/orioledb.so
    #2  0x00007fcd0989215b in btree_find_read_page () from /usr/local/pgsql/lib/orioledb.so
    #3  0x00007fcd09892f07 in find_page () from /usr/local/pgsql/lib/orioledb.so
    #4  0x00007fcd098a41f4 in load_next_internal_page () from /usr/local/pgsql/lib/orioledb.so
    #5  0x00007fcd098a5409 in make_btree_seq_scan_cb () from /usr/local/pgsql/lib/orioledb.so
    #6  0x00007fcd098ca9b5 in orioledb_beginscan () from /usr/local/pgsql/lib/orioledb.so
    #7  0x0000562f9b3ea762 in SeqNext ()
    #8  0x0000562f9b3c4739 in ExecProcNodeInstr ()
    #9  0x0000562f9b3cd841 in fetch_input_tuple ()
    #10 0x0000562f9b3d02d6 in ExecAgg ()
    #11 0x0000562f9b3c4739 in ExecProcNodeInstr ()
    #12 0x0000562f9b3be842 in standard_ExecutorRun ()
    #13 0x0000562f9b36ba9d in ExplainOnePlan ()
    #14 0x0000562f9b36beac in ExplainOneQuery ()
    #15 0x0000562f9b36c5a4 in ExplainQuery ()
    #16 0x0000562f9b521569 in standard_ProcessUtility ()
    #17 0x00007fcd098aa3db in orioledb_utility_command () from /usr/local/pgsql/lib/orioledb.so
    #18 0x0000562f9b51fc9f in PortalRunUtility ()
    #19 0x0000562f9b520037 in FillPortalStore ()
    #20 0x0000562f9b5202fd in PortalRun ()
    #21 0x0000562f9b51c593 in exec_simple_query ()
    #22 0x0000562f9b51e410 in PostgresMain ()
    #23 0x0000562f9b4a4251 in ServerLoop ()
    #24 0x0000562f9b4a5082 in PostmasterMain ()
    #25 0x0000562f9b248c00 in main ()
    (gdb)
    
    opened by ioxgrey 5
  • Checkpoint blocks commit

    Checkpoint blocks commit

    During pgbench setup run after data generation step pgbench backend stuck in commit

    pgbench -i -s 200 postgres
    dropping old tables...
    creating tables...
    generating data (client-side)...
    20000000 of 20000000 tuples (100%) done (elapsed 47.60 s, remaining 0.00 s)
    vacuuming...
    creating primary keys...
    done in 643.01 s (drop tables 0.05 s, create tables 0.01 s, client-side generate 619.78 s, vacuum 1.52 s, primary keys 21.65 s).
    

    Compare "elapsed 47.60 s" to "client-side generate 619.78 s"

    opened by ioxgrey 5
  • Variable-size pages?

    Variable-size pages?

    https://github.com/orioledb/orioledb/blob/main/doc/arch.md#data-compression says that storage pages have variable length.

    How did you pull that off? So far as I can tell, pg uses BLCKSZ pretty much everywhere. I looked through the code and I'm not seeing how you did it.

    Did you somehow replace md.c in the storage manager?

    I'm not seeing any commits here (https://github.com/orioledb/postgres/commits/patches14) which implement variable-length pages, but perhaps I missed it.

    ORIOLEDB_BLCKSZ and ORIOLEDB_COMP_BLCKSZ are fixed.

    opened by ccleve 4
  • very long shutdown checkpoint on a database contains compressed tables

    very long shutdown checkpoint on a database contains compressed tables

    orioledb branch: fix_free_extents_vacated_bytes

    testing env: 2xCPU Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz (72 cores), 256GB RAM, 4x1.6TB NVME INTEL SSDPEDMD016T4 RAID0 for pgdata, Altlinux c9f1 orioledb specific settings:

    orioledb.main_buffers = 16GB
    orioledb.free_tree_buffers = 1GB
    orioledb.catalog_buffers = 256MB
    orioledb.undo_buffers = 1GB
    orioledb.recovery_pool_size = 4
    orioledb.recovery_queue_size = 64MB
    orioledb.checkpoint_completion_ratio = 0.9
    orioledb.bgwriter_num_workers = 8
    orioledb.max_io_concurrency = 64
    

    steps to reproduce a problem:

    1. fist of all we shoult create pgbench db containing compressed tables (in my case Scale factor = 75К)
    2. then run pgbench --client=100 --jobs=100 --protocol=prepared --progress=10 --time=90 --username=pgbench --host=127.0.0.1 --port=5433 pgbench
    3. then we should try to stop dbms: pg_ctl -D /pg_data stop

    On my testing enviroinment, when db contains only uncompressed tables shutdown checkpoint takes 20-30 seconds, when db contains compressed tables shutdown checkpoint takes 15-30 minutes, sometimes it never finishes (at least I couldn't wait untill the end)

    There is no errors or suspicious messages in the postgres log.

    PID     USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    3531574 postgres  20   0   25.5g  20.2g  20.2g R 100.0   8.0   55:57.55 postgres: checkpointer performing shutdown checkpoint
    
    perf top shows us:
    Samples: 2K of event 'cycles', 99 Hz, Event count (approx.): 38390778263 lost: 0/0 drop: 0/0
    Overhead  Shared Object                    Symbol
      39.95%  orioledb.so                      [.] get_extent
      10.53%  orioledb.so                      [.] btree_page_search
       8.18%  orioledb.so                      [.] find_page
       6.43%  orioledb.so                      [.] free_tree_len_off_cmp
       4.69%  orioledb.so                      [.] o_btree_read_page
    
    # gdb -p 3531574 --ex bt --batch
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Missing separate debuginfo for /lib64/libzstd.so.1
    Try to install the hash file /usr/lib/debug/.build-id/5f/b21d91bc2846571353e13ad181b37f8dcbc9b4.debug
    0x00007f95217071af in btree_page_search () from /usr/local/pgsql/lib/orioledb.so
    #0  0x00007f95217071af in btree_page_search () from /usr/local/pgsql/lib/orioledb.so
    #1  0x00007f9521707a76 in find_page () from /usr/local/pgsql/lib/orioledb.so
    #2  0x00007f9521720a12 in get_extent () from /usr/local/pgsql/lib/orioledb.so
    #3  0x00007f952170b364 in get_free_disk_extent () from /usr/local/pgsql/lib/orioledb.so
    #4  0x00007f952170bf2b in perform_page_io () from /usr/local/pgsql/lib/orioledb.so
    #5  0x00007f952173098c in checkpoint_ix () from /usr/local/pgsql/lib/orioledb.so
    #6  0x00007f95217327cf in checkpoint_tables_callback () from /usr/local/pgsql/lib/orioledb.so
    #7  0x00007f9521725587 in o_indices_foreach_oids () from /usr/local/pgsql/lib/orioledb.so
    #8  0x00007f9521731eaf in o_perform_checkpoint () from /usr/local/pgsql/lib/orioledb.so
    #9  0x0000000000572ae9 in CheckPointGuts ()
    #10 0x0000000000579446 in CreateCheckPoint ()
    #11 0x000000000073eac6 in CheckpointerMain ()
    #12 0x000000000058864f in AuxiliaryProcessMain ()
    #13 0x00000000007488bd in StartChildProcess ()
    #14 0x0000000000749bf2 in reaper ()
    #15 <signal handler called>
    #16 0x00007f95218cb437 in __GI___select (nfds=9, readfds=0x7ffd45779210, writefds=0x0, exceptfds=0x0, timeout=0x7ffd45779160) at ../sysdeps/unix/sysv/linux/select.c:41
    #17 0x000000000074a52d in ServerLoop ()
    #18 0x000000000074bd98 in PostmasterMain ()
    #19 0x00000000004e7a7d in main ()
    [Inferior 1 (process 3531574) detached]
    
    opened by vbp1 2
  • stack smashing on pgbench rw test

    stack smashing on pgbench rw test

    testing env: 2xCPU Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz (72 cores), 256GB RAM, 4x1.6TB NVME INTEL SSDPEDMD016T4 RAID0 for pgdata, Altlinux c9f1 orioledb specific settings:

    orioledb.main_buffers = 16GB orioledb.free_tree_buffers = 1GB orioledb.catalog_buffers = 256MB orioledb.undo_buffers = 1GB orioledb.recovery_pool_size = 4 orioledb.recovery_queue_size = 64MB orioledb.checkpoint_completion_ratio = 0.9 orioledb.bgwriter_num_workers = 8 orioledb.max_io_concurrency = 64

    pgbench (Scale factor = 75К) orioledb (Commits on May 2, 2022) + uncompressed tables

    test log:

    2022-05-08 10:46:01: Tests [orioledb_RW-retest] is started. Runtime for each job is 900
    2022-05-08 10:46:01: Running task with parameters: query mode prepared, facuum full 0, clients 100, runtime 900.
    2022-05-08 10:46:01: Exec string is /opt/pgpro/ent-13/bin/pgbench --client=100 --jobs=100 --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench
    2022-05-08 11:01:01: Running task with parameters: query mode prepared, facuum full 0, clients 250, runtime 900.
    2022-05-08 11:01:01: Exec string is /opt/pgpro/ent-13/bin/pgbench --client=250 --jobs=250 --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench
    2022-05-08 11:16:03: Running task with parameters: query mode prepared, facuum full 0, clients 500, runtime 900.
    2022-05-08 11:16:03: Exec string is /opt/pgpro/ent-13/bin/pgbench --client=500 --jobs=500 --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench
    2022-05-08 11:31:06: Running task with parameters: query mode prepared, facuum full 0, clients 750, runtime 900.
    2022-05-08 11:31:06: Exec string is /opt/pgpro/ent-13/bin/pgbench --client=750 --jobs=750 --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench
    2022-05-08 11:46:09: Running task with parameters: query mode prepared, facuum full 0, clients 1000, runtime 900.
    2022-05-08 11:46:09: Exec string is /opt/pgpro/ent-13/bin/pgbench --client=1000 --jobs=1000 --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench
    2022-05-08 11:51:45: /opt/pgpro/ent-13/bin/pgbench got an error. canceling test [orioledb_RW-retest].
    

    1000_clients.job.log:

    ...
    progress: 310.0 s, 67222.1 tps, lat 14.829 ms stddev 16.250
    progress: 320.0 s, 66037.5 tps, lat 15.111 ms stddev 15.738
    WARNING:  terminating connection because of crash of another server process
    DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process
     exited abnormally and possibly corrupted shared memory.
    HINT:  In a moment you should be able to reconnect to the database and repeat your command.
    ...
    

    postgres.log:

    2022-05-08 11:49:56 MSK [2079647]: [49-1] us=,db=,app=,cli= LOG:  checkpoint starting: time
    2022-05-08 11:49:56 MSK [2079647]: [50-1] us=,db=,app=,cli= LOG:  orioledb checkpoint 95 started
    *** stack smashing detected ***: <unknown> terminated
    2022-05-08 11:51:40 MSK [2079257]: [9-1] us=,db=,app=,cli= LOG:  background worker "orioledb background writer" (PID 2079644) was terminated by signal 11: Segmentation fault
    2022-05-08 11:51:40 MSK [2079257]: [10-1] us=,db=,app=,cli= LOG:  terminating any other active server processes
    2022-05-08 11:51:45 MSK [2079257]: [11-1] us=,db=,app=,cli= LOG:  issuing SIGKILL to recalcitrant children
    2022-05-08 11:51:45 MSK [2079257]: [12-1] us=,db=,app=,cli= LOG:  all server processes terminated; reinitializing
    

    syslog

    stack traces:

    # coredumpctl info 2079647
    PID: 2079647 (postgres)
    UID: 46 (postgres)
    GID: 46 (postgres)
    Signal: 6 (ABRT)
    Command Line: postgres: checkpointer
    Message: Process 2079647 (postgres) of user 46 dumped core.
    Stack trace of thread 2079647:
         #0  0x00007f8ee31b97eb __GI_raise (libc.so.6 + 0x387eb)
         #1  0x00007f8ee31a3515 __GI_abort (libc.so.6 + 0x22515)
         #2  0x00007f8ee31fb598 __libc_message (libc.so.6 + 0x7a598)
         #3  0x00007f8ee328bb8d __GI___fortify_fail_abort (libc.so.6 + 0x10ab8d)
         #4  0x00007f8ee328bb42 __stack_chk_fail (libc.so.6 + 0x10ab42)
         #5  0x00007f8ee30b9f9f btree_page_reorg (orioledb.so + 0x2ff9f)
         #6  0x00007f8e40604050 n/a (n/a + 0x0)
    
    # coredumpctl info 2079644
    PID: 2079644 (postgres)
    UID: 46 (postgres)
    GID: 46 (postgres)
    Signal: 11 (SEGV)
    Command Line: postgres: orioledb background writer
    Message: Process 2079644 (postgres) of user 46 dumped core.
    Stack trace of thread 2079644:
         #0  0x00007f8ee30ed5bd o_idx_len (orioledb.so + 0x635bd)
         #1  0x00007f8ee30bb646 o_btree_len (orioledb.so + 0x31646)
         #2  0x00007f8ee30d8632 get_checkpoint_number (orioledb.so + 0x4e632)
         #3  0x00007f8ee30b1953 walk_page (orioledb.so + 0x27953)
         #4  0x00007f8ee30fd585 ppool_run_clock (orioledb.so + 0x73585)
         #5  0x00007f8ee30fc81a bgwriter_main (orioledb.so + 0x7281a)
         #6  0x000000000073d80f StartBackgroundWorker (postgres + 0x33d80f)
         #7  0x0000000000749575 do_start_bgworker (postgres + 0x349575)
         #8  0x000000000074bd93 PostmasterMain (postgres + 0x34bd93)
         #9  0x00000000004e7a7d main (postgres + 0xe7a7d)
         #10 0x00007f8ee31a508b __libc_start_main (libc.so.6 + 0x2408b)
         #11 0x00000000004e7b1a _start (postgres + 0xe7b1a)
    

    stack trace for other background writers looks like below:

    # coredumpctl info 2079642
    PID: 2079642 (postgres)
    UID: 46 (postgres)
    GID: 46 (postgres)
    Signal: 11 (SEGV)
    Command Line: postgres: orioledb background writer
    Message: Process 2079642 (postgres) of user 46 dumped core.
    Stack trace of thread 2079642:
         #0  0x00007f8ee32dce83 __memmove_avx_unaligned_erms (libc.so.6 + 0x15be83)
         #1  0x00007f8ee30bb65f memcpy (orioledb.so + 0x3165f)
         #2  0x00007f8ee30d8632 get_checkpoint_number (orioledb.so + 0x4e632)
         #3  0x00007f8ee30b1953 walk_page (orioledb.so + 0x27953)
         #4  0x00007f8ee30fd585 ppool_run_clock (orioledb.so + 0x73585)
         #5  0x00007f8ee30fc81a bgwriter_main (orioledb.so + 0x7281a)
         #6  0x000000000073d80f StartBackgroundWorker (postgres + 0x33d80f)
         #7  0x0000000000749575 do_start_bgworker (postgres + 0x349575)
         #8  0x000000000074bd93 PostmasterMain (postgres + 0x34bd93)
         #9  0x00000000004e7a7d main (postgres + 0xe7a7d)
         #10 0x00007f8ee31a508b __libc_start_main (libc.so.6 + 0x2408b)
         #11 0x00000000004e7b1a _start (postgres + 0xe7b1a)
    

    stack trace for workers looks like below:

    # coredumpctl info 2273949
    PID: 2273949 (postgres)
    UID: 46 (postgres)
    GID: 46 (postgres)
    Signal: 11 (SEGV)
    Command Line: postgres: pgbench pgbench 127.0.0.1(47518) UPDATE
    Message: Process 2273949 (postgres) of user 46 dumped core.
    Stack trace of thread 2273949:
         #0  0x00007f8ee32dce83 __memmove_avx_unaligned_erms (libc.so.6 + 0x15be83)
         #1  0x00007f8ee30bb65f memcpy (orioledb.so + 0x3165f)
         #2  0x00007f8ee30d8632 get_checkpoint_number (orioledb.so + 0x4e632)
         #3  0x00007f8ee30b1953 walk_page (orioledb.so + 0x27953)
         #4  0x00007f8ee30fd585 ppool_run_clock (orioledb.so + 0x73585)
         #5  0x00007f8ee30fd62f ppool_reserve_pages (orioledb.so + 0x7362f)
         #6  0x00007f8ee30b0a45 load_page (orioledb.so + 0x26a45)
         #7  0x00007f8ee30ace10 find_page (orioledb.so + 0x22e10)
         #8  0x00007f8ee30b3779 o_btree_find_tuple_by_key_cb (orioledb.so + 0x29779)
         #9  0x00007f8ee30b3a51 o_btree_find_tuple_by_key (orioledb.so + 0x29a51)
         #10 0x00007f8ee30e7d65 o_iterate_index (orioledb.so + 0x5dd65)
         #11 0x00007f8ee30e8210 o_index_scan_getnext (orioledb.so + 0x5e210)
         #12 0x00007f8ee30e8469 o_exec_fetch (orioledb.so + 0x5e469)
         #13 0x00007f8ee30ec4cf o_exec_custom_scan (orioledb.so + 0x624cf)
         #14 0x000000000069020d ExecProcNode (postgres + 0x29020d)
         #15 0x000000000066641b ExecProcNode (postgres + 0x26641b)
         #16 0x00007f8ee3126405 pgss_ExecutorRun (pg_stat_statements.so + 0x4405)
         #17 0x00000000007c7d20 ProcessQuery (postgres + 0x3c7d20)
         #18 0x00000000007c86da PortalRunMulti (postgres + 0x3c86da)
    
    opened by vbp1 13
  • More details of undo design 🚀

    More details of undo design 🚀

    First and foremost, I want to let you know that I think this engine is a solid work.

    I've been following the in-place update engine of postgres for a long time. There was an implementation of in-place update engine called [zheap](https://github.com/EnterpriseDB/zheap), which is incomplete and has met some issues.

    It would be kind of you to show more details(detailed design doc, trade-off between space and performance, etc.) of your undo design and the difference between orioledb and zheap.

    opened by ezreal1997 1
  • stuck spinlock at oxid_get_csn on primary <-> sync replica (synchronous_commit = 'remote_apply') configuration

    stuck spinlock at oxid_get_csn on primary <-> sync replica (synchronous_commit = 'remote_apply') configuration

    Steps to reproduce:

    1. create primary <-> sync replica (synchronous_commit = 'remote_apply') pair
    2. run on primary: pgbench --client=xxx --jobs=xxx --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench
    3. when previous pgbench finishes, without any pause, run on primary: pgbench --client=yyy --jobs=yyy --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench While running the preliminary vacuum, we will get an error on primary.

    postgres log:

    2022-05-04 21:02:36 MSK [210064]: [14-1] us=,db=,app=,cli= LOG:  orioledb checkpoint 108 started
    2022-05-04 21:07:25 MSK [210064]: [15-1] us=,db=,app=,cli= PANIC:  stuck spinlock detected at oxid_get_csn, src/transam/oxid.c:728
    2022-05-04 21:07:26 MSK [208645]: [14-1] us=,db=,app=,cli= LOG:  checkpointer process (PID 210064) was terminated by signal 6: Aborted
    2022-05-04 21:07:26 MSK [208645]: [15-1] us=,db=,app=,cli= LOG:  terminating any other active server processes
    

    coredump/stack trace:

    [[email protected] data]# coredumpctl info 210064
        PID: 210064 (postgres)
        UID: 46 (postgres)
        GID: 46 (postgres)
        Signal: 6 (ABRT)
        Command Line: postgres: checkpointer
        Executable: /usr/local/pgsql/bin/postgres
        Message: Process 210064 (postgres) of user 46 dumped core.
    
                    Stack trace of thread 210064:
                    #0  0x00007fd5d81c07eb __GI_raise (libc.so.6 + 0x387eb)
                    #1  0x00007fd5d81aa515 __GI_abort (libc.so.6 + 0x22515)
                    #2  0x00000000008d6c65 errfinish (postgres + 0x4d6c65)
                    #3  0x00000000004c7dab s_lock_stuck (postgres + 0xc7dab)
                    #4  0x00007fd5d814ad95 oxid_get_csn (orioledb.so + 0x68d95)
                    #5  0x00007fd5d814aeb0 xid_is_finished (orioledb.so + 0x68eb0)
                    #6  0x00007fd5d811a368 find_non_lock_only_undo_record (orioledb.so + 0x38368)
                    #7  0x00007fd5d810b412 o_find_tuple_version (orioledb.so + 0x29412)
                    #8  0x00007fd5d810c076 o_btree_iterator_fetch_internal (orioledb.so + 0x2a076)
                    #9  0x00007fd5d81227aa o_indices_foreach_oids (orioledb.so + 0x407aa)
                    #10 0x00007fd5d812f970 o_perform_checkpoint (orioledb.so + 0x4d970)
                    #11 0x0000000000572ae9 CheckPointGuts (postgres + 0x172ae9)
                    #12 0x0000000000579446 CreateCheckPoint (postgres + 0x179446)
                    #13 0x000000000073e832 CheckpointerMain (postgres + 0x33e832)
                    #14 0x000000000058864f AuxiliaryProcessMain (postgres + 0x18864f)
                    #15 0x00000000007488bd StartChildProcess (postgres + 0x3488bd)
                    #16 0x000000000074a072 sigusr1_handler (postgres + 0x34a072)
                    #17 0x00007fd5d84fe8b0 __restore_rt (libpthread.so.0 + 0x128b0)
                    #18 0x00007fd5d8279437 __GI___select (libc.so.6 + 0xf1437)
                    #19 0x000000000074a52d ServerLoop (postgres + 0x34a52d)
                    #20 0x000000000074bd98 PostmasterMain (postgres + 0x34bd98)
    

    One can avoid the problem by running pgbench with --no-vacuum.

    opened by vbp1 2
  • poor rw performance when a database contains compressed tables

    poor rw performance when a database contains compressed tables

    testing env: 2xCPU Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz (72 cores), 256GB RAM, 4x1.6TB NVME INTEL SSDPEDMD016T4 RAID0 for pgdata, Altlinux c9f1 orioledb specific settings:

    orioledb.main_buffers = 16GB
    orioledb.free_tree_buffers = 1GB
    orioledb.catalog_buffers = 256MB
    orioledb.undo_buffers = 1GB
    orioledb.recovery_pool_size = 4
    orioledb.recovery_queue_size = 64MB
    orioledb.checkpoint_completion_ratio = 0.9
    orioledb.bgwriter_num_workers = 8
    orioledb.max_io_concurrency = 64
    

    pgbench (Scale factor = 75К) orioledb alpha5 + uncompressed tables vs orioledb branch fix_free_extents_vacated_bytes + compressed tables test results looks like that: pgbench_75k_rw

    pgbench exec string was:

    /opt/pgpro/ent-13/bin/pgbench --client=n --jobs=n --protocol=prepared --progress=10 --time=900 --username=pgbench --host=127.0.0.1 --port=5432 pgbench
    

    I've noticed a lot of LWLock | BufferContent while taking the orioledb branch fix_free_extents_vacated_bytes + compressed tables test.

    perf top (orioledb branch fix_free_extents_vacated_bytes + compressed tables test):

    Samples: 14K of event 'cycles', 99 Hz, Event count (approx.): 410994119338 lost: 0/0 drop: 10042/10042
    Overhead  Shared Object         Symbol
      47.96%  orioledb.so           [.] get_extent
      12.67%  orioledb.so           [.] find_page
      10.23%  orioledb.so           [.] free_tree_len_off_cmp
       6.58%  orioledb.so           [.] btree_page_search
       2.62%  orioledb.so           [.] o_btree_read_page
    
    opened by vbp1 9
  • Relation sizes

    Relation sizes

    Hi!

    Thanks for a really good job!

    While testing OrioleDB, I came across the fact that the standard psql(dt+) flags don't allow me to determine the size of the table. The same issue with pg_(total)relation/table_size() functions.

    Can you advise me please a way to determine the size of the relation? Now I'm doing it with a query like bellow:

    SELECT pg_size_pretty(sum((pg_stat_file('orioledb_data/'||s.f)).size)) FROM pg_ls_dir('orioledb_data') AS s(f) WHERE f like '13892_16831%';
     pg_size_pretty
    ----------------
     22 GB
    (1 row)
    
    opened by Vadim0908 0
  • Comparison to other solutions (e.g. Yugabyte DB)

    Comparison to other solutions (e.g. Yugabyte DB)

    As all the information provided is in the README, I am having a problem comparing it with other options. Is that something that you could possible provide?

    opened by cdhagmann 2
Releases(alpha7)
  • alpha7(Jun 6, 2022)

    The 7th alpha resease of OrioleDB incorporates the following changes.

    • Make OrioleDB work with data directories initialized without patches,
    • Fix crashes due to wrong usage of o_handle_startup_proc_interrupts_hook(),
    • Fix crashed due to missing o_btree_load_shmem() during INSERT ... ON CONFLICT ...,
    • Fix eviction and checkpointing of temporary trees,
    • Fix accessing unlocked pages during checkpointing,
    • Fix page merging concurrency,
    • FIx page corruption during undo rollback,
    • Fix concurrency during reading transaction CSN,
    • Fix concurrency for unique checking,
    • FIx deadlocks during accessing tree meta information,
    • Add overflow check to add_invalidate_wal_record(),
    • Fix crashed when replaying joint commits,
    • Fix ON CONFLICT for unique index without primary key,
    • Fix update of renamed column,
    • Add support for MERGE command with tests,
    • Fix concurrent DROP TABLE issue.
    Source code(tar.gz)
    Source code(zip)
  • alpha6(Apr 14, 2022)

    The 6th alpha resease of OrioleDB incorporates the following changes.

    • orioledb_commit_hash() function;
    • Fix calculation of vacated bytes during free extents in-place operations;
    • Fix incorrect management of free space in compressed tables;
    • Support for PG13 and PG15;
    • Support for CREATE TABLE ... LIKE.
    Source code(tar.gz)
    Source code(zip)
  • alpha5(Mar 31, 2022)

    The 5th alpha resease of OrioleDB incorporates the following fixes.

    • Fix multiple EXPLAIN ANALYZE in the same transaction
    • Fix counting tuples in EXPLAIN ANALYZE
    • Fix integer overflow in orioledb_relation_size()
    • Fix calculation of sample rows in orioledb_acquire_sample_rows()
    • Fix index build concurrent to checkpointing
    • Fix leaf pages calculation during index build
    • Fix initialization of sampling scan
    • New tests for datatypes and table definition

    The sponsors of this release are:

    • Ian Livingstone (@ianlivingstone),
    • Michail Nikolaev (@michail-nikolaev).

    Full Changelog: https://github.com/orioledb/orioledb/compare/alpha4...alpha5

    Source code(tar.gz)
    Source code(zip)
  • alpha4(Mar 25, 2022)

    The 4th alpha resease of OrioleDB incorporates the following fixes.

    • Merge sparse pages during checkpoint
    • Fix runXmin overrun
    • Fix page corruption + page structure check
    • Use repalloc_huge() for an array of downlinks during tree scan
    • Missing advance_oxids() in read_xids()
    • Fix race condition between advance_global_xmin() and set_oxid_csn()
    • GUC for default compression levels
    • Show compression levels in table description
    • Remove old tmp files during normal restart
    • Fix interlock between write_xidsmap() and advance_global_xmin()
    • Fix checkpoint writeback concurrency

    The sponsors of this release are:

    • Ian Livingstone (@ianlivingstone),
    • Michail Nikolaev (@michail-nikolaev).
    Source code(tar.gz)
    Source code(zip)
  • alpha3(Mar 13, 2022)

    The 3rd alpha resease of OrioleDB incorporates the following fixes.

    • Fix wrong initialization UCM, which leads to hang.
    • Fix tracking xmin among processes.
    • Prevent copying the whole pages for point queries.
    • Fix performance issue with zeroing memory in o_btree_iterator_create().
    • Fix pg8000 support in testgres tests.
    Source code(tar.gz)
    Source code(zip)
  • alpha2(Mar 7, 2022)

    This release contains following bug fixes:

    • Support for ALTER TABLE ... OWNER TO ...,
    • Fix buffer prefetch error during ANALYZE,
    • Fix custom AM support in pgbench.
    Source code(tar.gz)
    Source code(zip)
  • alpha1(Mar 2, 2022)

Owner
OrioleDB
OrioleDB
React-native-quick-sqlite - ⚡️ The fastest SQLite implementation for react-native.

React Native Quick SQLite The **fastest** SQLite implementation for react-native. Copy typeORM patch-package from example dir npm i react-nati

Oscar Franco 280 Jun 22, 2022
Simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.

Sparkey is a simple constant key/value storage library. It is mostly suited for read heavy systems with infrequent large bulk inserts. It includes bot

Spotify 965 Jun 19, 2022
Kreon is a key-value store library optimized for flash-based storage

Kreon is a key-value store library optimized for flash-based storage, where CPU overhead and I/O amplification are more significant bottlenecks compared to I/O randomness.

Computer Architecture and VLSI Systems (CARV) Laboratory 20 Jun 10, 2022
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Authors: Sanjay Ghem

Google 29.7k Jul 4, 2022
The database built for IoT streaming data storage and real-time stream processing.

The database built for IoT streaming data storage and real-time stream processing.

HStreamDB 501 Jun 29, 2022
Serverless SQLite database read from and write to Object Storage Service, run on FaaS platform.

serverless-sqlite Serverless SQLite database read from and write to Object Storage Service, run on FaaS platform. NOTES: This repository is still in t

老雷 7 May 12, 2022
Test any type of cloud database on Android apps. No need of a dedicated backend.

DB Kong - Database connections simplified DB Kong is an Android library that allows you to connect, interact and test any type of cloud database on An

Arjun 9 May 9, 2022
pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

Agroal 524 Jun 15, 2022
HybridSE (Hybrid SQL Engine) is an LLVM-based, hybrid-execution and high-performance SQL engine

HybridSE (Hybrid SQL Engine) is an LLVM-based, hybrid-execution and high-performance SQL engine. It can provide fast and consistent execution on heterogeneous SQL data systems, e.g., OLAD database, HTAP system, SparkSQL, and Flink Stream SQL.

4Paradigm 45 Sep 12, 2021
Modern, asynchronous, and wicked fast C++11 client for Redis

redox Modern, asynchronous, and wicked fast C++11 client for Redis [] (https://travis-ci.org/hmartiro/redox) Redox is a C++ interface to the Redis key

Hayk Martiros 374 Jun 14, 2022
❤️ SQLite ORM light header only library for modern C++

SQLite ORM SQLite ORM light header only library for modern C++ Status Branch Travis Appveyor master dev Advantages No raw string queries Intuitive syn

Yevgeniy Zakharov 1.6k Jun 27, 2022
The PostgreSQL client API in modern C++

C++ client API to PostgreSQL {#mainpage} Dmitigr Pgfe (PostGres FrontEnd, hereinafter referred to as Pgfe) - is a C++ client API to PostgreSQL servers

Dmitry Igrishin 134 Jun 3, 2022
Modern cryptography for PostgreSQL using libsodium.

pgsodium pgsodium is an encryption library extension for PostgreSQL using the libsodium library for high level cryptographic algorithms. pgsodium can

Michel Pelletier 252 May 17, 2022
An embeddable fulltext search engine. Groonga is the successor project to Senna.

README Groonga is an open-source fulltext search engine and column store. Reference manual See doc/source/ directory or http://groonga.org/docs/. Here

Groonga Project 678 Jun 19, 2022
ESE is an embedded / ISAM-based database engine, that provides rudimentary table and indexed access.

Extensible-Storage-Engine A Non-SQL Database Engine The Extensible Storage Engine (ESE) is one of those rare codebases having proven to have a more th

Microsoft 780 Jun 13, 2022
PGSpider: High-Performance SQL Cluster Engine for distributed big data.

PGSpider: High-Performance SQL Cluster Engine for distributed big data.

PGSpider 127 Jun 24, 2022
A very fast lightweight embedded database engine with a built-in query language.

upscaledb 2.2.1 Fr 10. Mär 21:33:03 CET 2017 (C) Christoph Rupp, [email protected]; http://www.upscaledb.com This is t

Christoph Rupp 531 Jun 20, 2022
An Embedded NoSQL, Transactional Database Engine

UnQLite - Transactional Embedded Database Engine

PixLab | Symisc Systems 1.7k Jun 23, 2022
dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover

dqlite dqlite is a C library that implements an embeddable and replicated SQL database engine with high-availability and automatic failover. The acron

Canonical 3k Jun 27, 2022