A framework to monitor and improve the performance of PostgreSQL using Machine Learning methods.

Overview

pg_plan_inspector

pg_plan_inspector is being developed as a framework to monitor and improve the performance of PostgreSQL using Machine Learning methods.

One movie is worth a thousand words. I demonstrate that a tool provided by this framework shows the progress of a running query. Watch this: query-progress-example01.mp4

query-progress-example01.mp4

This is a POC model, and the primary purposes of this framework are:

  1. To implement an external module that monitors the state of the running queries. (Refer to [3].)
  2. To show that the PostgreSQL optimizer can be improved from providing feedback on the analysis of the executed plans.

To achieve these purposes, I am currently developing two modules: pg_query_plan and plan_analyzer.

alt text

The development of this framework has just begun. The movie shown above is the result of one of the by-products of developing these modules.

Terminology

  • Executed plan
    An executed plan is a plan that contains both estimated and actual values, i.e. it is the same as the result of the EXPLAIN command with ANALYZE option.
  • Query plan
    A query plan is a snapshot of the executed plan, which is taken by the pg_query_plan module. It also contains both estimated and actual values, and the actual values are the snapshot values when it is taken.
  • Machine Learning
    Many methods I learned in statistics as a student are now classified as Machine Learning methods. Thanks to the new AI era, I can use the word Machine Learning without hesitation.

Contents

  1. Supported Versions
  2. Installation and Usage
  3. Tentative Conclusion
  4. Future work
  5. Related works
  6. Limitations and Warning
    Version
    References
    Change Log

1. Supported Versions

This framework supports PostgreSQL versions 13 and 14 beta2.

2. Installation and Usage

This framework is composed of two modules: pg_query_plan and plan_analyzer.

The installations and their usages are described in README-pg_query_plan.md and README-plan_analyzer.md, respectively.

3. Tentative Conclusion

pg_query_plan

As shown in README-pg_query_plan.md, the pg_query_plan module can be monitored through the state of running queries. In other words, the first main purpose has been achieved by this module.

plan_analyzer

In README-tools.md, I introduced an example that a tool, called analyze.py, detects the functional dependency between the attributes of the table by applying a simple heuristics rule to the executed plans. This is an ad hoc; however, at least it shows that the second main purpose is not wrong and is possible.

As shown in query-progress-example01.mp4 and README-plan_analyzer.md, I have implemented the tool that shows the progress of a running query with some accuracy. More precisely, the tool gives relatively accurate results if the Plan Rows estimated by the optimizer can be corrected by a linear regression model, otherwise the results will be inaccurate, for example, if the distribution of the table data has changed considerably. In addition, the linear regression for correction of the estimated rows is generally often inaccurate when hash or merge joins are included. These causes are due to the validity limit of the linear regression model for the correction of the estimated rows.

Considering these, it is obvious that the essential task is to improve the cardinality estimation of the PostgreSQL optimizer.

4. Future Work

Step 1

In the first step, I will implement a bridge software between the plan_analyzer module and PostgreSQL server to feedback the analysis results of the executed plan to the optimizer.

A naive approach is that the plan_analyzer module sends the regression parameters stored in the repository to the PostgreSQL optimizer, and the optimizer corrects the estimated Plan Rows using the regression parameters. PostgreSQL fortunately already has a mechanism to intervene in the optimizer's processing and at least two modules have been made to improve the optimizer's processing for better results: pg_dbms_stats and pg_plan_advsr.

In addition, I will improve the algorithm of finding functional dependency in the first step.

Info:

pg_plan_advsr has already achieved one of the goals of this step using another approach.

Step 2

The approach mentioned above works to a certain extent if the linear regression model for the correction of the estimated rows is valid, otherwise it will not work.

Recently, lot of research to improve DBMS functions using AI technology[6] is being conducted, and hundreds of papers have been published on cardinality estimation. These state-of-the-art methods[7,9,8,1,2,5] are attempting to go beyond the traditional methods, such as using histograms. As PostgreSQL uses a traditional method, it would be worthwhile to add ML methods to the optimizer.

This framework will be able to provide feedback on the differences between the estimated cardinality (Plan Rows) and the actual cardinality in Step 1; therefore, the feedback can be used to improve learning.

alt text

5. Related Works

Two projects have similar features to the pg_query_plan module.

  1. pg_query_state also shows the query plan on the working backend. However, this module needs to modify the PostgreSQL core by patching. On the other hand, pg_query_plan is a pure module and does not need to modify the core.
  2. pg_show_plans shows the execution plans of all current running SQL statements. Unlike pg_query_plan and pg_query_state, pg_show_plans shows the execution plan that only contains the estimated values and not the actual values.

Query progress indicators were previously studied [4]. However, it seems that it is not currently being extensively researched.

Two interesting demonstrations will be shown at VLDB 2021: Demonstrations.

  • PostCENN: PostgreSQL with Machine Learning Models for Cardinality Estimation
  • DBMind: A Self-Driving Platform in openGauss

6. Limitations and Warning

6.1. Limitations

  1. These modules cannot work if the query handles the partitioned tables.
  2. These modules cannot work if the query has custom scans or foreign scans.
  3. These modules do not consider the effects of triggers.
  4. The pg_query_plan module is currently not available on standby.

6.2. Warning

Use this framework at your own risk.

  • With the pg_query_plan module, the performance of query processing is reduced by a few percent due to the overhead of data collection.
  • The values provided by the plan_analyzer module are estimates, not guaranteed to be 100% accurate.

Version

Version 0.1 (POC model)

References

[1] Andreas Kipf, et al. "Learned Cardinalities:Estimating Correlated Joins with Deep Learning". In CIDR, 2019.
[2] Benjamin Hilprecht, et al. "DeepDB: Learn from Data, not from Queries!". Proceedings of the VLDB, Vol. 13, Issue 7, March 2020, pages 992–1005.
[3] Lukas Fittl. "What's Missing for Postgres Monitoring". PGCon 2020.
[4] Patil L.V., and Mane Urmila P. "Survey on SQL Query Progress Indicator". International Journal of Engineering Research & Technology (IJERT) Vol. 2, Issue 3, March 2013.
[5] Rong Zhu, Ziniu Wu, et al. "FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation". arXiv preprint arXiv:2011.09022(2020).
[6] X. Zhou, C. Chai, G. Li, and J. Sun. "Database Meets Artificial Intelligence: A Survey". TKDE, 2020.
[7] Xiaoying Wang, et al. "Are We Ready For Learned Cardinality Estimation?". arXiv:2012.14743, December 2020.
[8] Ziniu Wu, et al. "BayesCard: Revitalizing Bayesian Networks for Cardinality Estimation". arXiv:2012.14743, December 2020.
[9] Zongheng Yang, et al. "NeuroCard: One Cardinality Estimator for All Tables". Proceedings of the VLDB, Vol. 14, Issue 1, September 2020.

Change Log

  • 28th July 2021: Version 0.1 Released.
Owner
suzuki hironobu
suzuki hironobu
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features

YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best to fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.

yugabyte 6.6k Jul 1, 2022
pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

pgagroal is a high-performance protocol-native connection pool for PostgreSQL.

Agroal 524 Jun 15, 2022
High-performance time-series aggregation for PostgreSQL

PipelineDB has joined Confluent, read the blog post here. PipelineDB will not have new releases beyond 1.0.0, although critical bugs will still be fix

PipelineDB 2.5k Jun 24, 2022
A PostgreSQL extension providing an async networking interface accessible via SQL using a background worker and curl.

pg_net is a PostgreSQL extension exposing a SQL interface for async networking with a focus on scalability and UX.

Supabase 41 Jun 19, 2022
Modern cryptography for PostgreSQL using libsodium.

pgsodium pgsodium is an encryption library extension for PostgreSQL using the libsodium library for high level cryptographic algorithms. pgsodium can

Michel Pelletier 252 May 17, 2022
A friendly and lightweight C++ database library for MySQL, PostgreSQL, SQLite and ODBC.

QTL QTL is a C ++ library for accessing SQL databases and currently supports MySQL, SQLite, PostgreSQL and ODBC. QTL is a lightweight library that con

null 155 Jun 26, 2022
The official C++ client API for PostgreSQL.

libpqxx Welcome to libpqxx, the C++ API to the PostgreSQL database management system. Home page: http://pqxx.org/development/libpqxx/ Find libpqxx on

Jeroen Vermeulen 643 Jun 25, 2022
Prometheus exporter for PostgreSQL

pgexporter pgexporter is a Prometheus exporter for PostgreSQL. pgexporter will connect to one or more PostgreSQL instances and let you monitor their o

null 15 Apr 17, 2022
PostgreSQL extension for pgexporter

pgexporter_ext pgexporter_ext is an extension for PostgreSQL to provide additional Prometheus metrics for pgexporter. Features Disk space metrics See

null 4 Apr 13, 2022
The PostgreSQL client API in modern C++

C++ client API to PostgreSQL {#mainpage} Dmitigr Pgfe (PostGres FrontEnd, hereinafter referred to as Pgfe) - is a C++ client API to PostgreSQL servers

Dmitry Igrishin 134 Jun 3, 2022
C++ client library for PostgreSQL

Welcome to taoPQ taoPQ is a lightweight C++ client library for accessing a PostgreSQL➚ database. It has no dependencies beyond libpq➚, the C applicati

The Art of C++ 213 Jun 22, 2022
Backup / restore solution for PostgreSQL

pgmoneta pgmoneta is a backup / restore solution for PostgreSQL. pgmoneta is named after the Roman Goddess of Memory. Features Full backup Restore Sym

null 35 Jun 22, 2022
recovery postgresql table data by update/delete/rollback/dropcolumn command

recovery postgresql table data by update/delete/rollback/dropcolumn command

RadonDB 4 Mar 30, 2022
xxhash functions for PostgreSQL

pg_xxhash PostgreSQL ❤️ xxhash Tested with xxhash 0.8.1 and PostgreSQL 14.1 on Linux and macOS. Think twice before even considering to use it in any s

Igor Hatarist 5 Mar 11, 2022
Distributed PostgreSQL as an extension

What is Citus? Citus is a PostgreSQL extension that transforms Postgres into a distributed database—so you can achieve high performance at any scale.

Citus Data 6.8k Jun 28, 2022
Reliable PostgreSQL Backup & Restore

pgBackRest Reliable PostgreSQL Backup & Restore Introduction pgBackRest aims to be a reliable, easy-to-use backup and restore solution that can seamle

pgBackRest 1.2k Jun 25, 2022
upstream module that allows nginx to communicate directly with PostgreSQL database.

About ngx_postgres is an upstream module that allows nginx to communicate directly with PostgreSQL database. Configuration directives postgres_server

RekGRpth 1 Apr 29, 2022
Open Source Oracle Compatible PostgreSQL.

IvorySQL is advanced, fully featured, open source Oracle compatible PostgreSQL with a firm commitment to always remain 100% compatible and a Drop-in r

null 86 Jun 20, 2022
A high performance fiber RPC network framework. 高性能协程RPC网络框架

ACID: 高性能协程RPC框架 学习本项目需要有一定的C++,网络,RPC知识 项目依赖 1.项目用到了大量C++17/20新特性,如constexpr if的编译期代码生成,基于c++20 coroutine的无栈协程状态机解析 URI 和 HTTP 协议等。注意,必须安装g++-11,否则不支

null 183 Jun 26, 2022