Query C++ codebases using SQLite

Overview

ClangQL: query C++ codebases using SQLite and clangd

What is it?

ClangQL is a proof-of-concept SQLite extension for querying C++ codebases that have been indexed using clangd.

How does it work?

It employs SQLite's virtual table system to act as an intermediary between SQLite and clangd's gRPC interface

How do I use it?

Once the module has been built, you can load it in the sqlite3 CLI via the usual .load clangql.

Afterwards, you can connect to a codebase by instantiating the various virtual tables:

sqlite> CREATE VIRTUAL TABLE llvm_symbols USING clangql (symbols, clangd-index.llvm.org:5900);
sqlite> CREATE VIRTUAL TABLE llvm_base_of USING clangql (base_of, clangd-index.llvm.org:5900);
sqlite> CREATE VIRTUAL TABLE llvm_overridden_by USING clangql (overridden_by, clangd-index.llvm.org:5900);
sqlite> CREATE VIRTUAL TABLE llvm_refs USING clangql (refs, clangd-index.llvm.org:5900);

You can then query the codebase as if it was a regular table (some caveats apply, read the last point to learn more):

sqlite> SELECT Name, Scope, DefPath FROM llvm_symbols WHERE Name = "Foo"
Name  Scope                                        DefPath
----  -------------------------------------------  --------------------------------------------------------
Foo   clang::clangd::                              clang-tools-extra/clangd/unittests/LSPBinderTests.cpp
Foo   STLExtras_MoveRange_Test::TestBody()::Foo::  llvm/unittests/ADT/STLExtrasTest.cpp
Foo   STLExtras_MoveRange_Test::TestBody()::Foo::  llvm/unittests/ADT/STLExtrasTest.cpp
Foo   SizelessTypeTester::                         clang/unittests/AST/SizelessTypesTest.cpp
Foo                                                llvm/unittests/ADT/TypeTraitsTest.cpp
Foo                                                llvm/unittests/ADT/STLExtrasTest.cpp
Foo   Class::                                      lldb/unittests/Utility/ReproducerInstrumentationTest.cpp
Foo   llvm::orc::CoreAPIsBasedStandardTest::       llvm/unittests/ExecutionEngine/Orc/OrcTestCommon.h
Foo   llvm::TrailingObjects::                      llvm/include/llvm/Support/TrailingObjects.h
Foo                                                llvm/unittests/Support/BinaryStreamTest.cpp
Foo   STLExtras_MoveRange_Test::TestBody()::Foo::  llvm/unittests/ADT/STLExtrasTest.cpp
Foo                                                llvm/unittests/Support/BinaryStreamTest.cpp
Foo   clang::                                      clang/unittests/AST/ASTTypeTraitsTest.cpp
Foo                                                lldb/unittests/Utility/ReproducerInstrumentationTest.cpp

As another example, searching all the subclasses of a particular class:

sqlite> SELECT subclass.Name, subclass.Scope, subclass.DefPath FROM llvm_symbols AS superclass INNER JOIN llvm_base_of AS rel ON rel.Subject = superclass.Id INNER JOIN llvm_symbols AS subclass ON subclass.Id = rel.Object WHERE superclass.Name = "MCAsmInfo";
Name               Scope   DefPath
-----------------  ------  ---------------------------------------------------
NVPTXMCAsmInfo     llvm::  llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.h
MCAsmInfoWasm      llvm::  llvm/include/llvm/MC/MCAsmInfoWasm.h
BPFMCAsmInfo       llvm::  llvm/lib/Target/BPF/MCTargetDesc/BPFMCAsmInfo.h
MockedUpMCAsmInfo          llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp
AVRMCAsmInfo       llvm::  llvm/lib/Target/AVR/MCTargetDesc/AVRMCAsmInfo.h
MCAsmInfoXCOFF     llvm::  llvm/include/llvm/MC/MCAsmInfoXCOFF.h
MCAsmInfoDarwin    llvm::  llvm/include/llvm/MC/MCAsmInfoDarwin.h
HackMCAsmInfo              llvm/unittests/CodeGen/TestAsmPrinter.cpp
MCAsmInfoELF       llvm::  llvm/include/llvm/MC/MCAsmInfoELF.h
MCAsmInfoCOFF      llvm::  llvm/include/llvm/MC/MCAsmInfoCOFF.h

Searching for all declarations inside of the std namespace:

sqlite> SELECT decl.Name FROM llvm_symbols AS decl INNER JOIN llvm_refs AS ref ON ref.SymbolId = decl.Id WHERE decl.Scope = "std::" AND ref.Declaration = 1;
Name
---------------------
align_val_t
__unexpected
is_execution_policy
__terminate
is_execution_policy_v

In general, for each codebase four different virtual tables can be queried: a symbols table will contain information about every symbol in the codebase, a base_of table will contain information about what symbols are base classes of what symbols, a overridden_by table will contain information about what symbols are overridden by what symbols, and a refs table will contain information about symbol references.

The syntax for instantiating the tables is the following:

CREATE VIRTUAL TABLE my_symbols USING clangql (symbols, host:port);
CREATE VIRTUAL TABLE my_base_of USING clangql (base_of, host:port);
CREATE VIRTUAL TABLE my_overridden_by USING clangql (overridden_by, host:port);
CREATE VIRTUAL TABLE my_refs USING clangql (refs, host:port);

my_* names are not important and can be anything, the first parameter to the creation of the virtual tables is important and must be left as-is, the second parameter is the connection string. Currently, only unencrypted gRPC connections are supported.

What's the schema?

The schema of symbols tables is equivalent to the following:

CREATE TABLE vtable(Id TEXT, Name TEXT, Scope TEXT,
  Signature TEXT, Documentation TEXT, ReturnType TEXT,
  Type TEXT, DefPath TEXT, DefStartLine INT, DefStartCol INT,
  DefEndLine INT, DefEndCol INT, DeclPath TEXT,
  DeclStartLine INT, DeclStartCol INT, DeclEndLine INT, DeclEndCol INT,
  Kind INT, SubKind INT, Language INT,
  Generic INT, TemplatePartialSpecialization INT, TemplateSpecialization INT,
  UnitTest INT, IBAnnotated INT, IBOutletCollection INT, GKInspectable INT,
  Local INT, ProtocolInterface INT)

A textual representation for the Kind, SubKind and Language columns can be obtained using the symbol_kind, symbol_subkind and symbol_language functions.

Currently, the columns from Generic to ProtocolInterface are always 0, because for some reason the server always sends a zero-valued properties field.

The schema for base_of is the same as overridden_by, and is equivalent to the following:

CREATE TABLE vtable(Subject TEXT, Object TEXT)

The meaning is as follows: if a row (S, O) is present in base_of, then S is a base class of O; if a row (S, O) is present in overridden_by, then S has been overridden by O.

Please note that it is only possible to query these two tables by their Subject, querying by Object is not possible due to limitations in the clangd protocol.

The schema of refs tables is equivalent to

CREATE TABLE vtable(SymbolId TEXT, Declaration INT,
  Definition INT, Reference INT, Spelled INT,
  Path TEXT, StartLine INT, StartCol INT,
  EndLine INT, EndCol INT)

Please note that querying refs without a SymbolId will return 0 rows.

How do I build it?

ClangQL uses CMake, Protocol Buffers and gRPC. On Windows I used vcpkg to manage the two dependencies. I'm afraid I'm not knowledgeable enough with Linux and/or macOS to give precise indications on how to build it there, but I'm guessing that as long as you have the correct development packages installed and visible on your system, CMake will be able to locate them.

Once the repository is cloned, run:

cmake -B build -S . -DCMAKE_BUILD_TYPE=Release -DVCPKG_TARGET_TRIPLET=x64-windows-static-md -DCMAKE_TOOLCHAIN_FILE=D:/vcpkg/scripts/buildsystems/vcpkg.cmake

to configure the build. Adjust CMAKE_BUILD_TYPE, VCPKG_TARGET_TRIPLET, CMAKE_TOOLCHAIN_FILE and the generator type to suit your system and needs the best. Please note that the sqlite CLI tool and the extension must have the same bitness, at least on Windows. A 32-bit CLI (such as the precompiled one from SQLite.org) will not load a 64-bit extension.

Once configured, run:

cmake --build build

to compile the extension. First time will take a long time (on Windows), due to the need to compile Protobuf and gRPC as well. Later builds will be faster.

I have uploaded precompiled 32- and 64-bit DLLs for Windows as a GitHub release, anyways.

What constraints are available?

On symbols tables, the following constraints will generate more specific requests to the clangd server:

  • Equality on Id, Name or Scope
  • LIKE on Name, Scope, DefPath or DeclPath

The LIKE constraint on Name relies on the fuzzy search semantics of clangd. The LIKE constraint on Scope has the effect of enabling the any_scope field of the fuzzy find request to clangd. The LIKE constraint on DefPath or DeclPath has the only effect of populating the proximity_path of the fuzzy find request to clangd, which has the ultimate effect of prioritizing symbols declared or defined near the specified path.

On base_of and overridden_by tables, only equality on Subject generates specific queries to the server.

On refs tables, only equality on SubjectId generates specific queries to the server.

What works, what doesn't?

There is currently no way to i.e. obtain all possible relations between two symbols, so the relation tables are really only useful in joins. It's not a huge deal, as they are meant to be used that way anyways, but you still need to be careful when writing queries.

Not all queries are equally fast: querying on symbol id, name or scope is fast, everything else needs to happen client side and is potentially slow.

Similarly, when querying the base_of or overridden_by relations, only one of the two directions is possible, the other is not currently possible due to protocol limitations. Also, not specifying a Subject will result in 0 rows being returned.

Querying refs without specifying a SymbolId will result in 0 rows being produced. Specifying any one of Definition, Declaration, Reference or Spelled will generate more specific requests to the clangd server, all other fields are scanned client side.

Error checking is nonexistant. This is not ready for production use and was mostly made for fun, to explore to what extent the clangd interface was suitable for use with SQLite, and to learn about the SQLite virtual table system.

You might also like...
Next Index to Query Kmer Intersection

NIQKI NIQKI stand for Next Index to Query K-mer Intersection. NIQKI is an sketch based software, similar to Mash or Dashing, which can index the large

A very fast lightweight embedded database engine with a built-in query language.

upscaledb 2.2.1 Fr 10. Mär 21:33:03 CET 2017 (C) Christoph Rupp, [email protected]; http://www.upscaledb.com This is t

Source code for the data dependency part of Jan Kossmann's PhD thesis "Unsupervised Database Optimization: Efficient Index Selection & Data Dependency-driven Query Optimization"

Unsupervised Database Optimization: Data Dependency-Driven Query Optimization Source code for the experiments presented in Chapter 8 of Jan Kossmann's

All the missing SQLite functions

SQLite Plus: all the missing SQLite functions SQLite has very few functions compared to other DBMS. SQLite authors see this as a feature rather than a

❤️ SQLite ORM light header only library for modern C++
❤️ SQLite ORM light header only library for modern C++

SQLite ORM SQLite ORM light header only library for modern C++ Status Branch Travis Appveyor master dev Advantages No raw string queries Intuitive syn

SQLean: all the missing SQLite functions

SQLite has very few functions compared to other DBMS. SQLite authors see this as a feature rather than a bug, because SQLite has extension mechanism in place.

An SQLite binding for node.js with built-in encryption, focused on simplicity and (async) performance

Description An SQLite (more accurately SQLite3MultipleCiphers) binding for node.js focused on simplicity and (async) performance. When dealing with en

Yet another SQLite wrapper for Nim

Yet another SQLite wrapper for Nim Features: Design for ARC/ORC, you don’t need to close the connection manually Use importdb macro to create helper f

Armazena a tabela nutricional dos alimentos em um banco de dados (SQLITE), salva as quantidades em um arquivo EXCEL, informando se a meta diária foi batida.

QT-Controle-de-Dieta Armazena a tabela nutricional dos alimentos em um banco de dados (SQLITE), salva as quantidades em um arquivo EXCEL, informando s

A friendly and lightweight C++ database library for MySQL, PostgreSQL, SQLite and ODBC.

QTL QTL is a C ++ library for accessing SQL databases and currently supports MySQL, SQLite, PostgreSQL and ODBC. QTL is a lightweight library that con

Fork of sqlite4java with updated SQLite and very basic compiler hardening enabled.

Download latest version: sqlite4java-392 with SQLite 3.8.7, Windows/Linux/Mac OS X/Android binaries OSGi bundle 1.0.392 with sqlite4java-392 Files for

An updated fork of sqlite_protobuf, a SQLite extension for extracting values from serialized Protobuf messages.

This fork of sqlite_protobuf fixes some issues (e.g., #15) and removes the test suite that we do not use. It also comes with proto_table, a C library

Serverless SQLite database read from and write to Object Storage Service, run on FaaS platform.

serverless-sqlite Serverless SQLite database read from and write to Object Storage Service, run on FaaS platform. NOTES: This repository is still in t

Verneuil is a VFS extension for SQLite that asynchronously replicates databases to S3-compatible blob stores.
Verneuil is a VFS extension for SQLite that asynchronously replicates databases to S3-compatible blob stores.

Verneuil: streaming replication for sqlite Verneuil1 [vɛʁnœj] is a VFS (OS abstraction layer) for sqlite that accesses local database files like the d

C++ ORM for SQLite

Hiberlite ORM C++ object-relational mapping with API inspired by the awesome Boost.Serialization - that means almost no API to learn. Usage Just compi

The C++14 wrapper around sqlite library

sqlite modern cpp wrapper This library is a lightweight modern wrapper around sqlite C api . #includeiostream #include sqlite_modern_cpp.h using n

Unofficial git mirror of SQLite sources (see link for build instructions)

SQLite Source Repository This repository contains the complete source code for the SQLite database engine. Some test scripts are also included. Howeve

A hook for Project Zomboid that intercepts files access for savegames and puts them in an SQLite DB instead.

ZomboidDB This project consists of a library and patcher that results in file calls for your savegame(s) being transparently intercepted and redirecte

Lightweight C++ wrapper for SQLite

NLDatabase Lightweight C++ wrapper for SQLite. Requirements C++11 compiler SQLite 3 Usage Let's open a database file and read some rows: #include "NLD

Comments
  • Estrarre tutti i simboli da una sorgente remota

    Estrarre tutti i simboli da una sorgente remota

    Salve Francesco, mi permetto di scrivere in italiano, spero non sia un problema. Innanzitutto, complimenti per il progetto! Mi chiedevo se esiste la possibilità di estrarre tutti i simboli dal una sorgente remota, ho notato che il count(*) della tabella restituisce 9998 righe, mi chiedevo se per caso è un limite del server o del client. Questo è quello che vorrei fare:

    .load clangql
    CREATE VIRTUAL TABLE llvm_symbols USING clangql (symbols, android.clangd-index.chromium.org:5900);
    CREATE VIRTUAL TABLE llvm_base_of USING clangql (base_of, android.clangd-index.chromium.org:5900);
    CREATE VIRTUAL TABLE llvm_overridden_by USING clangql (overridden_by, android.clangd-index.chromium.org:5900);
    CREATE VIRTUAL TABLE llvm_refs USING clangql (refs, android.clangd-index.chromium.org:5900);
    
    .headers on
    .mode csv
    .output c:/temp/symbols.txt
    SELECT * FROM llvm_symbols;
    

    grazie mille in anticipo.

    opened by uazo 4
  • Compilare per linux

    Compilare per linux

    Salve Francesco, se vuoi trovi qui lo script per compilare in linux tramite il runner di github. L'unica modifica che ho dovuto fare è questa dal momento che std::exception sembra sia una estensione specifica di microsoft. Grazie ancora per l'ottimo lavoro.

    opened by uazo 0
Releases(v0.3)
Owner
Francesco Bertolaccini
I am an undergraduate CompSci student from Italy. I love all things involving flowing electrons including music, electronics and PCs.
Francesco Bertolaccini
A C++ Web Framework built on top of Qt, using the simple approach of Catalyst (Perl) framework.

Cutelyst - The Qt Web Framework A Web Framework built on top of Qt, using the simple and elegant approach of Catalyst (Perl) framework. Qt's meta obje

Cutelyst 809 Dec 19, 2022
weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

Google Project Zero 2k Dec 28, 2022
DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.

DB Browser for SQLite What it is DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files c

null 17.5k Jan 2, 2023
React-native-quick-sqlite - ⚡️ The fastest SQLite implementation for react-native.

React Native Quick SQLite The **fastest** SQLite implementation for react-native. Copy typeORM patch-package from example dir npm i react-nati

Oscar Franco 423 Dec 30, 2022
🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

?? ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

ArangoDB 12.8k Jan 9, 2023
StarRocks is a next-gen sub-second MPP database for full analysis senarios, including multi-dimensional analytics, real-time analytics and ad-hoc query, formerly known as DorisDB.

StarRocks is a next-gen sub-second MPP database for full analysis senarios, including multi-dimensional analytics, real-time analytics and ad-hoc query, formerly known as DorisDB.

StarRocks 3.7k Dec 30, 2022
Velox is a new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Velox is a C++ database acceleration library which provides reusable, extensible, and high-performance data processing components

Facebook Incubator 2k Jan 8, 2023
Memgraph is a streaming graph application platform that helps you wrangle your streaming data, build sophisticated models that you can query in real-time, and develop graph applications.

Memgraph is a streaming graph application platform that helps you wrangle your streaming data, build sophisticated models that you can query in real-time, and develop graph applications.

Memgraph 797 Dec 29, 2022