Compressed Log Processor (CLP) is a free tool capable of compressing text logs and searching the compressed logs without decompression.

Overview

CLP

Compressed Log Processor (CLP) is a tool capable of losslessly compressing text logs and searching the compressed logs without decompression. To learn more about it, you can read our paper.

Getting Started

You can download a release from the releases page or you can build the latest by using the packager.

Project Structure

CLP is currently split across a few different components in the components directory:

  • clp-py-utils contains Python utilities common to several of the other components.
  • compression-job-handler contains code to submit compression jobs to a cluster.
  • core contains code to compress uncompressed logs, decompress compressed logs, and search compressed logs.
  • job-orchestration contains code to schedule compression jobs on the cluster.
  • package-template contains the base directory structure and files of the CLP package.

Packages

The packages held by this repository are:

  1. Docker Image clp/clp-core-dependencies-x86-ubuntu-focal
    • A docker image containing all the necessary dependencies to build CLP core in an Ubuntu Focal x86 environment
  2. Docker Image clp/clp-core-dependencies-x86-ubuntu-bionic
    • A docker image containing all the necessary dependencies to build CLP core in an Ubuntu Bionic x86 environment
  3. Docker Image clp/clp-core-dependencies-x86-centos7.4
    • A docker image containing all the necessary dependencies to build CLP core in a Centos 7.4 x86 environment
  4. Docker Image clp/clp-execution-x86-ubuntu-focal
    • A docker image containing all the necessary dependencies to run the full CLP package in an x86 environment

Next Steps

This is our open-source release which we will be constantly updating with bug fixes, features, etc. If you would like a feature or want to report a bug, please file an issue and we'll be happy to engage. We also welcome any contributions!

Issues
  • [ERROR] [clp] Unable to connect to the database with the provided credentials

    [ERROR] [clp] Unable to connect to the database with the provided credentials

    When I run ./sbin/start-clp --uncompressed-logs-dir <directory containing your uncompressed logs>, it prompts an error such as the title. The detailed execution process is as follows

    $ ./sbin/start-clp --uncompressed-logs-dir ./myTestData/input/
    2021-10-24 21:25:28,315 [INFO] [clp] Using default config file at etc/clp-config.yaml
    2021-10-24 21:25:28,320 [INFO] [clp] Provision docker network bridge
    2021-10-24 21:25:28,634 [INFO] [clp] Starting CLP scheduler
    2021-10-24 21:25:28,634 [INFO] [clp] Starting scheduler mariadb database
    2021-10-24 21:25:34,037 [INFO] [clp] Starting scheduler queue
    2021-10-24 21:25:39,313 [INFO] [clp] Initializing scheduler queue
    2021-10-24 21:25:40,831 [INFO] [clp] Initializing scheduler database tables
    2021-10-24 21:26:05,825 [ERROR] [clp] Unable to connect to the database with the provided credentials
    2021-10-24 21:26:05,826 [ERROR] [clp] 
    2021-10-24 21:26:05,826 [ERROR] [clp] Failed to provision "clp-mini-cluster"
    
    opened by charleswu52 11
  • Compression: unable to mount uncompressed logs directory in the container

    Compression: unable to mount uncompressed logs directory in the container

    Hello, I am unable to compress the log file as the path for uncompressed logs is not mounted in the container. Please see the logs below for further inspection. CLP starts successfully. uncomp_logs is the <uncompressed_logs-dir>

    $ ./clp-package-ubuntu-focal-x86_64-v0.0.1/sbin/start-clp --uncompressed-logs-dir uncomp_logs/
    2021-11-02 11:25:29,892 [INFO] [clp] Using default config file at clp-package-ubuntu-focal-x86_64-v0.0.1/etc/clp-config.yaml
    2021-11-02 11:25:29,898 [INFO] [clp] Provision docker network bridge
    2021-11-02 11:25:30,102 [INFO] [clp] Starting CLP scheduler
    2021-11-02 11:25:30,103 [INFO] [clp] Starting scheduler mariadb database
    2021-11-02 11:25:33,941 [INFO] [clp] Starting scheduler queue
    2021-11-02 11:25:39,687 [INFO] [clp] Initializing scheduler queue
    2021-11-02 11:25:41,374 [INFO] [clp] Initializing scheduler database tables
    2021-11-02 15:25:41,608 [INFO] Successfully created clp metadata tables for compression and search
    2021-11-02 15:25:41,835 [INFO] Successfully created compression_jobs and compression_tasks orchestration tables
    2021-11-02 11:25:41,851 [INFO] [clp] Starting scheduler service
    2021-11-02 11:25:41,960 [INFO] [clp] Starting CLP worker
    

    Upon compressing, it gives the following error:

    $ ./clp-package-ubuntu-focal-x86_64-v0.0.1/sbin/compress uncomp_logs/auth.log 
    2021-11-02 11:26:23,597 [INFO] [clp] Using default config file at clp-package-ubuntu-focal-x86_64-v0.0.1/etc/clp-config.yaml
    2021-11-02 15:26:23,842 [INFO] [compress] Compression job submitted to compression-job-handler.
    2021-11-02 15:26:23,842 [INFO] [compression-job-handler] compression-job-handler started.
    2021-11-02 15:26:23,863 [INFO] [job-8] Iterating and partitioning files into tasks.
    2021-11-02 15:26:23,863 [ERROR] [job-8] "/opt/clp/uncomp_logs/auth.log" does not exist.
    2021-11-02 15:26:23,871 [INFO] [job-8] Waiting for 0 task(s) to finish.
    

    There indeed is no mentioned directory in the container.

    [email protected]:/opt/clp# ls   
    LICENSE  README.md  bin  etc  lib  requirements-pre-3.7.txt  sbin  var
    

    I checked the permission of the uncompressed-log-directory and it is user:docker. Is there anything else I need to check

    opened by BasantaChaulagain 6
  • Use ArrayBackedSet to replace std::set for index in segment

    Use ArrayBackedSet to replace std::set for index in segment

    References

    N/A

    Description

    1. Introduced a new data structure ArrayBackedIntPosSet. The data structure replaced std::unorder_set for tracking which IDs had occurred in a segment. The new data structure wraps a vector<bool> which uses 1 bit for each ID. Compared to std::unorder_set, the ArrayBackedIntPosSet consumes significantly less memory and achieves similar performance.
    2. Removed the variable ID set from the encoded file object. Instead, IDs are added to the segment index as each message is encoded. For files that don't start with timestamps, we don't know whether the file will end up in the segment for files with timestamps or the segment for files without; so this change adds a temporary ID holder in the archive to handle this case.
    3. Embedded file object into archive object to enforce only 1 file can be compressed at any time of the execution.
    4. Updated make-dictionaries-readable to dump the segment index as well.

    Validation performed

    Ran compression locally on var-logs, openstack-24hrs, hadoop-24hrs. Confirmed that the output is correct and the RSS usage & performance match expectations.

    The Following is the change in RSS (Bytes) Openstack-24hrs: 1,163,071,488 -> 902,053,888 ~22% saving Spark-hibench: 1,019,035,648 -> 974,655,488 ~4% saving hadoop-24hrs: 80,744,448 -> 76,058,624 ~ 5.8% saving var-log: 436,318,208 -> 365,416,448 ~ 16% saving

    opened by haiqi96 4
  • Source Compile & Test

    Source Compile & Test

    I have a question here, I don’t know if you can answer it. I found that CLP uses docker to run the program, and it uses the MySQL database. But the core source code is still implemented in c++. I want to know how to compile and test the CLP algorithm using only C++ code, thank you! @kirkrodrigues @jackluo923

    opened by charleswu52 4
  • Fail to compile CLP

    Fail to compile CLP

    I have met another compilation problem when I try to execute "make" command. It refers missing "xmlReaderForIO" in libarchive.a, I wonder why ? I have "libxml2.a" installed in my lib directory.

    opened by Program-Bear 4
  • Centos7.4 build failed

    Centos7.4 build failed

    All dependencies have been successfully compiled and installed. But an error is reported when make is executed, and the error message is as follows: /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_read_support_filter_lz4.c.o): in function lz4_filter_read_legacy_stream': archive_read_support_filter_lz4.c:(.text+0x1f9): undefined reference toLZ4_decompress_safe' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_read_support_filter_lz4.c.o): in function lz4_filter_read_default_stream': archive_read_support_filter_lz4.c:(.text+0x575): undefined reference toLZ4_decompress_safe_usingDict' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_read_support_filter_lz4.c:(.text+0x79e): undefined reference to LZ4_decompress_safe' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_cryptor.c.o): in functionaes_ctr_release': archive_cryptor.c:(.text+0x18): undefined reference to EVP_CIPHER_CTX_free' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_cryptor.c.o): in functionaes_ctr_init': archive_cryptor.c:(.text+0x4e): undefined reference to EVP_CIPHER_CTX_new' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0x89): undefined reference toEVP_aes_192_ecb' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0xb7): undefined reference to EVP_CIPHER_CTX_init' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0xc9): undefined reference toEVP_aes_128_ecb' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0xd9): undefined reference to EVP_aes_256_ecb' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_cryptor.c.o): in functionaes_ctr_update': archive_cryptor.c:(.text+0x2f9): undefined reference to EVP_EncryptInit_ex' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_cryptor.c:(.text+0x318): undefined reference toEVP_EncryptUpdate' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_hmac.c.o): in function __hmac_sha1_cleanup': archive_hmac.c:(.text+0x10): undefined reference toHMAC_CTX_cleanup' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_hmac.c.o): in function __hmac_sha1_final': archive_hmac.c:(.text+0x48): undefined reference toHMAC_Final' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_hmac.c.o): in function __hmac_sha1_init': archive_hmac.c:(.text+0x95): undefined reference toEVP_sha1' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: archive_hmac.c:(.text+0xa9): undefined reference to HMAC_Init_ex' /opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/lib/libarchive.a(archive_hmac.c.o): in function__hmac_sha1_update': archive_hmac.c:(.text+0x64): undefined reference to `HMAC_Update' collect2: error: ld returned 1 exit status make[2]: *** [clp] mistake 1 make[1]: *** [CMakeFiles/clp.dir/all] mistake 2 make: *** [all] mistake 2

    opened by woxiang-H 2
  • Can not build on redHat 7

    Can not build on redHat 7

    I am trying to build CLP on Linux server(redHat 7), but I encountered CMake(version: 3.21.1) build failure. The debug messages refer it could not find Boost (missing iostreams, program_options filesystem system). But I have libboost-iostreams.a, libboost-program_options.a, libboost-filesystem.a and libboost-system.a(all are in version 1.59.0) under /usr/include/lib/, I wonder why the Cmake program could not find them. I am looking forward to a solution, thanks.

    opened by Program-Bear 2
  • File optimization

    File optimization

    References

    N/A

    Description

    1.Removed two strings member variable from file object. The getter now directly calls boost api instead of keeping the temporary strings in the class. 2. Turned static allocated page allocator object into dynamically allocated ones. This allows the clp to free up more memory when marking a file as closed.

    Validation performed

    Ran some local testing and verified that max RSS memory usage went down.

    opened by haiqi96 1
  • Added workflows to build and push images for CLP core's dependencies and CLP's execution environment

    Added workflows to build and push images for CLP core's dependencies and CLP's execution environment

    Changes

    Created two workflows, triggered by changes to the respective Dockerfiles and scripts, to build Docker images:

    1. With all the necesary dependencies to build CLP core
    2. With all the necesary tools to run the entire CLP package

    Both images are pushed to Github Packages with the names clp-core-dependencies-x86-ubuntu-focal and clp-execution-x86-ubuntu-focal

    Validation

    Tested workflows on personal fork, they run sucessfully - images are built and pushed to Github Packages

    opened by NamanGulati 1
  • Bugfix: Extract dependency installation from CLP core's library installation scripts; Replace usage of wget with curl.

    Bugfix: Extract dependency installation from CLP core's library installation scripts; Replace usage of wget with curl.

    References

    Fixes bug in #57

    Description

    • Rely on users / containers installing the necessary dependencies for the library installation scripts rather than having them in the scripts themselves. This is because each distro may be have different package managers, etc. which make the scripts unwieldy.
    • Replace wget with curl since curl is preinstalled in some Linux distributions.

    Validation performed

    • Built all docker images successfully.
    • Verified CLP's core builds in each dependency container.
    • Verified CLP runs in the execution container.
    opened by kirkrodrigues 0
  • Refactor library installation scripts and add ability to specify output directory for generated .deb package.

    Refactor library installation scripts and add ability to specify output directory for generated .deb package.

    Description

    • Refactor CLP core's library installation scripts to be simpler
    • Add the ability to specify an output directory for the .deb package generated by building the package (useful to avoid recompilation on a different machine)

    Validation performed

    • Built clp-env-base:focal docker image and verified that CLP's core could be built within it.
    opened by kirkrodrigues 0
  • Refactor Workflows & block docker push on PR

    Refactor Workflows & block docker push on PR

    References

    Description

    • Github Build Worfklows have been failing in every PR because the local clp-docker-build-push-action, which is used across several workflows, builds a docker image and tries to push it to ghcr - pushing to the registry from a PR is blocked, causing failed execution of workflows
    • The generate-build-dependency-image workflow currently takes upwards of 40 mins to finish. Reorganize different build steps in the workflow to execute in paralell - takes 20 mins to finish.

    Validation performed

    • Using this PR to validate no docker image push on PRs
    • https://github.com/NamanGulati/clp/actions/runs/2156157051
    opened by NamanGulati 0
  • Change ErrorCode enum to an enum class

    Change ErrorCode enum to an enum class

    References

    Description

    Changed ErrorCode enum to an enum class in ErrorCode.hpp

    Validation performed

    Validated that compression and decompression operated as normal

    opened by SharafMohamed 0
  • How to pass custom delimiters, dictionary and non-dictionary schemas

    How to pass custom delimiters, dictionary and non-dictionary schemas

    According to the paper, we can pass following configs for CLP.

    1. delimiters
    2. dictionary_variables
    3. non_dictionary_variables

    But, AFAIU, there is no way to pass these for clg and clp now.

    Can you help me if I miss anything? Thanks

    opened by kavirajk 6
Releases(v0.0.1)
Mini-async-log-c - Mini async log C port. Now with C++ wrappers.

Description A C11/C++11 low-latency wait-free producer (when using Thread Local Storage) asynchronous textual data logger with type-safe strings. Base

null 66 Dec 7, 2021
Cute Log is a C++ Library that competes to be a unique logging tool.

Cute Log Cute Log is a C++ Library that competes to be a unique logging tool. Version: 2 Installation Click "Code" on the main repo page (This one.).

null 4 Aug 12, 2021
log4cplus is a simple to use C++ logging API providing thread-safe, flexible, and arbitrarily granular control over log management and configuration. It is modelled after the Java log4j API.

% log4cplus README Short Description log4cplus is a simple to use C++17 logging API providing thread--safe, flexible, and arbitrarily granular control

null 1.3k Jun 30, 2022
View and log aoe-api requests and responses

aoe4_socketspy View and log aoe-api requests and responses Part 1: https://www.codereversing.com/blog/archives/420 Part 2: https://www.codereversing.c

Alex Abramov 6 Apr 28, 2022
A revised version of NanoLog which writes human readable log file, and is easier to use.

NanoLogLite NanoLogLite is a revised version of NanoLog, and is easier to use without performance compromise. The major changes are: NanoLogLite write

Meng Rao 19 May 17, 2022
Example program using eBPF to log data being based in using shell pipes

Example program using eBPF to log data being based in using shell pipes (|)

pat_h/to/file 27 Mar 21, 2022
Log engine for c plus plus

PTCLogs library PTCLogs is a library for pretty and configurable logs. Installation To install the library (headers and .so file), clone this repo and

Pedro Tavares de Carvalho 12 May 20, 2022
Sagan - a multi-threads, high performance log analysis engine

Sagan - Sagan is a multi-threads, high performance log analysis engine. At it's core, Sagan similar to Suricata/Snort but with logs rather than network packets.

Quadrant Information Security 63 Jun 17, 2022
Receive and process logs from the Linux kernel.

Netconsd: The Netconsole Daemon This is a daemon for receiving and processing logs from the Linux Kernel, as emitted over a network by the kernel's ne

Facebook 25 Jun 14, 2022
Tiny ANSI C lib for logs

logger.c An ANSI C (C86) lib for logs Easy to use and easy. Build You can build this lib or copy/paste sources files in your project. cd build make

Diesirae 7 Apr 10, 2022
ScyllaHide for IDA7.5; ScyllaHide IDA7.5; It is a really niccccccce anti-anti-debug tool

Hint 支持原项目,谢谢原项目作者,我只是改了改代码,以支持IDA7.5 ( 原本只支持IDA6.8 )。我觉得原作者应该会介意。 有事麻烦联系我删除。sorry Thank you for the original project developer ScyllaHide Thanks、Than

彳 亍 143 Jun 26, 2022
A tool for recording RL trajectories.

EnvironmentLogger EnvLogger is a standard dm_env.Environment class wrapper that records interactions between a real environment and an agent. These in

DeepMind 53 Jun 18, 2022
Colorful Logging is a simple and efficient library allowing for logging and benchmarking.

Colorful-Logging "Colorful Logging" is a library allowing for simple and efficient logging as well for benchmarking. What can you use it for? -Obvious

Mateusz Antkiewicz 1 Feb 17, 2022
Portable, simple and extensible C++ logging library

Plog - portable, simple and extensible C++ logging library Pretty powerful logging library in about 1000 lines of code Introduction Hello log! Feature

Sergey Podobry 1.5k Jun 30, 2022
A DC power monitor and data logger

Hoverboard Power Monitor I wanted to gain a better understanding of the power consumption of my hoverboard during different riding situations. For tha

Niklas Roy 22 May 1, 2021
An ATTiny85 implementation of the well known sleep aid. Includes circuit, software and 3d printed case design

dodowDIY An ATTiny85 implementation of the well known sleep aid. Includes circuit, software and 3d printed case design The STL shells are desiged arou

null 14 Nov 7, 2021