C++ Benchmark Authoring Library/Framework

Overview

Celero

C++ Benchmarking Library

Copyright 2017-2019 John Farrier

Apache 2.0 License

Community Support

A Special Thanks to the following corporations for their support:

Builds and Testing

Branch Status
origin/master: Build Status (Master)
origin/develop: Build Status (Develop)

Celero has been successfully built on the following platforms during development. See Travis.CI for more details.

  • GCC v6.0.0
  • GCC v7.0.0
  • GCC v8.0.0
  • LLVM v3.9.0
  • LLVM v5.0.1
  • LLVM v7.0.0
  • LLVM v8.0.0
  • Visual Studio 2017
  • Visual Studio 2019
  • XCode v10.1
  • XCode v10.3
  • XCode v11.0

Quality Control

Tooling Status
Codacy Codacy Badge
Statistics View on OpenHub

Overview

Developing consistent and meaningful benchmark results for code is a complicated task. Measurement tools exist (Intel® VTune™ Amplifier, SmartBear AQTime, Valgrind, etc.) external to applications, but they are sometimes expensive for small teams or cumbersome to utilize. This project, Celero, aims to be a small library which can be added to a C++ project and perform benchmarks on code in a way which is easy to reproduce, share, and compare among individual runs, developers, or projects. Celero uses a framework similar to that of GoogleTest to make its API more natural to use and integrate into a project. Make automated benchmarking as much a part of your development process as automated testing.

Celero uses CMake to provide cross-platform builds. It does require a modern compiler (Visual C++ 2012+, GCC 4.7+, Clang 2.9+) due to its use of C++11.

Once Celero is added to your project. You can create dedicated benchmark projects and source files. For convenience, there is single header file and a CELERO_MAIN macro that can be used to provide a main() for your benchmark project that will automatically execute all of your benchmark tests.

Key Features

  • Supports Windows, Linux, and OSX using C++11.
  • The timing utilities can be used directly in production code (independent of benchmarks).
  • Console table output is formatted as Markdown to easily copy/paste into documents.
  • Archive results to track performance over time.
  • Integrates into CI/CT/CD environments with JUnit-formatted output.
  • User-defined Experiment Values can scale test results, sample sizes, and user-defined properties for each run.
  • User-defined Measurements allow for measuring anything in addition to timing.
  • Supports Test Fixtures.
  • Supports fixed-time benchmark baselines.
  • Capture a rich set of timing statistics to a file.
  • Easily installed using CMake, Conan, or VCPkg.

Command Line

<celeroOutputExecutable> [-g groupNameToRun] [-t resultsTable.csv] [-j junitOutputFile.xml] [-a resultArchive.csv] [-d numberOfIterationsPerDistribution] [-h]
  • -g Use this option to run only one benchmark group out of all benchmarks contained within a test executable.
  • -t Writes all results to a CSV file. Very useful when using problem sets to graph performance.
  • -j Writes JUnit formatted XML output. To utilize JUnit output, benchmarks must use the _TEST version of the macros and specify an expected baseline multiple. When the test exceeds this multiple, the JUnit output will indicate a failure.
  • -a Builds or updates an archive of historical results, tracking current, best, and worst results for each benchmark.
  • -d (Experimental) builds a plot of four different sample sizes to investigate the distribution of sample results.

Celero Basics

Background

The goal, generally, of writing benchmarks is to measure the performance of a piece of code. Benchmarks are useful for comparing multiple solutions to the same problem to select the most appropriate one. Other times, benchmarks can highlight the performance impact of design or algorithm changes and quantify them in a meaningful way.

By measuring code performance, you eliminate errors in your assumptions about what the "right" solution is for performance. Only through measurement can you confirm that using a lookup table, for example, is faster than computing a value. Such lore (which is often repeated) can lead to bad design decisions and, ultimately, slower code.

The goal of writing correct benchmarking code is to eliminate all of the noise and overhead, and measure just the code under test. Sources of noise in the measurements include clock resolution noise, operating system background operations, test setup/teardown, framework overhead, and other unrelated system activity.

At a theoretical level, we want to measure "t," the time to execute the code under test. In reality, we measure "t" plus all of this measurement noise.

These extraneous contributors to our measurement of "t" fluctuate over time. Therefore, we want to try to isolate "t'. The way this is accomplished is by making many measurements, but only keeping the smallest total. The smallest total is necessarily the one with the smallest noise contribution and closest to the actual time "t."

Once this measurement is obtained, it has little meaning in isolation. It is essential to create a baseline test by which to compare. A baseline should generally be a "classic" or "pure" solution to the problem on which you measure a solution. Once you have a baseline, you have a meaningful time to compare your algorithm against. Merely saying that your fancy sorting algorithm (fSort) sorted a million elements in 10 milliseconds is not sufficient by itself. However, compare that to a classic sorting algorithm baseline such as quicksort (qSort) and then you can say that fSort is 50% faster than qSort on a million elements. That is a meaningful and powerful measurement.

Implementation

Celero heavily utilizes C++11 features that are available in both Visual C++ 2012 and GCC 4.7. C++11 greatly aided in making the code clean and portable. To make adopting the code more manageable, all definitions needed by a user are defined in a celero namespace within a single include file: Celero.h.

Celero.h has within it the macro definitions that turn each of the user benchmark cases into its own unique class with the associated test fixture (if any) and then registers the test case within a Factory. The macros automatically associate baseline test cases with their associated test benchmarks so that, at run time, benchmark-relative numbers can be computed. This association is maintained by TestVector.

The TestVector utilizes the PImpl idiom to help hide implementation and keep the include overhead of Celero.h to a minimum.

Celero reports its outputs to the command line. Since colors are nice (and perhaps contribute to the human factors/readability of the results), something beyond std::cout was called for. Console.h defines a simple color function, SetConsoleColor, which is utilized by the functions in the celero::print namespace to nicely format the program's output.

Measuring benchmark execution time takes place in the TestFixture base class, from which all benchmarks are written are ultimately derived. First, the test fixture setup code is executed. Then, the start time for the test is retrieved and stored in microseconds using an unsigned long. This is done to reduce floating point error. Next, the specified number of operations (iterations) is executed. When complete, the end time is retrieved, the test fixture is torn down, and the measured time for the execution is returned, and the results are saved.

This cycle is repeated for however-many samples were specified. If no samples were specified (zero), then the test is repeated until it as ran for at least one second or at least 30 samples have been taken. While writing this specific part of the code, there was a definite "if-else" relationship. However, the bulk of the code was repeated within the "if" and "else" sections. An old fashioned function could have been used here, but it was very natural to utilize std::function to define a lambda that could be called and keep all of the code clean. (C++11 is a fantastic thing.) Finally, the results are printed to the screen.

General Program Flow

To summarize, this pseudo-code illustrates how the tests are executed internally:

for(Each Experiment)
{
    for(Each Sample)
    {
        // Call the virtual function
        // and DO NOT include its time in the measurement.
        experiment->setUp();

        // Start the Timer
        timer->start();

        // Run all iterations
        for(Each Iteration)
        {
            // Call the virtual function
            // and include its time in the measurement.
            experiment->onExperimentStart(x);

            // Run the code under test
            experiment->run(threads, iterations, experimentValue);

            // Call the virtual function
            // and include its time in the measurement.
            experiment->onExperimentEnd();
        }

        // Stop the Timer
        timer->stop();

        // Record data...

        // Call the virtual teardown function
        // and DO NOT include its time in the measurement.
        experiment->tearDown();
    }
}

Using the Code

Celero uses CMake to provide cross-platform builds. It does require a modern compiler (Visual C++ 2012 or GCC 4.7+) due to its use of C++11.

Once Celero is added to your project. You can create dedicated benchmark projects and source files. For convenience, there is single header file and a CELERO_MAIN macro that can be used to provide a main() for your benchmark project that will automatically execute all of your benchmark tests.

Here is an example of a simple Celero Benchmark. (Note: This is a complete, runnable example.)

#include <celero/Celero.h>

#include <random>

#ifndef WIN32
#include <cmath>
#include <cstdlib>
#endif

///
/// This is the main(int argc, char** argv) for the entire celero program.
/// You can write your own, or use this macro to insert the standard one into the project.
///
CELERO_MAIN

std::random_device RandomDevice;
std::uniform_int_distribution<int> UniformDistribution(0, 1024);

///
/// In reality, all of the "Complex" cases take the same amount of time to run.
/// The difference in the results is a product of measurement error.
///
/// Interestingly, taking the sin of a constant number here resulted in a
/// great deal of optimization in clang and gcc.
///
BASELINE(DemoSimple, Baseline, 10, 1000000)
{
    celero::DoNotOptimizeAway(static_cast<float>(sin(UniformDistribution(RandomDevice))));
}

///
/// Run a test consisting of 1 sample of 710000 operations per measurement.
/// There are not enough samples here to likely get a meaningful result.
///
BENCHMARK(DemoSimple, Complex1, 1, 710000)
{
    celero::DoNotOptimizeAway(static_cast<float>(sin(fmod(UniformDistribution(RandomDevice), 3.14159265))));
}

///
/// Run a test consisting of 30 samples of 710000 operations per measurement.
/// There are not enough samples here to get a reasonable measurement
/// It should get a Baseline number lower than the previous test.
///
BENCHMARK(DemoSimple, Complex2, 30, 710000)
{
    celero::DoNotOptimizeAway(static_cast<float>(sin(fmod(UniformDistribution(RandomDevice), 3.14159265))));
}

///
/// Run a test consisting of 60 samples of 710000 operations per measurement.
/// There are not enough samples here to get a reasonable measurement
/// It should get a Baseline number lower than the previous test.
///
BENCHMARK(DemoSimple, Complex3, 60, 710000)
{
    celero::DoNotOptimizeAway(static_cast<float>(sin(fmod(UniformDistribution(RandomDevice), 3.14159265))));
}

The first thing we do in this code is to define a BASELINE test case. This template takes four arguments:

BASELINE(GroupName, BaselineName, Samples, Operations)
  • GroupName - The name of the benchmark group. This is used to batch together runs and results with their corresponding baseline measurement.
  • BaselineName - The name of this baseline for reporting purposes.
  • Samples - The total number of times you want to execute the given number of operations on the test code.
  • Operations - The total number of times you want to run the test code per sample.

Samples and operations here are used to measure very fast code. If you know the code in your benchmark will take some time less than 100 milliseconds, for example, your operations number would say to execute the code "operations" number of times before taking a measurement. Samples define how many measurements to make.

Celero helps with this by allowing you to specify zero samples. Zero samples will tell Celero to make some statistically significant number of samples based on how long it takes to complete your specified number of operations. These numbers will be reported at run time.

The celero::DoNotOptimizeAway template is provided to ensure that the optimizing compiler does not eliminate your function or code. Since this feature is used in all of the sample benchmarks and their baseline, it's time overhead is canceled out in the comparisons.

After the baseline is defined, various benchmarks are then defined. The syntax for the BENCHMARK macro is identical to that of the macro.

Results

Running Celero's simple example experiment (celeroDemoSimple.exe) benchmark gave the following output on a PC:

Celero
Timer resolution: 0.277056 us
|     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  |
|:--------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|
|DemoSimple      | Baseline        |            Null |              30 |         1000000 |         1.00000 |         0.09320 |     10729498.61 |
|DemoSimple      | Complex1        |            Null |               1 |          710000 |         0.99833 |         0.09305 |     10747479.64 |
|DemoSimple      | Complex2        |            Null |              30 |          710000 |         0.97898 |         0.09124 |     10959834.52 |
|DemoSimple      | Complex3        |            Null |              60 |          710000 |         0.98547 |         0.09185 |     10887733.66 |
Completed in 00:00:10.315012

The first test that executes will be the group's baseline. Celero took 30 samples of 1000000 iterations of the code in our test. (Each set of 1000000 iterations was measured, and this was done ten times and the shortest time was taken.) The "Baseline" value for the baseline measurement itself will always be 1.0.

After the baseline is complete, each individual test runs. Each test is executed and measured in the same way. However, there is an additional metric reported: Baseline. This compares the time it takes to compute the benchmark to the baseline. The data here shows that CeleroBenchTest.Complex1 takes 1.007949 times longer to execute than the baseline.

Automatically computing the number of Iterations and Samples

If you do want Celero to figure out a reasonable number of iterations to run, you can set the iteration value to 0 for your experiment. You can also set the number of samples to 0 to have it compute a statistically valid number of samples. (Note that the current implementation uses 30 as the default number of samples, but does calculate a reasonable number of iterations.)

Update the previous "DemoSimple" code's Complex1 case to use this feature as follows:

/// Run a test consisting of 0 samples of 0 iterations per measurement.
/// Since the sample size is equal to 0, celero will compute a number to use for both samples and iterations.
BENCHMARK(DemoSimple, Complex1, 0, 0)
{
    celero::DoNotOptimizeAway(static_cast<float>(sin(fmod(UniformDistribution(RandomDevice), 3.14159265))));
}

Now, when this executes, you will see a different number automatically computed for the number of iterations, and the sample size has been increased.

Celero
Timer resolution: 0.277056 us
|     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  |
|:--------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|
|DemoSimple      | Baseline        |            Null |              30 |         1000000 |         1.00000 |         0.09177 |     10897044.72 |
|DemoSimple      | Complex1        |            Null |              30 |         8388608 |         1.01211 |         0.09288 |     10766703.67 |
|DemoSimple      | Complex2        |            Null |              30 |          710000 |         0.99559 |         0.09136 |     10945304.31 |
|DemoSimple      | Complex3        |            Null |              60 |          710000 |         0.99671 |         0.09147 |     10933000.72 |
Completed in 00:00:37.583872

Statistically Sound Results

To use Celero for real science, there are three primary factors to consider when reviewing results. Firstly, you MUST check the generated assembly for your test. There are different paths to viewing the assembly for various compilers, but essentially this must be done to ensure that you did not optimize-out critical code. You must also verify, via assembly, that you are comparing apples to apples.

Once that is sorted out, you should run just the "Baseline" case several times. The "us/Iteration" and "Iterations/sec" should not fluctuate by any significant degree between runs. If they do, then ensure that your number of iterations is sufficiently large as to overcome the timer resolution on your machine. Once the number of iterations is high enough, ensure that you are performing a statistically significant number of samples. Lore has it that 30 samples are good, but use your own science to figure out the best number for your situation.

Finally, you need to ensure that the number of iterations and samples is producing stable output for your experiment cases. These numbers may be the same as your now-stable baseline case.

One factor that can impact the number of samples and iterations required is the amount of work that your experiment is doing. For cases where you are utilizing Celero's "problem space" functionality to scale up the algorithms, you can correspondingly scale down the number of iterations. Doing so can reduce the total run time of the more extensive experiments by doing fewer iterations, buy while still maintaining a statistically meaningful measurement. (It saves you time.)

Threaded Benchmarks

Celero can automatically run threaded benchmarks. BASELINE_T and BENCHMARK_T can be used to launch the given code on its own thread using a user-defined number of concurrent executions. celeroDemoMultithread illustrates using this feature. When defining these macros, they use the following format:

BASELINE_T(groupName, baselineName, fixtureName, samples, iterations, threads);
BASELINE_FIXED_T(groupName, baselineName, fixtureName, samples, iterations, threads, useconds);

BENCHMARK_T(groupName, benchmarkName, fixtureName, samples, iterations, threads);
BENCHMARK_TEST_T(groupName, benchmarkName, fixtureName, samples, iterations, threads, target);

Fixed Measurement Benchmarks

While Celero typically measures the baseline time and then executes benchmark cases for comparison, you can also specify a fixed measurement time. This is useful for measuring performance against a real-time requirement. To use, utilize the _FIXED_ version of the BASELINE and BENCHMARK macros.

// No threads or test fixtures.
BASELINE_FIXED(groupName, baselineName, samples, iterations, useconds);

// For using test fixtures:
BASELINE_FIXED_F(groupName, baselineName, fixtureName, samples, iterations, useconds);

// For using threads and test fixtures.
BASELINE_FIXED_T(groupName, baselineName, fixtureName, samples, iterations, threads, useconds);

Example:

BASELINE_FIXED_F(DemoTransform, FixedTime, DemoTransformFixture, 30, 10000, 100)
{ /* Nothing to do */ }

User-Defined Measurements (UDM)

Celero, by default, measures the execution time of your experiments. If you want to measure anything else, say for example the number of page faults via PAPI, user-defined measurements are for you.

Adding user-defined measurements consists of three steps:

  • Define a class for your user-defined measurement. (One per type of measurement.) This class must derive from celero::UserDefinedMeasurement. Celero provides a convenience class celero::UserDefinedMeasurementTemplate<> which will be sufficient for most uses.
  • Add (an) instance(s) of your class(es) to your test fixture. Implement getUserDefinedMeasurements to return these instances.
  • At the appropriate point (most likely tearDown()), record your measurements in your user-defined measurement instances.

As a rough example, say you want to measure the number of page faults. The class for your user-defined measurement could be as simple as this:

class PageFaultUDM : public celero::UserDefinedMeasurementTemplate<size_t>
{
  virtual std::string getName() const override
  {
    return "Page Faults";
  }

  // Optionally turn off some statistical reporting.
  virtual bool reportKurtosis() const override
  {
    return false;
  }
};

The only thing you need to implement in this case is a unique name. Other virtual functions are available inside celero::UserDefinedMeasurementTemplate and celero::UserDefinedMeasurement that you can leverage as needed. There are optional virtual functions that you can override to turn off specific statistical measurements in the output. These are:

    virtual bool reportSize() const;
    virtual bool reportMean() const;
    virtual bool reportVariance() const;
    virtual bool reportStandardDeviation() const;
    virtual bool reportSkewness() const;
    virtual bool reportKurtosis() const;
    virtual bool reportZScore() const;
    virtual bool reportMin() const;
    virtual bool reportMax() const;

(By default, all of the report functions inside UserDefinedMeasurementTemplate return true.)

Now, add it to your regular Celero test fixure:

class SortFixture : public celero::TestFixture
{
public:
    SortFixture()
    {
        this->pageFaultUDM.reset(new PageFaultUDM());
    }

    [...]

    virtual std::vector<std::shared_ptr<celero::UserDefinedMeasurement>> getUserDefinedMeasurements() const override
    {
        return { this->pageFaultUDM };
    }

private:
    std::shared_ptr<CopyCountUDM> pageFaultUDM;
};

Finally, you need to record your results. For this pseud-code example, assume two functions exist: resetPageFaultCounter() and getPageFaults(). These reset the number of page faults and return the number of page faults since last reset, respectively. Then, add these to the setUp and tearDown methods:

class SortFixture : public celero::TestFixture
{
public:
    SortFixture()
    {
        this->pageFaultUDM.reset(new PageFaultUDM());
    }

    [...]

    // Gather page fault statistics inside the UDM.
    virtual void onExperimentEnd() override
    {
        [...]
        this->pageFaultUDM->addValue(this->getPageFaults());
    }

    [...] 

    // Reset the page fault counter.
    virtual void setUp(const celero::TestFixture::ExperimentValue& experimentValue) override
    {
        [...]
        this->resetPageFaultCounter();
    }

    [...]

    virtual std::vector<std::shared_ptr<celero::UserDefinedMeasurement>> getUserDefinedMeasurements() const override
    {
        return { this->pageFaultUDM };
    }

private:
    std::shared_ptr<CopyCountUDM> pageFaultUDM;
    [...]
};

You will now be reporting statistics on the number of page faults that occurred during your experiments. See the ExperimentSortingRandomIntsWithUDM example for a complete example.

A note on User-Defined Measurements: This capability was introduced well after the creation of Celero. While it is a great enhancement to the library, it was not designed-in to the library. As such, the next major release of the library (v3.x) may change the way this is implemented and exposed to the library's users.

Frequency Scaling

CPU Frequency Scaling should be disabled if possible when executing benchmarks. While there is code in Celero to attempt to do this, it may not have sufficient privileges to be effective. On Linux systems, this can be accomplished as follows:

sudo cpupower frequency-set --governor performance
./celeroBenchmarkExecutable
sudo cpupower frequency-set --governor powersave

Notes

  • Benchmarks should always be performed on Release builds. Never measure the performance of a Debug build and make changes based on the results. The (optimizing) compiler is your friend concerning code performance.
  • Accuracy is tied very closely to the total number of samples and the sample sizes. As a general rule, you should aim to execute your baseline code for about as long as your longest benchmark test. Further, it is helpful if all of the benchmark tests take about the same order of magnitude of execution time. (Don't compare a baseline that executed in 0.1 seconds with benchmarks that take 60 seconds and an hour, respectively.)
  • Celero has Doxygen documentation of its API.
  • Celero supports test fixtures for each baseline group.

Celero Charts

Background

It has been noted many times that writing an algorithm to solve small problems is relatively easy. "Brute force" methods tend to function just as well as more agile approaches. However, as the size of data increases, beneficial algorithms scale their performance to match.

Theoretically, the best we can hope for with an algorithm is that is scales linearly (Order N, O(N) complexity) with respect to the problem size. That is to say that if the problem set doubles, the time it takes for the algorithm to execute doubles. While this seems obvious, it is often an elusive goal.

Even well-performing algorithms eventually run into problems with available memory or CPU cache. When making decisions within our software about algorithms and improvements to existing code, only through measurement and experimentation, can we know our complex algorithms perform acceptably.

Using the Code

While Celero offers simple benchmarking of code and algorithms, it also provides a more sophisticated method or directly producing performance graphs of how the benchmarks change with respect to some independent variable, referred to here as the Problem Set.

Within Celero, a test fixture can push integers into a ProblemSetValues vector which allows for the fixture's own SetUp function to scale a problem set for the benchmarks to run against. For each value pushed into the ProblemSetValues vector, a complete set of benchmarks is executed. These measured values are then stored and can be written out to a CSV file for easy plotting of results.

To demonstrate, we will study the performance of three common sorting algorithms: BubbleSort, SelectionSort, and std::sort. (The source code to this demo is distributed with Celero, available on GitHub.) First, we will write a test fixture for Celero.

class SortFixture : public celero::TestFixture
{
public:
    SortFixture()
    {
    }

    virtual std::vector<celero::TestFixture::ExperimentValue> getExperimentValues() const override
    {
        std::vector<celero::TestFixture::ExperimentValue> problemSpace;

        // We will run some total number of sets of tests together. 
        // Each one growing by a power of 2.
        const int totalNumberOfTests = 6;

        for(int i = 0; i < totalNumberOfTests; i++)
        {
            // ExperimentValues is part of the base class and allows us to specify
            // some values to control various test runs to end up building a nice graph.
            problemSpace.push_back({int64_t(pow(2, i+1))});
        }

        return problemSpace;
    }

    /// Before each run, build a vector of random integers.
    virtual void setUp(const celero::TestFixture::ExperimentValue& experimentValue)
    {
        this->arraySize = experimentValue.Value;
        this->array.reserve(this->arraySize);
    }

    /// Before each iteration. A common utility function to push back random ints to sort.
    void randomize()
    {
        for(int i = 0; i < this->arraySize; i++)
        {
            this->array.push_back(rand());
        }
    }

    /// After each iteration, clear the vector of random integers.
    void clear()
    {
        this->array.clear();
    }

    std::vector<int64_t> array;
    int64_t arraySize;
};

Before the test fixture is utilized by a benchmark, Celero will create an instantiation of the class and call its getExperimentValues() function. The test fixture can then build a vector of TestFixture::ExperimentValue values. For each value added to this array, benchmarks will be executed following calls to the setUp virtual function. A new test fixture is created for each measurement.

The SetUp() virtual function is called before each benchmark test is executed. When using a problem space values vector, the function will be given a value that was previously pushed into the array within the constructor. The function's code can then decide what to do with it. Here, we are using the value to indicate how many elements should be in the array that we intend to sort. For each of the array elements, we simply add a pseudo-random integer.

Now for implementing the actual sorting algorithms. For the baseline case, I implemented the first sorting algorithm I ever learned in school: Bubble Sort. The code for bubble sort is straight forward.

// For a baseline, I'll choose Bubble Sort.
BASELINE_F(SortRandInts, BubbleSort, SortFixture, 30, 10000)
{
    this->randomize();

    for(int x = 0; x < this->arraySize; x++)
    {
        for(int y = 0; y < this->arraySize - 1; y++)
        {
            if(this->array[y] > this->array[y+1])
            {
                std::swap(this->array[y], this->array[y+1]);
            }
        }
    }

    this->clear();
}

Celero will use the values from this baseline when computing a base lined measurement for the other two algorithms in the test group DemoSort. However, when we run this at the command line, we will specify an output file. The output file will contain the measured number of seconds the algorithm took to execute on the given array size.

Next, we will implement the Selection Sort algorithm.

BENCHMARK_F(SortRandInts, SelectionSort, SortFixture, 30, 10000)
{
    this->randomize();

    for(int x = 0; x < this->arraySize; x++)
    {
        auto minIdx = x;

        for(int y = x; y < this->arraySize; y++)
        {
            if(this->array[minIdx] > this->array[y])
            {
                minIdx = y;
            }
        }

        std::swap(this->array[x], this->array[minIdx]);
    }

    this->clear();
}

Finally, for good measure, we will simply use the Standard Library's sorting algorithm: Introsort. We only need to write a single line of code, but here it is for completeness.

BENCHMARK_F(SortRandInts, stdSort, SortFixture, 30, 10000)
{
    this->randomize();
    std::sort(this->array.begin(), this->array.end());
    this->clear();
}

Results

This test was run on a 4.00 GHz AMD with four cores, eight logical processors, and 32 GB of memory. (Hardware aside, the relative performance of these algorithms should be the same on any modern hardware.)

Celero outputs timing and benchmark references for each test automatically. However, to write to an output file for easy plotting, simply specify an output file on the command line.

celeroExperimentSortingRandomInts.exe -t results.csv

While not particularly surprising std::sort is by far the best option with any meaningful problem set size. The results are summarized in the following table output written directly by Celero:

Celero
Celero: CPU processor throttling disabled.
Timer resolution: 0.254288 us
Writing results to: results.csv
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  |
-----------------------------------------------------------------------------------------------------------------------------------------------
SortRandInts    | BubbleSort      |               2 |              30 |           10000 |         1.00000 |         0.05270 |     18975332.07 |
SortRandInts    | BubbleSort      |               4 |              30 |           10000 |         1.00000 |         0.12060 |      8291873.96 |
SortRandInts    | BubbleSort      |               8 |              30 |           10000 |         1.00000 |         0.31420 |      3182686.19 |
SortRandInts    | BubbleSort      |              16 |              30 |           10000 |         1.00000 |         1.09130 |       916338.31 |
SortRandInts    | BubbleSort      |              32 |              30 |           10000 |         1.00000 |         3.23470 |       309147.68 |
SortRandInts    | BubbleSort      |              64 |              30 |           10000 |         1.00000 |        10.82530 |        92376.19 |
SortRandInts    | SelectionSort   |               2 |              30 |           10000 |         1.09108 |         0.05750 |     17391304.35 |
SortRandInts    | SelectionSort   |               4 |              30 |           10000 |         1.03317 |         0.12460 |      8025682.18 |
SortRandInts    | SelectionSort   |               8 |              30 |           10000 |         1.01464 |         0.31880 |      3136762.86 |
SortRandInts    | SelectionSort   |              16 |              30 |           10000 |         0.72253 |         0.78850 |      1268230.82 |
SortRandInts    | SelectionSort   |              32 |              30 |           10000 |         0.63771 |         2.06280 |       484777.97 |
SortRandInts    | SelectionSort   |              64 |              30 |           10000 |         0.54703 |         5.92180 |       168867.57 |
SortRandInts    | InsertionSort   |               2 |              30 |           10000 |         1.07021 |         0.05640 |     17730496.45 |
SortRandInts    | InsertionSort   |               4 |              30 |           10000 |         1.05970 |         0.12780 |      7824726.13 |
SortRandInts    | InsertionSort   |               8 |              30 |           10000 |         1.00382 |         0.31540 |      3170577.05 |
SortRandInts    | InsertionSort   |              16 |              30 |           10000 |         0.74104 |         0.80870 |      1236552.49 |
SortRandInts    | InsertionSort   |              32 |              30 |           10000 |         0.61508 |         1.98960 |       502613.59 |
SortRandInts    | InsertionSort   |              64 |              30 |           10000 |         0.45097 |         4.88190 |       204838.28 |
SortRandInts    | QuickSort       |               2 |              30 |           10000 |         1.18027 |         0.06220 |     16077170.42 |
SortRandInts    | QuickSort       |               4 |              30 |           10000 |         1.16169 |         0.14010 |      7137758.74 |
SortRandInts    | QuickSort       |               8 |              30 |           10000 |         1.01400 |         0.31860 |      3138731.95 |
SortRandInts    | QuickSort       |              16 |              30 |           10000 |         0.65060 |         0.71000 |      1408450.70 |
SortRandInts    | QuickSort       |              32 |              30 |           10000 |         0.48542 |         1.57020 |       636861.55 |
SortRandInts    | QuickSort       |              64 |              30 |           10000 |         0.34431 |         3.72730 |       268290.72 |
SortRandInts    | stdSort         |               2 |              30 |           10000 |         1.08539 |         0.05720 |     17482517.48 |
SortRandInts    | stdSort         |               4 |              30 |           10000 |         0.94776 |         0.11430 |      8748906.39 |
SortRandInts    | stdSort         |               8 |              30 |           10000 |         0.76926 |         0.24170 |      4137360.36 |
SortRandInts    | stdSort         |              16 |              30 |           10000 |         0.45954 |         0.50150 |      1994017.95 |
SortRandInts    | stdSort         |              32 |              30 |           10000 |         0.33573 |         1.08600 |       920810.31 |
SortRandInts    | stdSort         |              64 |              30 |           10000 |         0.23979 |         2.59580 |       385237.69 |

The data shows first the test group name. Next, all of the data sizes are output. Then each row shows the baseline or benchmark name and the corresponding time for the algorithm to complete measured in useconds. This data, in CSV format, can be directly read by programs such as Microsoft Excel and plotted without any modification. The CSV contains the following data:

Group Experiment Problem Space Samples Iterations Baseline us/Iteration Iterations/sec Min (us) Mean (us) Max (us) Variance Standard Deviation Skewness Kurtosis Z Score
SortRandInts BubbleSort 2 30 10000 1 0.0527 1.89753e+07 527 532.533 582 118.74 10.8968 3.64316 13.0726 0.507794
SortRandInts BubbleSort 4 30 10000 1 0.1206 8.29187e+06 1206 1230.77 1455 1941.22 44.0593 4.60056 20.9542 0.562122
SortRandInts BubbleSort 8 30 10000 1 0.3142 3.18269e+06 3142 3195.73 3425 3080.41 55.5014 2.48383 7.72605 0.968143
SortRandInts BubbleSort 16 30 10000 1 1.0913 916338 10913 11022.1 11228 5450.26 73.8259 0.71778 0.387441 1.47825
SortRandInts BubbleSort 32 30 10000 1 3.2347 309148 32347 32803.9 36732 650545 806.563 4.1236 17.2616 0.566519
SortRandInts BubbleSort 64 30 10000 1 10.8253 92376.2 108253 110999 133389 2.8152e+07 5305.85 3.15455 9.60246 0.517542
SortRandInts SelectionSort 2 30 10000 1.09108 0.0575 1.73913e+07 575 620.167 753 2170.97 46.5937 1.33794 1.19871 0.969373
SortRandInts SelectionSort 4 30 10000 1.03317 0.1246 8.02568e+06 1246 1339.57 1413 2261.7 47.5574 -0.263592 -0.727621 1.96745
SortRandInts SelectionSort 8 30 10000 1.01464 0.3188 3.13676e+06 3188 3500.63 3742 20181.2 142.061 -0.438792 -0.522354 2.2007
SortRandInts SelectionSort 16 30 10000 0.722533 0.7885 1.26823e+06 7885 8504.67 9482 322584 567.965 0.274438 -1.43741 1.09103
SortRandInts SelectionSort 32 30 10000 0.63771 2.0628 484778 20628 20826.7 21378 26307.7 162.196 1.64431 2.96239 1.22526
SortRandInts SelectionSort 64 30 10000 0.547033 5.9218 168868 59218 59517.7 60308 55879.5 236.389 1.42419 2.38341 1.26783
SortRandInts InsertionSort 2 30 10000 1.07021 0.0564 1.77305e+07 564 585.4 814 2239.42 47.3225 4.06868 16.6254 0.452216
SortRandInts InsertionSort 4 30 10000 1.0597 0.1278 7.82473e+06 1278 1312 1574 3857.17 62.1061 3.06791 9.38706 0.54745
SortRandInts InsertionSort 8 30 10000 1.00382 0.3154 3.17058e+06 3154 3208.57 3617 8053.91 89.7436 3.40649 12.5161 0.608029
SortRandInts InsertionSort 16 30 10000 0.741043 0.8087 1.23655e+06 8087 8198.43 8556 11392.8 106.737 1.66984 3.10417 1.044
SortRandInts InsertionSort 32 30 10000 0.61508 1.9896 502614 19896 20088.9 20593 20955.8 144.761 1.97818 4.12296 1.33254
SortRandInts InsertionSort 64 30 10000 0.450971 4.8819 204838 48819 49152 50253 129327 359.62 1.7583 2.51588 0.925884
SortRandInts QuickSort 2 30 10000 1.18027 0.0622 1.60772e+07 622 647.4 836 2492.52 49.9252 2.83628 7.08836 0.508761
SortRandInts QuickSort 4 30 10000 1.16169 0.1401 7.13776e+06 1401 1450 1655 4476.21 66.9045 1.94538 2.90363 0.732388
SortRandInts QuickSort 8 30 10000 1.014 0.3186 3.13873e+06 3186 3245.8 3549 5043.89 71.0203 2.88396 9.36231 0.842012
SortRandInts QuickSort 16 30 10000 0.6506 0.71 1.40845e+06 7100 7231.07 7670 17248.2 131.332 1.93858 3.21011 0.997977
SortRandInts QuickSort 32 30 10000 0.485424 1.5702 636862 15702 15863.2 16469 33518 183.079 2.01833 3.2763 0.880494
SortRandInts QuickSort 64 30 10000 0.344314 3.7273 268291 37273 37554.4 37999 34113.3 184.698 0.822276 -0.0186633 1.52339
SortRandInts stdSort 2 30 10000 1.08539 0.0572 1.74825e+07 572 591.233 764 1863.15 43.1642 2.86875 7.63924 0.445585
SortRandInts stdSort 4 30 10000 0.947761 0.1143 8.74891e+06 1143 1185.33 1385 3435.4 58.6123 2.53277 5.69826 0.72226
SortRandInts stdSort 8 30 10000 0.769255 0.2417 4.13736e+06 2417 2459.47 2838 6555.84 80.9682 3.78132 14.5264 0.524486
SortRandInts stdSort 16 30 10000 0.459544 0.5015 1.99402e+06 5015 5120.97 5283 6486.65 80.5398 0.55161 -0.798651 1.31571
SortRandInts stdSort 32 30 10000 0.335734 1.086 920810 10860 13398 24592 8.85889e+06 2976.39 2.1597 4.93241 0.852722
SortRandInts stdSort 64 30 10000 0.23979 2.5958 385238 25958 27384.8 35800 4.88819e+06 2210.92 2.24632 5.15422 0.645326

The point here is not that std::sort is better than more elementary sorting methods, but how easily measurable results can be obtained. In making such measurements more accessible and easier to code, they can become part of the way we code just as automated testing has become.

Test early and test often!

Notes

  • Because I like explicitness as much as the next programmer, I want to note that the actual sorting algorithm used by std::sort is not defined in the standard, but references cite Introsort as a likely contender for how an STL implementation would approach std::sort. Wikipedia.
  • When choosing a sorting algorithm, start with std::sort and see if you can make improvements from there.
  • Don't just trust your experience, measure your code!

FAQ

Q: I asked for N iterations, but Celero ran N+1 iterations.

The internal code will do one un-measured "warm-up" pass. This helps account for caching which may otherwise influence measurements.

Q: As my problem space increases in size, my runs take longer and longer. How do I account for the increased complexity?

When defining a problem space, you set up a celero::TestFixture::ExperimentValue. If the Iterations member in the class is greater than zero, that number will be used to control the amount of iterations for the corresponding celero::TestFixture::ExperimentValue.

class MyFixture : public celero::TestFixture
{
public:
    virtual std::vector<std::pair<int64_t, uint64_t>> getExperimentValues() const override
    {
        std::vector<std::pair<int64_t, uint64_t>> problemSpaceValues;

        // We will run some total number of sets of tests together.
        // Each one growing by a power of 2.
        const int totalNumberOfTests = 12;

        for(int i = 0; i < totalNumberOfTests; i++)
        {
            // ExperimentValues is part of the base class and allows us to specify
            // some values to control various test runs to end up building a nice graph.
            // We make the number of iterations decrease as the size of our problem space increases
            // to demonstrate how to adjust the number of iterations per sample based on the
            // problem space size.
            problemSpaceValues.push_back(std::make_pair(int64_t(pow(2, i + 1)), uint64_t(pow(2, totalNumberOfTests - i))));
        }

        return problemSpaceValues;
    }

Example and Demo Code

Example and demonstration code are provided under Celero's "experiments" folder. There are two types of projects. The first is "Demo" projects. These are useful for illustrating techniques and ideas, but may not be interesting from a computer-science perspective. Experiments, on the other hand, have been added which demonstrate real-world questions.

The addition of real use cases of Celero is encouraged to be submitted to Celero's development branch for inclusion in the Demo and Experiment library.

Issues
  • Static library built in VS2015 doesn't work

    Static library built in VS2015 doesn't work

    Hi, I tried to create static library only build with VS2015. I received the .lib file (around 14MB), but when I tried to complle your runnable example, I received LNK2019 errors (log file attached). However, the same example, compiled with dynamic libs (40KB celero.lib and ~1MB celero.dll) builds and runs without a problem.

    build.txt

    opened by everyonecancode 12
  • Generating graphs for benchmark results

    Generating graphs for benchmark results

    This is not a real issue report.

    I just wanted to share little utility script which I developed to generate pretty graphs from Celero results using Bokeh: https://github.com/mloskot/pycelerograph

    I thought it might be useful to others.

    Kudos for Celero, it's a great benchmarking tool!

    opened by mloskot 5
  • Add timed pre/post test function calls

    Add timed pre/post test function calls

    The pull request gives fixtures the ability to call a function that will be included in the timing results before (onExperimentStart) and after (onExperimentEnd) the benchmark is executed. The proposed functionality enables Celero to time thread creation and wait on asynchronous events to complete without impacting the performance too dramatically (as would be the case if thread.wait() were called in every benchmark execution).

    opened by bkloppenborg 5
  • Add User-Defined Measurements

    Add User-Defined Measurements

    This is a first attempt at what I suggested in https://github.com/DigitalInBlue/Celero/issues/126

    It has become a larger change than I expected, but that's also because I needed to change a lot in the printing code to accommodate for the new columns. I haven't yet added output of the user-defined measurements in the other two formats; this will follow if you like what I have so far.

    There are some points which I'm not completely happy with, but for which I don't see a perfect solution:

    • The code collecting all the additional fields to be printed does not feel like it belongs in Celero.cpp, however it also doesn't fit into Executor.cpp - or anywhere else, really. Maybe this should be part of the UDMCollector?
    • The UDMCollector itself feels a bit like it should be part of ExperimentResult. However, at the time the ExperimentResult objects are created, we don't have the TestFixture objects yet (which I need for the UDMCollector…)

    Also, I didn't find any specification of what code style you prefer. I tried to go with what I found and keep it consistent.

    opened by tinloaf 4
  • Benchmarks' exceptions support

    Benchmarks' exceptions support

    This PR adds support of exceptions in google test manner.

    There are several possible related questions/issues:

    • Output to console: std::exception::what() is printed instead of measurement result columns ("Unknown exception" for exceptions not derived from std::exception). If baseline test failed with exception all related benchmarks are skipped, special table row is printed.
    • Output to csv-file (--outputTable, ResultTable module): failure-column (0/1) added.
    • Output to junit result xml-file (--junit): there is already failure field in run result (for comparison with objective), tests failed with exception have <error type="exception"/> field.
    • Output to archive (--archive): added additional column (as in csv output). If failed test isn't found in archive then failure output is printed. Failure outputs are overwritten with valid measurements unconditionally. Failed tests don't increment total samples collected and don't change min/max statistics.
    • To return failure status from test run method I used std::pair<bool, uint64_t> instead of uint64_t, it's a substitute for optional. Maybe std::tr1::optional should be used, but I don't know if it's cross-compatible.
    • Windows SEH exceptions are supported but POSIX signals aren't. The same can be said about gtest. I suppose SEH exceptions are supported because their support is straightforward. AFAIK signals can't be handled in safe manner (even non-fatal ones).
    • Recover from fatal SEH exceptions isn't tested, and may be impossible...
    opened by peterazmanov 4
  • Undo the re-introduction of zero-size experiments

    Undo the re-introduction of zero-size experiments

    In commit 353faabdf6adfdc561dc58654f4a7dc8ee46907e I fixed a bug in which the experiment size vectors are (unnecessarily) populated with experiments of zero size. In commit 999c206c9106c6a64edab037aa4809936ea7c00d (specifically in Experiment.cpp line 261 the offending code was re-introduced. This PR removes the offending code.

    opened by bkloppenborg 4
  • Support for different types in problemSpace array.

    Support for different types in problemSpace array.

    Hello! Thank you for the support. Celero is being very helpful.

    It would be nice if getExperimentValues() was a templated method. Thus, the problemSpace array could be parameterized in compile time by using templates.

    Another option would be making the problemSpace some kind of map data structure, which maps a integer to a structure representing the input of the benchmark test.

    In my opinion, this would give more flexibility in the setUp methods.

    opened by danielsaad 4
  • Baseline only runs once.

    Baseline only runs once.

    I'm having a problem with the baseline code.

    Despite of adding seven values into ProblemSetValues array, the baseline code is ran only once with the first value.

    The benchmarks code runs normally, but after the first phase, it is compared with a artificial baseline which takes one second. Hence, only the first baseline code which is ran from the first value from ProblemSetValues is compared correctly with the other benchmarks code.

    I've set the number of samples =1. And the number of interations =1;

    I'm doing something wrong?

    class LandauVishkinBenchmark : public celero::TestFixture{ public: const static int NUMBER_OF_RUNS = 1; //Number of samples const static int NUMBER_OF_ITERATIONS = 1;

    LandauVishkinBenchmark(){
        this->ProblemSetValues.push_back(0);
        this->ProblemSetValues.push_back(1);
        this->ProblemSetValues.push_back(2);
        this->ProblemSetValues.push_back(3);
        this->ProblemSetValues.push_back(6);
        this->ProblemSetValues.push_back(10);
        this->ProblemSetValues.push_back(20);
    
    }
    virtual void SetUp(const int32_t errors){
        _errors = errors;
    }
    virtual void TearDown(){
    }
    
    Text* createRandomPattern(Text* T,integer size);
    virtual ~LandauVishkinBenchmark(){}
    integer _errors;
    

    };

    Here is the output from Celero:

    [==========] [ CELERO ] [==========] [ STAGE ] Baselining [==========] [ RUN ] LV_DNA50_50.LV_DC_SE -- 1 run, 1 call per run. [ DONE ] LV_DNA50_50.LV_DC_SE (4.328641 sec) [1 calls in 4.328641 sec] [4.328641 us/call] [0.023102 calls/sec] [==========] [ STAGE ] Benchmarking [==========] [ RUN ] LV_DNA50_50.LV_RMQ -- 1 run, 1 call per run. [ DONE ] LV_DNA50_50.LV_RMQ (29.722101 sec) [1 calls in 29.722101 sec] [29.722101 us/call] [0.003364 calls/sec] [ BASELINE ] LV_DNA50_50.LV_RMQ 6.866382 [ RUN ] LV_DNA50_50.LV_DMIN -- 1 run, 1 call per run. [ DONE ] LV_DNA50_50.LV_DMIN (13.512609 sec) [1 calls in 13.512609 sec] [13.512609 us/call] [0.007400 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DMIN 3.121675 [ RUN ] LV_DNA50_50.LV_DC -- 1 run, 1 call per run. [ DONE ] LV_DNA50_50.LV_DC (0.517192 sec) [1 calls in 0.517192 sec] [0.517192 us/call] [0.193352 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC 0.119481 [ RUN ] LV_DNA50_50.LV_DC_NAV -- 1 run, 1 call per run. [ DONE ] LV_DNA50_50.LV_DC_NAV (0.541809 sec) [1 calls in 0.541809 sec] [0.541809 us/call] [0.184567 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_NAV 0.125168 [ RUN ] LV_DNA50_50.LV_DC_PAR -- 1 run, 1 call per run. [ DONE ] LV_DNA50_50.LV_DC_PAR (0.233204 sec) [1 calls in 0.233204 sec] [0.233204 us/call] [0.428809 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_PAR 0.053875 [==========] [ STAGE ] Benchmarking [==========] [ RUN ] LV_DNA50_50.LV_RMQ -- 1 run, 1 call per run. Problem Set 1 of 6. [ DONE ] LV_DNA50_50.LV_RMQ (45.937276 sec) [1 calls in 45.937276 sec] [45.937276 us/call] [0.002177 calls/sec] [ BASELINE ] LV_DNA50_50.LV_RMQ 45.937276 [ RUN ] LV_DNA50_50.LV_DMIN -- 1 run, 1 call per run. Problem Set 1 of 6. [ DONE ] LV_DNA50_50.LV_DMIN (15.816064 sec) [1 calls in 15.816064 sec] [15.816064 us/call] [0.006323 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DMIN 15.816064 [ RUN ] LV_DNA50_50.LV_DC -- 1 run, 1 call per run. Problem Set 1 of 6. [ DONE ] LV_DNA50_50.LV_DC (0.797697 sec) [1 calls in 0.797697 sec] [0.797697 us/call] [0.125361 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC 0.797697 [ RUN ] LV_DNA50_50.LV_DC_NAV -- 1 run, 1 call per run. Problem Set 1 of 6. [ DONE ] LV_DNA50_50.LV_DC_NAV (1.028075 sec) [1 calls in 1.028075 sec] [1.028075 us/call] [0.097269 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_NAV 1.028075 [ RUN ] LV_DNA50_50.LV_DC_PAR -- 1 run, 1 call per run. Problem Set 1 of 6. [ DONE ] LV_DNA50_50.LV_DC_PAR (0.301660 sec) [1 calls in 0.301660 sec] [0.301660 us/call] [0.331499 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_PAR 0.301660 [==========] [ STAGE ] Benchmarking [==========] [ RUN ] LV_DNA50_50.LV_RMQ -- 1 run, 1 call per run. Problem Set 2 of 6. [ DONE ] LV_DNA50_50.LV_RMQ (60.275921 sec) [1 calls in 60.275921 sec] [60.275921 us/call] [0.001659 calls/sec] [ BASELINE ] LV_DNA50_50.LV_RMQ 60.275921 [ RUN ] LV_DNA50_50.LV_DMIN -- 1 run, 1 call per run. Problem Set 2 of 6. [ DONE ] LV_DNA50_50.LV_DMIN (17.519518 sec) [1 calls in 17.519518 sec] [17.519518 us/call] [0.005708 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DMIN 17.519518 [ RUN ] LV_DNA50_50.LV_DC -- 1 run, 1 call per run. Problem Set 2 of 6. [ DONE ] LV_DNA50_50.LV_DC (1.136020 sec) [1 calls in 1.136020 sec] [1.136020 us/call] [0.088027 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC 1.136020 [ RUN ] LV_DNA50_50.LV_DC_NAV -- 1 run, 1 call per run. Problem Set 2 of 6. [ DONE ] LV_DNA50_50.LV_DC_NAV (1.288852 sec) [1 calls in 1.288852 sec] [1.288852 us/call] [0.077588 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_NAV 1.288852 [ RUN ] LV_DNA50_50.LV_DC_PAR -- 1 run, 1 call per run. Problem Set 2 of 6. [ DONE ] LV_DNA50_50.LV_DC_PAR (0.469037 sec) [1 calls in 0.469037 sec] [0.469037 us/call] [0.213203 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_PAR 0.469037 [==========] [ STAGE ] Benchmarking [==========] [ RUN ] LV_DNA50_50.LV_RMQ -- 1 run, 1 call per run. Problem Set 3 of 6. [ DONE ] LV_DNA50_50.LV_RMQ (75.057950 sec) [1 calls in 75.057950 sec] [75.057950 us/call] [0.001332 calls/sec] [ BASELINE ] LV_DNA50_50.LV_RMQ 75.057950 [ RUN ] LV_DNA50_50.LV_DMIN -- 1 run, 1 call per run. Problem Set 3 of 6. [ DONE ] LV_DNA50_50.LV_DMIN (19.086265 sec) [1 calls in 19.086265 sec] [19.086265 us/call] [0.005239 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DMIN 19.086265 [ RUN ] LV_DNA50_50.LV_DC -- 1 run, 1 call per run. Problem Set 3 of 6. [ DONE ] LV_DNA50_50.LV_DC (1.742060 sec) [1 calls in 1.742060 sec] [1.742060 us/call] [0.057403 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC 1.742060 [ RUN ] LV_DNA50_50.LV_DC_NAV -- 1 run, 1 call per run. Problem Set 3 of 6. [ DONE ] LV_DNA50_50.LV_DC_NAV (1.428545 sec) [1 calls in 1.428545 sec] [1.428545 us/call] [0.070001 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_NAV 1.428545 [ RUN ] LV_DNA50_50.LV_DC_PAR -- 1 run, 1 call per run. Problem Set 3 of 6. [ DONE ] LV_DNA50_50.LV_DC_PAR (0.532349 sec) [1 calls in 0.532349 sec] [0.532349 us/call] [0.187847 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_PAR 0.532349 [==========] [ STAGE ] Benchmarking [==========] [ RUN ] LV_DNA50_50.LV_RMQ -- 1 run, 1 call per run. Problem Set 4 of 6. [ DONE ] LV_DNA50_50.LV_RMQ (118.817409 sec) [1 calls in 118.817409 sec] [118.817409 us/call] [0.000842 calls/sec] [ BASELINE ] LV_DNA50_50.LV_RMQ 118.817409 [ RUN ] LV_DNA50_50.LV_DMIN -- 1 run, 1 call per run. Problem Set 4 of 6. [ DONE ] LV_DNA50_50.LV_DMIN (24.625528 sec) [1 calls in 24.625528 sec] [24.625528 us/call] [0.004061 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DMIN 24.625528 [ RUN ] LV_DNA50_50.LV_DC -- 1 run, 1 call per run. Problem Set 4 of 6. [ DONE ] LV_DNA50_50.LV_DC (2.905366 sec) [1 calls in 2.905366 sec] [2.905366 us/call] [0.034419 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC 2.905366 [ RUN ] LV_DNA50_50.LV_DC_NAV -- 1 run, 1 call per run. Problem Set 4 of 6. [ DONE ] LV_DNA50_50.LV_DC_NAV (2.651918 sec) [1 calls in 2.651918 sec] [2.651918 us/call] [0.037709 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_NAV 2.651918 [ RUN ] LV_DNA50_50.LV_DC_PAR -- 1 run, 1 call per run. Problem Set 4 of 6. [ DONE ] LV_DNA50_50.LV_DC_PAR (0.951985 sec) [1 calls in 0.951985 sec] [0.951985 us/call] [0.105044 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_PAR 0.951985 [==========] [ STAGE ] Benchmarking [==========] [ RUN ] LV_DNA50_50.LV_RMQ -- 1 run, 1 call per run. Problem Set 5 of 6. [ DONE ] LV_DNA50_50.LV_RMQ (172.621011 sec) [1 calls in 172.621011 sec] [172.621011 us/call] [0.000579 calls/sec] [ BASELINE ] LV_DNA50_50.LV_RMQ 172.621011 [ RUN ] LV_DNA50_50.LV_DMIN -- 1 run, 1 call per run. Problem Set 5 of 6. [ DONE ] LV_DNA50_50.LV_DMIN (32.110696 sec) [1 calls in 32.110696 sec] [32.110696 us/call] [0.003114 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DMIN 32.110696 [ RUN ] LV_DNA50_50.LV_DC -- 1 run, 1 call per run. Problem Set 5 of 6. [ DONE ] LV_DNA50_50.LV_DC (4.631579 sec) [1 calls in 4.631579 sec] [4.631579 us/call] [0.021591 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC 4.631579 [ RUN ] LV_DNA50_50.LV_DC_NAV -- 1 run, 1 call per run. Problem Set 5 of 6. [ DONE ] LV_DNA50_50.LV_DC_NAV (4.179240 sec) [1 calls in 4.179240 sec] [4.179240 us/call] [0.023928 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_NAV 4.179240 [ RUN ] LV_DNA50_50.LV_DC_PAR -- 1 run, 1 call per run. Problem Set 5 of 6. [ DONE ] LV_DNA50_50.LV_DC_PAR (1.290418 sec) [1 calls in 1.290418 sec] [1.290418 us/call] [0.077494 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_PAR 1.290418 [==========] [ STAGE ] Benchmarking [==========] [ RUN ] LV_DNA50_50.LV_RMQ -- 1 run, 1 call per run. Problem Set 6 of 6. [ DONE ] LV_DNA50_50.LV_RMQ (321.168932 sec) [1 calls in 321.168932 sec] [321.168932 us/call] [0.000311 calls/sec] [ BASELINE ] LV_DNA50_50.LV_RMQ 321.168932 [ RUN ] LV_DNA50_50.LV_DMIN -- 1 run, 1 call per run. Problem Set 6 of 6. [ DONE ] LV_DNA50_50.LV_DMIN (51.033489 sec) [1 calls in 51.033489 sec] [51.033489 us/call] [0.001959 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DMIN 51.033489 [ RUN ] LV_DNA50_50.LV_DC -- 1 run, 1 call per run. Problem Set 6 of 6. [ DONE ] LV_DNA50_50.LV_DC (7.547392 sec) [1 calls in 7.547392 sec] [7.547392 us/call] [0.013250 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC 7.547392 [ RUN ] LV_DNA50_50.LV_DC_NAV -- 1 run, 1 call per run. Problem Set 6 of 6. [ DONE ] LV_DNA50_50.LV_DC_NAV (7.185174 sec) [1 calls in 7.185174 sec] [7.185174 us/call] [0.013918 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_NAV 7.185174 [ RUN ] LV_DNA50_50.LV_DC_PAR -- 1 run, 1 call per run. Problem Set 6 of 6. [ DONE ] LV_DNA50_50.LV_DC_PAR (3.774900 sec) [1 calls in 3.774900 sec] [3.774900 us/call] [0.026491 calls/sec] [ BASELINE ] LV_DNA50_50.LV_DC_PAR 3.774900 [==========] [ STAGE ] Completed. 6 tests complete. [==========]

    opened by danielsaad 4
  • A simple `make` should not run very long tests

    A simple `make` should not run very long tests

    I wanted to try out Celero and did this:

    $ cmake CMakeLists.txt
    $ make
    Scanning dependencies of target celero
    [  6%] Building CXX object CMakeFiles/celero.dir/src/BenchmarkInfo.cpp.o
    /Users/deil/code/Celero/src/BenchmarkInfo.cpp:21:4: warning: field 'resetCalls' will be initialized after field 'baselineUnit' [-Wreorder]
                            resetCalls(0),
                            ^
    /Users/deil/code/Celero/src/BenchmarkInfo.cpp:37:4: warning: field 'resetCalls' will be initialized after field 'baselineUnit' [-Wreorder]
                            resetCalls(calls),
                            ^
    /Users/deil/code/Celero/src/BenchmarkInfo.cpp:56:4: warning: field 'resetCalls' will be initialized after field 'baselineUnit' [-Wreorder]
                            resetCalls(other.pimpl->resetCalls),
                            ^
    3 warnings generated.
    [ 13%] Building CXX object CMakeFiles/celero.dir/src/Celero.cpp.o
    [ 20%] Building CXX object CMakeFiles/celero.dir/src/Console.cpp.o
    [ 26%] Building CXX object CMakeFiles/celero.dir/src/Executor.cpp.o
    [ 33%] Building CXX object CMakeFiles/celero.dir/src/JUnit.cpp.o
    [ 40%] Building CXX object CMakeFiles/celero.dir/src/Print.cpp.o
    [ 46%] Building CXX object CMakeFiles/celero.dir/src/ResultTable.cpp.o
    [ 53%] Building CXX object CMakeFiles/celero.dir/src/TestVector.cpp.o
    [ 60%] Building CXX object CMakeFiles/celero.dir/src/TestFixture.cpp.o
    [ 66%] Building CXX object CMakeFiles/celero.dir/src/Timer.cpp.o
    Linking CXX shared library libcelero.dylib
    [ 66%] Built target celero
    Scanning dependencies of target celeroDemoComparison
    [ 73%] Building CXX object CMakeFiles/celeroDemoComparison.dir/examples/DemoComparison.cpp.o
    Linking CXX executable celeroDemoComparison
    [==========] 
    [  CELERO  ]
    [==========] 
    [ STAGE    ] Baselining
    [==========] 
    [ RUN      ] StackOverflow.Baseline -- 100 samples, 5000000 calls per run.
    [     DONE ] StackOverflow.Baseline  (1.197661 sec) [5000000 calls in 1197661 usec] [0.239532 us/call] [4174804.055572 calls/sec]
    [==========] 
    [ STAGE    ] Benchmarking
    [==========] 
    [ RUN      ] StackOverflow.Compare -- 100 samples, 5000000 calls per run.
    

    After a few minutes I aborted the StackOverflow.Compare test. Is this intentional that such a long-running test is run when a user says make or is there a bug and the test is hanging for some reason?

    opened by cdeil 4
  • celero 2.8.0 OSX test build failure

    celero 2.8.0 OSX test build failure

    👋 trying to build the latest release, but run into some build issue. The error log is as below:

    test failure
    ==> /usr/bin/clang++ -std=c++14 test.cpp -L/usr/local/Cellar/celero/2.8.0/lib -lcelero -o test
    ==> ./test
    Celero
    Timer resolution: 0.001000 us
    Error: celero: failed
    An exception occurred within a child process:
      BuildError: Failed executing: ./test
    

    Full build log is in here, https://github.com/Homebrew/homebrew-core/runs/2156154799 relates to https://github.com/Homebrew/homebrew-core/pull/73571

    opened by chenrui333 3
  • Use _WIN32 instead of WIN32 preprocessor macro

    Use _WIN32 instead of WIN32 preprocessor macro

    • _WIN32 is the recommended preprocessor macro
    • Substitute occurrences of WIN32 with _WIN32 using:
      git grep -lw -e 'WIN32' -- | xargs sed -i -e 's/\<WIN32\>/_WIN32/g'
    
    opened by c72578 3
  • Problems with building / packaging celero

    Problems with building / packaging celero

    Hi thanks for this project!

    I am trying to do a CMake build for an integration of Celero. I came across a few issues:

    MT / MD Visual Studio

    First question: why are you forcing a static linkage VS runtime linkage: https://github.com/DigitalInBlue/Celero/blob/9d0b1e8d77dd44b385591612a486e668a404d7b2/CMakeLists.txt#L46 ?

    The CMake default is always MD and that has nothing to do with the question if I am building a shared or static library. So I would avoid coupling that to the CELERO_COMPILE_DYNAMIC_LIBRARIES option altogether. It could be a completely separate option for the project in my point of view, and please don't make MT the default.

    (actually you can pass it also from outside via CMAKE_MSVC_RUNTIME_LIBRARY and CMP0091, see here which I would recommend a hundred times over forcing it on the user)

    Generated Config files

    The following line causes the generated celero-config.cmake file not to be relocatable: https://github.com/DigitalInBlue/Celero/blob/9d0b1e8d77dd44b385591612a486e668a404d7b2/CMakeLists.txt#L209-L213

    The following code is generated by CMake: image

    Instead, you should omit the `CMAKE_INSTALL_PREFIX``

    target_include_directories(${PROJECT_NAME} PUBLIC
      $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
      $<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
    )
    

    Exporting targets to a namespace

    It's CMake best practice to export targets to a namespace. (e.g. to reference the celero library from CMake as celero::celero instead of only celero. It does have some CMake advantages (e.g. it's recognized as a target, and not acidentially mistaken as a linker command). However it might break users who are already consuming the celero library.

    Warning / build errors when linking static library:

    Lastly, I am also getting these warnings (and respective error messages) when compiling a sample against the build static libraries. (I am building one of the samples

    D:\Conan\Fresh\celero\package\96f5122383318fb0b29bd75a76b20a51004d12d4\include\celero/UserDefinedMeasurementCollector.h(43,84): warning C4251: 'celero::UserDefinedMeas
    urementCollector::collected': class 'std::unordered_map<std::string,std::shared_ptr<celero::UserDefinedMeasurement>,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pai
    r<const std::string,std::shared_ptr<celero::UserDefinedMeasurement>>>>' needs to have dll-interface to be used by clients of class 'celero::UserDefinedMeasurementCollector' [D:\test\packages\celero\all\test_package\build\66562db33a92334c1d04620ac86f04a958457d64\Debug\celero_test.vcxproj]
    

    and then a bunch of unresolved external errors

    main.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) int __cdecl celero::Random(void)" ([email protected]@@YAHXZ) referenced in function "protected: virtual void __cdecl CeleroUserBaseline_DemoToString_Baseline::UserBenchmark(void)" ([email protected][email protected]@MEAAXXZ) [D:\test\packages\celero\all\test_package\build\66562db33a92334c1d04620ac86f04a958457d64\Debug\celero_test.vcxproj]
    main.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl celero::Factory::Factory(void)" ([email protected]@@[email protected]) referenced in function "public : __cdecl celero::GenericFactory<class CeleroUserBaseline_DemoToString_Baseline>::GenericFactory<class CeleroUserBaseline_DemoToString_Baseline>(void)" ([email protected][email protected]@@[email protected]@[email protected]) [D:\test\packages\celero\all\test_package\build\66562db33a92334c1d04620ac86f04a958457d64\Debug\celero_test.vcxproj]
    

    Do I need to take explicit care when linking celero as a static library?

    I'd be happy to create a PR for the first issues. Please let me know if you are interested.

    opened by KerstinKeller 1
  • Truncation of group and experiment names in standard output

    Truncation of group and experiment names in standard output

    Issue Template

    Feature Request

    The Group and Experiment columns in the standard output uses a fixed 15 characters, with longer names being truncated. Can this be automatically scaled to the width of the widest entry for readability?

    opened by mlohry 0
  • Tests terminate with Signal 11

    Tests terminate with Signal 11

    The celero-test executable terminates with signal:

    [...skipped...]
    [----------] 2 tests from Distribution
    [ RUN      ] Distribution.BuildDistribution
    [       OK ] Distribution.BuildDistribution (0 ms)
    [ RUN      ] Distribution.RunDistribution
    [       OK ] Distribution.RunDistribution (1 ms)
    [----------] 2 tests from Distribution (1 ms total)
    
    [----------] 9 tests from Executor
    [ RUN      ] Executor.RunAll
    *** Signal 11
    
    Stop.
    make: stopped in /disk-samsung/freebsd-ports/benchmarks/libcelero
    

    Version: 2.8.2 OS: FreeBSD 12.2

    opened by yurivict 1
  • Samples and iterations are only computed for first size of problem space (division by zero)

    Samples and iterations are only computed for first size of problem space (division by zero)

    Bug Report

    I have a Fixture which returns number of experiments:

        std::vector<celero::TestFixture::ExperimentValue> getExperimentValues() const override
        {
            // Problem space as number of points (pairs of X/Y)
            std::vector<celero::TestFixture::ExperimentValue> v;
            v.emplace_back(1, 0);
            v.emplace_back(256, 0);
            v.emplace_back(512, 0);
            v.emplace_back(1024, 0);
            return v;
        }
    

    The baseline and other benchmarks are defined similarly to this::

    BASELINE_F(wkt, to_string, Fixture, 0, 0)
    {
        for (auto const& p : this->points_)
        {
            celero::DoNotOptimizeAway(std::to_string(p.x()));
            celero::DoNotOptimizeAway(std::to_string(p.y()));
        }
    }
    

    Consider this output, where subsequent sizes of problem space receive unexpected number of iterations, 0 leading to division by zero or similar:

    |     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  |
    |:--------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|
    |wkt             | to_string       |               1 |              30 |          262144 |         1.00000 |         0.94479 |      1058432.12 |
    |wkt             | to_string       |             256 |              30 |               0 |         1.00000 |       -nan(ind) |       -nan(ind) |
    |wkt             | to_string       |             512 |              30 |               0 |         1.00000 |       -nan(ind) |       -nan(ind) |
    |wkt             | to_string       |            1024 |              30 |               0 |         1.00000 |       -nan(ind) |       -nan(ind) |
    |wkt             | stringstream    |               1 |              30 |          131072 |         1.63591 |         1.54559 |       647000.75 |
    |wkt             | stringstream    |             256 |              30 |               0 |        -1.00000 |       -nan(ind) |       -nan(ind) |
    |wkt             | stringstream    |             512 |              30 |               0 |        -1.00000 |       -nan(ind) |       -nan(ind) |
    |wkt             | stringstream    |            1024 |              30 |               0 |        -1.00000 |       -nan(ind) |       -nan(ind) |
    |wkt             | lexical_cast    |               1 |              30 |          131072 |         1.70723 |         1.61298 |       619972.00 |
    |wkt             | lexical_cast    |             256 |              30 |               0 |        -1.00000 |       -nan(ind) |       -nan(ind) |
    |wkt             | lexical_cast    |             512 |              30 |               0 |        -1.00000 |       -nan(ind) |       -nan(ind) |
    |wkt             | lexical_cast    |            1024 |              30 |               0 |        -1.00000 |       -nan(ind) |       -nan(ind) |
    

    If I change the getExperimentValues to read

        std::vector<celero::TestFixture::ExperimentValue> getExperimentValues() const override
        {
            std::uint64_t const total_tests = 4;
            // Problem space as number of points (pairs of X/Y)
            std::vector<celero::TestFixture::ExperimentValue> v;
            v.emplace_back(1, uint64_t(std::pow(2, total_tests - 0)));
            v.emplace_back(256, uint64_t(std::pow(2, total_tests - 1)));
            v.emplace_back(512, uint64_t(std::pow(2, total_tests - 2)));
            v.emplace_back(1024, uint64_t(std::pow(2, total_tests - 3)));
            return v;
        }
    

    and I keep the BASELINE_F(wkt, to_string, Fixture, 0, 0), then I get this

    |     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  |
    |:--------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|
    |wkt             | to_string       |               1 |              30 |          262144 |         1.00000 |         0.91043 |      1098376.39 |
    |wkt             | to_string       |             256 |              30 |               8 |         1.00000 |       249.87500 |         4002.00 |
    |wkt             | to_string       |             512 |              30 |               4 |         1.00000 |       489.50000 |         2042.90 |
    |wkt             | to_string       |            1024 |              30 |               2 |         1.00000 |       980.00000 |         1020.41 |
    |wkt             | stringstream    |               1 |              30 |          262144 |         1.59528 |         1.45240 |       688517.27 |
    |wkt             | stringstream    |             256 |              30 |               8 |         1.19510 |       298.62500 |         3348.68 |
    |wkt             | stringstream    |             512 |              30 |               4 |         1.21655 |       595.50000 |         1679.26 |
    |wkt             | stringstream    |            1024 |              30 |               2 |         1.21939 |      1195.00000 |          836.82 |
    

    where first experiment always gets the number of iterations calculated by Celero, 262144 instead of pre-calculated 16, but subsequent experiments within a set get the iterations from the getExoerimentValues spec.

    I'm confused, is the preference of 0 from BASELINE_F for the first experiment an expected behaviour?

    opened by mloskot 0
  • REQ: Runtime warning if CPU scaling is enabled

    REQ: Runtime warning if CPU scaling is enabled

    I tried out Celero for the first time, and got very inconsistent results.

    Tried out google benchmark, and it showed:

    ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
    

    Would be nice if Celero had a similar warning. I'm betting turning scaling off will fix the inconsistencies I was seeing.

    I disabled CPU scaling (well, technically set it to always run at max frequency) by running sudo cpupower frequency-set -g performance. I'll be back to trying Celero soon.

    opened by darlingm 3
Releases(v2.8.3)
  • v2.8.3(Jan 26, 2022)

  • v2.8.2(Mar 26, 2021)

  • v2.8.1(Mar 25, 2021)

  • v2.8.0(Mar 20, 2021)

  • v2.7.2(Feb 27, 2021)

  • v2.7.1(Feb 27, 2021)

  • v2.7.0(Feb 24, 2021)

    • Removed Google Test submodule
    • Clarified Doxygen use in Readme
    • Added User-Defined Measurements to the Output File
    • Added GitHub Code Analysis
    • Added MinGW Support
    • Cleaned up Compiler issues on MacOS
    Source code(tar.gz)
    Source code(zip)
  • v2.6.0(Oct 18, 2019)

    • This release improves the automated testing and fixes associated bugs.
    • Experiments and Demos were renamed.
    • CMake modernization
    • Google Test was added as a submodule.
    Source code(tar.gz)
    Source code(zip)
  • v2.5.0(Jun 9, 2019)

    This wraps up several minor changes

    • Testing on Visual Studio 2019
    • Celero now reports the total time to run on the command line.
    • Automatic computation of the required number of iterations has been improved.
    • Added a FindCelero.cmake for easier integration into other projects.
    • Improved Markdown console output.
    • Added a CMake option to allow command-line changes of the location of the GoogleTest repository.
    • Added automated testing for GCC v8, LLVM v7, and XCode 10.1
    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Oct 27, 2018)

    This release adds "User Defined Measurements" to the library. This was a great contribution that comes with new example code and an updated README.md.

    Source code(tar.gz)
    Source code(zip)
  • v2.3.0(Jul 2, 2018)

    • CMake exports are improved (it is better integrated into CMake's ecosystem).
    • Console output is formatted as a Markdown table for easy copy/paste into reports.
    • Notifies the user when executing in Debug mode (because the results would not be valid measurements).
    • Minor documentation updates.
    • Reduced the overhead of "DoNotOptimizeAway".
    • Greatly improved Google Test integration.
    • Improved the random number generation.
    • Updates for GCC/Linux.
    • Not tested on Mac.
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Mar 11, 2018)

    • Improved CMake integration.
    • Console table is now formatted as Markdown.
    • CPU throttle controls have been removed.
    • Added an example of shared_ptr measurements which includes OpenScenGraph's ref_ptr.
    Source code(tar.gz)
    Source code(zip)
  • v2.1.1(Feb 3, 2018)

  • v2.1.0(Oct 5, 2017)

  • v2.0.7(May 28, 2017)

  • v2.0.6(Feb 27, 2017)

    This release introduces exception catching and has a few minor updates to the documentation.

    • The documentation was updated.
    • Clang Format was applied to all source code files.
    • Exception handling was added. Enable via the command line "--catchExceptions" option.
    • Failure handling was added to archives. *Minor bugs in the statistics were resolved.
    Source code(tar.gz)
    Source code(zip)
  • 2.0.5(Mar 28, 2016)

    This release re-adds the code to automatically compute some reasonable iteration count and sample size.

    The documentation was updated for this as well. In short, simply set your sample size to zero and Celero will attempt to do your job for you.

    BENCHMARK(DemoSimple, Complex1, 0, 0)
    {
        celero::DoNotOptimizeAway(static_cast<float>(sin(fmod(UniformDistribution(RandomDevice), 3.14159265))));
    }
    

    Will produce something like:

    -----------------------------------------------------------------------------------------------------------------------------------------------
         Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  |
    -----------------------------------------------------------------------------------------------------------------------------------------------
    DemoSimple      | Complex1        | Null            |              30 |         4194304 |         1.00535 |         0.20928 |      4778189.16 |
    

    You can see that the specification of 0 samples and 0 iterations produced 30 samples and 4194304 iterations when it runs. This is based on making numerous measurements before actually running the experiment to determine reasonable values for these two numbers.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.4(Mar 21, 2016)

  • v2.0.3(Mar 20, 2016)

  • v2.0.2(Nov 13, 2015)

    This version changed the default problem space value created when no problem spaces are specified by a benchmark. In the previous release, if no problem space value was defined, then Clero would create a problem space with a default value of zero. This release changed this to the constant TestFixture::Constants::NoProblemSpaceValue. The core problem is that if nothing is done at all to add a default problem space, then the number of iterations for a result is not set, and the benchmark code is ran only a single time. There is a better solution for this, but this should be an improvement for now.

    This release also added a new demo for finding the fastest way to convert an integer to a std::string.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jul 7, 2015)

    The update to v2.0.0 improved the way experiment values and units were implemented. However, there was an oversight in that if these more complex features were not used, only a single iteration of the code under test was executed. This was resolved in v2.0.1. No API changes were implemented. Minor documentation updates were also performed.

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jun 26, 2015)

    This release adds a few great new features. First, it allows for automatic threading of test cases using fixtures and a user-defined number of concurrent threads.

    Second, this version allows the user to define a hard-coded baseline measurement for measuring performance against real-world measurements, hardware, or time requirements.

    Finally, Celero now allows for dynamic control of the number of iterations based on the problem space value. This means that you can ramp-down the number of iterations as problem complexity increases which can decrease the amount of time required to make the measurements while maintaining accuracy for both large and small data sets.

    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Apr 9, 2015)

    This release includes multithreaded tests and user-defined units for expressing results. This was tested using GCC, Clang, and Visual Studio on x64 architectures.

    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Mar 26, 2015)

    • This release fixes the previous release not properly running benchmarks which lacked a problem space definition.
    • Console output was completely revamped.
    • Table output was completely revamped and expanded.
    • README.md and CONTRIBUTING.md were updated.
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Mar 22, 2015)

    This version of Celero includes two new test fixture functions: onExperimentStart and onExperimentEnd. Unlike setup and teardown, these functions are called each once for ever N calls to the benchmarking function and are included in the measurement.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.9(Feb 9, 2015)

    This version adds a new directory structure to support demo projects as well as experiments. Users are encouraged to submit their own demo and experiments projects (following the project naming convention and general CMake format) on the "develop" branch.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.8(Jan 19, 2015)

    This release adds a summary output to the console for all experiments. Additionally, improvements were made to the timing output for Windows.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.7(Dec 9, 2014)

  • v1.0.6(Nov 9, 2014)

    • Timing results output is improved.
    • Benchmarks which lack a baseline now error-out (vs. asserting)
    • CMake configuration improved to more easily disable compilation/running of test and examples.
    Source code(tar.gz)
    Source code(zip)
  • v1.0.5(Sep 13, 2014)

    [Edit: v1.0.4 was mis-tagged. It is identical to v1.0.3. This (v1.0.5) is the intended release.]

    This release offers an overhaul of DoNotOptimizeAway. There is slightly new functionality here as well. If a lambda or std::function is passed into DoNotOptimizeAway, the lambda will be executed. The body of this template has also been updated.

    The speed improvement here is important as I think the slowness of the previous implementation made it more difficult to get accurate measurements from very fast code. A demo has been added which compares the new implementation to the old as well as alternatives. Better (portable) implementations of this function are welcome for suggestion. The demo can be used to demonstrate their "goodness".

    Source code(tar.gz)
    Source code(zip)
Owner
John Farrier
John Farrier
This Repository is a program to Benchmark the get_next_line from 42/21/CODAM

Get Next Line BenchMark of 42. Make with ❤︎ ?? Index What's this Repo? List of Archives Technologies How to Run Find a Bug? Or somenthing need to chan

Face Tattoo 3 Apr 24, 2022
A tool to automatically benchmark the most performant core based on X% lows/percentile fps in lava-lamp.

AutoGpuAffinity A tool to automatically benchmark the best physical CPU for the GPU to execute dpcs/isrs on based on 0.1% percentile/lows fps. Tips to

AMIT 0 Jun 12, 2022
A unit testing framework for C

Check Table of Contents About Installing Linking Packaging About Check is a unit testing framework for C. It features a simple interface for defining

null 885 Jun 25, 2022
The fastest feature-rich C++11/14/17/20 single-header testing framework

master branch Windows All dev branch Windows All doctest is a new C++ testing framework but is by far the fastest both in compile times (by orders of

Viktor Kirilov 4k Jun 23, 2022
A modern, C++-native, header-only, test framework for unit-tests, TDD and BDD - using C++11, C++14, C++17 and later (or C++03 on the Catch1.x branch)

Catch2 v3 is being developed! You are on the devel branch, where the next major version, v3, of Catch2 is being developed. As it is a significant rewo

Catch Org 15.2k Jun 20, 2022
A modern, C++-native, header-only, test framework for unit-tests, TDD and BDD - using C++11, C++14, C++17 and later (or C++03 on the Catch1.x branch)

Catch2 v3 is being developed! You are on the devel branch, where the next major version, v3, of Catch2 is being developed. As it is a significant rewo

Catch Org 15.2k Jun 25, 2022
CppUTest unit testing and mocking framework for C/C++

CppUTest CppUTest unit testing and mocking framework for C/C++ More information on the project page Slack channel: Join if link not expired Getting St

CppUTest 1.1k Jun 24, 2022
A testing micro framework for creating function test doubles

Fake Function Framework (fff) A Fake Function Framework for C Hello Fake World! Capturing Arguments Return Values Resetting a Fake Call History Defaul

Mike Long 490 Jun 22, 2022
Googletest - Google Testing and Mocking Framework

GoogleTest OSS Builds Status Announcements Release 1.10.x Release 1.10.x is now available. Coming Soon Post 1.10.x googletest will follow Abseil Live

Google 26.6k Jun 20, 2022
Minimal unit testing framework for C

MinUnit Minunit is a minimal unit testing framework for C/C++ self-contained in a single header file. It provides a way to define and configure test s

David Siñuela Pastor 422 Jun 26, 2022
A C++ micro-benchmarking framework

Nonius What is nonius? Nonius is an open-source framework for benchmarking small snippets of C++ code. It is very heavily inspired by Criterion, a sim

Nonius 334 Jun 27, 2022
A lightweight unit testing framework for C++

Maintenance of UnitTest++, recently sporadic, is officially on hiatus until 26 November 2020. Subscribe to https://github.com/unittest-cpp/unittest-cp

UnitTest++ 500 Jun 11, 2022
🧪 single header unit testing framework for C and C++

?? utest.h A simple one header solution to unit testing for C/C++. Usage Just #include "utest.h" in your code! The current supported platforms are Lin

Neil Henning 489 Jun 17, 2022
UT: C++20 μ(micro)/Unit Testing Framework

"If you liked it then you "should have put a"_test on it", Beyonce rule [Boost::ext].UT / μt | Motivation | Quick Start | Overview | Tutorial | Exampl

boost::ext 867 Jun 28, 2022
test framework

Photesthesis This is a small, experimental parameterized-testing tool. It is intended to be used in concert with another unit-testing framework (eg. C

Graydon Hoare 11 Jun 2, 2021
A simple framework for compile-time benchmarks

Metabench A simple framework for compile-time microbenchmarks Overview Metabench is a single, self-contained CMake module making it easy to create com

Louis Dionne 161 Jun 19, 2022
A microbenchmark support library

Benchmark A library to benchmark code snippets, similar to unit tests. Example: #include <benchmark/benchmark.h> static void BM_SomeFunction(benchmar

Google 6.6k Jun 22, 2022
🍦IceCream-Cpp is a little (single header) library to help with the print debugging on C++11 and forward.

??IceCream-Cpp is a little (single header) library to help with the print debugging on C++11 and forward.

Renato Garcia 383 Jun 25, 2022
C++ Benchmark Authoring Library/Framework

Celero C++ Benchmarking Library Copyright 2017-2021 John Farrier Apache 2.0 License Community Support A Special Thanks to the following corporations f

John Farrier 704 Jun 27, 2022
Fft-benchmark - A benchmark for comparison of FFT algorithms performance

FFT benchmark A benchmark for comparison of FFT algorithms performance. Currently supports Intel IPP, KFR, FFTW and KissFFT. Requires: Clang 6.0+ (GCC

KFR 16 Apr 7, 2022
Light probe generation and BRDF authoring for physically based shading.

IBLBaker About IBLBaker is provided under the MIT License(MIT) Copyright(c) 2015 Matt Davidson. Please see the LICENSE file for full details. Feel fre

MattD 633 Jun 23, 2022
Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration

3D+NeuroSim V1.0 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly av

NeuroSim 10 Dec 21, 2021
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

NeuroSim 20 Jun 1, 2022
Conan recipe for Google Benchmark library

DEPRECATED Please note that as Google Benchmark now has an official Conan support this repository should be considered deprecated. Please download the

Mateusz Pusz 8 Jan 17, 2022
Simple C++ code to benchmark fast division algorithms

fast_division Simple C++ code to benchmark fast division algorithms relying on constant divisors. The code is a companion to the paper Integer Divisio

Daniel Lemire 35 Apr 19, 2022
Simple benchmark for terminal output

TermBench This is a simple timing utility you can use to see how slow your terminal program is at parsing escape-sequence-coded color output. It can b

Casey Muratori 171 May 30, 2022
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions

CRAFT This repository contains the codes used to generate the CRAFT dataset as described in the paper: CRAFT: A Benchmark for Causal Reasoning About F

null 10 Apr 12, 2022
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

DAMOV is a benchmark suite and a methodical framework targeting the study of data movement bottlenecks in modern applications. It is intended to study new architectures, such as near-data processing. Described by Oliveira et al.

SAFARI Research Group at ETH Zurich and Carnegie Mellon University 26 May 2, 2022
BAM I/O Throughput Benchmark

If you want measure the throughput of reading and writing BAM files, with optional multithreaded BAM compression and decompression. biot reads in a BAM file and writes the identical file out. Except for the (de)compression, it's a serial workflow.

Armin Töpfer 1 Nov 26, 2021