A microbenchmark support library

Overview

Benchmark

build-and-test pylint test-bindings

Build Status Build status Coverage Status

A library to benchmark code snippets, similar to unit tests. Example:

#include <benchmark/benchmark.h>

static void BM_SomeFunction(benchmark::State& state) {
  // Perform setup here
  for (auto _ : state) {
    // This code gets timed
    SomeFunction();
  }
}
// Register the function as a benchmark
BENCHMARK(BM_SomeFunction);
// Run the benchmark
BENCHMARK_MAIN();

To get started, see Requirements and Installation. See Usage for a full example and the User Guide for a more comprehensive feature overview.

It may also help to read the Google Test documentation as some of the structural aspects of the APIs are similar.

Resources

Discussion group

IRC channel: freenode #googlebenchmark

Additional Tooling Documentation

Assembly Testing Documentation

Requirements

The library can be used with C++03. However, it requires C++11 to build, including compiler and standard library support.

The following minimum versions are required to build the library:

  • GCC 4.8
  • Clang 3.4
  • Visual Studio 14 2015
  • Intel 2015 Update 1

See Platform-Specific Build Instructions.

Installation

This describes the installation process using cmake. As pre-requisites, you'll need git and cmake installed.

See dependencies.md for more details regarding supported versions of build tools.

# Check out the library.
$ git clone https://github.com/google/benchmark.git
# Benchmark requires Google Test as a dependency. Add the source tree as a subdirectory.
$ git clone https://github.com/google/googletest.git benchmark/googletest
# Go to the library root directory
$ cd benchmark
# Make a build directory to place the build output.
$ cmake -E make_directory "build"
# Generate build system files with cmake.
$ cmake -E chdir "build" cmake -DCMAKE_BUILD_TYPE=Release ../
# or, starting with CMake 3.13, use a simpler form:
# cmake -DCMAKE_BUILD_TYPE=Release -S . -B "build"
# Build the library.
$ cmake --build "build" --config Release

This builds the benchmark and benchmark_main libraries and tests. On a unix system, the build directory should now look something like this:

/benchmark
  /build
    /src
      /libbenchmark.a
      /libbenchmark_main.a
    /test
      ...

Next, you can run the tests to check the build.

$ cmake -E chdir "build" ctest --build-config Release

If you want to install the library globally, also run:

sudo cmake --build "build" --config Release --target install

Note that Google Benchmark requires Google Test to build and run the tests. This dependency can be provided two ways:

  • Checkout the Google Test sources into benchmark/googletest as above.
  • Otherwise, if -DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON is specified during configuration, the library will automatically download and build any required dependencies.

If you do not wish to build and run the tests, add -DBENCHMARK_ENABLE_GTEST_TESTS=OFF to CMAKE_ARGS.

Debug vs Release

By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, add -DCMAKE_BUILD_TYPE=Release when generating the build system files, as shown above. The use of --config Release in build commands is needed to properly support multi-configuration tools (like Visual Studio for example) and can be skipped for other build systems (like Makefile).

To enable link-time optimisation, also add -DBENCHMARK_ENABLE_LTO=true when generating the build system files.

If you are using gcc, you might need to set GCC_AR and GCC_RANLIB cmake cache variables, if autodetection fails.

If you are using clang, you may need to set LLVMAR_EXECUTABLE, LLVMNM_EXECUTABLE and LLVMRANLIB_EXECUTABLE cmake cache variables.

Stable and Experimental Library Versions

The main branch contains the latest stable version of the benchmarking library; the API of which can be considered largely stable, with source breaking changes being made only upon the release of a new major version.

Newer, experimental, features are implemented and tested on the v2 branch. Users who wish to use, test, and provide feedback on the new features are encouraged to try this branch. However, this branch provides no stability guarantees and reserves the right to change and break the API at any time.

Usage

Basic usage

Define a function that executes the code to measure, register it as a benchmark function using the BENCHMARK macro, and ensure an appropriate main function is available:

#include <benchmark/benchmark.h>

static void BM_StringCreation(benchmark::State& state) {
  for (auto _ : state)
    std::string empty_string;
}
// Register the function as a benchmark
BENCHMARK(BM_StringCreation);

// Define another benchmark
static void BM_StringCopy(benchmark::State& state) {
  std::string x = "hello";
  for (auto _ : state)
    std::string copy(x);
}
BENCHMARK(BM_StringCopy);

BENCHMARK_MAIN();

To run the benchmark, compile and link against the benchmark library (libbenchmark.a/.so). If you followed the build steps above, this library will be under the build directory you created.

# Example on linux after running the build steps above. Assumes the
# `benchmark` and `build` directories are under the current directory.
$ g++ mybenchmark.cc -std=c++11 -isystem benchmark/include \
  -Lbenchmark/build/src -lbenchmark -lpthread -o mybenchmark

Alternatively, link against the benchmark_main library and remove BENCHMARK_MAIN(); above to get the same behavior.

The compiled executable will run all benchmarks by default. Pass the --help flag for option information or see the guide below.

Usage with CMake

If using CMake, it is recommended to link against the project-provided benchmark::benchmark and benchmark::benchmark_main targets using target_link_libraries. It is possible to use find_package to import an installed version of the library.

find_package(benchmark REQUIRED)

Alternatively, add_subdirectory will incorporate the library directly in to one's CMake project.

add_subdirectory(benchmark)

Either way, link to the library as follows.

target_link_libraries(MyTarget benchmark::benchmark)

Platform Specific Build Instructions

Building with GCC

When the library is built using GCC it is necessary to link with the pthread library due to how GCC implements std::thread. Failing to link to pthread will lead to runtime exceptions (unless you're using libc++), not linker errors. See issue #67 for more details. You can link to pthread by adding -pthread to your linker command. Note, you can also use -lpthread, but there are potential issues with ordering of command line parameters if you use that.

Building with Visual Studio 2015 or 2017

The shlwapi library (-lshlwapi) is required to support a call to CPUInfo which reads the registry. Either add shlwapi.lib under [ Configuration Properties > Linker > Input ], or use the following:

// Alternatively, can add libraries using linker options.
#ifdef _WIN32
#pragma comment ( lib, "Shlwapi.lib" )
#ifdef _DEBUG
#pragma comment ( lib, "benchmarkd.lib" )
#else
#pragma comment ( lib, "benchmark.lib" )
#endif
#endif

Can also use the graphical version of CMake:

  • Open CMake GUI.
  • Under Where to build the binaries, same path as source plus build.
  • Under CMAKE_INSTALL_PREFIX, same path as source plus install.
  • Click Configure, Generate, Open Project.
  • If build fails, try deleting entire directory and starting again, or unticking options to build less.

Building with Intel 2015 Update 1 or Intel System Studio Update 4

See instructions for building with Visual Studio. Once built, right click on the solution and change the build to Intel.

Building on Solaris

If you're running benchmarks on solaris, you'll want the kstat library linked in too (-lkstat).

User Guide

Command Line

Output Formats

Output Files

Running Benchmarks

Running a Subset of Benchmarks

Result Comparison

Library

Runtime and Reporting Considerations

Passing Arguments

Calculating Asymptotic Complexity

Templated Benchmarks

Fixtures

Custom Counters

Multithreaded Benchmarks

CPU Timers

Manual Timing

Setting the Time Unit

Preventing Optimization

Reporting Statistics

Custom Statistics

Using RegisterBenchmark

Exiting with an Error

A Faster KeepRunning Loop

Disabling CPU Frequency Scaling

Output Formats

The library supports multiple output formats. Use the --benchmark_format=<console|json|csv> flag (or set the BENCHMARK_FORMAT=<console|json|csv> environment variable) to set the format type. console is the default format.

The Console format is intended to be a human readable format. By default the format generates color output. Context is output on stderr and the tabular data on stdout. Example tabular output looks like:

Benchmark                               Time(ns)    CPU(ns) Iterations
----------------------------------------------------------------------
BM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
BM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
BM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s

The JSON format outputs human readable json split into two top level attributes. The context attribute contains information about the run in general, including information about the CPU and the date. The benchmarks attribute contains a list of every benchmark run. Example json output looks like:

{
  "context": {
    "date": "2015/03/17-18:40:25",
    "num_cpus": 40,
    "mhz_per_cpu": 2801,
    "cpu_scaling_enabled": false,
    "build_type": "debug"
  },
  "benchmarks": [
    {
      "name": "BM_SetInsert/1024/1",
      "iterations": 94877,
      "real_time": 29275,
      "cpu_time": 29836,
      "bytes_per_second": 134066,
      "items_per_second": 33516
    },
    {
      "name": "BM_SetInsert/1024/8",
      "iterations": 21609,
      "real_time": 32317,
      "cpu_time": 32429,
      "bytes_per_second": 986770,
      "items_per_second": 246693
    },
    {
      "name": "BM_SetInsert/1024/10",
      "iterations": 21393,
      "real_time": 32724,
      "cpu_time": 33355,
      "bytes_per_second": 1199226,
      "items_per_second": 299807
    }
  ]
}

The CSV format outputs comma-separated values. The context is output on stderr and the CSV itself on stdout. Example CSV output looks like:

name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,

Output Files

Write benchmark results to a file with the --benchmark_out=<filename> option (or set BENCHMARK_OUT). Specify the output format with --benchmark_out_format={json|console|csv} (or set BENCHMARK_OUT_FORMAT={json|console|csv}). Note that the 'csv' reporter is deperecated and the saved .csv file is not parsable by csv parsers.

Specifying --benchmark_out does not suppress the console output.

Running Benchmarks

Benchmarks are executed by running the produced binaries. Benchmarks binaries, by default, accept options that may be specified either through their command line interface or by setting environment variables before execution. For every --option_flag=<value> CLI switch, a corresponding environment variable OPTION_FLAG=<value> exist and is used as default if set (CLI switches always prevails). A complete list of CLI options is available running benchmarks with the --help switch.

Running a Subset of Benchmarks

The --benchmark_filter=<regex> option (or BENCHMARK_FILTER=<regex> environment variable) can be used to only run the benchmarks that match the specified <regex>. For example:

$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
Run on (1 X 2300 MHz CPU )
2016-06-25 19:34:24
Benchmark              Time           CPU Iterations
----------------------------------------------------
BM_memcpy/32          11 ns         11 ns   79545455
BM_memcpy/32k       2181 ns       2185 ns     324074
BM_memcpy/32          12 ns         12 ns   54687500
BM_memcpy/32k       1834 ns       1837 ns     357143

Result comparison

It is possible to compare the benchmarking results. See Additional Tooling Documentation

Runtime and Reporting Considerations

When the benchmark binary is executed, each benchmark function is run serially. The number of iterations to run is determined dynamically by running the benchmark a few times and measuring the time taken and ensuring that the ultimate result will be statistically stable. As such, faster benchmark functions will be run for more iterations than slower benchmark functions, and the number of iterations is thus reported.

In all cases, the number of iterations for which the benchmark is run is governed by the amount of time the benchmark takes. Concretely, the number of iterations is at least one, not more than 1e9, until CPU time is greater than the minimum time, or the wallclock time is 5x minimum time. The minimum time is set per benchmark by calling MinTime on the registered benchmark object.

Average timings are then reported over the iterations run. If multiple repetitions are requested using the --benchmark_repetitions command-line option, or at registration time, the benchmark function will be run several times and statistical results across these repetitions will also be reported.

As well as the per-benchmark entries, a preamble in the report will include information about the machine on which the benchmarks are run.

Passing Arguments

Sometimes a family of benchmarks can be implemented with just one routine that takes an extra argument to specify which one of the family of benchmarks to run. For example, the following code defines a family of benchmarks for measuring the speed of memcpy() calls of different lengths:

static void BM_memcpy(benchmark::State& state) {
  char* src = new char[state.range(0)];
  char* dst = new char[state.range(0)];
  memset(src, 'x', state.range(0));
  for (auto _ : state)
    memcpy(dst, src, state.range(0));
  state.SetBytesProcessed(int64_t(state.iterations()) *
                          int64_t(state.range(0)));
  delete[] src;
  delete[] dst;
}
BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);

The preceding code is quite repetitive, and can be replaced with the following short-hand. The following invocation will pick a few appropriate arguments in the specified range and will generate a benchmark for each such argument.

BENCHMARK(BM_memcpy)->Range(8, 8<<10);

By default the arguments in the range are generated in multiples of eight and the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the range multiplier is changed to multiples of two.

BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);

Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].

The preceding code shows a method of defining a sparse range. The following example shows a method of defining a dense range. It is then used to benchmark the performance of std::vector initialization for uniformly increasing sizes.

static void BM_DenseRange(benchmark::State& state) {
  for(auto _ : state) {
    std::vector<int> v(state.range(0), state.range(0));
    benchmark::DoNotOptimize(v.data());
    benchmark::ClobberMemory();
  }
}
BENCHMARK(BM_DenseRange)->DenseRange(0, 1024, 128);

Now arguments generated are [ 0, 128, 256, 384, 512, 640, 768, 896, 1024 ].

You might have a benchmark that depends on two or more inputs. For example, the following code defines a family of benchmarks for measuring the speed of set insertion.

static void BM_SetInsert(benchmark::State& state) {
  std::set<int> data;
  for (auto _ : state) {
    state.PauseTiming();
    data = ConstructRandomSet(state.range(0));
    state.ResumeTiming();
    for (int j = 0; j < state.range(1); ++j)
      data.insert(RandomNumber());
  }
}
BENCHMARK(BM_SetInsert)
    ->Args({1<<10, 128})
    ->Args({2<<10, 128})
    ->Args({4<<10, 128})
    ->Args({8<<10, 128})
    ->Args({1<<10, 512})
    ->Args({2<<10, 512})
    ->Args({4<<10, 512})
    ->Args({8<<10, 512});

The preceding code is quite repetitive, and can be replaced with the following short-hand. The following macro will pick a few appropriate arguments in the product of the two specified ranges and will generate a benchmark for each such pair.

BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});

Some benchmarks may require specific argument values that cannot be expressed with Ranges. In this case, ArgsProduct offers the ability to generate a benchmark input for each combination in the product of the supplied vectors.

BENCHMARK(BM_SetInsert)
    ->ArgsProduct({{1<<10, 3<<10, 8<<10}, {20, 40, 60, 80}})
// would generate the same benchmark arguments as
BENCHMARK(BM_SetInsert)
    ->Args({1<<10, 20})
    ->Args({3<<10, 20})
    ->Args({8<<10, 20})
    ->Args({3<<10, 40})
    ->Args({8<<10, 40})
    ->Args({1<<10, 40})
    ->Args({1<<10, 60})
    ->Args({3<<10, 60})
    ->Args({8<<10, 60})
    ->Args({1<<10, 80})
    ->Args({3<<10, 80})
    ->Args({8<<10, 80});

For more complex patterns of inputs, passing a custom function to Apply allows programmatic specification of an arbitrary set of arguments on which to run the benchmark. The following example enumerates a dense range on one parameter, and a sparse range on the second.

static void CustomArguments(benchmark::internal::Benchmark* b) {
  for (int i = 0; i <= 10; ++i)
    for (int j = 32; j <= 1024*1024; j *= 8)
      b->Args({i, j});
}
BENCHMARK(BM_SetInsert)->Apply(CustomArguments);

Passing Arbitrary Arguments to a Benchmark

In C++11 it is possible to define a benchmark that takes an arbitrary number of extra arguments. The BENCHMARK_CAPTURE(func, test_case_name, ...args) macro creates a benchmark that invokes func with the benchmark::State as the first argument followed by the specified args.... The test_case_name is appended to the name of the benchmark and should describe the values passed.

template <class ...ExtraArgs>
void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
  [...]
}
// Registers a benchmark named "BM_takes_args/int_string_test" that passes
// the specified values to `extra_args`.
BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));

Note that elements of ...args may refer to global variables. Users should avoid modifying global state inside of a benchmark.

Calculating Asymptotic Complexity (Big O)

Asymptotic complexity might be calculated for a family of benchmarks. The following code will calculate the coefficient for the high-order term in the running time and the normalized root-mean square error of string comparison.

static void BM_StringCompare(benchmark::State& state) {
  std::string s1(state.range(0), '-');
  std::string s2(state.range(0), '-');
  for (auto _ : state) {
    benchmark::DoNotOptimize(s1.compare(s2));
  }
  state.SetComplexityN(state.range(0));
}
BENCHMARK(BM_StringCompare)
    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);

As shown in the following invocation, asymptotic complexity might also be calculated automatically.

BENCHMARK(BM_StringCompare)
    ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();

The following code will specify asymptotic complexity with a lambda function, that might be used to customize high-order term calculation.

BENCHMARK(BM_StringCompare)->RangeMultiplier(2)
    ->Range(1<<10, 1<<18)->Complexity([](benchmark::IterationCount n)->double{return n; });

Templated Benchmarks

This example produces and consumes messages of size sizeof(v) range_x times. It also outputs throughput in the absence of multiprogramming.

template <class Q> void BM_Sequential(benchmark::State& state) {
  Q q;
  typename Q::value_type v;
  for (auto _ : state) {
    for (int i = state.range(0); i--; )
      q.push(v);
    for (int e = state.range(0); e--; )
      q.Wait(&v);
  }
  // actually messages, not bytes:
  state.SetBytesProcessed(
      static_cast<int64_t>(state.iterations())*state.range(0));
}
BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);

Three macros are provided for adding benchmark templates.

#ifdef BENCHMARK_HAS_CXX11
#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
#else // C++ < C++11
#define BENCHMARK_TEMPLATE(func, arg1)
#endif
#define BENCHMARK_TEMPLATE1(func, arg1)
#define BENCHMARK_TEMPLATE2(func, arg1, arg2)

Fixtures

Fixture tests are created by first defining a type that derives from ::benchmark::Fixture and then creating/registering the tests using the following macros:

  • BENCHMARK_F(ClassName, Method)
  • BENCHMARK_DEFINE_F(ClassName, Method)
  • BENCHMARK_REGISTER_F(ClassName, Method)

For Example:

class MyFixture : public benchmark::Fixture {
public:
  void SetUp(const ::benchmark::State& state) {
  }

  void TearDown(const ::benchmark::State& state) {
  }
};

BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
   for (auto _ : st) {
     ...
  }
}

BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
   for (auto _ : st) {
     ...
  }
}
/* BarTest is NOT registered */
BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
/* BarTest is now registered */

Templated Fixtures

Also you can create templated fixture by using the following macros:

  • BENCHMARK_TEMPLATE_F(ClassName, Method, ...)
  • BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)

For example:

template<typename T>
class MyFixture : public benchmark::Fixture {};

BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
   for (auto _ : st) {
     ...
  }
}

BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
   for (auto _ : st) {
     ...
  }
}

BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);

Custom Counters

You can add your own counters with user-defined names. The example below will add columns "Foo", "Bar" and "Baz" in its output:

static void UserCountersExample1(benchmark::State& state) {
  double numFoos = 0, numBars = 0, numBazs = 0;
  for (auto _ : state) {
    // ... count Foo,Bar,Baz events
  }
  state.counters["Foo"] = numFoos;
  state.counters["Bar"] = numBars;
  state.counters["Baz"] = numBazs;
}

The state.counters object is a std::map with std::string keys and Counter values. The latter is a double-like class, via an implicit conversion to double&. Thus you can use all of the standard arithmetic assignment operators (=,+=,-=,*=,/=) to change the value of each counter.

In multithreaded benchmarks, each counter is set on the calling thread only. When the benchmark finishes, the counters from each thread will be summed; the resulting sum is the value which will be shown for the benchmark.

The Counter constructor accepts three parameters: the value as a double ; a bit flag which allows you to show counters as rates, and/or as per-thread iteration, and/or as per-thread averages, and/or iteration invariants, and/or finally inverting the result; and a flag specifying the 'unit' - i.e. is 1k a 1000 (default, benchmark::Counter::OneK::kIs1000), or 1024 (benchmark::Counter::OneK::kIs1024)?

  // sets a simple counter
  state.counters["Foo"] = numFoos;

  // Set the counter as a rate. It will be presented divided
  // by the duration of the benchmark.
  // Meaning: per one second, how many 'foo's are processed?
  state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);

  // Set the counter as a rate. It will be presented divided
  // by the duration of the benchmark, and the result inverted.
  // Meaning: how many seconds it takes to process one 'foo'?
  state.counters["FooInvRate"] = Counter(numFoos, benchmark::Counter::kIsRate | benchmark::Counter::kInvert);

  // Set the counter as a thread-average quantity. It will
  // be presented divided by the number of threads.
  state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);

  // There's also a combined flag:
  state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);

  // This says that we process with the rate of state.range(0) bytes every iteration:
  state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024);

When you're compiling in C++11 mode or later you can use insert() with std::initializer_list:

  // With C++11, this can be done:
  state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
  // ... instead of:
  state.counters["Foo"] = numFoos;
  state.counters["Bar"] = numBars;
  state.counters["Baz"] = numBazs;

Counter Reporting

When using the console reporter, by default, user counters are printed at the end after the table, the same way as bytes_processed and items_processed. This is best for cases in which there are few counters, or where there are only a couple of lines per benchmark. Here's an example of the default output:

------------------------------------------------------------------------------
Benchmark                        Time           CPU Iterations UserCounters...
------------------------------------------------------------------------------
BM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
BM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
BM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
BM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
BM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
BM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
BM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
BM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
BM_Factorial                    26 ns         26 ns   26608979 40320
BM_Factorial/real_time          26 ns         26 ns   26587936 40320
BM_CalculatePiRange/1           16 ns         16 ns   45704255 0
BM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
BM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
BM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355

If this doesn't suit you, you can print each counter as a table column by passing the flag --benchmark_counters_tabular=true to the benchmark application. This is best for cases in which there are a lot of counters, or a lot of lines per individual benchmark. Note that this will trigger a reprinting of the table header any time the counter set changes between individual benchmarks. Here's an example of corresponding output when --benchmark_counters_tabular=true is passed:

---------------------------------------------------------------------------------------
Benchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
---------------------------------------------------------------------------------------
BM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
BM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
BM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
BM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
BM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
BM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
BM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
BM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
--------------------------------------------------------------
Benchmark                        Time           CPU Iterations
--------------------------------------------------------------
BM_Factorial                    26 ns         26 ns   26392245 40320
BM_Factorial/real_time          26 ns         26 ns   26494107 40320
BM_CalculatePiRange/1           15 ns         15 ns   45571597 0
BM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
BM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
BM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
BM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
BM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
BM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
BM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
BM_CalculatePi/threads:8      2255 ns       9943 ns      70936

Note above the additional header printed when the benchmark changes from BM_UserCounter to BM_Factorial. This is because BM_Factorial does not have the same counter set as BM_UserCounter.

Multithreaded Benchmarks

In a multithreaded test (benchmark invoked by multiple threads simultaneously), it is guaranteed that none of the threads will start until all have reached the start of the benchmark loop, and all will have finished before any thread exits the benchmark loop. (This behavior is also provided by the KeepRunning() API) As such, any global setup or teardown can be wrapped in a check against the thread index:

static void BM_MultiThreaded(benchmark::State& state) {
  if (state.thread_index == 0) {
    // Setup code here.
  }
  for (auto _ : state) {
    // Run the test as normal.
  }
  if (state.thread_index == 0) {
    // Teardown code here.
  }
}
BENCHMARK(BM_MultiThreaded)->Threads(2);

If the benchmarked code itself uses threads and you want to compare it to single-threaded code, you may want to use real-time ("wallclock") measurements for latency comparisons:

BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();

Without UseRealTime, CPU time is used by default.

CPU Timers

By default, the CPU timer only measures the time spent by the main thread. If the benchmark itself uses threads internally, this measurement may not be what you are looking for. Instead, there is a way to measure the total CPU usage of the process, by all the threads.

void callee(int i);

static void MyMain(int size) {
#pragma omp parallel for
  for(int i = 0; i < size; i++)
    callee(i);
}

static void BM_OpenMP(benchmark::State& state) {
  for (auto _ : state)
    MyMain(state.range(0));
}

// Measure the time spent by the main thread, use it to decide for how long to
// run the benchmark loop. Depending on the internal implementation detail may
// measure to anywhere from near-zero (the overhead spent before/after work
// handoff to worker thread[s]) to the whole single-thread time.
BENCHMARK(BM_OpenMP)->Range(8, 8<<10);

// Measure the user-visible time, the wall clock (literally, the time that
// has passed on the clock on the wall), use it to decide for how long to
// run the benchmark loop. This will always be meaningful, an will match the
// time spent by the main thread in single-threaded case, in general decreasing
// with the number of internal threads doing the work.
BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->UseRealTime();

// Measure the total CPU consumption, use it to decide for how long to
// run the benchmark loop. This will always measure to no less than the
// time spent by the main thread in single-threaded case.
BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime();

// A mixture of the last two. Measure the total CPU consumption, but use the
// wall clock to decide for how long to run the benchmark loop.
BENCHMARK(BM_OpenMP)->Range(8, 8<<10)->MeasureProcessCPUTime()->UseRealTime();

Controlling Timers

Normally, the entire duration of the work loop (for (auto _ : state) {}) is measured. But sometimes, it is necessary to do some work inside of that loop, every iteration, but without counting that time to the benchmark time. That is possible, although it is not recommended, since it has high overhead.

static void BM_SetInsert_With_Timer_Control(benchmark::State& state) {
  std::set<int> data;
  for (auto _ : state) {
    state.PauseTiming(); // Stop timers. They will not count until they are resumed.
    data = ConstructRandomSet(state.range(0)); // Do something that should not be measured
    state.ResumeTiming(); // And resume timers. They are now counting again.
    // The rest will be measured.
    for (int j = 0; j < state.range(1); ++j)
      data.insert(RandomNumber());
  }
}
BENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}});

Manual Timing

For benchmarking something for which neither CPU time nor real-time are correct or accurate enough, completely manual timing is supported using the UseManualTime function.

When UseManualTime is used, the benchmarked code must call SetIterationTime once per iteration of the benchmark loop to report the manually measured time.

An example use case for this is benchmarking GPU execution (e.g. OpenCL or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot be accurately measured using CPU time or real-time. Instead, they can be measured accurately using a dedicated API, and these measurement results can be reported back with SetIterationTime.

static void BM_ManualTiming(benchmark::State& state) {
  int microseconds = state.range(0);
  std::chrono::duration<double, std::micro> sleep_duration {
    static_cast<double>(microseconds)
  };

  for (auto _ : state) {
    auto start = std::chrono::high_resolution_clock::now();
    // Simulate some useful workload with a sleep
    std::this_thread::sleep_for(sleep_duration);
    auto end = std::chrono::high_resolution_clock::now();

    auto elapsed_seconds =
      std::chrono::duration_cast<std::chrono::duration<double>>(
        end - start);

    state.SetIterationTime(elapsed_seconds.count());
  }
}
BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();

Setting the Time Unit

If a benchmark runs a few milliseconds it may be hard to visually compare the measured times, since the output data is given in nanoseconds per default. In order to manually set the time unit, you can specify it manually:

BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);

Preventing Optimization

To prevent a value or expression from being optimized away by the compiler the benchmark::DoNotOptimize(...) and benchmark::ClobberMemory() functions can be used.

static void BM_test(benchmark::State& state) {
  for (auto _ : state) {
      int x = 0;
      for (int i=0; i < 64; ++i) {
        benchmark::DoNotOptimize(x += i);
      }
  }
}

DoNotOptimize(<expr>) forces the result of <expr> to be stored in either memory or a register. For GNU based compilers it acts as read/write barrier for global memory. More specifically it forces the compiler to flush pending writes to memory and reload any other values as necessary.

Note that DoNotOptimize(<expr>) does not prevent optimizations on <expr> in any way. <expr> may even be removed entirely when the result is already known. For example:

  /* Example 1: `<expr>` is removed entirely. */
  int foo(int x) { return x + 42; }
  while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);

  /*  Example 2: Result of '<expr>' is only reused */
  int bar(int) __attribute__((const));
  while (...) DoNotOptimize(bar(0)); // Optimized to:
  // int __result__ = bar(0);
  // while (...) DoNotOptimize(__result__);

The second tool for preventing optimizations is ClobberMemory(). In essence ClobberMemory() forces the compiler to perform all pending writes to global memory. Memory managed by block scope objects must be "escaped" using DoNotOptimize(...) before it can be clobbered. In the below example ClobberMemory() prevents the call to v.push_back(42) from being optimized away.

static void BM_vector_push_back(benchmark::State& state) {
  for (auto _ : state) {
    std::vector<int> v;
    v.reserve(1);
    benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
    v.push_back(42);
    benchmark::ClobberMemory(); // Force 42 to be written to memory.
  }
}

Note that ClobberMemory() is only available for GNU or MSVC based compilers.

Statistics: Reporting the Mean, Median and Standard Deviation of Repeated Benchmarks

By default each benchmark is run once and that single result is reported. However benchmarks are often noisy and a single result may not be representative of the overall behavior. For this reason it's possible to repeatedly rerun the benchmark.

The number of runs of each benchmark is specified globally by the --benchmark_repetitions flag or on a per benchmark basis by calling Repetitions on the registered benchmark object. When a benchmark is run more than once the mean, median and standard deviation of the runs will be reported.

Additionally the --benchmark_report_aggregates_only={true|false}, --benchmark_display_aggregates_only={true|false} flags or ReportAggregatesOnly(bool), DisplayAggregatesOnly(bool) functions can be used to change how repeated tests are reported. By default the result of each repeated run is reported. When report aggregates only option is true, only the aggregates (i.e. mean, median and standard deviation, maybe complexity measurements if they were requested) of the runs is reported, to both the reporters - standard output (console), and the file. However when only the display aggregates only option is true, only the aggregates are displayed in the standard output, while the file output still contains everything. Calling ReportAggregatesOnly(bool) / DisplayAggregatesOnly(bool) on a registered benchmark object overrides the value of the appropriate flag for that benchmark.

Custom Statistics

While having mean, median and standard deviation is nice, this may not be enough for everyone. For example you may want to know what the largest observation is, e.g. because you have some real-time constraints. This is easy. The following code will specify a custom statistic to be calculated, defined by a lambda function.

void BM_spin_empty(benchmark::State& state) {
  for (auto _ : state) {
    for (int x = 0; x < state.range(0); ++x) {
      benchmark::DoNotOptimize(x);
    }
  }
}

BENCHMARK(BM_spin_empty)
  ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
    return *(std::max_element(std::begin(v), std::end(v)));
  })
  ->Arg(512);

Using RegisterBenchmark(name, fn, args...)

The RegisterBenchmark(name, func, args...) function provides an alternative way to create and register benchmarks. RegisterBenchmark(name, func, args...) creates, registers, and returns a pointer to a new benchmark with the specified name that invokes func(st, args...) where st is a benchmark::State object.

Unlike the BENCHMARK registration macros, which can only be used at the global scope, the RegisterBenchmark can be called anywhere. This allows for benchmark tests to be registered programmatically.

Additionally RegisterBenchmark allows any callable object to be registered as a benchmark. Including capturing lambdas and function objects.

For Example:

auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };

int main(int argc, char** argv) {
  for (auto& test_input : { /* ... */ })
      benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
  benchmark::Initialize(&argc, argv);
  benchmark::RunSpecifiedBenchmarks();
}

Exiting with an Error

When errors caused by external influences, such as file I/O and network communication, occur within a benchmark the State::SkipWithError(const char* msg) function can be used to skip that run of benchmark and report the error. Note that only future iterations of the KeepRunning() are skipped. For the ranged-for version of the benchmark loop Users must explicitly exit the loop, otherwise all iterations will be performed. Users may explicitly return to exit the benchmark immediately.

The SkipWithError(...) function may be used at any point within the benchmark, including before and after the benchmark loop. Moreover, if SkipWithError(...) has been used, it is not required to reach the benchmark loop and one may return from the benchmark function early.

For example:

static void BM_test(benchmark::State& state) {
  auto resource = GetResource();
  if (!resource.good()) {
    state.SkipWithError("Resource is not good!");
    // KeepRunning() loop will not be entered.
  }
  while (state.KeepRunning()) {
    auto data = resource.read_data();
    if (!resource.good()) {
      state.SkipWithError("Failed to read data!");
      break; // Needed to skip the rest of the iteration.
    }
    do_stuff(data);
  }
}

static void BM_test_ranged_fo(benchmark::State & state) {
  auto resource = GetResource();
  if (!resource.good()) {
    state.SkipWithError("Resource is not good!");
    return; // Early return is allowed when SkipWithError() has been used.
  }
  for (auto _ : state) {
    auto data = resource.read_data();
    if (!resource.good()) {
      state.SkipWithError("Failed to read data!");
      break; // REQUIRED to prevent all further iterations.
    }
    do_stuff(data);
  }
}

A Faster KeepRunning Loop

In C++11 mode, a ranged-based for loop should be used in preference to the KeepRunning loop for running the benchmarks. For example:

static void BM_Fast(benchmark::State &state) {
  for (auto _ : state) {
    FastOperation();
  }
}
BENCHMARK(BM_Fast);

The reason the ranged-for loop is faster than using KeepRunning, is because KeepRunning requires a memory load and store of the iteration count ever iteration, whereas the ranged-for variant is able to keep the iteration count in a register.

For example, an empty inner loop of using the ranged-based for method looks like:

# Loop Init
  mov rbx, qword ptr [r14 + 104]
  call benchmark::State::StartKeepRunning()
  test rbx, rbx
  je .LoopEnd
.LoopHeader: # =>This Inner Loop Header: Depth=1
  add rbx, -1
  jne .LoopHeader
.LoopEnd:

Compared to an empty KeepRunning loop, which looks like:

.LoopHeader: # in Loop: Header=BB0_3 Depth=1
  cmp byte ptr [rbx], 1
  jne .LoopInit
.LoopBody: # =>This Inner Loop Header: Depth=1
  mov rax, qword ptr [rbx + 8]
  lea rcx, [rax + 1]
  mov qword ptr [rbx + 8], rcx
  cmp rax, qword ptr [rbx + 104]
  jb .LoopHeader
  jmp .LoopEnd
.LoopInit:
  mov rdi, rbx
  call benchmark::State::StartKeepRunning()
  jmp .LoopBody
.LoopEnd:

Unless C++03 compatibility is required, the ranged-for variant of writing the benchmark loop should be preferred.

Disabling CPU Frequency Scaling

If you see this error:

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

you might want to disable the CPU frequency scaling while running the benchmark:

sudo cpupower frequency-set --governor performance
./mybench
sudo cpupower frequency-set --governor powersave
Comments
  • Add user-defined counters.

    Add user-defined counters.

    This PR adds user-defined counters, as discussed in issue #240 . I've added usage notes to the README.md file.

    This was implemented mainly by adding two classes: BenchmarkCounters which is responsible for managing the collection of counters and Counter which contains logic for each individual counter. BenchmarkCounters objects can be summed up to aggregate multithread benchmark results. Then I added a BenchmarkCounters member to State and Run, which are accessible as needed.

    Though the changes in this PR generally adhere to the gist of the discussion in issue #240 , I've opted to leave the bytes_processed and items_processed as they previously were. In an early commit I did move these quantities to the BenchmarkCounters class, but otherwise noted that these changes were maybe a bit too far-reaching for what I wanted to do here. As an illustration of the resulting approach, I left some of this code commented out (eg, see src/benchmark.cc:470).

    I added two methods to the Fixture class: InitState() and TerminateState(). These were needed to change the state before and after the benchmark. They receive a non-const reference to benchmark::State so that they start and stop hardware counters as needed. I did this instead of changing the State constness in Setup() and TearDown() to retain compatibility. See src/benchmark.cc:758.

    The JSON reporter was the easiest to adapt. The Console and CSV reporters had the part printing the header moved out of Reporter::PrintContext(). In the console reporter, I store the previously-printed header, and check on each Run whether the header is different; if it is then a new header is printed. The printing of header inside the table was not immediately clear, so I added another separator line filled with '-' before the names; this makes the in-table headers more visible by separating them more clearly from the results above. In the CSV reporter, I gather first all the fields in a set, and then print counters (or nothing) based on this set. This set is local so that part is done outside of the PrintRunData() method which I promptly recognize as awkward. See src/csv_reporter.cc:80 below.

    Note that I needed to add #includes of <utility>,<vector>,<string> and <initializer_list> (this last only when compiling in C++11) to the benchmark_api.h file. If you do not want this, let me know, or just feel free to suggest alternative approaches that would obviate these includes.

    All the existing unit tests are passed successfully. I added also another test showing use with the PAPI library, but will do a separate PR for that, after this one is closed. Also, I did not add new specific tests for the user counters. If this PR is approved, then we'll discuss what to test for.

    Please let me know if something in the code of this PR should be changed. For example, maybe BenchmarkCounters could be called UserCounters. Also, I tried to be faithful to the style of this project, but if for any reason something was overlooked feel free to point it out; I'll gladly correct it.

    cla: yes 
    opened by biojppm 78
  • CI: Add Conan testing, mass building and publishing

    CI: Add Conan testing, mass building and publishing

    This

    • is adding a recipe for the Conan package manager
    • also adds a small test to the CI if Conan package creation is working
    • updated: also adds a upload script to the CI for the Conan recipe to a Conan repository

    ~~Please note that for future releases of benchmark the version string needs to be updated in conanfile.py~~ (not anymore, see discussion)


    ~~Future work could include~~ Remaining to do list:

    • [x] setting up a free repository on Bintray: https://bintray.com/dominichamon/benchmark
    • [ ] including the package in the official Conan repository conan-center
    • [x] and extending the CI to upload new versions of the recipe and built packages to the Bintray repository automatically

    To use the recipe after this pull request is merged it requires to clone this git repository first; when the recipe is included in conan-center every Conan client would be able to use it via one single install command.


    Please let me know if you have any further questions regarding supporting Conan ๐Ÿ˜„

    Fixes #635

    //cc @raulbocanegra @p-groarke @danimtb @Mikayex @mpusz @iblis-ms

    cla: yes 
    opened by Croydon 71
  • Benchmark ID for Better Tooling

    Benchmark ID for Better Tooling

    This Pull Request implements a Benchmark ID that is output for the various reporting statures.

    What it does

    This pull request outputs an additional column (console and csv) and an additional field (json) called "ID" ("id" for csv and json). It looks like this:

    Multi-Pretty ID Printing Pretty ID Printing

    Rationale

    Suppose one registers 2 benchmarks functions, as below (implementation purely for exposition):

    #include <benchmark/benchmark.h>
    
    static void BM_StringCreation(benchmark::State& state) {
      for (auto _ : state)
        std::string empty_string;
    }
    // Register the function as a benchmark
    BENCHMARK(BM_StringCreation);
    
    // Define another benchmark
    static void BM_StringCreation_Append(benchmark::State& state) {
      for (auto _ : state) {
        std::string empty_string;
        empty_string += "hello";
      }
    }
    BENCHMARK(BM_StringCreation_Append);
    
    BENCHMARK_MAIN();
    

    Now consider that someone runs this code with --benchmark_repetitions=2. We get the report for BM_StringCreation and BM_StringCreation_Append, giving us stddev, mean, and median for the benchmarks. The problem then becomes...

    How do you associate a the name of the benchmark statistic with the benchmark it belongs to?

    Normally, this wouldn't be a problem... but in the presence of custom statistics allowed by ComputeStatistics, we can run into problems. For example, if a statistic called my_stat is added, we run into issues where a tool cannot reliably determine which stats belong to which benchmark:

    --------------------------------------------...
    Name
    --------------------------------------------...
    BM_StringCreation
    BM_StringCreation
    BM_StringCreation_stddev
    BM_StringCreation_median
    BM_StringCreation_mean
    BM_StringCreation_Append
    BM_StringCreation_Append
    BM_StringCreation_Append_stddev
    BM_StringCreation_Append_median
    BM_StringCreation_Append_mean
    BM_StringCreation_Append_my_stat
    

    Is the statistic BM_StringCreation_Append_my_stat called Append_my_stat and belong to BM_StringCreation? Or is the statistic's name my_stat and belongs to BM_StringCreation_Append?

    A unique identifier solves this ambiguity, paving the way for tools to appropriately group statistic measurements with the right benchmark with a very small, simple change. It also allows tools to use order-independent parsers for json and csv, which is important in tooling languages where the default dictionary types and parsers do not guarantee read-order to be the order of submission.

    Code Submitted

    The code submitted here is adds a few extra utility functions and a few new members. Convention was followed as much as possible, and int was preferred except where convention (such as inside of ConsoleReporter, where the saved width types are of size_t) demonstrated a use for otherwise.

    cla: yes 
    opened by BaaMeow 49
  • C++11 Regular Expressions

    C++11 Regular Expressions

    I would like to get benchmark working on Windows. I see that #29 has done a lot of work in that respect but resulted in #30 about the replacement of the Regex class with std::regex.

    This PR implements a C++11 backend to the Regex class that passes the unit tests. This is more to start a discussion of how to proceed with this branch.

    From what I can see Regex is only used in one place in benchmark.cc so the Regex class could be dropped all together. However this is opposed by @dominichamon

    i actually had some issues when i tried using std::regex for this project as the matching wasn't quite the same. I don't remember the details, i'm afraid, but i'd want much more testing before making this change.

    So I need to make the unit tests much more thorough than what they are now. I'm not sure what strings I should be testing to make sure that the C++11 backend is actually doing the same thing (it uses the same extended matching flags).

    There are other issues here, whilst __cplusplus >= 201103L is the correct check to work out if we are in C++11 mode gcc 4.7 and 4.8 don't actually have <regex> it only came in 4.9. So switching to just using std::regex would result in dropping support for anything before 4.9. So a much better check would be for CMake to compile a snippet to work out if we have regular expressions and fallback to POSIX regular expressions as neccessary.

    Soooooo, where shall we go from here? I'm happy to put in as much effort as needed until benchmark builds from CMake for MinGW. VS should drop out nicely from this as CMake is awesome...but MSVC is crap so who knows what will happen there. Maybe clang to the rescue :wink:

    opened by mattyclarkson 36
  • Openmp compatibility

    Openmp compatibility

    This patch makes Google Benchmark compatible with OpenMP and other user-level thread management. Until now, google benchmark would only report the CPU usage of the master thread if the code being benchmarked used OpenMP or otherwise spawned multiple threads internally.

    This version reports the total process CPU usage if the number of google-benchmark threads is set to <= 1 , but reverts to the existing behaviour otherwise.

    It may actually be preferable to report the total process CPU usage in all cases, but this is sufficient for my needs.

    We have been using google benchmark in our parallel programming class, however, every term students are confused when the CPU time roughly reflects the wall-clock time for parallelized codes doing the same amount of work. This version is also advantageous because it can better demonstrate the overhead of threading, that some tasks take more total CPU time when multi-threaded, and, sometimes, tasks may actually take less overall CPU time.

    If my feature patch cannot be merged, I would like to request that the maintainers implement this. It is very important to us.

    cla: no 
    opened by bryan-lunt 35
  • Make `PauseTiming()` and `ResumeTiming()` per thread.

    Make `PauseTiming()` and `ResumeTiming()` per thread.

    Currently we time benchmarks using a single global timer that tracks per-process CPU usage. Pausing and resuming this timer have to act as a barrier to all threads. This has crippling effects on multi-threaded benchmarks. If you pause every iterator you synchronize the entire benchmark. It's effectively no longer multi-threaded.

    This patch changes to a per-thread timer. Instead of measuring process CPU time we sum thread CPU time and we pause on a per-thread basis.

    Below are comparison of the new and old results from basic_test.cc. Note that the BM_spin_pause_during test get 95% faster.

    Benchmark                                                    Time           CPU
    -------------------------------------------------------------------------------
    BM_empty_mean                                               +0.00         +0.00
    BM_empty/threads:4_mean                                     +0.00         +0.00
    BM_spin_empty/8_mean                                        +0.00         +0.00
    BM_spin_empty/512_mean                                      +0.01         +0.00
    BM_spin_empty/8k_mean                                       -0.00         -0.00
    BM_spin_empty/8/threads:4_mean                              +0.00         -0.17
    BM_spin_empty/512/threads:4_mean                            +0.01         +0.00
    BM_spin_empty/8k/threads:4_mean                             -0.01         +0.00
    BM_spin_pause_before/8_mean                                 +0.00         +0.00
    BM_spin_pause_before/512_mean                               +0.03         +0.02
    BM_spin_pause_before/8k_mean                                +0.01         +0.01
    BM_spin_pause_before/8/threads:4_mean                       +0.00         +0.00
    BM_spin_pause_before/512/threads:4_mean                     +0.04         +0.01
    BM_spin_pause_before/8k/threads:4_mean                      -0.03         -0.00
    BM_spin_pause_during/8_mean                                 -0.24         -0.25
    BM_spin_pause_during/512_mean                               -0.24         -0.24
    BM_spin_pause_during/8k_mean                                -0.13         -0.13
    BM_spin_pause_during/8/threads:4_mean                       -0.97         -0.90
    BM_spin_pause_during/512/threads:4_mean                     -0.96         -0.89
    BM_spin_pause_during/8k/threads:4_mean                      -0.95         -0.85
    BM_pause_during_mean                                        -0.23         -0.20
    BM_pause_during/threads:4_mean                              -0.97         -0.90
    BM_pause_during/real_time_mean                              -0.24         -0.26
    BM_pause_during/real_time/threads:4_mean                    -0.97         -0.90
    BM_spin_pause_after/8_mean                                  +0.00         +0.00
    BM_spin_pause_after/512_mean                                +0.00         +0.00
    BM_spin_pause_after/8k_mean                                 -0.00         -0.00
    BM_spin_pause_after/8/threads:4_mean                        +0.00         +0.00
    BM_spin_pause_after/512/threads:4_mean                      -0.01         -0.02
    BM_spin_pause_after/8k/threads:4_mean                       +0.01         +0.01
    BM_spin_pause_before_and_after/8_mean                       +0.00         +0.00
    BM_spin_pause_before_and_after/512_mean                     +0.00         +0.00
    BM_spin_pause_before_and_after/8k_mean                      +0.01         +0.01
    BM_spin_pause_before_and_after/8/threads:4_mean             +0.00         +0.00
    BM_spin_pause_before_and_after/512/threads:4_mean           +0.06         +0.04
    BM_spin_pause_before_and_after/8k/threads:4_mean            -0.00         +0.02
    BM_empty_stop_start_mean                                    +0.00         +0.00
    BM_empty_stop_start/threads:4_mean                          +0.00         +0.00
    

    There's still work to do on this, but I was hoping for initial feedback on the direction.

    cla: yes 
    opened by EricWF 32
  • [BUG] benchmark's CXX feature check fails to detect POSIX_REGEX | GNU_POSIX_REGEX

    [BUG] benchmark's CXX feature check fails to detect POSIX_REGEX | GNU_POSIX_REGEX

    Describe the bug trying to compile llvm 11 from source with cmake 3.19 and a llvm 11 toolchain on a X86 debian userland based node llvm source code pulls in llvm Googleโ€™s benchmarking but trying to configure the compilation ends with

    LLVM FileCheck Found: /usr/lib/llvm-11/bin/FileCheck
     -- Version: 0.0.0
     -- Performing Test HAVE_THREAD_SAFETY_ATTRIBUTES -- failed to compile
     -- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
     -- Performing Test HAVE_POSIX_REGEX -- failed to compile
     CMake Warning at utils/benchmark/CMakeLists.txt:244 (message):
       Using std::regex with exceptions disabled is not fully supported
    

    System Which OS, compiler, and compiler version are you using:

    • OS: Linux 5.10 rc5 X64
    • package distro: Debian
    • Compiler and version: llvm11 toolchain
    • Cmake version: 3.19.0
    • libpcre2-dev & libpcre2-posix2 version: 10.34-7

    To reproduce Steps to reproduce the behaviour:

    1. install the llvm 11 toolchain from distro (and all other relevant dependencies for compilation, incl. libpcre2)
    2. git clone the llvm source code
    3. cmake for the llvm source code
    4. See failure

    Expected behaviour CXXFeatureCheck.cmake to properly detect POSIX_REGEX as provided through libpcre2

    Additional context https://pcre.org provides its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API.

    The CXX feature check is not sufficiently verbose to understand what is failing in the context of checking for GNU_POSIX_REGEX | POSIX_REGEX

    opened by ghost 31
  • Add PyPI upload job to wheel building workflow

    Add PyPI upload job to wheel building workflow

    This commit adds a job running after the wheel building job responsible for uploading the built wheels to PyPI. The job only runs on successful completion of all build jobs, and uploads to PyPI using a secret added to the Google Benchmark repo (TBD). Also, the setup-python action has been bumped to the latest version v3.

    TODO:

    • Verify that the credential setup is correct (by trying a wheel building CI job)
    • Make sure the token is added to this repo's Github secrets under the name pypi_password (or I can change the name to whatever it is now)
    opened by nicholasjng 26
  • Implementation of random interleaving.

    Implementation of random interleaving.

    See http://github.com/google/benchmark/issues/1051 for the feature requests.

    Committer: Hai Huang (http://github.com/haih-g)

    On branch fr-1051 Changes to be committed: modified: include/benchmark/benchmark.h modified: src/benchmark.cc new file: src/benchmark_adjust_repetitions.cc new file: src/benchmark_adjust_repetitions.h modified: src/benchmark_api_internal.cc modified: src/benchmark_api_internal.h modified: src/benchmark_register.cc modified: src/benchmark_runner.cc modified: src/benchmark_runner.h modified: test/CMakeLists.txt new file: test/benchmark_random_interleaving_gtest.cc

    enhancement cla: yes next-release 
    opened by haih-g 26
  • Add support for GTest based unit tests.

    Add support for GTest based unit tests.

    As Dominic and I have previously discussed, there is some need/desire to improve the testing situation in Google Benchmark.

    One step to fixing this problem is to make it easier to write unit tests by adding support for GTest, which is what this patch does.

    By default it looks for an installed version of GTest. However the user can specify -DBENCHMARK_BUILD_EXTERNAL_GTEST=ON to instead download, build, and use copy of gtest from source. This is quite useful when Benchmark is being built in non-standard configurations, such as against libc++ or in 32 bit mode.

    cla: yes 
    opened by EricWF 26
  • Introduce a dependency on Abseil

    Introduce a dependency on Abseil

    The main motivation is enabling the benchmark library to play well with hosts that use Abseil, especially for command line flag functionality. This also opens up the possibility of reusing functionality such as string utils, concurrency, or logging.

    This was previously discussed in issue #910. This patch:

    • adds the dependency to Abseil. The library will either be downloaded (at a fixed, specific commit) or the user may specify their own location via the ABSEIL_PATH cmake flag

    • replaces flag functionality with Abseil's.

    cla: yes 
    opened by mtrofin 25
  • Null hypothesis test discrepancy

    Null hypothesis test discrepancy

    https://github.com/google/benchmark/blob/49aa374da96199d64fd3de9673b6f405bbc3de3e/tools/gbench/report.py#L212

    As far as I understand, in google benchmark compare.py script we set:

    • The null hypothesis as the baseline and the contender to have the same results.
    • The alternative hypothesis is the contrary, that they are different.

    In a null hypothesis we expect:

    A null hypothesis to be rejected when the pvalue of the contender results in the baseline distribution is less than the alpha value (alpha = 1-confidence level).

    However, in the source code we determine that the results are different when pvalue >= alpha. This is the reverse of a what we are looking for.

    On a similar question for performance level, we want one side hypothesis test, since getting less execution time than the average should not be considered a failure.

    I wonder if I am missing something here, I also wonder why using mannwhitneyu instead of a normal distribution since mannwhitneyu results are good for small numbers with a larger N.

    good first issue help wanted 
    opened by vicentebolea 11
  • [FR] Allow benchmarks to report their parameterization structurally to JSON

    [FR] Allow benchmarks to report their parameterization structurally to JSON

    Is your feature request related to a problem? Please describe.

    The benchmark::RegisterBenchmark() API supports adding benchmarks that are parameterized with arbitrary values, including custom names, lambdas, arbitrary objects, etc. The structural details of the parameterization is currently not available in the JSON output. The best the user can do is encode information in the benchmark name, which is a good idea anyway for clear console output. For the JSON output, however, it would be nice if a more data driven approach was available, such as key/value pairs. This way, analysis scripts need not parse the information out of the benchmark names.

    I have personally come across a few cases where I have registered a benchmark matrix manually in code and had to come up with a benchmark name parsing approach when analyzing the data. This is an uncommon edge case, though, so I agree that any solution here should be simple.

    This is similar to #838, but for the parameters that the benchmark library can't know about, doesn't know how to represent in JSON, etc.

    Describe the solution you'd like

    I am not invested in any particular solution, but this is one idea:

    Question is, what's the end use case? I think in general it is bad to enshrine in API something that is going to be rather obscurely used, but rather instead some generalization should be provided that allows to solve bigger problem. There's already global AddCustomContext(). Perhaps having the same but with different scopes will be sufficient?

    Originally posted by @LebedevRI in https://github.com/google/benchmark/issues/838#issuecomment-1242788747

    So perhaps code could look like this (pseudocode):

    for (param1 : param1_values) {
      for (param2 : param2_values) {
          name = "BM_Thing/" + param1.ToString() + "/" + param2.ToString();
          auto* benchmark = benchmark::RegisterBenchmark(
              name, [param1, param2](benchmark::State& state) {
                BM_MyBenchmark(state, param1.value(), param2.value());
              });
          benchmark->AddCustomContext("param1", param1.ToString());
          benchmark->AddCustomContext("param2", param2.ToString());
      }
    }
    

    Some of the fancier BENCHMARK_ macro wrappers around benchmark::RegisterBenchmark() could perhaps be adapted to add their stringified args using this new benchmark->AddCustomContext() API.

    Describe alternatives you've considered

    In the past I thought about URL encoding key/value pairs in the benchmark name. ;-)

    Additional context

    Eventually, with something like this in place perhaps some of the filtering logic in compare.py could optionally match benchmarks by these values, rather than requiring the use of a regex against the name.

    opened by matta 9
  • [windows] Terminal background colour

    [windows] Terminal background colour

    Hello folks, my first issue.

    The function ColorPrintf() ass-u-mes that all users of this library have terminals with a black background colour. I.e. using it in Microsoft's STL library and the benchmark-std_copy.exe program, the output like like this:

    STL-1

    Not very nice IMHO.

    With my blue terminal background I'd like it to look like: STL-2

    which I fixed with:

    --- a/src/colorprint.cc  2022-09-10 14:58:12
    +++ b/src/colorprint.cc 2022-09-10 14:54:32
    @@ -136,13 +136,16 @@
       CONSOLE_SCREEN_BUFFER_INFO buffer_info;
       GetConsoleScreenBufferInfo(stdout_handle, &buffer_info);
       const WORD old_color_attrs = buffer_info.wAttributes;
    +  WORD  new_attr;
    
       // We need to flush the stream buffers into the console before each
       // SetConsoleTextAttribute call lest it affect the text that is already
       // printed but has not yet reached the console.
       fflush(stdout);
    +  new_attr = (buffer_info.wAttributes & ~7);
    +  new_attr &= ~8;    // Since 'wAttributes' could have been hi-intensity at startup.
       SetConsoleTextAttribute(stdout_handle,
    -                          GetPlatformColorCode(color) | FOREGROUND_INTENSITY);
    +                          new_attr | GetPlatformColorCode(color) | FOREGROUND_INTENSITY);
       vprintf(fmt, args);
    
       fflush(stdout);
    

    PS. I use JPsoft's 4NT as the shell.

    System

    • Win-10
    • Compiler and version: MSVC ver. 19.34.31721 for x64
    opened by gvanem 2
  • Support --benchmarks_filter in the compare.py 'benchmarks' command

    Support --benchmarks_filter in the compare.py 'benchmarks' command

    Previously compare.py ignored the --benchmarks_filter argument when loading JSON. This defeated any workflow when a single run of the benchmark was run, followed by multiple "subset reports" run against it with the 'benchmarks' command.

    Concretely this came up with the simple case:

    compare.py benchmarks a.json b.json --benchmarks_filter=BM_Example

    This has no practical impact on the 'filters' and 'benchmarkfiltered' comand, which do their thing at a later stage.

    Fixes #1484

    opened by matta 3
  • [FR] Support compare.py benchmarks a.json b.json --benchmark_filter=blah

    [FR] Support compare.py benchmarks a.json b.json --benchmark_filter=blah

    Is your feature request related to a problem? Please describe.

    I've got an a.json and a b.json that took a while to collect due to lots of benchmarks and lots of repetitions. I was surprised that

    compare.py benchmarks a.json b.json --benchmark_filter=blah
    

    Ignored --benchmarks_filter= and always displayed all the results.

    Describe the solution you'd like

    The "benchmarks" sub-command honors --benchmarks_filter= and reports as if only those benchmarks were run.

    Describe alternatives you've considered

    Post processing the benchmark .json before running compare.py is an inconvenient workaround.

    opened by matta 6
Releases(v1.7.0)
  • v1.7.0(Jul 25, 2022)

    Small release to replace broken v1.6.2 release

    What's Changed

    • Stop generating the export header and just check it in by @dominichamon in https://github.com/google/benchmark/pull/1435
    • use target_compile_definitions by @dominichamon in https://github.com/google/benchmark/pull/1440
    • simplified code by @maochongxin in https://github.com/google/benchmark/pull/1439

    New Contributors

    • @maochongxin made their first contribution in https://github.com/google/benchmark/pull/1439

    Full Changelog: https://github.com/google/benchmark/compare/v1.6.2...v1.7.0

    Source code(tar.gz)
    Source code(zip)
  • v1.6.2(Jul 18, 2022)

    What's Changed

    • Add docs for ThreadRange. by @dominichamon in https://github.com/google/benchmark/pull/1318
    • Add docs on Memory profiling (#1217). by @dominichamon in https://github.com/google/benchmark/pull/1319
    • Suppress GoogleTest warnings on windows (MSVC) too. by @dominichamon in https://github.com/google/benchmark/pull/1320
    • Expand documentation for unpacking arbitrary arguments. by @dominichamon in https://github.com/google/benchmark/pull/1324
    • Refine docs on changing cpufreq governor by @dominichamon in https://github.com/google/benchmark/pull/1325
    • Refine the User Guide CPU Frequency Scaling section by @matta in https://github.com/google/benchmark/pull/1331
    • Fix some errors in Custom Statistics document demo code. by @YuanYingdong in https://github.com/google/benchmark/pull/1332
    • Cache PerfCounters instance in PerfCountersMeasurement by @taoliq in https://github.com/google/benchmark/pull/1308
    • Fix cross compilation for macOS ARM builds in cibuildwheel by @nicholasjng in https://github.com/google/benchmark/pull/1334
    • bump numby, as per dependabot by @dominichamon in https://github.com/google/benchmark/pull/1336
    • Use Win32 API only for Win32 apps by @batortaller in https://github.com/google/benchmark/pull/1333
    • Add mutex when reading counters_ (Fixes #1335) by @taoliq in https://github.com/google/benchmark/pull/1338
    • Avoid potential truncation issues for the integral type parameterized tests. by @staffantj in https://github.com/google/benchmark/pull/1341
    • Expose default display reporter creation in public API by @dominichamon in https://github.com/google/benchmark/pull/1344
    • explicitly export public symbols by @sergiud in https://github.com/google/benchmark/pull/1321
    • Check for macro existence before using by @oontvoo in https://github.com/google/benchmark/pull/1347
    • simplify reference to internal path by @dominichamon in https://github.com/google/benchmark/pull/1349
    • Introduce the possibility to customize the help printer function by @vincenzopalazzo in https://github.com/google/benchmark/pull/1342
    • move bzl file out of tools by @dominichamon in https://github.com/google/benchmark/pull/1352
    • resolve case sensitivity issues with WORKSPACE and workspace by @dominichamon in https://github.com/google/benchmark/pull/1354
    • Make generate_export_header.bzl work for Windows. by @junyer in https://github.com/google/benchmark/pull/1355
    • @platforms is magical; remove it from WORKSPACE. by @junyer in https://github.com/google/benchmark/pull/1356
    • restore BENCHMARK_MAIN() by @sergiud in https://github.com/google/benchmark/pull/1357
    • Allow setting the default time unit globally by @batortaller in https://github.com/google/benchmark/pull/1337
    • Add long description and content type for proper PyPI presentation by @nicholasjng in https://github.com/google/benchmark/pull/1361
    • Add SetBenchmarkFilter() to set --benchmark_filter flag value in user code by @oontvoo in https://github.com/google/benchmark/pull/1362
    • Appended additional BSD 3-Clause to LICENSE by @oontvoo in https://github.com/google/benchmark/pull/1363
    • Add PyPI upload job to wheel building workflow by @nicholasjng in https://github.com/google/benchmark/pull/1359
    • Fix float comparaison and add float comparison warning by @bensuperpc in https://github.com/google/benchmark/pull/1368
    • Update LICENSE file to clearly state which file needs BSD 3 by @oontvoo in https://github.com/google/benchmark/pull/1366
    • Add BENCHMARK_STATIC_DEFINE to the Python bindings' cc_binary localโ€ฆ by @nicholasjng in https://github.com/google/benchmark/pull/1369
    • Remove conditional trigger from PyPI upload job by @nicholasjng in https://github.com/google/benchmark/pull/1370
    • Change artifact download name to dist to match upload name by @nicholasjng in https://github.com/google/benchmark/pull/1371
    • Build //:benchmark as a static library only. by @junyer in https://github.com/google/benchmark/pull/1373
    • Fix Bazel build breakage caused by commit 6a894bd. by @junyer in https://github.com/google/benchmark/pull/1374
    • [nfc] Reformat doc-string in generate_export_header by @oontvoo in https://github.com/google/benchmark/pull/1376
    • Updates for inclusive language by @messerb5467 in https://github.com/google/benchmark/pull/1360
    • getting sysinfo in line with Google style by @dominichamon in https://github.com/google/benchmark/pull/1381
    • Small optimization to counter map management by @dominichamon in https://github.com/google/benchmark/pull/1382
    • Shut down Bazel gracefully and revert wheel build strategy to job matrix by @nicholasjng in https://github.com/google/benchmark/pull/1383
    • Fix wheel job name for PyPI uploads by @nicholasjng in https://github.com/google/benchmark/pull/1384
    • Filter out benchmarks that start with "DISABLED_" by @dominichamon in https://github.com/google/benchmark/pull/1387
    • Add benchmark labels to the output of the comparison tool by @dominichamon in https://github.com/google/benchmark/pull/1388
    • Enable -Wconversion by @dominichamon in https://github.com/google/benchmark/pull/1390
    • Add installation and build instructions for Python bindings by @nicholasjng in https://github.com/google/benchmark/pull/1392
    • fix some typos by @cuishuang in https://github.com/google/benchmark/pull/1393
    • Add option to get the verbosity provided by commandline flag -v (#1330) by @Matthdonau in https://github.com/google/benchmark/pull/1397
    • Add support to get clock for new architecture CSKY by @zixuan-wu in https://github.com/google/benchmark/pull/1400
    • Introduce warmup phase to BenchmarkRunner (#1130) by @Matthdonau in https://github.com/google/benchmark/pull/1399
    • Report large numbers in scientific notation in console reporter (#1303) by @Matthdonau in https://github.com/google/benchmark/pull/1402
    • add multiple OSes to bazel workflow by @dominichamon in https://github.com/google/benchmark/pull/1412
    • Add possibility to ask for libbenchmark version number (#1004) by @Matthdonau in https://github.com/google/benchmark/pull/1403
    • Fix DoNotOptimize() GCC copy overhead (#1340) by @alexgpg in https://github.com/google/benchmark/pull/1410
    • Clarify that the cpu frequency is not used for benchmark timings. by @dominichamon in https://github.com/google/benchmark/pull/1414
    • Revert "Add possibility to ask for libbenchmark version number (#1004)" by @dominichamon in https://github.com/google/benchmark/pull/1417
    • Remove redundant formatting tags by @tomcobley in https://github.com/google/benchmark/pull/1420
    • Fix DoNotOptimize() GCC compile error with some types (#1340) by @alexgpg in https://github.com/google/benchmark/pull/1424
    • Expose default help printer function by @yurikhan in https://github.com/google/benchmark/pull/1425
    • fix sanitizer builds by using clang 13 by @dominichamon in https://github.com/google/benchmark/pull/1426
    • Suppress nvcc offsetof warning by @cz4rs in https://github.com/google/benchmark/pull/1429
    • Expose google_benchmark.State for python bindings. by @rmcilroy in https://github.com/google/benchmark/pull/1430

    New Contributors

    • @YuanYingdong made their first contribution in https://github.com/google/benchmark/pull/1332
    • @taoliq made their first contribution in https://github.com/google/benchmark/pull/1308
    • @batortaller made their first contribution in https://github.com/google/benchmark/pull/1333
    • @vincenzopalazzo made their first contribution in https://github.com/google/benchmark/pull/1342
    • @messerb5467 made their first contribution in https://github.com/google/benchmark/pull/1360
    • @cuishuang made their first contribution in https://github.com/google/benchmark/pull/1393
    • @Matthdonau made their first contribution in https://github.com/google/benchmark/pull/1397
    • @zixuan-wu made their first contribution in https://github.com/google/benchmark/pull/1400
    • @alexgpg made their first contribution in https://github.com/google/benchmark/pull/1410
    • @tomcobley made their first contribution in https://github.com/google/benchmark/pull/1420
    • @yurikhan made their first contribution in https://github.com/google/benchmark/pull/1425
    • @cz4rs made their first contribution in https://github.com/google/benchmark/pull/1429
    • @rmcilroy made their first contribution in https://github.com/google/benchmark/pull/1430

    Full Changelog: https://github.com/google/benchmark/compare/v1.6.1...v1.6.2

    Source code(tar.gz)
    Source code(zip)
  • v1.6.1(Jan 10, 2022)

    What's Changed

    Fixes

    • Remove unused parameter from lambda. by @dominichamon in https://github.com/google/benchmark/pull/1223
    • Optimized docs installation by @xvitaly in https://github.com/google/benchmark/pull/1225
    • Fix mention of --benchmarks in comment by @oontvoo in https://github.com/google/benchmark/pull/1229
    • cmake: eliminate redundant target_include_directories by @sergiud in https://github.com/google/benchmark/pull/1242
    • Cmake: options for controlling werror, disable werror for PGI compilers by @PhilipDeegan in https://github.com/google/benchmark/pull/1246
    • Fix -Wdeprecated-declarations warning triggered by clang-cl. by @bc-lee in https://github.com/google/benchmark/pull/1245
    • cmake: make package config relocatable by @sergiud in https://github.com/google/benchmark/pull/1244
    • cmake: allow to use package config from build directory by @sergiud in https://github.com/google/benchmark/pull/1240
    • Fix -Wdeprecated-declarations warning once more. by @bc-lee in https://github.com/google/benchmark/pull/1256
    • Fix un-initted error in test and fix change the API previously proposed to use std::string instead of raw char* by @oontvoo in https://github.com/google/benchmark/pull/1266
    • [cleanup] Change == "" to .empty() on string to avoid clang-tidy warnings by @oontvoo in https://github.com/google/benchmark/pull/1271
    • Fix errorWshorten-64-to-32 with clang 12.0 by @bensuperpc in https://github.com/google/benchmark/pull/1273
    • Fix error with Fix Werror=old-style-cast by @bensuperpc in https://github.com/google/benchmark/pull/1272
    • Fixed typo in doc: s/marcro/macro by @oontvoo in https://github.com/google/benchmark/pull/1274
    • Fix warning with MacOS by @bensuperpc in https://github.com/google/benchmark/pull/1276
    • clang-format Google on {src/,include/} by @dominichamon in https://github.com/google/benchmark/pull/1280
    • format tests with clang-format by @dominichamon in https://github.com/google/benchmark/pull/1282
    • check clang format on pull requests and merges by @dominichamon in https://github.com/google/benchmark/pull/1281
    • Fix dependency typo and unpin cibuildwheel version in wheel building โ€ฆ by @nicholasjng in https://github.com/google/benchmark/pull/1263
    • disable lint check where we know it'd fail by @oontvoo in https://github.com/google/benchmark/pull/1286
    • Disable clang-tidy (unused-using-decls) by @oontvoo in https://github.com/google/benchmark/pull/1287
    • Add clang-tidy check by @dominc8 in https://github.com/google/benchmark/pull/1290
    • Fix broken link to Setup/Teardown section by @Krzmbrzl in https://github.com/google/benchmark/pull/1291
    • Update user_guide.md: thread_index should be thread_index() by @ShawnZhong in https://github.com/google/benchmark/pull/1296
    • clang-tidy: readability-redundant and performance by @dominc8 in https://github.com/google/benchmark/pull/1298
    • update googletest to latest release tag 1.11.0 by @dominichamon in https://github.com/google/benchmark/pull/1301
    • Avoid errors due to "default label in switch which covers all enumeration values" in Windows codepath by @mstorsjo in https://github.com/google/benchmark/pull/1302
    • Fix -DBENCHMARK_ENABLE_INSTALL=OFF build (Fixes #1275) by @LebedevRI in https://github.com/google/benchmark/pull/1305
    • Address c4267 warning on MSVC by @staffantj in https://github.com/google/benchmark/pull/1315
    • Destructor not returning is expected in some cases by @staffantj in https://github.com/google/benchmark/pull/1316

    Features

    • Added support of packaged GTest for running unit tests by @xvitaly in https://github.com/google/benchmark/pull/1226
    • Introduce additional memory metrics by @oontvoo in https://github.com/google/benchmark/pull/1238
    • Added Doxygen support by @xvitaly in https://github.com/google/benchmark/pull/1228
    • Allow template arguments to be specified directly on the BENCHMARK macro by @oontvoo in https://github.com/google/benchmark/pull/1262
    • [RFC] Adding API for setting/getting benchmark_filter flag? by @oontvoo in https://github.com/google/benchmark/pull/1254
    • use docker container for ubuntu-16.04 builds by @dominichamon in https://github.com/google/benchmark/pull/1265
    • Support for building with LLVM clang-10/clang-11 on Windows. by @alisenai in https://github.com/google/benchmark/pull/1227
    • Add Setup/Teardown option on Benchmark. by @oontvoo in https://github.com/google/benchmark/pull/1269
    • compare.py: compute and print 'OVERALL GEOMEAN' aggregate by @LebedevRI in https://github.com/google/benchmark/pull/1289

    New Contributors

    • @PhilipDeegan made their first contribution in https://github.com/google/benchmark/pull/1246
    • @bc-lee made their first contribution in https://github.com/google/benchmark/pull/1245
    • @bensuperpc made their first contribution in https://github.com/google/benchmark/pull/1273
    • @alisenai made their first contribution in https://github.com/google/benchmark/pull/1227
    • @rHermes made their first contribution in https://github.com/google/benchmark/pull/1283
    • @dondenton made their first contribution in https://github.com/google/benchmark/pull/1285
    • @dominc8 made their first contribution in https://github.com/google/benchmark/pull/1290
    • @Krzmbrzl made their first contribution in https://github.com/google/benchmark/pull/1291
    • @ShawnZhong made their first contribution in https://github.com/google/benchmark/pull/1296
    • @staffantj made their first contribution in https://github.com/google/benchmark/pull/1315

    Full Changelog: https://github.com/google/benchmark/compare/v1.6.0...v1.6.1

    Source code(tar.gz)
    Source code(zip)
  • v1.6.0(Sep 7, 2021)

    features

    • [breaking change] introduce accessorrs for public data members (#1208)
    • add support for percentage units in statistics (#1219)
    • introduce coefficient of variation aggregate (#1220)
    • format percentages in console reporter (#1221)

    bugfixes

    • fix unreachable code warning (#1214)
    • replace #warning with #pragma message (#1216)
    • report PFM as found when it is
    • update u-test value expectations due to scipy upgrade

    other stuff

    • refactored documentation to minimise README.md (#1211)
    • install docs when installing library (#1212)
    Source code(tar.gz)
    Source code(zip)
  • v1.5.6(Aug 11, 2021)

    features

    • helper methods to create integer lists (#1179)
    • default of --benchmark_filter is now rather than "." (#1207)

    fixes

    • type warning (#1193)
    • returning a reference when callers want pointers (65dc63b)

    cleanup

    • remove dead code from PredictNumItersNeeded (#1206)
    • fix clang-tidy warnings (#1195) and typos (#1194)
    • prefix macros (#1186) and flags (#1187, #1185) to avoid name clashes
    • downgrade warnings for googletest (twice) (#1203, 560b0834, ee726a7)
    Source code(tar.gz)
    Source code(zip)
  • v1.5.5(Jun 11, 2021)

    new features

    • Add support for new architecture loongarch (#1173)
    • Fixed version of random interleaving of benchmark repetitions (#1163, fixing #1051)
    • Easier comparison of results across families (#1168 #1166 #1165 #1164)

    fixes

    • Fix perf counter argument parsing (#1160)

    internal cleanup

    • Drop warning to satisfy clang's -Wunused-but-set-variable diag (#1174)
    • Enable some sanitizer builds in github actions (#1167 #1171)
    • Fix memory leak in test (#1169)
    Source code(tar.gz)
    Source code(zip)
  • v1.5.4(May 30, 2021)

    new features

    • better versioning in releases [#1047]
    • MSVC arm64 support [#1090]
    • add support for hardware performance counters [#1114, #1153]
    • add interface for custom context to be included [#1127, #1137]
    • random interleaving to reduce noise [#1105]

    compiler cleanliness

    • support -Wsuggest-override [#1059]
    • builds correctly with gcc-11 [#1060]
    • fix some windows warnings [#1121]
    • fix -Wreserved-identifier failures [#1143]
    • fix pendantic warnings [#1156]
    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Apr 23, 2021)

    New features

    • Implement custom benchmark names (#1107)
    • Support for macro expansion in benchmark names (#1054)
    • Reduce ramp up repetitions when KeepRunningBatch is used (#1113)

    Platform support

    • CycleTimer implemented for M68K architecture (#1050)
    • Support for DragonFly BSD (#1058)
    • Better support for z/OS (#1063, #1067)
    • Add MSVC ARM64 support for cycle clocks (#1052)
    • Add support for Elbrus 2000 (#1091)
    • Fix CPU frequency for AMD Ryzen (and probably other CPUs) (#1117)

    Bug fixes

    • Fix range when starting at zero (#1073)

    Tool improvements

    • Build tools with bazel (#982)
    • Support JSON dumps of benchmark diffs (#1042)
    Source code(tar.gz)
    Source code(zip)
  • v1.5.2(Sep 11, 2020)

    • Timestamps in output are now rfc3339-formatted #965
    • overflow warnings with timers fixed #980
    • Python dependencies are now covered by a requirements.txt #994
    • JSON output cleaned up when no CPU scaling is present (#1008)
    • CartesianProduct added for easier settings of multiple ranges (#1029)
    • Python bindings improvements:
      • Custom main functions (#993)
      • A rename to google_benchmark (#199
      • More state methods bound (#1037) with a builder interface (#1040)
    • Workflow additions in github include pylint (#1039) and bindings runs (#1041)
    Source code(tar.gz)
    Source code(zip)
  • v1.5.1(Jun 9, 2020)

    • Python bindings are now available in //bindings/python
    • Upgraded bazel from 0.10.1 to 3.2.0 (long overdue)
    • RISC-V and PPC cycleclock fixes
    • Various build warnings and cmake issues resolved
    • Documentation improvements
    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(May 28, 2019)

    • Bump CMake minimum version to 3.5.1 (see dependencies.md) (#801)
    • Add threads and repetitions to the JSON outputa (#748)
    • Memory management and reporting hooks (#625)
    • Documentation improvements
    • Miscellaneous build fixes (Mostly Intel compiler and Android)
    Source code(tar.gz)
    Source code(zip)
  • v1.4.1(May 25, 2018)

    Bug-fix release on v1.4.

    • Realign expectation that State::iterations() returns 0 before the main benchmark loop begins. (#598)
    • CMake error message fixes (#595, #584)
    • Enscripten check fix (#583)
    • OpenBSD porting (#582)
    • Windows bazel fixes (#581)
    • Bazel pthread linking (#579)
    • Negative regexes (#576)
    • gmock fix (#564)
    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(Apr 4, 2018)

    • Removal of deprecated headers
    • Improved CPU cache info reporting (#486)
    • Support State::KeepRunningBatch() (#521)
    • Support int64_t for AddRange()
    • New platform support: NetBSD, s390x, Solaris
    • Bazel build support
    • Support googletest unit tests
    • Add assembler tests
    • Various warnings fixed
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Nov 3, 2017)

    Highlights

    • Ranged for loop optimization! (#454, #460)
    • Make installation optional (#463)
    • Better stats including user-provided ones (#428)
    • JSON reporter format fixes (#426, #431)
    • Documentation improvements (#445, #433, #466)
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Jul 21, 2017)

    Highlights

    • User-defined counters
    • Single header library
    • Ability to clear benchmarks so the runtime registration is more flexible
    • Sample-based standard deviation
    • 32-bit build enabled
    • Bug fixes
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Nov 4, 2016)

    Highlights

    • ArgNames support
    • Fixes for OSX and Cygwin and MSVC builds
    • PauseTiming and ResumeTiming are per thread (#286)
    • Better Range and Arg specifications
    • Complexity reporting
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 15, 2016)

    Significant changes since v0.1.0:

    • cmake 2.8.11 required
    • stricter compiler warnings
    • LTO support
    • fixtures
    • documentation fixes
    • CSV output
    • better windows support
    • minor fixes
    Source code(tar.gz)
    Source code(zip)
Owner
Google
Google โค๏ธ Open Source
Google
A microbenchmark support library

Benchmark A library to benchmark code snippets, similar to unit tests. Example: #include <benchmark/benchmark.h> static void BM_SomeFunction(benchmar

Google 6.8k Sep 16, 2022
Video game library manager with support for wide range of 3rd party libraries and game emulation support, providing one unified interface for your games.

An open source video game library manager and launcher with support for 3rd party libraries like Steam, GOG, Origin, Battle.net and Uplay. Includes game emulation support, providing one unified interface for your games.

Josef Nemec 4.4k Sep 17, 2022
Ethereum No DevFee, Support for GMiner, Phoenix Miner, NBMiner, T-Rex, lolMiner, TeamRedMiner, ClaymoreMiner Support pool and wallet๏ผŒpool account, worker name.

WiseNoDevFee Ethereum No DevFee, Support for GMiner, Phoenix Miner, NBMiner, T-Rex, lolMiner, TeamRedMiner, ClaymoreMiner Support pool and wallet๏ผŒpool

null 144 Sep 12, 2022
Filter driver which support changing DPI of mouse that does not support hardware dpi changing.

Custom Mouse DPI Driver ํ•˜๋“œ์›จ์–ด DPI ๋ณ€๊ฒฝ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ๋งˆ์šฐ์Šค๋“ค์˜ DPI ๋ณ€๊ฒฝ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ํ•„ํ„ฐ ๋“œ๋ผ์ด๋ฒ„ ๊ฒฝ๊ณ : ํ•ด๋‹น ๋“œ๋ผ์ด๋ฒ„๋Š” ์™„์ „ํžˆ ํ…Œ์ŠคํŠธ ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค Install ํ•ด๋‹น ๋“œ๋ผ์ด๋ฒ„๋Š” ์„œ๋ช…์ด ๋˜์–ด์žˆ์ง€์•Š์Šต๋‹ˆ๋‹ค. ๋“œ๋ผ์ด๋ฒ„๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด tests

storycraft 4 Sep 9, 2022
๐ŸŽต Music notation engraving library for MEI with MusicXML and Humdrum support and various toolkits (JavaScript, Python)

Verovio is a fast, portable and lightweight library for engraving Music Encoding Initiative (MEI) digital scores into SVG images. Verovio also contain

RISM Switzerland 499 Sep 26, 2022
libsvm websitelibsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website

Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification,

Chih-Jen Lin 4.2k Sep 19, 2022
Ultra fast and low latency asynchronous socket server & client C++ library with support TCP, SSL, UDP, HTTP, HTTPS, WebSocket protocols and 10K connections problem solution

CppServer Ultra fast and low latency asynchronous socket server & client C++ library with support TCP, SSL, UDP, HTTP, HTTPS, WebSocket protocols and

Ivan Shynkarenka 885 Sep 17, 2022
A small and portable INI file library with read/write support

minIni minIni is a portable and configurable library for reading and writing ".INI" files. At just below 900 lines of commented source code, minIni tr

Thiadmer Riemersma 276 Sep 17, 2022
Support for multiple RPC protocols in a single library

AnyRPC A multiprotocol remote procedure call system for C++ Overview AnyRPC provides a common system to work with a number of different remote procedu

Steve Gieseking 56 Jul 26, 2022
PortAudio is a portable audio I/O library designed for cross-platform support of audio

PortAudio is a cross-platform, open-source C language library for real-time audio input and output.

PortAudio 704 Sep 15, 2022
Guidelines Support Library

GSL: Guidelines Support Library The Guidelines Support Library (GSL) contains functions and types that are suggested for use by the C++ Core Guideline

Microsoft 5.1k Sep 25, 2022
gsl-lite โ€“ A single-file header-only version of ISO C++ Guidelines Support Library (GSL) for C++98, C++11, and later

gsl-lite: Guidelines Support Library for C++98, C++11 up metadata build packages try online gsl-lite is an implementation of the C++ Core Guidelines S

gsl-lite 754 Sep 20, 2022
A modern compile-time reflection library for C++ with support for overloads, templates, attributes and proxies

refl-cpp v0.12.1 Documentation refl-cpp encodes type metadata in the type system to allow compile-time reflection via constexpr and template metaprogr

Veselin Karaganev 702 Sep 15, 2022
A library OS for Linux multi-process applications, with Intel SGX support

Graphene Library OS with Intel SGX Support A Linux-compatible Library OS for Multi-Process Applications NOTE: We are in the middle of transitioning ou

The Gramine Project 266 Sep 21, 2022
theora-player is an embeddable theora video player C++ library based on the libtheora sample. It has no audio support at this moment.

theora-player Description theora-player is an embeddable theora video player C++ library based on the libtheora sample. It has no audio support at thi

Fire Falcom 2 Jun 18, 2022
GNU project's implementation of the standard C library(with Xuantie RISC-V CPU support).

GNU project's implementation of the standard C library(with Xuantie RISC-V CPU support).

T-Head Semiconductor Co., Ltd. 5 Mar 17, 2022
Telepati 2 Nov 29, 2021
ROS2 packages based on NVIDIA libArgus library for hardware-accelerated CSI camera support.

Isaac ROS Argus Camera This repository provides monocular and stereo nodes that enable ROS developers to use cameras connected to Jetson platforms ove

NVIDIA Isaac ROS 31 Sep 13, 2022