Commits · da9ec3dfca2a1266c4f35cc3d340f094cd71487a · Chen Yisong / benchmark

09 Jul, 2018 3 commits

Include system load average in console and JSON reports · da9ec3df

authored Jul 09, 2018

High system load can skew benchmark results. By including system load averages
in the library's output, we help users identify a potential issue in the
quality of their measurements, and thus assist them in producing better (more
reproducible) results.

I got the idea for this from Brendan Gregg's checklist for benchmark accuracy
(http://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html).

da9ec3df

Update AUTHORS and CONTRIBUTORS (#632) · 1f35fa4a
Federico Ficarelli authored Jul 09, 2018
```
Adding myself to AUTHORS and CONTRIBUTORS according to guidelines.
```
1f35fa4a

Fix build with Intel compiler (#631) · 0c21bc36

authored Jul 09, 2018

* Set -Wno-deprecated-declarations for Intel

Intel compiler silently ignores -Wno-deprecated-declarations
so warning no. 1786 must be explicitly suppressed.

* Make std::int64_t → double casts explicit

While std::int64_t → double is a perfectly conformant
implicit conversion, Intel compiler warns about it.
Make them explicit via static_cast<double>.

* Make std::int64_t → int casts explicit

Intel compiler warns about emplacing an std::int64_t
into an int container. Just make the conversion explicit
via static_cast<int>.

* Cleanup Intel -Wno-deprecated-declarations workaround logic

0c21bc36

03 Jul, 2018 1 commit
- Disable Intel invalid offsetof warning (#629) · 5946795e
  Federico Ficarelli authored Jul 03, 2018
  
  5946795e
28 Jun, 2018 1 commit
- fixed Google Test (Primer) Documentation link (#628) · 847c0069
  Yoshinari Takaoka authored Jun 28, 2018
  
  847c0069
27 Jun, 2018 2 commits

Add Iteration-related Counter::Flags. Fixes #618 (#621) · b123abdc

authored Jun 27, 2018

Inspired by these [two](https://github.com/darktable-org/rawspeed/commit/a1ebe07bea5738f8607b48a7596c172be249590e) [bugs](https://github.com/darktable-org/rawspeed/commit/0891555be56b24f9f4af716604cedfa0da1efc6b) in my code due to the lack of those i have found fixed in my code:
* `kIsIterationInvariant` - `* state.iterations()`
  The value is constant for every iteration, and needs to be **multiplied** by the iteration count.
* `kAvgIterations` - `/ state.iterations()`
  The is global over all the iterations, and needs to be **divided** by the iteration count.

They play nice with `kIsRate`:
* `kIsIterationInvariantRate`
* `kAvgIterationsRate`.

I'm not sure how  meaningful they are when combined with `kAvgThreads`.
I guess the `kIsThreadInvariant` can be added, too, for symmetry with `kAvgThreads`.

b123abdc

Use EXPECT_DOUBLE_EQ when comparing doubles in tests. (#624) · d8584bda
Dominic Hamon authored Jun 27, 2018
```
* Use EXPECT_DOUBLE_EQ when comparing doubles in tests.

Fixes #623

* disable 'float-equal' warning
```
d8584bda

18 Jun, 2018 1 commit

[Tooling] Enable U Test by default, add tooltip about repetition count. (#617) · 7d03f2df

authored Jun 18, 2018

As previously discussed, let's flip the switch ^^.

This exposes the problem that it will now be run
for everyone, even if one did not read the help
about the recommended repetition count.

This is not good. So i think we can do the smart thing:
```
$ ./compare.py benchmarks gbench/Inputs/test3_run{0,1}.json
Comparing gbench/Inputs/test3_run0.json to gbench/Inputs/test3_run1.json
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_One                   -0.1000         +0.1000            10             9           100           110
BM_Two                   +0.1111         -0.0111             9            10            90            89
BM_Two                   +0.2500         +0.1125             8            10            80            89
BM_Two_pvalue             0.2207          0.6831      U Test, Repetitions: 2. WARNING: Results unreliable! 9+ repetitions recommended.
BM_Two_stat              +0.0000         +0.0000             8             8            80            80
```
(old screenshot)
![image](https://user-images.githubusercontent.com/88600/41502182-ea25d872-71bc-11e8-9842-8aa049509b14.png)

Or, in the good case (noise omitted):
```
s$ ./compare.py benchmarks /tmp/run{0,1}.json
Comparing /tmp/run0.json to /tmp/run1.json
Benchmark                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------
<99 more rows like this>
./_T012014.RW2/threads:8/real_time                +0.0160         +0.0596            46            47            10            10
./_T012014.RW2/threads:8/real_time_pvalue          0.0000          0.0000      U Test, Repetitions: 100
./_T012014.RW2/threads:8/real_time_mean           +0.0094         +0.0609            46            47            10            10
./_T012014.RW2/threads:8/real_time_median         +0.0104         +0.0613            46            46            10            10
./_T012014.RW2/threads:8/real_time_stddev         -0.1160         -0.1807             1             1             0             0
```
(old screenshot)
![image](https://user-images.githubusercontent.com/88600/41502185-fb8193f4-71bc-11e8-85fa-cbba83e39db4.png)

7d03f2df

07 Jun, 2018 1 commit
- Disable deprecation warnings when -Werror is enabled. (#609) · 151ead62
  Dominic Hamon authored Jun 07, 2018
```
Fixes #608
```
  151ead62
06 Jun, 2018 1 commit
- Avoid using CMake 3.6 feature list(FILTER ...) (#612) · 505be96a
  Marat Dukhan authored Jun 06, 2018
```
list(FILTER ...) is a CMake 3.6 feature, but benchmark targets CMake 2.8.12
```
  505be96a
05 Jun, 2018 2 commits

cmake: use numeric version in package config (#611) · 1301f53e
Sergiu Deitsch authored Jun 05, 2018

1301f53e

Fix compilation on Android with GNU STL (#596) · 7fb3c564

authored Jun 05, 2018

* Fix compilation on Android with GNU STL

GNU STL in Android NDK lacks string conversion functions from C++11, including std::stoul, std::stoi, and std::stod.
This patch reimplements these functions in benchmark:: namespace using C-style equivalents from C++03.

* Avoid use of log2 which doesn't exist in Android GNU STL

GNU STL in Android NDK lacks log2 function from C99/C++11.
This patch replaces their use in the code with double log(double) function.

7fb3c564

01 Jun, 2018 1 commit

(clang-)format all the things (#610) · 4c2af078

authored Jun 01, 2018

* format all documents according to contributor guidelines and specifications
use clang-format on/off to stop formatting when it makes excessively poor decisions

* format all tests as well, and mark blocks which change too much

4c2af078

30 May, 2018 1 commit
- Some platforms and environments don't pass a valid argc/argv. (#607) · 4fbfa2f3
  Dominic Hamon authored May 30, 2018
```
Specifically some iOS targets.
```
  4fbfa2f3
29 May, 2018 6 commits

clang-format run on the benchmark header (#606) · d07372e6
Dominic Hamon authored May 29, 2018

d07372e6

Deprecate CSVReporter - A first step to overhauling reporting. (#488) · 7b8d0249

authored May 29, 2018

As @dominichamon and I have discussed, the current reporter interface
is poor at best. And something should be done to fix it.

I strongly suspect such a fix will require an entire reimagining
of the API, and therefore breaking backwards compatibility fully.

For that reason we should start deprecating and removing parts
that we don't intend to replace. One of these parts, I argue,
is the CSVReporter. I propose that the new reporter interface
should choose a single output format (JSON) and traffic entirely
in that. If somebody really wanted to replace the functionality
of the CSVReporter they would do so as an external tool which
transforms the JSON.

For these reasons I propose deprecating the CSVReporter.

7b8d0249

cleaner and slightly larger statistics tests (#604) · 16703ff8
Dominic Hamon authored May 29, 2018

16703ff8
Add some 'travis_wait' commands to avoid gcc@7 installation timeouts. (#605) · c8adf453
Dominic Hamon authored May 29, 2018

c8adf453

Benchmarking is hard. Making sense of the benchmarking results is even harder. (#593) · a6a1b0d7

authored May 29, 2018

The first problem you have to solve yourself. The second one can be aided.
The benchmark library can compute some statistics over the repetitions,
which helps with grasping the results somewhat.

But that is only for the one set of results. It does not really help to compare
the two benchmark results, which is the interesting bit. Thankfully, there are
these bundled `tools/compare.py` and `tools/compare_bench.py` scripts.

They can provide a diff between two benchmarking results. Yay!
Except not really, it's just a diff, while it is very informative and better than
nothing, it does not really help answer The Question - am i just looking at the noise?
It's like not having these per-benchmark statistics...

Roughly, we can formulate the question as:
> Are these two benchmarks the same?
> Did my change actually change anything, or is the difference below the noise level?

Well, this really sounds like a [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis), does it not?
So maybe we can use statistics here, and solve all our problems?
lol, no, it won't solve all the problems. But maybe it will act as a tool,
to better understand the output, just like the usual statistics on the repetitions...

I'm making an assumption here that most of the people care about the change
of average value, not the standard deviation. Thus i believe we can use T-Test,
be it either [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test), or [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test).
**EDIT**: however, after @dominichamon review, it was decided that it is better
to use more robust [Mann–Whitney U test](https://en.wikipedia.org/wiki/Mann–Whitney_U_test)
I'm using [scipy.stats.mannwhitneyu](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html#scipy.stats.mannwhitneyu).

There are two new user-facing knobs:
```
$ ./compare.py --help
usage: compare.py [-h] [-u] [--alpha UTEST_ALPHA]
{benchmarks,filters,benchmarksfiltered} ...

versatile benchmark output compare tool
<...>
optional arguments:
-h, --help show this help message and exit

-u, --utest Do a two-tailed Mann-Whitney U test with the null
hypothesis that it is equally likely that a randomly
selected value from one sample will be less than or
greater than a randomly selected value from a second
sample. WARNING: requires **LARGE** (9 or more)
number of repetitions to be meaningful!
--alpha UTEST_ALPHA significance level alpha. if the calculated p-value is
below this value, then the result is said to be
statistically significant and the null hypothesis is
rejected. (default: 0.0500)
```

Example output:
![screenshot_20180512_175517](https://user-images.githubusercontent.com/88600/39958581-ae897924-560d-11e8-81b9-806db6c3e691.png)
As you can guess, the alpha does affect anything but the coloring of the computed p-values.
If it is green, then the change in the average values is statistically-significant.

I'm detecting the repetitions by matching name. This way, no changes to the json are _needed_.
Caveats:
* This won't work if the json is not in the same order as outputted by the benchmark,
or if the parsing does not retain the ordering.
* This won't work if after the grouped repetitions there isn't at least one row with
different name (e.g. statistic). Since there isn't a knob to disable printing of statistics
(only the other way around), i'm not too worried about this.
* **The results will be wrong if the repetition count is different between the two benchmarks being compared.**
* Even though i have added (hopefully full) test coverage, the code of these python tools is staring
to look a bit jumbled.
* So far i have added this only to the `tools/compare.py`.
Should i add it to `tools/compare_bench.py` too?
Or should we deduplicate them (by removing the latter one)?

a6a1b0d7

Update README.md · ec0f69c2
Dominic Hamon authored May 29, 2018

ec0f69c2

25 May, 2018 1 commit

Add benchmark_main target. (#601) · e776aa02

authored May 25, 2018

* Add benchmark_main library with support for Bazel.

* fix newline at end of file

* Add CMake support for benchmark_main.

* Mention optionally using benchmark_main in README.

e776aa02

24 May, 2018 2 commits

Corrections, additions to initial doc (#600) · d7aed736

authored May 24, 2018

* Correct/clarify build/install instructions

GTest is google test, don't obsfucate needlessly for newcomers.
Adding google test into installation guide helps newcomers.
Third option under this  line: "Note that Google Benchmark requires Google Test to build and run the tests. This
dependency can be provided three ways:"
Was not true (did not occur). If there is a further option that needs to be specified in order for that functionality to work it needs to be specified.

* Add prerequisite knowledge section

A lot of assumptions are made about the reader in the documentation. This is unfortunate.

* Removal of abbreviations for google test

d7aed736

Return 0 from State::iterations() when not yet started. (#598) · ce3fde16

authored May 24, 2018

* Return a reasonable value from State::iterations() even before starting a benchmark

* Optimize State::iterations() for started case.

ce3fde16

14 May, 2018 1 commit
- split_list is not defined for assembly tests (#595) · 6d74c062
  Deniz Evrenci authored May 14, 2018
```
* Update AUTHORS and CONTRIBUTORS

* split_list is not defined for assembly tests
```
  6d74c062
09 May, 2018 1 commit
- Remove unnecessary memset functions. (#591) · e90801ae
  Nan Xiao authored May 09, 2018
  
  e90801ae
08 May, 2018 3 commits
- [Tools] Fix a few python3-compatibility issues (#585) · 718cc91d
  Roman Lebedev authored May 08, 2018
  
  718cc91d
- There is no "FATAL" in message(), only "FATAL_ERROR" (#584) · e8ddd907
  Roman Lebedev authored May 08, 2018
  
  e8ddd907
- Run git from the source directory (#589) (#590) · 16af6450
  php1ic authored May 08, 2018
```
Git was being executed in the current directory, so could not get the
latest tag if cmake was run from a build directory. Force git to be
run from with the source directory.
```
  16af6450
03 May, 2018 1 commit

Use __EMSCRIPTEN__ (rather then EMSCRIPTEN) to check for emscripten (#583) · 8986839e

authored May 03, 2018

The old EMSCRIPTEN macro is deprecated and not enabled when
EMCC_STRICT is set.

Also fix a typo in EMSCRIPTN (not sure how this ever worked).

8986839e

02 May, 2018 2 commits

Porting into OpenBSD (#582) · ea5551e7
Nan Xiao authored May 02, 2018

ea5551e7

Update bazel WORKSPACE and BUILD files to work better on Windows. (#581) · 62a9d756

authored May 02, 2018

Note, bazel only supports MSVC on Windows, and not MinGW, so
linking against shlwapi.lib only needs to follow MSVC conventions.

git_repository() did not work in local testing, so is swapped for
http_archive(). The latter is also documented as the preferred way
to depend on an external library in bazel.

62a9d756

01 May, 2018 1 commit

Fix bazel config to link against pthread. (#579) · b678a202

authored May 01, 2018

The benchmarks in the test/ currently build because they all
include a dep on gtest, which brings in pthread when needed.

b678a202

26 Apr, 2018 1 commit

Issue 571: Allow support for negative regex filtering (#576) · ed1bac84

authored Apr 26, 2018

* Allow support for negative regex filtering

This patch allows one to apply a negation to the entire regex filter
by appending it with a '-' character, much in the same style as
GoogleTest uses.

* Address issues in PR

* Add unit tests for negative filtering

ed1bac84

23 Apr, 2018 2 commits
- Add caching for cxx_feature_check (#573) · 105ac14b
  Yangqing Jia authored Apr 23, 2018
  
  105ac14b
- Fix precision loss warning in MSVC. (#574) · 64d4805d
  Victor Costan authored Apr 23, 2018
  
  64d4805d
19 Apr, 2018 1 commit

Report the actual iterations run. (#572) · c4858d80

authored Apr 19, 2018

Before this change, we would report the number of requested iterations
passed to the state. After, we will report the actual number run. As a
side-effect, instead of multiplying the expected iterations by the
number of threads to get the total number, we can report the actual
number of iterations across all threads, which takes into account the
situation where some threads might run more iterations than others.

c4858d80

12 Apr, 2018 1 commit

Ensure 64-bit truncation doesn't happen for complexity_n (#569) · 64e5a13f

authored Apr 12, 2018

* Ensure 64-bit truncation doesn't happen for complexity results

* One more complexity_n 64-bit fix

* Missed another vector of int

* Piping through the int64_t

64e5a13f

09 Apr, 2018 1 commit
- Optimize by using nth_element instead of partial_sort to find the median. (#565) · 50ffc781
  Fred Tingaud authored Apr 09, 2018
  
  50ffc781
06 Apr, 2018 1 commit
- Fix #564 - gmock/gmock.h not found in benchmark tests. · 2844167f
  Eric Fiselier authored Apr 05, 2018
  
  2844167f
03 Apr, 2018 1 commit

Allow AddRange to work with int64_t. (#548) · 9913418d

authored Apr 03, 2018

* Allow AddRange to work with int64_t.

Fixes #516

Also, tweak how we manage per-test build needs, and create a standard
_gtest suffix for googletest to differentiate from non-googletest tests.

I also ran clang-format on the files that I changed (but not the
benchmark include or main src as they have too many clang-format
issues).

* Add benchmark_gtest to cmake

* Set(Items|Bytes)Processed now take int64_t

9913418d