Benchmark and Job Baselines

In order to scale your results, you can mark a benchmark method or a job as a baseline. Let's learn this feature by examples.

Sample: IntroBenchmarkBaseline

You can mark a method as a baseline with the help of [Benchmark(Baseline = true)].

Source code

using System.Threading;
using BenchmarkDotNet.Attributes;

namespace BenchmarkDotNet.Samples
{
    public class IntroBenchmarkBaseline
    {
        [Benchmark]
        public void Time50() => Thread.Sleep(50);

        [Benchmark(Baseline = true)]
        public void Time100() => Thread.Sleep(100);

        [Benchmark]
        public void Time150() => Thread.Sleep(150);
    }
}

Output

As a result, you will have additional Ratio column in the summary table:

|  Method |      Mean |     Error |    StdDev | Ratio |
|-------- |----------:|----------:|----------:|------:|
|  Time50 |  50.46 ms | 0.0779 ms | 0.0729 ms |  0.50 |
| Time100 | 100.39 ms | 0.0762 ms | 0.0713 ms |  1.00 |
| Time150 | 150.48 ms | 0.0986 ms | 0.0922 ms |  1.50 |

This column contains the mean value of the ratio distribution.

For example, in the case of Time50, we divide the first measurement of Time50 into the first measurement of Time100 (it's the baseline), the second measurement of Time50 into the second measurement of Time100, and so on. Next, we calculate the mean of all these values and display it in the Ratio column. For Time50, we have 0.50.

The Ratio column was formerly known as Scaled. The old title was a source of misunderstanding and confusion because many developers interpreted it as the ratio of means (e.g., 50.46/100.39 for Time50). The ratio of distribution means and the mean of the ratio distribution are pretty close to each other in most cases, but they are not equal.

In @BenchmarkDotNet.Samples.IntroRatioStdDev, you can find an example of how this value can be spoiled by outliers.

Sample: IntroRatioSD

The ratio of two benchmarks is not a single number, it's a distribution. In most simple cases, the range of the ratio distribution is narrow, and BenchmarkDotNet displays a single column Ratio with the mean value. However, it also adds the RatioSD column (the standard deviation of the ratio distribution) in complex situations. In the below example, the baseline benchmark is spoiled by a single outlier

Source code

using System.Threading;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;
using Perfolizer.Mathematics.OutlierDetection;

namespace BenchmarkDotNet.Samples
{
    // Don't remove outliers
    [Outliers(OutlierMode.DontRemove)]
    // Skip jitting, pilot, warmup; measure 10 iterations
    [SimpleJob(RunStrategy.Monitoring, iterationCount: 10, invocationCount: 1)]
    public class IntroRatioSD
    {
        private int counter;

        [GlobalSetup]
        public void Setup() => counter = 0;

        [Benchmark(Baseline = true)]
        public void Base()
        {
            Thread.Sleep(100);
            if (++counter % 7 == 0)
                Thread.Sleep(5000); // Emulate outlier
        }

        [Benchmark]
        public void Slow() => Thread.Sleep(200);

        [Benchmark]
        public void Fast() => Thread.Sleep(50);
    }
}

Output

Here are statistics details for the baseline benchmark:

Mean = 600.6054 ms, StdErr = 500.0012 ms (83.25%); N = 10, StdDev = 1,581.1428 ms
Min = 100.2728 ms, Q1 = 100.3127 ms, Median = 100.4478 ms, Q3 = 100.5011 ms, Max = 5,100.6163 ms
IQR = 0.1884 ms, LowerFence = 100.0301 ms, UpperFence = 100.7837 ms
ConfidenceInterval = [-1,789.8568 ms; 2,991.0677 ms] (CI 99.9%), Margin = 2,390.4622 ms (398.01% of Mean)
Skewness = 2.28, Kurtosis = 6.57, MValue = 2
-------------------- Histogram --------------------
[-541.891 ms ;  743.427 ms) | @@@@@@@@@
[ 743.427 ms ; 2027.754 ms) | 
[2027.754 ms ; 3312.082 ms) | 
[3312.082 ms ; 4458.453 ms) | 
[4458.453 ms ; 5742.780 ms) | @
---------------------------------------------------

As you can see, a single outlier significantly affected the metrics. Because of this, BenchmarkDotNet adds the Median and the RatioSD columns in the summary table:

 Method |      Mean |         Error |        StdDev |    Median | Ratio | RatioSD |
------- |----------:|--------------:|--------------:|----------:|------:|--------:|
   Base | 600.61 ms | 2,390.4622 ms | 1,581.1428 ms | 100.45 ms |  1.00 |    0.00 |
   Slow | 200.50 ms |     0.4473 ms |     0.2959 ms | 200.42 ms |  1.80 |    0.62 |
   Fast |  50.54 ms |     0.3435 ms |     0.2272 ms |  50.48 ms |  0.45 |    0.16 |

Let's look at the Base and Slow benchmarks. The Mean values are 600 and 200 milliseconds; the "Scaled Mean" value is 0.3. The Median values are 100 and 200 milliseconds; the "Scaled Median" value is 2. Both values are misleading. BenchmarkDotNet evaluates the ratio distribution and displays the mean (1.80) and the standard deviation (0.62).

Sample: IntroCategoryBaseline

The only way to have several baselines in the same class is to separate them by categories and mark the class with [GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)].

Source code

using System.Threading;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;

namespace BenchmarkDotNet.Samples
{
    [GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
    [CategoriesColumn]
    public class IntroCategoryBaseline
    {
        [BenchmarkCategory("Fast"), Benchmark(Baseline = true)]
        public void Time50() => Thread.Sleep(50);

        [BenchmarkCategory("Fast"), Benchmark]
        public void Time100() => Thread.Sleep(100);

        [BenchmarkCategory("Slow"), Benchmark(Baseline = true)]
        public void Time550() => Thread.Sleep(550);

        [BenchmarkCategory("Slow"), Benchmark]
        public void Time600() => Thread.Sleep(600);
    }
}

Output

|  Method | Categories |      Mean |     Error |    StdDev | Ratio |
|-------- |----------- |----------:|----------:|----------:|------:|
|  Time50 |       Fast |  50.46 ms | 0.0745 ms | 0.0697 ms |  1.00 |
| Time100 |       Fast | 100.47 ms | 0.0955 ms | 0.0893 ms |  1.99 |
|         |            |           |           |           |       |
| Time550 |       Slow | 550.48 ms | 0.0525 ms | 0.0492 ms |  1.00 |
| Time600 |       Slow | 600.45 ms | 0.0396 ms | 0.0331 ms |  1.09 |

Sample: IntroJobBaseline

If you want to compare several runtime configuration, you can mark one of your jobs with baseline = true.

Source code

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;

namespace BenchmarkDotNet.Samples
{
    [SimpleJob(runtimeMoniker: RuntimeMoniker.Net462, baseline: true)]
    [SimpleJob(runtimeMoniker: RuntimeMoniker.Mono)]
    [SimpleJob(runtimeMoniker: RuntimeMoniker.Net50)]
    public class IntroJobBaseline
    {
        [Benchmark]
        public int SplitJoin()
            => string.Join(",", new string[1000]).Split(',').Length;
    }
}

Output

BenchmarkDotNet=v0.10.12, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8
Frequency=2531249 Hz, Resolution=395.0619 ns, Timer=TSC
.NET Core SDK=2.0.3
  [Host]     : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
  Job-MXFYPZ : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2600.0
  Core       : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
  Mono       : Mono 5.4.0 (Visual Studio), 64bit

    Method | Runtime |     Mean |     Error |    StdDev | Ratio | RatioSD |
---------- |-------- |---------:|----------:|----------:|------:|--------:|
 SplitJoin |     Clr | 19.42 us | 0.2447 us | 0.1910 us |  1.00 |    0.00 |
 SplitJoin |    Core | 13.00 us | 0.2183 us | 0.1935 us |  0.67 |    0.01 |
 SplitJoin |    Mono | 39.14 us | 0.7763 us | 1.3596 us |  2.02 |    0.07 |

Table of Contents

Benchmark and Job Baselines

Sample: IntroBenchmarkBaseline

Source code

Output

Links

Sample: IntroRatioSD

Source code

Output

Links

Sample: IntroCategoryBaseline

Source code

Output

Links

Sample: IntroJobBaseline

Source code

Output

Links