How to Benchmark Different Python Implementations with pytest-benchmark

Note

Most of this text was generated with AI.

This guide will walk you through setting up and running performance benchmarks using pytest-benchmark. Benchmarking is crucial for making informed decisions about which libraries or implementation strategies offer the best performance for your specific use cases. We’ll use the common example of comparing two JSON serialization libraries: the standard json and the faster orjson.

Why benchmark?

When you have multiple ways to achieve the same task (e.g., using different libraries or algorithms), benchmarks provide quantitative data on their performance. This data helps you:

  • Identify performance bottlenecks.

  • Choose the most efficient library/method for critical code paths.

  • Track performance regressions or improvements over time.

  • Justify technical decisions with concrete evidence.

Prerequisites

Before you start, make sure you have the following installed in your Python environment:

  1. Python: (e.g., Python 3.8+)

  2. uv: Or your preferred Python package manager/runner.

  3. pytest: The testing framework.

  4. pytest-benchmark: The pytest plugin for benchmarking.

  5. orjson: The alternative JSON library we’ll be testing against (the standard json library is built-in).

You can install the necessary Python packages using uv:

uv pip install pytest pytest-benchmark orjson

How to Benchmark Different Python Implementations with pytest-benchmark.

This guide will walk you through setting up and running performance benchmarks using pytest-benchmark. Benchmarking is crucial for making informed decisions about which libraries or implementation strategies offer the best performance for your specific use cases. We’ll use the common example of comparing two JSON serialization libraries: the standard json and the faster orjson.

Why Benchmark?

When you have multiple ways to achieve the same task (e.g., using different libraries or algorithms), benchmarks provide quantitative data on their performance. This data helps you:

  • Identify performance bottlenecks.

  • Choose the most efficient library/method for critical code paths.

  • Track performance regressions or improvements over time.

  • Justify technical decisions with concrete evidence.

Prerequisites

Before you start, make sure you have the following installed in your Python environment:

  1. Python: (e.g., Python 3.8+)

  2. uv: Or your preferred Python package manager/runner.

  3. pytest: The testing framework.

  4. pytest-benchmark: The pytest plugin for benchmarking.

  5. orjson: The alternative JSON library we’ll be testing against (the standard json library is built-in).

You can install the necessary Python packages using uv:

uv pip install pytest pytest-benchmark orjson

Setting up Your Benchmark File

  1. Create a directory for your benchmark scripts. Following your project structure, let’s assume this is a scripts/ directory.

  2. Inside the scripts/ directory, create a new Python file for your benchmarks. For our JSON example, let’s name it test_json_performance.py.

    project_root/
    └── scripts/
        └── test_json_performance.py
    

Writing Benchmark Functions

In your test_json_performance.py file, you’ll write functions that pytest-benchmark can discover and run. Each function will test a specific piece of code.

Here’s how to structure the benchmark for comparing json.dumps and orjson.dumps:

# scripts/test_json_performance.py

import pytest
import json
import orjson

# Sample data to be used for serialization
SAMPLE_DATA = {
    "name": "Example User",
    "email": "user@example.com",
    "age": 30,
    "is_active": True,
    "balance": 1234.56,
    "metadata": {"key" + str(i): "value" + str(i) for i in range(50)},
}

# Benchmark for the standard json library's dumps function
def benchmark_standard_json_dumps(benchmark):
    """Benchmarks the standard json.dumps() function."""
    benchmark(json.dumps, SAMPLE_DATA)

def benchmark_orjson_dumps(benchmark):
    """Benchmarks the orjson.dumps() function."""
    benchmark(orjson.dumps, SAMPLE_DATA)


SERIALIZED_JSON_STD = json.dumps(SAMPLE_DATA)
SERIALIZED_JSON_ORJSON = orjson.dumps(SAMPLE_DATA)


def benchmark_standard_json_loads(benchmark):
    benchmark(json.loads, SERIALIZED_JSON_STD)


def benchmark_orjson_loads(benchmark):
    benchmark(orjson.loads, SERIALIZED_JSON_ORJSON)

Key points in the code:

  • We import pytest and the libraries we want to test (json, orjson).

  • SAMPLE_DATA provides a consistent input for all benchmarks.

  • Each function starting with benchmark_ is recognized by pytest-benchmark.

  • The benchmark fixture (provided by pytest-benchmark) is passed as an argument to these functions.

  • You call benchmark(function_to_test, arg1, arg2, ...) to run and measure the function_to_test with its arguments.

Running the Benchmarks

To run your benchmarks, navigate to your project’s root directory in the terminal and use the command structure you’ve established:

uv run pytest scripts/test_json_performance.py

If you have multiple benchmark files in the scripts/ directory, you can run one by one.

uv run pytest scripts/{BENCHMARK}.py

Understanding the output

After running, pytest-benchmark will produce a table summarizing the performance results. It will look something like this (the exact numbers will vary based on your machine):

Name (time in us)

Min

Max

Mean

StdDev

Median

IQR

Outliers(*)

Rounds

Iterations

benchmark_orjson_dumps

3.8530 (1.0)

6.5290 (1.0)

4.3386 (1.0)

0.3104 (1.0)

4.2600 (1.0)

0.3045 (1.0)

64;95

22893

1

benchmark_standard_json_dumps

19.0930 (4.96)

31.2950 (4.80)

20.6635 (4.76)

1.6072 (5.18)

20.2170 (4.75)

1.4480 (4.75)

72;165

4633

1

benchmark_orjson_loads

3.3270 (1.0)

5.8330 (1.0)

3.6799 (1.0)

0.3019 (1.0)

3.6020 (1.0)

0.2660 (1.0)

101;111

26329

1

benchmark_standard_json_loads

6.8310 (2.05)

11.2870 (1.94)

7.5088 (2.04)

0.7889 (2.61)

7.2790 (2.02)

0.6900 (2.59)

84;116

12691

1

Key columns to look at:

  • Name: The name of your benchmark function.

  • Min, Max, Mean, Median: These are timings (often in microseconds, us, or milliseconds, ms). Lower values are better. The Mean or Median are often good general indicators.

  • StdDev: Standard deviation, showing the variability of the measurements. Lower is generally better, indicating more consistent performance.

  • Rounds: How many times the core benchmark loop was run by pytest-benchmark to gather statistics.

  • Iterations: How many times your target function was called within each round.

  • Ops/s (or Rounds/s): Operations per second. Higher values are better. (This column might not always be present by default or may be named differently based on configuration, but “Min”, “Mean”, “Median” time are primary).

The numbers in parentheses (e.g., (1.0), (4.96)) next to the metrics for benchmark_orjson_dumps show its performance relative to the baseline (the fastest test, which is itself in this case). For benchmark_standard_json_dumps, (4.96) next to its Min time means it was 4.96 times slower than the Min time of the fastest test (benchmark_orjson_dumps).

From the example output, you could conclude that orjson is significantly faster than the standard json for both dumps and loads operations on this particular SAMPLE_DATA and machine.