How to Benchmark Different Python Implementations with pytest-benchmark¶
Note
Most of this text was generated with AI.
This guide will walk you through setting up and running performance benchmarks
using pytest-benchmark. Benchmarking is crucial for making informed decisions
about which libraries or implementation strategies offer the best performance
for your specific use cases. We’ll use the common example of comparing two JSON
serialization libraries: the standard json and the faster orjson.
Why benchmark?¶
When you have multiple ways to achieve the same task (e.g., using different libraries or algorithms), benchmarks provide quantitative data on their performance. This data helps you:
Identify performance bottlenecks.
Choose the most efficient library/method for critical code paths.
Track performance regressions or improvements over time.
Justify technical decisions with concrete evidence.
Prerequisites¶
Before you start, make sure you have the following installed in your Python environment:
Python: (e.g., Python 3.8+)
uv: Or your preferred Python package manager/runner.pytest: The testing framework.pytest-benchmark: The pytest plugin for benchmarking.orjson: The alternative JSON library we’ll be testing against (the standardjsonlibrary is built-in).
You can install the necessary Python packages using uv:
uv pip install pytest pytest-benchmark orjson
How to Benchmark Different Python Implementations with pytest-benchmark.¶
This guide will walk you through setting up and running performance benchmarks
using pytest-benchmark. Benchmarking is crucial for making informed decisions
about which libraries or implementation strategies offer the best performance
for your specific use cases. We’ll use the common example of comparing two JSON
serialization libraries: the standard json and the faster orjson.
Why Benchmark?¶
When you have multiple ways to achieve the same task (e.g., using different libraries or algorithms), benchmarks provide quantitative data on their performance. This data helps you:
Identify performance bottlenecks.
Choose the most efficient library/method for critical code paths.
Track performance regressions or improvements over time.
Justify technical decisions with concrete evidence.
Prerequisites¶
Before you start, make sure you have the following installed in your Python environment:
Python: (e.g., Python 3.8+)
uv: Or your preferred Python package manager/runner.pytest: The testing framework.pytest-benchmark: The pytest plugin for benchmarking.orjson: The alternative JSON library we’ll be testing against (the standardjsonlibrary is built-in).
You can install the necessary Python packages using uv:
uv pip install pytest pytest-benchmark orjson
Setting up Your Benchmark File¶
Create a directory for your benchmark scripts. Following your project structure, let’s assume this is a
scripts/directory.Inside the
scripts/directory, create a new Python file for your benchmarks. For our JSON example, let’s name ittest_json_performance.py.project_root/ └── scripts/ └── test_json_performance.py
Writing Benchmark Functions¶
In your test_json_performance.py file, you’ll write functions that
pytest-benchmark can discover and run. Each function will test a specific
piece of code.
Here’s how to structure the benchmark for comparing json.dumps and orjson.dumps:
# scripts/test_json_performance.py
import pytest
import json
import orjson
# Sample data to be used for serialization
SAMPLE_DATA = {
"name": "Example User",
"email": "user@example.com",
"age": 30,
"is_active": True,
"balance": 1234.56,
"metadata": {"key" + str(i): "value" + str(i) for i in range(50)},
}
# Benchmark for the standard json library's dumps function
def benchmark_standard_json_dumps(benchmark):
"""Benchmarks the standard json.dumps() function."""
benchmark(json.dumps, SAMPLE_DATA)
def benchmark_orjson_dumps(benchmark):
"""Benchmarks the orjson.dumps() function."""
benchmark(orjson.dumps, SAMPLE_DATA)
SERIALIZED_JSON_STD = json.dumps(SAMPLE_DATA)
SERIALIZED_JSON_ORJSON = orjson.dumps(SAMPLE_DATA)
def benchmark_standard_json_loads(benchmark):
benchmark(json.loads, SERIALIZED_JSON_STD)
def benchmark_orjson_loads(benchmark):
benchmark(orjson.loads, SERIALIZED_JSON_ORJSON)
Key points in the code:
We import
pytestand the libraries we want to test (json,orjson).SAMPLE_DATAprovides a consistent input for all benchmarks.Each function starting with
benchmark_is recognized bypytest-benchmark.The
benchmarkfixture (provided bypytest-benchmark) is passed as an argument to these functions.You call
benchmark(function_to_test, arg1, arg2, ...)to run and measure thefunction_to_testwith its arguments.
Running the Benchmarks¶
To run your benchmarks, navigate to your project’s root directory in the terminal and use the command structure you’ve established:
uv run pytest scripts/test_json_performance.py
If you have multiple benchmark files in the scripts/ directory, you can run one by one.
uv run pytest scripts/{BENCHMARK}.py
Understanding the output¶
After running, pytest-benchmark will produce a table summarizing the
performance results. It will look something like this (the exact numbers will
vary based on your machine):
Name (time in us) |
Min |
Max |
Mean |
StdDev |
Median |
IQR |
Outliers(*) |
Rounds |
Iterations |
|---|---|---|---|---|---|---|---|---|---|
benchmark_orjson_dumps |
3.8530 (1.0) |
6.5290 (1.0) |
4.3386 (1.0) |
0.3104 (1.0) |
4.2600 (1.0) |
0.3045 (1.0) |
64;95 |
22893 |
1 |
benchmark_standard_json_dumps |
19.0930 (4.96) |
31.2950 (4.80) |
20.6635 (4.76) |
1.6072 (5.18) |
20.2170 (4.75) |
1.4480 (4.75) |
72;165 |
4633 |
1 |
benchmark_orjson_loads |
3.3270 (1.0) |
5.8330 (1.0) |
3.6799 (1.0) |
0.3019 (1.0) |
3.6020 (1.0) |
0.2660 (1.0) |
101;111 |
26329 |
1 |
benchmark_standard_json_loads |
6.8310 (2.05) |
11.2870 (1.94) |
7.5088 (2.04) |
0.7889 (2.61) |
7.2790 (2.02) |
0.6900 (2.59) |
84;116 |
12691 |
1 |
Key columns to look at:
Name: The name of your benchmark function.
Min, Max, Mean, Median: These are timings (often in microseconds,
us, or milliseconds,ms). Lower values are better. TheMeanorMedianare often good general indicators.StdDev: Standard deviation, showing the variability of the measurements. Lower is generally better, indicating more consistent performance.
Rounds: How many times the core benchmark loop was run by
pytest-benchmarkto gather statistics.Iterations: How many times your target function was called within each round.
Ops/s (or Rounds/s): Operations per second. Higher values are better. (This column might not always be present by default or may be named differently based on configuration, but “Min”, “Mean”, “Median” time are primary).
The numbers in parentheses (e.g., (1.0), (4.96)) next to the metrics for
benchmark_orjson_dumps show its performance relative to the baseline (the
fastest test, which is itself in this case). For
benchmark_standard_json_dumps, (4.96) next to its Min time means it was
4.96 times slower than the Min time of the fastest test
(benchmark_orjson_dumps).
From the example output, you could conclude that orjson is significantly
faster than the standard json for both dumps and loads operations on this
particular SAMPLE_DATA and machine.