Skip to content

Conversation

@aldenks
Copy link
Contributor

@aldenks aldenks commented Jan 20, 2026

Parameterize with shards and no shards. Also parameterize with local + memory store to have an example of a store which has some modest latency.

When testing performance of optimizing sharded zarr reads I found that the memory store is too fast to show any difference, but the local store makes performance improvements there very visible. The extra parameterizations do expand the test space; test_slice_indexing with benchmarking on now takes ~30 seconds vs ~7 seconds on main.

Parameterize with shards and no shards.
Parameterize with local + memory store to have
an example of a store which has some modest latency.
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 20, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 20, 2026

Merging this PR will not alter performance

✅ 24 untouched benchmarks
🆕 24 new benchmarks
⏩ 6 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 WallTime test_slice_indexing[None-(slice(0, None, 4), slice(0, None, 4), slice(0, None, 4))-memory] N/A 500.6 ms N/A
🆕 WallTime test_slice_indexing[None-(slice(None, None, None), slice(None, None, None), slice(None, None, None))-memory] N/A 502.7 ms N/A
🆕 WallTime test_slice_indexing[None-(slice(None, None, None), slice(None, None, None), slice(None, None, None))-local] N/A 937.7 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(0, None, 4), slice(0, None, 4), slice(0, None, 4))-memory] N/A 541.9 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-memory] N/A 286.5 ms N/A
🆕 WallTime test_slice_indexing[None-(slice(0, None, 4), slice(0, None, 4), slice(0, None, 4))-local] N/A 930.3 ms N/A
🆕 WallTime test_slice_indexing[None-(slice(None, 10, None), slice(None, 10, None), slice(None, 10, None))-memory] N/A 938.9 µs N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(None, None, None), slice(None, None, None), slice(None, None, None))-memory] N/A 546.7 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(0, None, 4), slice(0, None, 4), slice(0, None, 4))-local] N/A 658.9 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(None, None, None), slice(None, None, None), slice(None, None, None))-local] N/A 671.5 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-memory] N/A 7.6 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(None, 10, None), slice(None, 10, None), slice(None, 10, None))-local] N/A 2.7 ms N/A
🆕 WallTime test_slice_indexing[None-(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-memory] N/A 274.2 ms N/A
🆕 WallTime test_slice_indexing[None-(slice(None, 10, None), slice(None, 10, None), slice(None, 10, None))-local] N/A 1.2 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-local] N/A 486 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(0, 0, 0)-local] N/A 2.6 ms N/A
🆕 WallTime test_slice_indexing[None-(slice(10, -10, 4), slice(10, -10, 4), slice(10, -10, 4))-local] N/A 510.9 ms N/A
🆕 WallTime test_slice_indexing[None-(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-memory] N/A 4.7 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(0, 0, 0)-memory] N/A 1.9 ms N/A
🆕 WallTime test_slice_indexing[(50, 50, 50)-(slice(None, None, None), slice(0, 3, 2), slice(0, 10, None))-local] N/A 12.7 ms N/A
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing aldenks:shard-slice-indexing-benchmark (0542993) with main (68359bc)

Open in CodSpeed

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 20, 2026

if we want more control over the latency, we could wrap memory storage with a LatencyStore here. This would keep the benchmark a bit more isolated from stuff the OS is doing than localstore. Happy to implement this if you don't mind me pushing to this branch

@aldenks
Copy link
Contributor Author

aldenks commented Jan 20, 2026

@d-v-b go ahead! If you make it I can't be accused of juicing my numbers by picking high latencies ;p

In all seriousness, that would be quite useful for picking a good default coalesce_max_gap_bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants