Split out cumulative vs. delta storage#8015
Split out cumulative vs. delta storage#8015jack-berg wants to merge 2 commits intoopen-telemetry:mainfrom
Conversation
| aggregatorHolder.activeRecordingThreads.addAndGet(-2); | ||
| } | ||
| } while (true); | ||
| } |
There was a problem hiding this comment.
This is the key work that cumulative instruments get to avoid.
On each collection, delta instruments substitute out the ConcurrentHashMap which holds AggregatorHandle instances corresponding to each series. In order to prevent lost write scenarios of the following sequence, delta requires additional coordination between collection and recording threads:
- T1: record thread grabs instance 1 of ConcurrentHashMap
- T2: collect thread obtains instance 1 of ConcurrentHashMap, and collects from all series
- T3: collect thread updates the ConcurrentHashMap to a new instance 2
- T4: record thread records to instance 1 of ConcurrentHashMap, which is never seen by any current or future collect
There was a problem hiding this comment.
In prom client java, we have 2 sets of variables (for the map in this case) and switch between them using an atomic. I wonder if we could apply this pattern here as well.
There was a problem hiding this comment.
It's quite involved, but you can ask the AI to explain you how it works: https://github.com/prometheus/client_java/blob/main/prometheus-metrics-core/src/main/java/io/prometheus/metrics/core/metrics/Buffer.java
There was a problem hiding this comment.
I'm familiar with how the prometheus works and will be incrementally applying techniques where useful / applicable - I'm trying to optimize the perf over there as well. The differences between prom and otel boil down to otel needing to accommodate all combinations of memory mode (immutable, reusable) and temporality (cumulative, delta), where prom client java only has the equivalent of immutable memory mode and cumulative temporality.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8015 +/- ##
=========================================
Coverage 90.17% 90.18%
+ Complexity 7479 7456 -23
=========================================
Files 834 834
Lines 22540 22569 +29
Branches 2236 2241 +5
=========================================
+ Hits 20326 20353 +27
+ Misses 1511 1510 -1
- Partials 703 706 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…y-java into split-sync-metric-storage
There was a problem hiding this comment.
Pull request overview
Refactors synchronous metric storage to separate cumulative vs. delta aggregation temporality implementations, aiming to simplify logic and improve performance under contention (especially for cumulative).
Changes:
- Replaced monolithic
DefaultSynchronousMetricStorageimplementation with distinct delta and cumulative implementations selected via a factory method. - Updated
SynchronousMetricStorage.createto delegate to the newDefaultSynchronousMetricStorage.createfactory. - Updated unit tests to construct storage via the factory and adjust internal assertions accordingly.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java | Splits storage logic into delta vs. cumulative implementations and introduces a factory constructor path. |
| sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/state/SynchronousMetricStorage.java | Switches to using the new storage factory rather than directly instantiating the default storage. |
| sdk/metrics/src/test/java/io/opentelemetry/sdk/metrics/internal/state/SynchronousMetricStorageTest.java | Updates tests to use the new factory and adapts assertions around pooling/reset behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assertThat(storage) | ||
| .extracting("aggregatorHandlePool", as(collection(AggregatorHandle.class))) | ||
| .hasSize(0); |
There was a problem hiding this comment.
These assertions reach into a private implementation detail via reflective field extraction ("aggregatorHandlePool"). This is brittle (renames/refactors won’t be caught by the compiler) and leads to repeated boilerplate throughout the test. Consider reintroducing a small test hook on DefaultSynchronousMetricStorage (e.g., a package-private/VisibleForTesting method that returns the pool size or an optional pool) so tests can assert pooling behavior without relying on reflection, or alternatively assert pooling indirectly via createHandle() call counts and remove the pool-size checks.
There was a problem hiding this comment.
The aggregatorHandlePool field now only exists on the delta storage, and so tests would need to cast to the storage to delta prior to invoking the visible-for-testing method. I don't want to move aggregatorHandlePool down to the base class DefaultSynchronousMetricStorage because its not a concern for cumulative.
Split our
DefaultSynchronousMetricStorageinto delta and cumulative implementations. This serves two purposes:if (aggregationTemporality == x & memoryMode = y) {..}style special handling. Its hard to keep it all straight when its all mashed together.Before (From #8000):
After:
I calculate ~3x improvement for cumulative instruments under contention (i.e. 4 thread cases) and more modest ~1.2x improvement for cumulative instruments without contention (i.e. 1 thread cases).