feat(transport): add HTTP retry with exponential backoff by jpnurmi · Pull Request #1520 · getsentry/sentry-native

jpnurmi · 2026-02-13T15:44:54Z

Add HTTP retry with exponential backoff for network failures, modeled after Crashpad's upload retry behavior.

Note

Adds an explicit 15s connect timeout to both curl and WinHTTP, matching the value used by Crashpad. This reduces the time the transport worker is blocked on unreachable hosts, previously ~75s on curl (OS TCP SYN retry limit) and 60s on WinHTTP.

Failed envelopes are stored as <db>/cache/<ts>-<n>-<uuid>.envelope and retried on startup after a 100ms throttle, and then with exponential backoff (15min, 30min, 1h, 2h, 4h, 8h). When retries are exhausted, and offline caching is enabled, envelopes are stored as <db>/cache/<uuid>.envelope instead of being discarded.

flowchart TD
    startup --> R{retry?}
    R -->|yes| throttle
    R -->|no| C{cache?}
    throttle -. 100ms .-> resend
    resend -->|success| C
    resend -->|fail| C2[&lt;db&gt;/cache/<br/>&lt;ts&gt;-&lt;n&gt;-&lt;uuid&gt;.envelope]
    C2 --> backoff
    backoff -. 2ⁿ×15min .-> resend
    C -->|yes| CACHE[&lt;db&gt;/cache/<br/>&lt;uuid&gt;.envelope]
    C -->|no| discard

Builds upon:

See also:

github-actions · 2026-02-13T15:45:22Z

	Messages
📖	Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 889ba35

jpnurmi · 2026-02-16T08:59:33Z

@sentry review

jpnurmi · 2026-02-16T08:59:39Z

@cursor review

src/sentry_sync.c

src/transports/sentry_http_transport.c

src/sentry_retry.c

src/transports/sentry_http_transport_winhttp.c

src/sentry_retry.c

jpnurmi · 2026-02-16T15:21:30Z

@sentry review

jpnurmi · 2026-02-16T15:21:38Z

@cursor review

src/transports/sentry_http_transport.c

src/sentry_retry.c

jpnurmi · 2026-02-16T16:09:06Z

@cursor review

src/sentry_sync.c

src/sentry_database.c

jpnurmi · 2026-02-16T16:41:48Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

jpnurmi · 2026-02-16T19:42:40Z

@cursor review

src/sentry_retry.c

src/sentry_transport.c

jpnurmi · 2026-02-17T09:34:34Z

@cursor review

src/sentry_retry.c

src/sentry_sync.c

jpnurmi · 2026-02-17T10:27:24Z

@sentry review

jpnurmi · 2026-02-17T10:27:33Z

@cursor review

src/sentry_options.c

jpnurmi · 2026-02-17T11:22:11Z

@cursor review

jpnurmi · 2026-02-23T15:47:36Z

src/sentry_options.c

    opts->traces_sample_rate = 0.0;
    opts->max_spans = SENTRY_SPANS_MAX;
    opts->handler_strategy = SENTRY_HANDLER_STRATEGY_DEFAULT;
+    opts->http_retry = true;


Question: default to false to avoid behavioral change? On the other hand, Crashpad already retries, so defaulting to true actually makes the behavior consistent across backends.

I generally think retry should be on by default, but the bigger problem is the consequence behind that very basic binary decision.

Retry is what most users expect from other SDKs, since most HTTP clients in higher-level languages retry by default. But those retry mechanisms are often trivial and short-lived (5 attempts with exponential back-off, but not for hours or even half an hour), and they do not account for in-flight requests that persist, so we can continue retrying on the next start.

We can essentially center on:

default to false to avoid behavioral change

What behavior should a user expect when no retry is occurring? Is this about the storage of inflight envelopes after a crash or a shutdown? Then yes, retry-by-default could be a problem.

Crashpad already retries, so defaulting to true actually makes the behavior consistent across backends

Yes, but also consider that this affects all events, not only crash events. In the general case, there will be many more regular events than crash events.

This brings us back to the initial point about retry mechanisms: exposing them requires configuration.

If we don't want to expose the retry policy configuration as a first step, we shouldn't default to retrying behind a binary option.

supervacuus

I think this is a sensible retry implementation, and the policy makes sense as well. However, as discussed initially, it applies a single policy for all use cases and hides it behind a binary interface.

I think it is highly likely that all currently hard-coded configurations (timeout, attempts, interval, throttle, http-response codes, etc.) will have to be exposed at some point. And I wonder why we wouldn't do that right away.

src/transports/sentry_http_transport_curl.c

src/transports/sentry_http_transport_winhttp.c

include/sentry.h

src/transports/sentry_http_transport.c

jpnurmi · 2026-02-24T11:14:44Z

I think it is highly likely that all currently hard-coded configurations (timeout, attempts, interval, throttle, http-response codes, etc.) will have to be exposed at some point. And I wonder why we wouldn't do that right away.

This PR started with a configurable number of retries, but after discussing it with others, we ended up simplifying it to a boolean because we didn't find these details exposed in other SDKs.

For the other parameters, I just decided to follow Crashpad. Now that we have a conversion from Crashpad reports to Sentry envelopes for offline caching, there's an opportunity to disable retries in Crashpad and let sentry-native take care of it in a backend-agnostic way. Then we could also easily make any retry parameters customizable.

…-retry

- Document the 5-attempt retry limit - Note there is no rate limiting between attempts - Warn about potential event loss during extended network outages

…ponses Only network failures (negative status codes) trigger retries. HTTP responses including 5xx (500, 502, 503, 504) are discarded.

src/sentry_retry.c

cursor · 2026-02-24T14:09:40Z

Bugbot Autofix prepared fixes for 1 of the 1 bugs found in the latest run.

✅ Fixed: Enqueue side effects fire even when no file written
- Changed sentry__retry_write_envelope to return bool and modified sentry__retry_enqueue to only clear startup_time and schedule poll task when a file was actually written.

Or push these changes by commenting:

@cursor push 41e84b4fc2

Preview (41e84b4fc2)

diff --git a/src/sentry_retry.c b/src/sentry_retry.c
--- a/src/sentry_retry.c
+++ b/src/sentry_retry.c
@@ -121,13 +121,13 @@
     return sentry__path_join_str(retry->cache_path, filename);
 }
 
-void
+bool
 sentry__retry_write_envelope(
     sentry_retry_t *retry, const sentry_envelope_t *envelope)
 {
     sentry_uuid_t event_id = sentry__envelope_get_event_id(envelope);
     if (sentry_uuid_is_nil(&event_id)) {
-        return;
+        return false;
     }
 
     char uuid[37];
@@ -139,9 +139,13 @@
         if (sentry_envelope_write_to_path(envelope, path) != 0) {
             SENTRY_WARNF(
                 "failed to write retry envelope to \"%s\"", path->path);
+            sentry__path_free(path);
+            return false;
         }
         sentry__path_free(path);
+        return true;
     }
+    return false;
 }
 
 static bool
@@ -379,7 +383,9 @@
     if (sentry__atomic_fetch(&retry->sealed)) {
         return;
     }
-    sentry__retry_write_envelope(retry, envelope);
+    if (!sentry__retry_write_envelope(retry, envelope)) {
+        return;
+    }
     // prevent the startup poll from re-processing this session's envelope
     retry->startup_time = 0;
     if (sentry__atomic_compare_swap(&retry->scheduled, 0, 1)) {

diff --git a/src/sentry_retry.h b/src/sentry_retry.h
--- a/src/sentry_retry.h
+++ b/src/sentry_retry.h
@@ -38,8 +38,9 @@
 
 /**
  * Writes an event envelope to the retry dir. Non-event envelopes are skipped.
+ * Returns true if a file was written, false otherwise.
  */
-void sentry__retry_write_envelope(
+bool sentry__retry_write_envelope(
     sentry_retry_t *retry, const sentry_envelope_t *envelope);
 
 /**

_{This Bugbot Autofix run was free. To enable Autofix for future PRs, go to the Cursor dashboard.}

jpnurmi · 2026-02-24T14:12:46Z

@cursor push 41e84b4

cursor · 2026-02-24T14:12:51Z

Could not push Autofix changes. The PR branch may have changed since the Autofix ran, or the Autofix commit may no longer exist.

- Change sentry__retry_write_envelope to return bool indicating success - Return false for nil event IDs (session envelopes) and write failures - sentry__retry_enqueue now returns early if write fails, preserving startup_time for session envelopes so retry_flush_task can flush them

src/sentry_retry.c

Add null check after cloning cache_path to prevent dereferencing null later in sentry__retry_send when iterating directory or joining paths.

src/sentry_retry.c

src/transports/sentry_http_transport.c

src/sentry_retry.c

When system clock moves backward (now < ts), the condition now >= ts was false, causing the backoff check to be skipped entirely. This made items immediately eligible for retry regardless of their count. Now checks if now < ts (clock skew) OR if backoff hasn't elapsed.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/sentry_retry.c

Crashpad's kRetryAttempts=5 with `upload_attempts > kRetryAttempts` (checked before post-increment) allows upload_attempts 0-5, giving 6 retries with backoffs: 15m, 30m, 1h, 2h, 4h, 8h. sentry-native's `count + 1 < SENTRY_RETRY_ATTEMPTS` with the old value of 5 only allowed counts 0-3 to be re-enqueued, so the max backoff reached was 4h. Bumping to 6 gives the same 6 retries and the full backoff sequence up to 8h. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename sentry__retry_flush to sentry__retry_shutdown and remove the bgworker_flush call so bgworker_shutdown gets the full timeout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

src/sentry_retry.c

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/sentry_retry.c

Check the sentry__path_rename return value and log a warning on failure instead of silently ignoring it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move eligible++ after all item fields are populated so a NULL path from allocation failure does not leave a half-initialized item in the array. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-24T18:26:26Z

src/sentry_retry.c

+    sentry_uuid_t event_id = sentry__envelope_get_event_id(envelope);
+    if (sentry_uuid_is_nil(&event_id)) {
+        return false;
+    }


Retry drops session-only envelopes

Medium Severity

sentry__retry_write_envelope only persists envelopes that have a non-nil event_id. Envelopes containing only sessions (and other non-event payloads) will not be written to the retry queue on network failures, so those telemetry items get discarded even when HTTP retry is enabled.

src/sentry_retry.c

Replace mutable startup_time + sealed with a state enum so that the startup flag is managed via atomic operations instead of clearing a uint64_t that may tear on 32-bit systems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

src/sentry_database.c

If sentry__retry_new() fails, retry_func was still set on the transport, causing can_retry to return true and envelopes to be dropped instead of cached. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry · 2026-02-24T19:36:04Z

src/sentry_retry.c

+    }
+    sentry__atomic_compare_swap(
+        &retry->state, SENTRY_RETRY_STARTUP, SENTRY_RETRY_RUNNING);
+    if (sentry__atomic_compare_swap(&retry->scheduled, 0, 1)) {
+        sentry__bgworker_submit_delayed(retry->bgworker, retry_poll_task, NULL,


Bug: A race condition between retry_poll_task and retry_enqueue can cause a new retry item to be queued without a corresponding poll task being scheduled, delaying its processing.
_{Severity: MEDIUM}

Suggested Fix

The race condition can be resolved by modifying retry_poll_task. After setting retry->scheduled to 0, it should re-check if the queue is empty. If new items have been added in the meantime, it should immediately attempt to schedule a new poll task itself by performing the same compare_swap operation used in retry_enqueue. This ensures that even if a race occurs, the system self-corrects and schedules the necessary poll.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry_retry.c#L408-L412 Potential issue: A race condition exists between the background worker polling for retries and the function that enqueues new items for retry. Specifically, when the poll task (`retry_poll_task`) finishes processing the queue, it sets a `scheduled` flag to 0. However, there is a small window before this flag is set where a new item can be enqueued by `retry_enqueue`. In this scenario, `retry_enqueue` will see the `scheduled` flag as 1, assume a poll is active, and fail to schedule a new one. The poll task then proceeds to set `scheduled` to 0, leaving the newly enqueued item stranded in the queue with no scheduled task to process it.

jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from b083a57 to a264f66 Compare February 13, 2026 17:47

sentry bot reviewed Feb 16, 2026

View reviewed changes

src/sentry_sync.c Show resolved Hide resolved

src/transports/sentry_http_transport.c Show resolved Hide resolved

src/sentry_retry.c Outdated Show resolved Hide resolved

cursor bot reviewed Feb 16, 2026

View reviewed changes

src/transports/sentry_http_transport_winhttp.c Outdated Show resolved Hide resolved

src/sentry_retry.c Show resolved Hide resolved

This was referenced Feb 16, 2026

ref: parse event_id from raw envelopes #1512

Merged

feat(sync): add delayed task submission #1506

Merged

cursor bot reviewed Feb 16, 2026

View reviewed changes

src/transports/sentry_http_transport.c Show resolved Hide resolved

src/sentry_retry.c Show resolved Hide resolved

cursor bot reviewed Feb 16, 2026

View reviewed changes

src/sentry_sync.c Outdated Show resolved Hide resolved

src/sentry_database.c Show resolved Hide resolved

cursor bot reviewed Feb 16, 2026

View reviewed changes

jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from 243c880 to d6aa792 Compare February 16, 2026 19:41

cursor bot reviewed Feb 16, 2026

View reviewed changes

src/sentry_retry.c Show resolved Hide resolved

src/sentry_transport.c Outdated Show resolved Hide resolved

jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from fbbffb2 to abd5815 Compare February 17, 2026 08:34

cursor bot reviewed Feb 17, 2026

View reviewed changes

src/sentry_retry.c Show resolved Hide resolved

src/sentry_sync.c Show resolved Hide resolved

jpnurmi force-pushed the jpnurmi/feat/http-retry branch from f030cf9 to 22e3fc4 Compare February 17, 2026 10:24

cursor bot reviewed Feb 17, 2026

View reviewed changes

src/sentry_options.c Show resolved Hide resolved

jpnurmi force-pushed the jpnurmi/feat/http-retry branch from 22e3fc4 to ffce486 Compare February 17, 2026 10:49

jpnurmi requested a review from mujacica February 23, 2026 15:42

jpnurmi commented Feb 23, 2026

View reviewed changes

supervacuus approved these changes Feb 24, 2026

View reviewed changes

src/transports/sentry_http_transport_curl.c Show resolved Hide resolved

src/transports/sentry_http_transport_winhttp.c Show resolved Hide resolved

include/sentry.h Show resolved Hide resolved

src/transports/sentry_http_transport.c Show resolved Hide resolved

jpnurmi added 3 commits February 24, 2026 14:29

Merge remote-tracking branch 'upstream/master' into jpnurmi/feat/http…

9feac1b

…-retry

docs: clarify sentry_transport_retry behavior and limitations

a710a78

- Document the 5-attempt retry limit - Note there is no rate limiting between attempts - Warn about potential event loss during extended network outages

docs(retry): document retry behavior for network failures vs HTTP res…

282bd46

…ponses Only network failures (negative status codes) trigger retries. HTTP responses including 5xx (500, 502, 503, 504) are discarded.