feat: add retry logic by jcabrero · Pull Request #46 · NillionNetwork/blacklight-node

jcabrero · 2026-01-23T12:32:41Z

The other day, we saw that one of the potential main causes of Inconclusive requests are transient network issues. These can come in many forms (e.g. AMD blocking certificate downloads, GitHub not being available, etc.). These issues are usually transient and can be resolved by retrying after some time.

This PR introduces retry logic to automatically retry functions when required. This is applied to most contract call functions (both read and write) as well as to verification calls.

For read calls, the delay between retries is 5 seconds. For write calls to the blockchain and for verification calls, the retry period is 30 seconds, and can be adjusted through the consts.rs file. A verification is only retried if the result is Inconclusive, this is because those are errors outside the control of nilCC or the verifier.

With this approach, if a user’s transaction is rejected due to gas costs on a given block, it may succeed on a different block on a subsequent retry 30 seconds later. Similarly, if a request to AMD times out, retries allow time for certificates to be re-downloaded successfully.

I am not sure this is the best way to approach this, and whether we want to introduce this change. Happy to open a discussion on this.

jimouris · 2026-01-23T16:07:53Z

I haven't followed the latest changes on the contracts, but this will heavily depend on the logic there. Do the contracts expect a response within a certain time limit? They should otherwise, a node could just not respond and cause the HTX to never be acknowledged (either positively or negatively).

I like the retry idea, but it might be better to respond "inconclusive" in time than keep retrying and have the contract assume that you never responded.

cc: @tlitre

mfontanini · 2026-01-23T16:25:07Z

You need to reply within a limit, yes, I think it's 100 blocks as it is now.

jcabrero · 2026-01-23T16:40:39Z

I think I read somewhere it was 5 minutes. That's why I chose 30 seconds x 3. Indeed, if it is 100 blocks, then it may be better to adjust the 30 seconds to something smaller.

mfontanini · 2026-01-23T16:44:46Z

I like the retry idea, but it might be better to respond "inconclusive" in time than keep retrying and have the contract assume that you never responded.

I agree with this. I don't know if we want to keep trying if we got inconclusive. This is a hard decision, there's too many failure types, some of which will be inconclusive but "always inconclusive" vs others which will only be transient.

mfontanini

I feel a little iffy about this whole thing. I'm not against it, just feel like we're adding lots of implicit retries, then top level retries, etc. This may make it hard to understand how long exactly can something take. e.g. if we try to verify and something internally fails, retries, then eventually gives up, then bubbles up, the next one may retry too, internally retrying again, etc. Maybe I'm seeing this wrong but I feel like we can't tell exactly how long can a verification take now given all the inner retries.

mfontanini · 2026-01-23T16:23:17Z

src/contract_client/heartbeat_manager.rs

+        retry(RetryConfig::for_reads(), "getNodes", || async {
+            self.contract
+                .getNodes()
+                .call()


This is fine as is but I think there could be a much less verbose way of doing the same so you don't need closures everywhere: create an extension trait. e.g. something like (uncompiled code)

#[async_trait] pub trait CallBuilderExt<T> { async fn call_with_retries(self, config: RetryConfig) -> Result<T, SomeError>; } impl <...> CallBuilderExt<...> for CallBuilder<P, D> { async fn call_with_retries(self, config: RetryConfig) -> Result<T, SomeError> { .... do the actual retries } } // usage self.contract .getNodes() .call_with_retries(RetryConfig::for_reads()) .await?

We don't have to switch to it (and I'm not sure if you wouldn't hit some annoyance around async here) but just to throw it out there.

mfontanini · 2026-01-23T16:42:03Z

src/config/consts.rs

+// =============================================================================
+
+/// Default delay between retry attempts in seconds
+pub const DEFAULT_RETRY_DELAY_SECS: u64 = 30;


nit: you can use Duration directly so you don't need the _SECS and the conversion to Duration on use later on

mfontanini · 2026-01-23T16:42:18Z

src/retry.rs

+            max_attempts: DEFAULT_MAX_RETRY_ATTEMPTS,
+            delay: Duration::from_secs(DEFAULT_RETRY_DELAY_SECS),
+            backoff_multiplier: 1.0,
+            max_delay: Duration::from_secs(300), // 5 minute cap


To dimitris' point, this I think goes off the 100 block range but I could be wrong.

mfontanini · 2026-01-23T16:43:24Z

src/retry.rs

+            max_attempts: DEFAULT_MAX_RETRY_ATTEMPTS,
+            delay: Duration::from_secs(DEFAULT_RETRY_DELAY_SECS),
+            backoff_multiplier: 1.0,
+            max_delay: Duration::from_secs(300), // 5 minute cap


To dimitris' point, this I think goes off the 100 block range but I could be wrong.

tlitre · 2026-01-23T19:24:21Z

I think I read somewhere it was 5 minutes. That's why I chose 30 seconds x 3. Indeed, if it is 100 blocks, then it may be better to adjust the 30 seconds to something smaller.

yes it's 5 minutes! though this is a parameter that's easily changeable by the admin RESPONSE_WINDOW_SEC

jcabrero · 2026-02-05T10:37:09Z

I think it's best to remove this PR for now. I agree we may be causing unecessary delays and unpredictable behaviour

jcabrero force-pushed the feat/add_retry_logic branch from 196c1f6 to 22cbd94 Compare January 23, 2026 15:28

feat: retry logic for verifications and contract calls

179d671

jcabrero force-pushed the feat/add_retry_logic branch from 22cbd94 to 179d671 Compare January 23, 2026 15:30

jcabrero marked this pull request as ready for review January 23, 2026 15:36

jcabrero requested review from jimouris and mfontanini January 23, 2026 15:37

mfontanini reviewed Jan 23, 2026

View reviewed changes

jcabrero closed this Feb 5, 2026

jcabrero deleted the feat/add_retry_logic branch February 5, 2026 10:40

jcabrero restored the feat/add_retry_logic branch February 5, 2026 10:40

jcabrero deleted the feat/add_retry_logic branch February 5, 2026 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add retry logic#46

feat: add retry logic#46
jcabrero wants to merge 1 commit intomainfrom
feat/add_retry_logic

jcabrero commented Jan 23, 2026 •

edited

Loading

Uh oh!

jimouris commented Jan 23, 2026

Uh oh!

mfontanini commented Jan 23, 2026

Uh oh!

jcabrero commented Jan 23, 2026

Uh oh!

mfontanini commented Jan 23, 2026

Uh oh!

mfontanini left a comment

Uh oh!

mfontanini Jan 23, 2026

Uh oh!

mfontanini Jan 23, 2026

Uh oh!

mfontanini Jan 23, 2026

Uh oh!

mfontanini Jan 23, 2026

Uh oh!

tlitre commented Jan 23, 2026

Uh oh!

jcabrero commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jcabrero commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimouris commented Jan 23, 2026

Uh oh!

mfontanini commented Jan 23, 2026

Uh oh!

jcabrero commented Jan 23, 2026

Uh oh!

mfontanini commented Jan 23, 2026

Uh oh!

mfontanini left a comment

Choose a reason for hiding this comment

Uh oh!

mfontanini Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

mfontanini Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

mfontanini Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

mfontanini Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

tlitre commented Jan 23, 2026

Uh oh!

jcabrero commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jcabrero commented Jan 23, 2026 •

edited

Loading