Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions crates/node/src/node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -374,8 +374,8 @@ where
// Load the safe block hash for both the host and the rollup.
// The safe block height of the rollup CANNOT be higher than the latest ru height,
// as we've already processed all the blocks up to the latest ru height.
let PairedHeights { host: _, rollup: safe_ru_height } =
self.load_safe_block_heights(ru_height)?;
let safe_heights = self.load_safe_block_heights(ru_height)?;
let safe_ru_height = safe_heights.rollup;
Copy link
Member

@prestwich prestwich Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a style change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is just refactoring to capture the full PairedHeights struct so we can use safe_heights.host in the fallback below. Happy to revert to the destructuring style if preferred — I can rename it to make the intent clearer:

let safe_heights @ PairedHeights { rollup: safe_ru_height, .. } = 
    self.load_safe_block_heights(ru_height)?;

Or keep the current approach. Let me know your preference.

let safe_ru_block_header = self
.ru_provider
.sealed_header(safe_ru_height)?
Expand Down Expand Up @@ -417,8 +417,17 @@ where
//
// To do this, we grab the finalized host header to get its height and hash,
// so we can send the corresponding [`ExExEvent`].
//
// If finalization isn't available, we fall back to the safe height.
// This ensures the WAL is cleared even on networks without finality
// reporting (e.g., testnets where the beacon client isn't synced).
if finalized_ru_block_hash != genesis_ru_hash {
self.update_highest_processed_height(finalized_heights.host)?;
} else if safe_ru_block_hash != genesis_ru_hash {
// Fallback: use safe height when finalized isn't available.
// Safe blocks are unlikely to reorg, so this is a reasonable
// threshold for clearing the WAL.
self.update_highest_processed_height(safe_heights.host)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what cases will this fail. will failures cause difficult-to-recover issuss?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Looking at update_highest_processed_height:

Failure cases:

  1. sealed_header(safe_heights.host)? — fails if DB read errors. Returns Err, propagates up.
  2. .expect("db inconsistent...") — panics if header doesn't exist. This shouldn't happen since load_safe_block_heights already validated these heights exist, but it's a hard crash if DB is corrupted.
  3. update_exex_head(...)self.host.events.send(...) — fails if reth node shut down (channel closed).

Recovery:

  • DB errors (1): Node crashes, restarts, resumes from last consistent state. No data loss.
  • DB inconsistency (2): Indicates deeper corruption — needs investigation. Same as existing behavior.
  • Channel closed (3): Node is shutting down anyway, graceful exit.

Is this safe for the fallback path?
The same failure modes exist on the finalized path. The only difference is when we call it — using safe vs finalized height. If safe_heights is valid (which load_safe_block_heights ensures), the call should succeed.

However, I see now from the Slack thread that this might be unrelated to the mainnet issue you're seeing. Should we hold off on this PR until the root cause is clearer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean in what cases will using safe height to clear the WAL result in issues

i.e. what is the behavior when Ethereum has a deep reorg? How can a signet node recover from that?

what other edge cases have you considered?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to push on this — I hadn't fully thought through the reorg implications.

Deep reorg scenario:
If we clear WAL up to safe height X, and then Ethereum reorgs past X:

  1. reth sends revert notification for reorged blocks
  2. on_host_revert tries to unwind the RU chain
  3. But reth may have already pruned the WAL entries we cleared — we can't replay from the ExEx side

Why this is problematic:
Safe blocks can reorg (unlike finalized). It requires >1/3 malicious validators, but it's not cryptographically impossible. If it happens, we'd be in an inconsistent state.

Edge cases I missed:

  • Beacon client temporarily reports stale safe/finalized (network issues)
  • Checkpoint sync completing mid-operation
  • Safe head progressing faster than we process during backfill

Conclusion:
You're right — using safe height for WAL clearing is not safe. The original code is correct: only clear WAL for truly finalized blocks.

The WAL warning on testnets without finality is annoying but not critical — it's a space issue, not a correctness issue. Better to live with disk growth than risk unrecoverable state.

Should I close this PR?

}

// Update the RPC's forkchoice timestamp.
Expand Down