Enable stStackTests, stCreate2, stCreateTest state tests #2782

pdrobnjak · 2026-01-28T10:33:21Z

Summary

Enable stStackTests: 373/375 tests pass (99.5%), 2 failures skipped
Enable stCreate2: 192/192 tests pass (100%)
Enable stCreateTest: 210/210 tests pass (100%)

Total: 775 additional tests enabled (773 passing, 2 skipped)

Test plan

Verified all three categories pass with skip list applied
stStackTests: 2 result_code failures skipped (underflowTest indices 22, 23)

🤖 Generated with Claude Code

Add test infrastructure for running Ethereum General State Tests: - Test fixtures archive (fixtures_general_state_tests.tgz) - Harness package for building and loading test cases - State test runner (state_test.go, state_harness_test.go) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update NewStateTestContext to use NewTestWrapperWithSc and NewGigaTestWrapper instead of manually setting removed EvmKeeper fields (GigaExecutorEnabled, GigaOCCEnabled, EvmoneVM). Also fix ModeV2Sequential -> ModeV2withOCC. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add ModeV2Sequential to ExecutorMode constants to provide a V2 execution path with OCC disabled. This gives a true sequential baseline for state test comparisons, as the previous ModeV2withOCC had OCC enabled. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix subtest naming bug for i>=10 by using fmt.Sprintf instead of rune arithmetic - Panic on parseHexBig failure to surface test data issues immediately - Fix map iteration non-determinism in LoadStateTest and LoadStateTestsFromDir Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add skip list infrastructure (harness/skip.go) to skip specific tests or entire categories via skip_list.json - Add failure type categorization: result_code, state_mismatch, code_mismatch, nonce_mismatch, error_mismatch, v2_error, giga_error - Add test summary report with per-category stats and failures by type - Enhanced logging with detailed diffs for state mismatches - Gas comparison disabled for now (pending Giga gas accounting finalization) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Initial population of the skip list with all test categories to enable a "skip by default, allowlist on pass" workflow for systematically categorizing state test results. Categories include: Cancun, Shanghai, VMTests, and 57 st* categories. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add test mode that runs Giga executor with regular KVStore instead of GigaKVStore. This isolates executor logic from GigaKVStore layer for debugging purposes. Tests pass with this mode, confirming Giga executor logic is correct. - Add ModeGigaWithRegularStore to ExecutorMode enum - Add NewGigaTestWrapperWithRegularStore test helper - Add TestGigaWithRegularStore_StateTests test - Remove stChainId from skip list for testing Note: Depends on keeper UseRegularStore changes in separate branch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove stExample from skipped_categories (38/39 tests pass) - Add solidityExample to skipped_tests (Giga reverts where V2 succeeds) - Add partial match support for skip patterns (category/shortName) - Add TestDebugStateTest for verbose single-test debugging - Add STATE_TEST_NAME filtering to state tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Set up pre-state in Giga keepers (GigaBankKeeper, GigaEvmKeeper) for Giga mode - Use GigaEvmKeeper for state comparisons and verification in Giga mode - Add verifyGigaPostStateWithResult for Giga keeper verification - Update skip_list.json: 37/39 stExample tests now pass - Add error logging for failed state comparisons Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The Write() method was clearing the cache after writing dirty entries to the parent store. This caused issues because: 1. The parent store (commitment.Store) writes to a changeSet buffer 2. commitment.Store.Get() reads from the tree, not the changeSet 3. After clearing the cache, reads fell through to parent which couldn't return uncommitted data Fix: Mark cache entries as clean (non-dirty) instead of clearing them. This preserves readability while still flushing data to parent for eventual commit. This fixes state test failures where pre-state data was lost after ProcessBlock called WriteGiga(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Introduce EvmKeeperInterface and BankKeeperInterface to abstract away the conditional logic for Giga vs V2 keeper access. Add helper methods (EvmKeeper(), BankKeeper(), IsGigaMode()) to StateTestContext that return the appropriate keeper based on execution mode. This eliminates scattered `if isGigaMode` checks and consolidates the duplicate verifyPostStateWithResult/verifyGigaPostStateWithResult functions into a single interface-based implementation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Refactor TestGigaVsV2_StateTests and TestGigaWithRegularStore_StateTests to share common code through a parameterized approach: - Add ComparisonConfig struct with GigaMode and VerifyFixture fields - Replace runStateTestComparisonWithResult and runV2VsGigaWithRegularStoreComparison with unified runStateTestComparison - Extract common test iteration logic into runStateTestSuite - Simplify both test entry points to thin wrappers This reduces ~90% code duplication between the two test functions while preserving their distinct behaviors (different executor modes and fixture verification settings). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rename the config field to better describe its purpose (verifying against Ethereum test spec expected post-state). Default to false and make it configurable via VERIFY_ETHEREUM_SPEC environment variable. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add the ability for GigaBankKeeper to use ctx.KVStore() instead of ctx.GigaKVStore() when UseRegularStore flag is set. This mirrors the existing functionality in GigaEvmKeeper. Changes: - Add UseRegularStore field to BaseViewKeeper struct - Add GetKVStore() method that switches between ctx.KVStore() and ctx.GigaKVStore() based on the flag - Update all direct ctx.GigaKVStore() calls to use k.GetKVStore(ctx) - Change GigaBankKeeper in App from interface to pointer type to allow the flag to be modified after initialization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Configure GigaBankKeeper.UseRegularStore = true in NewGigaTestWrapperWithRegularStore so that TestGigaWithRegularStore_StateTests can run without GigaKVStore. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Enable comprehensive verification in state tests: - Add V2 vs Giga gas comparison (detect gas accounting differences) - Add V2 vs Giga balance comparison (verify balance changes match) - Add Ethereum spec balance verification (guarded by VerifyEthereumSpec) - Add GetBalance to BankKeeperInterface - Add FailureTypeBalanceMismatch failure type Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove mergeTest and eip1559 from the skip list as they now pass. The GASPRICE opcode mismatch has been resolved and both tests produce matching storage values between V2 and Giga executors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

## Summary - Merge `TestGigaVsV2_StateTests` and `TestGigaWithRegularStore_StateTests` into a single test - Default uses GigaStore (`ModeGigaSequential`) - Set `USE_REGULAR_STORE=true` to use regular KVStore instead ## Test plan - [x] Run `STATE_TEST_DIR=stExample go test -v -run TestGigaVsV2_StateTests ./giga/tests/...` - [x] Run `STATE_TEST_DIR=stExample USE_REGULAR_STORE=true go test -v -run TestGigaVsV2_StateTests ./giga/tests/...` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Batch 1: Enable 3 high-pass categories with individual test skips: - Shanghai: 26/27 passing (1 gas_mismatch skipped) - stArgsZeroOneBalance: 91/96 passing (5 skipped) - stTransactionTest: 248/259 passing (11 skipped) Also update CLAUDE.md with: - Correct test name format documentation for skipped_tests - Updated workflow to include commit/push after each batch - Updated test categories status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Batch 2: Enable 3 more high-pass categories with individual test skips: - stSpecialTest: 21/22 passing (1 result_code skipped) - stSolidityTest: 21/23 passing (2 result_code skipped) - stNonZeroCallsTest: 21/24 passing (3 gas_mismatch skipped) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Batch 3 (partial): Enable 2 more high-pass categories with individual test skips: - stRefundTest: 23/26 passing (3 result_code skipped) - stWalletTest: 41/46 passing (5 skipped: 4 result_code, 1 gas_mismatch) Note: stEIP150singleCodeGasPrices (450 tests) deferred to separate run Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- stZeroCallsTest: 24/24 passing (100%) - stStaticFlagEnabled: 34/34 passing (100%) - stCodeSizeLimit: 9/9 passing (100%) Total: 67 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…rationsTest state tests - stCallCreateCallCodeTest: 56/56 passing (100%) - stPreCompiledContracts2: 160/160 passing (100%) - stSystemOperationsTest: 82/83 passing (1 skipped: result_code) Total: 298 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- stEIP2930: 110/140 passing (30 skipped: result_code) Total: 110 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix errcheck: handle file.Close() return value in loader.go and skip.go - Fix gofmt: correct struct field alignment in types.go - Fix defer placement: move defer after error check in loader.go Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix sync.Once closure capturing stale err variable in GetStateTestsPath - Fix os.Stat error handling to return actual error instead of ("", nil) - Remove unused NormalizeTestName function - Extract magic number 50 to maxFailuresToDisplay constant Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- stRandom: 310/310 passing (100%) - stRandom2: 221/221 passing (100%) Total: 531 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…stTransitionTest state tests All 5 categories pass with 100% success rate: - stInitCodeTest: 22/22 tests pass - stMemExpandingEIP150Calls: 14/14 tests pass - stEIP3607: 12/12 tests pass - stBugs: 8/8 tests pass - stTransitionTest: 6/6 tests pass Total: 62 new passing tests, 0 skips needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Change value-to-pointer assignment for GigaBankKeeper field. The initKeepersWithmAccPerms function returns a value type but app.GigaBankKeeper expects a pointer. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Iteration 5 of state test enablement: - Cancun: 170/177 tests pass (7 blob tx failures skipped) - stExtCodeHash: 65/69 tests pass (4 result_code failures skipped) - stSelfBalance: 42/42 tests pass (100%) Total new passing tests: 277 Total new skips: 11 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Results: - stStackTests: 373/375 tests pass (99.5%), 2 failures skipped - stCreate2: 192/192 tests pass (100%) - stCreateTest: 210/210 tests pass (100%) Total: 775 additional tests enabled (773 passing, 2 skipped) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-28T10:34:19Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	Jan 28, 2026, 5:01 PM

codecov · 2026-01-28T10:34:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.71%. Comparing base (e94ae27) to head (ebd4449).

Additional details and impacted files

@@                   Coverage Diff                   @@
##           pd/giga-state-tests    #2782      +/-   ##
=======================================================
- Coverage                56.71%   56.71%   -0.01%     
=======================================================
  Files                     2007     2007              
  Lines                   165033   165033              
=======================================================
- Hits                     93602    93593       -9     
- Misses                   63236    63244       +8     
- Partials                  8195     8196       +1

Flag	Coverage Δ
sei-chain	`41.62% <ø> (-0.02%)`	⬇️
sei-cosmos	`48.11% <ø> (-0.01%)`	⬇️
sei-db	`68.72% <ø> (ø)`
sei-tendermint	`58.34% <ø> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…Test, stMemoryTest state tests Enable 5 additional state test categories (978 tests total): - stEIP150Specific: 25/25 (100%) - old estimate was 36% - stCallCodes: 86/86 (100%) - old estimate was 37% - stZeroCallsRevert: 16/16 (100%) - old estimate was 0% - stReturnDataTest: 273/273 (100%) - old estimate was 46% - stMemoryTest: 578/578 (100%) - old estimate was 44% All categories pass at 100% with no individual test skips needed. Old estimates were severely stale - actual pass rates far exceed estimates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- stPreCompiledContracts: 745/745 tests pass (100%) - stStaticCall: 479/479 tests pass (100%) - stRevertTest: 241/272 tests pass, 31 failures skipped (result_code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- stMemoryStressTest: 83/83 tests pass (100%) - was estimated at 16%! - VMTests: updated stats to 311/596 (52%), still too many failures to enable Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

All 3 Homestead categories passed 100% (old estimates were severely stale): - stCallDelegateCodesCallCodeHomestead: 59/59 (was estimated at 5%) - stCallDelegateCodesHomestead: 59/59 (was estimated at 9%) - stDelegatecallTestHomestead: 34/34 (was estimated at 3%) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- stBadOpcode: 4135/4135 tests pass (100%) - stEIP1559: 1831/1846 tests pass, 15 failures skipped - balance_mismatch(9), error_mismatch(5), unknown(1) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

pdrobnjak and others added 30 commits January 23, 2026 11:09

Merge branch 'main' into pd/giga-state-tests

f8facd4

ch

97f1a19

Merge branch 'main' into pd/giga-state-tests

dbf8a0d

Merge branch 'pd/giga-cachekv-fix' into pd/giga-state-tests

3179462

ch

e2a68e9

ch

4a83824

[giga] Enable UseRegularStore for GigaBankKeeper in test wrapper

ad126ff

Configure GigaBankKeeper.UseRegularStore = true in NewGigaTestWrapperWithRegularStore so that TestGigaWithRegularStore_StateTests can run without GigaKVStore. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Merge branch 'main' into pd/giga-state-tests

8f482d3

ch

ef62287

ch

dfc2bc2

ch

4ef9200

ch

a554105

ch

2e87fc0

pdrobnjak and others added 20 commits January 27, 2026 17:18

ch

ffca605

Add CLAUDE.md for state tests documentation

6e873b6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ch

7ee3bce

Merge branch 'main' into pd/giga-state-tests

f127d92

Merge branch 'main' into pd/giga-state-tests

68929ef

Enable stZeroCallsTest, stStaticFlagEnabled, stCodeSizeLimit state tests

4c4142d

- stZeroCallsTest: 24/24 passing (100%) - stStaticFlagEnabled: 34/34 passing (100%) - stCodeSizeLimit: 9/9 passing (100%) Total: 67 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Enable stEIP2930 state tests

aaa22d0

- stEIP2930: 110/140 passing (30 skipped: result_code) Total: 110 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Enable stRandom, stRandom2 state tests

c45a914

- stRandom: 310/310 passing (100%) - stRandom2: 221/221 passing (100%) Total: 531 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix type mismatch in xbank keeper tests

6452263

Change value-to-pointer assignment for GigaBankKeeper field. The initKeepersWithmAccPerms function returns a value type but app.GigaBankKeeper expects a pointer. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix gofmt formatting in giga_test.go

587a446

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add CLAUDE.md with Go formatting guidelines

e94ae27

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

pdrobnjak changed the base branch from main to pd/giga-state-tests January 28, 2026 10:34

pdrobnjak and others added 6 commits January 28, 2026 12:00

Enable stZeroKnowledge, stZeroKnowledge2, stSStoreTest state tests

b97735d

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Enable stMemoryStressTest state tests

c8f9fbc

- stMemoryStressTest: 83/83 tests pass (100%) - was estimated at 16%! - VMTests: updated stats to 311/596 (52%), still too many failures to enable Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Enable stBadOpcode, stEIP1559 state tests

ebd4449

- stBadOpcode: 4135/4135 tests pass (100%) - stEIP1559: 1831/1846 tests pass, 15 failures skipped - balance_mismatch(9), error_mismatch(5), unknown(1) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Base automatically changed from pd/giga-state-tests to main January 29, 2026 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable stStackTests, stCreate2, stCreateTest state tests #2782

Enable stStackTests, stCreate2, stCreateTest state tests #2782

Uh oh!

pdrobnjak commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable stStackTests, stCreate2, stCreateTest state tests #2782

Are you sure you want to change the base?

Enable stStackTests, stCreate2, stCreateTest state tests #2782

Uh oh!

Conversation

pdrobnjak commented Jan 28, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 28, 2026 •

edited

Loading

codecov bot commented Jan 28, 2026 •

edited

Loading