Skip to content

Conversation

@pdrobnjak
Copy link
Contributor

Summary

  • Enable stStackTests: 373/375 tests pass (99.5%), 2 failures skipped
  • Enable stCreate2: 192/192 tests pass (100%)
  • Enable stCreateTest: 210/210 tests pass (100%)

Total: 775 additional tests enabled (773 passing, 2 skipped)

Test plan

  • Verified all three categories pass with skip list applied
  • stStackTests: 2 result_code failures skipped (underflowTest indices 22, 23)

🤖 Generated with Claude Code

pdrobnjak and others added 30 commits January 23, 2026 11:09
Add test infrastructure for running Ethereum General State Tests:
- Test fixtures archive (fixtures_general_state_tests.tgz)
- Harness package for building and loading test cases
- State test runner (state_test.go, state_harness_test.go)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update NewStateTestContext to use NewTestWrapperWithSc and
NewGigaTestWrapper instead of manually setting removed EvmKeeper
fields (GigaExecutorEnabled, GigaOCCEnabled, EvmoneVM).

Also fix ModeV2Sequential -> ModeV2withOCC.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add ModeV2Sequential to ExecutorMode constants to provide a V2 execution
path with OCC disabled. This gives a true sequential baseline for state
test comparisons, as the previous ModeV2withOCC had OCC enabled.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix subtest naming bug for i>=10 by using fmt.Sprintf instead of rune arithmetic
- Panic on parseHexBig failure to surface test data issues immediately
- Fix map iteration non-determinism in LoadStateTest and LoadStateTestsFromDir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add skip list infrastructure (harness/skip.go) to skip specific tests
  or entire categories via skip_list.json
- Add failure type categorization: result_code, state_mismatch,
  code_mismatch, nonce_mismatch, error_mismatch, v2_error, giga_error
- Add test summary report with per-category stats and failures by type
- Enhanced logging with detailed diffs for state mismatches
- Gas comparison disabled for now (pending Giga gas accounting finalization)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Initial population of the skip list with all test categories to enable
a "skip by default, allowlist on pass" workflow for systematically
categorizing state test results.

Categories include: Cancun, Shanghai, VMTests, and 57 st* categories.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test mode that runs Giga executor with regular KVStore instead of
GigaKVStore. This isolates executor logic from GigaKVStore layer for
debugging purposes. Tests pass with this mode, confirming Giga executor
logic is correct.

- Add ModeGigaWithRegularStore to ExecutorMode enum
- Add NewGigaTestWrapperWithRegularStore test helper
- Add TestGigaWithRegularStore_StateTests test
- Remove stChainId from skip list for testing

Note: Depends on keeper UseRegularStore changes in separate branch.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove stExample from skipped_categories (38/39 tests pass)
- Add solidityExample to skipped_tests (Giga reverts where V2 succeeds)
- Add partial match support for skip patterns (category/shortName)
- Add TestDebugStateTest for verbose single-test debugging
- Add STATE_TEST_NAME filtering to state tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Set up pre-state in Giga keepers (GigaBankKeeper, GigaEvmKeeper) for Giga mode
- Use GigaEvmKeeper for state comparisons and verification in Giga mode
- Add verifyGigaPostStateWithResult for Giga keeper verification
- Update skip_list.json: 37/39 stExample tests now pass
- Add error logging for failed state comparisons

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Write() method was clearing the cache after writing dirty entries to
the parent store. This caused issues because:

1. The parent store (commitment.Store) writes to a changeSet buffer
2. commitment.Store.Get() reads from the tree, not the changeSet
3. After clearing the cache, reads fell through to parent which couldn't
   return uncommitted data

Fix: Mark cache entries as clean (non-dirty) instead of clearing them.
This preserves readability while still flushing data to parent for
eventual commit.

This fixes state test failures where pre-state data was lost after
ProcessBlock called WriteGiga().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce EvmKeeperInterface and BankKeeperInterface to abstract away
the conditional logic for Giga vs V2 keeper access. Add helper methods
(EvmKeeper(), BankKeeper(), IsGigaMode()) to StateTestContext that
return the appropriate keeper based on execution mode.

This eliminates scattered `if isGigaMode` checks and consolidates the
duplicate verifyPostStateWithResult/verifyGigaPostStateWithResult
functions into a single interface-based implementation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactor TestGigaVsV2_StateTests and TestGigaWithRegularStore_StateTests
to share common code through a parameterized approach:

- Add ComparisonConfig struct with GigaMode and VerifyFixture fields
- Replace runStateTestComparisonWithResult and
  runV2VsGigaWithRegularStoreComparison with unified runStateTestComparison
- Extract common test iteration logic into runStateTestSuite
- Simplify both test entry points to thin wrappers

This reduces ~90% code duplication between the two test functions while
preserving their distinct behaviors (different executor modes and
fixture verification settings).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rename the config field to better describe its purpose (verifying against
Ethereum test spec expected post-state). Default to false and make it
configurable via VERIFY_ETHEREUM_SPEC environment variable.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the ability for GigaBankKeeper to use ctx.KVStore() instead of
ctx.GigaKVStore() when UseRegularStore flag is set. This mirrors the
existing functionality in GigaEvmKeeper.

Changes:
- Add UseRegularStore field to BaseViewKeeper struct
- Add GetKVStore() method that switches between ctx.KVStore() and
  ctx.GigaKVStore() based on the flag
- Update all direct ctx.GigaKVStore() calls to use k.GetKVStore(ctx)
- Change GigaBankKeeper in App from interface to pointer type to allow
  the flag to be modified after initialization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configure GigaBankKeeper.UseRegularStore = true in
NewGigaTestWrapperWithRegularStore so that
TestGigaWithRegularStore_StateTests can run without GigaKVStore.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable comprehensive verification in state tests:
- Add V2 vs Giga gas comparison (detect gas accounting differences)
- Add V2 vs Giga balance comparison (verify balance changes match)
- Add Ethereum spec balance verification (guarded by VerifyEthereumSpec)
- Add GetBalance to BankKeeperInterface
- Add FailureTypeBalanceMismatch failure type

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove mergeTest and eip1559 from the skip list as they now pass.
The GASPRICE opcode mismatch has been resolved and both tests
produce matching storage values between V2 and Giga executors.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
- Merge `TestGigaVsV2_StateTests` and
`TestGigaWithRegularStore_StateTests` into a single test
- Default uses GigaStore (`ModeGigaSequential`)
- Set `USE_REGULAR_STORE=true` to use regular KVStore instead

## Test plan
- [x] Run `STATE_TEST_DIR=stExample go test -v -run
TestGigaVsV2_StateTests ./giga/tests/...`
- [x] Run `STATE_TEST_DIR=stExample USE_REGULAR_STORE=true go test -v
-run TestGigaVsV2_StateTests ./giga/tests/...`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
pdrobnjak and others added 20 commits January 27, 2026 17:18
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Batch 1: Enable 3 high-pass categories with individual test skips:
- Shanghai: 26/27 passing (1 gas_mismatch skipped)
- stArgsZeroOneBalance: 91/96 passing (5 skipped)
- stTransactionTest: 248/259 passing (11 skipped)

Also update CLAUDE.md with:
- Correct test name format documentation for skipped_tests
- Updated workflow to include commit/push after each batch
- Updated test categories status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Batch 2: Enable 3 more high-pass categories with individual test skips:
- stSpecialTest: 21/22 passing (1 result_code skipped)
- stSolidityTest: 21/23 passing (2 result_code skipped)
- stNonZeroCallsTest: 21/24 passing (3 gas_mismatch skipped)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Batch 3 (partial): Enable 2 more high-pass categories with individual test skips:
- stRefundTest: 23/26 passing (3 result_code skipped)
- stWalletTest: 41/46 passing (5 skipped: 4 result_code, 1 gas_mismatch)

Note: stEIP150singleCodeGasPrices (450 tests) deferred to separate run

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stZeroCallsTest: 24/24 passing (100%)
- stStaticFlagEnabled: 34/34 passing (100%)
- stCodeSizeLimit: 9/9 passing (100%)

Total: 67 additional passing tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rationsTest state tests

- stCallCreateCallCodeTest: 56/56 passing (100%)
- stPreCompiledContracts2: 160/160 passing (100%)
- stSystemOperationsTest: 82/83 passing (1 skipped: result_code)

Total: 298 additional passing tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stEIP2930: 110/140 passing (30 skipped: result_code)

Total: 110 additional passing tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix errcheck: handle file.Close() return value in loader.go and skip.go
- Fix gofmt: correct struct field alignment in types.go
- Fix defer placement: move defer after error check in loader.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix sync.Once closure capturing stale err variable in GetStateTestsPath
- Fix os.Stat error handling to return actual error instead of ("", nil)
- Remove unused NormalizeTestName function
- Extract magic number 50 to maxFailuresToDisplay constant

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stRandom: 310/310 passing (100%)
- stRandom2: 221/221 passing (100%)

Total: 531 additional passing tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…stTransitionTest state tests

All 5 categories pass with 100% success rate:
- stInitCodeTest: 22/22 tests pass
- stMemExpandingEIP150Calls: 14/14 tests pass
- stEIP3607: 12/12 tests pass
- stBugs: 8/8 tests pass
- stTransitionTest: 6/6 tests pass

Total: 62 new passing tests, 0 skips needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change value-to-pointer assignment for GigaBankKeeper field.
The initKeepersWithmAccPerms function returns a value type but
app.GigaBankKeeper expects a pointer.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Iteration 5 of state test enablement:
- Cancun: 170/177 tests pass (7 blob tx failures skipped)
- stExtCodeHash: 65/69 tests pass (4 result_code failures skipped)
- stSelfBalance: 42/42 tests pass (100%)

Total new passing tests: 277
Total new skips: 11

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Results:
- stStackTests: 373/375 tests pass (99.5%), 2 failures skipped
- stCreate2: 192/192 tests pass (100%)
- stCreateTest: 210/210 tests pass (100%)

Total: 775 additional tests enabled (773 passing, 2 skipped)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pdrobnjak pdrobnjak changed the base branch from main to pd/giga-state-tests January 28, 2026 10:34
@github-actions
Copy link

github-actions bot commented Jan 28, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJan 28, 2026, 5:01 PM

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.71%. Comparing base (e94ae27) to head (ebd4449).

Additional details and impacted files

Impacted file tree graph

@@                   Coverage Diff                   @@
##           pd/giga-state-tests    #2782      +/-   ##
=======================================================
- Coverage                56.71%   56.71%   -0.01%     
=======================================================
  Files                     2007     2007              
  Lines                   165033   165033              
=======================================================
- Hits                     93602    93593       -9     
- Misses                   63236    63244       +8     
- Partials                  8195     8196       +1     
Flag Coverage Δ
sei-chain 41.62% <ø> (-0.02%) ⬇️
sei-cosmos 48.11% <ø> (-0.01%) ⬇️
sei-db 68.72% <ø> (ø)
sei-tendermint 58.34% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pdrobnjak and others added 6 commits January 28, 2026 12:00
…Test, stMemoryTest state tests

Enable 5 additional state test categories (978 tests total):
- stEIP150Specific: 25/25 (100%) - old estimate was 36%
- stCallCodes: 86/86 (100%) - old estimate was 37%
- stZeroCallsRevert: 16/16 (100%) - old estimate was 0%
- stReturnDataTest: 273/273 (100%) - old estimate was 46%
- stMemoryTest: 578/578 (100%) - old estimate was 44%

All categories pass at 100% with no individual test skips needed.
Old estimates were severely stale - actual pass rates far exceed estimates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stPreCompiledContracts: 745/745 tests pass (100%)
- stStaticCall: 479/479 tests pass (100%)
- stRevertTest: 241/272 tests pass, 31 failures skipped (result_code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stMemoryStressTest: 83/83 tests pass (100%) - was estimated at 16%!
- VMTests: updated stats to 311/596 (52%), still too many failures to enable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 3 Homestead categories passed 100% (old estimates were severely stale):
- stCallDelegateCodesCallCodeHomestead: 59/59 (was estimated at 5%)
- stCallDelegateCodesHomestead: 59/59 (was estimated at 9%)
- stDelegatecallTestHomestead: 34/34 (was estimated at 3%)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stBadOpcode: 4135/4135 tests pass (100%)
- stEIP1559: 1831/1846 tests pass, 15 failures skipped
  - balance_mismatch(9), error_mismatch(5), unknown(1)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Base automatically changed from pd/giga-state-tests to main January 29, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants