-
Notifications
You must be signed in to change notification settings - Fork 861
Enable stStackTests, stCreate2, stCreateTest state tests #2782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pdrobnjak
wants to merge
65
commits into
main
Choose a base branch
from
pd/giga-state-tests-iteration6
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add test infrastructure for running Ethereum General State Tests: - Test fixtures archive (fixtures_general_state_tests.tgz) - Harness package for building and loading test cases - State test runner (state_test.go, state_harness_test.go) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update NewStateTestContext to use NewTestWrapperWithSc and NewGigaTestWrapper instead of manually setting removed EvmKeeper fields (GigaExecutorEnabled, GigaOCCEnabled, EvmoneVM). Also fix ModeV2Sequential -> ModeV2withOCC. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add ModeV2Sequential to ExecutorMode constants to provide a V2 execution path with OCC disabled. This gives a true sequential baseline for state test comparisons, as the previous ModeV2withOCC had OCC enabled. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix subtest naming bug for i>=10 by using fmt.Sprintf instead of rune arithmetic - Panic on parseHexBig failure to surface test data issues immediately - Fix map iteration non-determinism in LoadStateTest and LoadStateTestsFromDir Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add skip list infrastructure (harness/skip.go) to skip specific tests or entire categories via skip_list.json - Add failure type categorization: result_code, state_mismatch, code_mismatch, nonce_mismatch, error_mismatch, v2_error, giga_error - Add test summary report with per-category stats and failures by type - Enhanced logging with detailed diffs for state mismatches - Gas comparison disabled for now (pending Giga gas accounting finalization) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Initial population of the skip list with all test categories to enable a "skip by default, allowlist on pass" workflow for systematically categorizing state test results. Categories include: Cancun, Shanghai, VMTests, and 57 st* categories. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test mode that runs Giga executor with regular KVStore instead of GigaKVStore. This isolates executor logic from GigaKVStore layer for debugging purposes. Tests pass with this mode, confirming Giga executor logic is correct. - Add ModeGigaWithRegularStore to ExecutorMode enum - Add NewGigaTestWrapperWithRegularStore test helper - Add TestGigaWithRegularStore_StateTests test - Remove stChainId from skip list for testing Note: Depends on keeper UseRegularStore changes in separate branch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove stExample from skipped_categories (38/39 tests pass) - Add solidityExample to skipped_tests (Giga reverts where V2 succeeds) - Add partial match support for skip patterns (category/shortName) - Add TestDebugStateTest for verbose single-test debugging - Add STATE_TEST_NAME filtering to state tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Set up pre-state in Giga keepers (GigaBankKeeper, GigaEvmKeeper) for Giga mode - Use GigaEvmKeeper for state comparisons and verification in Giga mode - Add verifyGigaPostStateWithResult for Giga keeper verification - Update skip_list.json: 37/39 stExample tests now pass - Add error logging for failed state comparisons Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Write() method was clearing the cache after writing dirty entries to the parent store. This caused issues because: 1. The parent store (commitment.Store) writes to a changeSet buffer 2. commitment.Store.Get() reads from the tree, not the changeSet 3. After clearing the cache, reads fell through to parent which couldn't return uncommitted data Fix: Mark cache entries as clean (non-dirty) instead of clearing them. This preserves readability while still flushing data to parent for eventual commit. This fixes state test failures where pre-state data was lost after ProcessBlock called WriteGiga(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce EvmKeeperInterface and BankKeeperInterface to abstract away the conditional logic for Giga vs V2 keeper access. Add helper methods (EvmKeeper(), BankKeeper(), IsGigaMode()) to StateTestContext that return the appropriate keeper based on execution mode. This eliminates scattered `if isGigaMode` checks and consolidates the duplicate verifyPostStateWithResult/verifyGigaPostStateWithResult functions into a single interface-based implementation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactor TestGigaVsV2_StateTests and TestGigaWithRegularStore_StateTests to share common code through a parameterized approach: - Add ComparisonConfig struct with GigaMode and VerifyFixture fields - Replace runStateTestComparisonWithResult and runV2VsGigaWithRegularStoreComparison with unified runStateTestComparison - Extract common test iteration logic into runStateTestSuite - Simplify both test entry points to thin wrappers This reduces ~90% code duplication between the two test functions while preserving their distinct behaviors (different executor modes and fixture verification settings). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rename the config field to better describe its purpose (verifying against Ethereum test spec expected post-state). Default to false and make it configurable via VERIFY_ETHEREUM_SPEC environment variable. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the ability for GigaBankKeeper to use ctx.KVStore() instead of ctx.GigaKVStore() when UseRegularStore flag is set. This mirrors the existing functionality in GigaEvmKeeper. Changes: - Add UseRegularStore field to BaseViewKeeper struct - Add GetKVStore() method that switches between ctx.KVStore() and ctx.GigaKVStore() based on the flag - Update all direct ctx.GigaKVStore() calls to use k.GetKVStore(ctx) - Change GigaBankKeeper in App from interface to pointer type to allow the flag to be modified after initialization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configure GigaBankKeeper.UseRegularStore = true in NewGigaTestWrapperWithRegularStore so that TestGigaWithRegularStore_StateTests can run without GigaKVStore. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable comprehensive verification in state tests: - Add V2 vs Giga gas comparison (detect gas accounting differences) - Add V2 vs Giga balance comparison (verify balance changes match) - Add Ethereum spec balance verification (guarded by VerifyEthereumSpec) - Add GetBalance to BankKeeperInterface - Add FailureTypeBalanceMismatch failure type Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove mergeTest and eip1559 from the skip list as they now pass. The GASPRICE opcode mismatch has been resolved and both tests produce matching storage values between V2 and Giga executors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Summary - Merge `TestGigaVsV2_StateTests` and `TestGigaWithRegularStore_StateTests` into a single test - Default uses GigaStore (`ModeGigaSequential`) - Set `USE_REGULAR_STORE=true` to use regular KVStore instead ## Test plan - [x] Run `STATE_TEST_DIR=stExample go test -v -run TestGigaVsV2_StateTests ./giga/tests/...` - [x] Run `STATE_TEST_DIR=stExample USE_REGULAR_STORE=true go test -v -run TestGigaVsV2_StateTests ./giga/tests/...` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Batch 1: Enable 3 high-pass categories with individual test skips: - Shanghai: 26/27 passing (1 gas_mismatch skipped) - stArgsZeroOneBalance: 91/96 passing (5 skipped) - stTransactionTest: 248/259 passing (11 skipped) Also update CLAUDE.md with: - Correct test name format documentation for skipped_tests - Updated workflow to include commit/push after each batch - Updated test categories status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Batch 2: Enable 3 more high-pass categories with individual test skips: - stSpecialTest: 21/22 passing (1 result_code skipped) - stSolidityTest: 21/23 passing (2 result_code skipped) - stNonZeroCallsTest: 21/24 passing (3 gas_mismatch skipped) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Batch 3 (partial): Enable 2 more high-pass categories with individual test skips: - stRefundTest: 23/26 passing (3 result_code skipped) - stWalletTest: 41/46 passing (5 skipped: 4 result_code, 1 gas_mismatch) Note: stEIP150singleCodeGasPrices (450 tests) deferred to separate run Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stZeroCallsTest: 24/24 passing (100%) - stStaticFlagEnabled: 34/34 passing (100%) - stCodeSizeLimit: 9/9 passing (100%) Total: 67 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rationsTest state tests - stCallCreateCallCodeTest: 56/56 passing (100%) - stPreCompiledContracts2: 160/160 passing (100%) - stSystemOperationsTest: 82/83 passing (1 skipped: result_code) Total: 298 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stEIP2930: 110/140 passing (30 skipped: result_code) Total: 110 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix errcheck: handle file.Close() return value in loader.go and skip.go - Fix gofmt: correct struct field alignment in types.go - Fix defer placement: move defer after error check in loader.go Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix sync.Once closure capturing stale err variable in GetStateTestsPath
- Fix os.Stat error handling to return actual error instead of ("", nil)
- Remove unused NormalizeTestName function
- Extract magic number 50 to maxFailuresToDisplay constant
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stRandom: 310/310 passing (100%) - stRandom2: 221/221 passing (100%) Total: 531 additional passing tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…stTransitionTest state tests All 5 categories pass with 100% success rate: - stInitCodeTest: 22/22 tests pass - stMemExpandingEIP150Calls: 14/14 tests pass - stEIP3607: 12/12 tests pass - stBugs: 8/8 tests pass - stTransitionTest: 6/6 tests pass Total: 62 new passing tests, 0 skips needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change value-to-pointer assignment for GigaBankKeeper field. The initKeepersWithmAccPerms function returns a value type but app.GigaBankKeeper expects a pointer. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Iteration 5 of state test enablement: - Cancun: 170/177 tests pass (7 blob tx failures skipped) - stExtCodeHash: 65/69 tests pass (4 result_code failures skipped) - stSelfBalance: 42/42 tests pass (100%) Total new passing tests: 277 Total new skips: 11 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Results: - stStackTests: 373/375 tests pass (99.5%), 2 failures skipped - stCreate2: 192/192 tests pass (100%) - stCreateTest: 210/210 tests pass (100%) Total: 775 additional tests enabled (773 passing, 2 skipped) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## pd/giga-state-tests #2782 +/- ##
=======================================================
- Coverage 56.71% 56.71% -0.01%
=======================================================
Files 2007 2007
Lines 165033 165033
=======================================================
- Hits 93602 93593 -9
- Misses 63236 63244 +8
- Partials 8195 8196 +1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
…Test, stMemoryTest state tests Enable 5 additional state test categories (978 tests total): - stEIP150Specific: 25/25 (100%) - old estimate was 36% - stCallCodes: 86/86 (100%) - old estimate was 37% - stZeroCallsRevert: 16/16 (100%) - old estimate was 0% - stReturnDataTest: 273/273 (100%) - old estimate was 46% - stMemoryTest: 578/578 (100%) - old estimate was 44% All categories pass at 100% with no individual test skips needed. Old estimates were severely stale - actual pass rates far exceed estimates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stPreCompiledContracts: 745/745 tests pass (100%) - stStaticCall: 479/479 tests pass (100%) - stRevertTest: 241/272 tests pass, 31 failures skipped (result_code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stMemoryStressTest: 83/83 tests pass (100%) - was estimated at 16%! - VMTests: updated stats to 311/596 (52%), still too many failures to enable Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All 3 Homestead categories passed 100% (old estimates were severely stale): - stCallDelegateCodesCallCodeHomestead: 59/59 (was estimated at 5%) - stCallDelegateCodesHomestead: 59/59 (was estimated at 9%) - stDelegatecallTestHomestead: 34/34 (was estimated at 3%) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- stBadOpcode: 4135/4135 tests pass (100%) - stEIP1559: 1831/1846 tests pass, 15 failures skipped - balance_mismatch(9), error_mismatch(5), unknown(1) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Total: 775 additional tests enabled (773 passing, 2 skipped)
Test plan
result_codefailures skipped (underflowTest indices 22, 23)🤖 Generated with Claude Code