Add hotspot shard split design document and versioned route catalog persistence helpers#316
Add hotspot shard split design document and versioned route catalog persistence helpers#316
Conversation
- introduce RouteDescriptor, CatalogSnapshot, and RouteState - add reserved catalog key helpers and route/version codecs - implement CatalogStore Snapshot/Save with optimistic version checks - add catalog tests for round-trip, validation, mismatch, and overflow cases
Summary of ChangesHello @bootjp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request lays the foundational groundwork for Elastickv's hotspot shard splitting feature. It introduces a durable and versioned route catalog, which is crucial for maintaining consistent routing information across the distributed system. Additionally, it provides comprehensive design and implementation plans, guiding the future development of this complex feature by detailing the architecture, data models, and a phased rollout strategy. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the foundational components for hotspot shard splitting by adding a durable, versioned route catalog. The changes include the core logic for catalog persistence with optimistic concurrency control, along with comprehensive unit tests. Additionally, it adds detailed design documents outlining the overall strategy and the plan for the first milestone.
The code is well-structured and the implementation of the catalog store is robust. The tests cover a good range of scenarios, ensuring the reliability of the new components. The design documents are thorough and provide excellent context for the feature.
I have a few suggestions to improve code clarity, enhance test coverage for an edge case, and align the design documentation with the current implementation and established design principles.
| func u64ToInt(v uint64) (int, error) { | ||
| if strconv.IntSize == 32 && v > uint64(^uint32(0)>>1) { | ||
| return 0, errors.WithStack(ErrCatalogInvalidRouteRecord) | ||
| } | ||
| if strconv.IntSize == 64 && v > (^uint64(0)>>1) { | ||
| return 0, errors.WithStack(ErrCatalogInvalidRouteRecord) | ||
| } | ||
| return int(v), nil |
There was a problem hiding this comment.
This function correctly checks for uint64 to int conversion overflow, but it can be simplified for better readability and maintainability by using a constant for the maximum integer value, which avoids platform-specific checks.
func u64ToInt(v uint64) (int, error) {
const maxInt = int(^uint(0) >> 1)
if v > uint64(maxInt) {
return 0, errors.WithStack(ErrCatalogInvalidRouteRecord)
}
return int(v), nil
}|
|
||
| func assertRouteEqual(t *testing.T, want, got RouteDescriptor) { | ||
| t.Helper() | ||
| if want.RouteID != got.RouteID { | ||
| t.Fatalf("route id mismatch: want %d, got %d", want.RouteID, got.RouteID) | ||
| } | ||
| if want.GroupID != got.GroupID { | ||
| t.Fatalf("group id mismatch: want %d, got %d", want.GroupID, got.GroupID) | ||
| } | ||
| if want.ParentRouteID != got.ParentRouteID { | ||
| t.Fatalf("parent route id mismatch: want %d, got %d", want.ParentRouteID, got.ParentRouteID) | ||
| } | ||
| if want.State != got.State { | ||
| t.Fatalf("state mismatch: want %d, got %d", want.State, got.State) | ||
| } | ||
| if !bytes.Equal(want.Start, got.Start) { | ||
| t.Fatalf("start mismatch: want %q, got %q", want.Start, got.Start) | ||
| } | ||
| if !bytes.Equal(want.End, got.End) { | ||
| t.Fatalf("end mismatch: want %q, got %q", want.End, got.End) | ||
| } | ||
| } |
There was a problem hiding this comment.
The code correctly handles commitTS overflow in prepareSave, but this case is not covered by tests. Adding a test for this scenario would improve coverage and prevent future regressions.
func assertRouteEqual(t *testing.T, want, got RouteDescriptor) {
t.Helper()
if want.RouteID != got.RouteID {
t.Fatalf("route id mismatch: want %d, got %d", want.RouteID, got.RouteID)
}
if want.GroupID != got.GroupID {
t.Fatalf("group id mismatch: want %d, got %d", want.GroupID, got.GroupID)
}
if want.ParentRouteID != got.ParentRouteID {
t.Fatalf("parent route id mismatch: want %d, got %d", want.ParentRouteID, got.ParentRouteID)
}
if want.State != got.State {
t.Fatalf("state mismatch: want %d, got %d", want.State, got.State)
}
if !bytes.Equal(want.Start, got.Start) {
t.Fatalf("start mismatch: want %q, got %q", want.Start, got.Start)
}
if !bytes.Equal(want.End, got.End) {
t.Fatalf("end mismatch: want %q, got %q", want.End, got.End)
}
}
func TestCatalogStoreSaveRejectsCommitTSOverflow(t *testing.T) {
st := store.NewMVCCStore()
ctx := context.Background()
// Set catalog version to 10 at a high timestamp.
if err := st.PutAt(ctx, CatalogVersionKey(), EncodeCatalogVersion(10), ^uint64(0)-1, 0); err != nil {
t.Fatalf("setup: %v", err)
}
// Set last commit TS to max uint64. This will be the readTS.
if err := st.PutAt(ctx, []byte("foo"), []byte("bar"), ^uint64(0), 0); err != nil {
t.Fatalf("setup: %v", err)
}
cs := NewCatalogStore(st)
// Save with expected version 10. This will read at readTS = ^uint64(0),
// and then try to commit at commitTS = readTS + 1, which overflows to 0.
_, err := cs.Save(ctx, 10, []RouteDescriptor{
{RouteID: 1, Start: []byte(""), End: nil, GroupID: 1, State: RouteStateActive},
})
if !errors.Is(err, ErrCatalogVersionOverflow) {
t.Fatalf("expected ErrCatalogVersionOverflow, got %v", err)
}
}
docs/hotspot_shard_split_design.md
Outdated
| - `start []byte` (inclusive) | ||
| - `end []byte` (exclusive, nil=+inf) | ||
| - `group_id uint64` | ||
| - `version uint64` (catalog generation) |
There was a problem hiding this comment.
The RouteDescriptor data model defined here includes a version field. However, the Go implementation in distribution/catalog.go does not have this field on the RouteDescriptor struct. Instead, it uses a single global version for the entire catalog snapshot. This aligns with the design principle that Optimistic Concurrency Control (OCC) conflict checks are not required at the store layer (Rule 1), as the leader is responsible for issuing consistent timestamps. Please update the documentation to reflect the current implementation and this design principle, clarifying that per-record versioning for OCC is not used at the store layer.
References
- Optimistic Concurrency Control (OCC) conflict checks are not required at the store layer because the leader is responsible for issuing consistent timestamps, which is expected to prevent write conflicts.
- fix gosec G115 by simplifying uint64->int overflow guard in u64ToInt - add commit timestamp overflow test for CatalogStore.Save - align hotspot split design doc with implementation: - remove per-route version from RouteDescriptor - document global catalog snapshot versioning and catalog-level CAS model
There was a problem hiding this comment.
Pull request overview
Adds milestone/planning documentation for hotspot shard splitting and introduces a first control-plane building block: a durable, versioned route catalog persisted in MVCC state.
Changes:
- Added hotspot split design doc and Milestone 1 PR plan docs.
- Introduced
distribution.CatalogStorewith reserved-key encoding/decoding for catalog version + route descriptors. - Added unit tests covering catalog codecs, optimistic version checks, deletes, and edge cases (duplicate IDs, invalid ranges, overflow).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| docs/hotspot_split_milestone1_pr_plan.md | Milestone 1 implementation plan + PR breakdown and rollout checklist. |
| docs/hotspot_shard_split_design.md | End-to-end design proposal for hotspot detection, splitting, migration, and routing consistency. |
| distribution/catalog.go | New MVCC-backed route catalog persistence helpers (keys, codecs, snapshot/save). |
| distribution/catalog_test.go | Unit tests for catalog encoding/decoding and versioned save/snapshot behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
distribution/catalog.go
Outdated
| commitTS := readTS + 1 | ||
| if commitTS == 0 { | ||
| return savePlan{}, errors.WithStack(ErrCatalogVersionOverflow) | ||
| } | ||
|
|
There was a problem hiding this comment.
CatalogStore.Save computes commitTS as readTS+1 in prepareSave. If the underlying MVCC store advances LastCommitTS (e.g., due to concurrent user writes with higher HLC timestamps) between prepareSave and ApplyMutations, this commitTS can become < current LastCommitTS and the catalog write will appear to have committed “in the past”, breaking snapshot/time ordering. Consider choosing commitTS immediately before ApplyMutations as max(store.LastCommitTS()+1, readTS+1) (with overflow handling), or require the caller to supply a commitTS/clock so catalog commits are always strictly increasing.
| commitTS := readTS + 1 | |
| if commitTS == 0 { | |
| return savePlan{}, errors.WithStack(ErrCatalogVersionOverflow) | |
| } | |
| // Choose a commit timestamp that is strictly greater than both the read | |
| // timestamp and the current store LastCommitTS to preserve ordering. | |
| candidateFromRead := readTS + 1 | |
| if candidateFromRead == 0 { | |
| return savePlan{}, errors.WithStack(ErrCatalogVersionOverflow) | |
| } | |
| currentLast := s.store.LastCommitTS() | |
| candidateFromStore := currentLast + 1 | |
| if candidateFromStore == 0 { | |
| return savePlan{}, errors.WithStack(ErrCatalogVersionOverflow) | |
| } | |
| commitTS := candidateFromRead | |
| if candidateFromStore > commitTS { | |
| commitTS = candidateFromStore | |
| } |
- recompute catalog commit TS at apply time to avoid stale planned timestamps - ensure commit TS is greater than both planned minimum and current store LastCommitTS - keep overflow checks for commit TS calculation - add regression test for concurrent LastCommitTS advancement between prepare and apply
…stickv into feature/hotspot-re-allocation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the foundational components for hotspot shard splitting by adding a durable and versioned route catalog. The changes include the core logic for catalog persistence, serialization/deserialization of route data, and a comprehensive test suite. Additionally, two markdown files are added, providing a detailed design document and a phased implementation plan. The code is well-structured and the tests are thorough. I have identified one critical compilation issue that needs to be addressed and one medium-severity suggestion for a potential performance improvement in the future.
| func (s *CatalogStore) buildSaveMutations(ctx context.Context, plan savePlan) ([]*store.KVPairMutation, error) { | ||
| existingKeys, err := s.routeKeysAt(ctx, plan.readTS) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| mutations := make([]*store.KVPairMutation, 0, len(existingKeys)+len(plan.routes)+1) | ||
| mutations = appendDeleteMutations(mutations, existingKeys) | ||
| mutations, err = appendRouteMutations(mutations, plan.routes) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| mutations = append(mutations, &store.KVPairMutation{ | ||
| Op: store.OpTypePut, | ||
| Key: CatalogVersionKey(), | ||
| Value: EncodeCatalogVersion(plan.nextVersion), | ||
| }) | ||
| return mutations, nil | ||
| } |
There was a problem hiding this comment.
The current implementation of buildSaveMutations uses a "delete-all, then write-all" strategy to update the route catalog. It fetches all existing keys, creates delete mutations for them, and then creates put mutations for the new set of routes. While simple, this can be inefficient if the catalog contains many routes and only a few are changing, as it leads to unnecessary writes and can increase transaction contention.
For a future improvement, consider implementing a diff-based approach:
- Read the full current set of
RouteDescriptors, not just the keys. - Compare the current routes with the new routes to identify which are added, modified, or deleted.
- Generate mutations only for the routes that have actually changed.
This would reduce write amplification and improve performance for large catalogs.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| out = append(out, cloneRouteDescriptor(route)) | ||
| } | ||
| sort.Slice(out, func(i, j int) bool { | ||
| return out[i].RouteID < out[j].RouteID |
There was a problem hiding this comment.
Routes are normalized/sorted by RouteID here, but the in-memory routing engine assumes routes are sorted by Start (key range) for binary search. Unless all callers re-sort by Start before applying a snapshot, returning RouteDescriptor slices in RouteID order is likely to cause subtle misrouting bugs. Consider sorting by Start (then End/RouteID for tie-breaks), or explicitly documenting the ordering contract in CatalogSnapshot and enforcing/validating it in Engine snapshot apply.
| return out[i].RouteID < out[j].RouteID | |
| ri, rj := out[i], out[j] | |
| // Primary ordering: Start key (ascending). | |
| if c := bytes.Compare(ri.Start, rj.Start); c != 0 { | |
| return c < 0 | |
| } | |
| // Secondary ordering: End key (ascending, with nil treated as empty slice). | |
| if c := bytes.Compare(ri.End, rj.End); c != 0 { | |
| return c < 0 | |
| } | |
| // Tertiary ordering: RouteID for deterministic tie-breaking. | |
| return ri.RouteID < rj.RouteID |
No description provided.