Skip to content

Add hotspot shard split design document and versioned route catalog persistence helpers#316

Open
bootjp wants to merge 7 commits intomainfrom
feature/hotspot-re-allocation
Open

Add hotspot shard split design document and versioned route catalog persistence helpers#316
bootjp wants to merge 7 commits intomainfrom
feature/hotspot-re-allocation

Conversation

@bootjp
Copy link
Owner

@bootjp bootjp commented Feb 17, 2026

No description provided.

- introduce RouteDescriptor, CatalogSnapshot, and RouteState
- add reserved catalog key helpers and route/version codecs
- implement CatalogStore Snapshot/Save with optimistic version checks
- add catalog tests for round-trip, validation, mismatch, and overflow cases
Copilot AI review requested due to automatic review settings February 17, 2026 16:37
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @bootjp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the foundational groundwork for Elastickv's hotspot shard splitting feature. It introduces a durable and versioned route catalog, which is crucial for maintaining consistent routing information across the distributed system. Additionally, it provides comprehensive design and implementation plans, guiding the future development of this complex feature by detailing the architecture, data models, and a phased rollout strategy.

Highlights

  • Durable Route Catalog Persistence: Introduced a new CatalogStore in distribution/catalog.go to provide persistence helpers for the route catalog state, enabling durable storage and retrieval of RouteDescriptor and catalog versions using an MVCC store.
  • Versioned Route Catalog Management: Implemented mechanisms for encoding and decoding catalog versions and RouteDescriptor objects, along with methods for taking consistent snapshots and saving updates with optimistic concurrency control, ensuring data integrity and preventing race conditions.
  • Hotspot Shard Split Design Document: Added a comprehensive design document (docs/hotspot_shard_split_design.md) outlining the architecture, data models, execution flow, and phased rollout plan for implementing automatic hotspot shard splitting in Elastickv.
  • Milestone 1 PR Plan: Included a detailed plan (docs/hotspot_split_milestone1_pr_plan.md) breaking down the first milestone of the hotspot shard split feature into individual pull requests, specifying goals, tasks, and validation for each.
Changelog
  • distribution/catalog.go
    • Added CatalogStore for managing versioned route catalog persistence.
    • Defined RouteDescriptor and CatalogSnapshot structs for route and catalog state.
    • Implemented encoding and decoding functions for catalog versions and route descriptors.
    • Introduced methods for reading consistent catalog snapshots and saving updates with optimistic version checks.
  • distribution/catalog_test.go
    • Added unit tests for CatalogVersionCodec and RouteDescriptorCodec round-trip serialization.
    • Included tests for CatalogStore snapshot and save operations, covering versioning, duplicate route IDs, and invalid route ranges.
  • docs/hotspot_shard_split_design.md
    • Added a new design document detailing the background, goals, requirements, high-level architecture, data model, hotspot detection, split execution flow, routing consistency, key coverage, API changes, implementation touchpoints, phased rollout, test strategy, risks, and acceptance criteria for hotspot shard splitting.
  • docs/hotspot_split_milestone1_pr_plan.md
    • Added a new document outlining the PR plan for Milestone 1 of the hotspot shard split feature, including objectives, scope, implementation strategy, PR breakdown, merge order, operational rollout checklist, and exit criteria.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the foundational components for hotspot shard splitting by adding a durable, versioned route catalog. The changes include the core logic for catalog persistence with optimistic concurrency control, along with comprehensive unit tests. Additionally, it adds detailed design documents outlining the overall strategy and the plan for the first milestone.

The code is well-structured and the implementation of the catalog store is robust. The tests cover a good range of scenarios, ensuring the reliability of the new components. The design documents are thorough and provide excellent context for the feature.

I have a few suggestions to improve code clarity, enhance test coverage for an edge case, and align the design documentation with the current implementation and established design principles.

Comment on lines 324 to 331
func u64ToInt(v uint64) (int, error) {
if strconv.IntSize == 32 && v > uint64(^uint32(0)>>1) {
return 0, errors.WithStack(ErrCatalogInvalidRouteRecord)
}
if strconv.IntSize == 64 && v > (^uint64(0)>>1) {
return 0, errors.WithStack(ErrCatalogInvalidRouteRecord)
}
return int(v), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function correctly checks for uint64 to int conversion overflow, but it can be simplified for better readability and maintainability by using a constant for the maximum integer value, which avoids platform-specific checks.

func u64ToInt(v uint64) (int, error) {
	const maxInt = int(^uint(0) >> 1)
	if v > uint64(maxInt) {
		return 0, errors.WithStack(ErrCatalogInvalidRouteRecord)
	}
	return int(v), nil
}

Comment on lines +241 to +262

func assertRouteEqual(t *testing.T, want, got RouteDescriptor) {
t.Helper()
if want.RouteID != got.RouteID {
t.Fatalf("route id mismatch: want %d, got %d", want.RouteID, got.RouteID)
}
if want.GroupID != got.GroupID {
t.Fatalf("group id mismatch: want %d, got %d", want.GroupID, got.GroupID)
}
if want.ParentRouteID != got.ParentRouteID {
t.Fatalf("parent route id mismatch: want %d, got %d", want.ParentRouteID, got.ParentRouteID)
}
if want.State != got.State {
t.Fatalf("state mismatch: want %d, got %d", want.State, got.State)
}
if !bytes.Equal(want.Start, got.Start) {
t.Fatalf("start mismatch: want %q, got %q", want.Start, got.Start)
}
if !bytes.Equal(want.End, got.End) {
t.Fatalf("end mismatch: want %q, got %q", want.End, got.End)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code correctly handles commitTS overflow in prepareSave, but this case is not covered by tests. Adding a test for this scenario would improve coverage and prevent future regressions.

func assertRouteEqual(t *testing.T, want, got RouteDescriptor) {
	t.Helper()
	if want.RouteID != got.RouteID {
		t.Fatalf("route id mismatch: want %d, got %d", want.RouteID, got.RouteID)
	}
	if want.GroupID != got.GroupID {
		t.Fatalf("group id mismatch: want %d, got %d", want.GroupID, got.GroupID)
	}
	if want.ParentRouteID != got.ParentRouteID {
		t.Fatalf("parent route id mismatch: want %d, got %d", want.ParentRouteID, got.ParentRouteID)
	}
	if want.State != got.State {
		t.Fatalf("state mismatch: want %d, got %d", want.State, got.State)
	}
	if !bytes.Equal(want.Start, got.Start) {
		t.Fatalf("start mismatch: want %q, got %q", want.Start, got.Start)
	}
	if !bytes.Equal(want.End, got.End) {
		t.Fatalf("end mismatch: want %q, got %q", want.End, got.End)
	}
}

func TestCatalogStoreSaveRejectsCommitTSOverflow(t *testing.T) {
	st := store.NewMVCCStore()
	ctx := context.Background()

	// Set catalog version to 10 at a high timestamp.
	if err := st.PutAt(ctx, CatalogVersionKey(), EncodeCatalogVersion(10), ^uint64(0)-1, 0); err != nil {
		t.Fatalf("setup: %v", err)
	}
	// Set last commit TS to max uint64. This will be the readTS.
	if err := st.PutAt(ctx, []byte("foo"), []byte("bar"), ^uint64(0), 0); err != nil {
		t.Fatalf("setup: %v", err)
	}

	cs := NewCatalogStore(st)
	// Save with expected version 10. This will read at readTS = ^uint64(0),
	// and then try to commit at commitTS = readTS + 1, which overflows to 0.
	_, err := cs.Save(ctx, 10, []RouteDescriptor{
		{RouteID: 1, Start: []byte(""), End: nil, GroupID: 1, State: RouteStateActive},
	})
	if !errors.Is(err, ErrCatalogVersionOverflow) {
		t.Fatalf("expected ErrCatalogVersionOverflow, got %v", err)
	}
}

- `start []byte` (inclusive)
- `end []byte` (exclusive, nil=+inf)
- `group_id uint64`
- `version uint64` (catalog generation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The RouteDescriptor data model defined here includes a version field. However, the Go implementation in distribution/catalog.go does not have this field on the RouteDescriptor struct. Instead, it uses a single global version for the entire catalog snapshot. This aligns with the design principle that Optimistic Concurrency Control (OCC) conflict checks are not required at the store layer (Rule 1), as the leader is responsible for issuing consistent timestamps. Please update the documentation to reflect the current implementation and this design principle, clarifying that per-record versioning for OCC is not used at the store layer.

References
  1. Optimistic Concurrency Control (OCC) conflict checks are not required at the store layer because the leader is responsible for issuing consistent timestamps, which is expected to prevent write conflicts.

- fix gosec G115 by simplifying uint64->int overflow guard in u64ToInt
- add commit timestamp overflow test for CatalogStore.Save
- align hotspot split design doc with implementation:
  - remove per-route version from RouteDescriptor
  - document global catalog snapshot versioning and catalog-level CAS model
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds milestone/planning documentation for hotspot shard splitting and introduces a first control-plane building block: a durable, versioned route catalog persisted in MVCC state.

Changes:

  • Added hotspot split design doc and Milestone 1 PR plan docs.
  • Introduced distribution.CatalogStore with reserved-key encoding/decoding for catalog version + route descriptors.
  • Added unit tests covering catalog codecs, optimistic version checks, deletes, and edge cases (duplicate IDs, invalid ranges, overflow).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
docs/hotspot_split_milestone1_pr_plan.md Milestone 1 implementation plan + PR breakdown and rollout checklist.
docs/hotspot_shard_split_design.md End-to-end design proposal for hotspot detection, splitting, migration, and routing consistency.
distribution/catalog.go New MVCC-backed route catalog persistence helpers (keys, codecs, snapshot/save).
distribution/catalog_test.go Unit tests for catalog encoding/decoding and versioned save/snapshot behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 453 to 457
commitTS := readTS + 1
if commitTS == 0 {
return savePlan{}, errors.WithStack(ErrCatalogVersionOverflow)
}

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogStore.Save computes commitTS as readTS+1 in prepareSave. If the underlying MVCC store advances LastCommitTS (e.g., due to concurrent user writes with higher HLC timestamps) between prepareSave and ApplyMutations, this commitTS can become < current LastCommitTS and the catalog write will appear to have committed “in the past”, breaking snapshot/time ordering. Consider choosing commitTS immediately before ApplyMutations as max(store.LastCommitTS()+1, readTS+1) (with overflow handling), or require the caller to supply a commitTS/clock so catalog commits are always strictly increasing.

Suggested change
commitTS := readTS + 1
if commitTS == 0 {
return savePlan{}, errors.WithStack(ErrCatalogVersionOverflow)
}
// Choose a commit timestamp that is strictly greater than both the read
// timestamp and the current store LastCommitTS to preserve ordering.
candidateFromRead := readTS + 1
if candidateFromRead == 0 {
return savePlan{}, errors.WithStack(ErrCatalogVersionOverflow)
}
currentLast := s.store.LastCommitTS()
candidateFromStore := currentLast + 1
if candidateFromStore == 0 {
return savePlan{}, errors.WithStack(ErrCatalogVersionOverflow)
}
commitTS := candidateFromRead
if candidateFromStore > commitTS {
commitTS = candidateFromStore
}

Copilot uses AI. Check for mistakes.
- recompute catalog commit TS at apply time to avoid stale planned timestamps
- ensure commit TS is greater than both planned minimum and current store LastCommitTS
- keep overflow checks for commit TS calculation
- add regression test for concurrent LastCommitTS advancement between prepare and apply
@bootjp bootjp requested a review from Copilot February 17, 2026 16:57
@bootjp
Copy link
Owner Author

bootjp commented Feb 17, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the foundational components for hotspot shard splitting by adding a durable and versioned route catalog. The changes include the core logic for catalog persistence, serialization/deserialization of route data, and a comprehensive test suite. Additionally, two markdown files are added, providing a detailed design document and a phased implementation plan. The code is well-structured and the tests are thorough. I have identified one critical compilation issue that needs to be addressed and one medium-severity suggestion for a potential performance improvement in the future.

Comment on lines +463 to +481
func (s *CatalogStore) buildSaveMutations(ctx context.Context, plan savePlan) ([]*store.KVPairMutation, error) {
existingKeys, err := s.routeKeysAt(ctx, plan.readTS)
if err != nil {
return nil, err
}

mutations := make([]*store.KVPairMutation, 0, len(existingKeys)+len(plan.routes)+1)
mutations = appendDeleteMutations(mutations, existingKeys)
mutations, err = appendRouteMutations(mutations, plan.routes)
if err != nil {
return nil, err
}
mutations = append(mutations, &store.KVPairMutation{
Op: store.OpTypePut,
Key: CatalogVersionKey(),
Value: EncodeCatalogVersion(plan.nextVersion),
})
return mutations, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of buildSaveMutations uses a "delete-all, then write-all" strategy to update the route catalog. It fetches all existing keys, creates delete mutations for them, and then creates put mutations for the new set of routes. While simple, this can be inefficient if the catalog contains many routes and only a few are changing, as it leads to unnecessary writes and can increase transaction contention.

For a future improvement, consider implementing a diff-based approach:

  1. Read the full current set of RouteDescriptors, not just the keys.
  2. Compare the current routes with the new routes to identify which are added, modified, or deleted.
  3. Generate mutations only for the routes that have actually changed.

This would reduce write amplification and improve performance for large catalogs.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

out = append(out, cloneRouteDescriptor(route))
}
sort.Slice(out, func(i, j int) bool {
return out[i].RouteID < out[j].RouteID
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routes are normalized/sorted by RouteID here, but the in-memory routing engine assumes routes are sorted by Start (key range) for binary search. Unless all callers re-sort by Start before applying a snapshot, returning RouteDescriptor slices in RouteID order is likely to cause subtle misrouting bugs. Consider sorting by Start (then End/RouteID for tie-breaks), or explicitly documenting the ordering contract in CatalogSnapshot and enforcing/validating it in Engine snapshot apply.

Suggested change
return out[i].RouteID < out[j].RouteID
ri, rj := out[i], out[j]
// Primary ordering: Start key (ascending).
if c := bytes.Compare(ri.Start, rj.Start); c != 0 {
return c < 0
}
// Secondary ordering: End key (ascending, with nil treated as empty slice).
if c := bytes.Compare(ri.End, rj.End); c != 0 {
return c < 0
}
// Tertiary ordering: RouteID for deterministic tie-breaking.
return ri.RouteID < rj.RouteID

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant