feat: release IPAM subnet on network delete to avoid range exhaustion#276
feat: release IPAM subnet on network delete to avoid range exhaustion#276moizpgedge wants to merge 1 commit intomainfrom
Conversation
- Add ReleaseSubnet/releaseSubnet on IPAM service to deallocate subnet when a database network is removed - Call ReleaseSubnet from swarm Network.Delete after successful NetworkRemove or ErrNotFound - Document workaround: use a different subnet CIDR to avoid conflicts (docs/installation/configuration.md)
📝 WalkthroughWalkthroughThis PR adds subnet release functionality to the IPAM service with retry logic and state validation. It introduces a ReleaseSubnet method, adds subnet containment validation, and integrates subnet deallocation into network deletion. Configuration documentation is updated with CIDR conflict guidance. Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/installation/configuration.md`:
- Around line 49-59: Clarify the contradictory guidance about changing
docker_swarm.database_networks_cidr and database_networks_subnet_bits:
explicitly state that these values must not be changed after any databases are
created (or only may be changed if all databases using the old range have been
deleted and re-created), and add a prominent warning/admonition callout
indicating the risk and required steps (delete databases or perform full
migration) before changing; also specify what the CIDR must not overlap with
(e.g., host Docker bridge networks, other Docker Swarm IPAM ranges, or any
on-prem/VPC networks) and note that the Control Plane restart is required only
when performed before creating databases or after completing the safe migration
steps.
In `@server/internal/ipam/service.go`:
- Around line 56-71: The loop in releaseSubnet uses releaseMaxRetries with for
retries := releaseMaxRetries; retries >= 0; retries-- which makes 1 +
releaseMaxRetries attempts (e.g. releaseMaxRetries=2 -> 3 attempts) and is
inconsistent with allocateSubnet's semantics; fix by either renaming the
constant to reflect attempts (e.g. releaseMaxAttempts = 3) and keep the loop, or
keep releaseMaxRetries and change the loop to run retries > 0 (so it performs 1
initial attempt plus releaseMaxRetries retries); update the declaration of
releaseMaxRetries/releaseMaxAttempts and the loop around s.releaseSubnet(ctx,
prefix, bits, subnet) accordingly to restore consistent semantics and naming.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/installation/configuration.mdserver/internal/ipam/service.goserver/internal/ipam/subnet.goserver/internal/orchestrator/swarm/network.go
|
|
||
| To avoid subnet range conflicts, use a non-overlapping CIDR. Restart the Control Plane after changing. | ||
|
|
||
| ```json | ||
| { | ||
| "docker_swarm": { | ||
| "database_networks_cidr": "10.129.0.0/18", | ||
| "database_networks_subnet_bits": 26 | ||
| } | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Contradictory guidance: table says "Must not be changed after creating databases" but the note encourages changing CIDR.
Lines 46–47 state the CIDR and subnet bits "Must not be changed after creating databases," yet this new note advises users to switch to a different CIDR and restart. This will confuse operators—clarify when this workaround is safe (e.g., only before any databases exist, or only after all databases on the old range are deleted) and consider adding a warning/admonition callout for visibility.
Also, the note doesn't explain what the CIDR should not overlap with (host Docker networks? other IPAM ranges?). A brief clarification would help.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/installation/configuration.md` around lines 49 - 59, Clarify the
contradictory guidance about changing docker_swarm.database_networks_cidr and
database_networks_subnet_bits: explicitly state that these values must not be
changed after any databases are created (or only may be changed if all databases
using the old range have been deleted and re-created), and add a prominent
warning/admonition callout indicating the risk and required steps (delete
databases or perform full migration) before changing; also specify what the CIDR
must not overlap with (e.g., host Docker bridge networks, other Docker Swarm
IPAM ranges, or any on-prem/VPC networks) and note that the Control Plane
restart is required only when performed before creating databases or after
completing the safe migration steps.
| var lastErr error | ||
| for retries := releaseMaxRetries; retries >= 0; retries-- { | ||
| lastErr = s.releaseSubnet(ctx, prefix, bits, subnet) | ||
| if lastErr == nil { | ||
| return nil | ||
| } | ||
| if !errors.Is(lastErr, storage.ErrValueVersionMismatch) { | ||
| // Non-retryable: log and succeed so delete is not blocked | ||
| s.logger.Warn(). | ||
| Err(lastErr). | ||
| Str("subnet", subnet.String()). | ||
| Str("prefix", prefix.String()). | ||
| Msg("release subnet failed (non-fatal)") | ||
| return nil | ||
| } | ||
| } |
There was a problem hiding this comment.
Off-by-one between constant name and actual attempt count.
releaseMaxRetries = 2 combined with for retries := releaseMaxRetries; retries >= 0; retries-- yields 3 attempts (iterations at 2, 1, 0). The name "max retries" conventionally means retries after the first attempt. Compare with allocateSubnet where maxRetries = 3 yields exactly 3 total attempts.
Either rename the constant (e.g. releaseMaxAttempts = 3) or adjust the loop to retries > 0 so the semantics match.
Option A – rename to reflect actual semantics
const (
maxRetries = 3
- releaseMaxRetries = 2
+ releaseMaxAttempts = 3
)
...
- for retries := releaseMaxRetries; retries >= 0; retries-- {
+ for attempt := 0; attempt < releaseMaxAttempts; attempt++ {Option B – keep constant, fix loop to mean 1 + 2 retries
- for retries := releaseMaxRetries; retries >= 0; retries-- {
+ var lastErr error
+ for attempt := 0; attempt <= releaseMaxRetries; attempt++ {(Either option keeps 3 total attempts; the point is consistency between the name and the loop.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/internal/ipam/service.go` around lines 56 - 71, The loop in
releaseSubnet uses releaseMaxRetries with for retries := releaseMaxRetries;
retries >= 0; retries-- which makes 1 + releaseMaxRetries attempts (e.g.
releaseMaxRetries=2 -> 3 attempts) and is inconsistent with allocateSubnet's
semantics; fix by either renaming the constant to reflect attempts (e.g.
releaseMaxAttempts = 3) and keep the loop, or keep releaseMaxRetries and change
the loop to run retries > 0 (so it performs 1 initial attempt plus
releaseMaxRetries retries); update the declaration of
releaseMaxRetries/releaseMaxAttempts and the loop around s.releaseSubnet(ctx,
prefix, bits, subnet) accordingly to restore consistent semantics and naming.
Summary
Adds subnet release on database network delete so the IPAM pool is reused and "range is full" no longer occurs after repeated create/delete. Release is best-effort and non-fatal (warnings only) when the subnet is invalid, missing, or out of range.
Changes
Testing
go build ./server/... and go test ./server/internal/ipam/... ./server/internal/orchestrator/swarm/...
make test (full unit tests)
Manual: small CIDR (e.g. /26), create database → delete → create again; second create succeeds (no "range is full").
...
Checklist
Notes for Reviewers
PLAT-399