Skip to content

feat: release IPAM subnet on network delete to avoid range exhaustion#276

Open
moizpgedge wants to merge 1 commit intomainfrom
feat/PLAT-399/Implement-release-deallocate-method-on-IPAM-service
Open

feat: release IPAM subnet on network delete to avoid range exhaustion#276
moizpgedge wants to merge 1 commit intomainfrom
feat/PLAT-399/Implement-release-deallocate-method-on-IPAM-service

Conversation

@moizpgedge
Copy link
Contributor

@moizpgedge moizpgedge commented Feb 24, 2026

Summary

Adds subnet release on database network delete so the IPAM pool is reused and "range is full" no longer occurs after repeated create/delete. Release is best-effort and non-fatal (warnings only) when the subnet is invalid, missing, or out of range.

Changes

  • Add ReleaseSubnet/releaseSubnet on IPAM service to deallocate subnet when a database network is removed
  • Call ReleaseSubnet from swarm Network.Delete after successful NetworkRemove or ErrNotFound
  • Document workaround: use a different subnet CIDR to avoid conflicts (docs/installation/configuration.md)

Testing

  • go build ./server/... and go test ./server/internal/ipam/... ./server/internal/orchestrator/swarm/...

  • make test (full unit tests)

  • Manual: small CIDR (e.g. /26), create database → delete → create again; second create succeeds (no "range is full").

  • ...

Checklist

  • Tests added or updated (unit and/or e2e, as needed)
  • Documentation updated (if needed)
  • Issue is linked
  • Changelog entry added for user-facing behavior changes
  • Breaking changes (if any) are clearly called out in the PR description

Notes for Reviewers

  • Release is best-effort: all failure paths log a warning and return nil so network delete never fails due to IPAM.
  • Network.Delete removes the Docker network first, then calls ReleaseSubnet; order avoids leaking a subnet if release fails.

PLAT-399

- Add ReleaseSubnet/releaseSubnet on IPAM service to deallocate subnet when a database network is removed
- Call ReleaseSubnet from swarm Network.Delete after successful NetworkRemove or ErrNotFound
- Document workaround: use a different subnet CIDR to avoid conflicts (docs/installation/configuration.md)
@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

This PR adds subnet release functionality to the IPAM service with retry logic and state validation. It introduces a ReleaseSubnet method, adds subnet containment validation, and integrates subnet deallocation into network deletion. Configuration documentation is updated with CIDR conflict guidance.

Changes

Cohort / File(s) Summary
Documentation
docs/installation/configuration.md
Added guidance on avoiding subnet range conflicts with non-overlapping CIDR values and Control Plane restarts; updated configuration example with new default CIDR (10.129.0.0/18).
IPAM Service
server/internal/ipam/service.go
Implemented ReleaseSubnet method with retry logic, state validation, and best-effort semantics for releasing subnets back to the pool with conflict handling.
IPAM Utilities
server/internal/ipam/subnet.go
Added public Contains method to SubnetRange type for validating subnet containment checks.
Network Management
server/internal/orchestrator/swarm/network.go
Simplified error handling in Delete method and integrated subnet deallocation via ReleaseSubnet call as a best-effort side-effect.

Poem

🐰 Our subnets now dance with precision and grace,
Released to their pools, each one finds its place,
No conflicts remain when we carefully choose,
The IPAM service wins, networks don't lose! 🌿

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: releasing IPAM subnets on network deletion to prevent range exhaustion.
Description check ✅ Passed The PR description includes all major sections (Summary, Changes, Testing, Checklist) and provides comprehensive context about the implementation and testing approach.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/PLAT-399/Implement-release-deallocate-method-on-IPAM-service

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@moizpgedge moizpgedge requested a review from mmols February 24, 2026 05:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/installation/configuration.md`:
- Around line 49-59: Clarify the contradictory guidance about changing
docker_swarm.database_networks_cidr and database_networks_subnet_bits:
explicitly state that these values must not be changed after any databases are
created (or only may be changed if all databases using the old range have been
deleted and re-created), and add a prominent warning/admonition callout
indicating the risk and required steps (delete databases or perform full
migration) before changing; also specify what the CIDR must not overlap with
(e.g., host Docker bridge networks, other Docker Swarm IPAM ranges, or any
on-prem/VPC networks) and note that the Control Plane restart is required only
when performed before creating databases or after completing the safe migration
steps.

In `@server/internal/ipam/service.go`:
- Around line 56-71: The loop in releaseSubnet uses releaseMaxRetries with for
retries := releaseMaxRetries; retries >= 0; retries-- which makes 1 +
releaseMaxRetries attempts (e.g. releaseMaxRetries=2 -> 3 attempts) and is
inconsistent with allocateSubnet's semantics; fix by either renaming the
constant to reflect attempts (e.g. releaseMaxAttempts = 3) and keep the loop, or
keep releaseMaxRetries and change the loop to run retries > 0 (so it performs 1
initial attempt plus releaseMaxRetries retries); update the declaration of
releaseMaxRetries/releaseMaxAttempts and the loop around s.releaseSubnet(ctx,
prefix, bits, subnet) accordingly to restore consistent semantics and naming.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7e8a31 and 65bf163.

📒 Files selected for processing (4)
  • docs/installation/configuration.md
  • server/internal/ipam/service.go
  • server/internal/ipam/subnet.go
  • server/internal/orchestrator/swarm/network.go

Comment on lines +49 to +59

To avoid subnet range conflicts, use a non-overlapping CIDR. Restart the Control Plane after changing.

```json
{
"docker_swarm": {
"database_networks_cidr": "10.129.0.0/18",
"database_networks_subnet_bits": 26
}
}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Contradictory guidance: table says "Must not be changed after creating databases" but the note encourages changing CIDR.

Lines 46–47 state the CIDR and subnet bits "Must not be changed after creating databases," yet this new note advises users to switch to a different CIDR and restart. This will confuse operators—clarify when this workaround is safe (e.g., only before any databases exist, or only after all databases on the old range are deleted) and consider adding a warning/admonition callout for visibility.

Also, the note doesn't explain what the CIDR should not overlap with (host Docker networks? other IPAM ranges?). A brief clarification would help.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/installation/configuration.md` around lines 49 - 59, Clarify the
contradictory guidance about changing docker_swarm.database_networks_cidr and
database_networks_subnet_bits: explicitly state that these values must not be
changed after any databases are created (or only may be changed if all databases
using the old range have been deleted and re-created), and add a prominent
warning/admonition callout indicating the risk and required steps (delete
databases or perform full migration) before changing; also specify what the CIDR
must not overlap with (e.g., host Docker bridge networks, other Docker Swarm
IPAM ranges, or any on-prem/VPC networks) and note that the Control Plane
restart is required only when performed before creating databases or after
completing the safe migration steps.

Comment on lines +56 to +71
var lastErr error
for retries := releaseMaxRetries; retries >= 0; retries-- {
lastErr = s.releaseSubnet(ctx, prefix, bits, subnet)
if lastErr == nil {
return nil
}
if !errors.Is(lastErr, storage.ErrValueVersionMismatch) {
// Non-retryable: log and succeed so delete is not blocked
s.logger.Warn().
Err(lastErr).
Str("subnet", subnet.String()).
Str("prefix", prefix.String()).
Msg("release subnet failed (non-fatal)")
return nil
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Off-by-one between constant name and actual attempt count.

releaseMaxRetries = 2 combined with for retries := releaseMaxRetries; retries >= 0; retries-- yields 3 attempts (iterations at 2, 1, 0). The name "max retries" conventionally means retries after the first attempt. Compare with allocateSubnet where maxRetries = 3 yields exactly 3 total attempts.

Either rename the constant (e.g. releaseMaxAttempts = 3) or adjust the loop to retries > 0 so the semantics match.

Option A – rename to reflect actual semantics
 const (
 	maxRetries        = 3
-	releaseMaxRetries = 2
+	releaseMaxAttempts = 3
 )
 ...
-	for retries := releaseMaxRetries; retries >= 0; retries-- {
+	for attempt := 0; attempt < releaseMaxAttempts; attempt++ {
Option B – keep constant, fix loop to mean 1 + 2 retries
-	for retries := releaseMaxRetries; retries >= 0; retries-- {
+	var lastErr error
+	for attempt := 0; attempt <= releaseMaxRetries; attempt++ {

(Either option keeps 3 total attempts; the point is consistency between the name and the loop.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/ipam/service.go` around lines 56 - 71, The loop in
releaseSubnet uses releaseMaxRetries with for retries := releaseMaxRetries;
retries >= 0; retries-- which makes 1 + releaseMaxRetries attempts (e.g.
releaseMaxRetries=2 -> 3 attempts) and is inconsistent with allocateSubnet's
semantics; fix by either renaming the constant to reflect attempts (e.g.
releaseMaxAttempts = 3) and keep the loop, or keep releaseMaxRetries and change
the loop to run retries > 0 (so it performs 1 initial attempt plus
releaseMaxRetries retries); update the declaration of
releaseMaxRetries/releaseMaxAttempts and the loop around s.releaseSubnet(ctx,
prefix, bits, subnet) accordingly to restore consistent semantics and naming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant