Skip to content

Conversation

@jhrv
Copy link
Contributor

@jhrv jhrv commented Feb 12, 2026

Problem

Migration fails when the source Cloud SQL instance has any of these features enabled (#240):

  • High Availability (HA)
  • Point-in-Time Recovery (PITR)

Root cause

DefineInstance() uses DeepCopy() to copy the source instance spec to the target. This copies all settings, including HA and PITR. The target instance is then created with these settings via the helper app.

During setup, DMS calls demoteTargetInstance() to convert the target to a read replica. This fails because:

  • A DMS-managed replica cannot have PITR enabled (DMS controls the WAL stream)
  • HA (availability type REGIONAL) is incompatible with being demoted to a DMS replica

Fix

Changes in instance.go:

  1. CreateInstance() – After calling DefineInstance(), strip HA and PITR from the helper app spec before creating the target instance. This prevents GCP from creating an instance with incompatible settings.

  2. PrepareTargetInstance() – Also explicitly disable PITR (pointInTimeRecoveryEnabled = false) and set availabilityType = ZONAL on the CNRM spec as a safety net.

  3. ValidateSourceInstance() – Log clear warnings when the source has HA or PITR enabled, explaining they will be temporarily disabled during migration.

  4. UpdateTargetInstanceAfterPromotion() – After DMS promote completes, explicitly restore HA (availabilityType=REGIONAL) and PITR from the source CNRM spec onto the target instance.

Changes in promote.go:

  1. Updated UpdateTargetInstanceAfterPromotion() call to pass source instance for settings restoration.

Changes in database.go:

  1. SetDatabasePassword() – Clear DatabaseRoles from the user object before calling Users.Update. The Cloud SQL Admin API returns databaseRoles in GET responses but rejects them in Update requests with "Invalid request to update database roles". This is a pre-existing bug unrelated to HA/PITR but was discovered during testing.

Re-enabling after migration

After promotion, UpdateTargetInstanceAfterPromotion() explicitly restores HA and PITR settings by reading the source CNRM spec and applying them to the target. Additionally, UpdateApplicationInstance() applies the original app spec via naiserator reconciliation.

Testing

Verified end-to-end with a real migration in nav-dev/nais:

Source instance: migrator-test-ha-pitr

  • POSTGRES_15, HA (REGIONAL), PITR enabled, pgaudit enabled, db-f1-micro

Target instance: migrator-test-target

  • POSTGRES_16

Results

Step Result
Validate source (HA/PITR warnings logged)
Create target with ZONAL + no PITR
DMS migration setup + CDC replication
DMS promote (demote target as replica)
Restore HA (REGIONAL) on target after promote
Restore PITR on target after promote
pgaudit flag survived migration
Final target: POSTGRES_16, REGIONAL, PITR=true

What about audit logging?

The cloudsql.enable_pgaudit database flag is copied via DeepCopy() and was not stripped. Testing confirmed it does not break the DMS migration – the flag survived the full setup→promote cycle without issues.

Fixes #240

/cc @mortenlj

DMS migration fails when the source instance has High Availability or
Point-in-Time Recovery enabled, because these settings are copied to
the target via DeepCopy() in DefineInstance(). The target is then
demoted to a read replica by DMS, which is incompatible with HA and PITR.

Changes:
- Strip HA and PITR from helper app spec in CreateInstance()
- Disable PITR and set availability to ZONAL in PrepareTargetInstance()
- Add warnings in ValidateSourceInstance() about temporarily disabled features

After promotion, HA and PITR are automatically re-enabled when
UpdateApplicationInstance() applies the original app spec (which still
has these settings) via naiserator reconciliation.

Fixes #240
@jhrv jhrv self-assigned this Feb 12, 2026
@jhrv jhrv requested a review from mortenlj February 12, 2026 14:23
Copy link
Member

@mortenlj mortenlj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

jhrv and others added 3 commits February 12, 2026 15:37
Instead of relying on implicit re-enabling through naiserator
reconciliation, explicitly restore AvailabilityType, backup and PITR
settings from the source instance onto the target CNRM spec in
UpdateTargetInstanceAfterPromotion().

This makes the restore visible in logs and less fragile.
The Cloud SQL Admin API returns databaseRoles in GET responses but
rejects them in Update requests with 'Invalid request to update
database roles'. Clear the field before sending the update.
@jhrv jhrv merged commit 2a961a9 into main Feb 13, 2026
4 checks passed
@jhrv jhrv deleted the fix/disable-pitr-ha-for-migration branch February 13, 2026 10:04
@mortenlj
Copy link
Member

mortenlj commented Feb 13, 2026

The cloudsql.enable_pgaudit database flag is copied via DeepCopy() and was not stripped. Testing confirmed it does not break the DMS migration – the flag survived the full setup→promote cycle without issues.

Jeg lurer på om problemet kommer etter at man har kjørt nais postgres enable-audit, som er det siste steget i Enable audit logging.

Jeg kjørte akkurat en test på å migrere noen databaser hvor jeg hadde skrudd på audit i henhold til vår dokumentasjon, og alle tre feilet på samme sted. Migreringsjobben på Google feiler ved oppstart med en Internal error og anbefaler å kontakte support.

Jeg lar test-apper og databaser stå slik de er nå dersom noen ønsker å undersøke nærmere, så skal jeg prøve ( 😅 ) å huske å slette dem etter vinterferien 😄

Testapper/databaser i dev-nais-dev, namespace basseng:

APP SOURCE TARGET
db-tester-1 not-same-as-app-1 db-tester-name-1
db-tester-2 not-same-as-app-2 db-tester-name-2
db-tester-3 not-same-as-app-3 db-tester-name-3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migration doesn't work when certain features are enabled

2 participants