Revise disaster recovery documentation for consistency, etc.#277
Revise disaster recovery documentation for consistency, etc.#277susan-pgedge wants to merge 9 commits intomainfrom
Conversation
Updated formatting and improved clarity in the disaster recovery guide.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdated the Disaster Recovery guide and site navigation: the guide's prose, formatting, section order, prerequisites, variable setup, and procedure blocks (Swarm, Control Plane, etcd/data-volume) were reorganized and reworded for clarity; mkdocs.yml adds a Troubleshooting nav entry linking the guide. Changes
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Revised instructions for restoring Docker Swarm and Control Plane, including clearer steps and formatting improvements.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/disaster-recovery/disaster-recovery.md`:
- Line 435: The table row linking "Docker Swarm commands hang" uses a broken
anchor `#reinitializing-the-swarm`; update that anchor in the table entry (the
row containing "Docker Swarm commands hang") to the correct heading anchor
`#restoring-docker-swarm` which matches the "Restoring Docker Swarm" section
(the "1. Reinitialize the Swarm" heading) so the link resolves.
- Line 435: The table row referencing the broken anchor
'#reinitializing-the-swarm' (the "Docker Swarm commands hang" entry) points to a
numbered list item that generates no ID; update that link target to the existing
valid heading anchor '#restoring-docker-swarm' or alternatively convert the "1.
Reinitialize the Swarm" list item into a proper level-2 heading "Reinitializing
the Swarm" so the '#reinitializing-the-swarm' anchor becomes valid; edit the
table cell replacing '#reinitializing-the-swarm' with '#restoring-docker-swarm'
unless you prefer adding a new "## Reinitializing the Swarm" heading to match
the current link.
- Around line 109-168: The numbered step headings ("1. Reinitialize the Swarm",
"2. Join Hosts to the New Swarm", "3. Removing Old Swarm Nodes" and the
equivalent headings in "Restoring the Control Plane" and "Re-adding Hosts") are
indented four spaces, which makes them render as code blocks; outdent those list
markers to 0–3 spaces so they become proper ordered list items, and ensure the
fenced blocks use an unindented ```bash line before and after the commands
(e.g., move the opening ```bash to its own non-indented line and remove the
leading four-space indentation from the "docker swarm init ..." and subsequent
list markers). Apply this same fix to each affected section heading and its
surrounding fenced code blocks.
- Around line 109-168: The numbered list markers ("1.", "2.", "3.") under the
"Reinitialize the Swarm", "Join Hosts to the New Swarm", and "Removing Old Swarm
Nodes" headings are prefixed with 4 spaces causing Python-Markdown to render
them as code blocks; remove the leading 4-space indentation from the lines that
start with the list markers so they are real list items, but keep the following
code blocks (the fenced ```bash``` blocks and the indented command lines like
"docker swarm init --force-new-cluster ..." and "docker swarm join --token ...
${RECOVERY_HOST_IP}:2377") indented exactly as shown so they remain code blocks;
update the three sections' numbered lines accordingly (the list marker lines
shown in the diff and any other similarly-indented numbered steps in "Restoring
the Control Plane" and "Re-adding Hosts").
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/disaster-recovery/disaster-recovery.md (1)
273-289:⚠️ Potential issue | 🟡 Minormarkdownlint MD046: fenced blocks vs. indented style.
MD046 expects indented code blocks in these sections, so the newly fenced blocks will keep failing lint. Either convert these to indented blocks or adjust the rule for this doc.
✏️ Representative conversion (apply to all flagged blocks)
-```sh -curl -X POST \ - "http://${RECOVERY_HOST_IP}:${API_PORT}/v1/databases/<DB_ID>?remove_host=<LOST_HOST_ID>" \ - -H "Content-Type: application/json" \ - -d '{ - "spec": { - "database_name": "<DB_NAME>", - "database_users": [{"username": "admin", "db_owner": true, - "attributes": ["SUPERUSER", "LOGIN"]}], - "port": 5432, - "nodes": [ - { "name": "n1", "host_ids": ["host-1"] }, - { "name": "n2", "host_ids": ["host-2"] } - ] - } - }' -``` + curl -X POST \ + "http://${RECOVERY_HOST_IP}:${API_PORT}/v1/databases/<DB_ID>?remove_host=<LOST_HOST_ID>" \ + -H "Content-Type: application/json" \ + -d '{ + "spec": { + "database_name": "<DB_NAME>", + "database_users": [{"username": "admin", "db_owner": true, + "attributes": ["SUPERUSER", "LOGIN"]}], + "port": 5432, + "nodes": [ + { "name": "n1", "host_ids": ["host-1"] }, + { "name": "n2", "host_ids": ["host-2"] } + ] + } + }'Also applies to: 301-305, 322-325, 386-403, 413-416
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaster-recovery/disaster-recovery.md` around lines 273 - 289, Convert the fenced shell code blocks in disaster-recovery.md (the curl POST examples such as the block starting with curl -X POST "http://${RECOVERY_HOST_IP}:${API_PORT}/v1/databases/<DB_ID>?remove_host=<LOST_HOST_ID>" ...) to indented code block style to satisfy markdownlint MD046 (or alternatively add an explicit rule disable for MD046 in this document's frontmatter); apply the same conversion to the other flagged blocks (the additional curl/command examples referenced in the review) so all examples use indented code blocks or the rule is disabled consistently.
♻️ Duplicate comments (1)
docs/disaster-recovery/disaster-recovery.md (1)
109-167:⚠️ Potential issue | 🟡 MinorOutdent top‑level numbered steps so they render as lists.
These numbered steps are still indented four spaces at the document root, so Markdown will render them as code blocks rather than ordered lists. Please outdent the list markers (and keep their content indented) in all three sections.
✏️ Representative fix (apply to all three sections)
- 1. Reinitialize the Swarm +1. Reinitialize the Swarm - On a surviving manager, invoke the following command: + On a surviving manager, invoke the following command: ```bash docker swarm init --force-new-cluster \ --advertise-addr ${RECOVERY_HOST_IP} ```Also applies to: 195-265, 340-377
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaster-recovery/disaster-recovery.md` around lines 109 - 167, The numbered sections "1. Reinitialize the Swarm", "2. Join Hosts to the New Swarm", and "3. Removing Old Swarm Nodes" are indented and render as code blocks; outdent each top-level list marker (the "1.", "2.", "3." lines) to the document root while keeping the subsequent code fences and wrapped lines indented as shown (e.g., the docker swarm init block, docker swarm join-token / join examples, and docker node demote / docker service rm examples). Apply the same outdent fix to the other occurrences noted (the ranges around lines 195-265 and 340-377) so all three sections render as proper ordered lists instead of code blocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/disaster-recovery/disaster-recovery.md`:
- Around line 181-193: The doc omits an explicit step to stop/scale down Control
Plane services on the recovery host before renaming/moving the etcd directory
(Step 1), which can cause corruption/races; update the "Then on the recovery
host, perform the following steps:" sequence to add a clear pre-step that
instructs operators to stop or scale down Control Plane services (e.g., using
docker service scale control-plane_<host-id>=0 or equivalent) on the recovery
host prior to any backup/restore or moving of the etcd data directory, and
reference the related sections ("Restoring Docker Swarm" / "Data Volume
Restore") so readers know when to skip the stop if services are already offline.
---
Outside diff comments:
In `@docs/disaster-recovery/disaster-recovery.md`:
- Around line 273-289: Convert the fenced shell code blocks in
disaster-recovery.md (the curl POST examples such as the block starting with
curl -X POST
"http://${RECOVERY_HOST_IP}:${API_PORT}/v1/databases/<DB_ID>?remove_host=<LOST_HOST_ID>"
...) to indented code block style to satisfy markdownlint MD046 (or
alternatively add an explicit rule disable for MD046 in this document's
frontmatter); apply the same conversion to the other flagged blocks (the
additional curl/command examples referenced in the review) so all examples use
indented code blocks or the rule is disabled consistently.
---
Duplicate comments:
In `@docs/disaster-recovery/disaster-recovery.md`:
- Around line 109-167: The numbered sections "1. Reinitialize the Swarm", "2.
Join Hosts to the New Swarm", and "3. Removing Old Swarm Nodes" are indented and
render as code blocks; outdent each top-level list marker (the "1.", "2.", "3."
lines) to the document root while keeping the subsequent code fences and wrapped
lines indented as shown (e.g., the docker swarm init block, docker swarm
join-token / join examples, and docker node demote / docker service rm
examples). Apply the same outdent fix to the other occurrences noted (the ranges
around lines 195-265 and 340-377) so all three sections render as proper ordered
lists instead of code blocks.
| - all server-mode hosts were offline: On a Swarm manager, stop all | ||
| Control Plane services with the command: | ||
| `docker service scale control-plane_<host-id-1>=0 control-plane_<host-id-2>=0 ...` | ||
| Then, restore the data volume from your | ||
| backup (see [Data Volume Restore](#data-volume-restore)). | ||
| - at least one server-mode host was still up: Use that host as the | ||
| recovery host. You do not need to stop services or restore the volume | ||
| first. | ||
| - you already completed [Restoring Docker Swarm](#restoring-docker-swarm): | ||
| Restore the data volume on the surviving host if it was lost (see | ||
| [Data volume restore](#data-volume-restore)); otherwise skip. | ||
|
|
||
| Then on the recovery host, perform the following steps: |
There was a problem hiding this comment.
Stop Control Plane services before moving live etcd data.
Step 1 renames the etcd directory. If the recovery host still has Control Plane services running, moving that directory can corrupt or race active processes. Add an explicit stop/scale‑down step before the backup/restore actions.
🛠️ Suggested clarification
Then on the recovery host, perform the following steps:
+If any Control Plane services are running on the recovery host, stop them
+before step 1 to avoid moving live etcd data (for example:
+`docker service scale control-plane_<host-id>=0`).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/disaster-recovery/disaster-recovery.md` around lines 181 - 193, The doc
omits an explicit step to stop/scale down Control Plane services on the recovery
host before renaming/moving the etcd directory (Step 1), which can cause
corruption/races; update the "Then on the recovery host, perform the following
steps:" sequence to add a clear pre-step that instructs operators to stop or
scale down Control Plane services (e.g., using docker service scale
control-plane_<host-id>=0 or equivalent) on the recovery host prior to any
backup/restore or moving of the etcd data directory, and reference the related
sections ("Restoring Docker Swarm" / "Data Volume Restore") so readers know when
to skip the stop if services are already offline.
There was a problem hiding this comment.
Does this require an update?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Yes, we should update it. For the case where at least one server-mode host was still up, add a step to stop the Control Plane service on the recovery host (e.g. scale it to 0) before they run the steps that rename/move the etcd directory, so we don’t move live data while the process is running.
There was a problem hiding this comment.
@moizpgedge Can you make that change inline here?
|
|
||
| You should see one host with `status: "reachable"` and | ||
| `etcd_mode: "server"`. Then continue with the next section: | ||
| [Updating Databases to Remove Old Hosts](#updating-databases-to-remove-old-hosts). |
There was a problem hiding this comment.
In the document view, the hyperlink is not displayed and appears as plain text.
|
|
||
| - If Control Plane is already deployed as Swarm services (if the etcd | ||
| quorum was lost and you did not run | ||
| [Restoring Docker Swarm](#restoring-docker-swarm)): |
There was a problem hiding this comment.
it shows plain text Restoring Docker Swarm in document view
| [Restoring Docker Swarm](#restoring-docker-swarm)): | ||
| `docker service scale control-plane_${RECOVERY_HOST_ID}=1` | ||
|
|
||
| - If you completed [Restoring Docker Swarm](#restoring-docker-swarm) |
There was a problem hiding this comment.
Fixing link comment
There was a problem hiding this comment.
♻️ Duplicate comments (2)
docs/disaster-recovery/disaster-recovery.md (2)
109-164:⚠️ Potential issue | 🟡 MinorOrdered steps still render as code blocks due to 4-space indentation.
Line 109 and the equivalent step markers in Line 196+ and Line 341+ are still indented by four spaces, so they render as code instead of numbered steps in MkDocs/Python-Markdown.
Proposed fix
- 1. Reinitialize the Swarm +1. Reinitialize the Swarm ... - 2. Join Hosts to the New Swarm +2. Join Hosts to the New Swarm ... - 3. Removing Old Swarm Nodes +3. Removing Old Swarm NodesApply the same outdent pattern to the numbered markers in “Restoring the Control Plane” and “Re-adding Hosts”.
Also applies to: 196-267, 341-378
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaster-recovery/disaster-recovery.md` around lines 109 - 164, The numbered list markers (e.g., the "1. Reinitialize the Swarm" heading and the equivalent markers in the "Restoring the Control Plane" and "Re-adding Hosts" sections) are prefixed with four spaces which causes Markdown to render them as code blocks; remove the leading four-space indent from those numbered lines so the lists render as normal ordered lists, ensuring fenced code blocks and their internal indentation remain unchanged and apply the same outdent change to the other occurrences you noted (the sections around lines 196–267 and 341–378).
187-202:⚠️ Potential issue | 🟠 MajorConflicting guidance: services may still be live when etcd data is moved.
Line 187-Line 189 says services do not need to be stopped when a server-mode host remains, but Line 196-Line 202 immediately renames/moves etcd data. That can race a running Control Plane service on the recovery host.
Proposed fix
-- at least one server-mode host was still up: Use that host as the - recovery host. You do not need to stop services or restore the volume - first. +- at least one server-mode host was still up: Use that host as the + recovery host. On that recovery host, stop/scale down the Control Plane + service before step 1 so etcd data is not moved while live.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaster-recovery/disaster-recovery.md` around lines 187 - 202, The current instructions move the etcd data directory (PGEDGE_DATA_DIR/etcd -> ETCD_BACKUP_DIR) while earlier text says services need not be stopped, creating a race with a running control-plane; update the procedure to either (a) explicitly stop the control-plane/etcd-related services on the recovery host before moving the directory (add a short step to stop the service(s) hosting etcd), or (b) if downtime is unacceptable, replace the mv step with an etcd-consistent snapshot approach (e.g., use etcdctl snapshot save) and document restoring from that snapshot; reference the existing variables and operations (PGEDGE_DATA_DIR, etcd directory move to ETCD_BACKUP_DIR, and the mv action) so the change is applied where the backup/move is described.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/disaster-recovery/disaster-recovery.md`:
- Around line 109-164: The numbered list markers (e.g., the "1. Reinitialize the
Swarm" heading and the equivalent markers in the "Restoring the Control Plane"
and "Re-adding Hosts" sections) are prefixed with four spaces which causes
Markdown to render them as code blocks; remove the leading four-space indent
from those numbered lines so the lists render as normal ordered lists, ensuring
fenced code blocks and their internal indentation remain unchanged and apply the
same outdent change to the other occurrences you noted (the sections around
lines 196–267 and 341–378).
- Around line 187-202: The current instructions move the etcd data directory
(PGEDGE_DATA_DIR/etcd -> ETCD_BACKUP_DIR) while earlier text says services need
not be stopped, creating a race with a running control-plane; update the
procedure to either (a) explicitly stop the control-plane/etcd-related services
on the recovery host before moving the directory (add a short step to stop the
service(s) hosting etcd), or (b) if downtime is unacceptable, replace the mv
step with an etcd-consistent snapshot approach (e.g., use etcdctl snapshot save)
and document restoring from that snapshot; reference the existing variables and
operations (PGEDGE_DATA_DIR, etcd directory move to ETCD_BACKUP_DIR, and the mv
action) so the change is applied where the backup/move is described.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
docs/disaster-recovery/disaster-recovery.md (2)
181-183:⚠️ Potential issue | 🟠 MajorAdd an explicit “stop Control Plane service” pre-step before moving etcd data.
Line 181-Line 183 currently says services do not need to be stopped when at least one server-mode host remains, but Line 190+ immediately moves etcd data. That can race with live processes. Add a clear pre-step to scale/stop Control Plane on the recovery host before Step 1.
Suggested clarification
-- at least one server-mode host was still up: Use that host as the - recovery host. You do not need to stop services or restore the volume - first. +- at least one server-mode host was still up: Use that host as the + recovery host. Before moving etcd data, stop/scale down Control Plane + service(s) on that recovery host (for example, + `docker service scale control-plane_<host-id>=0`). Restore the volume + only if needed.Also applies to: 188-205
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaster-recovery/disaster-recovery.md` around lines 181 - 183, Add an explicit pre-step to stop/scale down the Control Plane services on the chosen recovery host before any instructions that move etcd data: insert a clear step such as "Stop/scale Control Plane services on the recovery host (e.g., kube-apiserver, kube-controller-manager, kube-scheduler and any orchestration-managed control-plane processes) and verify they are not running" immediately before the existing Step 1 where etcd data is moved in the section currently describing recovery when "at least one server-mode host was still up" (also apply the same pre-step to the parallel block covering lines 188-205). Mention verifying services are stopped to avoid races with live processes.
109-157:⚠️ Potential issue | 🟡 MinorFix ordered-list indentation so recovery steps render as steps, not code.
The numbered items are still indented by 4 spaces, so Markdown renders them as code blocks instead of ordered lists (e.g., Line 109, Line 190, Line 335). This makes operational steps harder to follow in docs UI.
Suggested fix (representative)
- 1. Reinitialize the Swarm; on a surviving manager, invoke the following command: +1. Reinitialize the Swarm; on a surviving manager, invoke the following command: - ```bash - docker swarm init --force-new-cluster \ - --advertise-addr ${RECOVERY_HOST_IP} - ``` + ```bash + docker swarm init --force-new-cluster \ + --advertise-addr ${RECOVERY_HOST_IP} + ``` - 2. Join Hosts to the New Swarm. If you have other surviving nodes that should be +2. Join Hosts to the New Swarm. If you have other surviving nodes that should beAlso applies to: 190-260, 335-372
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaster-recovery/disaster-recovery.md` around lines 109 - 157, Remove the leading 4-space indentation that turns ordered steps into code blocks in disaster-recovery.md: un-indent the numbered headings ("Reinitialize the Swarm", "Join Hosts to the New Swarm", "Removing Old Swarm Nodes") so they start at column 0 (e.g., change " 1. ..." to "1. ..."), and ensure the fenced code blocks for commands use non-indented triple-backtick fences aligned with the list (e.g., ```bash on its own line, then the commands, then ```), fixing the blocks shown around the docker swarm init/join/node/service examples so they render as proper ordered list steps rather than code blocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/disaster-recovery/disaster-recovery.md`:
- Around line 181-183: Add an explicit pre-step to stop/scale down the Control
Plane services on the chosen recovery host before any instructions that move
etcd data: insert a clear step such as "Stop/scale Control Plane services on the
recovery host (e.g., kube-apiserver, kube-controller-manager, kube-scheduler and
any orchestration-managed control-plane processes) and verify they are not
running" immediately before the existing Step 1 where etcd data is moved in the
section currently describing recovery when "at least one server-mode host was
still up" (also apply the same pre-step to the parallel block covering lines
188-205). Mention verifying services are stopped to avoid races with live
processes.
- Around line 109-157: Remove the leading 4-space indentation that turns ordered
steps into code blocks in disaster-recovery.md: un-indent the numbered headings
("Reinitialize the Swarm", "Join Hosts to the New Swarm", "Removing Old Swarm
Nodes") so they start at column 0 (e.g., change " 1. ..." to "1. ..."), and
ensure the fenced code blocks for commands use non-indented triple-backtick
fences aligned with the list (e.g., ```bash on its own line, then the commands,
then ```), fixing the blocks shown around the docker swarm
init/join/node/service examples so they render as proper ordered list steps
rather than code blocks.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/disaster-recovery/disaster-recovery.md (1)
105-105: Optional: Consider simplifying "a majority of" to "most".For slightly more concise language, you could replace "a majority of managers" with "most managers," though the current phrasing is perfectly acceptable in technical documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/disaster-recovery/disaster-recovery.md` at line 105, Replace the phrase "a majority of managers" with the simpler wording "most managers" in the disaster recovery text (locate the exact string "a majority of managers" in disaster-recovery.md) so the sentence reads more concise while preserving meaning; ensure surrounding sentence grammar and any linked references remain correct after the substitution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/disaster-recovery/disaster-recovery.md`:
- Around line 180-189: Update the guidance so the recovery host's Control Plane
service is stopped before moving etcd data: change the sentence that currently
says "You do not need to stop services..." to explicitly require stopping the
Control Plane service on the recovery host, and insert a clear preparatory step
before "Back up existing etcd data..." that instructs operators to stop the
Control Plane service (and confirm it's not running) on the recovery host to
avoid races with the etcd directory move.
---
Nitpick comments:
In `@docs/disaster-recovery/disaster-recovery.md`:
- Line 105: Replace the phrase "a majority of managers" with the simpler wording
"most managers" in the disaster recovery text (locate the exact string "a
majority of managers" in disaster-recovery.md) so the sentence reads more
concise while preserving meaning; ensure surrounding sentence grammar and any
linked references remain correct after the substitution.
| - at least one server-mode host was still up: Use that host as the | ||
| recovery host. You do not need to stop services or restore the volume | ||
| first. | ||
| - you already completed [Restoring Docker Swarm](#restoring-docker-swarm): | ||
| Restore the data volume on the surviving host if it was lost (see | ||
| [Data volume restore](#data-volume-restore)); otherwise skip. | ||
|
|
||
| Then on the recovery host, perform the following steps: | ||
|
|
||
| 1. Back up existing etcd data and set aside for restore: |
There was a problem hiding this comment.
Stop the Control Plane service on the recovery host before moving etcd data.
The current text at lines 180-182 states "You do not need to stop services or restore the volume first" for the case where at least one server-mode host is still up. However, step 1 (line 189) immediately moves the etcd directory, which can cause corruption if the Control Plane service is still running on the recovery host and actively using that directory.
Per the discussion in past reviews (moizpgedge confirmed), you must stop the service on the recovery host before moving the etcd directory to avoid racing with live processes.
🛠️ Proposed fix
- at least one server-mode host was still up: Use that host as the
- recovery host. You do not need to stop services or restore the volume
- first.
+ recovery host. Stop the Control Plane service on that host with:
+ `docker service scale control-plane_<recovery-host-id>=0`
+ You do not need to restore the volume from backup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/disaster-recovery/disaster-recovery.md` around lines 180 - 189, Update
the guidance so the recovery host's Control Plane service is stopped before
moving etcd data: change the sentence that currently says "You do not need to
stop services..." to explicitly require stopping the Control Plane service on
the recovery host, and insert a clear preparatory step before "Back up existing
etcd data..." that instructs operators to stop the Control Plane service (and
confirm it's not running) on the recovery host to avoid races with the etcd
directory move.
Updated formatting and improved clarity in the disaster recovery guide.
Summary
Changes
Testing
Checklist
Notes for Reviewers