Goal
Create an operator runbook covering deployment, monitoring, upgrades, incident response, and rollback.
Scope
- Systemd/service templates
- Log/metric collection guidance
- Alerting thresholds (stall detection, peer drops, disk)
- Backup/restore and disaster recovery
Acceptance criteria
- A new operator can deploy a node from scratch following docs
- Runbook includes a troubleshooting decision tree