-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Part of #271 — Gastown Cloud Proposal A (Sandbox-per-Town)
Goal
Implement disaster-recovery backup of gastown state (Dolt databases, git repos, config files) to Cloudflare R2. This is a backup layer — the Fly persistent volume is the primary storage. R2 handles catastrophic volume loss and cross-region migration.
Context
Fly.io persistent volumes survive machine restarts and stops but not volume destruction or region migration. R2 backups ensure recoverability with a worst-case data loss window of 5 minutes.
Requirements
R2 Key Structure
gastown/{town_id}/
├── latest.json # Pointer: { "timestamp": "20260217T120000Z" }
├── snapshots/{timestamp}/
│ ├── manifest.json # Files included, checksums, gt version
│ ├── dolt/{rig_name}.backup # `dolt backup` output per rig
│ ├── git/{rig_name}.bundle # `git bundle create --all` per rig
│ ├── config.tar # Town + rig config files
│ └── runtime.tar # .runtime/ checkpoint files
└── incremental/ # Future: incremental deltas
Sync Daemon (r2-sync-daemon.sh)
Runs as a background process inside the sandbox, triggered every 5 minutes.
- Acquire flock (
/tmp/r2-sync.lock) to prevent concurrent syncs - Create snapshot directory
- For each rig:
dolt backup→ snapshot dir - For each rig:
git bundle create --all(bare repo only, not worktrees) → snapshot dir - Tar config files (
settings/,*/settings/) - Tar runtime state (
.runtime/) - Write
manifest.jsonwith checksums (sha256) - Upload to R2 staging prefix
- Update
latest.jsonpointer (atomic swap) - Cleanup: keep last 3 snapshots, delete older
- Report sync time to cloud API:
POST /api/gastown/heartbeat
Support --immediate flag for SIGTERM flush (skip timer, run once).
Restore Script (r2-restore.sh)
Runs on container startup (called by startup.sh from PR 1).
- Check if volume already has data → if yes, skip restore (volume-persisted state takes priority)
- Fetch
latest.jsonfrom R2 - Download snapshot files
- For each rig:
dolt backup restore - For each rig:
git clone --bare <bundle>→ recreate worktrees from branches - Extract config + runtime tarballs
- Verify Dolt integrity:
dolt verify-constraints - Report restore status to cloud API
SIGTERM Integration
Update startup.sh (from PR 1) to add:
cleanup() {
/usr/local/bin/r2-sync-daemon.sh --immediate
gt down
exit 0
}
trap cleanup SIGTERM SIGINTR2 Client Configuration
- Use existing R2 client infrastructure (
cloud/src/lib/r2/client.ts) - New bucket or prefix:
gastown-backups - Sandbox needs R2 credentials as env vars:
R2_ENDPOINT,R2_ACCESS_KEY_ID,R2_SECRET_ACCESS_KEY,R2_BUCKET - Inside the sandbox, use
awsCLI (S3-compatible) or a small upload script
Files
cloud/infra/gastown-sandbox/r2-sync-daemon.shcloud/infra/gastown-sandbox/r2-restore.sh- Updates to
cloud/infra/gastown-sandbox/startup.sh(SIGTERM handler)
Acceptance Criteria
- Sync daemon runs on a 5-minute timer
-
dolt backupproduces restorable snapshots for each rig -
git bundle create --allproduces valid bundles for each rig - Config and runtime files are tarred and uploaded
-
manifest.jsonincludes sha256 checksums for all files -
latest.jsonpointer is updated atomically (staging prefix → swap) - Old snapshots are cleaned up (keep last 3)
- Restore script skips if volume already has data
- Restore script successfully restores Dolt + git + config from R2
-
dolt verify-constraintspasses after restore - Git worktrees are recreated from restored bare repo
-
--immediateflag runs sync once and exits - SIGTERM handler flushes to R2 before shutdown
- Heartbeat reported to cloud API after each sync
Dependencies
- PR 1 (Sandbox Docker Image) — startup.sh integration
- PR 2 (Provisioning API) — R2 credentials provisioned as env vars
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels