The call comes at 2am. Fire suppression system failure. Server room flooded. Or: ransomware hit everything and you cannot trust a single disk on-site. Your primary Proxmox environment is gone.
You have offsite backups on Proxmox Backup Server. What happens next matters more than how you got here.
Key Takeaways
- Document PBS credentials, fingerprints, and encryption keys before disaster. You cannot recover them after.
- Recovery site type (bare metal, cloud, hosted) determines your setup time more than anything else
- Restore infrastructure services first: DNS, DHCP, and authentication before applications
- Lost encryption key means lost data, no exceptions. Store it separately from your backups.
- Your real RTO is 2-3x your estimated time. Measure it during drills, not during incidents.
- MSPs should provide clients a printed DR runbook, not just a link to documentation
What You Need Before Disaster Strikes
A DR plan that relies on credentials stored in your datacenter is not a DR plan. Before disaster, get these off-site.
Pre-DR Checklist
- PBS server hostname/IP: password manager + printed copy, verify monthly
- PBS TLS fingerprint: password manager + printed copy, verify monthly
- PBS API token or password: password manager, verify monthly
- Encryption key passphrase: password manager + printed copy in separate physical location, verify quarterly
- Network documentation (VLANs, IPs, firewall rules): cloud storage + printed, verify quarterly
- PVE cluster config backup: offsite PBS or cloud, verify weekly
- Prioritized VM list with dependencies: runbook document, verify quarterly
- RTO/RPO targets per service: runbook document, verify annually
The fingerprint is easy to overlook. Proxmox Backup Server uses TLS with a self-signed certificate by default. The fingerprint appears in PBS under Dashboard > Fingerprint. Without it, your clients refuse to connect, and you cannot retrieve it after the server is compromised.
Encryption keys deserve special attention. PBS client-side encryption is strong. If you use it and lose the key, the data is unrecoverable with no exceptions. Keep your key backed up separately, not on the server that holds the encrypted data.
Lost Encryption Key = Lost Data
If you use PBS client-side encryption and cannot produce the decryption key, your backups are unreadable. No recovery service can help. Store your key in at least two physically separate locations, neither of which is your primary datacenter.
Define your RTO and RPO before disaster. Know which VMs need to be up within one hour, which within four hours, and which can wait until the next day. See RTO vs. RPO if you need to build this framework. MSPs building DR runbooks for clients should work through this with each client individually.
Recovery Site Options
You need hardware and connectivity before you can restore anything. Three realistic options:
| Attribute | Bare Metal | Cloud VMs | remote-backups.com |
|---|---|---|---|
Setup Time | 4-24 hours | 30-90 minutes | Minutes |
Cost | High upfront | Low upfront, ongoing | Included in plan |
Network Complexity | Low | Medium | Low |
Bare metal gives you the most control and best performance. The tradeoff is hardware procurement and setup time. If your DR budget includes dedicated standby hardware, this is the cleanest path.
Cloud VMs get you running faster. Hetzner and OVH support nested virtualization, which means you can run Proxmox VE inside a cloud VM. Performance is limited compared to bare metal, but it's functional for most workloads during a DR event.
Hosted recovery eliminates the bootstrap problem. If your backups are already on remote-backups.com, PBS connectivity is pre-configured.
Step 1: Bootstrap the Proxmox Environment
Once you have target hardware or a cloud VM provisioned, install Proxmox VE from scratch. A single-node installation is fine. Rebuild the cluster after critical VMs are restored.
Then add your offsite PBS as a storage backend:
# Add PBS remote storage via CLI
pvesm add pbs offsite-pbs \
--server backup.example.com \
--datastore your-datastore \
--username backup@pbs \
--fingerprint AA:BB:CC:DD:EE:FF:... \
--password <your-token-or-password>Fingerprint Format
The fingerprint is the SHA-256 hash of your PBS server's TLS certificate, colon-delimited. Copy it carefully. A typo means the client will refuse to connect.
After adding the storage, verify you can see your backups:
If you used client-side encryption, add your encryption key before attempting any restores. In the PVE web UI: Datacenter > Storage, select your PBS storage, and upload the key under the Key Management tab.
Run a spot-check with PBS verify jobs before starting restores. Finding corruption mid-recovery is a costly surprise.
Step 2: Restore Order Strategy
The order you restore VMs determines how quickly everything else recovers. Dependencies matter.
| VM Type | Tier 1: DNS, DHCP | Tier 2: Active Directory / LDAP | Tier 3: Database servers | Tier 4: Application servers | Tier 5: User-facing services | Tier 6: Everything else |
|---|---|---|---|---|---|---|
RTO Target | 1 hour | 2 hours | 3 hours | 4-6 hours | 6-12 hours | 24+ hours |
Dependencies | None | DNS | DNS, AD | Databases, AD | Applications | All above |
Start with infrastructure. A DNS server that isn't running means almost everything in tier 3 and above will fail to start correctly. AD or LDAP servers come next. Applications that depend on database connections cannot be validated until the databases are running.
Document your specific VM list with actual IDs, roles, and dependencies. This list is what your team works from during recovery.
Step 3: Restore VMs from PBS
Two paths: GUI or CLI.
GUI Restore
In the PVE web interface: Datacenter > Storage > offsite-pbs > Backups. Select a VM backup and click Restore.
Key settings:
- VM ID: Use the original VM ID, or assign a new one if there is a conflict
- Target Storage: The local storage for the VM's disks
- Unique: Disable for DR restores where you want the original MAC address and UUID preserved
CLI Restore
# Restore VM 100 to the same ID, disks to local-lvm storage
qmrestore offsite-pbs:backup/vm/100/2026-03-24T02:00:00Z 100 \
--storage local-lvm
# Restore container 200
pct restore 200 offsite-pbs:backup/ct/200/2026-03-24T02:00:00Z \
--storage local-lvm
# Storage name differs between original and DR site
qmrestore offsite-pbs:backup/vm/101/2026-03-24T02:00:00Z 101 \
--storage-mapping source-local=local-nvmeThe --storage-mapping flag matters when your DR site uses different storage pool names than the original. Without it, the restore fails if the original storage name does not exist at the DR site.
Storage Name Mismatch
If your original PVE used storage named "local-zfs" and your DR node has "local-lvm", the restore will fail without explicit storage mapping. Use --storage or --storage-mapping to redirect disk placement.
Step 4: Network Reconfiguration
Your DR site almost certainly has different IP ranges, VLANs, and gateway addresses than your primary. Account for this before booting restored VMs.
Review each VM's network configuration before starting it. In the PVE hardware view, check the bridge assignment for each network device. If your DR site uses a different bridge name (e.g., vmbr1 instead of vmbr0), update it before boot.
For Linux VMs with cloud-init:
# Update IP config before starting the VM
qm set 100 --ipconfig0 ip=10.10.1.100/24,gw=10.10.1.1
# Then start the VM; cloud-init applies the new config on first boot
qm start 100For VMs without cloud-init, boot them without network access (remove or disconnect the network device), reconfigure the IP inside the VM via console, then re-attach the network device.
DNS reconfiguration happens at the same time. Once your DR DNS server is running at a new IP, every other restored VM needs that address. Update /etc/resolv.conf or your DHCP scope to point to the DR DNS server before networking comes up on application VMs.
Step 5: Validation and Cutover
Restore without validation is not recovery. Before declaring success on any tier, confirm:
Per VM:
- OS boots without filesystem errors or service failures
- Critical services are running (
systemctl status,sc query) - Data is present and current (check timestamps, query databases)
- Network connectivity to tier dependencies works
Infrastructure level:
- Tier 1 and 2 VMs are reachable by hostname from all restored VMs, not just by IP
- Authentication works across the environment
- Monitoring receives data from restored hosts
External access:
- DNS records updated to DR site IPs
- Firewall rules permit expected external traffic
- Load balancers and reverse proxies reconfigured
Document Fixes During Recovery
Track every manual fix you make. If DNS pointed to the wrong IP, or a service needed a manual config file update, that is a gap in your runbook. Fix the runbook after recovery, not during it.
Estimating Recovery Time
Restore time has two components: data transfer and service startup.
Transfer time depends on backup size and available bandwidth:
Rough formula: Total backup size (GB) ÷ (bandwidth in Mbps × 0.112) = hours
A 1 Gbps connection delivers roughly 112 MB/s. Restoring 2 TB of VM backups takes about five hours on a 1 Gbps link, assuming no other bottlenecks. Use the initial seed calculator to model your specific scenario.
Service startup time varies. A database server may take 10-20 minutes after boot to complete crash recovery and accept connections. Factor this into your tier timelines.
Your total RTO is: bootstrap time + largest restore time per tier + validation time per tier. Run a full DR drill to measure your actual numbers. Estimates are almost always optimistic.
Building a DR Runbook
A DR runbook is a step-by-step document your team or your client can follow without you. It answers every question that comes up during recovery, when people are under pressure and making mistakes is easy.
DR Runbook Contents
Escalation contacts
Who to call, in what order, at what hours. Include vendors and hosting providers.
Credential locations
Where to find PBS credentials, encryption keys, and network documentation. Not the credentials themselves.
Recovery site access
How to provision the DR environment, including provider logins and provisioning steps.
Prioritized VM list
VM IDs, roles, dependencies, and RTO targets. This is the restore order reference.
Step-by-step restore procedure
The exact commands and settings for your environment. No ambiguity.
Validation checklist
Per-VM and per-service acceptance criteria. Defines what 'recovered' means.
DNS and network cutover steps
Specific IP changes, DNS record updates, and firewall rule adjustments.
Test schedule and sign-off
When this runbook was last tested, by whom, and what issues were found.
For MSPs managing multiple clients, maintain a separate runbook per client. Each client has different infrastructure, different RTO requirements, and different DNS and network layouts. A shared generic runbook causes mistakes under pressure.
Test the runbook. Hand it to someone unfamiliar with the environment and have them execute it. Every step they cannot follow without asking a question is a gap to fix. The restore testing guide covers how to run a full DR drill against this runbook on an annual schedule.
Wrapping Up
Offsite backups on Proxmox Backup Server solve the data preservation problem. Getting a cluster running from those backups requires preparation that happens before disaster. Store credentials off-site, define your restore order, know your recovery site options, and drill the process at least once a year.
The difference between a drill and a real event is that the drill gives you the opportunity to find gaps before they matter. Every gap found in a drill is one fewer surprise at 2am.
Need offsite PBS storage for your DR plan?
remote-backups.com provides encrypted Proxmox Backup Server targets in EU datacenters with geo-replication across regions.
View Plans


