PBS Disaster Recovery: Full Cluster Restore

March 25, 2026
11 min read

The call comes at 2am. Fire suppression system failure. Server room flooded. Or: ransomware hit everything and you cannot trust a single disk on-site. Your primary Proxmox environment is gone.

You have offsite backups on Proxmox Backup Server. What happens next matters more than how you got here.

Key Takeaways

Document PBS credentials, fingerprints, and encryption keys before disaster. You cannot recover them after.
Recovery site type (bare metal, cloud, hosted) determines your setup time more than anything else
Restore infrastructure services first: DNS, DHCP, and authentication before applications
Lost encryption key means lost data, no exceptions. Store it separately from your backups.
Your real RTO is 2-3x your estimated time. Measure it during drills, not during incidents.
MSPs should provide clients a printed DR runbook, not just a link to documentation

What You Need Before Disaster Strikes

A DR plan that relies on credentials stored in your datacenter is not a DR plan. Before disaster, get these off-site.

Pre-DR Checklist

PBS server hostname/IP: password manager + printed copy, verify monthly
PBS TLS fingerprint: password manager + printed copy, verify monthly
PBS API token or password: password manager, verify monthly
Encryption key passphrase: password manager + printed copy in separate physical location, verify quarterly
Network documentation (VLANs, IPs, firewall rules): cloud storage + printed, verify quarterly
PVE cluster config backup: offsite PBS or cloud, verify weekly
Prioritized VM list with dependencies: runbook document, verify quarterly
RTO/RPO targets per service: runbook document, verify annually

The fingerprint is easy to overlook. Proxmox Backup Server uses TLS with a self-signed certificate by default. The fingerprint appears in PBS under Dashboard > Fingerprint. Without it, your clients refuse to connect, and you cannot retrieve it after the server is compromised.

Encryption keys deserve special attention. PBS client-side encryption is strong. If you use it and lose the key, the data is unrecoverable with no exceptions. Keep your key backed up separately, not on the server that holds the encrypted data.

Lost Encryption Key = Lost Data

If you use PBS client-side encryption and cannot produce the decryption key, your backups are unreadable. No recovery service can help. Store your key in at least two physically separate locations, neither of which is your primary datacenter.

Define your RTO and RPO before disaster. Know which VMs need to be up within one hour, which within four hours, and which can wait until the next day. See RTO vs. RPO if you need to build this framework. MSPs building DR runbooks for clients should work through this with each client individually.

Recovery Site Options

You need hardware and connectivity before you can restore anything. Three realistic options:

Attribute	Bare Metal	Cloud VMs	remote-backups.com
Setup Time	4-24 hours	30-90 minutes	Minutes
Cost	High upfront	Low upfront, ongoing	Included in plan
Network Complexity	Low	Medium	Low

Bare metal gives you the most control and best performance. The tradeoff is hardware procurement and setup time. If your DR budget includes dedicated standby hardware, this is the cleanest path.

Cloud VMs get you running faster. Hetzner and OVH support nested virtualization, which means you can run Proxmox VE inside a cloud VM. Performance is limited compared to bare metal, but it's functional for most workloads during a DR event.

Hosted recovery eliminates the bootstrap problem. If your backups are already on remote-backups.com, PBS connectivity is pre-configured.

Step 1: Bootstrap the Proxmox Environment

Once you have target hardware or a cloud VM provisioned, install Proxmox VE from scratch. A single-node installation is fine. Rebuild the cluster after critical VMs are restored.

Then add your offsite PBS as a storage backend:

bash

# Add PBS remote storage via CLI
pvesm add pbs offsite-pbs \
    --server backup.example.com \
    --datastore your-datastore \
    --username backup@pbs \
    --fingerprint AA:BB:CC:DD:EE:FF:... \
    --password <your-token-or-password>

Add PBS storage to fresh PVE node

Fingerprint Format

The fingerprint is the SHA-256 hash of your PBS server's TLS certificate, colon-delimited. Copy it carefully. A typo means the client will refuse to connect.

After adding the storage, verify you can see your backups:

List available backups on offsite PBS

root@pve1:~#

pvesm list offsite-pbs

offsite-pbs:backup/vm/100/2026-03-24T02:00:00Z vm 100 42G

offsite-pbs:backup/vm/101/2026-03-24T02:00:00Z vm 101 18G

offsite-pbs:backup/ct/200/2026-03-24T02:00:00Z ct 200 3G

If you used client-side encryption, add your encryption key before attempting any restores. In the PVE web UI: Datacenter > Storage, select your PBS storage, and upload the key under the Key Management tab.

Run a spot-check with PBS verify jobs before starting restores. Finding corruption mid-recovery is a costly surprise.

Step 2: Restore Order Strategy

The order you restore VMs determines how quickly everything else recovers. Dependencies matter.

VM Type	Tier 1: DNS, DHCP	Tier 2: Active Directory / LDAP	Tier 3: Database servers	Tier 4: Application servers	Tier 5: User-facing services	Tier 6: Everything else
RTO Target	1 hour	2 hours	3 hours	4-6 hours	6-12 hours	24+ hours
Dependencies	None	DNS	DNS, AD	Databases, AD	Applications	All above

Start with infrastructure. A DNS server that isn't running means almost everything in tier 3 and above will fail to start correctly. AD or LDAP servers come next. Applications that depend on database connections cannot be validated until the databases are running.

Document your specific VM list with actual IDs, roles, and dependencies. This list is what your team works from during recovery.

Step 3: Restore VMs from PBS

Two paths: GUI or CLI.

GUI Restore

In the PVE web interface: Datacenter > Storage > offsite-pbs > Backups. Select a VM backup and click Restore.

Key settings:

VM ID: Use the original VM ID, or assign a new one if there is a conflict
Target Storage: The local storage for the VM's disks
Unique: Disable for DR restores where you want the original MAC address and UUID preserved

CLI Restore

bash

# Restore VM 100 to the same ID, disks to local-lvm storage
qmrestore offsite-pbs:backup/vm/100/2026-03-24T02:00:00Z 100 \
    --storage local-lvm

# Restore container 200
pct restore 200 offsite-pbs:backup/ct/200/2026-03-24T02:00:00Z \
    --storage local-lvm

# Storage name differs between original and DR site
qmrestore offsite-pbs:backup/vm/101/2026-03-24T02:00:00Z 101 \
    --storage-mapping source-local=local-nvme

Restore VMs and containers via CLI

The --storage-mapping flag matters when your DR site uses different storage pool names than the original. Without it, the restore fails if the original storage name does not exist at the DR site.

Storage Name Mismatch

If your original PVE used storage named "local-zfs" and your DR node has "local-lvm", the restore will fail without explicit storage mapping. Use --storage or --storage-mapping to redirect disk placement.

Step 4: Network Reconfiguration

Your DR site almost certainly has different IP ranges, VLANs, and gateway addresses than your primary. Account for this before booting restored VMs.

Review each VM's network configuration before starting it. In the PVE hardware view, check the bridge assignment for each network device. If your DR site uses a different bridge name (e.g., vmbr1 instead of vmbr0), update it before boot.

For Linux VMs with cloud-init:

bash

# Update IP config before starting the VM
qm set 100 --ipconfig0 ip=10.10.1.100/24,gw=10.10.1.1

# Then start the VM; cloud-init applies the new config on first boot
qm start 100

Update VM network config via cloud-init

For VMs without cloud-init, boot them without network access (remove or disconnect the network device), reconfigure the IP inside the VM via console, then re-attach the network device.

DNS reconfiguration happens at the same time. Once your DR DNS server is running at a new IP, every other restored VM needs that address. Update /etc/resolv.conf or your DHCP scope to point to the DR DNS server before networking comes up on application VMs.

Step 5: Validation and Cutover

Restore without validation is not recovery. Before declaring success on any tier, confirm:

Per VM:

OS boots without filesystem errors or service failures
Critical services are running (systemctl status, sc query)
Data is present and current (check timestamps, query databases)
Network connectivity to tier dependencies works

Infrastructure level:

Tier 1 and 2 VMs are reachable by hostname from all restored VMs, not just by IP
Authentication works across the environment
Monitoring receives data from restored hosts

External access:

DNS records updated to DR site IPs
Firewall rules permit expected external traffic
Load balancers and reverse proxies reconfigured

Document Fixes During Recovery

Track every manual fix you make. If DNS pointed to the wrong IP, or a service needed a manual config file update, that is a gap in your runbook. Fix the runbook after recovery, not during it.

Estimating Recovery Time

Restore time has two components: data transfer and service startup.

Transfer time depends on backup size and available bandwidth:

Rough formula: Total backup size (GB) ÷ (bandwidth in Mbps × 0.112) = hours

A 1 Gbps connection delivers roughly 112 MB/s. Restoring 2 TB of VM backups takes about five hours on a 1 Gbps link, assuming no other bottlenecks. Use the initial seed calculator to model your specific scenario.

Service startup time varies. A database server may take 10-20 minutes after boot to complete crash recovery and accept connections. Factor this into your tier timelines.

Your total RTO is: bootstrap time + largest restore time per tier + validation time per tier. Run a full DR drill to measure your actual numbers. Estimates are almost always optimistic.

Building a DR Runbook

A DR runbook is a step-by-step document your team or your client can follow without you. It answers every question that comes up during recovery, when people are under pressure and making mistakes is easy.

DR Runbook Contents

Escalation contacts

Who to call, in what order, at what hours. Include vendors and hosting providers.

Credential locations

Where to find PBS credentials, encryption keys, and network documentation. Not the credentials themselves.

Recovery site access

How to provision the DR environment, including provider logins and provisioning steps.

Prioritized VM list

VM IDs, roles, dependencies, and RTO targets. This is the restore order reference.

Step-by-step restore procedure

The exact commands and settings for your environment. No ambiguity.

Validation checklist

Per-VM and per-service acceptance criteria. Defines what 'recovered' means.

DNS and network cutover steps

Specific IP changes, DNS record updates, and firewall rule adjustments.

Test schedule and sign-off

When this runbook was last tested, by whom, and what issues were found.

For MSPs managing multiple clients, maintain a separate runbook per client. Each client has different infrastructure, different RTO requirements, and different DNS and network layouts. A shared generic runbook causes mistakes under pressure.

Test the runbook. Hand it to someone unfamiliar with the environment and have them execute it. Every step they cannot follow without asking a question is a gap to fix. The restore testing guide covers how to run a full DR drill against this runbook on an annual schedule.

Wrapping Up

Offsite backups on Proxmox Backup Server solve the data preservation problem. Getting a cluster running from those backups requires preparation that happens before disaster. Store credentials off-site, define your restore order, know your recovery site options, and drill the process at least once a year.

The difference between a drill and a real event is that the drill gives you the opportunity to find gaps before they matter. Every gap found in a drill is one fewer surprise at 2am.

Need offsite PBS storage for your DR plan?

remote-backups.com provides encrypted Proxmox Backup Server targets in EU datacenters with geo-replication across regions.

View Plans

It depends on total backup size and available bandwidth. On a 1 Gbps link, plan for roughly 112 MB/s throughput. Restoring 2 TB of VM backups takes about five hours, plus service startup and validation time per tier. Run a drill with your actual data to get a real number. Assumed RTOs are almost always 2-3x too optimistic.

Yes. PBS backups are not hardware-specific. You can restore to different CPUs, different NICs, and different storage. If the original VM used a PCIe passthrough device that does not exist at the DR site, remove that hardware entry from the VM config before booting. Use --storage-mapping to handle storage pool name differences.

If you used PBS client-side encryption and cannot produce the decryption key, the backups are unreadable. There is no recovery path. This is intentional. Strong encryption with no backdoor is what makes the protection meaningful. Store your key in at least two locations independent of your datacenter: a password manager in a separate cloud account, a printed copy in a physical safe, or both.

Yes, at minimum annually. A full DR drill (fresh PVE install, connect to PBS, restore core infrastructure, validate services) reveals gaps in documentation, credential management, and network configuration that you will not find any other way. Quarterly tests of individual VM restores catch problems before the annual full drill.

Sign In

PBS Disaster Recovery: Full Cluster Restore

Key Takeaways

What You Need Before Disaster Strikes

Pre-DR Checklist

Lost Encryption Key = Lost Data

Recovery Site Options

Step 1: Bootstrap the Proxmox Environment

Fingerprint Format

Step 2: Restore Order Strategy

Step 3: Restore VMs from PBS

GUI Restore

CLI Restore

Storage Name Mismatch

Step 4: Network Reconfiguration

Step 5: Validation and Cutover

Document Fixes During Recovery

Estimating Recovery Time

Building a DR Runbook

DR Runbook Contents

Escalation contacts

Credential locations

Recovery site access

Prioritized VM list

Step-by-step restore procedure

Validation checklist

DNS and network cutover steps

Test schedule and sign-off

Wrapping Up

Need offsite PBS storage for your DR plan?

Tags

Bennet Gallein

Backup Solutions

Resources

Useful Links

Tools

Newsletter

For Who

Comparisons

Our Network

PBS Disaster Recovery: Full Cluster Restore

Key Takeaways

Pre-DR Checklist

Lost Encryption Key = Lost Data

Fingerprint Format

Storage Name Mismatch

Document Fixes During Recovery

DR Runbook Contents

Escalation contacts

Credential locations

Recovery site access

Prioritized VM list

Step-by-step restore procedure

Validation checklist

DNS and network cutover steps

Test schedule and sign-off

Need offsite PBS storage for your DR plan?

How long does a full Proxmox cluster restore take?

Can I restore VMs to different hardware than the original?

What if I lost my encryption key?

Should I test my DR plan regularly?

Related Articles

Tags

Share this article

Bennet Gallein

You might also like

PBS High Availability: Dual-Node Sync & Failover

Proxmox Backup Client for Windows: PBS Backup Guide

PBS Restore Testing: A Practical Guide to DR Drills