PBS Restore Testing: A Practical Guide to DR Drills

February 13, 2026
10 min read

The best backup system is worthless if you've never tested a restore. This isn't a theoretical concern. Most administrators discover their backup strategy has gaps at the worst possible moment: during an actual outage, with stakeholders watching and downtime accumulating.

The pattern is predictable. Backups run for months. The jobs report success. Nobody checks whether those backups actually produce a bootable, functional system. Then a disk fails, ransomware hits, or someone drops the wrong VM, and the first restore attempt happens under maximum pressure with zero prior practice.

Key Takeaways

A backup job reporting OK does not prove the backup is restorable — only a restore test does
Use a tiered testing schedule: verification weekly, file restores monthly, full VM restores quarterly, bare-metal DR drill annually
Always restore to a new VM ID with the Unique flag and isolated networking to avoid production conflicts
Document every test: VM tested, backup date, restore duration, pass/fail, and issues found
Your actual recovery time is usually 2-3x your assumed RTO — measure it during drills

By the end of this article, you'll have a repeatable process for validating your Proxmox Backup Server backups, a testing schedule you can maintain, and a clear understanding of what can go wrong and how to catch it early.

Why Restore Testing Matters

A backup job reporting "OK" tells you that data was written to the datastore. It does not tell you that data can be read back, reassembled into a functioning VM, and booted successfully. Several failure modes only surface during an actual restore:

Silent corruption. Bitrot, faulty RAM, or storage controller issues can corrupt chunks on disk without triggering backup job errors. Your backup completes, but the data is damaged. PBS verification jobs catch chunk-level corruption, but they don't prove a VM will boot.
Configuration drift. A backup that worked six months ago covered a VM with 4GB RAM and two disks. That VM now has 16GB, four disks, and a PCIe passthrough device. Your backup may not capture what you think it captures.
Credential and permission issues. Restore permissions, storage access, and API token scopes that were valid during setup may have changed. You won't know until you try.
Compliance requirements. Standards like ISO 27001 and SOC 2 mandate documented restore tests. "We assume it works" doesn't pass an audit.

Restore testing complements your broader data protection strategy. If you've implemented ransomware protection measures, regular restore tests confirm those protections actually deliver a recoverable system.

Types of Restore Tests

Not every test requires a full DR drill. Match your testing effort to the risk you're validating against.

Restore Test Types

Attribute	Verification job	File-level restore	Full VM restore to temp ID	Bare-metal DR drill
Effort	Low	Medium	High	Very High
What It Validates	Chunk integrity (hash check)	Data accessibility, file contents	Complete VM recovery, boot, services	Full infrastructure rebuild from scratch
Recommended Frequency	Weekly	Monthly	Quarterly	Annually

Verification jobs are automated and lightweight. They re-read chunks from disk and confirm SHA-256 hashes match. This catches storage-level corruption but doesn't prove a VM will boot. Think of it as a smoke detector, not a fire drill.

File-level restores prove you can access individual files inside a backup. Pick a few files you know the contents of, restore them, and compare. This catches backup scope issues (wrong disks, excluded paths) without the overhead of spinning up a full VM.

Full VM restore tests are the gold standard for operational confidence. You restore a complete VM to a temporary ID, boot it, verify services work, and tear it down. This is what the next section covers in detail.

Bare-metal DR drills simulate total infrastructure loss. You start from scratch, a fresh Proxmox VE install, reconnect to your PBS datastore, and restore everything. This is the only way to validate your entire recovery chain, including documentation, credentials, and network configuration.

Step-by-Step: Full VM Restore Test

This walkthrough covers the most practical test: restoring a production VM to a temporary ID, validating it works, and cleaning up.

1. Select a Test Candidate

Choose a VM that matters. Testing restores on a throwaway container proves nothing useful. Pick a production VM with real services: a database server, application host, or domain controller. Rotate which VMs you test each quarter so you eventually cover everything critical.

2. Locate the Backup

In the Proxmox VE web interface, navigate to Datacenter > Storage, select your PBS storage, and open the Backups tab. You'll see a list of available snapshots grouped by VM ID. Select the backup you want to test. For restore testing, the most recent backup is usually the right choice.

3. Start the Restore

Click Restore on the selected backup. The restore dialog is where the critical settings live:

VM ID: Assign a new VM ID that doesn't conflict with any existing VM. If your production VM is ID 100, restore to something like 9100. Never restore over a running production VM during a test.
Unique: Enable this checkbox. It generates new MAC addresses and UUID for the restored VM, preventing network identity conflicts with the production system.
Target Storage: Choose where to place the restored VM's disks. This can be the same storage as production or a different pool, as long as it has enough free space.
Start after restore: Leave this disabled. You want to review the VM configuration before booting it.

4. Monitor Progress

The restore task appears in the task log. Watch for errors. A healthy restore completes without warnings. Note the restore duration; this is your actual Recovery Time for this VM, not a theoretical estimate.

5. Isolate and Boot

Before starting the restored VM, review its network configuration.

Prevent IP Conflicts

Use an isolated VLAN or disable the network interfaces on the test VM before booting. A restored VM with the same IP address as your production system will cause network conflicts that impact production. In the VM's hardware settings, either remove the network device or assign it to a test-only bridge.

Start the VM and verify:

OS boots successfully. Watch the console for boot errors, filesystem checks, or service failures.
Services start. Log in and confirm that critical services (databases, web servers, application daemons) are running.
Data is present. Check that recent data exists, not just the OS. Query a database, open a recent file, verify log timestamps.
Application-level validation. If possible, run a basic smoke test against the application. Can you log into the web UI? Does the API respond?

6. Document and Clean Up

Record your results: VM tested, backup date, restore duration, pass/fail, and any issues found. This documentation is valuable for compliance and for improving your process.

When you're satisfied, shut down and delete the test VM. Don't leave test VMs running. They consume resources and create confusion.

CLI Alternative

If you prefer the command line, qmrestore handles the same operation:

bash

# Restore VM backup to new ID 9100, with unique MAC/UUID
qmrestore <backup-path> 9100 --storage local-lvm --unique

Restore VM backup

Replace <backup-path> with the PBS backup path (e.g., pbs:backup/vm/100/2026-02-10T02:00:00Z). The --unique flag is the CLI equivalent of the "Unique" checkbox in the GUI.

For containers, use pct restore instead:

bash

pct restore 9200 <backup-path> --storage local-lvm --unique

Restore container

Automating Restore Tests

Manual quarterly tests are essential, but automation handles the routine checks between them.

PBS Verification Jobs

Verification jobs are built into Proxmox Backup Server and should be your first line of defense. They re-read stored chunks and verify SHA-256 hashes, catching storage corruption before it becomes a restore failure.

Configure these in the PBS web UI under your datastore's Verify Jobs tab. Schedule them to run at least weekly. Pair them with monitoring and alerting so a failed verification triggers an immediate notification, not an entry in a log nobody reads.

Scripted Restore Validation

For teams that want automated end-to-end restore testing, you can script the restore-boot-validate-cleanup cycle:

bash

#!/bin/bash
# Monthly automated restore test
TEST_VMID=9999
BACKUP="pbs:backup/vm/100/latest"
STORAGE="local-lvm"

# Restore to temporary VM
qmrestore "$BACKUP" $TEST_VMID --storage $STORAGE --unique 2>&1
if [ $? -ne 0 ]; then
    echo "RESTORE FAILED" | mail -s "Restore Test FAILED" admin@example.com
    exit 1
fi

# Boot and wait for QEMU guest agent
qm start $TEST_VMID
sleep 120

# Basic health check via guest agent
qm guest cmd $TEST_VMID ping 2>&1
RESULT=$?

# Clean up
qm stop $TEST_VMID && qm destroy $TEST_VMID --purge

# Report
if [ $RESULT -eq 0 ]; then
    echo "Restore test PASSED" | mail -s "Restore Test OK" admin@example.com
else
    echo "Restore test FAILED - VM did not respond" | mail -s "Restore Test FAILED" admin@example.com
fi

monthly-restore-test.sh

Adapt this to your environment. Add application-specific health checks (HTTP requests, database queries) for more thorough validation.

Storage Consumption

Automated restore tests temporarily consume storage for the test VM's disks. Ensure your target storage has enough headroom, and always destroy test VMs automatically. A forgotten test VM with a 500GB disk will cause problems.

Building a DR Testing Schedule

A sustainable testing schedule uses a tiered approach. Not everything needs to be tested at the same frequency.

Weekly: PBS verification jobs run automatically. These are set-and-forget once configured.
Monthly: File-level restore test. Pick 2-3 random files from different VMs, restore them, verify their contents match expectations.
Quarterly: Full VM restore test on 2-3 critical VMs. Follow the step-by-step process above. Rotate which VMs you test so all critical systems are covered within a year.
Annually: Full DR drill. Simulate a complete host failure. Start from a fresh Proxmox VE installation, connect to your PBS datastore (including any offsite datastores), and restore your core infrastructure from scratch. This tests not just your backups, but your documentation, your credentials, and your team's ability to execute under pressure.

What to Document

Keep a restore test log. For each test, record:

Date of the test
VMs or data tested (VM IDs, file paths)
Backup snapshot used (timestamp)
Time to restore (wall clock from start to verified-working)
Result (pass/fail)
Issues found and remediation taken
Tester name

This log becomes evidence for compliance audits and a reference for estimating recovery times during actual incidents.

Common Restore Failures and How to Avoid Them

When a restore test fails, the cause usually falls into one of these categories:

"Datastore not found" — The PBS storage isn't configured on the target node. If you're restoring to a different PVE node than the one that created the backup, add the PBS storage to that node first.
Slow restores — Usually a bandwidth bottleneck between PVE and PBS. For remote/offsite datastores, network throughput is the limiting factor. Use the initial seed calculator to estimate restore times for large VMs and plan accordingly.
Permission errors — The backup user or API token lacks restore privileges. The DatastoreBackup role allows creating backups but not restoring. You need at least DatastoreReader for restore operations.
Disk space errors — The target storage doesn't have enough free space for the restored VM's disks. Check available space before starting. A thin-provisioned 500GB disk might only use 80GB in the backup, but restore may allocate the full 500GB depending on your storage backend.
Boot failures after successful restore — The restore completed but the VM won't boot. Common causes include missing EFI disk (if the original used UEFI boot), hardware configuration mismatches, or driver issues. Check the VM's hardware configuration matches what's needed.

Frequently Asked Questions

Use a tiered schedule: run PBS verification jobs weekly to catch chunk corruption, perform file-level restore tests monthly, do full VM restore tests on 2-3 critical VMs quarterly, and conduct a complete bare-metal DR drill annually. This balances thoroughness with operational overhead.

A verification job re-reads chunks from disk and confirms their SHA-256 hashes match, catching storage-level corruption like bitrot. It does not prove a backup is actually restorable. A restore test goes further by restoring a full VM, booting it, and verifying that the OS, services, and data work correctly. Verification catches corruption; restore testing catches everything else.

When restoring from PBS, assign a new VM ID (e.g., 9100 instead of 100) and enable the 'Unique' checkbox to generate new MAC addresses and UUIDs. Disable networking or use an isolated VLAN before booting the restored VM to prevent IP conflicts with production. On the CLI, use: qmrestore <backup-path> 9100 --storage local-lvm --unique

Yes. You can script the restore-boot-validate-cleanup cycle using qmrestore, qm start, and qm guest cmd for health checks, then qm destroy to clean up. Run this as a monthly cron job and send results via email or webhook. Ensure your target storage has enough space for temporary test VMs, and always destroy them automatically after testing.

Document the failure, including the error message and the backup snapshot used. Common causes include: PBS storage not configured on the target node ('Datastore not found'), insufficient permissions on the backup user, not enough disk space on the target storage, or missing EFI disk configuration. Fix the underlying issue, then re-test to confirm the fix works. Don't wait for a real disaster to discover the same failure again.

Recommended DR Testing Cadence

Weekly: PBS Verification Jobs

Automated chunk integrity checks. Set and forget after initial configuration.

Monthly: File-Level Restore

Restore 2-3 random files from different VMs. Verify contents match expectations.

Quarterly: Full VM Restore

Restore 2-3 critical VMs to temporary IDs. Boot, verify services, document results.

Annually: Bare-Metal DR Drill

Simulate total host failure. Fresh PVE install, reconnect to PBS, restore core infrastructure.

Conclusion

Backups that haven't been tested are assumptions, not protection. The verification job catches corrupted chunks. The file-level restore catches scope issues. The full VM restore proves you can actually recover. And the annual DR drill proves your team can execute the entire recovery chain under pressure.

Build the testing habit now, while nothing is on fire. Start with a single VM restore this week, add verification jobs if you haven't already, and commit to the quarterly schedule. Each test you run is one fewer surprise during a real incident.

Test your restores against a real offsite copy

remote-backups.com provides managed PBS datastores with geo-replication across EU datacenters. Your next restore test should prove your entire recovery chain works.

Get Started

Sign In

PBS Restore Testing: A Practical Guide to DR Drills

Key Takeaways

Why Restore Testing Matters

Types of Restore Tests

Restore Test Types

Step-by-Step: Full VM Restore Test

1. Select a Test Candidate

2. Locate the Backup

3. Start the Restore

4. Monitor Progress

5. Isolate and Boot

Prevent IP Conflicts

6. Document and Clean Up

CLI Alternative

Automating Restore Tests

PBS Verification Jobs

Scripted Restore Validation

Storage Consumption

Building a DR Testing Schedule

What to Document

Common Restore Failures and How to Avoid Them

Frequently Asked Questions

Recommended DR Testing Cadence

Weekly: PBS Verification Jobs

Monthly: File-Level Restore

Quarterly: Full VM Restore

Annually: Bare-Metal DR Drill

Conclusion

Test your restores against a real offsite copy

Tags

Bennet Gallein

Backup Solutions

Resources

Useful Links

Tools

Newsletter

For Who

Comparisons

Our Network

PBS Restore Testing: A Practical Guide to DR Drills

Key Takeaways

Restore Test Types

Prevent IP Conflicts

Storage Consumption

How often should I test Proxmox Backup Server restores?

What is the difference between a PBS verification job and a restore test?

How do I restore a VM without overwriting the production system?

Can I automate PBS restore tests?

What should I do if a restore test fails?

Recommended DR Testing Cadence

Weekly: PBS Verification Jobs

Monthly: File-Level Restore

Quarterly: Full VM Restore

Annually: Bare-Metal DR Drill

Test your restores against a real offsite copy

Tags

Share this article

Bennet Gallein

You might also like

PBS Disaster Recovery: Full Cluster Restore

PBS High Availability: Dual-Node Sync & Failover

Best Offsite Backup for Proxmox (2026)