The best backup system is worthless if you've never tested a restore. This isn't a theoretical concern. Most administrators discover their backup strategy has gaps at the worst possible moment: during an actual outage, with stakeholders watching and downtime accumulating.
The pattern is predictable. Backups run for months. The jobs report success. Nobody checks whether those backups actually produce a bootable, functional system. Then a disk fails, ransomware hits, or someone drops the wrong VM, and the first restore attempt happens under maximum pressure with zero prior practice.
Key Takeaways
- A backup job reporting OK does not prove the backup is restorable — only a restore test does
- Use a tiered testing schedule: verification weekly, file restores monthly, full VM restores quarterly, bare-metal DR drill annually
- Always restore to a new VM ID with the Unique flag and isolated networking to avoid production conflicts
- Document every test: VM tested, backup date, restore duration, pass/fail, and issues found
- Your actual recovery time is usually 2-3x your assumed RTO — measure it during drills
By the end of this article, you'll have a repeatable process for validating your Proxmox Backup Server backups, a testing schedule you can maintain, and a clear understanding of what can go wrong and how to catch it early.
Why Restore Testing Matters
A backup job reporting "OK" tells you that data was written to the datastore. It does not tell you that data can be read back, reassembled into a functioning VM, and booted successfully. Several failure modes only surface during an actual restore:
- Silent corruption. Bitrot, faulty RAM, or storage controller issues can corrupt chunks on disk without triggering backup job errors. Your backup completes, but the data is damaged. PBS verification jobs catch chunk-level corruption, but they don't prove a VM will boot.
- Configuration drift. A backup that worked six months ago covered a VM with 4GB RAM and two disks. That VM now has 16GB, four disks, and a PCIe passthrough device. Your backup may not capture what you think it captures.
- Credential and permission issues. Restore permissions, storage access, and API token scopes that were valid during setup may have changed. You won't know until you try.
- Compliance requirements. Standards like ISO 27001 and SOC 2 mandate documented restore tests. "We assume it works" doesn't pass an audit.
Restore testing complements your broader data protection strategy. If you've implemented ransomware protection measures, regular restore tests confirm those protections actually deliver a recoverable system.
Types of Restore Tests
Not every test requires a full DR drill. Match your testing effort to the risk you're validating against.
Restore Test Types
| Attribute | Verification job | File-level restore | Full VM restore to temp ID | Bare-metal DR drill |
|---|---|---|---|---|
Effort | Low | Medium | High | Very High |
What It Validates | Chunk integrity (hash check) | Data accessibility, file contents | Complete VM recovery, boot, services | Full infrastructure rebuild from scratch |
Recommended Frequency | Weekly | Monthly | Quarterly | Annually |
Verification jobs are automated and lightweight. They re-read chunks from disk and confirm SHA-256 hashes match. This catches storage-level corruption but doesn't prove a VM will boot. Think of it as a smoke detector, not a fire drill.
File-level restores prove you can access individual files inside a backup. Pick a few files you know the contents of, restore them, and compare. This catches backup scope issues (wrong disks, excluded paths) without the overhead of spinning up a full VM.
Full VM restore tests are the gold standard for operational confidence. You restore a complete VM to a temporary ID, boot it, verify services work, and tear it down. This is what the next section covers in detail.
Bare-metal DR drills simulate total infrastructure loss. You start from scratch, a fresh Proxmox VE install, reconnect to your PBS datastore, and restore everything. This is the only way to validate your entire recovery chain, including documentation, credentials, and network configuration.
Step-by-Step: Full VM Restore Test
This walkthrough covers the most practical test: restoring a production VM to a temporary ID, validating it works, and cleaning up.
1. Select a Test Candidate
Choose a VM that matters. Testing restores on a throwaway container proves nothing useful. Pick a production VM with real services: a database server, application host, or domain controller. Rotate which VMs you test each quarter so you eventually cover everything critical.
2. Locate the Backup
In the Proxmox VE web interface, navigate to Datacenter > Storage, select your PBS storage, and open the Backups tab. You'll see a list of available snapshots grouped by VM ID. Select the backup you want to test. For restore testing, the most recent backup is usually the right choice.
3. Start the Restore
Click Restore on the selected backup. The restore dialog is where the critical settings live:
- VM ID: Assign a new VM ID that doesn't conflict with any existing VM. If your production VM is ID 100, restore to something like 9100. Never restore over a running production VM during a test.
- Unique: Enable this checkbox. It generates new MAC addresses and UUID for the restored VM, preventing network identity conflicts with the production system.
- Target Storage: Choose where to place the restored VM's disks. This can be the same storage as production or a different pool, as long as it has enough free space.
- Start after restore: Leave this disabled. You want to review the VM configuration before booting it.
4. Monitor Progress
The restore task appears in the task log. Watch for errors. A healthy restore completes without warnings. Note the restore duration; this is your actual Recovery Time for this VM, not a theoretical estimate.
5. Isolate and Boot
Before starting the restored VM, review its network configuration.
Prevent IP Conflicts
Use an isolated VLAN or disable the network interfaces on the test VM before booting. A restored VM with the same IP address as your production system will cause network conflicts that impact production. In the VM's hardware settings, either remove the network device or assign it to a test-only bridge.
Start the VM and verify:
- OS boots successfully. Watch the console for boot errors, filesystem checks, or service failures.
- Services start. Log in and confirm that critical services (databases, web servers, application daemons) are running.
- Data is present. Check that recent data exists, not just the OS. Query a database, open a recent file, verify log timestamps.
- Application-level validation. If possible, run a basic smoke test against the application. Can you log into the web UI? Does the API respond?
6. Document and Clean Up
Record your results: VM tested, backup date, restore duration, pass/fail, and any issues found. This documentation is valuable for compliance and for improving your process.
When you're satisfied, shut down and delete the test VM. Don't leave test VMs running. They consume resources and create confusion.
CLI Alternative
If you prefer the command line, qmrestore handles the same operation:
# Restore VM backup to new ID 9100, with unique MAC/UUID
qmrestore <backup-path> 9100 --storage local-lvm --uniqueReplace <backup-path> with the PBS backup path (e.g., pbs:backup/vm/100/2026-02-10T02:00:00Z). The --unique flag is the CLI equivalent of the "Unique" checkbox in the GUI.
For containers, use pct restore instead:
pct restore 9200 <backup-path> --storage local-lvm --uniqueAutomating Restore Tests
Manual quarterly tests are essential, but automation handles the routine checks between them.
PBS Verification Jobs
Verification jobs are built into Proxmox Backup Server and should be your first line of defense. They re-read stored chunks and verify SHA-256 hashes, catching storage corruption before it becomes a restore failure.
Configure these in the PBS web UI under your datastore's Verify Jobs tab. Schedule them to run at least weekly. Pair them with monitoring and alerting so a failed verification triggers an immediate notification, not an entry in a log nobody reads.
Scripted Restore Validation
For teams that want automated end-to-end restore testing, you can script the restore-boot-validate-cleanup cycle:
#!/bin/bash
# Monthly automated restore test
TEST_VMID=9999
BACKUP="pbs:backup/vm/100/latest"
STORAGE="local-lvm"
# Restore to temporary VM
qmrestore "$BACKUP" $TEST_VMID --storage $STORAGE --unique 2>&1
if [ $? -ne 0 ]; then
echo "RESTORE FAILED" | mail -s "Restore Test FAILED" admin@example.com
exit 1
fi
# Boot and wait for QEMU guest agent
qm start $TEST_VMID
sleep 120
# Basic health check via guest agent
qm guest cmd $TEST_VMID ping 2>&1
RESULT=$?
# Clean up
qm stop $TEST_VMID && qm destroy $TEST_VMID --purge
# Report
if [ $RESULT -eq 0 ]; then
echo "Restore test PASSED" | mail -s "Restore Test OK" admin@example.com
else
echo "Restore test FAILED - VM did not respond" | mail -s "Restore Test FAILED" admin@example.com
fiAdapt this to your environment. Add application-specific health checks (HTTP requests, database queries) for more thorough validation.
Storage Consumption
Automated restore tests temporarily consume storage for the test VM's disks. Ensure your target storage has enough headroom, and always destroy test VMs automatically. A forgotten test VM with a 500GB disk will cause problems.
Building a DR Testing Schedule
A sustainable testing schedule uses a tiered approach. Not everything needs to be tested at the same frequency.
- Weekly: PBS verification jobs run automatically. These are set-and-forget once configured.
- Monthly: File-level restore test. Pick 2-3 random files from different VMs, restore them, verify their contents match expectations.
- Quarterly: Full VM restore test on 2-3 critical VMs. Follow the step-by-step process above. Rotate which VMs you test so all critical systems are covered within a year.
- Annually: Full DR drill. Simulate a complete host failure. Start from a fresh Proxmox VE installation, connect to your PBS datastore (including any offsite datastores), and restore your core infrastructure from scratch. This tests not just your backups, but your documentation, your credentials, and your team's ability to execute under pressure.
What to Document
Keep a restore test log. For each test, record:
- Date of the test
- VMs or data tested (VM IDs, file paths)
- Backup snapshot used (timestamp)
- Time to restore (wall clock from start to verified-working)
- Result (pass/fail)
- Issues found and remediation taken
- Tester name
This log becomes evidence for compliance audits and a reference for estimating recovery times during actual incidents.
Common Restore Failures and How to Avoid Them
When a restore test fails, the cause usually falls into one of these categories:
- "Datastore not found" — The PBS storage isn't configured on the target node. If you're restoring to a different PVE node than the one that created the backup, add the PBS storage to that node first.
- Slow restores — Usually a bandwidth bottleneck between PVE and PBS. For remote/offsite datastores, network throughput is the limiting factor. Use the initial seed calculator to estimate restore times for large VMs and plan accordingly.
- Permission errors — The backup user or API token lacks restore privileges. The
DatastoreBackuprole allows creating backups but not restoring. You need at leastDatastoreReaderfor restore operations. - Disk space errors — The target storage doesn't have enough free space for the restored VM's disks. Check available space before starting. A thin-provisioned 500GB disk might only use 80GB in the backup, but restore may allocate the full 500GB depending on your storage backend.
- Boot failures after successful restore — The restore completed but the VM won't boot. Common causes include missing EFI disk (if the original used UEFI boot), hardware configuration mismatches, or driver issues. Check the VM's hardware configuration matches what's needed.
Frequently Asked Questions
Recommended DR Testing Cadence
Weekly: PBS Verification Jobs
Automated chunk integrity checks. Set and forget after initial configuration.
Monthly: File-Level Restore
Restore 2-3 random files from different VMs. Verify contents match expectations.
Quarterly: Full VM Restore
Restore 2-3 critical VMs to temporary IDs. Boot, verify services, document results.
Annually: Bare-Metal DR Drill
Simulate total host failure. Fresh PVE install, reconnect to PBS, restore core infrastructure.
Conclusion
Backups that haven't been tested are assumptions, not protection. The verification job catches corrupted chunks. The file-level restore catches scope issues. The full VM restore proves you can actually recover. And the annual DR drill proves your team can execute the entire recovery chain under pressure.
Build the testing habit now, while nothing is on fire. Start with a single VM restore this week, add verification jobs if you haven't already, and commit to the quarterly schedule. Each test you run is one fewer surprise during a real incident.



