remote-backups.comremote-backups.com
Contact illustration
Sign In
Don't have an account ?Sign Up

PBS Verify Jobs: Automate Backup Integrity Checks

Your backup job ran at 2:00 AM and reported success. What it didn't tell you is that three chunks in VM 105's backup are silently corrupted from a bad sector on your storage array. You won't find out until you try to restore under pressure. PBS verify jobs are designed to surface that problem before it becomes a crisis.

Key Takeaways
  • Verify jobs re-read stored chunks and recompute SHA-256 checksums to detect silent corruption at rest
  • They run against already-stored data, not live systems, so they have no impact on backup windows
  • The --ignore-verified and --outdated-after flags prevent redundant rechecking and reduce I/O load
  • Verify does not require decrypting client-side encrypted data — it validates encrypted chunks directly
  • For MSPs, namespace-scoped verify jobs let you validate per-client data with tenant-level granularity
  • Verify catches chunk corruption; it does not prove a VM will boot — combine with quarterly restore tests

This post covers what verify jobs actually do, how to configure them for single-node and multi-tenant environments, and how to wire them into your alerting so failures don't disappear into a log nobody reads.

What PBS Verify Jobs Actually Do

Proxmox Backup Server stores backups as chunks. Each chunk is a fixed-size block of data with a SHA-256 checksum stored alongside it. When you run a verify job, PBS re-reads each chunk from disk and recomputes that checksum. If the computed hash doesn't match the stored hash, the chunk is flagged as corrupted.

This catches a specific class of failure: data that was written correctly but degraded on disk afterward. Common causes include:

  • Bitrot on spinning disks or cheap SSDs without ECC
  • Storage controller failures that corrupt data during reads or writes
  • Filesystem bugs or unclean shutdowns on the backup datastore
  • Bad RAM that flips bits after a write completes

What verify jobs do not do: they don't check whether a backup produces a functional, bootable system. A backup with intact chunks but a corrupted partition table will pass verification. For application-level validation, you still need restore tests, which the PBS restore testing guide covers in detail.

Encrypted Backups

PBS uses client-side encryption. If a backup was encrypted before reaching the server, verify jobs can still run against it. They validate the encrypted chunks directly, comparing stored and computed hashes on the ciphertext. No decryption key is required on the server side.

This matters for MSPs: you can verify client backups without ever having access to their encryption keys.

Chunk deduplication and verification

When chunks are deduplicated across multiple backups, verifying one backup also validates shared chunks. PBS tracks which backups reference each chunk, so verification coverage compounds over time. Run verify on your oldest active backup and you'll likely cover chunks used by more recent ones too.

Setting Up Verify Jobs

CLI Configuration

The proxmox-backup-manager verify-job command manages verify jobs on the server. The basic pattern is:

bash
proxmox-backup-manager verify-job create weekly-verify \
    --store main-datastore \
    --schedule "weekly" \
    --ignore-verified true \
    --outdated-after 30
Create a weekly verify job

Key flags:

  • --store — the datastore to verify
  • --schedule — any systemd calendar format: daily, weekly, monthly, or a cron-style expression like 0 3 * * 0 for 3 AM on Sundays
  • --ignore-verified — skip chunks that have already been verified; avoids redundant work on large datastores
  • --outdated-after — re-verify chunks that haven't been checked in N days, even if they were previously verified

The --ignore-verified and --outdated-after flags work together. With --ignore-verified true and --outdated-after 30, a verify job will skip chunks verified within the last 30 days and re-check anything older. This keeps I/O load predictable and ensures nothing goes unchecked indefinitely.

To list existing jobs:

bash
proxmox-backup-manager verify-job list
List verify jobs

To update a job after creation:

bash
proxmox-backup-manager verify-job update weekly-verify \
    --schedule "0 2 * * 0"
Update verify schedule

Web UI Setup

In the PBS web interface, navigate to a datastore and open the Verify Jobs tab. Click Add and fill in:

  • Job ID — a label for the job
  • Schedule — choose from the dropdown or enter a custom expression
  • Ignore verified — check this on any datastore over 1 TB to avoid re-scanning recently verified chunks
  • Re-Verify After — set a day count; 30 days works for most environments

Save, then click Run Now to confirm the job executes without errors before relying on the schedule.

Namespaced Verification

If you're running multi-tenant namespaces, you can scope a verify job to a specific namespace rather than verifying the entire datastore. This gives you per-client verification scheduling and isolated failure reporting.

bash
proxmox-backup-manager verify-job create client-a-verify \
    --store shared-datastore \
    --ns clients/client-a \
    --schedule "weekly" \
    --ignore-verified true \
    --outdated-after 14
Verify a specific namespace

For MSP environments, the typical pattern is one verify job per client namespace, each with a slightly different schedule time to spread the I/O load across the week.

Scheduling Strategy

Verify jobs read from storage. On a busy datastore with active backup windows, running verification at the same time creates contention. Schedule verify jobs during off-peak hours, after backup jobs complete.

A general rule: run backup jobs first, then verification. If your backups finish by 4 AM, schedule verification at 5 AM.

For large datastores, the --ignore-verified flag makes a significant difference. Without it, a 10 TB datastore running a full verify every week means reading 10 TB every Sunday. With it, PBS only reads the chunks not already checked in the last 30 days, which on a stable datastore is a small fraction of the total.

Recommended Verify Schedules by Datastore Size
Setting
Under 1 TB
1–10 TB
10–50 TB
50 TB+
Schedule
Weekly
Weekly
Twice weekly
Daily (off-peak)
ignore-verified
false
true
true
true
outdated-after
30 days
14 days
7 days
Typical I/O per run
Full scan, low overhead
Incremental, 5–20% of total
Spread across two runs
Small daily slice, full coverage over a week
Stagger jobs in multi-tenant setups

In MSP environments with multiple client namespaces, offset each verify job by 30–60 minutes. If every client's verify job starts at 5 AM, you get a burst of storage reads that peaks immediately. Staggering them means consistent, lower I/O throughout the morning window.

Interpreting Verify Results

After a verify job runs, results appear in the Task Log in the PBS web interface, and via proxmox-backup-manager task list on the CLI.

A clean run looks like this in the logs:

text
verify datastore 'main-datastore': OK
Verification successful.
verified 14832 chunks, 0 failed
Successful verify job output

A failure looks like this:

text
verify chunk 'a1b2c3d4...': checksum mismatch
verify backup 'vm/105/2026-03-15T02:00:05Z': failed
Verification failed: 1 chunk failed
Failed verify job output

When a chunk fails, the output includes the chunk hash and which backup snapshots reference it. Use that information to identify which VMs are affected and which backup dates are potentially unrestorable.

A failed verify means that backup may not restore

A chunk checksum mismatch means that specific backup snapshot may not restore cleanly. Don't delete it immediately. First, check whether more recent snapshots of the same VM pass verification. If newer snapshots are intact, the corrupted snapshot can be pruned. If the corruption is recent and affects your only good backup, run an immediate fresh backup before removing anything.

After verifying the scope of corruption, consider whether you need to run garbage collection after pruning affected snapshots to reclaim space from orphaned chunks.

Alerting on Verify Failures

Verify results in the task log are useless if nobody reads them. Wire failures into your alerting pipeline.

PBS Notification System

Proxmox Backup Server has a built-in notification system under Configuration > Notifications. Configure a notification target (email, Gotify, or a custom webhook) and set up a notification rule that fires on verify jobs with a failed or warning status.

With this in place, a verify failure sends an alert immediately rather than waiting for someone to check the UI.

Prometheus Integration

If you're running PBS with Prometheus and Grafana, PBS exposes metrics for verify job status via its built-in metrics endpoint. You can create an alert rule that fires when the last verify job for a datastore ends in a non-OK state. This integrates verification into the same alerting stack you're already using for backup job status, storage utilization, and task durations.

yaml
- alert: PBSVerifyJobFailed
  expr: pbs_verify_job_status{status!="ok"} == 1
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "PBS verify job failed on {{ $labels.datastore }}"
    description: "Chunk corruption detected. Check PBS task log immediately."
Example Prometheus alert rule

Adapt the metric name to match your PBS exporter's actual labels. The point is that verify failures should be as visible as backup failures — they're equally serious.

Verify vs Restore Testing

Verification and restore testing are complementary, not interchangeable. Both belong in your backup monitoring and alerting practice.

Verify Jobs vs Restore Tests
Attribute
PBS Verify Job
Restore Test
What it checks
Chunk checksums on disk
Full restore, boot, services
Automation
Fully automated
Manual or scripted
Overhead
Low to medium (storage reads only)
High (storage, CPU, temporary disk space)
Proves
Data is intact at rest
Backup produces a working system
Recommended frequency
Weekly
Quarterly per critical VM

Verify weekly. Restore test quarterly. The verify job catches corruption early, before your only intact backup gets pruned. The restore test catches everything else: configuration drift, permission issues, and whether your restored VM actually boots.

For MSPs, this combination gives you a defensible position with clients. Automated weekly verification proves you're actively monitoring backup health. Quarterly restore tests prove the backups actually work. Both are documentable.

Compliance and Client Reporting

Verify job results create an audit trail. For clients or organizations subject to ISO 27001, SOC 2, or internal IT governance frameworks, you need to demonstrate that backups are not just taken but actively validated.

The PBS task log exports to JSON. You can pull verify job results via the API and include them in client-facing reports:

bash
# List recent verify tasks for a datastore
proxmox-backup-manager task list \
    --typefilter verificationjob \
    --output-format json | jq '.[] | {upid, status, starttime}'
Export verify job results via PBS API

For MSPs managing multiple clients through a structured PBS setup, automate this into a weekly report: one line per client namespace, last verify date, pass/fail. Clients with SOC 2 audits will ask for this. Have it ready.

Verification is necessary but not sufficient for compliance

Most compliance frameworks require demonstrating recoverability, not just integrity. Verify jobs satisfy the "data is intact" requirement. To fully satisfy backup-related controls, pair them with documented restore tests. Both need to be in your evidence package.

Wrapping Up

Verify jobs are low-cost insurance against a class of failures that silent corruption makes genuinely dangerous. Configure them once, set a reasonable schedule with --ignore-verified and --outdated-after, and wire failures into your alerting stack. Then forget about them until they send you an alert.

For MSPs, the value compounds: automated verification plus quarterly restore tests gives you a repeatable, documentable process. Clients want proof their backups are valid. Verify jobs make that proof automatic.

Don't skip the alerting step. A verify job that fails and writes to a log nobody reads is worse than not running it — it creates a false sense of coverage.

Managed PBS with built-in verification

remote-backups.com runs automated verify jobs across all customer datastores. Integrity monitoring is included with every plan, with alerting on failure.

View Plans

It depends on datastore size, storage speed, and whether --ignore-verified is enabled. On a 1 TB datastore with fast NVMe storage, a full scan typically runs in 30–60 minutes. With --ignore-verified and --outdated-after 30, a weekly job on a stable datastore may complete in minutes since most chunks were already checked. The first run after enabling verify jobs is always the slowest — subsequent runs with incremental flags are much faster.

Verify jobs read from the same storage pool used for backups. If you run them concurrently with active backup jobs, there will be I/O contention. Schedule verify jobs after your backup window closes. On systems with fast storage and no concurrent backups, the impact is minimal. Monitor your storage latency during the first few runs to confirm it's acceptable in your environment.

No. PBS verify jobs work on encrypted chunks directly — they validate the SHA-256 checksum of the ciphertext, not the plaintext. No decryption key is required on the server. From a verify job configuration standpoint, encrypted and unencrypted backups are handled identically. This is one reason client-side encryption is safe to use: you can verify integrity without exposing keys to the server operator.

First, identify which backup snapshots are affected using the verify task log. Check whether more recent snapshots of the same VM pass verification. If newer backups are intact, the corrupted snapshot can be pruned safely. If the corruption is recent or widespread, run a fresh backup immediately before removing anything. Then investigate the root cause: check storage health (SMART data, ZFS scrub results), controller logs, and server RAM. A single corrupted chunk might be a one-off event; multiple failures across different backups suggest a hardware problem.
Bennet Gallein
Bennet Gallein

remote-backups.com operator

Infrastructure enthusiast and founder of remote-backups.com. I build and operate reliable backup infrastructure powered by Proxmox Backup Server, so you can focus on what matters most: your data staying safe.