PBS Troubleshooting: Fix Common Issues

March 28, 2026
9 min read

PBS doesn't fail often. When it does, you need to diagnose and fix it fast, before your next backup window opens or your client notices. This guide covers the most common Proxmox Backup Server failures in order of frequency: connection problems, job failures, performance degradation, garbage collection confusion, and sync issues.

Key Takeaways

Port 8007 is the PBS HTTPS interface — check firewall, service status, and fingerprint before anything else
Username realm matters: user@pbs for PBS users, user@pam for Linux system users
Pruning removes snapshot manifests but does not free disk space — only garbage collection does that
GC duration scales with chunk count, not raw storage size; large datastores taking hours is normal
Verify failures don't mean a backup is gone — they flag specific corrupted chunks; re-run the affected backup
For MSPs: read the PBS task log directly, not just the PVE job summary — the detail is in PBS

Connection Issues

Connection failures account for a large portion of PBS support tickets. Most come down to four things: firewall, service state, addressing, or TLS fingerprint.

Connection Refused or Timeout

PBS serves its web interface and API on port 8007/TCP over HTTPS. If connections time out or get refused, work through this checklist:

Firewall: Check that port 8007 is open on the PBS host. On Debian-based systems:

bash

# Check current iptables rules
iptables -L INPUT -n --line-numbers | grep 8007

# Or with ufw
ufw status | grep 8007

# Open the port if missing
ufw allow 8007/tcp

Check and open port 8007

Service state: Two systemd units handle PBS connections. Both must be running:

bash

systemctl status proxmox-backup.service
systemctl status proxmox-backup-proxy.service

Check PBS service status

The proxmox-backup-proxy service handles HTTPS on port 8007. If it's stopped, all remote connections fail regardless of firewall state.

Addressing: Verify the IP or hostname is correct on the client side. DNS resolution failures present identically to refused connections from the client's perspective.

TLS Fingerprint Errors

Proxmox Backup Server uses a self-signed certificate by default. PVE records the certificate's SHA-256 fingerprint when you add PBS as a storage backend. Reinstalling PBS or renewing the certificate generates a new fingerprint, and PVE then refuses the connection with a fingerprint mismatch error.

Find the current fingerprint:

bash

proxmox-backup-manager cert info | grep Fingerprint

Get current PBS TLS fingerprint

Alternatively, open the PBS web UI and navigate to Dashboard, then Certificate Information. Copy the fingerprint from there.

Update the stored fingerprint in PVE: go to Datacenter > Storage, edit your PBS storage entry, and paste the new fingerprint into the Fingerprint field.

Fingerprint changes break all connected PVE nodes

If multiple PVE nodes use the same PBS instance, you must update the fingerprint on each node separately. One missed node causes intermittent failures that look unrelated to the cert change.

Authentication Failures

PBS uses realm-qualified usernames. The realm suffix is not optional:

Format	user@pbs	user@pam	user@realm!tokenname
Realm	PBS internal users	Linux PAM	API tokens
Use Case	Accounts created in the PBS user database	System users authenticated by the OS	Token-based auth for automation

Using admin instead of admin@pam or admin@pbs produces a 401 error. The token format for API tokens is user@realm!tokenname, not user@realm:tokenname.

Beyond usernames, check ACL permissions. A user needs at least DatastoreReader on the target datastore for read operations and DatastoreBackup to write backups. Missing ACLs produce permission denied errors, not auth failures, but the two are easy to confuse.

Backup Job Failures

Datastore Full

The most common backup failure. PBS rejects new backups when a datastore hits capacity. Before expanding storage, verify that maintenance has been running:

bash

# Check current datastore usage
proxmox-backup-manager datastore list

# List prune jobs and their last run time
proxmox-backup-manager prune-job list

# List GC jobs
proxmox-backup-manager gc-job list

Check datastore usage and maintenance jobs

If pruning has been running but the datastore is still full, garbage collection may not have run since the last prune. Pruning removes snapshot manifests. It does not free disk space. Only garbage collection reclaims space by deleting orphaned chunks. See our pruning and garbage collection guide for the full explanation of how these two operations relate.

Run GC manually to reclaim space immediately:

bash

proxmox-backup-manager gc run <datastore-name>

Run GC on a datastore manually

Snapshot Failures

QEMU guest agent not responding: If qemu-guest-agent is enabled in the VM config but not installed or running inside the guest, the backup may stall waiting for a quiesced snapshot. Fix: either install and start the agent inside the VM, or disable the agent option in the VM's hardware configuration if you don't need quiescing.

Storage backend errors: If the VM's underlying storage has I/O errors, backup reads fail. Check dmesg and storage logs on the PVE node to confirm the disk, ZFS pool, or NFS mount is healthy before blaming PBS.

Timeout During Backup

Large changed blocks over a slow or unstable network produce timeouts. The PBS client retries internally, but a sustained degraded connection can outlast the retry window.

Diagnose the network path first:

bash

# On PBS server (or any host near it): start iperf3 server
iperf3 -s

# On PVE node: test throughput
iperf3 -c <pbs-host-ip> -t 30

Test throughput between PVE node and PBS

If throughput is well below your expected baseline, the problem is the network path, not PBS configuration. For WAN-connected offsite targets, schedule backups during off-peak hours and consider seeding large initial datasets locally. Our initial seed loading guide covers transfer strategies for large first-time syncs.

Backup Job Failures

Error / Symptom	datastore full	Backup stalls, then fails	Job fails after partial upload	permission denied on backup write	no space left on device
Likely Cause	GC hasn't run since last prune	Guest agent timeout	Network timeout or instability	ACL missing DatastoreBackup	Datastore at 100% capacity
Fix	Run proxmox-backup-manager gc run <store>	Install qemu-guest-agent or disable agent in VM config	Check iperf3, schedule during off-peak, reduce concurrent jobs	Add ACL via PBS UI or proxmox-backup-manager acl update	Run GC, then tighten retention policy if still full

Performance Issues

Slow Backups

Narrow down to network, disk, or CPU before changing any configuration.

Network: Run iperf3 between the PVE node and PBS host as shown above. Compare against your expected baseline.

Disk I/O: Run iostat -x 2 on both the PVE node and PBS host during an active backup. Watch %util for the storage devices involved. A device near 100% utilization is saturated.

CPU: Run htop on the PBS host during a backup. High CPU from proxmox-backup-proxy points to compression overhead. PBS uses zstd compression by default. For large datasets that are already compressed (VM images full of compressed files, database backups using their own compression), disabling PBS-level compression can cut CPU use significantly:

bash

proxmox-backup-manager datastore update <store-name> --chunk-order none

Disable compression on a datastore

Check compression ratio before disabling

PBS shows deduplication and compression statistics per datastore. If your compression ratio is below 1.05x (less than 5% savings), compression is burning CPU for negligible benefit. If it's above 1.4x, keep it enabled.

Too Many Concurrent Jobs

PBS does not hard-limit concurrent backup jobs. If 30 VMs all back up simultaneously, PBS serves all of them at once, and CPU and disk I/O suffer on both ends. Stagger your PVE backup job schedules by 5 to 10 minutes. For MSPs managing multiple client environments from a shared PBS target, this is one of the most impactful configuration changes you can make.

We cover multi-client scheduling in depth in our PBS backup scheduling guide.

Performance Diagnostics

Issue	Slow backup throughput	Disk saturation	CPU bottleneck	Too many concurrent jobs	GC overlap with backups
Diagnostic Command	iperf3 -c <pbs-host>	iostat -x 2 during backup	htop on PBS host	PBS web UI: Active Tasks	proxmox-backup-manager gc-job list
What to Look For	Compare against line-rate baseline	%util above 90% on storage device	High proxmox-backup-proxy CPU	More than 4-6 simultaneous backup jobs	GC schedule overlaps backup window

Garbage Collection Problems

Space Not Freed After Pruning

This is the most common source of confusion in Proxmox Backup Server. The workflow is:

Prune removes snapshot manifests based on your retention policy
GC identifies chunks not referenced by any remaining snapshot and deletes them

If you prune 50 old snapshots but don't run GC, the disk usage does not change. Chunks from those snapshots may be shared with other snapshots through deduplication. GC is the only operation that actually frees disk space, and it only frees chunks that are no longer referenced by any snapshot.

Run GC after every significant pruning operation and check the output:

bash

# Run GC (replace <store> with your datastore name)
proxmox-backup-manager gc run <store>

# Check GC task log for bytes freed
proxmox-backup-manager task list --limit 10 | grep gc

Run GC and check freed space

GC Takes Too Long

GC duration scales with the number of unique chunks in the datastore, not with total storage size. A 10TB datastore with high deduplication runs faster than a 2TB datastore with many unique chunks. On HDD-backed datastores, the mark phase involves random reads across the entire chunk store, which is slow by nature.

Running GC weekly instead of daily keeps each run smaller. If GC regularly takes more than 4-6 hours, check available memory on the PBS host. The mark phase loads chunk manifests into memory. Tight memory causes swap usage, which extends GC significantly.

Never run GC while backup or sync jobs are active

PBS includes a 24-hour safety window that protects newly written chunks, but the safest approach is to schedule GC in a dedicated maintenance window with no concurrent backup or sync activity. An overlapping GC sweep can delete chunks that an in-progress backup hasn't finished indexing yet.

Verification Failures

PBS verify jobs read each snapshot's chunks and validate their checksums. A failed verify means one or more chunks have mismatched checksums, which indicates data corruption on disk.

A verify failure does not mean the entire backup is unrecoverable. PBS identifies which snapshots are affected. Re-run the backup for those VMs to create new, clean snapshots. Then prune the corrupted snapshots.

Prevention is straightforward: run verify jobs on a regular schedule and alert on failures. Catching corruption early, before you need the backup, is the entire point. Our PBS verify jobs guide covers scheduling strategies and interpreting verify output.

Alert on every verify failure

A silent verify failure on your offsite datastore that you discover at restore time is a worst-case scenario. Configure email or webhook notifications for failed verify tasks. It takes 10 minutes to set up and saves hours of recovery work.

Sync Job Issues

Sync job failures generally fall into three categories: remote connection problems, namespace configuration, and encryption mismatch.

Remote connection failures follow the same pattern as PVE-to-PBS connection issues: wrong hostname, port 8007 not reachable, fingerprint mismatch, or authentication failure. Work through the connection checklist from the first section.

Namespace mismatch: If you use PBS namespaces for multi-tenant isolation, sync jobs must specify matching source and target namespaces. A sync job targeting the root namespace won't pull data from sub-namespaces unless configured with --recursive. Check the --ns parameters on both source and target when a sync job succeeds but transfers no data.

Encrypted sync: Sync jobs transfer raw chunks without decrypting them. If the source datastore contains a mix of encrypted and unencrypted backups, use --encrypted-only to ensure only encrypted data reaches the remote target. Omitting this flag on a mixed datastore silently transfers unencrypted chunks to your offsite server.

bash

proxmox-backup-manager sync-job create offsite-sync \
  --store local-datastore \
  --remote offsite-pbs \
  --remote-store remote-datastore \
  --encrypted-only true \
  --schedule "daily 02:00"

Create sync job with encrypted-only flag

MSP Quick Checklist

When a client reports a backup problem, work through these steps before touching any configuration:

Is PBS running? systemctl status proxmox-backup.service proxmox-backup-proxy.service
Is the datastore full? proxmox-backup-manager datastore list
Did the job fail or just warn? Check the PBS task log directly, not just the PVE job summary. PVE sometimes shows "OK with warnings" for jobs that actually wrote corrupted data.
When did GC last run? proxmox-backup-manager gc-job list
What does the error message actually say? PBS error messages are specific. Read the full task log.

For MSPs managing multiple PBS environments, catching problems before clients notice them requires centralized monitoring. Prometheus metrics from each PBS instance let you alert on "datastore approaching full" before it causes job failures. Our PBS monitoring guide covers the full Prometheus and Grafana setup for multi-environment visibility.

Quick Reference

Quick Reference Commands

Issue	Service status	Datastore usage	TLS fingerprint	Active tasks	Prune jobs	GC jobs	Run GC now	ACL permissions
Command	systemctl status proxmox-backup-proxy.service	proxmox-backup-manager datastore list	proxmox-backup-manager cert info	proxmox-backup-manager task list --limit 20	proxmox-backup-manager prune-job list	proxmox-backup-manager gc-job list	proxmox-backup-manager gc run <store>	proxmox-backup-manager acl list
What It Shows	Whether the HTTPS proxy is running	Used/total space per datastore	Current certificate fingerprint	Recent job history with status	Configured prune schedules	GC schedule and last run	Trigger GC immediately	Current permission assignments

Wrapping Up

Most PBS problems have a short list of root causes: network or firewall blocking port 8007, fingerprint mismatch after a cert change, the pruning/GC confusion where space doesn't free up as expected, or too many concurrent jobs saturating CPU and disk. Work through the connection stack first, then look at job logs for specific error messages. PBS error output is specific enough that the fix is usually obvious once you're reading the right log.

For MSPs, the key is detecting issues before they cause job failures. Centralized monitoring with alerts on datastore capacity, failed verify jobs, and failed sync jobs covers the scenarios that matter most at scale.

Need the offsite PBS leg handled for you?

remote-backups.com provides encrypted PBS targets in EU datacenters. Managed GC scheduling, monitored sync jobs, and isolated credentials included.

View Plans

Pruning removes snapshot manifests but does not free disk space. Disk space is reclaimed only by garbage collection, which deletes chunks that are no longer referenced by any snapshot. Run 'proxmox-backup-manager gc run <datastore-name>' after pruning and check the output for bytes freed.

PBS uses a self-signed certificate by default. When you add PBS as a storage backend in PVE, PVE records the certificate's SHA-256 fingerprint. If PBS is reinstalled or the certificate is renewed, the fingerprint changes and PVE refuses to connect. Find the current fingerprint with 'proxmox-backup-manager cert info' and update it in each PVE node's storage configuration.

PBS uses realm-qualified usernames. 'user@pbs' refers to accounts created directly in the PBS internal user database. 'user@pam' refers to Linux system users authenticated via PAM. Using a username without a realm suffix will always fail. API tokens use the format 'user@realm!tokenname'.

GC duration scales with the number of unique chunks in the datastore, not total storage size. A datastore with low deduplication (many unique chunks) takes longer than a highly deduplicated one of the same size. HDD-backed datastores are slower than SSDs for GC due to the random-read pattern of the mark phase. Running GC weekly instead of daily keeps each run smaller.

A verify failure means one or more chunks in a snapshot have mismatched checksums, indicating data corruption on disk. It does not mean the entire backup is unrecoverable. PBS identifies the affected snapshots. Re-run the backup for those VMs to create fresh, clean snapshots, then prune the corrupted ones. Configure alerts for verify failures so you catch them before restore time.

The most likely cause is a namespace mismatch. If the source datastore uses sub-namespaces and your sync job targets the root namespace without the --recursive flag, it finds no data to sync. Check the --ns parameter on both source and target in the sync job configuration.

Sign In

PBS Troubleshooting: Fix Common Issues

Key Takeaways

Connection Issues

Connection Refused or Timeout

TLS Fingerprint Errors

Fingerprint changes break all connected PVE nodes

Authentication Failures

Backup Job Failures

Datastore Full

Snapshot Failures

Timeout During Backup

Backup Job Failures

Performance Issues

Slow Backups

Check compression ratio before disabling

Too Many Concurrent Jobs

Performance Diagnostics

Garbage Collection Problems

Space Not Freed After Pruning

GC Takes Too Long

Never run GC while backup or sync jobs are active

Verification Failures

Alert on every verify failure

Sync Job Issues

MSP Quick Checklist

Quick Reference

Quick Reference Commands

Wrapping Up

Need the offsite PBS leg handled for you?

Tags

Bennet Gallein

Backup Solutions

Resources

Useful Links

Tools

Newsletter

For Who

Comparisons

Our Network

PBS Troubleshooting: Fix Common Issues

Key Takeaways

Fingerprint changes break all connected PVE nodes

Backup Job Failures

Check compression ratio before disabling

Performance Diagnostics

Never run GC while backup or sync jobs are active

Alert on every verify failure

Quick Reference Commands

Need the offsite PBS leg handled for you?

Why is my PBS datastore still full after running a prune job?

What does a TLS fingerprint mismatch error mean?

What is the difference between user@pbs and user@pam in PBS?

Why is garbage collection taking so long?

What does a PBS verify failure mean for my backup?

Why does my sync job succeed but transfer no data?

Related Articles

Tags

Share this article

Bennet Gallein

You might also like

PBS Performance Tuning: Optimize Backup Speed

PBS Verify Jobs: Automate Backup Integrity Checks

PBS Monitoring with Prometheus and Grafana