remote-backups.comremote-backups.com
Contact illustration
Sign In
Don't have an account ?Sign Up

PBS Performance Tuning: Optimize Backup Speed

Your backup job is taking four hours. It could finish in 45 minutes. Proxmox Backup Server has a collection of tuning knobs buried in sysctl, ZFS properties, and the PBS daemon config that most admins never find. Initial seed uploads crawl over WAN not because the link is too slow, but because TCP buffers are sized for 1999. This post covers the specific changes that actually move the needle.

Key Takeaways
  • ZFS recordsize=1M is the highest-impact single storage change for a PBS datastore — set it before writing data
  • TCP buffer tuning can double WAN throughput on high-latency links with no hardware changes
  • Disable per-backup chunk verification on trusted ZFS pools; run separate verify jobs on a schedule instead
  • ZSTD level 1 (PBS default) captures most compression gains at a fraction of higher-level CPU cost
  • Stagger backup windows by 30-60 minutes per client — simultaneous jobs fight for the same disk I/O pool
  • Benchmark raw disk and network before tuning. You cannot fix a bottleneck you have not identified.

Where PBS Bottlenecks Happen

Before touching any settings, identify which layer is your actual bottleneck. Tuning TCP buffers on a disk-bound system wastes time.

Three places kill PBS performance:

Network limits initial seeds and remote sync jobs. Symptoms: backup throughput tracks exactly with your link speed, CPU and disk sit idle.

Disk I/O limits incremental backups and restores. Symptoms: iotop shows 100% utilization on the datastore drive, network interface is underutilized.

CPU limits encryption and compression throughput. Symptoms: htop shows one or more cores pinned, disk and network are both idle.

Diagnostic Commands

Check what is saturated before tuning:

bash
# Disk I/O per process — look for proxmox-backup or backup-proxy
iotop -a -o

# CPU per core
htop

# Active connections and current socket buffer sizes
ss -tnp | grep backup

# Real-time disk throughput on your datastore volume
iostat -xm 2 /dev/sdX
Identify bottleneck

If iotop shows proxmox-backup saturating your disks, fix storage first. If ss shows tiny socket buffer sizes on an offsite link, start with network tuning.

Network Optimization

Network is the bottleneck for offsite sync and initial seeds. Linux's default TCP socket buffers were calibrated for LAN speeds and have not kept pace with modern WAN links.

TCP Buffer Tuning

The kernel defaults cap socket buffers around 128K-256K. On a 1Gbps WAN link with 20ms RTT, this limits throughput well below line rate. The bandwidth-delay product at that RTT demands a buffer of at least 2.5MB to fill the pipe:

bash
# Maximum socket receive and send buffers (128 MB)
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728

# TCP read/write buffers: min, default, max
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728

# Required for large buffers to take effect
net.ipv4.tcp_window_scaling = 1

# BBR congestion control — better than cubic on variable-latency WAN links
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
/etc/sysctl.d/99-pbs-network.conf

Apply with sysctl -p /etc/sysctl.d/99-pbs-network.conf. These settings survive reboots.

Apply on both ends

TCP buffer tuning helps most when applied on both the source (PVE node or PBS source) and the destination (your remote PBS target). Tuning only one side leaves half the improvement on the table.

MTU Considerations

If your backup VLAN supports jumbo frames end-to-end (switches, NICs, and the remote side), set MTU 9000 on the backup interface. This reduces per-packet overhead and lowers CPU interrupt load at high throughput:

bash
# Test first (non-persistent)
ip link set eth1 mtu 9000

# Verify with a large ping to the remote PBS target
ping -M do -s 8972 <pbs-target-ip>
Set jumbo frames on backup interface

Only configure this if your entire path supports jumbo frames. A mismatch causes fragmentation and usually makes performance worse.

Bandwidth Throttling

PBS sync jobs accept a --rate limit. Set this during business hours to protect production traffic; remove it or increase it for overnight windows:

bash
# Limit to 100 MiB/s
proxmox-backup-manager sync-job update your-sync-job --rate 100MiB

# Remove limit for overnight runs
proxmox-backup-manager sync-job update your-sync-job --rate ""
Rate-limit a sync job during business hours

Without a rate limit, a large initial sync will saturate your uplink and affect everything else on the connection.

WireGuard vs Direct Connection

If your offsite PBS target is behind WireGuard, expect 5-10% throughput reduction from tunnel overhead. WireGuard at the kernel level is significantly faster than OpenVPN userspace. For a dedicated backup link where both machines sit in the same datacenter, a direct VLAN without a tunnel is faster still — though you lose the encryption layer in transit.

Network Tuning Settings
Setting
Default
Tuned Value
Impact
net.core.rmem_max
212992
134217728
2-4x throughput on high-latency WAN
net.core.wmem_max
212992
134217728
2-4x throughput on high-latency WAN
TCP congestion control
cubic
bbr
10-30% improvement on variable-latency links
MTU (backup VLAN)
1500
9000
5-15% CPU reduction at high throughput
Sync job rate limit
none
per-environment
Prevents uplink saturation during business hours

Storage Backend Tuning

Most PBS deployments use ZFS. The default ZFS properties are not optimized for PBS write patterns.

ZFS recordsize

Proxmox Backup Server stores data as variable-size chunks that average around 4MB. ZFS's default recordsize is 128K, which means a single PBS chunk write triggers roughly 32 separate ZFS record writes. Setting recordsize to 1M aligns ZFS records much closer to PBS's write size and reduces metadata overhead:

bash
zfs set recordsize=1M tank/pbs-datastore
Set recordsize on PBS datastore dataset
Set this before writing data

ZFS recordsize only applies to newly written data. If you change it on an existing datastore, already-written chunks retain the old record size. Run a full garbage collection cycle, then let new backups repopulate the datastore to migrate chunks gradually.

Other ZFS Properties

Do not disable atime on a PBS datastore

PBS garbage collection uses chunk access times to identify recently written chunks and protect them from deletion during the sweep phase. Setting atime=off breaks this safety window and can cause GC to delete chunks that a concurrent backup job has written but not yet finalized, corrupting that backup.

bash
# PBS compresses its own chunks. ZFS compression on top wastes CPU.
# Disable if PBS compression is enabled (the default).
zfs set compression=off tank/pbs-datastore

# If PBS compression is disabled (e.g., for pre-compressed VM images),
# lz4 at the ZFS level is reasonable:
# zfs set compression=lz4 tank/pbs-datastore

# Prevent your ZFS management tool from snapshotting the datastore
zfs set com.sun:auto-snapshot=false tank/pbs-datastore
ZFS property tuning for PBS datastores
ZFS Properties for PBS Datastores
Property
Recommended
Why
recordsize
1M
Matches PBS chunk size, reduces ZFS metadata writes
compression
off (when PBS compresses)
Avoids CPU cost of compressing already-compressed chunk data
com.sun:auto-snapshot
false
ZFS snapshots of a PBS datastore duplicate data without benefit
sync
disabled (with UPS + SLOG)
PBS issues its own fsyncs; disabling ZFS sync improves throughput at the risk of losing in-flight data on power loss

SLOG and L2ARC

A SLOG (ZIL separate log device) accelerates synchronous writes. PBS issues fsync() calls when finalizing chunk writes, so a small NVMe SLOG device can help on spinning-disk arrays by offloading those sync operations. On all-NVMe pools, SLOG adds hardware complexity without meaningful benefit.

L2ARC (second-level read cache) benefits restore workloads, not backup writes. If you restore frequently from a spinning-disk datastore, an L2ARC SSD can cut restore times significantly. It does nothing for ongoing backup throughput.

ext4 and XFS Alternatives

ZFS has overhead that matters on commodity hardware. For dedicated PBS nodes on modern NVMe storage where you do not need ZFS checksumming or snapshots, ext4 or XFS with tuned mount options can outperform ZFS for raw sequential throughput. Test both if you are building a new node. For existing setups, the ZFS tuning above closes most of the gap.

PBS-Specific Settings

Chunk Verification After Backup

By default, Proxmox Backup Server runs a verification pass immediately after each backup job completes. This re-reads the written chunks and validates their hashes. On unreliable or untested storage it is a reasonable safeguard. On a ZFS pool with checksums enabled, it is redundant and adds significant I/O and CPU load right at the end of your backup window.

Disable per-backup verification in the datastore settings (Datastore > Options > "Verify New Backups" off). Replace it with a scheduled verify job during off-peak hours where you have full control over timing. See PBS verify jobs for scheduling strategies.

Compression Settings

PBS uses ZSTD at level 1 by default for stored chunks. Level 1 is a sensible default — it compresses substantially better than LZ4 at similar CPU cost. Higher ZSTD levels (3, 6, 12+) produce meaningfully smaller chunks only for highly compressible data, while consuming dramatically more CPU per chunk.

If your backup data is already compressed (database dumps with page compression, pre-compressed archives, media files), disable PBS compression for those jobs entirely:

bash
proxmox-backup-manager datastore update your-datastore \
    --tuning compress=false
Disable compression on a specific datastore

Compressing incompressible data wastes CPU cycles and saves zero space.

Parallel Job Limits

PBS scales worker task counts with CPU count, but you can set explicit limits to prevent backup storms from overwhelming I/O:

bash
# Show current node configuration
proxmox-backup-manager node-config show

# Cap concurrent workers — tune to your CPU and storage I/O capacity
proxmox-backup-manager node-config update --max-workers 8
View and adjust worker task limits

More concurrent workers increase throughput when you have spare CPU and disk headroom. They hurt performance when you exceed either.

Garbage Collection Scheduling

GC is I/O-intensive and competes directly with active backup jobs. Running GC during a backup window cuts effective throughput roughly in half. Schedule GC for off-peak maintenance windows, well clear of your backup start times. See pruning and garbage collection for retention policy configuration and safe scheduling.

Client-Side Optimization

Exclusion Patterns

Backing up temp directories, cache files, and virtual filesystems adds data that has no recovery value. Adding exclusion patterns to vzdump shrinks backup size and cuts runtime:

bash
# Exclude directories that contain no useful backup data
exclude-path: /tmp
exclude-path: /var/tmp
exclude-path: /var/cache
exclude-path: /run
exclude-path: /proc
exclude-path: /sys
/etc/vzdump.conf

For VM backups with the QEMU guest agent active, exclusions apply at the guest filesystem level. Without the agent, vzdump snapshots the entire disk.

vzdump Performance Settings

bash
# Reduce I/O priority so backup does not compete with production VMs
# ionice class 3 (idle) — backup runs only when disk is otherwise free
vzdump 101 --ionice 7 --storage pbs-storage

# Limit bandwidth in KB/s during business hours (102400 = 100 MB/s)
vzdump 101 --bwlimit 102400 --storage pbs-storage
Run vzdump with I/O priority and bandwidth limit

Encryption Overhead

PBS uses AES-256-GCM for client-side encryption. On CPUs with AES-NI hardware acceleration (Intel Westmere and later, AMD Zen and later), encryption adds less than 5% to total backup time. The bottleneck on encrypted jobs is almost never the encryption itself.

Run proxmox-backup-client benchmark to see the exact delta on your hardware. If your PBS nodes lack AES-NI and encryption overhead is significant, consider a dedicated PBS node with a modern CPU. Read PBS client-side encryption for full key management tradeoffs.

Backup Speed by Scenario
Scenario
Typical Speed
Bottleneck
Fix
Local VM backup, spinning disk
150-300 MB/s
Disk I/O
ZFS tuning, NVMe SLOG
Local VM backup, NVMe
400-800 MB/s
CPU (compression)
Disable double-compression, verify ZSTD level
Offsite sync, 1 Gbps WAN
80-110 MB/s
TCP buffers + latency
TCP buffer tuning, BBR congestion control
Initial seed over WAN
40-80 MB/s
Disk read + WAN ceiling
Consider physical seed loading for multi-TB datasets
Encrypted backup, AES-NI CPU
Near-identical to unencrypted
AES-NI handles it
No action needed
Encrypted backup, no AES-NI
30-50% slower
Software AES
Replace or dedicate a node with AES-NI

Multi-Client Optimization

When multiple clients back up to the same PBS datastore simultaneously, they compete for the same disk I/O pool. PBS processes jobs in parallel, but the underlying storage has fixed IOPS.

Stagger Backup Windows

Schedule jobs with 30-60 minute offsets between clients. Ten VMs starting at 02:00 simultaneously produce a write storm for the first several minutes. The same ten VMs starting at 02:00, 02:30, 03:00... each get room to complete their initial burst before the next job stresses the datastore.

PBS backup scheduling covers cron syntax and staggering strategies for multi-client environments.

Per-Client Bandwidth Limits

For sync jobs sending data to remote PBS targets, set per-job rate limits so no single large client monopolizes the uplink:

bash
proxmox-backup-manager sync-job update client-a-offsite --rate 50MiB
proxmox-backup-manager sync-job update client-b-offsite --rate 50MiB
proxmox-backup-manager sync-job update client-c-offsite --rate 50MiB
Per-sync-job rate limits for multi-client

Namespace Isolation

PBS namespaces partition a single datastore into logical segments per client or tenant. Namespaces share the underlying disk pool, so they do not provide I/O isolation. They do provide credential isolation (separate tokens per namespace) and allow per-namespace retention policies. See capacity planning for multi-client setups for datastore sizing across tenants.

Namespaces share disk bandwidth

If you need true I/O isolation between clients (e.g., one client's backup storm must not affect another's restore), separate datastores on separate disks are the only option. Namespaces are a logical boundary, not a performance boundary.

Benchmarking Your Setup

Tune against real numbers. Before changing anything, establish a baseline.

Raw Network Throughput

bash
# On the PBS target (server mode)
iperf3 -s

# On the source machine (4 parallel streams for 30 seconds)
iperf3 -c <pbs-target-ip> -t 30 -P 4
iperf3 throughput test

This gives you the ceiling for backup transfers to that target. If you see 900 Mbps raw but 200 Mbps in PBS, the bottleneck is not the network link.

Raw Disk Throughput

bash
fio --name=write-test \
    --ioengine=libaio \
    --rw=write \
    --bs=4M \
    --direct=1 \
    --size=10G \
    --numjobs=1 \
    --runtime=60 \
    --group_reporting \
    --filename=/mnt/pbs-datastore/fio-test

# Clean up after
rm /mnt/pbs-datastore/fio-test
fio sequential write test on datastore path

Compare this against your actual PBS backup throughput reported in the task log. If PBS backup speed is less than 50% of raw disk write speed, compression and chunking overhead is the likely cause.

PBS Built-in Benchmark

bash
proxmox-backup-client benchmark
PBS client benchmark

This tests local chunk write throughput and reports MB/s for both encrypted and unencrypted modes. Use it to verify that AES-NI is active and to quantify encryption overhead on your specific CPU before deciding whether it matters for your workload.

Establishing a Baseline

Before tuning, run a full backup job and record:

  • Wall-clock time from start to finish
  • Average MB/s from the PBS task log
  • CPU and disk utilization during the job (sar -u 1 and sar -d 1)

Change one variable at a time. Run the same backup again. Compare. This is the only reliable way to know whether a change actually helped or just moved the bottleneck elsewhere.

Wrapping Up

The highest-return changes for the least effort are ZFS recordsize and TCP buffer tuning. Set recordsize=1M on your PBS datastore dataset before the next GC cycle, apply the sysctl buffer changes on both source and target, and you will see measurable improvement without touching anything else.

From there, work inward. Disable double-compression (ZFS compression off when PBS compression is on). Stagger backup schedules so jobs do not pile onto disk simultaneously. Replace per-backup chunk verification with scheduled verify jobs during off-peak windows.

The more specialized tuning — BBR congestion control, SLOG devices, explicit worker limits — pays off mainly at scale or under specific hardware constraints. For a handful of VMs on modern hardware, the first three changes close most of the gap between a default install and a tuned one.

Need fast offsite PBS without the tuning headaches?

remote-backups.com provides pre-optimized PBS targets with 10Gbps internal networking, NVMe storage, and EU datacenter locations. No sysctl required.

View Plans

No. PBS deduplication operates at the chunk level, not the ZFS record level. PBS splits backup data into variable-size chunks based on content hashing, and deduplication happens when PBS identifies matching chunk hashes regardless of how those chunks are stored on disk. ZFS recordsize affects how PBS chunk files are physically laid out on the ZFS dataset, not how PBS identifies duplicate data. Changing recordsize only affects newly written data — existing chunks retain their original record size until garbage collection reclaims and rewrites them.

On CPUs with AES-NI hardware acceleration (Intel Westmere and later, AMD Zen and later), PBS encryption adds less than 5% to total backup time. PBS uses AES-256-GCM, which maps directly to hardware AES instructions. If your PBS node lacks AES-NI, software AES can reduce throughput by 30-50%. Run `proxmox-backup-client benchmark` with and without encryption to measure the actual delta on your hardware before drawing conclusions.

PBS uses ZSTD at level 1 by default. At that level, ZSTD compresses meaningfully better than LZ4 at comparable CPU cost, making it the better default. Higher ZSTD levels (3, 6, 12) produce smaller chunks only for highly compressible data while consuming dramatically more CPU per chunk — rarely worth it for general VM workloads. If your data is largely incompressible (pre-compressed databases, media files, encrypted volumes), disable PBS compression entirely rather than changing the algorithm. Compressing incompressible data wastes CPU and saves nothing.

On local NVMe storage, incremental PBS backups of already-seen data (high dedup ratio) can run at 600-900 MB/s because PBS only needs to identify chunk hashes, not write new data. For first-time backups to NVMe over gigabit LAN, 400-600 MB/s is achievable with proper tuning. Initial seeds over WAN typically land between 50-100 MB/s, constrained by link speed and TCP throughput. The ceiling is always the slowest layer: disk, network, or CPU — and PBS itself adds minimal overhead once those are tuned.

Yes, when running parallel jobs. PBS processes multiple backup jobs concurrently and each job uses workers for chunking, compression, and hashing. Running 10 simultaneous backup jobs benefits from having 10+ cores available. For a single sequential backup job, additional cores provide diminishing returns past 4-8, since the chunking pipeline has sequential dependencies. If your bottleneck is single-job throughput rather than concurrency, faster cores matter more than additional ones.
Bennet Gallein
Bennet Gallein

remote-backups.com operator

Infrastructure enthusiast and founder of remote-backups.com. I build and operate reliable backup infrastructure powered by Proxmox Backup Server, so you can focus on what matters most: your data staying safe.