You have 40 clients. They all want nightly backups. You have an 8-hour overnight window and a single Proxmox Backup Server cluster. Do the math. If every client kicks off at midnight, your disk I/O flatlines, network saturates, and half the jobs fail or run into the morning. Garbage collection never gets a chance to finish. Verification sits in a queue. Monday morning starts with a wall of alerts.
Scheduling PBS backups across dozens of clients is an engineering problem, not a checkbox exercise. This post covers how to build a schedule that actually works.
Key Takeaways
- Stagger backup start times across your window instead of running everything at once
- Schedule garbage collection per-datastore AFTER that datastore's backups complete
- Small, fast clients go first; large clients get dedicated late-window slots
- Always use UTC for cron schedules to avoid DST surprises
- Track actual backup durations weekly and adjust the schedule based on real data
- Build retry windows into the schedule so a failed job doesn't cascade into the next day
The Scheduling Problem
When you manage a single Proxmox Backup Server instance, one or two backup jobs at midnight works fine. Scale that to 20, 40, or 60 clients and everything changes. Concurrent backup jobs compete for the same resources: disk I/O, network bandwidth, CPU for chunk indexing, and memory for deduplication lookups.
PBS handles concurrent writes to the same datastore, but performance degrades as parallelism increases. If you run a datastore-per-client architecture, each job writes to its own datastore. That helps with lock contention but doesn't solve I/O bottleneck on the same physical storage.
The goal is simple: every client backed up, verified, with garbage collection completed and room for retries, all before business hours.
Understanding PBS Resource Constraints
Before designing a schedule, you need to know what's fighting for resources.
Disk I/O
PBS is write-heavy during backup ingestion. Each backup job streams chunks to disk, deduplicates against the chunk index, and writes new chunks plus the snapshot manifest. During garbage collection, the pattern flips to read-heavy as PBS scans all chunk references. Running both simultaneously is the fastest way to tank throughput.
Network Bandwidth
Every client pushes data over the network. A 1 Gbps link supports roughly 450 GB/hour of throughput in practice. If you have 10 clients each sending 50 GB concurrently, you need that full hour just for raw transfer. Factor in encryption overhead and protocol framing, and the real number is lower.
For offsite replication to remote targets, bandwidth constraints are even tighter. Sync jobs that replicate to a remote Proxmox Backup Server instance share the same uplink as incoming backups.
Garbage Collection
GC runs per-datastore, not globally. While GC is running on a datastore, new backup jobs targeting that datastore will queue or fail depending on timing. GC on a 2 TB datastore with high churn can take 30+ minutes. Multiply that by 40 datastores and you have a real scheduling constraint.
Never Run GC During Active Backups
Garbage collection and backup jobs targeting the same datastore do not mix. GC can block writes and cause backup failures. Always schedule GC after the last backup job for that datastore completes.
Verification and Memory
Verification jobs (proxmox-backup-client verify) check chunk integrity on disk. They're read-heavy and can run alongside backups to other datastores, but they add I/O load. Memory usage scales with datastore size because PBS needs to hold chunk index structures in RAM during operations. A datastore with millions of chunks can consume several GB.
Scheduling Strategies
Staggered Start Times
The simplest and most effective strategy: don't start everything at once. Group clients by estimated backup size and spread start times across your backup window.
Small clients (under 20 GB changed data) finish in 10-30 minutes. Large clients (100+ GB) might need 2-3 hours. Schedule small, fast jobs early in the window. They finish quickly, freeing up I/O for the bigger jobs that follow.
Group by Duration, Not Data Size
A client with 500 GB total but only 2 GB daily change is a fast backup. A client with 50 GB total but 40 GB daily change is a slow one. Group by expected duration, not total dataset size.
Here's what a staggered schedule looks like for 20 clients across an 8-hour window (22:00 to 06:00 UTC):
20-Client Staggered Schedule
| Client | client-01 | client-02 | client-03 | client-04 | client-05 | client-06 | client-07 | client-08 | client-09 | client-10 | client-11 | client-12 | client-13 | client-14 | client-15 | client-16 | client-17 | client-18 | client-19 | client-20 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Data Size | 5 GB | 8 GB | 12 GB | 15 GB | 18 GB | 22 GB | 30 GB | 35 GB | 40 GB | 45 GB | 50 GB | 60 GB | 75 GB | 80 GB | 100 GB | 120 GB | 150 GB | 200 GB | 300 GB | 500 GB |
Start (UTC) | 22:00 | 22:00 | 22:15 | 22:15 | 22:30 | 22:30 | 23:00 | 23:00 | 23:30 | 23:30 | 00:00 | 00:00 | 00:45 | 01:00 | 01:30 | 02:00 | 02:30 | 03:00 | 03:00 | 03:00 |
Est. Duration | 10 min | 15 min | 15 min | 20 min | 20 min | 25 min | 30 min | 35 min | 40 min | 40 min | 45 min | 50 min | 60 min | 65 min | 90 min | 100 min | 120 min | 150 min | 180 min | 210 min |
GC Schedule | Sat 07:00 | Sat 07:00 | Sat 07:15 | Sat 07:15 | Sat 07:30 | Sat 07:30 | Sat 08:00 | Sat 08:00 | Sat 08:30 | Sat 08:30 | Sun 07:00 | Sun 07:00 | Sun 07:30 | Sun 07:30 | Sun 08:00 | Sun 08:00 | Sun 08:30 | Sun 09:00 | Sun 09:00 | Sun 10:00 |
Notice the pattern: two small clients start together every 15 minutes early in the window. As job sizes grow, start times spread out more. GC runs on weekends when there's no I/O contention.
Bandwidth-Aware Scheduling
Calculate your available bandwidth and work backwards. If you have a 1 Gbps link and 4 concurrent backup streams, each client gets roughly 250 Mbps of effective throughput. That might be fine for local LAN backups, but for remote targets it changes everything.
PBS supports the --rate-limit flag to cap bandwidth per backup job (specified in bytes per second). Use it to prevent a single large client from starving others.
proxmox-backup-client backup \
vm/100.img:/dev/vg/vm-100-disk-0 \
--repository remote-pbs:client-05 \
--rate-limit 52428800That caps the job at 50 MB/s (roughly 400 Mbps), leaving room for other concurrent streams. For offsite targets on remote-backups.com, edge locations reduce latency and improve throughput for geographically distributed clients.
Avoiding GC Conflicts
Garbage collection is the most common scheduling conflict in multi-client PBS deployments. The rules are straightforward:
- GC runs after all backups to that datastore are complete
- Stagger GC start times across datastores (don't fire 40 GC jobs at 06:00)
- Run GC on weekends for large datastores where the process takes 30+ minutes
- For small datastores (under 500 GB), weeknight GC at a staggered off-peak time works fine
proxmox-backup-manager garbage-collection start client-05-datastoreA practical approach: batch your datastores into GC groups. Group A runs Saturday morning, Group B Sunday morning. Within each group, stagger starts by 15-30 minutes so I/O load is distributed.
Timezone Handling
If you manage clients across timezones, you get distributed load for free. A client in UTC+1 wanting a "midnight backup" starts at 23:00 UTC. A client in UTC-5 wanting the same starts at 05:00 UTC. That spreads your window naturally.
Always Schedule in UTC
Store all cron schedules in UTC and document the local-time equivalent for each client. This prevents confusion during daylight saving transitions and makes automation consistent.
PVE Integration
Most MSP environments run PVE with vzdump targeting a PBS storage backend. Backup scheduling happens in PVE's job configuration, not on the PBS side.
vzdump: backup-client-05
enabled 1
storage pbs-client-05
schedule *-*-* 23:30:00
mailnotification failure
mode snapshot
compress zstd
notes-template {{guestname}}The schedule field uses systemd calendar format. *-*-* 23:30:00 means "every day at 23:30." For specific days, use patterns like Mon..Fri *-*-* 23:30:00 to skip weekends.
When using PBS namespaces (available in PBS 2.x), specify the target namespace in the PVE storage configuration. Each client's PVE cluster should write to its own isolated namespace or datastore to maintain tenant separation.
The critical coordination point: PVE backup jobs and PBS garbage collection must not overlap on the same datastore. If PVE fires a vzdump at 23:30 and GC is still running from a previous cycle, the backup will fail.
Monitoring Your Schedule
A schedule is only as good as its tracking. You need visibility into actual backup durations, not just whether jobs succeeded.
Track these metrics over time:
- Actual vs estimated duration for each client, reviewed weekly
- Jobs that consistently overrun their allocated window
- GC duration per datastore, which grows as the datastore grows
- Failed jobs and retry counts per night
PBS exposes task logs and metrics through its API. Feed these into Prometheus and Grafana for dashboards that show schedule health at a glance. When a client's backup takes 90 minutes instead of the usual 45, you want to know before it cascades into the next slot.
Adjust Quarterly
Review and adjust your schedule at least every quarter. Client data grows, new clients onboard, and what worked in January may cause overlap by April.
When Schedules Fail
Backup jobs fail. Disks fill up, networks hiccup, VMs lock their snapshot. Your schedule needs a plan for this.
Retry Strategy
Build a 60-90 minute retry window at the end of your backup window. If a job fails at 23:30, an automatic retry at 05:00 still finishes before business hours. Don't retry immediately. The condition that caused the failure (I/O saturation, network issue) might still be present. Wait for load to drop.
# Primary backup at 23:30 UTC
30 23 * * * /usr/local/bin/backup-client-05.sh
# Retry window at 05:00 UTC (only runs if primary failed)
0 5 * * * /usr/local/bin/backup-client-05.sh --retry-if-missedAlerting
Set up alerts on missed backups with clear thresholds. A single missed nightly backup is a warning. Two consecutive misses is critical. By the third morning without a successful backup, someone should be investigating.
Don't alert on every transient failure. Alert on patterns: a client that fails every Tuesday, a datastore where GC consistently runs past the backup window, a job whose duration has doubled over two weeks.
Common Mistakes
Scheduling Mistakes vs Best Practices
Common Mistakes
- Start all backups at midnight
- Run GC on a fixed daily schedule regardless of backup timing
- Estimate backup windows once and never revisit
- No retry window in the schedule
- Schedule in local time per client
Best Practices
- Stagger start times based on expected duration
- Schedule GC per-datastore after backups complete
- Track actual durations and adjust quarterly
- Reserve 60-90 minutes at end of window for retries
- Use UTC everywhere, document local equivalents
Wrapping Up
Scheduling PBS backups for dozens of clients is not a one-time task. It requires understanding your resource constraints, staggering jobs based on real duration data, keeping garbage collection out of backup windows, and building in room for failures. Start with the staggered approach, monitor actual performance weekly, and adjust quarterly. The payoff is mornings that start with green dashboards instead of a flood of alerts.
Need managed PBS with built-in scheduling?
remote-backups.com handles scheduling, garbage collection, monitoring, and geo-replication for multi-client Proxmox Backup Server environments.
View Plans


