remote-backups.comremote-backups.com
Contact illustration
Sign In
Don't have an account ?Sign Up

PBS Backup Scheduling for Multi-Client

You have 40 clients. They all want nightly backups. You have an 8-hour overnight window and a single Proxmox Backup Server cluster. Do the math. If every client kicks off at midnight, your disk I/O flatlines, network saturates, and half the jobs fail or run into the morning. Garbage collection never gets a chance to finish. Verification sits in a queue. Monday morning starts with a wall of alerts.

Scheduling PBS backups across dozens of clients is an engineering problem, not a checkbox exercise. This post covers how to build a schedule that actually works.

Key Takeaways
  • Stagger backup start times across your window instead of running everything at once
  • Schedule garbage collection per-datastore AFTER that datastore's backups complete
  • Small, fast clients go first; large clients get dedicated late-window slots
  • Always use UTC for cron schedules to avoid DST surprises
  • Track actual backup durations weekly and adjust the schedule based on real data
  • Build retry windows into the schedule so a failed job doesn't cascade into the next day

The Scheduling Problem

When you manage a single Proxmox Backup Server instance, one or two backup jobs at midnight works fine. Scale that to 20, 40, or 60 clients and everything changes. Concurrent backup jobs compete for the same resources: disk I/O, network bandwidth, CPU for chunk indexing, and memory for deduplication lookups.

PBS handles concurrent writes to the same datastore, but performance degrades as parallelism increases. If you run a datastore-per-client architecture, each job writes to its own datastore. That helps with lock contention but doesn't solve I/O bottleneck on the same physical storage.

The goal is simple: every client backed up, verified, with garbage collection completed and room for retries, all before business hours.

Understanding PBS Resource Constraints

Before designing a schedule, you need to know what's fighting for resources.

Disk I/O

PBS is write-heavy during backup ingestion. Each backup job streams chunks to disk, deduplicates against the chunk index, and writes new chunks plus the snapshot manifest. During garbage collection, the pattern flips to read-heavy as PBS scans all chunk references. Running both simultaneously is the fastest way to tank throughput.

Network Bandwidth

Every client pushes data over the network. A 1 Gbps link supports roughly 450 GB/hour of throughput in practice. If you have 10 clients each sending 50 GB concurrently, you need that full hour just for raw transfer. Factor in encryption overhead and protocol framing, and the real number is lower.

For offsite replication to remote targets, bandwidth constraints are even tighter. Sync jobs that replicate to a remote Proxmox Backup Server instance share the same uplink as incoming backups.

Garbage Collection

GC runs per-datastore, not globally. While GC is running on a datastore, new backup jobs targeting that datastore will queue or fail depending on timing. GC on a 2 TB datastore with high churn can take 30+ minutes. Multiply that by 40 datastores and you have a real scheduling constraint.

Never Run GC During Active Backups

Garbage collection and backup jobs targeting the same datastore do not mix. GC can block writes and cause backup failures. Always schedule GC after the last backup job for that datastore completes.

Verification and Memory

Verification jobs (proxmox-backup-client verify) check chunk integrity on disk. They're read-heavy and can run alongside backups to other datastores, but they add I/O load. Memory usage scales with datastore size because PBS needs to hold chunk index structures in RAM during operations. A datastore with millions of chunks can consume several GB.

Scheduling Strategies

Staggered Start Times

The simplest and most effective strategy: don't start everything at once. Group clients by estimated backup size and spread start times across your backup window.

Small clients (under 20 GB changed data) finish in 10-30 minutes. Large clients (100+ GB) might need 2-3 hours. Schedule small, fast jobs early in the window. They finish quickly, freeing up I/O for the bigger jobs that follow.

Group by Duration, Not Data Size

A client with 500 GB total but only 2 GB daily change is a fast backup. A client with 50 GB total but 40 GB daily change is a slow one. Group by expected duration, not total dataset size.

Here's what a staggered schedule looks like for 20 clients across an 8-hour window (22:00 to 06:00 UTC):

20-Client Staggered Schedule
Client
client-01
client-02
client-03
client-04
client-05
client-06
client-07
client-08
client-09
client-10
client-11
client-12
client-13
client-14
client-15
client-16
client-17
client-18
client-19
client-20
Data Size
5 GB
8 GB
12 GB
15 GB
18 GB
22 GB
30 GB
35 GB
40 GB
45 GB
50 GB
60 GB
75 GB
80 GB
100 GB
120 GB
150 GB
200 GB
300 GB
500 GB
Start (UTC)
22:00
22:00
22:15
22:15
22:30
22:30
23:00
23:00
23:30
23:30
00:00
00:00
00:45
01:00
01:30
02:00
02:30
03:00
03:00
03:00
Est. Duration
10 min
15 min
15 min
20 min
20 min
25 min
30 min
35 min
40 min
40 min
45 min
50 min
60 min
65 min
90 min
100 min
120 min
150 min
180 min
210 min
GC Schedule
Sat 07:00
Sat 07:00
Sat 07:15
Sat 07:15
Sat 07:30
Sat 07:30
Sat 08:00
Sat 08:00
Sat 08:30
Sat 08:30
Sun 07:00
Sun 07:00
Sun 07:30
Sun 07:30
Sun 08:00
Sun 08:00
Sun 08:30
Sun 09:00
Sun 09:00
Sun 10:00

Notice the pattern: two small clients start together every 15 minutes early in the window. As job sizes grow, start times spread out more. GC runs on weekends when there's no I/O contention.

Bandwidth-Aware Scheduling

Calculate your available bandwidth and work backwards. If you have a 1 Gbps link and 4 concurrent backup streams, each client gets roughly 250 Mbps of effective throughput. That might be fine for local LAN backups, but for remote targets it changes everything.

PBS supports the --rate-limit flag to cap bandwidth per backup job (specified in bytes per second). Use it to prevent a single large client from starving others.

bash
proxmox-backup-client backup \
    vm/100.img:/dev/vg/vm-100-disk-0 \
    --repository remote-pbs:client-05 \
    --rate-limit 52428800
Bandwidth-limited backup job

That caps the job at 50 MB/s (roughly 400 Mbps), leaving room for other concurrent streams. For offsite targets on remote-backups.com, edge locations reduce latency and improve throughput for geographically distributed clients.

Avoiding GC Conflicts

Garbage collection is the most common scheduling conflict in multi-client PBS deployments. The rules are straightforward:

  1. GC runs after all backups to that datastore are complete
  2. Stagger GC start times across datastores (don't fire 40 GC jobs at 06:00)
  3. Run GC on weekends for large datastores where the process takes 30+ minutes
  4. For small datastores (under 500 GB), weeknight GC at a staggered off-peak time works fine
bash
proxmox-backup-manager garbage-collection start client-05-datastore
Trigger GC for a specific datastore

A practical approach: batch your datastores into GC groups. Group A runs Saturday morning, Group B Sunday morning. Within each group, stagger starts by 15-30 minutes so I/O load is distributed.

Timezone Handling

If you manage clients across timezones, you get distributed load for free. A client in UTC+1 wanting a "midnight backup" starts at 23:00 UTC. A client in UTC-5 wanting the same starts at 05:00 UTC. That spreads your window naturally.

Always Schedule in UTC

Store all cron schedules in UTC and document the local-time equivalent for each client. This prevents confusion during daylight saving transitions and makes automation consistent.

PVE Integration

Most MSP environments run PVE with vzdump targeting a PBS storage backend. Backup scheduling happens in PVE's job configuration, not on the PBS side.

bash
vzdump: backup-client-05
    enabled 1
    storage pbs-client-05
    schedule *-*-* 23:30:00
    mailnotification failure
    mode snapshot
    compress zstd
    notes-template {{guestname}}
/etc/pve/jobs.cfg — vzdump schedule targeting PBS

The schedule field uses systemd calendar format. *-*-* 23:30:00 means "every day at 23:30." For specific days, use patterns like Mon..Fri *-*-* 23:30:00 to skip weekends.

When using PBS namespaces (available in PBS 2.x), specify the target namespace in the PVE storage configuration. Each client's PVE cluster should write to its own isolated namespace or datastore to maintain tenant separation.

The critical coordination point: PVE backup jobs and PBS garbage collection must not overlap on the same datastore. If PVE fires a vzdump at 23:30 and GC is still running from a previous cycle, the backup will fail.

Monitoring Your Schedule

A schedule is only as good as its tracking. You need visibility into actual backup durations, not just whether jobs succeeded.

Track these metrics over time:

  • Actual vs estimated duration for each client, reviewed weekly
  • Jobs that consistently overrun their allocated window
  • GC duration per datastore, which grows as the datastore grows
  • Failed jobs and retry counts per night

PBS exposes task logs and metrics through its API. Feed these into Prometheus and Grafana for dashboards that show schedule health at a glance. When a client's backup takes 90 minutes instead of the usual 45, you want to know before it cascades into the next slot.

Adjust Quarterly

Review and adjust your schedule at least every quarter. Client data grows, new clients onboard, and what worked in January may cause overlap by April.

When Schedules Fail

Backup jobs fail. Disks fill up, networks hiccup, VMs lock their snapshot. Your schedule needs a plan for this.

Retry Strategy

Build a 60-90 minute retry window at the end of your backup window. If a job fails at 23:30, an automatic retry at 05:00 still finishes before business hours. Don't retry immediately. The condition that caused the failure (I/O saturation, network issue) might still be present. Wait for load to drop.

bash
# Primary backup at 23:30 UTC
30 23 * * * /usr/local/bin/backup-client-05.sh

# Retry window at 05:00 UTC (only runs if primary failed)
0  5  * * * /usr/local/bin/backup-client-05.sh --retry-if-missed
Cron entry with retry window

Alerting

Set up alerts on missed backups with clear thresholds. A single missed nightly backup is a warning. Two consecutive misses is critical. By the third morning without a successful backup, someone should be investigating.

Don't alert on every transient failure. Alert on patterns: a client that fails every Tuesday, a datastore where GC consistently runs past the backup window, a job whose duration has doubled over two weeks.

Common Mistakes

Scheduling Mistakes vs Best Practices
Common Mistakes
  • Start all backups at midnight
  • Run GC on a fixed daily schedule regardless of backup timing
  • Estimate backup windows once and never revisit
  • No retry window in the schedule
  • Schedule in local time per client
Best Practices
  • Stagger start times based on expected duration
  • Schedule GC per-datastore after backups complete
  • Track actual durations and adjust quarterly
  • Reserve 60-90 minutes at end of window for retries
  • Use UTC everywhere, document local equivalents

Wrapping Up

Scheduling PBS backups for dozens of clients is not a one-time task. It requires understanding your resource constraints, staggering jobs based on real duration data, keeping garbage collection out of backup windows, and building in room for failures. Start with the staggered approach, monitor actual performance weekly, and adjust quarterly. The payoff is mornings that start with green dashboards instead of a flood of alerts.

Need managed PBS with built-in scheduling?

remote-backups.com handles scheduling, garbage collection, monitoring, and geo-replication for multi-client Proxmox Backup Server environments.

View Plans

Yes. GC runs per-datastore, not globally. GC on datastore A does not affect backup writes to datastore B. The conflict only occurs when GC and backups target the same datastore simultaneously.

There's no hard-coded limit. The practical ceiling depends on disk I/O, memory, and network bandwidth. Most deployments run 4-8 concurrent backup streams comfortably on a server with NVMe storage and 64 GB RAM. Beyond that, test and monitor.

Use PVE scheduling (vzdump jobs) for VM and container backups, since PVE manages the snapshot lifecycle. Use PBS-side scheduling for sync jobs, GC, pruning, and verification. The two systems coordinate through the PBS storage backend.

Add additional time slots for those clients, but treat each slot as a separate scheduling entry. A client backing up at 12:00 and 00:00 UTC occupies two slots in your schedule, and both need staggering relative to other jobs in that time range.
Bennet Gallein
Bennet Gallein

remote-backups.com operator

Infrastructure enthusiast and founder of remote-backups.com. I build and operate reliable backup infrastructure powered by Proxmox Backup Server, so you can focus on what matters most: your data staying safe.