You back up a 100GB VM tonight. Tomorrow you back it up again and the snapshot finishes in 90 seconds, transferring around 800MB to the server. That is not magic, and it is not a clever incremental flag inside the guest. It is chunking, specifically content-defined chunking with a rolling hash, and once you see how it works the rest of Proxmox Backup Server starts to make sense.
Key Takeaways
- PBS uses content-defined chunking (CDC), not fixed-block chunking.
- Split points are decided by a Buzhash rolling hash over a sliding window, not by byte offset.
- Target chunk size is ~4MB on average, with min and max bounds to handle outliers.
- Each chunk is identified by the SHA-256 of its contents, so identical bytes always produce the same chunk.
- Inserting a byte at the start of a file shifts boundaries only locally, not for the whole disk.
- Encrypting before chunking destroys dedup. PBS does the opposite, which is why dedup survives encryption.
The Backup Chunking Problem
A running VM has a 100GB virtual disk. Last night's backup uploaded the full image. Tonight, somewhere between 50MB and 5GB of bytes changed, depending on workload. You need to figure out which pieces actually moved and ship only those.
The naive answer is "diff the files." That works on text. It does not work on a block device with about 100 billion potential byte positions and no useful diff semantics.
The real answer needs three things. First, a way to break the disk into pieces. Second, a way to recognise pieces that already exist on the server so you skip them. Third, the recognition step has to survive small edits without invalidating the entire disk.
That third requirement is the hard one. The first part of this series covered how PBS talks to the client over HTTP/2; chunking is the next layer up, and it is where the dedup story begins.
Approach 1: Fixed-Block Chunking
The obvious approach is to slice the disk into equal-sized blocks. Pick a size, say 4MB. Block 0 is bytes 0 through 4MB. Block 1 is bytes 4MB through 8MB. Hash each block. Compare hashes to last night. Upload the ones that changed.
This works when writes are aligned to the block size. Databases that overwrite 4KB pages in place produce predictable change sets. ZFS scrubs do not move blocks around. For these workloads, fixed-block chunking is fine.
It falls apart the moment something inserts or removes bytes inside a file. Imagine a 1GB log file. Someone prepends a 17-byte header. Every subsequent byte in the file has shifted forward by 17. Block 0 is now slightly different. Block 1 is completely different from yesterday's block 1 because its contents are mostly yesterday's block 0. Block 2 is yesterday's block 1, shifted. And so on for every block in the file.
Your dedup ratio just collapsed to zero, even though 99.99% of the data is identical.
This is why some S3 backup tools have poor dedup ratios
A lot of VM-backup-to-object-storage products still use fixed-block chunking under the hood. That choice is fine for snapshot-style consistent block writes, but it explains why their reported dedup ratios are often a fraction of what Proxmox Backup Server achieves on the same workload.
Approach 2: Content-Defined Chunking
Content-defined chunking (CDC) flips the question. Instead of asking "where does block N start," it asks "given the bytes I am looking at right now, should I cut a chunk boundary here?"
The mechanism looks like this. A small window slides through the data one byte at a time. At each position, a rolling hash is computed over the window's current contents. When the hash hits a chosen marker pattern, that position becomes a chunk boundary. Repeat until the file ends.
The marker is something cheap to check, usually "the low N bits of the hash are all zero." Picking N controls the average distance between cuts, which controls the average chunk size. The hash is content-driven, so cuts happen at content-driven positions.
Now reconsider the 17-byte prepend. The window slides into yesterday's content after only a few KB of new bytes. As soon as the window contains the same bytes it contained yesterday at some position, the hash produces the same value as yesterday, and the cut decisions match. The boundaries resync within a small region around the edit. Everything after that point produces identical chunks.
That is the property that makes incremental backups of a 100GB disk feasible.
The Buzhash Rolling Hash
Proxmox Backup Server uses Buzhash for the rolling hash step. There are alternatives (Rabin fingerprinting, FastCDC, Gear hashing, others), and each has trade-offs. Buzhash is fast to compute, easy to make truly rolling (constant-time add-byte and remove-byte operations), and has predictable enough distribution to land on the target chunk size.
The mechanics are simple enough to fit in a paragraph. Each possible byte value gets assigned a random 32-bit constant ahead of time, stored in a lookup table. The current hash is the XOR of rotated copies of the constants for each byte in the window. To slide forward one position, you XOR in the new byte's constant and XOR out the oldest byte's constant after rotating to account for its age. No expensive modular arithmetic, no big-integer math, no cryptographic operations.
WINDOW_SIZE = 64
TARGET_MASK = (1 << 22) - 1 # ~4 MiB average chunk size
MIN_CHUNK = 512 * 1024 # 512 KiB
MAX_CHUNK = 16 * 1024 * 1024 # 16 MiB
def chunk_stream(data, table):
h = 0
window = bytearray(WINDOW_SIZE)
chunk_start = 0
for i, byte in enumerate(data):
old = window[i % WINDOW_SIZE]
window[i % WINDOW_SIZE] = byte
# Roll the hash: add new byte, remove old byte.
# Illustrative only; real Buzhash uses the rotation amount
# accumulated over W iterations.
h = rotate_left(h, 1) ^ table[byte] ^ rotate_left(table[old], WINDOW_SIZE % 32)
size = i - chunk_start
if size < MIN_CHUNK:
continue
if size >= MAX_CHUNK or (h & TARGET_MASK) == 0:
yield data[chunk_start:i + 1]
chunk_start = i + 1That snippet is not the real PBS implementation. It is the idea, compressed to fit on a screen. The production version is written in Rust, handles edge cases, and is heavily optimised for cache-friendly access patterns. But the loop body is conceptually the same.
The rolling hash does not need to be cryptographically strong
Buzhash collisions are fine here. The rolling hash only decides where to cut. Once a chunk has been cut, Proxmox Backup Server hashes it again with SHA-256 to produce the chunk's actual identity. That second hash is the one that needs to be collision-resistant, and SHA-256 is.
Why ~4MB Average?
The chunk size target is a tunable parameter in any CDC system, and the choice has real consequences. PBS lands on roughly 4MB average for good reasons.
Chunk size trade-offs
| Property | 256 KB | 4 MB (PBS default) | 16 MB |
|---|---|---|---|
Dedup quality on small edits | Excellent | Very good | Mediocre |
Metadata overhead | Heavy | Moderate | Light |
Network round-trips per backup | Many | Manageable | Few |
Restore granularity | Fine-grained | Good | Coarse |
Best fit for | Mostly-text repos, source trees | VMs, containers, mixed workloads | Cold archives, large media files |
Smaller chunks deduplicate better in theory because a small edit invalidates a smaller region. They also produce far more metadata, multiply chunk-existence checks on the wire, and turn the chunk store into a directory tree with millions of tiny files. Larger chunks waste bandwidth when edits land in the middle of a 16MB region and force the whole thing to be re-uploaded.
Around 4MB is where the curve flattens for typical VM workloads. The min and max bounds (512 KiB and 16 MiB) handle the pathological cases: a hash that never hits the cut condition would otherwise produce a single giant chunk; one that hits constantly would produce thousands of tiny ones.
SHA-256 as Chunk Identity
After a chunk is cut, the bytes go through SHA-256. The resulting 32-byte hash is the chunk's identity, full stop. On disk in the chunk store, the chunk lives at a path derived from its hash, sharded by the first byte or two to keep directory sizes sane.
Two chunks with identical contents always produce identical hashes. That is the property that makes cross-VM dedup work. If your web cluster runs ten VMs from the same Debian template, the chunks that contain /usr/bin/python3 are physically stored exactly once. The next post in this series goes into how the deduplication and chunk store layer handles reference counting and physical storage on disk.
When client-side encryption is enabled, the chunk identity is derived from an HMAC-SHA-256 keyed by the per-datastore encryption key over the plaintext instead of a plain SHA-256, so dedup still works across snapshots that share a keyring. The detail belongs in the encryption deep dive, not here. For now, the takeaway is that the chunk ID is always content-derived.
A Real Example: 100GB Linux VM Across Two Backups
Concrete numbers make this less abstract. Picture a 100GB Linux VM running a standard application stack.
Day 1 (first backup):
Disk size: 100 GB
Average chunk size: ~4 MB
Approx. chunks produced: 25,000
Chunks already on server: 0 (new datastore)
Chunks uploaded: 25,000
Bytes on wire: ~100 GB (compressed slightly)
Day 2 (incremental, after 5 GB of writes):
Disk size: 100 GB (unchanged)
Writes since Day 1: 5 GB scattered
(logs, apt updates, app working set)
Chunks fully invalidated: ~1,250
Boundary-shifted chunks: ~10-30 (edges of changed regions)
Chunks uploaded: ~1,275
Chunks reused server-side: ~23,725
Bytes on wire: ~5-6 GBDay 1 is the boring one. Everything is new, everything gets uploaded, the wire moves close to the raw disk size minus whatever compression saves on top.
Day 2 is where CDC earns its keep. The 5GB of changes are scattered across log files, package updates, and application working data. Each of those regions invalidates the chunks that overlap it. The rolling hash resyncs within a few KB after each edit, so only a handful of boundary-adjacent chunks need to be rehashed and reuploaded. Each contiguous edit region produces one or two boundary-shifted chunks at its edges, so a handful of edit regions adds up to roughly a dozen extra chunks across the whole disk. The rest of the disk is identical to yesterday and the server already has those chunks. The client never sends them.
The wire ends up moving roughly the volume of changed data plus a small overhead for the chunk-existence checks. That is the 800MB-on-an-otherwise-quiet-day pattern, scaled up to a workload with real activity.
Random-write databases break the pattern
A 50GB PostgreSQL instance under heavy OLTP load can rewrite blocks anywhere on the device between snapshots. Even though the total write volume might be 2GB, those writes touch many chunk boundaries, so the chunk count uploaded is closer to the change footprint than to the change size. Dedup still helps across snapshots of the same database; it just helps less than it does on an OS disk.
When Chunking Wins, When It Does Not
Be honest about the wins and losses. Content-defined chunking is fantastic when neighbouring writes preserve neighbouring bytes. It struggles when neighbouring writes produce uncorrelated bytes.
The wins:
- OS images, application binaries, source trees. Most files change rarely; when they do change, the changes are localised. Dedup ratios of 20:1 or higher are common across snapshots and across VMs that share a base image.
- Log files and append-mostly data. New writes go to the end; old data does not move. The first few hundred chunks of a log file are identical day after day.
- Document trees, media libraries, software repositories. Files are added, occasionally edited, rarely shuffled. Cross-snapshot dedup is excellent.
The losses:
- Pre-encrypted volumes. A LUKS-encrypted disk looks like noise. Two snapshots of the same encrypted disk taken seconds apart have nothing in common at the chunk level if the encryption layer rotates anything. Do not use guest-level full-disk encryption like LUKS if you want PBS dedup to work; let PBS handle encryption at the chunk layer instead, where dedup is preserved.
- Pre-compressed archives. A 10GB tarball that gets one file added gets a new compression dictionary. Every byte after the change is different. Dedup ratios drop to near zero.
- Random-write databases. Already covered above. The dedup story is "less bad than fixed-block chunking," not "great."
The rule of thumb: if you can predict that yesterday's bytes will appear in tomorrow's snapshot at roughly similar positions, CDC will find the overlap. If yesterday and tomorrow have nothing visually in common at the byte level, chunking can only do so much.
The Encryption Ordering Question
This belongs in part 4 of the series, but it is worth flagging here because it explains a choice that confuses people coming from other backup products.
If you encrypt the disk before chunking, every snapshot looks like uniform random noise. The Buzhash rolling hash will still produce chunks, but no two snapshots will share any. Dedup ratio: 1.0. That is what happens with naive server-side encryption schemes.
Proxmox Backup Server encrypts at the chunk level, after the content has been chunked, using an HMAC over the plaintext to derive a deterministic chunk ID. The server stores ciphertext and never sees the plaintext, but identical plaintext from the same keyring still produces identical chunk IDs. Dedup survives. Privacy survives. That ordering decision is one of the more important design choices in the protocol, and part 4 of this series digs into how encryption preserves dedup.
Wrapping Up
Chunks are the atom of Proxmox Backup Server. Dedup, encryption, sync replication, pruning and garbage collection, restore, verification: everything operates on top of the single decision to cut chunk boundaries based on content rather than position. Once you internalise that one mechanism, the rest of the system stops looking like magic and starts looking like a sensible pipeline of small steps. The next post picks up at the chunk store itself and the index files that turn a pile of chunks back into a coherent VM disk.
Want the dedup math working in your favour?
remote-backups.com runs managed Proxmox Backup Server targets in EU datacenters, with content-defined chunking, client-side encryption, and per-tenant isolation by default. You get the protocol benefits without operating the server.
View Plans


