On 12/20/25 15:27, [email protected] wrote:
I've used tarsnap to back up a series of old cd/dvd/external drive backups, a
lot of which were duplicate directories of photos.
It's costing more than I'd budgeted so I was looking to cut down what I was
storing and wanted to see what was taking up the most space.
I used the `tarsnap --print-stats -f '*'` command from the docs but the output
confuses me: the 'All archives (unique data)' line lists the compressed size
as ~140GB, and this tallies with my monthly spend. But the sum of all the
individual archives' compressed unique data only comes to ~41GB, which I
wasn't expecting.
[...]
So either I've missed something or I've misunderstood how to interpret the
stats, but I'm not sure how to figure out what I can prune if 70% of the
storage is not directly attributed -- any pointers?
"Unique data" means "how much data is in this archive *and not any others*".
Or from a different perspective: It tells you how much data will be removed
if you delete that *one* archive.
If you have two identical archives, they'll both show very close to zero
"unique data" (just a very small amount of non-deduplicated metadata).
So, of your ~140 GB of data, ~40 GB is blocks which are present in only one
archive and the other ~100 GB is blocks which appear in multiple archives.
Some of those blocks might appear in two archives; some might be found in
every single one of your archives. (Tarsnap does actually know for each
block of data how many archives use it -- it needs this reference count in
order to know when it can be deleted -- but there's no interface to that
information.)
--
Colin Percival
FreeBSD Release Engineering Lead & EC2 platform maintainer
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid