On Wednesday 26 April 2006 23:06, Matt Zimmerman wrote:
> ...
> Note that the package file format incorporates gzip compression (gzip -9),
> so you could get a very close approximation of the package size delta by
> comparing the gzip-compressed sizes of the images before and after
> recompressing
> ... 

I created gzip-compressed tarballs of each scenario, resulting in these 
numbers:
Original files including dupes tar-gz'ed, 41.449.521 bytes
Original unique files tar-gz'ed, 35.481.117 bytes
Recompressed (fast) unique files tar-gz'ed, 28.949.292 bytes
Recompressed (max) unique files tar-gz'ed, 28.941.136 bytes

These numbers closely resemble the bzip2 numbers and also show a difference of 
11 to 12 Mb of savings in the most probable scenario if all duplicates could 
be symlinked.

> ...
> Replacing these with symlinks may not be as easy as it appears; consider
> that the package with the symlink must depend on the package with the
> actual file, and this may not always be desirable or appropriate.
>
> Could you send the list of duplicate images?
> ...

The list is quite long, so I compressed it and placed it online. You can 
download the list here:
http://www.ffnn.nl/media/external/ubuntu/dapper-duplicates-060425.txt.bz2

You will see duplicate-group headers giving the file size of the file and 
after that a list of all binary equal files is listed. I think the file is 
pretty self-explanatory once you open it. I hope this information is useful 
regarding the subject.

With kind regards,

Frank Schoep

-- 
ubuntu-art mailing list
ubuntu-art@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-art

Reply via email to