On Wednesday 26 April 2006 23:06, Matt Zimmerman wrote: > ... > Note that the package file format incorporates gzip compression (gzip -9), > so you could get a very close approximation of the package size delta by > comparing the gzip-compressed sizes of the images before and after > recompressing > ...
I created gzip-compressed tarballs of each scenario, resulting in these numbers: Original files including dupes tar-gz'ed, 41.449.521 bytes Original unique files tar-gz'ed, 35.481.117 bytes Recompressed (fast) unique files tar-gz'ed, 28.949.292 bytes Recompressed (max) unique files tar-gz'ed, 28.941.136 bytes These numbers closely resemble the bzip2 numbers and also show a difference of 11 to 12 Mb of savings in the most probable scenario if all duplicates could be symlinked. > ... > Replacing these with symlinks may not be as easy as it appears; consider > that the package with the symlink must depend on the package with the > actual file, and this may not always be desirable or appropriate. > > Could you send the list of duplicate images? > ... The list is quite long, so I compressed it and placed it online. You can download the list here: http://www.ffnn.nl/media/external/ubuntu/dapper-duplicates-060425.txt.bz2 You will see duplicate-group headers giving the file size of the file and after that a list of all binary equal files is listed. I think the file is pretty self-explanatory once you open it. I hope this information is useful regarding the subject. With kind regards, Frank Schoep -- ubuntu-art mailing list ubuntu-art@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-art