Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

Nicolas Williams Tue, 18 Jan 2011 09:51:45 -0800

On Tue, Jan 18, 2011 at 07:16:04AM -0800, Orvar Korvar wrote:
> BTW, I thought about this. What do you say?
> 
> Assume I want to compress data and I succeed in doing so. And then I
> transfer the compressed data. So all the information I transferred is
> the compressed data. But, then you don't count all the information:
> knowledge about which algorithm was used, which number system, laws of
> math, etc. So there are lots of other information that is implicit,
> when compress/decompress - not just the data.
> 
> So, if you add data and all implicit information you get a certain bit
> size X. Do this again on the same set of data, with another algorithm
> and you get another bit size Y. 
> 
> You compress the data, using lots of implicit information. If you use
> less implicit information (simple algorithm relying on simple math),
> will X be smaller than if you use lots of implicit information
> (advanced algorithm relying on a large body of advanced math)? What
> can you say about the numbers X and Y? Advanced math requires many
> math books that you need to transfer as well.


Just as the laws of thermodynamics preclude perpetual motion machines,
so do they preclude infinite, loss-less data compression.  Yes,
thermodynamics and information theory are linked, amazingly enough.

Data compression algorithms work by identifying certain types of
patterns, then replacing the input with notes such as "pattern 1 is ...
and appears at offsets 12345 and 1234567" (I'm simplifying a lot).  Data
that has few or no observable patterns (observable by the compression
algorithm in question) will not compress, but will expand if you insist
on compressing -- randomly-generated data (e.g., the output of
/dev/urandom) will not compress at all and will expand if you insist.
Even just one bit needed to indicate whether a file is compressed or not
will mean expansion when you fail to compress and store the original
instead of the "compressed" version.  Data compression reduces
repetition, thus making it harder to further compress compressed data.

Try it yourself.  Try building a pipeline of all the compression tools
you have, see how many rounds of compression you can apply to typical
data before further compression fails.

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

Reply via email to