> Assuming that you may pick a specific compression algorithm,
> most algorithms can have different levels/percentages of
> deflations/inflations which affects the time to compress
> and/or inflate wrt the CPU capacity.

Yes?  I'm not sure what your point is.  Are you suggesting that, rather than 
hard-coding (for instance) the nine "gzip1/gzip2/.../gzip9" alternatives, it 
would be useful to have a "gzip" setting with a compression level? That might 
make some sense, but in practice, there's a limited number of compression 
algorithms and limited utility for setting the degree of compression, so the 
current approach doesn't seem to sacrifice much. (If you get into more complex 
compression algorithms, there are more knobs to tweak, too; and it doesn't seem 
particularly useful to expose all of those.)

> Secondly, if I can add an additional item, would anyone
> want to be able to encrypt the data vs compress

Yes, and I think Darren Moffat is working on it.  Encryption & compression are 
orthogonal, though.  (The only constraint is that it's far preferable to 
compress first, then encrypt, since compression relies on regularity in the 
data stream which encryption removes.)

>       Third, if data were to be compressed within a file
>       object, should a reader be made aware that the data
>       being read is compressed or should he just read
>       garbage?

I don't understand your question here. Compression is transparent, so a reader 
will get back exactly what was written. Both the compression and decompression 
happen automatically.

(There's a separate issue that backup applications would like to be able to 
read the compressed data directly; I haven't paid attention to see if there's 
an ioctl to enable this yet.)

> Fourth, if you take 8k and expect to alloc 8k of disk
> block storage for it and compress it to 7k, are you
> really saving 1k? Or are you just creating an additional
> 1K of internal fragmentation?

You're really saving 1K, because the disk space is not allocated until after 
the compression step. Remember, ZFS uses variably-sized blocks. In your 
example, you'll allocate a 1K block which happens to hold 8K worth of the 
user's data.

>       Fifth and hopefully last, should the znode have a
> new length field that keeps the non-compressed length
> for Posix compatibility.

With this & your third question, I think you've got a fundamental 
misunderstanding of what the compression in ZFS does. It is transparent to the 
application. The application reads & writes uncompressed data, it sees 
uncompressed files, it doesn't even have any way to know that the file has been 
compressed (except for looking at stat data & counting the blocks used).

> Really last..., why not just compress the data
> a stream
> before writing it out to disk? Then you can at least
> t do
> a file on it and identify the type of compression...

This is preferable when the application supports it, because it allows you to 
compress the whole file at once and get better compression ratios, choose an 
appropriate compression algorithm, not try to compress incompressible data, 
etc. However, it's less general, since it requires that the application do the 
compression. If you have existing applications which only deal with 
uncompressed data, then having the file system do the compression is useful.

This isn't exactly new. Stak did this for Windows (at the disk level, not the 
file system level) in the 1980s. File system level compression came in around 
the same time (DiskDoubler and StuffIt SpaceSaver on the Mac, for instance). 
Windows NTFS has built-in compression, but it compresses the whole file, rather 
than individual blocks. (Better compression, but the performance isn't as good 
if you're only reading a small portion of the file.)

Anton
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to