[zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread Carsten Aulbert
Hi all,

I was just reading
http://blogs.sun.com/dap/entry/zfs_compression

and would like to know what the experience of people is about enabling
compression in ZFS.

In principle I don't think it's a bad thing, especially not when the
CPUs are fast enough to improve the performance as the hard drives might
be too slow. However, I'm missing two aspects:

o what happens when a user opens the file and does a lot of seeking
inside the file? For example our scientists use a data format where
quite compressible data is contained in stretches and the file header
contains a dictionary where each stretch of data starts. If these files
are compressed on disk, what will happen with ZFS? Will it just make
educated guesses, or does it have to read all of the typically 30-150 MB
of the file and then does the seeking from buffer caches?

o Another problem I see (but probably isn't): A user is accessing a file
via a NFS-exported ZFS, appending a line of text, closing the file (and
hopefully also flushing everything correctly. However, then the user
opens it again appends another line of text, ... Imagine this happening
a few times per second. How will ZFS react to this pattern? Will it only
opens the final record of the file, uncompress it, adds data,
recompresses it, flushes it to disk and reports that back to the user's
processes? Is there a potential problem here?

Cheers (and sorry if these questions are stupid ones)

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread Richard Elling

Carsten Aulbert wrote:

Hi all,

I was just reading
http://blogs.sun.com/dap/entry/zfs_compression

and would like to know what the experience of people is about enabling
compression in ZFS.

In principle I don't think it's a bad thing, especially not when the
CPUs are fast enough to improve the performance as the hard drives might
be too slow. However, I'm missing two aspects:

o what happens when a user opens the file and does a lot of seeking
inside the file? For example our scientists use a data format where
quite compressible data is contained in stretches and the file header
contains a dictionary where each stretch of data starts. If these files
are compressed on disk, what will happen with ZFS? Will it just make
educated guesses, or does it have to read all of the typically 30-150 MB
of the file and then does the seeking from buffer caches?
  


Files are not compressed in ZFS.  Blocks are compressed.

If the compression of the blocks cannot gain more than 12.5% space savings,
then the block will not be compressed.  If your file contains 
compressable parts

and uncompressable parts, then (depending on the size/blocks) it may be
partially compressed.


o Another problem I see (but probably isn't): A user is accessing a file
via a NFS-exported ZFS, appending a line of text, closing the file (and
hopefully also flushing everything correctly. However, then the user
opens it again appends another line of text, ... Imagine this happening
a few times per second. How will ZFS react to this pattern? Will it only
opens the final record of the file, uncompress it, adds data,
recompresses it, flushes it to disk and reports that back to the user's
processes? Is there a potential problem here?
  


The file will be cached in RAM. When the file is closed and synced, the data
will be written to the ZIL and ultimately to the data set.  I don't 
think there

is a fundamental problem here... you should notice the NFS sync behaviour
whether the backing store is ZFS or some other file system. Using a slog
or nonvolatile write cache will help performance for such workloads.


Cheers (and sorry if these questions are stupid ones)
  


They are good questions :-)
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread Carsten Aulbert
Hi Richard,

Richard Elling wrote:
 
 Files are not compressed in ZFS.  Blocks are compressed.

Sorry, yes, I was not specific enough.

 
 If the compression of the blocks cannot gain more than 12.5% space savings,
 then the block will not be compressed.  If your file contains
 compressable parts
 and uncompressable parts, then (depending on the size/blocks) it may be
 partially compressed.


I guess the block size is related (or equal) to the record size set for
this file system, right?

What will happen then if I have a file which contains a header which
fits into 1 or 2 blocks, and is followed by stretches of data which are
say 500kB each (for simplicity) which could be visualized as sitting in
a rectangle with M rows and N columns. Since the file system has no way
of knowing details on the file, it will cut the file into blocks and
store it compressed or uncompressed as you have written. However, what
happens if the typical usage pattern is read only columns of the
rectangle, i.e. read the header, seek to the start of stretch #1, then
seeking to stretch #N+1, ...

Can ZFS make educated guesses where the seek targets might be or will it
read the file block by block until it reaches the target position, in
the latter case it might be quite inefficient if the file is huge and
has a large variance in compressibility.

 
 The file will be cached in RAM. When the file is closed and synced, the
 data
 will be written to the ZIL and ultimately to the data set.  I don't
 think there
 is a fundamental problem here... you should notice the NFS sync behaviour
 whether the backing store is ZFS or some other file system. Using a slog
 or nonvolatile write cache will help performance for such workloads.


Thanks, that's answer I was hoping for :)

 They are good questions :-)

Good :)

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread A Darren Dunham
On Mon, Mar 16, 2009 at 10:34:54PM +0100, Carsten Aulbert wrote:
 Can ZFS make educated guesses where the seek targets might be or will it
 read the file block by block until it reaches the target position, in
 the latter case it might be quite inefficient if the file is huge and
 has a large variance in compressibility.

Imagine a file that isn't compressed.  You don't have to read earlier
blocks to find data further into the file.  If you want data from a
particular offset, the filesystem can direct you to the block containing
that offset, and the byte within the block.

Now, since only the individual ZFS blocks are compressed, nothing
changes when compression is enabled.  The filesystem can still compute
the ZFS block containing any offset.  You might have to decompress the
final block to read the data, but that's the only file data block that
has to be read.

This is different from something like a .Z file where the entire file
is compressed.  For that, to read data at the end you either need a
directory or to decompress earlier data.
-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread A Darren Dunham
On Mon, Mar 16, 2009 at 09:54:57PM +0100, Carsten Aulbert wrote:
 o what happens when a user opens the file and does a lot of seeking
 inside the file? For example our scientists use a data format where
 quite compressible data is contained in stretches and the file header
 contains a dictionary where each stretch of data starts. If these files
 are compressed on disk, what will happen with ZFS? Will it just make
 educated guesses, or does it have to read all of the typically 30-150 MB
 of the file and then does the seeking from buffer caches?

Individual ZFS blocks are compressed.  So seeking isn't expensive.  It
doesn't have to decompress earlier blocks to find the correct offset.

 o Another problem I see (but probably isn't): A user is accessing a file
 via a NFS-exported ZFS, appending a line of text, closing the file (and
 hopefully also flushing everything correctly. However, then the user
 opens it again appends another line of text, ... Imagine this happening
 a few times per second. How will ZFS react to this pattern? Will it only
 opens the final record of the file, uncompress it, adds data,
 recompresses it, flushes it to disk and reports that back to the user's
 processes? Is there a potential problem here?

If it's truly an append, then no.  Only the last block has to be
rewritten. 

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread Richard Elling

Carsten Aulbert wrote:

Hi Richard,

Richard Elling wrote:
  

Files are not compressed in ZFS.  Blocks are compressed.



Sorry, yes, I was not specific enough.

  

If the compression of the blocks cannot gain more than 12.5% space savings,
then the block will not be compressed.  If your file contains
compressable parts
and uncompressable parts, then (depending on the size/blocks) it may be
partially compressed.




I guess the block size is related (or equal) to the record size set for
this file system, right?
  


The block size is dynamic, but for large files will likely top out at the
recordsize.


What will happen then if I have a file which contains a header which
fits into 1 or 2 blocks, and is followed by stretches of data which are
say 500kB each (for simplicity) which could be visualized as sitting in
a rectangle with M rows and N columns. Since the file system has no way
of knowing details on the file, it will cut the file into blocks and
store it compressed or uncompressed as you have written. However, what
happens if the typical usage pattern is read only columns of the
rectangle, i.e. read the header, seek to the start of stretch #1, then
seeking to stretch #N+1, ...
  


File systems do this even if the blocks are not compressed.

I can read your question as asking whether or not the file will always be
stored in contiguous blocks. The general answer is no, especially since
ZFS has a COW architecture.  But the practical impacts of this are difficult
to predict, because there is so much caching occuring at all levels of the
data path.  Suffice to say, if you think your disks are seeking themselves
to death, there is a simple dtrace script, iopattern, which will prove or
disprove the case.


Can ZFS make educated guesses where the seek targets might be or will it
read the file block by block until it reaches the target position, in
the latter case it might be quite inefficient if the file is huge and
has a large variance in compressibility.
  


This isn't a ZFS function, per se.  If a program is written to seek(), 
then it
can seek.  If a program is written to sequentially read, then it will do 
that.

The reason I say per se above, is because there is, by default, some
prefetching which can occur at the device or file level.  For magnetic 
disks,

the prefetching is usually free because it costs so much time to move the
head.  For SSDs, I think the jury is still out... we just don't have enough
collective experience to know that prefetching will never be a win.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread Carsten Aulbert
Darren, Richard,

thanks a lot for the very good answers. Regarding the seeking I was
probably mislead by the believe that the block size was like an
impenetrable block where as much data as possible is being squeezed into
(like .Z files would be if you first compressed and then cut the data
into blocks).

Thanks a lot!

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss