[zfs-discuss] Disk usage
Hey all, I have a question/puzzle with zfs. See the following: bash-3.00# df -h | grep d25 ; zfs list | grep d25 FILESYSTEM SIZE USED AVAIL CAPACITY MOUNTED ON r12_data/d25 *659G*40G*63G*39%/opt/d25/oakwc12 df -h says the d25 file system is 659GB?; 40GB used and 63GB available? r12_data/d2442G40G 2.1G95%/opt/d24/oakwcr12 NAMEUSED AVAIL REFER MOUNTPOINT r12_data/d25 760K *62.7G* 39.9G /opt/d25/oakwc12 zfs list says the db25 file system has 63GB available? r12_data/d24 39.9G 2.14G 39.9G /opt/d24/oakwcr12 Shouldn't the new filesystem (d25) size be what the clone was allocated? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] seeking in ZFS when data is compressed
Darren, Richard, thanks a lot for the very good answers. Regarding the seeking I was probably mislead by the believe that the block size was like an impenetrable block where as much data as possible is being squeezed into (like .Z files would be if you first compressed and then cut the data into blocks). Thanks a lot! Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] seeking in ZFS when data is compressed
Carsten Aulbert wrote: Hi Richard, Richard Elling wrote: Files are not compressed in ZFS. Blocks are compressed. Sorry, yes, I was not specific enough. If the compression of the blocks cannot gain more than 12.5% space savings, then the block will not be compressed. If your file contains compressable parts and uncompressable parts, then (depending on the size/blocks) it may be partially compressed. I guess the block size is related (or equal) to the record size set for this file system, right? The block size is dynamic, but for large files will likely top out at the recordsize. What will happen then if I have a file which contains a header which fits into 1 or 2 blocks, and is followed by stretches of data which are say 500kB each (for simplicity) which could be visualized as sitting in a rectangle with M rows and N columns. Since the file system has no way of knowing details on the file, it will "cut" the file into blocks and store it compressed or uncompressed as you have written. However, what happens if the typical usage pattern is read only columns of the "rectangle", i.e. read the header, seek to the start of stretch #1, then seeking to stretch #N+1, ... File systems do this even if the blocks are not compressed. I can read your question as asking whether or not the file will always be stored in contiguous blocks. The general answer is no, especially since ZFS has a COW architecture. But the practical impacts of this are difficult to predict, because there is so much caching occuring at all levels of the data path. Suffice to say, if you think your disks are seeking themselves to death, there is a simple dtrace script, iopattern, which will prove or disprove the case. Can ZFS make educated guesses where the seek targets might be or will it read the file block by block until it reaches the target position, in the latter case it might be quite inefficient if the file is huge and has a large variance in compressibility. This isn't a ZFS function, per se. If a program is written to seek(), then it can seek. If a program is written to sequentially read, then it will do that. The reason I say "per se" above, is because there is, by default, some prefetching which can occur at the device or file level. For magnetic disks, the prefetching is usually free because it costs so much time to move the head. For SSDs, I think the jury is still out... we just don't have enough collective experience to know that prefetching will never be a win. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] seeking in ZFS when data is compressed
On Mon, Mar 16, 2009 at 09:54:57PM +0100, Carsten Aulbert wrote: > o what happens when a user opens the file and does a lot of seeking > inside the file? For example our scientists use a data format where > quite compressible data is contained in stretches and the file header > contains a dictionary where each stretch of data starts. If these files > are compressed on disk, what will happen with ZFS? Will it just make > educated guesses, or does it have to read all of the typically 30-150 MB > of the file and then does the seeking from buffer caches? Individual ZFS blocks are compressed. So seeking isn't expensive. It doesn't have to decompress earlier blocks to find the correct offset. > o Another problem I see (but probably isn't): A user is accessing a file > via a NFS-exported ZFS, appending a line of text, closing the file (and > hopefully also flushing everything correctly. However, then the user > opens it again appends another line of text, ... Imagine this happening > a few times per second. How will ZFS react to this pattern? Will it only > opens the final record of the file, uncompress it, adds data, > recompresses it, flushes it to disk and reports that back to the user's > processes? Is there a potential problem here? If it's truly an append, then no. Only the last block has to be rewritten. -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] seeking in ZFS when data is compressed
On Mon, Mar 16, 2009 at 10:34:54PM +0100, Carsten Aulbert wrote: > Can ZFS make educated guesses where the seek targets might be or will it > read the file block by block until it reaches the target position, in > the latter case it might be quite inefficient if the file is huge and > has a large variance in compressibility. Imagine a file that isn't compressed. You don't have to read earlier blocks to find data further into the file. If you want data from a particular offset, the filesystem can direct you to the block containing that offset, and the byte within the block. Now, since only the individual ZFS blocks are compressed, nothing changes when compression is enabled. The filesystem can still compute the ZFS block containing any offset. You might have to decompress the final block to read the data, but that's the only file data block that has to be read. This is different from something like a .Z file where the entire file is compressed. For that, to read data at the end you either need a directory or to decompress earlier data. -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] seeking in ZFS when data is compressed
Hi Richard, Richard Elling wrote: > > Files are not compressed in ZFS. Blocks are compressed. Sorry, yes, I was not specific enough. > > If the compression of the blocks cannot gain more than 12.5% space savings, > then the block will not be compressed. If your file contains > compressable parts > and uncompressable parts, then (depending on the size/blocks) it may be > partially compressed. > I guess the block size is related (or equal) to the record size set for this file system, right? What will happen then if I have a file which contains a header which fits into 1 or 2 blocks, and is followed by stretches of data which are say 500kB each (for simplicity) which could be visualized as sitting in a rectangle with M rows and N columns. Since the file system has no way of knowing details on the file, it will "cut" the file into blocks and store it compressed or uncompressed as you have written. However, what happens if the typical usage pattern is read only columns of the "rectangle", i.e. read the header, seek to the start of stretch #1, then seeking to stretch #N+1, ... Can ZFS make educated guesses where the seek targets might be or will it read the file block by block until it reaches the target position, in the latter case it might be quite inefficient if the file is huge and has a large variance in compressibility. > > The file will be cached in RAM. When the file is closed and synced, the > data > will be written to the ZIL and ultimately to the data set. I don't > think there > is a fundamental problem here... you should notice the NFS sync behaviour > whether the backing store is ZFS or some other file system. Using a slog > or nonvolatile write cache will help performance for such workloads. > Thanks, that's answer I was hoping for :) > They are good questions :-) Good :) Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] seeking in ZFS when data is compressed
Carsten Aulbert wrote: Hi all, I was just reading http://blogs.sun.com/dap/entry/zfs_compression and would like to know what the experience of people is about enabling compression in ZFS. In principle I don't think it's a bad thing, especially not when the CPUs are fast enough to improve the performance as the hard drives might be too slow. However, I'm missing two aspects: o what happens when a user opens the file and does a lot of seeking inside the file? For example our scientists use a data format where quite compressible data is contained in stretches and the file header contains a dictionary where each stretch of data starts. If these files are compressed on disk, what will happen with ZFS? Will it just make educated guesses, or does it have to read all of the typically 30-150 MB of the file and then does the seeking from buffer caches? Files are not compressed in ZFS. Blocks are compressed. If the compression of the blocks cannot gain more than 12.5% space savings, then the block will not be compressed. If your file contains compressable parts and uncompressable parts, then (depending on the size/blocks) it may be partially compressed. o Another problem I see (but probably isn't): A user is accessing a file via a NFS-exported ZFS, appending a line of text, closing the file (and hopefully also flushing everything correctly. However, then the user opens it again appends another line of text, ... Imagine this happening a few times per second. How will ZFS react to this pattern? Will it only opens the final record of the file, uncompress it, adds data, recompresses it, flushes it to disk and reports that back to the user's processes? Is there a potential problem here? The file will be cached in RAM. When the file is closed and synced, the data will be written to the ZIL and ultimately to the data set. I don't think there is a fundamental problem here... you should notice the NFS sync behaviour whether the backing store is ZFS or some other file system. Using a slog or nonvolatile write cache will help performance for such workloads. Cheers (and sorry if these questions are stupid ones) They are good questions :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] seeking in ZFS when data is compressed
Hi all, I was just reading http://blogs.sun.com/dap/entry/zfs_compression and would like to know what the experience of people is about enabling compression in ZFS. In principle I don't think it's a bad thing, especially not when the CPUs are fast enough to improve the performance as the hard drives might be too slow. However, I'm missing two aspects: o what happens when a user opens the file and does a lot of seeking inside the file? For example our scientists use a data format where quite compressible data is contained in stretches and the file header contains a dictionary where each stretch of data starts. If these files are compressed on disk, what will happen with ZFS? Will it just make educated guesses, or does it have to read all of the typically 30-150 MB of the file and then does the seeking from buffer caches? o Another problem I see (but probably isn't): A user is accessing a file via a NFS-exported ZFS, appending a line of text, closing the file (and hopefully also flushing everything correctly. However, then the user opens it again appends another line of text, ... Imagine this happening a few times per second. How will ZFS react to this pattern? Will it only opens the final record of the file, uncompress it, adds data, recompresses it, flushes it to disk and reports that back to the user's processes? Is there a potential problem here? Cheers (and sorry if these questions are stupid ones) Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] User quota design discussion..
Hello Jorgen, If you look at the list archives you will see that it made a huge difference for some people including me. Now I'm easily able to saturate GbE linke while zfs send|recv'ing. -- Best regards, Robert Milkowski http://milek.blogspot.com Saturday, March 14, 2009, 1:06:40 PM, you wrote: JL> Sorry, did not mean it as a complaints, it just has been for us. But if JL> it has been made faster, that would be excellent. ZFS send is very powerful. JL> Lund JL> Robert Milkowski wrote: >> Hello Jorgen, >> >> Friday, March 13, 2009, 1:14:12 AM, you wrote: >> >> JL> That is a good point, I had not even planned to support quotas for ZFS >> JL> send, but consider a rescan to be the answer. We don't ZFS send very >> JL> often as it is far too slow. >> >> Since build 105 it should be *MUCH* for faster. >> >> ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freezing OpenSolaris with ZFS
Hi again, I read through your thread Blake and don't really know if we have exactly the same problem. I get different output and the system doesn't reboot automatically. Controller is an Adaptec RAID 31605 and the board is a Supermicro X7DBE. Here is some perhaps useful information. "fmdump -eV" gives the following output: Mar 16 2009 09:22:25.383935668 ereport.io.scsi.cmd.disk.dev.uderr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.uderr ena = 0x139594f76b1 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci8086,2...@1c/pci8086,3...@0/ pci9005,2...@e/d...@0,0 devid = id1,s...@tadaptec_31605___7ae2bc31 (end detector) driver-assessment = fail op-code = 0x1a cdb = 0x1a 0x0 0x8 0x0 0x18 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x0 stat-code = 0x0 un-decode-info = sd_cache_control: Mode Sense caching page code mismatch 0 un-decode-value = __ttl = 0x1 __tod = 0x49be0c41 0x16e264b4 Mar 16 2009 09:22:25.385150895 ereport.io.scsi.cmd.disk.dev.uderr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.uderr ena = 0x1395a77fe51 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci8086,2...@1c/pci8086,3...@0/ pci9005,2...@e/d...@1,0 devid = id1,s...@tadaptec_31605___6a16ac31 (end detector) driver-assessment = fail op-code = 0x1a cdb = 0x1a 0x0 0x8 0x0 0x18 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x0 stat-code = 0x0 un-decode-info = sd_cache_control: Mode Sense caching page code mismatch 0 un-decode-value = __ttl = 0x1 __tod = 0x49be0c41 0x16f4efaf Mar 16 2009 09:22:25.386359711 ereport.io.scsi.cmd.disk.dev.uderr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.uderr ena = 0x1395b9f5321 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci8086,2...@1c/pci8086,3...@0/ pci9005,2...@e/d...@2,0 devid = id1,s...@tadaptec_31605___7206bc31 (end detector) driver-assessment = fail op-code = 0x1a cdb = 0x1a 0x0 0x8 0x0 0x18 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x0 stat-code = 0x0 un-decode-info = sd_cache_control: Mode Sense caching page code mismatch 0 un-decode-value = __ttl = 0x1 __tod = 0x49be0c41 0x1707619f Mar 16 2009 09:22:25.387617744 ereport.io.scsi.cmd.disk.dev.uderr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.uderr ena = 0x1395cd262d1 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci8086,2...@1c/pci8086,3...@0/ pci9005,2...@e/d...@3,0 devid = id1,s...@tadaptec_31605___1a2ebc31 (end detector) driver-assessment = fail op-code = 0x1a cdb = 0x1a 0x0 0x8 0x0 0x18 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x0 stat-code = 0x0 un-decode-info = sd_cache_control: Mode Sense caching page code mismatch 0 un-decode-value = __ttl = 0x1 __tod = 0x49be0c41 0x171a93d0 Mar 16 2009 09:22:25.388910486 ereport.io.scsi.cmd.disk.dev.uderr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.uderr ena = 0x1395e0ddda1 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci8086,2...@1c/pci8086,3...@0/ pci9005,2...@e/d...@4,0 devid = id1,s...@tadaptec_31605___6c66cc31 (end detector) driver-assessment = fail op-code = 0x1a cdb = 0x1a 0x0 0x8 0x0 0x18 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x0 stat-code = 0x0 un-decode-info = sd_cache_control: Mode Sense caching page code mismatch 0 un-decode-value = __ttl = 0x1 __tod = 0x49be0c41 0x172e4d96 Mar 16 2009 09:22:25.390144519 ereport.io.scsi.cmd.disk.dev.uderr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.uderr ena = 0x1395f3b4431 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci8086,2...@1c/pci8086,3...@0/ pci9005,2...@e/d...@5,0 devid = id1,s...@tadaptec_31605___508acc