Re: [zfs-discuss] Deduplication Memory Requirements
On Thu, May 5, 2011 at 8:50 PM, Edward Ned Harvey wrote: > If you have to use the 4k recordsize, it is likely to consume 32x more > memory than the default 128k recordsize of ZFS. At this rate, it becomes > increasingly difficult to get a justification to enable the dedup. But it's > certainly possible. You're forgetting that zvols use an 8k volblocksize by default. If you're currently exporting exporting volumes with iSCSI it's only a 2x increase. The tradeoff is that you should have more duplicate blocks, and reap the rewards there. I'm fairly certain that it won't offset the large increase in the size of the DDT however. Dedup with zvols is probably never a good idea as a result. Only if you're hosting your VM images in .vmdk files will you get 128k blocks. Of course, your chance of getting many identical blocks gets much, much smaller. You'll have to worry about the guests' block alignment in the context of the image file, since two identical files may not create identical blocks as seen from ZFS. This means you may get only fractional savings and have an enormous DDT. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > If you have to use the 4k recordsize, it is likely to consume 32x more > memory than the default 128k recordsize of ZFS. At this rate, it becomes > increasingly difficult to get a justification to enable the dedup. But it's > certainly possible. Sorry, I didn't realize ... RE just said (and I take his word for it) that the default recordsize for a zvol is 8k. While of course the default recordsize for a ZFS filesystem is 128k. Emphasis is that memory requirement is a constant multiplied by number of blocks, so smaller blocks ==> higher number of blocks ==> more memory consumption. This could be a major difference in implementation ... If you are going to use ZFS over NFS as your VM storage backend that would default to 128k recordsize, while if you're going to use ZFS over ISCSI as your VM storage backend that would default to 8k recordsize. In either case, you really want to be aware of, and tune your recordsize appropriately for the guest(s) that you are running. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
> From: Brandon High [mailto:bh...@freaks.com] > > On Wed, May 4, 2011 at 8:23 PM, Edward Ned Harvey > wrote: > > Generally speaking, dedup doesn't work on VM images. (Same is true for > ZFS > > or netapp or anything else.) Because the VM images are all going to have > > their own filesystems internally with whatever blocksize is relevant to the > > guest OS. If the virtual blocks in the VM don't align with the ZFS (or > > whatever FS) host blocks... Then even when you write duplicated data > inside > > the guest, the host won't see it as a duplicated block. > > A zvol with 4k blocks should give you decent results with Windows > guests. Recent versions use 4k alignment by default and 4k blocks, so > there should be lots of duplicates for a base OS image. I agree with everything Brandon said. The one thing I would add is: The "correct" recordsize for each guest machine would depend on the filesystem that the guest machine is using. Without knowing a specific filesystem on a specific guest OS, the 4k recordsize sounds like a reasonable general-purpose setting. But if you know more details of the guest, you could hopefully use a larger recordsize and therefore consume less ram on the host. If you have to use the 4k recordsize, it is likely to consume 32x more memory than the default 128k recordsize of ZFS. At this rate, it becomes increasingly difficult to get a justification to enable the dedup. But it's certainly possible. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements
On May 4, 2011, at 7:56 PM, Edward Ned Harvey wrote: > This is a summary of a much longer discussion "Dedup and L2ARC memory > requirements (again)" > Sorry even this summary is long. But the results vary enormously based on > individual usage, so any "rule of thumb" metric that has been bouncing > around on the internet is simply not sufficient. You need to go into this > level of detail to get an estimate that's worth the napkin or bathroom > tissue it's scribbled on. > > This is how to (reasonably) accurately estimate the hypothetical ram > requirements to hold the complete data deduplication tables (DDT) and L2ARC > references in ram. Please note both the DDT and L2ARC references can be > evicted from memory according to system policy, whenever the system decides > some other data is more valuable to keep. So following this guide does not > guarantee that the whole DDT will remain in ARC or L2ARC. But it's a good > start. As the size of the data grows, the need to have the whole DDT in RAM or L2ARC decreases. With one notable exception, destroying a dataset or snapshot requires the DDT entries for the destroyed blocks to be updated. This is why people can go for months or years and not see a problem, until they try to destroy a dataset. > > I am using a solaris 11 express x86 test system for my example numbers > below. > > --- To calculate size of DDT --- > > Each entry in the DDT is a fixed size, which varies by platform. You can > find it with the command: > echo ::sizeof ddt_entry_t | mdb -k > This will return a hex value, that you probably want to convert to decimal. > On my test system, it is 0x178 which is 376 bytes > > There is one DDT entry per non-dedup'd (unique) block in the zpool. The workloads which are nicely dedupable tend to not have unique blocks. So this is another way of saying, "if your workload isn't dedupable, don't bother with deduplication." For years now we have been trying to convey this message. One way to help convey the message is... > Be > aware that you cannot reliably estimate #blocks by counting #files. You can > find the number of total blocks including dedup'd blocks in your pool with > this command: > zdb -bb poolname | grep 'bp count' Ugh. A better method is to simulate dedup on existing data: zdb -S poolname or measure dedup efficacy zdb -DD poolname which offer similar tabular analysis > Note: This command will run a long time and is IO intensive. On my systems > where a scrub runs for 8-9 hours, this zdb command ran for about 90 minutes. > On my test system, the result is 44145049 (44.1M) total blocks. > > To estimate the number of non-dedup'd (unique) blocks (assuming average size > of dedup'd blocks = average size of blocks in the whole pool), use: > zpool list > Find the dedup ratio. In my test system, it is 2.24x. Divide the total > blocks by the dedup ratio to find the number of non-dedup'd (unique) blocks. Or just count the unique and non-unique blocks with: zdb -D poolname > > In my test system: > 44145049 total blocks / 2.24 dedup ratio = 19707611 (19.7M) approx > non-dedup'd (unique) blocks > > Then multiply by the size of a DDT entry. > 19707611 * 376 = 7410061796 bytes = 7G total DDT size A minor gripe about zdb -D output is that it doesn't do the math. > > --- To calculate size of ARC/L2ARC references --- > > Each reference to a L2ARC entry requires an entry in ARC (ram). This is > another fixed size, which varies by platform. You can find it with the > command: > echo ::sizeof arc_buf_hdr_t | mdb -k > On my test system, it is 0xb0 which is 176 bytes Better yet, without need for mdb privilege, measure the current L2ARC header size in use. Normal user accounts can: kstat -p zfs::arcstats:hdr_size kstat -p zfs::arcstats:l2_hdr_size arcstat will allow you to easily track this over time. > > We need to know the average block size in the pool, to estimate the number > of blocks that will fit into L2ARC. Find the amount of space ALLOC in the > pool: > zpool list > Divide by the number of non-dedup'd (unique) blocks in the pool, to find the > average block size. In my test system: > 790G / 19707611 = 42K average block size > > Remember: If your L2ARC were only caching average size blocks, then the > payload ratio of L2ARC vs ARC would be excellent. In my test system, every > 42K L2ARC would require 176bytes ARC (a ratio of 244x). This would result > in a negligible ARC memory consumption. But since your DDT can be pushed > out of ARC into L2ARC, you get a really bad ratio of L2ARC vs ARC memory > consumption. In my test system every 376bytes DDT entry in L2ARC consumes > 176bytes ARC (a ratio of 2.1x). Yes, it is approximately possible to have > the complete DDT present in ARC and L2ARC, thus consuming tons of ram. This is a good thing for those cases when you need to quickly reference la
Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements
> From: Karl Wagner [mailto:k...@mouse-hole.com] > > so there's an ARC entry referencing each individual DDT entry in the L2ARC?! > I had made the assumption that DDT entries would be grouped into at least > minimum block sized groups (8k?), which would have lead to a much more > reasonable ARC requirement. > > seems like a bad design to me, which leads to dedup only being usable by > those prepared to spend a LOT of dosh... which may as well go into more > storage (I know there are other benefits too, but that's my opinion) The whole point of the DDT is that it needs to be structured, and really fast searchable. So no, you're not going to consolidate it into an unstructured memory block as you said. You pay the memory consumption price for the sake of performance. Yes it consumes a lot of ram, but don't call it a "bad design." It's just a different design than what you expected, because what you expected would hurt performance while consuming less ram. And we're not talking crazy dollars here. So your emphasis on a LOT of dosh seems exaggerated. I just spec'd out a system where upgrading from 12 to 24G of ram to enable dedup effectively doubled the storage capacity of the system, and that upgrade cost the same as one of the disks. (This is a 12-disk system.) So it was actually a 6x cost reducer, at least. It all depends on how much mileage you get out of the dedup. Your mileage may vary. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On May 5, 2011, at 6:02 AM, Edward Ned Harvey wrote: > Is this a zfs discussion list, or a nexenta sales & promotion list? Obviously, this is a Nextenta sales & promotion list. And Oracle. And OSX. And BSD. And Linux. And anyone who needs help or can offer help with ZFS technology :-) This list has never been more diverse. The only sad part is the unnecessary assassination of the OpenSolaris brand. But life moves on, and so does good technology. -- richard-who-is-proud-to-work-at-Nexenta ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On May 5, 2011, at 2:58 PM, Brandon High wrote: > On Wed, May 4, 2011 at 8:23 PM, Edward Ned Harvey > >> Or if you're intimately familiar with both the guest & host filesystems, and >> you choose blocksizes carefully to make them align. But that seems >> complicated and likely to fail. > > Using a 4k block size is a safe bet, since most OSs use a block size > that is a multiple of 4k. It's the same reason that the new "Advanced > Format" drives use 4k sectors. Yes, 4KB block sizes are replacing the 512B blocks of yesteryear. However, the real reason the HDD manufacturers headed this way is because they can get more usable bits per platter. The tradeoff is that your workload may consume more real space on the platter than before. TANSTAAFL. The trick for best performance and best opportunity for dedup (alignment notwithstanding) is to have a block size that is smaller than your workload. Or, don't bring a 128KB block to a 4KB block battle. For this reason, the default 8KB block size for a zvol is a reasonable choice, but perhaps 4KB is better for many workloads. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Permanently using hot spare?
On Thu, May 05, 2011 at 03:13:06PM -0700, TianHong Zhao wrote: > Just detach the faulty disk, then the spare will become the "normal" > disk once it's finished resilvering. > > #zfs detach > > Then you need to the new spare : > #zfs add > > There seems to be a new feature in illumos project to support a zpool > property like "spare promotion", > which would not require the manual "detach" operation. > > Tianhong Thanks! Great tip. Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multipl disk failures cause zpool hang
Thanks again. No, I don’t see any bio functions, but you have shed very useful lights on the issue. My test platform is b147, the pool disks are from a storage system via a Qlogic fiber HBA. My test case is : 1. zpool set failmode=continue pool1 2. dd if=/dev/zero of=/pool1/fs/myfile count=1000 & 3. unplug the fiber cable, wait about 30 sec. 4. zpool status (hang) 5. wait about 1 min. 6. can not open a new ssh session to the box, but existing ssh sessions are still alive though. 7. Use the existing session to get into mdb and get threadlist. 8. Eventually, I have to power cycle the box. Tianhong From: Steve Gonczi [mailto:gon...@comcast.net] Sent: Thursday, May 05, 2011 6:32 PM To: TianHong Zhao Subject: Re: [zfs-discuss] multipl disk failures cause zpool hang You are most welcome. The zio_wait just indicates that the sync thread is waiting for an io to complete. Search through the threadlist and see if there is a thread that is stuck in "biowait". zio is asynchronous, so the thread performing the actual io will be a different thread. But first let's just verify again that you are not deleting large files or large snapshots, or zfs destroy-in large file systems when this hang happens, and that you are running a fairly modern zfs version ( something 145+). If I am reading your posts correctly, you can rpeatably make this happen on a mostly idle system, just by disconnecting and reconnecting your cable, correct? In that case, maybe this is a lost "biodone" problem. If you find a thread sitting in biowait for a long time, that would be my suspicion. When you unplug the cable, the strategy routine that would normally complete or time out or fail the io, could be taking a rare exit path, and on that particular path, fails to issue a biodone() like it is supposed to. The next step after this, would be figuring out which is the device's strategy call and give that function a good thorough review, esp. the different exit paths. Steve /sG/ - "TianHong Zhao" wrote: Thanks for the information. I think you’re right that spa_sync thread is blocked in zio_wait while holding scl_lock which blocks all zpool related command (such as zpool status). Question is why zio_wait is blocked forever ? if the underlying device is offline, could zio service just bail out ? what if I set “zfs sync=disabled” ? Here is what I collected “threadlists” #mdb -K >::threadlist -v ff02d9627400 ff02f05f80a8 ff02d95f2780 1 59 ff02d57e585c PC: _resume_from_idle+0xf1CMD: zpool status stack pointer for thread ff02d9627400: ff00108a3a70 [ ff00108a3a70 _resume_from_idle+0xf1() ] swtch+0x145() cv_wait+0x61() spa_config_enter+0x86() spa_vdev_state_enter+0x3c() spa_vdev_set_common+0x37() spa_vdev_setpath+0x22() zfs_ioc_vdev_setpath+0x48() zfsdev_ioctl+0x15e() cdev_ioctl+0x45() spec_ioctl+0x5a() fop_ioctl+0x7b() ioctl+0x18e() _sys_sysenter_post_swapgs+0x149() … ff0010378c40 fbc2e3300 0 60 ff034935bcb8 PC: _resume_from_idle+0xf1THREAD: txg_sync_thread() stack pointer for thread ff0010378c40: ff00103789b0 [ ff00103789b0 _resume_from_idle+0xf1() ] swtch+0x145() cv_wait+0x61() zio_wait+0x5d() dsl_pool_sync+0xe1() spa_sync+0x38d() txg_sync_thread+0x247() thread_start+8() Tianhong From: Steve Gonczi [mailto:gon...@comcast.net] Sent: Wednesday, May 04, 2011 10:43 AM To: TianHong Zhao Subject: Re: [zfs-discuss] multipl disk failures cause zpool hang Hi TianHong, I have seen similar apparent hangs, all related to destroying large snapshots, file systems or deleting large files ( with dedup enabled. by large I mean in the terabyte range) In the cases I have looked at, the root problem is the sync taking way too long, and because of the sync interlock with keeping the current txg open, zfs eventually runs out of space in the current txg, and, unable to accept any more transactions. In those cases, the system would come back to life eventually, but it may take a long time ( days potentially). Looks like yours is a reproducible scenario, and I think the disconnect-reconnect triggered hang may be new. It would be good to root cause this. I recommend loading the kernel debugger up and generating a crash dump. It would be pretty straight forward to verify if this is the "sync taking a long time" failure or not. The output from ::threadlist -v would be telling. There have been posts earlier as to how to load the debugger and creating a crash dump. Best wishes Steve /sG/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Permanently using hot spare?
On 05/ 6/11 09:53 AM, Ray Van Dolson wrote: Have a failed drive on a ZFS pool (three RAIDZ2 vdevs, one hot spare). The hot spare kicked in and all is well. Is it possible to just make that hot spare disk -- already silvered into the pool -- as a permanent part of the pool? We could then throw in a new disk and mark it as a spare and avoid what would seem to be an unnecessary resilver (twice, once when the spare is brought in and again when we replace the failed disk). Yes, as Tianhong just posted, just detach the faulted device. What you describe is what I normally do, add the original drive back as a spare when it is replaced. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Permanently using hot spare?
Just detach the faulty disk, then the spare will become the "normal" disk once it's finished resilvering. #zfs detach Then you need to the new spare : #zfs add There seems to be a new feature in illumos project to support a zpool property like "spare promotion", which would not require the manual "detach" operation. Tianhong -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ray Van Dolson Sent: Thursday, May 05, 2011 5:53 PM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Permanently using hot spare? Have a failed drive on a ZFS pool (three RAIDZ2 vdevs, one hot spare). The hot spare kicked in and all is well. Is it possible to just make that hot spare disk -- already silvered into the pool -- as a permanent part of the pool? We could then throw in a new disk and mark it as a spare and avoid what would seem to be an unnecessary resilver (twice, once when the spare is brought in and again when we replace the failed disk). This document[1] seems to make it sound like it can be done, but I'm not really seeing how... Can I "add" the spare disk to the pool when it's already in use? Probably not... Note this is on Solaris 10 U9. Thanks, Ray [1] http://dlc.sun.com/osol/docs/content/ZFSADMIN/gayrd.html#gcvcw ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multipl disk failures cause zpool hang
Thanks for the information. I think you’re right that spa_sync thread is blocked in zio_wait while holding scl_lock which blocks all zpool related command (such as zpool status). Question is why zio_wait is blocked forever ? if the underlying device is offline, could zio service just bail out ? what if I set “zfs sync=disabled” ? Here is what I collected “threadlists” #mdb -K >::threadlist -v ff02d9627400 ff02f05f80a8 ff02d95f2780 1 59 ff02d57e585c PC: _resume_from_idle+0xf1CMD: zpool status stack pointer for thread ff02d9627400: ff00108a3a70 [ ff00108a3a70 _resume_from_idle+0xf1() ] swtch+0x145() cv_wait+0x61() spa_config_enter+0x86() spa_vdev_state_enter+0x3c() spa_vdev_set_common+0x37() spa_vdev_setpath+0x22() zfs_ioc_vdev_setpath+0x48() zfsdev_ioctl+0x15e() cdev_ioctl+0x45() spec_ioctl+0x5a() fop_ioctl+0x7b() ioctl+0x18e() _sys_sysenter_post_swapgs+0x149() … ff0010378c40 fbc2e3300 0 60 ff034935bcb8 PC: _resume_from_idle+0xf1THREAD: txg_sync_thread() stack pointer for thread ff0010378c40: ff00103789b0 [ ff00103789b0 _resume_from_idle+0xf1() ] swtch+0x145() cv_wait+0x61() zio_wait+0x5d() dsl_pool_sync+0xe1() spa_sync+0x38d() txg_sync_thread+0x247() thread_start+8() Tianhong From: Steve Gonczi [mailto:gon...@comcast.net] Sent: Wednesday, May 04, 2011 10:43 AM To: TianHong Zhao Subject: Re: [zfs-discuss] multipl disk failures cause zpool hang Hi TianHong, I have seen similar apparent hangs, all related to destroying large snapshots, file systems or deleting large files ( with dedup enabled. by large I mean in the terabyte range) In the cases I have looked at, the root problem is the sync taking way too long, and because of the sync interlock with keeping the current txg open, zfs eventually runs out of space in the current txg, and, unable to accept any more transactions. In those cases, the system would come back to life eventually, but it may take a long time ( days potentially). Looks like yours is a reproducible scenario, and I think the disconnect-reconnect triggered hang may be new. It would be good to root cause this. I recommend loading the kernel debugger up and generating a crash dump. It would be pretty straight forward to verify if this is the "sync taking a long time" failure or not. The output from ::threadlist -v would be telling. There have been posts earlier as to how to load the debugger and creating a crash dump. Best wishes Steve /sG/ - "TianHong Zhao" wrote: Thanks for the reply. This sounds a serious issue if we have to reboot a machine in such case, I am wondering if anybody is working on this. BTW, the zpool failmode is set to continue, in my test case. Tianhong Zhao ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On Wed, May 4, 2011 at 8:23 PM, Edward Ned Harvey wrote: > Generally speaking, dedup doesn't work on VM images. (Same is true for ZFS > or netapp or anything else.) Because the VM images are all going to have > their own filesystems internally with whatever blocksize is relevant to the > guest OS. If the virtual blocks in the VM don't align with the ZFS (or > whatever FS) host blocks... Then even when you write duplicated data inside > the guest, the host won't see it as a duplicated block. A zvol with 4k blocks should give you decent results with Windows guests. Recent versions use 4k alignment by default and 4k blocks, so there should be lots of duplicates for a base OS image. > There are some situations where dedup may help on VM images... For example > if you're not using sparse files and you have a zero-filed disk... But in compression=zle works even better for these cases, since it doesn't require DDT resources. > Or if you're intimately familiar with both the guest & host filesystems, and > you choose blocksizes carefully to make them align. But that seems > complicated and likely to fail. Using a 4k block size is a safe bet, since most OSs use a block size that is a multiple of 4k. It's the same reason that the new "Advanced Format" drives use 4k sectors. Windows uses 4k alignment and 4k (or larger) clusters. ext3/ext4 uses 1k, 2k, or 4k blocks. Drives over 512MB should use 4k by default. The block alignment is determined by the partitioning, so some care needs to be taken there. zfs uses 'ashift' size blocks. I'm not sure what ashift works out to be when using a zvol though, so it could be as small as 512b but may be set to the same as the blocksize property. ufs is 4k or 8k on x86 and 8k on sun4u. As with ext4, block alignment is determined by partitioning and slices. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Permanently using hot spare?
Have a failed drive on a ZFS pool (three RAIDZ2 vdevs, one hot spare). The hot spare kicked in and all is well. Is it possible to just make that hot spare disk -- already silvered into the pool -- as a permanent part of the pool? We could then throw in a new disk and mark it as a spare and avoid what would seem to be an unnecessary resilver (twice, once when the spare is brought in and again when we replace the failed disk). This document[1] seems to make it sound like it can be done, but I'm not really seeing how... Can I "add" the spare disk to the pool when it's already in use? Probably not... Note this is on Solaris 10 U9. Thanks, Ray [1] http://dlc.sun.com/osol/docs/content/ZFSADMIN/gayrd.html#gcvcw ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quick zfs send -i performance questions
On Thu, May 5, 2011 at 11:17 AM, Giovanni Tirloni wrote: > What I find it curious is that it only happens with incrementals. Full > send's go as fast as possible (monitored with mbuffer). I was just wondering > if other people have seen it, if there is a bug (b111 is quite old), etc. I missed that you were using b111 earlier. That's probably a large part of the problem. There were a lot of performance and reliability improvements between b111 and b134, and there have been more between b134 and b148 (OI) or b151 (S11 Express). Updating the host you're receiving on to something more recent may fix the performance problem you're seeing. Fragmentation shouldn't be to great of an issue if the pool you're writing to is relatively empty. There were changes made to zpool metaslab allocation post-b111 that might improve performance for pools between 70% and 96% full. This could also be why the full sends perform better than incremental sends. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quick zfs send -i performance questions
On Thu, May 5, 2011 at 2:17 PM, Giovanni Tirloni wrote: > What I find it curious is that it only happens with incrementals. Full > send's go as fast as possible (monitored with mbuffer). I was just wondering > if other people have seen it, if there is a bug (b111 is quite old), etc. I have been using zfs send / recv via ssh and a WAN connection to replicate about 20 TB of data. One initial Full followed by an Incremental every 4 hours. This has been going on for over a year and I have not had any reliability issues. I started at Solaris 10U6, then 10U8, and now 10U9. I did run into a bug early on that if the ssh failed, then the zfs recv would hang, but that was fixed ages ago. -- {1-2-3-4-5-6-7-} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quick zfs send -i performance questions
On Wed, May 4, 2011 at 9:04 PM, Brandon High wrote: > On Wed, May 4, 2011 at 2:25 PM, Giovanni Tirloni > wrote: > > The problem we've started seeing is that a zfs send -i is taking hours > to > > send a very small amount of data (eg. 20GB in 6 hours) while a zfs send > full > > transfer everything faster than the incremental (40-70MB/s). Sometimes we > > just give up on sending the incremental and send a full altogether. > > Does the send complete faster if you just pipe to /dev/null? I've > observed that if recv stalls, it'll pause the send, and they two go > back and forth stepping on each other's toes. Unfortunately, send and > recv tend to pause with each individual snapshot they are working on. > > Putting something like mbuffer > (http://www.maier-komor.de/mbuffer.html) in the middle can help smooth > it out and speed things up tremendously. It prevents the send from > pausing when the recv stalls, and allows the recv to continue working > when the send is stalled. You will have to fiddle with the buffer size > and other options to tune it for your use. > We've done various tests piping it to /dev/null and then transferring the files to the destination. What seems to stall is the recv because it doesn't complete (through mbuffer, ssh, locally, etc). The zfs send always complete at the same rate. Mbuffer is being used but doesn't seem to help. When things start to stall, the in / out buffers will quickly fill up and nothing will be sent. Probably because the mbuffer on the other side can't receive any more data until the zfs recv gives it some air to breath. What I find it curious is that it only happens with incrementals. Full send's go as fast as possible (monitored with mbuffer). I was just wondering if other people have seen it, if there is a bug (b111 is quite old), etc. -- Giovanni Tirloni ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
We have customers using dedup with lots of vm images... in one extreme case they are getting dedup ratios of over 200:1! You don't need dedup or sparse files for zero filling. Simple zle compression will eliminate those for you far more efficiently and without needing massive amounts of ram. Our customers have the ability to access our systems engineers to design the solution for their needs. If you are serious about doing this stuff right, work with someone like Nexenta that can engineer a complete solution instead of trying to figure out which of us on this forum are quacks and which are cracks. :) Tim Cook wrote: >On Wed, May 4, 2011 at 10:23 PM, Edward Ned Harvey < >opensolarisisdeadlongliveopensola...@nedharvey.com> wrote: > >> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> > boun...@opensolaris.org] On Behalf Of Ray Van Dolson >> > >> > Are any of you out there using dedupe ZFS file systems to store VMware >> > VMDK (or any VM tech. really)? Curious what recordsize you use and >> > what your hardware specs / experiences have been. >> >> Generally speaking, dedup doesn't work on VM images. (Same is true for ZFS >> or netapp or anything else.) Because the VM images are all going to have >> their own filesystems internally with whatever blocksize is relevant to the >> guest OS. If the virtual blocks in the VM don't align with the ZFS (or >> whatever FS) host blocks... Then even when you write duplicated data >> inside >> the guest, the host won't see it as a duplicated block. >> >> There are some situations where dedup may help on VM images... For example >> if you're not using sparse files and you have a zero-filed disk... But in >> that case, you should probably just use a sparse file instead... Or ... >> If >> you have a "golden" image that you're copying all over the place ... but in >> that case, you should probably just use clones instead... >> >> Or if you're intimately familiar with both the guest & host filesystems, >> and >> you choose blocksizes carefully to make them align. But that seems >> complicated and likely to fail. >> >> >> >That's patently false. VM images are the absolute best use-case for dedup >outside of backup workloads. I'm not sure who told you/where you got the >idea that VM images are not ripe for dedup, but it's wrong. > >--Tim > >___ >zfs-discuss mailing list >zfs-discuss@opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements
so there's an ARC entry referencing each individual DDT entry in the L2ARC?! I had made the assumption that DDT entries would be grouped into at least minimum block sized groups (8k?), which would have lead to a much more reasonable ARC requirement. seems like a bad design to me, which leads to dedup only being usable by those prepared to spend a LOT of dosh... which may as well go into more storage (I know there are other benefits too, but that's my opinion) -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Edward Ned Harvey wrote: > From: Erik Trimble [mailto:erik.trim...@oracle.com] > > Using the standard > c_max value of 80%, remember that this is 80% of the > TOTAL system RAM, > including that RAM normally dedicated to other > purposes. So long as the > total amount of RAM you expect to dedicate to > ARC usage (for all ZFS uses, > not just dedup) is less than 4 times that > of all other RAM consumption, you > don't need to "overprovision". Correct, usually you don't need to > overprovision for the sake of ensuring enough ram available for OS and > processes. But you do need to overprovision 25% if you want to increase the > size of your usable ARC without reducing the amount of ARC you currently have > in the system being used to cache other files etc. > Any > entry that is > migrated back from L2ARC into ARC is considered "stale" > data in the L2ARC, > and thus, is no longer tracked in the ARC's reference > table for L2ARC. Good > news. I didn't know that. I thought the L2ARC was still valid, even if > something was pulled back into ARC. So there are two useful models: (a) The upper bound: The whole DDT is in ARC, and the whole L2ARC is filled with average-size blocks. or (b) The lower bound: The whole DDT is in L2ARC, and all the rest of the L2ARC is filled with average-size blocks. ARC requirements are based only on L2ARC references. The actual usage will be something between (a) and (b)... And the actual is probably closer to (b) In my test system: (a) (upper bound) On my test system I guess the OS and processes consume 1G. (I'm making that up without any reason.) On my test system I guess I need 8G in the system to get reasonable performance without dedup or L2ARC. (Again, I'm just making that up.) I need 7G for DDT and I have 748982 average-size blocks in L2ARC, which means 131820832 bytes = 125M or 0.1G for L2ARC I really just need to plan for 7.1G ARC usage Multiply by 5/4 and it means I need 8.875G system ram My system needs to be built with at least 8G + 8.875G = 16.875G. (b) (lower bound) On my test system I guess the OS and processes consume 1G. (I'm making that up without any reason.) On my test system I guess I need 8G in the system to get reasonable performance without dedup or L2ARC. (Again, I'm just making that up.) I need 0G for DDT (because it's in L2ARC) and I need 3.4G ARC to hold all the L2ARC references, including the DDT in L2ARC So I really just need to plan for 3.4G ARC for my L2ARC references. Multiply by 5/4 and it means I need 4.25G system ram My system needs to be built with at least 8G + 4.25G = 12.25G. Thank you for your input, Erik. Previously I would have only been comfortable with 24G in this system, because I was calculating a need for significantly higher than 16G. But now, what we're calling the upper bound is just *slightly* higher than 16G, while the lower bound and most likely actual figure is significantly lower than 16G. So in this system, I would be comfortable running with 16G. But I would be even more comfortable running with 24G. ;-)_ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
I assume you're talking about a situation where there is an initial VM image, and then to clone the machine, the customers copy the VM, correct? If that is correct, have you considered ZFS cloning instead? When I said dedup wasn't good for VM's, what I'm talking about is: If there is data inside the VM which is cloned... For example if somebody logs into the guest OS and then does a "cp" operation... Then dedup of the host is unlikely to be able to recognize that data as cloned data inside the virtual disk. I have the same opinion. When having talks with customers about the usage of dedup and cloning, the answer is simple: When you know that duplicates will occur but don't know when, then use dedup, when you know that duplicates occur and you know that they are there from the beginning, then use cloning. Thus VM images cries for cloning. I'm not a fan for dedup for VMs. I heard the argument once "but what is with vm patching". Aside from the problem of detecting the clones, i wouldn't patch a vm, but patch the master image and regenerate the clones, especially when it's about general patching session (just saving a gig because there is a patch on 2 or 3 of 100 server) isn't worth the effort of spending a lot of memory for dedup). Out of a simple reason: Patching the VMs each on it's own is likely to increase VM sprawl. So all i save is some iron, but i'm not simplifying administration. However this needs good administrative processes. You can use dedup for VMs, but i'm not sure someone should ... Is this a zfs discussion list, or a nexenta sales & promotion list? Well ... i have an opinion how he sees that ... however it's just my own ;) -- ORACLE Joerg Moellenkamp | Sales Consultant Phone: +49 40 251523-460 | Mobile: +49 172 8318433 Oracle Hardware Presales - Nord ORACLE Deutschland B.V.& Co. KG | Nagelsweg 55 | 20097 Hamburg ORACLE Deutschland B.V.& Co. KG Hauptverwaltung: Riesstr. 25, D-80992 München Registergericht: Amtsgericht München, HRA 95603 Komplementärin: ORACLE Deutschland Verwaltung B.V. Rijnzathe 6, 3454PV De Meern, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On Thu, 2011-05-05 at 09:02 -0400, Edward Ned Harvey wrote: > > From: Garrett D'Amore [mailto:garr...@nexenta.com] > > > > We have customers using dedup with lots of vm images... in one extreme > > case they are getting dedup ratios of over 200:1! > > I assume you're talking about a situation where there is an initial VM image, > and then to clone the machine, the customers copy the VM, correct? > If that is correct, have you considered ZFS cloning instead? No. Obviously if you can clone, its better. But sometimes you can't do this even with v12n, and we have this situation at customer sites today. (I have always said, zfs clone is far easier, far more proven, and far more efficient, *if* you can control the "ancestral" relationship to take advantage of the clone.) For example, one are where cloning can't help is with patches and updates. In some instances these can get quite large, and across 1000's of VMs the space required can be considerable. > > When I said dedup wasn't good for VM's, what I'm talking about is: If there > is data inside the VM which is cloned... For example if somebody logs into > the guest OS and then does a "cp" operation... Then dedup of the host is > unlikely to be able to recognize that data as cloned data inside the virtual > disk. I disagree. I believe that within the VMDKs data is aligned nicely, since these are disk images. At any rate, we are seeing real (and large) dedup ratios in the field when used with v12n. In fact, this is the killer app for dedup. > > > Our customers have the ability to access our systems engineers to design the > > solution for their needs. If you are serious about doing this stuff right, > > work > > with someone like Nexenta that can engineer a complete solution instead of > > trying to figure out which of us on this forum are quacks and which are > > cracks. :) > > Is this a zfs discussion list, or a nexenta sales & promotion list? My point here was that there is a lot of half baked advice being given... the idea that you should only use dedup if you have a bunch of zeros on your disk images is absolutely and totally nuts for example. It doesn't match real world experience, and it doesn't match the theory either. And sometimes real-world experience trumps the theory. I've been shown on numerous occasions that ideas that I thought were half-baked turned out to be very effective in the field, and vice versa. (I'm a developer, not a systems engineer. Fortunately I have a very close working relationship with a couple of awesome systems engineers.) Folks come here looking for advice. I think the advice that if you're contemplating these kinds of solutions, you should get someone with some real world experience solving these kinds of problems every day, is very sound advice. Trying to pull out the truths from the myths I see stated here nearly every day is going to be difficult for the average reader here, I think. - Garrett ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
Hi, On 05/ 5/11 03:02 PM, Edward Ned Harvey wrote: From: Garrett D'Amore [mailto:garr...@nexenta.com] We have customers using dedup with lots of vm images... in one extreme case they are getting dedup ratios of over 200:1! I assume you're talking about a situation where there is an initial VM image, and then to clone the machine, the customers copy the VM, correct? If that is correct, have you considered ZFS cloning instead? When I said dedup wasn't good for VM's, what I'm talking about is: If there is data inside the VM which is cloned... For example if somebody logs into the guest OS and then does a "cp" operation... Then dedup of the host is unlikely to be able to recognize that data as cloned data inside the virtual disk. ZFS cloning and ZFS dedup are solving two problems that are related, but different: - Through Cloning, a lot of space can be saved in situations where it is known beforehand that data is going to be used multiple times from multiple different "views". Virtualization is a perfect example of this. - Through Dedup, space can be saved in situations where the duplicate nature of data is not known, or not known beforehand. Again, in virtualization scenarios, this could be common modifications to VM images that are performed multiple times, but not anticipated, such as extra software, OS patches, or simply man users saving the same files to their local desktops. To go back to the "cp" example: If someone logs into a VM that is backed by ZFS with dedup enabled, then copies a file, the extra space that the file will take will be minimal. The act of copying the file will break down into a series of blocks that will be recognized as duplicate blocks. This is completely independent of the clone nature of the underlying VM's backing store. But I agree that the biggest savings are to be expected from cloning first, as they typically translate into n GB (for the base image) x # of users, which is a _lot_. Dedup is still the icing on the cake for all those data blocks that were unforeseen. And that can be a lot, too, as everone who has seen cluttered desktops full of downloaded files can probably confirm. Cheers, Constantin -- Constantin Gonzalez Schmitz, Sales Consultant, Oracle Hardware Presales Germany Phone: +49 89 460 08 25 91 | Mobile: +49 172 834 90 30 Blog: http://constantin.glez.de/| Twitter: zalez ORACLE Deutschland B.V. & Co. KG, Sonnenallee 1, 85551 Kirchheim-Heimstetten ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstraße 25, D-80992 München Registergericht: Amtsgericht München, HRA 95603 Komplementärin: ORACLE Deutschland Verwaltung B.V. Hertogswetering 163/167, 3543 AS Utrecht Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven Oracle is committed to developing practices and products that help protect the environment ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements
> From: Erik Trimble [mailto:erik.trim...@oracle.com] > > Using the standard c_max value of 80%, remember that this is 80% of the > TOTAL system RAM, including that RAM normally dedicated to other > purposes. So long as the total amount of RAM you expect to dedicate to > ARC usage (for all ZFS uses, not just dedup) is less than 4 times that > of all other RAM consumption, you don't need to "overprovision". Correct, usually you don't need to overprovision for the sake of ensuring enough ram available for OS and processes. But you do need to overprovision 25% if you want to increase the size of your usable ARC without reducing the amount of ARC you currently have in the system being used to cache other files etc. > Any > entry that is migrated back from L2ARC into ARC is considered "stale" > data in the L2ARC, and thus, is no longer tracked in the ARC's reference > table for L2ARC. Good news. I didn't know that. I thought the L2ARC was still valid, even if something was pulled back into ARC. So there are two useful models: (a) The upper bound: The whole DDT is in ARC, and the whole L2ARC is filled with average-size blocks. or (b) The lower bound: The whole DDT is in L2ARC, and all the rest of the L2ARC is filled with average-size blocks. ARC requirements are based only on L2ARC references. The actual usage will be something between (a) and (b)... And the actual is probably closer to (b) In my test system: (a) (upper bound) On my test system I guess the OS and processes consume 1G. (I'm making that up without any reason.) On my test system I guess I need 8G in the system to get reasonable performance without dedup or L2ARC. (Again, I'm just making that up.) I need 7G for DDT and I have 748982 average-size blocks in L2ARC, which means 131820832 bytes = 125M or 0.1G for L2ARC I really just need to plan for 7.1G ARC usage Multiply by 5/4 and it means I need 8.875G system ram My system needs to be built with at least 8G + 8.875G = 16.875G. (b) (lower bound) On my test system I guess the OS and processes consume 1G. (I'm making that up without any reason.) On my test system I guess I need 8G in the system to get reasonable performance without dedup or L2ARC. (Again, I'm just making that up.) I need 0G for DDT (because it's in L2ARC) and I need 3.4G ARC to hold all the L2ARC references, including the DDT in L2ARC So I really just need to plan for 3.4G ARC for my L2ARC references. Multiply by 5/4 and it means I need 4.25G system ram My system needs to be built with at least 8G + 4.25G = 12.25G. Thank you for your input, Erik. Previously I would have only been comfortable with 24G in this system, because I was calculating a need for significantly higher than 16G. But now, what we're calling the upper bound is just *slightly* higher than 16G, while the lower bound and most likely actual figure is significantly lower than 16G. So in this system, I would be comfortable running with 16G. But I would be even more comfortable running with 24G. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
> From: Garrett D'Amore [mailto:garr...@nexenta.com] > > We have customers using dedup with lots of vm images... in one extreme > case they are getting dedup ratios of over 200:1! I assume you're talking about a situation where there is an initial VM image, and then to clone the machine, the customers copy the VM, correct? If that is correct, have you considered ZFS cloning instead? When I said dedup wasn't good for VM's, what I'm talking about is: If there is data inside the VM which is cloned... For example if somebody logs into the guest OS and then does a "cp" operation... Then dedup of the host is unlikely to be able to recognize that data as cloned data inside the virtual disk. > Our customers have the ability to access our systems engineers to design the > solution for their needs. If you are serious about doing this stuff right, > work > with someone like Nexenta that can engineer a complete solution instead of > trying to figure out which of us on this forum are quacks and which are > cracks. :) Is this a zfs discussion list, or a nexenta sales & promotion list? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster copy from UFS to ZFS
Ian Collins wrote: > >> *ufsrestore works fine on ZFS filesystems (although I haven't tried it > >> with any POSIX ACLs on the original ufs filesystem, which would probably > >> simply get lost). > > star -copy -no-fsync is typically 30% faster that ufsdump | ufsrestore. > > > Does it preserve ACLs? Star supports ACLs from the withdrawn POSIX draft. Star could already support ZFS ACLs in case that Sun had offered a correctly working ACL support library when they introdiced ZFS ACLs. Unfortunately it took some time until this lib was fixed and since then, I had other projects that took my time. ZFS ACLs are not fogetten however. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster copy from UFS to ZFS
Erik Trimble wrote: > rsync is indeed slower than star; so far as I can tell, this is due > almost exclusively to the fact that rsync needs to build an in-memory > table of all work being done *before* it starts to copy. After that, it > copies at about the same rate as star (my observations). I'd have to > look at the code, but rsync appears to internally buffer a signification > amount (due to its expect network use pattern), which helps for ZFS > copying. The one thing I'm not sure of is whether rsync uses a socket, > pipe, or semaphore method when doing same-host copying. I presume socket > (which would slightly slow it down vs star). The reason why star is faster than any other copy method is based on the fact that star is not implemented like historical tar or cpio implementations. Since around 1990, star forks into two processes unless you forbid this by an option. In the normal modes, one of them is the "archive process" that just reads or writes from/to the archive file or tape, the other is the tar process that understands the archive content and deals with the filesystem (the direction of the filesystem operation depends on whether it is in extract or create mode). Between both processes, there is a large FIFO of shared memory that is used to share the data. If the FIFO has much free space, star will read files in one single chunk into the FIFO, this is another reason for it's speed. Another advantage in star is that it reads every directory in one large chunk and thus allows the OS to do optimization at this place. BTW: An OS that floods (and probably overflows) the stat/vnode cache in such a case may cause an unneeded slow down. In copy mode, star starts two archive processes and a FIFO between them. The create process tries to keep the FIFO as full as possible and as is makes sense to use a FIFO size up to aprox. half of the real system memory, this FIFO may be really huge, so it will even be able to keep modern tapes streaming for at least 30-60 seconds. Ufsdump only allows a small number of 126 kB buffers (I belive is it 6 buffer) and thus ufsdump | ufsrestore is tightly coupled while star allows to freely run both creation and extration of the internal virtual archive nearly independent from each other. This way, star does not need to wait every time extraction slows down but just fills the FIFO instead. Before SEEK_HOLE/SEEK_DATA existed, the only place where ufsdump was faster than star have been sparse files. This is why I talked with Jeff Bonwick in September 2004 to find a useful interface for user space programs (in special star) that do not read the filesystem at block level (like ufsdump) but cleanly in the documented POSIX way. Since SEEK_HOLE/SEEK_DATA have been introduced, there is no single known case, where star is not at least 30% faster than ufsdump. BTW: ufsdump is another implementation that first sits and collects all filenames before it starts to read file content. > That said, rsync is really the only solution if you have a partial or > interrupted copy. It's also really the best method to do verification. Star offers another method to continue interrupted extracts or copies: Star sets the time stamp of an incomplete file to 0 (1.1.1970 GMT). As star does not overwrite files in case they are not newer in the archive, star can skip the other files in extract mode and continue with the missing files or with the file(s) that have the time stamp 0. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss