[zfs-discuss] Re: Motley group of discs?
That's a lot of talking without an answer :) > internal EIDE 320GB (boot drive), internal 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive. > So, what's the best zfs configuration in this situation? RAIDZ uses disk space like RAID5. So the best you could do here for redundant space is (160 * 4 or 5)-160, and then use the remaining spaces as non-redundant or mirrored. If you want to play with opensolaris and zfs you can do so easily with a vmware or parallels virtual machine. It sounds like that is all you want to do right now. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: tape-backup software (was: Very Large Filesystems)
The software that we use for our production backups is compatible with ZFS. I cannot comment on it's stability with ZFS or ability to handle multi-TB filesystems, as we do not have any ZFS systems in production yet. I can say that overall, the software is solid, stable and very well documented. Check out the vendor's website: http://www.commvault.com This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motley group of discs?
Al Hopper wrote: On Fri, 4 May 2007, mike wrote: Isn't the benefit of ZFS that it will allow you to use even the most unreliable risks and be able to inform you when they are attempting to corrupt your data? Yes - I won't argue that ZFS can be applied exactly as you state above. However, ZFS is no substitute for bad practices that include: - not proactively replacing mechanical components *before* they fail There's a nice side benefit from this one: - the piece of hardware you retire becomes a backup of "old data" When I ran lots of older SPARC boxes, I made a point of upgrading the disks, from 1GB to 4G to 9GB...it was't for disk space but to put in place newer, quieter, faster, less power hungry drives and had the added benefit of ensuring that in 2004, the SCA SCSI drive in the SPARC 5 was made maybe 1 or 2 years ago, not 10 and thus also less likely to fail. I still try to do this with PC hard drives today, but sometimes they fail inside my replacement window :-( Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recovered state after system crash
kyusun Chang wrote On 05/04/07 19:34,: If system crashes some time after last commit of transaction group (TxG), what happens to the file system transactions since the last commit of TxG They are lost, unless they were synchronous (see below). (I presume last commit of TxG represents the last on-disk consistency)? Correct. Does ZFS recover all file system transactions which it returned with success since the last commit of TxG, which implis that ZIL must flush log records for > each successful file system transaction before it returns to caller so that it can replay the filesystem transactions? Only synchronous transactions (those forced by O_DSYNC or fsync()) are written to the intent log. Blogs on ZIL states (I hope I read it right) that log records are maintained in-memory and flushed to disk only when 1) at synchronous write request (does that mean they free in-memory log after that), Yes they are then freed in memory 2) when TxG is committed (and free in-memory log). Thank you for your time. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] recovered state after system crash
If system crashes some time after last commit of transaction group (TxG), what happens to the file system transactions since the last commit of TxG (I presume last commit of TxG represents the last on-disk consistency)? Does ZFS recover all file system transactions which it returned with success since the last commit of TxG, which implis that ZIL must flush log records for each successful file system transaction before it returns to caller so that it can replay the filesystem transactions? Blogs on ZIL states (I hope I read it right) that log records are maintained in-memory and flushed to disk only when 1) at synchronous write request (does that mean they free in-memory log after that), 2) when TxG is committed (and free in-memory log). Thank you for your time. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motley group of discs?
Lee Fyock wrote: > I didn't mean to kick up a fuss. > > I'm reasonably zfs-savvy in that I've been reading about it for a year > or more. I'm a Mac developer and general geek; I'm excited about zfs > because it's new and cool. > > At some point I'll replace my old desktop machine with something new > and better -- probably when Unreal Tournament 2007 arrives, > necessitating a faster processor and better graphics card. :-) > > In the mean time, I'd like to hang out with the system and drives I > have. As "mike" said, my understanding is that zfs would provide error > correction until a disc fails, if the setup is properly done. That's > the setup for which I'm requesting a recommendation. > > I won't even be able to use zfs until Leopard arrives in October, but > I want to bone up so I'll be ready when it does. > > Money isn't an issue here, but neither is creating an optimal zfs > system. I'm curious what the right zfs configuration is for the system > I have. > Given the odd sizes of your drives, there might not be one, unless you are willing to sacrifice capacity. Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motley group of discs?
I didn't mean to kick up a fuss. I'm reasonably zfs-savvy in that I've been reading about it for a year or more. I'm a Mac developer and general geek; I'm excited about zfs because it's new and cool. At some point I'll replace my old desktop machine with something new and better -- probably when Unreal Tournament 2007 arrives, necessitating a faster processor and better graphics card. :-) In the mean time, I'd like to hang out with the system and drives I have. As "mike" said, my understanding is that zfs would provide error correction until a disc fails, if the setup is properly done. That's the setup for which I'm requesting a recommendation. I won't even be able to use zfs until Leopard arrives in October, but I want to bone up so I'll be ready when it does. Money isn't an issue here, but neither is creating an optimal zfs system. I'm curious what the right zfs configuration is for the system I have. Thanks! Lee On May 4, 2007, at 7:41 PM, Al Hopper wrote: On Fri, 4 May 2007, mike wrote: Isn't the benefit of ZFS that it will allow you to use even the most unreliable risks and be able to inform you when they are attempting to corrupt your data? Yes - I won't argue that ZFS can be applied exactly as you state above. However, ZFS is no substitute for bad practices that include: - not proactively replacing mechanical components *before* they fail - not having maintenance policies in place To me it sounds like he is a SOHO user; may not have a lot of funds to go out and swap hardware on a whim like a company might. You may be right - but you're simply guessing. The original system probably cost around $3k (?? I could be wrong). So what I'm suggesting, that he spend ~ $300, represents ~ 10% of the original system cost. Since the OP asked for advice, I've given him the best advice I can come up with. I've also encountered many users who don't keep up to date with current computer hardware capabilities and pricing, and who may be completely unaware that you can purchase two 500Gb disk drives, with a 5 year warranty, for around $300. And possibly less if you checkout Frys weekly bargin disk drive offers. Now consider the total cost of ownership solution I recommended: 500 gigabytes of storage, coupled with ZFS, which translates into $60/ year for 5 years of error free storage capability. Can life get any better than this! :) Now contrast my recommendation with what you propose - re-targeting a bunch of older disk drives, which incorporate older, less reliable technology, with a view to saving money. How much is your time worth? How many hours will it take you to recover from a failure of one of these older drives and the accompying increased risk of data loss. If the ZFS savvy OP comes back to this list and says "Als' solution is too expensive" I'm perfectly willing to rethink my recommendation. For now, I believe it to be the best recommendation I can devise. ZFS in my opinion is well-suited for those without access to continuously upgraded hardware and expensive fault-tolerant hardware-based solutions. It is ideal for home installations where people think their data is safe until the disk completely dies. I don't know how many non-savvy people I have helped over the years who has no data protection, and ZFS could offer them at least some fault-tolerance and protection against corruption, and could help notify them when it is time to shut off their computer and call someone to come swap out their disk and move their data to a fresh drive before it's completely failed... Agreed. One piece-of-the-puzzle that's missing right now IMHO, is a reliable, two port, low-cost PCI SATA disk controller. A solid/de-bugged 3124 driver would go a long way to ZFS-enabling a bunch of cost- constrained ZFS users. And, while I'm working this hardware wish list, please ... a PCI- Express based version of the SuperMicro AOC-SAT2-MV8 8-port Marvell based disk controller card. Sun ... are you listening? - mike On 5/4/07, Al Hopper <[EMAIL PROTECTED]> wrote: On Fri, 4 May 2007, Lee Fyock wrote: Hi-- I'm looking forward to using zfs on my Mac at some point. My desktop server (a dual-1.25GHz G4) has a motley collection of discs that has accreted over the years: internal EIDE 320GB (boot drive), internal 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive. My guess is that I won't be able to use zfs on the boot 320 GB drive, at least this year. I'd like to favor available space over performance, and be able to swap out a failed drive without losing any data. So, what's the best zfs configuration in this situation? The FAQs I've read are usually related to matched (in size) drives. Seriously, the best solution here is to discard any drive that is 3 years (or more) old[1] and purchase two new SATA 500Gb drives. Setup the new drives as a zfs mirror. Being a believer in diversity, I'd recommend the fol
Re: [zfs-discuss] Motley group of discs?
On 5/4/07, Al Hopper <[EMAIL PROTECTED]> wrote: Yes - I won't argue that ZFS can be applied exactly as you state above. However, ZFS is no substitute for bad practices that include: - not proactively replacing mechanical components *before* they fail - not having maintenance policies in place I mainly was speaking on behalf of the home users. If any data is important you obviously get what you pay for. However I think ZFS can help improve the integrity - perhaps you don't know the disk is starting to fail until it has corrupted some data. If ZFS was in place, some if not all of the data would still have been safe. I replace my disks when they start to get corrupt, and I am still always nervous and have high-stress data moves off failing disks to the new ones/temporary storage. ZFS in my opinion is a proactive way to minimize data loss. It's obviously not an excuse to let your hardware rot for years. And, while I'm working this hardware wish list, please ... a PCI-Express based version of the SuperMicro AOC-SAT2-MV8 8-port Marvell based disk controller card. Sun ... are you listening? Yeah - I've got a wishlist too; port-multiplier friendly PCI-e adapters... Marvell or SI or anything as long as it's PCI-e and has 4 or 5 eSATA ports that can work with a port multipler (for 4-5 disks per port) ... I don't think there is a clear fully supported option yet or I'd be using it right now. - mike ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motley group of discs?
On Fri, 4 May 2007, mike wrote: > Isn't the benefit of ZFS that it will allow you to use even the most > unreliable risks and be able to inform you when they are attempting to > corrupt your data? Yes - I won't argue that ZFS can be applied exactly as you state above. However, ZFS is no substitute for bad practices that include: - not proactively replacing mechanical components *before* they fail - not having maintenance policies in place > To me it sounds like he is a SOHO user; may not have a lot of funds to > go out and swap hardware on a whim like a company might. You may be right - but you're simply guessing. The original system probably cost around $3k (?? I could be wrong). So what I'm suggesting, that he spend ~ $300, represents ~ 10% of the original system cost. Since the OP asked for advice, I've given him the best advice I can come up with. I've also encountered many users who don't keep up to date with current computer hardware capabilities and pricing, and who may be completely unaware that you can purchase two 500Gb disk drives, with a 5 year warranty, for around $300. And possibly less if you checkout Frys weekly bargin disk drive offers. Now consider the total cost of ownership solution I recommended: 500 gigabytes of storage, coupled with ZFS, which translates into $60/year for 5 years of error free storage capability. Can life get any better than this! :) Now contrast my recommendation with what you propose - re-targeting a bunch of older disk drives, which incorporate older, less reliable technology, with a view to saving money. How much is your time worth? How many hours will it take you to recover from a failure of one of these older drives and the accompying increased risk of data loss. If the ZFS savvy OP comes back to this list and says "Als' solution is too expensive" I'm perfectly willing to rethink my recommendation. For now, I believe it to be the best recommendation I can devise. > ZFS in my opinion is well-suited for those without access to > continuously upgraded hardware and expensive fault-tolerant > hardware-based solutions. It is ideal for home installations where > people think their data is safe until the disk completely dies. I > don't know how many non-savvy people I have helped over the years who > has no data protection, and ZFS could offer them at least some > fault-tolerance and protection against corruption, and could help > notify them when it is time to shut off their computer and call > someone to come swap out their disk and move their data to a fresh > drive before it's completely failed... Agreed. One piece-of-the-puzzle that's missing right now IMHO, is a reliable, two port, low-cost PCI SATA disk controller. A solid/de-bugged 3124 driver would go a long way to ZFS-enabling a bunch of cost-constrained ZFS users. And, while I'm working this hardware wish list, please ... a PCI-Express based version of the SuperMicro AOC-SAT2-MV8 8-port Marvell based disk controller card. Sun ... are you listening? > - mike > > > On 5/4/07, Al Hopper <[EMAIL PROTECTED]> wrote: > > On Fri, 4 May 2007, Lee Fyock wrote: > > > > > Hi-- > > > > > > I'm looking forward to using zfs on my Mac at some point. My desktop > > > server (a dual-1.25GHz G4) has a motley collection of discs that has > > > accreted over the years: internal EIDE 320GB (boot drive), internal > > > 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive. > > > > > > My guess is that I won't be able to use zfs on the boot 320 GB drive, > > > at least this year. I'd like to favor available space over > > > performance, and be able to swap out a failed drive without losing > > > any data. > > > > > > So, what's the best zfs configuration in this situation? The FAQs > > > I've read are usually related to matched (in size) drives. > > > > Seriously, the best solution here is to discard any drive that is 3 years > > (or more) old[1] and purchase two new SATA 500Gb drives. Setup the new > > drives as a zfs mirror. Being a believer in diversity, I'd recommend the > > following two products (one of each): > > > > - Western Digital Caviar RE2 WD5000YS 500GB 7200 RPM 16MB Cache SATA > > 3.0Gb/s Hard Drive [2] > > - Seagate Barracuda 7200.10 (Perpendicular Recording) ST3500630AS 500GB > > 7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive > > > > Not being familiar with Macs - I'm not sure about your availability of > > SATA ports on the motherboard. > > > > [1] it continues to amaze me that many sites, large or small, don't have a > > (written) policy for mechanical component replacement - whether disk > > drives or fans. > > [2] $151 at zipzoomfly.com > > [3] $130 at newegg.com > > > > Regards, > > > > Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] > > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > > OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 > > http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ > > ___
Re: [zfs-discuss] Motley group of discs?
mike wrote: > Isn't the benefit of ZFS that it will allow you to use even the most > unreliable risks and be able to inform you when they are attempting to > corrupt your data? > > To me it sounds like he is a SOHO user; may not have a lot of funds to > go out and swap hardware on a whim like a company might. > There's a limit to haw much even ZFS can do with bad disks, sure it can manage a failing mirror better than SVM or low end hardware RAID, but given the motley collection of drives in the OP's system, there aren't that many options. Given the silly prices of new drives (320GB are about the best $/GB), replacement is the best long term option. Otherwise, mirroring the largest two drives and discard the small one might be a good option. Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motley group of discs?
Isn't the benefit of ZFS that it will allow you to use even the most unreliable risks and be able to inform you when they are attempting to corrupt your data? To me it sounds like he is a SOHO user; may not have a lot of funds to go out and swap hardware on a whim like a company might. ZFS in my opinion is well-suited for those without access to continuously upgraded hardware and expensive fault-tolerant hardware-based solutions. It is ideal for home installations where people think their data is safe until the disk completely dies. I don't know how many non-savvy people I have helped over the years who has no data protection, and ZFS could offer them at least some fault-tolerance and protection against corruption, and could help notify them when it is time to shut off their computer and call someone to come swap out their disk and move their data to a fresh drive before it's completely failed... - mike On 5/4/07, Al Hopper <[EMAIL PROTECTED]> wrote: On Fri, 4 May 2007, Lee Fyock wrote: > Hi-- > > I'm looking forward to using zfs on my Mac at some point. My desktop > server (a dual-1.25GHz G4) has a motley collection of discs that has > accreted over the years: internal EIDE 320GB (boot drive), internal > 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive. > > My guess is that I won't be able to use zfs on the boot 320 GB drive, > at least this year. I'd like to favor available space over > performance, and be able to swap out a failed drive without losing > any data. > > So, what's the best zfs configuration in this situation? The FAQs > I've read are usually related to matched (in size) drives. Seriously, the best solution here is to discard any drive that is 3 years (or more) old[1] and purchase two new SATA 500Gb drives. Setup the new drives as a zfs mirror. Being a believer in diversity, I'd recommend the following two products (one of each): - Western Digital Caviar RE2 WD5000YS 500GB 7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive [2] - Seagate Barracuda 7200.10 (Perpendicular Recording) ST3500630AS 500GB 7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive Not being familiar with Macs - I'm not sure about your availability of SATA ports on the motherboard. [1] it continues to amaze me that many sites, large or small, don't have a (written) policy for mechanical component replacement - whether disk drives or fans. [2] $151 at zipzoomfly.com [3] $130 at newegg.com Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motley group of discs?
On 4-May-07, at 6:53 PM, Al Hopper wrote: ... [1] it continues to amaze me that many sites, large or small, don't have a (written) policy for mechanical component replacement - whether disk drives or fans. You're not the only one. In fact, while I'm not exactly talking "enterprise" level here - more usually "IT" and we know what that means - I've seen many RAID systems purchased and set up without any spare disks on hand, or any thought given to what happens next when one fails. Likely this is a combination of low expectations (you can usually blame Windows and everyone will believe it) from the computing services department combined with a lack of feedback ("you're fired") when massive data loss occurs. --Toby ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Motley group of discs?
On Fri, 4 May 2007, Lee Fyock wrote: > Hi-- > > I'm looking forward to using zfs on my Mac at some point. My desktop > server (a dual-1.25GHz G4) has a motley collection of discs that has > accreted over the years: internal EIDE 320GB (boot drive), internal > 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive. > > My guess is that I won't be able to use zfs on the boot 320 GB drive, > at least this year. I'd like to favor available space over > performance, and be able to swap out a failed drive without losing > any data. > > So, what's the best zfs configuration in this situation? The FAQs > I've read are usually related to matched (in size) drives. Seriously, the best solution here is to discard any drive that is 3 years (or more) old[1] and purchase two new SATA 500Gb drives. Setup the new drives as a zfs mirror. Being a believer in diversity, I'd recommend the following two products (one of each): - Western Digital Caviar RE2 WD5000YS 500GB 7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive [2] - Seagate Barracuda 7200.10 (Perpendicular Recording) ST3500630AS 500GB 7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive Not being familiar with Macs - I'm not sure about your availability of SATA ports on the motherboard. [1] it continues to amaze me that many sites, large or small, don't have a (written) policy for mechanical component replacement - whether disk drives or fans. [2] $151 at zipzoomfly.com [3] $130 at newegg.com Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] thoughts on ZFS copies
I've put together some thoughts on the ZFS copies property. http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection I hope that you might find this useful. I tried to use simplified drawings to illustrate the important points. Feedback appreciated. There is more work to be done to understand all of the implications presented by the copies feature, so if you find something confusion or have questions, then please speak up and I'll add it to the list of things to do. I've already got performance characterization and modeling on the list :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ARC, mmap, pagecache...
On 5/4/07, Roch - PAE <[EMAIL PROTECTED]> wrote: Manoj Joseph writes: > Hi, > > I was wondering about the ARC and its interaction with the VM > pagecache... When a file on a ZFS filesystem is mmaped, does the ARC > cache get mapped to the process' virtual memory? Or is there another copy? > My understanding is, The ARC does not get mapped to user space. The data ends up in the ARC (recordsize chunks) and in the page cache (in page chunks). Both copies are updated on writes. If that is the case, are there any plans to unify the ARC and the page cache? Thanks, - Ryan -- UNIX Administrator http://prefetch.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ARC, mmap, pagecache...
Manoj Joseph writes: > Hi, > > I was wondering about the ARC and its interaction with the VM > pagecache... When a file on a ZFS filesystem is mmaped, does the ARC > cache get mapped to the process' virtual memory? Or is there another copy? > My understanding is, The ARC does not get mapped to user space. The data ends up in the ARC (recordsize chunks) and in the page cache (in page chunks). Both copies are updated on writes. -r > -Manoj > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Motley group of discs?
Hi-- I'm looking forward to using zfs on my Mac at some point. My desktop server (a dual-1.25GHz G4) has a motley collection of discs that has accreted over the years: internal EIDE 320GB (boot drive), internal 250, 200 and 160 GB drives, and an external USB 2.0 600 GB drive. My guess is that I won't be able to use zfs on the boot 320 GB drive, at least this year. I'd like to favor available space over performance, and be able to swap out a failed drive without losing any data. So, what's the best zfs configuration in this situation? The FAQs I've read are usually related to matched (in size) drives. Thanks! Lee ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> A couple more questions here. ... > You still have idle time in this lockstat (and mpstat). > > What do you get for a lockstat -A -D 20 sleep 30? > > Do you see anyone with long lock hold times, long > sleeps, or excessive spinning? Hmm, I ran a series of "lockstat -A -l ph_mutex -s 16 -D 20 sleep 5" commands while writing to the gzip compressed zpool, and noticed these high mutex block times: Adaptive mutex block: 8 events in 5.100 seconds (2 events/sec) --- Count indv cuml rcnt nsec Lock Caller 5 62% 62% 0.00 317300109 ph_mutex+0x1380page_create_va+0x334 nsec -- Time Distribution -- count Stack 536870912 |@@ 5 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1b8 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc --- Count indv cuml rcnt nsec Lock Caller 1 12% 75% 0.00 260247717 ph_mutex+0x1a40page_create_va+0x334 nsec -- Time Distribution -- count Stack 268435456 |@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1de z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc --- Count indv cuml rcnt nsec Lock Caller 1 12% 88% 0.00 348135263 ph_mutex+0x1380page_create_va+0x334 nsec -- Time Distribution -- count Stack 536870912 |@@ 1 segkmem_page_create+0x89 segkmem_xalloc+0xbc segkmem_alloc_vn+0xcd segkmem_alloc+0x20 vmem_xalloc+0x4fc vmem_alloc+0x159 kmem_alloc+0x4f kobj_alloc+0x7e kobj_zalloc+0x1c zcalloc+0x2d z_deflateInit2_+0x1a1 z_deflateInit_+0x32 z_compress_level+0x77 gzip_compress+0x4b zio_compress_data+0xbc -
Re: [zfs-discuss] Force rewriting of all data, to push stripes onto newly added devices?
Mario Goebbels wrote: I'm just in sort of a scenario, where I've added devices to a pool and would now like the existing data to be spread across the new drives, to increase the performance. Is there a way to do it, like a scrub? Or would I have to have all files to copy over themselves, or similar hacks? for the short term, cp works (or any other process which would result in a new write of the files). -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force rewriting of all data, to push stripes onto newly added devices?
Mario Goebbels wrote: I'm just in sort of a scenario, where I've added devices to a pool and would now like the existing data to be spread across the new drives, to increase the performance. Is there a way to do it, like a scrub? Or would I have to have all files to copy over themselves, or similar hacks? Thanks, -mg This requires rewriting the block pointers; it's the same problem as supporting vdev removal. I would guess that they'll be solved at the same time. - Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: gzip compression throttles system?
Ian Collins writes: > Roch Bourbonnais wrote: > > > > with recent bits ZFS compression is now handled concurrently with many > > CPUs working on different records. > > So this load will burn more CPUs and acheive it's results > > (compression) faster. > > > Would changing (selecting a smaller) filesystem record size have any effect? > If the problem is that we just have a high kernel load compressing blocks, then probably not. If anything small records might be a tad less efficient (thus needing more CPU). > > So the observed pauses should be consistent with that of a load > > generating high system time. > > The assumption is that compression now goes faster than when is was > > single threaded. > > > > Is this undesirable ? We might seek a way to slow down compression in > > order to limit the system load. > > > I think you should, otherwise we have a performance throttle that scales > with the number of cores! > Again I wonder to what extent the issue becomes painful due to lack of write throttling. Once we have that in, we should revisit this. -r > Ian > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
Roch Bourbonnais wrote > with recent bits ZFS compression is now handled concurrently with > many CPUs working on different records. > So this load will burn more CPUs and acheive it's results > (compression) faster. Is this done using the taskq's, created in spa_activate()? http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#109 These threads seems to be running the gzip compression code, and are apparently started with a priority of maxclsyspri == 99. > So the observed pauses should be consistent with that of a load > generating high system time. > The assumption is that compression now goes faster than when is was > single threaded. > > Is this undesirable ? We might seek a way to slow > down compression in order to limit the system load. Hmm, I see that the USB device drivers are also using taskq's, see file usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c, function usba_init_pipe_handle(). The USB device driver is using a priority of minclsyspri == 60 (or "maxclsyspri - 5" == 94, in the case of isochronuous usb pipes): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/usb/usba/usbai_pipe_mgmt.c#427 Could this be a problem? That is, when zfs' taskq is filled with lots of compression requests, there is no time left running USB taskq that have a lower priority than zfs? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iscsitadm local_name in ZFS
cedric briner wrote: hello dear community, Is there a way to have a ``local_name'' as define in iscsitadm.1m when you shareiscsi a zvol. This way, it will give even easier way to identify an device through IQN. Ced. Okay no reply from you so... maybe I didn't make myself well understandable. Let me try to re-explain you what I mean: when you use zvol and enable shareiscsi, could you add a suffix to the IQN (Iscsi Qualified Name). This suffix will be given by myself and will help me to identify which IQN correspond to which zvol : this is just a more human readable tag on an IQN. Similarly, this tag is also given when you do an iscsitadm. And in the man page of iscsitadm it is called a . iscsitadm iscsitadm create target -b /dev/dsk/c0d0s5 tiger or iscsitadm iscsitadm create target -b /dev/dsk/c0d0s5 hd-1 tiger and hd-1 are Ced. -- Cedric BRINER Geneva - Switzerland ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Force rewriting of all data, to push stripes onto newly added devices?
I'm just in sort of a scenario, where I've added devices to a pool and would now like the existing data to be spread across the new drives, to increase the performance. Is there a way to do it, like a scrub? Or would I have to have all files to copy over themselves, or similar hacks? Thanks, -mg This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: gzip compression throttles system?
> A couple more questions here. ... > What do you have zfs compresison set to? The gzip level is > tunable, according to zfs set, anyway: > > PROPERTY EDIT INHERIT VALUES > compression YES YES on | off | lzjb | gzip | gzip-[1-9] I've used the "default" gzip compression level, that is I used zfs set compression=gzip gzip_pool > You still have idle time in this lockstat (and mpstat). > > What do you get for a lockstat -A -D 20 sleep 30? # lockstat -A -D 20 /usr/tmp/fill /gzip_pool/junk lockstat: warning: 723388 aggregation drops on CPU 0 lockstat: warning: 239335 aggregation drops on CPU 1 lockstat: warning: 62366 aggregation drops on CPU 0 lockstat: warning: 51856 aggregation drops on CPU 1 lockstat: warning: 45187 aggregation drops on CPU 0 lockstat: warning: 46536 aggregation drops on CPU 1 lockstat: warning: 687832 aggregation drops on CPU 0 lockstat: warning: 575675 aggregation drops on CPU 1 lockstat: warning: 46504 aggregation drops on CPU 0 lockstat: warning: 40874 aggregation drops on CPU 1 lockstat: warning: 45571 aggregation drops on CPU 0 lockstat: warning: 33422 aggregation drops on CPU 1 lockstat: warning: 501063 aggregation drops on CPU 0 lockstat: warning: 361041 aggregation drops on CPU 1 lockstat: warning: 651 aggregation drops on CPU 0 lockstat: warning: 7011 aggregation drops on CPU 1 lockstat: warning: 61600 aggregation drops on CPU 0 lockstat: warning: 19386 aggregation drops on CPU 1 lockstat: warning: 566156 aggregation drops on CPU 0 lockstat: warning: 105502 aggregation drops on CPU 1 lockstat: warning: 25362 aggregation drops on CPU 0 lockstat: warning: 8700 aggregation drops on CPU 1 lockstat: warning: 585002 aggregation drops on CPU 0 lockstat: warning: 645299 aggregation drops on CPU 1 lockstat: warning: 237841 aggregation drops on CPU 0 lockstat: warning: 20931 aggregation drops on CPU 1 lockstat: warning: 320102 aggregation drops on CPU 0 lockstat: warning: 435898 aggregation drops on CPU 1 lockstat: warning: 115 dynamic variable drops with non-empty dirty list lockstat: warning: 385192 aggregation drops on CPU 0 lockstat: warning: 81833 aggregation drops on CPU 1 lockstat: warning: 259105 aggregation drops on CPU 0 lockstat: warning: 255812 aggregation drops on CPU 1 lockstat: warning: 486712 aggregation drops on CPU 0 lockstat: warning: 61607 aggregation drops on CPU 1 lockstat: warning: 1865 dynamic variable drops with non-empty dirty list lockstat: warning: 250425 aggregation drops on CPU 0 lockstat: warning: 171415 aggregation drops on CPU 1 lockstat: warning: 166277 aggregation drops on CPU 0 lockstat: warning: 74819 aggregation drops on CPU 1 lockstat: warning: 39342 aggregation drops on CPU 0 lockstat: warning: 3556 aggregation drops on CPU 1 lockstat: warning: ran out of data records (use -n for more) Adaptive mutex spin: 4701 events in 64.812 seconds (73 events/sec) Count indv cuml rcnt spin Lock Caller --- 1726 37% 37% 0.002 vph_mutex+0x17e8 pvn_write_done+0x10c 1518 32% 69% 0.001 vph_mutex+0x17e8 hat_page_setattr+0x70 264 6% 75% 0.002 vph_mutex+0x2000 page_hashin+0xad 194 4% 79% 0.004 0xfffed2ee0a88 cv_wait+0x69 106 2% 81% 0.002 vph_mutex+0x2000 page_hashout+0xdd 91 2% 83% 0.004 0xfffed2ee0a88 taskq_dispatch+0x2c9 83 2% 85% 0.004 0xfffed2ee0a88 taskq_thread+0x1cb 83 2% 86% 0.001 0xfffec17a56b0 ufs_iodone+0x3d 47 1% 87% 0.004 0xfffec1e4ce98 vdev_queue_io+0x85 43 1% 88% 0.006 0xfffec139a2c0 trap+0xf66 38 1% 89% 0.006 0xfffecb5f8cd0 cv_wait+0x69 37 1% 90% 0.004 0xfffec143ee90 dmult_deque+0x36 26 1% 91% 0.002 htable_mutex+0x108 htable_release+0x79 26 1% 91% 0.001 0xfffec17a56b0 ufs_putpage+0xa4 18 0% 91% 0.004 0xfffec00dca48 ghd_intr+0xa8 17 0% 92% 0.002 0xfffec00dca48 ghd_waitq_delete+0x35 12 0% 92% 0.002 htable_mutex+0x248 htable_release+0x79 11 0% 92% 0.008 0xfffec1e4ce98 vdev_queue_io_done+0x3b 10 0% 93% 0.003 0xfffec00dca48 ghd_transport+0x71 10 0% 93% 0.002 0xff00077dc138 page_get_mnode_freelist+0xdb --- Adaptive mutex block: 167 events in 64.812 seconds (3 events/sec) Count indv cuml rcnt nsec Lock Caller --- 78 47% 47% 0.0031623 vph_mutex+0x17e8 pvn_write_done+0x10c
Re: [zfs-discuss] gzip compression throttles system?
Darren Moffat, Yes and no. A earlier statement within this discussion was whether gzip is appropriate for .wav files. This just gets a relative time to compress. And relative sizes of the files after the compression. My assumption is that gzip will run as a user app in one environment. The normal r/w sys calls then take a user buffer. So, it would be hard to believe that the .wav file won't be read one user buff at at time. Yes, it could be mmap'ed, but then it would have to be unmapped. Too many sys calls, I think for the app. Sorry, haven't looked at it for awhile.. Overall, I am just trying to guess at the read-ahead delay versus the user buffer versus the internal fs. The internal FS should take it basicly one FS block at a time (or do multiple blocks in parallel) and the user app takes it anywhere from one buffer to one page size, 8k at a time. So, due to reading one buffer at a time in a loop, with a context switch from kernel to user each time. Thus, I would expect that the gzip app would be slwer. So, my first step is to keep it simple (KISS)and tell the group "what happens if" we do this simple comparison? And how many bytes/sec is compressed? And are they approx the same speed? Do you end up with the same size file Mitchell Erblich -- Darren J Moffat wrote: > > Erblichs wrote: > > So, my first order would be to take 1GB or 10GB .wav files > > AND time both the kernel implementation of Gzip and the > > user application. Approx the same times MAY indicate > > that the kernel implementation gzip funcs should > > be treatedly maybe more as interactive scheduling > > threads and that it is too high and blocks other > > threads or proces from executing. > > If you just run gzip(1) against the files you are operating on the whole > file so you only incur startup costs once and are thus doing quite a > different compression to operating on a block level. A fairer > comparison would be to build a userland program that compresses and then > writes to disk in ZFS blocksize chunks, that way you are compressing the > same sizes of data and doing the startup every time just like zio has to do. > > -- > Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: Re: ZFS improvements
> On Mon, Apr 23, 2007 at 09:38:47AM -0700, Gino wrote: > > > > we had 5 corrupted zpool (on different servers and > different SANs) ! > > With Solaris up to S10U3 and Nevada up to snv59 we > are able to corrupt > > easily a zpool only disconnecting a few times one > or more luns of a > > zpool under high i/o load. > > > > We are testing now snv60. > > As I've mentioned before, I believe you were tripping > over the space map > bug (6458218) which was fixed in build 60 and will > appear in S10u4. Let > us know if you are able to reproduce the problem on > build 60 or later. Eric, we done our first test with snv60. we moved over 40TB of data between 4 zpools and in the mean time we've done about 10 snapshots and forced 50 panic disabling ports on the fc switches. None of the pools have been corrupted! Also we found snv60 MUCH more stable than S10U3. gino This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs UFS2 overhead and may be a bug?
On Thu, May 03, 2007 at 02:15:45PM -0700, Bakul Shah wrote: > [originally reported for ZFS on FreeBSD but Pawel Jakub Dawid > says this problem also exists on Solaris hence this email.] Thanks! > Summary: on ZFS, overhead for reading a hole seems far worse > than actual reading from a disk. Small buffers are used to > make this overhead more visible. > > I ran the following script on both ZFS and UF2 filesystems. > > [Note that on FreeBSD cat uses a 4k buffer and md5 uses a 1k > buffer. On Solaris you can replace them with dd with > respective buffer sizes for this test and you should see > similar results.] > > $ dd SPACY# 10G zero bytes allocated > $ truncate -s 10G HOLEY # no space allocated > > $ time dd /dev/null bs=1m # A1 > $ time dd /dev/null bs=1m # A2 > $ time cat SPACY >/dev/null # B1 > $ time cat HOLEY >/dev/null # B2 > $ time md5 SPACY # C1 > $ time md5 HOLEY # C2 > > I have summarized the results below. > > ZFSUFS2 > Elapsed System Elapsed System Test > dd SPACY bs=1m 110.26 22.52340.38 19.11 A1 > dd HOLEY bs=1m 22.44 22.41 24.24 24.13 A2 > > cat SPACY 119.64 33.04 342.77 17.30 B1 > cat HOLEY 222.85 222.08 22.91 22.41 B2 > > md5 SPACY 210.01 77.46 337.51 25.54 C1 > md5 HOLEY 856.39 801.21 82.11 28.31 C2 This is what I see on Solaris (hole is 4GB): # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=128k real 23.7 # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=128k real 21.2 # /usr/bin/time dd if=/ufs/hole of=/dev/null bs=4k real 31.4 # /usr/bin/time dd if=/zfs/hole of=/dev/null bs=4k real 7:32.2 -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpHFXMS6aW7i.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Filesystem full not reported in /var/adm/messages
Hello, Is someone able to explain me why zfs does not report a filesystem full in /var/adm/messages ? Did I miss something or is it a expected behaviour ? Tested on Solaris 11/06 (ZFS version 3) Thank you for your feedback ! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] gzip compression throttles system?
Erblichs wrote: So, my first order would be to take 1GB or 10GB .wav files AND time both the kernel implementation of Gzip and the user application. Approx the same times MAY indicate that the kernel implementation gzip funcs should be treatedly maybe more as interactive scheduling threads and that it is too high and blocks other threads or proces from executing. If you just run gzip(1) against the files you are operating on the whole file so you only incur startup costs once and are thus doing quite a different compression to operating on a block level. A fairer comparison would be to build a userland program that compresses and then writes to disk in ZFS blocksize chunks, that way you are compressing the same sizes of data and doing the startup every time just like zio has to do. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss