Re: [zfs-discuss] ZFS on a damaged disk
On 12 December, 2006 - Patrick P Korsnick sent me these 1,1K bytes: > i have a machine with a disk that has some sort of defect and i've > found that if i partition only half of the disk that the machine will > still work. i tried to use 'format' to scan the disk and find the bad > blocks, but it didn't work. > > so as i don't know where the bad blocks are but i'd still like to use > some of the rest of the disk, i thought ZFS might be able to help. i > partitioned the disk so slices 4,5,6 and 7 are each 5GB. i thought > i'd make one or multiple zpools on those slices and then i'd be able > to narrow down where the bad sections are. > > so my question is can i declare a zpool that spans multiple c0d0sXX > but isn't a mirror and if i can, then will zfs be able to detect where > the problem c0d0sXX is and not use it? if not, i'll have to make 4 > different zpools and experiment with storing stuff on each to find the > approximate location of the bad blocks. Either create 4 separate pools; zpool create slice4 c0d0s4;zpool create slice5 c0d0s5; and then torture each of them to see where it's corrupted.. Or you can for instance create a raidz(2) of those 4 and watch performance go down the hill, but still work.. zpool create broken raidz2 c0d0s4 c0d0s5 c0d0s6 c0d0s7 /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS on a damaged disk
i have a machine with a disk that has some sort of defect and i've found that if i partition only half of the disk that the machine will still work. i tried to use 'format' to scan the disk and find the bad blocks, but it didn't work. so as i don't know where the bad blocks are but i'd still like to use some of the rest of the disk, i thought ZFS might be able to help. i partitioned the disk so slices 4,5,6 and 7 are each 5GB. i thought i'd make one or multiple zpools on those slices and then i'd be able to narrow down where the bad sections are. so my question is can i declare a zpool that spans multiple c0d0sXX but isn't a mirror and if i can, then will zfs be able to detect where the problem c0d0sXX is and not use it? if not, i'll have to make 4 different zpools and experiment with storing stuff on each to find the approximate location of the bad blocks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Kickstart hot spare attachment
> If the SCSI commands hang forever, then there is nothing that ZFS can > do, as a single write will never return. The more likely case is that > the commands are continually timining out with very long response times, > and ZFS will continue to talk to them forever. It looks like the sd driver defaults to a 60-second timeout, which is quite long. It might be useful if FMA saw a potential fault for any I/O longer than some much lower value. (This gets tricky with power management, since if you have to wait for the disk to spin up, it can take a long time compared to normal I/O.) That said, it sounds to me like your enclosure is actually powering down the drive. If so, it ought to stop responding to selection, and I/O should fail in a "hard" way within 250 ms (or less, depending on whether you've got a SCSI bus which supports QAS, as the newer, faster versions do). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS and write caching (SATA)
It took manufacturers of SCSI drives some years to get this right. Around 1997 or so we were still seeing drives at my former employer that didn't properly flush their caches under all circumstances (and had other "interesting" behaviours WRT caching). Lots of ATA disks never did bother to implement the write cache controls. I haven't talked recently with any vendors who have been sourcing SATA disks, so I don't know what they're seeing. Generally the major players have their own disk qualification suites and often wind up with custom firmware because they want all of their detected bugs fixed before they'll accept a particular disk. If you buy a disk off-the-shelf, you get a drive that's gone through the disk manufacturer's testing (which is good, don't get me wrong) but hasn't been qualified with the particular commands or configuration that a particular operating system or file system might send. If you can do your own tests, that would be best; but that involves executing a flush (with all the various combinations of commands outstanding, dirty vs. clean cache buffers, etc.) and immediately powering off the device, which generally can't be done without special hardware. My *hunch* is that "enterprise-class" SATA disks have probably gone through more of this sort of testing than consumer SATA, even at the drive manufacturers. (It's not at all the same firmware.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Usage in Warehousing (lengthy intro)
> http://www.norcotek.com/item_detail.php?categoryid=8&modelno=DS-1220 yea SiI3726 Multipliers, are cool.. http://cooldrives.com/cosapomubrso.html http://cooldrives.com/mac-port-multiplier-sata-case.html but finding PCI-X slots for Ying Tian's si3124 or marvell88sx cards are getting tricky.. even harder at 133Mhz. the 1x PCIe two SATA si3132 card should come up http://elektronkind.org/category/geekery/solaris/ but has issues http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6404812 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6492430 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6492427 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=2133861 what would be nice is support for Marvell's 88SX7042 4x PCIe four SATA card http://www.amug.org/amug-web/html/amug/reviews/articles/sonnet/e4p/ an easier bet is AMD's 4x4 Platform http://www.tomshardware.com/2006/11/30/brute_force_quad_cores/page6.html with its watered down Professional 3600 chipset http://www.nvidia.com/page/pg_20060814366736.html that would likely "just work" with 12 sata ports. man, if someone would sell me a diskless thumper... its an impressive grouping of PCI-X slots. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Storage Pool advice
> Were looking for pure performance. > > What will be contained in the LUNS is Student User > account files that they will access and Department > Share files like, MS word documents, excel files, > PDF. There will be no applications on the ZFS > Storage pools or pool Does this help on what > strategy might be best? I think so. I would suggest striping a single pool across all available LUNs, then. (I'm presuming that you would be prepared to recover from ZFS-detected errors by reloading from backup.) There doesn't seem any compelling reason to split your storage into multiple pools, and by using a single pool, you don't have to worry about reallocating storage if one pool fills up while another has free space. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Uber block corruption?
> Also note that the UB is written to every vdev (4 per disk) so the > chances of all UBs being corrupted is rather low. The chances that they're corrupted by the storage system, yes. However, they are all sourced from the same in-memory buffer, so an undetected in-memory error (e.g. kernel bug) will be replicated to all vdevs. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS behavior under heavy load (I/O that is)
I think you may be observing that fsync() is slow. The file will be written, and visible to other processes via the in-memory cache, before the data has been pushed to disk. vi forces the data out via fsync, and that can be quite slow when the file system is under load, especially before a fix which allows fsync to work on a per-file basis. (In the S10U2 aka 6/06 Solaris release, fsync on ZFS forced all changes to disk, not just those of the requested file.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Monitoring ZFS
Thanks, Neil, for the assistance. Tom Neil Perrin wrote On 12/12/06 19:59,: >Tom Duell wrote On 12/12/06 17:11,: > > >>Group, >> >>We are running a benchmark with 4000 users >>simulating a hospital management system >>running on Solaris 10 6/06 on USIV+ based >>SunFire 6900 with 6540 storage array. >> >>Are there any tools for measuring internal >>ZFS activity to help us understand what is going >>on during slowdowns? >> >> > >dtrace can be used in numerous ways to examine >every part of ZFS and Solaris. lockstat(1M) (which actually >uses dtrace underneath) can also be used to see the cpu activity >(try lockstat -kgIW -D 20 sleep 10). > >You can also use iostat (eg iostat -xnpcz) to look at disk activity. > > Yes, we are doing this and the disks are performing extremely well. > > >>We have 192GB of RAM and while ZFS runs >>well most of the time, there are times where >>the system time jumps up to 25-40% >>as measured by vmstat and iostat. These >>times coincide with slowdowns in file access >>as measured by a side program that simply >>reads a random block in a file... these response >>times can exceed 1 second or longer. >> >> > >ZFS commits transaction groups every 5 seconds. >I suspect this flurry of activity is due to that. >Commiting can indeed take longer than a second. > >You might be able to show this by changing it with: > ># echo txg_time/W 10 | mdb -kw > >then the activity should be longer but less frequent. >I don't however recommend you keep it at that value. > > > Thanks, we may try that to see what effects it might have. > > >>Any pointers greatly appreaciated! >> >>Tom >> >> >> >> >> >>___ >>zfs-discuss mailing list >>zfs-discuss@opensolaris.org >>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS behavior under heavy load (I/O that is)
I'm observing the following behavior on our E2900 (24 x 92 config), 2 FCs, and ... I've a large filesystem (~758GB) with compress mode on. When this filesystem is under heavy load (>150MB/S) I've problems saving files in 'vi'. I posted here about it and recall that the issue is addressed in Sol10U3. This morning I observed another variation of this problem as follows: - Create a file in 'vi' and save it, session will hang as if it is waiting for the write to complete. - In another session you'll observe the write from 'vi' is indeed complete as evidenced by the contents of the file. Am I repeating myself here or is it a different problem all together. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Monitoring ZFS
Tom Duell wrote On 12/12/06 17:11,: Group, We are running a benchmark with 4000 users simulating a hospital management system running on Solaris 10 6/06 on USIV+ based SunFire 6900 with 6540 storage array. Are there any tools for measuring internal ZFS activity to help us understand what is going on during slowdowns? dtrace can be used in numerous ways to examine every part of ZFS and Solaris. lockstat(1M) (which actually uses dtrace underneath) can also be used to see the cpu activity (try lockstat -kgIW -D 20 sleep 10). You can also use iostat (eg iostat -xnpcz) to look at disk activity. We have 192GB of RAM and while ZFS runs well most of the time, there are times where the system time jumps up to 25-40% as measured by vmstat and iostat. These times coincide with slowdowns in file access as measured by a side program that simply reads a random block in a file... these response times can exceed 1 second or longer. ZFS commits transaction groups every 5 seconds. I suspect this flurry of activity is due to that. Commiting can indeed take longer than a second. You might be able to show this by changing it with: # echo txg_time/W 10 | mdb -kw then the activity should be longer but less frequent. I don't however recommend you keep it at that value. Any pointers greatly appreaciated! Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Uber block corruption?
> Hello Toby, > > Tuesday, December 12, 2006, 4:18:54 PM, you wrote: > TT> On 12-Dec-06, at 9:46 AM, George Wilson wrote: > > >> Also note that the UB is written to every vdev (4 per disk) so the > >> chances of all UBs being corrupted is rather low. > > It depends actually - if all your vdevs are on the same array with > write back cache set to on you actually can end-up with all UB > corrupted - at least in theory. Do such caches respond to explicit flushes? My understanding is that it should try to flush between writing the front 2 and the back 2. Not that even that would guarantee anything if there are real bugs in the cache code, but it would improve the odds. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Monitoring ZFS
Group, We are running a benchmark with 4000 users simulating a hospital management system running on Solaris 10 6/06 on USIV+ based SunFire 6900 with 6540 storage array. Are there any tools for measuring internal ZFS activity to help us understand what is going on during slowdowns? We have 192GB of RAM and while ZFS runs well most of the time, there are times where the system time jumps up to 25-40% as measured by vmstat and iostat. These times coincide with slowdowns in file access as measured by a side program that simply reads a random block in a file... these response times can exceed 1 second or longer. Any pointers greatly appreaciated! Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Storage Pool advice
Also there will be no NFS services on this system. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Storage Pool advice
Were looking for pure performance. What will be contained in the LUNS is Student User account files that they will access and Department Share files like, MS word documents, excel files, PDF. There will be no applications on the ZFS Storage pools or pool Does this help on what strategy might be best? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance problems during 'destroy' (and bizzare Zone problem as well)
Anantha N. Srirama wrote: - Why is the destroy phase taking so long? Destroying clones will be much faster with build 53 or later (or the unreleased s10u4 or later) -- see bug 6484044. - What can explain the unduly long snapshot/clone times - Why didn't the Zone startup? - More surprisingly why did the Zone startup after an hour? Perhaps there was so much activity on the system that we couldn't push out transaction groups in the usual < 5 seconds. 'zfs snapshot' and 'zfs clone' take at least 1 transaction group to complete, so this could explain it. We've seen this problem as well and are working on a fix... --mat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and write caching (SATA)
> PS> While I do intend to perform actual powerloss tests, it would be > interesting PS> to hear from anybody whether it is generally expected to be > safe. > > Well is disks honors cache flush commands then it should be reliable > wether it's SATA or SCSI disk. Yes. Sorry, I could have stated my question clear:er. What I am specifically concerned about is exactly that - whether your typical SATA drive *will* honor cache flush commands, as I understand a lot of PATA drives did/do not. Googling tends to give very little concrete information on this since very few people actually seem to care about this. Since I wanted to confirm my understanding of ZFS semantics w.r.t. write caching anyway I thought I might aswell also ask about the general tendency among drives since, if anywhere, people here might know. -- / Peter Schuller, InfiDyne Technologies HB PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Storage Pool advice
Hi Kory, It depends on the capabilities of your array in our experience...and also the zpool type. If you're going to do RAID-Z in a write intensive environment you're going to have a lot more I/Os with three LUNs then a single large LUN. Your controller may go nutty. Also, (Richard can address this better than I) you may want to disable the ZIL or have your array ignore the write cache flushes that ZFS issues. Best Regards, Jason On 12/12/06, Kory Wheatley <[EMAIL PROTECTED]> wrote: This question is concerning ZFS. We have a Sun Fire V890 attached to a EMC disk array. Here's are plan to incorporate ZFS: On our EMC storage array we will create 3 LUNS. Now how would ZFS be used for the best performance? What I'm trying to ask is if you have 3 LUNS and you want to create a ZFS storage pool, would it be better to have a storage pool per LUN or combine the 3 LUNS as one big disks under ZFS and create 1 huge ZFS storage pool. Example: LUN1 200gb ZFS Storage Pool "pooldata1" LUN2 200gb ZFS Storage Pool "pooldata2" LUN3 200gb ZFS Storage Pool "pooldata3" or LUN 600gb ZFS Storage Pool "alldata" This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS
Robert Milkowski wrote: Hello Matthew, MCA> Also, I am considering what type of zpools to create. I have a MCA> SAN with T3Bs and SE3511s. Since neither of these can work as a MCA> JBOD (at lesat that is what I remember) I guess I am going to MCA> have to add in the LUNS in a mirrored zpool of the Raid-5 Luns? 1. those boxes can work a JBODs but not in a clustered environment. Actually, those boxes can't act as JBODs. They only present LUNs created from the drives in the enclosures. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and write caching (SATA)
Hello Peter, Tuesday, December 12, 2006, 11:18:32 PM, you wrote: PS> Hello, PS> my understanding is that ZFS is specifically designed to work with write PS> caching, by instructing drives to flush their caches when a write barrier is PS> needed. And in fact, even turns write caching on explicitly on managed PS> devices. PS> My question is of a practical nature: will this *actually* be safe on the PS> average consumer grade SATA drive? I have seen offhand references to PATA PS> drives generally not being trustworthy when it comes to this (SCSI therefore PS> being recommended), but I have not been able to find information on the PS> status of typical SATA drives. PS> While I do intend to perform actual powerloss tests, it would be interesting PS> to hear from anybody whether it is generally expected to be safe. Well is disks honors cache flush commands then it should be reliable wether it's SATA or SCSI disk. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Sol10u3 -- is "du" bug fixed?
Hello Anton, Tuesday, December 12, 2006, 9:36:41 PM, you wrote: ABR> Is there an easy way to determine whether a pool has this fix applied or not? Yep. Just do 'df -h' and see what is a reported size of a pool. It should be something like N-1 times disk size for each raid-z group. If it is N times disk size then it was created before fix. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and write caching (SATA)
Hello, my understanding is that ZFS is specifically designed to work with write caching, by instructing drives to flush their caches when a write barrier is needed. And in fact, even turns write caching on explicitly on managed devices. My question is of a practical nature: will this *actually* be safe on the average consumer grade SATA drive? I have seen offhand references to PATA drives generally not being trustworthy when it comes to this (SCSI therefore being recommended), but I have not been able to find information on the status of typical SATA drives. While I do intend to perform actual powerloss tests, it would be interesting to hear from anybody whether it is generally expected to be safe. -- / Peter Schuller, InfiDyne Technologies HB PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Netapp to Solaris/ZFS issues
> NetApp can actually grow their RAID groups, but they recommend adding > an entire RAID group at once instead. If you add a disk to a RAID > group on NetApp, I believe you need to manually start a reallocate > process to balance data across the disks. There's no reallocation process that I'm aware of. Obviously adding a single column to a pretty full volume prevents you from doing the most optimal (full-stripe) writes. But since the existing parity disk covers the new column, you do have full availability of the new space. That's a different story with raidz. Hopefully you don't wait until the raid group is full before adding disks, and the blocks sort themselves out over time. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS Storage Pool advice
Are you looking purely for performance, or for the added reliability that ZFS can give you? If the latter, then you would want to configure across multiple LUNs in either a mirrored or RAID configuration. This does require sacrificing some storage in exchange for the peace of mind that any “silent data corruption” in the array or storage fabric will be not only detected but repaired by ZFS. From a performance point of view, what will work best depends greatly on your application I/O pattern, how you would map the application’s data to the available ZFS pools if you had more than one, how many channels are used to attach the disk array, etc. A single pool can be a good choice from an ease-of-use perspective, but multiple pools may perform better under certain types of load (for instance, there’s one intent log per pool, so if the intent log writes become a bottleneck then multiple pools can help). Bad example, as there's actually one intent log per file system! This also depends on how the LUNs are configured within the EMC array If you can put together a test system, and run your application as a benchmark, you can get an answer. Without that, I don’t think anyone can predict which will work best in your particular situation. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Sol10u3 -- is "du" bug fixed?
Is there an easy way to determine whether a pool has this fix applied or not? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Storage Pool advice
Are you looking purely for performance, or for the added reliability that ZFS can give you? If the latter, then you would want to configure across multiple LUNs in either a mirrored or RAID configuration. This does require sacrificing some storage in exchange for the peace of mind that any “silent data corruption” in the array or storage fabric will be not only detected but repaired by ZFS. >From a performance point of view, what will work best depends greatly on your >application I/O pattern, how you would map the application’s data to the >available ZFS pools if you had more than one, how many channels are used to >attach the disk array, etc. A single pool can be a good choice from an >ease-of-use perspective, but multiple pools may perform better under certain >types of load (for instance, there’s one intent log per pool, so if the intent >log writes become a bottleneck then multiple pools can help). This also >depends on how the LUNs are configured within the EMC array If you can put together a test system, and run your application as a benchmark, you can get an answer. Without that, I don’t think anyone can predict which will work best in your particular situation. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kickstart hot spare attachment
Eric Schrock wrote: > Hmmm, it means that we correctly noticed that the device had failed, but > for whatever reason the ZFS FMA agent didn't correctly replace the > drive. I am cleaning up the hot spare behavior as we speak so I will > try to reproduce this. Ok, great. >> Well, as long as I know which device is affected :-> If "zpool status" >> doesn't return it may be difficult to figure out. >> >> Do you know if the SATA controllers in a Thumper can better handle this >> problem? > > I will be starting a variety of experiments in this vein in the near > future. Others may be able to describe their experiences so far. How > exactly did you 'spin down' the drives in question? Is there a > particular failure mode you're interested in? The Andataco cabinet has a button for each disk slot that if you hold down will spin the drive down so you can pull it out. I'm interested in any failure mode that might happen to my server :-> Basically, we're very interested in building a nice ZFS server box that will house a good chunk of our data, be it homes, research or whatever. I just have to know the server is as bulletproof as possible, that's why I'm doing the stress tests. >> Do you have an idea as to when this might be available? > > It will be a while before the complete functionality is finished. I > have begun the work, but there are several distinct phases. First, I > am cleaning up the existing hot spare behavior. Second, I'm adding > proper hotplug support to ZFS so that it detects device removal without > freaking out and correctly resilvers/replaces drives when they are > plugged back in. Finally, I'll be adding a ZFS diagnosis engine to both > analyze ZFS faults as well as consume SMART data to predict disk failure > and proactively offline devices. I would estimate that it will be a few > months before I get all of this into Nevada. Ok, thanks. Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kickstart hot spare attachment
On Tue, Dec 12, 2006 at 02:38:22PM -0500, James F. Hranicky wrote: > > Dec 11 14:42:32.1271 1319464e-7a8c-e65b-962e-db386e90f7f2 ZFS-8000-D3 > 100% fault.fs.zfs.device > > Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745 >Affects: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745 >FRU: - > > I'm not really sure what it means. Hmmm, it means that we correctly noticed that the device had failed, but for whatever reason the ZFS FMA agent didn't correctly replace the drive. I am cleaning up the hot spare behavior as we speak so I will try to reproduce this. > Well, as long as I know which device is affected :-> If "zpool status" > doesn't return it may be difficult to figure out. > > Do you know if the SATA controllers in a Thumper can better handle this > problem? I will be starting a variety of experiments in this vein in the near future. Others may be able to describe their experiences so far. How exactly did you 'spin down' the drives in question? Is there a particular failure mode you're interested in? > Do you have an idea as to when this might be available? It will be a while before the complete functionality is finished. I have begun the work, but there are several distinct phases. First, I am cleaning up the existing hot spare behavior. Second, I'm adding proper hotplug support to ZFS so that it detects device removal without freaking out and correctly resilvers/replaces drives when they are plugged back in. Finally, I'll be adding a ZFS diagnosis engine to both analyze ZFS faults as well as consume SMART data to predict disk failure and proactively offline devices. I would estimate that it will be a few months before I get all of this into Nevada. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Storage Pool advice
Kory Wheatley wrote: This question is concerning ZFS. We have a Sun Fire V890 attached to a EMC disk array. Here's are plan to incorporate ZFS: On our EMC storage array we will create 3 LUNS. Now how would ZFS be used for the best performance? What I'm trying to ask is if you have 3 LUNS and you want to create a ZFS storage pool, would it be better to have a storage pool per LUN or combine the 3 LUNS as one big disks under ZFS and create 1 huge ZFS storage pool. One huge zpool. Remember, the pool can contain many file systems, but the reverse is not true. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS
Matthew C Aycock wrote: We are currently working on a plan to upgrade our HA-NFS cluster that uses HA-StoragePlus and VxVM 3.2 on Solaris 9 to Solaris 10 and ZFS. Is there a known procedure or best practice for this? I have enough free disk space to recreate all the filesystems and copy the data if necessary, but would like to avoid copying if possible. You will need to copy the data from the old file system into ZFS. Also, I am considering what type of zpools to create. I have a SAN with T3Bs and SE3511s. Since neither of these can work as a JBOD (at lesat that is what I remember) I guess I am going to have to add in the LUNS in a mirrored zpool of the Raid-5 Luns? Lacking other information, particularly performance requirements, what you suggest is a good strategy: ZFS mirrors of RAID-5 LUNs. We are at the extreme start of this project and I was hoping for some guidance as to what direction to start. By all means, read the Sun Cluster Concepts Guide first. It will answer many questions that may arise as you go through the design. Note version 3.2 which is required for ZFS has updates to the concepts guide regarding the use of ZFS, available RSN. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kickstart hot spare attachment
Eric Schrock wrote: > On Tue, Dec 12, 2006 at 02:08:57PM -0500, James F. Hranicky wrote: >> Sure, but that's what I want to avoid. The FMA agent should do this by >> itself, but it's not, so I guess I'm just wondering why, or if there's >> a good way to get to do so. If this happens in the middle of the night I >> don't want to have to run the commands by hand. > > Yes, the FMA agent should do this. Can you run 'fmdump -v' and see if > the DE correctly identified the faulted devices? Here you go: # fmdump -v TIME UUID SUNW-MSG-ID Nov 29 16:29:12.1947 e50198f2-2eb9-c58b-d7c5-87aaae5cb935 ZFS-8000-D3 100% fault.fs.zfs.device Problem in: zfs://pool=8e63f0b8e4263e71/vdev=9272c0973ecdb27c Affects: zfs://pool=8e63f0b8e4263e71/vdev=9272c0973ecdb27c FRU: - Nov 30 10:31:48.8844 1a44a780-05c0-cb6e-d44f-f1d8999f40e5 ZFS-8000-D3 100% fault.fs.zfs.device Problem in: zfs://pool=51f1caf6cad1aa2f/vdev=769276842b0efd54 Affects: zfs://pool=51f1caf6cad1aa2f/vdev=769276842b0efd54 FRU: - Dec 11 14:04:57.8803 c46d21e0-200d-43a1-e5db-ae9c9ebf3482 ZFS-8000-D3 100% fault.fs.zfs.device Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=52070de44ec80c15 Affects: zfs://pool=2646e20c1cb0a9d0/vdev=52070de44ec80c15 FRU: - Dec 11 14:42:32.1271 1319464e-7a8c-e65b-962e-db386e90f7f2 ZFS-8000-D3 100% fault.fs.zfs.device Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745 Affects: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745 FRU: - I'm not really sure what it means. >> For instance, the zpool command hanging or the system hanging trying to >> reboot normally. > > If the SCSI commands hang forever, then there is nothing that ZFS can > do, as a single write will never return. The more likely case is that > the commands are continually timining out with very long response times, > and ZFS will continue to talk to them forever. The future FMA > integration I mentioned will solve this problem. In the meantime, you > should be able to 'zpool offline' the affected devices by hand. Well, as long as I know which device is affected :-> If "zpool status" doesn't return it may be difficult to figure out. Do you know if the SATA controllers in a Thumper can better handle this problem? > There is also associated work going on to better handle asynchrounous > reponse times across devices. Currently, a single slow device will slow > the entire pool to a crawl. Do you have an idea as to when this might be available? Thanks for all your input, Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Netapp to Solaris/ZFS issues
On 12/12/06, James F. Hranicky <[EMAIL PROTECTED]> wrote: Jim Davis wrote: >> Have you tried using the automounter as suggested by the linux faq?: >> http://nfs.sourceforge.net/#section_b > > Yes. On our undergrad timesharing system (~1300 logins) we actually hit > that limit with a standard automounting scheme. So now we make static > mounts of the Netapp /home space and then use amd to make symlinks to > the home directories. Ugly, but it works. This is how we've always done it, but we use amd (am-utils) to manage two maps, a filesystem map and a homes map. The homes map is of all type:=link, so amd handles the link creation for us, plus we only have a handful of mounts on any system. It looks like if each user has a ZFS quota-ed home directory which acts as its own little filesystem, we won't be able to do this anymore, as we'll have to export and mount each user directory separately. Is this the case, or is there a way to export and mount a volume containing zfs quota-ed directories, i.e., have the quota-ed subdirs not necessarily act like they're separate filesystems? This is definitely a feature I'd love to see, whereby one can share the filesystem at a higher point in the tree (aka /pool/a/b, sharing /pool/a, but have "b" as its own filesystem). I know this breaks some of the sharing, but I'd love to have clients be able to mount /pool/a and by way of that see b as well and not have that treated as a separate share. Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kickstart hot spare attachment
On Tue, Dec 12, 2006 at 02:08:57PM -0500, James F. Hranicky wrote: > > Sure, but that's what I want to avoid. The FMA agent should do this by > itself, but it's not, so I guess I'm just wondering why, or if there's > a good way to get to do so. If this happens in the middle of the night I > don't want to have to run the commands by hand. Yes, the FMA agent should do this. Can you run 'fmdump -v' and see if the DE correctly identified the faulted devices? > For instance, the zpool command hanging or the system hanging trying to > reboot normally. If the SCSI commands hang forever, then there is nothing that ZFS can do, as a single write will never return. The more likely case is that the commands are continually timining out with very long response times, and ZFS will continue to talk to them forever. The future FMA integration I mentioned will solve this problem. In the meantime, you should be able to 'zpool offline' the affected devices by hand. There is also associated work going on to better handle asynchrounous reponse times across devices. Currently, a single slow device will slow the entire pool to a crawl. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Sol10u3 -- is "du" bug fixed?
> IIRC you have to re-create entire raid-z pool to get > it fixed - just > rewriting data or upgrading a pool won't do it. You are correct ... Now I have to find some place to stick +1TB of temp files ;) Thanks for the help, Jeb This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Netapp to Solaris/ZFS issues
Jim Davis wrote: >> Have you tried using the automounter as suggested by the linux faq?: >> http://nfs.sourceforge.net/#section_b > > Yes. On our undergrad timesharing system (~1300 logins) we actually hit > that limit with a standard automounting scheme. So now we make static > mounts of the Netapp /home space and then use amd to make symlinks to > the home directories. Ugly, but it works. This is how we've always done it, but we use amd (am-utils) to manage two maps, a filesystem map and a homes map. The homes map is of all type:=link, so amd handles the link creation for us, plus we only have a handful of mounts on any system. It looks like if each user has a ZFS quota-ed home directory which acts as its own little filesystem, we won't be able to do this anymore, as we'll have to export and mount each user directory separately. Is this the case, or is there a way to export and mount a volume containing zfs quota-ed directories, i.e., have the quota-ed subdirs not necessarily act like they're separate filesystems? Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kickstart hot spare attachment
Eric Schrock wrote: > On Tue, Dec 12, 2006 at 07:53:32AM -0800, Jim Hranicky wrote: >> - I know I can attach it via the zpool commands, but is there a way to >> kickstart the attachment process if it fails to attach automatically upon >> disk failure? > > Yep. Just do a 'zpool replace zmir '. This is what the > FMA agent does in response to failed drive faults. Sure, but that's what I want to avoid. The FMA agent should do this by itself, but it's not, so I guess I'm just wondering why, or if there's a good way to get to do so. If this happens in the middle of the night I don't want to have to run the commands by hand. >> - Is there something inherent to an old SCSI bus that causes spun- >> down drives to hang the system in some way, even if it's just hanging >> the zpool/zfs system calls? Would a thumper be more resilient to this? > > There are a number of drive failure modes that result in arbitrarily > misbehaving drives, as opposed to drives which fail to open entirely. > We are working on a more complete FMA diagnosis engine which will be > able to diagnose this type of failure and proactively fault the device. > > I'm not sure exactly what behavior you're seeing by 'spun-down drives', > so this may or may not address your issue. For instance, the zpool command hanging or the system hanging trying to reboot normally. Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Sol10u3 -- is "du" bug fixed?
Jeb Campbell wrote: After upgrade you did actually re-create your raid-z pool, right? No, but I did "zpool upgrade -a". Hmm, I guess I'll try re-writing the data first. I know you have to do that if you change compression options. Ok -- rewriting the data doesn't work ... I'll create a new temp pool and see what that does ... then I'll investigate options for recreating my big pool ... Unfortunately, this bug is only fixed when you create the pool on the new bits. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS
Hello Matthew, Tuesday, December 12, 2006, 7:13:47 PM, you wrote: MCA> We are currently working on a plan to upgrade our HA-NFS cluster MCA> that uses HA-StoragePlus and VxVM 3.2 on Solaris 9 to Solaris 10 MCA> and ZFS. Is there a known procedure or best practice for this? I MCA> have enough free disk space to recreate all the filesystems and MCA> copy the data if necessary, but would like to avoid copying if possible. You will have to copy data. Also keep in mind that ZFS is supported in Sun Cluster 3.2 which is not out yet (should be really soon now). MCA> Also, I am considering what type of zpools to create. I have a MCA> SAN with T3Bs and SE3511s. Since neither of these can work as a MCA> JBOD (at lesat that is what I remember) I guess I am going to MCA> have to add in the LUNS in a mirrored zpool of the Raid-5 Luns? 1. those boxes can work a JBODs but not in a clustered environment. 2. the configurations of arrays - well, it depends. I would only suggest to do redundancy at zfs level at least. For some performance numbers on those arrays with zfs see the list archives. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kickstart hot spare attachment
On Tue, Dec 12, 2006 at 07:53:32AM -0800, Jim Hranicky wrote: > > - I know I can attach it via the zpool commands, but is there a way to > kickstart the attachment process if it fails to attach automatically upon > disk failure? Yep. Just do a 'zpool replace zmir '. This is what the FMA agent does in response to failed drive faults. > - In this instance the spare is twice as big as the other > drives -- does that make a difference? Nope. The 'size' of a replacing vdev is the minimum size of its two children, so it won't affect anything. > - Is there something inherent to an old SCSI bus that causes spun- > down drives to hang the system in some way, even if it's just hanging > the zpool/zfs system calls? Would a thumper be more resilient to this? There are a number of drive failure modes that result in arbitrarily misbehaving drives, as opposed to drives which fail to open entirely. We are working on a more complete FMA diagnosis engine which will be able to diagnose this type of failure and proactively fault the device. I'm not sure exactly what behavior you're seeing by 'spun-down drives', so this may or may not address your issue. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Sol10u3 -- is "du" bug fixed?
Hello Jeb, Tuesday, December 12, 2006, 7:11:30 PM, you wrote: >> After upgrade you did actually re-create your raid-z >> pool, right? JC> No, but I did "zpool upgrade -a". JC> Hmm, I guess I'll try re-writing the data first. I know you have JC> to do that if you change compression options. IIRC you have to re-create entire raid-z pool to get it fixed - just rewriting data or upgrading a pool won't do it. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Performance problems during 'destroy' (and bizzare Zone problem as well)
[b]Setting:[/b] We've operating in the following setup for well over 60 days. - E2900 (24 x 92) - 2 2Gbps FC to EMC SAN - Solaris 10 Update 2 (06/06) - ZFS with compression turned on - Global zone + 1 local zone (sparse) - Local zone is fed ZFS clones from the global Zone [b]Daily Routine[/b] - Shutdown local Zone - Recreate ZFS clones - Restart local Zone - End to end timing for this refresh is anywhere between 5 to 30 minutes. Bulk of the time is spent in the ZFS 'destroy' phase. [b]Problem[/b] - We had extensive read/write activity in the global and local Zones yesterday. I estimate that we wrote 1/4 of one large ZFS filesystem, ~ 160GB of write. - This morning we had a fair amount of activity on the system when the refresh started, zpool was reporting around 150MB/S of write. - Our 'zfs destroy' commands took what I considere 'normal', the FS that was fielding the bulk of the I/O took 15 minutes. During this time everything was crawling or more accurately come to a dead stop. A simple 'rm' would hang. I've reported this problem to the forum in the past. I also believe the fix for the problem is in Update 3 for Solaris 10, right? -[b]Surprisingly today the ZFS 'snapshot & clone' took an inordinate amount of time. I observed each snapshot & clone activity together took 10+ minutes. In the past the same activity has taken no more than a few seconds even during busy times. The total end-to-end timing for all snapshots/clones was a whopping 1:44:00!!![/b] - Even more surprising was that local Zone refused to startup (zoneadm -z bluenile boot) with no error messages. - I was able to start the Zone only after an hour or so after the completion of the ZFS commands. [b]Questions:[/b] - Why is the destroy phase taking so long? - What can explain the unduly long snapshot/clone times - Why didn't the Zone startup? - More surprisingly why did the Zone startup after an hour? Thanks in advance. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Storage Pool advice
This question is concerning ZFS. We have a Sun Fire V890 attached to a EMC disk array. Here's are plan to incorporate ZFS: On our EMC storage array we will create 3 LUNS. Now how would ZFS be used for the best performance? What I'm trying to ask is if you have 3 LUNS and you want to create a ZFS storage pool, would it be better to have a storage pool per LUN or combine the 3 LUNS as one big disks under ZFS and create 1 huge ZFS storage pool. Example: LUN1 200gb ZFS Storage Pool "pooldata1" LUN2 200gb ZFS Storage Pool "pooldata2" LUN3 200gb ZFS Storage Pool "pooldata3" or LUN 600gb ZFS Storage Pool "alldata" This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS
We are currently working on a plan to upgrade our HA-NFS cluster that uses HA-StoragePlus and VxVM 3.2 on Solaris 9 to Solaris 10 and ZFS. Is there a known procedure or best practice for this? I have enough free disk space to recreate all the filesystems and copy the data if necessary, but would like to avoid copying if possible. Also, I am considering what type of zpools to create. I have a SAN with T3Bs and SE3511s. Since neither of these can work as a JBOD (at lesat that is what I remember) I guess I am going to have to add in the LUNS in a mirrored zpool of the Raid-5 Luns? We are at the extreme start of this project and I was hoping for some guidance as to what direction to start. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Re: zpool import takes to long with large numbers of file systems
Hello Jason, Thursday, December 7, 2006, 11:18:17 PM, you wrote: JJWW> Hi Luke, JJWW> That's terrific! JJWW> You know you might be able to tell ZFS which disks to look at. I'm not JJWW> sure. It would be interesting, if anyone with a Thumper could comment JJWW> on whether or not they see the import time issue. What are your load JJWW> times now with MPXIO? On x4500 importing a pool made of 44 disks takes about 13 seconds. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Sol10u3 -- is "du" bug fixed?
> After upgrade you did actually re-create your raid-z > pool, right? No, but I did "zpool upgrade -a". Hmm, I guess I'll try re-writing the data first. I know you have to do that if you change compression options. Ok -- rewriting the data doesn't work ... I'll create a new temp pool and see what that does ... then I'll investigate options for recreating my big pool ... Thanks for the info, Jeb This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Uber block corruption?
Hello Toby, Tuesday, December 12, 2006, 4:18:54 PM, you wrote: TT> On 12-Dec-06, at 9:46 AM, George Wilson wrote: >> Also note that the UB is written to every vdev (4 per disk) so the >> chances of all UBs being corrupted is rather low. It depends actually - if all your vdevs are on the same array with write back cache set to on you actually can end-up with all UB corrupted - at least in theory. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Need Clarification on ZFS quota property.
> Hi All, > > Assume the device c0t0d0 size is 10 KB. > I created ZFS file system on this > $ zpool create -f mypool c0t0d0s2 This creates a pool on the entire slice. > and to limit the size of ZFS file system I used quota property. > > $ zfs set quota = 5000K mypool Note that this sets a quota only on the default filesystem that was created along with the zpool. There may be other filesystems created on the pool with different quotas. You are not setting a quota on the pool itself. > Which 5000 K bytes are belongs (or reserved) to mypool first 5000KB > or last 5000KB or random ? All blocks belong to the pool. The /mypool filesystem may be allocated any particular space there depending on other filesystems and layout. Attempts to allocate space greater than 5000K will fail. > UFS and VxFS file systems have options to limit the size of file > system on the device (E.g. We can limit the size offrom 1 block to > some nth block . Like this is there any sub command to limit the > size of ZFS file system from 1 block to some n th block ? I'm not sure what you're saying here. UFS and VxFS normally take the entire space of a disk slice or volume. The pool creation does the same thing. Can you clarify what you mean by limiting the size of UFS or VxVS? -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)
> But seriously, the big issue with SCSI, is that the SCSI commands are sent > over the SCSI bus at the original (legacy) rate of 5 Mbits/Sec in 8-bit > mode. Actually, this isn't true on the newest (Ultra320) SCSI systems, though I don't know if the 3320 supports packetized SCSI. It's definitely an issue for older SCSI buses if the reads and writes are small, less than a megabyte, say. (For data warehousing applications you should see larger reads, as long as your data is laid out contiguously on disk.) There's rather a nice chart at http://www.hitachigst.com/hdd/library/whitepap/tech/hdwpacket.htm showing how the overhead grows with the speed of the bus. > And since it takes an average of 5 SCSI commands to do something useful Urm? What's wrong with just READ(10) or WRITE(10)? > Also, it takes a lot of time to send those commands - so you have latency. Not much compared to the rotational latency if you're actually reading from media, though. (Measured latency for a read operation with disconnect/reconnect on a parallel SCSI bus is around 22 µs. [That's microseconds in case your mail program/browser doesn't get it right.]) > This is the main reason why SCSI is EOL I presume you mean parallel SCSI? I'd argue that the larger reason was the cost and cooling requirements of parallel cabling; SAS seems to be alive, at least, if not taking off quickly. FC, SAS, and SATA all have lower overhead since they're point-to-point and don't need to arbitrate (or drive multiple receivers). How noticeable this is depends on your application. For large sequential I/O, the data transfer time dominates the overhead; for random I/O, the seek time and rotational latency dominates the overhead. Only in the cases where you're doing fairly small sequential I/Os, you have a very fast caching controller, or you have so many spindles on one connection that you have enough I/O operations in flight to keep the bus busy, will this matter much. For this application, with a mix of random & sequential I/O, FC disks, or other disks with very low seek+rotation times, might perform quite a lot better than inexpensive disks with longer seek+rotation times. I'd be concerned that the updates would dominate performance, unless they're happening at a rate of fewer than about 50/second/spindle. Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sol10u3 -- is "du" bug fixed?
Hello Jeb, Tuesday, December 12, 2006, 6:04:36 PM, you wrote: JC> I updated to Sol10u3 last night, and I'm still seeing different JC> differences between "du -h" and "ls -h". JC> "du" seems to take into account raidz and compression -- if this is correct, please let me know. JC> It makes sense that "du" reports actual disk usage, but this JC> makes some scripts I wrote very broken (need real sizes of files JC> in a directory to be able to put them on dvd isos). JC> Sol10u3 on 3 disk RaidZ: JC> [EMAIL PROTECTED]:~/burnout/2006-11-30]$ ls -lh JMS-data-1-2006-11-30.iso JC> -rw-r--r-- 1 splus splus 3.5G Dec 1 10:15 JMS-data-1-2006-11-30.iso JC> [EMAIL PROTECTED]:~/burnout/2006-11-30]$ du -hs JMS-data-1-2006-11-30.iso JC> 5.2GJMS-data-1-2006-11-30.iso After upgrade you did actually re-create your raid-z pool, right? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Re: Snapshots impact on performance
Hello Chris, Wednesday, December 6, 2006, 6:23:48 PM, you wrote: CG> One of our file servers internally to Sun that reproduces this CG> running nv53 here is the dtrace output: Any conclusions yet? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Sol10u3 -- is "du" bug fixed?
I updated to Sol10u3 last night, and I'm still seeing different differences between "du -h" and "ls -h". "du" seems to take into account raidz and compression -- if this is correct, please let me know. It makes sense that "du" reports actual disk usage, but this makes some scripts I wrote very broken (need real sizes of files in a directory to be able to put them on dvd isos). Sol10u3 on 3 disk RaidZ: [EMAIL PROTECTED]:~/burnout/2006-11-30]$ ls -lh JMS-data-1-2006-11-30.iso -rw-r--r-- 1 splus splus 3.5G Dec 1 10:15 JMS-data-1-2006-11-30.iso [EMAIL PROTECTED]:~/burnout/2006-11-30]$ du -hs JMS-data-1-2006-11-30.iso 5.2GJMS-data-1-2006-11-30.iso Thanks, Jeb This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Usage in Warehousing (lengthy intro)
On Dec 12, 2006, at 10:02, Al Hopper wrote: Another possiblity, which is on my todo list to checkout, is: http://www.norcotek.com/item_detail.php?categoryid=8&modelno=DS-1220 I would not go with this device. I picked up one along with 12 500GB SATA drives with the hopes of making a dumping ground on the network for my servers to rsync to. Now I might have it all kinds of not configured or tuned correctly in terms of solaris & zfs (which if I do I can't fiure out), but performance is terrible compared to my existing dumping ground based on a cheap-o raid-5 card & freebsd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zpool mirror
> > Not right now (without a bunch of shell-scripting). > I'm working on > eing able to "send" a whole tree of filesystems & > their snapshots. > Would that do what you want? Exactly! When you think that -really useful- feature will be available? thanks, gino This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Uber block corruption?
> [...] there is no possibility of referencing an overwritten > block unless you have to back off more than two uberblocks. At this > point, blocks that have been overwritten will show up as corrupted (bad > checksums). Hmmm. Is there some way we can warn the user to scrub their pool because we had trouble reading an überblock? (Maybe some FMA rules about what to do if an überblock read fails?) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: zfs exported a live filesystem
Jim Hranicky wrote: Now having said that I personally wouldn't have expected that zpool export should have worked as easily as that while there where shared filesystems. I would have expected that exporting the pool should have attempted to unmount all the ZFS filesystems first - which would have failed without a -f flag because they were shared. So IMO it is a bug or at least an RFE. Ok, where should I file an RFE? http://bugs.opensolaris.org/ -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS related kernel panic
> UFS will panic on EIO also. Most other file systems, too. In which cases will UFS panic on an I/O error? A quick browse through the UFS code shows several cases where we can panic if we have bad metadata on disk, but none if a disk read (or write) fails altogether. If UFS fails to read a block, it returns EIO (in most cases, occasionally a different error depending on the context) to its caller. (In a few cases, it can continue past the error; for instance, if it can't read a cylinder group header and wants to allocate a block there, it will go on to a different cylinder group.) If UFS fails to write a block, the buffer cache or page cache will just keep retrying. QFS won't even panic on bad metadata, unless enabled with an /etc/system variable; it will just returns errors to its caller. (It won't panic on I/O errors at all.) --- As for why expectations with ZFS are higher? I suspect that it's primarily because ZFS has been sold (deservedly) as being very good at dealing with hardware problems. This means that it should not only detect the problems, but continue on past them whenever possible. Ditto blocks are a first step in this direction. Bringing down the machine when a read or write fails is so 1980s; ZFS needs a bit of fine-tuning here. We don't need to be defensive. ZFS is a new file system. It will take some time to work all the quirks out and it will take some time to eliminate all the panic cases. But we will. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Corruption
Bill Casale wrote: Please reply directly to me. Seeing the message below. Is it possible to determine exactly which file is corrupted? I was thinking the OBJECT/RANGE info may be pointing to it but I don't know how to equate that to a file. This is bug: 6410433 'zpool status -v' would be more useful with filenames and i'm actually working on it right now! eric # zpool status -v pool: u01 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM u01 ONLINE 0 0 6 c1t102d0 ONLINE 0 0 6 errors: The following persistent errors have been detected: DATASET OBJECT RANGE u01 4741362 600178688-600309760 Thanks, Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Usage in Warehousing (lengthy intro)
On Fri, 8 Dec 2006, Jochen M. Kaiser wrote: > Dear all, > > we're currently looking forward to restructure our hardware environment for > our datawarehousing product/suite/solution/whatever. > > We're currently running the database side on various SF V440's attached via > dual FC to our SAN backend (EMC DMX3) with UFS. The storage system is > (obviously in a SAN) shared between many systems. Performance is mediocre > in terms of raw throughput at 70-150MB/sec. (lengthy, sequential reads due to > full table scan operations on the db side) and excellent is terms of I/O and > service times (averaging at 1,7ms according to sar). > >From our applications perspective sequential read is the most important > >factor. > Read-to-Write ratio is almost 20:1. > > We now want to consolidate our database servers (Oracle, btw.) to a pair of > x4600 systems running Solaris 10 (which we've already tested in a benchmark > setup). The whole system was still I/O-bound, even though the backend (3510, > 12x146GB, QFS, RAID10) delivered a sustained data rate of 250-300MB/sec. > > I'd like to target a sequential read performance of 500++MB/sec while reading > from the db on multiple tablespaces. We're experiencing massive data volume > growth of about 100% per year and are therefore looking both for an > expandable, > yet "cheap" solution. We'd like to use a DAS solution, because we had negative > experiences with SAN in the past in terms of tuning and throughput. > > Being a friend of simplicity I was thinking about using a pair (or more) of > 3320 > SCSI JBODs with multiple RAIDZ and/or RAID10 zfs disk pools on which we'd Have you not heard that SCSI is dead? :) But seriously, the big issue with SCSI, is that the SCSI commands are sent over the SCSI bus at the original (legacy) rate of 5 Mbits/Sec in 8-bit mode. And since it takes an average of 5 SCSI commands to do something useful, you can't send enough commands over the bus to busy out a modern SCSI drive. Even a single drive on a single SCSI bus. Also, it takes a lot of time to send those commands - so you have latency. And everyone understands how latency affects throughput on a LAN (or WAN) .. same issue with SCSI. This is the main reason why SCSI is EOL and could not be extended without breaking the existing standards. While I understand you don't want to build a SAN, an alternative would be a Fibre Channel (FC) box that presents SATA drives. This would be a DAS solution with one or two connections to (Qlogic) FC controllers in the host - IOW not a SAN and there is no FC switch required. Many such boxes are designed to provide expansion to a FC based hardware RAID box. For example, the DS4000 EXP100 Storage Expansion Unit from IBM. In your application you'd need to find something that supports FC rates of 4Gb/Sec, if possible. Another possiblity, which is on my todo list to checkout, is: http://www.norcotek.com/item_detail.php?categoryid=8&modelno=DS-1220 Now if I could find a Marvell based equivalent to the: http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm with external SATA ports, life would be great. Another card with external SATA ports that works with Solaris (via the si3124 driver) is: http://www.newegg.com/product/product.asp?item=N82E16816124003 which only has a 32-bit PCI connection. :( > place the database. If we need more space we'll simply connect yet another > JBOD. I'd calculate 1-2 PCIe U320 controllers (w/o raid) per jbod, starting > with a > minimum of 4 controllers per server. > > Regarding ZFS I'd be very interested to know, whether someone else is running > a similar setup and can provide me with some hints or point me at some > caveats. > > I'd be also very interested in the cpu usage of such a setup for the zfs raidz > pools. After searching this forum I found the rule of thumb that 200MB/sec > throughput roughly consume one 2GHz Opteron cpu, but am hoping that someone > can provide me with some in depth data. (Frankly I can hardly imagine that > this > holds true for reads). > > I'd be also be interested in you opinion on my targeted setup, so if you have > any comments - go ahead. > > Any help is appreciated, > > Jochen > > P.S. Fallback scenarios would be Oracle with ASM or a (zfs/ufs) SAN setup. > Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kickstart hot spare attachment
For my latest test I set up a stripe of two mirrors with one hot spare like so: zpool create -f -m /export/zmir zmir mirror c0t0d0 c3t2d0 mirror c3t3d0 c3t4d0 spare c3t1d0 I spun down c3t2d0 and c3t4d0 simultaneously, and while the system kept running (my tar over NFS barely hiccuped), the zpool command hung again. I rebooted the machine with -dnq, and although the system didn't come up the first time, it did after a fsck and a second reboot. However, once again the hot spare isn't getting used: # zpool status -v pool: zmir state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Tue Dec 12 09:15:49 2006 config: NAMESTATE READ WRITE CKSUM zmirDEGRADED 0 0 0 mirrorDEGRADED 0 0 0 c0t0d0 ONLINE 0 0 0 c3t2d0 UNAVAIL 0 0 0 cannot open mirrorDEGRADED 0 0 0 c3t3d0 ONLINE 0 0 0 c3t4d0 UNAVAIL 0 0 0 cannot open spares c3t1d0AVAIL A few questions: - I know I can attach it via the zpool commands, but is there a way to kickstart the attachment process if it fails to attach automatically upon disk failure? - In this instance the spare is twice as big as the other drives -- does that make a difference? - Is there something inherent to an old SCSI bus that causes spun- down drives to hang the system in some way, even if it's just hanging the zpool/zfs system calls? Would a thumper be more resilient to this? Jim This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Netapp to Solaris/ZFS issues
NetApp can actually grow their RAID groups, but they recommend adding an entire RAID group at once instead. If you add a disk to a RAID group on NetApp, I believe you need to manually start a reallocate process to balance data across the disks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Uber block corruption?
On 12-Dec-06, at 9:46 AM, George Wilson wrote: Also note that the UB is written to every vdev (4 per disk) so the chances of all UBs being corrupted is rather low. Furthermore the time window where UBs are mutually inconsistent would be very short, since they'd be updated together? --Toby Thanks, George Darren Dunham wrote: DD> To reduce the chance of it affecting the integrety of the filesystem, DD> there are multiple copies of the UB written, each with a checksum and a DD> generation number. When starting up a pool, the oldest generation copy DD> that checks properly will be used. If the import can't find any valid DD> UB, then it's not going to have access to any data. Think of a UFS DD> filesystem where all copies of the superblock are corrupt. Actually the latest UB, not the oldest. My *other* oldest... yeah. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Netapp to Solaris/ZFS issues
Hello Jim, Wednesday, December 6, 2006, 3:28:53 PM, you wrote: JD> We have two aging Netapp filers and can't afford to buy new Netapp gear, JD> so we've been looking with a lot of interest at building NFS fileservers JD> running ZFS as a possible future approach. Two issues have come up in the JD> discussion JD> - Adding new disks to a RAID-Z pool (Netapps handle adding new disks very JD> nicely). Mirroring is an alternative, but when you're on a tight budget JD> losing N/2 disk capacity is painful. Actually you can add another raid-z group to the pool. I belive it's the same what NetApp is doing (instead of actually growing raid group). JD> - The default scheme of one filesystem per user runs into problems with JD> linux NFS clients; on one linux system, with 1300 logins, we already have JD> to do symlinks with amd because linux systems can't mount more than about JD> 255 filesystems at once. We can of course just have one filesystem JD> exported, and make /home/student a subdirectory of that, but then we run JD> into problems with quotas -- and on an undergraduate fileserver, quotas JD> aren't optional! It can with 2.6 kernels. However there're other problems we we ended-up with limit at around 700. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Uber block corruption?
[EMAIL PROTECTED] wrote: Hello Casper, Tuesday, December 12, 2006, 10:54:27 AM, you wrote: So 'a' UB can become corrupt, but it is unlikely that 'all' UBs will become corrupt through something that doesn't also make all the data also corrupt or inaccessible. CDSC> So how does this work for data which is freed and overwritten; does CDSC> the system make sure that none of the data referenced by any of the CDSC> old ueberblocks is ever overwritten? Why it should? If blocks are not used due to current UB I guess you can safely assume they are free. What if a newer UB is corrupted and you fall back to an older one? Casper A block freed in transaction group N cannot be reused until transaction group N+3; so there is no possibility of referencing an overwritten block unless you have to back off more than two uberblocks. At this point, blocks that have been overwritten will show up as corrupted (bad checksums). -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Uber block corruption?
Also note that the UB is written to every vdev (4 per disk) so the chances of all UBs being corrupted is rather low. Thanks, George Darren Dunham wrote: DD> To reduce the chance of it affecting the integrety of the filesystem, DD> there are multiple copies of the UB written, each with a checksum and a DD> generation number. When starting up a pool, the oldest generation copy DD> that checks properly will be used. If the import can't find any valid DD> UB, then it's not going to have access to any data. Think of a UFS DD> filesystem where all copies of the superblock are corrupt. Actually the latest UB, not the oldest. My *other* oldest... yeah. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to do DIRECT IO on ZFS ?
Maybe this will help: http://blogs.sun.com/roch/entry/zfs_and_directio -r dudekula mastan writes: > Hi All, > > We have directio() system to do DIRECT IO on UFS file system. Can > any one know how to do DIRECT IO on ZFS file system. > > Regards > Masthan > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS Usage in Warehousing (no more lengthy intro)
Hello Jochen, Sunday, December 10, 2006, 10:51:57 AM, you wrote: JMK> James, >> Just a thought. >> >> have you thought about giving thumper x4500's a trial >> for this work >> load? Oracle would seem to be IO limited in the end >> so 4 cores may be >> enough to keep oracle happy when linked with upto >> 2GB/s disk IO speed. JMK> === JMK> Actually yes, however I've doubts in regard to scalability JMK> of cpu power. I'd imagine that a RaidZ setup will increase JMK> cpu usage of zfs, so Mirroring will be the way to go. JMK> I've also browsed some info on greenplum and other appliance JMK> vendors. However none are listed as strategic products for our JMK> company (forcing a lengthy assessment process), support/consulting JMK> in Germany is usually non-existent and a port of our current setup JMK> is difficult at best. JMK> I've asked Robert Milkowski (milek.blogspot.com) if he can provide JMK> me with some cpu figures from his throughput benchmarks. It's not that bad with CPU usage. For example with RAID-Z2 while doing scrub I get something like 800MB/s read from disks (550-600MB/s from zpool iostat perspective) and all four cores are mostly consumed - I get something like 10% idle on each cpu. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Corruption
Bill, If you want to find the file associated with the corruption you could do a "find /u01 -inum 4741362" or use the output of "zdb -d u01" to find the object associated with that id. Thanks, George Bill Casale wrote: Please reply directly to me. Seeing the message below. Is it possible to determine exactly which file is corrupted? I was thinking the OBJECT/RANGE info may be pointing to it but I don't know how to equate that to a file. # zpool status -v pool: u01 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM u01 ONLINE 0 0 6 c1t102d0 ONLINE 0 0 6 errors: The following persistent errors have been detected: DATASET OBJECT RANGE u01 4741362 600178688-600309760 Thanks, Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Uber block corruption?
>Hello Casper, > >Tuesday, December 12, 2006, 10:54:27 AM, you wrote: > >>>So 'a' UB can become corrupt, but it is unlikely that 'all' UBs will >>>become corrupt through something that doesn't also make all the data >>>also corrupt or inaccessible. > > >CDSC> So how does this work for data which is freed and overwritten; does >CDSC> the system make sure that none of the data referenced by any of the >CDSC> old ueberblocks is ever overwritten? > >Why it should? If blocks are not used due to current UB I guess you >can safely assume they are free. What if a newer UB is corrupted and you fall back to an older one? Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Corruption
Hello Bill, Tuesday, December 12, 2006, 2:34:01 PM, you wrote: BC> Please reply directly to me. Seeing the message below. BC> Is it possible to determine exactly which file is corrupted? BC> I was thinking the OBJECT/RANGE info may be pointing to it BC> but I don't know how to equate that to a file. BC> # zpool status -v BC>pool: u01 BC> state: ONLINE BC> status: One or more devices has experienced an error resulting in data BC> corruption. Applications may be affected. BC> action: Restore the file in question if possible. Otherwise restore the BC> entire pool from backup. BC> see: http://www.sun.com/msg/ZFS-8000-8A BC> scrub: none requested BC> config: BC> NAMESTATE READ WRITE CKSUM BC> u01 ONLINE 0 0 6 BC>c1t102d0 ONLINE 0 0 6 BC> errors: The following persistent errors have been detected: BC>DATASET OBJECT RANGE BC>u01 4741362 600178688-600309760 ^^^ This is inode number so just use find to find a file. There's an RFE for this so zpool status will give you actual file names. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to do DIRECT IO on ZFS ?
Hello dudekula, Tuesday, December 12, 2006, 9:36:24 AM, you wrote: > Hi All, We have directio() system to do DIRECT IO on UFS file system. Can any one know how to do DIRECT IO on ZFS file system. Right now you can't. -- Best regards, Robert mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Corruption
Please reply directly to me. Seeing the message below. Is it possible to determine exactly which file is corrupted? I was thinking the OBJECT/RANGE info may be pointing to it but I don't know how to equate that to a file. # zpool status -v pool: u01 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM u01 ONLINE 0 0 6 c1t102d0 ONLINE 0 0 6 errors: The following persistent errors have been detected: DATASET OBJECT RANGE u01 4741362 600178688-600309760 Thanks, Bill -- _/_/_/ _/_/ _/ _/Bill Casale - TSE _/ _/_/ _/_/ _/ OS Team _/_/_/ _/_/ _/ _/ _/ 1 Network Drive _/ _/_/ _/ _/_/ Burlington, MA. 01802 _/_/_/ _/_/_/ _/ _/ M I C R O S Y S T E M S ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zfs exported a live filesystem
For the record, this happened with a new filesystem. I didn't muck about with an old filesystem while it was still mounted, I created a new one, mounted it and then accidentally exported it. > > Except that it doesn't: > > > > # mount /dev/dsk/c1t1d0s0 /mnt > > # share /mnt > > # umount /mnt > > umount: /mnt busy > > # unshare /mnt > > # umount /mnt > > If you umount -f it will though! Well, sure, but I was still surprised that it happened anyway. > The system is working as designed, the NFS client did > what it was supposed to do. If you brought the pool back in > again with zpool import things should have picked up where they left off. Yep -- an import/shareall made the FS available again. > Whats more you we probably running as root when you > did that so you got what you asked for - there is only so much protection > we can give without being annoying! Sure, but there are still safeguards in place even when running things as root, such as requiring "umount -f" as above, or warning you when running format on a disk with mounted partitions. Since this appeared to be an operation that may warrant such a safeguard I thought I'd check and see if this was to be expected or if a safeguard should be put in. Annoying isn't always bad :-> > Now having said that I personally wouldn't have > expected that zpool export should have worked as easily as that while > there where shared filesystems. I would have expected that exporting > the pool should have attempted to unmount all the ZFS filesystems first - > which would have failed without a -f flag because they were shared. > > So IMO it is a bug or at least an RFE. Ok, where should I file an RFE? Jim This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs exported a live filesystem
Boyd Adamson wrote: On 12/12/2006, at 8:48 AM, Richard Elling wrote: Jim Hranicky wrote: By mistake, I just exported my test filesystem while it was up and being served via NFS, causing my tar over NFS to start throwing stale file handle errors. Should I file this as a bug, or should I just "not do that" :-> Don't do that. The same should happen if you umount a shared UFS file system (or any other file system types). -- richard Except that it doesn't: # mount /dev/dsk/c1t1d0s0 /mnt # share /mnt # umount /mnt umount: /mnt busy # unshare /mnt # umount /mnt If you umount -f it will though! I don't quite agree that unmounting a UFS filesystem that is exported over NFS is the same as running zpool export on the pool. The equivalent to running umount on the UFS file system is running zfs umount on the ZFS file system in the pool. Running zpool export on the pool is closer to removing (cleanly) the disks or metadevices that the ufs file system is stored on. The system is working as designed, the NFS client did what it was supposed to do. If you brought the pool back in again with zpool import things should have picked up where they left off. Whats more you we probably running as root when you did that so you got what you asked for - there is only so much protection we can give without being annoying! If you look at the RBAC profiles we currently ship for ZFS you will see that there are two distinct profiles, one for ZFS File System Management and one for ZFS Storage Management. The reason they are separate is because they work at quite different layers in the system with different protections. Now having said that I personally wouldn't have expected that zpool export should have worked as easily as that while there where shared filesystems. I would have expected that exporting the pool should have attempted to unmount all the ZFS filesystems first - which would have failed without a -f flag because they were shared. So IMO it is a bug or at least an RFE. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Uber block corruption?
Hello Casper, Tuesday, December 12, 2006, 10:54:27 AM, you wrote: >>So 'a' UB can become corrupt, but it is unlikely that 'all' UBs will >>become corrupt through something that doesn't also make all the data >>also corrupt or inaccessible. CDSC> So how does this work for data which is freed and overwritten; does CDSC> the system make sure that none of the data referenced by any of the CDSC> old ueberblocks is ever overwritten? Why it should? If blocks are not used due to current UB I guess you can safely assume they are free. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Need Clarification on ZFS quota property.
On 12 December, 2006 - dudekula mastan sent me these 2,7K bytes: > > Hi All, > > Assume the device c0t0d0 size is 10 KB. > > I created ZFS file system on this > > $ zpool create -f mypool c0t0d0s2 > > and to limit the size of ZFS file system I used quota property. > > $ zfs set quota = 5000K mypool > > Which 5000 K bytes are belongs (or reserved) to mypool first 5000KB or last > 5000KB or random ? "random".. When you've stored 5000K, you can't store anymore there. > UFS and VxFS file systems have options to limit the size of file > system on the device (E.g. We can limit the size offrom 1 block to > some nth block . Like this is there any sub command to limit the > size of ZFS file system from 1 block to some n th block ? Just amount, not specific positions on/portions of the FS/devices. /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Need Clarification on ZFS quota property.
Hi All, Assume the device c0t0d0 size is 10 KB. I created ZFS file system on this $ zpool create -f mypool c0t0d0s2 and to limit the size of ZFS file system I used quota property. $ zfs set quota = 5000K mypool Which 5000 K bytes are belongs (or reserved) to mypool first 5000KB or last 5000KB or random ? UFS and VxFS file systems have options to limit the size of file system on the device (E.g. We can limit the size offrom 1 block to some nth block . Like this is there any sub command to limit the size of ZFS file system from 1 block to some n th block ? Your help is appreciated. Thanks & Regards Masthan __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Doubt on solaris 10 installation ..
[EMAIL PROTECTED] looks like the more appropriate list to post questions like yours. dudekula mastan wrote: Hi Everybody, I have some problems in solaris 10 installation. After installing the first CD , I removed the CD from CDrom , after that the machine is getting rebooting again and again. It is not asking second CD to install. If you have any idea. Please tell me. Thanks & Regards Masthan -- Zoram Thanga::Sun Cluster Development::http://blogs.sun.com/zoram ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Uber block corruption?
>So 'a' UB can become corrupt, but it is unlikely that 'all' UBs will >become corrupt through something that doesn't also make all the data >also corrupt or inaccessible. So how does this work for data which is freed and overwritten; does the system make sure that none of the data referenced by any of the old ueberblocks is ever overwritten? Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to do DIRECT IO on ZFS ?
Hi All, We have directio() system to do DIRECT IO on UFS file system. Can any one know how to do DIRECT IO on ZFS file system. Regards Masthan - Everyone is raving about the all-new Yahoo! Mail beta.___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss