Re: [zfs-discuss] zfs on a raid box
Hi everyone, I've had some time to upgrade the machine in question to nv-b77 and run the same tests. And I'm happy to report that now, hotspares work a lot better. The only question remaining for us: how long for these changes to be integrated into a supported Solaris release? See below for some logs. # zpool history data History for 'data': 2007-11-22.14:48:18 zpool create -f data raidz2 c4t0d0 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t8d0 c4t9d0 c4t10d0 spare c4t11d0 c4t12d0 >From /var/adm/messages: Nov 22 15:15:52 ddd scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd16): Error for Command: write(10) Error Level: Fatal Requested Block: 103870006 Error Block: 103870006 Vendor: transtec Serial Number: Sense Key: Not_Ready ASC: 0x4 (LUN not ready intervention required), ASCQ: 0x3, FRU: 0x0 (and about 27 more of these, until 15:16:02) Nov 22 15:16:12 ddd scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd16): offline or reservation conflict (95 of these, until 15:43:49, almost half an hour later) And then the console showed "The device has been offlined and marked as faulted. An attemt will be made to activate a hotspare if available" And my current zpool status shows: # zpool status pool: data state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: resilver completed with 0 errors on Thu Nov 22 16:09:49 2007 config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c4t2d0 FAULTED 0 23.7K 0 too many errors c4t11d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t8d0 ONLINE 0 0 0 c4t9d0 ONLINE 0 0 0 c4t10d0ONLINE 0 0 0 spares c4t11d0 INUSE currently in use c4t12d0 AVAIL One remark: I find the overview above a bit confusing ('spare' apparently is 'DEGRADED' and consists of C4t2d0 and c4t11d0) but the hotspare was properly activated this time and my pool is otherwise in good health. Thanks everyone for the replies and suggestions, Regards, Paul Boven. -- Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a raid box
Hi Dan, Dan Pritts wrote: > On Mon, Nov 19, 2007 at 11:10:32AM +0100, Paul Boven wrote: >> Any suggestions on how to further investigate / fix this would be very >> much welcomed. I'm trying to determine whether this is a zfs bug or one >> with the Transtec raidbox, and whether to file a bug with either >> Transtec (Promise) or zfs. > the way i'd try to do this would be to use the same box under solaris > software RAID, or better yet linux or windows software RAID (to make > sure it's not a solaris device driver problem). > Does pulling the disk then get noticed? If so, it's a zfs bug. Excellent suggestion, and today I had some time to give it a try. I created a 4 disk SVM volume (2x 2-disk stripe, mirrored, with 2 more disks as hot spare). d10 -m /dev/md/rdsk/d11 /dev/md/rdsk/d12 1 d11 1 2 /dev/rdsk/c4t0d0s0 /dev/rdsk/c4t1d0s0 -i 1024b -h hsp001 d12 1 2 /dev/rdsk/c4t2d0s0 /dev/rdsk/c4t3d0s0 -i 1024b -h hsp001 hsp001 c4t4d0s0 c4t5d0s0 I started a write and then pulled a disk. And without any further probing, SVM put a hotspare in place and started resyncing: d10 m 463GB d11 d12 (resync-0%) d11 s 463GB c4t0d0s0 c4t1d0s0 d12 s 463GB c4t2d0s0 (resyncing-c4t4d0s0) c4t3d0s0 hsp001 h - c4t4d0s0 (in-use) c4t5d0s0 This is all on b76. The issue seems to be with zfs indeed. I'm currently downloading b77, and once that is installed I'll have to see whether the fault diagnostics and hot spare handling have indeed improved as several people here have pointed out. Regards, Paul Boven. -- Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a raid box
Hi MP, MP wrote: >> but my issue is that >> not only the 'time left', but also the progress >> indicator itself varies >> wildly, and keeps resetting itself to 0%, not giving >> any indication that > > Are you sure you are not being hit by this bug: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667 > > i.e. scrub or resilver get's reset to 0% on a snapshot creation or deletion. >Cheers. I'm very sure of that: I've never done a snapshot on these, and I am the only user on the machine (it's not in production yet). Regards, Paul Boven. -- Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz2
Hi Eric, everyone, Eric Schrock wrote: > There have been many improvements in proactively detecting failure, > culminating in build 77 of Nevada. Earlier builds: > > - Were unable to distinguish device removal from devices misbehaving, > depending on the driver and hardware. > > - Did not diagnose a series of I/O failures as disk failure. > > - Allowed several (painful) SCSI retries and continued to queue up I/O, > even if the disk was fatally damaged. > Most classes of hardware would behave reasonably well on device removal, > but certain classes caused cascading failures in ZFS, all which should > be resolved in build 77 or later. I seem to be having exactly the problems you are describing (see my postings with the subject 'zfs on a raid box'). So I would very much like to give b77 a try. I'm currently running b76, as that's the latest sxce that's available. Are the sources to anything beyond b76 already available? Would I need to build it, or bfu? I'm seeing zfs not making use of available hot-spares when I pull a disk, long and indeed painful SCSI retries and very poor write performance on a degraded zpool - I hope to be able to test if b77 fares any better with this. Regards, Paul Boven. -- Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a raid box
Hi Tom, everyone, Tom Mooney wrote: > A little extra info: > ZFS brings in a ZFS spare device the next time the pool is accessed, not > a raidbox hot spare. Resilvering starts automatically and increases disk > access times by about 30%. The first hour of estimated time left ( for > 5-6 TB pools ) is wildly inaccurate, but it starts to settle down after > that. Thanks for your reply. I'm talking about zfs hot spares, not the hot spare functionality of the raid box: # zpool create -f data raidz c4t0d0 c4t0d1 c4t0d2 c4t0d3 c4t0d4 c4t0d5 c4t0d6 c4t0d7 c4t0d8 c4t0d9 spare c4t0d10 c4t0d11 I did my initial tests by pulling a disk during a 100GB sequential write, so that should have kicked in a hot spare right away. But no hot spare was activated (as shown by 'zpool status' and write performance fell to less than 25%. I have also tried to start resilvering manually, but that doesn't seem to work either. I've heard from several people that currently, zfs has problems with reporting the 'estimated time left' - but my issue is that not only the 'time left', but also the progress indicator itself varies wildly, and keeps resetting itself to 0%, not giving any indication that the resilvering will ever finish. And with nv-b76, 'zpool status' simply hangs when there is a drive missing, so I can't even really keep track of the resilvering, if any. So, at least for me, hot spare functionality in zfs seems completely broken. Any suggestions on how to further investigate / fix this would be very much welcomed. I'm trying to determine whether this is a zfs bug or one with the Transtec raidbox, and whether to file a bug with either Transtec (Promise) or zfs. Regards, Paul Boven. -- Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a raid box
Hi Dan, Dan Pritts wrote: > On Tue, Nov 13, 2007 at 12:25:24PM +0100, Paul Boven wrote: >> We've building a storage system that should have about 2TB of storage >> and good sequential write speed. The server side is a Sun X4200 running >> Solaris 10u4 (plus yesterday's recommended patch cluster), the array we >> bought is a Transtec Provigo 510 12-disk array. The disks are SATA, and >> it's connected to the Sun through U320-scsi. > > We are doing basically the same thing with simliar Western Scientific > (wsm.com) raids, based on infortrend controllers. ZFS notices when we > pull a disk and goes on and does the right thing. > > I wonder if you've got a scsi card/driver problem. We tried using > an Adaptec card with solaris with poor results; switched to LSI, > it "just works". Thanks for your reply. The SCSI-card in the X4200 is a Sun Single Channel U320 card that came with the system, but the PCB artwork does sport a nice 'LSI LOGIC' imprint. So, just to make sure we're talking about the same thing here - your drives are SATA, you're exporting each drive through the Western Scientific raidbox as a seperate volume, and zfs actually brings in a hot spare when you pull a drive? Over here, I've still not been able to accomplish that - even after installing Nevada b76 on the machine, removing a disk will not cause a hot-spare to become active, nor does resilvering start. Our Transtec raidbox seems to be based on a chipset by Promise, by the way. Regards, Paul Boven. -- Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs on a raid box
Hi everyone, We've building a storage system that should have about 2TB of storage and good sequential write speed. The server side is a Sun X4200 running Solaris 10u4 (plus yesterday's recommended patch cluster), the array we bought is a Transtec Provigo 510 12-disk array. The disks are SATA, and it's connected to the Sun through U320-scsi. Now the raidbox was sold to us as doing JBOD and various other raid levels, but JBOD turns out to mean 'create a single-disk stripe for every drive'. Which works, after a fashion: When using a 12-drive zfs with raidz and 1 hotspare, I get 132MB/s write performance, with raidz2 it's still 112MB/s. If instead I configure the array as a Raid-50 through the hardware raid controller, I can only manage 72MB/s. So at a first glance, this seems a good case for zfs. Unfortunately, if I then pull a disk from the zfs array, it will keep trying to write to this disk, and will never activate the hot-spare. So a zpool status will then show the pool as 'degraded', one drive marked as unavailable - and the hot-spare still marked as available. Write performance also drops to about 32MB/s. If I then try to activate the hot-spare by hand (zpool replace ) the resilvering starts, but never makes it past 10% - it seems to restart all the time. As this box is not in production yet, and I'm the only user on it, I'm 100% sure that there is nothing happening on the zfs filesystem during the resilvering - no reads, writes and certainly no snapshots. In /var/adm/messages, I see this message repeated several times each minute: Nov 12 17:30:52 ddd scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd47): Nov 12 17:30:52 ddd offline or reservation conflict Why isn't this enough for zfs to switch over to the hotspare? I've tried disabling (setting to write-thru) the write-cache on the array box, but that didn't make any difference to the behaviour either. I'd appreciate any insights or hints on how to proceed with this - should I even be trying to use zfs in this situation? Regards, Paul Boven. -- Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Setting up for zfsboot
Hi everyone, Now that zfsboot is becoming available, I'm wondering how to put it to use. Imagine a system with 4 identical disks. Of course I'd like to use raidz, but zfsboot doesn't do raidz. What if I were to partition the drives, such that I have 4 small partitions that make up a zfsboot partition (4 way mirror), and the remainder of each drive becomes part of a raidz? Do I still have the advantages of having the whole disk 'owned' by zfs, even though it's split into two parts? Swap would probably have to go on a zvol - would that be best placed on the n-way mirror, or on the raidz? Regards, Paul Boven. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss