[zfs-discuss] ZFS related kernel panic
Last Friday, one of our V880s kernel panicked with the following message.This is a SAN connected ZFS pool attached to one LUN. From this, it appears that the SAN 'disappeared' and then there was a panic shortly after. Am I reading this correctly? Is this normal behavior for ZFS? This is a mostly patched Solaris 10 6/06 install. Before patching this system we did have a couple of NFS related panics, always on Fridays! This is the fourth panic, first time with a ZFS error. There are no errors in zpool status. Dec 1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd17): Dec 1 20:30:21 foobar SCSI transport failed: reason 'incomplete': retrying command Dec 1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd17): Dec 1 20:30:21 foobar SCSI transport failed: reason 'incomplete': retrying command Dec 1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd17): Dec 1 20:30:21 foobar disk not responding to selection Dec 1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd17): Dec 1 20:30:21 foobar disk not responding to selection Dec 1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd17): Dec 1 20:30:21 foobar disk not responding to selection Dec 1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd17): Dec 1 20:30:21 foobar disk not responding to selection Dec 1 20:30:22 foobar scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],1 (sd17): Dec 1 20:30:22 foobar disk not responding to selection Dec 1 20:30:22 foobar unix: [ID 836849 kern.notice] Dec 1 20:30:22 foobar ^Mpanic[cpu2]/thread=2a100aedcc0: Dec 1 20:30:22 foobar unix: [ID 809409 kern.notice] ZFS: I/O failure (write on unknown off 0: zio 3004c0ce540 [L0 unallocated] 2L/2P DVA [0]=0:2ae190:2 fletcher2 uncompressed BE contiguous birth=586818 fill=0 cksum=102297a2db39dfc:cc8e38087da7a38f:239520856ececf15:c2fd36 9cea9db4a1): error 5 Dec 1 20:30:22 foobar unix: [ID 10 kern.notice] Dec 1 20:30:22 foobar genunix: [ID 723222 kern.notice] 02a100aed740 zfs:zio_done+284 (3004c0ce540, 0, a8, 70513bf0, 0, 60001374940) Dec 1 20:30:22 foobar genunix: [ID 179002 kern.notice] %l0-3: 03006319fc80 70513800 0005 0005 Dec 1 20:30:22 foobar %l4-7: 7b224278 0002 0008f442 0005 Dec 1 20:30:22 foobar genunix: [ID 723222 kern.notice] 02a100aed940 zfs:zio_vdev_io_assess+178 (3004c0ce540, 8000, 10, 0, 0, 10) Dec 1 20:30:22 foobar genunix: [ID 179002 kern.notice] %l0-3: 0002 0001 0005 Dec 1 20:30:22 foobar %l4-7: 0010 35a536bc 00043d7293172cfc Dec 1 20:30:22 foobar genunix: [ID 723222 kern.notice] 02a100aeda00 genunix:taskq_thread+1a4 (600012a0c38, 600012a0be0, 50001, 43d72c8bfb810, 2a100aedaca, 2a100aedac8) Dec 1 20:30:22 foobar genunix: [ID 179002 kern.notice] %l0-3: 0001 0600012a0c08 0600012a0c10 0600012a0c12 Dec 1 20:30:22 foobar %l4-7: 030060946320 0002 0600012a0c00 Dec 1 20:30:22 foobar unix: [ID 10 kern.notice] Dec 1 20:30:22 foobar genunix: [ID 672855 kern.notice] syncing file systems... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS related kernel panic
Douglas Denny wrote: Last Friday, one of our V880s kernel panicked with the following message.This is a SAN connected ZFS pool attached to one LUN. From this, it appears that the SAN 'disappeared' and then there was a panic shortly after. Am I reading this correctly? Yes. Is this normal behavior for ZFS? Yes. You have no redundancy (from ZFS' point of view at least), so ZFS has no option except panicing in order to maintain the integrity of your data. This is a mostly patched Solaris 10 6/06 install. Before patching this system we did have a couple of NFS related panics, always on Fridays! This is the fourth panic, first time with a ZFS error. There are no errors in zpool status. Without data, it is difficult to suggest what might have caused your NFS panics. James C. McPherson -- Solaris kernel software engineer, system admin and troubleshooter http://www.jmcp.homeunix.com/blog Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS related kernel panic
On 12/4/06, James C. McPherson [EMAIL PROTECTED] wrote: Is this normal behavior for ZFS? Yes. You have no redundancy (from ZFS' point of view at least), so ZFS has no option except panicing in order to maintain the integrity of your data. This is interesting from a implementation point of view. Any singly attached SAN connection that has a disconnect from its switch/backend will cause the ZFS to panic, why would it not wait and see if the device came back? Should all SAN connected ZFS pools have redundancy built in with dual HBAs to dual SAN switches/controllers? -Doug ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS related kernel panic
Douglas Denny wrote: On 12/4/06, James C. McPherson [EMAIL PROTECTED] wrote: Is this normal behavior for ZFS? Yes. You have no redundancy (from ZFS' point of view at least), so ZFS has no option except panicing in order to maintain the integrity of your data. This is interesting from a implementation point of view. Any singly attached SAN connection that has a disconnect from its switch/backend will cause the ZFS to panic, why would it not wait and see if the device came back? Should all SAN connected ZFS pools have redundancy built in with dual HBAs to dual SAN switches/controllers? UFS will panic on EIO also. Most other file systems, too. You can put UFS on top of SVM, but unless SVM is configured for redundancy, it (UFS) would still panic in such situations. ZFS doesn't bring anything new here, but I sense a change in expectations that I can't quite reconcile. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replacing a drive in a raidz vdev
I am having no luck replacing my drive as well. few days ago I replaced my drive and its completly messed up now. pool: mypool2 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 8.70% done, 8h19m to go config: NAME STATE READ WRITE CKSUM mypool2 DEGRADED 0 0 0 raidz DEGRADED 0 0 0 c3t0d0ONLINE 0 0 0 c3t1d0ONLINE 0 0 0 c3t2d0ONLINE 0 0 0 c3t3d0ONLINE 0 0 0 c3t4d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 replacing DEGRADED 0 0 0 c3t6d0s0/o UNAVAIL 0 0 0 cannot open c3t6d0 ONLINE 0 0 0 errors: No known data errors this is what I get, I am running Solaris 10 U2 two days ago I did see 2.00% range, and then like 10h remaining, now its still going and its already at least few days since it started. when I do: zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT mypool2 952G684G268G71% DEGRADED - I have almost 1TB of space. when I do df -k it does show me only 277gb, it is better than only displaying 12gb as I did see yesterday. mypool2/d3 277900047 12022884 265877163 5% /d/d3 when I do zfs list I get: mypool2684G 254G52K /mypool2 mypool2/d 191G 254G 189G /mypool2/d mypool2/[EMAIL PROTECTED] 653M - 145G - mypool2/[EMAIL PROTECTED] 31.2M - 145G - mypool2/[EMAIL PROTECTED] 36.8M - 144G - mypool2/[EMAIL PROTECTED] 37.9M - 144G - mypool2/[EMAIL PROTECTED] 31.7M - 145G - mypool2/[EMAIL PROTECTED] 27.7M - 145G - mypool2/[EMAIL PROTECTED] 34.0M - 146G - mypool2/[EMAIL PROTECTED] 26.8M - 149G - mypool2/[EMAIL PROTECTED] 34.4M - 151G - mypool2/[EMAIL PROTECTED] 141K - 189G - mypool2/d3 492G 254G 11.5G legacy I am so confused with all of this... Why its taking so long to replace that one bad disk? Why such different results? What is going on? Is there a problem with my zpool/zfs combination? Did I do anything wrong? Did I actually loose data on my drive? If I knew it woul dbe this bad I would just destroy my whole zpool and zfs and start from the beginning but I wanted to see how would it go trough replacement to see whats the process... I am so happy I did not use zfs in my production environment yet to be honest with you... Chris On Sat, 2 Dec 2006, Theo Schlossnagle wrote: I had a disk malfunction in a raidz pool today. I had an extra on in the enclosure and performed a: zpool replace pool old new and several unexpected behaviors have transpired: the zpool replace command hung for 52 minutes during which no zpool commands could be executed (like status, iostat or list). When it finally returned, the drive was marked as replacing as I expected from reading the man page. However, it's progress counter has not been monotonically increasing. It started at 1% and then went to 5% and then back to 2%, etc. etc. I just logged in to see if it was done and ran zpool status and received: pool: xsr_slow_2 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM xsr_slow_2 ONLINE 0 0 0 raidzONLINE 0 0 0 c4t600039316A1Fd0s2ONLINE 0 0 0 c4t600039316A1Fd1s2ONLINE 0 0 0 c4t600039316A1Fd2s2ONLINE 0 0 0 c4t600039316A1Fd3s2ONLINE 0 0 0 replacing ONLINE 0 0 0 c4t600039316A1Fd4s2 ONLINE 2.87K 251 0 c4t600039316A1Fd6ONLINE 0 0 0 c4t600039316A1Fd5s2ONLINE 0 0 0 I thought to myself, if it is 100% done why is it still replacing? I waited about 15 seconds and ran the command again to find something rather disconcerting: pool: xsr_slow_2 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.45% done, 27h27m to go config: NAME
Re: [zfs-discuss] ZFS related kernel panic
Hi all, Having experienced this, it would be nice if there was an option to offline the filesystem instead of kernel panicking on a per-zpool basis. If its a system-critical partition like a database I'd prefer it to kernel-panick and thereby trigger a fail-over of the application. However, if its a zpool hosting some fileshares I'd prefer it to stay online. Putting that level of control in would alleviate a lot of the complaints it seems to me...or at least give less of a leg to stand on. ;-) A nasty little notice that tells you the system will kernel panick if a vdev becomes unavailable, wouldn't be bad either when you're creating a striped zpool. Even the best of us forgets these things. Best Regards, Jason On 12/4/06, Richard Elling [EMAIL PROTECTED] wrote: Douglas Denny wrote: On 12/4/06, James C. McPherson [EMAIL PROTECTED] wrote: Is this normal behavior for ZFS? Yes. You have no redundancy (from ZFS' point of view at least), so ZFS has no option except panicing in order to maintain the integrity of your data. This is interesting from a implementation point of view. Any singly attached SAN connection that has a disconnect from its switch/backend will cause the ZFS to panic, why would it not wait and see if the device came back? Should all SAN connected ZFS pools have redundancy built in with dual HBAs to dual SAN switches/controllers? UFS will panic on EIO also. Most other file systems, too. You can put UFS on top of SVM, but unless SVM is configured for redundancy, it (UFS) would still panic in such situations. ZFS doesn't bring anything new here, but I sense a change in expectations that I can't quite reconcile. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replacing a drive in a raidz vdev
On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote: mypool2/[EMAIL PROTECTED] 34.4M - 151G - mypool2/[EMAIL PROTECTED] 141K - 189G - mypool2/d3 492G 254G 11.5G legacy I am so confused with all of this... Why its taking so long to replace that one bad disk? To workaround a bug where a pool traverse gets lost when the snapshot configuration of a pool changes, both scrubs and resilvers will start over again any time you create or delete a snapshot. Unfortunately, this workaround has problems of its own -- If your inter-snapshot interval is less than the time required to complete a scrub, the resilver will never complete. The open bug is: 6343667 scrub/resilver has to start over when a snapshot is taken if it's not going to be fixed any time soon, perhaps we need a better workaround: Ideas: - perhaps snapshots should be made to fail while a resilver (not scrub!) is in progress... - or maybe snapshots should fail when a *restarted* resilver is in progress -- that way, if you can complete the resilver between two snapshots times, you don't miss any snapshots, but if it takes longer than that, snapshots are sacrificed in the name of pool integrity. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS on multi-volume
Hi all Sorry if my question is not very clear, I'm not very familiar with ZFS (why I ask this question). Suppose I've lot of low-cost raid array disk (like Brownie meaning IDE/SATA disk)) all in SCSI attachement (lot of ~ 10 and the sum of space is ~ 20 To). Now if I buy some «high» level big raid array disk on FC attachement and a big Sun server, can I create a ZFS fs on all disks with : All data is on the new big raid array disk (using hardware raid) and All data is mirroring on the sum of my old low-cost raid array ? If it's possible what do you think of the perf ? The purpose is make a big NFS server with primary data on a high-level raid array disk but using ZFS to mirror all data on the all old-raid-array. Regards. -- Albert SHIH Universite de Paris 7 (Denis DIDEROT) U.F.R. de Mathematiques. 7 ième étage, plateau D, bureau 10 Heure local/Local time: Mon Dec 4 23:04:04 CET 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS related kernel panic
Jason J. W. Williams wrote: Hi all, Having experienced this, it would be nice if there was an option to offline the filesystem instead of kernel panicking on a per-zpool basis. If its a system-critical partition like a database I'd prefer it to kernel-panick and thereby trigger a fail-over of the application. However, if its a zpool hosting some fileshares I'd prefer it to stay online. Putting that level of control in would alleviate a lot of the complaints it seems to me...or at least give less of a leg to stand on. ;-) Agreed, and we are working on this. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS related kernel panic
If you take a look at these messages the somewhat unusual condition that may lead to unexpected behaviour (ie. fast giveup) is that whilst this is a SAN connection it is achieved through a non- Leadville config, note the fibre-channel and sd references. In a Leadville compliant installation this would be the ssd driver, hence you'd have to investigate the specific semantics and driver tweaks that this system has applied to sd in this instance. If only it was possible to use the Leadville drivers... We've seen the same problems here (*instant* panic if the FC switch reboots due to ZFS - I wouldn't mind if it kept on retrying a tad bit longer - preferably configurable). And to panic? How can that in any sane way be good way to protect the application? *BANG* - no chance at all for the application to handle the problem... Btw. in our case we have also wrapped the raw FC-attached disks with SVM metadevices first because if a disk in a A3500FC units goes bad then we had the _other_ failure mode of ZFS - total hang until I noticed that by wrapping the device with a layer of SVM metadevices insulated ZFS from that problem - now it correctly notices that the disk is gone/dead and displays that when doing zfs status etc. (We (Lysator ACS - a students computer club) can't use the Leadville driver, since the 'ifp driver (and hence use the ssd disks) for the Qlogic QLA2100 HBA boards is based on an older Qlogic firmware that only supports max 16 LUNs per target and we want more... So we use the Qlogic qla2100 driver instead which works really nicely but then it uses the sd disk devices instead. Being a computer club with limited funds means one finds ways to use old hardware in new and interesting ways :-) Hardware in use: Primary file server: Sun Ultra 450, two Qlogic QLA2100 HBAs. One connected via an 8-port FC-AL *hub* to two Sun A5000 JBOD boxes (filled with 9 and 18GB FC disks), the other via a Brocade 2400 8-port switch (running in QuickLoop mode) to a Compaq StorageWorks RA8000 RAID and two A3500FC systems. Now... What can *possibly* go wrong with that setup? :-) I'll tell you a couple: 1. When the server entered multiuser and started serving NFS to all the users $HOME - many many disks in the A5000 started resetting themself again and again and again... Solution: Tune down the maximum number of tagged commands that was sent to the disks in /kernel/drv/qla2100.conf: hba1-max-iocb-allocation=7; # was 256 hba1-execution-throttle=7; # was 31 (This problem wasn't there with the old Sun ifp driver, probably because it has less agressive limits - but since that driver is totally nonconfigurable it's impossible to tell). 2. The power cord got slightly lose to the Brocade switch causing it to reboot causing the server into an *Instant PANIC thanks to ZFS* This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS related kernel panic
Any chance we might get a short refresher warning when creating a striped zpool? O:-) Best Regards, Jason On 12/4/06, Matthew Ahrens [EMAIL PROTECTED] wrote: Jason J. W. Williams wrote: Hi all, Having experienced this, it would be nice if there was an option to offline the filesystem instead of kernel panicking on a per-zpool basis. If its a system-critical partition like a database I'd prefer it to kernel-panick and thereby trigger a fail-over of the application. However, if its a zpool hosting some fileshares I'd prefer it to stay online. Putting that level of control in would alleviate a lot of the complaints it seems to me...or at least give less of a leg to stand on. ;-) Agreed, and we are working on this. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] it's me Lester
You cannot make big returns on an oil company AFTER huge profits are reported. You also can't make them by getting in AFTER successful drilling results. Everyone needs a helping hand at getting in BEFORE the big events, and that's what we are giving you here. Great product, great sector, tightly held, with great results expected any day. Cana Petroleum is going to make you a winner! Symbol: CNPM Current Price: Around $0.78 Projected Price: $5.40 Watch it on Tuesday, December 5th. This one is going to have you wearing a big smile all day long! Major oil discovery? We are not permitted to say at this point. All we can say is that this one is going to see amazing appreciation in a very short period of time! This is your opportunity. Excel with CNPM! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS related kernel panic
Peter Eriksson wrote: If you take a look at these messages the somewhat unusual condition that may lead to unexpected behaviour (ie. fast giveup) is that whilst this is a SAN connection it is achieved through a non- Leadville config, note the fibre-channel and sd references. In a Leadville compliant installation this would be the ssd driver, hence you'd have to investigate the specific semantics and driver tweaks that this system has applied to sd in this instance. If only it was possible to use the Leadville drivers... We've seen the same problems here (*instant* panic if the FC switch reboots due to ZFS - I wouldn't mind if it kept on retrying a tad bit longer - preferably configurable). And to panic? How can that in any sane way be good way to protect the application? *BANG* - no chance at all for the application to handle the problem... The *application* should not be worrying about handling error conditions in the kernel. That's the kernel's job, and in this case, ZFS' job. ZFS protects *your data* by preventing any more writes from occurring when it cannot guarantee the integrity of your data. Btw. in our case we have also wrapped the raw FC-attached disks with SVM metadevices first because if a disk in a A3500FC units goes bad then we had the _other_ failure mode of ZFS - total hang until I noticed that by wrapping the device with a layer of SVM metadevices insulated ZFS from that problem - now it correctly notices that the disk is gone/dead and displays that when doing zfs status etc. Hm. An extra layer of complexity. Kinda defeats one of stated goals of ZFS. (We (Lysator ACS - a students computer club) can't use the Leadville driver, since the 'ifp driver (and hence use the ssd disks) for the Qlogic QLA2100 HBA boards is based on an older Qlogic firmware that only supports max 16 LUNs per target and we want more... So we use the Qlogic qla2100 driver instead which works really nicely but then it uses the sd disk devices instead. Being a computer club with limited funds means one finds ways to use old hardware in new and interesting ways :-) Ebay.se ? Hardware in use: Primary file server: Sun Ultra 450, two Qlogic QLA2100 HBAs. One connected via an 8-port FC-AL *hub* to two Sun A5000 JBOD boxes (filled with 9 and 18GB FC disks), the other via a Brocade 2400 8-port switch (running in QuickLoop mode) to a Compaq StorageWorks RA8000 RAID and two A3500FC systems. Now... What can *possibly* go wrong with that setup? :-) Hmmm let's start with the mere existence of the EOL'd A3500fc hardware in your config. Kinda goes downhill from there :) I'll tell you a couple: 1. When the server entered multiuser and started serving NFS to all the users $HOME - many many disks in the A5000 started resetting themself again and again and again... Solution: Tune down the maximum number of tagged commands that was sent to the disks in /kernel/drv/qla2100.conf: hba1-max-iocb-allocation=7; # was 256 hba1-execution-throttle=7; # was 31 (This problem wasn't there with the old Sun ifp driver, probably because it has less agressive limits - but since that driver is totally nonconfigurable it's impossible to tell). Ebay.se 2. The power cord got slightly lose to the Brocade switch causing it to reboot causing the server into an *Instant PANIC thanks to ZFS* Yes, as noted, this is by design in order to *protect your data* James C. McPherson -- Solaris kernel software engineer Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS related kernel panic
Matthew Ahrens wrote: Jason J. W. Williams wrote: Hi all, Having experienced this, it would be nice if there was an option to offline the filesystem instead of kernel panicking on a per-zpool basis. If its a system-critical partition like a database I'd prefer it to kernel-panick and thereby trigger a fail-over of the application. However, if its a zpool hosting some fileshares I'd prefer it to stay online. Putting that level of control in would alleviate a lot of the complaints it seems to me...or at least give less of a leg to stand on. ;-) Agreed, and we are working on this. Similar to UFS's onerror mount option, I take it? /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS on multi-volume
It is possible to configure ZFS in the way you describe, but your performance will be limited by the older array. All mirror writes have to be stored on both arrays before they are considered complete, so writes will be as slow as the slowest disk or array involved. ZFS does not currently consider performance in selecting a mirror side for reads, so half of the reads will run at the speed of the new array, half at the speed of the old array. If you need to use both types of arrays (20 To is a lot of space to give up!), consider creating two pools, one composed of newer arrays and one of older arrays, at least if your data is easily split into fast and slow sets (e.g. fresh data vs. archival, or database logs vs. infrequently-accessed tables). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS related kernel panic
And to panic? How can that in any sane way be good way to protect the application? *BANG* - no chance at all for the application to handle the problem... I agree -- a disk error should never be fatal to the system; at worst, the file system should appear to have been forcibly unmounted (and worst really means that critical metadata, like the superblock/uberblock, can't be updated on any of the disks in the pool). That at least gives other applications which aren't using the file system the chance to keep going. An I/O error detected when writing a file can be reported at write() time, fsync() time, or close() time. Any application which doesn't check all three of these won't handle all I/O errors properly; and applications which care about knowing that their data is on disk must either use synchronous writes (O_SYNC/O_DSYNC) or call fsync before closing the file. ZFS should report back these errors in all cases and avoid panicing (obviously). That said, it also appears that the device drivers (either the FibreChannel or SCSI disk drivers in this case) are misbehaving. The FC driver appears to be reporting back an error which is interpreted as fatal by the SCSI disk driver when one or the other should be retrying the I/O. (It also appears that either the FC driver, SCSI disk driver, or ZFS is misbehaving in the observed hang.) So ZFS should be more resilient against write errors, and the SCSI disk or FC drivers should be more resilient against LIPs (the most likely cause of your problem) or other transient errors. (Alternatively, the ifp driver should be updated to support the maximum number of targets on a loop, which might also solve your second problem.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS related kernel panic
Anton B. Rang wrote: Peter Eriksson wrote: And to panic? How can that in any sane way be good way to protect the application? *BANG* - no chance at all for the application to handle the problem... I agree -- a disk error should never be fatal to the system; at worst, the file system should appear to have been forcibly unmounted (and worst really means that critical metadata, like the superblock/uberblock, can't be updated on any of the disks in the pool). That at least gives other applications which aren't using the file system the chance to keep going. But it's still not the application's problem to handle the underlying device failure. ... That said, it also appears that the device drivers (either the FibreChannel or SCSI disk drivers in this case) are misbehaving. The FC driver appears to be reporting back an error which is interpreted as fatal by the SCSI disk driver when one or the other should be retrying the I/O. (It also appears that either the FC driver, SCSI disk driver, or ZFS is misbehaving in the observed hang.) In this case it is most likely that it's the qla2x00 driver which is at fault. The Leadville drivers do the appropriate retries. The sd driver and ZFS also do the appropriate retries. So ZFS should be more resilient against write errors, and the SCSI disk or FC drivers should be more resilient against LIPs (the most likely cause of your problem) or other transient errors. (Alternatively, the ifp driver should be updated to support the maximum number of targets on a loop, which might also solve your second problem.) Your alternative option isn't going to happen. The ifp driver and the card it supports have both been long since EOLd. James C. McPherson -- Solaris kernel software engineer Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS related kernel panic
Anton B. Rang wrote: And to panic? How can that in any sane way be good way to protect the application? *BANG* - no chance at all for the application to handle the problem... I agree -- a disk error should never be fatal to the system; at worst, the file system should appear to have been forcibly unmounted (and worst really means that critical metadata, like the superblock/uberblock, can't be updated on any of the disks in the pool). That at least gives other applications which aren't using the file system the chance to keep going. This is not always the desired behavior. In particular, for a high availability cluster, if one node is having difficulty and another is not, then we'd really like to have the services relocated to the good node ASAP. I think this case is different, though... An I/O error detected when writing a file can be reported at write() time, fsync() time, or close() time. Any application which doesn't check all three of these won't handle all I/O errors properly; and applications which care about knowing that their data is on disk must either use synchronous writes (O_SYNC/O_DSYNC) or call fsync before closing the file. ZFS should report back these errors in all cases and avoid panicing (obviously). From what I recall of previous discussions on this topic (search the archives), the difficulty is attributing a failure temporally, given that you want a file system to have better performance by caching. That said, it also appears that the device drivers (either the FibreChannel or SCSI disk drivers in this case) are misbehaving. The FC driver appears to be reporting back an error which is interpreted as fatal by the SCSI disk driver when one or the other should be retrying the I/O. (It also appears that either the FC driver, SCSI disk driver, or ZFS is misbehaving in the observed hang.) Agree 110%. When debugging layered software/firmware, it is essential to understand all of the assumptions made at all interfaces. Currently, ZFS assumes that a fatal write error is in fact fatal. So ZFS should be more resilient against write errors, and the SCSI disk or FC drivers should be more resilient against LIPs (the most likely cause of your problem) or other transient errors. (Alternatively, the ifp driver should be updated to support the maximum number of targets on a loop, which might also solve your second problem.) NB. LIPs are a normal part of everyday life for fibre channel, they are not an error. But I think Anton is right here, the way that the driver deals with incurred exceptions is key to the upper layers being stable. This can be tuned, but remember that tuning my lead to instability. We might be dealing with an instability case here, not a functional spec problem. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS related kernel panic
Dale Ghent wrote: Matthew Ahrens wrote: Jason J. W. Williams wrote: Hi all, Having experienced this, it would be nice if there was an option to offline the filesystem instead of kernel panicking on a per-zpool basis. If its a system-critical partition like a database I'd prefer it to kernel-panick and thereby trigger a fail-over of the application. However, if its a zpool hosting some fileshares I'd prefer it to stay online. Putting that level of control in would alleviate a lot of the complaints it seems to me...or at least give less of a leg to stand on. ;-) Agreed, and we are working on this. Similar to UFS's onerror mount option, I take it? Actually, it would be interesting to see how many customers change the onerror setting. We have some data, just need more days in the hour. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS related kernel panic
Richard Elling wrote: Actually, it would be interesting to see how many customers change the onerror setting. We have some data, just need more days in the hour. I'm pretty sure you'd find that info in over 6 years of submitted Explorer output :) I imagine that stuff is sandboxed away in a far off department, though. /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] need Clarification on ZFS
Hi All, I am new to solaris. Please clarify me on the following questions. 1) On Linux to know the presence of ext2/ext3 file systems on a device we use tune2fs command. Similar to tune2fs command is there any command to know the presence of ZFS file system on a device ? 2) When a device is shared between two machines , What our project does is, - Create ext2 file system on device a) Mount the device on machine 1 b) Write data on the device c) unmount the device from machine 1 d)mount the device on machine 2 e) read the data on the device f) compare the current read data with previous write data and report the result g) unmount the device from machine 2 h) Goto step a. Like this , Can We share zfs file system between two machines. If so please explain it. 3) Can we create ZFS pools (or ZFS file system ) on VxVm volumes ? if so, how ? 4) Can we share ZFS pools ( ZFS file ststem ) between two machines ? 5) Like fsck command on Linux, is there any command to check the consistency of the ZFS file system ? your help is appreciated. Thanks Regards Masthan - Need a quick answer? Get one in minutes from people who know. Ask your question on Yahoo! Answers.___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] need Clarification on ZFS
Hi Mastan, On Dec 4, 2006, at 11:13 PM, dudekula mastan wrote: Hi All, I am new to solaris. Please clarify me on the following questions. 1) On Linux to know the presence of ext2/ext3 file systems on a device we use tune2fs command. Similar to tune2fs command is there any command to know the presence of ZFS file system on a device ? zpool import will list any zpools even when they're not currently visible in a zpool list zfs get -r all zpool-name will list all zfs filesystems 2) When a device is shared between two machines , What our project does is, - Create ext2 file system on device a) Mount the device on machine 1 b) Write data on the device c) unmount the device from machine 1 d)mount the device on machine 2 e) read the data on the device f) compare the current read data with previous write data and report the result g) unmount the device from machine 2 h) Goto step a. Like this , Can We share zfs file system between two machines. If so please explain it. It's always going from machine 1 to machine 2? zfs send [EMAIL PROTECTED] | ssh [EMAIL PROTECTED] | zfs recv filesystem-one-machine2 will stream a snapshot from the first machine to a filesystem/device/ snapshot on machine2 3) Can we create ZFS pools (or ZFS file system ) on VxVm volumes ? if so, how ? It's been so long since I've cared about VxVm volumes, I don't know. 4) Can we share ZFS pools ( ZFS file ststem ) between two machines ? Yes but what are the requirements here? 5) Like fsck command on Linux, is there any command to check the consistency of the ZFS file system ? ZFS is a transactional, copy-on-write filesystem. It's always consistent. There is an output for zpool status that includes this information, for example [strongspace12(zone):/] root# zpool status pool: thumper12 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM thumper ONLINE 0 0 0 raidz2ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 spares c5t1d0AVAIL c5t2d0AVAIL c5t3d0AVAIL errors: No known data errors Regards, Jason Jason A. Hoffman, PhD | Founder, CTO, Joyent Inc. Applications = http://joyent.com/ Hosting = http://textdrive.com/ Backups = http://strongspace.com/ Weblog = http://joyeur.com/ Email= [EMAIL PROTECTED] or [EMAIL PROTECTED] Mobile = (858)342-2179 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] need Clarification on ZFS
1) On Linux to know the presence of ext2/ext3 file systems on a device we use tune2fs command. Similar to tune2fs command is there any command to know the presence of ZFS file system on a device ? You can use 'zpool import' to check normal disk devices, or give an optional list of devices/directories to search specifically for zfs presence, or you can use 'fstyp' to guess at a filesystem on any type of named device based on signature. # fstyp /dev/rdsk/c0t8d0s0 ufs # fstyp /dev/rdsk/c1t8d0s0 zfs 2) When a device is shared between two machines , What our project does is, - Create ext2 file system on device a) Mount the device on machine 1 b) Write data on the device c) unmount the device from machine 1 d)mount the device on machine 2 e) read the data on the device f) compare the current read data with previous write data and report the result g) unmount the device from machine 2 h) Goto step a. Like this , Can We share zfs file system between two machines. If so please explain it. Yes. 'zpool export' and 'zpool import' can be used to unmount and remount the pool and filesystems on different machines at separate times. 3) Can we create ZFS pools (or ZFS file system ) on VxVm volumes ? if so, how ? Haven't tried it, but you should be able to pass the volume in on the zpool create command line as a device. 4) Can we share ZFS pools ( ZFS file ststem ) between two machines ? Not simultaneously at this point. 5) Like fsck command on Linux, is there any command to check the consistency of the ZFS file system ? Not in exactly the same way (because it's not needed in the same way), but it can be scrubbed periodically to verify that all the data checksums correctly. Also, you can do this while it is online. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss