Re: [zfs-discuss] help diagnosing system hang
Ethan Erchinger wrote: > > Richard Elling wrote: >>> >>>asc = 0x29 >>>ascq = 0x0 >> >> ASC/ASCQ 29/00 is POWER ON, RESET, OR BUS DEVICE RESET OCCURRED >> http://www.t10.org/lists/asc-num.htm#ASC_29 >> >> [this should be more descriptive as the codes are, more-or-less, >> standardized, I'll try to file an RFE, unless someone beat me to it] >> >> Depending on which system did the reset, it should be noted in the >> /var/adm/messages log. This makes me suspect the hardware (firmware, >> actually). >> > Firmware of the SSD, or something else? The answer may lie in the /var/adm/messages file which should report if a reset was received or sent. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
I've seen discussions as far back as 2006 that say development is underway to allow the addition and remove of disks in a raidz vdev to grow/shrink the group. Meaning, if a 4x100GB raidz only used 150GB of space, one could do 'zpool remove tank c0t3d0' and data residing on c0t3d0 would be migrated to other disks in the raidz. Then, c0t3d0 would be free for removal and reuse. What is the status of this support in nv101? If a pool has multiple raidz vdevs, how would one add a disk to the second raidz vdev? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] help diagnosing system hang
Richard Elling wrote: > The answer may lie in the /var/adm/messages file which should report > if a reset was received or sent. Here is a sample set of messages at that time. It looks like timeouts on the SSD for various requested blocks. Maybe I need to talk with Intel about this issue. Ethan == Dec 2 20:14:01 opensolaris scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd16): Dec 2 20:14:01 opensolaris Error for Command: write Error Level: Retryable Dec 2 20:14:01 opensolaris scsi: [ID 107833 kern.notice] Requested Block: 840 Error Block: 840 Dec 2 20:14:01 opensolaris scsi: [ID 107833 kern.notice] Vendor: ATASerial Number: CVEM840201EU Dec 2 20:14:01 opensolaris scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention Dec 2 20:14:01 opensolaris scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Dec 2 20:15:08 opensolaris scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:15:08 opensolaris Disconnected command timeout for Target 15 Dec 2 20:15:09 opensolaris scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:15:09 opensolaris Log info 0x3114 received for target 15. Dec 2 20:15:09 opensolaris scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 2 20:15:09 opensolaris scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:15:09 opensolaris Log info 0x3114 received for target 15. Dec 2 20:15:09 opensolaris scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 2 20:15:09 opensolaris scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:15:09 opensolaris Log info 0x3114 received for target 15. Dec 2 20:15:09 opensolaris scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 2 20:15:09 opensolaris scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:15:09 opensolaris Log info 0x3114 received for target 15. Dec 2 20:15:09 opensolaris scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 2 20:15:12 opensolaris scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd16): Dec 2 20:15:12 opensolaris Error for Command: write Error Level: Retryable Dec 2 20:15:12 opensolaris scsi: [ID 107833 kern.notice] Requested Block: 810 Error Block: 810 Dec 2 20:15:12 opensolaris scsi: [ID 107833 kern.notice] Vendor: ATASerial Number: CVEM840201EU Dec 2 20:15:12 opensolaris scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention Dec 2 20:15:12 opensolaris scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Dec 2 20:16:19 opensolaris scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:16:19 opensolaris Disconnected command timeout for Target 15 Dec 2 20:16:21 opensolaris scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:16:21 opensolaris Log info 0x3114 received for target 15. Dec 2 20:16:21 opensolaris scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 2 20:16:21 opensolaris scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:16:21 opensolaris Log info 0x3114 received for target 15. Dec 2 20:16:21 opensolaris scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 2 20:16:21 opensolaris scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:16:21 opensolaris Log info 0x3114 received for target 15. Dec 2 20:16:21 opensolaris scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 2 20:16:21 opensolaris scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci10de,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Dec 2 20:16:21 opensolaris Log info 0x3114 received for target 15. Dec 2 20:16:21 opensolaris scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] redundancy in non-redundant stripes
With ZFS, we can enable copies=[1,2,3] to configure how many copies of data there are. With copies of 2 or more, in theory, an entire disk can have read errors, and the zfs volume still works. The unfortunate part here is that the redundancy lies in the volume, not the pool vdev like with raidz or mirroring. So if a disk were to go missing, the zpool (stripe) is missing a vdev and thus becomes offline. If a single disk in a raidz vdev is missing, it would become degraded and still usable. Now, with non-redundant stripes, the disk can't be replaced, but all the data is still there with copies=2 if a disk dies. Is there not a way to force the zpool online or prevent it from offlining itself? One of the key benefits of the metadata copies is that if a single block fails, the filesystem is still navigable to grab what data is possible. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] redundancy in non-redundant stripes
On Fri, 5 Dec 2008, Mike Brancato wrote: > With ZFS, we can enable copies=[1,2,3] to configure how many copies > of data there are. With copies of 2 or more, in theory, an entire > disk can have read errors, and the zfs volume still works. So you are saying that if we use copies of 2 or more that if we have only one disk drive and it does not spin up then we should be ok? My understanding is that the copies function is purely statistical and if some drives are overloaded and therefore are not selected as the next round-robin device, it is possible that the several copies may end up on the same drive. The copies functionality is intended to aid with media failure and not whole drive failure. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
Mike Brancato wrote: > I've seen discussions as far back as 2006 that say development is underway to > allow the addition and remove of disks in a raidz vdev to grow/shrink the > group. Meaning, if a 4x100GB raidz only used 150GB of space, one could do > 'zpool remove tank c0t3d0' and data residing on c0t3d0 would be migrated to > other disks in the raidz. Then, c0t3d0 would be free for removal and reuse. > > What is the status of this support in nv101? Not available. I predict that you will see it mentioned everywhere, billboards, graffiti, slashdot, etc. when it arrives. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] redundancy in non-redundant stripes
In theory, with 2 80GB drives, you would always have a copy somewhere else. But a single drive, no. I guess I'm thinking in the optimal situation. With multiple drives, copies are spread through the vdevs. I guess it would work better if we could define that if copies=2 or more, that at least one copy be on a different vdev. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
Well, I knew it wasn't available. I meant to ask what is the status of the development of the feature? Not started, I presume. Is there no timeline? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] redundancy in non-redundant stripes
Mike Brancato wrote: > With ZFS, we can enable copies=[1,2,3] to configure how many copies of data > there are. With copies of 2 or more, in theory, an entire disk can have read > errors, and the zfs volume still works. No, this is not a completely true statement. > The unfortunate part here is that the redundancy lies in the volume, not the > pool vdev like with raidz or mirroring. So if a disk were to go missing, the > zpool (stripe) is missing a vdev and thus becomes offline. If a single disk > in a raidz vdev is missing, it would become degraded and still usable. Now, > with non-redundant stripes, the disk can't be replaced, but all the data is > still there with copies=2 if a disk dies. Is there not a way to force the > zpool online or prevent it from offlining itself? No. If you want this feature with copies>1, then consider mirroring. > One of the key benefits of the metadata copies is that if a single block > fails, the filesystem is still navigable to grab what data is possible. Yes. But you cannot guarantee that the metadata copies are on different vdevs. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace - choke point
[EMAIL PROTECTED] said: > Thanks for the tips. I'm not sure if they will be relevant, though. We > don't talk directly with the AMS1000. We are using a USP-VM to virtualize > all of our storage and we didn't have to add anything to the drv > configuration files to see the new disk (mpxio was already turned on). We > are using the Sun drivers and mpxio and we didn't require any tinkering to > see the new LUNs. Yes, the fact that the USP-VM was recognized automatically by Solaris drivers is a good sign. I suggest that you check to see what queue-depth and disksort values you ended up with from the automatic settings: echo "*ssd_state::walk softstate |::print -t struct sd_lun un_throttle" \ | mdb -k The "ssd_state" would be "sd_state" on an x86 machine (Solaris-10). The "un_throttle" above will show the current max_throttle (queue depth); Replace it with "un_min_throttle" to see the min, and "un_f_disksort_disabled" to see the current queue-sort setting. The HDS docs for 9500 series suggested 32 as the max_throttle to use, and the default setting (Solaris-10) was 256 (hopefully with the USP-VM you get something more reasonable). And while 32 did work for us, i.e. no operations were ever lost as far as I could tell, the array back-end -- the drives themselves, and the internal SATA shelf connections, have an actual queue depth of four for each array controller. The AMS1000 has the same limitation for SATA shelves, according to our HDS engineer. In short, Solaris, especially with ZFS, functions much better if it does not try to send more FC operations to the array than the actual physical devices can handle. We were actually seeing NFS client operations hang for minutes at a time when the SAN-hosted NFS server was making its ZFS devices busy -- and this was true even if clients were using different devices than the busy ones. We do not see these hangs after making the described changes, and I believe this is because the OS is no longer waiting around for a response from devices that aren't going to respond in a reasonable amount of time. Yes, having the USP between the host and the AMS1000 will affect things; There's probably some huge cache in there somewhere. But unless you've got cache of hundreds of GB in size, at some point a resilver operation is going to end up running at the speed of the actual back-end device. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS fragments 32 bits RAM? Problem?
I see this old post about ZFS fragmenting the RAM if it is 32 bit. This makes the memory run out. Is it still true, or has it been fixed? http://mail.opensolaris.org/pipermail/zfs-discuss/2006-July/003506.html -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Status of zpool remove in raidz and non-redundant stripes
> "mb" == Mike Brancato <[EMAIL PROTECTED]> writes: mb> if a 4x100GB raidz only used 150GB of space, one could do mb> 'zpool remove tank c0t3d0' and data residing on c0t3d0 would mb> be migrated to other disks in the raidz. that sounds like in-place changing of stripe width, and wasn't part of the discussion I remember. We were wishing for vdev removal, but you'd have to remove a whole vdev at a time. It would be analagous to 'zpool add', so just as you can't add 1 disk to widen a 3-disk raidz vdev to 4-disks, you couldn't do the reverse even with the wished-for feature. To change from 4x100GB raidz to 3x100GB raidz, you'd have to: zpool add pool raidz disk5 disk6 disk7 zpool evacuate pool raidz disk1 disk2 disk3 disk4 RFE 4852783 is to create something like zpool evacuate, removing the whole vdev at once and migrating onto other vdev's, not other disks. The feature's advantage as-is would be for pools with many vdev's. It could also be an advantage for pools with just one vdev that are humongous: you want to change the shape of the 1 vdev, but you need to do the copy/evacuation online because it takes a week. If not for the week, on a 1-vdev pool you could destroy the pool and make a new one without needing any more media than you would with the new feature. For home storage with big, slow, cheap pools, what you want sounds nice. Someone once told me he'd gotten Veritas to change a plex's width with the vg online, but for me I think it's scary because, if it crashed halfway through, I'm not sure how the system could communicate to me what's happening in a way I'd understand, much less recover from it. I'm not saying Veritas doesn't do both, just that I'd chuckle happily if I saw it actually work (which was the storyteller's response too). For vdev removal I think you could harmlessly stop the evacuation at any time with only O(1) quickie-import-time recovery, without needing to communicate anything. much easier. so i like the RFE as-is, analagous to Linux LVM2's pvmove. pgppOdQ9azkoS.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fragments 32 bits RAM? Problem?
On Fri, Dec 05, 2008 at 11:35:27AM -0800, Orvar Korvar wrote: > I see this old post about ZFS fragmenting the RAM if it is 32 bit. This makes > the memory run out. Is it still true, or has it been fixed? Don't waste your time trying to run ZFS on a 32-bit machine. The performance is horrible. I really wish I hadn't. -brian -- "Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you'll end up with a cupboard full of pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs not yet suitable for HA applications?
Trying to keep this in the spotlight. Apologies for the lengthy post. I'd really like to see features as described by Ross in his summary of the "Availability: ZFS needs to handle disk removal / driver failure better" (http://www.opensolaris.org/jive/thread.jspa?messageID=274031 ). I'd like to have these/similar features as well. Has there already been internal discussions regarding adding this type of functionality to ZFS itself, and was there approval, disapproval or no decision? Unfortunately my situation has put me in urgent need to find workarounds in the meantime. My setup: I have two iSCSI target nodes, each with six drives exported via iscsi (Storage Nodes). I have a ZFS Node that logs into each target from both Storage Nodes and creates a mirrored Zpool with one drive from each Storage Node comprising each half of the mirrored vdevs (6 x 2-way mirrors). My problem: If a Storage Node crashes completely, is disconnected from the network, iscsitgt core dumps, a drive is pulled, or a drive has a problem accessing data (read retries), then my ZFS Node hangs while ZFS waits patiently for the layers below to report a problem and timeout the devices. This can lead to a roughly 3 minute or longer halt when reading OR writing to the Zpool on the ZFS node. While this is acceptable in certain situations, I have a case where my availability demand is more severe. My goal: figure out how to have the zpool pause for NO LONGER than 30 seconds (roughly within a typical HTTP request timeout) and then issue reads/writes to the good devices in the zpool/mirrors while the other side comes back online or is fixed. My ideas: 1. In the case of the iscsi targets disappearing (iscsitgt core dump, Storage Node crash, Storage Node disconnected from network), I need to lower the iSCSI login retry/timeout values. Am I correct in assuming the only way to accomplish this is to recompile the iscsi initiator? If so, can someone help point me in the right direction (I have never compiled ONNV sources - do I need to do this or can I just recompile the iscsi initiator)? 1.a. I'm not sure in what Initiator session states iscsi_sess_max_delay is applicable - only for the initial login, or also in the case of reconnect? Ross, if you still have your test boxes available, can you please try setting "set iscsi:iscsi_sess_max_delay = 5" in /etc/system, reboot and try failing your iscsi vdevs again? I can't find a case where this was tested quick failover. 1.b. I would much prefer to have bug 649 addressed and fixed rather than having to resort to recompiling the iscsi initiator (if iscsi_sess_max_delay) doesn't work. This seems like a trivial feature to implement. How can I sponsor development? 2. In the case of the iscsi target being reachable, but the physical disk is having problems reading/writing data (retryable events that take roughly 60 seconds to timeout), should I change the iscsi_rx_max_window tunable with mdb? Is there a tunable for iscsi_tx? Ross, I know you tried this recently in the thread referenced above (with value 15), which resulted in a 60 second hang. How did you offline the iscsi vol to test this failure? Unless iscsi uses a multiple of the value for retries, then maybe the way you failed the disk caused the iscsi system to follow a different failure path? Unfortunately I don't know of a way to introduce read/write retries to a disk while the disk is still reachable and presented via iscsitgt, so I'm not sure how to test this. 2.a With the fix of http://bugs.opensolaris.org/view_bug.do?bug_id=6518995 , we can set sd_retry_count along with sd_io_time to cause I/O failure when a command takes longer than sd_retry_count * sd_io_time. Can (or should) these tunables be set on the imported iscsi disks in the ZFS Node, or can/should they be applied only to the local disk on the Storage Nodes? If there is a way to apply them to ONLY the imported iscsi disks (and not the local disks) of the ZFS Node, and without rebooting every time a new iscsi disk is imported, then I'm thinking this is the way to go. In a year of having this setup in customer beta I have never had Storage nodes (or both sides of a mirror) down at the same time. I'd like ZFS to take advantage of this. If (and only if) both sides fail then ZFS can enter failmode=wait. Currently using Nevada b96. Planning to move to >100 shortly to avoid zpool commands hanging while the zpool is waiting to reach a device. David Anderson Aktiom Networks, LLC Ross wrote: > I discussed this exact issue on the forums in February, and filed a bug at the time. I've also e-mailed and chatted with the iSCSI developers, and the iSER developers a few times. There was also been another thread about the iSCSI timeouts being made configurable a few months back, and finally, I started another discussion on ZFS availability,
Re: [zfs-discuss] zfs not yet suitable for HA applications?
Hi Dan, replying in line: On Fri, Dec 5, 2008 at 9:19 PM, David Anderson <[EMAIL PROTECTED]> wrote: > Trying to keep this in the spotlight. Apologies for the lengthy post. Heh, don't apologise, you should see some of my posts... o_0 > I'd really like to see features as described by Ross in his summary of the > "Availability: ZFS needs to handle disk removal / driver failure better" > (http://www.opensolaris.org/jive/thread.jspa?messageID=274031 ). > I'd like to have these/similar features as well. Has there already been > internal discussions regarding adding this type of functionality to ZFS > itself, and was there approval, disapproval or no decision? > > Unfortunately my situation has put me in urgent need to find workarounds in > the meantime. > > My setup: I have two iSCSI target nodes, each with six drives exported via > iscsi (Storage Nodes). I have a ZFS Node that logs into each target from > both Storage Nodes and creates a mirrored Zpool with one drive from each > Storage Node comprising each half of the mirrored vdevs (6 x 2-way mirrors). > > My problem: If a Storage Node crashes completely, is disconnected from the > network, iscsitgt core dumps, a drive is pulled, or a drive has a problem > accessing data (read retries), then my ZFS Node hangs while ZFS waits > patiently for the layers below to report a problem and timeout the devices. > This can lead to a roughly 3 minute or longer halt when reading OR writing > to the Zpool on the ZFS node. While this is acceptable in certain > situations, I have a case where my availability demand is more severe. > > My goal: figure out how to have the zpool pause for NO LONGER than 30 > seconds (roughly within a typical HTTP request timeout) and then issue > reads/writes to the good devices in the zpool/mirrors while the other side > comes back online or is fixed. > > My ideas: > 1. In the case of the iscsi targets disappearing (iscsitgt core dump, > Storage Node crash, Storage Node disconnected from network), I need to lower > the iSCSI login retry/timeout values. Am I correct in assuming the only way > to accomplish this is to recompile the iscsi initiator? If so, can someone > help point me in the right direction (I have never compiled ONNV sources - > do I need to do this or can I just recompile the iscsi initiator)? I believe it's possible to just recompile the initiator and install the new driver. I have some *very* rough notes that were sent to me about a year ago, but I've no experience compiling anything in Solaris, so don't know how useful they will be. I'll try to dig them out in case they're useful. > > 1.a. I'm not sure in what Initiator session states iscsi_sess_max_delay is > applicable - only for the initial login, or also in the case of reconnect? > Ross, if you still have your test boxes available, can you please try > setting "set iscsi:iscsi_sess_max_delay = 5" in /etc/system, reboot and try > failing your iscsi vdevs again? I can't find a case where this was tested > quick failover. Will gladly have a go at this on Monday. >1.b. I would much prefer to have bug 649 addressed and fixed rather > than having to resort to recompiling the iscsi initiator (if > iscsi_sess_max_delay) doesn't work. This seems like a trivial feature to > implement. How can I sponsor development? > > 2. In the case of the iscsi target being reachable, but the physical disk > is having problems reading/writing data (retryable events that take roughly > 60 seconds to timeout), should I change the iscsi_rx_max_window tunable with > mdb? Is there a tunable for iscsi_tx? Ross, I know you tried this recently > in the thread referenced above (with value 15), which resulted in a 60 > second hang. How did you offline the iscsi vol to test this failure? Unless > iscsi uses a multiple of the value for retries, then maybe the way you > failed the disk caused the iscsi system to follow a different failure path? > Unfortunately I don't know of a way to introduce read/write retries to a > disk while the disk is still reachable and presented via iscsitgt, so I'm > not sure how to test this. So far I've just been shutting down the Solaris box hosting the iSCSI target. Next step will involve pulling some virtual cables. Unfortunately I don't think I've got a physical box handy to test drive failures right now, but my previous testing (of simply pulling drives) showed that it can be hit and miss as to how well ZFS detects these types of 'failure'. Like you I don't know yet how to simulate failures, so I'm doing simple tests right now, offlining entire drives or computers. Unfortunately I've found more than enough problems with just those tests to keep me busy. >2.a With the fix of > http://bugs.opensolaris.org/view_bug.do?bug_id=6518995 , we can set > sd_retry_count along with sd_io_time to cause I/O failure when a command > takes longer than sd_retry_count * sd_io_time. Can (or should) these > tunables be set on the imported iscsi disks in the ZFS N
Re: [zfs-discuss] zfs not yet suitable for HA applications?
> "da" == David Anderson <[EMAIL PROTECTED]> writes: da> (I have never da> compiled ONNV sources - do I need to do this or can I just da> recompile the iscsi initiator)? The source offering is disorganized and spread over many ``consolidations'' which are pushed through ``gates'', similar to Linux with its source tarballs and kernel patch-kits, but only tens of consolidations instead of thousands of packages. The downside: the overall source-to-binary system you get with Gentoo portage or Debian dpkg or RedHat SRPM's to gather the consolidations and turn them into an .iso, is Sun-proprietary. There was talk of an IPS ``distribution builder'' but it seems to be a binary-only FLAR replacement for replicating installed systems, not a proper open build system that consumes sources and produces IPS ``images''. I don't know which consolidation holds the iSCSI initiator sources, or how to find it. Also for sharing your experiences you need to get the exact same version of the sources as on other people's binary installs, so you can compare with others while change onyl what you're trying to change, ``snv_101 +my timeout change''. I'm not sure how to do that---I see some bugs for example 6684570 refers to versions like ``onnv-gate:2008-04-04'' but how does that map to the snv_ version-markers, or is it a different branch entirely? On Linux or BSD I'd use the package system to both find the source and get the exact version of it I'm running. There are only a few consolidations to dig through. maybe either the ON consolidation or the Storage consolidation? Once you find it all you have to do is solve the version question. pgpyW2trdGXgk.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Problem with ZFS and ACL with GDM
I am the maintainer of GDM, and I am noticing that GDM has a problem when running on a ZFS filesystem, as with Indiana. When GDM (the GNOME Display Manager) starts the login GUI, it runs the following commands on Solaris: /usr/bin/setfacl -m user:gdm:rwx,mask:rwx /dev/audio /usr/bin/setfacl -m user:gdm:rwx,mask:rwx /dev/audioctl It does this because the login GUI programs are run as the "gdm" user, and in order to support text-to-speech via orca, for users with accessibility needs, the "gdm" user needs access to the audio device. We were using setfacl because logindevperm(3) normally manages the audio device permissions and we only want the "gdm" user to have access on-the-fly when the GDM GUI is started. However, I notice that when using ZFS on Indiana the above commands fail with the following error: File system doesn't support aclent_t style ACL's. See acl(5) for more information on ACL styles support by Solaris. What is the appropriate command to use with ZFS? If different commands are needed based on the file system type, then how can GDM determine which command to use. Or is there a better way for GDM to ensure that the audio devices has the appropriate permissions for the "gdm" user to support text-to-speech accessibility? I am not subscribed to this list, so please cc: me in any response. Thanks, Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss