Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-04-10 Thread Markus Kovero
... I have identified the culprit is the Western Digital drive WD2002FYPS-01U1B0. It's not clear if they can fix it in firmware, but Western Digital is replacing my drives. Feb 17 04:45:10 thecratewall scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Feb 17 04:45:10 thecratewall scsi:

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-04-10 Thread Maurice Volaski
Hi, do you have disks connected in sata1/2? With WD2003FYYS-01T8B0/WD20EADS-00S2B0/WD1001FALS-00J7B1/WD1002FBYS-01A6B0 these timeouts are to be expected if disk is in SATA2 mode, No, why are they to be expected with SATA2 mode? Is the defect specific to the SATA2 circuitry? I guess it could

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-04-10 Thread Markus Kovero
No, why are they to be expected with SATA2 mode? Is the defect specific to the SATA2 circuitry? I guess it could be a temporary workaround provided they would eventually fix the problem in firmware, but I'm getting new drives, so I guess I can't complain :-) Probably your new disks do

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-04-09 Thread Maurice Volaski
At 11:19 AM +1000 2/19/10, James C. McPherson wrote: On 19/02/10 12:51 AM, Maurice Volaski wrote: For those who've been suffering this problem and who have non-Sun jbods, could you please let me know what model of jbod and cables (including length thereof) you have in your configuration. For

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-18 Thread Maurice Volaski
For those who've been suffering this problem and who have non-Sun jbods, could you please let me know what model of jbod and cables (including length thereof) you have in your configuration. For those of you who have been running xVM without MSI support, could you please confirm whether the

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-18 Thread John
For those who've been suffering this problem and who have non-Sun jbods, could you please let me know what model of jbod and cables (including length thereof) you have in your configuration. We are seeing the problem on both Sun and non-Sun hardware. On our Sun thumper x4540, we can

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-18 Thread James C. McPherson
On 19/02/10 12:51 AM, Maurice Volaski wrote: For those who've been suffering this problem and who have non-Sun jbods, could you please let me know what model of jbod and cables (including length thereof) you have in your configuration. For those of you who have been running xVM without MSI

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-06 Thread Maurice Volaski
For those who've been suffering this problem and who have non-Sun jbods, could you please let me know what model of jbod and cables (including length thereof) you have in your configuration. For those of you who have been running xVM without MSI support, could you please confirm whether the

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-04 Thread Tonmaus
Hi Simon I.e. you'll have to manually intervene if a consumer drive causes the system to hang, and replace it, whereas the RAID edition drives will probably report the error quickly and then ZFS will rewrite the data elsewhere, and thus maybe not kick the drive. IMHO the relevant aspects

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-03 Thread Mark Nipper
Looks like I got the textbook response from Western Digital: --- Western Digital technical support only provides jumper configuration and physical installation support for hard drives used in systems running the Linux/Unix operating systems. For setup questions beyond physical installation of

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-03 Thread Tonmaus
Hi Simon, they are the new revision. I got the impression as well that the complaints you reported were mainly related to embedded Linux systems probably running LVM / mda. (thecus, Qnap, ) Other reports I had seen related to typical HW raids. I don't think the situation is comparable to

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-03 Thread Simon Breden
Hi Tonmaus, they are the new revision. OK. I got the impression as well that the complaints you reported were mainly related to embedded Linux systems probably running LVM / mda. (thecus, Qnap, ) Other reports I had seen related to typical HW raids. I don't think the situation is

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Simon Breden
My timeout issue is definitely the WD10EARS disks. WD has chosen to cripple their consumer grade disks when used in quantities greater than one. I'll now need to evaluate alternative supplers of low cost disks for low end high volume storage. Mark. typo ST32000542AS not NS This was

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Tonmaus
Hi Simon, I am running 5 WD20EADS in a raidz-1+spare on ahci controller without any problems I could relate to TLER or head parking. Cheers, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Simon Breden
Hi Tonmaus, That's good to hear. Which revision are they: 00R6B0 or 00P8B0? It's marked on the drive top. From what I've seen elsewhere, people seem to be complaining about the newer 00P8B0 revision, so I'd be interested to hear from you. These revision numbers are listed in the first post of

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Mark Nipper
That's good to hear. Which revision are they: 00R6B0 or 00P8B0? It's marked on the drive top. Interesting. I wonder if this is the issue too with the 01U1B0 2.0TB drives? I have 24 WD2002FYPS-01U1B0 drives under OpenSolaris with an LSI 1068E controller that have weird timeout issues and I

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-02 Thread Simon Breden
If I'm not mistaken then the WD2002FYPS is an enterprise model: WD RE4-GP (RAID Edition, Green Power), so you almost certainly have the firmware that allows (1) the idle time before spindown to be modified with WDIDLE3.EXE and (2) the error reporting time to be modified with WDTLER.EXE. So I

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-02-01 Thread Mark Bennett
The results are in: My timeout issue is definitely the WD10EARS disks. Although differences in the error rate was seen with different LSI firmware revisions, the errors persisted. The more disks on the expander, the higher the number with iostat errors. This then causes zpool issues (disk

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-01-26 Thread Mark Nipper
I would definitely be interested to see if the newer firmware fixes the problem for you. I have a very similar setup to yours, and finally forcing the firmware flash to 1.26.00 of my on-board LSI 1068E on a SuperMicro H8DI3+ running snv_131 seemed to address the issue. I'm still waiting to

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2010-01-25 Thread Mark Bennett
I can produce the timeout error on multiple, similar servers. These are storage servers, so no zones or gui running. Hardware: Supermicro X7DWN with AOC-USASLP-L8i controller E1 (single port) backplanes (16 24 bay) (LSILOGICSASX28 A.0 and LSILOGICSASX36 A.1) up to 36 1Tb WD Sata disks This

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-11 Thread Calvin Morrow
I'm glad I was able to help someone. My card is also a 3081E-R (B3). It shipped to me with the IR firmware, and I immediately flashed the IT firmware on it because I had heard it was supposed to be (better, faster, stable, shiny) with Solaris and ZFS. The motherboard on that server has an LSI

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-11 Thread Calvin Morrow
Can't say when the problems may have been introduced, but it looks like we've got my report (b104) and another report from b111 of issues with the 1068E. The IR firmware seems to do some sort of internal multipathing while the IT firmware doesn't do any. With the IT firmware, I enabled

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-06 Thread Stan Seibert
I've spent all weekend fighting this problem on our storage server after installing a ZFS log device, and your suggestion fixed it! I also have a LSI 3081E-R adapter (B3 revision) connected to a SAS expander backplane with 7 drives on it. None of the /etc/system options mentioned in this

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-05 Thread Calvin Morrow
I found this thread after fighting the same problem in Nexenta which uses the OpenSolaris kernel from b104. Thankfully, I think I have (for the moment) solved my problem. Background: I have an LSI 3081e-R (1068E based) adapter which experiences the same disconnected command timeout error

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-05 Thread Chad Cantwell
I was under the impression that the problem affecting most of us was introduced much later than b104, sometime between ~114 and ~118. When I first started using my LSI 3081 cards, they had the IR firmware on them, and it caused me all kinds of problems. The disks showed up but I couldn't

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-01 Thread Travis Tabbal
Just an update, my scrub completed without any timeout errors in the log. XVM with MSI disabled globally. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-01 Thread Travis Tabbal
Perhaps. As I noted though, it also occurs on the onboard NVidia SATA controller when MSI is enabled. I had already put a line in /etc/system to disable MSI for that controller per a forum thread and it worked great. I'm now running with all MSI disabled via XVM as the mpt controller is giving

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread James C. McPherson
Tru Huynh wrote: On Sat, Nov 21, 2009 at 07:08:20PM +1000, James C. McPherson wrote: If you and everybody else who is seeing this problem could provide details about your configuration (output from cfgadm -lva, raidctl -l, prtconf -v, what your zpool configs are, and the firmware rev of each

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Mark Johnson
James C. McPherson wrote: Adam Cheal wrote: I thought you had just set set xpv_psm:xen_support_msi = -1 which is different, because that sets the xen_support_msi variable which lives inside the xpv_psm module. Setting mptsas:* will have no effect on your system if you do not have an mptsas

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Adam Cheal
Can folks confirm/deny each of these? o The problems are not seen with Sun's version of this card On the Thumper x4540 (which uses 6 of the same LSI 1068E controller chips), we do not see this problem. Then again, it uses a one-to-one mapping of controller PHY ports to internal disks; no

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Travis Tabbal
o The problems are not seen with Sun's version of this card Unable to comment as I don't have a Sun card here. If Sun would like to send me one, I would be willing to test it compared to the cards I do have. I'm running Supermicro USAS-L8i cards (LSI 1068e based). o The problems are not

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Carson Gaspar
Mark Johnson wrote: I think there are two different bugs here... I think there is a problem with MSIs and some variant of mpt card on xVM. These seem to be showing up as timeout errors. Disabling MSIs for this adapter seems to fix this problem. For folks seeing this problem, what HBA adapter

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Jeremy Kitchen
On Nov 30, 2009, at 2:14 PM, Carson Gaspar wrote: Mark Johnson wrote: I think there are two different bugs here... I think there is a problem with MSIs and some variant of mpt card on xVM. These seem to be showing up as timeout errors. Disabling MSIs for this adapter seems to fix this

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Carson Gaspar
Carson Gaspar wrote: Mark Johnson wrote: I think there are two different bugs here... I think there is a problem with MSIs and some variant of mpt card on xVM. These seem to be showing up as timeout errors. Disabling MSIs for this adapter seems to fix this problem. For folks seeing this

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread James C. McPherson
Hi all, I believe it's an accurate summary of the emails on this thread over the last 18 hours to say that (1) disabling MSI support in xVM makes the problem go away (2) disabling MSI support on bare metal when you only have disks internal to your host (no jbods), makes the problem go

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Chad Cantwell
Hi, I just posted a summary of a similiar issue I'm having with non-Sun hardware. For the record, it's in a Chenbro RM41416 chassis with 4 chenbro SAS backplanes but no expanders (each backplane is 4 disks connected by SFF-8087 cable). Each of my LSI brand SAS3081E PCI-E cards is connected to

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Travis Tabbal
(1) disabling MSI support in xVM makes the problem go away Yes here. (6) mpt(7d) without MSI support is sloow. That does seem to be the case. It's not so bad overall, and at least the performance is consistent. It would be nice if this were improved. For those of you who have been

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Rob Logan
Chenbro 16 hotswap bay case. It has 4 mini backplanes that each connect via an SFF-8087 cable StarTech HSB430SATBK hmm, both are passive backplanes with one SATA tunnel per link... no SAS Expanders (LSISASx36) like those found in SuperMicro or J4x00 with 4 links per connection. wonder

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-29 Thread James C. McPherson
Adam Cheal wrote: Thankyou for all who've procvided data about this. I've updated the bugs mentioned earlier and I believe we can now make progress on diagnosis. The new synopsis (should show up on b.o.o tomorrow) is as follows: 6894775 mpt's msi support is suboptimal with xVM FYI, as the

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-29 Thread Adam Cheal
Hi Adam, thanks for this info. I've talked with my colleagues in Beijing (since I'm in Beijing this week) and we'd like you to try disabling MSI/MSI-X for your mpt instances. In /etc/system, add set mpt:mpt_enable_msi = 0 then regen your boot archive and reboot. I had already done

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-29 Thread James C. McPherson
Adam Cheal wrote: Hi Adam, thanks for this info. I've talked with my colleagues in Beijing (since I'm in Beijing this week) and we'd like you to try disabling MSI/MSI-X for your mpt instances. In /etc/system, add set mpt:mpt_enable_msi = 0 then regen your boot archive and reboot. I had

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-29 Thread James C. McPherson
Adam Cheal wrote: I thought you had just set set xpv_psm:xen_support_msi = -1 which is different, because that sets the xen_support_msi variable which lives inside the xpv_psm module. Setting mptsas:* will have no effect on your system if you do not have an mptsas card installed. The mptsas

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-24 Thread Travis Tabbal
On Nov 23, 2009, at 7:28 PM, Travis Tabbal wrote: I have a possible workaround. Mark Johnson mark.john...@sun.com has been emailing me today about this issue and he proposed the following: You can try adding the following to /etc/system, then rebooting... set

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-24 Thread Travis Tabbal
Travis Tabbal wrote: I have a possible workaround. Mark Johnson mark.john...@sun.com has been emailing me today about this issue and he proposed the following: You can try adding the following to /etc/system, then rebooting... set xpv_psm:xen_support_msi = -1 I am also running

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-24 Thread James C. McPherson
Thankyou for all who've procvided data about this. I've updated the bugs mentioned earlier and I believe we can now make progress on diagnosis. The new synopsis (should show up on b.o.o tomorrow) is as follows: 6894775 mpt's msi support is suboptimal with xVM James C. McPherson -- Senior

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-23 Thread Travis Tabbal
I will give you all of this information on monday. This is great news :) Indeed. I will also be posting this information when I get to the server tonight. Perhaps it will help. I don't think I want to try using that old driver though, it seems too risky for my taste. Is there a command

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-23 Thread James C. McPherson
Travis Tabbal wrote: I will give you all of this information on monday. This is great news :) Indeed. I will also be posting this information when I get to the server tonight. Perhaps it will help. I don't think I want to try using that old driver though, it seems too risky for my taste.

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-23 Thread Travis Tabbal
I have a possible workaround. Mark Johnson mark.john...@sun.com has been emailing me today about this issue and he proposed the following: You can try adding the following to /etc/system, then rebooting... set xpv_psm:xen_support_msi = -1 I have been able to format a ZVOL container from a

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-23 Thread Carson Gaspar
Travis Tabbal wrote: I have a possible workaround. Mark Johnson mark.john...@sun.com has been emailing me today about this issue and he proposed the following: You can try adding the following to /etc/system, then rebooting... set xpv_psm:xen_support_msi = -1 I am also running XVM, and after

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-23 Thread Jeremy Kitchen
On Nov 23, 2009, at 7:28 PM, Travis Tabbal wrote: I have a possible workaround. Mark Johnson mark.john...@sun.com has been emailing me today about this issue and he proposed the following: You can try adding the following to /etc/system, then rebooting... set xpv_psm:xen_support_msi = -1

[zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-21 Thread Carson Gaspar
For all of those suffering from mpt timeouts in snv_127, I decided to give the ancient itmpt driver a whirl. It works fine, and in my brief testing a zfs scrub that would generate about 1 timeout every 2 minutes or so now runs with no problems. The downside is that lsiutil and raidctl both

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-21 Thread James C. McPherson
Carson Gaspar wrote: For all of those suffering from mpt timeouts in snv_127, I decided to give the ancient itmpt driver a whirl. It works fine, and in my brief testing a zfs scrub that would generate about 1 timeout every 2 minutes or so now runs with no problems. The downside is that

Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-21 Thread Jeremy Kitchen
On Nov 21, 2009, at 1:08 AM, James C. McPherson wrote: We currently have two bugs open on what I believe to be the same issue, namely 6894775 mpt driver timeouts and bus resets under load 6900767 Server hang with LSI 1068E based SAS controller under load If you and everybody else who is seeing