...
I have identified the culprit is the Western Digital drive WD2002FYPS-01U1B0.
It's not clear if they can fix it in firmware, but Western Digital is
replacing my drives.
Feb 17 04:45:10 thecratewall scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Feb 17 04:45:10 thecratewall scsi:
Hi, do you have disks connected in sata1/2? With
WD2003FYYS-01T8B0/WD20EADS-00S2B0/WD1001FALS-00J7B1/WD1002FBYS-01A6B0
these timeouts are to be expected if disk is in SATA2 mode,
No, why are they to be expected with SATA2 mode? Is the defect
specific to the SATA2 circuitry? I guess it could
No, why are they to be expected with SATA2 mode? Is the defect
specific to the SATA2 circuitry? I guess it could be a temporary
workaround provided they would eventually fix the problem in
firmware, but I'm getting new drives, so I guess I can't complain :-)
Probably your new disks do
At 11:19 AM +1000 2/19/10, James C. McPherson wrote:
On 19/02/10 12:51 AM, Maurice Volaski wrote:
For those who've been suffering this problem and who have non-Sun
jbods, could you please let me know what model of jbod and cables
(including length thereof) you have in your configuration.
For
For those who've been suffering this problem and who have non-Sun
jbods, could you please let me know what model of jbod and cables
(including length thereof) you have in your configuration.
For those of you who have been running xVM without MSI support,
could you please confirm whether the
For those who've been suffering this problem and
who have non-Sun
jbods, could you please let me know what model of
jbod and cables
(including length thereof) you have in your
configuration.
We are seeing the problem on both Sun and non-Sun hardware. On our Sun thumper
x4540, we can
On 19/02/10 12:51 AM, Maurice Volaski wrote:
For those who've been suffering this problem and who have non-Sun
jbods, could you please let me know what model of jbod and cables
(including length thereof) you have in your configuration.
For those of you who have been running xVM without MSI
For those who've been suffering this problem and who have non-Sun
jbods, could you please let me know what model of jbod and cables
(including length thereof) you have in your configuration.
For those of you who have been running xVM without MSI support,
could you please confirm whether the
Hi Simon
I.e. you'll have to manually intervene
if a consumer drive causes the system to hang, and
replace it, whereas the RAID edition drives will
probably report the error quickly and then ZFS will
rewrite the data elsewhere, and thus maybe not kick
the drive.
IMHO the relevant aspects
Looks like I got the textbook response from Western Digital:
---
Western Digital technical support only provides jumper configuration and
physical installation support for hard drives used in systems running the
Linux/Unix operating systems. For setup questions beyond physical installation
of
Hi Simon,
they are the new revision.
I got the impression as well that the complaints you reported were mainly
related to embedded Linux systems probably running LVM / mda. (thecus, Qnap,
) Other reports I had seen related to typical HW raids. I don't think the
situation is comparable to
Hi Tonmaus,
they are the new revision.
OK.
I got the impression as well that the complaints you
reported were mainly related to embedded Linux
systems probably running LVM / mda. (thecus, Qnap,
) Other reports I had seen related to typical HW
raids. I don't think the situation is
My timeout issue is definitely the WD10EARS disks.
WD has chosen to cripple their consumer grade disks
when used in quantities greater than one.
I'll now need to evaluate alternative supplers of low
cost disks for low end high volume storage.
Mark.
typo ST32000542AS not NS
This was
Hi Simon,
I am running 5 WD20EADS in a raidz-1+spare on ahci controller without any
problems I could relate to TLER or head parking.
Cheers,
Tonmaus
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
Hi Tonmaus,
That's good to hear. Which revision are they: 00R6B0 or 00P8B0? It's marked on
the drive top.
From what I've seen elsewhere, people seem to be complaining about the newer
00P8B0 revision, so I'd be interested to hear from you. These revision numbers
are listed in the first post of
That's good to hear. Which revision are they: 00R6B0
or 00P8B0? It's marked on the drive top.
Interesting. I wonder if this is the issue too with the 01U1B0 2.0TB drives?
I have 24 WD2002FYPS-01U1B0 drives under OpenSolaris with an LSI 1068E
controller that have weird timeout issues and I
If I'm not mistaken then the WD2002FYPS is an enterprise model: WD RE4-GP (RAID
Edition, Green Power), so you almost certainly have the firmware that allows
(1) the idle time before spindown to be modified with WDIDLE3.EXE and (2) the
error reporting time to be modified with WDTLER.EXE.
So I
The results are in:
My timeout issue is definitely the WD10EARS disks.
Although differences in the error rate was seen with different LSI firmware
revisions, the errors persisted. The more disks on the expander, the higher the
number with iostat errors.
This then causes zpool issues (disk
I would definitely be interested to see if the newer firmware fixes the problem
for you. I have a very similar setup to yours, and finally forcing the
firmware flash to 1.26.00 of my on-board LSI 1068E on a SuperMicro H8DI3+
running snv_131 seemed to address the issue. I'm still waiting to
I can produce the timeout error on multiple, similar servers.
These are storage servers, so no zones or gui running.
Hardware:
Supermicro X7DWN with AOC-USASLP-L8i controller
E1 (single port) backplanes (16 24 bay)
(LSILOGICSASX28 A.0 and LSILOGICSASX36 A.1)
up to 36 1Tb WD Sata disks
This
I'm glad I was able to help someone.
My card is also a 3081E-R (B3). It shipped to me with the IR firmware, and I
immediately flashed the IT firmware on it because I had heard it was supposed
to be (better, faster, stable, shiny) with Solaris and ZFS.
The motherboard on that server has an LSI
Can't say when the problems may have been introduced, but it looks like we've
got my report (b104) and another report from b111 of issues with the 1068E.
The IR firmware seems to do some sort of internal multipathing while the IT
firmware doesn't do any. With the IT firmware, I enabled
I've spent all weekend fighting this problem on our storage server after
installing a ZFS log device, and your suggestion fixed it!
I also have a LSI 3081E-R adapter (B3 revision) connected to a SAS expander
backplane with 7 drives on it. None of the /etc/system options mentioned in
this
I found this thread after fighting the same problem in Nexenta which uses the
OpenSolaris kernel from b104. Thankfully, I think I have (for the moment)
solved my problem.
Background:
I have an LSI 3081e-R (1068E based) adapter which experiences the same
disconnected command timeout error
I was under the impression that the problem affecting most of us was introduced
much later than b104,
sometime between ~114 and ~118. When I first started using my LSI 3081 cards,
they had the IR firmware
on them, and it caused me all kinds of problems. The disks showed up but I
couldn't
Just an update, my scrub completed without any timeout errors in the log. XVM
with MSI disabled globally.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
Perhaps. As I noted though, it also occurs on the onboard NVidia SATA
controller when MSI is enabled. I had already put a line in /etc/system to
disable MSI for that controller per a forum thread and it worked great. I'm now
running with all MSI disabled via XVM as the mpt controller is giving
Tru Huynh wrote:
On Sat, Nov 21, 2009 at 07:08:20PM +1000, James C. McPherson wrote:
If you and everybody else who is seeing this problem could provide
details about your configuration (output from cfgadm -lva, raidctl
-l, prtconf -v, what your zpool configs are, and the firmware rev
of each
James C. McPherson wrote:
Adam Cheal wrote:
I thought you had just set
set xpv_psm:xen_support_msi = -1
which is different, because that sets the
xen_support_msi variable
which lives inside the xpv_psm module.
Setting mptsas:* will have no effect on your system
if you do not
have an mptsas
Can folks confirm/deny each of these?
o The problems are not seen with Sun's version of
this card
On the Thumper x4540 (which uses 6 of the same LSI 1068E controller chips), we
do not see this problem. Then again, it uses a one-to-one mapping of controller
PHY ports to internal disks; no
o The problems are not seen with Sun's version of
this card
Unable to comment as I don't have a Sun card here. If Sun would like to send me
one, I would be willing to test it compared to the cards I do have. I'm running
Supermicro USAS-L8i cards (LSI 1068e based).
o The problems are not
Mark Johnson wrote:
I think there are two different bugs here...
I think there is a problem with MSIs and some variant of mpt
card on xVM. These seem to be showing up as timeout errors.
Disabling MSIs for this adapter seems to fix this problem.
For folks seeing this problem, what HBA adapter
On Nov 30, 2009, at 2:14 PM, Carson Gaspar wrote:
Mark Johnson wrote:
I think there are two different bugs here...
I think there is a problem with MSIs and some variant of mpt
card on xVM. These seem to be showing up as timeout errors.
Disabling MSIs for this adapter seems to fix this
Carson Gaspar wrote:
Mark Johnson wrote:
I think there are two different bugs here...
I think there is a problem with MSIs and some variant of mpt
card on xVM. These seem to be showing up as timeout errors.
Disabling MSIs for this adapter seems to fix this problem.
For folks seeing this
Hi all,
I believe it's an accurate summary of the emails on this thread
over the last 18 hours to say that
(1) disabling MSI support in xVM makes the problem go away
(2) disabling MSI support on bare metal when you only have
disks internal to your host (no jbods), makes the problem
go
Hi,
I just posted a summary of a similiar issue I'm having with non-Sun hardware.
For the record, it's in a Chenbro RM41416 chassis with 4 chenbro SAS backplanes
but no expanders (each backplane is 4 disks connected by SFF-8087 cable). Each
of my LSI brand SAS3081E PCI-E cards is connected to
(1) disabling MSI support in xVM makes the problem go
away
Yes here.
(6) mpt(7d) without MSI support is sloow.
That does seem to be the case. It's not so bad overall, and at least the
performance is consistent. It would be nice if this were improved.
For those of you who have been
Chenbro 16 hotswap bay case. It has 4 mini backplanes that each connect via
an SFF-8087 cable
StarTech HSB430SATBK
hmm, both are passive backplanes with one SATA tunnel per link...
no SAS Expanders (LSISASx36) like those found in SuperMicro or J4x00 with 4
links per connection.
wonder
Adam Cheal wrote:
Thankyou for all who've procvided data about this. I've updated the
bugs mentioned earlier and I believe we can now make progress on
diagnosis.
The new synopsis (should show up on b.o.o tomorrow) is as follows:
6894775 mpt's msi support is suboptimal with xVM
FYI, as the
Hi Adam,
thanks for this info. I've talked with my colleagues
in Beijing (since
I'm in Beijing this week) and we'd like you to try
disabling MSI/MSI-X
for your mpt instances. In /etc/system, add
set mpt:mpt_enable_msi = 0
then regen your boot archive and reboot.
I had already done
Adam Cheal wrote:
Hi Adam,
thanks for this info. I've talked with my colleagues
in Beijing (since
I'm in Beijing this week) and we'd like you to try
disabling MSI/MSI-X
for your mpt instances. In /etc/system, add
set mpt:mpt_enable_msi = 0
then regen your boot archive and reboot.
I had
Adam Cheal wrote:
I thought you had just set
set xpv_psm:xen_support_msi = -1
which is different, because that sets the
xen_support_msi variable
which lives inside the xpv_psm module.
Setting mptsas:* will have no effect on your system
if you do not
have an mptsas card installed. The mptsas
On Nov 23, 2009, at 7:28 PM, Travis Tabbal wrote:
I have a possible workaround. Mark Johnson
mark.john...@sun.com
has been emailing me today about this issue and he
proposed the
following:
You can try adding the following to /etc/system,
then rebooting...
set
Travis Tabbal wrote:
I have a possible workaround. Mark Johnson
mark.john...@sun.com has
been emailing me today about this issue and he
proposed the
following:
You can try adding the following to /etc/system,
then rebooting...
set xpv_psm:xen_support_msi = -1
I am also running
Thankyou for all who've procvided data about this. I've updated
the bugs mentioned earlier and I believe we can now make progress
on diagnosis.
The new synopsis (should show up on b.o.o tomorrow) is as follows:
6894775 mpt's msi support is suboptimal with xVM
James C. McPherson
--
Senior
I will give you all of this information on monday.
This is great news :)
Indeed. I will also be posting this information when I get to the server
tonight. Perhaps it will help. I don't think I want to try using that old
driver though, it seems too risky for my taste.
Is there a command
Travis Tabbal wrote:
I will give you all of this information on monday.
This is great news :)
Indeed. I will also be posting this information when I get to the server
tonight. Perhaps it will help. I don't think I want to try using that old
driver though, it seems too risky for my taste.
I have a possible workaround. Mark Johnson mark.john...@sun.com has been
emailing me today about this issue and he proposed the following:
You can try adding the following to /etc/system, then rebooting...
set xpv_psm:xen_support_msi = -1
I have been able to format a ZVOL container from a
Travis Tabbal wrote:
I have a possible workaround. Mark Johnson mark.john...@sun.com has
been emailing me today about this issue and he proposed the
following:
You can try adding the following to /etc/system, then rebooting...
set xpv_psm:xen_support_msi = -1
I am also running XVM, and after
On Nov 23, 2009, at 7:28 PM, Travis Tabbal wrote:
I have a possible workaround. Mark Johnson mark.john...@sun.com
has been emailing me today about this issue and he proposed the
following:
You can try adding the following to /etc/system, then rebooting...
set xpv_psm:xen_support_msi = -1
For all of those suffering from mpt timeouts in snv_127, I decided to
give the ancient itmpt driver a whirl. It works fine, and in my brief
testing a zfs scrub that would generate about 1 timeout every 2 minutes
or so now runs with no problems.
The downside is that lsiutil and raidctl both
Carson Gaspar wrote:
For all of those suffering from mpt timeouts in snv_127, I decided to
give the ancient itmpt driver a whirl. It works fine, and in my brief
testing a zfs scrub that would generate about 1 timeout every 2 minutes
or so now runs with no problems.
The downside is that
On Nov 21, 2009, at 1:08 AM, James C. McPherson wrote:
We currently have two bugs open on what I believe to be the same
issue, namely
6894775 mpt driver timeouts and bus resets under load
6900767 Server hang with LSI 1068E based SAS controller under load
If you and everybody else who is seeing
53 matches
Mail list logo