> Can folks confirm/deny each of these?
> 
> o The problems are not seen with Sun's version of
>  this card

On the Thumper x4540 (which uses 6 of the same LSI 1068E controller chips), we 
do not see this problem. Then again, it uses a one-to-one mapping of controller 
PHY ports to internal disks; no JBODs or expanders here. 1 controller per 8 
disks is a much more performance-oriented ratio, so I don`t expect to see the 
problem there.

We have not tried using a Sun re-branded LSI controller for the external JBODs. 
My understanding is that Sun uses a custom firmware derived from LSIs public 
offerings.

> o The problems are not seen with LSI's version of
>  the driver

Incorrect. We have tried using the latest itmpt driver from LSI and see the 
same problem.

> o The problems are seen with the latest LSI
> firmware

Correct. We`ve tried Phase 15, 16 and 17. All exhibit the same problem.
 
> o Errors still occur if MSIs are disabled.  They
>  seem to
> occur less frequently.  Were timeouts being seen
> before
> MSIs were disabled? If so, are timeouts being
> seen after
> MSIs were disabled?

Correct: disabling the MSIs did not affect the problem, although they did slow 
the IO on the system down enough to delay the onset of the problem a few hours. 
Timeouts were being seen before disabling MSIs and they are usually coupled 
with bus resets, which is standard behaviour for the sd driver if an IO is 
timed out for too long, I believe.

> folks seeing the command failures, what are you using
> for
> a jbod? Is there firmware on the jbod, and if so, is
> it up
> to date? Have you tried a different jbod?  Are the
> command failures
> tied to a subset of the disks or effect all of them?
> Have you tried
> a different length cable?
> 

We use Dell DCS J23 JBODs (23 disk enclosure), 2 per LSI3801E, fully populated 
with enterprise-grade WD SATA drives. We`ve tried both R105 and R106 firmware 
(both latest production-grade firmware) on them with no differences. The 
problem affects all disks in the JBOD(s), not specific ones. Usually one or two 
disks start to timeout which snowballs into all of them when the bus resets. We 
have 15 of these systems running, all with the same config using 2 foot 
external cables...changing cables doesn`t help. We have not tried using a 
different JBOD.

- Adam
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to