On Wed, Jun 23, 2010 at 10:14 AM, Jeff Bacon <ba...@walleyesoftware.com> wrote:
>> > Have I missed any changes/updates in the situation?
>>
>> I'm been getting very bad performance out of a LSI 9211-4i card
>> (mpt_sas) with Seagate Constellation 2TB SAS disks, SM SC846E1 and
>> Intel X-25E/M SSDs. Long story short, I/O will hang for over 1 minute
>> at random under heavy load.
>
> Hm. That I haven't seen. Is this hang as in some drive hangs up with
> iostat busy% at 100 and nothing else happening (can't talk to a disk) or
> a hang as perceived by applications under load?
>
> What's your read/write mix, and what are you using for CPU/mem? How many
> drives?

I'm using iozone to get some performance numbers and I/O hangs when
it's doing the writing phase.

This pool has:

18 x 2TB SAS disks as 9 data mirrors
2 x 32GB X-25E as log mirror
1 x 160GB X-160M as cache

iostat shows "2" I/O operations active and SSDs at 100% busy when it's stuck.

There are timeout messages when this happens:

Jun 23 00:05:51 osol-x8-hba scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0):
Jun 23 00:05:51 osol-x8-hba     Disconnected command timeout for Target 11
Jun 23 00:05:51 osol-x8-hba scsi: [ID 365881 kern.info]
/p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0):
Jun 23 00:05:51 osol-x8-hba     Log info 0x31140000 received for target 11.
Jun 23 00:05:51 osol-x8-hba     scsi_status=0x0, ioc_status=0x8048,
scsi_state=0xc
Jun 23 00:05:51 osol-x8-hba scsi: [ID 365881 kern.info]
/p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0):
Jun 23 00:05:51 osol-x8-hba     Log info 0x31140000 received for target 11.
Jun 23 00:05:51 osol-x8-hba     scsi_status=0x0, ioc_status=0x8048,
scsi_state=0xc
Jun 23 00:11:51 osol-x8-hba scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0):
Jun 23 00:11:51 osol-x8-hba     Disconnected command timeout for Target 11
Jun 23 00:11:51 osol-x8-hba scsi: [ID 365881 kern.info]
/p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0):
Jun 23 00:11:51 osol-x8-hba     Log info 0x31140000 received for target 11.
Jun 23 00:11:51 osol-x8-hba     scsi_status=0x0, ioc_status=0x8048,
scsi_state=0xc
Jun 23 00:11:51 osol-x8-hba scsi: [ID 365881 kern.info]
/p...@0,0/pci8086,3...@3/pci1000,3...@0 (mpt_sas0):
Jun 23 00:11:51 osol-x8-hba     Log info 0x31140000 received for target 11.
Jun 23 00:11:51 osol-x8-hba     scsi_status=0x0, ioc_status=0x8048,
scsi_state=0xc



> I wonder if maybe your SSDs are flooding the channel. I have a (many)
> 847E2 chassis, and I'm considering putting in a second pair of
> controllers and splitting the drives front/back so it's 24/12 vs all 36
> on one pair.

My plan is to use the newest SC846E26 chassis with 2 cables but right
now what I've available for testing is the SC846E1.

I like the fact that SM uses the LSI chipsets in their backplanes.
It's been a good experience so far.


>> Swapping the 9211-4i for a MegaRAID 8888ELP (mega_sas) improves
>> performance by 30-40% instantly and there are no hangs anymore so I'm
>> guessing it's something related to the mpt_sas driver.
>
> Well, I sorta hate to swap out all of my controllers (bother, not to
> mention the cost) but it'd be nice to have raidutil/lsiutil back.

As much as I would like to blame faulty hardware for this issue, I
only pointed out that using the MegaRAID doesn't show the problem
because that's what I've been using without any issues in this
particular setup.

This system will be available to me for quite some time, so if anyone
wants all kinds of tests to understand what's happening, I would be
happy to provide those.

-- 
Giovanni Tirloni
gtirl...@sysdroid.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to