James: We are running Phase 16 on our LSISAS3801E's, and have also tried the 
recently released Phase 17 but it didn't help. All firmware NVRAM settings are 
default. Basically, when we put the disks behind this controller under load 
(e.g. scrubbing, recursive ls on large ZFS filesystem) we get this series of 
log entries that appear at random intervals:

scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49):
       incomplete read- retrying
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
       mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110b00
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
       mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110b00
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
       mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
       mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       Log info 0x31110b00 received for target 40.
       scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       Log info 0x31110b00 received for target 40.
       scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       Log info 0x31110b00 received for target 40.
       scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       Log info 0x31110b00 received for target 40.
       scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@2d,0 (sd42):
       incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
       mpt0: IOC Operational.

It seems to be timing out accessing a disk, retrying, giving up and then doing 
a bus reset?

This is happening with random disks behind the controller and on multiple 
systems with the same hardware config. We are running snv_118 right now and was 
hoping this was some magic mpt-related "bug" that was going to be fixed in 
snv_125 but it doesn't look like it. The LSI3801E is driving 2 x 23-disk JBOD's 
which, albeit a dense solution, it should be able to handle. We are also using 
wide raidz2 vdevs (22 disks each, one per JBOD) which agreeably is slower 
performance-wise, but the goal here is density not performance. I would have 
hoped that the system would just "slow down" if there was IO contention, but 
not experience things like bus resets.

Your thoughts?
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to