Re: [zfs-discuss] How can we help fix MPT driver post build 129

2009-12-07 Thread Travis Tabbal
To be fair, I think it's obvious that Sun people are looking into it and that 
users are willing to help diagnose and test. There were requests for particular 
data in those threads you linked to, have you sent yours? It might help them 
find a pattern in the errors. 

I understand the frustration that it hasn't been fixed in a couple builds that 
they have been aware of it, but it could be a very tricky problem. It also 
sounds like it's not reproducible on Sun hardware, so they have to get cards 
and such as well. It's also less urgent now that they have identified a 
workaround that works for most of us. While disabling MSIs is not optimal, it 
does help a lot.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How can we help fix MPT driver post build 129

2009-12-05 Thread Rob Nelson
How can we help with what is outlined below.  I can reproduce these at will, so 
if anyone at Sun would like an environment to test this situation let me know.

What is the best info to grab for you folks to help here?

Thanks - nola



This is in regard to these threads:

http://www.opensolaris.org/jive/thread.jspa?messageID=421400#421400
http://www.opensolaris.org/jive/thread.jspa?threadID=118947tstart=0
http://www.opensolaris.org/jive/thread.jspa?threadID=117702tstart=1
http://www.opensolaris.org/jive/thread.jspa?messageID=437031tstart=0

And bug IDs: 

6894775 mpt driver timeouts and bus resets under load
6900767 Server hang with LSI 1068E based SAS controller under load

Exec Summary:  Those using the LSI 1068 chipset with the LSI SAS2x IC expander 
have IO errors under load from about build 118 to 129 (last build I tested).

At build 111b, it worked.  If you take the same hardware and load test scripts, 
run under 111b your OK, run under @118 and on you suffer from for example:

Dec  5 08:17:04 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:17:04 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:17:04 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:17:07 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:17:07 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:18:09 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:09 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:18:14 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:14 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:18:14 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:17 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:17 gb2000-007  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:18:19 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:19 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:18:19 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:18:22 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:22 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:19:24 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:24 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:19:29 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:29 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:19:29 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:32 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:32 gb2000-007  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:19:34 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:34 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:19:34 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:19:37 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:37 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:20:39 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:39 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:20:39 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:39 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x31112000
Dec  5 08:20:44 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:44 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:20:44 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:20:44 gb2000-007 scsi: