Re: [zfs-discuss] Update - mpt errors on snv 101b

2009-12-08 Thread Rob Nelson
I can report io errors with Chenbro based LSI SASx36 IC based  
expanders tested with 111b/121/128a/129.  The HBA was LSI 1068 based.   
If I bypass expander by adding more HBA controllers, mpt does not have  
io errors.


-nola


On Dec 8, 2009, at 6:48 AM, Bruno Sousa wrote:


Hi James,

Thank you for your feedback, and i will send the prtconf -v output for
your email.
I also have another system where i can test something if that's the
case, and if you need extra information or even access to the system,
please let me know it.

Thank you,
Bruno

James C. McPherson wrote:

Bruno Sousa wrote:

Hi all,

During this problem i did a power-off/power-on in the server and the
bus reset/scsi timeout issue persisted. After that i decided to
poweroff/power on the jbod array, and after that everything became
normal.
No scsi timeouts, normal performance, everything is okay now.
With this is it safe to assume that the problem may becaused by the
SAS expander (one single LSI SASX36 Expander Chip) used by the
supermicro jbod chassis, and not by the hba/mpt driver?


Hi Bruno,
that is indeed what I, personally, suspect is the case. Tracking
that down and conclusively proving so is, however, another thing
entirely.

Could you send the output from prtconf -v for your host please,
so that we can have a look at the vital information for the
enclosure services and SMP nodes that the SAS Expander presents/



thankyou,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How can we help fix MPT driver post build 129

2009-12-05 Thread Rob Nelson
How can we help with what is outlined below.  I can reproduce these at will, so 
if anyone at Sun would like an environment to test this situation let me know.

What is the best info to grab for you folks to help here?

Thanks - nola



This is in regard to these threads:

http://www.opensolaris.org/jive/thread.jspa?messageID=421400#421400
http://www.opensolaris.org/jive/thread.jspa?threadID=118947tstart=0
http://www.opensolaris.org/jive/thread.jspa?threadID=117702tstart=1
http://www.opensolaris.org/jive/thread.jspa?messageID=437031tstart=0

And bug IDs: 

6894775 mpt driver timeouts and bus resets under load
6900767 Server hang with LSI 1068E based SAS controller under load

Exec Summary:  Those using the LSI 1068 chipset with the LSI SAS2x IC expander 
have IO errors under load from about build 118 to 129 (last build I tested).

At build 111b, it worked.  If you take the same hardware and load test scripts, 
run under 111b your OK, run under @118 and on you suffer from for example:

Dec  5 08:17:04 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:17:04 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:17:04 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:17:07 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:17:07 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:18:09 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:09 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:18:14 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:14 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:18:14 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:17 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:17 gb2000-007  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:18:19 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:19 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:18:19 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:18:22 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:22 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:19:24 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:24 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:19:29 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:29 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:19:29 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:32 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:32 gb2000-007  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:19:34 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:34 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:19:34 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:19:37 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:37 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:20:39 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:39 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:20:39 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:39 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x31112000
Dec  5 08:20:44 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:44 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:20:44 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:20:44 gb2000-007 scsi: 

[zfs-discuss] Helpful Newbie ZFS Build Tip

2007-07-18 Thread Rob Nelson
OK - here's some info for those of you just starting out with zfs from the 
coding/building level.  I struggled for many days walking down the path of 
install specific snv_xx release - build code of snv_xx release with nightly - 
install kernel only with cap eye Install.

While this all worked and I could boot in most cases with my newly built 
kernel, zfs would not work correctly, the most common error was internal 
error: out of memory.

IT WAS NOT UNTIL DOING A FULL BFU install that zfs worked correctly.

I was a little shocked by this as my source and the original install where at 
the same build level.  From the docs it seemed as though if your at the same 
build level you didn't need the full BFU treatment, NOT SO.

So save yourself some time and always go for the BFU the first time you build 
fresh and plan on using zfs.  

I assume the issue is that the userland zfs tools need to match the kernel 
build (maybe a debug vs non-debug issue)? For veterans zfs folks - please 
correct me if I am missing some subtlety.

Cheers - nola
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss