Re: [zfs-discuss] solaris 10u8 hangs with message Disconnected command timeout for Target 0

2011-08-17 Thread Richard Elling
On Aug 15, 2011, at 11:17 PM, Ding Honghui wrote:
> My solaris storage hangs. I login to the console and there is messages[1] 
> display on the console.
> I can't login into the console and seems the IO is totally blocked.
> 
> The system is solaris 10u8 on Dell R710 with disk array Dell MD3000. 2 HBA 
> cable connect the server and MD3000.
> The symptom is random.

This symptom is consistent with a broken SATA disk behind a SAS expander.

Unfortunately, the mpt driver is closed source, so we can only infer what the 
code does by using the open source mpt_sas driver as (hopefully) a derivative.

> 
> It is very appreciated if any one can help me out.
> 
> Regards,
> Ding
> 
> [1]
> Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
> /pci@0,0/pci8086,3410@9/pci8086,32c@0/pci1028,1f04@8 (mpt1):
> Aug 16 13:14:16 nas-hz-02   Disconnected command timeout for Target 0

A command did not complete and the mpt driver reset the target. 
If that target is an expander, then everything behind the expander can 
reset, resulting in the aborts of any in-flight commands, as follows...

> Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
> /scsi_vhci/disk@g60026b900053aa1802a44b8f0ded (sd47):
> Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)   
> Error Level: Retryable
> Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679073   
>  Error Block: 1380679073
> Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL  
>  Serial Number: 
> Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
> Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), 
> ASCQ: 0x4, FRU: 0x0
> Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
> /scsi_vhci/disk@g60026b900053aa18029e4b8f0d61 (sd41):
> Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)   
> Error Level: Retryable
> Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072   
>  Error Block: 1380679072
> Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL  
>  Serial Number: 
> Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
> Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), 
> ASCQ: 0x4, FRU: 0x0
> Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
> /scsi_vhci/disk@g60026b900053aa1802a24b8f0dc5 (sd45):
> Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)   
> Error Level: Retryable
> Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679073   
>  Error Block: 1380679073
> Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL  
>  Serial Number: 
> Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
> Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), 
> ASCQ: 0x4, FRU: 0x0
> Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
> /scsi_vhci/disk@g60026b900053aa18029c4b8f0d35 (sd39):
> Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)   
> Error Level: Retryable
> Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072   
>  Error Block: 1380679072
> Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL  
>  Serial Number: 
> Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
> Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), 
> ASCQ: 0x4, FRU: 0x0
> Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
> /scsi_vhci/disk@g60026b900053aa1802984b8f0cd2 (sd35):
> Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)   
> Error Level: Retryable
> Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072   
>  Error Block: 1380679072
> Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL  
>  Serial Number: 
> Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
> Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), 
> ASCQ: 0x4, FRU: 0x0

You will be happiest if you do not use SATA disks directly connected to SAS 
expanders.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] solaris 10u8 hangs with message Disconnected command timeout for Target 0

2011-08-15 Thread Andrew Gabriel

Ding Honghui wrote:

Hi,

My solaris storage hangs. I login to the console and there is 
messages[1] display on the console.

I can't login into the console and seems the IO is totally blocked.

The system is solaris 10u8 on Dell R710 with disk array Dell MD3000. 2 
HBA cable connect the server and MD3000.

The symptom is random.

It is very appreciated if any one can help me out.


The SCSI target you are talking to is being reset. "Unit Attention" 
means it's forgotten what operating parameters have been negotiated with 
the system and is a warning the device might have been changed without 
the system knowing, and it's telling you this happened because of 
"device internal reset". That sort of thing can happen if the firmware 
in the SCSI target crashes and restarts, or the power supply blips, or 
if the device was swapped. I don't know anything about a Dell MD3000, 
but given it's happened on lots of disks at the same moment following a 
timeout, it looks like the array power cycled or array firmware (if any) 
rebooted. (Not sure if a SCSI bus reset can do this or not.)



[1]
Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
/pci@0,0/pci8086,3410@9/pci8086,32c@0/pci1028,1f04@8 (mpt1):

Aug 16 13:14:16 nas-hz-02   Disconnected command timeout for Target 0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
/scsi_vhci/disk@g60026b900053aa1802a44b8f0ded (sd47):
Aug 16 13:14:16 nas-hz-02   Error for Command: 
write(10)   Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 
1380679073Error Block: 1380679073
Aug 16 13:14:16 nas-hz-02 scsi: Vendor: 
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal 
reset), ASCQ: 0x4, FRU: 0x0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
/scsi_vhci/disk@g60026b900053aa18029e4b8f0d61 (sd41):
Aug 16 13:14:16 nas-hz-02   Error for Command: 
write(10)   Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 
1380679072Error Block: 1380679072
Aug 16 13:14:16 nas-hz-02 scsi: Vendor: 
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal 
reset), ASCQ: 0x4, FRU: 0x0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
/scsi_vhci/disk@g60026b900053aa1802a24b8f0dc5 (sd45):
Aug 16 13:14:16 nas-hz-02   Error for Command: 
write(10)   Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 
1380679073Error Block: 1380679073
Aug 16 13:14:16 nas-hz-02 scsi: Vendor: 
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal 
reset), ASCQ: 0x4, FRU: 0x0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
/scsi_vhci/disk@g60026b900053aa18029c4b8f0d35 (sd39):
Aug 16 13:14:16 nas-hz-02   Error for Command: 
write(10)   Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 
1380679072Error Block: 1380679072
Aug 16 13:14:16 nas-hz-02 scsi: Vendor: 
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal 
reset), ASCQ: 0x4, FRU: 0x0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING: 
/scsi_vhci/disk@g60026b900053aa1802984b8f0cd2 (sd35):
Aug 16 13:14:16 nas-hz-02   Error for Command: 
write(10)   Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 
1380679072Error Block: 1380679072
Aug 16 13:14:16 nas-hz-02 scsi: Vendor: 
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal 
reset), ASCQ: 0x4, FRU: 0x0


--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] solaris 10u8 hangs with message Disconnected command timeout for Target 0

2011-08-15 Thread Ding Honghui
Hi,

My solaris storage hangs. I login to the console and there is messages[1]
display on the console.
I can't login into the console and seems the IO is totally blocked.

The system is solaris 10u8 on Dell R710 with disk array Dell MD3000. 2 HBA
cable connect the server and MD3000.
The symptom is random.

It is very appreciated if any one can help me out.

Regards,
Ding

[1]
Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /pci@0,0/pci8086,3410@9
/pci8086,32c@0/pci1028,1f04@8 (mpt1):
Aug 16 13:14:16 nas-hz-02   Disconnected command timeout for Target 0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING:
/scsi_vhci/disk@g60026b900053aa1802a44b8f0ded (sd47):
Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)
Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block:
1380679073Error Block: 1380679073
Aug 16 13:14:16 nas-hz-02 scsi: Vendor:
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset),
ASCQ: 0x4, FRU: 0x0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING:
/scsi_vhci/disk@g60026b900053aa18029e4b8f0d61 (sd41):
Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)
Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block:
1380679072Error Block: 1380679072
Aug 16 13:14:16 nas-hz-02 scsi: Vendor:
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset),
ASCQ: 0x4, FRU: 0x0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING:
/scsi_vhci/disk@g60026b900053aa1802a24b8f0dc5 (sd45):
Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)
Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block:
1380679073Error Block: 1380679073
Aug 16 13:14:16 nas-hz-02 scsi: Vendor:
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset),
ASCQ: 0x4, FRU: 0x0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING:
/scsi_vhci/disk@g60026b900053aa18029c4b8f0d35 (sd39):
Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)
Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block:
1380679072Error Block: 1380679072
Aug 16 13:14:16 nas-hz-02 scsi: Vendor:
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset),
ASCQ: 0x4, FRU: 0x0
Aug 16 13:14:16 nas-hz-02 scsi: WARNING:
/scsi_vhci/disk@g60026b900053aa1802984b8f0cd2 (sd35):
Aug 16 13:14:16 nas-hz-02   Error for Command: write(10)
Error Level: Retryable
Aug 16 13:14:16 nas-hz-02 scsi: Requested Block:
1380679072Error Block: 1380679072
Aug 16 13:14:16 nas-hz-02 scsi: Vendor:
DELL   Serial Number:
Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention
Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset),
ASCQ: 0x4, FRU: 0x0
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss