James C. McPherson wrote:

Patrick Petit wrote:

Darren Reed wrote:

Patrick Petit wrote:

Using a ZFS emulated volume, I wasn't expecting to see a system [1] hang caused by a SCSI error. What do you think? The error is not systematic. When it happens, the Solaris/Xen dom0 console keeps displaying the following message and the system hangs. *Aug 3 11:11:23 jesma58 scsi: WARNING: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd2): Aug 3 11:11:23 jesma58 Error for Command: read(10) Error Level: Retryable Aug 3 11:11:23 jesma58 scsi: Requested Block: 67679394 Error Block: 67679394 Aug 3 11:11:23 jesma58 scsi: Vendor: SEAGATE Serial Number: 3JA7XWQY
Aug  3 11:11:23 jesma58 scsi:   Sense Key: Unit_Attention
Aug 3 11:11:23 jesma58 scsi: ASC: 0x29 (bus device reset message occurred), ASCQ: 0x3, FRU: 0x4*

Have you looked into this futher using FMA, using fmadm to start with?

fmadm shows no error :-(
jesma58# fmadm faulty -a
  STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
jesma58#


I have had a similar issue with the solitary SATA disk which makes
up my zfs root pool - errors such as these send the system to a hang
state (fortunately not a hard hang) and require a break/F1-A + forced
crash to get out of.


As I understand it, ZFS will retry operations based on various settings
such as those in 'sd' and I don't believe there are specific error case
handlers in the ZFS code to deal with issues like this.

I am wondering to what extent this is the role of ZFS to fix SCSI controller errors. Shouldn't it be role of the controller driver, or even the controller itself? I would expect that in such circumstances lower layers repare and/or isolate the faulty block by using, for instance, a reassignment block. But, for having written SCSI drivers in the past (to my discharge that was long time ago) I do not recall drivers were that elaborated so letting the above layers deal with the hot potatoes :-(


OTOH it would be nice to see ZFS invoking an error path immediately
on receipt of a failure like your or mine. But I fear that this would
detract from the device agnosticism that we presently have.

Patrick, is your pool mirrored? I know that mine isn't and as a result
I know that I need to expect that I will suffer.

No it's not mirrored. It's a simple pool backed by a physical disk drive.



The other thing that I am concerned with in your scenario is that you
are dd-ing a disk image onto a zvol. I'm not sure that this is the
right way to go about it (although I don't know what *is* the right
way to do it).



Yes I am wondering the same. Would it be preferable to dd on the raw (rdsk) device?

best regards,

James C. McPherson
--
Solaris Datapath Engineering
Storage Division
Sun Microsystems


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to