James C. McPherson wrote:
Patrick Petit wrote:
Darren Reed wrote:
Patrick Petit wrote:
Using a ZFS emulated volume, I wasn't expecting to see a system [1]
hang caused by a SCSI error. What do you think? The error is not
systematic. When it happens, the Solaris/Xen dom0 console keeps
displaying the following message and the system hangs.
*Aug 3 11:11:23 jesma58 scsi: WARNING:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd2):
Aug 3 11:11:23 jesma58 Error for Command:
read(10) Error Level: Retryable
Aug 3 11:11:23 jesma58 scsi: Requested Block:
67679394 Error Block: 67679394
Aug 3 11:11:23 jesma58 scsi: Vendor:
SEAGATE Serial Number: 3JA7XWQY
Aug 3 11:11:23 jesma58 scsi: Sense Key: Unit_Attention
Aug 3 11:11:23 jesma58 scsi: ASC: 0x29 (bus device reset message
occurred), ASCQ: 0x3, FRU: 0x4*
Have you looked into this futher using FMA, using fmadm to start with?
fmadm shows no error :-(
jesma58# fmadm faulty -a
STATE RESOURCE / UUID
--------
----------------------------------------------------------------------
jesma58#
I have had a similar issue with the solitary SATA disk which makes
up my zfs root pool - errors such as these send the system to a hang
state (fortunately not a hard hang) and require a break/F1-A + forced
crash to get out of.
As I understand it, ZFS will retry operations based on various settings
such as those in 'sd' and I don't believe there are specific error case
handlers in the ZFS code to deal with issues like this.
I am wondering to what extent this is the role of ZFS to fix SCSI
controller errors. Shouldn't it be role of the controller driver, or
even the controller itself? I would expect that in such circumstances
lower layers repare and/or isolate the faulty block by using, for
instance, a reassignment block. But, for having written SCSI drivers in
the past (to my discharge that was long time ago) I do not recall
drivers were that elaborated so letting the above layers deal with the
hot potatoes :-(
OTOH it would be nice to see ZFS invoking an error path immediately
on receipt of a failure like your or mine. But I fear that this would
detract from the device agnosticism that we presently have.
Patrick, is your pool mirrored? I know that mine isn't and as a result
I know that I need to expect that I will suffer.
No it's not mirrored. It's a simple pool backed by a physical disk drive.
The other thing that I am concerned with in your scenario is that you
are dd-ing a disk image onto a zvol. I'm not sure that this is the
right way to go about it (although I don't know what *is* the right
way to do it).
Yes I am wondering the same. Would it be preferable to dd on the raw
(rdsk) device?
best regards,
James C. McPherson
--
Solaris Datapath Engineering
Storage Division
Sun Microsystems
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss