Re: [xen-discuss] Re: [zfs-discuss] System hangs on SCSI error

Patrick Petit Thu, 03 Aug 2006 05:53:59 -0700

James C. McPherson wrote:

Patrick Petit wrote:
Darren Reed wrote:
Patrick Petit wrote:
Using a ZFS emulated volume, I wasn't expecting to see a system [1]hang caused by a SCSI error. What do you think? The error is notsystematic. When it happens, the Solaris/Xen dom0 console keepsdisplaying the following message and the system hangs.*Aug 3 11:11:23 jesma58 scsi: WARNING:/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd2):Aug 3 11:11:23 jesma58 Error for Command:read(10) Error Level: RetryableAug 3 11:11:23 jesma58 scsi: Requested Block:67679394 Error Block: 67679394Aug 3 11:11:23 jesma58 scsi: Vendor:SEAGATE Serial Number: 3JA7XWQY
Aug  3 11:11:23 jesma58 scsi:   Sense Key: Unit_Attention
Aug 3 11:11:23 jesma58 scsi: ASC: 0x29 (bus device reset messageoccurred), ASCQ: 0x3, FRU: 0x4*
Have you looked into this futher using FMA, using fmadm to start with?
fmadm shows no error :-(
jesma58# fmadm faulty -a
  STATE RESOURCE / UUID
------------------------------------------------------------------------------
jesma58#
I have had a similar issue with the solitary SATA disk which makes
up my zfs root pool - errors such as these send the system to a hang
state (fortunately not a hard hang) and require a break/F1-A + forced
crash to get out of.


As I understand it, ZFS will retry operations based on various settings
such as those in 'sd' and I don't believe there are specific error case
handlers in the ZFS code to deal with issues like this.

I am wondering to what extent this is the role of ZFS to fix SCSIcontroller errors. Shouldn't it be role of the controller driver, oreven the controller itself? I would expect that in such circumstanceslower layers repare and/or isolate the faulty block by using, forinstance, a reassignment block. But, for having written SCSI drivers inthe past (to my discharge that was long time ago) I do not recalldrivers were that elaborated so letting the above layers deal with thehot potatoes :-(


OTOH it would be nice to see ZFS invoking an error path immediately
on receipt of a failure like your or mine. But I fear that this would
detract from the device agnosticism that we presently have.

Patrick, is your pool mirrored? I know that mine isn't and as a result
I know that I need to expect that I will suffer.


No it's not mirrored. It's a simple pool backed by a physical disk drive.



The other thing that I am concerned with in your scenario is that you
are dd-ing a disk image onto a zvol. I'm not sure that this is the
right way to go about it (although I don't know what *is* the right
way to do it).

Yes I am wondering the same. Would it be preferable to dd on the raw(rdsk) device?

best regards,

James C. McPherson
--
Solaris Datapath Engineering
Storage Division
Sun Microsystems



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [xen-discuss] Re: [zfs-discuss] System hangs on SCSI error

Reply via email to