mpt doesn't propagate read errors and dies on a single sector?

Attila Nagy Sat, 20 Oct 2012 14:39:45 -0700

Hi,

I have a Sun X4540 with LSI C1068E based SAS controllers (FW version:1.27.02.00-IT).My problem is if one drive starts to fail with read errors, the machinebecomes completely unusable (running stable/9 with ZFS), because -itseems- ZFS can't see that there are read errors on a device, the mptdriver (controller, kernel?) wants to re-issue the operation endlessly.


Here is a verbose (dev.mpt.0.debug=7 level) dump:
mpt0: Address Reply:
SCSI IO Request Reply @ 0xffffff87ffcfdc00
        IOC Status    Success
        IOCLogInfo    0x00000000
        MsgLength     0x09
        MsgFlags      0x00
        MsgContext    0x000200eb
        Bus:          0
        TargetID      3
        CDBLength     10
        SCSI Status:  Check Condition
        SCSI State:   (0x00000001)AutoSense_Valid
        TransferCnt   0x20000
        SenseCnt      0x0012
        ResponseInfo  0x00000000
(da3:mpt0:0:3:0): READ(10). CDB: 28 0 3a 38 5d e 0 1 0 0
(da3:mpt0:0:3:0): CAM status: SCSI Status Error
(da3:mpt0:0:3:0): SCSI status: Check Condition
(da3:mpt0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da3:mpt0:0:3:0): Info: 0x3a385d1a
(da3:mpt0:0:3:0): Error 5, Unretryable error
SCSI IO Request @ 0xffffff80003046f0
        Chain Offset  0x00
        MsgFlags      0x00
        MsgContext    0x000200ea
        Bus:                0
        TargetID            3
        SenseBufferLength   32
        LUN:              0x0
        Control           0x02000000  READ  SIMPLEQ
        DataLength      0x00020000
        SenseBufAddr    0x0c65d5e0
        CDB[0:10]       28 00 3a 38 5e 0e 00 01 00 00

SE64 0xffffff87ffd1c430: Addr=0x000000010e858000FlagsLength=0xd3020000

         64_BIT_ADDRESSING LAST_ELEMENT END_OF_BUFFER END_OF_LIST
mpt0: Address Reply:
SCSI IO Request Reply @ 0xffffff87ffcfdd00
        IOC Status    Success
        IOCLogInfo    0x00000000
        MsgLength     0x09
        MsgFlags      0x00
        MsgContext    0x000200ea
        Bus:          0
        TargetID      3
        CDBLength     10
        SCSI Status:  Check Condition
        SCSI State:   (0x00000001)AutoSense_Valid
        TransferCnt   0x20000
        SenseCnt      0x0012
        ResponseInfo  0x00000000

And I get these check condition SCSI errors endlessly. If ZFS is enabledat boot, the machine can't even start because of this (zpool importnever finishes), if I boot without ZFS, and try to import, the zpoolcommand stucks in the vdev_g state:

 1163 root          1  20    0 35440K  5200K vdev_g  6   0:01 0.10% zpool
procstat -k 1163
  PID    TID COMM             TDNAME KSTACK

1163 100116 zpool - mi_switchsleepq_timedwait _sleep biowait vdev_geom_read_guid vdev_geom_openvdev_open vdev_open_children vdev_raidz_open vdev_openvdev_open_children vdev_root_open vdev_open spa_load spa_tryimportzfs_ioc_pool_tryimport zfsdev_ioctl devfs_ioctl_f

Could it be that GEOM/ZFS doesn't receive this read error and waitsindefinitely for the command to complete?


_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

mpt doesn't propagate read errors and dies on a single sector?

Reply via email to