Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

Sanjeev Tue, 11 Aug 2009 02:17:39 -0700

Hi Chris,

On Sun, Aug 09, 2009 at 05:53:12PM -0700, Chris Baker wrote:
> OK - had a chance to do more testing over the weekend. Firstly some extra 
> data:
> 
> Moving the mirror to both drives on ICH10R ports and on sudden disk power-off 
> the mirror faulted cleanly to the remaining drive no problem.
> 
> Having a one drive pool on the ICH10R under heavy write traffic and then 
> powered off causes the zpool/zfs hangs described above.
> 
> ZPool being tested is called "Remove" and consists of:
> c7t2d0s0 - attached to the ICH10R
> c8t0d0s0 - second disk attached to the Si3132 card with the Si3124 driver
> 
> This leads me to the following suspicions:
> (1) We have an Si3124 issue in not detecting the drive removal always, or of 
> failing to pass that info back to ZFS, even though we know the kernel noticed
> (2) In the event that the only disk in a pool goes faulted, the zpool/zfs 
> subsystem will block indefinitely waiting to get rid of the pending writes.
> 
> I've just recabled back to one disk on ICH10R and one on Si3132 and tried the 
> sudden off with the Si drive:
> 
> *) First try - mirror faulted and IO continued - good news but confusing
> *) Second try - zfs/zpool hung, couldn't even get a zpool status, tried a 
> savecore but savecore hung moving the data to a seperate zpool
> *) Third try - zfs/zpool hung, ran savecore -L to a UFS filesystem I created 
> for the that purpose
> 
> After the first try, dmesg shows:
> Aug 10 00:34:41 TS1  SATA device detected at port 0
> Aug 10 00:34:41 TS1 sata: [ID 663010 kern.info] 
> /p...@0,0/pci8086,3...@1c,3/pci1095,7...@0 :
> Aug 10 00:34:41 TS1 sata: [ID 761595 kern.info]         SATA disk device at 
> port 0
> Aug 10 00:34:41 TS1 sata: [ID 846691 kern.info]         model WDC 
> WD5000AACS-00ZUB0
> Aug 10 00:34:41 TS1 sata: [ID 693010 kern.info]         firmware 01.01B01
> Aug 10 00:34:41 TS1 sata: [ID 163988 kern.info]         serial number      
> WD-xxxxxxxxxxxxxx
> Aug 10 00:34:41 TS1 sata: [ID 594940 kern.info]         supported features:
> Aug 10 00:34:41 TS1 sata: [ID 981177 kern.info]          48-bit LBA, DMA, 
> Native Command Queueing, SMART, SMART self-test
> Aug 10 00:34:41 TS1 sata: [ID 643337 kern.info]         SATA Gen2 signaling 
> speed (3.0Gbps)
> Aug 10 00:34:41 TS1 sata: [ID 349649 kern.info]         Supported queue depth 
> 32, limited to 31
> Aug 10 00:34:41 TS1 sata: [ID 349649 kern.info]         capacity = 976773168 
> sectors
> Aug 10 00:34:41 TS1 fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, 
> TYPE: Fault, VER: 1, SEVERITY: Major
> Aug 10 00:34:41 TS1 EVENT-TIME: Mon Aug 10 00:34:41 BST 2009
> Aug 10 00:34:41 TS1 PLATFORM:                                  , CSN:         
>                          , HOSTNAME: TS1
> Aug 10 00:34:41 TS1 SOURCE: zfs-diagnosis, REV: 1.0
> Aug 10 00:34:41 TS1 EVENT-ID: ab7df266-3380-4a35-e0bc-9056878fd182
> Aug 10 00:34:41 TS1 DESC: The number of I/O errors associated with a ZFS 
> device exceeded
> Aug 10 00:34:41 TS1          acceptable levels.  Refer to 
> http://sun.com/msg/ZFS-8000-FD for more information.
> Aug 10 00:34:41 TS1 AUTO-RESPONSE: The device has been offlined and marked as 
> faulted.  An attempt
> Aug 10 00:34:41 TS1          will be made to activate a hot spare if 
> available.
> Aug 10 00:34:41 TS1 IMPACT: Fault tolerance of the pool may be compromised.
> Aug 10 00:34:41 TS1 REC-ACTION: Run 'zpool status -x' and replace the bad 
> device.
> 
> and after the second and third test, just:
> SATA device detached at port 0
> 
> Core files were tar-ed together and bzip2-ed and can be found at:
> 
> http://dl.getdropbox.com/u/1709454/dump.bakerci.200908100106.tar.bz2
> 
> Please let me know if you need any further core/debug. Apologies to readers 
> having all this inflicted by email digest.


Spent some time analysing the dump and I find that ZFS does not know that the
disk is dead. There are about 1900 WRITE requests pending on that disk
(c8t0d0s0).

Attached are the details. Let me know what you find from fmdump.

I suspect that this has got to do with support for the card.

Hope that helps.

Regards,
Sanjeev
-- 
----------------
Sanjeev Bagewadi
Solaris RPE 
Bangalore, India

The pool in question "remove"

-- snip --
ZFS spa @ 0xffffff01c77b9800
    Pool name: remove
    State: ACTIVE
       VDEV Address      State    Aux   Description
    0xffffff01c9faec80  HEALTHY    -       root

            VDEV Address      State    Aux   Description
         0xffffff01c9faf2c0  HEALTHY    -      mirror

                 VDEV Address      State    Aux     Description
              0xffffff01d4099940  HEALTHY    -    /dev/dsk/c7t2d0s0

                 VDEV Address      State    Aux     Description
              0xffffff01d4099300  HEALTHY    -    /dev/dsk/c8t0d0s0
-- snip --

Obviously, the status for c8t0d0s0 is wrong because, it should have marked
it dead.
 
Looking at the threads we have spa_sync() waiting for an IO to complete :
-- snip --
> ffffff0008678c60::findstack -v
stack pointer for thread ffffff0008678c60: ffffff0008678a00
[ ffffff0008678a00 _resume_from_idle+0xf1() ]
  ffffff0008678a30 swtch+0x147()
  ffffff0008678a60 cv_wait+0x61(ffffff01db0e9bc0, ffffff01db0e9bb8)
  ffffff0008678aa0 zio_wait+0x5d(ffffff01db0e9900)
  ffffff0008678b10 dsl_pool_sync+0xe1(ffffff01cedac300, d9)
  ffffff0008678ba0 spa_sync+0x32a(ffffff01c77b9800, d9)
  ffffff0008678c40 txg_sync_thread+0x265(ffffff01cedac300)
  ffffff0008678c50 thread_start+8()
-- snip --

There are 1921 write IOs on the write-queue for the vdev (c8t0d0s0) :
-- snip --
> 0xffffff01d4099300::print -at struct vdev vdev_queue
{
    ffffff01d4099790 avl_tree_t vdev_queue.vq_deadline_tree = {
        ffffff01d4099790 struct avl_node *avl_root = 0xffffff01db8ae600
        ffffff01d4099798 int (*)() avl_compar = vdev_queue_deadline_compare
        ffffff01d40997a0 size_t avl_offset = 0x220
        ffffff01d40997a8 ulong_t avl_numnodes = 0x781
        ffffff01d40997b0 size_t avl_size = 0x2d0
    }
    ffffff01d40997b8 avl_tree_t vdev_queue.vq_read_tree = {
        ffffff01d40997b8 struct avl_node *avl_root = 0
        ffffff01d40997c0 int (*)() avl_compar = vdev_queue_offset_compare
        ffffff01d40997c8 size_t avl_offset = 0x208
        ffffff01d40997d0 ulong_t avl_numnodes = 0
        ffffff01d40997d8 size_t avl_size = 0x2d0
    }
    ffffff01d40997e0 avl_tree_t vdev_queue.vq_write_tree = {
        ffffff01d40997e0 struct avl_node *avl_root = 0xffffff01db8ae5e8
        ffffff01d40997e8 int (*)() avl_compar = vdev_queue_offset_compare
        ffffff01d40997f0 size_t avl_offset = 0x208
        ffffff01d40997f8 ulong_t avl_numnodes = 0x781
        ffffff01d4099800 size_t avl_size = 0x2d0
    }
    ffffff01d4099808 avl_tree_t vdev_queue.vq_pending_tree = {
        ffffff01d4099808 struct avl_node *avl_root = 0xffffff01ddaae0f8
        ffffff01d4099810 int (*)() avl_compar = vdev_queue_offset_compare
        ffffff01d4099818 size_t avl_offset = 0x208
        ffffff01d4099820 ulong_t avl_numnodes = 0x23
        ffffff01d4099828 size_t avl_size = 0x2d0
    }
    ffffff01d4099830 kmutex_t vdev_queue.vq_lock = {
        ffffff01d4099830 void *[1] _opaque = [ 0 ]
    }
}
-- snip --

So, ZFS is not aware that the device has been removed and it is still waiting
for those IOs to finish.

You could  run :  fmdump -v /var/fm/fmd/fltlog and see if any faults were
reported. And if ZFS did detect the failure that would be reported as well.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

Reply via email to