Thanks again.
No, I don’t see any bio functions, but you have shed very useful lights on the issue. My test platform is b147, the pool disks are from a storage system via a Qlogic fiber HBA. My test case is : 1. zpool set failmode=continue pool1 2. dd if=/dev/zero of=/pool1/fs/myfile count=10000000 & 3. unplug the fiber cable, wait about 30 sec. 4. zpool status (hang) 5. wait about 1 min. 6. can not open a new ssh session to the box, but existing ssh sessions are still alive though. 7. Use the existing session to get into mdb and get threadlist. 8. Eventually, I have to power cycle the box. Tianhong From: Steve Gonczi [mailto:gon...@comcast.net] Sent: Thursday, May 05, 2011 6:32 PM To: TianHong Zhao Subject: Re: [zfs-discuss] multipl disk failures cause zpool hang You are most welcome. The zio_wait just indicates that the sync thread is waiting for an io to complete. Search through the threadlist and see if there is a thread that is stuck in "biowait". zio is asynchronous, so the thread performing the actual io will be a different thread. But first let's just verify again that you are not deleting large files or large snapshots, or zfs destroy-in large file systems when this hang happens, and that you are running a fairly modern zfs version ( something 145+). If I am reading your posts correctly, you can rpeatably make this happen on a mostly idle system, just by disconnecting and reconnecting your cable, correct? In that case, maybe this is a lost "biodone" problem. If you find a thread sitting in biowait for a long time, that would be my suspicion. When you unplug the cable, the strategy routine that would normally complete or time out or fail the io, could be taking a rare exit path, and on that particular path, fails to issue a biodone() like it is supposed to. The next step after this, would be figuring out which is the device's strategy call and give that function a good thorough review, esp. the different exit paths. Steve /sG/ ----- "TianHong Zhao" <tianhong.z...@nexsan.ca> wrote: Thanks for the information. I think you’re right that spa_sync thread is blocked in zio_wait while holding scl_lock which blocks all zpool related command (such as zpool status). Question is why zio_wait is blocked forever ? if the underlying device is offline, could zio service just bail out ? what if I set “zfs sync=disabled” ? Here is what I collected “threadlists” #mdb -K >::threadlist -v ffffff02d9627400 ffffff02f05f80a8 ffffff02d95f2780 1 59 ffffff02d57e585c PC: _resume_from_idle+0xf1 CMD: zpool status stack pointer for thread ffffff02d9627400: ffffff00108a3a70 [ ffffff00108a3a70 _resume_from_idle+0xf1() ] swtch+0x145() cv_wait+0x61() spa_config_enter+0x86() spa_vdev_state_enter+0x3c() spa_vdev_set_common+0x37() spa_vdev_setpath+0x22() zfs_ioc_vdev_setpath+0x48() zfsdev_ioctl+0x15e() cdev_ioctl+0x45() spec_ioctl+0x5a() fop_ioctl+0x7b() ioctl+0x18e() _sys_sysenter_post_swapgs+0x149() … ffffff0010378c40 fffffffffbc2e330 0 0 60 ffffff034935bcb8 PC: _resume_from_idle+0xf1 THREAD: txg_sync_thread() stack pointer for thread ffffff0010378c40: ffffff00103789b0 [ ffffff00103789b0 _resume_from_idle+0xf1() ] swtch+0x145() cv_wait+0x61() zio_wait+0x5d() dsl_pool_sync+0xe1() spa_sync+0x38d() txg_sync_thread+0x247() thread_start+8() Tianhong From: Steve Gonczi [mailto:gon...@comcast.net] Sent: Wednesday, May 04, 2011 10:43 AM To: TianHong Zhao Subject: Re: [zfs-discuss] multipl disk failures cause zpool hang Hi TianHong, I have seen similar apparent hangs, all related to destroying large snapshots, file systems or deleting large files ( with dedup enabled. by large I mean in the terabyte range) In the cases I have looked at, the root problem is the sync taking way too long, and because of the sync interlock with keeping the current txg open, zfs eventually runs out of space in the current txg, and, unable to accept any more transactions. In those cases, the system would come back to life eventually, but it may take a long time ( days potentially). Looks like yours is a reproducible scenario, and I think the disconnect-reconnect triggered hang may be new. It would be good to root cause this. I recommend loading the kernel debugger up and generating a crash dump. It would be pretty straight forward to verify if this is the "sync taking a long time" failure or not. The output from ::threadlist -v would be telling. There have been posts earlier as to how to load the debugger and creating a crash dump. Best wishes Steve /sG/
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss