Re: [zfs-discuss] multipl disk failures cause zpool hang

2011-05-05 Thread TianHong Zhao
Thanks again.

 

No, I don’t see any bio functions, but you have shed very useful lights on the 
issue.

 

My test platform is b147, the pool disks are from a storage system via a Qlogic 
fiber HBA.

 

My test case is :

1.   zpool set failmode=continue pool1

2.   dd if=/dev/zero of=/pool1/fs/myfile count=1000 &

3.   unplug the fiber cable, wait about 30 sec.

4.   zpool status  (hang)

5.   wait about 1 min.

6.   can not open a new ssh session to the box,  but existing ssh sessions 
are still alive though.

7.   Use the existing session to get into mdb and get threadlist.

8.   Eventually, I have to power cycle the box.

 

Tianhong

 

From: Steve Gonczi [mailto:gon...@comcast.net] 
Sent: Thursday, May 05, 2011 6:32 PM
To: TianHong Zhao
Subject: Re: [zfs-discuss] multipl disk failures cause zpool hang

 

You are most welcome.

The zio_wait just indicates that the sync thread is waiting for an io to 
complete.

Search through the threadlist and see if there is a thread that is stuck 
in "biowait".   zio is asynchronous, so the thread performing the actual io
will be a different thread.

But first let's just verify again that you are not deleting large files or 
large snapshots,
or zfs destroy-in large file systems when this hang happens, and that you are 
running
a fairly modern zfs version ( something 145+). If I am reading your posts 
correctly,
you can rpeatably make this happen on a mostly idle system, just by 
disconnecting and
reconnecting your cable, correct?

In that case, maybe this is a lost "biodone" problem.
If  you find a thread sitting in biowait for a long time, that would be my 
suspicion.

When you unplug the cable, the strategy routine that would normally complete
or time out or fail the io,  could be taking a rare exit path, and on that 
particular path,
fails to issue a biodone() like it is supposed to.

The next step after this, would be figuring out which is the device's strategy 
call
and give that function a good thorough review, esp. the different exit paths.

Steve

/sG/

- "TianHong Zhao"  wrote: 

Thanks for the information.

 

I think you’re right that spa_sync thread is blocked in zio_wait while holding 
scl_lock

which blocks all zpool related command (such as zpool status).

 

Question is why zio_wait is blocked forever ? if the underlying device is 
offline, could zio service just bail out ?

what if I set “zfs sync=disabled” ?

 

Here is what I collected “threadlists”

#mdb -K
   >::threadlist -v

ff02d9627400 ff02f05f80a8 ff02d95f2780   1  59 ff02d57e585c
  PC: _resume_from_idle+0xf1CMD: zpool status
  stack pointer for thread ff02d9627400: ff00108a3a70
  [ ff00108a3a70 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
spa_config_enter+0x86()
spa_vdev_state_enter+0x3c()
spa_vdev_set_common+0x37()
spa_vdev_setpath+0x22()
zfs_ioc_vdev_setpath+0x48()
zfsdev_ioctl+0x15e()
cdev_ioctl+0x45()
spec_ioctl+0x5a()
fop_ioctl+0x7b()
ioctl+0x18e()
_sys_sysenter_post_swapgs+0x149()

…

ff0010378c40 fbc2e3300   0  60 ff034935bcb8
  PC: _resume_from_idle+0xf1THREAD: txg_sync_thread()
  stack pointer for thread ff0010378c40: ff00103789b0
  [ ff00103789b0 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
zio_wait+0x5d()
dsl_pool_sync+0xe1()
spa_sync+0x38d()
txg_sync_thread+0x247()
thread_start+8()

 

Tianhong

 

From: Steve Gonczi [mailto:gon...@comcast.net] 
Sent: Wednesday, May 04, 2011 10:43 AM
To: TianHong Zhao
Subject: Re: [zfs-discuss] multipl disk failures cause zpool hang

 

Hi TianHong,

I have seen similar apparent hangs, all related to destroying large snapshots, 
file systems
or deleting large files  ( with dedup enabled. by large I mean in the terabyte 
range)

In the cases I have looked at,   the root problem is the sync taking way too 
long, and 
because of the sync interlock with keeping the current txg open, zfs eventually
runs out of space in the current txg, and,  unable to accept any more 
transactions.

In those cases, the system would come back to life eventually, 
but it may take a long time ( days potentially).

Looks like yours is a reproducible scenario, and I think the 
disconnect-reconnect
triggered hang may be new. It would be good to root cause this. 

I recommend loading the kernel debugger up 
and generating a crash dump.  It would be pretty straight forward to verify if 
this is the 
"sync taking a long time" failure or not.   The output from ::threadlist -v
would be telling.

There have been posts earlier as to how to load the debugger and creating a 
crash dump.

Best wishes

Steve
 
/sG/

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanently using hot spare?

2011-05-05 Thread TianHong Zhao
Just detach the faulty disk, then the spare will become the "normal"
disk once it's finished resilvering.

#zfs detach  

Then you need to the new spare :
#zfs add  

There seems to be a new feature in illumos project to support a zpool
property like "spare promotion", 
which would not require the manual "detach" operation.
 

Tianhong


-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ray Van Dolson
Sent: Thursday, May 05, 2011 5:53 PM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Permanently using hot spare?

Have a failed drive on a ZFS pool (three RAIDZ2 vdevs, one hot spare).
The hot spare kicked in and all is well.

Is it possible to just make that hot spare disk -- already silvered into
the pool -- as a permanent part of the pool?  We could then throw in a
new disk and mark it as a spare and avoid what would seem to be an
unnecessary resilver (twice, once when the spare is brought in and again
when we replace the failed disk).

This document[1] seems to make it sound like it can be done, but I'm not
really seeing how... 

Can I "add" the spare disk to the pool when it's already in use?
Probably not...

Note this is on Solaris 10 U9.

Thanks,
Ray

[1] http://dlc.sun.com/osol/docs/content/ZFSADMIN/gayrd.html#gcvcw
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multipl disk failures cause zpool hang

2011-05-05 Thread TianHong Zhao
Thanks for the information.

 

I think you’re right that spa_sync thread is blocked in zio_wait while holding 
scl_lock

which blocks all zpool related command (such as zpool status).

 

Question is why zio_wait is blocked forever ? if the underlying device is 
offline, could zio service just bail out ?

what if I set “zfs sync=disabled” ?

 

Here is what I collected “threadlists”

#mdb -K
   >::threadlist -v

ff02d9627400 ff02f05f80a8 ff02d95f2780   1  59 ff02d57e585c
  PC: _resume_from_idle+0xf1CMD: zpool status
  stack pointer for thread ff02d9627400: ff00108a3a70
  [ ff00108a3a70 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
spa_config_enter+0x86()
spa_vdev_state_enter+0x3c()
spa_vdev_set_common+0x37()
spa_vdev_setpath+0x22()
zfs_ioc_vdev_setpath+0x48()
zfsdev_ioctl+0x15e()
cdev_ioctl+0x45()
spec_ioctl+0x5a()
fop_ioctl+0x7b()
ioctl+0x18e()
_sys_sysenter_post_swapgs+0x149()

…

ff0010378c40 fbc2e3300   0  60 ff034935bcb8
  PC: _resume_from_idle+0xf1THREAD: txg_sync_thread()
  stack pointer for thread ff0010378c40: ff00103789b0
  [ ff00103789b0 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
zio_wait+0x5d()
dsl_pool_sync+0xe1()
spa_sync+0x38d()
txg_sync_thread+0x247()
thread_start+8()

 

Tianhong

 

From: Steve Gonczi [mailto:gon...@comcast.net] 
Sent: Wednesday, May 04, 2011 10:43 AM
To: TianHong Zhao
Subject: Re: [zfs-discuss] multipl disk failures cause zpool hang

 

Hi TianHong,

I have seen similar apparent hangs, all related to destroying large snapshots, 
file systems
or deleting large files  ( with dedup enabled. by large I mean in the terabyte 
range)

In the cases I have looked at,   the root problem is the sync taking way too 
long, and 
because of the sync interlock with keeping the current txg open, zfs eventually
runs out of space in the current txg, and,  unable to accept any more 
transactions.

In those cases, the system would come back to life eventually, 
but it may take a long time ( days potentially).

Looks like yours is a reproducible scenario, and I think the 
disconnect-reconnect
triggered hang may be new. It would be good to root cause this. 

I recommend loading the kernel debugger up 
and generating a crash dump.  It would be pretty straight forward to verify if 
this is the 
"sync taking a long time" failure or not.   The output from ::threadlist -v
would be telling.

There have been posts earlier as to how to load the debugger and creating a 
crash dump.

Best wishes

Steve
 
/sG/

----- "TianHong Zhao"  wrote: 

Thanks for the reply.

This sounds a serious issue if we have to reboot a machine in such case, I am 
wondering if anybody is working on this.
BTW, the zpool failmode is set to continue, in my test case.

Tianhong Zhao

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multipl disk failures cause zpool hang

2011-05-04 Thread TianHong Zhao
Thanks for the reply.

This sounds a serious issue if we have to reboot a machine in such case, I am 
wondering if anybody is working on this.
BTW, the zpool failmode is set to continue, in my test case.

Tianhong Zhao

-Original Message-
From: Edward Ned Harvey 
[mailto:opensolarisisdeadlongliveopensola...@nedharvey.com] 
Sent: Wednesday, May 04, 2011 9:50 AM
To: TianHong Zhao; zfs-discuss@opensolaris.org
Subject: RE: multipl disk failures cause zpool hang

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- 
> boun...@opensolaris.org] On Behalf Of TianHong Zhao
> 
> There seems to be a few threads about zpool hang,  do we have a 
> workaround to resolve the hang issue without rebooting ?
> 
> In my case,  I have a pool with disks from external LUNs via a fiber
cable.
> When the cable is unplugged while there is IO in the pool, All zpool 
> related command hang (zpool status, zpool list, etc.), put the
cable
> back does not solve the problem.
> 
> Eventually, I cannot even open a new SSH session to the box,  somehow 
> the system goes into  half-locked state.

I've hit that one a lot.  I am not aware of any way to fix it without reboot.  
In fact, in all of my experience, you don't even have a choice.
You wait long enough (a few hours) and the system will become totally 
unresponsive, and you'll have no alternative but power cycle.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] multipl disk failures cause zpool hang

2011-05-03 Thread TianHong Zhao
ROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFECB

memcntl(0xFECC, 6984, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

stat64("/lib/libm.so.2", 0x08046654)= 0

resolvepath("/lib/libm.so.2", "/lib/libm.so.2", 1023) = 14

open("/lib/libm.so.2", O_RDONLY)= 3

mmapobj(3, MMOBJ_INTERPRET, 0xFECB0508, 0x080466C0, 0x) = 0

close(3)= 0

memcntl(0xFEB9, 39464, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

stat64("/lib/libsocket.so.1", 0x08046654)   = 0

resolvepath("/lib/libsocket.so.1", "/lib/libsocket.so.1", 1023) = 19

open("/lib/libsocket.so.1", O_RDONLY)   = 3

mmapobj(3, MMOBJ_INTERPRET, 0xFECB0B10, 0x080466C0, 0x) = 0

close(3)= 0

memcntl(0xFEC9, 16524, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

stat64("/lib/libnsl.so.1", 0x08046654)  = 0

resolvepath("/lib/libnsl.so.1", "/lib/libnsl.so.1", 1023) = 16

open("/lib/libnsl.so.1", O_RDONLY)  = 3

mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEC8

mmapobj(3, MMOBJ_INTERPRET, 0xFEC80018, 0x080466C0, 0x) = 0

close(3)= 0

memcntl(0xFE57, 78408, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

sigfillset(0xFEF683A8)  = 0

stat64("/usr/lib//libshare.so.1", 0x08046B24)   = 0

resolvepath("/usr/lib//libshare.so.1", "/usr/lib/libshare.so.1", 1023) =
22

open("/usr/lib//libshare.so.1", O_RDONLY)   = 3

mmapobj(3, MMOBJ_INTERPRET, 0xFEC80928, 0x08046B90, 0x) = 0

close(3)= 0

memcntl(0xFEC5, 24216, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEC4

sysi86(SI86FPSTART, 0xFEF68CD4, 0x133F, 0x1F80) = 0x0001

open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_SGS.mo",
O_RDONLY) Err#2 ENOENT

sysconfig(_CONFIG_NPROC_ONLN)   = 4

issetugid() = 0

open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSLIB.mo",
O_RDONLY) Err#2 ENOENT

issetugid() = 0

brk(0x08089000) = 0

brk(0x08099000) = 0

stat64("/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", 0x08042BA0) = 0

resolvepath("/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3",
"/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", 1023) = 44

open("/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", O_RDONLY) = 3

mmapobj(3, MMOBJ_INTERPRET, 0xFEC40560, 0x08042C0C, 0x) = 0

close(3)= 0

memcntl(0xFE64, 6780, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

stat64("/usr/lib/locale/en_US.UTF-8/libc.so.1", 0x08042A80) Err#2 ENOENT

stat64("/usr/lib/locale/en_US.UTF-8/methods_unicode.so.3", 0x08042A80) =
0

resolvepath("/usr/lib/locale/en_US.UTF-8/methods_unicode.so.3",
"/usr/lib/locale/common/methods_unicode.so.3", 1023) = 43

open("/usr/lib/locale/en_US.UTF-8/methods_unicode.so.3", O_RDONLY) = 3

mmapobj(3, MMOBJ_INTERPRET, 0xFEC40D30, 0x08042AEC, 0x) = 0

close(3)= 0

mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEC3

memcntl(0xFE62, 3576, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

open("/dev/zfs", O_RDWR)= 3

open("/etc/mnttab", O_RDONLY)   = 4

open("/etc/dfs/sharetab", O_RDONLY) = 5

stat64("/lib/libavl.so.1", 0x080431A8)  = 0

resolvepath("/lib/libavl.so.1", "/lib/libavl.so.1", 1023) = 16

open("/lib/libavl.so.1", O_RDONLY)  = 6

mmapobj(6, MMOBJ_INTERPRET, 0xFEC305E0, 0x08043214, 0x) = 0

close(6)= 0

memcntl(0xFEC1, 2416, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

sysconfig(_CONFIG_PAGESIZE) = 4096

stat64("/lib/libuutil.so.1", 0x080430F8)= 0

resolvepath("/lib/libuutil.so.1", "/lib/libuutil.so.1", 1023) = 18

open("/lib/libuutil.so.1", O_RDONLY)= 6

mmapobj(6, MMOBJ_INTERPRET, 0xFEC30B68, 0x08043164, 0x) = 0

close(6)= 0

mmap(0x0000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFE61

memcntl(0xFE51, 9080, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0

brk(0x080A9000) = 0

ioctl(3, ZFS_IOC_POOL_CONFIGS, 0x08042590)  = 0

stat64("/lib/libnvpair.so.1", 0x08041AB8)   = 0

resolvepath("/lib/libn