Thank you so much for your reply! 
Here are the outputs:

>1. Find PID of the hanging 'zpool import', e.g. with 'ps -ef | grep zpool'
r...@mybox:~# ps -ef|grep zpool
    root   915   908   0 03:34:46 pts/3       0:00 grep zpool
    root   901   874   1 03:34:09 pts/2       0:00 zpool import drowning

>2. Substitute PID with actual number in the below command
>echo "0tPID::pid2proc|::walk thread|::findstack -v" | mdb -k

r...@mybox:~# echo "0t901::pid2proc|::walk thread|::findstack -v" | mdb -k
stack pointer for thread ffffff02ed8c7880: ffffff0010191a10
[ ffffff0010191a10 _resume_from_idle+0xf1() ]
  ffffff0010191a40 swtch+0x147()
  ffffff0010191a70 cv_wait+0x61(ffffff02eb010dda, ffffff02eb010d98)
  ffffff0010191ac0 txg_wait_synced+0x7f(ffffff02eb010c00, 31983c5)
  ffffff0010191b00 dsl_sync_task_group_wait+0xee(ffffff02f1d11bd8)
  ffffff0010191b80 dsl_sync_task_do+0x65(ffffff02eb010c00, fffffffff78be1f0, 
  fffffffff78be250, ffffff02edc38400, ffffff0010191b98, 0)
  ffffff0010191bd0 dsl_dataset_rollback+0x53(ffffff02edc38400, 2)
  ffffff0010191c00 dmu_objset_rollback+0x46(ffffff02eb674b20)
  ffffff0010191c40 zfs_ioc_rollback+0x10d(ffffff02f2b58000)
  ffffff0010191cc0 zfsdev_ioctl+0x10b(b600000000, 5a1a, 803e240, 100003, 
  ffffff02ee813338, ffffff0010191de4)
  ffffff0010191d00 cdev_ioctl+0x45(b600000000, 5a1a, 803e240, 100003, 
  ffffff02ee813338, ffffff0010191de4)
  ffffff0010191d40 spec_ioctl+0x83(ffffff02df6a7480, 5a1a, 803e240, 100003, 
  ffffff02ee813338, ffffff0010191de4, 0)
  ffffff0010191dc0 fop_ioctl+0x7b(ffffff02df6a7480, 5a1a, 803e240, 100003, 
  ffffff02ee813338, ffffff0010191de4, 0)
  ffffff0010191ec0 ioctl+0x18e(3, 5a1a, 803e240)
  ffffff0010191f10 _sys_sysenter_post_swapgs+0x14b()

>3. Do
>echo "::spa" | mdb -k

r...@mybox:~# echo "::spa" | mdb -k
ADDR                 STATE NAME                                                
ffffff02f2b8b800    ACTIVE mypool
ffffff02d5890000    ACTIVE rpool

>4. Find address of your pool in the output of stage 3 and replace ADDR with it
>in the below command (it is single line):
>echo "ADDR::print spa_t spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | 
>mdb -k

r...@mybox:~# echo "ffffff02f2b8b800::print spa_t 
spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | mdb -k
mdb: spa_t is not a struct or union type

So I decided to remove "spa_t" to see what would happen:

r...@mybox:~# echo "ffffff02f2b8b800::print 
spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | mdb -k
mdb: failed to look up type spa_dsl_pool->dp_tx.tx_sync_thread: no symbol 
corresponds to address

>What do you mean by halt here? Are you able to interrupt 'zpool import' with 
>CTRL-C?
Yes

>Does 'zfs list' provide any output?
JACKPOT!!!!!  When I run "zfs list", the import completes!  Instead, "zfs list" 
hangs just like "zpool import" did.

r...@mybox:~# ps -ef | grep zfs
    root   940   874   0 03:49:15 pts/2       0:00 grep zfs
    root   936   908   0 03:44:28 pts/3       0:01 zfs list

r...@mybox:~# echo "0t936::pid2proc|::walk thread|::findstack -v" | mdb -k
stack pointer for thread ffffff02d72ea020: ffffff000fdeaa10
[ ffffff000fdeaa10 _resume_from_idle+0xf1() ]
  ffffff000fdeaa40 swtch+0x147()
  ffffff000fdeaa70 cv_wait+0x61(ffffff02eb010dda, ffffff02eb010d98)
  ffffff000fdeaac0 txg_wait_synced+0x7f(ffffff02eb010c00, 31990da)
  ffffff000fdeab00 dsl_sync_task_group_wait+0xee(ffffff02f1d11bd8)
  ffffff000fdeab80 dsl_sync_task_do+0x65(ffffff02eb010c00, fffffffff78be1f0, 
  fffffffff78be250, ffffff02f1d0ce00, ffffff000fdeab98, 0)
  ffffff000fdeabd0 dsl_dataset_rollback+0x53(ffffff02f1d0ce00, 2)
  ffffff000fdeac00 dmu_objset_rollback+0x46(ffffff02eb3322a8)
  ffffff000fdeac40 zfs_ioc_rollback+0x10d(ffffff02ebf4e000)
  ffffff000fdeacc0 zfsdev_ioctl+0x10b(b600000000, 5a1a, 8043a20, 100003, 
  ffffff02ee813e78, ffffff000fdeade4)
  ffffff000fdead00 cdev_ioctl+0x45(b600000000, 5a1a, 8043a20, 100003, 
  ffffff02ee813e78, ffffff000fdeade4)
  ffffff000fdead40 spec_ioctl+0x83(ffffff02df6a7480, 5a1a, 8043a20, 100003, 
  ffffff02ee813e78, ffffff000fdeade4, 0)
  ffffff000fdeadc0 fop_ioctl+0x7b(ffffff02df6a7480, 5a1a, 8043a20, 100003, 
  ffffff02ee813e78, ffffff000fdeade4, 0)
  ffffff000fdeaec0 ioctl+0x18e(3, 5a1a, 8043a20)
  ffffff000fdeaf10 _sys_sysenter_post_swapgs+0x14b()


>Apparently as you have 5TB of data there, it worked fine some time ago. What
>happened to the pool before this issue was noticed?
A reboot?
This box acts as network storage for all of my computers.  All of the PCs in 
the house are set to back up to it daily, and it is like an extra hard drive 
for my wife's netbook and laptop.  We dump all of the pictures off of the 
camera there as well as any HD video we capture.  I NEVER reboot this box 
unless I am prompted to.  I'm running OpenSolaris (uname -a: SunOS mybox 5.11 
snv_111b i86pc i386 i86pc Solaris), and if I remember right, I was prompted to 
update.  I did so, and needed to reboot.  Rebooted, and the box would not 
start.  I used another PC to find out how to start in single user mode and 
tried that.  No dice.  I had to physically remove the drives to get to a login 
prompt.  BTW, I just stopped the "zfs list" after about 30 minutes running, and 
it was constantly writing to my drives. (used 'zpool iostat 1' to check)  I am 
by no means an expert, but whatever "zfs list" is trying to do, it is hanging.

Right now, my goal is to back up all of my important data.  Once I do that, I 
will delete this pool and start over from scratch.  My biggest concern is to 
keep this from happening again.  Any suggestions?
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to