Re: [zfs-discuss] zfs destroy hanging

2009-07-21 Thread Moshe Vainer
Some more info - the system won't shutdown, issuing shutdown -g0 -i5 just sits 
there doing nothing.

Then i tried to find locks on the savecore i took,  - mdb crashes:
mdb -k ./unix.1 ./vmcore.1
mdb: failed to read panicbuf and panic_reg -- current register set will be 
unavailable
Loading modules: [ unix genunix specfs dtrace cpu.generic uppc pcplusmp 
scsi_vhci zfs sd ip hook neti sctp arp usba uhci fctl fcip fcp md cpc random 
crypto smbsrv nfs lofs logindmux ptm ufs sppp nsmb ipc ]
 ::findlocks
mdb: type graph not yet built; run ::typegraph.
 ::typegraph
typegraph:   pass = initial
mdb: failed to read slab at 0x4350b001065fff0: no mapping for address

*** mdb: received signal ABRT at:
[1] libc.so.1`_lwp_kill+0xa()
[2] libc.so.1`raise+0x19()
[3] libumem.so.1`umem_do_abort+0x1c()
[4] libumem.so.1`umem_err_recoverable+0xb8()
[5] libumem.so.1`process_free+0x17e()
[6] libumem.so.1`free+0x16()
[7] mdb`mdb_free+0x3b()
[8] genunix.so`avl_walk_fini+0x26()
[9] genunix.so`combined_walk_fini+0x40()
[10] mdb`mdb_wcb_destroy+0x3d()
[11] mdb`walk_common+0xb8()
[12] mdb`mdb_pwalk+0x3f()
[13] genunix.so`kmem_estimate_allocated+0x37()
[14] genunix.so`typegraph_estimate+0x2e()
[15] genunix.so`list_walk_step+0xa2()
[16] mdb`walk_step+0x5e()
[17] mdb`walk_common+0x7d()
[18] mdb`mdb_pwalk+0x3f()
[19] mdb`mdb_walk+0xc()
[20] genunix.so`typegraph+0xd1()
[21] mdb`dcmd_invoke+0x64()
[22] mdb`mdb_call_idcmd+0xff()
[23] mdb`mdb_call+0x390()
[24] mdb`yyparse+0x4e5()
[25] mdb`mdb_run+0x2cd()
[26] mdb`main+0x1246()
[27] mdb`_start+0x6c()

mdb: (c)ore dump, (q)uit, (r)ecover, or (s)top for debugger [cqrs]?
 --- i hit [r]
mdb: unloading module 'genunix' ...
Segmentation Fault (core dumped)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hanging

2009-07-21 Thread Moshe Vainer
And pstack won't give stack on bootadm process:

devu...@zfs05:/var/crash/zfs05# pstack 23870
23870:  /sbin/bootadm -a update_all
devu...@zfs05:/var/crash/zfs05# pstack -F 23870
23870:  /sbin/bootadm -a update_all
devu...@zfs05:/var/crash/zfs05# kill -9 23870
devu...@zfs05:/var/crash/zfs05# kill -9 23870
devu...@zfs05:/var/crash/zfs05# kill -9 23870
devu...@zfs05:/var/crash/zfs05# pstack -F 23870
pstack: cannot examine 23870: unanticipated system error
devu...@zfs05:/var/crash/zfs05# pstack -F 23870
pstack: cannot examine 23870: unanticipated system error
devu...@zfs05:/var/crash/zfs05# ps -ef | grep bootadm
root 24214 23890   0 11:11:00 pts/13  0:00 grep bootadm
root 23870 23847   0 10:35:03 pts/8   0:00 /sbin/bootadm -a update_all
devu...@zfs05:/var/crash/zfs05# kill -9 23870
devu...@zfs05:/var/crash/zfs05# ps -ef | grep bootadm
root 24220 23890   0 11:11:13 pts/13  0:00 grep bootadm
root 23870 23847   0 10:35:03 pts/8   0:00 /sbin/bootadm -a update_all
devu...@zfs05:/var/crash/zfs05# pstack -F 23870
pstack: cannot examine 23870: unanticipated system error
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hanging

2009-07-20 Thread Moshe Vainer
We have just got a hang like this.
Here's the output of ps -ef | grep zfs:

root   425 7   0   Jun 17 console 0:00 /usr/lib/saf/ttymon -g -d 
/dev/console -l console -m ldterm,ttcompat -h -p zfs0
root 22879 22876   0 18:18:37 ?   0:01 /usr/sbin/zfs rollback -r 
tank/aa
root 22884 1   0 18:19:02 ?   0:00 /usr/sbin/zfs clone 
tank/templates/x tank/yy
root 23188 23006   0 18:42:50 pts/8   0:00 grep z
root 22883 22826   0 18:19:01 ?   0:00 /usr/sbin/zfs destroy -rf 
tank/aa
root 22880 22855   0 18:18:37 ?   0:01 /usr/sbin/zfs rollback -r 
tank/aa
root 22930 1   0 18:24:33 ?   0:00 /usr/sbin/zfs clone tank/bb 
tank/ccc
root 22961 23010   0 18:25:42 pts/2   0:00 zfs list
root 22995 22945   0 18:27:11 pts/4   0:00 zfs list


I have created a crash dump, though it is rather large (around .5GB compressed).
Should i upload it here (i would rather not), or is there someone i should send 
it to directly?

The symptoms are otherwise the same - destroy hangs, process is not killable 
even by kill -9, all other zfs commands are hanging waiting for this to 
complete, even zfs list.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hanging

2009-07-20 Thread Moshe Vainer
Forgot to mention - 1. this system was installed as 2008.11, so it should have 
no upgrade issues.
2. Not sure how to do the mdb -k on the dump, the only thing it produced is the 
following:
 ::status
debugging live kernel (64-bit) on zfs05
operating system: 5.11 snv_101b (i86pc)
 $C

-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hanging

2009-07-20 Thread Moshe Vainer
Ok, sorry for spamming - got some more info from mdb -k
devu...@zfs05:/var/crash/zfs05#  mdb -k unix.0 vmcore.0
mdb: failed to read panicbuf and panic_reg -- current register set will be 
unavailable
Loading modules: [ unix genunix specfs dtrace cpu.generic uppc pcplusmp 
scsi_vhci zfs sd ip hook neti sctp arp usba uhci fctl fcip fcp md cpc random 
crypto smbsrv nfs lofs logindmux ptm ufs sppp nsmb ipc ]
 ::status
debugging crash dump vmcore.0 (64-bit) from zfs05
operating system: 5.11 snv_101b (i86pc)
panic message:
dump content: kernel pages only
 $C
 ::stack
 ::pgrep zfs
SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
R  22884  1  22511  22511  0 0x4a004900 ff0317295350 zfs
R  22930  1  22930  22918  0 0x4a004000 ff0306ca68f8 zfs
R  22879  22876  22825  22825  0 0x4a004000 ff0306bf4918 zfs
R  22880  22855  22825  22825  0 0x4a004900 ff031754f068 zfs
R  22883  22826  22825  22825  0 0x4a004000 ff0306ca5038 zfs
R  22995  22945  22995  22939  0 0x4a004000 ff0306caa6d8 zfs
R  22961  23010  22961  22974  0 0x4a004000 ff031728f050 zfs
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hanging

2009-02-15 Thread David Dyer-Bennet
Thanks, I've filed your message where I can easily get at it even if I'm
having trouble with the server.  I'm afraid I'd rather I didn't get the
chance to use it, but if something weird does go on, I'm happy to have the
procedure to capture information that might help get it identified and
fixed.

On Sat, February 14, 2009 16:30, James C. McPherson wrote:

 Hi David,
 if this happens to you again, you could help get more
 data on the problem by getting a crash dump, either forced
 or via reboot or (if you have a dedicated dump device, via
 savecore:

 (dedicated dump dev, )
 # savecore -L /var/crash/`uname -n`

[rest snipped to save bandwidth]

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hanging

2009-02-14 Thread Blake
I think you can kill the destroy command process using traditional methods.

Perhaps your slowness issue is because the pool is an older format.
I've not had these problems since upgrading to the zfs version that
comes default with 2008.11


On Fri, Feb 13, 2009 at 4:14 PM, David Dyer-Bennet d...@dd-b.net wrote:
 This shouldn't be taking anywhere *near* half an hour.  The snapshots
 differ trivially, by one or two files and less than 10k of data (they're
 test results from working on my backup script).  But so far, it's still
 sitting there after more than half an hour.

 local...@fsfs:~/src/bup2# zfs destroy ruin/export
 cannot destroy 'ruin/export': filesystem has children
 use '-r' to destroy the following datasets:
 ruin/export/h...@bup-20090210-202557utc
 ruin/export/h...@20090210-213902utc
 ruin/export/home/local...@first
 ruin/export/home/local...@second
 ruin/export/home/local...@bup-20090210-202557utc
 ruin/export/home/local...@20090210-213902utc
 ruin/export/home/localddb
 ruin/export/home
 local...@fsfs:~/src/bup2# zfs destroy -r ruin/export

 It's still hung.

 Ah, here's zfs list output from shortly before I started the destroy:

 ruin 474G   440G   431G  /backups/ruin
 ruin/export 35.0M   440G18K  /backups/ruin/export
 ruin/export/home35.0M   440G19K  /export/home
 ruin/export/home/localddb 35M   440G  27.8M  /export/home/localddb

 As you can see, the ruin/export/home filesystem (and subs) is NOT large.

 iostat shows no activity on pool ruin over a minute.

 local...@fsfs:~$ pfexec zpool iostat ruin 10
   capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 ruin 474G   454G 10  0  1.13M840
 ruin 474G   454G  0  0  0  0
 ruin 474G   454G  0  0  0  0
 ruin 474G   454G  0  0  0  0
 ruin 474G   454G  0  0  0  0
 ruin 474G   454G  0  0  0  0
 ruin 474G   454G  0  0  0  0
 ruin 474G   454G  0  0  0  0
 ruin 474G   454G  0  0  0  0

 The pool still thinks it is healthy.

 local...@fsfs:~$ zpool status -v ruin
  pool: ruin
  state: ONLINE
 status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
 action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
  scrub: scrub completed after 4h42m with 0 errors on Mon Feb  9 19:10:49 2009
 config:

NAMESTATE READ WRITE CKSUM
ruinONLINE   0 0 0
  c7t0d0ONLINE   0 0 0

 errors: No known data errors

 There is still a process out there trying to run that destroy.  It doesn't
 appear to be using much cpu time.

 local...@fsfs:~$ ps -ef | grep zfs
 localddb  7291  7228   0 15:10:56 pts/4   0:00 grep zfs
root  7223  7101   0 14:18:27 pts/3   0:00 zfs destroy -r ruin/export

 Running 2008.11.

 local...@fsfs:~$ uname -a
 SunOS fsfs 5.11 snv_101b i86pc i386 i86pc Solaris

 Any suggestions?  Eventually I'll kill the process by the gentlest way
 that works, I suppose (if it doesn't complete).
 --
 David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
 Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
 Photos: http://dd-b.net/photography/gallery/
 Dragaera: http://dragaera.info

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hanging

2009-02-14 Thread David Dyer-Bennet

On Sat, February 14, 2009 13:04, Blake wrote:
 I think you can kill the destroy command process using traditional
 methods.

kill and kill -9 failed.  In fact, rebooting failed; I had to use a hard
reset (it shut down most of the way, but then got stuck).

 Perhaps your slowness issue is because the pool is an older format.
 I've not had these problems since upgrading to the zfs version that
 comes default with 2008.11

We can hope.  In case that's the cause, I upgraded the pool format (after
considering whether I'd be needing to access it with older software; hope
I was right :-)).

The pool did import and scrub cleanly, anyway.  That's hopeful.  Also this
particular pool is a scratch pool at the moment, so I'm not risking losing
data, only risking losing confidence in ZFS.  It's also a USB external
disk.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hanging

2009-02-14 Thread James C. McPherson
On Sat, 14 Feb 2009 15:40:04 -0600 (CST)
David Dyer-Bennet d...@dd-b.net wrote:

 
 On Sat, February 14, 2009 13:04, Blake wrote:
  I think you can kill the destroy command process using traditional
  methods.
 
 kill and kill -9 failed.  In fact, rebooting failed; I had to use a
 hard reset (it shut down most of the way, but then got stuck).
 
  Perhaps your slowness issue is because the pool is an older format.
  I've not had these problems since upgrading to the zfs version that
  comes default with 2008.11
 
 We can hope.  In case that's the cause, I upgraded the pool format
 (after considering whether I'd be needing to access it with older
 software; hope I was right :-)).
 
 The pool did import and scrub cleanly, anyway.  That's hopeful.  Also
 this particular pool is a scratch pool at the moment, so I'm not
 risking losing data, only risking losing confidence in ZFS.  It's
 also a USB external disk.

Hi David,
if this happens to you again, you could help get more
data on the problem by getting a crash dump, either forced
or via reboot or (if you have a dedicated dump device, via
savecore:

(dedicated dump dev, )
# savecore -L /var/crash/`uname -n`

or

# reboot -dq


(forced, 64bit mode)
# echo 0rip|mdb -kw

(forced, 32bit mode)
# echo 0eip|mdb -kw


Try the command line options first, only use the mdb
kick in the guts if the other two fail.

Once you've got the core, you could post the output of

::status
$C

when run over the core with mdb -k.



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs destroy hanging

2009-02-13 Thread David Dyer-Bennet
This shouldn't be taking anywhere *near* half an hour.  The snapshots
differ trivially, by one or two files and less than 10k of data (they're
test results from working on my backup script).  But so far, it's still
sitting there after more than half an hour.

local...@fsfs:~/src/bup2# zfs destroy ruin/export
cannot destroy 'ruin/export': filesystem has children
use '-r' to destroy the following datasets:
ruin/export/h...@bup-20090210-202557utc
ruin/export/h...@20090210-213902utc
ruin/export/home/local...@first
ruin/export/home/local...@second
ruin/export/home/local...@bup-20090210-202557utc
ruin/export/home/local...@20090210-213902utc
ruin/export/home/localddb
ruin/export/home
local...@fsfs:~/src/bup2# zfs destroy -r ruin/export

It's still hung.

Ah, here's zfs list output from shortly before I started the destroy:

ruin 474G   440G   431G  /backups/ruin
ruin/export 35.0M   440G18K  /backups/ruin/export
ruin/export/home35.0M   440G19K  /export/home
ruin/export/home/localddb 35M   440G  27.8M  /export/home/localddb

As you can see, the ruin/export/home filesystem (and subs) is NOT large.

iostat shows no activity on pool ruin over a minute.

local...@fsfs:~$ pfexec zpool iostat ruin 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
ruin 474G   454G 10  0  1.13M840
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0

The pool still thinks it is healthy.

local...@fsfs:~$ zpool status -v ruin
  pool: ruin
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub completed after 4h42m with 0 errors on Mon Feb  9 19:10:49 2009
config:

NAMESTATE READ WRITE CKSUM
ruinONLINE   0 0 0
  c7t0d0ONLINE   0 0 0

errors: No known data errors

There is still a process out there trying to run that destroy.  It doesn't
appear to be using much cpu time.

local...@fsfs:~$ ps -ef | grep zfs
localddb  7291  7228   0 15:10:56 pts/4   0:00 grep zfs
root  7223  7101   0 14:18:27 pts/3   0:00 zfs destroy -r ruin/export

Running 2008.11.

local...@fsfs:~$ uname -a
SunOS fsfs 5.11 snv_101b i86pc i386 i86pc Solaris

Any suggestions?  Eventually I'll kill the process by the gentlest way
that works, I suppose (if it doesn't complete).
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss