Re: [zfs-discuss] Hanging receive

Ian Collins Wed, 08 Jul 2009 01:32:56 -0700

Andrew Robert Nicols wrote:

On Wed, Jul 08, 2009 at 08:43:17AM +1200, Ian Collins wrote:

Ian Collins wrote:
Brent Jones wrote:
On Fri, Jul 3, 2009 at 8:31 PM, Ian Collins<i...@ianshome.com> wrote:
Ian Collins wrote:
I was doing an incremental send between pools, the receive side is
locked up and no zfs/zpool commands work on that pool.
The stacks look different from those reported in the earlier "ZFS
snapshot send/recv "hangs" X4540 servers" thread.

Here is the process information from scat (other commands hanging on
the pool are also in cv_wait):
Has anyone else seen anything like this?  The box wouldn't even
reboot, it had to be power cycled.  It locks up on receive regularly
now.
I hit this too:
6826836

Fixed in 117

http://opensolaris.org/jive/thread.jspa?threadID=104852&tstart=120
I don't think this is the same problem (which is why a started a newthread), a single incremental set will eventually lock the pool up,pretty much guaranteed each time.
One more data point:This didn't happen when I had a single pool (stripe of mirrors) on theserver. It started happening when I split the mirrors and created asecond pool built from 3 8 drive raidz2 vdevs. Sending to the new pool(either locally or from another machine) causes the hangs.


And here are my data points:

We were running two X4500s under Nevada 112 but came across this issue on
both of them. On receiving much data through a ZFS receive, they would lock
up. Any zpool or zfs commands would hang and were unkillable. The only way
to resolve the situation was to reboot without syncing disks. I reported
this in some posts back in April
(http://opensolaris.org/jive/click.jspa?searchID=2021762&messageID=368524)

One of them had an old enough zpool and zfs version to down/up/sidegrade to
Solaris 10 u6 and so I made this change.
The thumper running Solaris 10 is now mostly fine - it normally receives an
hourly snapshot with no problem.

The thumper unning 112 has continued to experience the issues described by
Ian and others. I've just upgraded to 117 and am having even more issues -
I'm unable to receive or roll back snapshots, instead I see:

506 r...@thumper1:~> cat snap | zfs receive -vF thumperpool
receiving incremental stream of vlepool/m...@200906182000 into 
thumperp...@200906182000
cannot receive incremental stream: most recent snapshot of thumperpool does not
match incremental source

511 r...@thumper1:~> zfs rollback -r thumperpool/m...@200906181800
cannot destroy 'thumperpool/m...@200906181900': dataset already exists

Thanks for the additional data Andrew.

Can you do a "zfs destroy" of thumperpool/m...@200906181900?

--
Ian.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Hanging receive

Reply via email to