[zfs-discuss] Re: Re[2]: System hang caused by a "bad" snapshot

Ben Miller Wed, 27 Sep 2006 05:34:52 -0700

> > Hello Matthew,
> > Tuesday, September 12, 2006, 7:57:45 PM, you
> wrote:
> > MA> Ben Miller wrote:
> > >> I had a strange ZFS problem this morning.  The
> > entire system would
> > >> hang when mounting the ZFS filesystems.  After
> > trial and error I
> > >> determined that the problem was with one of the
> > 2500 ZFS filesystems.
> > >> When mounting that users' home the system would
> > hang and need to be
> > >> rebooted.  After I removed the snapshots (9 of
> > them) for that
> > >> filesystem everything was fine.
> > >> 
> > >> I don't know how to reproduce this and didn't
> get
> > a crash dump.  I
> > >> don't remember seeing anything about this
> before
> > so I wanted to
> > >> report it and see if anyone has any ideas.
> > 
> > MA> Hmm, that sounds pretty bizarre, since I don't
> > think that mounting a 
> > MA> filesystem doesn't really interact with
> snapshots
> > at all. 
> > MA> Unfortunately, I don't think we'll be able to
> > diagnose this without a 
> > MA> crash dump or reproducibility.  If it happens
> > again, force a crash dump
> > MA> while the system is hung and we can take a
> look
> > at it.
> > 
> > Maybe it wasn't hung after all. I've seen similar
> > behavior here
> > sometimes. Did your disks used in a pool were
> > actually working?
> > 
> 
> There was lots of activity on the disks (iostat and
> status LEDs) until it got to this one filesystem and
> everything stopped.  'zpool iostat 5' stopped
> running, the shell wouldn't respond and activity on
> the disks stopped.  This fs is relatively small
>   (175M used of a 512M quota).
> > Sometimes it takes a lot of time (30-50minutes) to
> > mount a file system
> > - it's rare, but it happens. And during this ZFS
> > reads from those
> > disks in a pool. I did report it here some time
> ago.
> > 
> In my case the system crashed during the evening
> and it was left hung up when I came in during the
>  morning, so it was hung for a good 9-10 hours.
> 
  The problem happened again last night, but for a different users' filesystem. 
 I took a crash dump with it hung and the back trace looks like this:
> ::status
debugging crash dump vmcore.0 (64-bit) from hostname
operating system: 5.11 snv_40 (sun4u)
panic message: sync initiated
dump content: kernel pages only
> ::stack
0xf0046a3c(f005a4d8, 2a100047818, 181d010, 18378a8, 1849000, f005a4d8)
prom_enter_mon+0x24(2, 183c000, 18b7000, 2a100046c61, 1812158, 181b4c8)
debug_enter+0x110(0, a, a, 180fc00, 0, 183e000)
abort_seq_softintr+0x8c(180fc00, 18abc00, 180c000, 2a100047d98, 1, 1859800)
intr_thread+0x170(600019de0e0, 0, 6000d7bfc98, 600019de110, 600019de110, 
600019de110)
zfs_delete_thread_target+8(600019de080, ffffffffffffffff, 0, 600019de080, 
6000d791ae8, 60001aed428)
zfs_delete_thread+0x164(600019de080, 6000d7bfc88, 1, 2a100c4faca, 2a100c4fac8, 
600019de0e0)
thread_start+4(600019de080, 0, 0, 0, 0, 0)


In single user I set the mountpoint for that user to be none and then brought 
the system up fine.  Then I destroyed the snapshots for that user and their 
filesystem mounted fine.  In this case the quota was reached with the snapshots 
and 52% used without.

Ben
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re[2]: System hang caused by a "bad" snapshot

Reply via email to