I remember asking about this a long time ago, and everybody seemed to think
it was a non-issue.  The vague and unclearly reported rumor that ZFS behaves
poorly when it's 100% full.  Well now I have one really solid data point to
confirm it.  And possibly how to reproduce it, avoid it, and prevent it.

 

I'm looking to see if anyone else has similar or related issues.  Of
particular value, if you have any test machine to attempt reproducing the
problem, that could be very valuable.

 

I called support, and asked them to file a bug report, which they said they
did.  Still, more information is better than less.  Hence this email.

 

Core problem:  Can't stop scrub when disk is full.  And 'zfs destroy' old
snapshot takes infinitely longer while scrub is running.  It's a catch-22.
Can't stop scrub because disk is full; can't free disk because scrub is
running.  We had to yank the power and enter failsafe mode for service.

 

The system was in this bad state for approx 3 hours before we power cycled.
Scrub running, and "zfs destroy" also running, and neither one is killable.
Disks thrashing.  Try to be patient, and let it run for hours with no signs
of changing.  Once it was in "failsafe" mode, and the scrub was stopped, we
were able to destroy snapshots and free space in less than a minute.

 

I don't have the precise error message anymore because my terminal
disappeared when the server went down.  But I know it contained the terms
"cannot scrub" and "out of space" and it was something like this:

      (Attempt to stop the scrub)

      zpool scrub -s tank

      cannot scrub tank: out of space

This makes no sense of course, because the command was to STOP scrubbing.
So the response "cannot scrub" is nonsensical.  Meanwhile, the pool
continues scrubbing.

 

Steps to reproduce the problem:  (I think)

1.  Have a zpool.

2.  Have daily snapshots going back for several days.  Each day, more space
is consumed, and some things are deleted, etc.  So space is being used in
both the present filesystem, and all the previous snapshots.

3.  When the disk is almost full, start a scrub.

4.  Some process now creates data, until disk is full, and it's a little
unclear what happens after that.  Maybe the process continues trying and
failing to write.  Maybe the process dies.  Other processes may be
attempting the same or similar writes.  The system becomes unusable, for an
unknown period of time before users call IT, and IT logs in to see the
status of the system.

5.  While the disk is full, and scrub is running, try to destroy the oldest
snapshot.  It just sits there, but it's normal for this to take some time,
sometimes.  So just be patient.  While you're waiting, check to see if
anything else is happening, and discover that a scrub is in progress.

6.  Try to stop the scrub.  The command returns fine, but the scrub
continues.  Try stopping the scrub a few more times.  Check the man page to
ensure you're not being an idiot.  Keep trying to stop the scrub.  It's not
working, but there is no error message.  After some point, after repetition,
I started getting the aforementioned "Can't scrub" error message.

7.  Try killing the "zfs destroy."  It won't die.  Try kiling -9 or -KILL
and it still won't die.  Well, you know it can't die until it undoes all the
work it's been doing for the last hour or two, so it's not surprising that
it's not dying.

8.  You can't stop the scrub because the disk is full.  You can't free disk
space because the scrub is running.  Eventually you give up hope and do the
power-cycle.

9.  In failsafe mode, the scrub is not running.  Maybe that's because we
already gave the "zpool scrub -s" command, or maybe it's just normal
behavior for a scrub to be cancelled after reboot or something - I don't
know.  But I do know the scrub was stopped while entering failsafe mode.  We
were able to destroy the old snaps in a few seconds.  Now disk space is
free, we reboot, and everything is fine again.

 

What to do if the problem is encountered:

If you have more space to add, you should be able to add space, and then
stop the scrub.  But once you add space, you can never remove it.  So if you
do this, be prepared for it to become permanent.

 

Maybe if you wait long enough, the scrub and/or zfs destroy might eventually
finish.  Make your own choice.  We decided to power cycle after it seemed
like it was making no progress for an unreasonably long time.  

 

Upon reboot, entering failsafe mode, import the pool.  Ensure it is not
scrubbing.  Destroy old snapshots (completed in a few seconds.)  And reboot
as normal again.

 

How to avoid the problem:

Option #1:

I don't know if this works.  I just know it was rumored some time ago, and
it seems possible.

 

Before there is a problem, create a new zfs filesystem, with a space
reservation.

zfs create -o reservation=1G tank/reservation

 

Then, when or if there is a problem someday, hopefully you can "zfs destroy
tank/reservation" or "zfs set reservation=none tank/reservation" or
something like that...  To free up a little space and stop the scrub.

 

Option #2:

Poll for disk usage while scrubbing, and stop scrubbing if disk usage gets
above a threshold. 

 

We are running fully patched Solaris 10 on x86 SunFire x4275.

 [r...@nas ~]# uname -a

SunOS nas 5.10 Generic_142901-07 i86pc i386 i86pc

 [r...@nas ~]# cat /etc/release

                       Solaris 10 10/08 s10x_u6wos_07b X86

           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.

                        Use is subject to license terms.

                            Assembled 27 October 2008

 

We are running the system default version of zpool.

 [r...@nas ~]# zpool upgrade

This system is currently running ZFS pool version 15.

 

The following pools are out of date, and can be upgraded.  After being

upgraded, these pools will no longer be accessible by older software
versions.

 

VER  POOL

---  ------------

10   tank

10   rpool

 

 

We are running the system default version of zfs.

 [r...@nas ~]# zfs upgrade

This system is currently running ZFS filesystem version 4.

 

The following filesystems are out of date, and can be upgraded.  After being

upgraded, these filesystems (and any 'zfs send' streams generated from

subsequent snapshots) will no longer be accessible by older software
versions.

 

VER  FILESYSTEM

---  ------------

3   tank

3   rpool

3   rpool/ROOT

3   rpool/ROOT/nas_slash

 

 

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to