I remember asking about this a long time ago, and everybody seemed to think it was a non-issue. The vague and unclearly reported rumor that ZFS behaves poorly when it's 100% full. Well now I have one really solid data point to confirm it. And possibly how to reproduce it, avoid it, and prevent it.
I'm looking to see if anyone else has similar or related issues. Of particular value, if you have any test machine to attempt reproducing the problem, that could be very valuable. I called support, and asked them to file a bug report, which they said they did. Still, more information is better than less. Hence this email. Core problem: Can't stop scrub when disk is full. And 'zfs destroy' old snapshot takes infinitely longer while scrub is running. It's a catch-22. Can't stop scrub because disk is full; can't free disk because scrub is running. We had to yank the power and enter failsafe mode for service. The system was in this bad state for approx 3 hours before we power cycled. Scrub running, and "zfs destroy" also running, and neither one is killable. Disks thrashing. Try to be patient, and let it run for hours with no signs of changing. Once it was in "failsafe" mode, and the scrub was stopped, we were able to destroy snapshots and free space in less than a minute. I don't have the precise error message anymore because my terminal disappeared when the server went down. But I know it contained the terms "cannot scrub" and "out of space" and it was something like this: (Attempt to stop the scrub) zpool scrub -s tank cannot scrub tank: out of space This makes no sense of course, because the command was to STOP scrubbing. So the response "cannot scrub" is nonsensical. Meanwhile, the pool continues scrubbing. Steps to reproduce the problem: (I think) 1. Have a zpool. 2. Have daily snapshots going back for several days. Each day, more space is consumed, and some things are deleted, etc. So space is being used in both the present filesystem, and all the previous snapshots. 3. When the disk is almost full, start a scrub. 4. Some process now creates data, until disk is full, and it's a little unclear what happens after that. Maybe the process continues trying and failing to write. Maybe the process dies. Other processes may be attempting the same or similar writes. The system becomes unusable, for an unknown period of time before users call IT, and IT logs in to see the status of the system. 5. While the disk is full, and scrub is running, try to destroy the oldest snapshot. It just sits there, but it's normal for this to take some time, sometimes. So just be patient. While you're waiting, check to see if anything else is happening, and discover that a scrub is in progress. 6. Try to stop the scrub. The command returns fine, but the scrub continues. Try stopping the scrub a few more times. Check the man page to ensure you're not being an idiot. Keep trying to stop the scrub. It's not working, but there is no error message. After some point, after repetition, I started getting the aforementioned "Can't scrub" error message. 7. Try killing the "zfs destroy." It won't die. Try kiling -9 or -KILL and it still won't die. Well, you know it can't die until it undoes all the work it's been doing for the last hour or two, so it's not surprising that it's not dying. 8. You can't stop the scrub because the disk is full. You can't free disk space because the scrub is running. Eventually you give up hope and do the power-cycle. 9. In failsafe mode, the scrub is not running. Maybe that's because we already gave the "zpool scrub -s" command, or maybe it's just normal behavior for a scrub to be cancelled after reboot or something - I don't know. But I do know the scrub was stopped while entering failsafe mode. We were able to destroy the old snaps in a few seconds. Now disk space is free, we reboot, and everything is fine again. What to do if the problem is encountered: If you have more space to add, you should be able to add space, and then stop the scrub. But once you add space, you can never remove it. So if you do this, be prepared for it to become permanent. Maybe if you wait long enough, the scrub and/or zfs destroy might eventually finish. Make your own choice. We decided to power cycle after it seemed like it was making no progress for an unreasonably long time. Upon reboot, entering failsafe mode, import the pool. Ensure it is not scrubbing. Destroy old snapshots (completed in a few seconds.) And reboot as normal again. How to avoid the problem: Option #1: I don't know if this works. I just know it was rumored some time ago, and it seems possible. Before there is a problem, create a new zfs filesystem, with a space reservation. zfs create -o reservation=1G tank/reservation Then, when or if there is a problem someday, hopefully you can "zfs destroy tank/reservation" or "zfs set reservation=none tank/reservation" or something like that... To free up a little space and stop the scrub. Option #2: Poll for disk usage while scrubbing, and stop scrubbing if disk usage gets above a threshold. We are running fully patched Solaris 10 on x86 SunFire x4275. [r...@nas ~]# uname -a SunOS nas 5.10 Generic_142901-07 i86pc i386 i86pc [r...@nas ~]# cat /etc/release Solaris 10 10/08 s10x_u6wos_07b X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 27 October 2008 We are running the system default version of zpool. [r...@nas ~]# zpool upgrade This system is currently running ZFS pool version 15. The following pools are out of date, and can be upgraded. After being upgraded, these pools will no longer be accessible by older software versions. VER POOL --- ------------ 10 tank 10 rpool We are running the system default version of zfs. [r...@nas ~]# zfs upgrade This system is currently running ZFS filesystem version 4. The following filesystems are out of date, and can be upgraded. After being upgraded, these filesystems (and any 'zfs send' streams generated from subsequent snapshots) will no longer be accessible by older software versions. VER FILESYSTEM --- ------------ 3 tank 3 rpool 3 rpool/ROOT 3 rpool/ROOT/nas_slash
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss