Re: [sqlite] How much disk space is required to checkpoint?

Jay A. Kreibich Fri, 28 Sep 2012 10:49:43 -0700

On Wed, Sep 26, 2012 at 02:59:12PM +0100, Simon Slavin scratched on the wall:


> Allow me to present an alternative point of view.
> 
> If filling up your filespace is causing you to crash, it's easier to
> understand what's going on if you can see the files which are filling
> it up.  Running out of disk space when you can't see huge files makes
> it difficult to debug.  With an unlinked file I would find it hard to 
> figure out what was filling my hard disk causing me to crash.

  That's a valid point, and is one of the reasons why /tmp is
  traditionally its own file system... if you accidentally fill it, you
  might lock an application or two, but you shouldn't cause issues
  with the whole OS.

  Of course if the application dies because it ran out of space, that's
  its own fault for making assumptions about storage.  Having the file
  unlinked (or not) doesn't change those issues, other than leaving
  behind evidence of the cause.  An application properly checking 
  and reporting error codes doesn't need evidence.

  Regardless, you're saying it is easier and more logical to hunt
  through a whole file tree looking for a large file, rather than just
  checking to see how full the disks are.  While I get where you're
  coming from, I think it is a pretty weak point next to the general
  advantages open-and-unlink offers.


  There also seems to be a running assumption that disk space should,
  in some direct way, relate to the sum total of all the file sizes on
  a volume.  That's not a very safe assumption with Unix file systems.
  While the use non-symbolic links is somewhat rare these days, they
  do exist.  You can have the same file show up multiple places in the
  file system as a full-on native file (not a link or alias), but it
  only shows up on the disk once (that's why the call is unlink() and
  not delete()).
  
  Similarly, the "size" of a file and how much disk space it takes up
  are not always the same.   Most Unix file systems support sparse
  files, so you can have a files that report a "size" significantly
  larger than their actual disk usage.

  In fact, we used to test backup and archive software that way.  Open
  a file, write a byte, advanced the file counter several hundred
  gigabytes, write another byte.  The file shows up as huge, but it is
  only taking up a few kilobytes on disk.  The smart backup systems
  understand this, the dumb ones burn through a lot of tape recording
  zeros.  Same with tar and similar utilities, and don't get me started
  with crappy home-grown quota systems.  You can cause a lot of headaches
  for your sysadmin that way.  It's also fun to freak out the newbies 
  by putting a 2 TB file onto their 500 GB disk.

> Avoiding filename clashes can be done by creating files with random
> or time-related elements to their names.  It's less of a problem.

  As I already pointed out, there are many, many reasons for the
  "open-and-unlink" pattern that go beyond file name collision.  In
  fact, that's a very minor reason, since an application should be
  using one of the C standard calls like tmpnam() to generate unique
  file names in a known, accepted way.

> > How would a file that clogs up /tmp be preferrable to some unnamed data
> > that will be automatically removed from the file system by the fsck
> > after the crash?
> 
> Unix deletes the contents of /tmp at boot time.

  Some, but not all.  I've run across systems that don't bother to
  delete /tmp.  Not that it matters... many Unix systems have a high
  enough uptime that anything done on reboot happens pretty rare, and
  really shouldn't be considered standard maintenance.

> That's why it's special. 

  That, and it has specific permissions that allow anyone to create
  files.  Traditionally, it is also its own file system, although
  that's somewhat rare these days.

> In contrast, using unlink() can cause some
> chaos including filespace hogs turning up in lost+found -- the sort
> of thing that might cause problems that a mundane user might never
> understand.

  No, it can't.  In fact, if "problems for the mundane user" is your
  primary concern, open-and-unlink is a much better pattern in almost
  every way.

  Understand that if a process does an open-and-unlink and the
  process terminates FOR ANY REASON, including a crash, then the OS
  will close all the file descriptors, which will trigger an automatic
  deletion of the file.  The process cannot exit without cleaning up.

  Compare this to a traditional file in temp, which is just going to get
  left there, taking up space until the next reboot.  Most users-- even
  savvy users-- don't manually clean up their /tmp directory between
  reboots.  If this is a desktop with a multi-month uptime, and more
  and more cruft is getting left behind, it is conceivable that /tmp
  may fill up.

  If an application manages to fill up /tmp, the correct thing to do is
  notice that the write operations aren't working, report the error and
  exit.  The open-and-unlink file will always be cleaned up.  The
  normal file will hopefully be cleaned up by the application, assuming
  it remembers to do so correctly, even when exiting under and error
  condition.

  If the application is not so careful and crashes due to filling up
  /tmp, the open-and-unlink file will just be deleted by the OS and
  /tmp will have plenty of free space.  The normal file will remain,
  keeping /tmp full, and likely confusing the heck out of the user.
  The savvy user/admin may be smart enough to look in /tmp, the mundane
  user will just see their machine get slow, many applications not
  work, and will likely reboot the machine because something is clearly
  borked.  Clearly Linux/MacOS/Sun sucks.

  If the OS itself crashes, locks, panics, or loses power while an
  application is putting things into /tmp, then things get a bit more
  interesting.  In the case of the open-and-unlink file, the file and
  its contents will be cleaned up when the disks are fsck'ed on reboot.
  The file will NOT be put into lost+found, since the reference count
  in the i-node is zero.  It will simply be deleted and the space
  recovered.  This is true of files created in /tmp, as well as files
  created on any other file system.  While this is a process that only
  happens on reboot, it is recovering for a situation that can only
  happen due to unintentional shut-down.

  Normal files in /tmp will, in most cases, simply be deleted on
  reboot.  That works for files in /tmp, but not in other places.
  It should also be noted that since cleaning up unlinked files is part
  of the mount process, the open-and-unlink files will be removed, and
  the space recovered, long before any /tmp cleanup script is run.

  So in every way, the open-and-unlink approach is better for the
  mundane user, as it always has an equal or better chance of returning
  the system to a usable state, even for poorly written applications,
  and even if there is a very long duration between reboots.
  
  The only situation when a normal file makes more sense is if something
  goes wrong, and a developer or administrator trying to debug the issue
  is trying to figure out why it went wrong.  In short, situations when
  you need evidence left behind in the form of a big temp file that
  wasn't cleaned up after a crash.  If that crash was caused by filling
  up /tmp or some similar issue, the application should have reported
  the error.  If the crash was caused by some other problem, then the
  file is just going to be wasted space and may cause issues until it
  is cleaned up.  In all those situations, it makes more sense to blame
  the application for playing fast and loose with error codes.

> >> Is there any chance that the use of this trick can be discontinued ?
> > 
> > This is not a trick, it's a widely used Unix idiom.
> 
> It's widely used outside /tmp.

  It's widely used inside /tmp for anything that only needs to be seen
  by the process that created the file.

  When this technique is used correctly for files of small or moderate
  size, you (the admin or user) never see it.  That's half the point.
  Just because you've never noticed it doesn't mean it isn't happening.
  Just because you don't know about it doesn't mean it hasn't been
  there all along.

  
  The standard C I/O library includes the tmpfile() call, which performs
  the whole process of generating a random temporary file name, opening
  the file, and then unlinking the file.  It returns an anonymous file
  pointer with no associated file name that does not appear in the file
  system, and is deleted as soon as the file pointer is closed OR if
  the application terminates for any reason.  It will create the file 
  in the system's default temp space, which is /tmp in the case of UNIX
  systems.  This call is part of the POSIX platform standard, as well
  as the ISO C-90 standard.  I'm sure one could trace its roots back
  pretty far into the history of C and UNIX.  There is a strong history
  of open-and-unlink being the standard practice for this kind of thing.

  It is exactly what I would expect SQLite to be doing with it's temp
  files.
  
   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] How much disk space is required to checkpoint?

Reply via email to