[zfs-code] [ufs-discuss] statvfs change

Boyd Adamson Wed, 29 Aug 2007 12:26:26 +1000

Chris Kirby <chris.kirby at sun.com> writes:
> Don, thanks for your comments, please see below:
>
> Don Cragun wrote:
>>>Date: Mon, 27 Aug 2007 16:04:42 -0500 From: Chris Kirby
>>><chris.kirby at sun.com>
>  >
>> The base point is that any time you lie to applications, some
>> application software is going to make wrong decisions based on the
>> lie.
>
>
> Yes, and we certainly don't want to lie. But returning an error when
> we can return valid (albeit less precise) info will also cause
> applications to make wrong decisions.
>
> In the case of the netBeans installer, it died because it thought
> there wasn't enough free space when in fact, there were several TB of
> space available.
>
> I suspect that most apps that use f_bfree/f_bavail just want to know
> if they have enough space to write their data.


I don't claim to be an expert in this area (or any other) but all this
seems very clearly to be an application bug. If the application asks
"how much free space is there on this filesystem" and the system says
"More space that your data structure can handle" then surely the correct
response is:

   "Since the data I'm installing *will* fit into a 32 bit size and the
   free space *won't*, there is no problem, just install."

Or,

   "The data I'm installing *won't* fit into a 32 bit size so using a 32
   bit struct to get the available space was stupid in the first place"

>>>For ZFS, we report f_frsize as 512 regardless of the size of the fs.
>>>...
>> 
>> 
>> Why?  Why shouldn't you always set f_frsize to the actual size of an
>> allocation unit on the filesystem?  Is it still true that we don't
>> support disks formatted with 1024 byte sectors?
>
>
> For ZFS, we don't have a fixed allocation block size so in general
> there won't be one true f_frsize across and entire VFS.  So we return
> SPA_MINBLOCKSIZE (512) for f_frsize.
>
>
>> When you cap f_files, f_ffree, and f_favail at UINT32_MAX when the
>> correct values for these fields are larger; you are not returning
>> valid information.
>
>
> I think it's valid in the sense that you will be able to create at
> least UINT32_MAX files.  Of course once you've done so, we might still
> report that you can create UINT32_MAX additional files.  :-)
>
> For any application making a decision on an available file count such
> that UINT32_MAX is not enough, but UINT32_MAX+1 would be OK, is using
> the correct largefile syscalls like statvfs64.
>
>> 
>> You may be returning "valid" values for f_frsize, f_blocks, f_bfree,
>> and f_bavail; but you aren't checking to see if that is true or
>> not. (If shifting f_blocks, f_bfree, or f_bavail right throws away a
>> bit that was not a zero bit; the scaled values being returned are not
>> valid.)
>
>
> You're right that we're discarding some bytes through the scaling
> process.  However, any non-zero bits that are discarded are
> effectively partial f_frsize blocks.  For any filesystem large enough
> to get into this situation, we're talking about a relatively *very*
> small amount of rounding down.  (e.g. for a 1PB fs, f_frsize is only
> 256K)
>
> Remember that the fs code can be doing delayed writes, delayed
> allocation, background delete processing, etc.  So the statvfs values
> are just rumors anyway.  Most filesystems don't even bother to grab a
> lock when reporting statvfs info.

Not to mention that (even if it's 100% accurate) the amount of free
space *now* is not valid at any time other than *now*. Assuming that if
there's 300GB available now it means I can write 300GB is not valid in
the face of other writers.

>> Since the statvfs(2) and statvfs.h(3HEAD) man pages don't state any
>> relationship between f_bsize and f_frsize, applications may well have
>> made their own assumptions.  Is there documentation somewhere that
>> specifies how many bytes should be written at a time (on boundaries
>> that is a multiple of that value) to get the most efficiency out of
>> the underlying hardware?  I would hope that f_bsize would be that
>> value.  If it is, it seems that f_bsize should be an integral
>> multiple of f_frsize.
>
> Aside from the comment in statvfs(2) about f_bsize being the
> "preferred file system block size", I can't find any documentation
> that talks about that.
>
> For filesystems that support direct I/O, f_bsize has traditionally
> provided the most efficient I/O size multiplier.
>
> But the setting of f_bsize is up to the underlying fs.  And at least
> for QFS, UFS, and ZFS, its value is not scaled based on
> f_frsize. That's also why I don't rescale f_bsize.
>
>
> -Chris _______________________________________________ ufs-discuss
> mailing list ufs-discuss at opensolaris.org

[zfs-code] [ufs-discuss] statvfs change

Reply via email to