Re: adding linux syscall fallocate

HRISHIKESH GOYAL Mon, 18 Nov 2019 12:47:06 -0800

Hi Jason,

Thanks for the detailed and clear explanation.


> However, the file system is allowed to NOT zero out the space SO LONG AS
it knows that the space is uninitialized and thus return zero-filled pages
when the space is read.

Does that mean when there's a read request on uninitialised blocks,
filesystem doesn't read disk but directly returns zero filled blocks?

Regards,
Hrishikesh

On Mon, Nov 18, 2019, 23:34 Jason Thorpe <thor...@me.com> wrote:

>
>
> > On Nov 17, 2019, at 11:21 AM, HRISHIKESH GOYAL <hrishi.go...@gmail.com>
> wrote:
> >
> > Questions:
> > 1. As what I follow from the above stackoverflow answer and truncate man
> page, even though `truncate` doesn't allocate space for file baz but
> filesystem should still update the free space by reducing it to
> 0.3G(otherwise filesystem metadata are not consistent with file metadata).
> Could anyone please correct me?
> >
> > 2. Does it mean that `truncate` only updates file vnode (i.e. size)
> attribute and doesn't update super block (free_space) attribute?
> >
> > 3. I checked first 100 bytes in both above files using c lang fread()
> function, all are filled with NULL character ( '\0' ), how file bar
> (previously fallocate'ed file) got initialised with NULLs(as per my
> understanding since they are uninitialised, they should be some random
> bytes.. and not all nulls right?).
>
> I think what you are missing is that that many file systems support sparse
> files.  Consider an application that does:
>
> 1- Create file "foo".
> 2- Write a single byte to offset 0.
> 3- Write a single byte to offset (4GiB-1).
>
> That file will have a logical size of 4GiB; this size is recorded in the
> inode.  However, on FFS, it will only have 2 file system blocks allocated.
> The direct and indirect block pointers for the whole middle range will not
> point to any physical space on disk[*], and when an application reads from
> that range, the file system will return zero-filled pages.
>
> [*] ...a little bit of hand-waving some of the details here; some of the
> indirect block pointers will in fact be filled in, because they are needed
> to be able to find the block at the end of the file that's actually
> allocated, and at 4GiB, you're definitely into indirect block territory.
>
> This is similar to what happens when you call truncate() on a file with a
> size beyond the current EOF, only in that case, you didn't need to write a
> byte to the end to get the size to change; there's simply no block
> allocated to the end of the file.
>
> Now, what happens if you do a posix_fallocate("foo", 0, 4GiB)?  The file
> system will have to allocate all of the necessary space, FILL IT WITH
> ZEROS, and fill in the direct and indirect block pointers in the inode.
>
> Now, a file system is allowed to make an optimization, here.  The
> posix_fallocate() specification does state that if offset+len is beyond the
> current file size, that the file size will be updated, i.e. it behaves like
> ftruncate() in that regard.  However, the file system is allowed to NOT
> zero out the space SO LONG AS it knows that the space is uninitialized and
> thus return zero-filled pages when the space is read.  This allows the file
> system to avoid redundantly filling the space with zeros only to have those
> zeros overwritten with actual data later.  This is good for performance AND
> for reducing PE cycles on flash storage.  This would require an additional
> size field in the inode to indicate the end if the initialized space (this
> information would have to persist across unmounts, and essentially
> represents an incompatible format change in the case of FFS since software
> that does not understand this extra field could not safely mount the file
> system).
>
> Technically, a file system is allowed to make that optimization for the
> "allocate to fill in a sparse hole" case as well, but it would require a
> bunch of extra metadata to track the valid ranges of the file, and so
> probably isn't worth it.
>
> -- thorpej
>
>

Re: adding linux syscall fallocate

Reply via email to