Re: adding linux syscall fallocate

HRISHIKESH GOYAL Mon, 18 Nov 2019 13:41:57 -0800

Ok, thanks.

Now I'll check ffs source code around fallocate and see if I can add
anything to it.


On Tue, Nov 19, 2019, 02:17 Jason Thorpe <thor...@me.com> wrote:

>
>
> > On Nov 18, 2019, at 12:22 PM, HRISHIKESH GOYAL <hrishi.go...@gmail.com>
> wrote:
> >
> > Hi Jason,
> >
> > Thanks for the detailed and clear explanation.
> >
> > > However, the file system is allowed to NOT zero out the space SO LONG
> AS it knows that the space is uninitialized and thus return zero-filled
> pages when the space is read.
> >
> > Does that mean when there's a read request on uninitialised blocks,
> filesystem doesn't read disk but directly returns zero filled blocks?
>
> Yes, that would be how it would have to work, otherwise you're leaking the
> contents of what might have been in that now-allocated space on disk
> before, which can be a security risk.
>
> >
> > Regards,
> > Hrishikesh
> >
> > On Mon, Nov 18, 2019, 23:34 Jason Thorpe <thor...@me.com> wrote:
> >
> >
> > > On Nov 17, 2019, at 11:21 AM, HRISHIKESH GOYAL <hrishi.go...@gmail.com>
> wrote:
> > >
> > > Questions:
> > > 1. As what I follow from the above stackoverflow answer and truncate
> man page, even though `truncate` doesn't allocate space for file baz but
> filesystem should still update the free space by reducing it to
> 0.3G(otherwise filesystem metadata are not consistent with file metadata).
> Could anyone please correct me?
> > >
> > > 2. Does it mean that `truncate` only updates file vnode (i.e. size)
> attribute and doesn't update super block (free_space) attribute?
> > >
> > > 3. I checked first 100 bytes in both above files using c lang fread()
> function, all are filled with NULL character ( '\0' ), how file bar
> (previously fallocate'ed file) got initialised with NULLs(as per my
> understanding since they are uninitialised, they should be some random
> bytes.. and not all nulls right?).
> >
> > I think what you are missing is that that many file systems support
> sparse files.  Consider an application that does:
> >
> > 1- Create file "foo".
> > 2- Write a single byte to offset 0.
> > 3- Write a single byte to offset (4GiB-1).
> >
> > That file will have a logical size of 4GiB; this size is recorded in the
> inode.  However, on FFS, it will only have 2 file system blocks allocated.
> The direct and indirect block pointers for the whole middle range will not
> point to any physical space on disk[*], and when an application reads from
> that range, the file system will return zero-filled pages.
> >
> > [*] ...a little bit of hand-waving some of the details here; some of the
> indirect block pointers will in fact be filled in, because they are needed
> to be able to find the block at the end of the file that's actually
> allocated, and at 4GiB, you're definitely into indirect block territory.
> >
> > This is similar to what happens when you call truncate() on a file with
> a size beyond the current EOF, only in that case, you didn't need to write
> a byte to the end to get the size to change; there's simply no block
> allocated to the end of the file.
> >
> > Now, what happens if you do a posix_fallocate("foo", 0, 4GiB)?  The file
> system will have to allocate all of the necessary space, FILL IT WITH
> ZEROS, and fill in the direct and indirect block pointers in the inode.
> >
> > Now, a file system is allowed to make an optimization, here.  The
> posix_fallocate() specification does state that if offset+len is beyond the
> current file size, that the file size will be updated, i.e. it behaves like
> ftruncate() in that regard.  However, the file system is allowed to NOT
> zero out the space SO LONG AS it knows that the space is uninitialized and
> thus return zero-filled pages when the space is read.  This allows the file
> system to avoid redundantly filling the space with zeros only to have those
> zeros overwritten with actual data later.  This is good for performance AND
> for reducing PE cycles on flash storage.  This would require an additional
> size field in the inode to indicate the end if the initialized space (this
> information would have to persist across unmounts, and essentially
> represents an incompatible format change in the case of FFS since software
> that does not understand this extra field could not safely mount the file
> system).
> >
> > Technically, a file system is allowed to make that optimization for the
> "allocate to fill in a sparse hole" case as well, but it would require a
> bunch of extra metadata to track the valid ranges of the file, and so
> probably isn't worth it.
> >
> > -- thorpej
> >
>
> -- thorpej
>
>

Re: adding linux syscall fallocate

Reply via email to