Ok, thanks. Now I'll check ffs source code around fallocate and see if I can add anything to it.
On Tue, Nov 19, 2019, 02:17 Jason Thorpe <thor...@me.com> wrote: > > > > On Nov 18, 2019, at 12:22 PM, HRISHIKESH GOYAL <hrishi.go...@gmail.com> > wrote: > > > > Hi Jason, > > > > Thanks for the detailed and clear explanation. > > > > > However, the file system is allowed to NOT zero out the space SO LONG > AS it knows that the space is uninitialized and thus return zero-filled > pages when the space is read. > > > > Does that mean when there's a read request on uninitialised blocks, > filesystem doesn't read disk but directly returns zero filled blocks? > > Yes, that would be how it would have to work, otherwise you're leaking the > contents of what might have been in that now-allocated space on disk > before, which can be a security risk. > > > > > Regards, > > Hrishikesh > > > > On Mon, Nov 18, 2019, 23:34 Jason Thorpe <thor...@me.com> wrote: > > > > > > > On Nov 17, 2019, at 11:21 AM, HRISHIKESH GOYAL <hrishi.go...@gmail.com> > wrote: > > > > > > Questions: > > > 1. As what I follow from the above stackoverflow answer and truncate > man page, even though `truncate` doesn't allocate space for file baz but > filesystem should still update the free space by reducing it to > 0.3G(otherwise filesystem metadata are not consistent with file metadata). > Could anyone please correct me? > > > > > > 2. Does it mean that `truncate` only updates file vnode (i.e. size) > attribute and doesn't update super block (free_space) attribute? > > > > > > 3. I checked first 100 bytes in both above files using c lang fread() > function, all are filled with NULL character ( '\0' ), how file bar > (previously fallocate'ed file) got initialised with NULLs(as per my > understanding since they are uninitialised, they should be some random > bytes.. and not all nulls right?). > > > > I think what you are missing is that that many file systems support > sparse files. Consider an application that does: > > > > 1- Create file "foo". > > 2- Write a single byte to offset 0. > > 3- Write a single byte to offset (4GiB-1). > > > > That file will have a logical size of 4GiB; this size is recorded in the > inode. However, on FFS, it will only have 2 file system blocks allocated. > The direct and indirect block pointers for the whole middle range will not > point to any physical space on disk[*], and when an application reads from > that range, the file system will return zero-filled pages. > > > > [*] ...a little bit of hand-waving some of the details here; some of the > indirect block pointers will in fact be filled in, because they are needed > to be able to find the block at the end of the file that's actually > allocated, and at 4GiB, you're definitely into indirect block territory. > > > > This is similar to what happens when you call truncate() on a file with > a size beyond the current EOF, only in that case, you didn't need to write > a byte to the end to get the size to change; there's simply no block > allocated to the end of the file. > > > > Now, what happens if you do a posix_fallocate("foo", 0, 4GiB)? The file > system will have to allocate all of the necessary space, FILL IT WITH > ZEROS, and fill in the direct and indirect block pointers in the inode. > > > > Now, a file system is allowed to make an optimization, here. The > posix_fallocate() specification does state that if offset+len is beyond the > current file size, that the file size will be updated, i.e. it behaves like > ftruncate() in that regard. However, the file system is allowed to NOT > zero out the space SO LONG AS it knows that the space is uninitialized and > thus return zero-filled pages when the space is read. This allows the file > system to avoid redundantly filling the space with zeros only to have those > zeros overwritten with actual data later. This is good for performance AND > for reducing PE cycles on flash storage. This would require an additional > size field in the inode to indicate the end if the initialized space (this > information would have to persist across unmounts, and essentially > represents an incompatible format change in the case of FFS since software > that does not understand this extra field could not safely mount the file > system). > > > > Technically, a file system is allowed to make that optimization for the > "allocate to fill in a sparse hole" case as well, but it would require a > bunch of extra metadata to track the valid ranges of the file, and so > probably isn't worth it. > > > > -- thorpej > > > > -- thorpej > >