On Tue, Jul 17, 2012 at 08:15:20AM -0400, Thor Lancelot Simon wrote: > > In the case of the sparse file, the user has explicitly taken actions > that -- on normal Unix systems and filesystems -- reduce the space > required to store the file. If I open a file and lseek() 1TB off > the end, I have a reasonable expectation to be charged for zero bytes > of storage, or perhaps the size of the inode -- not 1,000,000,000,000 > bytes of storage.
The DragonFly VFS quota project was originally an existing Google Summer of Code proposal from 2010. I clearly remember some discussions about sparse files, and a preference beeing made about counting the seek size and not the number of actual blocks used. > However, it is not the case AFAICT that opening a file and seeking > 1TB off the end causes 1TB of allocation in HAMMER. Nor would I expect > the HAMMER maintainers to think such a behavior was desirable; as far > as I can tell they have more sense than that. HAMMER behaves in the same way as UFS, nothing changes here. > > Having a quota system based on visible file sizes gives at least consistent > > results with what a regular user sees when listing files or using du(1). > > You can say that because you avoid mentioning stat(2) or stat(1) or > (at least, not explicitly) ls(1), all of which do actually expose the > difference between the user's requested file length (st_size) and the > block allocations performed on behalf of the user (st_blocks * st_blksize). > > The problem is that you're mixing up apples and oranges: what the filesystem > (HAMMER) or storage device (deduplication) do behind the user's back which > may reduce or increase actual block usage on the underlying storage device > are fundamentally different from what the user expressly requests the > system do to manage block allocation (intentionally creating holes in files). > > Creating an inconsistency between what stat(2) reports and what is charged > against the user's quota really seems like a very bad idea. I understand > that you are trying to simplify away what looks to you like annoying > complexity, but consider the famous Einstein quote: "as simple as possible, > but no simpler". You've gone too simple: your scheme breaks user and > application expectations with regard to behavior the user/application > expressly requested from the kernel. Not a good thing. > > Existing applications reasonably expect that regardless of how much > disk space is available, they can lseek off the end of an existing > file and not get back an error. In fact, EDQUOT is not among the > documented error values for lseek(2) so applications will not > handle it (for the record, lseek also cannot return EFBIG nor ENOSPC). > So you can be pretty sure you will break a good number of existing > Unix applications, likely in data-corrupting ways! As far as I remember, potential application breakages concerns didn't come up when the decision was made to not specially handle sparse files. I may have to it if the first implementation really causes problems in practice. > Again, I am very curious whether you really have consensus from the > other Dragonfly developers in favor of this choice. There was no consensus, but no strong opposition either. Adding kernel@ to the discussion. -- Francois Tigeot