> Personally, I think the underlying structure of files should not be made > visible to apps at all - they should just see a byte stream (perhaps with > an advisory useful block size to write in).
i would buy this argument if mmap()ing a large sparse file and filling it up randomly (but with relatively large chunks at a time) did not lead to severely fragmented files that can take 10x to read, vs one written with plain sequential write() calls. because of this, some workaround is necessary. it is very disappointing to see an average of 120 iops of 64KB each (and only because i formatted my FS with 64kb blocks/frags!.) whe sequentially reading a file created by mmap(). posix_fallocate() is answering a real problem. the work around today is to write the file, which doubles the IO traffic, and i am not sure we can do better with FFS, due to the issues you've mentioned, but there are many other filesystems in existence that do allow block allocation without exposing prior data or initialisation. given the current issues, i'd be happy with a userspace implementation. .mrg.