On Sunday 07 September 2008 17:43, Shapor Naghibzadeh wrote: > I've noticed most filesystems have relatively little diversity in file > attributes (especially within a directory), so we have lots of > duplicated bits of attribute metadata. For example, an email system > with "virtual" accounts (not tied to real Unix users) may have > millions of files with the exact same user/group/mode (Maildirs). > With Tux3, if the inodes didn't explicitly track the extra 6 or so > bytes of user/group/mode data per entry, we could see a potential 25% > reduction in size of our already compact inodes. > > After first reading this post, I thought the right approach may be to > combine xattrs and user/group/mode in to a single attribute atom table > which could grow dynamically in addressability (with 2 or 3 levels). > However, I think an inheritance model would work better. With atoms, > it is possible for any user (malicious or not) to grow the atom table > significantly. Updating reference counts also sounds complex, with a > lot of corner cases. > > Initially, I thought we could track user/group/mode defaults on a > per-directory basis, but discarded this due to the inability to > (easily) map an inode to a parent directory (not to mention hard > links, duh). It would be possible, however, to have attribute > defaults for inode table blocks (or higher level branches of the tree, > even). If we did that, it could lessen the need for a more complex > atom based approach.
I completely agree with you on the thrust of this. This is purely a compression optimization, in other words, it had better cause no change to semantics. The inheritance can be per inode table block, that is, each inode table block has a default user/group/mode in its header, and if an inode exactly matches that, it is not represented, otherwise the attribute appears in the inode. A slight variation on that idea is to say that the user/group/mode attribute of each inode applies to the next one, if the next inode does not have one of its own. Which requires scanning all inodes in a table block to find out what the user/group/mode attribute should be, so I think I prefer the one per table block approach. This is 12 bytes, vs savings of up to 64 * 12 = 768 bytes/inode, which is a big deal. So yes, I think we should do something very much like this. Later of course, say after atomic commit and versioning are working, but with fuse being a reality there is no need to wait for the kernel port. > I suppose the inheritance and atom approaches could be combined or > chosen based on how the filesystem is being used, but that sounds > exponentially complex. :) Yup. Regards, Daniel _______________________________________________ Tux3 mailing list [email protected] http://tux3.org/cgi-bin/mailman/listinfo/tux3
