On Feb 09, 2008 08:10 -0700, Mark Maybee wrote: >> Hi Matthew, >> There's also the question of the attribute size limit.. from conversations >> I had with Andreas, I got the feeling that the Lustre layout information >> could be quite big when a file is striped through all OSTs, and I would >> imagine this would become an even bigger concern in the future if we >> intend to continue scaling horizontally. > > By their vary nature, these attributes will have a size limitation. I > don't really see the point in allowing a size larger than the supported > block size. 64K seems reasonable. I think that very large values could > possibly be supported using some form of indirection: storing a block > pointer for the value, or storing an object ID for values that span > multiple blocks.
For the most common case in Lustre, the striping attribute will be relatively small (in the range of 80 - 128 bytes). In other cases (less common, but still present) the current ext3 size limit of 4096 bytes is already a limiting factor on the striping of a file - directly affecting the total bandwidth that can be allocated to a single file. It is reasonable to have Lustre striping attributes up to 16kB - 24kB range, at which point we will have a different (more efficient) mechanism for storing large stripes, but it needs some upcoming infrastructure first. There is very little use in the middle range. It seems possible that we may need to have two separate mechanisms for storing the small attributes and storing the large ones. The small Lustre SAs will be stored in the dnode, and the large ones in the existing xattr mechanism. Given that some applications (ZFS/OSX/pNFS) need to be able to fall back to looking in an xattr for the data they need for compatibility, this isn't any extra overhead. >> Anyway, your proposal is interesting, but there's also one thing I would >> like to add: >> >> Could we have a special integer value that would essentially mean "this is >> an unknown, name-value type of attribute", which would be used to store >> additional, perhaps user-specified attributes? >> In the space of the attribute value, we could store the name of the >> attribute and the value itself (perhaps with 1 or 2 additional bytes for >> the name length of the attribute). To be clear - as yet Lustre has a fairly limited set of "system attributes" that are needed for high performance operation. There is the ability to store "user attributes" on a file, and while good performance is desirable this is not a widely-used feature and falls into the "nice to have" category. In ext3 there is no separation of system attributes and user attributes, so they all benefit from the [di]node local storage optimization. >> I also think that instead of having an additional block pointer (which are >> huge) in the dnode for "spillage", we should have something like a >> "uint64_t dn_spillobj" which would be an object id of a "spillage" object. >> An object id is much more space-efficient and, like Andreas mentioned, >> allows for an unlimited number of attributes/attribute sizes. > > I don't quite understand this. The whole point of these attributes is > to make them fast to access... using an object ID is going to be far > more expensive then a block pointer to access. Matt's model can also > support unlimited numbers of attributes if we allow the blocks to be > chained. While I agree with your point, I think part of the issue is that as soon as we store a blkptr_t in the dnode this will consume some significant chunk of the SA space and push attributes out of the dnode. The other tradeoff is one of complexity. You know the code better than I, but it seems cleaner to have a dnode reference as a container for a bunch of SA blocks rather than having another block tree attached to the same dnode. That said, if you don't think there is a lot of added complexity to have chained blocks for the SAs, I'll defer to your experience. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.