[zfs-code] Design for EAs in dnode.

Andreas Dilger Mon, 11 Feb 2008 19:42:15 -0700

On Feb 09, 2008  08:10 -0700, Mark Maybee wrote:
>> Hi Matthew,
>> There's also the question of the attribute size limit.. from conversations 
>> I had with Andreas, I got the feeling that the Lustre layout information 
>> could be quite big when a file is striped through all OSTs, and I would 
>> imagine this would become an even bigger concern in the future if we 
>> intend to continue scaling horizontally.
>
> By their vary nature, these attributes will have a size limitation.  I
> don't really see the point in allowing a size larger than the supported
> block size.  64K seems reasonable.  I think that very large values could
> possibly be supported using some form of indirection: storing a block
> pointer for the value, or storing an object ID for values that span
> multiple blocks.


For the most common case in Lustre, the striping attribute will be
relatively small (in the range of 80 - 128 bytes).  In other cases
(less common, but still present) the current ext3 size limit of 4096
bytes is already a limiting factor on the striping of a file - directly
affecting the total bandwidth that can be allocated to a single file.
It is reasonable to have Lustre striping attributes up to 16kB - 24kB
range, at which point we will have a different (more efficient) mechanism
for storing large stripes, but it needs some upcoming infrastructure first.
There is very little use in the middle range.

It seems possible that we may need to have two separate mechanisms for
storing the small attributes and storing the large ones.  The small
Lustre SAs will be stored in the dnode, and the large ones in the existing
xattr mechanism.  Given that some applications (ZFS/OSX/pNFS) need to 
be able to fall back to looking in an xattr for the data they need for
compatibility, this isn't any extra overhead.

>> Anyway, your proposal is interesting, but there's also one thing I would 
>> like to add:
>>
>> Could we have a special integer value that would essentially mean "this is 
>> an unknown, name-value type of attribute", which would be used to store 
>> additional, perhaps user-specified attributes?
>> In the space of the attribute value, we could store the name of the 
>> attribute and the value itself (perhaps with 1 or 2 additional bytes for 
>> the name length of the attribute).

To be clear - as yet Lustre has a fairly limited set of "system attributes"
that are needed for high performance operation.  There is the ability to
store "user attributes" on a file, and while good performance is desirable
this is not a widely-used feature and falls into the "nice to have" category.
In ext3 there is no separation of system attributes and user attributes, so
they all benefit from the [di]node local storage optimization.

>> I also think that instead of having an additional block pointer (which are 
>> huge) in the dnode for "spillage", we should have something like a 
>> "uint64_t dn_spillobj" which would be an object id of a "spillage" object.
>> An object id is much more space-efficient and, like Andreas mentioned, 
>> allows for an unlimited number of attributes/attribute sizes.
>
> I don't quite understand this.  The whole point of these attributes is
> to make them fast to access... using an object ID is going to be far
> more expensive then a block pointer to access.  Matt's model can also
> support unlimited numbers of attributes if we allow the blocks to be
> chained.

While I agree with your point, I think part of the issue is that as soon
as we store a blkptr_t in the dnode this will consume some significant
chunk of the SA space and push attributes out of the dnode.  The other
tradeoff is one of complexity.  You know the code better than I, but it
seems cleaner to have a dnode reference as a container for a bunch of SA
blocks rather than having another block tree attached to the same dnode.

That said, if you don't think there is a lot of added complexity to have
chained blocks for the SAs, I'll defer to your experience.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

[zfs-code] Design for EAs in dnode.

Reply via email to