[zfs-code] Design for EAs in dnode.

Mark Maybee Sat, 09 Feb 2008 08:10:22 -0700

Ricardo M. Correia wrote:
> Hi Matthew,
> 
> On Qui, 2008-02-07 at 12:48 -0800, Matthew Ahrens wrote:
>> on disk:
>> header:
>> struct sa_phys {
>>          uint16_t sa_numattrs;
>>          struct {
>>                  uint16_t sa_type;     /* enum sssa_type */
>>                  uint16_t sa_length;   /* in sainfo[sa_type] chunks */
>>          } sssa_attr[];
>> };
>>
>> followed by the data in the order specified by the header.  8-byte alignment 
>> will be enforced for all attribute starting offsets.
> 
> This would require repacking when adding an attribute, right? Anyway, 
> maybe that wouldn't be a big problem..
> 
> There's also the question of the attribute size limit.. from 
> conversations I had with Andreas, I got the feeling that the Lustre 
> layout information could be quite big when a file is striped through all 
> OSTs, and I would imagine this would become an even bigger concern in 
> the future if we intend to continue scaling horizontally.
> 
By their vary nature, these attributes will have a size limitation.  I
don't really see the point in allowing a size larger than the supported
block size.  64K seems reasonable.  I think that very large values could
possibly be supported using some form of indirection: storing a block
pointer for the value, or storing an object ID for values that span
multiple blocks.


> Anyway, your proposal is interesting, but there's also one thing I would 
> like to add:
> 
> Could we have a special integer value that would essentially mean "this 
> is an unknown, name-value type of attribute", which would be used to 
> store additional, perhaps user-specified attributes?
> In the space of the attribute value, we could store the name of the 
> attribute and the value itself (perhaps with 1 or 2 additional bytes for 
> the name length of the attribute).
> 
> These attributes could have lower priority than the other system 
> attributes (they would be stored at the end) and they would be ignored 
> (but not removed) by any implemention that doesn't understand them.
> 
I'm not sure how storing them at the end makes them lower priority.  I'm
also not sure exactly how these would be used... does lustre need these?
For any file system layer, we would need to add infrastructure to manage
and manipulate these objects, as well as new interfaces to be able to
access these attributes from outside the kernel.

All that aside.  There is nothing that would prohibit the definition of
this type of "special system attribute" in the current model.

> I also think that instead of having an additional block pointer (which 
> are huge) in the dnode for "spillage", we should have something like a 
> "uint64_t dn_spillobj" which would be an object id of a "spillage" object.
> An object id is much more space-efficient and, like Andreas mentioned, 
> allows for an unlimited number of attributes/attribute sizes.
> 
I don't quite understand this.  The whole point of these attributes is
to make them fast to access... using an object ID is going to be far
more expensive then a block pointer to access.  Matt's model can also
support unlimited numbers of attributes if we allow the blocks to be
chained.

> I also find **extremely** appealing your idea of making this whole thing 
> a DMU service.
> I think that would make our lives (as in Lustre developers' lives) much, 
> much easier in the long run.. the VFS dependencies in the ZPL code have 
> been kind of a pain, to say the least.. :)
> 
> Thanks,
> Ricardo
> 
> --
>       *Ricardo Manuel Correia*
> Lustre Engineering
> 
> *Sun Microsystems, Inc.*
> Portugal
> Phone +351.214134023 / x58723
> Mobile +351.912590825
> Email Ricardo.M.Correia at Sun.COM <mailto:Ricardo.M.Correia at Sun.COM>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> zfs-code mailing list
> zfs-code at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-code

[zfs-code] Design for EAs in dnode.

Reply via email to